Metadata-Version: 2.1
Name: symspellpy
Version: 6.5.0
Summary: Python SymSpell
Home-page: https://github.com/mammothb/symspellpy
Author: mmb L
Author-email: mammothb@hotmail.com
License: MIT
Description: symspellpy <br>
        [![Build Status](https://travis-ci.com/mammothb/symspellpy.svg?branch=master)](https://travis-ci.com/mammothb/symspellpy)
        [![Documentation Status](https://readthedocs.org/projects/symspellpy/badge/?version=latest)](https://symspellpy.readthedocs.io/en/latest/?badge=latest)
        [![codecov](https://codecov.io/gh/mammothb/symspellpy/branch/master/graph/badge.svg)](https://codecov.io/gh/mammothb/symspellpy)
        ========
        
        symspellpy is a Python port of [SymSpell](https://github.com/wolfgarbe/SymSpell) v6.3, which provides much higher speed and lower memory consumption. Unit tests
        from the original project are implemented to ensure the accuracy of the port.
        
        Please note that the port has not been optimized for speed.
        
        Usage
        ========
        ### Installing the `symspellpy` module
        ```pip install -U symspellpy```
        
        ### Copying the frequency dictionary to your project
        Copy `frequency_dictionary_en_82_765.txt` and `frequency_bigramdictionary_en_243_342.txt` (found in the inner `symspellpy`
        directory) to your project directory so you end up with the following layout:
        ```
        project_dir
          +-frequency_dictionary_en_82_765.txt
          +-frequency_bigramdictionary_en_243_342.txt
          \-project.py
        ```
        
        ### Adding new terms
          - Use `load_dictionary(corpus=<path/to/dictionary.txt>, <term_index>,<count_index>)`. `dictionary.txt` should contain:
        ```
        <term> <count>
        <term> <count>
        ...
        <term> <count>
        ```
        with `term_index` indicating the column number of terms and `count_index` indicating the column number of counts/frequency.
          - Append `<term> <count>` to the provided `frequency_dictionary_en_82_765.txt`
          - Use the method `create_dictionary_entry(key=<term>, count=<count>)`
        
        ### Sample usage (`create_dictionary`)
        ```python
        import os
        
        from symspellpy.symspellpy import SymSpell  # import the module
        
        def main():
            # maximum edit distance per dictionary precalculation
            max_edit_distance_dictionary = 2
            prefix_length = 7
            # create object
            sym_spell = SymSpell(max_edit_distance_dictionary, prefix_length)
            
            # create dictionary using corpus.txt
            if not sym_spell.create_dictionary(<path/to/corpus.txt>):
                print("Corpus file not found")
                return
        
            for key, count in sym_spell.words.items():
                print("{} {}".format(key, count))
        
        if __name__ == "__main__":
            main()
        ```
        `corpus.txt` should contain:
        ```
        abc abc-def abc_def abc'def abc qwe qwe1 1qwe q1we 1234 1234
        ```
        Expected output:
        ```
        abc 4
        def 2
        abc'def 1
        qwe 1
        qwe1 1
        1qwe 1
        q1we 1
        1234 2
        ```
        
        ### Sample usage (`lookup` and `lookup_compound`)
        Using `project.py` (code is more verbose than required to allow explanation of method arguments)
        ```python
        import pkg_resources
        
        from symspellpy.symspellpy import SymSpell, Verbosity  # import the module
        
        def main():
            # maximum edit distance per dictionary precalculation
            max_edit_distance_dictionary = 2
            prefix_length = 7
            # create object
            sym_spell = SymSpell(max_edit_distance_dictionary, prefix_length)
            # load dictionary
            dictionary_path = pkg_resources.resource_filename(
                "symspellpy", "frequency_dictionary_en_82_765.txt")
            bigram_path = pkg_resources.resource_filename(
                "symspellpy", "frequency_bigramdictionary_en_243_342.txt")
            # term_index is the column of the term and count_index is the
            # column of the term frequency
            if not sym_spell.load_dictionary(dictionary_path, term_index=0,
                                             count_index=1):
                print("Dictionary file not found")
                return
            if not sym_spell.load_bigram_dictionary(bigram_path, term_index=0,
                                                    count_index=2):
                print("Bigram dictionary file not found")
                return
        
            # lookup suggestions for single-word input strings
            input_term = "memebers"  # misspelling of "members"
            # max edit distance per lookup
            # (max_edit_distance_lookup <= max_edit_distance_dictionary)
            max_edit_distance_lookup = 2
            suggestion_verbosity = Verbosity.CLOSEST  # TOP, CLOSEST, ALL
            suggestions = sym_spell.lookup(input_term, suggestion_verbosity,
                                           max_edit_distance_lookup)
            # display suggestion term, term frequency, and edit distance
            for suggestion in suggestions:
                print("{}, {}, {}".format(suggestion.term, suggestion.distance,
                                          suggestion.count))
        
            # lookup suggestions for multi-word input strings (supports compound
            # splitting & merging)
            input_term = ("whereis th elove hehad dated forImuch of thepast who "
                          "couqdn'tread in sixtgrade and ins pired him")
            # max edit distance per lookup (per single word, not per whole input string)
            max_edit_distance_lookup = 2
            suggestions = sym_spell.lookup_compound(input_term,
                                                    max_edit_distance_lookup)
            # display suggestion term, edit distance, and term frequency
            for suggestion in suggestions:
                print("{}, {}, {}".format(suggestion.term, suggestion.distance,
                                          suggestion.count))
        
        if __name__ == "__main__":
            main()
        ```
        ##### Expected output:
        `members, 1, 226656153`<br><br>
        `where is the love he had dated for much of the past who couldn't read in six grade and inspired him, 9, 0`
        
        ### Sample usage (`word_segmentation`)
        Using `project.py` (code is more verbose than required to allow explanation of
        method arguments)
        ```python
        import pkg_resources
        
        from symspellpy.symspellpy import SymSpell  # import the module
        
        def main():
            # maximum edit distance per dictionary precalculation
            max_edit_distance_dictionary = 0
            prefix_length = 7
            # create object
            sym_spell = SymSpell(max_edit_distance_dictionary, prefix_length)
            # load dictionary
            dictionary_path = pkg_resources.resource_filename(
                "symspellpy", "frequency_dictionary_en_82_765.txt")
            bigram_path = pkg_resources.resource_filename(
                "symspellpy", "frequency_bigramdictionary_en_243_342.txt")
            # term_index is the column of the term and count_index is the
            # column of the term frequency
            if not sym_spell.load_dictionary(dictionary_path, term_index=0,
                                             count_index=1):
                print("Dictionary file not found")
                return
            if not sym_spell.load_bigram_dictionary(dictionary_path, term_index=0,
                                                    count_index=2):
                print("Bigram dictionary file not found")
                return
        
            # a sentence without any spaces
            input_term = "thequickbrownfoxjumpsoverthelazydog"
            
            result = sym_spell.word_segmentation(input_term)
            # display suggestion term, term frequency, and edit distance
            print("{}, {}, {}".format(result.corrected_string, result.distance_sum,
                                      result.log_prob_sum))
        
        if __name__ == "__main__":
            main()
        ```
        ##### Expected output:
        `the quick brown fox jumps over the lazy dog 8 -34.491167981910635`
        
        
        ### Transferring casing
        
        To transfer the casing (eg uppercase/lowercase) from the original phrase
        to the typo-corrected one, use the `transfer_casing` boolean flag of 
        the `lookup()` and the `lookup_compound()` methods:
        
        `lookup_compound()`:
        ```
        suggestions = sym_spell.lookup_compound(input_term,
                                                max_edit_distance_lookup,
                                                transfer_casing=True)
        ```
        
        `lookup()`:
        ```
        suggestions = sym_spell.lookup(input_term,
                                       suggestion_verbosity,
                                       max_edit_distance_lookup,
                                       transfer_casing=True)
        ```
        
        
        
        
        CHANGELOG <br>
        ==============
        
        ## 6.5.0 (2019-09-21)
        ---------------------
        - Added `load_bigram_dictionary` and bigram dictionary `frequency_bigramdictionary_en_243_342.txt`
        - Updated `lookup_compound` algorithm
        - Added `Levenshtein` to compute edit distance
        - Added `save_pickle_stream` and `load_pickle_stream` to save/load SymSpell data alongside other structure (contribution by [marcoffee](https://github.com/marcoffee))
        
        ## 6.3.9 (2019-08-06)
        ---------------------
        - Added `transfer_casing` to `lookup` and `lookup_compound`
        - Fixed prefix length check in `_edits_prefix`
        
        ## 6.3.8 (2019-03-21)
        ---------------------
        - Implemented `delete_dictionary_entry`
        - Improved performance by using python builtin hashing
        - Added versioning of the pickle
        
        ## 6.3.7 (2019-02-18)
        ---------------------
        - Fixed `include_unknown` in `lookup`
        - Removed unused `initial_capacity` argument
        - Improved `_get_str_hash` performance
        - Implemented `save_pickle` and `load_pickle` to avoid having to create the
        dictionary every time
        
        ## 6.3.6 (2019-02-11)
        ---------------------
        - Added `create_dictionary()` feature
        
        ## 6.3.5 (2019-01-14)
        ---------------------
        - Fixed `lookup_compound()` to return the correct `distance`
        
        ## 6.3.4 (2019-01-04)
        ---------------------
        - Added `<self._replaced_words = dict()>` to track number of misspelled words
        - Added `ignore_token` to `word_segmentation()` to ignore words with regular expression
        
        ## 6.3.3 (2018-12-05)
        ---------------------
        - Added `word_segmentation()` feature
        
        ## 6.3.2 (2018-10-23)
        ---------------------
        - Added `encoding` option to `load_dictionary()`
        
        ## 6.3.1 (2018-08-30)
        ---------------------
        - Create a package for `symspellpy`
        
        ## 6.3.0 (2018-08-13)
        ---------------------
        - Ported [SymSpell](https://github.com/wolfgarbe/SymSpell) v6.3
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=3.4
Description-Content-Type: text/markdown
