Metadata-Version: 1.1
Name: spsim
Version: 0.1.2
Summary: A spelling similarity measure for cognate identification.
Home-page: https://github.com/luismsgomes/spsim
Author: Luís Gomes
Author-email: luismsgomes@gmail.com
License: MIT
Description: =======
         spsim
        =======
        
        ``spsim`` is a Python 3 module that implements a spelling similarity measure
        for identifying cognates across languages, taking into account spelling
        differences that are characteristic of each language pair, as described
        in [Gomes2011]_.
        
        Note: in the examples below, `$` denotes the Bash prompt and a Linux, MacOs or similar \*nix environment is assumed.
        
        Install as usual::
        
            $ pip3 install spsim
        
        Example command line usage::
        
            $ # first let's get some pairs of words that may be cognates:
            $ wget http://research.variancia.com/spsim/maybe_enpt.txt
            $ cat maybe_enpt.txt
            pharmacy    farmácia
            arithmetic  aritmética
        
            $ # If we don't give any example cognates, SpSim will be equivalent to
            $ #             1 - edit_distance / max_len_of_strings
            $ # Note that by default spsim matches accentuated characters, i.e. a == á
            $ echo "" > empty.txt
            $ spsim empty.txt maybe_enpt.txt
            pharmacy    farmácia    0.5
            arithmetic  aritmética  0.8
        
            $ now let's get some example cognates:
            $ wget http://research.variancia.com/spsim/examples_enpt.txt
            $ cat examples_enpt.txt
            alcohol     álcool
            alpha       alfa
            anomaly     anomalia
            mathematics matemática
            methodology metodologia
            metric      métrica
            morphine    morfina
            photos      fotos
        
            $ # by giving these examples to spsim, it will learn to ignore certain differences:
            $ spsim examples_enpt.txt maybe_enpt.txt
            pharmacy    farmácia    1.0
            arithmetic  aritmética  1.0
        
        
        .. [Gomes2011] Measuring Spelling Similarity for Cognate Identification,
            Luís Gomes and Gabriel Pereira Lopes
            in *Progress in Artificial Intelligence, 15th Portuguese Conference in
            Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011*,
            http://www.springerlink.com/content/gtl56j3l06906020/
        
        
Keywords: text bilingual cognate mt machine translation
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.5
Classifier: Topic :: Text Processing :: Linguistic
