Metadata-Version: 2.1
Name: super-collator
Version: 0.0.2
Summary: Collate textual sources with relaxed spelling.
Project-URL: Homepage, https://github.com/cceh/super-collator
Project-URL: Bug Tracker, https://github.com/cceh/super-collator/issues
Author-email: Marcello Perathoner <marcello@perathoner.de>
License-File: LICENSE
Keywords: collator,needleman-wunsch,python
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.8
Description-Content-Type: text/x-rst

==============
Super Collator
==============

.. |py39| image:: docs/_images/tox-py39.svg

.. |py310| image:: docs/_images/tox-py310.svg

.. |py311| image:: docs/_images/tox-py311.svg

.. |pypy38| image:: docs/_images/tox-pypy38.svg

.. |coverage| image:: docs/_images/coverage.svg

|py39| |py310| |py311| |pypy38| |coverage|

Collates textual sources with relaxed spelling.  Uses Gotoh's variant of the
Needleman-Wunsch sequence alignment algorithm.

.. code-block:: shell

   $ pip install super-collator

.. code-block:: python

   >>> from super_collator.strategy import CommonNgramsStrategy
   >>> from super_collator.token import SingleToken
   >>> from super_collator.super_collator import align, to_table

   >>> a = "Lorem ipsum dollar amat adipiscing elit"
   >>> b = "qui dolorem ipsum quia dolor sit amet consectetur adipisci velit"
   >>>
   >>> a = [SingleToken(s) for s in a.split()]
   >>> b = [SingleToken(s) for s in b.split()]
   >>>
   >>> c, score = align(a, b, CommonNgramsStrategy(2))
   >>> print(to_table(c))  # doctest: +NORMALIZE_WHITESPACE
   -   Lorem   ipsum -    dollar -   amat -           adipiscing elit
   qui dolorem ipsum quia dolor  sit amet consectetur adipisci   velit

Documentation: https://cceh.github.io/super-collator/

PyPi: https://pypi.org/project/super-collator/
