Metadata-Version: 2.1
Name: cldr-language-helpers
Version: 1.2.0
Summary: Some helpful unicode string processing tools in native Python
Home-page: https://github.com/night-crawler/cldr-language-helpers
Author: night-crawler
Author-email: lilo.panic@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.6
Requires-Dist: funcy

cldr-language-helpers
=====================

Basically this projects was designed to provide a map of UTF8 Block
Ranges to the locale/language code.

It uses cldr.unicode.org project as its reference.

Build
=====

Clone the latest cldr release:

.. code:: bash

    svn co http://www.unicode.org/repos/cldr/tags/release-34/ ./cldr-release-34

Tests
-----

.. code:: bash

    pip install -r requirements/dev.txt

Then run test:

.. code:: bash

    pytest

Tests will also populate the ``cldr_language_helpers/data`` directory.
Or you can use just run populate tests:

.. code:: bash

    pytest -sm generate

API
---

.. code:: python

    from cldr_language_helpers.annotator import StringAnnotator

    assert StringAnnotator('123').char_types_by_index == [{'numbers'}, {'numbers'}, {'numbers'}]
    assert 'ru_RU' in StringAnnotator('ф').langs_by_index[0]
    assert {'ru_RU', 'en_US', 'en', 'ru'}.issubset(StringAnnotator('йцу 123 qwe LOL').all_langs)

    stats = StringAnnotator('qwe йцу').lang_stats
    assert stats['ru_RU'] == 3
    assert stats['ru'] == 3
    assert stats['en'] == 3
    assert stats['space'] == 1

    assert 'en' in StringAnnotator('somesortof123').langs_intersection

    assert StringAnnotator('somesortof123').char_types_intersection == set()
    assert StringAnnotator('somesortof').char_types_intersection == {'auxiliary', 'main'}
    assert StringAnnotator(' ').char_types_intersection == {'space'}

    assert StringAnnotator('что-то everything как-то lol !').split_by_lang_intersection() == \
           ['что-то', ' ', 'everything', ' ', 'как-то', ' ', 'lol', ' ', '!']
    assert StringAnnotator('somesortof123').split_by_lang_intersection() == ['somesortof123']


    assert StringAnnotator('qwe, 123!!!').split_by_char_type() == \
           ['qwe', ',', ' ', '123', '!!!']

    assert StringAnnotator('йцу 123 qwe LOL').has_langs('ru', 'en')

    assert StringAnnotator().has_langs_throughout('ru') is False
    assert StringAnnotator('йцу 123').has_langs_throughout('ru') is True
    assert StringAnnotator('йцу 123').has_langs_throughout('ru_RU') is True


