Metadata-Version: 1.1
Name: maskouk-pysqlite
Version: 0.1
Summary: maskouk: Arabic Dictionary for Collocations - python + sqlite
Home-page: http://maskouk.sourceforge.net/
Author: Taha Zerrouki
Author-email: taha.zerrouki@gmail.com
License: GPL
Description: Maskouk-pysqlite مكتبة مسكوك
        ============================
        
        Arabic collocations library and data for Python +SQLite API |maskouk
        logo|
        
        |downloads| |downloads2|
        
        Developpers: Taha Zerrouki: http://tahadz.com taha dot zerrouki at gmail
        dot com
        
        +---------+------------------------------------------------------------------+
        | Feature | value                                                            |
        | s       |                                                                  |
        +=========+==================================================================+
        | Authors | `Authors.md <https://github.com/linuxscout/maskouk-pysqlite/mast |
        |         | er/AUTHORS.md>`__                                                |
        +---------+------------------------------------------------------------------+
        | Release | 0.1                                                              |
        +---------+------------------------------------------------------------------+
        | License | `GPL <https://github.com/linuxscout/maskouk-pysqlite/master/LICE |
        |         | NSE>`__                                                          |
        +---------+------------------------------------------------------------------+
        | Tracker | `linuxscout/maskouk/Issues <https://github.com/linuxscout/maskou |
        |         | k-pysqlite/issues>`__                                            |
        +---------+------------------------------------------------------------------+
        | Website | `http://maskouk.sourceforge.net <http://maskouk-pysqlite.sourcef |
        |         | orge.net>`__                                                     |
        +---------+------------------------------------------------------------------+
        | Source  | `Github <http://github.com/linuxscout/maskouk-pysqlite>`__       |
        +---------+------------------------------------------------------------------+
        | Downloa | `sourceforge <http://maskouk.sourceforge.net>`__                 |
        | d       |                                                                  |
        +---------+------------------------------------------------------------------+
        | Feedbac | `Comments <https://github.com/linuxscout/maskouk-pysqlite/>`__   |
        | ks      |                                                                  |
        +---------+------------------------------------------------------------------+
        | Account | [@Twitter](https://twitter.com/linuxscout)                       |
        | s       | [@Sourceforge](http://sourceforge.net/projects/maskouk/)         |
        +---------+------------------------------------------------------------------+
        
        Description
        -----------
        
        Maskouk is a database of arab ic collocations extracted from Wikipedia.
        
        Arabic wikipedia data base 2011-Jun-21.
        
        install
        ~~~~~~~
        
        .. code:: shell
        
            pip install maskouk-pysqlite
        
        Usage
        ~~~~~
        
        import
        ^^^^^^
        
        .. code:: python
        
            >>> import pyarabic.araby as araby
            >>> import maskouk.collocations as msk
            >>> mydict = msk.CollocationClass()
        
        Test if collocation exists in database
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        .. code:: python
        
            >>> wlist = [u"كرة", u"القدم"]
            >>> # test if collocation exists
            >>> results = mydict.is_collocated(wlist)
            >>> print("inuput:", wlist)
            >>> print("output:",results)
            inuput: ['كرة', 'القدم']
            output: كرة القدم
            >>> wlist = [u"شمس", u"النهار"]
            >>> results = mydict.is_collocated(wlist)
            >>> print("inuput:", wlist)
            >>> print("output:",results)
            inuput: ['شمس', 'النهار']
            output: False
        
        Test if a word has collocations in database
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        .. code:: python
        
            >>> # get all collocations for a specific word
            >>> word1 = u"كرة"
            >>> results  = mydict.is_collocated_word(word1)
            >>> print("inuput:", word1)
            >>> print("output:",results)
            inuput: كرة
            output: {'القدم': 'كُرَة الْقَدَمِ'}
            >>>
            >>> word = u"بيت"
            >>> # get all collocations for a specific word
            >>> results  = mydict.is_collocated_word(word)
            >>> print("inuput:", word)
            >>> print("output:",results)
            inuput: بيت
            output: {'العدة': 'بَيْت الْعِدَّةِ', 'المستأجر': 'بَيْت الْمُسْتَأْجِرِ', 'المشتري': 'بَيْتِ الْمُشْتَرِي', 'الرجل': 'بَيْت الرَّجُلِ', 'البناء': 'بَيْت الْبِنَاءِ', 'الزوج': 'بَيْت الزَّوْجِ', 'المال': 'بيت المال', 'المقدس': 'بَيْت الْمَقْدِسِ', 'البائع': 'بَيْت الْبَائِعِ', 'الخلاء': 'بَيْت الْخَلَاءِ', 'الأب': 'بَيْت الْأَبِ', 'الله': 'بَيْت اللّهِ'}
        
        Detect collocation in a phrase
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        It can be presented asseparated lists or tagged forms
        
        .. code:: python
        
            >>> # detect collocations in phrase    
            >>> text = u"لعبنا مباراة كرة القدم في بيت المقدس"
            >>> wordlist = araby.tokenize(text)
            >>> results  = mydict.ngramfinder(2, wordlist)
            >>> print("inuput:", text)
            >>> print("output:",results)
            inuput: لعبنا مباراة كرة القدم في بيت المقدس
            output: ['لعبنا', 'مباراة', 'كرة القدم', 'في', 'بيت المقدس']
            >>> # detect collocations in phrase    
            >>> text = u"لعبنا مباراة كرة القدم في بيت المقدس"
            >>> wordlist = araby.tokenize(text)
            >>> results   = mydict.lookup(wordlist)
            >>> print("inuput:", text)
            >>> print("output:",results)
            inuput: لعبنا مباراة كرة القدم في بيت المقدس
            output: (['لعبنا', 'مباراة', 'كُرَة', 'الْقَدَمِ', 'في', 'بَيْت', 'الْمَقْدِسِ'], ['CO', 'CO', 'CB', 'CI', 'CO', 'CB', 'CI'])
            >>> 
        
        detect long collocations in a phrase
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        Some collocations are too long to be used in a bigrams database like
        "بسم الله الرحمن الرحيم" "السلام عليكم ورحمة الله وبركاته" "أهلا وسهلا
        بكم"
        
        .. code:: python
        
            >>> # get Long collocations
            ... text = u" قلت لهم السلام عليكم ورحمة الله تعالى وبركاته ثم رجعت"
            >>> results  = mydict.lookup4long_collocations(text)
            >>> print("inuput:", text)
            inuput:  قلت لهم السلام عليكم ورحمة الله تعالى وبركاته ثم رجعت
            >>> print("output:",results)   
            output:  قلت لهم السّلامُ عَلَيكُمْ وَرَحْمَةُ اللهِ تَعَالَى وبركاته ثم رجعت
        
        Detect candidate collocations in phrase
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        The candidate collocation doesn't exists in the database, this feature
        is used to extract collocations based on rules. It returns a rule code,
        100 as default (no collocation)
        
        .. code:: python
        
            >>> text = u"ظهر رئيس الوزراء السيد عبد الملك بن عامر ومعه أمير دولة غرناطة ونهر النيل انطلاق السباق"
            >>> wordlist = araby.tokenize(text)
            >>> previous = "__"
            >>> for wrd in wordlist:
            ...     wlist = [previous, wrd]
            ...     results  = mydict.is_possible_collocation(wlist, lenght = 2)
            ...     print("inuput:", wlist)
            ...     print("output:", results)   
            ...     previous  = wrd
            ... 
            inuput: ['__', 'ظهر']
            output: 100
            inuput: ['ظهر', 'رئيس']
            output: 100
            inuput: ['رئيس', 'الوزراء']
            output: 100
            inuput: ['الوزراء', 'السيد']
            output: 20
            inuput: ['السيد', 'عبد']
            output: 100
            inuput: ['عبد', 'الملك']
            output: 15
            inuput: ['الملك', 'بن']
            output: 100
            inuput: ['بن', 'عامر']
            output: 15
            inuput: ['عامر', 'ومعه']
            output: 100
            inuput: ['ومعه', 'أمير']
            output: 100
            inuput: ['أمير', 'دولة']
            output: 100
            inuput: ['دولة', 'غرناطة']
            output: 10
            inuput: ['غرناطة', 'ونهر']
            output: 100
            inuput: ['ونهر', 'النيل']
            output: 100
            inuput: ['النيل', 'انطلاق']
            output: 100
            inuput: ['انطلاق', 'السباق']
            output: 100
            >>> 
        
        [requirement]
        ^^^^^^^^^^^^^
        
        ::
        
            1- pyarabic 
            2. sqlite
        
        Data Structure:
        ---------------
        
        Colocations database
        ~~~~~~~~~~~~~~~~~~~~
        
        .. code:: sql
        
            CREATE TABLE "collocations" (
                "id" INTEGER PRIMARY KEY  NOT NULL , 
                "vocalized" VARCHAR,
                "unvocalized" VARCHAR,
                "rule" VARCHAR, 
                "category" VARCHAR, 
                "note" VARCHAR,
                "first" VARCHAR,
                "second" VARCHAR
                );
        
        CSV Structure:
        
        1. id : id unique in the database
        2. vocalized : vocalized collocation
        3. unvocalized : unvocalized collocation
        4. rule : the extraction rule number
        5. category : collocation category
        6. note :
        7. first: first word
        8. second: second word
        
        
        
        .. |maskouk logo| image:: doc/maskouk_header.png
        .. |downloads| image:: https://img.shields.io/sourceforge/dt/maskouk.svg
           :target: http://sourceforge.org/projects/maskouk
        .. |downloads2| image:: https://img.shields.io/sourceforge/dm/maskouk.svg
           :target: http://sourceforge.org/projects/maskouk
        
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: End Users/Desktop
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
