Metadata-Version: 1.1
Name: fastent
Version: 0.5.1
Summary: Automated Custom NER tool
Home-page: https://github.com/fastent/fastent
Author: Erik Arakelyan
Author-email: erikarakelyan1997@gmail.com
License: UNKNOWN
Download-URL: https://github.com/fastent/fastent/archive/0.5.tar.gz
Description: # fastent
        
        Fastent is a tool designed for creating end to end Custom Named Entity Recognition models. Entities **ARE NOT** limited to the usual predefiend classes of Person(PER), Location(LOC), Companies/agencies/institutions(ORG) etc etc. Any custom entity that can be described using a list of words can be created. 
        
        The package is comprised of several modules that can be used both sperately for their designated tasks (i.e Anotation, contextualization, etc etc.) or in a combined workflow. Most of the modules offer multilingual support, meaning the datasets and text don't necessarily require English language.
        
        Table of contents
        =================
        
        <!--ts-->
         * [Installation](#installation)
         * [Usage](#usage)
            * [Dataset generation](#Dataset-Generation)
            * [Contextualization](#Contextualization)
            * [Api for model download](#Api)
            * [Annotation](#Annotation)
            * [Text utilities](#Text-utilities)
            * [wordnet utilities](#Wordnet)
            * [Poincare embeddings wrapper](#Poincare)
            * [Combinging everyting](#combo)
         * [Baselines](#tests)
         * [Dependency](#dependency)
        <!--te-->
        
        
        Installation
        ============
        
        This section show the process for installing the package with different methods
        
        ### From source
        
        1) lets start by cloning the package
        
        ``` 
        git clone https://github.com/fastent/fastent.git
        ```
        2) Installing all the relevant packages
        
        ``` 
        pip install -r requirements.txt 
        ```
        
        3) Install couchDB 
        
        Update the current packages
        ```
        sudo apt-get update
        ```
        
        Adding PPA Repository
        ```
        sudo apt-get install software-properties-common
        sudo add-apt-repository ppa:couchdb/stable
        sudo apt-get update
        ```
        
        Installing CouchDB
        ```
        sudo apt-get install couchdb
        ```
        
        Ownership changes (recommended to fix the permission)
        
        ```
        sudo chown -R couchdb:couchdb /usr/bin/couchdb /etc/couchdb /usr/share/couchdb
        ```
        
        Once this is completed we need to fix the permissions
        
        ```
        sudo chmod -R 0770 /usr/bin/couchdb /etc/couchdb /usr/share/couchdb
        ```
        
        Restarting CouchDB
        
        ```
        sudo systemctl restart couchdb
        ```
        
        couchDB can now be accessed from http://127.0.0.1:5984/_utils/
        
        4) Now you need to install NLTK dependencies.
        
        ```
        >>> import nltk
        >>> nltk.download()
        ```
        The minimum installation requires to download the *stopwords* corpora. (Feel free to add more if you feel so)
        
        ### From pip
        
        Coming Soon
        
        Usage
        ======
        
        ## Dataset generation
        
        The module includes a possibility to generate a dataset for raw entity words.
        Example command looks as this if using source
        
        ```
        python dataset_pseudo_generator.py -m en_core_web_lg -s cocaine,heroin
        ```
        
        If using the package is installed 
        ```
        from fastent import dataset_pseudo_generator
        
        model = dataset_pseudo_generator.spacy_initialize(model_name)
        dataset_pseudo_generator.dataset_generate(model,['cocaine', 'heroin'], 100)
        
        ```
        
        
        
        
        
Keywords: NER,Anotation,Contextualization
Platform: UNKNOWN
