Metadata-Version: 1.2
Name: usbusiness
Version: 0.1.8
Summary: NAICS code business domain classifier and domain utility kit
Home-page: UNKNOWN
Author: Glendon Thompson
Author-email: glendonthompson1@gmail.com
License: MIT
Description: # usbusiness
        
        The aim of the project ot to provide an open source business classifier using website information.
        
        ## Reasearch
        
        Web Page Classification: Features and Algorithms (2009)
        https://www.cs.ucf.edu/~dcm/Teaching/COT4810-Fall%202012/Literature/WebPageClassification.pdf
        
        Automated Text Classification in the DMOZ Hierarchy (2009)
        http://users.cecs.anu.edu.au/~ssanner/Papers/Lachlan_Report.pdf
        
        Topical Web-page classification of the DMOZ Dataset (2015)
        https://github.com/kahliloppenheimer/Web-page-classification/blob/master/paper.pdf
        
        ## Industrys of Weakness
        
        1. Religious
        2. Oil and Gas
        3. Finance
        4. Large Companies
        
        ### Options
        
        1. Remove stop words (T/F)
        2. My words selection, None, google_10, google_100k
        
        ### TO DO
        
        1. Link depth pull option
        2. Data Set
        3. Training / Validation
        
        ### Components
        
        1. The data set
        2. The words
        3. The confidence
        4. Link depth
        5. The predictive model
        
        ### Ideas
        
        1. Stemmers
        
Keywords: naics usa business predict website analytics industry classfication
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Build Tools
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2.7
Requires-Python: >=2.7
