Metadata-Version: 1.1
Name: icePick
Version: 0.0.3.1
Summary: icePick is a All in one Package library for easy Scraping
Home-page: https://github.com/teitei-tk/ice-pick
Author: teitei-tk
Author-email: teitei.tk@gmail.com
License: MIT
Download-URL: https://github.com/teitei-tk/ice-pick/archive/master.tar.gz
Description: 
        IcePick
        ===================
        
        IcePick is a All in one Package library for easy Scraping
        
        --------------
        
        Concept
        -------
        
        -  Lightweight Scraping Library
        -  All in one Package library for easy Scraping
        
        Requirements
        ------------
        
        -  Python 3.4 or later(not support 2.x)
        -  MongoDB
        
        Dependencies Libraries
        ----------------------
        
        -  aiohttp
        -  beautifulsoup4
        -  pymongo >= 3.0
        -  nose
        
        Usage
        -----
        
        Scraping Flow,
        
        ::
        
            Your Scraping Order(Order) -> Do Scraping(Picker) -> HTML Parse(Parser) -> Save in Database(Recorder)
        
        Example
        -------
        
        get a my repository filenames
        
        .. code:: python
        
        
            import icePick
        
            db = icePick.get_database('icePick_example', 'localhost')
        
        
            class GithubRepoParser(icePick.Parser):
                def serialize(self):
                    result = {
                        "files": [],
                    }
        
                    for v in self.bs.find_all(class_="js-directory-link"):
                        result['files'] += [v.text]
                    return result
        
        
            class GithubRepoRecorder(icePick.Recorder):
                struct = icePick.Structure(files=list())
        
                class Meta:
                    database = db
        
        
            class GithubRepoOrder(icePick.Order):
                recorder = GithubRepoRecorder
                parser = GithubRepoParser
        
        
            def main():
                document = {
                    'url': 'https://github.com/teitei-tk/ice-pick/tree/master',
                    'ua': 'Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko',
                }
        
                print('---download start---')
                order = GithubRepoOrder(document.get('url'), document.get('ua'))
                picker = icePick.Picker([order])
                picker.run()
                print("---finish---")
        
            if __name__ == "__main__":
                main()
        
        ::
        
            >>> import icePick
            >>> db = icePick.get_database('icePick_example', 'localhost')
            >>> class GithubRepoRecorder(icePick.Recorder):
            ...     struct = icePick.Structure(files=list())
            ...     class Meta:
            ...         database = db
            ...
            >>> records = GithubRepoRecorder.find()
            >>> records[0].files
            ['example', 'icePick', 'tests', 'LICENSE', 'README.md', 'circle.yml', 'requirements.txt']
            >>>
        
        TODO
        ----
        
        -  Crawling
        -  Document
        
        LICENSE
        -------
        
        -  MIT
        
        
Keywords: scraping
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
