Metadata-Version: 2.1
Name: rollet
Version: 0.0.2a0
Summary: Collect data from various sources
Home-page: UNKNOWN
Author: Opscidia (Tech)
Author-email: tech@opscidia.com
Maintainer: Loïc Rakotoson
Maintainer-email: loic.rakotoson@opscidia.com
License: UNKNOWN
Description: # Rollet
        `Rollet` collects, standardizes and completes from various sources.
        
        [![PyPI](https://img.shields.io/pypi/v/Rollet?logo=PyPI&style=for-the-badge&labelColor=%233775A9&logoColor=white)](https://pypi.org/project/rollet/)
        ![PyPI - Status](https://img.shields.io/pypi/status/rollet?style=for-the-badge)
        [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/rollet?logo=python&logoColor=yellow&style=for-the-badge)](https://pypi.org/project/rollet/)
        
        
        
        # Installation
        ## Pypi
        The safest way to install `rollet` is to go through pip
        ```bash
        python -m pip install rollet
        ```
        
        # How to use?
        ## Command script
        ```sh
        usage: rollet {extract-txt,extract-csv,extract-json} path
                      [-h] [-o [OUTFILE]] [-l [LINK]] [-f [FIELDS]] [--start [START]]
                      [--size [SIZE]] [-t [TIMESLEEP]]
        
        positional arguments:
          {extract-txt,extract-csv,extract-json} Choose file type option extraction
          path                                   file path
        
        optional arguments:
          -h, --help                   show this help message and exit
          -o [OUTFILE], --outfile      output file path
          -l [LINK], --link  link      field if csv or json
          -f [FIELDS], --fields        fields to keep separated by comma
          --start [START]              number of rows to skip
          --size  [SIZE]               max number of rows to keep
          -t [TIMESLEEP], --timesleep  sleep time in seconds between two pulling
        ```
        
        ## Python
        ### Basic usage
        ```python
        from rollet import get_content
        from rollet.extractor import BaseExtractor
        
        url = 'https://example.url.com/content-id'
        
        content_dict = get_content(url)
        
        content_object = BaseExtractor(url)
        content_object.title            # Title
        content_object.abstract         # Abstract
        content_object.lang             # Language
        content_object.content_type     # Type (pdf, json, html, ...)
        content_object.to_dict()        # Same as get_content
        ```
        
        ### Custom extractors
        ```python
        class CustomExtractor(BaseExtractor):
            
            @property
            def title(self):
                return self._page.find('title')
        ```
        
        And More!
Keywords: fetch,pull,extract,scrap
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
