Metadata-Version: 2.1
Name: irspdf
Version: 0.4.0
Summary: A simple information retrieval system for pdf documents
Home-page: UNKNOWN
Author: Jibril Frej
Author-email: <frejjibril@gmail.com>
License: UNKNOWN
Project-URL: Source code, https://github.com/Jibril-Frej/irspdf
Project-URL: Documentation, https://irspdf.readthedocs.io/en/latest/
Description: # README
        ## Presentation
        
        irspdf is a simple textual information retrieval system for pdf documents.
        
        Text is extracted from pdf with [pdfplumber](https://pypi.org/project/pdfplumber/).
        
        Standard text preprocessing for information retrieval is applied:
        * StopWord removal
        * Stemming 
        * Punctuation removal
        * Lowercase conversion
        
        The ranking function used is BM25.
        
        
        ## Installation
        
        ### Install with pip
        ```
        pip install irspdf
        ```
        
        ### OR install from github
        ```
        git clone https://github.com/Jibril-Frej/irspdf.git
        cd irspdf && python setup.py install
        ```
        
        ## Usage
        
        ### Build a collection
        
        ```
        from irspdf import build
        build(folder_path, collection_path)
        ```
        folder_path : path of the folder that contains all the pdf files to include to the collection.
        
        collection_path : file where the collection will be saved
        
        ### Query the collection
        
        ```
        from irspdf import query
        query(collection_path)
        ```
        
        collection_path : file where the collection is saved
        
        ### Update the collection
        
        ```
        from irspdf import update
        update(folder_path, collection_path)
        ```
        
        folder_path : path of the folder that contains all the pdf files to add to the collection.
        
        collection_path : file where the original collection is saved
        
        ## Useful links
        
        Documentation:  [https://irspdf.readthedocs.io/en/latest/](https://irspdf.readthedocs.io/en/latest/).
        
        Source Code: [https://github.com/Jibril-Frej/irspdf](https://github.com/Jibril-Frej/irspdf)
        
        Package: [https://pypi.org/project/irspdf/](https://pypi.org/project/irspdf/)
        
Keywords: python,information retrieval
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Description-Content-Type: text/markdown
