Metadata-Version: 2.1
Name: gismo
Version: 0.4.0
Summary: GISMO is a NLP tool to rank and organize a corpus of documents according to a query.
Home-page: https://github.com/balouf/gismo
Author: Fabien Mathieu
Author-email: fabien.mathieu@normalesup.org
License: GNU General Public License v3
Description: =====
        GISMO
        =====
        
        
        .. image:: https://img.shields.io/pypi/v/gismo.svg
                :target: https://pypi.python.org/pypi/gismo
        
        .. image:: https://img.shields.io/travis/balouf/gismo.svg
                :target: https://travis-ci.org/balouf/gismo
        
        .. image:: https://readthedocs.org/projects/gismo/badge/?version=latest
                :target: https://gismo.readthedocs.io/en/latest/?badge=latest
                :alt: Documentation Status
        
        
        .. image:: https://codecov.io/gh/balouf/gismo/branch/master/graphs/badge.svg
                :target: https://codecov.io/gh/balouf/gismo/branch/master/graphs/badge
                :alt: Code Coverage
        
        
        
        
        
        GISMO is a NLP tool to rank and organize a corpus of documents according to a query.
        
        Gismo stands for Generic Information Search... with a Mind of its Own.
        
        * Free software: GNU General Public License v3
        * Github: https://github.com/balouf/gismo.
        * Documentation: https://gismo.readthedocs.io.
        
        
        Features
        --------
        
        Gismo combines three main ideas:
        
        * **TF-IDTF**: a symmetric version of the TF-IDF embedding.
        * **DIteration**: a fast, push-based, variant of the PageRank algorithm.
        * **Fuzzy dendrogram**: a variant of the Louvain clustering algorithm.
        
        Quickstart
        ----------
        
        Install gismo:
        
        .. code-block:: console
        
            $ pip install gismo
        
        Import gismo in a Python project::
        
            import gismo as gs
        
        
        To get the hang of a typical Gismo workflow, you can check the `Toy Example`_ notebook. For more advanced uses,
        look at the other tutorials_ or directly the reference_ section.
        
        
        
        Credits
        -------
        
        Thomas Bonald, Anne Bouillard, Marc-Olivier Buob, Dohy Hong.
        
        This package was created with Cookiecutter_ and the `francois-durand/package_helper`_ project template.
        
        .. _reference: https://gismo.readthedocs.io/en/latest/reference.html
        .. _`Toy Example`: https://gismo.readthedocs.io/en/latest/tutorials/tutorial_toy_example.html
        .. _tutorials: https://gismo.readthedocs.io/en/latest/tutorials/index.html#
        .. _Cookiecutter: https://github.com/audreyr/cookiecutter
        .. _`francois-durand/package_helper`: https://github.com/francois-durand/package_helper
        
        
        =======
        History
        =======
        
        X.X.X (TODO-List)
        -----------------
        * Rethink distortion on both vectors normalization and IDTF/query trade-off.
        * Accelerate similarity computation (currently sklearn-based) in clustering.
        * Add a logo!
        
        0.4.0 (2020-07-21)
        ------------------
        0.4 is a big update. Lot of things added, lot of things changed.
        
        * New API for Gismo runtime parameters (see new parameters module for details). Short version:
            * ``gismo = Gismo(corpus, embedding, alpha=0.85)``: create a gismo with damping factor set to 0.85 instead of default value.
            * ``gismo.parameters.alpha = 0.85``: set the damping factor of the gismo to 0.85.
            * ``gismo.rank(query, alpha=0.85)``: makes a query with damping factor temporarily set to 0.85.
        * Landmarks! Half Corpus, half Gismo, the Landmarks class can simplify many analysis tasks.
            * Landmarks are (small) corpus where each entry is augmented with the computation of an associated gismo query;
            * Landmarks can be used to refine the analysis around a part of your data;
            * They can be used as soft and fast classifiers.
            * Landmarks' runtime parameters follow the same approach than for Gismo instances (cf above).
            * See the dedicated tutorial to learn more!
        * Documentation summer cleaning.
        * ``query_distortion`` parameter (reshape subspace for clustering) is renamed ``distortion`` and is now a float instead of a bool (e.g. you can apply distortion in a non-binary way).
        * Full refactoring of get_*** and post_*** methods and objects.
            * The good news is that they are now more natural, self-describing, and unified.
            * The bad news is that there is no backward-compatibility with previous Gismo versions. Hopefully this refactoring
              will last for some time!
        
        0.3.1 (2020-06-12)
        ------------------
        
        * New dataset: Reuters C50
        * New module: sentencizer
        
        
        0.3.0 (2020-05-13)
        ------------------
        
        * dblp module: url2source function added to directly load a small dblp source in memory instead of using a FileSource approach.
        * Possibility to disable query distortion in gismo.
        * XGismo class to cross analyze embeddings.
        * Tutorials updated
        
        0.2.5 (2020-05-11)
        ------------------
        
        * auto_k feature: if not specified, a query-dependent, reasonable, number of results k is estimated.
        * covering methods added to gismo. It is now possible to use get_covering_* instead of get_ranked_* to maximize coverage and/or eliminate redundancy.
        
        
        0.2.4 (2020-05-07)
        ------------------
        
        * Tutorials for ACM and DBLP added. After cleaning, there is currently 3 tutorials:
            * Toy model, to get the hang of Gismo on a tiny example,
            * ACM, to play with Gismo on a small example,
            * DBLP, to play with a large dataset.
        
        
        0.2.3 (2020-05-04)
        ------------------
        
        * ACM and DBLP dataset creation added.
        
        
        0.2.2 (2020-05-04)
        ------------------
        
        * Notebook tutorials added (early version)
        
        0.2.1 (2020-05-03)
        ------------------
        
        * Actual code
        * Coverage badge
        
        0.1.0 (2020-04-30)
        ------------------
        
        * First release on PyPI.
        
Keywords: gismo
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/x-rst
