Metadata-Version: 2.1
Name: cliqs
Version: 1.0.0
Summary: Module provides implementation of multilingual crisis social media summarization model.
Author-email: Fedor Vitiugin <fedor.vitiugin@upf.edu>
License: MIT License
        
        Copyright (c) 2023 Fedor Vitiugin
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://github.com/vitiugin/cliqs
Keywords: text summarization,text classification,multilingual
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8.0
Description-Content-Type: text/markdown
License-File: LICENSE

# CLiQS Python module

CLiQS Python module provides implementation of multilingual crisis social media summarization model.

Please, if you use CLiQS for your research consider citing:

>Fedor Vitiugin, Carlos Castillo: Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers. In ACM Hypertext 2022. ACM Press. https://doi.org/10.1145/3511095.3531279

## Installation

1. Install the module via pip:

```console
pip install cliqs
```

2. Download LASER and CLiQS models:

```console
python -m laserembeddings download-models
python -m cliqs download-models
```

3. Before running the script, please check installation of [SpaCy models](https://spacy.io/models) for language that you plan to use.

```console
python -m spacy download fr_core_news_sm # for French
```


## Test use

Download [test data 'example.csv'](https://github.com/vitiugin/cliqs/blob/main/example.csv) file and put in the current directory.

Example of use:

```console
import pandas as pd
from cliqs import CliqSum

sum = CliqSum()

tweets = pd.read_csv('example.csv')
summary = sum.summarize(tweets, 'Damage', 'fr')

print(summary)
```

>cyclone seroja a touché terre en Australie, entre Kalbarri et northampton, l'oeil est encore bien dessiné mais devrait rapidement se déstructurer. cyclone seroja devrait prendre le dessus et atteindre le stade de cyclone 65kt ce WE avant de toucher terre sur côte ouest Australie dimanche soir.

- example.csv —- data file with three columns: id, text, en_text (translation of texts to English).
- Damage -- information category. Current version supports 6 categories: Casualties, Damage, Danger, Sensor, Service aand Weather.
- fr -- language of texts in file.

## Resources

Code for training custom models — [CLiQS-CM GitHub repository](https://github.com/vitiugin/CLiQS-CM)

Dataset for text classification — [tweets dataset](https://data.d4science.org/ctlg/ResourceCatalogue/cross-lingual_dataset_of_crisis-related_social_media)

Dataset for summary evaluation — [summaries dataset](https://data.d4science.org/ctlg/ResourceCatalogue/dataset_for_evaluating_abstractive_summaries_of_crisis-related_social_media)
