Metadata-Version: 2.1
Name: summac
Version: 0.0.2
License: Apache
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: transformers (==4.8.1)
Requires-Dist: click (<7.2.0,>=7.1.1)
Requires-Dist: huggingface-hub (==0.0.12)
Requires-Dist: sentencepiece (==0.1.97)
Requires-Dist: protobuf (==3.20.1)
Requires-Dist: xlrd (==1.2.0)
Requires-Dist: requests
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: torch
Requires-Dist: datasets (==1.7.0)
Requires-Dist: scikit-learn (==1.0.2)
Requires-Dist: nltk (==3.6.6)

# SummaC: Summary Consistency Detection

This repository contains the code for TACL2021 paper: SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

We release: (1) the trained SummaC models, (2) the SummaC Benchmark and data loaders, (3) training and evaluation scripts.

<p align="center">
  <img width="400" src="https://tingofurro.github.io/images/tacl2021_summac.png">
</p>

## Installing/Using SummaC

[Update] Thanks to @Aktsvigun for the help, we now have a pip package, making it easy to install the SummaC models:
```
pip install summac
```

The two trained models SummaC-ZS and SummaC-Conv are implemented in `model_summac` ([link](https://github.com/tingofurro/summac/blob/master/model_summac.py)). Once the package is installed, the models can be used like this:

### Example use

```
from summac.model_summac import SummaCZS
model = SummaCZS(granularity="sentence", model_name="vitc", device="cpu") # If you have a GPU: switch to: device="cuda"

document = """Scientists are studying Mars to learn about the Red Planet and find landing sites for future missions.
One possible site, known as Arcadia Planitia, is covered instrange sinuous features.
The shapes could be signs that the area is actually made of glaciers, which are large masses of slow-moving ice.
Arcadia Planitia is in Mars' northern lowlands."""

summary1 = "There are strange shape patterns on Arcadia Planitia. The shapes could indicate the area might be made of glaciers. This makes Arcadia Planitia ideal for future missions."
summary2 = "There are strange shape patterns on Arcadia Planitia. The shapes could indicate the area might be made of glaciers."

score1 = model.score([document], [summary1])
print("Summary Score 1 consistency: %.3f" % (score1["scores"][0])) # Prints: 0.587

score2 = model.score([document], [summary2])
print("Summary Score 2 consistency: %.3f" % (score2["scores"][0])) # Prints: 0.877
```

To load all the necessary files: (1) clone this repository, (2) add the reposity to Python path: `export PYTHONPATH="${PYTHONPATH}:/path/to/summac/"`


## SummaC Benchmark

The SummaC Benchmark consists of 6 summary consistency datasets that have been standardized to a binary classification task. The datasets included are:

<p align="center">
  <img width="500" src="https://tingofurro.github.io/images/tacl2021_summac_benchmark.png?1"><br />
  <b>% Positive</b> is the percentage of positive (consistent) summaries. IAA is the inter-annotator agreement (Fleiss Kappa). <b>Source</b> is the dataset used for the source documents (CNN/DM or XSum). <b># Summarizers</b> is the number of summarizers (extractive and abstractive) included in the dataset. <b># Sublabel</b> is the number of labels in the typology used to label summary errors.
</p>



The data-loaders for the benchmark are included in `utils_summac_benchmark.py` ([link](https://github.com/tingofurro/summac/blob/master/utils_summac_benchmark.py)). Because the dataset relies on previously published work, the dataset requires the manual download of several datasets. For each of the 6 tasks, the link and instruction to download are present as a comment in the file. Once all the files have been compiled, the benchmark can be loaded and standardized by running:
```
from utils_summac_benchmark import SummaCBenchmark
benchmark_validation = SummaCBenchmark(benchmark_folder="/path/to/summac_benchmark/", cut="val")
```

Note: we have a plan to streamline the process by further improving to automatically download necessary files if not present, if you would like to participate please let us know. If encoutering an issue in the manual download process, please contact us.

## Cite the work

If you make use of the code, models, or algorithm, please cite our paper.
```
@article{Laban2022SummaCRN,
  title={SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization},
  author={Philippe Laban and Tobias Schnabel and Paul N. Bennett and Marti A. Hearst},
  journal={Transactions of the Association for Computational Linguistics},
  year={2022},
  volume={10},
  pages={163-177}
}
```

## Contributing

If you'd like to contribute, or have questions or suggestions, you can contact us at phillab@berkeley.edu. All contributions welcome, for example helping make the benchmark more easily downloadable, or improving model performance on the benchmark.
