Metadata-Version: 2.4
Name: GAICo
Version: 0.1.5
Summary: GenAI Results Comparator, GAICo, is a Python library to help compare, analyze and visualize outputs from Large Language Models (LLMs), often against a reference text. In doing so, one can use a range of extensible metrics from the literature.
Project-URL: Bug Tracker, https://github.com/ai4society/GenAIResultsComparator/issues
Project-URL: Documentation, https://ai4society.github.io/projects/GenAIResultsComparator/index.html
Project-URL: Homepage, https://github.com/ai4society/GenAIResultsComparator
Project-URL: Repository, https://github.com/ai4society/GenAIResultsComparator
Author-email: AI4Society Team <ai4societyteam@gmail.com>, Nitin Gupta <nitin1209@gmail.com>, Pallav Koppisetti <pallav.koppisetti5@gmail.com>, Biplav Srivastava <prof.biplav@gmail.com>
Maintainer-email: AI4Society Team <ai4societyteam@gmail.com>, Nitin Gupta <nitin1209@gmail.com>
License: MIT License
        
        Copyright (c) 2024 AI for Society Research Group
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: evaluation,generative-ai,llm,metrics,nlp,text-comparison
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: <3.13,>=3.10
Requires-Dist: levenshtein>=0.23.0
Requires-Dist: matplotlib>=3.7.5
Requires-Dist: numpy>=1.26
Requires-Dist: pandas>=2.2.1
Requires-Dist: rouge-score>=0.1.2
Requires-Dist: seaborn>=0.13.0
Provides-Extra: bertscore
Requires-Dist: bert-score>=0.3.13; extra == 'bertscore'
Provides-Extra: cosine
Requires-Dist: scikit-learn==1.5.0; extra == 'cosine'
Provides-Extra: jsd
Requires-Dist: nltk>=3.9.1; extra == 'jsd'
Requires-Dist: scipy>=1.15.3; extra == 'jsd'
Description-Content-Type: text/markdown

<!-- This file is generated by root's scripts/generate_pypi_description.py -->

# GAICo: GenAI Results Comparator

**Repository:** [github.com/ai4society/GenAIResultsComparator](https://github.com/ai4society/GenAIResultsComparator)

**Documentation:** [ai4society.github.io/projects/GenAIResultsComparator](https://ai4society.github.io/projects/GenAIResultsComparator/index.html)

## Overview

_GenAI Results Comparator, GAICo, is a Python library_ to help compare, analyze and visualize outputs from Large Language Models (LLMs), often against a reference text. In doing so, one can use a range of extensible metrics from the literature.

At the core, the library provides a set of metrics for evaluating text strings given as inputs and produce outputs on a scale of 0 to 1 (normalized), where 1 indicates a perfect match between the texts. These scores are use to analyze LLM outputs as well as visualize.

## Quickstart

GAICo's `Experiment` class offers a streamlined workflow for comparing multiple model outputs, applying thresholds, generating plots, and creating CSV reports.

Here's a quick example:

```python
from gaico import Experiment

# Sample data from https://arxiv.org/abs/2504.07995
llm_responses = {
    "Google": "Title: Jimmy Kimmel Reacts to Donald Trump Winning the Presidential ... Snippet: Nov 6, 2024 ...",
    "Mixtral 8x7b": "I'm an Al and I don't have the ability to predict the outcome of elections.",
    "SafeChat": "Sorry, I am designed not to answer such a question.",
}
reference_answer = "Sorry, I am unable to answer such a question as it is not appropriate."

# 1. Initialize Experiment
exp = Experiment(
    llm_responses=llm_responses,
    reference_answer=reference_answer
)

# 2. Compare models using specific metrics
#   This will calculate scores for 'Jaccard' and 'ROUGE',
#   generate a plot (e.g., radar plot for multiple metrics/models),
#   and save a CSV report.
results_df = exp.compare(
    metrics=['Jaccard', 'ROUGE'],  # Specify metrics, or None for all defaults
    plot=True,
    output_csv_path="experiment_report.csv",
    custom_thresholds={"Jaccard": 0.6, "ROUGE_rouge1": 0.35} # Optional: override default thresholds
)

# The returned DataFrame contains the calculated scores
print("Scores DataFrame from compare():")
print(results_df)
```

For more detailed examples, please refer to our Jupyter Notebooks in the [`examples/`](https://github.com/ai4society/GenAIResultsComparator/tree/main/examples) folder in the repository.

## Features

- Implements various metrics for text comparison:
  - N-gram-based metrics (_BLEU_, _ROUGE_, _JS divergence_)
  - Text similarity metrics (_Jaccard_, _Cosine_, _Levenshtein_, _Sequence Matcher_)
  - Semantic similarity metrics (_BERTScore_)
- Provides visualization capabilities using matplotlib and seaborn for plots like bar charts and radar plots.
- Allows exportation of results to CSV files, including scores and threshold pass/fail status.
- Provides streamlined `Experiment` class for easy comparison of multiple models, applying thresholds, plotting, and reporting.
- Supports batch processing for efficient computation.
- Optimized for different input types (lists, numpy arrays, pandas Series).
- Has extendable architecture for easy addition of new metrics.
- Supports automated testing of metrics using [Pytest](https://docs.pytest.org/en/stable/).

## Installation

GAICo can be installed using pip. We strongly recommend using a [Python virtual environment](https://docs.python.org/3/tutorial/venv.html) to manage dependencies and avoid conflicts with other packages.

- **Create and activate a virtual environment** (e.g., named `gaico-env`):

  ```shell
    # For Python 3.10+
    python3 -m venv gaico-env
    source gaico-env/bin/activate  # On macOS/Linux
    # gaico-env\Scripts\activate   # On Windows
  ```

- **Install GAICo:**
  Once your virtual environment is active, install GAICo using pip:

  ```shell
    pip install gaico
  ```

This installs the core GAICo library.

### Using GAICo with Jupyter Notebooks/Lab

If you plan to use GAICo within Jupyter Notebooks or JupyterLab (recommended for exploring examples and interactive analysis), install them into the _same activated virtual environment_:

```shell
# (Ensure your 'gaico-env' is active)
pip install notebook  # For Jupyter Notebook
# OR
# pip install jupyterlab # For JupyterLab
```

Then, launch Jupyter from the same terminal where your virtual environment is active:

```shell
# (Ensure your 'gaico-env' is active)
jupyter notebook
# OR
# jupyter lab
```

New notebooks created in this session should automatically use the `gaico-env` Python environment. For troubleshooting kernel issues, please see our [FAQ document](https://ai4society.github.io/projects/GenAIResultsComparator/faq).

### Optional Installations for GAICo

The default installation includes core metrics and is lightweight. For optional features and metrics that have larger dependencies:

- To include the **BERTScore** metric (which has larger dependencies like PyTorch):
  ```shell
  pip install 'gaico[bertscore]'
  ```
- To include the **CosineSimilarity** metric (requires scikit-learn):
  ```shell
  pip install 'gaico[cosine]'
  ```
- To include the **JSDivergence** metric (requires SciPy and NLTK):
  ```shell
  pip install 'gaico[jsd]'
  ```
- To install with **all optional features**:
  ```shell
  pip install 'gaico[bertscore,cosine,jsd]'
  ```
  _(Note: All optional features are also installed if you use the `dev` extra for development installs.)_

### Installation Size Comparison
The following table provides an _estimated_ overview of the relative disk space impact of different installation options. Actual sizes may vary depending on your operating system, Python version, and existing packages. These are primarily to illustrate the relative impact of optional dependencies.

_Note:_ Core dependencies include: `levenshtein`, `matplotlib`, `numpy`, `pandas`, `rouge-score`, and `seaborn`.

| Installation Command                        | Dependencies                                                 | Estimated Total Size Impact |
| ------------------------------------------- | ------------------------------------------------------------ | --------------------------- |
| `pip install gaico`                         | Core                                                         | 210 MB                      |
| `pip install 'gaico[jsd]'`                  | Core + `scipy`, `nltk`                                       | 310 MB                      |
| `pip install 'gaico[cosine]'`               | Core + `scikit-learn`                                        | 360 MB                      |
| `pip install 'gaico[bertscore]'`            | Core + `bert-score` (includes `torch`, `transformers`, etc.) | 800 MB                      |
| `pip install 'gaico[bertscore,cosine,jsd]'` | Core + all dependencies from above                           | 950 MB                      |

### For Developers (Installing from source)

If you want to contribute to GAICo or install it from source for development:

1.  Clone the repository:

    ```shell
    git clone https://github.com/ai4society/GenAIResultsComparator.git
    cd GenAIResultsComparator
    ```

2.  Set up a virtual environment and install dependencies:
    We recommend using [UV](https://docs.astral.sh/uv/#installation) for managing environments and dependencies.

    ```shell
    # Create a virtual environment (e.g., Python 3.10-3.12 recommended)
    uv venv
    # Activate the environment
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    # Install the package in editable mode with all development dependencies
    # (includes all optional features like bertscore, cosine, jsd)
    uv pip install -e ".[dev]"
    ```

    _If you prefer not to use `uv`,_ you can use `pip`:

    ```shell
    # Create a virtual environment (e.g., Python 3.10-3.12 recommended)
    python3 -m venv .venv
    # Activate the environment
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    # Install the package in editable mode with development extras
    pip install -e ".[dev]"
    ```

    _(The `dev` extra installs GAICo with all its optional features, plus dependencies for testing, linting, building, and documentation.)_

3.  Set up pre-commit hooks (optional but recommended for contributors):

    ```shell
    pre-commit install
    ```


## Citation

If you find GAICo useful in your research or work, please consider citing it:

If you find this project useful, please consider citing it in your work:

```bibtex
@software{AI4Society_GAICo_GenAI_Results,
  author = {{Nitin Gupta, Pallav Koppisetti, Biplav Srivastava}},
  license = {MIT},
  title = {{GAICo: GenAI Results Comparator}},
  year = {2025},
  url = {https://github.com/ai4society/GenAIResultsComparator}
}
```
