Metadata-Version: 2.1
Name: factool
Version: 0.0.1
Summary: Factuality Detection for Generative AI
Home-page: https://github.com/GAIR-NLP/factool
Author: GAIR Research Group
License: Apache License
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3 :: Only
Description-Content-Type: text/markdown
Requires-Dist: openai (==0.27.8)
Requires-Dist: PyYAML (==6.0)
Requires-Dist: asyncio (==3.4.3)
Requires-Dist: numpy
Requires-Dist: pydantic (==1.10.9)
Requires-Dist: scholarly (==1.7.11)
Requires-Dist: scikit-learn
Requires-Dist: aiohttp (==3.8.4)
Requires-Dist: fastapi (==0.96.0)
Requires-Dist: uvicorn (==0.22.0)

## Factuality Detection in Generative AI

This repository contains the source code and plugin configuration for our paper [Factuality Detection in Generative AI: A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios]().

Factool is a tool augmented framework for detecting factual errors of texts generated by large language models (e.g., ChatGPT). 

Factool now supports 4 tasks: 
* **knowledge-based QA**: Factool detects factual errors in knowledge-based QA.
* **code generation**: Factool detects execution errors in code generation.
* **mathematical reasoning**: Factool detects calculation errors in mathematical reasoning.
* **scientific literature review**: Factool detects hallucinated scientific literatures.

<p align="center"> 
<img src="./figure/factool.png" width="300"/>
 </p>

## Installation
```bash
conda create -n factool_env python=3.11
conda activate factool_env
pip uninstall factool
pip install git+https://github.com/GAIR-NLP/factool
```

## Quick Start
1. Install the package.
2. Create a file called keys.yaml and put your API keys (openai, serper, scraperapi) under it.

## General Usage:

```python
from factool  import Factool

# Initialize a Factool instance with the specified keys. foundation_model could be either "gpt-3.5-turbo" or "gpt-4"
factool_instance = Factool("gpt-3.5-turbo")


inputs = [
            {
                "prompt": "Introduce Graham Neubig",
                "response": "Graham Neubig is a professor at MIT",
                "category": "kbqa",
                "entry_point": "answer_question",
            },
]



response_list = factool_instance.run(inputs)

print(response_list)
```


```
'''
knowledge-based QA: 
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'category': 'kbqa', 
        'claims': [claim_11, claim_12, ..., claims_1n], 
        'queries': [[query_111, query_112], [query_121, query_122], ..[query_1n1, query_1n2]], 
        'evidences': [[evidences_11], [evidences_12], ..., [evidences_1n]], 
        'claim_level_factuality': [{claim_11, reasoning_11, error_11, correction_11, factuality_11}, {claim_12, reasoning_12, error_12, correction_12, factuality_12}, ..., {claim_1n, reasoning_1n, error_1n, correction_1n, factuality_1n}], 
        'response_level_factuality': factuality_1
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'category': 'kbqa',
        'claims': [claim_21, claim_22, ..., claims_2n], 
        'queries': [[query_211, query_212], [query_221, query_222], ..., [query_2n1, query_2n2]], 
        'evidences': [[evidences_21], [evidences_22], ..., [evidences_2n]], 
        'claim_level_factuality': [{claim_21, reasoning_21, error_21, correction_21, factuality_21}, {claim_22, reasoning_22, error_22, correction_22, factuality_22}, ..., {claim_2n, reasoning_2n, error_2n, correction_2n, factuality_2n}],
        'response_level_factuality': factuality_2,
    },
    ...
]

code generation:
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'claim': claim_1,
        'category': 'code',
        'testcases_queries': [testcase_query_11, testcase_query_12, testcase_query_13], 
        'potential_solutions_queries': [potential_solution_query_11, potential_solution_query_12, potential_solution_query_13], 
        'exec_results': [[evidences_111, evidences_112, evidences_113, evidences_114], [evidences_121, evidences_122, evidences_123, evidences_124], [evidences_131, evidences_132, evidences_133, evidences_134]], 
        'claim_level_factuality': factuality_1,
        'response_level_factuality': factuality_1,
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'claim': claim_2,
        'category': 'code',
        'testcases_queries': [testcase_query_21, testcase_query_22, testcase_query_23], 
        'potential_solutions_queries': [potential_solution_query_21, potential_solution_query_22, potential_solution_query_23], 
        'exec_results': [[evidences_211, evidences_212, evidences_213, evidences_214], [evidences_221, evidences_222, evidences_223, evidences_224], [evidences_231, evidences_232, evidences_233, evidences_234]], 
        'claim_level_factuality': factuality_2,
        'response_level_factuality': factuality_2,
    },
    ...
]

mathematical problem solving: 
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'category': 'math', 
        'claims': [claim_11, claim_12, ..., claims_1n], 
        'queries': [query_11, query_12, ..., query_1n], 
        'execution_results': [exec_result_11, exec_result_12, ..., exec_result_1n],
        'claim_level_factuality': [factuality_11, factuality_12, ..., factuality_1n], 
        'response_level_factuality': factuality_1
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'category': 'math', 
        'claims': [claim_21, claim_22, ..., claims_2n], 
        'queries': [query_21, query_22, ..., query_2n], 
        'execution_results': [exec_result_21, exec_result_22, ..., exec_result_2n],
        'claim_level_factuality': [factuality_21, factuality_22, ..., factuality_2n], 
        'response_level_factuality': factuality_2
    },
    ...
]

scientific literature review:
response_list = 
[
    {
        'prompt': prompt_1, 
        'response': response_1, 
        'category': 'scientific', 
        'claims': [claim_11, claim_12, ..., claims_1n], 
        'queries': [query_11, query_12, ..., query_1n], 
        'evidences': [evidences_11, evidences_12, ..., evidences_1n], 
        'claim_level_factuality': [{claim_11, evidence_11, error_11, factuality_11}, {claim_12, evidence_12, error_12, factuality_12}, ..., {claim_1n, evidence_1n, error_1n, factuality_1n}], 
        'response_level_factuality': factuality_1
    },
    {
        'prompt': prompt_2, 
        'response': response_2, 
        'category': 'scientific', 
        'claims': [claim_21, claim_22, ..., claims_2n], 
        'queries': [query_21, query_22, ..., query_2n],
        'evidences': [evidences_21, evidences_22, ..., evidences_2n], 
        'claim_level_factuality': [{claim_21, evidence_21, error_21, factuality_21}, {claim_22, evidence_22, error_22, factuality_22}, ..., {claim_2n, evidence_2n, error_2n, factuality_2n}], 
        'response_level_factuality': factuality_2
    },
    ...
]

```

## Steps for setting up FACTOOL ChatGPT Plugin:
1. Install the package: [Installation](#installation)
2. git clone the repo: `git clone https://github.com/GAIR-NLP/factool.git`
3. `cd ./factool/plugin_config`
4. Create your `keys.yaml`
5. Run the API locally: `uvicorn main:app --host 0.0.0.0 --port ${PORT:-5003}`
6. Enter plugin store of [ChatGPT Website](https://chat.openai.com/?model=gpt-4-plugins)
7. Click 'develop your own plugin' then enter the website domain `localhost: 5003` under 'domain'.

## Experiments:

1. Experimental results:

- Exp I: `.results/knowledge_QA/RoSE/`

- Exp II: `.results/knowledge_QA`, `.results/code`, `.results/math`, `.results/scientific`

- Exp III `.results/chat`

2. Get the final results and statistics for fine-grained analysis:

- Exp I: `python ./results/knowledge_QA/RoSE/run_rose_claim_extraction.py`

- Exp II: `bash ./results/evaluation.sh`

- Exp III: `python ./results/chat/calc_stats.py`

3. Reimplement the experiments:

- Exp II: `bash run_experiments.sh`

- Exp III: `bash run_chatbot.sh`

