Metadata-Version: 2.4
Name: veritasai
Version: 0.1.13
Summary: Python utilities for veritasai (edit this description)
Home-page: https://github.com/sopolat/veritasai
Download-URL: https://github.com/sopolat/veritasai/archive/refs/tags/v0.1.tar.gz
Author: Suleyman Olcay Polat
Author-email: suleyman.olcay.polat@gmail.com
License: MIT License
        
        Copyright (c) 2025 sopolat
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/sopolat/veritasai
Project-URL: Repository, https://github.com/sopolat/veritasai
Project-URL: Issues, https://github.com/sopolat/veritasai/issues
Keywords: veritasai
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: spacy
Requires-Dist: sentence-transformers
Requires-Dist: transformers
Requires-Dist: accelerate
Requires-Dist: bitsandbytes
Requires-Dist: peft
Requires-Dist: torch
Dynamic: author-email
Dynamic: download-url
Dynamic: home-page
Dynamic: license-file

# VeritasAI

[![PyPI - Version](https://img.shields.io/pypi/v/veritasai.svg)](https://pypi.org/project/veritasai/)
[![Python](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![Status](https://img.shields.io/badge/status-alpha-orange.svg)](#project-status)

**Multilingual fact checker that uses provided knowledge bases to make sure that its claims are reliable.**

VeritasAI is a **fact verification** toolkit designed for multilingual settings and domains where **traceable evidence** matters (e.g., medical). It works **with your own knowledge base**—raw text is sufficient—and verifies that any claim produced by the pipeline can be traced back to a specific **line or paragraph** in that corpus.

> **Alpha notice:** the API may change. Contributions are not currently accepted; feel free to open issues.

---

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Requirements](#requirements)
- [Quickstart](#quickstart)
- [Usage & API Overview](#usage--api-overview)
  - [Core classes](#core-classes)
  - [Wrapper pipeline](#wrapper-pipeline)
- [Preparing a Knowledge Base](#preparing-a-knowledge-base)
- [Project Status](#project-status)
- [Roadmap](#roadmap)
- [Limitations](#limitations)
- [Why VeritasAI vs Alternatives](#why-veritasai-vs-alternatives)
- [Documentation & Support](#documentation--support)
- [License](#license)
- [Citation](#citation)

---

## Features

- **Multilingual**: works across languages as long as a relevant knowledge base is supplied.
- **Evidence-grounded**: every verified claim links back to a **line/paragraph** in your corpus.
- **Modular pipeline**: separate components for claim extraction, evidence retrieval, and claim verification.
- **GPU-first**: CUDA support for efficient large-model inference.
- **No telemetry**: VeritasAI does **not** collect any usage data.

> _Benchmarks and end-to-end demos are coming; see the Colab links once they are published._

## Installation

```bash
pip install veritasai
```

> If you don't have a CUDA-enabled PyTorch yet, install one that matches your system:
>
> ```bash
> # Example for CUDA 12.x (adjust per https://pytorch.org/get-started/locally/)
> pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio
> ```

### Optional heavy dependencies (usually auto-installed)

- `torch`, `transformers`, `accelerate`, `bitsandbytes`, `peft`, `spacy`, `sentence-transformers`

## Requirements

- **Python**: 3.12
- **OS**: Linux, macOS, or Windows
- **Hardware**: NVIDIA GPU with CUDA

## Quickstart

> A full **Colab notebook** will be linked [here](https://colab.research.google.com/drive/1C8caA_1QsIpfWnEB19CoCgtKNinwg0dC?usp=sharing). For now, here’s a minimal illustrative example using the high-level wrapper.  
> _Note: function names below reflect the intended design; adjust if your local API differs._

```python
from veritasai import VeritasAI  # wrapper that runs extraction → retrieval → verification

# A very small raw-text knowledge base (list of passages or documents)
kb = [
    "Aspirin (acetylsalicylic acid) can help reduce fever and relieve minor aches.",
    "For adults, typical oral doses of ibuprofen are 200–400 mg every 4–6 hours as needed.",
    "Type 2 diabetes is characterized by insulin resistance."
]

# Initialize the pipeline (no extra config required)
va = VeritasAI(knowledge_base=kb)  # or VeritasAI(kb=kb) depending on your API

text = "Aspirin reduces fever in adults."
result = va.verify(text)  # returns a structured object with claims, evidence, and verdicts

# Inspect the first verified claim
first = result.claims[0]
print("Claim:", first.text)
print("Verdict:", first.verdict)        # e.g., "SUPPORTED", "REFUTED", "NOT ENOUGH INFO"
print("Evidence snippet:", first.evidence.snippet)
print("Evidence source:", first.evidence.source_id)  # index/identifier in your KB
```

## Usage & API Overview

### Core classes

VeritasAI exposes three core building blocks (plus a wrapper). Typical usage wires them in sequence:

- **`claim_extractor.py`** — finds atomic claims in raw text.
- **`evidence_retriever.py`** — retrieves candidate evidence spans from the knowledge base.
- **`claim_verifier.py`** — classifies each (claim, evidence) pair (e.g., supported/refuted/unknown).

### Wrapper pipeline

- **`veritasai`** (wrapper) — orchestrates the three stages end-to-end.
  - Example sketch:

    ```python
    from veritasai import ClaimExtractor, EvidenceRetriever, ClaimVerifier, VeritasAI

    extractor = ClaimExtractor()
    retriever = EvidenceRetriever(knowledge_base=kb)
    verifier  = ClaimVerifier()

    # Use individual stages
    claims = extractor.extract(text)
    evidence = retriever.retrieve(claims)
    verdicts = verifier.verify(claims, evidence)

    # Or just use the high-level wrapper
    va = VeritasAI(knowledge_base=kb)
    result = va.verify(text)
    ```

> **No config needed:** the defaults should work out-of-the-box; customize models/parameters as needed once those knobs are exposed.

## Preparing a Knowledge Base

VeritasAI expects **raw text** as the knowledge source. You can start simple:

```python
kb = [
  "Paragraph 1 ...",
  "Paragraph 2 ...",
  # ...
]
```

For larger corpora, consider pre-splitting into **passages** (e.g., 256–512 tokens) and storing an **ID** per passage so evidence links are stable. You can load from files or a database and pass a Python list to the pipeline.

## Project Status

- **Stage**: Alpha (API subject to change)
- **CLI**: None at the moment
- **Public API**: Not finalized
- **Telemetry**: None collected

## Roadmap

- Public Colab notebooks (Quickstart, evaluation, domain-specific demos)
- Configurable models and retrieval backends
- Evaluation scripts and reproducible benchmarks
- Expanded multilingual tests and domain examples (e.g., clinical, legal)
- Optional CLI once the API stabilizes

## Limitations

- Requires a **GPU + CUDA** for practical performance.
- Quality depends heavily on your **knowledge base coverage** and cleanliness.
- The verifier may be conservative without sufficient domain evidence.

## Why VeritasAI vs Alternatives

- **Evidence requirement by design**: the pipeline prioritizes **traceability** to your own corpus.
- **Modular**: you can inspect or replace any stage (extraction / retrieval / verification) as the system evolves.
- **Multilingual**: leverages modern multilingual embeddings and LMs to work across languages (subject to model availability).

> If you’re comparing with generic RAG toolkits: VeritasAI focuses specifically on **claim-level verification** with explicit evidence spans rather than general question answering.

## Documentation & Support

- **Issues**: use the repository’s Issues tab for bugs/questions.
- **Contributing**: external contributions aren’t accepted yet—please open an issue to discuss.
- **Security**: please open a private security advisory in the repo if you discover a vulnerability.

## License

MIT. See [LICENSE](LICENSE).

## Citation

If you use VeritasAI in academic work, please cite it. A BibTeX entry will be provided here.

```text
# (To be added)
```

---

<sub>README generated on 2025-10-04.</sub>
