Metadata-Version: 2.4
Name: evalvault
Version: 1.75.0
Summary: RAG evaluation system using Ragas with Phoenix/Langfuse tracing
Project-URL: Homepage, https://github.com/ntts9990/EvalVault
Project-URL: Documentation, https://github.com/ntts9990/EvalVault#readme
Project-URL: Repository, https://github.com/ntts9990/EvalVault.git
Project-URL: Issues, https://github.com/ntts9990/EvalVault/issues
Project-URL: Changelog, https://github.com/ntts9990/EvalVault/releases
Author: EvalVault Contributors
Maintainer: EvalVault Contributors
License: Apache-2.0
License-File: LICENSE.md
Keywords: ai,evaluation,langfuse,llm,machine-learning,nlp,observability,opentelemetry,phoenix,rag,ragas,retrieval-augmented-generation,testing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: chainlit>=2.9.5
Requires-Dist: chardet
Requires-Dist: fastapi>=0.128.0
Requires-Dist: instructor
Requires-Dist: langchain-openai
Requires-Dist: langfuse
Requires-Dist: matplotlib<3.9.0,>=3.8.0
Requires-Dist: networkx
Requires-Dist: openai
Requires-Dist: openpyxl
Requires-Dist: pandas
Requires-Dist: pydantic
Requires-Dist: pydantic-settings
Requires-Dist: pypdf>=4.3.0
Requires-Dist: python-multipart
Requires-Dist: ragas==0.4.2
Requires-Dist: rich
Requires-Dist: truststore>=0.10.4
Requires-Dist: typer
Requires-Dist: uvicorn>=0.40.0
Requires-Dist: xlrd
Provides-Extra: analysis
Requires-Dist: scikit-learn>=1.3.0; extra == 'analysis'
Requires-Dist: xgboost>=2.0.0; extra == 'analysis'
Provides-Extra: anthropic
Requires-Dist: anthropic; extra == 'anthropic'
Requires-Dist: langchain-anthropic; extra == 'anthropic'
Provides-Extra: benchmark
Requires-Dist: datasets>=2.0.0; extra == 'benchmark'
Requires-Dist: lm-eval[api]>=0.4.0; extra == 'benchmark'
Provides-Extra: dashboard
Requires-Dist: matplotlib<3.9.0,>=3.8.0; extra == 'dashboard'
Provides-Extra: dev
Requires-Dist: anthropic; extra == 'dev'
Requires-Dist: arize-phoenix>=8.0.0; extra == 'dev'
Requires-Dist: datasets>=2.0.0; extra == 'dev'
Requires-Dist: faiss-cpu>=1.8.0; extra == 'dev'
Requires-Dist: ijson>=3.3.0; extra == 'dev'
Requires-Dist: kiwipiepy>=0.18.0; extra == 'dev'
Requires-Dist: langchain-anthropic; extra == 'dev'
Requires-Dist: lm-eval[api]>=0.4.0; extra == 'dev'
Requires-Dist: mkdocs-material>=9.5.0; extra == 'dev'
Requires-Dist: mkdocs>=1.5.0; extra == 'dev'
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'dev'
Requires-Dist: mlflow>=2.0.0; extra == 'dev'
Requires-Dist: openinference-instrumentation-langchain>=0.1.0; extra == 'dev'
Requires-Dist: opentelemetry-api>=1.20.0; extra == 'dev'
Requires-Dist: opentelemetry-exporter-otlp>=1.20.0; extra == 'dev'
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'dev'
Requires-Dist: pgvector>=0.2.5; extra == 'dev'
Requires-Dist: psycopg[binary]>=3.0.0; extra == 'dev'
Requires-Dist: pydeps>=3.0.1; extra == 'dev'
Requires-Dist: pymdown-extensions>=10.7.0; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-asyncio; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest-html; extra == 'dev'
Requires-Dist: pytest-mock; extra == 'dev'
Requires-Dist: pytest-rerunfailures; extra == 'dev'
Requires-Dist: pytest-xdist; extra == 'dev'
Requires-Dist: python-multipart; extra == 'dev'
Requires-Dist: rank-bm25>=0.2.2; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Requires-Dist: scikit-learn<1.4.0,>=1.3.0; extra == 'dev'
Requires-Dist: sentence-transformers>=5.2.0; extra == 'dev'
Requires-Dist: xgboost>=2.0.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5.0; extra == 'docs'
Requires-Dist: mkdocs>=1.5.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == 'docs'
Requires-Dist: pymdown-extensions>=10.7.0; extra == 'docs'
Provides-Extra: korean
Requires-Dist: kiwipiepy>=0.18.0; extra == 'korean'
Requires-Dist: rank-bm25>=0.2.2; extra == 'korean'
Requires-Dist: sentence-transformers>=5.2.0; extra == 'korean'
Provides-Extra: mlflow
Requires-Dist: mlflow>=2.0.0; extra == 'mlflow'
Provides-Extra: perf
Requires-Dist: faiss-cpu>=1.8.0; extra == 'perf'
Requires-Dist: ijson>=3.3.0; extra == 'perf'
Provides-Extra: phoenix
Requires-Dist: arize-phoenix>=8.0.0; extra == 'phoenix'
Requires-Dist: openinference-instrumentation-langchain>=0.1.0; extra == 'phoenix'
Requires-Dist: opentelemetry-api>=1.20.0; extra == 'phoenix'
Requires-Dist: opentelemetry-exporter-otlp>=1.20.0; extra == 'phoenix'
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == 'phoenix'
Provides-Extra: postgres
Requires-Dist: pgvector>=0.2.5; extra == 'postgres'
Requires-Dist: psycopg[binary]>=3.0.0; extra == 'postgres'
Provides-Extra: secrets
Requires-Dist: boto3; extra == 'secrets'
Requires-Dist: google-cloud-secret-manager; extra == 'secrets'
Requires-Dist: hvac; extra == 'secrets'
Provides-Extra: timeseries
Requires-Dist: aeon>=1.3.0; extra == 'timeseries'
Requires-Dist: numba>=0.55.0; extra == 'timeseries'
Provides-Extra: web
Description-Content-Type: text/markdown

# EvalVault

RAG(Retrieval-Augmented Generation) 시스템을 대상으로 **평가(Eval) → 분석(Analysis) → 추적(Tracing) → 개선 루프**를 하나의 워크플로로 묶는 CLI + Web UI 플랫폼입니다.

[![PyPI](https://img.shields.io/pypi/v/evalvault.svg)](https://pypi.org/project/evalvault/)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![CI](https://github.com/ntts9990/EvalVault/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/ntts9990/EvalVault/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE.md)

English version? See `README.en.md`.

---

## Quickstart (CLI)

```bash
uv sync --extra dev
cp .env.example .env

uv run evalvault run --mode simple tests/fixtures/e2e/insurance_qa_korean.json \
  --metrics faithfulness,answer_relevancy \
  --profile dev \
  --auto-analyze
```

Tip: 기본 저장소는 Postgres+pgvector입니다. SQLite를 쓰려면 `--db` 또는 `DB_BACKEND=sqlite` + `EVALVAULT_DB_PATH`를 지정하세요.

---

## 핵심 기능

- **End-to-End 평가 루프**: Eval → Analysis → Tracing → Improvement를 한 흐름으로 실행
- **Dataset 중심 운영**: 합격 기준(threshold)을 데이터셋에 유지
- **Artifacts-first**: 보고서뿐 아니라 모듈별 원본 결과를 구조화 저장
- **옵션형 Observability**: Phoenix/Langfuse/MLflow는 필요할 때만 활성화
- **CLI + Web UI**: 동일 run_id 기반으로 히스토리/비교/리포트 통합

---

## 문서 허브

- 문서 인덱스: `docs/INDEX.md`
- 핸드북(교과서형): `docs/handbook/INDEX.md`
- 외부 요약본: `docs/handbook/EXTERNAL.md`
- 운영 가이드(로컬/도커/관측/런북): `docs/handbook/CHAPTERS/04_operations.md`
- 워크플로(실행/분석/비교/회귀): `docs/handbook/CHAPTERS/03_workflows.md`
- 품질/테스트/CI: `docs/handbook/CHAPTERS/06_quality_and_testing.md`
- 아키텍처: `docs/handbook/CHAPTERS/01_architecture.md`
- 오프라인/폐쇄망(Docker/모델 캐시): `docs/guides/OFFLINE_DOCKER.md`, `docs/guides/OFFLINE_MODELS.md`

참고(호환성): `docs/guides/USER_GUIDE.md`, `docs/guides/DEV_GUIDE.md` 등 일부 문서는 과거 링크 호환을 위한 deprecated 스텁이며, 최신 내용은 handbook을 따릅니다.

---

## Web UI

```bash
# API
uv run evalvault serve-api --reload

# Frontend
cd frontend
npm install
npm run dev
```

브라우저에서 `http://localhost:5173` 접속 후, Evaluation Studio에서 실행/히스토리/리포트를 확인합니다.

---

## 오프라인/폐쇄망

- Docker 이미지 번들: `docs/guides/OFFLINE_DOCKER.md`
- NLP 모델 캐시 번들: `docs/guides/OFFLINE_MODELS.md`

LLM 모델은 폐쇄망 내부 인프라가 관리하며, EvalVault는 **분석용 NLP 모델 캐시**만 번들에 포함합니다.

---

## 기여

```bash
uv run ruff check src/ tests/
uv run ruff format src/ tests/
uv run pytest tests -v
```

- 기여 가이드: `CONTRIBUTING.md`
- 개발/테스트 루틴: `AGENTS.md`, `docs/handbook/CHAPTERS/06_quality_and_testing.md`

---

## License

EvalVault is licensed under the [Apache 2.0](LICENSE.md) license.
