Metadata-Version: 2.1
Name: krag
Version: 0.0.3
Summary: A Python package for RAG performance evaluation
License: MIT
Keywords: RAG,evaluation,BM25,NLP
Author: Pandas-Studio
Author-email: ontofinance@gmail.com
Requires-Python: >=3.10,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: cython (>=3.0.11,<4.0.0)
Requires-Dist: kiwipiepy (>=0.18.0,<0.19.0)
Requires-Dist: korouge-score (>=0.1.4,<0.2.0)
Requires-Dist: langchain (>=0.2.12,<0.3.0)
Requires-Dist: langchain-community (>=0.2.11,<0.3.0)
Requires-Dist: langchain-openai (>=0.1.20,<0.2.0)
Requires-Dist: numpy (>=1.21,<2.0)
Requires-Dist: python-dotenv (==1.0.0)
Requires-Dist: rank-bm25 (>=0.2.2,<0.3.0)
Requires-Dist: twine (>=5.1.1,<6.0.0)
Description-Content-Type: text/markdown

# Krag

Krag is a Python package designed for evaluating retrieval-augmented generation (RAG) systems. It provides tools to calculate various evaluation metrics such as hit rate, recall@k, precision@k, MRR (Mean Reciprocal Rank), MAP (Mean Average Precision), NDCG (Normalized Discounted Cumulative Gain), and more.

Krag는 RAG 시스템(Retrieval-Augmented Generation)을 평가하기 위해 설계된 Python 패키지입니다. Hit Rate, Recall@k, Precision@k, MRR(Mean Reciprocal Rank), MAP(Mean Average Precision), NDCG(Normalized Discounted Cumulative Gain) 등 다양한 평가 지표를 계산하는 도구를 제공합니다.

## Installation / 설치 방법

You can install Krag using pip:

```bash
pip install krag
```

Krag는 pip을 통해 설치할 수 있습니다:

```bash
pip install krag
```

## Usage / 사용 예시

Here is a simple example of how to use the `KragDocument` and `OfflineRetrievalEvaluators` classes provided by this package.

다음은 Krag 패키지에서 제공하는 `KragDocument` 및 `OfflineRetrievalEvaluators` 클래스를 사용하는 간단한 예제입니다.

```python
from krag.document import KragDocument
from krag.evaluators import OfflineRetrievalEvaluators

# 각 쿼리에 대한 정답 문서 
actual_docs = [
    #  Query 1
    [
        KragDocument(metadata={'id': 1}, page_content='1'),
        KragDocument(metadata={'id': 2}, page_content='2'),
        KragDocument(metadata={'id': 3}, page_content='3'),
    ],
    #  Query 2
    [
        KragDocument(metadata={'id': 4}, page_content='4'),
        KragDocument(metadata={'id': 5}, page_content='5'),
        KragDocument(metadata={'id': 6}, page_content='6'),
    ],
    #  Query 3
    [
        KragDocument(metadata={'id': 7}, page_content='7'),
        KragDocument(metadata={'id': 8}, page_content='8'),
        KragDocument(metadata={'id': 9}, page_content='9'),
    ],
]


# 각 쿼리에 대한 검색 결과 
predicted_docs = [
    #  Query 1
    [
        KragDocument(metadata={'id': 1}, page_content='1'),
        KragDocument(metadata={'id': 4}, page_content='4'),
        KragDocument(metadata={'id': 7}, page_content='7'),
        KragDocument(metadata={'id': 2}, page_content='2'),
        KragDocument(metadata={'id': 5}, page_content='5'),
        KragDocument(metadata={'id': 8}, page_content='8'),
        KragDocument(metadata={'id': 3}, page_content='3'),
        KragDocument(metadata={'id': 6}, page_content='6'),
        KragDocument(metadata={'id': 9}, page_content='9')
    ],

    #  Query 2
    [
        KragDocument(metadata={'id': 4}, page_content='4'),
        KragDocument(metadata={'id': 1}, page_content='1'),
        KragDocument(metadata={'id': 7}, page_content='7'),
        KragDocument(metadata={'id': 5}, page_content='5'),
        KragDocument(metadata={'id': 2}, page_content='2'),
        KragDocument(metadata={'id': 8}, page_content='8'),
        KragDocument(metadata={'id': 6}, page_content='6'),
        KragDocument(metadata={'id': 3}, page_content='3'),
        KragDocument(metadata={'id': 9}, page_content='9')
    ],
    
    #  Query 3
    [
        KragDocument(metadata={'id': 7}, page_content='7'),
        KragDocument(metadata={'id': 2}, page_content='2'),
        KragDocument(metadata={'id': 4}, page_content='4'),
        KragDocument(metadata={'id': 8}, page_content='8'),
        KragDocument(metadata={'id': 5}, page_content='5'),
        KragDocument(metadata={'id': 3}, page_content='3'),
        KragDocument(metadata={'id': 9}, page_content='9'),
        KragDocument(metadata={'id': 6}, page_content='6'),
        KragDocument(metadata={'id': 1}, page_content='1')
    ]
]


# Initialize the evaluator / 평가도구 초기화 
evaluator = OfflineRetrievalEvaluators(actual_docs, predicted_docs, match_method="rouge1", threshold=0.8)

# Calculate evaluation metrics / 평가지표 계산 
hit_rate = evaluator.calculate_hit_rate()
mrr = evaluator.calculate_mrr()
recall_at_3 = evaluator.calculate_recall_k(k=3)
precision_at_5 = evaluator.calculate_precision_k(k=5)
map_at_5 = evaluator.calculate_map_k(k=5)
ndcg_at_5 = evaluator.ndcg_at_k(k=5)

# Print results / 결과 출력
print(f"Hit Rate: {hit_rate}")
print(f"MRR: {mrr}")
print(f"Recall@3: {recall_at_3}")
print(f"Precision@5: {precision_at_5}")
print(f"MAP@5: {map_at_5}")
print(f"NDCG@5: {ndcg_at_5}")
```

### Key Features / 주요 기능

1. **Document Matching (문서 매칭)**:
    - The evaluator provides multiple methods to match actual and predicted documents, including exact text match and ROUGE-based matching (`rouge1`, `rouge2`, `rougeL`).
    - 평가자는 실제 문서와 예측된 문서를 매칭하기 위한 여러 가지 방법을 제공합니다. 여기에는 정확한 텍스트 매칭과 ROUGE 기반 매칭(`rouge1`, `rouge2`, `rougeL`)이 포함됩니다.

2. **Evaluation Metrics (평가지표)**:
    - **Hit Rate**: Measures the proportion of actual documents correctly identified in the predicted set.
    - **Hit Rate (적중률)**: 예측된 문서 집합에서 실제 문서가 올바르게 식별된 비율을 측정합니다.
    - **Recall@k**: Evaluates how many relevant documents are present in the top-k predictions.
    - **Recall@k**: 상위 k개의 예측에서 얼마나 많은 관련 문서가 포함되었는지를 평가합니다.
    - **Precision@k**: Evaluates the precision of the top-k predictions.
    - **Precision@k**: 상위 k개의 예측의 정밀도를 평가합니다.
    - **MRR (Mean Reciprocal Rank)**: Averages the reciprocal of the rank of the first relevant document.
    - **MRR (Mean Reciprocal Rank, 평균 역순위)**: 첫 번째 관련 문서의 순위의 역수를 평균내어 계산합니다.
    - **MAP@k (Mean Average Precision at k)**: Averages precision across top-k ranks where relevant documents appear.
    - **MAP@k (Mean Average Precision at k)**: 상위 k위 안에 관련 문서가 등장하는 순위에서의 정밀도를 평균냅니다.
    - **NDCG@k (Normalized Discounted Cumulative Gain at k)**: Evaluates the ranking quality considering the order of documents based on relevance scores, with softmax normalization applied when using ROUGE scores.
    - **NDCG@k (Normalized Discounted Cumulative Gain at k)**: 관련성 점수를 바탕으로 문서 순서를 고려하여 순위 품질을 평가합니다. ROUGE 점수를 사용할 때 소프트맥스 정규화를 적용합니다.

3. **BM25 Integration (BM25 통합)**:
    - Seamlessly integrates with BM25 retrievers for scoring and evaluating document retrieval performance.
    - 문서 검색 성능을 평가하기 위해 BM25 리트리버와 원활하게 통합됩니다.

#### Using Custom Tokenizer with BM25 Retriever (BM25 리트리버와 커스텀 토크나이저 사용하기)
You can also use the `KiWiBM25RetrieverWithScore` class with a custom Kiwi tokenizer for advanced text retrieval. Here's an example:

고급 텍스트 검색을 위해 커스텀 Kiwi 토크나이저와 함께 `KiWiBM25RetrieverWithScore` 클래스를 사용할 수 있습니다. 아래는 예제입니다:

```python
from krag.tokenizers import KiwiTokenizer
from krag.retrievers import KiWiBM25RetrieverWithScore

# Create a KiwiTokenizer with specific options
# 특정 옵션을 사용하여 KiwiTokenizer 생성
kiwi_tokenizer = KiwiTokenizer(model_type='knlm', typos='basic')

# Create the retriever instance with the custom tokenizer
# 커스텀 토크나이저로 리트리버 인스턴스 생성
retriever = KiWiBM25RetrieverWithScore(
    documents=langchain_docs, 
    kiwi_tokenizer=kiwi_tokenizer, 
    k=5, 
    threshold=0.0,
)

# Use the retriever
# 리트리버 사용
query = "K-RAG 패키지는 어떤 평가지표를 제공하나요?"

retrieved_docs = retriever.invoke(query, 2)

# Print retrieved documents with their BM25 scores
# 검색된 문서와 그 BM25 점수를 출력
for doc in retrieved_docs:
    print(doc.metadata.get('doc_id'))
    print(doc.metadata["bm25_score"])
   

 print(doc.page_content)
    print("------------------------------")

```

## License

This project is licensed under the MIT License - see the [MIT License](https://opensource.org/licenses/MIT) for more details.


## Contact

If you have any questions, feel free to reach out via [email](mailto:ontofinances@gmail.com>).

