Metadata-Version: 2.4
Name: aegis-detect
Version: 0.1.2
Summary: AI Code Classifier Tool
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: huggingface-hub>=0.15.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: peft>=0.4.0
Requires-Dist: safetensors>=0.3.0
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.30.0
Provides-Extra: dev
Requires-Dist: accelerate>=0.20.0; extra == "dev"
Requires-Dist: datasets>=2.0.0; extra == "dev"
Requires-Dist: matplotlib>=3.5.0; extra == "dev"
Requires-Dist: openai>=1.0.0; extra == "dev"
Requires-Dist: pandas>=1.5.0; extra == "dev"
Requires-Dist: python-dotenv>=0.19.0; extra == "dev"
Requires-Dist: scikit-learn>=1.0.0; extra == "dev"
Requires-Dist: seaborn>=0.13.2; extra == "dev"
Requires-Dist: tqdm>=4.60.0; extra == "dev"
Provides-Extra: all
Requires-Dist: aegis-detect[dev]; extra == "all"
Dynamic: license-file

# Aegis: AI Python Code Detection Model

## Overview
Aegis is a fine-tuned CodeBERT model designed to classify AI-generated and human Python code. While CodeBERT contains 125 million parameters, Aegis was efficiently trained locally using LoRA (Low-Rank Adaptation), updating only a subset of the original parameters.

This project investigates classifying code based on semantic differences. Consequently, the dataset (20K Python samples: 10K AI + 10K Human) was aggressively cleaned to ensure standard formatting and the removal of comments and docstrings. A confidence threshold of 0.7 was established to flag samples as AI-generated only when strong evidence exists. Aegis is not a definitive judge; predictions can be imperfect, particularly in tasks where semantic convergence between humans and AI is observed (e.g., LeetCode solutions).

## Installation

```bash
pip install aegis-detect
```

### CLI Usage

**Supported commands**:
```bash
# Predicting using a file
aegis --file path/to/code.py

# Predicting using text
aegis --text "def add(a, b):\n    return a + b"

# JSON output
aegis --file path/to/code.py --json > result.json

# Setting a threshold for AI classification 
aegis --file path/to/code.py --threshold 0.7

# Help
aegis --help

# Uninstall
pip uninstall aegis-detect
aegis-cleanup 
```

**Notes**:
- On first run, the model adapter is downloaded from the Hugging Face repo [anthonyq7/aegis](https://huggingface.co/anthonyq7/aegis) and cached under `~/.aegis/models`.
- Internet access is required on the first run; subsequent runs use local cache. 
- The CLI prints the predicted label and probabilities for human and AI.

## Key Results

### Model Performance
- **Accuracy**: 85.10%
- **Precision**: 83.37%
- **Recall**: 87.70%
- **F1-Score**: 85.48%

### Confusion Matrix
![Alt text](model/results/confusion_matrix.png)

### Attention Heatmap
![Alt text](model/results/attention_weights.png)

## Contact
**Email**: a.j.qin@wustl.edu

## License
This project is licensed under the [MIT License](LICENSE).
