Metadata-Version: 2.4
Name: llmleaks
Version: 2.0.0
Summary: Security tool for detecting exposed LLM/AI API keys in public repositories
Project-URL: Homepage, https://github.com/fariiixm/llmleaks
Project-URL: Documentation, https://github.com/fariiixm/llmleaks#readme
Project-URL: Repository, https://github.com/fariiixm/llmleaks
Project-URL: Issues, https://github.com/fariiixm/llmleaks/issues
Project-URL: Changelog, https://github.com/fariiixm/llmleaks/blob/main/CHANGELOG.md
Author: fariiixm
License-Expression: MIT
License-File: LICENSE
Keywords: api-key,auditing,credentials,github,secrets,security,vulnerability
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: httpx>=0.25.0
Provides-Extra: dev
Requires-Dist: mypy>=1.5.0; extra == 'dev'
Requires-Dist: pre-commit>=3.4.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: respx>=0.20.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.0.0; extra == 'docs'
Requires-Dist: mkdocs>=1.5.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.23.0; extra == 'docs'
Description-Content-Type: text/markdown

# LLMLeaks

A security auditing tool for detecting exposed API keys in public GitHub repositories. Built with async Python for high performance and designed with SOLID principles for extensibility.

## Overview

LLMLeaks scans GitHub repositories for accidentally committed API keys and validates their status against provider APIs. This tool is designed for:

- **Security researchers** conducting authorized vulnerability assessments
- **Organizations** auditing their own repositories for leaked credentials
- **DevSecOps teams** implementing proactive secret detection

## Features

- Asynchronous scanning with fail-fast quota management
- Resilient HTTP validation (429 rate-limit = key exists, not invalid)
- Provider-scoped deduplication (no cross-provider cache poisoning)
- Greedy regex with negative lookaheads (no key truncation)
- Support for 10 AI/LLM providers:
  - OpenAI (including `sk-proj-` and `sk-svcacct-` variants)
  - Anthropic (Claude)
  - Google Gemini
  - DeepSeek
  - Groq
  - Perplexity
  - Hugging Face
  - OpenRouter
  - Replicate
  - RunwayML
- Extensible validator architecture (Strategy + Factory patterns)
- Real-time validation of discovered keys
- Configurable concurrency and output

## Installation

### From PyPI

```bash
pip install llmleaks
```

### From Source

```bash
git clone https://github.com/fariiixm/llmleaks.git
cd llmleaks
pip install -e .
```

### Development Installation

```bash
pip install -e ".[dev]"
```

## Usage

### Command Line

```bash
# Basic usage
llmleaks --token YOUR_GITHUB_TOKEN --query "API_KEY"

# Scan with custom parameters
llmleaks --token YOUR_GITHUB_TOKEN \
    --query "sk-proj-" \
    --pages 10 \
    --out results.txt \
    --concurrency 3

# Using environment variable for token
export GITHUB_TOKEN=ghp_xxxxxxxxxxxx
llmleaks --query "OPENAI_API_KEY"
```

### As a Python Module

```bash
python -m auditor --token YOUR_GITHUB_TOKEN --query "API_KEY"
```

### Programmatic Usage

```python
import asyncio
from auditor import AsyncKeyAuditor

async def main():
    auditor = AsyncKeyAuditor(
        github_token="YOUR_GITHUB_TOKEN",
        max_concurrent=5
    )

    stats = await auditor.run(
        query="API_KEY",
        pages=5,
        output_file="results.txt"
    )

    print(f"Found {stats['valid']} valid keys")

asyncio.run(main())
```

### Adding Custom Validators

```python
from auditor.core.base import Validator
from auditor.core.factory import ValidatorFactory

class CustomValidator(Validator):
    name = "custom"

    async def validate(self, client, key):
        return await self._safe_fetch(
            client,
            "https://api.example.com/verify",
            headers={"Authorization": f"Bearer {key}"},
        )

# Register the validator
ValidatorFactory.register(
    "custom",
    r"custom-[a-z0-9]{32}(?=[^a-z0-9]|$)",
    CustomValidator()
)
```

## CLI Options

| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| `--token` | | GitHub Personal Access Token | `$GITHUB_TOKEN` |
| `--query` | `-q` | GitHub code search query | `API_KEY` |
| `--pages` | `-p` | Number of result pages to scan | `5` |
| `--out` | `-o` | Output file for valid keys | `valid_keys.txt` |
| `--concurrency` | `-c` | Maximum concurrent requests | `5` |
| `--quiet` | | Suppress progress output | `false` |
| `--verbose` | `-v` | Enable debug logging | `false` |
| `--list-providers` | | List supported API providers | |

## Architecture

```
llmleaks/
├── src/auditor/
│   ├── __init__.py          # Package exports (v2.0.0)
│   ├── __main__.py          # Module entry point
│   ├── cli.py               # Command-line interface
│   ├── core/
│   │   ├── base.py          # Abstract Validator + _safe_fetch
│   │   ├── factory.py       # ValidatorFactory registry (10 providers)
│   │   └── engine.py        # AsyncKeyAuditor (fail-fast, provider cache)
│   └── validators/
│       └── llm.py           # LLM provider validators (10 classes)
├── tests/                   # Test suite
├── docs/                    # Documentation
└── pyproject.toml          # Project configuration
```

### Design Principles

- **Strategy Pattern**: Validators implement a common interface for different providers
- **Factory Pattern**: Centralized registry for validator management
- **Dependency Injection**: Validators are decoupled from the engine
- **Async I/O**: Non-blocking network operations for performance
- **Fail-Fast**: Startup handshake verifies quota before scanning
- **Resilient Validation**: 429 = key exists (rate-limited, not invalid)

## Requirements

- Python 3.10+
- GitHub Personal Access Token with `repo` scope

### Creating a GitHub Token

1. Go to GitHub Settings > Developer settings > Personal access tokens
2. Click "Generate new token (classic)"
3. Select the `repo` scope
4. Copy the generated token

## Development

### Running Tests

```bash
pytest
```

### Running Tests with Coverage

```bash
pytest --cov=auditor --cov-report=html
```

### Code Quality

```bash
# Linting
ruff check src tests

# Type checking
mypy src
```

## Ethical Use

This tool is intended for **authorized security research only**. Users must:

1. Only scan repositories they own or have explicit authorization to audit
2. Follow responsible disclosure practices for any vulnerabilities found
3. Comply with GitHub's Terms of Service and API usage policies
4. Never use discovered keys for unauthorized access

## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/new-validator`)
3. Commit your changes (`git commit -am 'Add new validator'`)
4. Push to the branch (`git push origin feature/new-validator`)
5. Open a Pull Request

See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.

## Quick Start Guides

- [Google Colab / Windows / Linux](docs/QUICKSTART.md) - Step-by-step setup
- [Colab Notebook](examples/llmleaks_colab.ipynb) - One-click notebook
- [Simple Example](examples/simple_scan.py) - Python script example

## License

MIT License - see [LICENSE](LICENSE) for details.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history.
