Metadata-Version: 2.3
Name: mailprobe-py
Version: 0.1.0
Summary: A Python implementation of advanced Bayesian email classification and filtering
License: MIT
Keywords: email,filter,bayesian,classification,spam,multi-category
Author: Peter Bowen
Author-email: peter-mailprobe@bowenfamily.org
Maintainer: Peter Bowen
Maintainer-email: peter-mailprobe@bowenfamily.org
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Communications :: Email :: Filters
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Filters
Requires-Dist: click (>=8.0.0,<9.0.0)
Project-URL: Bug Tracker, https://github.com/huntsberg/mailprobe-py/issues
Project-URL: Changelog, https://github.com/huntsberg/mailprobe-py/blob/main/CHANGELOG.md
Project-URL: Documentation, https://github.com/huntsberg/mailprobe-py/blob/main/README.md
Project-URL: Homepage, https://github.com/huntsberg/mailprobe-py
Project-URL: Repository, https://github.com/huntsberg/mailprobe-py
Description-Content-Type: text/markdown

# MailProbe-Py

[![CI](https://github.com/huntsberg/mailprobe-py/actions/workflows/ci.yml/badge.svg)](https://github.com/huntsberg/mailprobe-py/actions/workflows/ci.yml)
[![PyPI version](https://badge.fury.io/py/mailprobe-py.svg)](https://badge.fury.io/py/mailprobe-py)
[![Python versions](https://img.shields.io/badge/python-3.9%2B-blue)](https://pypi.org/project/mailprobe-py/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Python implementation of the MailProbe Bayesian email classifier, inspired by the original C++ MailProbe by Burton Computer Corporation.

## Overview

MailProbe-Py is a statistical email classifier that uses Bayesian analysis to identify spam emails. It learns from examples of spam and legitimate emails to build a database of word frequencies, then uses this information to score new emails.

## Features

- **Bayesian Analysis**: Uses statistical analysis of word frequencies to identify spam
- **Learning Capability**: Trains on your specific email patterns for personalized filtering
- **Multiple Email Formats**: Supports mbox, maildir, and individual email files
- **MIME Support**: Handles MIME attachments and encoding (quoted-printable, base64)
- **Phrase Analysis**: Analyzes both individual words and multi-word phrases
- **Header Analysis**: Separately analyzes different email headers for improved accuracy
- **Database Management**: Built-in database cleanup and maintenance commands
- **Command Line Interface**: Comprehensive CLI matching original MailProbe functionality

## Installation

### Using Poetry (Recommended)

```bash
git clone https://github.com/huntsberg/mailprobe-py
cd mailprobe-py
poetry install
```

### Using pip

```bash
pip install mailprobe-py
```

For development:

```bash
git clone https://github.com/huntsberg/mailprobe-py
cd mailprobe-py
poetry install --with dev
```

## Quick Start

### Command Line Usage

1. **Create a database**:
   ```bash
   poetry run mailprobe-py create-db
   ```

2. **Train on existing emails**:
   ```bash
   poetry run mailprobe-py good ~/mail/inbox
   poetry run mailprobe-py spam ~/mail/spam
   ```

3. **Score new emails**:
   ```bash
   poetry run mailprobe-py score < new_email.txt
   ```

### Python API Usage (Recommended for Integration)

```python
from mailprobe import MailProbeAPI

# Create email classifier
with MailProbeAPI() as spam_filter:
    # Train on messages
    spam_filter.train_good(["From: friend@example.com\nSubject: Meeting\n\nLet's meet tomorrow."])
    spam_filter.train_spam(["From: spammer@bad.com\nSubject: FREE MONEY\n\nClick here!"])
    
    # Classify new messages
    email_content = "From: unknown@example.com\nSubject: Question\n\nI have a question."
    is_spam = spam_filter.classify_text(email_content)
    print(f"Is spam: {is_spam}")
    
    # Get detailed results
    result = spam_filter.classify_text(email_content, return_details=True)
    print(f"Probability: {result.probability:.3f}")
```

### Convenience Functions

```python
from mailprobe import classify_email, get_spam_probability

# Quick classification
is_spam = classify_email("From: test@example.com\nSubject: Test\n\nHello")
probability = get_spam_probability("From: test@example.com\nSubject: Test\n\nHello")
```

### Multi-Category Classification

MailProbe-Py supports classification into multiple categories beyond just spam/not-spam:

```python
from mailprobe import MultiCategoryFilter, FolderBasedClassifier

# Multi-category classification
categories = ['work', 'personal', 'newsletters', 'spam']
with MultiCategoryFilter(categories) as classifier:
    # Train on different categories
    classifier.train_category('work', work_emails)
    classifier.train_category('personal', personal_emails)
    
    # Classify into categories
    result = classifier.classify(email_content)
    print(f"Category: {result.category}, Confidence: {result.confidence:.3f}")

# Folder-based classification (auto-discovers categories from folders)
with FolderBasedClassifier('emails/') as classifier:
    classifier.train_from_folders()  # Train from folder structure
    result = classifier.classify(email_content)
    print(f"Should go to folder: {result.category}")
```

## Commands

- `create-db` - Create a new spam database
- `train` - Score and selectively train on messages
- `receive` - Score and always train on messages  
- `score` - Score messages without updating database
- `good` - Mark messages as non-spam
- `spam` - Mark messages as spam
- `remove` - Remove messages from database
- `dump` - Export database contents
- `cleanup` - Remove old/rare terms from database
- `help` - Show detailed help for commands

## Configuration

MailProbe-Py stores its database in `~/.mailprobe-py/` by default. You can change this with the `-d` option:

```bash
poetry run mailprobe-py -d /path/to/database score < email.txt
```

## Integration with Mail Systems

### Procmail Example

Add to your `.procmailrc`:

```
:0 fw
| poetry run mailprobe-py receive

:0:
* ^X-MailProbe: SPAM
spam/
```

### Postfix Example

Configure as a content filter in Postfix to automatically score incoming mail.

## Accuracy

Like the original MailProbe, this implementation typically achieves:
- 99%+ spam detection rate
- Very low false positive rate (< 0.1%)
- Improves with training on your specific email patterns

## Testing

MailProbe-Py includes a comprehensive test suite with **97 tests** and **81% code coverage**.

### Running Tests

```bash
# Run all tests
poetry run pytest

# Run tests with coverage
poetry run pytest --cov=src/mailprobe --cov-report=html

# Run specific test categories
poetry run pytest tests/test_api.py      # API tests
poetry run pytest tests/test_cli.py      # CLI tests
poetry run pytest tests/test_filter.py   # Core filtering tests

# Use the convenient test runner
python run_tests.py quick      # Quick test run
python run_tests.py coverage   # With coverage report
python run_tests.py html       # Generate HTML report
```

### Test Coverage

- **API Tests**: High-level object-oriented interface
- **CLI Tests**: Command-line interface functionality  
- **Core Tests**: Spam filtering, database, tokenization
- **Integration Tests**: End-to-end workflows
- **Error Handling**: Edge cases and invalid inputs

All tests pass successfully and the library is ready for production use.

## Documentation

- **[USAGE.md](USAGE.md)** - Comprehensive usage guide with examples
- **[OO_API_GUIDE.md](OO_API_GUIDE.md)** - Object-oriented API documentation
- **[MULTI_CATEGORY_GUIDE.md](MULTI_CATEGORY_GUIDE.md)** - Multi-category classification guide
- **[DEVELOPMENT.md](DEVELOPMENT.md)** - Development setup and guidelines
- **[TEST_SUMMARY.md](TEST_SUMMARY.md)** - Detailed test documentation
- **[CHANGELOG.md](CHANGELOG.md)** - Version history and changes

## License

MIT License - see LICENSE file for details.

## Acknowledgments

Based on the original MailProbe by Burton Computer Corporation, inspired by Paul Graham's "A Plan for Spam" article.

