Metadata-Version: 2.3
Name: fenliu
Version: 0.6.2
Summary: Monitor and filter Fediverse hashtags, curate quality content, and distribute via external tools like Zhongli
Author: marvin8
Author-email: marvin8 <marvin8@tuta.io>
License: AGPL-3.0-or-later
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: alembic~=1.18.4
Requires-Dist: apscheduler~=3.11.2
Requires-Dist: jinja2~=3.1.6
Requires-Dist: minimal-activitypub~=1.5.6
Requires-Dist: pydantic~=2.12.5
Requires-Dist: pydantic-settings~=2.13.1
Requires-Dist: python-multipart~=0.0.22
Requires-Dist: pyview-web~=0.8.3
Requires-Dist: sqlalchemy~=2.0.48
Requires-Dist: uvicorn[standard]~=0.41.0
Requires-Python: >=3.12, <3.14
Description-Content-Type: text/markdown

# FenLiu (分流)

*Created by marvin8 with assistance from Claude and DeepSeek AI assistants.*

> **⚠️ DISCLAIMER / PROVISO**: This project is a **work in progress** with major changes still happening. It is in no way anywhere close to finished and is only borderline useful for actual production use. Expect breaking changes, incomplete features, and significant architectural evolution as development continues.

**Divide the Fediverse content flow**

FenLiu is a web application that monitors Fediverse hashtags, filters spam, allows human review, learns from feedback, and exports quality content for boosting. Inspired by the ancient Chinese Dujiangyan irrigation system (256 BC) that separated silt from water, FenLiu applies 2,300-year engineering wisdom to modern digital content streams.

## Current Status — v0.6.0

FenLiu is a fully functional spam filtering and content management system with complete Curated Queue integration, flexible pattern-based user blocking, automated queue lifecycle management, and production-ready containerization. Monitor hashtags, score posts for spam, manually review content, reliably export quality posts, and manage queue health with automatic cleanup and trimming.

**Latest updates (v0.6.0)**: Auto-delete old delivered posts (7-day retention), weighted random deletion of excess pending posts based on age/engagement/author-activity, cleanup/trim API endpoints with manual UI controls, production containerization with Podman/Docker support, persistent data volumes for database and logs; 389 total tests passing.

## Features

### Core Functionality
- **Hashtag Monitoring**: Monitor multiple Fediverse hashtags with customizable instance sources and scheduling
- **Spam Scoring**: Rule-based detection (0-100 scale) with 7 intelligent detection rules
- **Manual Review Interface**: Web interface for reviewing and approving/rejecting posts with scoring
- **Bulk Operations**: Fetch and process posts in bulk with real-time progress tracking
- **Curated Queue Export**: API-driven queue with ack/nack/error reliability pattern

### Reblog Controls (Export Filters)
- **Pattern-Based User Blocking**: Block users with flexible matching modes:
  - **exact**: Exact account identifier (e.g., `@user@mastodon.social`)
  - **suffix**: Block all users from domain (e.g., `bsky.app` for all Bluesky users)
  - **prefix**: Block by username prefix (e.g., `bot_` for bot accounts)
  - **contains**: Block by substring (e.g., `spam` for accounts with "spam" in name)
- **"Don't Reblog" Hashtag Blocklist**: Exclude posts with blocked hashtags
- **Attachments-Only Mode**: Export only posts with media attachments
- **Auto-Reject on Fetch**: Automatically reject blocked content before review
- **Blocklist Refresh**: Apply Settings changes to review page instantly without losing progress

### Web Interface
- **Dashboard**: Real-time analytics, top hashtags, review progress
- **Streams Management**: Create, edit, manage hashtag streams with CRUD operations
- **Review Workflow**: Approve/reject posts with manual score adjustment and spam breakdown
- **Pattern Blocking Settings**: Intuitive UI for adding pattern-based user blocks with examples
- **Queue Preview**: Monitor queue health (pending/reserved/delivered/error counts)
- **Statistics**: Charts for posts over time and hashtag distribution
- **Responsive Design**: Fully responsive across desktop, tablet, mobile

### REST API
- **Hashtag Streams**: Full CRUD for stream management and bulk fetching
- **Posts**: List, filter, update with approval/rejection and scoring
- **Curated Queue**: `/next`, `/ack`, `/nack`, `/error`, `/requeue` endpoints
- **Reblog Controls**: Manage blocked users (with pattern types) and hashtags
- **Statistics**: Post counts, hashtag distribution, approval rates
- **Authentication**: API key-based authentication for queue endpoints
- **Health**: Health check and application info endpoints

### Technical Quality
- **Type Safety**: Comprehensive type hints throughout
- **Testing**: 384 tests with 100% pass rate (including 29 pattern matching tests)
- **Resource Management**: Proper cleanup of DB sessions and HTTP connections
- **Database Migrations**: Alembic with automatic schema migration on startup
- **API Key Security**: Secure generation and management of API keys
- **Code Complexity**: All functions optimized for maintainability
- **No JavaScript Bloat**: Pure HTML/CSS frontend, no external JS dependencies

## Quick Start

### Prerequisites
- Python 3.12 or higher
- `uv` package manager (recommended)

### Installation
```bash
# Install dependencies
uv sync -U --all-groups

# Optional: Set up pre-commit hooks
uv run pre-commit install
```

### Running the Application
```bash
# Development mode with auto-reload
fenliu --reload --debug

# Alternative development mode
uv run python -m fenliu --reload --debug

# Production mode
fenliu --host 0.0.0.0 --port 8000

# See all options
fenliu --help
```

### Container Deployment (Docker/Podman)

FenLiu includes production-ready containerization with minimal image size (~207 MB).

#### Build the Image
```bash
# With Podman (recommended)
podman build -t fenliu -f Containerfile .

# Or with Docker
docker build -t fenliu -f Containerfile .
```

#### Run the Container
```bash
# Copy environment file and edit with your settings
cp .env.example .env
# Edit .env with your configuration

# Run with persistent volumes (recommended)
podman run -p 8000:8000 \
  -v fenliu-data:/app/data \
  -v fenliu-logs:/app/logs \
  --env-file .env \
  fenliu

# Or specify DATABASE_URL and SECRET_KEY directly
podman run -p 8000:8000 \
  -v fenliu-data:/app/data \
  -v fenliu-logs:/app/logs \
  -e DATABASE_URL="sqlite:////app/data/fenliu.db" \
  -e SECRET_KEY="your-production-secret-key" \
  fenliu
```

#### Container Features
- **Multi-stage build**: Minimal final image (~207 MB)
- **Non-root user**: Runs as `fenliu` user (UID 1000) for security
- **Persistent volumes**: Separate volumes for data (`/app/data`) and logs (`/app/logs`)
- **Automatic migrations**: Alembic migrations run on container startup
- **Production ready**: Uses `python:3.13-slim-bookworm` base image

#### Docker Compose Example
```yaml
version: '3.8'

services:
  fenliu:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - fenliu-data:/app/data
      - fenliu-logs:/app/logs
    environment:
      - DATABASE_URL=sqlite:////app/data/fenliu.db
      - DEBUG=false
      - SECRET_KEY=your-secret-key-change-in-production
    restart: unless-stopped

volumes:
  fenliu-data:
  fenliu-logs:
```

Run with: `docker-compose up -d`

### First Steps
1. Start the server: `fenliu --reload`
2. Open browser: Navigate to `http://localhost:8000`
3. Add a hashtag: Go to Streams page and create a hashtag stream (e.g., "python")
4. Fetch posts: Click "Fetch" on the stream to retrieve posts from Fediverse
5. Review posts: Use the Review interface to approve quality content or reject spam
6. Block patterns: Go to Settings to add pattern-based blocks (optional)
7. Export: Monitor the Queue Preview to see posts flowing to Curated Queue

### Pattern-Based Blocking Examples

#### Settings Page Usage
1. Go to **Settings → Don't Reblog — Users**
2. Enter pattern: `bsky.app`
3. Select type: **suffix**
4. Click "Block"
5. Result: All users from Bluesky are now blocked

#### Common Patterns
- **Block all Bluesky users**: Pattern `bsky.app`, Type `suffix`
- **Block bot accounts**: Pattern `bot_`, Type `prefix`
- **Block accounts with spam keyword**: Pattern `spam`, Type `contains`
- **Block specific user**: Pattern `@user@mastodon.social`, Type `exact`

#### Applying to Review Page
1. While reviewing posts, go to Settings to add new patterns
2. Return to Review page
3. Click **Refresh Blocklists** button (next to Refresh)
4. Current posts instantly re-evaluated with new patterns
5. Continue reviewing without page reload

### Debug Logging
Enable detailed debug logging with the `--debug` flag:

```bash
# Enable debug logging to file
fenliu --debug

# View logs in real-time
tail -f logs/fenliu_debug.log

# Custom log directory
fenliu --debug --log-dir=/var/log/fenliu
```

In your code: `from fenliu.logging import get_logger` then `logger.debug(f"message")`

## API Usage

### Authentication
All queue endpoints require API key authentication. Generate a key in Settings, then include it in requests:

```bash
curl -H "X-API-Key: your-api-key-here" \
  http://localhost:8000/api/v1/curated/next
```

### Common Examples
```bash
# List all hashtag streams
curl http://localhost:8000/api/v1/streams

# Create a new hashtag stream
curl -X POST http://localhost:8000/api/v1/streams \
  -H "Content-Type: application/json" \
  -d '{"hashtag": "python", "instance": "mastodon.social", "active": true}'

# Fetch posts for a stream
curl -X POST http://localhost:8000/api/v1/streams/1/fetch?limit=20

# Get next post from Curated Queue
curl -H "X-API-Key: your-api-key-here" \
  http://localhost:8000/api/v1/curated/next

# Acknowledge successful reblog
curl -X POST -H "X-API-Key: your-api-key-here" \
  http://localhost:8000/api/v1/curated/123/ack

# Report permanent failure
curl -X POST -H "X-API-Key: your-api-key-here" \
  -H "Content-Type: application/json" \
  -d '{"reason": "Account suspended"}' \
  http://localhost:8000/api/v1/curated/123/error

# Review a post (approve)
curl -X PATCH http://localhost:8000/api/v1/posts/123 \
  -H "Content-Type: application/json" \
  -d '{"approved": true, "reviewer_notes": "Quality content"}'

# Adjust spam score manually
curl -X PATCH http://localhost:8000/api/v1/posts/123 \
  -H "Content-Type: application/json" \
  -d '{"manual_spam_score": 15}'

# Add a pattern-based block (suffix type)
curl -X POST http://localhost:8000/api/v1/reblog-controls/blocked-users \
  -H "Content-Type: application/json" \
  -d '{"account_identifier": "bsky.app", "pattern_type": "suffix", "notes": "Block all Bluesky"}'

# List blocked users with pattern types
curl http://localhost:8000/api/v1/reblog-controls/blocked-users
```

### API Endpoints

**Streams & Posts:**
- `GET /api/v1/streams` - List streams
- `POST /api/v1/streams` - Create stream
- `GET/PUT/DELETE /api/v1/streams/{id}` - Stream operations
- `POST /api/v1/streams/{id}/fetch` - Fetch posts for stream
- `POST /api/v1/streams/fetch-all` - Fetch all active streams
- `GET /api/v1/posts` - List posts with filtering
- `GET /api/v1/posts/{id}` - Get post details
- `PATCH /api/v1/posts/{id}` - Update post (review, approve, score)
- `GET /api/v1/stats` - Application statistics

**Curated Queue:**
- `GET /api/v1/curated/next` - Get next post (returns 204 if empty)
- `POST /api/v1/curated/{post_id}/ack` - Confirm successful reblog
- `POST /api/v1/curated/{post_id}/nack` - Return to queue (transient failure)
- `POST /api/v1/curated/{post_id}/error` - Mark permanently failed
- `POST /api/v1/curated/{post_id}/requeue` - Return errored post to queue

**Reblog Controls (Pattern-Based Blocking):**
- `GET /api/v1/reblog-controls/settings` - Get reblog filter settings
- `PUT /api/v1/reblog-controls/settings` - Update settings
- `GET /api/v1/reblog-controls/blocked-users` - List blocked users with pattern types
- `POST /api/v1/reblog-controls/blocked-users` - Add blocked user (with pattern_type)
- `DELETE /api/v1/reblog-controls/blocked-users/{id}` - Remove blocked user
- `GET /api/v1/reblog-controls/blocked-hashtags` - List blocked hashtags
- `POST /api/v1/reblog-controls/blocked-hashtags` - Add blocked hashtag
- `DELETE /api/v1/reblog-controls/blocked-hashtags/{id}` - Remove blocked hashtag
- `POST /api/v1/reblog-controls/reject-blocked` - Bulk reject posts matching any pattern

**System:**
- `GET /health` - Health check
- `GET /info` - Application info

## Configuration

Environment variables (via `.env` file):

```bash
# Database
DATABASE_URL=sqlite:///./fenliu.db

# Fediverse settings
DEFAULT_INSTANCE=mastodon.social
API_TIMEOUT=30
MAX_POSTS_PER_FETCH=20
RATE_LIMIT_DELAY=1.0

# Application
DEBUG=false
SECRET_KEY=your-secret-key-change-in-production
APP_NAME=FenLiu

# Spam scoring thresholds
VERY_HIGH_THRESHOLD=76
LOW_MAX_THRESHOLD=25

# Queue timeout
RESERVE_TIMEOUT_SECONDS=300
```

## Development

### Testing
```bash
# Run full test suite
pytest

# Run with coverage
pytest --cov=src/fenliu tests/

# Quick validation
python -m pytest -q

# Run specific test file
pytest tests/test_pattern_blocking.py -v
```

### Code Quality
```bash
# Linting
ruff check src/fenliu/

# Formatting
ruff format src/fenliu/

# Complexity check
complexipy src

# Pre-commit checks
prek run --all-files

# Full CI simulation
nox
```

### Database Migrations
```bash
# Apply pending migrations
alembic upgrade head

# Create new migration
alembic revision --autogenerate -m "description"

# Show current revision
alembic current

# View all revisions
alembic history
```

### Development Workflow
```bash
# After dependency changes
uv sync -U --all-groups

# Quick validation before commits
prek run --all-files

# Full validation before commits
nox
```

## Project Structure
```
fenliu/
├── src/fenliu/
│   ├── __init__.py              # Package definition
│   ├── __main__.py              # CLI entry point
│   ├── main.py                  # PyView application
│   ├── config.py                # Configuration
│   ├── database.py              # Database setup
│   ├── models.py                # SQLAlchemy models
│   ├── schemas.py               # Pydantic validation
│   ├── api/                     # REST API endpoints
│   │   ├── curated.py           # Queue API
│   │   ├── reblog_controls.py   # Filter management (pattern-based)
│   │   └── api_keys.py          # API key management
│   ├── services/                # Business logic
│   │   ├── spam_scoring.py      # Spam detection
│   │   ├── fediverse.py         # Fediverse client
│   │   ├── export_eligibility.py # Export filtering with pattern matching
│   │   ├── scheduler.py         # Task scheduling
│   │   └── api_key.py           # API key service
│   ├── templates/               # HTML templates
│   └── static/                  # CSS and assets
├── alembic/                     # Database migrations
├── tests/                       # Test suite (384 tests)
├── docs/                        # MkDocs documentation
├── pyproject.toml               # Project configuration
├── ROADMAP.md                   # Development roadmap
├── README.md                    # This file
└── PATTERN_BLOCKING_FEATURE.md  # Pattern blocking documentation
```

## Documentation

Complete documentation available in the `docs/` folder built with [MkDocs](https://www.mkdocs.org/):

```bash
# Serve locally with hot reload
mkdocs serve

# Build static site
mkdocs build
```

**📚 Live Documentation**: https://marvinsmastodontools.codeberg.page/fenliu/

Includes: Installation, Quick Start, API Reference, Pattern Blocking Guide, Curated Queue Integration, Contributing Guide, Roadmap, and FAQ.

## Technical Stack
- **Framework**: PyView (Starlette-based LiveView) with real-time capabilities
- **Database**: SQLAlchemy with SQLite, optimized with eager loading
- **API Client**: minimal-activitypub for Fediverse integration
- **Async**: Full async/await throughout (sync for SQLite only)
- **Type Hints**: Comprehensive type annotations with Pydantic validation
- **Frontend**: Jinja2 templates with Tailwind CSS, responsive design
- **Testing**: pytest with 384 tests (100% pass rate)
- **Linting**: ruff for formatting and linting
- **Migrations**: Alembic for schema management
- **Package Manager**: uv for dependency management

## Upcoming Features

See [Roadmap](ROADMAP.md) for detailed plans. Phase 4 focus:
- Docker containerization and CI/CD
- Performance optimization and caching for pattern matching
- Multi-user support with roles
- Advanced monitoring dashboard
- PostgreSQL/MySQL support

## What's New in v0.6.0

### Queue Lifecycle Management
Automatic management of pending and delivered posts to prevent indefinite queue growth:

- **Auto-Delete Delivered Posts**: Posts automatically deleted after 7 days (configurable), with historical stats preserved
- **Trim Excess Pending Posts**: Weighted random deletion maintains invariant: `pending_count ≥ 2 × daily_consumption_rate`
  - **Age-based weighting**: Older posts have higher deletion probability
  - **Engagement-based weighting**: Posts with fewer likes deleted preferentially
  - **Author activity weighting**: Posts from prolific authors in pending queue have higher deletion probability
- **Cleanup API Endpoints**: `POST /api/v1/curated/cleanup` and `POST /api/v1/curated/trim-pending` for manual control
- **Queue UI Controls**: New "Purge old delivered" and "Trim excess pending" buttons on Queue Preview page
- **Historical Stats**: All-time deletion counts preserved in database; stats page shows both active and historical data
- **5 New Tests**: Comprehensive coverage of cleanup/trim logic (389 total tests)

### Production Containerization
FenLiu is now production-ready for containerized deployment:

- **Multi-stage Dockerfile**: Minimal final image (~207 MB)
- **Non-root User**: Runs as `fenliu` (UID 1000) for enhanced security
- **Persistent Volumes**: Separate data and logs volumes for durability
- **Automatic Migrations**: Database schema migrated automatically on container startup
- **Environment Configuration**: `.env.example` with comprehensive documentation
- **Docker/Podman Support**: Works with both Docker and Podman
- **Docker Compose Example**: Ready-to-use configuration in documentation

### Code Quality
- **Complexity Optimization**: Refactored `_trim_pending_posts()` from complexity 16 to 6 via helper functions
- **Type Safety**: Full type hints across all new functions with zero type errors
- **Linting**: All code passes ruff checks (no warnings)
- **Test Pass Rate**: 389 tests passing (100%)

## Previous Release — v0.5.3

### Pattern-Based User Blocking (v0.5.3)
Users can block Fediverse accounts using flexible pattern matching:

- **Four Pattern Types**: exact, suffix, prefix, contains
- **Real-World Examples**: Block all Bluesky users, all bot accounts, or any account with a keyword
- **Settings UI**: Intuitive pattern selector with helpful examples
- **Review Page Integration**: Pattern-based blocks show on review page with instant visibility
- **Blocklist Refresh**: New button allows applying Settings changes to review page without losing progress

See [PATTERN_BLOCKING_FEATURE.md](PATTERN_BLOCKING_FEATURE.md) for complete details and examples.

## Cultural Context

The name "FenLiu" (分流) means "divide the flow" in Chinese, inspired by the ancient Dujiangyan irrigation system (256 BC). This project applies the same engineering wisdom to digital content streams, separating valuable content from spam and noise while maintaining the natural flow of community conversation.

## Key Resources
- [Roadmap](ROADMAP.md) - Development plans and future features
- [Pattern Blocking Guide](PATTERN_BLOCKING_FEATURE.md) - Detailed pattern matching documentation
- [LLM System Prompt](LLM_SYSTEM_PROMPT.md) - Development standards
- [Live Docs](https://marvinsmastodontools.codeberg.page/fenliu/) - Complete documentation

## License
AGPL-3.0 License - See LICENSE file for details.

## Contributing
1. Follow existing code style (ruff formatted with comprehensive type hints)
2. Write tests for new functionality (maintain 100% test pass rate)
3. Update documentation as needed
4. Run `nox` before submitting changes
5. Run `alembic upgrade head` after pulling changes with new migrations

---

**Version**: 0.6.0
**Status**: Production Ready ✅
**Released**: 2026-03-12
**Tests**: 389 passing ✅
**Code Quality**: All checks passing ✅
**Container Size**: ~207 MB (multi-stage optimized)
**Framework**: PyView (Starlette-based LiveView)
**Architecture**: Async Python with comprehensive type hints
**Repository**: https://codeberg.org/marvinsmastodontools/fenliu
