Metadata-Version: 2.4
Name: bgf-agents
Version: 0.1.0
Summary: Autonomous Data Factory Agents - Health Monitoring, Self-Healing, and Data Governance
Project-URL: Homepage, https://github.com/bgf-dev/bgf-agents
Project-URL: Documentation, https://github.com/bgf-dev/bgf-agents#readme
Project-URL: Repository, https://github.com/bgf-dev/bgf-agents
Author-email: BGF Team <dev@bgf.dev>
License-Expression: MIT
License-File: LICENSE
Keywords: agents,autonomous,data-factory,governance,llm,monitoring,self-healing
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.40.0
Requires-Dist: asyncpg>=0.29.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: openai>=1.50.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: redis>=5.0.0
Requires-Dist: tenacity>=8.0.0
Provides-Extra: all
Requires-Dist: crawl4ai>=0.3.0; extra == 'all'
Requires-Dist: mypy>=1.10.0; extra == 'all'
Requires-Dist: playwright>=1.40.0; extra == 'all'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'all'
Requires-Dist: pytest>=8.0.0; extra == 'all'
Requires-Dist: ruff>=0.4.0; extra == 'all'
Provides-Extra: collect
Requires-Dist: crawl4ai>=0.3.0; extra == 'collect'
Requires-Dist: playwright>=1.40.0; extra == 'collect'
Provides-Extra: dev
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Description-Content-Type: text/markdown

# BGF Agents

Autonomous Data Factory Agents - A lightweight, Token-Zero agent framework for data pipeline management.

[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Features

- **5 Specialized Agents**: Orchestrator, Healer, Governor, Collector, Analytics
- **76 Tools**: Database, cache, API, file, notification, and more
- **Token-Zero Architecture**: ~50-150 tokens per agent run
- **Multi-Tier Collection**: HTTP → JS → Firecrawl → Browser automation
- **Self-Healing**: Circuit breakers and anomaly detection
- **Data Governance**: Quality checks, lineage tracking, metadata management

## Quick Start

```bash
# Install
pip install bgf-agents

# Or with collection dependencies
pip install bgf-agents[collect]

# Check system health
bgf-agents health

# List available agents
bgf-agents list

# Run an agent
bgf-agents run orchestrator --task health
```

## Architecture

```
┌─────────────────────────────────────────────────────────────────────────┐
│                  Autonomous Agent System                                 │
├─────────────────────────────────────────────────────────────────────────┤
│  Orchestrator  │ System health + ETL         │ 10 tools                │
│  Healer        │ Self-healing + circuits     │ 10 tools                │
│  Governor      │ Data governance             │ 20 tools                │
│  Collector     │ Multi-tier collection       │ 8 tools                 │
│  Analytics     │ Statistics + trends         │ 8 tools                 │
├─────────────────────────────────────────────────────────────────────────┤
│  Extended      │ DB/Cache/API/File/Notify    │ 76 tools total          │
├─────────────────────────────────────────────────────────────────────────┤
│  Total         │ Token Zero Architecture     │ ~50-150 tokens/run      │
└─────────────────────────────────────────────────────────────────────────┘
```

## Agents

### MasterOrchestratorAgent

System health monitoring and ETL orchestration.

```python
from bgf_agents import MasterOrchestratorAgent

agent = MasterOrchestratorAgent(
    provider='openai',
    model='ai/gpt-oss',
    base_url='http://localhost:12434/v1'
)
result = await agent.run('Check all system health')
```

**Tools**: `check_database_health`, `check_redis_health`, `check_api_health`, `run_etl_pipeline`, `get_pipeline_status`, `schedule_etl`, `cancel_etl`, `get_etl_history`, `get_system_metrics`, `get_alerts`

### HealerAgent

Self-healing with circuit breakers and anomaly detection.

```python
from bgf_agents import HealerAgent

agent = HealerAgent()
anomalies = agent.detect_anomalies([1, 2, 3, 100, 4, 5])
circuits = agent.list_circuits()
```

**Tools**: `detect_anomalies`, `get_circuit_status`, `open_circuit`, `close_circuit`, `reset_circuit`, `get_healing_history`, `trigger_healing`, `get_anomaly_report`, `configure_circuit`, `get_health_score`

### GovernorAgent

Data governance, quality, lineage, and metadata management.

```python
from bgf_agents import GovernorAgent

agent = GovernorAgent()
quality = agent.check_data_quality('users_table')
lineage = agent.get_lineage('revenue_metric')
```

**Tools**: `check_data_quality`, `get_lineage`, `update_metadata`, `validate_schema`, `check_freshness`, `get_data_catalog`, `register_dataset`, `get_quality_report`, `set_data_owner`, `get_compliance_status`

### CollectorAgent

Multi-tier data collection (HTTP/JS/Firecrawl/Browser).

```python
from bgf_agents import CollectorAgent

collector = CollectorAgent()

# Automatic tier detection
result = await collector.smart_collect('https://example.com')

# Batch collection
results = await collector.batch_collect([
    'https://api.example.com/data',
    'https://js-heavy-site.com',
    'https://amazon.com/product'  # Auto-routes to Tier-4
])
```

**Tools**: `collect_http`, `collect_js`, `collect_complex`, `collect_browser`, `detect_tier`, `smart_collect`, `batch_collect`, `get_collection_stats`

**Tier Routing**:
| Tier | Tool | Use Case |
|------|------|----------|
| 1 | httpx | REST APIs, static pages |
| 2 | Crawl4AI | JS-rendered pages |
| 3 | Firecrawl | Complex JS, anti-scraping |
| 4 | Playwright | Amazon, LinkedIn, anti-bot |

### AnalyticsAgent

Statistical analysis, trend detection, anomaly detection, and reporting.

```python
from bgf_agents import AnalyticsAgent

analytics = AnalyticsAgent()

# Statistical analysis
result = analytics.analyze_statistics([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(result.summary)  # {'count': 10, 'mean': 5.5, 'median': 5.5, ...}

# Trend detection
trend = analytics.detect_trend([1, 2, 3, 4, 5])
print(trend.direction)  # TrendDirection.UP

# Anomaly detection
anomalies = analytics.detect_anomalies([1, 2, 3, 100, 4, 5])
print(anomalies.anomaly_indices)  # [3]
```

**Tools**: `analyze_statistics`, `detect_trend`, `detect_anomalies`, `profile_data`, `compare_datasets`, `calculate_correlation`, `generate_report`, `forecast_values`

## Extended Tools

76 tools organized in 7 categories:

### Database Tools (10)
```python
from bgf_agents.tools import DATABASE_TOOLS

# execute_query, execute_transaction, get_table_schema,
# list_tables, get_row_count, backup_table,
# vacuum_table, get_table_stats, create_index, drop_index
```

### Cache Tools (10)
```python
from bgf_agents.tools import CACHE_TOOLS

# cache_get, cache_set, cache_delete, cache_exists,
# cache_ttl, cache_keys, cache_clear_pattern,
# cache_increment, cache_get_many, cache_set_many
```

### API Tools (9)
```python
from bgf_agents.tools import API_TOOLS

# http_get, http_post, http_put, http_delete,
# graphql_query, check_api_health, get_api_metrics,
# retry_request, batch_requests
```

### File Tools (9)
```python
from bgf_agents.tools import FILE_TOOLS

# read_file, write_file, append_file, delete_file,
# list_directory, file_exists, get_file_info,
# copy_file, move_file
```

### Notification Tools (5)
```python
from bgf_agents.tools import NOTIFICATION_TOOLS

# send_email, send_slack, send_webhook,
# send_sms, get_notification_history
```

### Search Tools
```python
from bgf_agents.tools import SEARCH_TOOLS

# search_files, search_content
```

## Configuration

### Environment Variables

```bash
# Database
DATABASE_URL=postgresql://localhost:5432/mydb

# Redis
REDIS_URL=redis://localhost:6379/0

# LLM Provider
LLM_PROVIDER=anthropic  # or openai
ANTHROPIC_API_KEY=sk-ant-...
# or
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=http://localhost:12434/v1
```

### YAML Configuration

```yaml
# config.yaml
database:
  url: postgresql://localhost:5432/mydb
  pool_size: 10

redis:
  url: redis://localhost:6379/0

llm:
  provider: anthropic
  model: claude-3-5-sonnet-20241022

agents:
  max_iterations: 10
  timeout: 300
```

### Programmatic Configuration

```python
from bgf_agents import Config, get_config, set_config

# Get current config
config = get_config()

# Set custom config
from bgf_agents.config import DatabaseConfig, LLMConfig

custom_config = Config(
    database=DatabaseConfig(url='postgresql://...'),
    llm=LLMConfig(provider='openai', model='gpt-4')
)
set_config(custom_config)
```

## CLI Reference

```bash
# List agents and tools
bgf-agents list
bgf-agents list -v  # Verbose, show all tools

# Health check
bgf-agents health

# Run agents
bgf-agents run orchestrator --task health
bgf-agents run healer --task detect
bgf-agents run governor --task quality --table users
bgf-agents run collector --url https://example.com
bgf-agents run analytics --data '[1,2,3,4,5]'

# Configuration
bgf-agents config show
bgf-agents config validate
bgf-agents config set --key llm.model --value gpt-4
```

## Token Zero Architecture

The agents use a Token Zero design for efficiency:

| Layer | Tokens | Responsibility |
|-------|--------|----------------|
| Skill Layer | ~50 | Intent understanding |
| Action Layer | 0 | Python computation |
| Tool Layer | 0 | Data operations |

Total token consumption: ~50-150 tokens per agent run (98.5% savings).

## Docker Model Runner Setup

To use local LLM with Docker Desktop:

```bash
# Enable Model Runner (TCP mode)
docker desktop enable model-runner --tcp=12434

# Available models
docker model list
# ai/gpt-oss (11.04 GiB) - General purpose
# ai/gemma3 (2.31 GiB) - Fast inference
# ai/qwen3-vl:8B (4.79 GiB) - Multimodal
```

## Production Deployment

### Cron Schedule

```bash
# Orchestrator - every hour
10 * * * * /path/to/run_orchestrator.sh

# Healer - every 4 hours
20 */4 * * * /path/to/run_healer.sh

# Governor - every 6 hours
30 */6 * * * /path/to/run_governor.sh
```

### Monitoring

Agent executions are logged to `dwd.agent_executions` and `dwd.agent_tool_calls` tables if TimescaleDB is configured.

## Installation Options

```bash
# Basic installation
pip install bgf-agents

# With browser automation (Playwright, Crawl4AI)
pip install bgf-agents[collect]

# Development installation
pip install bgf-agents[dev]

# All dependencies
pip install bgf-agents[all]
```

## Development

```bash
# Clone repository
git clone https://github.com/bgf-dev/bgf-agents.git
cd bgf-agents

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Lint and format
ruff check .
ruff format .

# Type checking
mypy .
```

## License

MIT License - see [LICENSE](LICENSE) for details.
