Metadata-Version: 2.4
Name: blackreach
Version: 5.0.0b1
Summary: Autonomous browser agent that accomplishes goals through web browsing
Author: Phnix
License: MIT
Project-URL: Homepage, https://github.com/Null-Phnix/Blackreach
Project-URL: Documentation, https://github.com/Null-Phnix/Blackreach#readme
Project-URL: Repository, https://github.com/Null-Phnix/Blackreach
Project-URL: Issues, https://github.com/Null-Phnix/Blackreach/issues
Keywords: browser,automation,agent,ai,llm,web,scraping,autonomous
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: playwright>=1.40.0
Requires-Dist: playwright-stealth>=1.0.0
Requires-Dist: undetected-playwright>=0.3.0
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: ollama>=0.1.0
Requires-Dist: click>=8.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: prompt-toolkit>=3.0.0
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.18.0; extra == "anthropic"
Provides-Extra: google
Requires-Dist: google-genai>=0.1.0; extra == "google"
Provides-Extra: all
Requires-Dist: openai>=1.0.0; extra == "all"
Requires-Dist: anthropic>=0.18.0; extra == "all"
Requires-Dist: google-genai>=0.1.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Dynamic: license-file

# Blackreach

[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Version](https://img.shields.io/badge/version-5.0.0--beta.1-orange.svg)](https://github.com/Null-Phnix/Blackreach)

**Autonomous Browser Agent** - Give it a goal, watch it browse.

![Blackreach Demo](assets/demo.gif)

Blackreach is a CLI tool that uses AI to autonomously browse the web and accomplish tasks. It can navigate websites, search for content, download files (PDFs, images, datasets, etc.), and more.

```bash
blackreach run "find and download papers about machine learning from arxiv"
```

## Features

- **General-Purpose**: Download any content type - papers, images, datasets, ebooks, etc.
- **ReAct Pattern**: Observe -> Think -> Act loop for intelligent browsing
- **DOM Walker**: Live browser DOM extraction gives the LLM numbered interactive elements
- **Session Resume**: Pause and resume interrupted sessions
- **Smart Deduplication**: Never download the same file twice (URL + hash checking)
- **Memory System**: Remembers successful patterns across sessions
- **Multi-Provider**: Ollama, OpenAI, Anthropic, Google, xAI
- **Stealth Mode**: Evades basic bot detection
- **Stuck Detection**: Automatically detects loops and recovers with alternate strategies

## How It Works

Blackreach uses a **DOM walker** approach to let the LLM interact with web pages:

1. **Observe**: The DOM walker (`dom_walker.py`) runs JavaScript in the live browser to find all interactive elements (links, buttons, inputs, etc.) and assigns each a numeric `[N]` ID.
2. **Think**: The LLM receives the page text content and the numbered element list, then reasons about which action moves closest to the goal.
3. **Act**: The LLM outputs a JSON action referencing a specific element ID (e.g., `{"action":"click","element":15}`), and the agent executes it in the browser via Playwright.

This cycle repeats until the goal is accomplished or the step limit is reached. The agent includes stuck detection, automatic source failover, and error recovery to handle real-world browsing challenges.

## Installation

### Quick Install (pip)

```bash
pip install blackreach
```

### Install with Cloud Providers

```bash
# With OpenAI support
pip install "blackreach[openai]"

# With Anthropic support
pip install "blackreach[anthropic]"

# With all providers
pip install "blackreach[all]"
```

### From Source

```bash
git clone https://github.com/Null-Phnix/Blackreach
cd blackreach
pip install -e .
```

### Post-Install: Browser Setup

Blackreach uses Playwright for browser automation. Install the browser:

```bash
playwright install chromium
```

## Quick Start

### First Run

```bash
blackreach
```

On first run, Blackreach will walk you through setup:
1. Install browser (if needed)
2. Choose AI provider (Ollama, OpenAI, etc.)
3. Configure API key (for cloud providers)

### Basic Usage

```bash
# Interactive mode
blackreach

# Run with a specific goal
blackreach run "search wikipedia for artificial intelligence"

# Run headless (no browser window)
blackreach run --headless "download papers about transformers from arxiv"

# Use a specific provider/model
blackreach run -p openai -m gpt-4o "find papers about attention mechanisms"

# Resume an interrupted session
blackreach run --resume 42
```

## Commands

| Command | Description |
|---------|-------------|
| `blackreach` | Interactive mode |
| `blackreach run "goal"` | Run agent with a goal |
| `blackreach run --resume ID` | Resume a paused session |
| `blackreach sessions` | List resumable sessions |
| `blackreach config` | Configure settings and API keys |
| `blackreach models` | List available models |
| `blackreach status` | Show current configuration |
| `blackreach stats` | Show performance metrics |
| `blackreach setup` | Run setup wizard |
| `blackreach doctor` | Check system requirements |
| `blackreach health` | Check content source availability |
| `blackreach downloads` | Show download history |

### Interactive Commands

In interactive mode, use these slash commands:

| Command | Short | Description |
|---------|-------|-------------|
| `/help` | `/h` | Show help |
| `/model` | `/m` | Switch model |
| `/provider` | `/p` | Switch provider |
| `/status` | `/s` | Show status |
| `/plan "goal"` | | Preview a plan without executing |
| `/sessions` | | List resumable sessions |
| `/resume ID` | | Resume a session |
| `/logs` | `/l` | View recent logs |
| `/clear` | `/cls` | Clear screen |
| `/quit` | `/q` | Exit |

## Supported AI Providers

| Provider | Type | Models |
|----------|------|--------|
| **Ollama** | Local | qwen2.5:7b, llama3.2:3b, mistral:7b |
| **xAI** | Cloud | grok-2, grok-2-mini |
| **OpenAI** | Cloud | gpt-4o, gpt-4o-mini |
| **Anthropic** | Cloud | claude-sonnet-4-6, claude-haiku-4-5 |
| **Google** | Cloud | gemini-2.5-pro, gemini-2.5-flash |

### Using Ollama (Local, Free, Private)

1. Install Ollama: https://ollama.ai
2. Pull a model: `ollama pull qwen2.5:7b`
3. Start Ollama: `ollama serve`
4. Use Blackreach: `blackreach`

### Using Cloud Providers

1. Get API key from your provider
2. Configure: `blackreach config` -> Set API key
3. Switch provider: `blackreach config` -> Set default provider

## Configuration

Config file: `~/.blackreach/config.yaml`

### Environment Variables

```bash
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."
export XAI_API_KEY="xai-..."
```

## Examples

### Research Papers

```bash
blackreach run "go to arxiv.org, search for 'attention mechanism', download 3 papers"
```

### Images and Media

```bash
blackreach run "find and download landscape wallpapers from unsplash"
```

### Datasets

```bash
blackreach run "download CSV files about climate data from kaggle"
```

### Documentation

```bash
blackreach run "go to github.com/pytorch/pytorch and download the README"
```

### Ebooks

```bash
blackreach run "find and download 'pride and prejudice' from project gutenberg"
```

## Session Resume

Sessions are automatically saved when interrupted (Ctrl+C):

```bash
# Start a task
blackreach run "download 10 papers from arxiv"
# Press Ctrl+C to pause

# Later, resume where you left off
blackreach sessions  # See available sessions
blackreach run --resume 42  # Resume session #42
```

## Architecture

```
blackreach/
├── agent.py          # ReAct loop coordinator
├── browser.py        # Playwright browser control (stealth, downloads)
├── dom_walker.py     # Live DOM extraction - assigns [N] IDs to interactive elements
├── observer.py       # Legacy HTML parsing (fallback utilities)
├── llm.py            # Multi-provider LLM integration
├── memory.py         # Session memory + SQLite persistence
├── detection.py      # CAPTCHA, login, paywall detection
├── knowledge.py      # Content source knowledge base
├── resilience.py     # Retry logic, circuit breaker
├── stuck_detector.py # Loop detection and recovery strategies
├── error_recovery.py # Error categorization and recovery
├── exceptions.py     # Error hierarchy
├── config.py         # Configuration management
├── logging.py        # Structured session logging
├── ui.py             # Rich terminal UI components
└── cli.py            # Command-line interface
```

## Troubleshooting

### Check System Status

```bash
blackreach doctor
```

### Common Issues

**Browser not found:**
```bash
playwright install chromium
```

**Ollama not running:**
```bash
ollama serve
```

**No API key configured:**
```bash
blackreach config          # Interactive setup
# or set environment variable:
export OPENAI_API_KEY="sk-..."
```

**Bot detection (418/403 errors):**
- Some sites block headless browsers
- Try running without `--headless`
- Try a different browser: `blackreach run -b firefox "your goal"`
- Use different search engines (Google/Wikipedia work better than DuckDuckGo)

**Session resume fails:**
```bash
blackreach sessions  # Check if session exists
```

## Memory and Learning

Blackreach maintains two types of memory:

1. **Session Memory** (RAM): Current session state
2. **Persistent Memory** (SQLite): Cross-session learning

The persistent memory tracks:
- All downloads (prevents re-downloading)
- Site patterns that worked
- Action success rates per domain
- Common failures to avoid

View stats:
```bash
blackreach stats
```

## License

MIT

## Contributing

Contributions welcome! Please open an issue or PR on GitHub.
