Metadata-Version: 2.4
Name: ghostfolio-ai-agent
Version: 0.2.0
Summary: AI-powered conversational portfolio assistant for Ghostfolio with tool calling, verification, and evaluation framework
Author: Leszek Bartkowski
License-Expression: MIT
Project-URL: Homepage, https://github.com/leszekbar/ghostfolio-agent
Project-URL: Repository, https://github.com/leszekbar/ghostfolio-agent
Project-URL: Issues, https://github.com/leszekbar/ghostfolio-agent/issues
Keywords: ghostfolio,ai-agent,portfolio,finance,langchain,langgraph
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business :: Financial
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastapi>=0.116.0
Requires-Dist: httpx>=0.28.0
Requires-Dist: langchain-openai>=0.3.0
Requires-Dist: langchain-anthropic>=0.3.0
Requires-Dist: langfuse>=2.0.0
Requires-Dist: langgraph>=0.6.0
Requires-Dist: pydantic>=2.11.0
Requires-Dist: pydantic-settings>=2.10.0
Requires-Dist: streamlit>=1.49.0
Requires-Dist: uvicorn>=0.35.0
Provides-Extra: dev
Requires-Dist: pytest>=8.4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=1.1.0; extra == "dev"
Requires-Dist: ruff>=0.9.0; extra == "dev"
Dynamic: license-file

# Ghostfolio AI Agent

AI-powered conversational portfolio assistant for [Ghostfolio](https://ghostfol.io). Ask natural-language questions about your portfolio and get verified, fact-grounded responses.

## Architecture

```mermaid
graph TD
    A[Streamlit Chat UI] -->|HTTP| B[FastAPI Server]
    B --> C{Agent}
    C -->|Primary| D[LLM Mode<br/>OpenRouter / OpenAI / Anthropic]
    C -->|Fallback| E[Rule-Based Mode]
    D --> F[Tool Layer<br/>7 Tools]
    E --> F
    F --> G{Data Source}
    G -->|Testing| H[Mock Provider]
    G -->|Production| I[Ghostfolio API]
    F --> J[Verification Layer]
    J -->|Traces| K[Langfuse]
```

**Dual-mode agent**: LLM-powered tool calling with automatic rule-based fallback. Every response passes through fact-grounding, disclaimer enforcement, and confidence scoring.

## Features

- **7 portfolio tools**: Summary, performance, transactions, accounts, market data, allocation analysis, risk rules
- **LLM integration**: Configurable model via OpenRouter (GPT, Claude) with direct OpenAI/Anthropic fallback
- **Verification layer**: Fact grounding, financial disclaimer, trade advice refusal, prompt injection defense
- **Observability**: Langfuse tracing for tool calls, LLM invocations, and verification
- **50+ eval test cases**: Deterministic checks + LLM-as-judge scoring
- **Production-ready**: FastAPI + Streamlit, Railway deployment, CI/CD with linting and evals

## Quick Start

```bash
# 1. Setup
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# 2. Configure (optional — works with mock data by default)
cp .env.example .env  # Add API keys if desired

# 3. Run API
uvicorn app.main:app --reload

# 4. Run Chat UI (new terminal)
streamlit run ui/streamlit_app.py

# 5. Test
pytest -v

# 6. Run evals
python evals/run_evals.py
```

## Configuration

Environment variables (prefix: `GHOSTFOLIO_`):

### Core
| Variable | Default | Description |
|----------|---------|-------------|
| `GHOSTFOLIO_DEFAULT_DATA_SOURCE` | `mock` | `mock` or `ghostfolio_api` |
| `GHOSTFOLIO_BASE_URL` | `https://ghostfol.io` | Ghostfolio instance URL |
| `GHOSTFOLIO_REQUEST_TIMEOUT_SECONDS` | `10` | HTTP timeout |

### LLM (OpenRouter — recommended)

The easiest way to configure the agent's LLM is via [OpenRouter](https://openrouter.ai), which provides a unified API for multiple providers. Set two env vars:

| Variable | Default | Description |
|----------|---------|-------------|
| `GHOSTFOLIO_OPENROUTER_API_KEY` | — | OpenRouter API key |
| `GHOSTFOLIO_AGENT_MODEL` | — | Model to use (see table below) |

Available models:

| `AGENT_MODEL` value | Routed to |
|---------------------|-----------|
| `gpt-o` | `openai/gpt-5.2` |
| `gpt-mini` | `openai/gpt-5.1-chat` |
| `claude-sonnet` | `anthropic/claude-sonnet-4.6` |
| `claude-opus` | `anthropic/claude-opus-4.6` |

You can also pass a raw OpenRouter model ID (e.g. `anthropic/claude-haiku-4-5-20251001`) for any model available on OpenRouter.

### LLM (direct API keys — fallback)

If OpenRouter is not configured, the agent falls back to direct provider keys:

| Variable | Default | Description |
|----------|---------|-------------|
| `GHOSTFOLIO_OPENAI_API_KEY` | — | OpenAI API key |
| `GHOSTFOLIO_OPENAI_MODEL` | `gpt-4.1` | OpenAI model |
| `GHOSTFOLIO_ANTHROPIC_API_KEY` | — | Anthropic API key |
| `GHOSTFOLIO_ANTHROPIC_MODEL` | `claude-sonnet-4-20250514` | Anthropic model |
| `GHOSTFOLIO_LLM_ENABLED` | `true` | Enable/disable LLM mode |

**Priority**: OpenRouter > direct OpenAI > direct Anthropic > rule-based fallback.

### Observability
| Variable | Default | Description |
|----------|---------|-------------|
| `GHOSTFOLIO_LANGFUSE_PUBLIC_KEY` | — | Langfuse public key |
| `GHOSTFOLIO_LANGFUSE_SECRET_KEY` | — | Langfuse secret key |
| `GHOSTFOLIO_LANGFUSE_HOST` | `https://cloud.langfuse.com` | Langfuse host |

### Logging
| Variable | Default | Description |
|----------|---------|-------------|
| `GHOSTFOLIO_LOG_LEVEL` | `INFO` | `DEBUG\|INFO\|WARNING\|ERROR` |
| `GHOSTFOLIO_LOG_FORMAT` | `json` | `json` or `text` |

## Tools

| Tool | Description |
|------|-------------|
| `get_portfolio_summary` | Portfolio value, holdings, allocations |
| `get_performance` | Returns for time ranges (1d, ytd, 1y, 5y, max) |
| `get_transactions` | Buy/sell activity history |
| `get_account_details` | Linked brokerage accounts and balances |
| `get_market_data` | Current prices for stock/ETF symbols |
| `analyze_allocation` | Sector, region, asset class breakdown + risk flags |
| `check_risk_rules` | Concentration, diversification, asset class risk checks |

## Verification

Every response is verified before delivery:
- **Fact grounding**: Numerical claims traced to tool output
- **Disclaimer**: Financial disclaimer on every response
- **Trade advice refusal**: Buy/sell recommendations politely refused
- **Prompt injection defense**: Override attempts detected and blocked
- **Data freshness**: Stale data warnings (>6h old)
- **Confidence scoring**: 0.4 (low) — 0.95 (high)

## Evaluation

```bash
# Deterministic evals (50+ test cases, >80% gate)
python evals/run_evals.py

# LLM-as-judge (requires OpenAI key, advisory)
python evals/llm_judge.py
```

Categories: happy path (21), edge cases (10), adversarial (12), multi-step (10)

### Multi-Model Comparison

Compare agent configurations side-by-side across the full eval dataset. Runs each model against all 53 cases using `MockFileDataProvider` (48 holdings, 577 transactions, 5 accounts), scores with deterministic checks + LLM-as-judge, measures response time, and outputs a ranked comparison table.

Available models: `rule-based`, `gpt-o`, `gpt-mini`, `claude-haiku`, `claude-sonnet`, `claude-opus`.

```bash
# Quick smoke test (no API keys needed)
python evals/compare_models.py --models rule-based --no-judge

# Compare two LLM models with verbose per-case output
python evals/compare_models.py --models gpt-mini claude-sonnet -v

# Full run (all models except opus, with LLM judge)
python evals/compare_models.py

# Include opus (expensive)
python evals/compare_models.py --include-expensive

# Filter by eval category
python evals/compare_models.py --models gpt-o claude-sonnet --categories happy_path edge_cases
```

Output includes a ranked summary table, per-category breakdown, and a detailed JSON results file. Ranking composite score: `0.4 × det_pass_rate + 0.4 × (judge/5) + 0.2 × (1 − error_rate)`.

## Deployment (Railway)

1. Create a Railway project from this repo
2. Set environment variables:
   - `GHOSTFOLIO_DEFAULT_DATA_SOURCE=mock` (or `ghostfolio_api`)
   - `GHOSTFOLIO_OPENROUTER_API_KEY=sk-or-...` and `GHOSTFOLIO_AGENT_MODEL=claude-sonnet` (recommended)
   - Or `GHOSTFOLIO_OPENAI_API_KEY=sk-...` (direct OpenAI, alternative)
   - `GHOSTFOLIO_LANGFUSE_PUBLIC_KEY` / `GHOSTFOLIO_LANGFUSE_SECRET_KEY` (optional)
3. Deploy — Railway uses `Procfile`: `web: bash scripts/start.sh`
4. The Streamlit UI is the public entrypoint on `$PORT`

## Project Structure

```
ghostfolio-agent/
├── app/
│   ├── agent.py           # Dual-mode LLM + rule-based agent
│   ├── config.py          # Environment-based settings
│   ├── ghostfolio_client.py # HTTP client with retry
│   ├── llm.py             # LLM factory (OpenRouter/OpenAI/Anthropic)
│   ├── main.py            # FastAPI server
│   ├── observability.py   # Langfuse tracing
│   ├── schemas.py         # Pydantic models
│   ├── telemetry.py       # Structured logging
│   ├── tool_defs.py       # Tool schemas for LLM
│   ├── tools.py           # 7 tool implementations
│   └── data_sources/
│       ├── base.py        # Provider protocol
│       ├── mock_provider.py
│       ├── mock_file_provider.py  # Large dataset provider
│       └── ghostfolio_api_provider.py
├── evals/
│   ├── eval_dataset.json  # 50+ test cases
│   ├── run_evals.py       # Deterministic eval runner
│   ├── llm_judge.py       # LLM-as-judge scorer
│   └── compare_models.py  # Multi-model comparison
├── tests/                 # pytest test suite
├── ui/
│   └── streamlit_app.py   # Chat interface
├── docs/
│   ├── architecture.md    # Architecture documentation
│   └── cost_analysis.md   # Cost projections
└── scripts/
    └── start.sh           # Railway startup script
```

## Development

```bash
# Run tests
pytest -v

# Lint
ruff check app/ tests/ evals/

# Format
ruff format app/ tests/ evals/
```

## License

See [LICENSE](LICENSE).
