Metadata-Version: 2.4
Name: superinfo-tool
Version: 1.0.0
Summary: AI-powered research, OSINT, verification and report synthesis CLI
License: MIT
Keywords: research,osint,rag,ai,cli
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: fastapi>=0.111.0
Requires-Dist: uvicorn[standard]>=0.29.0
Requires-Dist: pydantic>=2.7.1
Requires-Dist: pydantic-settings>=2.2.1
Requires-Dist: typer[all]>=0.12.3
Requires-Dist: rich>=13.7.1
Requires-Dist: httpx>=0.27.0
Requires-Dist: beautifulsoup4>=4.12.3
Requires-Dist: readability-lxml>=0.8.1
Requires-Dist: lxml>=5.2.1
Requires-Dist: python-dotenv>=1.0.1
Requires-Dist: loguru>=0.7.2
Requires-Dist: faiss-cpu>=1.13.2
Requires-Dist: sentence-transformers>=2.7.0
Requires-Dist: numpy>=1.26.4
Requires-Dist: openai>=1.30.1
Requires-Dist: arxiv>=2.1.0
Requires-Dist: tenacity>=8.3.0
Requires-Dist: tiktoken>=0.7.0
Requires-Dist: aiofiles>=23.2.1
Requires-Dist: redis[asyncio]>=5.0.4
Requires-Dist: sqlalchemy[asyncio]>=2.0.30
Requires-Dist: asyncpg>=0.29.0
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"

# 🔍 SuperInfo - AI-Powered Research Tool

A **production-grade, modular** research platform combining:
- 🔬 **Research Agent** — web search + RAG + evidence-grounded answers
- 🕵️ **OSINT Agent** — public digital footprint analysis
- ✅ **Verification Agent** — claim fact-checking against authoritative sources
- 📄 **Report Agent** — structured report synthesis

Available as both a **CLI tool** and **FastAPI backend**.

---

## Architecture

```
User Input
    │
    ▼
Intent / CLI / API
    │
    ▼
Agent Router
    │
    ├─► Search (Brave/Bing/SerpAPI)
    │       │
    │       ▼
    │   Fetch & Clean HTML
    │       │
    │       ▼
    │   Chunk + Embed (SentenceTransformers / Grok)
    │       │
    │       ▼
    │   FAISS Vector Store ◄──── Persistent
    │       │
    │       ▼
    │   Top-K Retrieval
    │       │
    │       ▼
    ├─► Grok LLM (reasoning + synthesis)
    │       │
    │       ▼
    └─► Structured Output
            │
            ├─► PostgreSQL (history + reports)
            ├─► Redis (cache + dedup)
            └─► JSON / Markdown report
```

---

## Quick Start

### 1. Install

```bash
git clone <repo>
cd superinfo
pip install -r requirements.txt

# Or install CLI globally
pip install -e .
```

### 2. Configure

```bash
cp .env.example .env
# Edit .env with your API keys:
# - GROK_API_KEY
# - SEARCH_API_KEY
# - DATABASE_URL (optional, for persistence)
# - REDIS_URL (optional, for caching)
```

### 3. Run CLI

```bash
# Research a topic
python superinfo.py research "What is retrieval-augmented generation?"

# Research with arXiv papers
python superinfo.py research "transformer architecture" --arxiv

# OSINT - username
python superinfo.py osint self --username johndoe

# OSINT - organization/domain
python superinfo.py osint org --domain openai.com

# Verify a claim
python superinfo.py verify "The Great Wall of China is visible from space"

# Generate report from saved outputs
python superinfo.py report --input research_output.json --title "My Report"

# Start API server
python superinfo.py serve --port 8000
```

### 4. Deploy with Docker

```bash
docker-compose up -d
# API available at http://localhost:8000
# Docs at http://localhost:8000/docs
```

---

## API Reference

### POST /research
```json
{
  "question": "What is quantum computing?",
  "include_arxiv": false,
  "top_k": 5
}
```

### POST /osint
```json
{"username": "johndoe"}
```

### POST /osint/domain
```json
{"domain": "example.com"}
```

### POST /verify
```json
{"claim": "Coffee is the most consumed beverage in the world."}
```

### POST /report
```json
{
  "agent_outputs": [{...research output...}, {...verify output...}],
  "title": "Quarterly Research Report"
}
```

---

## Tech Stack

| Component | Technology |
|-----------|-----------|
| API Framework | FastAPI + Uvicorn |
| CLI | Typer + Rich |
| LLM | Grok API (OpenAI-compatible) |
| Embeddings | SentenceTransformers / Grok |
| Vector Store | FAISS (IndexFlatL2, persistent) |
| Search | Brave / Bing / SerpAPI |
| Academic | arXiv API |
| Database | PostgreSQL (async SQLAlchemy) |
| Cache | Redis |
| HTTP | httpx (async) |
| HTML Parsing | BeautifulSoup4 + readability-lxml |
| Validation | Pydantic v2 |
| Logging | Loguru |

---

## Folder Structure

```
superinfo/
├── app/
│   ├── agents/
│   │   ├── research.py       # Research agent
│   │   ├── osint.py          # OSINT agent
│   │   ├── verification.py   # Verification agent
│   │   └── report.py         # Report synthesis agent
│   ├── api/
│   │   ├── app.py            # FastAPI app + endpoints
│   │   └── schemas.py        # Pydantic request/response models
│   ├── cli/
│   │   └── main.py           # Typer CLI commands
│   ├── core/
│   │   ├── config.py         # Settings (pydantic-settings)
│   │   ├── llm.py            # Grok LLM client
│   │   ├── embeddings.py     # Embedding service
│   │   ├── rag.py            # RAG pipeline
│   │   └── logging.py        # Loguru setup
│   ├── db/
│   │   ├── models.py         # SQLAlchemy async models
│   │   ├── cache.py          # Redis cache layer
│   │   └── vector_store.py   # FAISS vector store
│   └── utils/
│       ├── text.py           # Chunking, cleaning, fetching
│       └── search.py         # Search abstraction (Brave/Bing/SerpAPI)
├── tests/
│   └── test_core.py          # Unit tests
├── superinfo.py              # Entry point
├── requirements.txt
├── pyproject.toml
├── Dockerfile
├── docker-compose.yml
└── .env.example
```

---

## RAG Implementation

- **Chunk size**: 1000 tokens (~4000 chars)
- **Overlap**: 150 tokens (~600 chars)
- **Embedding**: `all-MiniLM-L6-v2` (384-dim) or Grok embeddings
- **Index**: FAISS `IndexFlatL2`, persistent to disk
- **Retrieval**: Top-k L2 similarity
- **LLM constraint**: All answers grounded in retrieved chunks with `[CHUNK-N]` citations

---

## Running Tests

```bash
pytest tests/ -v
```

---

## Configuration Reference

| Variable | Default | Description |
|----------|---------|-------------|
| `GROK_API_KEY` | required | Grok API key |
| `GROK_BASE_URL` | `https://api.x.ai/v1` | Grok API base URL |
| `GROK_CHAT_MODEL` | `grok-beta` | Chat model name |
| `EMBEDDING_BACKEND` | `sentence_transformers` | `grok` or `sentence_transformers` |
| `SEARCH_PROVIDER` | `brave` | `brave`, `bing`, or `serpapi` |
| `SEARCH_API_KEY` | required | Search provider API key |
| `DATABASE_URL` | postgres URL | PostgreSQL connection string |
| `REDIS_URL` | `redis://localhost:6379/0` | Redis URL |
| `CHUNK_SIZE` | `1000` | Tokens per chunk |
| `CHUNK_OVERLAP` | `150` | Overlap tokens |
| `TOP_K` | `5` | RAG retrieval count |

---

## Extending the System

### Add a new agent:
1. Create `app/agents/my_agent.py` with `run()` async method
2. Add Pydantic schemas in `app/api/schemas.py`
3. Register endpoint in `app/api/app.py`
4. Add CLI command in `app/cli/main.py`

### Add a search provider:
Add a new `_search_<provider>()` function in `app/utils/search.py` and register in `search_web()`.

---

## Constraints

- ✅ No browser automation
- ✅ No login scraping
- ✅ Public HTTP GET only
- ✅ No hidden endpoints
- ✅ No inference beyond visible data
