Metadata-Version: 2.4
Name: syke
Version: 0.1.1
Summary: Personal context daemon — pulls your digital footprint, perceives who you are, feeds that to any AI.
Author-email: Utkarsh Saxena <utkarsh@mysyke.com>
License-Expression: MIT
Project-URL: Repository, https://github.com/saxenauts/syke
Project-URL: Documentation, https://syke-docs.vercel.app
Project-URL: Changelog, https://github.com/saxenauts/syke/blob/main/CHANGELOG.md
Keywords: ai,context,memory,mcp,anthropic,claude,identity
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: anthropic>=0.79.0
Requires-Dist: click>=8.1
Requires-Dist: pydantic>=2.0
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: rich>=13.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: uuid7>=0.1.0
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: lxml>=5.0
Requires-Dist: mcp>=1.0
Requires-Dist: claude-agent-sdk>=0.1.0
Provides-Extra: gmail
Requires-Dist: google-auth-oauthlib>=1.0; extra == "gmail"
Requires-Dist: google-api-python-client>=2.100; extra == "gmail"
Provides-Extra: browser
Requires-Dist: browser-use>=0.11; extra == "browser"
Requires-Dist: playwright>=1.40; extra == "browser"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Provides-Extra: all
Requires-Dist: syke[gmail]; extra == "all"
Requires-Dist: syke[browser]; extra == "all"
Requires-Dist: syke[dev]; extra == "all"
Dynamic: license-file

# Syke — Personal Context Daemon

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![Tests](https://img.shields.io/badge/tests-210%2B%20passing-brightgreen.svg)](https://github.com/saxenauts/syke/actions)
[![Anthropic](https://img.shields.io/badge/Anthropic-Opus%204.6-blueviolet.svg)](https://www.anthropic.com)
[![MCP](https://img.shields.io/badge/MCP-7%20tools-orange.svg)](https://modelcontextprotocol.io)
[![Demo](https://img.shields.io/badge/demo-live-ff69b4.svg)](https://syke-ai.vercel.app)
[![Docs](https://img.shields.io/badge/docs-site-blue.svg)](https://syke-docs.vercel.app)

> Your cross-web working memory. One system, every AI.

Syke is infrastructure for portable AI identity. It ingests your digital footprint across platforms, synthesizes a psyche-level understanding of who you are using Claude's Agent SDK, and distributes that context to any AI tool via MCP — so every model you interact with already knows you.

**This is not a memory system.** Memory systems store facts. Syke synthesizes *psyche* — patterns, voice, what drives you, what you're working on right now. It's the active context layer that attaches to any memory system, any model, any platform.

## Contents

- [The Problem](#the-problem)
- [How It Works](#how-it-works)
- [Quick Start](#quick-start)
- [Creative Use of Opus 4.6](#creative-use-of-opus-46)
- [MCP Server](#mcp-server)
- [Benchmarks](#benchmarks)
- [Architecture](#architecture--design-decisions)
- [Privacy](#privacy-model)
- [Platforms](#supported-platforms)
- [CLI Reference](#cli-reference)
- [Contributing](#contributing)
- [Docs](#documentation)
- [License](#license)

## The Problem

Every AI conversation starts from zero. You've had 10,000 interactions across ChatGPT, Claude, GitHub, email — and none of it carries over. Each new session, each new tool, each new model: *"Hi, I'm an AI assistant. How can I help you today?"*

Your digital presence is fragmented across platforms. No single AI sees the full picture. The insights from your ChatGPT research don't inform your Claude Code sessions. Your GitHub commits don't shape how your email assistant talks to you.

Syke fixes this. One daemon, continuous sync, every AI gets your context.

## How It Works

```mermaid
graph TB
    subgraph Clients["ANY MCP CLIENT"]
        CC[Claude Code]
        CU[Cursor]
        CA[Custom Agent]
    end

    subgraph Syke["SYKE DAEMON"]
        IS["Identity Synthesis<br/>Agent SDK + 6 MCP Tools<br/>Coverage-Gated Exploration<br/>Strategy Evolution (ALMA)"]
        TL["Unified Timeline<br/>SQLite + WAL"]
        IS --> TL
    end

    subgraph Sources["DATA SOURCES"]
        S1[Claude Code<br/>Sessions]
        S2[ChatGPT<br/>Export]
        S3[GitHub<br/>API]
        S4[Gmail<br/>OAuth]
        S5[MCP Push<br/>any client]
    end

    Clients <-->|"MCP (pull & push)"| Syke
    TL --- S1
    TL --- S2
    TL --- S3
    TL --- S4
    TL --- S5
```

**Federated push/pull**: Any MCP client can *read* your context (pull) and *contribute* new events back (push). Your Claude Code session logs what you're building. Your custom agent logs decisions. Everything feeds the living profile.

## Quick Start

```bash
git clone https://github.com/saxenauts/syke.git && cd syke
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
cp .env.example .env  # Set ANTHROPIC_API_KEY

python -m syke --user you setup --yes  # Detect → ingest → perceive → output
```

That's it. One command detects your local data (Claude Code sessions, ChatGPT exports, GitHub), ingests everything, runs agentic perception, and generates your identity profile.

## Agent Installation

For AI agents setting up Syke for a user:

```bash
python3 -m venv .venv && source .venv/bin/activate && pip install -e .
cp .env.example .env
# Set ANTHROPIC_API_KEY (Claude Code agents: read from environment)
python -m syke --user <name> setup --yes
```

**Auto-detected sources:**
- Claude Code sessions from `~/.claude/` (transcripts + project context)
- ChatGPT export ZIPs in `~/Downloads/`
- GitHub via `gh` CLI or `GITHUB_TOKEN`
- Gmail via OAuth token (if configured)

## Creative Use of Opus 4.6

This is where Syke pushes the boundaries of what's possible with the Agent SDK.

### Agent SDK with Custom MCP Tools

The perception agent doesn't receive a text dump — it *explores* interactively. Six custom MCP tools let it browse timelines, search across platforms, cross-reference topics, read its own prior profiles, and submit structured output:

| Tool | Purpose |
|------|---------|
| `get_source_overview` | Understand what data exists: platforms, counts, date ranges |
| `browse_timeline` | Browse events chronologically with source/date filters |
| `search_footprint` | Full-text keyword search across all events |
| `cross_reference` | Search a topic across ALL platforms, grouped by source |
| `read_previous_profile` | Read prior perception for incremental updates |
| `submit_profile` | Submit the final structured profile (gated by coverage) |

The agent typically makes 5-12 targeted tool calls, forming hypotheses and testing them — not processing a static context window.

### Coverage-Gated Exploration (PermissionResultDeny)

The Agent SDK's hook system enforces exploration quality. A `PreToolUse` hook tracks which sources the agent has browsed, searched, and cross-referenced. If the agent tries to call `submit_profile` before covering all platforms:

```
PermissionResultDeny(reason="Sources not explored: github (67% coverage).
Explore the missing sources first, then resubmit.")
```

The agent literally cannot submit a shallow profile. It must demonstrate thorough exploration before the system accepts its synthesis. Zero extra API cost — hooks piggyback on existing turns.

### Multi-Agent Orchestration

Three specialized Sonnet sub-agents explore in parallel, each with constrained tool access:

- **Timeline Explorer** — browses chronologically, identifies active threads and recent patterns
- **Pattern Detective** — cross-references topics across platforms, finds contradictions
- **Voice Analyst** — analyzes communication style, tone, vocabulary, personality signals

Opus synthesizes their findings into the final profile. Agent SDK's `AgentDefinition` handles delegation, tool scoping, and result aggregation.

### Strategy Evolution via Trace Analysis (ALMA)

Inspired by the ALMA paper (Clune, 2026) — the agent evolves its own exploration strategy across runs:

1. **Explore**: Agent runs perception, leaving a trace of every tool call and result
2. **Reflect**: Deterministic analysis labels each search as productive or wasted (zero LLM cost)
3. **Evolve**: Productive queries promoted, dead ends culled, new priorities discovered
4. **Adapt**: Next run reads the evolved strategy via tool, explores smarter

**12 runs. Real data. The system learned.**

| Strategy | Runs | Key Searches | Peak Score |
|----------|------|-------------|------------|
| v0 (baseline) | 1-3 | project names: Syke, Pogu, ALMA | 88.7% |
| v1 (concepts) | 4-6 | concepts: memory, federated, PersonaMem | **94.3%** |
| v2 (entities) | 7-9 | entities: wizard, Persona, Eder | 91.2% |
| v3 (refined) | 10-12 | refined ranking, yukti added | 92.8% |

The key discovery: searching for *concepts* beats searching for *project names*. Strategy v1 found deeper cross-platform connections because "memory" appears across ChatGPT research, Claude Code implementation, and GitHub commits — while "Syke" only appears where the project is explicitly named.

Total cost: $8.07 across 12 runs. Peak quality: 94.3% at $0.60/run — 67% cheaper than the $1.80 legacy baseline.

*Note: Scores vary across runs due to non-deterministic LLM exploration. The 94.3% is the peak from run 5; median across all 12 runs is ~90%. Reflection is deterministic (zero LLM cost), but the agent's tool-call choices are not.*

### Extended Thinking

16K+ token thinking budget lets Opus cross-reference signals deeply before synthesizing. The agent uses thinking to connect patterns across platforms — a GitHub commit pattern, a ChatGPT research thread, and an email discussion about the same topic get woven into a single coherent thread.

## MCP Server

Add to `~/.claude/settings.json`:

```json
{
  "mcpServers": {
    "syke": {
      "command": "/path/to/syke/.venv/bin/python",
      "args": ["-m", "syke", "--user", "you", "serve", "--transport", "stdio"],
      "cwd": "/path/to/syke"
    }
  }
}
```

Now every Claude Code session knows who you are. 7 MCP tools available:

| Tool | Parameters | Returns |
|------|-----------|---------|
| `get_profile` | `format`: json/markdown/claude-md/user-md | Full identity profile |
| `query_timeline` | `since`, `source`, `limit` | Event timeline (summaries by default) |
| `get_event` | `event_id` | Full content for a single event |
| `get_manifest` | — | Data summary: sources, counts, status |
| `search_events` | `query`, `limit` | Full-text search (summaries by default) |
| `push_event` | `source`, `event_type`, `title`, `content` | Push event from any client |
| `push_events` | `events_json` | Batch push |

**Consumption contract**: `query_timeline` and `search_events` return summaries by default (content stripped) to prevent context flooding. Use `get_event` to fetch full content for specific events.

## Benchmarks

All methods produce the same `UserProfile` schema. Tested on 3,225 events across ChatGPT, Claude Code, and GitHub:

| | Legacy | Agentic v1 | Multi-Agent v2 | Meta-Best |
|---|-------:|----------:|---------------:|---:|
| **Cost** | $1.80 | $0.71 | $1.04 | **$0.60** |
| **Eval score** | -- | -- | -- | **94.3%** |
| Source coverage | 100% | 67% | 100% | 100%* |
| Cross-platform threads | 2 | 1 | 2 | 4 |
| Identity anchor | 660ch | 411ch | 637ch | 819ch |
| Wall time | 119s | 160s | 225s | 189s |
| API turns | 1 | 13 | 13 | 12 |

*\*Meta-Best coverage varies by run. Run 5 (94.3%) achieved 100%; some runs hit 67-100% depending on agent exploration choices. Coverage gating was tightened after the benchmark.*

**Meta-Best Per-Dimension Breakdown (Run 5):**

| Dimension | Score | Detail |
|-----------|------:|--------|
| Thread quality | 61% | 6 threads, 4 cross-platform, high specificity |
| Identity anchor | 78% | 819 chars, deep and specific |
| Voice patterns | 100% | Rich tone, 5 vocab notes, 6 examples |
| Source coverage | 100% | 3/3 platforms |
| Completeness | 100% | All fields populated |
| Recent detail | 100% | 1,304 chars, 10 temporal markers |
| **Composite** | **94.3%** | Weighted average |

## Architecture & Design Decisions

### Why Agent SDK over raw API calls?

The Agent SDK gives us hooks (`PreToolUse`, `PostToolUse`), sub-agent delegation, and structured tool definitions — none of which exist in raw Messages API calls. The coverage gate (blocking `submit_profile` until sources are explored) would require building a custom orchestration loop from scratch. With the SDK, it's a single `PermissionResultDeny` return.

### Why SQLite over vector DB?

Syke doesn't need semantic search at the storage layer — that's the LLM's job during perception. SQLite with WAL mode gives us concurrent reads (daemon + MCP server), ACID transactions, zero infrastructure, and keyword search via `LIKE` queries. The agent's `search_footprint` tool does keyword matching, which is fast and predictable. Semantic understanding happens in Opus's thinking, not in the database.

### Why one event per session (not per message)?

Sessions are the natural unit of intent. A single Claude Code session about "refactoring the auth system" contains 50+ messages but represents one coherent activity. Storing per-message would bloat the timeline 50x and force the perception agent to spend tokens reconstructing context that the session boundary already provides. Content is capped at 50K chars per event.

### Why keyword search in tools (not semantic)?

Multi-word semantic queries fail silently with keyword matching — "AI memory system architecture" returns nothing useful. Single keywords work reliably: "memory", "ALMA", "Persona". The `PreToolUse` hook auto-simplifies multi-word queries to their longest single word, making the agent's searches more effective without it even noticing.

### Why pre-ingestion content filtering?

Privacy by design, not afterthought. The `ContentFilter` runs before events enter SQLite, catching credential patterns (API keys, tokens, passwords) and private messaging content (WhatsApp logs, DMs pasted into AI conversations). Content that never enters the timeline can never be sent to an LLM for perception.

### Why 4 output formats?

Different consumers need different shapes:
- **JSON** — programmatic access, other tools
- **Markdown** — human reading
- **CLAUDE.md** — injected into Claude Code projects, follows the `.claude/CLAUDE.md` convention
- **USER.md** — portable identity file for any context

### Why a daemon?

Identity drifts. What's true about you on Monday isn't true on Friday. The background daemon (macOS LaunchAgent, 15-min sync cycle) re-ingests new events and runs incremental perception, keeping the profile current without manual intervention. Currently experimental — lives in `experiments/daemon/`.

## The Journey

How Syke evolved across the hackathon:

**Day 1-2: Foundation.** Core pipeline: Claude Code adapter (dual-store discovery with DFS path resolver), ChatGPT ZIP parser, GitHub REST API adapter. SQLite timeline. Legacy perception: single-shot Opus with recency-weighted timeline dump and 16K extended thinking budget. It worked, but the agent had no ability to explore — it received everything at once and tried to make sense of it.

**Day 3: Agent SDK.** Rewrote perception using the Claude Agent SDK with custom MCP tools. The agent could now *explore* — browsing timelines, searching topics, cross-referencing platforms. Added coverage gating via `PermissionResultDeny` hooks. Quality improved, but single-agent exploration was narrow (67% source coverage).

**Day 4: Multi-agent.** Three Sonnet sub-agents exploring in parallel, each specialized (timeline, patterns, voice). Opus synthesizes. 100% source coverage, richer cross-platform connections. But expensive ($1.04/run).

**Day 5: Meta-learning.** ALMA-inspired strategy evolution. 12 runs, real data, the system learned which searches work. Peak 94.3% at $0.60 — cheaper than any other method. Built the 4-way benchmark.

**Day 6: Polish.** MCP server tests (23 tests, push/pull verified), viz site (14 interactive sections), Vercel deployment, documentation.

## Privacy Model

**Local storage**: All timeline data stays in `data/{user}/syke.db` on your machine. Nothing is uploaded to external services.

**API transmission**: During perception, event data is sent to the Anthropic API for Opus to analyze. This is the same trust model as using Claude directly — your data goes to Anthropic's API, processed under their [data policy](https://www.anthropic.com/privacy), and not used for training.

**Content filtering**: Pre-ingestion filter strips credentials (API keys, tokens, passwords, SSH keys) and skips private messaging content (WhatsApp logs, DMs). Runs before data enters the timeline.

**Consent tiers**:
- Public sources (GitHub): no consent required
- Private sources (Claude Code, ChatGPT, Gmail): require `--yes` flag or interactive consent prompt

**Strategy files**: Contain search query patterns only — no user content.

## Supported Platforms

| Platform | Status | Method | Data Captured |
|----------|--------|--------|---------------|
| Claude Code | Working | Local JSONL parsing | Sessions, tools, projects, git branches |
| ChatGPT | Working | ZIP export parsing | Conversations, topics, timestamps |
| GitHub | Working | REST API | Repos, commits, issues, PRs, stars, READMEs |
| Gmail | Working | OAuth API (gog CLI + Python fallback) | Subjects, snippets, labels, sent patterns |
| Twitter/X | Stub | -- | Adapter stubbed, not implemented |
| YouTube | Stub | -- | Adapter stubbed, not implemented |

## CLI Reference

```bash
# Detection & setup
syke detect                              # Scan for data sources
syke --user <id> setup [--yes]           # Full pipeline: ingest -> perceive -> output

# Ingestion
syke --user <id> ingest claude-code      # Claude Code sessions (~/.claude/)
syke --user <id> ingest chatgpt -f <zip> # ChatGPT export ZIP
syke --user <id> ingest github --username <name>  # GitHub public data
syke --user <id> ingest gmail            # Gmail via OAuth

# Perception
syke --user <id> perceive                # Agentic (default, Agent SDK)
syke --user <id> perceive --legacy       # Legacy single-shot
syke --user <id> perceive -m agentic-v2  # Multi-agent (3 Sonnet + Opus)
syke --user <id> perceive -m meta        # ALMA meta-learning

# Output
syke --user <id> profile -f json         # JSON profile
syke --user <id> profile -f markdown     # Markdown profile
syke --user <id> profile -f claude-md    # CLAUDE.md format
syke --user <id> profile -f user-md      # USER.md format
syke --user <id> inject -t <dir> -f claude-md  # Inject into project

# Distribution
syke --user <id> serve --transport stdio # MCP server for Claude Code
syke --user <id> serve --transport http  # MCP server via HTTP

# Sync & monitoring
syke --user <id> sync                    # Re-ingest + update profile
syke --user <id> sync --agentic          # Sync with agentic perception
syke --user <id> status                  # Data summary
syke --user <id> timeline                # Event timeline
syke --user <id> health                  # Environment health checks
syke --user <id> metrics                 # Cost/token/timing dashboard
syke --user <id> validate                # End-to-end validation

# Background daemon (experimental, macOS)
syke --user <id> daemon install          # Install LaunchAgent
syke --user <id> daemon status           # Check daemon status
```

## Package Structure

```
syke/                                     # ~5,300 source lines
├── cli.py                    # Click CLI with all commands
├── sync.py                   # Sync business logic (detect, ingest, perceive)
├── config.py                 # .env, paths, API key, model defaults
├── db.py                     # SQLite schema + queries (WAL mode)
├── models.py                 # Pydantic 2.x: Event, UserProfile, etc.
├── metrics.py                # JSONL observability, health checks
├── ingestion/
│   ├── base.py               # BaseAdapter ABC + ContentFilter
│   ├── claude_code.py        # Dual-store adapter (projects + transcripts)
│   ├── chatgpt.py            # ChatGPT ZIP export parser
│   ├── github_.py            # GitHub REST API + pagination
│   └── gmail.py              # Gmail OAuth (gog CLI + Python fallback)
├── perception/
│   ├── agentic_perceiver.py  # Agent SDK perception (single + multi-agent)
│   ├── perceiver.py          # Legacy single-shot perception
│   ├── tools.py              # 6 MCP tools + CoverageTracker
│   ├── agent_prompts.py      # System/task prompts + sub-agent defs
│   └── prompts.py            # Legacy prompts
├── distribution/
│   ├── formatters.py         # 4 output formats (JSON, MD, CLAUDE.md, USER.md)
│   ├── inject.py             # File injection + MCP config
│   └── mcp_server.py         # FastMCP server (7 tools)
└── llm/
    └── client.py             # Anthropic SDK wrapper (streaming, retries, cost)

tests/                                    # ~3,000 test lines, 210+ tests
experiments/                              # Untracked experiment code
├── cli_experiments.py         # Auto-registered experiment CLI commands
├── perception/                # Schema-free perceiver, eval framework
├── benchmarking/              # Benchmark runner, trace analysis, reports
├── simulation/                # 14-day federated push simulation
├── viz/                       # Interactive identity visualizer
├── daemon/                    # Background sync daemon + LaunchAgent
└── stubs/                     # Platform adapter stubs (twitter, youtube)
```

## Testing

```bash
python -m pytest tests/ -v   # 210+ tests across 16 files
```

## Demo

See the [interactive demo site](https://syke-ai.vercel.app) for: before/after comparison, data sources and privacy model, profile evolution across 12 meta-learning runs, benchmark charts, MCP tool gallery, and pipeline walkthrough.

## Contributing

Syke is young — born in a hackathon, growing into infrastructure. See [CONTRIBUTING.md](CONTRIBUTING.md) for dev setup, tests, and how to add adapters.

## Documentation

Full docs at [syke-docs.vercel.app](https://syke-docs.vercel.app) — architecture deep dives, MCP tool reference, platform guides.

## License

MIT

---

*Built for the [Claude Code Hackathon](https://docs.anthropic.com/en/docs/claude-code), Feb 2026. By [Utkarsh Saxena](https://github.com/saxenauts).*
