Metadata-Version: 2.4
Name: syke
Version: 0.2.7
Summary: Agentic memory for AI — collects your digital footprint, synthesizes your identity, feeds that to any AI.
Author-email: Utkarsh Saxena <utkarsh@mysyke.com>
License-Expression: MIT
Project-URL: Repository, https://github.com/saxenauts/syke
Project-URL: Documentation, https://syke-docs.vercel.app
Project-URL: Changelog, https://github.com/saxenauts/syke/blob/main/CHANGELOG.md
Keywords: ai,context,memory,mcp,anthropic,claude,identity
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: anthropic>=0.79.0
Requires-Dist: click>=8.1
Requires-Dist: pydantic>=2.0
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: rich>=13.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: uuid7>=0.1.0
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: lxml>=5.0
Requires-Dist: mcp>=1.0
Requires-Dist: claude-agent-sdk>=0.1.0
Provides-Extra: gmail
Requires-Dist: google-auth-oauthlib>=1.0; extra == "gmail"
Requires-Dist: google-api-python-client>=2.100; extra == "gmail"
Provides-Extra: browser
Requires-Dist: browser-use>=0.11; extra == "browser"
Requires-Dist: playwright>=1.40; extra == "browser"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Provides-Extra: all
Requires-Dist: syke[gmail]; extra == "all"
Requires-Dist: syke[browser]; extra == "all"
Requires-Dist: syke[dev]; extra == "all"
Dynamic: license-file

# Syke — Personal Context Daemon

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![Tests](https://img.shields.io/badge/tests-272%20passing-brightgreen.svg)](https://github.com/saxenauts/syke/actions)
[![Anthropic](https://img.shields.io/badge/Anthropic-Opus%204.6-blueviolet.svg)](https://www.anthropic.com)
[![MCP](https://img.shields.io/badge/MCP-8%20tools-orange.svg)](https://modelcontextprotocol.io)
[![Demo](https://img.shields.io/badge/demo-live-ff69b4.svg)](https://syke-ai.vercel.app)
[![Docs](https://img.shields.io/badge/docs-site-blue.svg)](https://syke-docs.vercel.app)

> Cross-web working memory for AI. One daemon, continuous sync, every model knows you.

## The Problem

Your identity is scattered across 5 platforms. No AI sees the whole picture. Every new session starts from zero — *"Hi, I'm an AI assistant. How can I help you today?"*

Syke fixes this.

## What You Get

A synthesized identity that follows you everywhere. Here's a real CLAUDE.md output:

```
# About alex
<!-- Generated by Syke from gmail, chatgpt, github (150 events) -->

A curious builder exploring the intersection of AI and developer tools.

## What's Active Right Now
🔥 **Syke Hackathon**: Building a personal context daemon for Claude Code hackathon.
  - Multiple commits today
  - ChatGPT conversations about architecture

## Recent Context
Working intensely on Syke, a personal context daemon. Writing Python, using Opus 4.6.

## Current World State
Building Syke v0.2 for Claude Code Hackathon (deadline Feb 16). Core focus: ask() tool.

## How They Communicate
casual, intense, exploratory. Direct, fast-paced, mixes technical and philosophical.
```

Notice how `active_threads` weave signals from *multiple platforms* — GitHub commits and ChatGPT research — into a single coherent thread. That's the synthesis.

## How It Works

```mermaid
graph TB
    subgraph Clients["ANY MCP CLIENT"]
        CC[Claude Code]
        CU[Cursor]
        CA[Custom Agent]
    end

    subgraph Syke["SYKE DAEMON"]
        IS["Agentic Perception<br/>Agent SDK + 6 MCP Tools<br/>Coverage-Gated Exploration<br/>Strategy Evolution (ALMA)"]
        TL["Unified Timeline<br/>SQLite + WAL"]
        IS --> TL
    end

    subgraph Sources["DATA SOURCES"]
        S1[Claude Code<br/>Sessions]
        S2[ChatGPT<br/>Export]
        S3[GitHub<br/>API]
        S4[Gmail<br/>OAuth]
        S5[MCP Push<br/>any client]
    end

    Clients <-->|"MCP (pull & push)"| Syke
    TL --- S1
    TL --- S2
    TL --- S3
    TL --- S4
    TL --- S5
```

**The sync loop**: Collect signals from your platforms → perceive patterns across them → distribute context to every AI tool → collect new signals back → re-sync. Your identity evolves as you do.

## Quick Start

```bash
uvx syke setup --yes
```

`ANTHROPIC_API_KEY` recommended for profile generation ([get one here](https://console.anthropic.com/settings/keys)). Setup works without it — data collection, MCP, and daemon proceed; perception is skipped until the key is added.

Auto-detects your username, local data sources, builds your identity profile, and configures MCP.

<details>
<summary>Other install methods</summary>

**pipx** (persistent install):
```bash
pipx install syke
syke setup --yes
```

**pip** (in a venv):
```bash
pip install syke
syke setup --yes
```
</details>

<details>
<summary>From source (development)</summary>

```bash
git clone https://github.com/saxenauts/syke.git && cd syke
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
cp .env.example .env  # Set ANTHROPIC_API_KEY
python -m syke setup --yes
```
</details>

<details>
<summary>API key setup</summary>

Get your key from [console.anthropic.com](https://console.anthropic.com/settings/keys). Required for perception and the `ask()` MCP tool.

```bash
export ANTHROPIC_API_KEY=your-key-here
echo 'export ANTHROPIC_API_KEY=your-key-here' >> ~/.zshrc  # persist
```
</details>

## Agentic Perception

This is where Syke pushes the boundaries of what's possible with the Agent SDK.

### Agent SDK with Custom MCP Tools

The perception agent doesn't receive a text dump — it *explores* interactively. Six custom MCP tools let it browse timelines, search across platforms, cross-reference topics, read its own prior profiles, and submit structured output:

| Tool | Purpose |
|------|---------|
| `get_source_overview` | Understand what data exists: platforms, counts, date ranges |
| `browse_timeline` | Browse events chronologically with source/date filters |
| `search_footprint` | Full-text keyword search across all events |
| `cross_reference` | Search a topic across ALL platforms, grouped by source |
| `read_previous_profile` | Read prior perception for incremental updates |
| `submit_profile` | Submit the final structured profile (gated by coverage) |

The agent typically makes 5-12 targeted tool calls, forming hypotheses and testing them — not processing a static context window.

### Coverage-Gated Exploration (PermissionResultDeny)

The Agent SDK's hook system enforces exploration quality. A `PreToolUse` hook tracks which sources the agent has browsed, searched, and cross-referenced. If the agent tries to call `submit_profile` before covering all platforms:

```
PermissionResultDeny(reason="Sources not explored: github (67% coverage).
Explore the missing sources first, then resubmit.")
```

The agent literally cannot submit a shallow profile. Zero extra API cost — hooks piggyback on existing turns.

### Multi-Agent Orchestration

Three specialized Sonnet sub-agents explore in parallel, each with constrained tool access:

- **Timeline Explorer** — browses chronologically, identifies active threads and recent patterns
- **Pattern Detective** — cross-references topics across platforms, finds contradictions
- **Voice Analyst** — analyzes communication style, tone, vocabulary, personality signals

Opus synthesizes their findings into the final profile. Agent SDK's `AgentDefinition` handles delegation, tool scoping, and result aggregation.

### ALMA-Inspired Strategy Evolution

**This is the technical crown jewel.** Inspired by the ALMA paper (Clune, 2026) — the agent evolves its own exploration strategy across runs:

1. **Explore**: Agent runs perception, leaving a trace of every tool call and result
2. **Reflect**: Deterministic analysis labels each search as productive or wasted (zero LLM cost)
3. **Evolve**: Productive queries promoted, dead ends culled, new priorities discovered
4. **Adapt**: Next run reads the evolved strategy via tool, explores smarter

**12 runs. Real data. The system learned.**

| Strategy | Runs | Key Searches | Peak Score |
|----------|------|-------------|------------|
| v0 (baseline) | 1-3 | project names: Syke, Pogu, ALMA | 88.7% |
| v1 (concepts) | 4-6 | concepts: memory, federated, PersonaMem | **94.3%** |
| v2 (entities) | 7-9 | entities: wizard, Persona, Eder | 91.2% |
| v3 (refined) | 10-12 | refined ranking, entity relationships | 92.8% |

**Key discovery**: searching for *concepts* beats searching for *project names*. Strategy v1 found deeper cross-platform connections because "memory" appears across ChatGPT research, Claude Code implementation, and GitHub commits — while "Syke" only appears where the project is explicitly named.

Total cost: $8.07 across 12 runs. Peak quality: 94.3% at $0.60/run — 67% cheaper than the $1.80 legacy baseline.

### Federated Push/Pull

Any MCP client can *read* your context (pull) and *contribute* new events back (push). Your Claude Code session logs what you're building. Your Cursor session adds context. The identity grows from every tool.

### Continuous Sync

The daemon syncs every 15 minutes, runs incremental profile updates, and skips when nothing changed. Identity that drifts with you — what's true about you on Monday isn't true on Friday.

### Memory Threading

Active threads track what you're working on *across* platforms. A GitHub commit about "auth refactor" + a ChatGPT research thread on "JWT vs session tokens" + a Claude Code session implementing the change = one coherent thread with cross-platform signals.

```json
{
  "name": "Syke Hackathon",
  "intensity": "high",
  "platforms": ["github", "chatgpt"],
  "recent_signals": [
    "Multiple commits today",
    "ChatGPT conversations about architecture"
  ]
}
```

The perception agent discovers these connections by cross-referencing topics across all platforms — it's not hard-coded.

## MCP Server

8 tools via the Model Context Protocol. The `ask()` tool is the recommended entry point — it uses agentic reasoning to explore the timeline and synthesize answers.

```python
ask("What is the user working on?")
ask("What did they do last week?")
ask("What do they think about AI agents?")
```

Core data tools: `get_profile`, `query_timeline`, `search_events`, `get_manifest`, `push_event`, `push_events`.

## Benchmarks

All methods produce the same `UserProfile` schema. Tested on 3,225 events across ChatGPT, Claude Code, and GitHub:

| | Legacy | Agentic v1 | Multi-Agent v2 | Meta-Best |
|---|-------:|----------:|---------------:|---:|
| **Cost** | $1.80 | $0.71 | $1.04 | **$0.60** |
| **Eval score** | -- | -- | -- | **94.3%** |
| Source coverage | 100% | 67% | 100% | 100%* |
| Cross-platform threads | 2 | 1 | 2 | 4 |
| Identity anchor | 660ch | 411ch | 637ch | 819ch |
| Wall time | 119s | 160s | 225s | 189s |
| API turns | 1 | 13 | 13 | 12 |

**Meta-Best Per-Dimension Breakdown (Run 5):**

| Dimension | Score | Detail |
|-----------|------:|--------|
| Thread quality | 61% | 6 threads, 4 cross-platform, high specificity |
| Identity anchor | 78% | 819 chars, deep and specific |
| Voice patterns | 100% | Rich tone, 5 vocab notes, 6 examples |
| Source coverage | 100% | 3/3 platforms |
| Completeness | 100% | All fields populated |
| Recent detail | 100% | 1,304 chars, 10 temporal markers |
| **Composite** | **94.3%** | Weighted average |

## Architecture

### Why SQLite over vector DB?

Syke doesn't need semantic search at the storage layer — that's the LLM's job during perception. SQLite with WAL mode gives concurrent reads, ACID transactions, zero infrastructure. Semantic understanding happens in Opus's thinking, not in the database.

### Why Agent SDK over raw API calls?

Hooks (`PreToolUse`, `PostToolUse`), sub-agent delegation, and structured tool definitions. The coverage gate would require building a custom orchestration loop from scratch. With the SDK, it's a single `PermissionResultDeny` return.

### Why one event per session?

Sessions are the natural unit of intent. A Claude Code session about "refactoring auth" has 50+ messages but represents one activity. Per-message would bloat the timeline 50x.

### Why content filtering?

Privacy by design, not afterthought. Credentials and private messages never enter the timeline. Content that never enters SQLite can never be sent to an LLM.

### Why 4 output formats?

Different consumers: JSON for programs, Markdown for humans, CLAUDE.md for Claude Code projects, USER.md for portable identity.

## Privacy

**Local storage**: All data stays in `~/.syke/data/{user}/syke.db`. Nothing is uploaded except during perception (Anthropic API, under their [data policy](https://www.anthropic.com/privacy)).

**Content filtering**: Pre-collection filter strips credentials and private messaging content before events enter SQLite.

**Consent tiers**: Public sources (GitHub) need no consent. Private sources (Claude Code, ChatGPT, Gmail) require `--yes` flag.

## Supported Platforms

| Platform | Status | Method | Data Captured |
|----------|--------|--------|---------------|
| Claude Code | Working | Local JSONL parsing | Sessions, tools, projects, git branches |
| ChatGPT | Working | ZIP export parsing | Conversations, topics, timestamps |
| GitHub | Working | REST API | Repos, commits, issues, PRs, stars, READMEs |
| Gmail | Working | OAuth API | Subjects, snippets, labels, sent patterns |
| Twitter/X | Stub | -- | Adapter stubbed, not implemented |
| YouTube | Stub | -- | Adapter stubbed, not implemented |

---

[Docs](https://syke-docs.vercel.app) · [Demo](https://syke-ai.vercel.app) · [PyPI](https://pypi.org/project/syke/) · 272 tests · MIT

*Built for the [Claude Code Hackathon](https://docs.anthropic.com/en/docs/claude-code), Feb 2026. By [Utkarsh Saxena](https://github.com/saxenauts).*
