Turn a product concept into shipping code. PRD, plan, build, evaluate, document, ship.
Standalone CLI · Any LLM Provider
ProductTeam is a standalone Python CLI. No Claude Code required. Works with Anthropic, OpenAI, Ollama (free, local), or Gemini. Just pip install and run.
The Pipeline
Eight specialized agents pass structured artifacts through a pipeline with three human approval gates and a build-evaluate loop that catches bugs the builder misses.
The Team
Each skill is a markdown file. Drop in the ones you need, skip the ones you don't.
How It Works
The pipeline runs automatically between gates. You only stop three times to confirm intent, scope, and readiness to ship.
Architecture
Six layers, from your command to files on disk. Every component has one job.
The brain of the system. When you run productteam run "concept", the Supervisor takes over. It reads your config, loads pipeline state from state.json, and launches each stage in sequence.
It enforces three approval gates (PRD, Sprint, Ship) where it pauses for your input. Between gates, it runs autonomously. It manages the build-evaluate loop: Builder produces code, Evaluator grades it, and if the verdict is NEEDS_WORK, the Supervisor feeds the feedback back to the Builder and loops (max 3 times). If an agent gets stuck (timeout, infinite loop, max tool calls), the Supervisor detects it and escalates to you with full context. On every state change, it writes state.json so you can resume later with just productteam run. If a stage gets stuck, productteam recover resets it and re-enters the pipeline.
PRD Writer and Design Evaluator are thinkers. They take context in and produce a text artifact out. One LLM call: system prompt (the SKILL.md file), user message (the context), response (the artifact).
Because they don't need to touch the filesystem, they work with any provider — Anthropic, OpenAI, Ollama, Gemini. The Supervisor calls provider.complete(system, messages) and writes the result to .productteam/.
Planner, Builder, UI Builder, Evaluator, and Doc Writer are doers. They need to read files, write code, run tests, and react to results. A single LLM call can't do that. They use an agentic tool-use loop.
The loop: call LLM with 4 tools (read_file, write_file, run_bash, list_dir). The Planner writes sprint YAML files directly to .productteam/sprints/. The Evaluator reads source files and runs the test suite. The Doc Writer reads every module before writing documentation. Path validation blocks traversal attacks. Command validation blocks credential access. Loop detection catches the LLM calling the same tool with identical args 3 times.
The tool loop is deliberately minimal. Four tools, no more:
Forge is three components working together. The queue (queue.py) is file-based — each job is a directory with job.json, gate.json, and log.txt. Zero infrastructure, no database. The daemon (daemon.py) polls the queue every 10 seconds. When it finds a new job, it creates a project directory, runs init, and starts the Supervisor with auto-approve. At gates, it writes gate.json and sends a webhook notification.
The dashboard (dashboard.py) is a single-page app served by Python's stdlib http.server at 0.0.0.0:7654 — accessible from any device on your local network. It shows all jobs, live log tailing, approve/reject buttons, and a submit form. Open the dashboard on your phone, type a product idea, hit “Forge it,” and the daemon picks it up. No CLI needed. No framework, no build step.
Every LLM call goes through the provider layer. An abstract LLMProvider base class defines two methods: complete() for thinker stages and complete_with_tools() for doer stages. Four implementations: Anthropic (SDK), OpenAI-compatible (httpx), Ollama (native API), Gemini (REST).
The factory function get_provider(config) reads your productteam.toml and returns the right provider. API keys come from environment variables only — never from config files, never logged, never in state.json.
productteam doctor checks 8 things: Python version, package version, productteam.toml validity, .productteam/ directory, skills directory (all 8 present), provider configuration and API key, forge queue health, and disk space. It prints the thinker/doer note unconditionally — that's the thing users hit first.
Exit code 0 if everything passes. Exit code 1 if anything critical is wrong. --json flag outputs machine-readable results for scripting. --no-network skips API reachability checks.
The Supervisor writes state.json on every state change. It records which stages are complete, which sprint is being built, what loop iteration the evaluator is on, and when things last changed. When you run productteam run without a concept, the Supervisor reads state.json and resumes from exactly where it left off. Sprints that already passed evaluation are skipped. The --rebuild flag forces a full rebuild.
Why This Approach
The core insight: separate the builder from the judge.
| Other Multi-Agent Systems | ProductTeam |
|---|---|
| Agents self-evaluate | Separate skeptical judge that assumes code is broken |
| "Done" when builder says so | "Done" only when Evaluator grades PASS |
| State in conversation memory | State in structured YAML files that persist across sessions |
| All agents or nothing | Drop in only the skills you need |
| Complex setup (databases, hooks, scripts) | Just markdown files in a directory |
| Single quality standard | Two evaluators: code quality and visual quality |
Proven Results
Built and verified across 7 Python packages in a real monorepo.
60 Seconds
Install, init, run. Three commands to your first pipeline.
CLI
Every command you need.
Forge
Submit an idea from anywhere — your phone, a browser, the CLI, or a GitHub Issue. The dashboard at http://<your-ip>:7654 has a submit form, live status, and gate approval buttons. All from your phone's browser.
Standalone Skill
Grades visual artifacts like a senior designer with 15 years of experience. Four scoring dimensions, evidence-based feedback, no vague criticism.
4.0+ = PASS · 3.0-3.9 = NEEDS_WORK · Below 3.0 = FAIL
Works With
Thinker stages work with any provider. Configure once in productteam.toml.