GitHub

ProductTeam

Turn a product concept into shipping code. PRD, plan, build, evaluate, document, ship.

Standalone CLI · Any LLM Provider

ProductTeam is a standalone Python CLI. No Claude Code required. Works with Anthropic, OpenAI, Ollama (free, local), or Gemini. Just pip install and run.

$ pip install productteam
$ productteam run "I want a CLI tool that estimates API costs"

From idea to shipped product

Eight specialized agents pass structured artifacts through a pipeline with three human approval gates and a build-evaluate loop that catches bugs the builder misses.

PRD Writer
Product Manager
Planner
Tech Lead
max 3 loops
Builder
Engineer
Evaluator
QA Engineer
Doc Writer
Technical Writer
Ship
Done

8 specialized skills

Each skill is a markdown file. Drop in the ones you need, skip the ones you don't.

prd-writer
Product Manager
Takes a concept, asks clarifying questions with sensible defaults, researches competitors, produces a structured PRD.
planner
Tech Lead
Reads PRD, decomposes into sprint contracts with testable acceptance criteria. Never writes code.
builder
Engineer
Implements sprint contracts with production-quality code and tests. Declares "ready for review" -- never "done."
ui-builder
Frontend Engineer
Specialized builder for visual work. Landing pages, dashboards, web UIs. Dark theme, responsive, WCAG AA.
evaluator
QA Engineer
Skeptical by default. Runs tests, verifies acceptance criteria, tries to break things. Grades PASS / NEEDS_WORK / FAIL.
evaluator-design
Design Reviewer
Grades visual artifacts on four criteria: Coherence, Originality, Craft, and Functionality. 1-5 scale.
doc-writer
Technical Writer
Reads code, never fabricates features. Produces README, landing page, PDF, changelog with real data only.
orchestrator
Project Manager
Routes work between agents, manages build-evaluate loops (max 3), enforces approval gates, writes handoff artifacts.

Three approval gates

The pipeline runs automatically between gates. You only stop three times to confirm intent, scope, and readiness to ship.

Gate 1
PRD Approval
"Does this capture your intent?" Review the PRD before planning begins. Revise until it matches your vision.
Gate 2
Sprint Approval
"Does this scope look right?" Review sprint contracts and acceptance criteria before building starts.
Gate 3
Ship Approval
"Ready to commit/push/publish?" All evaluations have passed. Review the final state before shipping.

The Build-Evaluate Loop

  • Builder implements the sprint contract, then declares "ready for review"
  • Evaluator runs tests, verifies acceptance criteria, tries to break things
  • PASS: sprint is complete, move to documentation
  • NEEDS_WORK: automatically routes back to Builder with findings
  • FAIL: escalate to human immediately
  • Maximum 3 loops. After loop 3, the plan is wrong, not the implementation

How it really works

Six layers, from your command to files on disk. Every component has one job.

ProductTeam v2.0 Architecture Diagram
supervisor.py Orchestration

The brain of the system. When you run productteam run "concept", the Supervisor takes over. It reads your config, loads pipeline state from state.json, and launches each stage in sequence.

It enforces three approval gates (PRD, Sprint, Ship) where it pauses for your input. Between gates, it runs autonomously. It manages the build-evaluate loop: Builder produces code, Evaluator grades it, and if the verdict is NEEDS_WORK, the Supervisor feeds the feedback back to the Builder and loops (max 3 times). If an agent gets stuck (timeout, infinite loop, max tool calls), the Supervisor detects it and escalates to you with full context. On every state change, it writes state.json so you can resume later with just productteam run. If a stage gets stuck, productteam recover resets it and re-enters the pipeline.

Thinker Stages Any Provider

PRD Writer and Design Evaluator are thinkers. They take context in and produce a text artifact out. One LLM call: system prompt (the SKILL.md file), user message (the context), response (the artifact).

Because they don't need to touch the filesystem, they work with any provider — Anthropic, OpenAI, Ollama, Gemini. The Supervisor calls provider.complete(system, messages) and writes the result to .productteam/.

Doer Stages Tool-Use Loop

Planner, Builder, UI Builder, Evaluator, and Doc Writer are doers. They need to read files, write code, run tests, and react to results. A single LLM call can't do that. They use an agentic tool-use loop.

The loop: call LLM with 4 tools (read_file, write_file, run_bash, list_dir). The Planner writes sprint YAML files directly to .productteam/sprints/. The Evaluator reads source files and runs the test suite. The Doc Writer reads every module before writing documentation. Path validation blocks traversal attacks. Command validation blocks credential access. Loop detection catches the LLM calling the same tool with identical args 3 times.

tool_loop.py Agentic Runtime

The tool loop is deliberately minimal. Four tools, no more:

read_file
Read any file in the project directory. Path must be relative, no traversal.
write_file
Write content to a file. Creates parent directories. Same path restrictions.
run_bash
Run a shell command. Timeout per call. Blocks .ssh, .aws, credential paths.
list_dir
List files and directories. Shows [FILE] and [DIR] prefixes.
forge/ Phone to Product

Forge is three components working together. The queue (queue.py) is file-based — each job is a directory with job.json, gate.json, and log.txt. Zero infrastructure, no database. The daemon (daemon.py) polls the queue every 10 seconds. When it finds a new job, it creates a project directory, runs init, and starts the Supervisor with auto-approve. At gates, it writes gate.json and sends a webhook notification.

The dashboard (dashboard.py) is a single-page app served by Python's stdlib http.server at 0.0.0.0:7654 — accessible from any device on your local network. It shows all jobs, live log tailing, approve/reject buttons, and a submit form. Open the dashboard on your phone, type a product idea, hit “Forge it,” and the daemon picks it up. No CLI needed. No framework, no build step.

providers/ LLM Abstraction

Every LLM call goes through the provider layer. An abstract LLMProvider base class defines two methods: complete() for thinker stages and complete_with_tools() for doer stages. Four implementations: Anthropic (SDK), OpenAI-compatible (httpx), Ollama (native API), Gemini (REST).

The factory function get_provider(config) reads your productteam.toml and returns the right provider. API keys come from environment variables only — never from config files, never logged, never in state.json.

doctor.py Diagnostics

productteam doctor checks 8 things: Python version, package version, productteam.toml validity, .productteam/ directory, skills directory (all 8 present), provider configuration and API key, forge queue health, and disk space. It prints the thinker/doer note unconditionally — that's the thing users hit first.

Exit code 0 if everything passes. Exit code 1 if anything critical is wrong. --json flag outputs machine-readable results for scripting. --no-network skips API reachability checks.

state.json Pipeline State

The Supervisor writes state.json on every state change. It records which stages are complete, which sprint is being built, what loop iteration the evaluator is on, and when things last changed. When you run productteam run without a concept, the Supervisor reads state.json and resumes from exactly where it left off. Sprints that already passed evaluation are skipped. The --rebuild flag forces a full rebuild.

What makes this different

The core insight: separate the builder from the judge.

Other Multi-Agent Systems ProductTeam
Agents self-evaluate Separate skeptical judge that assumes code is broken
"Done" when builder says so "Done" only when Evaluator grades PASS
State in conversation memory State in structured YAML files that persist across sessions
All agents or nothing Drop in only the skills you need
Complex setup (databases, hooks, scripts) Just markdown files in a directory
Single quality standard Two evaluators: code quality and visual quality

Tested on prompttools

Built and verified across 7 Python packages in a real monorepo.

755
tests written and verified
7
bugs caught by Evaluators
6
packages passed review
3
max loops per sprint

Getting Started

Install, init, run. Three commands to your first pipeline.

# Install
pip install productteam

# Set up your provider
export ANTHROPIC_API_KEY=sk-ant-...

# Init a project
productteam init

# Run the full pipeline
productteam run "a CLI tool that estimates LLM API costs"

Commands

Every command you need.

# Run the pipeline
productteam run "concept"
productteam run # resume
productteam recover # unstick
productteam run --auto-approve
# Forge: phone to product
productteam forge "idea"
productteam forge --listen
productteam forge status
# Setup & diagnostics
productteam init
productteam doctor
productteam status
# Configuration
productteam config
productteam config set pipeline.provider ollama
productteam config set pipeline.model llama3

Phone to product

Submit an idea from anywhere — your phone, a browser, the CLI, or a GitHub Issue. The dashboard at http://<your-ip>:7654 has a submit form, live status, and gate approval buttons. All from your phone's browser.

1
Submit
From your phone's browser, CLI, or GitHub Issue
2
Daemon runs
PRD, plan, build, evaluate, document. Fully headless.
3
Approve gates
Get notified via webhook or Slack. Approve from your phone's browser or CLI.
4
Product ready
Code written, tests passing, docs generated. Ship it.

Design Evaluator

Grades visual artifacts like a senior designer with 15 years of experience. Four scoring dimensions, evidence-based feedback, no vague criticism.

30%
Coherence
Unified visual language
25%
Originality
Deliberate creative choices
25%
Craft
Typography, spacing, contrast
20%
Functionality
User understands in 10 sec

4.0+ = PASS  ·  3.0-3.9 = NEEDS_WORK  ·  Below 3.0 = FAIL

Multi-provider support

Thinker stages work with any provider. Configure once in productteam.toml.

Anthropic Claude
OpenAI GPT-4o
Ollama (local)
Google Gemini
LM Studio
vLLM