Eight specialized agents. Three human approval gates. One CLI. The builder never grades its own work — a separate, skeptical evaluator does.
Standalone Python CLI. Any LLM provider. No Claude Code required.
The Headline Feature
Start the daemon on your workstation. Open the dashboard on your phone. Type a product idea. Hit "Forge it." Go to bed. The pipeline runs headlessly — PRD, plan, build, evaluate, document. When a gate needs your approval, you get a Slack notification. Tap approve. Wake up to a built, tested, documented codebase.
Zero infrastructure. File-based queue. Python stdlib HTTP server. No React, no build step, no npm.
Most AI coding tools let a single agent build something and then declare it done. That's a student grading their own exam.
ProductTeam separates builder from judge. The Builder writes code and says "ready for review." The Evaluator — a separate agent, separate prompt, skeptical by default — reads the source, runs the tests, tries to break things. It grades PASS, NEEDS_WORK, or FAIL.
If NEEDS_WORK, findings route back to the Builder automatically. Maximum 3 loops. After loop 3, the plan is wrong — not the implementation. The Builder can never ship its own code.
| Other AI Tools | ProductTeam |
|---|---|
| Agent self-evaluates | Separate skeptical judge |
| "Done" when builder says so | "Done" only when Evaluator grades PASS |
| State in conversation memory | State in files that persist across sessions |
| All agents or nothing | Drop in only the skills you need |
| Complex setup | pip install and run |
| One quality standard | Code evaluator + design evaluator |
The Pipeline
Eight specialized agents pass structured artifacts through a pipeline with three human approval gates.
Human in the Loop
The pipeline runs automatically between gates. You stop exactly three times to confirm intent, scope, and readiness to ship.
Commitments
These aren't marketing claims. They're architectural constraints enforced by the code.
The Doc Writer is a doer stage — it reads every source file via read_file before writing documentation. If a function doesn't exist in the code, it doesn't appear in the docs. No hallucinated APIs. No invented features.
Only the Evaluator can grade a sprint PASS. The Builder declares "ready for review" — never "done." This is the GAN-inspired insight: separate the generator from the discriminator.
state.json is written on every state change. Crash, timeout, or Ctrl+C at any point — productteam run resumes from exactly where you left off. Passed sprints are skipped.
Sensitive environment variables (*_KEY, *_TOKEN, *_SECRET) are stripped from the subprocess environment before run_bash executes. The Builder writes Python and runs tests — it doesn't need your credentials.
Doer agents get read_file, write_file, run_bash, list_dir. A narrow tool surface means more predictable behavior and a smaller attack surface than frameworks with dozens of tools.
Each agent is a standalone markdown skill file. Want just the Evaluator as a QA agent? Just the PRD Writer as a thinking tool? Drop in the skills you need. Skip the rest.
The Team
Each skill is a markdown file. Readable, editable, replaceable.
60 Seconds
Install, init, run. Three commands to your first pipeline.
Fit
ProductTeam is an opinionated, auditable idea-to-code operating system for small software teams.
You can describe a product but want structured, auditable AI execution instead of chatting with a coding assistant. ProductTeam gives you a delivery pipeline, not a conversation partner.
You want PRD → Sprint → Build → Evaluate → Document → Ship with human gates at every strategic decision point. ProductTeam encodes a software delivery doctrine you can trust.
The evaluator loop is the difference between "the AI said it's done" and "the AI proved it works." If you've been burned by hallucinated features or rubber-stamped tests, this is for you.