## 2026-03-14 - US-018
- What was implemented: End-to-end integration tests in tests/test_e2e.py. 4 tests covering: (1) full pipeline creates all 8 deliverable .md files, (2) metadata.json created with all required fields, (3) context accumulates correctly so stage N prompt contains outputs from stages 1..N-1, (4) missing output file triggers retry then abort with exit code 1.
- Files changed: tests/test_e2e.py (new), tasks/stories.json, tasks/progress.txt
- **Learnings for future iterations:**
  - metadata.json stores engine as key `engine` (not `engine_name`) — generate_metadata() takes `engine_name` param but writes `"engine": engine_name` in the dict
  - E2E mock engine pattern: use a closure over `output_dir` in `execute_side_effect(prompt, working_directory)` — `working_directory` is `repo_path`, so `out_path = os.path.join(working_directory, output_dir, filename)` correctly reconstructs the output path
  - Track call count with a mutable dict `{"n": 0}` in the closure to derive stage index from call number
  - Patch `threatsmith.main.get_engine` (not `threatsmith.engines.get_engine`) when testing via CLI — the CLI imports get_engine into its own module namespace
  - Real `detect_scanners()` and `generate_metadata()` can be used in E2E tests without mocking — scanners uses shutil.which() (safe), metadata falls back to 'unknown' for git failures
---

## 2026-03-14 - US-017
- What was implemented: CLI interface in src/threatsmith/main.py using Typer. Positional `path` argument, `--engine` (default `claude-code`), `--business-objectives`, `--security-objectives`, `--output-dir` (default `threatmodel/`), `-v/--verbose`. Creates output directory with `os.makedirs(exist_ok=True)`, calls `detect_scanners()`, `generate_metadata()`, `write_metadata()` before pipeline, then instantiates `Orchestrator` and runs it. `raise SystemExit(orchestrator.run())` propagates exit code. Entry point `threatsmith = "threatsmith.main:main"` added to `[project.scripts]` in pyproject.toml.
- Files changed: src/threatsmith/main.py (new), tests/test_cli.py (new), pyproject.toml ([project.scripts] section added), tasks/stories.json
- **Learnings for future iterations:**
  - `typer.testing.CliRunner` is the right way to test Typer CLIs — use `runner.invoke(app, [...])` and check `result.exit_code` and `result.output`
  - `raise SystemExit(code)` propagates exit codes through Typer; `sys.exit()` also works but `raise SystemExit` is more explicit
  - Output dir should be resolved as `os.path.join(repo_path, output_dir)` — relative to the target repo, not CWD
  - Orchestrator receives `output_dir` as a relative string (not the absolute path); it joins with `repo_path` internally
  - `typer` is already a declared dependency in pyproject.toml
---

## Codebase Patterns
- Orchestrator is a dataclass in src/threatsmith/orchestrator.py; stage→filename mapping in _STAGE_FILES dict at module level
- Do not write tests for string constants — test behavior and logic only
- Package is under src/threatsmith with hatchling build backend
- Tests go in tests/ at root level (configured via pyproject.toml testpaths)
- Use `uv run pytest` and `uv run ruff check --fix && uv run ruff format` for quality checks
- Engine abstraction uses ABC with abstract method returning int exit code
- Engine execute() takes (prompt: str, working_directory: str) — single assembled prompt, no system/user split
- get_engine(name) factory in engines/__init__.py maps string names to engine classes
- Stage prompt modules export STAGE_PROMPT constant + build_prompt(context: dict) -> str
- Use .replace() not .format() for prompt placeholders — avoids curly brace escaping issues in prompt text
- Stage 5 introduces `scanners_available` context key (list of scanner name strings); scanner snippets from scanner_snippets.py are conditionally injected via `{scanner_section}` placeholder
- When testing for absence of injected section headers, use `"## SECTION NAME"` (with ##) to avoid matching the phrase inside the prompt body text
- build_prompt context dict uses optional keys: business_objectives, security_objectives (str or None)
- Prior stage output injection uses XML-delimited format: `<prior_stages><stage_NN_name>...</stage_NN_name></prior_stages>`
- Prior stage context key convention: `stage_01_output` (raw markdown string from that stage's deliverable)
- OWASP conditional injection uses keyword lists (_API_KEYWORDS, _LLM_KEYWORDS, _MOBILE_KEYWORDS) matched case-insensitively against Stage 2 output; Web Top 10 is always included

## 2026-03-13 - US-004
- What was implemented: owasp_references.py with OWASP_WEB_TOP_10 (A01–A10), OWASP_API_TOP_10 (API1–API10), OWASP_LLM_TOP_10 (LLM01–LLM10), and OWASP_MOBILE_TOP_10 (M1–M10) as concise one-liner constants per PRD format. scanner_snippets.py with SEMGREP_SNIPPET, TRIVY_SNIPPET, GITLEAKS_SNIPPET, and SCANNER_SNIPPETS dict.
- Files changed: src/threatsmith/prompts/owasp_references.py (new), src/threatsmith/prompts/scanner_snippets.py (new), tasks/stories.json
- **Learnings for future iterations:**
  - Scanner CLI invocations aligned with v0.1.0: `semgrep scan --config auto --severity=ERROR --json --quiet --no-error`, `trivy fs --format json --severity CRITICAL,HIGH --scanners vuln --quiet --exit-code 0`, `gitleaks dir --report-format json --report-path ... --no-banner --redact=100`
  - Gitleaks v0.1.0 uses `gitleaks dir` (not `gitleaks detect --source`)
  - Skipped test_constants.py — no value in asserting string content; test logic, not data
---

## 2026-03-12 - US-001
- What was implemented: Package structure with src/threatsmith/__init__.py (version 0.2.0), engines/base.py (abstract Engine class), engines/__init__.py (exports Engine), prompts/__init__.py (empty), utils/__init__.py (empty), and tests/test_package.py
- Files changed: src/threatsmith/__init__.py, src/threatsmith/engines/__init__.py, src/threatsmith/engines/base.py, src/threatsmith/prompts/__init__.py, src/threatsmith/utils/__init__.py, tests/__init__.py, tests/test_package.py
- **Learnings for future iterations:**
  - The src layout is used with hatchling; imports use `from threatsmith.engines import Engine` (no `src.` prefix)
  - Empty __init__.py files are needed for the prompts and utils packages
---

## 2026-03-13 - US-003
- What was implemented: CodexEngine in src/threatsmith/engines/codex.py using `codex exec <prompt>` via subprocess; get_engine() factory in engines/__init__.py mapping 'claude-code' and 'codex' to their engine classes; tests for CodexEngine and get_engine() added to tests/test_engines.py; fixed outdated test_engine_execute_signature in tests/test_package.py to match the simplified prompt/working_directory interface
- Files changed: src/threatsmith/engines/codex.py (new), src/threatsmith/engines/__init__.py, tests/test_engines.py, tests/test_package.py, tasks/stories.json
- **Learnings for future iterations:**
  - Codex non-interactive mode uses `codex exec "<prompt>"` (analogous to claude's `-p` flag)
  - get_engine() factory uses a dict mapping names to classes, so adding a new engine only requires one dict entry
  - test_package.py had an outdated signature test expecting system_prompt/user_prompt; the real interface is prompt/working_directory (simplified in US-002)
---
## 2026-03-12 - US-002
- What was implemented: ClaudeCodeEngine in src/threatsmith/engines/claude_code.py; exports added to engines/__init__.py; tests/test_engines.py with 3 tests covering command construction, prompt ordering, and exit code passthrough
- Files changed: src/threatsmith/engines/claude_code.py, src/threatsmith/engines/__init__.py, tests/test_engines.py, tests/test_package.py
- **Learnings for future iterations:**
  - The engine receives a single fully-assembled `prompt` string; prompt assembly (combining stage instructions, dynamic context, prior outputs) is the orchestrator's responsibility, not the engine's
  - working_directory is passed as cwd= to subprocess.run()
  - Mock subprocess.run with MagicMock().returncode to test exit code propagation
---

## 2026-03-13 - US-005
- What was implemented: Scanner detection utility in src/threatsmith/utils/scanners.py with detect_scanners() function using shutil.which() to check for semgrep, trivy, gitleaks. Returns dict with 'available' and 'unavailable' lists. Comprehensive test coverage in tests/test_scanners.py for all availability combinations.
- Files changed: src/threatsmith/utils/scanners.py (new), tests/test_scanners.py (new), tasks/stories.json
- **Learnings for future iterations:**
  - Scanner list is centralized in detect_scanners() — adding a new scanner requires only adding to the `scanners` list, no structural changes needed
  - shutil.which() returns path if found, None if not found; using list comprehension pattern for available/unavailable separation makes the code maintainable
  - All 8 tests cover edge cases: all available, none available, mixed scenarios, and verifies shutil.which() is called for each scanner
---

## 2026-03-13 - US-006
- What was implemented: Metadata generation utility in src/threatsmith/utils/metadata.py with generate_metadata() function capturing threatsmith_version (from __init__.__version__), engine_name, commit_hash (via git rev-parse HEAD), branch (via git branch --show-current), ISO 8601 timestamp with UTC timezone, scanners_available, scanners_unavailable, and user_objectives. write_metadata(output_dir, metadata) writes JSON file. Graceful fallback to 'unknown' for git commands if they fail.
- Files changed: src/threatsmith/utils/metadata.py (new), tests/test_metadata.py (new), tasks/stories.json
- **Learnings for future iterations:**
  - Metadata dict is JSON-serializable (all scalar types: str, list)
  - Git command failures (missing git, not in repo, etc.) are caught gracefully with broad Exception handler (not just CalledProcessError)
  - UTC timezone import via `from datetime import UTC` (Python 3.12+) simpler than timezone.utc
  - Test mocking pattern: mock subprocess.run with side_effect list of mock objects with stdout.strip()
  - 9 tests cover all required fields, types, engine capture, scanner lists, objectives, git failures, ISO 8601 format, JSON serialization, and write_metadata file creation
---

## 2026-03-13 - US-007
- What was implemented: Stage 1 prompt template in src/threatsmith/prompts/stage_01_objectives.py with STAGE_PROMPT constant and build_prompt(context) function. STAGE_PROMPT covers all four PASTA Stage 1 pillars: Business Objectives, Security/Compliance/Legal Requirements, Business Impact Analysis (with full data sensitivity classification depth), and Operational Impact. Hybrid output format — four required H2 headings with flexible sub-structure. Three-phase investigation approach (documentation → data layer → business logic). build_prompt() injects user-supplied business/security objectives via placeholder replacement, treating None and empty strings as absent.
- Files changed: src/threatsmith/prompts/stage_01_objectives.py (new), tests/test_stage_01_objectives.py (new), tasks/stories.json, tasks/progress.txt
- **Learnings for future iterations:**
  - Use `.replace("{placeholder}", value)` instead of `.format()` for prompt templates — prompt text frequently contains curly braces in code examples that would need escaping with `.format()`
  - Stage prompt modules follow a consistent pattern: STAGE_PROMPT constant with `{user_objectives_section}` placeholder + `build_prompt(context: dict) -> str` function
  - `context.get("key") or None` pattern handles both missing keys and empty strings uniformly
  - Prompt design: v0.1.0 missed Business Impact Analysis and Operational Impact pillars from PASTA Stage 1 — v0.2.0 covers all four explicitly
  - 11 tests: 9 for build_prompt behavior (objectives injection, empty/None handling, output file reference, placeholder removal) + 2 structural checks on STAGE_PROMPT
---

## 2026-03-13 - US-008
- What was implemented: Stage 2 prompt template in src/threatsmith/prompts/stage_02_technical_scope.py with STAGE_PROMPT constant and build_prompt(context) function. STAGE_PROMPT covers six analysis pillars: Project Boundary Definition, Technology Stack Mapping, Dependency and Supply Chain Analysis (with level of impact assessment), Data Classification and Flow Boundaries, Infrastructure and Deployment, Integration Points and External Attack Surface. Three-phase investigation approach (manifest/config → dependency/architecture tracing → boundary and impact assessment). build_prompt() injects Stage 1 output via XML-delimited `<prior_stages><stage_01_objectives>...</stage_01_objectives></prior_stages>` format.
- Files changed: src/threatsmith/prompts/stage_02_technical_scope.py (new), tests/test_stage_02_technical_scope.py (new), tasks/stories.json, tasks/progress.txt
- **Learnings for future iterations:**
  - Prior stage injection uses `{prior_stages_section}` placeholder and `.replace()` — same pattern as user objectives but wrapping content in XML tags
  - Context key for prior stage output is `stage_01_output` (raw markdown string); subsequent stages should use `stage_02_output`, etc.
  - Stage 2 deliberately does NOT re-classify data — it maps technical components TO classifications from Stage 1. This separation of concerns avoids redundancy across stages.
  - The "level of impact" dimension (blast radius, data access, ripple effects) is what distinguishes a useful technical scope from a naive tech inventory
  - 12 tests: 9 for build_prompt behavior (XML injection, empty/None handling, output file reference, placeholder removal, instructional context) + 3 structural checks on STAGE_PROMPT
---

## 2026-03-13 - US-009
- What was implemented: Stage 3 prompt template in src/threatsmith/prompts/stage_03_decomposition.py with STAGE_PROMPT constant and build_prompt(context) function. STAGE_PROMPT covers five analysis pillars: Use Case Identification (with explicit abuse cases feeding Stage 4), Actors Roles and Trust Levels (4-tier classification: untrusted/semi-trusted/trusted/privileged), Entry Points and Attack Surface (comprehensive catalog with per-entry-point metadata), Assets and Data Inventory (data classification mapping from Stage 1), Data Flow Diagrams and Trust Boundaries (minimum 2 Mermaid DFDs: architecture overview + sensitive data flow). Three-phase investigation approach (entry point discovery → actor/asset tracing → data flow/trust boundary mapping). build_prompt() injects Stages 1-2 outputs via XML-delimited format with selective inclusion (only present stages are wrapped).
- Files changed: src/threatsmith/prompts/stage_03_decomposition.py (new), tests/test_stage_03_decomposition.py (new), tasks/stories.json, tasks/progress.txt
- **Learnings for future iterations:**
  - Stage 3 build_prompt handles two prior stage outputs (stage_01_output, stage_02_output) — each is independently optional and only included in XML when present
  - The `or None` pattern consistently handles both missing keys and empty strings across all stage build_prompt functions
  - Abuse cases in Stage 3 are tied to specific use cases (not generic threats) — generic threats belong in Stage 4's STRIDE analysis
  - Mermaid diagram guidance includes the parentheses-in-labels pitfall (use hyphens/commas instead) carried forward from v0.1.0 experience
  - 17 tests: 13 for build_prompt behavior (both stages, partial stages, XML wrapping, empty/None handling, output file reference, placeholder removal, instructional context) + 4 structural checks on STAGE_PROMPT
---

## 2026-03-13 - US-010
- What was implemented: Stage 4 prompt template in src/threatsmith/prompts/stage_04_threat_analysis.py with STAGE_PROMPT constant and build_prompt(context) function. STAGE_PROMPT covers four analysis pillars: STRIDE Threat Analysis (systematic per-component coverage with all six STRIDE categories), Probabilistic Attack Scenario Analysis (scenario narratives with preconditions, probability assessment, kill chain mapping, cross-component cascading), Regression Analysis on Security Events (technology-specific threat history, architectural pattern analysis, component-level regression, similar incident patterns, dependency threat landscape), Threat Intelligence Correlation (OWASP cross-referencing, public vulnerability pattern matching, threat actor profiling with four actor classes, supply chain threat assessment, emerging threat patterns). Three-phase investigation approach (context integration → systematic identification → cross-cutting validation). build_prompt() injects Stages 1-3 outputs via XML-delimited format, always includes OWASP Web Top 10, conditionally injects API/LLM/Mobile Top 10 based on case-insensitive keyword matching against Stage 2 output.
- Files changed: src/threatsmith/prompts/stage_04_threat_analysis.py (new), tests/test_stage_04.py (new), tasks/stories.json, tasks/progress.txt
- **Learnings for future iterations:**
  - Stage 4 is the first stage to use OWASP references — imports from owasp_references.py and injects them via a separate `{owasp_section}` placeholder (distinct from `{prior_stages_section}`)
  - OWASP Web Top 10 is always injected; API, LLM, and Mobile variants are conditional on Stage 2 keyword presence — keywords are matched case-insensitively against the full stage_02_output string
  - Stage 4 build_prompt handles three prior stage outputs (stage_01_output, stage_02_output, stage_03_output) — each independently optional
  - The v0.1.0 prompt was heavily STRIDE-focused; v0.2.0 adds three additional pillars (probabilistic scenarios, regression analysis, threat intel correlation) making it a more complete threat analysis
  - Mobile OWASP Top 10 keywords: android, ios, react native, flutter, swift, kotlin, mobile
  - 48 tests: 40 for build_prompt behavior (prior stages, XML wrapping, OWASP conditional injection for all four variants with keyword coverage) + 8 structural checks on STAGE_PROMPT
---

## 2026-03-13 - US-011
- What was implemented: Stage 5 prompt template in src/threatsmith/prompts/stage_05_vulnerability.py with STAGE_PROMPT constant and build_prompt(context) function. STAGE_PROMPT covers five analysis pillars: Scanner-Augmented Vulnerability Discovery (scanner correlation with Stage 4 threats, false positive assessment, gap identification), Threat-to-Vulnerability Mapping with Threat Trees (root cause analysis, AND/OR threat tree construction, exploitability assessment, vulnerability chain analysis), Design Flaw Analysis (use case security review, abuse case development, trust boundary violation analysis, security control gap analysis), Vulnerability Scoring and Classification (CVSS 3.1 with all 8 vector components documented, CWE classification, CVE cross-reference, environmental context adjustment), Impact and Exposure Assessment (affected systems/data mapping, historical vulnerability context, current state assessment, change impact analysis, remediation guidance with effort estimates and priority tiers). Three-phase investigation approach (context integration + scanner execution → systematic identification → cross-cutting validation). build_prompt() injects Stages 1-4 outputs via XML-delimited format, conditionally injects scanner snippets from scanner_snippets.py based on `scanners_available` context key.
- Files changed: src/threatsmith/prompts/stage_05_vulnerability.py (new), tests/test_stage_05.py (new), tasks/stories.json, tasks/progress.txt
- **Learnings for future iterations:**
  - Stage 5 is the first stage to use scanner snippets — imports SCANNER_SNIPPETS dict from scanner_snippets.py and injects via a `{scanner_section}` placeholder (distinct from `{prior_stages_section}`)
  - Scanner injection is driven by `scanners_available` context key (list of scanner name strings); unknown scanner names are silently ignored (snippet lookup returns None)
  - When testing for absence of a dynamically injected section header (e.g., "SCANNER INSTRUCTIONS"), use the markdown heading prefix `"## SCANNER INSTRUCTIONS"` to avoid false matches against the same phrase appearing in the prompt body text
  - Stage 5 build_prompt handles four prior stage outputs (stage_01_output through stage_04_output) — each independently optional
  - The v0.1.0 prompt was a monolithic system prompt with all scanner logic baked into the agent class; v0.2.0 separates the prompt template from scanner execution — the wrapper detects scanners and passes availability info, the prompt tells the agent to run them
  - 40 tests: 29 for build_prompt behavior (prior stages XML wrapping, scanner snippet injection for all three scanners, empty/None/unknown scanner handling, combined scanner + prior stages) + 11 structural checks on STAGE_PROMPT
---

## 2026-03-14 - US-012
- What was implemented: Stage 6 prompt template in src/threatsmith/prompts/stage_06_attack_modeling.py with STAGE_PROMPT constant and build_prompt(context) function. STAGE_PROMPT covers four analysis pillars: Attack Surface Analysis (entry point inventory, exposure assessment, pre-remediation vs post-remediation attack surface comparison, attack surface reduction opportunities), Attack Tree Development (Mermaid flowchart TD diagrams with AND/OR decomposition, MITRE ATT&CK tactic and technique mapping with IDs like TA0001/T1190, Mermaid parentheses pitfall guidance), Attack-Vulnerability-Exploit Analysis (technique-to-vulnerability-to-exploit tracing referencing Stage 5 findings by CWE/CVSS/code location, prerequisite analysis, existing control evaluation, vulnerability chaining with amplification effect), Impact Summary and Risk Narrative (plain-language attack narrative, CIA impact with Stage 1 data classification cross-reference, business impact with stakeholder/regulatory/operational/reputational dimensions, feasibility assessment with skill level/tooling/time/detection axes, aggregate risk summary). Three-phase investigation approach (context integration + attack surface mapping → attack tree construction + exploit analysis → impact synthesis + completeness validation). build_prompt() injects Stages 1-5 outputs via XML-delimited format with selective inclusion (only present stages are wrapped).
- Files changed: src/threatsmith/prompts/stage_06_attack_modeling.py (new), tests/test_stage_06.py (new), tasks/stories.json, tasks/progress.txt
- **Learnings for future iterations:**
  - Stage 6 has no dynamic injection beyond prior stages — no scanner snippets, no OWASP references. It only uses the `{prior_stages_section}` placeholder.
  - Stage 6 build_prompt handles five prior stage outputs (stage_01_output through stage_05_output) — each independently optional
  - The v0.1.0 prompt was a monolithic system prompt embedded in a LangGraph agent class with tool-specific instructions (code ingestor, mermaid validation). v0.2.0 separates prompt from execution — the agent uses its native code navigation and the prompt focuses on analytical methodology.
  - Key design distinction from v0.1.0: v0.2.0 adds pre-remediation vs post-remediation attack surface comparison, which is critical for Stage 7's cost-benefit analysis of remediation strategies
  - MITRE ATT&CK integration includes specific tactic IDs (TA0001-TA0040) and example technique IDs (T1190) for concrete cross-referencing rather than generic framework mentions
  - 31 tests: 19 for build_prompt behavior (all five prior stages, XML wrapping, partial stages, empty/None handling, output file reference, placeholder removal, instructional context) + 12 structural checks on STAGE_PROMPT
---

## 2026-03-14 - US-013
- What was implemented: Stage 7 prompt template in src/threatsmith/prompts/stage_07_risk_impact.py with STAGE_PROMPT constant and build_prompt(context) function. STAGE_PROMPT covers five analysis pillars: Business Impact Qualification and Quantification (structured severity scale with Critical/High/Medium/Low, impact quantification across data exposure/financial/operational/blast radius dimensions, composite likelihood assessment combining Stage 6 feasibility with Stage 5 CVSS, risk matrix rating), Countermeasure Identification (four control categories: preventive/detective/corrective/compensating, per-countermeasure specificity with vulnerability reference, implementation approach, effort estimate, dependencies), Residual Risk Assessment (post-countermeasure reassessment of vulnerability/attack surface/impact/likelihood, residual risk rating comparison, risk acceptance criteria with ownership and re-evaluation triggers), Mitigation Effectiveness vs Cost Analysis (risk reduction magnitude, coverage breadth, durability, defense-in-depth contribution, cost dimensions including introduction risk, cost-effectiveness ranking as quick wins/strategic investments/diminishing returns/maintenance items, residual benefits for cross-cutting countermeasures), Prioritized Remediation Roadmap (P0-P3 tiers with clear criteria, per-item details including vulnerability reference/countermeasure/risk reduction/effort/dependencies/residual benefits/acceptance criteria). Three-phase investigation approach (context integration + risk assessment → countermeasure development + residual risk → roadmap synthesis + completeness validation). build_prompt() injects Stages 1-6 outputs via XML-delimited format with selective inclusion (only present stages are wrapped).
- Files changed: src/threatsmith/prompts/stage_07_risk_impact.py (new), tests/test_stage_07.py (new), tasks/stories.json, tasks/progress.txt
- **Learnings for future iterations:**
  - Stage 7 has no dynamic injection beyond prior stages — no scanner snippets, no OWASP references. Like Stage 6, it only uses the `{prior_stages_section}` placeholder.
  - Stage 7 build_prompt handles six prior stage outputs (stage_01_output through stage_06_output) — each independently optional
  - Stage 7 is the first stage to explicitly require residual benefits analysis — documenting how fixing one component improves security for other dependent systems. This cross-cutting analysis is a key differentiator from generic vulnerability remediation advice.
  - The PRD clearly distinguishes Stage 7 (analytical — risk qualification, countermeasures, residual risk) from Stage 8 (report consolidation — no new analysis). Stage 7 is the last analytical stage.
  - 34 tests: 21 for build_prompt behavior (all six prior stages, XML wrapping, partial stages, empty/None handling, output file reference, placeholder removal, instructional context) + 13 structural checks on STAGE_PROMPT
---

## 2026-03-14 - US-014
- What was implemented: Stage 8 prompt template in src/threatsmith/prompts/stage_08_report.py with STAGE_PROMPT constant and build_prompt(context) function. STAGE_PROMPT is a report consolidation template (NOT a PASTA stage) covering: Executive Summary (scope, critical findings count, top risks, key recommendations, overall risk posture), Stage Content Consolidation (all 7 stages in order with standardized ## Stage N: Title headings), Content Preservation Rules (Mermaid diagrams verbatim, CVSS scores/vectors, CWE/CVE/MITRE ATT&CK IDs, code locations and file references, tables and structured data, all countermeasures and remediation items), Content Cleanup Rules (conversational artifact removal, investigation process note removal, heading level normalization, cross-reference deduplication, formatting consistency). build_prompt() injects Stages 1-7 outputs via XML-delimited format with selective inclusion (only present stages are wrapped). Instructional context section header is "PRIOR STAGE OUTPUTS" (distinct from analytical stages' "PRIOR STAGE FINDINGS") to reinforce the consolidation-only role.
- Files changed: src/threatsmith/prompts/stage_08_report.py (new), tests/test_stage_08.py (new), tasks/stories.json, tasks/progress.txt
- **Learnings for future iterations:**
  - Stage 8 is explicitly NOT a PASTA stage — it is a post-pipeline deliverable generation step. The prompt reinforces "no new analysis" as a critical requirement.
  - Stage 8 has no dynamic injection beyond prior stages — no scanner snippets, no OWASP references. Like Stages 6 and 7, it only uses the `{prior_stages_section}` placeholder.
  - Stage 8 build_prompt handles seven prior stage outputs (stage_01_output through stage_07_output) — each independently optional
  - The instructional context uses "PRIOR STAGE OUTPUTS" instead of "PRIOR STAGE FINDINGS" to semantically distinguish the consolidation step from analytical stages
  - Stage 8 XML tag for Stage 7 is `<stage_07_risk_impact>` following the established naming convention (module name suffix)
  - 34 tests: 21 for build_prompt behavior (all seven prior stages, XML wrapping, partial stages, empty/None handling, output file reference, placeholder removal, instructional context, no-new-analysis constraint, consolidation role) + 13 structural checks on STAGE_PROMPT
---

## 2026-03-14 - Dynamic output directory parameter

- What was implemented: All 8 stage prompt modules (stage_01 through stage_08) now accept `output_dir: str = "threatmodel"` as an optional parameter in `build_prompt()`. STAGE_PROMPT constants updated to use `{output_dir}` placeholder instead of hardcoded `threatmodel/` in OUTPUT REQUIREMENTS sections. All `build_prompt()` functions normalize the output directory via `output_dir.rstrip("/") + "/"` to handle both `"threatmodel"` and `"threatmodel/"` input formats cleanly.
- Files changed: src/threatsmith/prompts/stage_01 through stage_08 (build_prompt signature + prompt templates), tests/test_stage_02, test_stage_03, test_stage_04, test_stage_05 (assertion updates)
- **Learnings for future iterations:**
  - Keep runtime/system config (like output_dir) separate from analytical context to maintain proper separation of concerns
  - Single normalization point (`rstrip("/") + "/"`) cleanly handles edge cases without branching logic in callers
  - Update pattern: add parameter with default → normalize in function → inject via `.replace()` → chain with other replacements
  - Test assertions for STAGE_PROMPT should check for placeholder `{output_dir}XX-name.md` rather than hardcoded paths
---

## 2026-03-14 - Typed context dataclasses
- What was implemented: `src/threatsmith/prompts/contexts.py` with 8 typed dataclasses — ObjectivesContext, TechnicalScopeContext, DecompositionContext, ThreatAnalysisContext, VulnerabilityContext, AttackModelingContext, RiskImpactContext, ReportContext. All `build_prompt(context: dict)` signatures across stages 1–8 updated to accept the appropriate dataclass. All 8 test files updated to construct dataclass instances instead of passing raw dicts.
- Files changed: src/threatsmith/prompts/contexts.py (new), stage_01 through stage_08 prompt modules (signature + attribute access), tests/test_stage_01 through test_stage_08
- **Learnings for future iterations:**
  - All context fields are `str | None = None`; `scanners_available` in VulnerabilityContext is `list[str] | None = None` — the `or []` / `or None` guards in build_prompt handle None at runtime
  - dataclass fields use plain `= None` defaults (no `field(default_factory=...)`) since all are scalar or nullable
  - The `or None` access pattern (`context.field or None`) preserves existing empty-string-as-absent semantics from the dict era
---

## 2026-03-14 - US-015
- What was implemented: `src/threatsmith/prompts/assembler.py` with `assemble_prompt(stage_number, prior_outputs, scanner_info, user_objectives, commit_hash, output_dir)` function. Maps stage numbers 1–8 to the correct stage module, builds the appropriate typed context dataclass from the caller's flat dicts, and delegates to each stage's `build_prompt()`. Raises `ValueError` for out-of-range stage numbers. `commit_hash` is accepted but not yet injected into prompts (reserved for future use).
- Files changed: src/threatsmith/prompts/assembler.py (new), tests/test_assembler.py (new), tasks/stories.json
- **Learnings for future iterations:**
  - Stages 6, 7, 8 have hardcoded `threatmodel/` in STAGE_PROMPT constants instead of `{output_dir}` placeholders — the `output_dir` parameter exists in `build_prompt()` but replaces a placeholder that isn't present in those templates (no-op replacement). Stage 1–5 fully support custom `output_dir`.
  - scanner_info dict uses key `"available"` (not `"scanners_available"`) — matches the return from `detect_scanners()`; assembler maps it to the `scanners_available` field in `VulnerabilityContext`
  - The `commit_hash` parameter is part of the assembler signature per PRD, but none of the current stage context dataclasses have a commit_hash field — it's accepted and silently ignored until a future story adds it to stage prompts or metadata injection
  - Ruff auto-fixed the import style from individual `from X import Y` lines to a grouped `from X import (...)` block
---

## 2026-03-14 - US-016
- What was implemented: `src/threatsmith/orchestrator.py` with `Orchestrator` dataclass. `run()` iterates stages 1–8, assembles prompts via `assemble_prompt()`, calls `engine.execute()`, validates that the expected output file was created, reads the deliverable into accumulated `_prior_outputs` context. Each stage gets 2 attempts (original + 1 retry); after both fail, prints an abort message and returns exit code 1. Verbose mode prints stage start, non-zero exit code warnings, missing file warnings, and completion with cumulative context size.
- Files changed: src/threatsmith/orchestrator.py (new), tests/test_orchestrator.py (new), tasks/stories.json
- **Learnings for future iterations:**
  - Orchestrator uses a `@dataclass` with `_prior_outputs: dict[str, str] = field(default_factory=dict, init=False)` for internal state — underscore prefix signals it's not a constructor param
  - `_STAGE_FILES` dict maps stage_number → (filename, prior_outputs_key) — adding a new stage requires only one entry
  - The retry loop uses `range(1, 3)` (attempts 1 and 2) to avoid needing a separate "first attempt" flag
  - Output file path is constructed as `os.path.join(repo_path, output_dir, filename)` — repo_path is the working directory passed to engine.execute()
  - Verbose-mode context size tracking: `sum(len(v) for v in self._prior_outputs.values())` gives cumulative chars across all accumulated outputs
---

## 2026-03-15 - Stdlib logging

- What was implemented: Replaced the hand-rolled `_log()`/`verbose` pattern with Python's stdlib `logging` throughout the package. `main.py` calls `logging.basicConfig(level=DEBUG if verbose else INFO, format="%(message)s", force=True)` as the single configuration point. All modules declare a module-level `logger = logging.getLogger(__name__)`. Orchestrator's `verbose` field and `_log()` method removed entirely.
- Files changed: `src/threatsmith/main.py`, `src/threatsmith/orchestrator.py`, `src/threatsmith/engines/claude_code.py`, `src/threatsmith/engines/codex.py`, `src/threatsmith/utils/scanners.py`, `src/threatsmith/utils/metadata.py`, `tests/test_orchestrator.py`, `tests/test_cli.py`
- **Learnings for future iterations:**
  - `-v` → `logging.DEBUG`, no flag → `logging.INFO` is standard Python CLI convention (pip, ansible, httpx all do this)
  - Use `force=True` in `logging.basicConfig()` at the CLI entry point — ensures reconfiguration even if handlers were set up earlier (important for tests and repeated invocations)
  - Library modules use `logging.getLogger(__name__)` only — never call `basicConfig` or configure handlers themselves; callers configure logging, libraries use loggers
  - Log levels: DEBUG = verbose diagnostics (command run, context size, scanner checks, metadata path); INFO = operational progress (stage start/complete, pipeline start/done, scanner summary); WARNING = non-fatal anomalies before retry (bad exit code, missing output file); ERROR = unrecoverable failures (abort after retry)
  - Remove `verbose` from dataclass constructor once logging handles verbosity — threading a boolean flag through the object graph is the anti-pattern that stdlib logging was designed to eliminate
  - Test logging with `caplog.at_level(logging.DEBUG, logger="threatsmith.orchestrator")` — scoped to the module under test to avoid noise from other loggers; check absence of debug details at INFO level to verify level gating works
  - `test_verbose_flag_passed_to_orchestrator` → `test_verbose_flag_not_forwarded_to_orchestrator`: when refactoring from flag-threading to logging, update tests to assert the flag is NOT forwarded rather than that it is
---
