Metadata-Version: 2.4
Name: sanicode
Version: 0.8.0
Summary: AI-assisted code sanitization scanner with OWASP ASVS, NIST 800-53, and ASD STIG compliance mapping.
Project-URL: Homepage, https://github.com/rdwj/sanicode
Project-URL: Repository, https://github.com/rdwj/sanicode
Project-URL: Issues, https://github.com/rdwj/sanicode/issues
Author: Sanicode Contributors
License: Apache-2.0
Keywords: compliance,llm,owasp,sast,security,stig
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Requires-Python: >=3.10
Requires-Dist: fastapi>=0.100
Requires-Dist: httpx>=0.24
Requires-Dist: litellm>=1.0
Requires-Dist: networkx>=3.0
Requires-Dist: prometheus-client>=0.17
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: tomli>=2.0; python_version < '3.11'
Requires-Dist: tomlkit>=0.12
Requires-Dist: tree-sitter-language-pack>=0.7
Requires-Dist: tree-sitter>=0.24
Requires-Dist: typer>=0.9.0
Requires-Dist: uvicorn[standard]>=0.20
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: mlflow>=2.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Provides-Extra: mlflow
Requires-Dist: mlflow>=2.0; extra == 'mlflow'
Description-Content-Type: text/markdown

# Sanicode

Sanicode scans Python, JavaScript/TypeScript, and PHP codebases for input validation and sanitization gaps using field-sensitive taint analysis and a data flow knowledge graph, then maps every finding to OWASP ASVS 5.0, NIST 800-53, ASD STIG v4r11, PCI DSS 4.0, FedRAMP, and CMMC 2.0. It also scans lockfiles for third-party dependency vulnerabilities via the OSV database and can generate CycloneDX 1.5 SBOMs. Output formats include SARIF (for GitHub Code Scanning), JSON, Markdown, and an HTML dashboard with an interactive knowledge graph.

Unlike pattern-only tools like Bandit or Semgrep, sanicode traces tainted data from source to sink across function boundaries with field-level precision — `request.args` and `request.form["name"]` are tracked as distinct taint keys, not flattened to `request`. Findings carry context about *how* untrusted input reaches a dangerous call and *whether* sanitization exists along the path.

## Install

```
pip install sanicode
```

Requires Python 3.10+.

## Quick start

Scan a codebase and generate a Markdown report:

```
sanicode scan .
```

Generate SARIF output for CI integration:

```
sanicode scan . -f sarif
```

Generate an HTML dashboard with an interactive knowledge graph:

```
sanicode scan . -f html
```

Generate a DISA STIG Viewer checklist for ATO packages:

```
sanicode scan . -f stig-checklist
```

Fail the build if high-severity findings exist:

```
sanicode scan . --fail-on high
```

Scan dependencies for known vulnerabilities:

```
sanicode deps .
```

Generate a CycloneDX SBOM alongside scan results:

```
sanicode scan . --sbom sbom.json
```

Reports are written to `sanicode-reports/` by default.

## CI/CD integration

### GitHub Action

```yaml
- uses: rdwj/sanicode@v0
  with:
    path: .
    fail-on: high
    format: sarif
```

### Pre-commit hook

```yaml
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/rdwj/sanicode
    rev: v0.5.0
    hooks:
      - id: sanicode
```

See [docs/ci-cd-integration.md](docs/ci-cd-integration.md) for GitLab CI, Jenkins, Azure DevOps, and Tekton/OpenShift Pipelines.

## API server

Start the FastAPI server for remote or hybrid scan mode:

```
sanicode serve
```

This starts on port 8080 with Prometheus metrics at `/metrics`.

### Endpoints

```
POST /api/v1/scan              Submit a scan (async)
GET  /api/v1/scan/{id}         Poll scan status
GET  /api/v1/scan/{id}/findings   Retrieve findings (JSON or ?format=sarif)
GET  /api/v1/scan/{id}/graph      Retrieve knowledge graph
POST /api/v1/analyze           Instant snippet analysis
GET  /api/v1/compliance/map    Compliance framework lookup
GET  /api/v1/health            Liveness check
GET  /metrics                  Prometheus metrics
```

## CLI commands

```
sanicode scan .                              # Scan codebase, generate reports
sanicode scan . -f sarif                     # SARIF output
sanicode scan . -f json -f sarif             # Multiple formats
sanicode scan . -f html                      # HTML dashboard with interactive graph
sanicode scan . --fail-on high               # Exit non-zero on high+ findings
sanicode serve                               # Start API server on :8080
sanicode report scan-result.json             # Re-generate reports from saved results
sanicode report scan-result.json -s high     # Filter by severity
sanicode report scan-result.json --cwe 89    # Filter by CWE
sanicode config setup                        # Interactive provider configuration wizard
sanicode config set llm.fast.model granite-nano  # Script-friendly config
sanicode config test                         # Test configured LLM tiers
sanicode config --show                       # Show resolved configuration
sanicode config --init                       # Create starter sanicode.toml
sanicode graph . --export graph.json         # Export knowledge graph
sanicode graph . --visualize graph.html      # Standalone graph visualization
sanicode rules --list                        # List all detection rules
sanicode rules --validate custom.yaml        # Validate custom rule file
sanicode benchmark                           # Benchmark against Bandit and Semgrep
sanicode scan . -f stig-checklist           # STIG Viewer checklist (.ckl) + summary
sanicode scan . -f poam                     # POA&M entries (CSV + JSON + summary)
sanicode report scan-result.json -f stig-checklist  # STIG checklist from saved results
sanicode report scan-result.json -f poam    # POA&M from saved results
sanicode enrich bandit.sarif semgrep.sarif   # Enrich third-party SARIF with compliance
sanicode enrich *.sarif --merge -o merged.sarif  # Merge and enrich multiple SARIF files
sanicode validate-llm                        # Benchmark LLM pipeline quality (precision/recall/F1 deltas)
sanicode deps .                              # Scan lockfiles for dependency vulnerabilities
sanicode deps . --format json                # JSON output for CI pipelines
sanicode deps . --sbom sbom.json             # Generate CycloneDX SBOM
sanicode scan . --no-deps                    # Skip dependency scanning
sanicode scan . --sbom sbom.json             # Include SBOM with scan
sanicode scan . --offline                    # Skip OSV queries (air-gapped mode)
```

## Detection rules

21 built-in rules across three languages:

**Python** (10 rules, SC001–SC010): path traversal, OS command injection, XSS, SQL injection, code injection, weak cryptography, insecure random, deserialization, hardcoded credentials, SSRF.

**JavaScript/TypeScript** (6 rules, SC200–SC205): path traversal, OS command injection, XSS, weak cryptography, insecure random, hardcoded credentials.

**PHP** (5 rules, SC100–SC104): OS command injection, XSS, SQL injection, deserialization, hardcoded credentials.

Custom YAML rules extend this set. Place rule files in `rules/` in your project root or `~/.config/sanicode/rules/`, and validate with `sanicode rules --validate`.

## Custom rules

```yaml
id: CUSTOM001
cwe_id: 78
severity: high
pattern:
  targets: [python]
  ast_pattern: "call:subprocess.run"
  args:
    shell: "True"
```

Rule files are discovered from `rules/` in the project root and `~/.config/sanicode/rules/`. Run `sanicode rules --validate custom.yaml` to check syntax before deploying.

## Taint analysis

Sanicode performs field-sensitive, dataflow-aware taint tracking at two levels:

- **Intra-procedural**: reaching-definitions analysis within each function body, with field-level precision. Attribute chains like `request.args.get("id")` are tracked as dotted taint keys, not flattened to individual identifiers. Prefix matching ensures that tainting `request` implicitly taints `request.args`, but tainting only `request.args` does not falsely taint unrelated attributes.
- **Inter-procedural**: function summaries propagated across the call graph.

Taint paths produce high-confidence edges in the knowledge graph, giving the LLM (and human reviewers) evidence of whether untrusted data actually reaches a sink.

## Dependency scanning

Sanicode discovers lockfiles (`requirements.txt`, `package-lock.json`, `composer.lock`) and queries the [OSV database](https://osv.dev) for known vulnerabilities. Findings are mapped to CWE-1395 (Dependency on Vulnerable Third-Party Component) with compliance cross-references to NIST SI-2/RA-5, PCI DSS 6.3.2, and FedRAMP baselines. CycloneDX 1.5 SBOMs can be generated alongside scan results.

Dependency scanning runs automatically during `sanicode scan` and can be used standalone via `sanicode deps`. Use `--offline` for air-gapped environments or `--no-deps` to skip it entirely.

## Compliance frameworks

Findings map to six frameworks, covering 104 CWEs:

- **OWASP ASVS 5.0** — V1: Encoding and Sanitization requirements (L1/L2/L3)
- **NIST 800-53** — SI-10 (Information Input Validation), SI-15 (Information Output Filtering), and related controls
- **ASD STIG v4r11** — APSC-DV-002510 (CAT I), APSC-DV-002520 (CAT II), APSC-DV-002530 (CAT II), and related checks. Use `--format stig-checklist` to output a DISA STIG Viewer `.ckl` file with findings mapped directly to ASD STIG v4r11 checklist items, suitable for submission to STIG assessors.
- **PCI DSS 4.0** — Requirement 6 (Develop and Maintain Secure Systems and Software)
- **FedRAMP** — Baselines (Low, Moderate, High) derived from NIST 800-53 control selection. Findings indicate which FedRAMP authorization baselines are affected.
- **CMMC 2.0** — Cybersecurity Maturity Model Certification practices (Level 2+) mapped from NIST 800-53 controls. Useful for DoD supply chain compliance assessments.

## Configuration

Create a config file:

```
sanicode config --init
```

This writes a `sanicode.toml` in the current directory. Config is loaded from (in order):

1. `--config` flag
2. `sanicode.toml` in the current directory
3. `~/.config/sanicode/config.toml`

Sanicode works fully without any configuration. LLM tiers are optional — without them, the tool runs in degraded mode using AST pattern matching, taint analysis, knowledge graph construction, and compliance lookups. LLM integration adds context-aware reasoning on top of these.

### LLM tiers (optional)

The config supports three tiers for different task complexities. Supported providers include cloud APIs (Anthropic, OpenAI, Google, Azure) and self-hosted inference (vLLM, Ollama, OpenShift AI). Run `sanicode config setup` for an interactive wizard that walks through provider selection and endpoint configuration.

| Tier        | Purpose                                  | Recommended model        |
|-------------|------------------------------------------|--------------------------|
| `fast`      | Classification, severity scoring         | Granite Nano, Mistral 7B |
| `analysis`  | Data flow context, taint reasoning       | Granite Code 8B          |
| `reasoning` | Compliance mapping, graph exploitability | Llama 3.1 70B            |

## Current status

v0.8.0 — Pre-built Grafana dashboard (4-row layout with compliance score, findings trends, scan operations, and knowledge graph health panels) plus GrafanaDatasource CR for OpenShift Thanos Querier auto-import. Optional MLflow integration tracks scan runs as experiments with params, metrics, and artifact upload (`pip install 'sanicode[mlflow]'`). Container runtime switched to UBI9 minimal for reduced attack surface. Plus everything from v0.7: field-sensitive taint analysis, SBOM-aware dependency scanning via OSV, CycloneDX 1.5 SBOM generation, FedRAMP/CMMC 2.0 mappings, SARIF enrichment, POA&M generation, STIG checklist output, multi-language scanning, 21 detection rules, inter-procedural taint analysis, and CI/CD integration.

## License

Apache-2.0
