Metadata-Version: 2.4
Name: npkt
Version: 1.0.1
Summary: AWS Service Control Policy toolkit: lint, analyze, and simulate SCP impact
Author: NPKT Team
License-Expression: MIT
Project-URL: Homepage, https://github.com/OnticX/npkt
Project-URL: Documentation, https://github.com/OnticX/npkt#readme
Project-URL: Repository, https://github.com/OnticX/npkt
Project-URL: Issues, https://github.com/OnticX/npkt/issues
Keywords: aws,scp,service-control-policy,cloudtrail,security,compliance,linter,policy-analysis
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.1.0
Requires-Dist: rich>=13.7.0
Requires-Dist: python-dateutil>=2.8.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: mypy>=1.8.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Provides-Extra: aws
Requires-Dist: boto3>=1.34.0; extra == "aws"
Provides-Extra: all
Requires-Dist: npkt[aws]; extra == "all"
Dynamic: license-file

# NPKT - AWS Service Control Policy Toolkit

**"Apply SCPs with 0% chance of breaking existing workflows"**

A unified Python toolkit for linting, analyzing, and simulating AWS Service Control Policy (SCP) impact against CloudTrail logs.

## Features

- **Lint** - Validate SCP syntax and detect common mistakes (36+ rules)
- **Analyze** - Find conflicts, shadows, duplicates across policies
- **Simulate** - Test SCP impact against real CloudTrail data before deployment
- **Validate** - Quick syntax check for SCP JSON files
- **Generate Logs** - Generate mock CloudTrail events for all 442 AWS services for testing

### Simulation Capabilities

- **400+ AWS services** with resource ARN extraction (22 hardcoded quirky services + data-driven via IAM reference)
- **1,150+ condition key extractors** (88 hand-tuned + 1,069 auto-generated from IAM reference)
- **External context enrichment** - supply org ID, principal/resource tags, VPC mappings, and management account ID via `--context` file to resolve normally-unevaluable condition keys
- **Service-linked role filtering** - automatically excludes SLR events (SCPs don't apply to them)
- **Management account filtering** - excludes management account events when `management_account_id` is in the context file
- **Simulation confidence scoring** - reports whether denial rate is exact or a lower bound based on unevaluable condition keys
- **Strict conditions mode** - `--strict-conditions` treats unevaluable conditions as non-matching to produce an upper-bound denial rate
- **Resource-level permissions lint rule** - warns when SCP statements use non-`*` Resource with actions that don't support resource-level permissions (W052)
- **Multi-policy hierarchy warning** - warns when multiple policies are evaluated as a flat set instead of OU hierarchy
- **Data event gap detection** - warns when SCPs target events not in logs
- **Mock CloudTrail generation** - generate test events for all 442 AWS services with realistic `requestParameters` and condition key values

## Installation

### Prerequisites

- Python 3.10 or higher
- pip

### Install from source

```bash
git clone <repo-url>
cd NPKT

# Install the package
pip install -e .

# Or with dev dependencies
pip install -e ".[dev]"
```

## Quick Start

### Lint an SCP

```bash
npkt lint policy.json
npkt lint ./policies/
npkt lint policy.json --format json
```

### Analyze policies for conflicts

```bash
npkt analyze ./policies/
npkt analyze policy.json
npkt analyze ./policies/ --format json
```

### Simulate SCP impact

```bash
npkt simulate policy.json --logs ./cloudtrail/
npkt simulate policy.json --logs events.json --days 30
npkt simulate policy.json --logs ./logs/ --context context.json
npkt simulate policy.json --logs ./logs/ --quick
```

### Validate syntax

```bash
npkt validate policy.json
npkt validate policy.json --verbose
```

### Generate mock CloudTrail logs

```bash
npkt generate-logs -o logs.json
npkt generate-logs -o logs.json -s ec2,s3,iam -c 100
npkt generate-logs -o logs.json --write-only --seed 42
npkt generate-logs -o logs.json -c 1000 --regions us-east-1,eu-west-1
```

## CLI Reference

### `npkt lint`

Lint SCP policies for errors and best practices.

```bash
npkt lint <policy_path> [options]
```

| Option | Description |
|--------|-------------|
| `--format, -f` | Output format: `text` (default) or `json` |
| `--strict` | Treat warnings as errors |
| `--quiet, -q` | Only show errors, suppress warnings |

**Example output:**
```
policy.json
  [W] W050: Statement denies all S3 actions without conditions (Statement.0)
  [I] W090: Statement uses NotAction (Statement.1)

Summary: 1 warning(s), 1 info
```

### `npkt analyze`

Analyze policies for conflicts, shadows, and redundancies.

```bash
npkt analyze <policy_path> [options]
```

| Option | Description |
|--------|-------------|
| `--format, -f` | Output format: `text` (default) or `json` |
| `--cross-policy/--no-cross-policy` | Enable cross-policy analysis (default: enabled) |
| `--strict` | Treat warnings as errors |

**Detected issues:**
- **DUPLICATE_STATEMENT** - Identical statements across policies
- **DUPLICATE_SID** - Duplicate statement IDs
- **SHADOW** - Statement is overshadowed by another
- **CONFLICT** - Conflicting Allow/Deny for same actions
- **UNREACHABLE** - Allow statement blocked by Deny statements

### `npkt simulate`

Simulate SCP impact against CloudTrail events.

```bash
npkt simulate <policy_path> --logs <logs_path> [options]
```

| Option | Description |
|--------|-------------|
| `--logs, -l` | Path to CloudTrail logs (required) |
| `--context, -c` | Path to external context JSON file (org ID, tags, VPC mappings) |
| `--format, -f` | Output format: `text` (default) or `json` |
| `--output, -o` | Write output to file |
| `--days, -d` | Days to analyze (default: 90) |
| `--quick` | Quick analysis with sampling |
| `--sample-size` | Sample size for quick mode (default: 1000) |
| `--no-details` | Hide detailed denial list |
| `--strict-conditions` | Treat unevaluable conditions as non-matching (worst-case upper-bound denial rate) |

**Exit codes:**
- `0` - No risk or low risk
- `1` - Medium risk
- `2` - High or critical risk

### `npkt validate`

Quick syntax validation for SCP files.

```bash
npkt validate <policy_path> [--verbose]
```

### `npkt generate-logs`

Generate mock CloudTrail log events for SCP simulation testing. Creates realistic events for any of the 442 AWS services in the IAM reference database, with proper `requestParameters` derived from IAM resource ARN patterns for high resource ARN resolution rates.

```bash
npkt generate-logs -o <output_path> [options]
```

| Option | Description |
|--------|-------------|
| `-o, --output` | Output file path (required) |
| `-s, --services` | Comma-separated service prefixes or `all` (default: `all`) |
| `-c, --count` | Total number of events to generate (default: 500) |
| `--regions` | Comma-separated AWS regions (default: `us-east-1,us-west-2,eu-west-1`) |
| `--account-id` | AWS account ID (default: `123456789012`) |
| `--seed` | Random seed for reproducible output |
| `--write-only` | Only include write/mutative operations (skip read/list) |

**Example workflow - generate logs then simulate:**
```bash
# Generate 500 write-only events for key services
npkt generate-logs -o test_logs.json -s ec2,s3,iam,lambda,rds -c 500 --write-only --seed 42

# Simulate your SCP against the generated events
npkt simulate policy.json --logs test_logs.json

# Simulate with external context for higher accuracy
npkt simulate policy.json --logs test_logs.json --context context.json
```

## Python API

```python
from npkt import (
    load_policy,
    load_policies_from_dir,
    SCPLinter,
    PolicyAnalyzer,
    analyze_policies,
    ImpactAnalyzer,
    FileIngester,
)

# Load and lint a policy
policy = load_policy("policy.json")
linter = SCPLinter()
report = linter.lint(policy.to_dict())

if report.has_errors:
    for result in report.errors:
        print(f"{result.code}: {result.message}")

# Analyze multiple policies
policies = load_policies_from_dir("./policies/")
analysis = analyze_policies(*policies)

for issue in analysis.issues:
    print(f"{issue.type.value}: {issue.message}")

# Simulate SCP impact
ingester = FileIngester("./cloudtrail/")
analyzer = ImpactAnalyzer(
    scp_policies=[policy],
    cloudtrail_ingester=ingester,
)
report = analyzer.analyze()

print(f"Denial rate: {report.denial_rate:.2%}")
print(f"Risk level: {report.get_risk_level()}")

# Simulate with external context for better accuracy
from npkt import ExternalContext

ctx = ExternalContext.from_file("context.json")
analyzer = ImpactAnalyzer(
    scp_policies=[policy],
    cloudtrail_ingester=ingester,
    external_context=ctx,
)
report = analyzer.analyze()
```

## Project Structure

```
NPKT/
+-- src/npkt/               # Main package
|   +-- cli/                # CLI commands
|   |   +-- main.py         # Entry point
|   |   +-- lint.py         # lint command
|   |   +-- analyze.py      # analyze command
|   |   +-- simulate.py     # simulate command
|   |   +-- validate.py     # validate command
|   |   +-- generate.py     # generate-logs command
|   +-- models/             # Data models
|   |   +-- scp.py          # SCPStatement, SCPPolicy
|   |   +-- cloudtrail.py   # CloudTrailEvent
|   |   +-- report.py       # ImpactReport, EvaluationResult, EvaluationContext
|   |   +-- external_context.py # ExternalContext (--context FILE)
|   |   +-- lint.py         # LintReport, LintResult
|   |   +-- analysis.py     # AnalysisReport, Issue
|   +-- linter/             # SCP linter
|   +-- analyzer/           # Policy and impact analysis
|   +-- engine/             # SCP evaluation engine
|   +-- parsers/            # SCP parsers
|   +-- ingest/             # CloudTrail ingesters
|   +-- reporters/          # Output formatters
|   +-- generators/         # CloudTrail log generation
|   +-- data/               # IAM reference data
+-- tests/                  # Test suite (1219 tests)
|   +-- test_services/     # Per-service tests (48 files)
|   +-- test_cli/          # CLI command tests
|   +-- test_engine/       # Engine tests
    +-- fixtures/           # Test data
```

## How It Works

1. **Parse SCP**: Reads and validates SCP policies (JSON format)
2. **Ingest CloudTrail**: Loads CloudTrail events from files (JSON/gzip)
3. **Filter**: Excludes service-linked role events and management account events (SCPs don't apply)
4. **Extract Context**: Resolves resource ARNs and condition key values from each event
5. **Enrich Context**: If `--context` is provided, enriches each event with external data (org ID, principal/resource tags, VPC mappings)
6. **Evaluate**: Tests each event against SCP statements (action, resource, principal, conditions)
7. **Track Confidence**: Records unresolved resources and unevaluable condition keys; qualifies denial rate as exact or lower-bound
8. **Analyze**: Aggregates results, calculates statistics, detects data event gaps
9. **Report**: Generates output with risk assessment, confidence score, and recommendations

## Understanding Risk Levels

| Level | Denial Rate | Action |
|-------|-------------|--------|
| NONE | 0% | Safe to apply |
| LOW | <1% | Review denials, likely safe |
| MEDIUM | 1-5% | Careful review needed |
| HIGH | 5-20% | Significant impact expected |
| CRITICAL | >20% | Major impact, refine SCP first |

## Supported SCP Features

- **Effects**: Allow, Deny
- **Actions**: Wildcards (`*`, `s3:*`, `s3:Delete*`)
- **NotAction**: Inverse action matching
- **Resources/NotResource**: ARN pattern matching with wildcards
- **Principal/NotPrincipal**: Principal ARN pattern matching
- **Conditions**: 24 operators with IfExists and ForAll/ForAny modifiers
  - String: `StringEquals`, `StringLike`, `StringEqualsIgnoreCase`, etc.
  - ARN: `ArnEquals`, `ArnLike`, `ArnNotEquals`, `ArnNotLike`
  - Numeric: `NumericEquals`, `NumericLessThan`, `NumericGreaterThan`, etc.
  - IP: `IpAddress`, `NotIpAddress`
  - Date: `DateEquals`, `DateLessThan`, `DateGreaterThan`, etc.
  - Bool, Null

## Resource ARN Extraction

NPKT uses a hybrid approach for extracting resource ARNs from CloudTrail events:

### Hardcoded Quirky Services (22)

Services with non-trivial extraction logic that requires hand-tuned patterns:

| Category | Services |
|----------|----------|
| **Compute** | EC2 (10 resource types, nested `instancesSet`), EKS (sub-resources under cluster) |
| **Storage** | S3 (composite `bucket/key`, regionless) |
| **Database** | RDS (colon separator `db:id`), ElastiCache (colon separator `cluster:id`) |
| **Messaging** | SQS (URL parsing), EventBridge |
| **Networking** | ELBv2, Route 53 (prefix stripping, regionless), CloudFront (regionless) |
| **Security** | IAM (regionless, priority ordering), KMS (UUID/alias/ARN detection), WAFv2 (scope-based path), Organizations |
| **Monitoring** | CloudWatch, CloudTrail, AWS Config |
| **Integration** | Step Functions, SSM (leading slash stripping), CodePipeline |
| **Data** | Glue (composite `database/table`) |
| **DevOps** | CloudFormation |

### Data-Driven Services (400+)

All remaining services use IAM reference ARN patterns for automatic extraction:

- **ARN passthrough** - Detects when parameters already contain valid ARNs
- **Template scoring** - When multiple resource types exist, picks the best match by resolved placeholders and specificity
- **8 parameter matching strategies** - Exact, camelCase, lowercase, snake_case, abbreviation expansion, suffix stripping, aliases, name fallback
- **Regionless/accountless handling** - Correctly handles services that omit region or account from ARNs

## Condition Key Extraction

NPKT evaluates **1,150+ condition keys** from CloudTrail events using a two-tier system:

### Hand-Tuned Extractors (88 keys across 31 services)

Manually crafted extractors for keys with non-obvious mappings:

| Service | Example Keys |
|---------|--------------|
| **S3** | `s3:prefix`, `s3:delimiter`, `s3:x-amz-acl`, `s3:x-amz-server-side-encryption` |
| **EC2** | `ec2:instancetype`, `ec2:imageid`, `ec2:region`, `ec2:tenancy`, `ec2:volumetype` |
| **RDS** | `rds:databaseclass`, `rds:databaseengine`, `rds:multi-az`, `rds:storagetype` |
| **Lambda** | `lambda:functionarn`, `lambda:layer`, `lambda:runtime` |
| **KMS** | `kms:viaservice`, `kms:callerarn`, `kms:encryptioncontext` |
| **IAM/STS** | `iam:permissionsboundary`, `sts:rolesessionname`, `sts:externalid` |

### Auto-Generated Extractors (1,069 keys across 200+ services)

Derived from IAM reference condition key definitions at startup. Key name parts are converted to `requestParameters` field candidates (e.g., `sagemaker:VolumeKmsKeyId` -> `volumeKmsKeyId`). Hand-tuned extractors always take priority.

## External Context Enrichment

Some condition keys (`aws:PrincipalOrgId`, `aws:PrincipalTag/*`, `aws:ResourceTag/*`, `aws:SourceVpc`) require data not present in CloudTrail events. Without this data, NPKT conservatively assumes conditions match and reports them as unevaluable.

The `--context` flag lets you supply this data via a JSON file, turning "assumed match" into actual evaluation:

```bash
npkt simulate policy.json --logs ./logs/ --context context.json
```

### Context file format

```json
{
  "management_account_id": "123456789012",
  "organization": {
    "id": "o-a1b2c3d4e5",
    "paths": ["o-a1b2c3d4e5/r-ab12/ou-ab12-11111111"]
  },
  "principals": {
    "arn:aws:iam::123456789012:role/AdminRole": {
      "tags": { "Department": "Engineering", "Environment": "production" }
    },
    "arn:aws:iam::123456789012:role/*": {
      "tags": { "OrgUnit": "eng" }
    }
  },
  "resources": {
    "arn:aws:s3:::my-bucket": {
      "tags": { "Classification": "confidential" }
    },
    "arn:aws:s3:::public-*": {
      "tags": { "Classification": "public" }
    }
  },
  "vpc_map": {
    "vpce-0a1b2c3d": "vpc-11111111"
  }
}
```

### What each section resolves

| Section | Effect |
|---------|--------|
| `management_account_id` | Excludes events from this account (SCPs don't apply to management account) |
| `organization.id` | Resolves `aws:PrincipalOrgId` condition key |
| `organization.paths` | Resolves `aws:PrincipalOrgPaths` condition key |
| `principals.*.tags` | Resolves `aws:PrincipalTag/*` condition keys |
| `resources.*.tags` | Resolves `aws:ResourceTag/*` condition keys |
| `vpc_map` | Resolves `aws:SourceVpc` (via VPC endpoint ID mapping) |

Principal and resource ARN patterns support `*` and `?` wildcards. Exact matches take priority over wildcards, and more specific patterns override less specific ones.

### Gathering context data

The data for the context file can be collected with a few AWS CLI commands:

```bash
# Organization ID
aws organizations describe-organization --query 'Organization.Id'

# Principal tags
aws iam list-role-tags --role-name MyRole
aws iam list-user-tags --user-name MyUser

# Resource tags
aws resourcegroupstaggingapi get-resources --resource-type-filters ec2:instance

# VPC endpoint to VPC mapping
aws ec2 describe-vpc-endpoints --query 'VpcEndpoints[].{Id:VpcEndpointId,VpcId:VpcId}'
```

## Testing

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=npkt

# Run specific test file
pytest tests/test_engine/test_scp_engine.py
```

## Development

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run linters
ruff check .
mypy src/

# Format code
ruff format .
```

## Simulation Confidence

NPKT reports simulation confidence to help you understand result reliability:

```
X Would Be DENIED:  3 (>=0.3% -- lower bound, 2 unevaluable condition key(s))

  Filtered: 42 service-linked role event(s) (SCPs do not apply to SLRs)
  Filtered: 15 management account event(s) (SCPs do not apply to management account)

Simulation Confidence: MEDIUM
------------------------------------------------------------
  Resource ARN resolved: 847/1,000 events (84.7%)
  Unevaluable condition keys encountered:
    - aws:PrincipalOrgId (found in 3 evaluations)
    - aws:ResourceTag/Environment (found in 1 evaluation)

  WARNING: These conditions were assumed to MATCH (not deny).
  The actual denial rate may be HIGHER than reported.
  Supply a --context file to resolve evaluable keys, or use
  --strict-conditions to treat unevaluable conditions as denials (worst-case).
```

Use `--context` to resolve unevaluable keys and improve confidence:

```bash
npkt simulate policy.json --logs ./logs/ --context context.json
```

Use `--strict-conditions` for worst-case analysis (upper-bound denial rate):

```bash
npkt simulate policy.json --logs ./logs/ --strict-conditions
```

Running both modes gives a range: the normal mode shows a lower bound and strict mode shows an upper bound. The actual denial rate is somewhere in between.

**Confidence Levels:**
- **HIGH** - Resource resolution >95%, few unevaluable keys
- **MEDIUM** - Some resources unresolved or unevaluable keys present
- **LOW** - Significant data gaps, results may be unreliable

## Known Limitations

### CloudTrail Ingestion
- **File-based only**: CloudTrail logs must be downloaded locally (JSON or gzip format)
- **No S3 direct access**: Cannot read logs directly from S3 buckets
- **Data events**: S3 object operations, Lambda invocations, and DynamoDB item operations require explicit CloudTrail data event logging. Most trails only capture management events -- a deny rule targeting `s3:GetObject` would show 0% denial rate if data events weren't enabled, giving false confidence. NPKT warns when SCPs target these events but none are found in logs.

### Condition Keys Not Evaluable

Some condition keys require external context not available in CloudTrail. Most of these can be resolved by providing a `--context` file (see [External Context Enrichment](#external-context-enrichment)):

| Key Type | Examples | Resolvable via `--context`? |
|----------|----------|---------------------------|
| **Organization context** | `aws:PrincipalOrgId`, `aws:PrincipalOrgPaths` | Yes |
| **Principal tags** | `aws:PrincipalTag/*` | Yes |
| **Resource tags** | `aws:ResourceTag/*` | Yes |
| **VPC context** | `aws:SourceVpc` | Yes (via VPC endpoint mapping) |
| **Service-specific keys** | `s3:prefix`, `s3:x-amz-acl`, `kms:ViaService`, etc. | No |
| **Multi-factor auth** | `aws:MultiFactorAuthAge` | No |

When these keys are encountered without a context file, NPKT assumes the condition matches (conservative approach -- the reported denial rate is a lower bound) and tracks them in the simulation confidence report. SCPs that rely heavily on service-specific condition keys will have less accurate results. Use `--strict-conditions` to flip this assumption and get an upper-bound denial rate.

### Resource ARN Extraction

- **400+ services supported**: 22 hardcoded quirky services + data-driven extraction via IAM reference for all others
- **5-layer extraction**: Direct ARN fields, quirky service patterns, ~190 known ARN parameter keys, IAM reference template resolution, response element scan
- **Resolution rate**: Typically 80-98% depending on service mix. The simulation confidence section reports the exact resolution rate so you can assess impact.

### SCP Evaluation Scope

- **SCP layer only**: This tool evaluates SCPs in isolation. It does not model identity policies, resource policies, permissions boundaries, or session policies. An action the SCP allows could still be denied by other policy types (and vice versa).
- **Service-linked roles**: Automatically filtered out (SCPs do not apply to SLRs)
- **Management account**: Filtered when `management_account_id` is provided in the context file
- **Resource-level permissions**: Linter warns (W052) when actions that don't support resource-level permissions are paired with non-`*` Resource restrictions. The simulator does not yet adjust Resource matching for these actions.
- **OU hierarchy**: SCPs are inherited at every level (Root, OU, Account) and all must allow an action. This tool evaluates provided policies as a flat set and warns when multiple policies are provided.

## Troubleshooting

If you encounter issues, see [TROUBLESHOOTING.md](TROUBLESHOOTING.md).

**Common checks:**
1. Validate your SCP: `npkt validate policy.json`
2. Run tests: `pytest` (1219 tests verify functionality)
3. Check CloudTrail format: Ensure valid JSON with `Records` array
