Metadata-Version: 2.4
Name: pycrq
Version: 1.0.0
Summary: A production-quality Open FAIR library for quantitative cyber risk analysis
License: MIT
Project-URL: Homepage, https://github.com/securemetrics/pyCRQ
Project-URL: Repository, https://github.com/securemetrics/pyCRQ
Project-URL: Issues, https://github.com/securemetrics/pyCRQ/issues
Keywords: cyber risk,FAIR,risk quantification,monte carlo,information security
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Topic :: Security
Classifier: Topic :: Office/Business :: Financial
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.10.0
Requires-Dist: matplotlib>=3.7.0
Dynamic: license-file

# Open FAIR Python Library

A production-quality implementation of the **Open FAIR** (Factor Analysis of Information Risk) standard for quantitative cyber risk analysis. This library provides Monte Carlo simulation, risk metrics, scenario building, control ROI analysis, sensitivity analysis, and rich reporting — all in a clean, well-typed Python API.

---

## Table of Contents

1. [Overview](#overview)
2. [Installation](#installation)
3. [Quick Start](#quick-start)
4. [Architecture](#architecture)
5. [API Reference](#api-reference)
   - [Distributions](#distributions)
   - [Ontology (Data Model)](#ontology-data-model)
   - [ScenarioBuilder](#scenariobuilder)
   - [FAIRSimulator](#fairsimulator)
   - [Analysis Functions](#analysis-functions)
   - [Reporting](#reporting)
   - [Utilities](#utilities)
6. [Examples](#examples)
7. [Open FAIR Model Overview](#open-fair-model-overview)
8. [License](#license)

---

## Overview

Open FAIR is a risk taxonomy and measurement standard published by The Open Group. It provides a structured way to decompose and quantify information risk in financial terms. This library implements the core FAIR ontology and adds:

- **Monte Carlo simulation** with configurable iteration counts and seeds
- **Eight probability distributions**: PERT, Normal, LogNormal, Uniform, Triangular, Constant, Beta, Poisson
- **Fluent scenario builder** for readable, maintainable risk models
- **Risk registers** for portfolio-level analysis
- **Control ROI (ROSI)** calculation
- **Sensitivity analysis** for one-at-a-time parameter sweeps
- **Bootstrap confidence intervals** for simulation uncertainty
- **Text reports, JSON/CSV export, and matplotlib visualisations**
- **Calibration helpers** for converting expert estimates to distributions

---

## Installation

### Prerequisites

- Python 3.9 or later
- pip

### Install Dependencies

```bash
pip install -r requirements.txt
```

**requirements.txt:**
```
numpy>=1.24.0
scipy>=1.10.0
matplotlib>=3.7.0
```

### Install as a Package (Optional)

If you want to import `pycrq` from anywhere on your system, install it in development mode from the repository root:

```bash
pip install -e .
```

Or copy the `pycrq/` directory into your project.

---

## Quick Start

```python
from pycrq import (
    ScenarioBuilder, FAIRSimulator,
    pert, scenario_report, compute_metrics, format_currency
)

# 1. Build a scenario using the fluent API
scenario = (
    ScenarioBuilder("Ransomware Attack")
    .asset("Customer Database", "Information", "PII and payment records")
    .threat("External Hacker", "External Adversarial", "Financially motivated group")
    .add_control("EDR Solution", "Endpoint detection and response", "Detective")
    .tef(pert(low=1, mode=3, high=8))           # 1-8 attempts/year
    .vulnerability(pert(low=0.10, mode=0.25, high=0.55))
    .loss_productivity(pert(50_000, 200_000, 750_000))
    .loss_response(pert(30_000, 100_000, 400_000))
    .loss_reputation(pert(0, 75_000, 500_000))
    .secondary_loss(
        slef=pert(0.05, 0.15, 0.35),
        slm=pert(50_000, 250_000, 2_000_000)
    )
    .build()
)

# 2. Simulate
simulator = FAIRSimulator(n_simulations=10_000, seed=42)
result = simulator.simulate(scenario)

# 3. Report
print(scenario_report(result))

# 4. Access metrics programmatically
metrics = compute_metrics(result)
print(f"Mean ALE: {format_currency(metrics.mean_ale)}")
print(f"Risk Level: {metrics.risk_level}")
print(f"VaR (95%): {format_currency(metrics.var_95)}")
```

---

## Architecture

```
pycrq/
  __init__.py        — Public API exports, version
  distributions.py   — Probability distributions (PERT, Normal, etc.)
  ontology.py        — FAIR data model (Asset, ThreatAgent, FAIRScenario, ...)
  simulation.py      — Monte Carlo engine (FAIRSimulator, SimulationResult)
  scenarios.py       — Fluent ScenarioBuilder API
  analysis.py        — Risk metrics, ranking, ROI, sensitivity
  reporting.py       — Text/JSON/CSV reports and matplotlib plots
  utils.py           — Calibration, validation, unit conversions

examples/
  01_basic_scenario.py       — Single ransomware scenario
  02_risk_register.py        — 5-scenario risk register
  03_control_roi.py          — MFA implementation ROI
  04_sensitivity_analysis.py — Threat capability sweep
```

---

## API Reference

### Distributions

All distributions inherit from `Distribution` and implement:

| Method | Description |
|--------|-------------|
| `.sample(n)` | Draw `n` samples as `np.ndarray` |
| `.mean()` | Theoretical mean |
| `.std()` | Theoretical standard deviation |
| `.percentile(p)` | The `p`-th percentile (0–100) |
| `.to_dict()` | JSON-serializable representation |

**Available distributions:**

```python
from pycrq import (
    pert, normal, lognormal, uniform, triangular, constant, beta, poisson
)

# PERT — the workhorse for expert elicitation
tef_dist = pert(low=1, mode=3, high=8)          # (lam=4.0 default)

# Normal
loss_dist = normal(mean=500_000, std=100_000)

# LogNormal — from real-space moments
from pycrq import LogNormalDistribution
loss_dist = LogNormalDistribution.from_moments(mean=500_000, std=300_000)

# Others
vuln_dist  = uniform(low=0.1, high=0.4)
freq_dist  = triangular(low=1, mode=4, high=12)
fixed_val  = constant(value=50_000)
score_dist = beta(alpha=2, beta=5, low=0, high=100)
count_dist = poisson(lam=3.5)

# Deserialize from dict
from pycrq import from_dict
d = tef_dist.to_dict()   # {"type": "pert", "low": 1, "mode": 3, "high": 8, "lam": 4.0}
tef_dist2 = from_dict(d)
```

---

### Ontology (Data Model)

```python
from pycrq import (
    Asset, ThreatAgent, Control,
    FrequencyFactors, LossMagnitudeFactors,
    FAIRScenario, RiskRegister,
    AssetType, ThreatType, EffectType
)

# Asset
asset = Asset(
    name="Customer Database",
    asset_type=AssetType.INFORMATION,
    description="Stores PII and payment records.",
    tags=["critical", "pii"],
)

# ThreatAgent
threat = ThreatAgent(
    name="External Hacker",
    threat_type=ThreatType.EXTERNAL_ADVERSARIAL,
)

# FrequencyFactors
ff = FrequencyFactors(
    tef=pert(1, 3, 8),
    vulnerability=pert(0.1, 0.25, 0.55),
)

# OR decomposed as:
ff = FrequencyFactors(
    contact_frequency=pert(5, 10, 20),
    probability_of_action=pert(0.2, 0.4, 0.7),
    threat_capability=pert(40, 60, 80),
    control_strength=pert(50, 65, 80),
)

# LossMagnitudeFactors
lmf = LossMagnitudeFactors(
    primary_loss=pert(100_000, 500_000, 2_000_000),
    # OR by category:
    productivity=pert(50_000, 200_000, 750_000),
    response=pert(30_000, 100_000, 400_000),
    fines_judgments=pert(0, 75_000, 500_000),
    # Secondary loss:
    secondary_loss_event_frequency=pert(0.1, 0.25, 0.5),
    secondary_loss_magnitude=pert(100_000, 500_000, 3_000_000),
)

# FAIRScenario
scenario = FAIRScenario(
    name="Ransomware Attack",
    asset=asset,
    threat_agent=threat,
    frequency_factors=ff,
    loss_magnitude_factors=lmf,
    time_horizon_years=1.0,
)

# RiskRegister
register = RiskRegister(name="Enterprise Cyber 2026")
register.add_scenario(scenario)
register.remove_scenario("Ransomware Attack")
s = register.get_scenario("Some Scenario")
```

---

### ScenarioBuilder

The fluent builder API is the recommended way to construct scenarios:

```python
from pycrq import ScenarioBuilder, pert

scenario = (
    ScenarioBuilder("My Scenario")
    # Asset
    .asset("Name", "Information", "Description")
    .with_asset(existing_asset_object)           # alternative

    # Threat
    .threat("Name", "External Adversarial", "Description")
    .with_threat(existing_threat_object)          # alternative

    # Controls
    .add_control("Control Name", "Description", "Preventive")
    .with_control(existing_control_object)        # alternative

    # Metadata
    .describe("Scenario narrative text")
    .tag("tag1", "tag2")
    .effect("Confidentiality")                    # or EffectType enum
    .time_horizon(1.0)                            # years

    # Frequency (use one of these groups)
    .tef(pert(1, 3, 8))                           # Direct TEF
    .contact_frequency(pert(5, 10, 20))           # or CF × POA
    .probability_of_action(pert(0.2, 0.5, 0.8))
    .threat_capability(pert(40, 60, 80))          # for logistic vuln
    .control_strength(pert(50, 65, 80))
    .vulnerability(pert(0.1, 0.25, 0.55))         # or direct vuln

    # Loss magnitude (use primary_loss OR category methods)
    .primary_loss(pert(100_000, 500_000, 2_000_000))
    .loss_productivity(pert(50_000, 200_000, 750_000))
    .loss_response(pert(30_000, 100_000, 400_000))
    .loss_replacement(pert(10_000, 50_000, 200_000))
    .loss_competitive_advantage(pert(0, 25_000, 150_000))
    .loss_fines_judgments(pert(0, 75_000, 500_000))
    .loss_reputation(pert(0, 50_000, 300_000))
    .secondary_loss(
        slef=pert(0.05, 0.20, 0.45),             # probability
        slm=pert(50_000, 250_000, 2_000_000),    # magnitude
    )

    .build()                                      # returns FAIRScenario
)
```

---

### FAIRSimulator

```python
from pycrq import FAIRSimulator

simulator = FAIRSimulator(
    n_simulations=10_000,   # iterations (default 10,000)
    seed=42,                 # optional for reproducibility
)

# Single scenario
result = simulator.simulate(scenario)          # -> SimulationResult

# Full risk register
agg_result = simulator.simulate_register(register)  # -> AggregateSimulationResult
```

**SimulationResult attributes:**

| Attribute | Type | Description |
|-----------|------|-------------|
| `scenario_name` | `str` | Scenario identifier |
| `n_simulations` | `int` | Number of iterations |
| `annual_loss_exposure` | `np.ndarray` | Per-iteration ALE |
| `loss_event_frequency` | `np.ndarray` | Per-iteration LEF |
| `loss_magnitude` | `np.ndarray` | Per-iteration LM |
| `threat_event_frequency` | `np.ndarray` | Per-iteration TEF |
| `vulnerability` | `np.ndarray` | Per-iteration vulnerability |
| `primary_loss` | `np.ndarray` | Per-iteration primary loss |
| `secondary_loss` | `np.ndarray` | Per-iteration secondary loss |
| `loss_by_category` | `Dict[str, np.ndarray]` | Per-category losses |
| `mean_ale` | `float` (property) | Mean ALE |
| `std_ale` | `float` (property) | Std dev of ALE |
| `.percentile(p)` | method | `p`-th percentile of ALE |
| `.var(confidence)` | method | Value-at-Risk |
| `.cvar(confidence)` | method | Conditional VaR |
| `.summary()` | method | Dict of all key stats |

---

### Analysis Functions

```python
from pycrq import (
    compute_metrics,
    classify_risk_level,
    rank_scenarios,
    compute_risk_reduction,
    control_roi,
    sensitivity_analysis,
    bootstrap_confidence_interval,
)

# Full metrics from a SimulationResult
metrics = compute_metrics(result)
# metrics.mean_ale, metrics.var_95, metrics.cvar_95, metrics.risk_level, ...

# Risk level from mean ALE
level = classify_risk_level(mean_ale)
# Returns: "LOW" | "MEDIUM" | "HIGH" | "CRITICAL"

# Rank a list of SimulationResult objects
ranked = rank_scenarios(results, by="mean_ale")
# Returns: [("scenario_name", value), ...] sorted descending

# Compare before/after a control
reduction = compute_risk_reduction(before_result, after_result)
# reduction["mean_ale_reduction_abs"], reduction["mean_ale_reduction_pct"], ...

# Compute ROSI
roi = control_roi(before_result, after_result, control_cost=60_000)
# roi["rosi_pct"], roi["net_benefit"], roi["break_even_years"], roi["recommended"]

# Sensitivity analysis (one-at-a-time)
results = sensitivity_analysis(
    scenario_factory=lambda cap: build_scenario(cap),
    param_name="threat_capability",
    param_values=[20, 40, 60, 80],
    n_simulations=5_000,
    seed=42,
)

# Bootstrap confidence interval
lower, upper = bootstrap_confidence_interval(
    result, statistic="mean", confidence=0.95, n_bootstrap=1_000
)
```

---

### Reporting

```python
from pycrq import (
    scenario_report,
    register_report,
    export_json,
    export_csv,
    plot_ale_distribution,
    plot_scenario_comparison,
    plot_risk_matrix,
    format_currency,
    format_frequency,
)

# Text reports
print(scenario_report(result, verbose=True))
print(register_report(agg_result))

# Data export
export_json(result, "output/my_scenario.json")
export_csv(result, "output/my_scenario.csv")

# Plots (return matplotlib Figure objects)
fig1 = plot_ale_distribution(result, title="My Scenario ALE", show_percentiles=True)
fig1.savefig("ale_distribution.png", dpi=150)

fig2 = plot_scenario_comparison(results, metric="mean_ale")
fig3 = plot_risk_matrix(results)

# Formatting helpers
format_currency(1_500_000)   # "$1.50M"
format_currency(85_000)      # "$85.0K"
format_frequency(0.25)       # "Once per 4.0 years"
format_frequency(12.0)       # "12.0x/year"
```

---

### Utilities

```python
from pycrq import (
    validate_scenario,
    calibrate_pert_from_confidence,
    frequency_to_rate,
    annualize,
    summary_table,
)

# Validate a scenario before simulation
warnings = validate_scenario(scenario)
for w in warnings:
    print(f"[WARN] {w}")

# Convert P10/P90 expert estimates to PERT parameters
low, mode, high = calibrate_pert_from_confidence(p10=10_000, p90=500_000)
dist = pert(low, mode, high)

# Convert natural language to annual frequency
rate = frequency_to_rate("monthly")    # 12.0
rate = frequency_to_rate("weekly")     # 52.0
rate = frequency_to_rate("quarterly")  # 4.0

# Annualize a periodic value
annual_cost = annualize(5_000, period="monthly")   # 60_000.0
annual_cost = annualize(1_200, period="weekly")    # 62_400.0

# ASCII comparison table
print(summary_table(list_of_simulation_results))
```

---

## Examples

| File | Description |
|------|-------------|
| `examples/01_basic_scenario.py` | Full ransomware scenario with all PERT distributions, report, bootstrap CI |
| `examples/02_risk_register.py` | 5-scenario risk register, aggregate portfolio analysis, rankings |
| `examples/03_control_roi.py` | Before/after MFA comparison, ROSI calculation, break-even analysis |
| `examples/04_sensitivity_analysis.py` | Threat capability sweep (20→80), control strength sweep, logistic vulnerability insight |

Run any example from the repository root:

```bash
python examples/01_basic_scenario.py
python examples/02_risk_register.py
python examples/03_control_roi.py
python examples/04_sensitivity_analysis.py
```

---

## Open FAIR Model Overview

The Open FAIR model decomposes risk as:

```
Risk = Loss Event Frequency (LEF) × Loss Magnitude (LM)

LEF = Threat Event Frequency (TEF) × Vulnerability (Vuln)

TEF = Contact Frequency (CF) × Probability of Action (PoA)

Vuln = f(Threat Capability (TCap), Control Strength (CS))
     = logistic(6 × (TCap - CS) / 100)   [when using capability scores]

LM = Primary Loss (PL) + Secondary Loss (SL)

Primary Loss = Productivity + Response + Replacement
             + Competitive Advantage + Fines & Judgments + Reputation

Secondary Loss = SL Event Frequency × SL Magnitude
```

**Annual Loss Exposure (ALE)** is the primary output metric:
```
ALE = LEF × LM × Time Horizon (years)
```

The Monte Carlo simulation samples each distribution independently per iteration, producing an empirical ALE distribution from which any statistic (mean, percentile, VaR, CVaR) can be derived.

---

## License

MIT License. See [LICENSE](LICENSE) for details.

Open FAIR is a standard published by The Open Group. This library is an independent implementation and is not affiliated with or endorsed by The Open Group.
