Metadata-Version: 2.4
Name: dvm-haranalyzer
Version: 0.1.0
Summary: Analyze HAR files and identify page-load bottlenecks
Requires-Python: >=3.12
Description-Content-Type: text/markdown

# HAR Analyzer

A command-line tool that parses `.har` files and identifies page-load bottlenecks —
slow requests, large assets, missing cache headers, ad/tracker overload, and more.

## Requirements

- Python 3.8+ (no third-party packages required — standard library only)

## Quick Start

```bash
# Analyze a HAR file and print the report
python3 har-analyzer.py metrics/hars/mypage.har

# Save the report to metrics/reports/ as well
python3 har-analyzer.py metrics/hars/mypage.har --output metrics/reports/
```

## How to Capture a HAR File

### Chrome / Edge
1. Open DevTools (`F12` or `Cmd+Option+I`)
2. Go to the **Network** tab
3. Check **Preserve log** and **Disable cache**
4. Navigate to the page you want to analyze
5. Right-click the request list → **Save all as HAR with content**
6. Save the file into `metrics/hars/`

### Firefox
1. Open DevTools → **Network** tab
2. Navigate to the page
3. Click the gear icon → **Save All As HAR**

### Safari
1. Open **Develop** menu → **Show Web Inspector**
2. Go to the **Network** tab
3. Navigate to the page
4. Click **Export** (floppy disk icon) to save the HAR

## Sanitizing HAR Files Before Analysis

> **Important:** HAR files captured from a browser contain session cookies, auth tokens,
> API keys, and personal data. Sanitize them before sharing, committing, or storing.

[`har-capture`](https://pypi.org/project/har-capture/) handles sanitization.
It requires no permanent installation — run it with `uvx`:

```bash
uvx "har-capture[cli]" <command>
```

### Recommended workflow

```
capture in browser  →  validate  →  sanitize  →  analyze
```

---

### 1. Validate — check what's sensitive before touching it

```bash
# Check a single file
uvx "har-capture[cli]" validate metrics/hars/mypage.har

# Scan the whole hars/ folder (recursive)
uvx "har-capture[cli]" validate --dir metrics/hars/ --recursive

# Treat any warning as an error (useful in CI)
uvx "har-capture[cli]" validate metrics/hars/mypage.har --strict
```

The validator scans for passwords, tokens, API keys, MAC addresses, IP addresses,
and other PII and exits non-zero if any are found.

---

### 2. Sanitize — redact PII and produce a clean file

```bash
# Basic — writes mypage.sanitized.har alongside the original
uvx "har-capture[cli]" sanitize metrics/hars/mypage.har

# Write to a specific path
uvx "har-capture[cli]" sanitize metrics/hars/mypage.har --output metrics/hars/mypage.clean.har

# Also produce a compressed .har.gz (useful for large captures)
uvx "har-capture[cli]" sanitize metrics/hars/mypage.har --compress

# Write a JSON report of everything that was redacted
uvx "har-capture[cli]" sanitize metrics/hars/mypage.har --report metrics/reports/redaction.json

# Skip the interactive review step (good for scripting)
uvx "har-capture[cli]" sanitize metrics/hars/mypage.har --no-interactive
```

**How redaction works:**

By default each sensitive value is replaced with a salted hash. The same value always
maps to the same hash within a session, so cross-request correlation is preserved while
the actual value is hidden. Pass `--no-salt` to use static `[REDACTED]` placeholders
instead.

---

### 3. Capture directly from a URL (auto-sanitizes)

`har-capture get` drives a headless browser and sanitizes the output in one step:

```bash
# Capture and auto-sanitize (writes <hostname>.har + <hostname>.har.gz)
uvx "har-capture[cli]" get https://example.com

# Save to a specific file
uvx "har-capture[cli]" get https://example.com --output metrics/hars/example.har

# Keep the raw (unsanitized) file alongside the sanitized one
uvx "har-capture[cli]" get https://example.com --keep-raw

# Include images and fonts in the capture (excluded by default)
uvx "har-capture[cli]" get https://example.com --include-images --include-fonts

# Use Firefox instead of the default Chromium
uvx "har-capture[cli]" get https://example.com --browser firefox

# Skip sanitization (not recommended for sharing)
uvx "har-capture[cli]" get https://example.com --no-sanitize
```

---

### Full workflow example

```bash
# 1. Capture from URL into the metrics/hars folder
# (uses uv run so it finds Playwright browsers already installed on the system)
uv run --with "har-capture[cli]" --python python3 \
  har-capture get https://www.example.com \
    --output metrics/hars/example.har \
    --include-images

# 2. Validate the sanitized file
uvx "har-capture[cli]" validate metrics/hars/example.har --strict

# 3. Analyze
python3 har-analyzer.py metrics/hars/example.har --output metrics/reports/
```

Or, for a HAR captured manually in the browser:

```bash
# 1. Sanitize the browser export
uvx "har-capture[cli]" sanitize metrics/hars/raw.har \
    --output metrics/hars/raw.clean.har \
    --report metrics/reports/redaction.json

# 2. Validate the result
uvx "har-capture[cli]" validate metrics/hars/raw.clean.har --strict

# 3. Analyze
python3 har-analyzer.py metrics/hars/raw.clean.har --output metrics/reports/
```

---

## Output

The tool prints a report to stdout containing:

| Section | What it shows |
|---|---|
| Overview | DOMContentLoaded, onLoad, request count, total transfer size |
| Bottleneck Summary | Ranked list of CRITICAL / WARNING findings with fix recommendations |
| Top Slowest Requests | Time, TTFB, SSL, status, KB for the 15 slowest requests |
| Large Resources | Resources over 50 KB with type and cache headers |
| Content Type Breakdown | Total KB per MIME type |
| Top Domains | Request count, KB, and average time per origin |
| Slow TTFB | Requests with >300ms wait time |
| Slow TLS | Cold TLS handshakes >100ms |
| Slow DNS | DNS lookups >50ms |
| Poorly Cached Resources | Large resources missing Cache-Control |
| Redirects | All 3xx chains |
| HTTP Version Breakdown | HTTP/1.1 vs HTTP/2 usage |
| Concurrency | Peak concurrent requests in the first 5 seconds |

When `--output` is given, the report is also written to a timestamped file:
```
metrics/reports/<stem>_YYYYMMDD_HHMMSS.txt
```

## All Options

```
python3 har-analyzer.py <har> [options]

Positional:
  har                  Path to the .har file

Options:
  --output, -o DIR     Directory to write the text report
  --large-kb N         Threshold (KB) for "large resource" section (default: 50)
  --ttfb-ms N          Slow TTFB threshold in ms (default: 300)
  --ssl-ms N           Slow TLS threshold in ms (default: 100)
  --dns-ms N           Slow DNS threshold in ms (default: 50)
  --top-n N            Number of slowest requests to list (default: 15)
```

## Folder Structure

```
webanalytics/
├── har-analyzer.py        # main script
├── README.md              # this file
├── developer.md           # how to extend the tool
└── metrics/
    ├── hars/              # drop your .har files here
    └── reports/           # generated reports land here
```

## Examples

```bash
# Higher threshold — only flag resources over 200 KB
python3 har-analyzer.py metrics/hars/checkout.har --large-kb 200

# Show top 30 slowest requests
python3 har-analyzer.py metrics/hars/homepage.har --top-n 30

# Stricter TTFB — flag anything over 100ms
python3 har-analyzer.py metrics/hars/api-heavy.har --ttfb-ms 100 --output metrics/reports/
```
