Metadata-Version: 2.4
Name: nl-voting-data-scraper
Version: 0.1.1
Summary: Scrape Dutch voting advice (StemWijzer) data for any election
Author: Rehan Fazal
License: MIT
License-File: LICENSE
Keywords: dutch,elections,scraper,stemwijzer,votematch
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Requires-Python: >=3.11
Requires-Dist: click>=8.3
Requires-Dist: httpx>=0.28
Requires-Dist: pycryptodome>=3.23
Requires-Dist: pydantic>=2.12
Requires-Dist: rich>=14.0
Requires-Dist: tenacity>=9.1
Provides-Extra: browser
Requires-Dist: playwright>=1.58; extra == 'browser'
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=1.3; extra == 'dev'
Requires-Dist: pytest-httpx>=0.36; extra == 'dev'
Requires-Dist: pytest>=9.0; extra == 'dev'
Requires-Dist: ruff>=0.15; extra == 'dev'
Description-Content-Type: text/markdown

# nl-voting-data-scraper

[![PyPI](https://img.shields.io/pypi/v/nl-voting-data-scraper?color=green)](https://pypi.org/project/nl-voting-data-scraper/)
[![Downloads](https://img.shields.io/pypi/dm/nl-voting-data-scraper)](https://pypi.org/project/nl-voting-data-scraper/)
[![Python package](https://github.com/rhnfzl/nl-voting-data-scraper/actions/workflows/publish.yml/badge.svg)](https://github.com/rhnfzl/nl-voting-data-scraper/actions/workflows/publish.yml)
[![Python](https://img.shields.io/pypi/pyversions/nl-voting-data-scraper)](https://pypi.org/project/nl-voting-data-scraper/)
[![License](https://img.shields.io/pypi/l/nl-voting-data-scraper)](https://github.com/rhnfzl/nl-voting-data-scraper/blob/main/LICENSE)

Scrape Dutch voting advice ([StemWijzer](https://stemwijzer.nl)) data for any election — municipal, national, European, or provincial.

Outputs structured JSON with party positions, policy statements, and metadata. Reusable across election cycles.

## Installation

```bash
pip install nl-voting-data-scraper
```

For browser automation fallback (optional):

```bash
pip install "nl-voting-data-scraper[browser]"
playwright install chromium
```

## Quick Start

### CLI

```bash
# List known elections
nl-voting-data-scraper list-elections

# Scrape all municipalities for 2026 municipal elections
nl-voting-data-scraper scrape gr2026 -o ./output

# Scrape a specific municipality
nl-voting-data-scraper scrape gr2026 -m GM0014 -o ./output

# Scrape national election
nl-voting-data-scraper scrape tk2025 -o ./output

# List municipalities for an election
nl-voting-data-scraper list-municipalities gr2026

# Discover API endpoints
nl-voting-data-scraper discover gr2026
```

### Python Library

```python
import asyncio
from nl_voting_data_scraper import StemwijzerScraper

async def main():
    async with StemwijzerScraper("gr2026") as scraper:
        # Scrape a single municipality
        data = await scraper.scrape_one("GM0014")
        print(f"{data.votematch.name}: {len(data.parties)} parties, {len(data.statements)} statements")

        # Scrape all
        results = await scraper.scrape()
        print(f"Scraped {len(results)} entries")

asyncio.run(main())
```

## Supported Elections

| Slug | Type | Year | Description |
|------|------|------|-------------|
| `gr2026` | Municipal | 2026 | Gemeenteraadsverkiezingen 2026 |
| `tk2025` | National | 2025 | Tweede Kamerverkiezingen 2025 |
| `tk2023` | National | 2023 | Tweede Kamerverkiezingen 2023 |
| `eu2024` | European | 2024 | Europees Parlement 2024 |
| `ps2023` | Provincial | 2023 | Provinciale Staten 2023 |

New elections are auto-detected from URL patterns. You can also pass custom election slugs.

## How It Works

**Hybrid approach:**

1. **API-first (fast):** Tries to fetch data from StemWijzer data endpoints via HTTP. Handles base64-encoded responses and optional AES decryption.
2. **Browser fallback:** If the API fails, uses Playwright to load the frontend, intercept network requests, and capture the data. Falls back to DOM extraction as a last resort.

## Output Format

Each municipality/election produces a JSON file:

```json
{
  "parties": [
    {
      "id": 206919,
      "name": "Party Name",
      "fullName": "Full Party Name",
      "website": "https://...",
      "hasSeats": true,
      "statements": [
        { "id": 206987, "position": "agree", "explanation": "..." }
      ]
    }
  ],
  "statements": [
    {
      "id": 206987,
      "theme": "Housing",
      "title": "The municipality should build more affordable housing.",
      "index": 1
    }
  ],
  "shootoutStatements": [...],
  "votematch": {
    "id": 206918,
    "name": "Municipality Name",
    "context": "2026GR",
    "remote_id": "GM0014",
    "langcode": "nl"
  }
}
```

## CLI Options

```
nl-voting-data-scraper scrape ELECTION [OPTIONS]

Options:
  -m, --municipality TEXT   Specific GM codes (repeatable)
  -l, --language TEXT       Languages to scrape (default: nl)
  -o, --output TEXT         Output directory (default: ./output)
  --combined                Also write combined.json
  --rate-limit FLOAT        Requests per second (default: 2.0)
  --no-cache                Disable caching
  --resume                  Resume interrupted scrape
  --browser-only            Only use browser scraping
  --api-only                Only use API scraping
  -v, --verbose             Verbose output
```

## Development

```bash
git clone https://github.com/rhnfzl/nl-voting-data-scraper.git
cd nl-voting-data-scraper
pip install -e ".[dev,browser]"
playwright install chromium
pytest
```

## License

MIT
