Metadata-Version: 2.4
Name: echr-extractor
Version: 1.0.45
Summary: Python library for extracting case law data from the European Court of Human Rights (ECHR) HUDOC database
Author-email: LawTech Lab <lawtech@maastrichtuniversity.nl>
License: Apache-2.0
Project-URL: Homepage, https://github.com/maastrichtlawtech/echr-extractor
Project-URL: Repository, https://github.com/maastrichtlawtech/echr-extractor
Project-URL: Bug Reports, https://github.com/maastrichtlawtech/echr-extractor/issues
Project-URL: Documentation, https://github.com/maastrichtlawtech/echr-extractor
Keywords: echr,extractor,european,convention,human,rights,court,case-law,legal,hudoc,data-extraction
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Legal Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup :: HTML
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.26.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: beautifulsoup4>=4.9.3
Requires-Dist: dateparser>=1.0.0
Requires-Dist: tqdm>=4.60.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.10; extra == "dev"
Requires-Dist: black>=21.0.0; extra == "dev"
Requires-Dist: isort>=5.0.0; extra == "dev"
Requires-Dist: flake8>=3.8.0; extra == "dev"
Requires-Dist: mypy>=0.910; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=4.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=0.5.0; extra == "docs"
Dynamic: license-file

# ECHR Extractor

Python library for extracting case law data from the European Court of Human Rights (ECHR) HUDOC database.

## Features

- Extract metadata for ECHR cases from the HUDOC database
- Download full text content for cases
- Support for custom date ranges and case ID ranges
- Multiple language support
- Generate nodes and edges for network analysis
- Flexible output formats (CSV, JSON, in-memory DataFrames)

## Installation

```bash
pip install echr-extractor
```

## Quick Start

```python
from echr_extractor import get_echr, get_echr_extra, get_nodes_edges

# Get basic metadata for cases
df = get_echr(start_id=0, count=100, language=['ENG'])

# Get metadata + full text
df, full_texts = get_echr_extra(start_id=0, count=100, language=['ENG'])

# Generate network data
nodes, edges = get_nodes_edges(df=df)
```

## Functions

### `get_echr`

Gets all available metadata for ECHR cases from the HUDOC database.

**Parameters:**
- `start_id` (int, optional): The ID of the first case to download (default: 0)
- `end_id` (int, optional): The ID of the last case to download (default: maximum available)
- `count` (int, optional): Number of cases per language to download (default: None)
- `start_date` (str, optional): Start publication date (yyyy-mm-dd) (default: None)
- `end_date` (str, optional): End publication date (yyyy-mm-dd) (default: current date)
- `verbose` (bool, optional): Show progress information (default: False)
- `fields` (list, optional): Limit metadata fields to download (default: all fields)
- `save_file` (str, optional): Save as CSV file ('y') or return DataFrame ('n') (default: 'y')
- `language` (list, optional): Languages to download (default: ['ENG'])
- `link` (str, optional): Direct HUDOC search URL (default: None)
- `query_payload` (str, optional): Direct API query payload (default: None)

### `get_echr_extra`

Gets metadata and downloads full text for each case.

**Parameters:** Same as `get_echr` plus:
- `threads` (int, optional): Number of threads for parallel download (default: 10)

### `get_nodes_edges`

Generates nodes and edges for network analysis from case metadata.

**Parameters:**
- `metadata_path` (str, optional): Path to metadata CSV file (default: None)
- `df` (DataFrame, optional): Metadata DataFrame (default: None)
- `save_file` (str, optional): Save as files ('y') or return objects ('n') (default: 'y')

## Advanced Usage

### Using Custom Search URLs

You can use direct HUDOC search URLs:

```python
url = "https://hudoc.echr.coe.int/eng#{%22itemid%22:[%22001-57574%22]}"
df = get_echr(link=url)
```

### Using Query Payloads

For more robust searching, use query payloads from the browser's Network tab:

```python
payload = '{"query":{"terms":{"articles":["8"]}}}'
df = get_echr(query_payload=payload)
```

### Date Range Filtering

```python
df = get_echr(
    start_date="2020-01-01",
    end_date="2023-12-31",
    language=['ENG', 'FRE']
)
```

### Specific Fields Only

```python
fields = ['itemid', 'doctypebranch', 'title', 'kpdate']
df = get_echr(count=100, fields=fields)
```

## Requirements

- Python 3.8+
- requests
- pandas
- beautifulsoup4
- dateparser
- tqdm

## License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

## Contributors

- Benjamin Rodrigues de Miranda
- Chloe Crombach
- Piotr Lewandowski
- Pranav Bapat
- Shashank MC
- Gijs van Dijck

## Citation

If you use this library in your research, please cite:

```bibtex
@software{echr_extractor,
  title={ECHR Extractor: Python Library for European Court of Human Rights Data},
  author={LawTech Lab, Maastricht University},
  url={https://github.com/maastrichtlawtech/echr-extractor},
  year={2024}
}
```

## Support

For bug reports and feature requests, please open an issue on GitHub.
