Metadata-Version: 2.4
Name: vestige-scraper
Version: 1.2.0
Summary: Python library for getting a list of details regarding the Bulgarian State Gazette (Държавен вестник) issues
Author-email: "ytcalifax (98w)" <hello@98w.eu>
License-Expression: MIT
Project-URL: Homepage, https://github.com/ytcalifax/vestige
Project-URL: Repository, https://github.com/ytcalifax/vestige.git
Project-URL: Bug Tracker, https://github.com/ytcalifax/vestige/issues
Project-URL: Changelog, https://github.com/ytcalifax/vestige/blob/main/CHANGELOG.md
Keywords: bulgaria,gazette,state,state-gazette,dv,darzhaven-vestnik
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: asyncio>=4.0.0
Requires-Dist: httpx[http2]>=0.28.1
Requires-Dist: selectolax>=0.4.7
Dynamic: license-file

# 📰 Vestige

> **A Python library for fetching and parsing issues of the Bulgarian State Gazette (Държавен вестник).**

Vestige gives you a clean, dependency-light Python interface to the official [Държавен вестник](https://dv.parliament.bg/) listing. Scrape issue metadata and download links without wrestling with raw HTML or cryptic JSF form fields ever again.

## ✨ Features

- **📄 Issue Listings**: Fetch paginated lists of State Gazette issues, including issue number, date, year, and type.
- **📥 Download Links**: Automatically resolve the PDF and RTF download URLs for each issue.
- **🔌 Pluggable Architecture**: Swap out the HTTP transport or the HTML parser via simple interfaces — great for testing with mocks.
- **🗂️ Typed Models**: Clean dataclasses (`IssueEntry`, `DownloadFile`, `PageResult`) with `to_dict()` helpers for easy JSON serialization.
- **⚡ Minimal Dependencies**: Only needs `selectolax` and `httpx[http2]`.

## 📥 Installation

```bash
pip install vestige-scraper
```

Or, to install directly from source:

```bash
pip install .
```

## 🚀 Quick Start

```python
import asyncio
from vestige import DVClient

async def main():
    client = DVClient()
    try:
        # Fetch the first page of issues (with download URLs resolved)
        page = await client.get_page(1)

        print(page)
        # PageResult(page=1, total_results=..., total_pages=..., entries=...)

        for entry in page.entries:
            print(entry)
            for file in entry.download_urls:
                print(f"  → {file.filename}: {file.url}")
    finally:
        await client.aclose()

asyncio.run(main())
```

### Fetch all pages at once

```python
import asyncio
from vestige import DVClient

async def main():
    client = DVClient()
    try:
        all_pages = await client.get_all_pages(max_pages=5)
        for page in all_pages:
            for entry in page.entries:
                print(entry.to_dict())
    finally:
        await client.aclose()

asyncio.run(main())
```

### Skip download URL resolution (faster)

```python
page = client.get_page(1, fetch_downloads=False)
```

## 🗂️ Models

| Class          | Description                                                                    |
|----------------|--------------------------------------------------------------------------------|
| `IssueEntry`   | One issue of the State Gazette — number, date, year, type, and download files. |
| `DownloadFile` | A single downloadable file (`url`, `filename`).                                |
| `PageResult`   | One page of results — holds metadata and a list of `IssueEntry` objects.       |

## ⚙️ Advanced Usage

### Custom transport / parser (e.g. for testing)

```python
from vestige import DVClient
from vestige.network.transport import AsyncRequestsTransport
from vestige.scraping.parsers import IssueParser

client = DVClient(
    transport=MyCustomTransport(),
    parser=MyCustomParser(),
)
```

Both `transport` and `parser` accept any object that satisfies the `AsyncPageFetcher` and `PageParser` interfaces defined in `vestige.core.interfaces`.

## 🐍 Requirements

- Python **3.10+**
- `asyncio>=4.0.0`
- `selectolax>=0.4.7`
- `httpx[http2]>=0.28.1`

## 🤝 Contributing

Issues and pull requests are welcome! Please file an issue if you encounter any problems or have suggestions for improvements.

---
*Built with ❤️ for Bulgaria's open data. MIT Licensed.*
