Metadata-Version: 2.4
Name: transit-parser
Version: 0.2.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: GIS
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: pytest>=8.0 ; extra == 'dev'
Requires-Dist: pytest-benchmark>=4.0 ; extra == 'dev'
Requires-Dist: mypy>=1.10 ; extra == 'dev'
Requires-Dist: ruff>=0.6 ; extra == 'dev'
Provides-Extra: dev
Summary: High-performance transit data parser with TXC to GTFS conversion
Keywords: gtfs,transxchange,txc,transit,parsing,transport
Author-email: alexogeny <alexogeny@gmail.com>
License: MIT OR Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://alexogeny.github.io/transit-parser/
Project-URL: Homepage, https://github.com/alexogeny/transit-parser
Project-URL: Issues, https://github.com/alexogeny/transit-parser/issues
Project-URL: Repository, https://github.com/alexogeny/transit-parser

# Transit Parser

High-performance Python+Rust library for parsing transit data formats with TXC to GTFS conversion.

## Features

- **GTFS Static** - Parse and write GTFS feeds (CSV-based)
- **TransXChange (TXC)** - Parse UK XML transit format
- **TXC to GTFS** - Convert TransXChange to GTFS
- **Schedule Validation** - Validate operational schedules against GTFS
- **Deadhead Inference** - Infer missing pull-out, pull-in, and interlining movements
- **Generic CSV/JSON** - Parse any CSV/JSON with schema inference

## Installation

### Prerequisites

- Python 3.9+
- Rust 1.75+ (with cargo)
- uv (recommended) or pip

### Development Setup

```bash
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and enter directory
cd parser

# Create virtual environment and install in dev mode
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Build and install with maturin
uv pip install maturin
maturin develop

# Or use pip directly
pip install maturin
maturin develop
```

### Building for Release

```bash
maturin build --release
```

## Usage

### Parse GTFS Feed

```python
from transit_parser import GtfsFeed

# From ZIP file
feed = GtfsFeed.from_zip("path/to/gtfs.zip")

# From directory
feed = GtfsFeed.from_path("path/to/gtfs/")

# Access data
print(f"Agencies: {len(feed.agencies)}")
print(f"Routes: {len(feed.routes)}")
print(f"Stops: {len(feed.stops)}")
print(f"Trips: {len(feed.trips)}")

# Write to ZIP
feed.to_zip("output.zip")
```

### Parse TransXChange

```python
from transit_parser import TxcDocument

# From file
doc = TxcDocument.from_path("path/to/file.xml")

# From string
doc = TxcDocument.from_string(xml_string)

# Inspect document
print(f"Schema version: {doc.schema_version}")
print(f"Operators: {doc.operator_count}")
print(f"Services: {doc.service_count}")
print(f"Vehicle journeys: {doc.vehicle_journey_count}")
```

### Convert TXC to GTFS

```python
from transit_parser import TxcDocument, TxcToGtfsConverter, ConversionOptions

# Parse TXC
doc = TxcDocument.from_path("input.xml")

# Configure conversion
options = ConversionOptions(
    include_shapes=True,
    region="england",  # For bank holiday handling
    calendar_start="2024-01-01",
    calendar_end="2024-12-31",
)

# Convert
converter = TxcToGtfsConverter(options)
result = converter.convert(doc)

# Check results
print(f"Converted {result.stats.trips_converted} trips")
print(f"Warnings: {len(result.warnings)}")

# Save GTFS
result.feed.to_zip("output.zip")
```

### Batch Conversion

```python
from pathlib import Path
from transit_parser import TxcDocument, TxcToGtfsConverter

# Parse multiple TXC files
docs = []
for xml_file in Path("txc_files/").glob("*.xml"):
    docs.append(TxcDocument.from_path(str(xml_file)))

# Convert all to single GTFS
converter = TxcToGtfsConverter()
result = converter.convert_batch(docs)
result.feed.to_zip("combined.zip")
```

### Generic CSV Parsing

```python
from transit_parser import CsvDocument

# Parse with automatic type inference
doc = CsvDocument.from_path("data.csv")

print(f"Columns: {doc.columns}")
print(f"Rows: {len(doc)}")

# Access rows as dicts
for row in doc.rows:
    print(row)
```

### JSON Parsing

```python
from transit_parser import JsonDocument

# Parse JSON
doc = JsonDocument.from_path("data.json")

# Access root value
data = doc.root

# Use JSON pointer for nested access
value = doc.pointer("/data/items/0/name")
```

### Schedule Validation

```python
from transit_parser import GtfsFeed, Schedule, ValidationConfig

# Load GTFS and schedule
gtfs = GtfsFeed.from_path("gtfs/")
schedule = Schedule.from_csv("schedule.csv")

# Validate with custom rules
config = ValidationConfig(
    gtfs_compliance="standard",
    min_layover_seconds=300,
    max_duty_length_seconds=32400,
)
result = schedule.validate(gtfs, config)

if not result.is_valid:
    for error in result.errors:
        print(f"Error: {error['message']}")

# Infer missing deadheads
inference = schedule.infer_deadheads(gtfs, default_depot="MAIN")
print(f"Inferred {inference.total_count} deadheads")

# Export
schedule.to_csv("output.csv", preset="optibus")
```

## Project Structure

```
parser/
├── pyproject.toml          # Python project config (maturin backend)
├── Cargo.toml              # Rust workspace root
├── rust/
│   ├── transit-core/       # Core data models and traits
│   ├── gtfs-parser/        # GTFS Static parser
│   ├── txc-parser/         # TransXChange parser
│   ├── txc-gtfs-adapter/   # TXC→GTFS conversion
│   ├── schedule-parser/    # Schedule validation & generation
│   ├── csv-parser/         # Generic CSV parser
│   ├── json-parser/        # Generic JSON parser
│   └── transit-bindings/   # PyO3 Python bindings
└── python/
    └── transit_parser/     # Python package
```

## Performance

The Rust core provides high performance for:

- **Streaming XML parsing** - Process large TXC files without loading entire DOM
- **Zero-copy CSV parsing** - Efficient GTFS file reading
- **Parallel processing** - Batch conversion uses multiple cores
- **GIL release** - Python can do other work during long operations

## License

MIT OR Apache-2.0

