Metadata-Version: 2.4
Name: pyptine
Version: 0.3.0
Summary: High-level Python client for INE Portugal (Statistics Portugal) API
Author-email: Nigel Randsley <nigel.randsley@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/nigelrandsley/pyptine
Project-URL: Documentation, https://pyptine.readthedocs.io
Project-URL: Repository, https://github.com/nigelrandsley/pyptine
Project-URL: Bug Tracker, https://github.com/nigelrandsley/pyptine/issues
Keywords: ine,portugal,statistics,data,api
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28.0
Requires-Dist: pandas>=1.5.0
Requires-Dist: click>=8.0.0
Requires-Dist: requests-cache>=1.0.0
Requires-Dist: lxml>=4.9.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: platformdirs>=3.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: plotly>=5.0.0
Requires-Dist: httpx>=0.24.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: responses>=0.22.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Requires-Dist: types-requests>=2.28.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=6.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.2.0; extra == "docs"
Requires-Dist: sphinx-click>=4.4.0; extra == "docs"
Dynamic: license-file

# pyptine - INE Portugal API Client

[![PyPI version](https://img.shields.io/pypi/v/pyptine.svg)](https://pypi.org/project/pyptine/)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

High-level Python client for Statistics Portugal (INE) API. Query and download statistical data from [INE Portugal](https://www.ine.pt) with a simple, intuitive interface.

## Features

- 🎯 **High-level Convenience API**: Simple interface for common data retrieval and analysis tasks.
- ⚡ **Async Support**: Non-blocking I/O with `AsyncINE` for concurrent requests using httpx.
- 📊 **Multiple Output Formats**: Export data to pandas DataFrames, JSON, or CSV with ease.
- 📈 **Data Visualization**: Interactive plotly charts (line, bar, area, scatter) directly from data.
- 🔬 **Statistical Analysis**: Built-in YoY growth, MoM changes, moving averages, and EMA calculations.
- 💾 **Smart Caching**: Disk-based caching reduces redundant API calls, speeding up repeated queries.
- 🔍 **Metadata Browsing**: Search and discover indicators, themes, and dimensions.
- 🖥️ **Enhanced CLI**: Rich formatting with progress bars, tables, and colored output.
- 📑 **True Pagination**: Efficient streaming of large datasets with `get_all_data()`.
- 📖 **Modern Python**: Fully type-annotated for better developer experience and IDE support.
- ✅ **Well-Tested**: Comprehensive test suite with 81% code coverage (239 tests).
- 🔄 **API Compatible**: Supports both old and new INE API response formats seamlessly.

## Installation

```bash
pip install pyptine
```

For development, install with all extra dependencies:

```bash
pip install "pyptine[dev,docs]"
```

## Quick Start

```python
from pyptine import INE

# Initialize the client
ine = INE(language="EN")

# 1. Search for an indicator
print("Searching for 'gdp' indicators...")
results = ine.search("gdp")
for indicator in results[:5]:  # Print top 5 results
    print(f"- {indicator.varcd}: {indicator.title}")

# 2. Get data for a specific indicator
varcd = "0004167"  # Resident population
print(f"\nFetching data for indicator {varcd}...")
response = ine.get_data(varcd)

# 3. Convert to a pandas DataFrame
df = response.to_dataframe()
print("\nData as DataFrame:")
print(df.head())

# 4. Export data to a CSV file
output_file = "population_data.csv"
print(f"\nExporting data to {output_file}...")
ine.export_csv(varcd, output_file)
print("Done!")
```

## Async API

For concurrent requests and non-blocking I/O, use the `AsyncINE` client:

```python
import asyncio
from pyptine import AsyncINE

async def main():
    async with AsyncINE(language="EN") as ine:
        # Fetch single indicator
        response = await ine.get_data("0004167")
        df = response.to_dataframe()
        print(df.head())

        # Fetch multiple indicators concurrently
        import asyncio
        responses = await asyncio.gather(
            ine.get_data("0004167"),
            ine.get_data("0004127"),
            ine.get_data("0008074")
        )

        # Stream large datasets
        async for chunk in ine.get_all_data("0004127", chunk_size=40000):
            df_chunk = chunk.to_dataframe()
            print(f"Processing {len(df_chunk)} rows...")

asyncio.run(main())
```

**AsyncINE Features:**
- Non-blocking I/O for faster concurrent requests
- Async iterator for memory-efficient pagination
- Same API as the synchronous `INE` client
- Automatic connection pooling and retries

## Command-Line Usage

The pyptine CLI provides a convenient way to access INE data from your terminal, with rich formatting and progress indicators for a better user experience.

```bash
# Search for indicators related to "pib" (GDP in Portuguese)
pyptine search "pib"

# Get detailed information about a specific indicator
pyptine info 0004127

# Download data for an indicator to a CSV file (with progress bar)
pyptine download 0004127 --output data.csv

# Download data and filter by dimensions
pyptine download 0004167 --output filtered_data.csv -d Dim1=S7A2023 -d Dim2=PT

# List all available statistical themes (in formatted table)
pyptine list-commands themes

# List all indicators (with pagination support)
pyptine list-commands indicators --limit 50

# View available dimensions for an indicator
pyptine dimensions 0004167

# Clear the local cache
pyptine cache clear
```

**CLI Features:**
- **Rich Formatting** - Tables, panels, and colored output for better readability
- **Progress Indicators** - Spinners and progress bars for long-running operations
- **Error Handling** - Centralized, user-friendly error messages with context
- **Better Organization** - Data displayed in well-formatted tables rather than plain text

## Documentation

### Initializing the Client

The `INE` class is the main entry point.

```python
from pyptine import INE
from pathlib import Path

# Default client (language='EN', caching=True)
ine = INE()

# Client with Portuguese language
ine_pt = INE(language="PT")

# Disable caching
ine_no_cache = INE(cache=False)

# Use a custom cache directory
ine_custom_cache = INE(cache_dir=Path("/path/to/custom/cache"))
```

### Working with Indicators

#### Searching for Indicators

You can search for indicators by keyword and filter by theme or sub-theme.

```python
# Basic search
results = ine.search("unemployment rate")

# Search within a specific theme
results = ine.search("employment", theme="Labour market")
```

#### Getting Indicator Metadata

Retrieve detailed information about an indicator, including its dimensions.

```python
metadata = ine.get_metadata("0004167")
print(f"Title: {metadata.title}")
print(f"Unit: {metadata.unit}")
print(f"Source: {metadata.source}")

# List available dimensions
dimensions = ine.get_dimensions("0004167")
for dim in dimensions:
    print(f"\nDimension: {dim.name}")
    for value in dim.values[:5]:  # Show first 5 values
        print(f"- {value.code}: {value.label}")
```

### Fetching and Exporting Data

#### Getting Data

The `get_data` method returns a `DataResponse` object, which can be easily converted to different formats.

```python
response = ine.get_data("0004127")

# Convert to pandas DataFrame
df = response.to_dataframe()

# Convert to a dictionary
data_dict = response.to_dict()

# Get data as a JSON string
json_str = response.to_json()
```

#### Filtering Data with Dimensions

Use the `dimensions` parameter to filter data before downloading.

```python
# Get data for the year 2023 and region "Portugal"
# Note: Dimension values use specific codes (e.g., 'S7A2023' for year 2023)
filtered_response = ine.get_data(
    "0004167",
    dimensions={
        "Dim1": "S7A2023",  # Year 2023
        "Dim2": "PT"        # Geographic region 'Portugal'
    }
)
df_filtered = filtered_response.to_dataframe()
```

#### Exporting Data

You can export data directly to CSV or JSON files.

```python
# Export to CSV
ine.export_csv("0004127", "output.csv")

# Export to JSON with pretty printing
ine.export_json("0004127", "output.json", pretty=True)

# Export filtered data
ine.export_csv(
    "0004167",
    "filtered_output.csv",
    dimensions={"Dim1": "S7A2023"}
)
```

#### Working with Large Datasets

For large datasets that exceed the default 40,000 data point limit, use the `get_all_data()` method which automatically handles pagination:

```python
from pyptine.client.data import DataClient

client = DataClient(language="EN")

# Fetch data in chunks (default chunk_size=40,000)
for chunk in client.get_all_data("0004127"):
    df = chunk.to_dataframe()
    print(f"Processed {len(df)} rows")
    # Process each chunk

# Custom chunk size
for chunk in client.get_all_data("0004127", chunk_size=5000):
    # Process smaller chunks
    pass

# Combine all chunks into a single dataset
all_chunks = list(client.get_all_data("0004127"))
all_data = [point for chunk in all_chunks for point in chunk.data]
```

#### Visualizing Data

Create interactive visualizations directly from indicator data without exporting to DataFrame:

```python
# Get data and create interactive line chart
response = ine.get_data("0004127")
fig = response.plot(chart_type="line")
fig.show()

# Different chart types
fig_bar = response.plot_bar()
fig_area = response.plot_area()
fig_scatter = response.plot_scatter()

# Customize visualization
fig = response.plot_line(
    markers=True,
    x_column="Period",
    y_column="value"
)

# Color by dimensions (if data has dimension columns)
fig = response.plot_line(color_column="region")

# Save to HTML for sharing
fig.write_html("indicator_plot.html")

# Further customization with plotly
fig.update_layout(height=600, width=1200, title="Custom Title")
fig.show()
```

**Available Visualization Methods:**
- `plot(chart_type)` - Generic plot with selectable chart type
- `plot_line()` - Interactive line chart with optional markers
- `plot_bar()` - Bar chart for categorical comparison
- `plot_area()` - Stacked area chart for trends
- `plot_scatter()` - Scatter plot with optional size and color dimensions

All methods support:
- Interactive plotly charts with hover, zoom, and pan
- Custom column selection for x/y axes
- Color coding by dimension columns
- Export to HTML, PNG, or other formats

#### Advanced Data Analysis

Perform statistical calculations on indicator data directly within the library:

```python
# Get data and calculate year-over-year growth
response = ine.get_data("0004127")
yoy_response = response.calculate_yoy_growth()
df_yoy = yoy_response.to_dataframe()
print(df_yoy[['Period', 'value', 'yoy_growth']])

# Calculate month-over-month changes
mom_response = response.calculate_mom_change()
df_mom = mom_response.to_dataframe()

# Calculate simple moving average (3-period)
ma_response = response.calculate_moving_average(window=3)
df_ma = ma_response.to_dataframe()

# Calculate exponential moving average
ema_response = response.calculate_exponential_moving_average(span=5)
df_ema = ema_response.to_dataframe()

# Chain multiple analyses
result = response.calculate_yoy_growth().calculate_moving_average(window=2)
df = result.to_dataframe()
print(df[['Period', 'value', 'yoy_growth', 'moving_avg']])
```

Available analysis methods on `DataResponse`:
- `calculate_yoy_growth()` - Year-over-year percentage change
- `calculate_mom_change()` - Month-over-month percentage change
- `calculate_moving_average(window)` - Simple moving average
- `calculate_exponential_moving_average(span)` - Exponential weighted moving average

All methods support custom `value_column` and `period_column` parameters to work with different data structures.

## API Reference

### `INE` Class

The main class for interacting with the INE API.

`INE(language: str = "EN", cache: bool = True, cache_dir: Optional[Path] = None, cache_ttl: int = 86400)`

| Method | Description |
| --- | --- |
| `search(query, ...)` | Search for indicators. |
| `get_data(varcd, ...)` | Get data for an indicator as a `DataResponse` object. |
| `get_metadata(varcd)` | Get detailed metadata for an indicator. |
| `get_dimensions(varcd)` | Get available dimensions for an indicator. |
| `get_indicator(varcd)` | Get catalogue information for a single indicator. |
| `validate_indicator(varcd)` | Check if an indicator code is valid. |
| `list_themes()` | Get a list of all available themes. |
| `export_csv(varcd, ...)` | Export indicator data to a CSV file. |
| `export_json(varcd, ...)` | Export indicator data to a JSON file. |
| `clear_cache()` | Clear all cached data. |
| `get_cache_info()` | Get statistics about the cache. |

---

## Links & Resources

- **PyPI Package**: https://pypi.org/project/pyptine/
- **GitHub Repository**: https://github.com/randsley/pyptine
- **Issue Tracker**: https://github.com/randsley/pyptine/issues
- **INE Portal**: https://www.ine.pt/

---

## Development

### Setup

To set up the development environment:

```bash
# Clone the repository
git clone https://github.com/nigelrandsley/pyptine.git
cd pyptine

# Install in editable mode with development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks to ensure code quality
pre-commit install
```

### Running Tests

```bash
# Run all tests
pytest

# Run tests with coverage report
pytest --cov=src/pyptine --cov-report=term-missing
```

### Code Quality

This project uses `black` for formatting, `ruff` for linting, and `mypy` for type checking.

```bash
# Format code
black src/ tests/

# Lint code
ruff check src/ tests/

# Type check
mypy src/
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1.  Fork the repository.
2.  Create your feature branch (`git checkout -b feature/amazing-feature`).
3.  Commit your changes (`git commit -m 'Add amazing feature'`).
4.  Push to the branch (`git push origin feature/amazing-feature`).
5.  Open a Pull Request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
