Metadata-Version: 2.4
Name: per-datasets
Version: 0.0.7a2
Summary: A Python package for loading petroleum datasets
Author-email: PERD Team <data.per@uniben.edu>
License-Expression: MIT
Project-URL: Homepage, https://github.com/P-E-R-D/library-py
Project-URL: Bug Tracker, https://github.com/P-E-R-D/library-py/issues
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: ipython<=9.11.0
Requires-Dist: numpy<=2.4.2
Requires-Dist: plotly<=6.6.0
Requires-Dist: matplotlib<=3.10.8
Requires-Dist: pandas<=3.0.0
Requires-Dist: python-socketio<=5.16.1
Requires-Dist: requests<=2.32.5
Requires-Dist: websocket-client<=1.9.0
Requires-Dist: nbformat<=5.10.4
Requires-Dist: anywidget<=0.9.21
Requires-Dist: ipywidgets<=8.1.8
Requires-Dist: typing-extensions<=3.10.0.2; python_version < "3.10"
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"

# per-datasets

A Python package for loading reservoir datasets from API endpoints.

## Installation

```bash
pip install per-datasets
```

## Quick Start

### Option 1: Using Global API Key (Recommended)

First, set your API key globally:

```bash
# Set API key globally (works across all projects)
per-datasets set-key "your_api_key_here"

# Or use interactive setup
per-datasets interactive
```

Then use in your Python code:

```python
import per_datasets as pds

# Initialize without API key (uses global key)
pds.initialize()

# Load a random reservoir dataset
df_random = pds.reservoir.load_random()
print(f"Loaded dataset with shape: {df_random.shape}")
```

### Option 2: Using API Key in Code

```python
import per_datasets as pds

# Initialize with your API key
pds.initialize('your_api_key_here')

# Load a random reservoir dataset
df_random = pds.reservoir.load_random()
print(f"Loaded dataset with shape: {df_random.shape}")
```

## Workflows

The package includes Dockerized workflows for common operations:

### Available Workflows

1. **Add Workflow** - Adds two numbers together
2. **Subtract Workflow** - Subtracts one number from another
3. **PINN Workflow** - Trains a Physics-Informed Neural Network (Transformer-based)

### Running Workflows in Python

You can run workflows directly in Python:

```python
from per_datasets.workflows import add, subtract, pinn

# Run simple workflows
print(add(5, 3))       # 8
print(subtract(10, 4)) # 6

# Run PINN training workflow
results = pinn(epochs=50)
print(f"Final Loss: {results['final_loss']}")

# Visualize the loss history dynamically
from per_datasets import visual # Or use pds.visual if imported as pds
visual.line_plot(results, y='loss_history', title="PINN Training Loss")
```

### Building Workflow Containers

```bash
# Build all workflow Docker images
./build_workflows.sh

# Or build individually
docker build -t perd-add-workflow -f per_datasets/workflows/add/Dockerfile .
docker build -t perd-subtract-workflow -f per_datasets/workflows/substract/Dockerfile .
```

### Running Workflows

```bash
# Run add workflow
docker run --rm perd-add-workflow 5.2 3.8

# Run subtract workflow
docker run --rm perd-subtract-workflow 10.5 4.3
```

See `per_datasets/workflows/README.md` for more details.

## Command Line Interface

The package includes a CLI for managing API keys globally:

```bash
# Set API key globally
per-datasets set-key "your_api_key_here"

# Check configuration status
per-datasets status

# Get stored API key (masked)
per-datasets get-key

# Remove API key
per-datasets remove-key

# Interactive setup
per-datasets interactive

# Clear all configuration
per-datasets clear

# Show help
per-datasets --help
```

## Complete Usage Examples

```
import per_datasets as pds

# Initialize (uses global key if available)
pds.initialize()

# Load a random reservoir dataset
df_random = pds.reservoir.load_random()
print(f"Loaded dataset with shape: {df_random.shape}")

# Load a specific dataset by ID
df_specific = pds.reservoir.load('your_dataset_id')

# Get information about available datasets
info = pds.get_dataset_info()
```

## API Reference

### `initialize(api_key=None)`

Initialize the per_datasets module with API credentials.

**Parameters:**

- `api_key` (str, optional): The API key for authentication. If not provided, uses globally stored key.

**Note:** If no API key is provided and none is stored globally, raises a ValueError with instructions to set a global key.

### `load_random()`

Loads a random reservoir model from the API endpoint and returns as pandas DataFrame.

**Returns:**

- `pandas.DataFrame`: A DataFrame containing the dataset

## Configuration Management

The package stores configuration in `~/.per_datasets/config.json` by default:

```
{
  "api_key": "your_api_key_here"
}
```

### Benefits of Global Configuration:

- ✅ **No API key in code**: Keep sensitive keys out of your source code
- ✅ **Cross-project**: Use the same API key across multiple projects
- ✅ **Secure**: API keys are stored in user's home directory
- ✅ **Override**: Can still provide API key in code to override global setting
- ✅ **Easy management**: Use CLI commands to manage keys

### Security Notes:

- API keys are stored in plain text in your home directory
- Only you can access the configuration file
- Consider using environment variables for production deployments

## Dependencies

- requests>=2.25.1
- pandas>=1.3.0

## License

MIT

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## Development

To set up the development environment:

```bash
git clone https://github.com/P-E-R-D/library-py.git
cd per-datasets
pip install -e .
```

## Building and Publishing

### Automatic Deployment (Recommended)

This package uses GitHub Actions for automatic deployment to PyPI:

1. **Make your changes** to the code
2. **Update version numbers** in `per_datasets/__init__.py` and `pyproject.toml`
3. **Create a git tag** with the new version:
   ```bash
   git tag v0.2.0
   git push origin v0.2.0
   ```
4. **GitHub Actions automatically** builds and uploads to PyPI!

See [DEPLOYMENT.md](DEPLOYMENT.md) for detailed setup instructions.

### Manual Publishing

```bash
python -m build
twine upload dist/*
```
