Metadata-Version: 2.1
Name: scidx_streaming
Version: 0.1.3
Summary: A Python client library for interacting with the scidx POP and create streams.
Home-page: https://github.com/sci-ndp/streaming-py
Author: Andreu Fornos, Raul Bardaji, Saleem Slharir
Author-email: andreu.fornos@utah.edu, rbardaji@gmail.com, saleem.alharir@utah.edu
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: AUTHORS
Requires-Dist: requests
Requires-Dist: pytest
Requires-Dist: pytest-order
Requires-Dist: aiokafka
Requires-Dist: pytest-asyncio
Requires-Dist: pointofpresence
Requires-Dist: confluent_kafka
Requires-Dist: blosc
Requires-Dist: msgpack
Requires-Dist: PyJWT
Requires-Dist: xarray
Requires-Dist: sseclient
Requires-Dist: aiohttp
Requires-Dist: h5netcdf
Requires-Dist: netCDF4
Requires-Dist: fastapi
Requires-Dist: python-snappy
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: avro-python3

# scidx Streaming

A Python library for managing streaming data using the sciDX platform and a Point of Presence. This library provides easy-to-use methods for creating, consuming, and managing Kafka streams and related resources.


## Table of Contents

- [Installation](https://github.com/sci-ndp/streaming-py/blob/main/README.md#installation)
- [Tutorial](https://github.com/sci-ndp/streaming-py/blob/main/README.md#tutorial)
- [Running Tests](https://github.com/sci-ndp/streaming-py/blob/main/README.md#running-tests)
- [Usage examples](https://github.com/sci-ndp/streaming-py/blob/main/README.md#usage-examples)
- [Contributing](https://github.com/sci-ndp/streaming-py/blob/main/README.md#contributing)
- [License](https://github.com/sci-ndp/streaming-py/blob/main/README.md#license)
- [Contact](https://github.com/sci-ndp/streaming-py/blob/main/README.md#contact)


## Installation

Ensure you have Python 3.7 or higher installed. Using a virtual environment is recommended.

### Option 1: Install from GitHub

1. **Clone the repository:**

   ```bash
   git clone https://github.com/sci-ndp/streaming-py.git
   cd streaming-py
   ```
2. **Create and activate a virtual environment:**

   ```bash
   python3 -m venv .venv
   source .venv/bin/activate
   ```
3. **Install the package in editable mode:**

   ```bash
   pip install -e .
   ```
4. **Install development dependencies (optional, for testing):**

   ```bash
   pip install -r requirements.txt
   ```

### Option 2: Install via pip

Once the package is published on PyPI, you can install it directly using pip:

```
pip install scidx-streaming
```

## Tutorial

For a step-by-step guide on how to use the `streaming` library, check out our comprehensive tutorial: [10 Minutes for Streaming POP Data](https://github.com/sci-ndp/streaming-py/blob/main/docs/streaming_tutorial.ipynb).


## Running Tests

To run the tests, navigate to the project root and execute:

```bash
pytest
```

## Usage examples

Below is an example showcasing how to set up the library, register a data object, create a filtered stream, and consume its data.

### 1. Set up the POP and Streaming libraries

This can be done by initializing the `APIClient` and using it to initilize the `StreamingClient`:

```python
from streaming import StreamingClient
from pointofpresence import APIClient

API_URL = "http://your-api-url.com"
USERNAME = "your_username"
USERNAME = "your_password"

client = APIClient(base_url=API_URL, username=USERNAME, password=PASSWORD)
streaming = StreamingClient(client)
```

### 2. Register a data object

```python
data_object_metadata = {
    "name": "sample_data_object",
    "type": "url",
    "url": "http://example.com/data.csv",
    "description": "Sample data object for streaming demo"
}
client.register_url(data_object_metadata)
```

### 3. Create a filtered data stream

```python
# Define filters
filters = [
    "column_name > 100",
    "IF column_name < 50 THEN alert = 'low' ELSE alert = 'high'"
]

# Create a Kafka stream with filters
stream = await streaming.create_kafka_stream(
    keywords=["sample_data_object"],
    match_all=True,
    filter_semantics=filters
)
print(f"Stream created with topic: {stream.data_stream_id}")
```

### 4. Consuming the filtered data stream

```python
# Consume stream data
consumer = streaming.consume_kafka_messages(stream.data_stream_id)
print(consumer.dataframe.head())
```

### 5. Cleaning up

```python
consumer.stop()
# Delete the stream and the data object
await streaming.delete_stream(stream)
client.delete_resource_by_id(search_results[0]["id"])
print("Cleanup completed.")
```



## Contributing

Contributions are welcome! Please follow these steps:

1. **Fork the repository**
2. **Create a new branch** (`git checkout -b feature/new-feature`)
3. **Make your changes** and **commit** (`git commit -m 'Add new feature'`)
4. **Push** to the branch (`git push origin feature/new-feature`)
5. **Open a Pull Reques**

### Contributing in PyPI

To publish the library to PyPI, follow these steps:

Ensure setup.py is correctly configured.

Build the distribution files:

```bash
python setup.py sdist bdist_wheel
```

Upload to PyPI using twine:

```bash
twine upload dist/*
```

Verify the package on PyPI:

Visit https://pypi.org/ and check your package listing.

If you need to update the library on PyPI:
- Make your changes and update the version in setup.py.
- Run the above steps to rebuild and upload the new version.

## License

This project is licensed under the MIT License. See [LICENSE.md](https://github.com/sci-ndp/streaming-py/blob/main/docs/LICENSE.md) for more details.

## Contact

For any questions or suggestions, please open an [issue](https://github.com/sci-ndp/streaming-py/blob/main/docs/issues.md) on GitHub.
