Metadata-Version: 2.4
Name: aiogzip
Version: 0.3
Summary: Asynchronous gzip file reader/writer with aiocsv support.
Author-email: Geoff Davis <geoff@keksi.ai>
License: MIT License
        
        Copyright (c) 2025 Geoff Davis
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/geoff-davis/aiogzip
Project-URL: Source Code, https://github.com/geoff-davis/aiogzip
Project-URL: Bug Tracker, https://github.com/geoff-davis/aiogzip/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Framework :: AsyncIO
Classifier: Topic :: System :: Archiving :: Compression
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: aiofiles>=23.0.0
Provides-Extra: csv
Requires-Dist: aiocsv>=1.2.0; extra == "csv"
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Requires-Dist: psutil; extra == "dev"
Requires-Dist: tomli; python_version < "3.11" and extra == "dev"
Dynamic: license-file

# aiogzip ⚡️

**An asynchronous library for reading and writing gzip-compressed files.**

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![PyPI version](https://img.shields.io/pypi/v/aiogzip.svg)](https://pypi.org/project/aiogzip/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/aiogzip.svg)
[![Tests](https://github.com/geoff-davis/aiogzip/workflows/Python%20CI/badge.svg)](https://github.com/geoff-davis/aiogzip/actions)

`aiogzip` provides a fast, simple, and asyncio-native interface for handling `.gz` files, making it a useful complement to Python's built-in `gzip` module for asynchronous applications.

It is designed for high-performance I/O operations, especially for text-based data pipelines, and integrates seamlessly with other `async` libraries like `aiocsv`.

## Features

- **Truly Asynchronous**: Built with `asyncio` and `aiofiles` for non-blocking file I/O.
- **High-Performance Text Processing**: Significantly faster than the standard `gzip` library for text and JSONL file operations.
- **Simple API**: Mimics the interface of `gzip.open()`, making it easy to adopt.
- **Separate Binary and Text Modes**: `AsyncGzipBinaryFile` and `AsyncGzipTextFile` provide clear, type-safe handling of data.
- **Excellent Compression Quality**: Achieves compression ratios nearly identical to the standard `gzip` module.
- **`aiocsv` Integration**: Read and write compressed CSV files effortlessly.

---

## Installation

Install `aiogzip` using pip. To include optional `aiocsv` support, specify the `[csv]` extra.

```bash
# Standard installation
pip install aiogzip

# With aiocsv support
pip install aiogzip[csv]
```

---

## Quickstart

Using `aiogzip` is as simple as using the standard `gzip` module, but with `async`/`await`.

### Writing to a Compressed File

```python
import asyncio
from aiogzip import AsyncGzipFile

async def main():
    # Write binary data
    async with AsyncGzipFile("file.gz", "wb") as f:
        await f.write(b"Hello, async world!")

    # Write text data
    async with AsyncGzipFile("file.txt.gz", "wt") as f:
        await f.write("This is a text file.")

asyncio.run(main())
```

### Reading from a Compressed File

```python
import asyncio
from aiogzip import AsyncGzipFile

async def main():
    # Read the entire file
    async with AsyncGzipFile("file.gz", "rb") as f:
        content = await f.read()
        print(content)

    # Iterate over lines in a text file
    async with AsyncGzipFile("file.txt.gz", "rt") as f:
        async for line in f:
            print(line.strip())

asyncio.run(main())
```

---

## Performance

`aiogzip` is optimized for text-based async workflows and provides excellent performance across different scenarios:

**Text Operations** (where aiogzip excels):

- **2.4x faster** for bulk text read/write operations (33 MB/s vs 13.8 MB/s)
- **2.0x faster** for JSONL processing workflows
- **1.7M lines/sec** for line-by-line iteration
- `async for` and `readline()` have equivalent performance

**Binary Operations** (comparable to standard gzip):

- **1.14x faster** with many small chunks (1.65M chunks/sec) - better overhead handling
- **~50 MB/s** throughput for bulk operations (comparable to gzip's ~52 MB/s)
- **Equivalent performance** for typical 64KB chunked streaming

**Concurrency** (with simulated I/O):

- **1.5x faster** when processing multiple files with I/O delays
- Enables non-blocking concurrent file operations

The key is to match the tool to the task. Use `aiogzip` where its async and text-handling capabilities provide the most significant advantage.

### Async and Concurrent Processing Benefits

`aiogzip` excels in scenarios where you need to process multiple files concurrently or integrate with other async libraries:

- **Concurrent file processing**: Process multiple `.gz` files simultaneously without blocking
- **Async pipeline integration**: Seamlessly works with `aiocsv`, `aiohttp`, and other async libraries
- **Non-blocking I/O**: Allows your application to handle other tasks while file operations are in progress
- **Better resource utilization**: More efficient use of system resources in I/O-bound applications

**Note**: The benefits of async are most visible when there's actual I/O latency (network storage, remote APIs, etc.) or when mixing file operations with other async tasks. For purely local file processing on SSDs, the async overhead may exceed the benefits due to minimal I/O wait times.

### When to Use `aiogzip`

✅ **Recommended for:**

- **Text file processing**: 2.4x performance advantage for text operations
- **Async applications**: Processing CSV, JSONL, or log files in async pipelines
- **Concurrent workflows**: Processing multiple files simultaneously
- **Many small writes**: Better overhead handling (1.65M small chunks/sec)
- **Integration with async libraries**: Works seamlessly with `aiohttp`, `aiocsv`, etc.

### When to Use Standard `gzip`

❌ **Consider standard `gzip` for:**

- **Purely synchronous applications**: No async event loop overhead
- **Simple binary file operations**: Comparable performance (~50 MB/s for both)
- **Memory-constrained environments**: `aiogzip` may use more memory for buffering
- **Seeking/metadata operations**: Not yet supported by `aiogzip`

---

## Limitations

`aiogzip` focuses on the most common file-based read/write operations and does not implement the full API of the standard `gzip` module. Notably, it does not currently support:

- In-memory compression/decompression (e.g., `gzip.compress`/`gzip.decompress`).
- The `seek()` and `tell()` methods for navigating within a file stream.
- Reading or writing gzip headers and metadata like `mtime`.

## Development

This project uses `setuptools` for packaging.

1. **Clone the repository**:

   ```bash
   git clone https://github.com/geoff-davis/aiogzip.git
   cd aiogzip
   ```

2. **Create a virtual environment and install dependencies**:

   ```bash
   python -m venv .venv
   source .venv/bin/activate
   pip install -e ".[csv]"  # Install in editable mode with extras
   pip install -e ".[dev]" # Install dev dependencies
   ```

3. **Run tests**:

   ```bash
   pytest
   ```

## License

This project is licensed under the **MIT License**. See the
[`LICENSE`](https://github.com/geoff-davis/aiogzip/blob/main/LICENSE ) file for details.
