Metadata-Version: 2.4
Name: thread-downloader
Version: 0.1.0
Summary: Lightweight async downloader with Range detection, multi-part downloads, and flexible assembly modes
Author-email: Michael-YS <21081757+Michael-YS@users.noreply.github.com>
License: MIT
Project-URL: Homepage, https://github.com/Michael-YS/thread-downloader
Project-URL: Documentation, https://github.com/Michael-YS/thread-downloader#readme
Project-URL: Repository, https://github.com/Michael-YS/thread-downloader
Project-URL: Issues, https://github.com/Michael-YS/thread-downloader/issues
Keywords: downloader,async,http,range,concurrent,multi-threaded,asyncio,httpx
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: System :: Networking
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.28.0
Provides-Extra: test
Requires-Dist: pytest>=8.0.0; extra == "test"

# thread-downloader

A lightweight, reusable Python async downloader with intelligent Range header detection, multi-part download support, and flexible assembly modes.

## Features

- **Async-first design**: Pure `asyncio` + `httpx` for modern async I/O
- **Smart Range detection**: Automatically detects server Range support; falls back to single-stream download if unavailable
- **Multi-part concurrent downloads**: Splits files into segments for parallel download (when Range is supported)
- **Flexible assembly modes**:
  - Memory mode (default): Cache all parts in RAM, assemble once, write to disk
  - Temp file mode: Write parts to system temp directory, merge in order (low memory footprint for large files)
- **Reusable instances**: Each `Downloader` instance can handle multiple sequential downloads
- **Global concurrency control**: Class-level semaphore to limit total concurrent downloads across all instances
- **Automatic retry**: Configurable retry policy with exponential backoff
- **Progress tracking**: Optional callback for download progress updates

## Installation

From PyPI:

```bash
pip install thread-downloader
```

From source:

```bash
git clone https://github.com/Michael-YS/thread-downloader.git
cd thread-downloader
pip install -e .
```

## Quick Start

### Basic Usage with Async API

```python
import asyncio
from thread_downloader import Downloader, DownloadConfig, DownloadTask

async def main():
    downloader = Downloader(DownloadConfig(workers=4))
    task = DownloadTask(
        url="https://example.com/file.bin",
        output="downloaded_file.bin"
    )
    result = await downloader.download(task)
    print(f"Downloaded {result.bytes_written} bytes")
    print(f"Used multi-thread: {result.used_multi_thread}")
    print(f"Range supported: {result.range_supported}")

asyncio.run(main())
```

### Function API

```python
import asyncio
from thread_downloader import download_file_async, DownloadTask

async def main():
    task = DownloadTask("https://example.com/file.bin", "output.bin")
    result = await download_file_async(task)
    print(f"Download complete: {result.bytes_written} bytes")

asyncio.run(main())
```

### CLI Usage

```bash
# Basic download
thread-downloader https://example.com/file.bin output.bin

# Advanced options
thread-downloader https://example.com/file.bin output.bin \
  --workers 8 \
  --mem-assemble no \
  --global-limit 4 \
  --timeout 60 \
  --retries 5

# Or via Python module
python -m thread_downloader https://example.com/file.bin output.bin
```

## Configuration

### `DownloadConfig` Options

```python
from thread_downloader import DownloadConfig, RetryPolicy

config = DownloadConfig(
    workers=4,              # Parallel segments for multi-part download (default: 4)
    chunk_size=262144,      # Bytes per read/write operation (default: 256KB)
    timeout_seconds=30.0,   # HTTP request timeout (default: 30)
    retry_policy=RetryPolicy(
        max_attempts=3,                    # Max retry attempts (default: 3)
        backoff_seconds=0.5,               # Initial backoff duration (default: 0.5s)
        max_backoff_seconds=4.0,           # Max backoff duration (default: 4s)
    ),
    mem_assemble=True,     # "yes" (memory) or "no" (temp files), default: "yes"
)

downloader = Downloader(config)
```

### Assembly Modes

- **`mem_assemble=True` (default)**
  - All parts downloaded to memory
  - Single write to disk after all parts complete
  - Best for: Medium files, systems with available RAM
  - Pros: Fewer disk I/O operations, simple cleanup
  - Cons: Memory usage grows with file size

- **`mem_assemble=False`**
  - Each part written to system temp directory (e.g., `/tmp` on Linux)
  - Sequential read and merge into final file
  - Best for: Large files, memory-constrained systems
  - Pros: Constant memory usage regardless of file size
  - Cons: Additional disk I/O for temp files

### Global Concurrency Control

Control the maximum number of simultaneous downloads across all `Downloader` instances:

```python
from thread_downloader import Downloader

# Limit to 2 concurrent downloads globally
Downloader.set_global_concurrency_limit(2)

downloader1 = Downloader()
downloader2 = Downloader()

# Both instances will respect the global limit
```

## Advanced Examples

### Progress Tracking

```python
import asyncio
from thread_downloader import Downloader, DownloadTask, DownloadProgress

def on_progress(progress: DownloadProgress) -> None:
    print(
        f"Downloaded: {progress.downloaded_bytes}/{progress.total_bytes} bytes "
        f"({progress.percent:.1f}%) - Mode: {progress.mode} - Workers: {progress.worker_count}"
    )

async def main():
    downloader = Downloader()
    task = DownloadTask("https://example.com/large_file.bin", "output.bin")
    result = await downloader.download(
        task,
        progress_callback=on_progress,
    )
    print(f"Complete: {result.bytes_written} bytes")

asyncio.run(main())
```

### Multi-threaded Scenario with Instance Reuse

```python
import asyncio
import threading
from thread_downloader import Downloader, DownloadConfig, DownloadTask

Downloader.set_global_concurrency_limit(4)

async def download_files(urls: list[str], prefix: str) -> None:
    downloader = Downloader(DownloadConfig(workers=4))
    for idx, url in enumerate(urls):
        task = DownloadTask(url, f"{prefix}_{idx}.bin")
        result = await downloader.download(task)
        print(f"Downloaded part {idx}: {result.bytes_written} bytes")

def worker(urls: list[str], prefix: str) -> None:
    asyncio.run(download_files(urls, prefix))

# Spawn multiple threads, each with its own Downloader instance
threads = [
    threading.Thread(
        target=worker,
        args=(["https://example.com/file1.bin", "https://example.com/file2.bin"], "thread_a")
    ),
    threading.Thread(
        target=worker,
        args=(["https://example.com/file3.bin", "https://example.com/file4.bin"], "thread_b")
    ),
]

for t in threads:
    t.start()
for t in threads:
    t.join()

print("All downloads complete")
```

### Large File Download with Temp Mode

```python
import asyncio
from thread_downloader import Downloader, DownloadConfig, DownloadTask

async def main():
    # Use temp files for a 5GB file to avoid memory bloat
    config = DownloadConfig(
        workers=8,
        mem_assemble=False,  # Use temp files
        timeout_seconds=60,
    )
    downloader = Downloader(config)
    task = DownloadTask(
        "https://example.com/large_5gb_file.iso",
        "downloaded.iso"
    )
    result = await downloader.download(task)
    print(f"Downloaded: {result.bytes_written} bytes in temp mode")

asyncio.run(main())
```

## Design Constraints

- Each `Downloader` instance handles one download at a time (instance-level mutual exclusion)
- Multiple file downloads require multiple `Downloader` instances or sequential calls
- Global concurrency is controlled by a class-level `semaphore`, independent of instance count
- Thread-safety: Safe for use in multi-threaded environments where each thread has its own `Downloader` instance

## Testing

Run the test suite:

```bash
pip install -e ".[test]"
pytest -v
```

Tests cover:
- Instance reuse and sequential downloads
- Multi-instance parallel downloads with global limit enforcement
- Range header detection and automatic fallback
- Retry mechanism for transient failures
- Both memory and temp file assembly modes

## CLI Reference

```bash
usage: thread-downloader [-h] [--workers WORKERS] [--chunk-size CHUNK_SIZE]
                          [--timeout TIMEOUT] [--retries RETRIES]
                          [--global-limit GLOBAL_LIMIT]
                          [--mem-assemble {yes,no}]
                          url output

Async downloader with Range fallback

positional arguments:
  url                   File URL to download
  output                Output file path

optional arguments:
  -h, --help            show this help message and exit
  --workers WORKERS     Number of parallel segments (default: 4)
  --chunk-size CHUNK_SIZE
                        Bytes per chunk (default: 262144)
  --timeout TIMEOUT     HTTP timeout in seconds (default: 30.0)
  --retries RETRIES     Max retry attempts (default: 3)
  --global-limit GLOBAL_LIMIT
                        Global concurrency limit (default: 4)
  --mem-assemble {yes,no}
                        Assembly mode: yes (memory) or no (temp files)
                        (default: yes)
```

## How It Works

1. **Range Detection**:
   - Sends HEAD request to probe `Accept-Ranges` header
   - If unavailable, sends Range probe (`bytes=0-0`) to check for 206 response
   - Determines if server supports resumable downloads

2. **Multi-part Download** (when Range supported):
   - Splits file into segments based on `Content-Length` and `workers` count
   - Creates async tasks for each segment, respecting global semaphore
   - Stores parts in memory or temp files based on `mem_assemble` config
   - After all parts complete, assembles into final file

3. **Single-stream Fallback** (when Range not supported):
   - Falls back to standard GET request
   - Streams response and writes directly

4. **Retry & Error Handling**:
   - Exponential backoff on transient errors
   - Configurable max attempts
   - Full exception propagation on permanent failures

## Requirements

- Python 3.11+
- `httpx >= 0.28.0`

## License

MIT License. See [LICENSE](LICENSE) file for details.

## Contributing

Contributions welcome! Please feel free to submit a Pull Request.

## Changelog

### v0.1.0 (2026-03-18)

- Initial release
- Async-first download engine with httpx
- Range header detection and fallback
- Multi-part concurrent downloads
- Global and instance-level concurrency control
- Memory and temp file assembly modes
- Full CLI support
- Comprehensive test suite
