Metadata-Version: 2.4
Name: cascache-lib
Version: 0.1.1
Summary: Content-addressable storage (CAS) caching library for Python with local and remote backend support
Project-URL: Homepage, https://gitlab.com/cascascade/cascache_lib
Project-URL: Repository, https://gitlab.com/cascascade/cascache_lib
Project-URL: Issues, https://gitlab.com/cascascade/cascache_lib/-/issues
Project-URL: Documentation, https://gitlab.com/cascascade/cascache_lib/-/blob/main/README.md
Author-email: ladidadida <stefan@dalada.de>
License: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Build Tools
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: aiofiles>=23.0.0
Requires-Dist: anyio>=4.0.0
Requires-Dist: grpcio>=1.64.0
Requires-Dist: protobuf>=5.26.0
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: grpcio-tools>=1.64.0; extra == 'dev'
Requires-Dist: pyright>=1.1.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=7.0.0; extra == 'dev'
Requires-Dist: pytest>=9.0.0; extra == 'dev'
Requires-Dist: ruff>=0.3.0; extra == 'dev'
Requires-Dist: twine>=4.0; extra == 'dev'
Description-Content-Type: text/markdown

# cascache_lib

**Content-addressable storage (CAS) caching library for Python**

`cascache_lib` is a flexible, high-performance caching library that brings content-addressed caching to Python applications. It provides both local filesystem caching and remote CAS backend support via gRPC, making it perfect for build systems, CI/CD pipelines, and any workflow that benefits from intelligent caching.

## Features

- 🚀 **Multiple Backends**: Local filesystem, remote CAS (cascache server), or hybrid (both)
- 🔐 **Content-Addressed**: Cache keys based on SHA256 hashes of inputs
- ⚡ **Async/Await**: Non-blocking I/O with asyncio support
- 🔄 **Automatic Retry**: Exponential backoff for transient network errors
- 📦 **Compression**: Efficient tar.gz compression for cached artifacts
- 🛡️ **Graceful Degradation**: Automatic fallback to local cache when remote unavailable
- 🎯 **Type-Safe**: Full type hints with Python 3.11+ support
- 🧪 **Well-Tested**: Comprehensive test suite with >85% coverage

## Installation

```bash
# From PyPI (once published)
pip install cascache-lib

# From source
pip install git+https://gitlab.com/cascascade/cascache-lib.git

# With development dependencies
pip install cascache-lib[dev]
```

## Quick Start

### Local Caching

```python
from pathlib import Path
from cascache_lib import LocalCache, compute_cache_key

# Create a local cache
cache = LocalCache(Path(".cache"))

# Compute cache key from inputs
cache_key = compute_cache_key(
    command="python build.py",
    inputs=[Path("src/main.py"), Path("pyproject.toml")],
    env={"PYTHON_VERSION": "3.13"},
)

# Check if artifacts are cached
if await cache.exists(cache_key):
    print("Cache hit!")
    await cache.get(cache_key, [Path("dist/")])
else:
    print("Cache miss - building...")
    # Run your build process here
    # ...
    # Cache the outputs
    await cache.put(cache_key, [Path("dist/")])
```

### Remote CAS Caching

```python
from cascache_lib import RemoteCache

# Connect to cascache server
cache = RemoteCache(
    cas_url="grpc://cache.example.com:50051",
    token="your-auth-token",
    timeout=30.0,
    max_retries=3,
)

# Use exactly like LocalCache
cache_key = compute_cache_key("make build", [Path("src/")])
if not await cache.exists(cache_key):
    # Build and cache
    await cache.put(cache_key, [Path("build/")])
```

### Hybrid Caching (Recommended)

```python
from cascache_lib import HybridCache, LocalCache, RemoteCache

# Create hybrid cache: local + remote
local = LocalCache(Path(".cache/local"))
remote = RemoteCache("grpc://cache.example.com:50051", token="...")

cache = HybridCache(
    local_cache=local,
    remote_cache=remote,
    auto_upload=True,  # Automatically sync to remote
)

# Automatic behavior:
# - get(): Check local first (fast), then remote, populate local on hit
# - put(): Store in local AND upload to remote
# - Graceful fallback to local-only on remote errors

cache_key = compute_cache_key("cargo build --release", [Path("src/")])
if await cache.get(cache_key, [Path("target/release/")]):
    print("Restored from cache (local or remote)")
else:
    # Build...
    await cache.put(cache_key, [Path("target/release/")])
```

### Configuration-Based Setup

```python
from cascache_lib import create_cache
from cascache_lib.config import CacheConfig

# Define configuration (or load from YAML/JSON)
config = CacheConfig(
    local={
        "enabled": True,
        "path": ".cache/cascache",
    },
    remote={
        "enabled": True,
        "url": "grpc://localhost:50051",
        "token_file": "~/.cache/cascache/token",
        "upload": True,
        "download": True,
        "timeout": 30.0,
        "max_retries": 3,
    },
)

# Create cache from config
cache = create_cache(config)

# Use the cache (automatically hybrid if both local and remote enabled)
```

## Use Cases

### Build Systems

```python
# Cache compiled artifacts
cache_key = compute_cache_key(
    command="gcc -o myapp main.c",
    inputs=[Path("main.c"), Path("config.h")],
)

if not await cache.exists(cache_key):
    subprocess.run(["gcc", "-o", "myapp", "main.c"])
    await cache.put(cache_key, [Path("myapp")])
```

### CI/CD Pipelines

```python
# Share build artifacts across CI jobs
cache = HybridCache(
    local_cache=LocalCache(Path("/tmp/ci-cache")),
    remote_cache=RemoteCache(
        cas_url=os.environ["CACHE_SERVER_URL"],
        token=os.environ["CACHE_TOKEN"],
    ),
)

# First job caches, subsequent jobs restore
cache_key = compute_cache_key("npm run build", [Path("package.json"), Path("src/")])
if await cache.get(cache_key, [Path("dist/")]):
    print("Skipped build - restored from cache")
```

### Test Frameworks

```python
# Cache test fixtures or test results
cache_key = compute_cache_key(
    command="pytest tests/",
    inputs=expand_globs(["tests/**/*.py", "src/**/*.py"]),
)

# Cache test database fixtures
await cache.put(cache_key, [Path("tests/fixtures/test.db")])
```

## API Reference

### CacheBackend (Abstract Base)

All cache implementations inherit from `CacheBackend`:

```python
class CacheBackend(ABC):
    async def exists(cache_key: str) -> bool
    async def get(cache_key: str, output_paths: list[Path]) -> bool
    async def put(cache_key: str, output_paths: list[Path]) -> bool
    async def clear() -> None
```

### LocalCache

Filesystem-based cache with tar.gz compression:

```python
cache = LocalCache(cache_dir: Path | None = None)
```

- **cache_dir**: Cache directory (default: `.cache`)

### RemoteCache

Remote cache using cascache server:

```python
cache = RemoteCache(
    cas_url: str,
    token: str | None = None,
    timeout: float = 30.0,
    max_retries: int = 3,
    initial_backoff: float = 0.1,
)
```

- **cas_url**: Server URL (format: `grpc://host:port`)
- **token**: Authentication token (optional)
- **timeout**: Request timeout in seconds
- **max_retries**: Max retry attempts for transient errors
- **initial_backoff**: Initial retry delay in seconds

### HybridCache

Hybrid cache combining local + remote:

```python
cache = HybridCache(
    local_cache: LocalCache,
    remote_cache: RemoteCache | None = None,
    auto_upload: bool = True,
)
```

- **local_cache**: Local cache instance (required)
- **remote_cache**: Remote cache instance (optional)
- **auto_upload**: Auto-upload to remote on `put()`

Methods:
- `get_stats() -> dict`: Get cache hit/miss statistics
- `reset_stats() -> None`: Reset statistics

### Utility Functions

#### compute_cache_key

Compute deterministic SHA256 hash for cache keys:

```python
cache_key = compute_cache_key(
    command: str,
    inputs: list[Path],
    env: dict[str, str] | None = None,
) -> str
```

Returns 64-character hex string (SHA256 digest).

#### expand_globs

Expand glob patterns to file paths:

```python
files = expand_globs(
    patterns: list[str],
    base_dir: Path | None = None,
) -> list[Path]
```

Supports `**` for recursive matching.

## Configuration

### Pydantic Models

```python
from cascache_lib.config import (
    CacheConfig,
    LocalCacheConfig,
    RemoteCacheConfig,
)

config = CacheConfig(
    local=LocalCacheConfig(
        enabled=True,
        path=".cache",
    ),
    remote=RemoteCacheConfig(
        enabled=True,
        type="cas",
        url="grpc://localhost:50051",
        token_file=None,
        upload=True,
        download=True,
        timeout=30.0,
        max_retries=3,
        initial_backoff=0.1,
    ),
)
```

### YAML Configuration

```yaml
local:
  enabled: true
  path: .cache

remote:
  enabled: true
  type: cas
  url: grpc://cache.example.com:50051
  token_file: ~/.cache/cascache/token
  upload: true
  download: true
  timeout: 30.0
  max_retries: 3
```

Load with:

```python
import yaml
from cascache_lib.config import CacheConfig

with open("cache-config.yaml") as f:
    config_dict = yaml.safe_load(f)
    config = CacheConfig(**config_dict)
```

## Error Handling

The library handles errors gracefully:

- **Network errors**: Automatic retry with exponential backoff
- **Timeout errors**: Configurable timeout per request
- **Authentication errors**: Clear error messages, no retry
- **Remote unavailable**: Automatic fallback to local cache (CacheManager)

```python
try:
    cache = CASCache("grpc://cache.example.com:50051")
    await cache.get(cache_key, outputs)
except grpc.RpcError as e:
    # Handle gRPC errors
    print(f"Cache error: {e}")
```

## Development

### Setup

```bash
# Clone repository
git clone https://gitlab.com/cascascade/cascache-lib.git
cd cascache-lib

# Install with dev dependencies
pip install -e ".[dev]"
```

### Running Tests

```bash
# Run all tests
pytest

# With coverage
pytest --cov=cascache_lib --cov-report=html

# Run specific test file
pytest tests/unit/test_local_cache.py -v
```

### Code Quality

```bash
# Type checking
pyright

# Linting
ruff check src/

# Formatting
ruff format src/
```

### Regenerating Protobuf Code

```bash
python -m grpc_tools.protoc \
    -I src/cascache_lib/api/protos \
    --python_out=src/cascache_lib/api/generated \
    --grpc_python_out=src/cascache_lib/api/generated \
    --pyi_out=src/cascache_lib/api/generated \
    src/cascache_lib/api/protos/*.proto
```

## Architecture

```
cascache_lib/
├── cache/
│   ├── backend.py      # Abstract CacheBackend interface
│   ├── local.py        # LocalCache (filesystem)
│   ├── cas.py          # CASCache (remote gRPC)
│   ├── manager.py      # CacheManager (hierarchical)
│   ├── hash.py         # Cache key computation
│   └── factory.py      # Factory function
├── api/
│   ├── protos/         # Protobuf definitions (.proto)
│   └── generated/      # Generated Python code
└── config.py           # Pydantic configuration models
```

## Compatibility

- **Python**: 3.11, 3.12, 3.13+
- **Platforms**: Linux, macOS, Windows
- **CAS Server**: Compatible with [cascache](https://gitlab.com/cascascade/cascache)

## Related Projects

- [cascache](https://gitlab.com/cascascade/cascache) - CAS server implementation (Python)
- [cascade](https://gitlab.com/cascascade/cascade) - Workflow orchestration tool using cascache_lib

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Contributing

Contributions welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a merge request

## Support

- **Issues**: https://gitlab.com/cascascade/cascache-lib/-/issues
- **Documentation**: https://gitlab.com/cascascade/cascache-lib/-/blob/main/README.md

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for version history.

---

**Built with ❤️ for better caching**
