Metadata-Version: 2.1
Name: datamole
Version: 0.1.0
Summary: Dataset versioning and management for ML projects.
Author-email: Anshuman Narayan <anshu.nryn@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/anshumandec94/datamole
Project-URL: Repository, https://github.com/anshumandec94/datamole
Project-URL: Issues, https://github.com/anshumandec94/datamole/issues
Keywords: data,versioning,ml,datasets,dvc-alternative
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"

# datamole

Simple data versioning for ML projects. Track, version, and share your datasets with minimal overhead.

## Features
- 🚀 Simple CLI interface (`dtm` command)
- 📦 Version datasets with automatic hashing
- 🏷️ Tag versions for easy reference
- 🔍 Smart lookup: pull by hash, prefix, or tag
- 💾 Multiple storage backends (local, GCS, S3, Azure)
- 🔒 Transaction-safe uploads
- 🤝 Collaboration-friendly with shared storage

## Installation

```bash
pip install datamole
```

After installation, the `dtm` command will be available globally.

## Quick Start

```bash
# Configure storage backend (one-time setup)
dtm config --backend local --remote-uri /path/to/shared/storage

# Initialize in your project
cd my-ml-project
dtm init

# Add your data and create a version
dtm add-version -m "Initial dataset" -t v1.0

# List versions
dtm list-versions

# Pull a specific version (by tag, hash, or prefix)
dtm pull v1.0
dtm pull abc123  # by hash prefix
dtm pull latest  # pull current version
```

## CLI Commands

### Setup & Configuration
```bash
# Configure storage backend
dtm config --backend local --remote-uri /path/to/storage

# Initialize project
dtm init [--data-dir data] [--backend local] [--no-pull]
```

### Version Management
```bash
# Create a new version
dtm add-version [-m "message"] [-t tag-name]

# Pull a version
dtm pull [version] [-f]

# List all versions
dtm list-versions

# Show current version
dtm current-version
```

## Python API

```python
from datamole.core import DataMole

# Initialize
dtm = DataMole()
dtm.init(data_dir="data", backend="local")

# Create versions
dtm.add_version(message="Initial dataset", tag="v1.0")

# Pull versions
dtm.pull("v1.0")
dtm.pull("abc123")  # by hash prefix
dtm.pull()  # pull current version
```

## Storage Backends

- **local**: Local filesystem storage
- **gcs**: Google Cloud Storage (coming soon)
- **s3**: AWS S3 (coming soon)
- **azure**: Azure Blob Storage (coming soon)

## Development

```bash
# Clone repository
git clone https://github.com/yourusername/datamole.git
cd datamole

# Install in development mode
uv pip install -e ".[dev]"

# Run tests
pytest
```

## License

MIT


