Metadata-Version: 2.3
Name: avex
Version: 0.5.0a1
Summary: A comprehensive Python-based system for training, evaluating, and analyzing audio representation learning models with support for both supervised and self-supervised learning paradigms
Keywords: audio,machine-learning,avex,self-supervised,bioacoustics
Author: Earth Species
Author-email: Earth Species <contact@earthspecies.org>
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Requires-Dist: librosa>=0.11.0
Requires-Dist: numpy>=1.26.4
Requires-Dist: omegaconf>=2.3.0
Requires-Dist: pydantic>=2.11.3
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: soundfile>=0.13.1
Requires-Dist: birdnetlib>=0.18.0
Requires-Dist: timm>=1.0.15
Requires-Dist: torch>=2.5.0
Requires-Dist: torchaudio>=2.2.0
Requires-Dist: torchvision>=0.17.0
Requires-Dist: tqdm>=4.67.1
Requires-Dist: huggingface-hub>=0.36.0
Requires-Dist: transformers>=4.0.0
Requires-Dist: h5py>=3.13.0
Requires-Dist: tensorflow>=2.17.0
Requires-Dist: tensorflow-hub>=0.16.1
Requires-Dist: einops>=0.8.1
Requires-Dist: pydantic-settings>=2.9.1
Requires-Dist: fsspec>=2024.2.0
Requires-Dist: gcsfs>=2024.2.0
Requires-Dist: s3fs>=2024.2.0
Requires-Dist: pandas>=2.2.3
Requires-Dist: pytorch-lightning>=2.5.1.post0 ; extra == 'dev'
Requires-Dist: mlflow>=2.22.0 ; extra == 'dev'
Requires-Dist: wandb>=0.21.0 ; extra == 'dev'
Requires-Dist: esp-sweep>=0.1.0 ; extra == 'dev'
Requires-Dist: esp-data>=1.5.0 ; extra == 'dev'
Requires-Dist: gradio>=4.0.0 ; extra == 'dev'
Requires-Dist: gradio-leaderboard>=0.0.13 ; extra == 'dev'
Requires-Python: >=3.10, <3.13
Project-URL: Homepage, https://github.com/earthspecies/avex
Project-URL: Repository, https://github.com/earthspecies/avex
Project-URL: Documentation, https://github.com/earthspecies/avex#readme
Project-URL: Bug Tracker, https://github.com/earthspecies/avex/issues
Provides-Extra: dev
Description-Content-Type: text/markdown

# avex - Animal Vocalization Encoder Library

![CI status](https://github.com/earthspecies/avex/actions/workflows/pythonapp.yml/badge.svg?branch=main)
![Pre-commit status](https://github.com/earthspecies/avex/actions/workflows/pre-commit.yml/badge.svg?branch=main)

An API for model loading and inference, and a Python-based system for training and evaluating bioacoustics representation learning models.

## Description

The Animal Vocalization Encoder library avex provides a unified interface for working with pre-trained bioacoustics representation learning models, with support for:

- **Model Loading**: Load pre-trained models with checkpoints and class mappings
- **Embedding Extraction**: Extract features from audio for downstream tasks
- **Probe System**: Flexible probe heads (linear, MLP, LSTM, attention, transformer) for transfer learning
- **Training & Evaluation**: Scripts for supervised learning experiments
- **Plugin Architecture**: Register and use custom models seamlessly

## Installation

### Prerequisites

- Python 3.10, 3.11, or 3.12

### Install with pip

```bash
pip install avex
```

### Install with uv

```bash
uv add avex
```

For development installation with training/evaluation tools, see the [Contributing guide](CONTRIBUTING.md).

## Quick Start

```python
import torch
import librosa
from avex import load_model, list_models

# List available models
print(list_models().keys())

# Load a pre-trained model
model = load_model("esp_aves2_sl_beats_all", device="cpu")

# Load and preprocess audio (BEATs expects 16kHz)
audio, sr = librosa.load("your_audio.wav", sr=16000)
audio_tensor = torch.tensor(audio).unsqueeze(0)  # Shape: (1, num_samples)

# Run inference
with torch.no_grad():
    logits = model(audio_tensor)
    predicted_class = logits.argmax(dim=-1).item()

# Get human-readable label
if model.label_mapping:
    label = model.label_mapping.get(str(predicted_class), predicted_class)
    print(f"Predicted: {label}")
```

### Embedding Extraction

```python
# Load for embedding extraction (no classifier head)
model = load_model("esp_aves2_sl_beats_all", return_features_only=True, device="cpu")

with torch.no_grad():
    embeddings = model(audio_tensor)
    # Shape: (batch, time_steps, 768) for BEATs

# Pool to get fixed-size embedding
embedding = embeddings.mean(dim=1)  # Shape: (batch, 768)
```

### Transfer Learning with Probes

```python
from avex.models.probes import build_probe_from_config
from avex.configs import ProbeConfig

# Load backbone for feature extraction
base = load_model("esp_aves2_sl_beats_all", return_features_only=True, device="cpu")

# Define a probe head for your task
probe_config = ProbeConfig(
    probe_type="linear",
    target_layers=["last_layer"],
    aggregation="mean",
    freeze_backbone=True,
    online_training=True,
)

probe = build_probe_from_config(
    probe_config=probe_config,
    base_model=base,
    num_classes=10,  # Your number of classes
    device="cpu",
)
```

## Documentation

**Full documentation**: [docs/index.md](docs/index.md)

### Core Documentation

- **[API Reference](docs/api_reference.md)** - Complete API documentation for model loading, registry, and management functions
- **[Architecture](docs/api_architecture.md)** - Framework architecture, core components, and plugin system
- **[Supported Models](docs/supported_models.md)** - List of supported models and their configurations
- **[Configuration](docs/configuration.md)** - ModelSpec parameters, audio requirements, and configuration options

### Usage Guides

- **[Training and Evaluation](docs/training_evaluation.md)** - Guide to training and evaluating models
- **[Embedding Extraction](docs/embedding_extraction.md)** - Working with feature representations and embeddings
- **[Examples](docs/examples.md)** - Comprehensive examples and use cases

### Advanced Topics

- **[Probe System](docs/probe_system.md)** - Understanding and using probes for transfer learning
- **[API Probes](docs/api_probes.md)** - API reference for probe-related functionality
- **[Custom Model Registration](docs/custom_model_registration.md)** - Guide on registering custom model classes and loading pre-trained models

**Examples**: See the [`examples/`](examples/) directory:

- `00_quick_start.py` - Basic model loading
- `01_basic_model_loading.py` - Loading models with different configurations
- `02_checkpoint_loading.py` - Working with checkpoints
- `03_custom_model_registration.py` - Custom model registration
- `04_training_and_evaluation.py` - Training and evaluation examples
- `05_embedding_extraction.py` - Feature extraction
- `06_classifier_head_loading.py` - Classifier head behavior

## Supported Models

The framework supports the following audio representation learning models:

- **EfficientNet** - EfficientNet-based models for audio classification
- **BEATs** - BEATs transformer models for audio representation learning
- **EAT** - Efficient Audio Transformer models
- **AVES** - AVES model for bioacoustics
- **BirdMAE** - BirdMAE masked autoencoder for bioacoustic representation learning
- **ATST** - Audio Spectrogram Transformer
- **ResNet** - ResNet models (ResNet18, ResNet50, ResNet152)
- **CLIP** - Contrastive Language-Audio Pretraining models
- **BirdNet** - BirdNet models for bioacoustic classification
- **Perch** - Perch models for bioacoustics
- **SurfPerch** - SurfPerch models

See [Supported Models](docs/supported_models.md) for detailed information and configuration examples.

## Supported Probes

The framework provides flexible probe heads for transfer learning:

- **Linear** - Simple linear classifier (fastest, most memory-efficient)
- **MLP** - Multi-layer perceptron with configurable hidden layers
- **LSTM** - Long Short-Term Memory network for sequence modeling
- **Attention** - Self-attention mechanism for sequence modeling
- **Transformer** - Full transformer encoder architecture

Probes can be trained:
- **Online**: End-to-end with the backbone (raw audio input)
- **Offline**: On pre-computed embeddings

See [Probe System](docs/probe_system.md) and [API Probes](docs/api_probes.md) for detailed documentation.

## Citing

If you use this framework in your research, please cite:

```bibtex
@article{miron2025matters,
  title={What Matters for Bioacoustic Encoding},
  author={Miron, Marius and Robinson, David and Alizadeh, Milad and Gilsenan-McMahon, Ellen and Narula, Gagan and Pietquin, Olivier and Geist, Matthieu and Chemla, Emmanuel and Cusimano, Maddie and Effenberger, Felix and others},
  journal={arXiv preprint arXiv:2508.11845},
  year={2025}
}
```

## Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for:

- Development setup
- Running tests
- Code style guidelines
- Adding new functionality
- Pull request process

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Built on top of PyTorch
- Integrates with various pre-trained audio models
