Metadata-Version: 2.4
Name: transaudio
Version: 0.1.3
Summary: Transferable Audio Representations via Procedural Generation (Physics-based AudioMAE)
Author-email: TransAudio Authors <author@example.com>
License: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=1.10.0
Requires-Dist: torchaudio>=0.10.0
Requires-Dist: numpy
Requires-Dist: timm>=0.4.12
Requires-Dist: tqdm
Dynamic: license-file

# TransAudio: Transferable Audio Representations via Procedural Generation

An implementation of the **AudioMAE**  model pre-trained entirely on **procedurally generated audio**  (physics-based synthesis). This library allows you to train powerful audio representations without using any real-world data, achieving strong sim-to-real transfer.

Based on the paper: *Transferable Audio Representations via Procedural Generation*[cite: 1].

## Features

* **Physics-based Synthesizer** : Generates infinite training data using Additive, FM, and Pulse synthesis with ADSR envelopes and transient bursts[cite: 1]. 
* **AudioMAE** : Masked Autoencoder tailored for audio spectrograms. 
* **Zero-Data Pre-training** : Train models without downloading massive datasets like AudioSet.

## Installation

```bash 

pip install transaudio 
```

## Usage

### 1. Training from Scratch 
You can start the pre-training loop immediately after installation. The synthesizer runs on-the-fly.

```bash
transaudio-train --epochs 100 --batch_size 32
```

### 2. Using the Model in Python 

```python
import  torch 
from transaudio import  mae_vit_base_patch16, advanced_physics_synth 
 
# 1. Create the model 
model = mae_vit_base_patch16(audio_exp=True, img_size=(1024, 128 )) 
 
# 2. Generate a synthetic sample 
waveform = advanced_physics_synth(f0_hz=440.0 ) 
print(f"Generated waveform shape: {waveform.shape}" ) 
 
# 3. Forward pass (Simulate input) 
# Input expected: [Batch, Channel, Time, Freq] 
dummy_input = torch.randn(1, 1, 1024, 128 ) 
loss, pred, mask = model(dummy_input, mask_ratio=0.75 ) 
print(f"Loss: {loss.item()}" ) 
```

## Citation 

If you use this code, please cite the original paper: 

```
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. XX, NO. XX, 2026 
Transferable Audio Representations via Procedural Generation 
Fengrui Liu, Ruiyang Huang, Qijian Zheng, Yuanfang Wang, and Feng Liu
```
