Metadata-Version: 2.4
Name: eigenwave-asr
Version: 1.0.4
Summary: EigenWave-ASR: High-Performance Speech Recognition with Multi-Scale Robin Features
Author-email: Sakib Hasan <sakibhasan@example.com>
License: MIT
Project-URL: Homepage, https://github.com/sakibhasan/eigenwave-asr
Project-URL: Repository, https://github.com/sakibhasan/eigenwave-asr
Keywords: asr,speech-recognition,deep-learning,pytorch,ctc
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: torch>=1.10
Requires-Dist: torchaudio>=0.10
Provides-Extra: lm
Requires-Dist: pyctcdecode>=0.4; extra == "lm"
Requires-Dist: kenlm; extra == "lm"
Provides-Extra: hub
Requires-Dist: huggingface_hub>=0.10; extra == "hub"
Provides-Extra: all
Requires-Dist: pyctcdecode>=0.4; extra == "all"
Requires-Dist: kenlm; extra == "all"
Requires-Dist: huggingface_hub>=0.10; extra == "all"

# EigenWave-ASR 🎤

**High-Performance Speech Recognition with Multi-Scale Robin Features**

A novel ASR model achieving **6.36% WER** on LibriSpeech test-clean with only **27.8M parameters**.

## 🚀 Quick Start

```python
from eigenwave import EigenWaveASR

# Load model
model = EigenWaveASR.from_pretrained("./")  # local directory
# or
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr")  # from Hub

# Transcribe audio
text = model.transcribe("speech.wav")
print(text)
# Output: "hello this is a test of the speech recognition system"
```

## 📦 Installation

```bash
pip install eigenwave-asr

# For best accuracy (KenLM language model):
pip install eigenwave-asr[lm]

# For HuggingFace Hub support:
pip install eigenwave-asr[all]
```

## 📋 Usage Examples

### Basic Transcription
```python
from eigenwave import EigenWaveASR

model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr")

# From file
text = model.transcribe("audio.wav")

# Batch
texts = model.transcribe(["audio1.wav", "audio2.wav", "audio3.wav"])

# From tensor (16kHz mono)
import torch
audio_tensor = torch.randn(16000 * 5)  # 5 seconds
text = model.transcribe(audio_tensor)
```

### Detailed Output
```python
result = model.transcribe_with_details("audio.wav")
print(result)
# {
#     "text": "hello world",
#     "duration": 2.5,
#     "processing_time": 0.320,
#     "rtf": 0.128,         # Real-Time Factor (< 1.0 = real-time)
#     "real_time": True
# }
```

### CPU Inference
```python
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr", device="cpu")
text = model.transcribe("audio.wav", beam_width=10)  # smaller beam for speed
```

### Without Language Model (faster, less accurate)
```python
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr", use_lm=False)
text = model.transcribe("audio.wav")
```

## 📊 Performance

| Dataset | Greedy WER | Beam+LM WER | 
|---------|-----------|-------------|
| test-clean | ~8.5% | **6.36%** |
| test-other | ~20% | ~15% |

## 🏗️ Model Details

- **Architecture**: Conformer-style encoder with Multi-Scale Robin Features
- **Parameters**: 27.8M
- **Input**: 16kHz mono audio
- **Output**: English text (lowercase)
- **Training**: LibriSpeech 960h, 182k steps
- **Features**: Novel Robin differential operator feature extraction at scales [1, 3, 5]

## ⚡ Optimal Hyperparameters (Optuna-tuned)

```
alpha       = 0.9268   (LM weight)
beta        = 0.061    (word insertion bonus)
temperature = 0.536    (softmax sharpness)
beam_width  = 50       (beam search width)
```

## 📄 License

MIT License

## 👤 Author

Sakib Hasan ([@sakibhasanml](https://kaggle.com/sakibhasanml))
