Metadata-Version: 2.4
Name: laser-asd
Version: 0.1.0
Summary: LASER ASD - Lip Landmark Assisted Speaker Detection for Active Speaker Detection
Author-email: nawta <nawta1998@gmail.com>
Maintainer-email: nawta <nawta1998@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/nawta/LASER_ASD_PyPI
Project-URL: Repository, https://github.com/nawta/LASER_ASD_PyPI
Project-URL: Documentation, https://github.com/nawta/LASER_ASD_PyPI#readme
Project-URL: Bug Tracker, https://github.com/nawta/LASER_ASD_PyPI/issues
Keywords: active-speaker-detection,speaker-detection,audio-visual,lip-sync,deep-learning,pytorch,computer-vision
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Multimedia :: Video
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=1.10.0
Requires-Dist: torchvision>=0.11.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: opencv-python>=4.5.0
Requires-Dist: resampy>=0.4.0
Requires-Dist: python_speech_features>=0.6
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: isort>=5.0; extra == "dev"
Requires-Dist: flake8>=6.0; extra == "dev"
Dynamic: license-file

# LASER ASD - Lip Landmark Assisted Speaker Detection

[![PyPI version](https://badge.fury.io/py/laser-asd.svg)](https://badge.fury.io/py/laser-asd)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A PyTorch implementation of LASER ASD for Active Speaker Detection, providing a simple interface for the LoCoNet encoder with LASER landmark injection.

## Features

- Active Speaker Detection using audio-visual fusion
- Based on LoCoNet architecture with LASER landmark injection
- Simple Python API for inference
- GPU acceleration with CUDA support
- Compatible with PyTorch 1.10+

## Installation

```bash
pip install laser-asd
```

## Quick Start

```python
import numpy as np
from laser_asd import LaserASDModel

# Initialize model
model = LaserASDModel(device="cuda")

# Load pre-trained weights
model.load_weights("/path/to/loconet_laser.model")

# Prepare inputs
# face_crops: numpy array of shape [T, H, W, C] or [T, H, W]
# audio_data: numpy array of shape [samples] at 16kHz
face_crops = np.random.rand(100, 112, 112, 3).astype(np.float32)
audio_data = np.random.rand(64000).astype(np.float32)  # 4 seconds at 16kHz

# Predict speaking scores
scores = model.predict(face_crops, audio_data, sample_rate=16000, fps=25.0)

# scores >= 0 indicates speaking
is_speaking = scores >= 0
```

## Model Weights

Pre-trained model weights can be downloaded from:
- [Original LASER ASD repository](https://github.com/plnguyen2908/LASER_ASD)

Set the model path via environment variable:
```bash
export LASER_ASD_MODEL_PATH=/path/to/loconet_laser.model
```

Or pass it directly:
```python
model = LaserASDModel(device="cuda", model_path="/path/to/loconet_laser.model")
```

## API Reference

### LaserASDModel

```python
class LaserASDModel:
    def __init__(
        self,
        device: str = "cuda",
        model_path: Optional[Path] = None,
        use_landmarks: bool = False,
    ):
        """
        Initialize LASER ASD model.

        Args:
            device: Device to run model on ('cuda' or 'cpu')
            model_path: Path to model weights file
            use_landmarks: Whether to use landmark features (False uses zeros)
        """

    def load_weights(self, model_path: Optional[str] = None):
        """Load model weights."""

    def predict(
        self,
        face_crops: np.ndarray,
        audio_data: np.ndarray,
        sample_rate: int = 16000,
        fps: float = 25.0,
    ) -> np.ndarray:
        """
        Predict speaking probability for each frame.

        Args:
            face_crops: Face crop images [T, H, W, C] or [T, H, W]
            audio_data: Audio waveform [samples]
            sample_rate: Audio sample rate (default: 16000)
            fps: Video frame rate (default: 25.0)

        Returns:
            Per-frame speaking scores [T] (>= 0 means speaking)
        """
```

### Factory Function

```python
def create_laser_model(
    device: str = "cuda",
    model_path: Optional[str] = None,
    **kwargs,
) -> LaserASDModel:
    """Factory function to create LASER ASD model."""
```

## Requirements

- Python >= 3.8
- PyTorch >= 1.10.0
- torchvision >= 0.11.0
- numpy >= 1.20.0
- opencv-python >= 4.5.0
- resampy >= 0.4.0
- python_speech_features >= 0.6

## Citation

If you use this code, please cite the original LASER ASD paper:

```bibtex
@inproceedings{nguyen2024laser,
  title={LASER: Lip Landmark Assisted Speaker Detection},
  author={Nguyen, Phat Lam and others},
  booktitle={Proceedings},
  year={2024}
}
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Original LASER ASD implementation: https://github.com/plnguyen2908/LASER_ASD
- LoCoNet encoder architecture
- TalkNet audio-visual framework

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
