Metadata-Version: 2.4
Name: audiogen-mcp
Version: 0.2.1
Summary: MCP server for generating sound effects using Meta's AudioGen
Project-URL: Homepage, https://github.com/peerjakobsen/audiogen-mcp
Project-URL: Repository, https://github.com/peerjakobsen/audiogen-mcp
Project-URL: Issues, https://github.com/peerjakobsen/audiogen-mcp/issues
Author: Peer Jakobsen
License: MIT
License-File: LICENSE
Keywords: ai,audio,audiogen,claude,mcp,sound-effects
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: <3.12,>=3.9
Requires-Dist: mcp>=1.0.0
Requires-Dist: scipy>=1.11.0
Requires-Dist: torch>=2.1.0
Requires-Dist: torchaudio>=2.1.0
Description-Content-Type: text/markdown

# AudioGen MCP Server

[![PyPI version](https://badge.fury.io/py/audiogen-mcp.svg)](https://badge.fury.io/py/audiogen-mcp)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

An MCP server that generates sound effects from text descriptions using Meta's AudioGen model. Designed for Apple Silicon Macs.

## Prerequisites

- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.9-3.11 (3.12+ not yet supported by audiocraft)
- ffmpeg: `brew install ffmpeg`
- ~4GB disk space for model weights
- ~8GB RAM recommended

## Installation

Due to audiocraft's complex dependencies (xformers doesn't build on Apple Silicon), installation requires a specific order:

```bash
# Create virtual environment with Python 3.11
uv venv ~/.audiogen-env --python 3.11
source ~/.audiogen-env/bin/activate

# Install audiocraft without its problematic dependencies
uv pip install audiocraft --no-deps

# Install the actual dependencies (skipping xformers)
uv pip install torch torchaudio transformers huggingface_hub encodec einops \
    flashy num2words sentencepiece librosa av julius spacy torchmetrics \
    hydra-core hydra-colorlog demucs lameenc

# Install audiogen-mcp
uv pip install audiogen-mcp
```

The first run will download the AudioGen model (~2GB).

## Configure Claude Code

```bash
claude mcp add audiogen ~/.audiogen-env/bin/python -- -m audiogen_mcp.server
```

Or add to `~/.config/claude/claude_desktop_config.json`:

```json
{
  "mcpServers": {
    "audiogen": {
      "command": "/Users/YOUR_USERNAME/.audiogen-env/bin/python",
      "args": ["-m", "audiogen_mcp.server"]
    }
  }
}
```

## Available Tools

| Tool | Description |
|------|-------------|
| `generate_sound_effect` | Start a background generation job, returns job_id |
| `check_generation_status` | Poll job status by job_id until completed |
| `list_generation_jobs` | List all jobs and their current status |
| `list_generated_sounds` | List previously generated audio files |
| `get_model_status` | Check if model is loaded and device info |

## How It Works

Generation runs in the background to avoid timeouts:

1. Call `generate_sound_effect` with your prompt → returns `job_id`
2. Poll `check_generation_status` with the `job_id` every 10-15 seconds
3. When status is `completed`, the result includes `file_path`

## Example Prompts

Once configured, ask Claude Code to generate sounds:

- "Generate an explosion sound effect"
- "Create a dark ambient tension drone, 10 seconds"
- "Make a retro 8-bit power-up sound, 2 seconds long"
- "Generate footsteps on gravel, 5 seconds"

### Prompt Tips

For best results, be specific:

```
# Good
"glass breaking, single wine glass falling on tile floor"
"8-bit arcade explosion, retro game style"
"dark ambient tension drone, synth pad, ominous low frequency rumble"

# Less good
"glass sound"
"explosion"
"ambient"
```

Include style, mood, and context for better results.

## Performance

- ~18 seconds to generate 1 second of audio on Apple Silicon
- 5 seconds of audio ≈ 90 seconds generation time
- 10 seconds of audio ≈ 180 seconds generation time
- First generation takes longer (model loading ~5s)
- Uses Metal Performance Shaders (MPS) for GPU acceleration

## Output

Generated files save to `~/audiogen_outputs/` by default as WAV or OGG files.

## Troubleshooting

### Installation fails with xformers error

This is expected on Apple Silicon. The server mocks xformers at runtime since it's only needed for CUDA. If audiocraft installation fails, try:

```bash
uv pip install torch torchaudio
uv pip install audiocraft --no-build-isolation
```

### Model download fails

Ensure stable internet and sufficient disk space. The model downloads from HuggingFace Hub.

### Slow generation

Check device with `get_model_status` tool. CPU fallback is 10-20x slower than MPS.

### MPS not available

Requires macOS 12.3+ and PyTorch 2.0+.

## License

MIT License - see [LICENSE](LICENSE) file.

## Acknowledgments

- [Meta AudioCraft](https://github.com/facebookresearch/audiocraft) - The underlying AI model
- [MCP](https://modelcontextprotocol.io/) - Model Context Protocol specification
