Metadata-Version: 2.1
Name: openmmla-audio
Version: 0.1.3
Summary: Audio module for OpenMMLA platform, including data collection, data processing, and data analytics.
Home-page: https://github.com/ucph-ccs/mbox-audio
Author: Zaibei Li
Author-email: lizaibeim@gmail.com
License: MIT License
Classifier: Topic :: Multimedia :: Sound/Audio :: Analysis
Classifier: Topic :: Education :: Computer Aided Instruction (CAI)
Requires-Python: >=3.9.0
Description-Content-Type: text/markdown
Provides-Extra: server
License-File: LICENSE


# OpenMMLA Audio
[![PyPI version](https://img.shields.io/pypi/v/openmmla-audio.svg)](https://pypi.org/project/openmmla-audio/)

Audio module of the mBox multimodal learning analytic system. For more details, please refer to [mBox System Design](https://github.com/lizaibeim/mbox-uber/blob/main/docs/mbox_system.md).

## Uber Server Setup
Before setting up the audio base, you need to set up a server hosting the InfluxDB, Redis, and Mosquitto services.
Please refer to [mbox-uber](https://github.com/lizaibeim/mbox-uber/blob/main/README.md) module.

## Audio Base & Server Setup

Downloading and Setting up the mbox-audio module is accomplished in three steps:  
(1) Clone the repository from GitHub to your local home directory.  
(2) Install required system dependencies.  
(3) Install openmmla-audio.

1. Clone the repository from GitHub
    ```
    git clone https://github.com/ucph-ccs/mbox-audio.git
    ```

2. Install the required dependencies
   - <details>
     <summary> Mac </summary>
     
        ```sh
        # Install ffmpeg, portaudio-19.7.0, mecab-0.996(required for sacrebleu for NLP collection), llvm-16.0.6
        brew install ffmpeg
        brew install portaudio
        brew install mecab
        brew install llvm
        
        # Export llvm to your PATH, run:
        echo 'export PATH="/opt/homebrew/opt/llvm/bin:$PATH"' >> ~/.zshrc
        echo 'export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"' >> ~/.zshrc
        echo 'export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"' >> ~/.zshrc
        source ~/.zshrc
        ```
     </details>
   
   - <details>
     <summary> Ubuntu 24.04 </summary>
    
        ```sh
        sudo apt update && sudo apt upgrade
        sudo apt install build-essential
        sudo apt install git
        sudo apt install ffmpeg
        sudo apt install python3-pyaudio
        sudo apt update && sudo apt install -y libsndfile1

        # Install portaudio
        sudo apt install libasound-dev
        # Download the portaudio archive from: http://files.portaudio.com/download.html
        wget https://files.portaudio.com/archives/pa_stable_v190700_20210406.tgz
        # Unzip the archive
        tar -zxvf pa_stable_v190700_20210406.tgz
        # Enter the directory and compile
        cd portaudio
        ./configure && make
        sudo make install
        ```
     </details>
   
   - <details>
     <summary> Raspberry Pi Bullseye or later </summary>
    
        ```sh
        # Install pyaudio
        sudo apt-get install portaudio19-dev
        ```
     </details>

3. Install openmmla-audio with [conda environment](https://docs.anaconda.com/free/miniconda/index.html)
   - <details>
     <summary> Conda </summary>
     
       ```sh
       # For Raspberry Pi
       wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
       bash Miniforge3-$(uname)-$(uname -m).sh
       
       # For Mac and Linux
       wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-$(uname)-$(uname -m).sh"
       bash Miniconda3-latest-$(uname)-$(uname -m).sh
       ```
     </details>
   
   - <details>
     <summary> Audio Base </summary>
     
       ```sh
       conda create -c conda-forge -n audio-base python==3.10.12 -y
       conda activate audio-base
       pip install openmmla-audio
       ```  
     </details>

   - <details>
     <summary> Audio Server </summary>
     
       ```sh
       conda create -c conda-forge -n audio-server python==3.10.12 -y
       conda activate audio-server
       pip install openmmla-audio[server] # for linux and raspberry pi
       pip install 'openmmla-audio[server]' # for mac
       ```  
     </details>

## Usage

After successfully installing all required libraries, you can run the audio module on terminal.

1. Run real-time audio analysis system
    + Run audio server and audio bases in distributed mode
    ```sh
    # Run server scripts on your application servers supporting audio bases, specify your audio server cluster on 
    # your uber server by configuring the mbox-uber/conf/nginx.sh file and specify your extra audio upstream services in 
    # mbox-uber/conf/nginx.conf file.
   
    # e.g. our default setting runs audio services on 3 servers, which are server-01.local, server-02.local and 
    # server-03.local. Inside the nginx.conf, we specify 5 audio services related to those three server, which are
    # transcribe, separate, infer, enhance and vad services.
    sudo apt install tmux -y
    ./server.sh
    
    # Run audio bases 
    # :param -b the number of audio base needed to run, default to 3. 
    # :param -s the number of audio base synchronizer need to run, default to 1.
    # :param -l whether to run the audio bases standalone or with application servers, default to false. 
    # :param -p whether to do the speech separation when recognizing, default to false. 
    ./run.sh
    
    # control script to start/stop the session playing
    ./control.sh
    ```

2. Run the post-time audio analyzer
   1. Create a speaker corpus folder under ***/audio_db/post-time/*** folder, the folder name should be aligned with the
      name of the audio file to be processed **[audio_file_name.wav]** without the extension, 
      e.g. ***/audio_db/post-time/[audio_file_name]/***.
   2. Copy the speaker audio files to the speaker corpus folder, the audio files should be named as **[speaker_name].wav**.
   3. Run **audio_post_analyzer.py**
      ```sh
      cd examples/
         
      # process a single audio file, supported audio file format: wav, m4a, mp3
      python3 run_audio_post_analyzer.py -f [audio_file_name.wav]
         
      # process all audio files under the ***/audio/post-time/origin/*** folder
      python3 run_audio_post_analyzer.py
      ```

## Visualization

After running, the logs and visualizations are stored in the ***/logs/*** and ***/visualizations/*** folders.

## [FAQ](https://github.com/lizaibeim/mbox-uber/blob/main/docs/FAQ.md)

## Citation
If you use this code in your research, please cite the following paper:
```
@inproceedings{inproceedings,
author = {Li, Zaibei and Jensen, Martin and Nolte, Alexander and Spikol, Daniel},
year = {2024},
month = {03},
pages = {785-791},
title = {Field report for Platform mBox: Designing an Open MMLA Platform},
doi = {10.1145/3636555.3636872}
}
```

## References
- [NeMo](https://github.com/NVIDIA/NeMo)
- [Silero VAD](https://github.com/snakers4/silero-vad)
- [Denoiser](https://github.com/facebookresearch/denoiser)
- [MossFormer](https://github.com/alibabasglab/MossFormer)
- [Whisper](https://github.com/openai/whisper)

## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. 
