Metadata-Version: 2.4
Name: multimodal-emotion-engine
Version: 0.2.0
Summary: Multimodal emotion recognition engine (audio + text + fusion)
Author-email: Satyam Sinha <satyamsinha9404@gmail.com>
License-Expression: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/x-rst
License-File: LICENSE.txt
Requires-Dist: numpy
Requires-Dist: torch
Requires-Dist: sounddevice
Requires-Dist: librosa
Requires-Dist: transformers
Requires-Dist: pydantic
Requires-Dist: rich
Requires-Dist: typer
Provides-Extra: asr
Requires-Dist: faster-whisper; extra == "asr"
Requires-Dist: ctranslate2; extra == "asr"
Provides-Extra: nlp
Requires-Dist: transformers; extra == "nlp"
Requires-Dist: torch; extra == "nlp"
Provides-Extra: full
Requires-Dist: torch; extra == "full"
Requires-Dist: faster-whisper; extra == "full"
Requires-Dist: ctranslate2; extra == "full"
Requires-Dist: transformers; extra == "full"
Provides-Extra: audio
Requires-Dist: torch; extra == "audio"
Requires-Dist: sounddevice; extra == "audio"
Requires-Dist: librosa; extra == "audio"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Dynamic: license-file

multimodal_emotion_engine
========================================

Multimodal Emotion Recognition Engine combining **audio**, **speech-to-text (ASR)**,
and **text-based emotion classification**, with a simple CLI and Python API.

This project is designed as a modular engine that can:
- 🎙️ Listen from microphone
- 📂 Analyze audio files
- 🧠 Fuse audio + text emotion signals
- ⚡ Run locally with pretrained models

---

Features
--------

- **Audio emotion recognition**
  - MFCC + pitch based model
- **Speech-to-text (ASR)**
  - Powered by `faster-whisper`
- **Text emotion classification**
  - DistilRoBERTa emotion model
- **CLI interface**
  - `emotion-engine listen`
  - `emotion-engine analyze <file>`
- **Python API**
- **PyScaffold-based project structure**
- **Tested with pytest**

---

Project Structure
-----------------

::

    multimodal_emotion_engine/
    ├── src/multimodal_emotion_engine/
    │   ├── audio/
    │   │   ├── emotion_model.py
    │   │   ├── features.py
    │   │   └── mic.py
    │   ├── text/
    │   │   └── asr.py
    │   ├── engine.py
    │   ├── cli.py
    │   └── config.py
    ├── models/
    │   └── emotion_model.pt
    ├── tests/
    ├── pyproject.toml
    └── setup.cfg

---

Installation
------------

### 1. Clone the repository

.. code-block:: bash

    git clone https://github.com/satsin06/multimodal_emotion_engine.git
    cd multimodal_emotion_engine

---

### 2. Create and activate a virtual environment (recommended)

.. code-block:: bash

    python -m venv venv
    source venv/bin/activate   # macOS / Linux
    venv\Scripts\activate      # Windows

---

### 3. Install the package

#### Minimal install

.. code-block:: bash

    pip install -e .

#### With audio support

.. code-block:: bash

    pip install -e ".[audio]"

#### With ASR support

.. code-block:: bash

    pip install -e ".[asr]"

#### Full installation (recommended)

.. code-block:: bash

    pip install -e ".[full]"

---

CLI Usage
---------

After installation, the CLI command is available as:

.. code-block:: bash

    emotion-engine --help

### 🎙️ Listen from microphone

.. code-block:: bash

    emotion-engine listen

This will:
- Record audio from your microphone
- Run emotion recognition
- Print emotion probabilities

Press **Ctrl+C** to stop recording (microphone is safely closed).

---

### 📂 Analyze an audio file

.. code-block:: bash

    emotion-engine analyze path/to/audio.wav

---

Python API Usage
----------------

You can also use the engine programmatically:

.. code-block:: python

    from multimodal_emotion_engine.engine import EmotionEngine

    engine = EmotionEngine()
    result = engine.analyze_file("sample.wav")

    print(result)

Example output:

.. code-block:: python

    {
        "neutral": 0.38,
        "happy": 0.10,
        "sad": 0.11,
        "angry": 0.05
    }

---

Models
------

- **Audio emotion model**
  - Stored in: ``models/emotion_model.pt``
- **Text emotion model**
  - ``j-hartmann/emotion-english-distilroberta-base`` (downloaded automatically)
- **ASR model**
  - Faster-Whisper (downloaded on first use)

Models are cached under:

::

    ~/.cache/emotion-engine/

---

Running Tests
-------------

Install test dependencies:

.. code-block:: bash

    pip install -e ".[dev]"

Run tests:

.. code-block:: bash

    pytest

---

Development Notes
-----------------

- Python ≥ 3.9
- PyScaffold 4.6
- Type hints enabled (Pylance / MyPy friendly)
- Modular design for future multimodal fusion

---

Roadmap
-------

- [ ] Real-time streaming emotion detection
- [ ] Multilingual ASR
- [ ] Emotion fusion weighting
- [ ] REST API
- [ ] Model retraining pipeline

---

License
-------

MIT License. See ``LICENSE.txt`` for details.

---

Author
------

**Satyam Sinha**

- GitHub: https://github.com/satsin06
- Email: satyamsinha9404@gmail.com
