Metadata-Version: 2.1
Name: neural-sync
Version: 0.1.2
Summary: A library to standardize the usage of various machine learning models
Home-page: https://github.com/HasanMaqsood8747/neuralsync.git
Author: Hasan Maqsood
Author-email: hasanmaqsood2001@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10.12
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: torch ==2.4.1
Requires-Dist: transformers ==4.44.2
Requires-Dist: torchvision ==0.19.1
Requires-Dist: numpy ==1.26.4
Requires-Dist: pyannote.audio ==3.3.1
Requires-Dist: torchaudio ==2.4.1
Requires-Dist: accelerate ==0.34.2
Requires-Dist: openai ==1.43.1
Requires-Dist: huggingface-hub ==0.23.4
Requires-Dist: pydub ==0.25.1
Requires-Dist: SpeechRecognition ==3.10.4
Requires-Dist: Cython ==3.0.11
Requires-Dist: nemo-toolkit[all] ==1.23.0
Requires-Dist: diffusers ==0.30.2
Requires-Dist: sentencepiece ==0.2.0

# Neural_sync Library

## Overview

The Neural_sync Library provides a unified interface for working with different machine learning models across various tasks. This library aims to standardize the way models are loaded, parameters are set, and results are generated, enabling a consistent approach regardless of the model type.

## Supported Models

### Text-to-Speech (TTS)
- **FastPitch & HiFi-GAN**: Convert text to high-quality speech audio.

### Speech-to-Text (STT)
- **Distil-Large-V2**: Transcribe speech to text.
- **Openai/whisper-large-v3**: Transcribe speech to text.
- **Nemo_asr**: Transcribe speech to text.

### Speaker Diarization
- **Pyannote 3.1**: Identify and separate speakers in an audio file.

### Voice Activity Detection (VAD)
- **Silero VAD**: Detect speech segments in audio files.

### Text-to-Image Generation
- **Stable Diffusion Medium-3**: Generate images from text prompts.

### Transformers-based Models
- **llama2, llama3, llama3_1**
- **Mistralv2, Mistralv3**
- **Phi3.5 Mini**
- **AgentLM 7b**

### Quantized Models
- **LLaMA 2, LLaMA 3, LLaMA 3.1**
- **Mistral v2, Mistral v3**
- **Phi3.5 Mini**
- **AgentLM 7b**

## Usage Examples
### 1. Transformers
Transformers models are versatile and can be used for various NLP tasks. Here's an example using the LLaMA 3 model

```bash
from parent.factory import ModelFactory
model = ModelFactory.get_model("llama3_1_8b_instruct")  # No need to specify model_path
params = ModelFactory.load_params_from_json('parameters.json')
model.set_params(**params)
response = model.generate(
     prompt="What is Artificial Intelligence?",
     system_prompt="Answer in German."
 )
print(response)
```
Similarly use following string for other models of transformers:
- agentlm
- Phi3_5
- llama2
- llama3_8b_instruct
- llama3_1_8b_instruct
- Mistralv2
- Mistralv3



### 2. FastPitch (Text-to-Speech)

FastPitch is used for generating speech from text:
```bash
from parent.factory import ModelFactory
model = ModelFactory.get_model("fastpitch")

response = model.generate(text="Hello, this is Hasan Maqsood",output_path="Hasan.wav")
```


### 3. Voice Activity Detection (VAD)

Silero VAD is used for detecting speech timestamps in audio files:

```bash
from parent.factory import ModelFactory
model = ModelFactory.get_model("silero_vad")

response = model.generate("Youtube.wav")
print("Speech Timestamps:", response)
```

### 4. Speaker Diarization

Pyannote is used for speaker diarization:
```bash
from parent.factory import ModelFactory
model = ModelFactory.get_model("pyannote",use_auth_token="Enter Your authentication token")
response = model.generate("Hasan.wav", visualize =True)
```

### 5. Automatic Speech Recognition (Speech-To-Text)

Nemo ASR is used for transcribing audio to text:
```bash
from parent.factory import ModelFactory
model = ModelFactory.get_model("nemo_asr")
response = model.generate(audio_files=["Hasan.wav"])
print(response)
```

### 6. Distil/ Openai whsiper (Speech-To-Text)

 Distil-whisper is used for transcribing audio to text:
```bash
from parent.factory import ModelFactory
model = ModelFactory.get_model("distil_whisper")
response = model.generate("Youtube.wav")
print("Transcription:", response)
```
 Openai-whisper is also used for transcribing audio to text:
```bash
 from parent.factory import ModelFactory
 model = ModelFactory.get_model("openai_whisper")
 response = model.generate("Youtube.wav")
 print("Transcription:", response)
```
### 7. Text-to-Image Generation

Stable Diffusion is used for generating images from text prompts:

```bash
from parent.factory import ModelFactory
model = ModelFactory.get_model("sd_medium3")
response = model.generate(prompt ="House")
image_path = "new_house.png"
response.save(image_path)
```



