Metadata-Version: 2.4
Name: podangelex_JustAnotherCoderTheThird
Version: 1.0.5
Summary: Automatically remove profanity and toxic content from audio files using Whisper and Detoxify
Author-email: Dante Edmiston <dante.edmiston@gmail.com>
License: CC0
Project-URL: Homepage, https://github.com/IDKCoding-commits/PodangelEX
Project-URL: Repository, https://github.com/IDKCoding-commits/PodangelEX
Project-URL: Issues, https://github.com/IDKCoding-commits/PodangelEX/issues
Keywords: audio,profanity,toxicity,whisper,detoxify,speech-to-text
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: whisper-timestamped>=1.15.9
Requires-Dist: openai-whisper>=20231114
Requires-Dist: detoxify>=0.5.1
Requires-Dist: torch>=2.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: black>=23.0; extra == "dev"
Requires-Dist: isort>=5.0; extra == "dev"
Requires-Dist: flake8>=6.0; extra == "dev"
Dynamic: license-file

# PodAngelEX

PodAngelEX is a passion project of mine, specifically made for muting inappropriate words and segments in audio

# How it works

PodAngel takes input files and transcribes them using OpenAI's Whisper model. Depending on your specs, you can choose multiple workers, that run asyncronously, to drastically improve how many files get 'cleaned' in a set amount of time. 

Once there is a list of all words and sentences from the audio file, the program first compares each word with a list of swears. Once it finds all the swears, it 'makes note' of the start and end time of each bad word. Then, so as to not skew the context catching, it 'erases' those swear words from the transcription

Then, the 'new' transcription is passed to Detoxify. Detoxify reads a sentence and assigns multiple values to it, denoting how vulgar it is. If any of those values pass the user-set threshold, the segment is flagged for muting.

Once we have the words and segments to be muted, the start and end times for each are passed to FFMPEG, which cuts and concentates the audio file accordingly, resulting in a much more socially appropriate file

*Please note, I cannot promise absolute accuracy. Rarely, the program may miss one or two swears in my experience, using Whisper's Turbo model

# What can be configured

I made PodAngel with the intent of being highly configurable. You can configure:

    1. Worker amount. There is, functionally, no limit to the amount of workers you want active at one time. They still use VRAM though, so don't just set it to a hundred and let it run if you can't support that.

    2. Toxicity thresholds. Each threshold from Detoxify can be increased or decreased in severity. The lower the number, the stricter the context catching. It's a float from 0.0 to 1.0. 1.0 will let everything through, and 0.0 will let about nothing through

    3. File paths. You can set a file path if you'd like to move PodAngel's 'workspace'. This will move every file/folder that PodAngel relies on to that new path, so if you change it, maybe put it in a fresh folder. 

# Install and How to Run

To install(I don't know how to do the fancy formatting), simply run the following command: 
    
    1. pip install podangelex-JustAnotherCoderTheThird

Then, to run it, just run this command in your terminal:
    
    2. podangel

On the first run, it'll guide you through setting up the program, while also making all the necessary files/folders it will need. Then, after initialization, just put files in the input folder, run the program, wait a bit, and enjoy your clean audio. 

## License

CC0 1.0 Universal - Public Domain

## AI Declaration

I used some AI to help debug the code, provide commit messages on Github, and to organize the files for package uploading
