Metadata-Version: 2.1
Name: rVADfast
Version: 0.0.1
Summary: rVADfast - a fast and robust unsupervised VAD
Author-email: Zheng-Hua Tan <zt@es.aau.dk>, Achintya Kumar Sarkar <sarkar.achintya@gmail.com>, Holger Severin Bovbjerg <hsbo@es.aau.dk>
Maintainer-email: Holger Severin Bovbjerg <hsbo@es.aau.dk>, Zheng-Hua Tan <zt@es.aau.dk>
License: MIT
Project-URL: Homepage, https://github.com/zhenghuatan/rVADfast/
Project-URL: Repository, https://github.com/zhenghuatan/rVADfast.git
Project-URL: Issues, https://github.com/zhenghuatan/rVADfast/issues
Project-URL: Source, https://github.com/zhenghuatan/rVADfast/
Keywords: Audio,Tools,VAD,Speech,Speech Processing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Multimedia :: Sound/Audio
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.23.5
Requires-Dist: scipy>=1.10.0
Requires-Dist: audiofile>=1.1.1
Requires-Dist: tqdm>=4.64.1

# rVADfast
The Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as presented in the 
paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method, Computer Speech and Language, 2020. 
More info on [the rVAD GitHub page](https://github.com/zhenghuatan/rVAD). 

The rVAD method consists of two passes of denoising followed by a VAD stage. It has been applied as a preprocessor for 
a wide range of applications, such as speech recognition, speaker identification, language identification, age and 
gender identification, self-supervised learning, human-robot interaction, audio archive segmentation, 
and so on as in [Google Scholar](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=fugL2E8AAAAJ&citation_for_view=fugL2E8AAAAJ:-mN3Mh-tlDkC).  

The method is unsupervised to make it applicable to a broad range of acoustic environments, 
and it is optimized considering both noisy and clean conditions. 

The rVAD (out of the box) ranks the 4th place (out of 27 supervised/unsupervised systems) 
in a Fearless Steps Speech Activity Detection Challenge. 

The rVAD paper is among [the most cited articles from Computer Speech and Language published since 2018](https://www.journals.elsevier.com/computer-speech-and-language/most-cited-articles) (the 6th place), in 2023.


## Usage
The rVADfast method is available as a python package installable via: ```pip install rVADfast```.
After installation, you can import the rVADfast VAD class as ```from rVADfast import rVADfast``` 
from which you can instantiate a VAD instance, e.g. as ```vad = rVADfast()```.  
The package also contains functionality to process folders of audio files, to generate VAD labels 
or to trim non-speeh segments from audio files.
This is done by importing the ```rVADfast.process``` module which has two methods for processing audio files, 
namely ```process.rVADfast_single_process``` and ```process.rVADfast_multi_process```, 
with the latter utilizing multiple CPUs for processing.
Additionally, a processing script can be called from commandline-tools by executing ```rVADfast_process 

In ```/notebooks``` a concrete example on how to use the rVADfast package is found.

## References
1) Z.-H. Tan, A.k. Sarkara and N. Dehak, "rVAD: an unsupervised segment-based robust voice activity detection method," Computer Speech and Language, vol. 59, pp. 1-21, 2020. 
2) Z.-H. Tan and B. Lindberg, "Low-complexity variable frame rate analysis for speech recognition and voice activity detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 798-807, 2010.
