Metadata-Version: 2.1
Name: fad-pytorch
Version: 0.0.2
Summary: Frechet Audio Distance evaluation in PyTorch
Home-page: https://github.com/drscotthawley/fad_pytorch
Author: Scott H. Hawley
Author-email: scott.hawley@belmont.edu
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: aeiou
Requires-Dist: torch (>=1.13.1)
Requires-Dist: torchaudio (>=0.13.1)
Requires-Dist: laion-clap
Requires-Dist: accelerate
Requires-Dist: torchlibrosa
Provides-Extra: dev

fad_pytorch
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

[Original FAD paper (PDF)](https://arxiv.org/pdf/1812.08466.pdf)

## Install

``` sh
pip install fad_pytorch
```

## About

(Intended) Features:

- runs in parallel on multiple GPUs
- supports 48kHz sample rates and stereo when possible
- supports CLAP embeddings, in addition to VGGish and PANN
- favors ops in PyTorch instead of numpy
- allows dataset access via WebDataset (over s3://)
- runs on CPU, CUDA, or MPS

This is designed to be run as 3 command-line scripts in succession. The
latter 2 (`fad_embed` and `fad_score`) are probably what most people
will want:

1.  `fad_gen`: produces directories of real & fake audio
2.  `fad_embed <real_audio_dir> <fake_audio_dir>`: produces directories
    of *embeddings* of real & fake audio
3.  `fad_score <real_emb_dir> <fake_emb_dir>`: reads the embeddings &
    generates FAD score, for real (“$r$”) and fake (“$f$”):

$$ FAD = || \mu_r - \mu_f ||^2 + tr\left(\Sigma_r + \Sigma_f - 2 \sqrt{\Sigma_r \Sigma_f}\right)$$

## Related Repos

There are \[several\] others, but this one is mine. These repos didn’t
have all the features I wanted, but I used them for inspiration:

- https://github.com/gudgud96/frechet-audio-distance
- https://github.com/google-research/google-research/tree/master/frechet_audio_distance:
  Goes with [Original FAD paper](https://arxiv.org/pdf/1812.08466.pdf)
- https://github.com/AndreevP/speech_distances
