Metadata-Version: 2.1
Name: fad_pytorch
Version: 0.0.6
Summary: Frechet Audio Distance evaluation in PyTorch
Home-page: https://github.com/drscotthawley/fad_pytorch
Author: Scott H. Hawley
Author-email: scott.hawley@belmont.edu
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: dev
License-File: LICENSE

fad_pytorch
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

[Original FAD paper (PDF)](https://arxiv.org/pdf/1812.08466.pdf)

## Install

``` sh
pip install fad_pytorch
```

## Features:

- runs in parallel on multiple processors and multiple GPUs (via
  `accelerate`)
- supports multiple embedding methods:
  - VGGish and PANN, both mono @ 16kHz
  - OpenL3 and (LAION-)CLAP, stereo @ 48kHz
- uses publicly-available pretrained checkpoints for music (+other
  sources) for those models. (if you want Speech, submit a PR or an
  Issue; I don’t do speech.)
- favors ops in PyTorch rather than numpy (or tensorflow)
- `fad_gen` supports local data read or WebDataset (audio data stored in
  S3 buckets)
- runs on CPU, CUDA, or MPS

## Instructions:

This is designed to be run as 3 command-line scripts in succession. The
latter 2 (`fad_embed` and `fad_score`) are probably what most people
will want:

1.  `fad_gen`: produces directories of real & fake audio (given real
    data). See `fad_gen`
    [documentation](https://drscotthawley.github.io/fad_pytorch/fad_gen.html)
    for calling sequence.
2.  `fad_embed [options] <real_audio_dir> <fake_audio_dir>`: produces
    directories of *embeddings* of real & fake audio
3.  `fad_score [options] <real_emb_dir> <fake_emb_dir>`: reads the
    embeddings & generates FAD score, for real (“$r$”) and fake (“$f$”):

$$ FAD = || \mu_r - \mu_f ||^2 + tr\left(\Sigma_r + \Sigma_f - 2 \sqrt{\Sigma_r \Sigma_f}\right)$$

## Documentation

See the [Documentation
Website](https://drscotthawley.github.io/fad_pytorch/).

## Comments / FAQ / Troubleshooting

- “`RuntimeError: CUDA error: invalid device ordinal`”: This happens
  when you have a “bad node” on an AWS cluster. [Haven’t yet figured out
  what causes it or how to fix
  it](https://discuss.huggingface.co/t/solved-accelerate-accelerator-cuda-error-invalid-device-ordinal/21509/1).
  Workaround: Just add the current node to your SLURM `--exclude` list,
  exit and retry. Note: it may take as many as 5 to 7 retries before you
  get a “good node”.
- “FAD scores obtained from different embedding methods are *wildly*
  different!” …Yea. It’s not obvious that scores from different
  embedding methods should be comparable. Rather, compare different
  groups of audio files using the same embedding method, and/or check
  that FAD scores go *down* as similarity improves.
- “FAD score for the same dataset repeated (twice) is not exactly zero!”
  …Yea. There seems to be an uncertainty of around +/- 0.008. I’d say,
  don’t quote any numbers past the first decimal point.

## Contributing

This repo is still fairly “bare bones” and will benefit from more
documentation and features as time goes on. Note that it is written
using [nbdev](https://nbdev.fast.ai/), so the things to do are:

1.  Fork this repo
2.  Clone your fork to your (local) machine
3.  Install nbdev: `python3 -m pip install -U nbdev`
4.  Make changes by editing the notebooks in `nbs/`, not the `.py` files
    in `fad_pytorch/`.
5.  Run `nbdev_export` to export notebook changes to `.py` files
6.  For good measure, run `nbdev_install_hooks` and `nbdev_clean` -
    especially if you’ve *added* any notebooks.
7.  Do a `git status` to see all the `.ipynb` and `.py` files that need
    to be added & committed
8.  `git add` those files and then `git commit`, and then `git push`
9.  Take a look in your fork’s GitHub Actions tab, and see if the “test”
    and “deploy” CI runs finish properly (green light) or fail (red
    light)
10. Once you get green lights, send in a Pull Request!

*Feel free to ask me for tips with nbdev, it has quite a learning curve.
You can also ask on [fast.ai forums](https://forums.fast.ai/) and/or
[fast.ai
Discord](https://discord.com/channels/689892369998676007/887694559952400424)*

## Citations / Blame / Disclaimer

This repo is 2 weeks old. I’m not ready for this to be cited in your
papers. I’d hate for there to be some mistake I haven’t found yet.
Perhaps a later version will have citation info. For now, instead,
there’s:

**Disclaimer:** Results from this repo are still a work in progress.
While every effort has been made to test model outputs, the author takes
no responsbility for mistakes. If you want to double-check via another
source, see “Related Repos” below.

## Related Repos

There are \[several\] others, but this one is mine. These repos didn’t
have all the features I wanted, but I used them for inspiration:

- https://github.com/gudgud96/frechet-audio-distance
- https://github.com/google-research/google-research/tree/master/frechet_audio_distance:
  Goes with [Original FAD paper](https://arxiv.org/pdf/1812.08466.pdf)
- https://github.com/AndreevP/speech_distances
