Metadata-Version: 2.1
Name: txt-utils
Version: 0.0.2
Summary: CLI to modify text files.
Author-email: Stefan Taubert <pypi@stefantaubert.com>
Maintainer-email: Stefan Taubert <pypi@stefantaubert.com>
License: MIT
Project-URL: Homepage, https://github.com/stefantaubert/txt-utils
Project-URL: Issues, https://github.com/stefantaubert/txt-utils/issues
Keywords: Preprocessing,Processing,Text-to-speech,Speech synthesis,Utils,Language,Linguistics
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Operating System :: OS Independent
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX
Classifier: Operating System :: POSIX :: BSD
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Unix
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: License :: OSI Approved :: MIT License
Requires-Python: <4,>=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

# txt-utils

[![PyPI](https://img.shields.io/pypi/v/txt-utils.svg)](https://pypi.python.org/pypi/txt-utils)
[![PyPI](https://img.shields.io/pypi/pyversions/txt-utils.svg)](https://pypi.python.org/pypi/txt-utils)
[![MIT](https://img.shields.io/github/license/stefantaubert/txt-utils.svg)](https://github.com/stefantaubert/txt-utils/blob/master/LICENSE)
[![PyPI](https://img.shields.io/github/commits-since/stefantaubert/txt-utils/latest/master.svg)](https://github.com/stefantaubert/txt-utils/compare/v0.0.3...master)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7986310.svg)](https://doi.org/10.5281/zenodo.7986310)

CLI to modify text files.

## Features

- `merge`: merge multiple text files into one
- `extract-vocabulary`: extract unit vocabulary
- `transcribe`: transcribe units
- `replace`: replace text
- `replace-line`: replace text in a line
- `trim-units`: trim units
- `remove-units`: remove units
- `create-unit-occurrence-stats`: create unit occurrence statistics

## Roadmap

- add tests
- create n-grams
- map units
- merge units right/left
- calculate units TF-IDF

## Installation

```sh
pip install txt-utils --user
```

## Usage

```sh
txt-utils-cli
```

## Contributing

If you notice an error, please don't hesitate to open an issue.

## Dependencies

- pandas
- tqdm
- ordered-set >=4.1.0
- pronunciation-dictionary >=0.0.4

## License

MIT License

## Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

## Citation

If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see *About => Cite this repository*).

## Changelog

- 0.0.2 (2023-05-30)
  - Bugfix: Merge multiple files
  - Added:
    - Support for Python 3.11
- 0.0.1 (2022-05-30)
  - Initial release
