Metadata-Version: 2.4
Name: teraflopai-data
Version: 0.1.0
Summary: A petabyte scale data processing framework for AI models using Ray.
Author-email: Teraflop AI <enrico@teraflop.ai>
Classifier: Typing :: Typed
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: ray[data]>=2.47.1
Requires-Dist: transformers>=4.53.2
Provides-Extra: text
Requires-Dist: warcio>=1.7.5; extra == "text"
Requires-Dist: trafilatura>=2.0.0; extra == "text"
Requires-Dist: chonkie>=1.1.0; extra == "text"
Requires-Dist: sentence-transformers>=5.0.0; extra == "text"
Requires-Dist: vllm>=0.9.2; extra == "text"
Provides-Extra: audio
Requires-Dist: silero-vad>=5.1.2; extra == "audio"
Provides-Extra: video
Requires-Dist: scenedetect>=0.6.6; extra == "video"
Provides-Extra: image
Requires-Dist: vllm>=0.9.2; extra == "image"
Requires-Dist: warcio>=1.7.5; extra == "image"
Provides-Extra: all
Requires-Dist: teraflopai-data[audio,image,text,video]; extra == "all"
Dynamic: license-file

# Open-data

A petabyte scale data processing framework for AI models using Ray.

## Installation
```python
pip install open-data
```
