Metadata-Version: 2.1
Name: kosmos2-torch
Version: 0.0.1
Summary: Kosmos - Pytorch
Home-page: https://github.com/kyegomez/Kosmos2.5
License: MIT
Keywords: artificial intelligence,deep learning,optimizers,Prompt Engineering
Author: Kye Gomez
Author-email: kye@apac.ai
Requires-Python: >=3.6,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: torch
Requires-Dist: zeta
Project-URL: Repository, https://github.com/kyegomez/Kosmos2.5
Description-Content-Type: text/markdown

[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Kosmos2.5
My implementation of Kosmos2.5 from Microsoft research and the paper: "KOSMOS-2.5: A Multimodal Literate Model"

[Paper Link](https://arxiv.org/pdf/2309.11419.pdf)

# Appreciation
* Lucidrains
* Agorians



# Install


# Dataset Strategy
Here is a table summarizing the datasets used in the paper KOSMOS-2.5: A Multimodal Literate Model with metadata and source links:

| Dataset | Modality | # Samples | Domain | Source | 
|-|-|-|-|-|  
| IIT-CDIP | Text + Layout | 27.6M pages | Scanned documents | [Link](https://ir.nist.gov/cdip/)|
| arXiv papers | Text + Layout | 20.9M pages | Research papers | [Link](https://arxiv.org/) |  
| PowerPoint slides | Text + Layout | 6.2M pages | Presentation slides | Web crawl |
| General PDF | Text + Layout | 155.2M pages | Diverse PDF files | Web crawl |
| Web screenshots | Text + Layout | 100M pages | Webpage screenshots | [Link](https://www.tensorflow.org/datasets/catalog/c4) |
| README | Text + Markdown | 2.9M files | GitHub README files | [Link](https://github.com/) |  
| DOCX | Text + Markdown | 1.1M pages | WORD documents | Web crawl |
| LaTeX | Text + Markdown | 3.7M pages | Research papers | [Link](https://arxiv.org/) |
| HTML | Text + Markdown | 6.3M pages | Webpages | [Link](https://www.tensorflow.org/datasets/catalog/c4) |



# License
MIT

# Citations
```bibtex
@misc{2309.11419,
Author = {Tengchao Lv and Yupan Huang and Jingye Chen and Lei Cui and Shuming Ma and Yaoyao Chang and Shaohan Huang and Wenhui Wang and Li Dong and Weiyao Luo and Shaoxiang Wu and Guoxin Wang and Cha Zhang and Furu Wei},
Title = {Kosmos-2.5: A Multimodal Literate Model},
Year = {2023},
Eprint = {arXiv:2309.11419},
}
```

**bold**
*italics*

