Metadata-Version: 2.4
Name: small-vlm
Version: 0.5.2
Summary: small vlm for training and experiments
Project-URL: Repository, https://github.com/leo1oel/small-vlm
Author-email: Yiming Liu <liuym23@mails.tsinghua.edu.cn>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: <4.0,>=3.11
Requires-Dist: blobfile>=3.0.0
Requires-Dist: datasets>=3.5.0
Requires-Dist: deepspeed>=0.16.5
Requires-Dist: hydra-core>=1.3.2
Requires-Dist: lightning>=2.5.1
Requires-Dist: pillow>=11.1.0
Requires-Dist: polars>=1.26.0
Requires-Dist: pretty-errors>=1.2.25
Requires-Dist: sentencepiece>=0.2.0
Requires-Dist: tiktoken>=0.9.0
Requires-Dist: torchvision>=0.21.0
Requires-Dist: transformers>=4.50.0
Requires-Dist: wandb>=0.19.8
Description-Content-Type: text/markdown

# small-vlm

![Architecture](assets/architecture.png)

A small vision-language model (VLM) implementation in PyTorch. The model consists of three main components:

- **Visual Encoder**: Extracts visual features from images using vision transformers
- **Language Model**: Processes text and generates responses using LLMs
- **Connector**: Connects visual and language features for multimodal understanding

You can switch different visual encoders, language models and connectors by changing the config.

* * *

## Project Docs

For how to install uv and Python, see [installation.md](installation.md).

For development workflows, see [development.md](development.md).

For instructions on publishing to PyPI, see [publishing.md](publishing.md).

* * *

*This project was built from
[simple-modern-uv](https://github.com/jlevy/simple-modern-uv).*
