Metadata-Version: 2.1
Name: palm-vadapter
Version: 0.0.1
Summary: Paper - Pytorch
Home-page: https://github.com/kyegomez/PaLM2-VAdapter
License: MIT
Keywords: artificial intelligence,deep learning,optimizers,Prompt Engineering
Author: Kye Gomez
Author-email: kye@apac.ai
Requires-Python: >=3.6,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: einops
Requires-Dist: torch
Requires-Dist: zetascale
Project-URL: Documentation, https://github.com/kyegomez/PaLM2-VAdapter
Project-URL: Repository, https://github.com/kyegomez/PaLM2-VAdapter
Description-Content-Type: text/markdown

[![Multi-Modality](agorabanner.png)](https://discord.gg/qUtxnK2NMf)

# Palm2 Adapter
Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter".

This model uses a perceiver resampler with a depth of 1 + a tiny palm to efficiently learn the features behind the images and then map them  to the same space as the big model.

## install
`$ pip install palm2-vadapter`


## usage
```python
import torch
from palm_vadapter.main import PaLM2VAdapter

# Random text and image tensors
text = torch.randint(0, 1000, (1, 32), dtype=torch.long)


# Image tensor
img = torch.randn(1, 3, 224, 224)

# Initialize PaLM2VAdapter model
model = PaLM2VAdapter(
    tiny_dim=512,
    dim=512,
    num_tokens=10000,
    seq_length=32,
    depth=6,
    heads=8,
    image_size=224,
    patch_size=16,
)

# Forward pass through the model
out = model(text, img)

# Print the shape of the output
print(out.shape)
```


# License
MIT

## Citation
```bibtex
@misc{xiao2024palm2vadapter,
    title={PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter}, 
    author={Junfei Xiao and Zheng Xu and Alan Yuille and Shen Yan and Boyu Wang},
    year={2024},
    eprint={2402.10896},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
```
