Metadata-Version: 2.4
Name: vemb
Version: 0.2.0
Summary: httpie for embeddings. Embed anything from the command line.
Project-URL: Homepage, https://github.com/yuvrajangadsingh/vemb
Project-URL: Repository, https://github.com/yuvrajangadsingh/vemb
Author: Yuvraj Angad Singh
License-Expression: MIT
License-File: LICENSE
Keywords: cli,embeddings,gemini,multimodal,rag,vector
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: click>=8.0.0
Requires-Dist: google-genai>=1.0.0
Description-Content-Type: text/markdown

# vemb

httpie for embeddings. Embed text, images, audio, video, and PDFs from the command line.

```bash
pipx install vemb
export GEMINI_API_KEY=your_key
vemb text "hello world"
```

Powered by [Gemini Embedding 2](https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-embedding-2/), the first natively multimodal embedding model. One model, one vector space for everything.

## Install

```bash
pipx install vemb
# or
pip install vemb
```

Get a free API key at https://aistudio.google.com/apikey

```bash
export GEMINI_API_KEY=your_key
```

## Commands

```bash
vemb text "hello world"                    # embed text
vemb embed photo.jpg                       # embed any file (auto-detects type)
vemb embed *.jpg --jsonl                   # batch embed, one JSON per line
vemb image photo.jpg                       # embed image (PNG, JPEG)
vemb audio clip.mp3                        # embed audio (MP3, WAV)
vemb video clip.mp4                        # embed video (MP4, MOV)
vemb pdf doc.pdf                           # embed PDF
vemb similar photo1.jpg photo2.jpg         # cosine similarity between two files
vemb search ./photos "sunset at beach"     # search a directory
```

Pipe from stdin:

```bash
echo "hello world" | vemb text -
cat document.txt | vemb text -
```

## Output

Default output is JSON:

```json
{
  "model": "gemini-embedding-2-preview",
  "dimensions": 768,
  "values": [0.012, -0.034, ...]
}
```

Options:

```bash
vemb text "hello" --compact                # just the vector array
vemb text "hello" --numpy                  # numpy format
vemb text "hello" --dim 768                # set dimensions (128-3072)
vemb text "hello" --task RETRIEVAL_QUERY   # set task type
```

Batch mode outputs JSONL (one embedding per line):

```bash
vemb embed *.jpg --jsonl > embeddings.jsonl
```

## Search

Search indexes a directory and finds files similar to your query:

```bash
vemb search ./photos "sunset at beach" --top 5
```

Embeddings are cached in `.vemb/cache.json` inside the searched directory. Unchanged files won't be re-embedded on subsequent searches.

## Supported formats

| Type | Formats |
|------|---------|
| Text | any string, stdin |
| Image | PNG, JPEG |
| Audio | MP3, WAV (up to 80s) |
| Video | MP4, MOV (up to 128s) |
| PDF | up to 6 pages |

## License

MIT
