Metadata-Version: 2.4
Name: embeddoor
Version: 0.1.0
Summary: A browser-based embedding visualization and analysis tool
Author-email: Robert Haase <robert.haase@uni-leipzig.de>
License: License :: OSI Approved :: MIT License
Keywords: embeddings,visualization,dimensionality-reduction,data-analysis
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: flask>=2.3.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: plotly>=5.14.0
Requires-Dist: scikit-learn>=1.3.0
Requires-Dist: pyarrow>=12.0.0
Requires-Dist: umap-learn>=0.5.3
Requires-Dist: pillow>=9.5.0
Requires-Dist: wordcloud>=1.9.0
Provides-Extra: dev
Requires-Dist: pytest>=7.3.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: black>=23.3.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Provides-Extra: embeddings
Requires-Dist: transformers>=4.30.0; extra == "embeddings"
Requires-Dist: torch>=2.0.0; extra == "embeddings"
Requires-Dist: sentence-transformers>=2.2.2; extra == "embeddings"
Requires-Dist: openai>=1.0.0; extra == "embeddings"
Requires-Dist: google-generativeai>=0.3.0; extra == "embeddings"
Dynamic: license-file

# Embeddoor

A browser-based tool for embedding visualization and analysis.

![](logo.png)

## Features

- **Dual-panel interface**: 2D/3D plots on the left, custom visualizations (tables, images, word clouds) on the right
- **Interactive data exploration**: Load CSV files, visualize tabular data, and plot 2-3 numerical columns
- **Advanced plot controls**: Configure hue, size, and shape based on data columns
- **Lasso selection**: Select data points interactively and store selections in the dataframe
- **Correlation analysis**: Visualize pairwise correlations with Pearson, Spearman, or Kendall methods
- **Heatmap visualizations**: View data as heatmaps from embeddings or numeric columns
- **Modular embedding framework**: Create embeddings using HuggingFace, OpenAI, Gemini, and custom models
- **Dimensionality reduction**: Apply PCA, t-SNE, and UMAP to high-dimensional embeddings
- **Data persistence**: Save and load data in Parquet format

## Installation

### Development Installation

```bash
git clone https://github.com/haesleinhuepf/embeddoor.git
cd embeddoor
pip install -e .[dev,embeddings]
```

## Quick Start

Launch the application:

```bash
embeddoor
```

This will start the server and open your default browser to `http://localhost:5000`.

## Workflow

1. **Load Data**: Use File → Open to load a CSV file
2. **Visualize**: View tabular data in the right panel, plot numerical columns in the left panel
3. **Customize Plot**: Select hue, size, and shape attributes for data points
4. **Select Points**: Use the lasso tool to select data points (stored as a new column)
5. **Create Embeddings**: Embedding → Create Embedding to generate embeddings from text/image columns
6. **Reduce Dimensions**: Dimensionality Reduction → Apply PCA/t-SNE/UMAP to embeddings
7. **Save**: File → Save to export data as Parquet

## License

MIT License

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
