Metadata-Version: 2.4
Name: afterthoughts
Version: 0.0.1
Summary: Late chunking for transformer embeddings - chunk after the model, not before.
Author-email: Nicholas Gigliotti <ndgigliotti@gmail.com>
License: Apache-2.0
Project-URL: homepage, https://github.com/ndgigliotti/afterthoughts
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown

# Afterthoughts

**Late chunking for transformer embeddings.**

Generate fine-grained, context-aware sentence embeddings by chunking *after* the model forward pass rather than before. Each chunk retains full document context from the transformer's attention mechanism.

## Coming Soon

This package is under active development. Features will include:

- Sentence-level embeddings with full document context
- Overlapping chunk extraction for dense retrieval
- Memory-efficient processing with incremental PCA
- Dynamic batching for optimal GPU utilization

## Why "Afterthoughts"?

- **"After"** = late (as in late chunking)
- **"Thoughts"** = the sentences/segments extracted
- Chunking is done "as an afterthought" rather than beforehand

## License

Apache 2.0
