Metadata-Version: 2.4
Name: chunkhive
Version: 0.1.1
Summary: Hierarchical, semantic code chunking for AI systems
Home-page: https://github.com/AgentAhmed/CodeAtlas
Author: CodeAtlas
Author-email: ChunkHive <contact@codeatlas.ai>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/AgentAhmed/CodeAtlas
Project-URL: Documentation, https://github.com/AgentAhmed/CodeAtlas#readme
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tree-sitter>=0.20.0
Requires-Dist: tree-sitter-python>=0.20.0
Requires-Dist: typer>=0.9.0
Provides-Extra: javascript
Requires-Dist: tree-sitter-javascript>=0.20.0; extra == "javascript"
Provides-Extra: all
Requires-Dist: tree-sitter-javascript>=0.20.0; extra == "all"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# chunkhive

**Semantic, hierarchical code chunking for AI systems**

Chunkhive is a production-grade code chunking engine designed for modern AI workflows such as
code embeddings, retrieval-augmented generation (RAG), agentic systems, and dataset synthesis.

It converts raw repositories into **clean, structured, semantically accurate chunks**
with byte-level precision and preserved hierarchy.

---

## 🚀 Why chunkhive?

Modern AI systems need **more than naive text splitting**.

chunkhive provides:
- AST-first semantic correctness
- Hierarchical structure awareness
- Byte-accurate spans
- Robust parsing across real-world repositories

---

## 🧠 Core Principle

> **AST is the Authority, Tree-sitter is Enrichment**

- **Primary source of truth**: Language AST (semantic accuracy)
- **Fallback & enrichment**: Tree-sitter (structural robustness)
- **Result**: Maximum parsing success across diverse codebases

---

## ✨ Features

- Semantic AST-first chunking (no filename-based chunks)
- Preserves hierarchy: Module → Class → Method / Function
- Accurate parent–child relationships
- Byte-level precision (`start_byte`, `end_byte`)
- Clean symbol naming (`ast.name`)
- Import & decorator capture
- Robust handling of edge cases (empty files, `__init__.py`)
- Supports documentation + code chunking flows

---

## 🗂 Supported Chunk Types

- module
- class
- method
- function
- documentation

---

## 📦 Installation

```bash
pip install chunkhive


[![PyPI](https://img.shields.io/pypi/v/chunkhive)](https://pypi.org/project/chunkhive/)
[![Python](https://img.shields.io/pypi/pyversions/chunkhive)](https://pypi.org/project/chunkhive/)
[![License](https://img.shields.io/pypi/l/chunkhive)](https://pypi.org/project/chunkhive/)


---

## 📦 Output Schema (Simplified)

```json
{
  "chunk_id": "ts_123_456",
  "file_path": "src/example.py",
  "chunk_type": "function",
  "language": "python",
  "code": "...",
  "ast": {
    "name": "my_function",
    "parent": "MyClass"
  },
  "span": {
    "start_byte": 123,
    "end_byte": 456,
    "start_line": 10,
    "end_line": 25
  },
  "metadata": {
    "byte_accuracy": "exact_bytes"
  }
}

```


🛠 Use Cases

Code embedding model training

RAG pipelines

Agentic AI systems

Code search & navigation

QA dataset generation

Static analysis & tooling

📜 License

Apache License 2.0 — free to use, modify, and distribute, including commercial use.

