Metadata-Version: 2.3
Name: gittxt
Version: 1.7.5
Summary: Gittxt: Get text from Git repositories in AI-ready formats
License: MIT
Keywords: llm-dataset,text-extraction,git-repos,ai-preprocessing,cli-tool,nlp,data-pipeline,repo-analysis,source-code,github-scanner,ai-ready
Author: Sandeep Paidipati
Author-email: sandeep.paidipati@gmail.com
Requires-Python: >=3.9,!=3.9.7
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Version Control :: Git
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Utilities
Requires-Dist: aiofiles (>=23.2.1)
Requires-Dist: aiohttp (>=3.9.5)
Requires-Dist: binaryornot (>=0.4.4)
Requires-Dist: click (>=8.1.3)
Requires-Dist: colorama (>=0.4.6)
Requires-Dist: gitpython (>=3.1.40)
Requires-Dist: humanize (>=4.12.2)
Requires-Dist: python-dotenv (>=0.21.1)
Requires-Dist: rich (>=13.9.4)
Requires-Dist: tiktoken (>=0.9.0)
Project-URL: Homepage, https://github.com/sandy-sp/gittxt
Project-URL: Repository, https://github.com/sandy-sp/gittxt
Description-Content-Type: text/markdown

🚀 **AI-Ready Text Extractor for Git Repos** | CLI tool for dataset prep, summaries, reverse engineering & bundling

# 📝 Gittxt: Get text from Git repositories in AI-ready formats

[![Docs](https://img.shields.io/badge/docs-online-blue?logo=mkdocs&labelColor=gray)](https://sandy-sp.github.io/gittxt/)
[![Python Version](https://img.shields.io/badge/python-≥3.9-blue)](pyproject.toml)
[![PyPI version](https://badge.fury.io/py/gittxt.svg)](https://pypi.org/project/gittxt/)
[![Release](https://img.shields.io/github/release/sandy-sp/gittxt.svg)](https://github.com/sandy-sp/gittxt/releases)
[![Tested with Pytest](https://img.shields.io/badge/tested%20with-pytest-9cf.svg)](https://docs.pytest.org/en/stable/)
[![PyPI Downloads](https://img.shields.io/pypi/dm/gittxt)](https://pypi.org/project/gittxt/)
![GitHub repo size](https://img.shields.io/github/repo-size/sandy-sp/gittxt)
![GitHub top language](https://img.shields.io/github/languages/top/sandy-sp/gittxt)
[![Build Status](https://github.com/sandy-sp/gittxt/actions/workflows/release.yml/badge.svg)](https://github.com/sandy-sp/gittxt/actions)
[![Made for LLMs](https://img.shields.io/badge/LLM%20ready-Yes-brightgreen)](https://github.com/sandy-sp/gittxt)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

---

## ✨ What is Gittxt? 
![](./docs/getting-started/assets/gittxt-demo.gif)

**Gittxt** is a powerful CLI and plugin framework that extracts structured text and metadata from Git repositories. It’s designed to help you build AI-ready datasets, analyze large codebases, and even reverse engineer report outputs.

Use it for:
- 🔍 Curating datasets from code and documentation
- 🗃️ Generating `.txt`, `.json`, `.md`, and `.zip` bundles
- 📑 Extracting and classifying technical files by sub-type
- 🧠 Analyzing size, token count, and file types
- 🔄 Reconstructing full project trees from summary reports

---

## 🚀 Features

- ✅ **File-Type Detection** (extension, MIME, content heuristic)
- ✅ **.gittxtignore Support** (with `--sync`)
- ✅ **Subcategory Classification** (docs, config, code, etc.)
- ✅ **Async File I/O** for scalable performance
- ✅ **Lite Mode** for minimal outputs (`--lite`)
- ✅ **Bundled ZIPs** (`--zip`) with manifest, summary, README
- ✅ **Reverse Engineering** from `.txt`, `.md`, `.json` reports
- ✅ **Plugin System**: `gittxt-api`, `gittxt-streamlit`, etc.

---

## 🏗️ Installation

```bash
pip install gittxt
```

Or for development:

```bash
git clone https://github.com/sandy-sp/gittxt.git
cd gittxt
poetry install
poetry run gittxt config install  # Optional installer
```

---

## ⚙️ Quickstart

```bash
gittxt scan https://github.com/sandy-sp/gittxt --output-format txt,json --zip --lite
gittxt re outputs/gittxt_summary.json
```

---

## 🖥️ CLI Commands

```bash
gittxt scan [OPTIONS] [REPOS]...
gittxt config [SUBCOMMANDS]
gittxt clean [--output-dir]
gittxt re REPORT_FILE [--output-dir]
gittxt plugin [list|install|run|uninstall]
```

---

## 🔌 Plugin System

```bash
gittxt plugin list
gittxt plugin install gittxt-api
gittxt plugin run gittxt-api
```

Plugins include:

- 🧪 `gittxt-api`: FastAPI backend for scanning and summaries
- 🖥️ `gittxt-streamlit`: Interactive visual dashboard

---

## 📦 Output Formats

```
<output_dir>/
├── txt/
├── json/
├── md/
├── zip/
│   ├── summary.json
│   ├── manifest.json
│   ├── outputs/
│   └── assets/
```

---

## 🔄 Reverse Engineer

```bash
gittxt re report.txt -o ./restored
```

This recreates original file structure in a ZIP from Gittxt `.txt`, `.md`, or `.json` reports.

---

## 📚 Documentation

Docs are now organized in a full [Docs site](https://sandy-sp.github.io/gittxt/) with:

- ✅ Getting Started
- ✅ CLI Reference
- ✅ API Endpoints
- ✅ Reverse Engineering
- ✅ Developer & Contributor Guide

---

## 🛣️ Roadmap

- ✅ Plugin framework with API/Streamlit
- ✅ Reverse from Gittxt reports
- ⏳ AI-powered summaries
- ⏳ Live web UI

---

## 🤝 Contributing

See [Contributing Guide](https://sandy-sp.github.io/gittxt/development/contributing/)

```bash
make lint     # Code style
make test     # Run CLI + API tests
```

---

## 🛡️ License

MIT License © [Sandeep Paidipati](https://github.com/sandy-sp)

---

Gittxt — **Get text from Git repositories in AI-ready formats.**

