Metadata-Version: 2.4
Name: readanybook
Version: 0.1.4
Summary: A RAG-based cheat sheet generator for books and papers
Home-page: https://github.com/readanybook/readanybook
Author: ReadAnyBook Team
Author-email: ReadAnyBook Team <team@readanybook.dev>
License: MIT
Project-URL: Homepage, https://github.com/readanybook/readanybook
Project-URL: Documentation, https://readanybook.dev/docs
Project-URL: Repository, https://github.com/readanybook/readanybook
Project-URL: Issues, https://github.com/readanybook/readanybook/issues
Keywords: rag,retrieval-augmented-generation,cheat-sheet,pdf,summarization,llm,nlp,education
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: General
Classifier: Topic :: Education
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: pypdf>=3.0.0
Requires-Dist: ebooklib>=0.18
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: lxml>=4.9.0
Requires-Dist: chardet>=5.0.0
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: torch>=2.0.0
Requires-Dist: chromadb>=0.4.0
Requires-Dist: tiktoken>=0.5.0
Requires-Dist: rank-bm25>=0.2.2
Requires-Dist: jinja2>=3.1.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: numpy>=1.24.0
Provides-Extra: cli
Requires-Dist: typer>=0.9.0; extra == "cli"
Requires-Dist: rich>=13.0.0; extra == "cli"
Provides-Extra: api
Requires-Dist: fastapi>=0.100.0; extra == "api"
Requires-Dist: uvicorn>=0.23.0; extra == "api"
Requires-Dist: python-multipart>=0.0.6; extra == "api"
Provides-Extra: qdrant
Requires-Dist: qdrant-client>=1.6.0; extra == "qdrant"
Provides-Extra: weaviate
Requires-Dist: weaviate-client>=3.24.0; extra == "weaviate"
Provides-Extra: ollama
Requires-Dist: ollama>=0.1.0; extra == "ollama"
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: quantization
Requires-Dist: bitsandbytes>=0.41.0; extra == "quantization"
Requires-Dist: accelerate>=0.24.0; extra == "quantization"
Provides-Extra: eval
Requires-Dist: nltk>=3.8.0; extra == "eval"
Requires-Dist: rouge-score>=0.1.2; extra == "eval"
Provides-Extra: observability
Requires-Dist: opentelemetry-api>=1.20.0; extra == "observability"
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == "observability"
Requires-Dist: opentelemetry-exporter-otlp>=1.20.0; extra == "observability"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: mypy>=1.5.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Provides-Extra: all
Requires-Dist: readanybook[api,cli,eval,observability,ollama,openai,qdrant,quantization]; extra == "all"
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# ReadAnyBook 📚

A RAG-based cheat sheet generator that transforms books and papers into structured, 12-page LaTeX cheat sheets.

## Features

- **Multi-format Document Support**: PDF, EPUB, HTML, LaTeX, Markdown
- **Intelligent Chunking**: Math-aware and code-aware text splitting
- **Hybrid Retrieval**: Dense embeddings + BM25 with reciprocal rank fusion
- **Multi-pass Generation**: Separate extraction for concepts, formulas, algorithms, and models
- **LaTeX Output**: Professional cheat sheets compiled to PDF
- **Multiple LLM Backends**: HuggingFace, Ollama, vLLM, OpenAI-compatible APIs
- **Vector Store Options**: ChromaDB, Qdrant, Weaviate

## Quick Start

### Installation

```bash
# Basic installation
pip install readanybook

# With CLI support
pip install readanybook[cli]

# With all features
pip install readanybook[all]
```

### From Source

```bash
git clone https://github.com/readanybook/readanybook.git
cd readanybook
pip install -e ".[dev]"
```

### Usage

#### Command Line

```bash
# Generate a cheat sheet from a PDF
read-any-book build document.pdf -o cheatsheet.pdf

# Use a specific profile
read-any-book build document.pdf --profile math_paper

# Index a document
read-any-book index document.pdf --collection my_collection

# Search indexed documents
read-any-book search "gradient descent" --collection my_collection
```

#### Python API

```python
from readanybook import CheatSheetPipeline, Settings

# Initialize pipeline
settings = Settings()
pipeline = CheatSheetPipeline(settings)

# Process document
pipeline.ingest("textbook.pdf")
pipeline.index(collection_name="textbook")

# Generate cheat sheet
content = pipeline.generate_content()
cheat_sheet = pipeline.build(content, "output/cheatsheet.pdf")

print(f"Generated: {cheat_sheet.pdf_path}")
```

#### REST API

```bash
# Start the API server
uvicorn readanybook.api:app --host 0.0.0.0 --port 8000

# Upload a document
curl -X POST "http://localhost:8000/upload" \
  -F "file=@document.pdf" \
  -F "collection_name=my_docs"

# Generate cheat sheet
curl -X POST "http://localhost:8000/generate" \
  -H "Content-Type: application/json" \
  -d '{"collection_name": "my_docs", "title": "My Cheat Sheet"}'
```

## Configuration

Create a `config.yaml` file or use environment variables:

```yaml
# Embedding model
embedding:
  model_name: "BAAI/bge-base-en-v1.5"
  device: "cuda"

# Vector store
vectordb:
  store_type: "chroma"
  persist_directory: "./data/chroma"

# LLM settings
llm:
  backend: "ollama"
  model_name: "llama3:8b"

# Retrieval
retrieval:
  mode: "hybrid"
  top_k: 15
  
# LaTeX output
latex:
  columns: 2
  font_size: 10
  paper_size: "a4paper"
```

### Configuration Profiles

Use built-in profiles for different document types:

```bash
# For technical books
read-any-book build book.pdf --profile technical_book

# For math papers
read-any-book build paper.pdf --profile math_paper

# For non-technical books
read-any-book build novel.pdf --profile nontechnical_book
```

## Architecture

```
readanybook/
├── core/           # Domain logic
│   ├── ingestion.py    # Document parsing
│   ├── chunking.py     # Text splitting
│   ├── indexing.py     # Embedding & indexing
│   ├── retrieval.py    # Hybrid retrieval
│   ├── models.py       # LLM clients
│   ├── prompts.py      # Prompt templates
│   └── pipeline.py     # Main orchestrator
├── generation/     # Content generation
│   ├── concepts.py     # Concept extraction
│   ├── formulas.py     # Formula extraction
│   ├── algorithms.py   # Algorithm synthesis
│   ├── models_theory.py # Model summarization
│   └── latex_builder.py # LaTeX generation
├── evaluation/     # Quality metrics
│   ├── rag_eval.py     # RAG evaluation
│   └── metrics.py      # Content metrics
├── infra/          # Infrastructure
│   ├── settings.py     # Configuration
│   ├── vectordb.py     # Vector stores
│   ├── logging.py      # Logging
│   └── tracing.py      # Observability
├── api/            # REST API
├── cli/            # Command line interface
├── templates/      # LaTeX templates
└── config/         # Default configs
```

## Requirements

- Python 3.10+
- PyTorch 2.0+
- LaTeX distribution (for PDF compilation)
  - TeX Live, MiKTeX, or Tectonic

### LaTeX Installation

```bash
# Ubuntu/Debian
sudo apt install texlive-full

# macOS
brew install --cask mactex

# Or use Tectonic (lightweight)
cargo install tectonic
```

## Development

```bash
# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black readanybook tests
isort readanybook tests

# Type check
mypy readanybook

# Lint
ruff check readanybook
```

## Examples

See the [examples](examples/) directory for:

- Processing academic papers
- Creating ML textbook cheat sheets
- Custom template usage
- API integration examples

## License

MIT License - see [LICENSE](LICENSE) for details.

## Contributing

Contributions welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) first.

## Acknowledgments

- Built with 🤗 Transformers, ChromaDB, and FastAPI
- Inspired by the need for better study materials
