Metadata-Version: 2.4
Name: granule
Version: 0.1.9
Summary: Granularity-on-demand learning object extractor and composer
Author-email: "R. Cooper Snyder" <robertcoopersnyder@gmail.com>
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.7.0
Requires-Dist: youtube-transcript-api>=0.6.2
Provides-Extra: videocard
Requires-Dist: openai>=1.9.9; extra == "videocard"
Requires-Dist: python-dotenv>=1.0.0; extra == "videocard"
Provides-Extra: cli
Requires-Dist: typer>=0.12.0; extra == "cli"
Requires-Dist: rich>=13.0.0; extra == "cli"
Provides-Extra: article
Requires-Dist: beautifulsoup4>=4.12.0; extra == "article"
Requires-Dist: trafilatura>=1.7.0; extra == "article"
Requires-Dist: readability-lxml>=0.8.1; extra == "article"
Requires-Dist: markdown-it-py>=3.0.0; extra == "article"
Provides-Extra: analysis
Requires-Dist: numpy>=1.25.0; extra == "analysis"
Requires-Dist: scikit-learn>=1.3.0; extra == "analysis"
Provides-Extra: vector
Requires-Dist: chromadb>=0.5.0; extra == "vector"
Provides-Extra: web
Requires-Dist: fastapi>=0.111.0; extra == "web"
Requires-Dist: uvicorn>=0.30.0; extra == "web"
Provides-Extra: ui
Requires-Dist: streamlit>=1.34.0; extra == "ui"
Provides-Extra: advanced-llm
Requires-Dist: pydantic-ai>=0.0.12; extra == "advanced-llm"
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: httpx>=0.27.0; extra == "dev"
Requires-Dist: granule[advanced-llm,analysis,article,cli,ui,vector,videocard,web]; extra == "dev"
Provides-Extra: full
Requires-Dist: granule[advanced-llm,analysis,article,cli,ui,vector,videocard,web]; extra == "full"

# Granule (v0.1.1)
Granule ingests blogs/HN/Reddit/YouTube/podcasts/news, atomizes them into SemanticAtoms with citations, and composes MicroLearning/FocusedSessions/DeepMastery units by target duration.

## Install
```bash
pip install -e .
```

### Minimal Video Insight Card Install
If you only need to generate structured `VideoInsightCardPayload` objects (YouTube transcript + OpenAI LLM), install the lightweight extra:

```bash
pip install granule[videocard]
```

This pulls only:
- pydantic (core models)
- youtube-transcript-api (transcript fetch)
- openai (LLM generation)
- python-dotenv (optional .env loading)

Example:
```python
from granule.api_simple import video_insight_card
from dotenv import load_dotenv
load_dotenv()  # reads OPENAI_API_KEY
card = video_insight_card("https://www.youtube.com/watch?v=mAClw7r3ETc")
print(card.model_dump_json(indent=2))
```

If `OPENAI_API_KEY` is unset, you'll still get a minimal fallback card (header + short TL;DR snippet if transcript exists).

### Full Feature Install
For CLI, article ingestion, ML metrics, vector store, API, UI, and advanced agent support:
```bash
pip install granule[full]
```

## CLI
```bash
granule ingest "Some article text" --kind blog -o doc.json
granule dissect doc.json --max-tokens 60 --only definition,example -o atoms.json
granule expand atoms.json --target 55s -o micro.json
granule stream atoms.json --pace 55s --until 3m -o session.jsonl
granule simple-youtube https://www.youtube.com/watch?v=dQw4w9WgXcQ -o simple.json
granule simple-card https://www.youtube.com/watch?v=dQw4w9WgXcQ -o card.json
granule video-card https://www.youtube.com/watch?v=dQw4w9WgXcQ -o video_card.json
granule video-card https://www.youtube.com/watch?v=dQw4w9WgXcQ --title "Custom Title" -o titled_card.json
granule text-video-card transcript.txt -o text_card.json
granule text-video-card "[00:00] Intro\n[00:05] Point A" --title "Inline Snippet" -o inline_card.json
```

## FastAPI
```bash
uvicorn granule.fastapi_app:app --reload --port 8000
```
Endpoints:
- POST /ingest {source, kind}
- POST /dissect {doc, max_tokens, kinds}
- POST /expand {graph, target_seconds}

## YouTube transcript
```python
from granule.ingest.youtube import ingest_youtube
doc = ingest_youtube("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
print(doc.text[:200])
```

## Environment Variables
Granule reads optional environment variables for LLM integration.

1. Copy `.env.example` to `.env` (or export variables another way).
2. Add your OpenAI key if you want LLM-powered features (planned / experimental):

```bash
cp .env.example .env
echo "OPENAI_API_KEY=sk-..." >> .env
```

Variables:

- `OPENAI_API_KEY` – enables future atom enrichment / generation via OpenAI.
- `GRANULE_LLM_MODEL` – preferred model (e.g. `gpt5`).
- `GRANULE_LLM_PROVIDER` – provider alias (`openai`, `azure-openai`, etc.).

 - `GRANULE_SUPPRESS_PROBLEM_QUESTIONS`: if set to `1/true/on`, skips adding ontology-derived and synthesized integrative Questions Raised (useful when you only want claims & problems without extra exploratory questions).

If `python-dotenv` is installed, `.env` will be auto-loaded; otherwise export vars normally.

## Streamlit UI (YouTube → Atoms → Unit)
Experimental helper UI.

Install extras:
```bash
pip install -e .[ui,llm]
```

Run:
```bash
streamlit run streamlit_app.py
```

Paste a YouTube URL, adjust parameters, run the pipeline, optionally enrich first atoms (needs OPENAI_API_KEY).

## New: Segments, Summaries & Insight Card
The pipeline now also produces:

- `segments` – improved sentence-cluster segments with token counts.
- `segment_summaries` – per-segment micro summaries (LLM-backed if key present, heuristic fallback otherwise).
- `insight_card` – a heuristic high-level Transcript Insight Card (claims, glossary, metrics stubs) serialized for UI consumption.

These appear in the composite JSON returned by `process_youtube` and in the Streamlit UI (preview table + card header/sections).

## Video Insight Card (Structured JSON)

Granule can produce a richer, schema-driven `VideoInsightCardPayload` either from a YouTube URL (auto transcript + optional title fetch) or any raw transcript text file.

Quickstart:
```bash
# (Install)  pip install granule[videocard]
# YouTube → structured video insight card (minimal fallback if no OPENAI_API_KEY set)
granule video-card https://www.youtube.com/watch?v=mAClw7r3ETc -o video_card.json

# Override title (skip auto fetched oEmbed title)
granule video-card https://www.youtube.com/watch?v=dQw4w9WgXcQ --title "My Custom Title" -o custom.json

# Raw transcript file → card
granule text-video-card path/to/transcript.txt --title "Workshop Transcript" -o workshop_card.json

# Inline raw text (small snippets) → card
granule text-video-card "[00:00] Intro to X\n[00:10] Key idea" --title "Snippet" -o snippet_card.json
```

If you provide an `OPENAI_API_KEY`, the card is generated via the OpenAI Responses API with strict schema parsing; otherwise a minimal fallback card (header + TL;DR snippet) is produced.

Schema highlights:
- Header (title, subtitle, badges)
- Video meta (url, id)
- Sections (TL;DR, Claims & Evidence, Glossary, Rhetoric, Misconceptions, Questions, Segments, Metrics, etc. – only those with content appear)
- Footer (persuasion modes, devices, timeline events)

Use cases:
- Fast analysis / structuring of transcripts for research
- Feeding downstream UI components or analytics pipelines
- Offline transcript auditing (supply a .txt file without hitting YouTube)

Tip: Pair with `simple-card` for a lighter heuristic card, or `video-card` for the full structured extraction.

### Extras Overview
| Extra | Purpose |
|-------|---------|
| videocard | Minimal OpenAI-powered video insight card generation |
| cli | Typer/Rich command line UX |
| article | Blog/article/markdown parsing & readability |
| analysis | Optional numeric/text metrics (numpy, scikit-learn) |
| vector | Chroma vector store integration |
| web | FastAPI + Uvicorn API server |
| ui | Streamlit prototype UI |
| advanced-llm | pydantic-ai agent experimentation |
| full | Aggregate of all feature extras |

## Releasing / Publishing

Helper script `publish.sh` automates version bump, build, and upload.

Examples:
```bash
# Bump patch version, build, upload to PyPI
./publish.sh patch

# Bump minor and upload to TestPyPI
./publish.sh minor --test

# Build only (no version change, no upload)
./publish.sh same --no-upload

# Dry run (show actions without changing files)
./publish.sh patch --dry-run
```

Set `PYPI_TOKEN` env var for non-interactive upload (token from PyPI account settings).
