Metadata-Version: 2.4
Name: folio-notion
Version: 0.1.0
Summary: Orchestrator: PDFs → NotebookLM → synthesis → disk + Notion
Author: Folio Notion contributors
License-Expression: MIT
Project-URL: Homepage, https://pypi.org/project/folio-notion/
Keywords: notion,notebooklm,pdf,cli,automation,research
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.27.0
Requires-Dist: prompt-toolkit>=3.0.36
Requires-Dist: pyyaml>=6.0.1
Provides-Extra: notebooklm
Requires-Dist: playwright>=1.49.0; extra == "notebooklm"
Requires-Dist: notebooklm-py>=0.3.4; extra == "notebooklm"
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: build>=1.2.0; extra == "dev"
Requires-Dist: twine>=5.0.0; extra == "dev"
Dynamic: license-file

# Folio Notion

**User story:** configure once → add PDFs → stay logged into NotebookLM → run → local export + Notion.

This repo is a **single product**: an orchestrator that owns the sequence and glue between tools you already use (files, [NotebookLM](https://notebooklm.google.com/), disk, [Notion](https://www.notion.so/)). It does not replace those tools; it drives them in order and passes data between steps.

For technical readers: this is a **research / ingestion pipeline** with orchestration at the center.

## Layout

See the tree below; each top-level folder has a short `README.md` where it helps.

```
folio-notion/
├── README.md
├── pyproject.toml
├── .env.example
├── .gitignore
├── config/                 # checked-in templates; user config lives outside or via env
├── docs/
│   ├── user-workflow.md    # journey + milestones (OSS-friendly)
│   └── roadmap.md
├── src/folio_notion/       # application package
│   ├── cli.py              # “Run pipeline” entrypoint
│   ├── pipeline.py         # sequence: configure → … → Notion
│   ├── steps/              # one module per pipeline stage
│   └── integrations/       # NotebookLM, Notion API, filesystem
├── tests/
├── scripts/                # optional one-off maintenance / dev helpers
└── var/                    # default local scratch (gitignored); exports, caches
```

## Quick start

**Install from PyPI** (after you [publish](docs/publishing.md) the package):

```bash
pip install folio-notion
# or: pip install folio_notion   # same package (normalized name)

# NotebookLM + browser flow:
pip install "folio-notion[notebooklm]"
playwright install chromium
```

**Install from a git clone** (development):

```bash
cd folio-notion
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e .
```

**Run pipeline** (PDFs → NotebookLM chat → `var/exports` + Notion page):

```bash
pip install -e ".[notebooklm]"   # NotebookLM API + browser login
playwright install chromium
# Put PDFs in var/inbox (or set input_dir in config/config.yaml)
fn run                            # same as: fn run -C .
fn run -C /path/to/folio-notion
python -m folio_notion
```

**Run modes**

- `fn run --dry-run` — load config and verify NotebookLM storage (+ Notion, unless you combine with `--skip-notion`). Does not ingest PDFs or call NotebookLM.
- `fn run --skip-notion` — NotebookLM synthesis and local markdown under `export_dir`; skips Notion API checks and page creation (useful for end-to-end LM testing without publishing).
- **PDFs** — by default Folio looks in `input_dir` (often `var/inbox`). If it’s empty and your terminal is interactive, it **prompts for file or folder path(s)** (comma-separated). Non-interactive / CI: `fn run --pdf /path/to/file.pdf` or `--pdf /path/to/folder` (repeat `--pdf` for several paths).

Structured logs: set **`FOLIO_LOG_LEVEL`** to `DEBUG` or `INFO` (default).

**Status** — see whether Notion / NotebookLM look connected (uses a quick API check; needs network):

```bash
fn status
fn status -C /path/to/folio-notion
```

Requires `.env` with `NOTION_TOKEN`, `NOTION_PARENT_PAGE_ID`, and `FOLIO_NOTEBOOKLM_STORAGE` (from the connect commands). Config: `config/config.yaml` (merged with `config/defaults.example.yaml`).

**Connect Notion** — verify your [internal integration](https://www.notion.so/my-integrations) secret with the API, optionally check a parent page id, optionally write `.env`:

```bash
fn connect notion
# alias:
fn notion connect
```

Flags: `--token SECRET`, `--parent PAGE_ID`, `--save`, `--no-save`, `-C /path/to/project` (where `.env` lives).

Interactive prompts use **prompt-toolkit** so **arrow keys and normal line editing** work (after `pip install -e .`). Parent may be a **Notion page or database**; database URLs are verified via the databases API.

Interactive parent step: bad input keeps prompting (a short guide appears every 3 failures). **Enter** skips parent; **quit** / **exit** / **q** leaves parent empty but continues to the save question.

Do **not** paste lines from `pyproject.toml` (like `[project.scripts]`) into the shell; those belong only in the file.

### Connect NotebookLM (browser → storage → httpx)

**Architecture:** interactive **Playwright** session once → save `storage_state.json` → **httpx** loads cookies and fetches NotebookLM page tokens (**SNlM0e**, **FdrFJe**) for future RPC-style calls. Cookie parsing patterns are aligned with [notebooklm-py](https://github.com/teng-lin/notebooklm-py); paths default to `~/.folio-notion/notebooklm/` (override with `NOTEBOOKLM_HOME` / `FOLIO_NOTEBOOKLM_HOME`).

```bash
pip install -e ".[notebooklm]"
playwright install chromium   # if not already installed
fn connect notebooklm
# alias:
fn notebooklm connect
```

- Chromium uses stealth-ish flags (`AutomationControlled`, no `--enable-automation` banner); **persistent profile** under `browser_profile/`.
- Session file is **`storage_state.json`** (mode `600` where supported). Optional: `NOTEBOOKLM_AUTH_JSON` for CI (same shape as Playwright storage).
- Set `FOLIO_PLAYWRIGHT_AUTO_INSTALL=0` to skip automatic `playwright install chromium`.
- Flags: `--storage PATH`, `--no-verify`, `--save` / `--no-save`, `-C` project root (for `.env`).

## Configuration

Paths, prompts, Notion parent page, optional notebook id — **configure once** via `config/` templates and environment variables (see `.env.example`). After `fn connect notion`, `NOTION_TOKEN` and optionally `NOTION_PARENT_PAGE_ID` can live in `.env`. After `fn connect notebooklm`, `FOLIO_NOTEBOOKLM_STORAGE` can point at your saved `storage_state.json`.
