Metadata-Version: 2.4
Name: glyph-forge
Version: 3.0.0
Summary: Pip installable client for Glyph Forge API
Project-URL: Homepage, https://www.glyphapi.ai/
Project-URL: Issues, https://github.com/Devpro-LLC/glyph-forge-client/issues
Project-URL: Source, https://github.com/Devpro-LLC/glyph-forge-client
Author: Devpro LLC
License: Apache-2.0
License-File: LICENSE
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.9
Requires-Dist: httpx>=0.25.0
Requires-Dist: lxml<6.0,>=5.2
Requires-Dist: python-docx<2.0,>=1.1
Provides-Extra: docs
Requires-Dist: furo>=2025.0.0; extra == 'docs'
Requires-Dist: myst-parser>=4.0.0; extra == 'docs'
Requires-Dist: sphinx-autodoc-typehints>=3.0.0; extra == 'docs'
Requires-Dist: sphinx-copybutton>=0.5.0; extra == 'docs'
Requires-Dist: sphinx>=8.0.0; extra == 'docs'
Provides-Extra: test
Requires-Dist: pytest-cov>=4.0.0; extra == 'test'
Requires-Dist: pytest>=7.0.0; extra == 'test'
Description-Content-Type: text/markdown

# Glyph Forge

A Python framework for turning LLM plaintext into styled DOCX documents. Inspired by HTML/CSS and Tailwind design patterns — schemas define the baseline formatting, inline markup handles the overrides, and AI agents handle the rest.

## Installation

```bash
pip install glyph-forge
```

## Quick Start

```python
from glyph_forge import ForgeClient, create_workspace

ws = create_workspace()
client = ForgeClient()

# Build schema from a reference DOCX
schema = client.build_schema_from_docx(ws, docx_path="template.docx", save_as="my_schema")

# Generate a new DOCX from plaintext
docx_path = client.run_schema(
    ws,
    schema=schema,
    plaintext="Your content here...",
    dest_name="output.docx"
)
```

---

## Tools & Their Purpose

Glyph Forge has several tools. Each one does a specific job. Understanding what each tool is (and isn't) for is key to building reliable workflows.

### Schemas — Your Baseline Formatter

**What it does:** A schema maps heuristic types (headings, paragraphs, lists, tables) to styling rules. When you run a schema against plaintext, Glyph classifies each line by its structural role and applies the matching style.

**What it is NOT:** A schema is not AI. It does not understand meaning, context, or semantics. It matches structural patterns — a short title-cased line is a heading, a line starting with `•` is a bullet, and so on.

**When to use it:** Always. The schema is the foundation of every Glyph workflow. Start here.

**Performance:** Schemas compile in milliseconds. No API key, no network calls, no latency.

```python
# Schema selectors target structural patterns
{
    "type": "H-SHORT",       # Short headings (title case, ALL CAPS, <=6 words)
    "style": {"font": {"bold": true, "size": 18}}
}
```

### Inline Markup — Context-Aware Overrides

**What it does:** Inline markup lets you (or an LLM) embed styling instructions directly in the plaintext. Block markup (`$glyph-{utilities}`) wraps entire paragraphs. Inline markup (`[utilities]text[/]`) styles specific words or phrases.

**What it is NOT:** A replacement for schemas. Markup handles exceptions and overrides — it is not meant to style every line of a document from scratch.

**When to use it:** When you need to style something a schema can't identify on its own. A schema knows what a heading looks like structurally, but it doesn't know that "Professional Summary" is a section you want bolded in blue. An LLM does.

**Cascade rule:** Inline markup always overrides schema styles. `[bold,color-FF0000]` on a word wins over whatever the schema says for that line.

```
$glyph-font-size-11
This is normal body text, but [bold,color-FF0000]this phrase[/] stands out.
$glyph
```

### Plaintext Agent (Markup Agent) — LLM-Powered Styling

**What it does:** You describe what you want in natural language, and the agent rewrites the plaintext with the appropriate `$glyph` blocks and `[utilities]text[/]` inline tags inserted.

**What it is NOT:** A content generator. It does not write or rewrite your text. It wraps existing text in markup.

**When to use it:** When you already have an established schema and want to apply styling that requires understanding the meaning of the text — things like "bold the professional summary" or "make the warning section red."

**Requires:** API key. This is an AI agent, so it adds processing time.

```python
# The agent reads the plaintext, understands the request, and inserts markup
marked_up = client.ask(message="Make the professional summary bold", current_plaintext=plaintext)
```

### Schema Agent — Developer Scaffolding

**What it does:** Helps you quickly draft or edit schemas through natural language prompts. You describe the document structure you want, and it generates selector JSON.

**What it is NOT:** A source of truth. Schemas generated by this agent should be reviewed, tested, and stored in your backend. Do not use agent-generated schemas in production without human review.

**When to use it:** During development, to bootstrap a schema quickly. Think of it like a code generator — useful to get started, but you own the output.

**Requires:** API key.

### XML Agent — Experimental Final Polish

**What it does:** Operates directly on the unzipped DOCX XML structure. An LLM identifies a target element in the XML and writes modifications to it.

**What it is NOT:** A content generator or primary formatter. Do not use it to style an entire document. Its job is surgical, targeted edits — a final polish step when the schema and markup aren't enough.

**When to use it:** Rarely. When you need something that can't be expressed through schemas or markup — for example, modifying a specific XML attribute that Glyph's styling utilities don't cover. In theory an LLM can write anything to a DOCX with this method, but it requires precision.

**Status:** Beta. The accuracy and reliability of direct XML writing is still being researched.

**Requires:** API key.

### Form Detection — Heuristic Line Classification

**What it does:** Classifies each line of plaintext by its structural form (H-SHORT, L-BULLET, P-BODY, T-ROW, etc.) using the same heuristic engine that powers schemas. Returns a list of classifications with confidence scores.

**What it is NOT:** AI. This is the same deterministic heuristic engine used by schemas, exposed as a standalone tool.

**When to use it:** When you want to understand what Glyph "sees" in your plaintext before building a schema. Also useful for filtering — extract only headings, or only list items, from a large document.

**Performance:** Local, milliseconds, no API key.

```python
result = client.detect_forms(ws, text=text, forms=["H-SHORT", "L-BULLET"])
```

### Document Chunking — Heading-Bounded Splitting

**What it does:** Splits plaintext or DOCX files at heading boundaries, producing independent chunks that can be processed one at a time.

**What it is NOT:** Semantic chunking. It splits at structural heading boundaries detected by heuristics, not by topic or meaning.

**When to use it:** To reduce LLM context window usage. Instead of sending a 50-page document to an LLM, chunk it and process one section at a time. Works with both plaintext files and DOCX files.

**Performance:** Local, milliseconds, no API key.

```python
result = client.chunk_plaintext_text(ws, text=text)
for chunk in result["chunks"]:
    llm_response = call_llm(chunk["plaintext"])  # Each chunk fits in context
```

---

## Tool Summary

| Tool | AI? | API Key? | Speed | Purpose |
|------|-----|----------|-------|---------|
| **Schema** | No | No | Milliseconds | Baseline structural styling |
| **Inline Markup** | No | No | Milliseconds | Embedded style overrides in plaintext |
| **Plaintext Agent** | Yes | Yes | Seconds | LLM applies markup based on meaning |
| **Schema Agent** | Yes | Yes | Seconds | LLM drafts/edits schemas |
| **XML Agent** | Yes | Yes | Seconds | Direct DOCX XML modifications (beta) |
| **Form Detection** | No | No | Milliseconds | Classify lines by heuristic form |
| **Chunking** | No | No | Milliseconds | Split documents at heading boundaries |

---

## Workflow Patterns

### Pattern 1: Schema Only (Fastest)

The simplest path. Good when your document structure is consistent and predictable.

```
LLM writes plaintext --> Schema styles heuristics --> DOCX
                         (milliseconds, no AI)
```

```python
schema = client.build_schema_from_docx(ws, docx_path="template.docx")
docx_path = client.run_schema(ws, schema=schema, plaintext=plaintext)
```

### Pattern 2: Schema + Markup Agent (Most Common)

Schema handles the structural baseline, then the markup agent adds context-aware overrides.

```
LLM writes plaintext --> Markup agent inserts styling --> Schema compiles --> DOCX
                         (seconds, requires API key)      (milliseconds)
```

```python
schema = client.build_schema_from_docx(ws, docx_path="template.docx")

# Agent understands "professional summary" semantically and inserts markup
response = client.ask(
    message="Bold the professional summary and make section headers dark blue",
    current_plaintext=plaintext,
)
marked_up_plaintext = response["plaintext"]

docx_path = client.run_schema(ws, schema=schema, plaintext=marked_up_plaintext)
```

### Pattern 3: Chunk + Process (Large Documents)

For documents that exceed LLM context windows, chunk first, process per-section, reassemble.

```
Document --> Chunk at headings --> Process each chunk --> Reassemble --> Schema --> DOCX
             (milliseconds)       (per-chunk LLM calls)
```

```python
chunks = client.chunk_plaintext_text(ws, text=full_document)

processed_sections = []
for chunk in chunks["chunks"]:
    result = call_llm(chunk["plaintext"])  # Your LLM call
    processed_sections.append(result)

final_plaintext = "\n".join(processed_sections)
docx_path = client.run_schema(ws, schema=schema, plaintext=final_plaintext)
```

### Pattern 4: Detect + Filter (Pre-Processing)

Use form detection to extract or filter specific content types before processing.

```
Document --> Detect forms --> Filter by type --> Process subset
             (milliseconds)
```

```python
result = client.detect_forms(ws, text=text, forms=["H-SHORT", "H-SECTION-N"])
headings = [c["text"] for c in result["classifications"]]
# Use headings as a table of contents, outline, or navigation structure
```

---

## Building a Workflow: Resume Builder Example

Here's how the tools layer together for a real use case.

**Step 1 — Build your schema.** Look at your reference resume DOCX. Note the font sizes, colors, and spacing for headings, body text, bullet lists. Create selectors for each:

```python
schema = client.build_schema_from_docx(ws, docx_path="resume_template.docx", save_as="resume")
```

At this point, `LLM plaintext -> schema -> DOCX` already produces a solid result with correct heading sizes, bullet formatting, and paragraph spacing. This runs in milliseconds with no API key.

**Step 2 — Add markup for semantic styling.** A schema knows what a heading looks like structurally, but it doesn't know that "Professional Summary" should be styled differently from "Education." Use the markup agent:

```python
response = client.ask(
    message="Bold the professional summary section header and make it dark blue",
    current_plaintext=resume_plaintext,
)
```

The agent returns the same plaintext, but now the professional summary line is wrapped in `$glyph-bold-color-1F4E78` markup. When the schema runs, inline markup overrides the default heading style for just that section.

**Step 3 — Compile.** Feed the marked-up plaintext through the schema:

```python
docx_path = client.run_schema(
    ws,
    schema=schema,
    plaintext=response["plaintext"],
    dest_name="resume_output.docx"
)
```

**The key insight:** Schemas are fast and deterministic. Agents are smart but slow. Layer them — schema for the 90% that's structural, agents for the 10% that requires understanding.

---

## CLI Usage

```bash
# Build schema from template
glyph-forge build template.docx -o ./output

# Build and run in one command
glyph-forge build-and-run template.docx input.txt -o ./output

# Run existing schema
glyph-forge run schema.json input.txt -o ./output

# Detect forms in plaintext
glyph-forge detect-forms document.txt --forms H-SHORT,L-BULLET

# Chunk a document
glyph-forge chunk report.txt
glyph-forge chunk report.docx
```

## Documentation

Full documentation: [glyphapi.ai](https://www.glyphapi.ai/)

## License

Apache License 2.0 - see [LICENSE](LICENSE) for details.

Copyright 2025 Devpro LLC
