Metadata-Version: 2.4
Name: worai
Version: 2.2.0
Summary: AI-powered CLI for WordLift knowledge graph and SEO workflows.
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: copier<10.0.0,>=9.7.1
Requires-Dist: jinja2>=3.1.0
Requires-Dist: morph-kgc>=2.7.0
Requires-Dist: playwright>=1.48.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pyshacl>=0.26.0
Requires-Dist: typer>=0.12.5
Requires-Dist: wordlift-sdk<4.0.0,>=3.6.0
Provides-Extra: dev
Requires-Dist: pytest>=8.3.4; extra == "dev"

# worai

Command-line toolkit for WordLift operations and SEO checks.
Pronunciation: "waw-RYE"

Docs: https://docs.wordlift.io/worai/

## Install

- `pipx install worai`
- `pip install worai`

Full docs: https://docs.wordlift.io/worai/

Runtime dependency note:
- `wordlift-sdk>=3.6.0,<4.0.0` (installed automatically by pip)
- `copier` (required by `worai graph sync create`, installed automatically by pip)

If you plan to run `seocheck`, install Playwright browsers:
- `playwright install chromium`

## Quick Start

- `worai --help`
- `worai seocheck https://example.com/sitemap.xml`
- `worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json`
- `worai <command> --help`

## Configuration

Config file (TOML) discovery order:
- `--config`
- `WORAI_CONFIG`
- `./worai.toml`
- `~/.config/worai/config.toml`
- `~/.worai.toml`

Profiles:
- `[profile.<name>]` with `--profile` or `WORAI_PROFILE`

Common keys:
- `wordlift.api_key`
- `gsc.id`
- `gsc.client_secrets`
- `ga.id`
- `ga.client_secrets`
- `oauth.token` (shared token for GSC + GA)

Supported environment variables:
- `WORAI_CONFIG` — path to a config TOML file (overrides discovery order).
- `WORAI_PROFILE` — profile name under `[profile.<name>]`.
- `WORAI_LOG_LEVEL` — default log level (`debug|info|warning|error`).
- `WORAI_LOG_FORMAT` — default log format (`text|json`).
- `WORDLIFT_KEY` — WordLift API key for entity operations.
- `WORDLIFT_API_KEY` — alternate WordLift API key name (also accepted by some commands).
- `GSC_CLIENT_SECRETS` — path to OAuth client secrets JSON for GSC.
- `GSC_ID` — GSC property URL.
- `OAUTH_TOKEN` — path to store the shared OAuth token (GSC + GA).
- `GSC_OUTPUT` — default output CSV path for GSC export.
- `GA_ID` — GA4 property ID for Analytics sections.
- `GA_CLIENT_SECRETS` — path to OAuth client secrets JSON for GA4.
- `GSC_TOKEN` / `GA_TOKEN` — legacy aliases for `OAUTH_TOKEN` (must point to the same file if used).

`.env` support:
- `worai` loads `.env` from the current working directory (and parent lookup) at startup.
- values from `.env` are treated as environment variables.
- existing environment variables take precedence over `.env` values.

Example environment setup:
```
export WORDLIFT_KEY="wl_..."
export WORAI_CONFIG="~/worai.toml"
export WORAI_PROFILE="dev"
export GSC_CLIENT_SECRETS="~/client_secrets.json"
export OAUTH_TOKEN="~/oauth_token.json"
```

Example `worai.toml`:
```
[defaults]
log_level = "info"

[wordlift]
api_key = "wl_..."

[gsc]
id = "sc-domain:example.com"
client_secrets = "/path/to/client_secrets.json"

[ga]
id = "123456789"
client_secrets = "/path/to/client_secrets.json"

[oauth]
token = "/path/to/oauth_token.json"
```

## Commands

Full docs: https://docs.wordlift.io/worai/

- `seocheck` — run SEO checks against sitemap URLs.
- `google-search-console` — export GSC page metrics to CSV.
- `dedupe` — deduplicate WordLift entities by schema:url.
- `canonicalize-duplicate-pages` — choose canonical URLs using GSC KPIs.
- `delete-entities-from-csv` — delete entities listed in a CSV.
- `find-faq-page-wrong-type` — find/patch FAQPage type issues.
- `find-missing-names` — list pages missing schema:name/headline.
- `find-url-by-type` — extract schema:url by type from RDF.
- `graph` — run graph-specific workflows.
- `link-groups` — build/apply LinkGroup data from CSV.
- `patch` — patch entities from RDF.
- `structured-data` — generate JSON-LD/YARRRML mappings or materialize RDF from YARRRML.
- `validate` — validate JSON-LD against SHACL shapes (use `structured-data validate page` for webpage URLs).
- `upload-entities-from-turtle` — upload .ttl files with resume.

Command help:
- `worai <command> --help`

Autocompletion:
- `worai --install-completion`
- `worai --show-completion`

## Examples

seocheck
- `worai seocheck https://example.com/sitemap.xml`
- `worai seocheck https://example.com/sitemap.xml --output-dir ./seocheck-report --save-html`
- `worai seocheck https://example.com/sitemap.xml --output-dir ./seocheck-report --no-open-report`
- `worai seocheck https://example.com/sitemap.xml --user-agent "Mozilla/5.0 ..."`
- `worai seocheck https://example.com/sitemap.xml --sitemap-fetch-mode browser`
- `worai seocheck https://example.com/sitemap.xml --no-report-ui`
- `worai seocheck https://example.com/sitemap.xml --recheck-failed --recheck-from ./seocheck-report`

google-search-console
- `worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json`
  - Uses OAuth redirect port 8080 by default.

seoreport (with Analytics)
- `worai seoreport --site sc-domain:example.com --ga-id 123456789 --format html`

canonicalize-duplicate-pages
- `worai canonicalize-duplicate-pages --input gsc_pages.csv --output canonical_targets.csv --kpi-window 28d --kpi-metric clicks`
- `worai canonicalize-duplicate-pages --input gsc_pages.csv --entity-type Product`

dedupe
- `worai dedupe --dry-run`

find-faq-page-wrong-type
- `worai find-faq-page-wrong-type ./data.ttl --dry-run --replace-type`
- `worai find-faq-page-wrong-type ./data.ttl --patch --replace-type`

find-missing-names
- `worai find-missing-names ./data.ttl`

find-url-by-type
- `worai find-url-by-type ./data.ttl schema:Service schema:Product`

link-groups
- `worai link-groups ./links.csv --format turtle`
- `worai link-groups ./links.csv --apply --dry-run --concurrency 4`

graph
- `worai --config ./worai.toml graph sync run --profile acme`
- `worai graph sync run --profile acme --debug`
- `worai graph sync create ./acme-graph`
- `worai graph sync create ./acme-graph --template ./graph-sync-template --defaults`
- `worai graph sync create ./acme-graph --data-file ./answers.yml --non-interactive`
- `worai graph sync create ./acme-graph --vcs-ref v1.2.3`
  - `graph sync create` runs Copier in trusted mode by default so template `_tasks` execute.
  - Mapping docs (for `[profiles.<name>]`): `docs/graph-sync-mappings-reference.md`, `docs/graph-sync-mappings-guide.md`, `docs/graph-sync-mappings-examples.md`
  - `web_page_import_timeout` is configured in seconds in `worai.toml` (`60` -> `60000` ms in SDK).

patch
- `worai patch ./data.ttl --dry-run --add-types`

structured-data
- `worai structured-data create https://example.com/article Review --output-dir ./structured-data`
- `worai structured-data create https://example.com/article --type Review --output-dir ./structured-data`
- `worai structured-data create https://example.com/article --type Review --debug`
- `worai structured-data create https://example.com/article --type Review --max-xhtml-chars 40000 --max-nesting-depth 2`
- `worai structured-data generate https://example.com/sitemap.xml --yarrrml ./mapping.yarrrml --output-dir ./out`
- `worai structured-data generate https://example.com/page --yarrrml ./mapping.yarrrml --format jsonld`
- `worai structured-data inventory https://example.com/sitemap.xml --output ./structured-data-inventory.csv`
- `worai structured-data inventory ./urls.txt --output ./structured-data-inventory.csv`
- `worai structured-data inventory https://docs.google.com/spreadsheets/d/<id>/edit --sheet-name URLs_US --output ./structured-data-inventory.csv`
- `worai structured-data inventory https://example.com/sitemap.xml --destination-sheet-id <spreadsheet_id> --destination-sheet-name Inventory`
- `worai structured-data inventory https://example.com/sitemap.xml --output ./structured-data-inventory.csv --concurrency auto`
- `worai structured-data inventory /path/to/debug_cloud/us --source-type debug-cloud --output ./structured-data-inventory.csv`

validate
- `worai validate jsonld --shape review-snippet --shape schema-review ./data.jsonld`
- `worai validate jsonld --format raw https://api.wordlift.io/data/example.jsonld`
- `worai structured-data validate page https://example.com/article --shape review-snippet`

upload-entities-from-turtle
- `worai upload-entities-from-turtle ./entities --recursive --limit 50`

## Troubleshooting

- Playwright missing browsers:
  - `playwright install chromium`
- YARRRML conversion:
  - `npm install -g @rmlio/yarrrml-parser`
- RML execution:
  - `morph-kgc` is included in project dependencies
- Dependency notes:
  - Common runtime libs (e.g., `requests`, `rdflib`, `tqdm`, `advertools`, Google auth helpers) are provided transitively by `wordlift-sdk`.
- OAuth token issues:
  - Remove the token file and re-run `worai google-search-console`.
  - If you are prompted to re-auth every run, delete the token file to force a new consent flow that includes a refresh token.
