Metadata-Version: 2.4
Name: worai
Version: 1.0.2
Summary: Add your description here
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: google-auth>=2.35.0
Requires-Dist: google-auth-oauthlib>=1.2.1
Requires-Dist: playwright>=1.48.0
Requires-Dist: rdflib>=7.5.0
Requires-Dist: requests>=2.32.5
Requires-Dist: tqdm>=4.67.1
Requires-Dist: typer>=0.12.5
Requires-Dist: wordlift-client>=1.133.0
Provides-Extra: dev
Requires-Dist: pytest>=8.3.4; extra == "dev"

# WORAi

Command-line toolkit for WordLift operations and SEO checks.

## Install

- Local:
  - `pipx install .`
- From git:
  - `pipx install git+<repo_url>`
- Official (PyPI):
  - `pipx install worai`
  - `pip install worai`
- Curl installer:
  - `curl -fsSL <install.sh_url> | bash -s -- <repo_url>`

If you plan to run `seocheck`, install Playwright browsers:
- `playwright install chromium`

## Quick Start

- `worai --help`
- `worai seocheck https://example.com/sitemap.xml`
- `worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json`
- `worai <command> --help`

## Configuration

Config file (TOML) discovery order:
- `--config`
- `WORAI_CONFIG`
- `./worai.toml`
- `~/.config/worai/config.toml`
- `~/.worai.toml`

Profiles:
- `[profile.<name>]` with `--profile` or `WORAI_PROFILE`

Common keys:
- `wordlift.api_key`
- `gsc.client_secrets`
- `gsc.token`

Supported environment variables:
- `WORAI_CONFIG` — path to a config TOML file (overrides discovery order).
- `WORAI_PROFILE` — profile name under `[profile.<name>]`.
- `WORAI_LOG_LEVEL` — default log level (`debug|info|warning|error`).
- `WORAI_LOG_FORMAT` — default log format (`text|json`).
- `WORDLIFT_KEY` — WordLift API key for entity operations.
- `WORDLIFT_API_KEY` — alternate WordLift API key name (also accepted by some commands).
- `GSC_CLIENT_SECRETS` — path to OAuth client secrets JSON for GSC.
- `GSC_TOKEN` — path to store the OAuth token.
- `GSC_OUTPUT` — default output CSV path for GSC export.

Example environment setup:
```
export WORDLIFT_KEY="wl_..."
export WORAI_CONFIG="~/worai.toml"
export WORAI_PROFILE="dev"
export GSC_CLIENT_SECRETS="~/client_secrets.json"
```

Example `worai.toml`:
```
[defaults]
log_level = "info"

[wordlift]
api_key = "wl_..."

[gsc]
client_secrets = "/path/to/client_secrets.json"
token = "/path/to/gsc_token.json"
```

## Commands

- `seocheck` — run SEO checks against sitemap URLs. Docs: `docs/commands/seocheck.md`
- `google-search-console` — export GSC page metrics to CSV. Docs: `docs/commands/google-search-console.md`
- `dedupe` — deduplicate WordLift entities by schema:url. Docs: `docs/commands/dedupe.md`
- `canonicalize-duplicate-pages` — choose canonical URLs using GSC KPIs. Docs: `docs/commands/canonicalize-duplicate-pages.md`
- `delete-entities-from-csv` — delete entities listed in a CSV. Docs: `docs/commands/delete-entities-from-csv.md`
- `find-faq-page-wrong-type` — find/patch FAQPage type issues. Docs: `docs/commands/find-faq-page-wrong-type.md`
- `find-missing-names` — list pages missing schema:name/headline. Docs: `docs/commands/find-missing-names.md`
- `find-url-by-type` — extract schema:url by type from RDF. Docs: `docs/commands/find-url-by-type.md`
- `link-groups` — build/apply LinkGroup data from CSV. Docs: `docs/commands/link-groups.md`
- `patch` — patch entities from RDF. Docs: `docs/commands/patch.md`
- `upload-entities-from-turtle` — upload .ttl files with resume. Docs: `docs/commands/upload-entities-from-turtle.md`

Command help:
- `worai <command> --help`

Autocompletion:
- `worai --install-completion`
- `worai --show-completion`

## Examples

seocheck
- `worai seocheck https://example.com/sitemap.xml`
- `worai seocheck https://example.com/sitemap.xml --output-dir ./seocheck-report --save-html`

google-search-console
- `worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json`

canonicalize-duplicate-pages
- `worai canonicalize-duplicate-pages --input gsc_pages.csv --output canonical_targets.csv --kpi-window 28d --kpi-metric clicks`

dedupe
- `worai dedupe --dry-run`

find-faq-page-wrong-type
- `worai find-faq-page-wrong-type ./data.ttl --dry-run --replace-type`
- `worai find-faq-page-wrong-type ./data.ttl --patch --replace-type`

find-missing-names
- `worai find-missing-names ./data.ttl`

find-url-by-type
- `worai find-url-by-type ./data.ttl schema:Service schema:Product`

link-groups
- `worai link-groups ./links.csv --format turtle`
- `worai link-groups ./links.csv --apply --dry-run --concurrency 4`

patch
- `worai patch ./data.ttl --dry-run --add-types`

upload-entities-from-turtle
- `worai upload-entities-from-turtle ./entities --recursive --limit 50`

## Troubleshooting

- Playwright missing browsers:
  - `playwright install chromium`
- OAuth token issues:
  - Remove the token file and re-run `worai google-search-console`.
