Metadata-Version: 2.4
Name: worai
Version: 1.0.0
Summary: Add your description here
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: google-auth>=2.35.0
Requires-Dist: google-auth-oauthlib>=1.2.1
Requires-Dist: playwright>=1.48.0
Requires-Dist: rdflib>=7.5.0
Requires-Dist: requests>=2.32.5
Requires-Dist: tqdm>=4.67.1
Requires-Dist: typer>=0.12.5
Requires-Dist: wordlift-client>=1.133.0
Provides-Extra: dev
Requires-Dist: pytest>=8.3.4; extra == "dev"

This folder contains WORAi CLI utilities.

Install
- Local: `pipx install .`
- From git: `pipx install git+<repo_url>`
- Curl installer: `curl -fsSL <install.sh_url> | bash -s -- <repo_url>`

Usage
- `worai --help`
- `worai seocheck <sitemap_url_or_path>`
- `worai google-search-console --site sc-domain:example.com --client-secrets /path/to/client_secrets.json`

Configuration
- Config file (TOML) discovery order: `--config`, `WORAI_CONFIG`, `./worai.toml`, `~/.config/worai/config.toml`, `~/.worai.toml`.
- Profiles: `[profile.<name>]` with `--profile` or `WORAI_PROFILE`.
- Common keys:
  - `wordlift.api_key`
  - `gsc.client_secrets`
  - `gsc.token`

seocheck
The `seocheck` command runs SEO checks against URLs found in a sitemap. It supports remote sitemap URLs and local sitemap files.

Usage
- `worai seocheck <sitemap_url_or_path>`

Options
- `sitemap_url`: URL or local file path to a sitemap XML (or sitemap index). Supports `.gz` files and `file://` URLs.
- `--max-urls`: Limit the number of URLs checked. Default: no limit.
- `--timeout`: Timeout in seconds for HTTP requests (sitemaps, robots.txt, llms.txt). Default: 20.0.
- `--page-timeout`: Timeout in milliseconds for browser page loads. Default: 30000.
- `--wait-until`: Playwright wait strategy for page load. Choices: `domcontentloaded`, `load`, `networkidle`. Default: `domcontentloaded`.
- `--ttfb-ok-ms`: TTFB ok threshold in ms. Default: 200.
- `--ttfb-warn-ms`: TTFB warn threshold in ms. Default: 500.
- `--headed`: Run the browser with a visible UI instead of headless mode.
- `--format`: Output format. Choices: `text`, `json`. Default: `text`.
- `--output-dir`: Write report outputs to this directory (report.json, summary.txt, per-page JSONs, report UI).
- `--output`: Write a comprehensive JSON report to this file path.
- `--output-summary`: Write a human-readable summary report to this file path.
- `--save-html`: Save rendered HTML for each page to the output directory.
- `--checks`: Comma-separated list of page check names to run (others disabled).
- `--disable-checks`: Comma-separated list of page check names to skip.
- `--concurrency`: Number of pages to process concurrently, or `auto`. Default: 1.

Notes
- The page checks are modular. Add a new check under `worai/seocheck/checks` and register it in `worai/seocheck/checks/__init__.py`.
- The browser is Playwright (Chromium) to ensure JavaScript rendering.
- When using `--output-dir`, open `index.html` from that directory to view the report UI (or serve it with a web server).

Examples
- `worai seocheck https://example.com/sitemap.xml`
- `worai seocheck ./sitemap.xml`
- `worai seocheck /path/to/sitemap.xml.gz`
- `worai seocheck https://example.com/sitemap.xml --wait-until networkidle`
- `worai seocheck https://example.com/sitemap.xml --max-urls 25 --format json`
- `worai seocheck https://example.com/sitemap.xml --output ./report.json --output-summary ./report.txt`
- `worai seocheck https://example.com/sitemap.xml --output-dir ./seocheck-report --save-html`
- `worai seocheck https://example.com/sitemap.xml --checks status,page_meta,canonical`

Command Examples
- `worai google-search-console --site sc-domain:example.com --client-secrets ./client_secrets.json`
- `worai dedupe --dry-run`
- `worai canonicalize-duplicate-pages --input gsc_pages.csv --output canonical_targets.csv --kpi-window 28d --kpi-metric clicks`
- `worai delete-entities-from-csv ./entities.csv --batch-size 20`
- `worai find-faq-page-wrong-type ./data.ttl`
- `worai find-faq-page-wrong-type ./data.ttl --dry-run --replace-type`
- `worai find-faq-page-wrong-type ./data.ttl --patch --replace-type`
- `worai find-missing-names ./data.ttl`
- `worai find-url-by-type ./data.ttl schema:Service schema:Product`
- `worai link-groups ./links.csv --format turtle`
- `worai link-groups ./links.csv --apply --dry-run --concurrency 4`
- `worai patch ./data.ttl --dry-run --add-types`
- `worai patch ./data.jsonld --types-only --workers 4`
- `worai upload-entities-from-turtle ./entities --recursive --limit 50`

Client secrets format
The `--client-secrets` file is the OAuth2 client configuration downloaded from Google Cloud Console.
It looks like:

{
  "installed": {
    "client_id": "YOUR_CLIENT_ID.apps.googleusercontent.com",
    "project_id": "your-project-id",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_secret": "YOUR_CLIENT_SECRET",
    "redirect_uris": [
      "http://localhost"
    ]
  }
}

Setup steps
1. Google Cloud Console -> APIs & Services -> OAuth consent screen:
   - Configure a consent screen (External or Internal) and add yourself as a test user if required.
2. APIs & Services -> Credentials -> Create Credentials -> OAuth client ID:
   - Application type: Desktop app.
3. Download the JSON and pass its path to `--client-secrets`.
4. Run the command:
   - `worai google-search-console --site sc-domain:example.com --client-secrets /path/to/client_secrets.json`
   - If your OAuth client requires a fixed redirect URI (e.g. Web app), add `--port 8080` and register `http://localhost:8080`.
5. The first run opens a browser for consent and writes a token file (default `gsc_token.json`).
