Metadata-Version: 2.4
Name: genworker-ai
Version: 0.2.1
Summary: Desktop AI Agent powered by Claude Computer Use — controls mouse, keyboard & shell via natural language
Project-URL: Homepage, https://github.com/fabiokatsumi/GENWORKER
Project-URL: Documentation, https://github.com/fabiokatsumi/GENWORKER#platform-setup-guide
Project-URL: Repository, https://github.com/fabiokatsumi/GENWORKER
Project-URL: Issues, https://github.com/fabiokatsumi/GENWORKER/issues
Project-URL: Changelog, https://github.com/fabiokatsumi/GENWORKER/releases
Author-email: Fabio Katsumi <fabio.katsumi@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: agent,ai,automation,claude,computer-use,desktop,macos,rpa,windows
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: MacOS X
Classifier: Environment :: Win32 (MS Windows)
Classifier: Environment :: X11 Applications
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Desktop Environment
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.13
Requires-Dist: anthropic>=0.83.0
Requires-Dist: openai>=2.23.0
Requires-Dist: pillow>=12.1.1
Requires-Dist: pyautogui>=0.9.54
Requires-Dist: pyperclip>=1.11.0
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: screeninfo>=0.8.1
Description-Content-Type: text/markdown

# GenWorker

A desktop AI agent powered by Claude Computer Use. GenWorker takes natural-language tasks and executes them on your desktop by controlling the mouse, keyboard, and shell — using Claude's computer, bash, and text editor tools.

## Features

- **Computer Use** — mouse clicks, keyboard input, and screenshots via Claude's computer tool
- **Shell Execution** — runs PowerShell/CMD/bash commands
- **File Editing** — view, create, and edit files with the str_replace_editor tool
- **Multi-monitor support** — select which display to control
- **Adaptive screenshots** — fast for CLI, slower for GUI loads, with JPEG compression
- **Retry & error recovery** — exponential backoff on API errors, automatic recovery hints
- **Context management** — token tracking, smart compaction, and image pruning
- **Task timeout** — configurable time limit with graceful termination

## Installation

**With pip:**

```bash
pip install genworker-ai
```

**With [uv](https://docs.astral.sh/uv/) (recommended):**

```bash
uv pip install genworker-ai
```

Or run it directly without installing — `uv` will create a temporary environment, resolve dependencies, and execute the command in one step:

```bash
uv run --with genworker-ai genworker "Open Chrome and search for Python docs"
```

You can also use `uvx` (shorthand for `uv tool run`) to run it as a CLI tool:

```bash
uvx genworker-ai "Open Notepad and write Hello World"
```

## Requirements

- Python 3.13+
- An [Anthropic API key](https://console.anthropic.com/)

## Quick start

1. Create a `.env` file (or export the variable):

   ```
   ANTHROPIC_API_KEY=sk-ant-...
   ```

2. Run a task:

   ```bash
   genworker "Open Notepad and write Hello World"
   ```

   Or start interactively:

   ```bash
   genworker
   ```

   Or with `uv` (no install needed):

   ```bash
   uv run --with genworker-ai genworker "Open Notepad and write Hello World"
   ```

## Usage

```
genworker [OPTIONS] [TASK]
```

| Option | Description |
|---|---|
| `TASK` | Natural-language task to execute (interactive prompt if omitted) |
| `--monitor`, `-m` | Monitor index to use (default: `0`) |
| `--list-monitors`, `-l` | List available monitors and exit |
| `--no-thinking` | Disable Claude's extended thinking |
| `--timeout`, `-t` | Task timeout in seconds (default: `600`) |
| `--gui` | Launch the graphical interface |

You can also run it as a Python module:

```bash
python -m genworker "Open Chrome and search for Python docs"
```

## Platform Setup Guide

GenWorker controls your desktop — mouse, keyboard, and screen capture. Each operating system requires specific permissions and setup. Follow the instructions for your platform below.

### macOS

#### 1. Grant Accessibility Permission (Required)

macOS blocks apps from controlling the mouse and keyboard unless you explicitly grant **Accessibility** access. Without this, GenWorker will run but **mouse clicks and key presses will silently do nothing**.

1. Open **System Settings** (or System Preferences on older macOS)
2. Go to **Privacy & Security → Accessibility**
3. Click the **+** button (you may need to unlock with your password)
4. Add the terminal app you use to run GenWorker:
   - **Terminal.app** — `/Applications/Utilities/Terminal.app`
   - **iTerm2** — `/Applications/iTerm.app`
   - **VS Code integrated terminal** — add `/Applications/Visual Studio Code.app`
   - **PyCharm terminal** — add the PyCharm application
5. Make sure the toggle next to your app is **ON**
6. **Restart your terminal** after granting access (required for the permission to take effect)

> **How do I know if this is working?**
> GenWorker runs built-in diagnostics at startup on macOS. Look for:
> ```
> 🔍  macOS mouse diagnostics
> ──────────────────────────────────────────────────
>   ✅  Accessibility: GRANTED
> ```
> If you see `❌  Accessibility: DENIED`, follow the steps above.

#### 2. Grant Screen Recording Permission (Required for screenshots)

GenWorker takes screenshots to see what's on your screen. macOS requires **Screen Recording** permission for this.

1. Open **System Settings → Privacy & Security → Screen Recording**
2. Click **+** and add the same terminal app from step 1
3. Toggle it **ON**
4. **Restart your terminal**

> On first run, macOS may show a popup asking for Screen Recording permission — click **Allow**.

#### 3. Retina Display Notes

GenWorker automatically detects Retina (HiDPI) displays and handles coordinate scaling. The startup diagnostics will report:

```
📐  CoreGraphics logical  = 1512×982
📐  CoreGraphics physical = 3024×1964
📐  Retina scale factor   = 2.0×
✅  Coordinate scaling: screeninfo matches logical points
```

If a mismatch is detected, GenWorker will auto-correct. No manual action is needed.

#### 4. macOS Quick Start

```bash
# Install (pick one)
pip install genworker-ai
# or
uv pip install genworker-ai

# Set your API key
export ANTHROPIC_API_KEY=sk-ant-...

# Run (diagnostics will confirm permissions are OK)
genworker "Open Safari and search for Python documentation"

# Or use the GUI
genworker --gui

# Or run without installing (uv handles everything)
uv run --with genworker-ai genworker "Open Safari and search for Python documentation"
```

---

### Windows

#### 1. Display Scaling (DPI)

Windows display scaling (100%, 125%, 150%, etc.) can cause mouse clicks to land at wrong positions. For best results:

- **Option A** — Set display scaling to **100%** in Settings → Display → Scale
- **Option B** — If you need a higher scaling, GenWorker will attempt to handle it automatically, but 100% is the most reliable setting

To check your current scaling:
1. Right-click the desktop → **Display settings**
2. Look at **Scale and layout** → **Scale**

#### 2. Running as Administrator

Some tasks require elevated privileges (e.g., interacting with apps running as admin, or modifying system settings). If GenWorker can't click on certain windows:

1. Right-click your terminal (Command Prompt, PowerShell, or Windows Terminal)
2. Select **Run as administrator**
3. Run GenWorker from that elevated terminal

> For most tasks, standard (non-admin) mode works fine. Only escalate if you encounter permission issues.

#### 3. Antivirus / Security Software

Some antivirus programs may flag GenWorker's mouse/keyboard control as suspicious. If you encounter issues:

- Add your Python installation directory to your antivirus exclusions
- Or temporarily disable the protection while using GenWorker

#### 4. Windows Quick Start

```powershell
# Install (pick one)
pip install genworker-ai
# or
uv pip install genworker-ai

# Set your API key (PowerShell)
$env:ANTHROPIC_API_KEY = "sk-ant-..."

# Or set it permanently via System Environment Variables:
# Settings → System → About → Advanced system settings → Environment Variables

# Run
genworker "Open Notepad and write Hello World"

# Or use the GUI
genworker --gui

# Or run without installing (uv handles everything)
uv run --with genworker-ai genworker "Open Notepad and write Hello World"
```

---

### Linux

On most Linux desktop environments, GenWorker works without special permissions. If you encounter issues:

- **Wayland** — `pyautogui` requires X11. If your desktop uses Wayland, either switch to an X11 session or run with `XDG_SESSION_TYPE=x11`
- **Headless servers** — GenWorker requires a display. Use a virtual display like Xvfb:
  ```bash
  sudo apt install xvfb
  xvfb-run genworker "your task"
  ```

---

### GUI Mode

All platforms support an optional graphical interface:

```bash
genworker --gui
```

The GUI provides:
- A multi-line task input with Enter to submit
- Live screenshot preview of what the agent sees
- Start/Stop controls, timeout and step configuration
- Colour-coded output log

You can also launch it directly:

```bash
genworker-gui
```

## Configuration

All settings can be configured via environment variables or a `.env` file:

| Variable | Default | Description |
|---|---|---|
| `ANTHROPIC_API_KEY` | — | **Required.** Your Anthropic API key |
| `MODEL` | `claude-sonnet-4-5-20250929` | Claude model to use |
| `MAX_STEPS` | `100` | Maximum agent loop iterations |
| `TASK_TIMEOUT` | `600` | Task timeout in seconds |
| `THINKING_BUDGET` | `4096` | Token budget for extended thinking |
| `SCREENSHOT_INTERVAL` | `1.0` | Seconds between screenshots |
| `SCREENSHOT_QUALITY` | `75` | JPEG compression quality (1-100) |
| `MAX_CONTEXT_IMAGES` | `15` | Max screenshots kept in context |
| `BASH_TIMEOUT` | `30` | Shell command timeout in seconds |
| `MONITOR_INDEX` | `0` | Default monitor index |

## License

[MIT](LICENSE)
