Metadata-Version: 2.4
Name: svo-client
Version: 2.3.2
Summary: Async client for SVO semantic chunker microservice.
Home-page: https://github.com/your_org/svo_client
Author: Vasiliy Zdanovskiy
Author-email: vasilyvz@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.0.0
Requires-Dist: chunk_metadata_adapter>=3.3.4
Requires-Dist: mcp-proxy-adapter>=6.9.122
Requires-Dist: embed-client==3.1.9.2
Requires-Dist: aiohttp>=3.8.0
Requires-Dist: httpx>=0.24.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# svo-client

Async Python client for SVO Semantic Chunker microservice.

## Installation

```bash
pip install svo-client
```

## Quick Start

### Text Chunking

```python
from svo_client import ChunkerClient
import asyncio

async def main():
    async with ChunkerClient(
        host="localhost", port=8009,
        cert="client.crt", key="client.key", ca="ca.crt",
    ) as client:
        chunks = await client.chunk(["Your text here."])
        for text_chunks in chunks:
            for chunk in text_chunks:
                print(chunk.text)

asyncio.run(main())
```

### File Chunking

```python
async with ChunkerClient(
    host="localhost", port=8009,
    cert=cert, key=key, ca=ca,
) as client:
    # Any file, any format — server handles processing.
    # Default: no timeout limit (server can work up to 1 hour).
    chunks = await client.chunk_file(
        filepath="/path/to/document.pdf",
        filter_name="plain_text",
    )

    # Explicit timeout limit (optional):
    chunks = await client.chunk_file(
        filepath="/path/to/document.pdf",
        timeout=1800,  # 30 min max
    )
```

## API Reference

### `ChunkerClient`

```python
ChunkerClient(
    *,
    config: Optional[Dict[str, Any]] = None,
    host: str = "localhost",
    port: int = 8009,
    cert: Optional[str] = None,
    key: Optional[str] = None,
    ca: Optional[str] = None,
    token: Optional[str] = None,
    token_header: str = "X-API-Key",
    check_hostname: bool = False,
    timeout: Optional[float] = None,
)
```

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `config` | `Optional[Dict]` | `None` | Pre-built config; if set, other args ignored |
| `host` | `str` | `"localhost"` | Server host |
| `port` | `int` | `8009` | Server port |
| `cert` | `Optional[str]` | `None` | Client certificate path (mTLS) |
| `key` | `Optional[str]` | `None` | Client key path (mTLS) |
| `ca` | `Optional[str]` | `None` | CA certificate path (mTLS) |
| `token` | `Optional[str]` | `None` | API key for authentication |
| `token_header` | `str` | `"X-API-Key"` | HTTP header for API key |
| `check_hostname` | `bool` | `False` | Verify SSL hostname |
| `timeout` | `Optional[float]` | `None` | Default request timeout (seconds); `None` = no timeout |

### Methods

#### `chunk(texts, use_sv=False, timeout=0.0, verify_integrity=False, **params)`

Chunk a list of texts via WebSocket. Returns `List[List[SemanticChunk]]` —
one list of chunks per input text.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `texts` | `List[str]` | required | Texts to chunk |
| `use_sv` | `bool` | `False` | Use semantic verification |
| `timeout` | `float` | `0.0` | Timeout in seconds; `0` = no limit |
| `verify_integrity` | `bool` | `False` | Check text integrity after chunking |
| `**params` | `Any` | — | Additional chunk parameters |

#### `file(*, filepath=None, filename=None, file_content=None, filter_name="plain_text", timeout=None)`

Send a file to the server for text extraction. Returns `FileResponse`.

Two input modes:
- **CLI channel**: provide `filepath` (server reads the file).
- **API channel**: provide `filename` + `file_content` (bytes; base64-encoded internally).

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `filepath` | `Optional[str]` | `None` | Local file path (CLI channel) |
| `filename` | `Optional[str]` | `None` | Filename (API channel) |
| `file_content` | `Optional[bytes]` | `None` | Raw file bytes (API channel) |
| `filter_name` | `str` | `"plain_text"` | Server-side filter name |
| `timeout` | `Optional[float]` | `None` | Per-call timeout in seconds |

#### `chunk_file(*, filepath=None, filename=None, file_content=None, filter_name="plain_text", use_sv=False, timeout=0, verify_integrity=False, **chunk_params)`

Convenience method: `file()` + `chunk()` in one call. Returns
`List[List[SemanticChunk]]`.

Any file format supported (PDF, DOCX, images, text, markdown, etc.) —
server handles all processing.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `filepath` | `Optional[str]` | `None` | Local file path |
| `filename` | `Optional[str]` | `None` | Filename for API channel |
| `file_content` | `Optional[bytes]` | `None` | Raw file bytes |
| `filter_name` | `str` | `"plain_text"` | Server-side filter name |
| `use_sv` | `bool` | `False` | Use semantic verification |
| `timeout` | `Optional[float]` | `0` | Timeout in seconds; `0` = no limit |
| `verify_integrity` | `bool` | `False` | Check text integrity |
| `**chunk_params` | `Any` | — | Additional chunk parameters |

#### `config(timeout=None)`

Retrieve server configuration. Returns `Dict[str, Any]`.

#### `help_cmd(command=None, timeout=None)`

Retrieve server help information. Returns `Dict[str, Any]`.
If `command` is given, returns help for that specific command.

#### `health()`

Health check — verifies the server is up. Returns `Dict[str, Any]`.

#### `open_ws_channel(receive_timeout=60.0, heartbeat=30.0)`

Open a bidirectional WebSocket channel for multiple requests.
Returns `BidirectionalWsChannel`.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `receive_timeout` | `float` | `60.0` | Per-message receive timeout (seconds) |
| `heartbeat` | `float` | `30.0` | WebSocket keepalive interval (seconds) |

#### `close()`

Close the underlying client connection. Also available via async context
manager (`async with`).

## Long-Running File Operations

File processing is entirely server-side. The client sends the file as-is
(any format: PDF, DOCX, images, text, markdown, etc.).

- `chunk_file()` defaults to no timeout limit — the client waits as long
  as the server needs (up to 1 hour for large files).
- Results arrive via WebSocket; the adapter heartbeat (30s) keeps the
  connection alive.
- To set an explicit limit, pass `timeout=N` (seconds):

```python
# Wait up to 30 minutes:
chunks = await client.chunk_file(
    filepath="/path/to/large.pdf",
    timeout=1800,
)
```

## CLI Reference

```bash
# Chunk text
svo-chunker chunk --text "Your text here" [--use-sv] [--type Draft]

# Chunk multiple texts (batch)
svo-chunker chunk-batch --text "Text one" --text "Text two"

# Process file (extraction + optional chunking)
svo-chunker file --filepath /path/to/file [--filter plain_text] [--chunk]

# Server configuration
svo-chunker config

# Health check
svo-chunker health
```

## Error Handling

All exceptions are importable from `svo_client`:

```python
from svo_client import SVOServerError, SVOTimeoutError, SVOFileError
```

| Exception | When Raised |
|-----------|-------------|
| `SVOServerError` | Server application-level error |
| `SVOChunkingIntegrityError` | Text integrity check fails (subclass of `SVOServerError`) |
| `SVOJSONRPCError` | JSON-RPC error response |
| `SVOHTTPError` | HTTP error or invalid response |
| `SVOWebSocketRequiredError` | WebSocket required but unavailable |
| `SVOConnectionError` | Network/connection issues |
| `SVOTimeoutError` | Request timeout exceeded |
| `SVOEmbeddingError` | Embedding service error |
| `SVOFileError` | Base file-command error |
| `SVOFilePayloadError` | Invalid file payload (subclass of `SVOFileError`) |
| `SVOFileTypeError` | Unknown filter_name (subclass of `SVOFileError`) |
| `SVOFileNotFoundError` | File not found (subclass of `SVOFileError`) |
| `SVOFilePermissionError` | Permission denied (subclass of `SVOFileError`) |
| `SVOFileReadError` | OS-level file read error (subclass of `SVOFileError`) |

## Filter Names

Supported server-side filter names for file processing:

- `plain_text` — extract plain text content (default)
- `markdown` — extract and process markdown
- `txt` — raw text extraction
