Metadata-Version: 2.4
Name: vixio
Version: 0.1.5
Summary: Voice-Powered Agent Framework
Author: Weyne Chen
License-Expression: MIT
Project-URL: Homepage, https://github.com/weynechen/vixio
Project-URL: Repository, https://github.com/weynechen/vixio
Project-URL: Issues, https://github.com/weynechen/vixio/issues
Keywords: voice,agent,ai,speech,framework
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.5.0
Requires-Dist: pydantic-settings>=2.1.0
Requires-Dist: ruamel.yaml>=0.18.16
Requires-Dist: loguru>=0.7.3
Provides-Extra: xiaozhi
Requires-Dist: fastapi>=0.110.0; extra == "xiaozhi"
Requires-Dist: uvicorn[standard]>=0.27.0; extra == "xiaozhi"
Requires-Dist: websockets<15.0,>=14.0; extra == "xiaozhi"
Requires-Dist: opuslib_next>=1.1.5; extra == "xiaozhi"
Requires-Dist: PyJWT>=2.10.0; extra == "xiaozhi"
Requires-Dist: aiohttp>=3.13.0; extra == "xiaozhi"
Requires-Dist: aiohttp-cors>=0.8.0; extra == "xiaozhi"
Requires-Dist: numpy<2.0.0,>=1.26.0; extra == "xiaozhi"
Provides-Extra: silero-vad-grpc
Requires-Dist: grpcio>=1.76.0; extra == "silero-vad-grpc"
Requires-Dist: grpcio-tools>=1.76.0; extra == "silero-vad-grpc"
Provides-Extra: sherpa-onnx-asr-grpc
Requires-Dist: grpcio>=1.76.0; extra == "sherpa-onnx-asr-grpc"
Requires-Dist: grpcio-tools>=1.76.0; extra == "sherpa-onnx-asr-grpc"
Requires-Dist: numpy<2.0.0,>=1.26.0; extra == "sherpa-onnx-asr-grpc"
Provides-Extra: kokoro-cn-tts-grpc
Requires-Dist: grpcio>=1.76.0; extra == "kokoro-cn-tts-grpc"
Requires-Dist: grpcio-tools>=1.76.0; extra == "kokoro-cn-tts-grpc"
Requires-Dist: numpy<2.0.0,>=1.26.0; extra == "kokoro-cn-tts-grpc"
Provides-Extra: silero-vad-local
Requires-Dist: onnxruntime-gpu>=1.16.0; extra == "silero-vad-local"
Requires-Dist: numpy>=1.24.0; extra == "silero-vad-local"
Requires-Dist: silero-vad>=5.0; extra == "silero-vad-local"
Provides-Extra: sherpa-onnx-asr-local
Requires-Dist: onnxruntime-gpu>=1.16.0; extra == "sherpa-onnx-asr-local"
Requires-Dist: numpy>=1.24.0; extra == "sherpa-onnx-asr-local"
Requires-Dist: sherpa-onnx>=1.12.15; extra == "sherpa-onnx-asr-local"
Requires-Dist: huggingface_hub>=0.20.0; extra == "sherpa-onnx-asr-local"
Provides-Extra: kokoro-cn-tts-local
Requires-Dist: torch>=2.0.0; extra == "kokoro-cn-tts-local"
Requires-Dist: numpy>=1.24.0; extra == "kokoro-cn-tts-local"
Requires-Dist: kokoro>=0.8.1; extra == "kokoro-cn-tts-local"
Requires-Dist: misaki[zh]>=0.8.1; extra == "kokoro-cn-tts-local"
Provides-Extra: openai-agent
Requires-Dist: openai-agents[litellm]>=0.4.2; extra == "openai-agent"
Requires-Dist: openai>=2.7.0; extra == "openai-agent"
Requires-Dist: httpx>=0.28.0; extra == "openai-agent"
Provides-Extra: edge-tts
Requires-Dist: edge-tts>=7.2.3; extra == "edge-tts"
Requires-Dist: pydub>=0.25.0; extra == "edge-tts"
Provides-Extra: qwen
Requires-Dist: dashscope>=1.25.3; extra == "qwen"
Provides-Extra: doubao
Requires-Dist: websockets>=14.0; extra == "doubao"
Provides-Extra: dev-local-cn
Requires-Dist: vixio[kokoro-cn-tts-local,openai-agent,sherpa-onnx-asr-local,silero-vad-local,xiaozhi]; extra == "dev-local-cn"
Provides-Extra: dev-grpc
Requires-Dist: vixio[kokoro-cn-tts-grpc,openai-agent,sherpa-onnx-asr-grpc,silero-vad-grpc,xiaozhi]; extra == "dev-grpc"
Provides-Extra: dev-qwen
Requires-Dist: vixio[openai-agent,qwen,silero-vad-local,xiaozhi]; extra == "dev-qwen"
Provides-Extra: dev-qwen-streaming
Requires-Dist: vixio[openai-agent,qwen,xiaozhi]; extra == "dev-qwen-streaming"
Provides-Extra: dev-doubao
Requires-Dist: vixio[doubao,openai-agent,xiaozhi]; extra == "dev-doubao"
Provides-Extra: quickstart
Requires-Dist: vixio[openai-agent,qwen,silero-vad-local,xiaozhi]; extra == "quickstart"
Provides-Extra: test
Requires-Dist: pytest>=7.4.0; extra == "test"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "test"
Requires-Dist: pytest-cov>=4.1.0; extra == "test"
Requires-Dist: black>=23.7.0; extra == "test"
Requires-Dist: isort>=5.12.0; extra == "test"
Requires-Dist: mypy>=1.5.0; extra == "test"
Requires-Dist: ruff>=0.0.287; extra == "test"
Requires-Dist: pydub; extra == "test"
Requires-Dist: psutil>=7.0.0; extra == "test"
Dynamic: license-file

# Vixio

**Quickly add voice interaction capabilities to AI Agents, with Xiaozhi protocol compatibility for seamless hardware integration**

[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
[![Status: Alpha](https://img.shields.io/badge/status-alpha-orange.svg)]()

**[中文文档](docs/README_zh.md)**

## Why Vixio?

- Vixio is Agent-centric — quickly add voice capabilities to any Agent without dealing with complex audio processing details.
- Compatible with Xiaozhi protocol for rapid hardware integration.
- Can serve as a Xiaozhi server — start with just one command.

## Features

### 🎯 Core Advantages

- **Flexible DAG Architecture**: Data flow design based on directed acyclic graphs, nodes can be freely combined. Beyond voice conversation, supports transcription, real-time translation, digital humans, and more.

- **Three Operating Modes**:
  - **Pipeline** - Traditional cascade (VAD→ASR→Agent→TTS), maximum control
  - **Streaming** - Bidirectional streaming, low latency
  - **Realtime** - End-to-end model, lowest latency
- **Multiple Providers**: Support for OpenAI, Qwen, Doubao and more, continuously expanding.
- **Ready to Use**: Built-in Xiaozhi hardware protocol support
- **Interface Agnostic**: Interfaces abstracted as transports, can be replaced with any protocol.
- **Local Inference Support**: Unified gRPC abstraction with local inference for various common models.


## Requirements

- Python 3.12 or higher
- [uv](https://docs.astral.sh/uv/) (recommended package manager)

## 🚀 Quick Start

### Step 1: Get API Key
Visit: [DashScope Console](https://dashscope.console.aliyun.com/) to obtain your key.

### Step 2: Start Xiaozhi Voice Chat Service with One Command!

```bash
uvx --from "vixio[dev-qwen-streaming]" vixio run xiaozhi-server \
  --preset qwen-realtime \
  --dashscope-key sk-your-key-here
```

**What you get:**
- WebSocket server running at `http://localhost:8000`
- End-to-end voice AI (Qwen Omni Realtime)
- Low latency
- Ready for Xiaozhi devices or custom clients

### Step 3: Recompile Xiaozhi Firmware
- Run `idf.py menuconfig`
- Select Xiaozhi Assistant
- Change the OTA address to the address shown in the console.

You have now configured the server address in your Xiaozhi device. You can start chatting!

If the default configuration doesn't meet your needs, try customizing:

### Customize Your Bot

```bash
# Use custom prompt
uvx --from "vixio[dev-qwen-streaming]" vixio run xiaozhi-server \
  --preset qwen-realtime \
  --dashscope-key sk-xxx \
  --prompt "You are a professional programming assistant"

# Use pipeline mode (more control)
uvx --from "vixio[dev-qwen-streaming]" vixio run xiaozhi-server \
  --dashscope-key sk-xxx

# Export template for full customization
uvx --from "vixio[xiaozhi]" vixio init xiaozhi-server
cd xiaozhi-server
# Edit .env, config.yaml, prompt.txt
python run.py
```

## Try the Examples

For more advanced customization, refer to the examples in the examples directory.

### Install from Source

```bash
git clone https://github.com/weynechen/vixio.git
cd vixio
uv sync --extra dev-qwen  # or dev-local-cn, dev-grpc, etc.
```

### Browse Configurations
In config/provider.yaml, there are multiple default configurations:
- `dev-in-process`: With this configuration, all local inference runs in a single process. No need to start complex microservices, but each connection starts its own inference service, consuming more resources. Suitable for quick local inference testing.

- `dev-grpc`: With this configuration, local inference runs as individual microservices. The main process connects to microservices via gRPC. You need to manually start each microservice first. You can go to the inference directory and start them individually (uv run each), or use docker compose.

- `dev-qwen-xxx`: This configuration uses Alibaba Cloud services. Configure your key and run — minimal local dependencies.

### Run Examples

- Bidirectional streaming ASR and TTS usage:

```bash
uv run python examples/xiaozhi/streaming.py
```
With cloud-based bidirectional streaming, you can achieve 1-2s first response latency. Maintains autonomous agent with full tool calling capability. Recommended for regular use.


- Realtime:
```bash
uv run python examples/xiaozhi/realtime_chat.py --env dev-qwen-realtime
```
Using end-to-end realtime models, you can achieve < 1s first response latency. However, due to model limitations, tool calling is not available (for now).

- Traditional cascade mode:
```bash
  # Development mode - In-process inference (no external services needed) . 
  uv run python examples/xiaozhi/pipeline.py --env dev-in-process
  
  # Development mode - with gRPC microservices
  uv run python examples/xiaozhi/pipeline.py --env dev-grpc 

  # Or use qwen 
  uv run python examples/xiaozhi/pipeline.py --env dev-qwen-pipeline
```
This mode offers the highest flexibility, but latency is 1.5-3s.

## Available Components

### Transport
- `xiaozhi` - Xiaozhi protocol transport (WebSocket + HTTP)

Other protocols are being designed and developed...

### VAD (Voice Activity Detection)
- `silero-vad-grpc` - Silero VAD via gRPC service
- `silero-vad-local` - Silero VAD local inference

More coming...

### ASR (Automatic Speech Recognition)
- `sherpa-onnx-asr-grpc` - Sherpa-ONNX ASR via gRPC service
- `sherpa-onnx-asr-local` - Sherpa-ONNX ASR local inference
- `qwen` - Qwen platform ASR

More coming...

### TTS (Text-to-Speech)
- `kokoro-cn-tts-grpc` - Kokoro TTS via gRPC service
- `kokoro-cn-tts-local` - Kokoro TTS local inference
- `edge-tts` - Microsoft Edge TTS (cloud)
- `qwen` - Qwen platform TTS

More coming...

### Agent
- `openai-agent` - OpenAI-compatible LLM via LiteLLM

More coming...


## Reference
https://github.com/78/xiaozhi-esp32


## Project Status

**Current Version: v0.1.x (Alpha)**

> **Note**: This project is under active development. APIs may change.

## License

Apache License - see [LICENSE](LICENSE) for details.
