Metadata-Version: 2.4
Name: voice-agent-core
Version: 1.2.0
Summary: A wake-word driven voice agent with extensible tools.
Author-email: Darshan <darshanbr081@gmail.com>
License: MIT License
        
        Copyright (c) [2025] [Darshan BR]
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Project-URL: Homepage, https://github.com/Darshan0312/Voice_Agent
Project-URL: Bug Tracker, https://github.com/Darshan0312/Voice_agent/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: google-generativeai>=0.8.5
Requires-Dist: openai-whisper>=20231117
Requires-Dist: PyAudio>=0.2.14
Requires-Dist: SpeechRecognition>=3.14.3
Requires-Dist: pyttsx3>=2.90
Requires-Dist: pywhatkit>=5.4
Requires-Dist: beautifulsoup4>=4.12.3
Requires-Dist: PyAutoGUI>=0.9.54
Requires-Dist: python-dotenv>=1.0.1
Requires-Dist: torch>=2.7.1
Requires-Dist: Flask>=3.0.0
Requires-Dist: wikipedia>=1.4.0
Requires-Dist: pydub>=0.25.1
Requires-Dist: soundfile>=0.12.1
Requires-Dist: pvporcupine>=3.0.0
Requires-Dist: requests>=2.31.0
Dynamic: license-file

# Voice_Agent

# Voice Agent Core

A powerful, extensible, wake-word driven voice assistant framework for Python. This project provides a robust foundation for building your own conversational AI agents that can perform tasks, answer questions, and integrate with various APIs.

![Voice Agent](https://user-images.githubusercontent.com/1094726/106368383-35f1f900-633b-11eb-814a-71b504a9b5bd.gif)
*(Demo GIF: A short animation showing the agent being activated and performing a task would go here.)*

## Features

-   **⚡️ High-Performance Wake Word:** Utilizes the industry-standard **Picovoice Porcupine** engine for instant, low-resource, and highly accurate wake-word detection right on your device.
-   **🚀 Fast & Accurate Speech-to-Text:** Powered by **OpenAI's Whisper (`tiny.en` model)** for fast and reliable English command transcription.
-   **🧠 Intelligent Conversational Brain:** Leverages **Google's Gemini 1.5 Pro** model for state-of-the-art natural language understanding, tool use, and conversational abilities. The agent knows when to ask for clarification and when to have a normal conversation.
-   **🗣️ High-Quality Voice:** Features a custom, high-quality Text-to-Speech API with a robust local fallback mechanism, ensuring the agent can always respond.
-   **🛠️ Extensible Skillset:** Easily add new capabilities (skills/tools) by writing simple Python functions. The agent's brain automatically understands your functions and their parameters from their docstrings.

## Prerequisites

Before you install the Python package, you must install a few system-level dependencies required by the audio libraries.

#### 1. For Audio Input (`PyAudio`)
-   **Debian/Ubuntu Linux:**
    ```bash
    sudo apt-get update && sudo apt-get install portaudio19-dev
    ```
-   **macOS (using Homebrew):**
    ```bash
    brew install portaudio
    ```
-   **Windows:**
    `PyAudio` is usually installed with the necessary binaries via pip, so no extra steps are typically needed.

#### 2. For Audio Playback (`pydub`)
This library requires FFmpeg for decoding and playing audio.
-   **Debian/Ubuntu Linux:**
    ```bash
    sudo apt-get install ffmpeg
    ```
-   **macOS (using Homebrew):**
    ```bash
    brew install ffmpeg
    ```
-   **Windows:**
    Follow a guide to [install FFmpeg and add it to your system's PATH](https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/).

## Installation & Setup

### Step 1: Install the Package
The agent can be installed directly from PyPI using pip. It is highly recommended to do this within a virtual environment.

```bash
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install the package
pip install voice-agent-core




# Step 2: Configure Your API Keys
The agent requires three API keys to function. The recommended way to manage these is by creating a .env file in the directory where you plan to run the agent.
Create a file named .env and add the following content:

# Get from Google AI Studio: https://ai.google.dev/
GEMINI_API_KEY="YOUR_GEMINI_API_KEY_HERE"

# Get from Picovoice Console: https://console.picovoice.ai/
PICOVOICE_ACCESS_KEY="YOUR_PICOVOICE_ACCESS_KEY_HERE"

# Get from OpenWeatherMap: https://openweathermap.org/api
OPENWEATHER_API_KEY="YOUR_OPENWEATHER_API_KEY_HERE"





# Quick Start: Running the Agent

Once you have installed the package and configured your .env file with the API keys, you can run the agent with a single command from your terminal:
code

```bash

run-voice-agent

```


#The agent will initialize and print Listening for wake word ('porcupine').... It is now passively listening.
# Default Skills & Example Commands
Say the wake word "Porcupine" followed by one of these commands:
Check the Weather:
"Porcupine... what's the weather like in Tokyo?"
Play Music on YouTube:
"Porcupine... play some lofi hip hop radio on YouTube."
Search Google:
"Porcupine... search for the latest news on Python."
Open a Website:
"Porcupine... open wikipedia.org."
Control Media:
"Porcupine... pause the song."
"Porcupine... resume playing."
"Porcupine... stop the music."
Open Applications:
"Porcupine... open my code editor."
Have a Conversation:
"Porcupine... hello, how are you today?"
"Porcupine... what is the capital of Canada?"


# Advanced Usage: Using as a Library

The true power of this project is its use as a framework. You can import the core run_agent function into your own scripts to create an agent with a completely custom set of tools.

###  Example: my_custom_agent.py


```bash
from voice_agent_core.main import run_agent
import datetime

# 1. Define your custom Python functions with clear docstrings.
def get_current_time():
    """
    Returns the current time in a human-readable format.
    """
    now = datetime.datetime.now()
    return f"The current time is {now.strftime('%I:%M %p')}."

def shutdown_computer(delay_minutes: int):
    """
    Initiates a system shutdown after a specified delay.

    Args:
        delay_minutes (int): The number of minutes to wait before shutting down.
    """
    print(f"WARNING: Shutdown scheduled in {delay_minutes} minutes!")
    # import os
    # os.system(f"shutdown /s /t {delay_minutes * 60}") # Example for Windows
    return f"Okay, I will shut down the computer in {delay_minutes} minutes."

# 2. Create a dictionary mapping the tool name to the function.
my_personal_tools = {
    "get_current_time": get_current_time,
    "shutdown_computer": shutdown_computer,
}

# 3. Run the agent with your custom set of tools!
if __name__ == '__main__':
    print("Starting agent with custom tools...")
    # The agent's brain will automatically learn to use your new functions.
    run_agent(available_tools=my_personal_tools)

    ```


