Metadata-Version: 2.4
Name: valyu
Version: 2.2.0
Summary: Deepsearch API for AI.
Home-page: https://valyu.network
Author: Valyu
Author-email: contact@valyu.network
Maintainer: Harvey Yorke
Maintainer-email: harvey@valyu.network
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.31.0
Requires-Dist: pydantic>=2.5.0
Requires-Dist: openai>=1.66.0
Requires-Dist: anthropic>=0.46.0
Requires-Dist: python-dotenv>=1.0.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: maintainer
Dynamic: maintainer-email
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Valyu SDK

**Search for AIs**

Valyu's Deepsearch API gives AI the context it needs. Integrate trusted, high-quality public and proprietary sources, with full-text multimodal retrieval.

Get **$10 free credits** for the Valyu API when you sign up at [Valyu](https://platform.valyu.network)!

*No credit card required.*

## How does it work?

We do all the heavy lifting for you - one unified API for all data:

- **Academic & Research Content** - Access millions of scholarly papers and textbooks
- **Real-time Web Search** - Get the latest information from across the internet  
- **Structured Financial Data** - Stock prices, market data, and financial metrics
- **Intelligent Reranking** - Results across all sources are automatically sorted by relevance
- **Transparent Pricing** - Pay only for what you use with clear CPM pricing

## Installation

Install the Valyu SDK using pip:

```bash
pip install valyu
```

## Quick Start

Here's what it looks like, make your first query in just 4 lines of code:

```python
from valyu import Valyu

valyu = Valyu(api_key="your-api-key-here")

response = valyu.search(
    "Implementation details of agentic search-enhanced large reasoning models",
    max_num_results=5,            # Limit to top 5 results
    max_price=10                  # Maximum price per thousand queries (CPM)
)

print(response)

# Feed the results to your AI agent as you would with other search APIs
```

## API Reference

### Search Method

The `search()` method is the core of the Valyu SDK. It accepts a query string as the first parameter, followed by optional configuration parameters.

```python
def search(
    query: str,                                    # Your search query
    search_type: str = "all",                     # "all", "web", or "proprietary"
    max_num_results: int = 10,                    # Maximum results to return (1-20)
    is_tool_call: bool = True,                    # Whether this is an AI tool call
    relevance_threshold: float = 0.5,             # Minimum relevance score (0-1)
    max_price: int = 30,                          # Maximum price per thousand queries (CPM)
    included_sources: List[str] = None,           # Specific sources to search
    excluded_sources: List[str] = None,            # Sources to exclude from search
    country_code: str = None,                     # Country code filter (e.g., "US", "GB")
    response_length: Union[str, int] = None,      # Response length: "short"/"medium"/"large"/"max" or character count
    category: str = None,                         # Category filter
    start_date: str = None,                       # Start date (YYYY-MM-DD)
    end_date: str = None,                         # End date (YYYY-MM-DD)
) -> SearchResponse
```

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `query` | `str` | *required* | The search query string |
| `search_type` | `str` | `"all"` | Search scope: `"all"`, `"web"`, or `"proprietary"` |
| `max_num_results` | `int` | `10` | Maximum number of results to return (1-20) |
| `is_tool_call` | `bool` | `True` | Whether this is an AI tool call (affects processing) |
| `relevance_threshold` | `float` | `0.5` | Minimum relevance score for results (0.0-1.0) |
| `max_price` | `int` | `30` | Maximum price per thousand queries in CPM |
| `included_sources` | `List[str]` | `None` | Specific data sources or URLs to search |
| `excluded_sources` | `List[str]` | `None` | Data sources or URLs to exclude from search |
| `country_code` | `str` | `None` | Country code filter (e.g., "US", "GB", "JP", "ALL") |
| `response_length` | `Union[str, int]` | `None` | Response length: "short"/"medium"/"large"/"max" or character count |
| `category` | `str` | `None` | Category filter for results |
| `start_date` | `str` | `None` | Start date filter in YYYY-MM-DD format |
| `end_date` | `str` | `None` | End date filter in YYYY-MM-DD format |

### Response Format

The search method returns a `SearchResponse` object with the following structure:

```python
class SearchResponse:
    success: bool                           # Whether the search was successful
    error: Optional[str]                    # Error message if any
    tx_id: str                             # Transaction ID for feedback
    query: str                             # The original query
    results: List[SearchResult]            # List of search results
    results_by_source: ResultsBySource     # Count of results by source type
    total_deduction_pcm: float             # Cost in CPM
    total_deduction_dollars: float         # Cost in dollars
    total_characters: int                  # Total characters returned
```

Each `SearchResult` contains:

```python
class SearchResult:
    title: str                             # Result title
    url: str                              # Source URL
    content: Union[str, List[Dict]]       # Full content (text or structured)
    description: Optional[str]            # Brief description
    source: str                           # Source identifier
    price: float                          # Cost for this result
    length: int                           # Content length in characters
    image_url: Optional[Dict[str, str]]   # Associated images
    relevance_score: float                # Relevance score (0-1)
    data_type: Optional[str]              # "structured" or "unstructured"
```

### Contents Method

The `contents()` method extracts clean, structured content from web pages with optional AI-powered data extraction and summarization.

```python
def contents(
    urls: List[str],                                      # List of URLs to process (max 10)
    summary: Union[bool, str, Dict] = None,              # AI summary configuration
    extract_effort: str = None,                          # "normal" or "high"
    response_length: Union[str, int] = None,             # Content length configuration
    max_price_dollars: float = None,                     # Maximum cost limit in USD
) -> ContentsResponse
```

### Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `urls` | `List[str]` | *required* | List of URLs to process (maximum 10 URLs per request) |
| `summary` | `Union[bool, str, Dict]` | `None` | AI summary configuration:<br>- `False/None`: No AI processing (raw content)<br>- `True`: Basic automatic summarization<br>- `str`: Custom instructions (max 500 chars)<br>- `dict`: JSON schema for structured extraction |
| `extract_effort` | `str` | `None` | Extraction thoroughness: `"normal"` (fast) or `"high"` (thorough but slower) |
| `response_length` | `Union[str, int]` | `None` | Content length per URL:<br>- `"short"`: 25,000 characters<br>- `"medium"`: 50,000 characters<br>- `"large"`: 100,000 characters<br>- `"max"`: No limit<br>- `int`: Custom character limit |
| `max_price_dollars` | `float` | `None` | Maximum cost limit in USD |

### Response Format

The contents method returns a `ContentsResponse` object:

```python
class ContentsResponse:
    success: bool                          # Whether the request was successful
    error: Optional[str]                   # Error message if any
    tx_id: str                            # Transaction ID for tracking
    urls_requested: int                   # Number of URLs submitted
    urls_processed: int                   # Number of URLs successfully processed
    urls_failed: int                      # Number of URLs that failed
    results: List[ContentsResult]        # List of extraction results
    total_cost_dollars: float             # Total cost in dollars
    total_characters: int                 # Total characters extracted
```

Each `ContentsResult` contains:

```python
class ContentsResult:
    url: str                              # Source URL
    title: str                            # Page/document title
    content: Union[str, int, float]       # Extracted content
    length: int                           # Content length in characters
    source: str                           # Data source identifier
    summary: Optional[Union[str, Dict]]   # AI-generated summary or structured data
    summary_success: Optional[bool]       # Whether summary generation succeeded
    data_type: Optional[str]              # Type of data extracted
    image_url: Optional[Dict[str, str]]   # Extracted images
    citation: Optional[str]               # APA-style citation
```

## Examples

### Basic Search

```python
from valyu import Valyu

valyu = Valyu("your-api-key")

# Simple search across all sources
response = valyu.search("What is machine learning?")
print(f"Found {len(response.results)} results")
```

### Academic Research

```python
# Search academic papers on arXiv
response = valyu.search(
    "transformer architecture improvements",
    search_type="proprietary",
    included_sources=["valyu/valyu-arxiv"],
    relevance_threshold=0.7,
    max_num_results=10
)
```

### Web Search with Date Filtering

```python
# Search recent web content
response = valyu.search(
    "AI safety developments",
    search_type="web",
    start_date="2024-01-01",
    end_date="2024-12-31",
    max_num_results=5
)
```

### Hybrid Search

```python
# Search both web and proprietary sources
response = valyu.search(
    "quantum computing breakthroughs",
    search_type="all",
    category="technology",
    relevance_threshold=0.6,
    max_price=50
)
```

### Processing Results

```python
response = valyu.search("climate change solutions")

if response.success:
    print(f"Search cost: ${response.total_deduction_dollars:.4f}")
    print(f"Sources: Web={response.results_by_source.web}, Proprietary={response.results_by_source.proprietary}")
    
    for i, result in enumerate(response.results, 1):
        print(f"\n{i}. {result.title}")
        print(f"   Source: {result.source}")
        print(f"   Relevance: {result.relevance_score:.2f}")
        print(f"   Content: {result.content[:200]}...")
else:
    print(f"Search failed: {response.error}")
```

### Content Extraction Examples

#### Basic Content Extraction

```python
# Extract raw content from URLs
response = valyu.contents(
    urls=["https://techcrunch.com/2025/08/28/anthropic-users-face-a-new-choice-opt-out-or-share-your-data-for-ai-training/"]
)

if response.success:
    for result in response.results:
        print(f"Title: {result.title}")
        print(f"Content: {result.content[:500]}...")
```

#### Content with AI Summary

```python
# Extract content with automatic summarization
response = valyu.contents(
    urls=["https://docs.python.org/3/tutorial/"],
    summary=True,
    response_length="max"
)

for result in response.results:
    print(f"Summary: {result.summary}")
```

#### Structured Data Extraction

```python
# Extract structured data using JSON schema
company_schema = {
    "type": "object",
    "properties": {
        "company_name": {"type": "string"},
        "founded_year": {"type": "integer"},
        "key_products": {
            "type": "array",
            "items": {"type": "string"},
            "maxItems": 3
        }
    }
}

response = valyu.contents(
    urls=["https://en.wikipedia.org/wiki/OpenAI"],
    summary=company_schema,
    response_length="max"
)

if response.success:
    for result in response.results:
        if result.summary:
            print(f"Structured data: {json.dumps(result.summary, indent=2)}")
```

#### Multiple URLs

```python
# Process multiple URLs with a cost limit
response = valyu.contents(
    urls=[
        "https://www.valyu.network/",
        "https://docs.valyu.network/overview",
        "https://www.valyu.network/blogs/why-ai-agents-and-llms-struggle-with-search-and-data-access"
    ],
    summary="Provide key takeaways in bullet points, and write in very emphasised singaporean english"
)

print(f"Processed {response.urls_processed}/{response.urls_requested} URLs")
print(f"Cost: ${response.total_cost_dollars:.4f}")
```

## Authentication

Set your API key in one of these ways:

1. **Environment variable** (recommended):
   ```bash
   export VALYU_API_KEY="your-api-key-here"
   ```

2. **Direct initialization**:
   ```python
   valyu = Valyu(api_key="your-api-key-here")
   ```

## Error Handling

The SDK handles errors gracefully and returns structured error responses:

```python
response = valyu.search("test query")

if not response.success:
    print(f"Error: {response.error}")
    print(f"Transaction ID: {response.tx_id}")
else:
    # Process successful results
    for result in response.results:
        print(result.title)
```

## Getting Started

1. Sign up for a free account at [Valyu](https://platform.valyu.network)
2. Get your API key from the dashboard  
3. Install the SDK: `pip install valyu`
4. Start building with the examples above

## Support

- **Documentation**: [docs.valyu.network](https://docs.valyu.network)
- **API Reference**: Full parameter documentation above
- **Examples**: Check the `examples/` directory in this repository
- **Issues**: Report bugs on GitHub

## License

This project is licensed under the MIT License.
