Metadata-Version: 2.4
Name: async-web-search
Version: 1.2.1
Summary: Async web search library supporting Google, Wikipedia, arXiv, NewsAPI, GitHub, and PubMed data sources.
Project-URL: Homepage, https://github.com/nwaughachukwuma/async-web-search
Project-URL: Bug Tracker, https://github.com/nwaughachukwuma/async-web-search/issues
Author: Chukwuma
License-Expression: MIT
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: beautifulsoup4
Requires-Dist: httpx
Requires-Dist: lxml
Requires-Dist: python-dotenv
Requires-Dist: wikipedia
Description-Content-Type: text/markdown

# Web Search

Async web search library supporting Google Custom Search, Wikipedia, arXiv, NewsAPI, GitHub, and PubMed data sources.

> You can search across multiple sources and retrieve relevant, clean results in JSON format or as compiled text.

## 🌟 Features

- ⚡ Asynchronous Searching: Perform searches concurrently across multiple sources
- 🔗 Multi-Source Support: Query Google Custom Search, Wikipedia, arXiv, NewsAPI, GitHub, and PubMed
- 🧹 Content extraction and cleaning
- 🔧 Configurable Search Parameters: Adjust maximum results, preview length, and sources.

## 📋 Prerequisites

- 🐍 Python 3.8 or newer
- 🔑 API keys and configuration:
  - Google Search: Requires a Google API key and a Custom Search Engine (CSE) ID.
  - NewsAPI: Requires a free API key from newsapi.org.
  - arXiv: No API key required.
  - Wikipedia: No API key required.
  - GitHub: No API key required.
  - PubMed: No API key required.

Set environment variables:

```bash
export GOOGLE_API_KEY="your_google_api_key"
export CSE_ID="your_cse_id"
export NEWSAPI_KEY="your_newsapi_key"
```

## 📦 Installation

```bash
pip install async-web-search
```

## 🛠️ Usage

### Example 1: Search across multiple sources

```python
from web_search import WebSearch, WebSearchConfig

config = WebSearchConfig(sources=["google", "arxiv", "github", "newsapi", "pubmed"])
results = await WebSearch(config).search("quantum computing")

# results is a list of dicts with keys: url, title, preview, source
for result in results:
    print(f"Title: {result['title']}")
    print(f"URL: {result['url']}")
    print(f"Preview: {result['preview']}")
    print(f"Source: {result['source']}")
    print("---")
```

### Example 1.1: Compiled search results as string

```python
from web_search import WebSearch, WebSearchConfig

config = WebSearchConfig(sources=["google", "arxiv", "github"])
compiled_results = await WebSearch(config).compile_search("quantum computing")

print(compiled_results)  # Prints a formatted string with all results
```

### Example 2: Google Search

```python
from web_search import GoogleSearchConfig
from web_search.google import GoogleSearch

config = GoogleSearchConfig(
    api_key="your_google_api_key",
    cse_id="your_cse_id",
    max_results=5
)
results = await GoogleSearch(config)._search("quantum computing")

for result in results:
    print(result)
```

### Example 3: Wikipedia Search

```python
from web_search import BaseConfig
from web_search.wikipedia_ import WikipediaSearch

wiki_config = BaseConfig(max_results=5)
results = await WikipediaSearch(wiki_config)._search("deep learning")

for result in results:
    print(result)
```

### Example 4: ArXiv Search

```python
from web_search import BaseConfig
from web_search.arxiv import ArxivSearch

arxiv_config = BaseConfig(max_results=3)
results = await ArxivSearch(arxiv_config)._search("neural networks")

for result in results:
    print(result)
```

## 🔌 Plugin System

Need to search a data source that isn't bundled with the library? Create a plugin from `PluginSearch` and pass an **instance** via `WebSearchConfig.plugins`.

#### Example

```python
from web_search import WebSearch, WebSearchConfig
from web_search.base import PluginSearch, SearchResult

class RedditSearch(PluginSearch):
    slug = "reddit"

    async def _search(self, query: str):
        # ...implement Reddit search here...
        return [
            SearchResult(
                url="https://reddit.com/r/MachineLearning/1",
                title="AMA about quantum ML",
                preview="I recently built a quantum ...",
                source=self.slug,
            )
        ]


# Option 1: Register the plugin in WebSearchConfig
config = WebSearchConfig(
    sources=["google", "arxiv"],
    plugins=[RedditSearch()]
)
results = await WebSearch(config).search("quantum computing")

# Option 2: add plugin after initializing Websearch
ws = WebSearch(config=WebSearchConfig(
    sources=["google", "arxiv"],
))
ws.add_plugin(RedditSearch())
results = await ws.search("quantum computing")
```

#### Edge-cases handled automatically:

1. Objects in the plugin list that do **not** inherit from `PluginSearch` are ignored.
2. Exceptions raised inside a plugin are caught; other providers still return results.

## 🌐 Production API Server

A FastAPI-based production server is available for teams that want to use async web search as a web service. The server is hosted at **https://awebs.veedo.ai** and can be run locally as well.

See [`server/README.md`](server/README.md) for detailed API documentation, endpoints, and deployment instructions.

## 📘 API Overview

### 🔧 Configuration

- BaseConfig: Shared configuration for all sources (e.g., max_results and timeout).
- GoogleSearchConfig: Google-specific settings (e.g., api_key, cse_id).
- WebSearchConfig: Configuration for the overall search process (e.g., sources to query).

### 📚 Classes

- WebSearch: Entry point for performing searches across multiple sources.
- GoogleSearch: Handles searches via Google Custom Search Engine API.
- WikipediaSearch: Searches Wikipedia and retrieves article previews.
- ArxivSearch: Queries arXiv for academic papers.

### ⚙️ Methods

- search(query: str): Main search method for WebSearch.
- \_search(query: str): Source-specific search logic for GoogleSearch, WikipediaSearch, and ArxivSearch.

## 🤝 Contributing

We welcome contributions! To contribute:

- Fork the repository.
- Create a new branch (git checkout -b feature-name).
- Commit your changes (git commit -am "Add new feature").
- Push to the branch (git push origin feature-name).
- Open a pull request.

### 🧪 Running Tests

```bash
pytest -v
```

## License

MIT
