Metadata-Version: 2.3
Name: spiderai
Version: 0.0.2
Summary: A Python library for extracting structured data from web pages using AI.
Project-URL: Homepage, https://github.com/AB7zz/web_extractor
Project-URL: Issues, https://github.com/AB7zz/web_extractor/issues
Author-email: Abhinav C V <abhinavcv007@gmail.com>
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Requires-Dist: absl-py==2.1.0
Requires-Dist: annotated-types==0.7.0
Requires-Dist: beautifulsoup4==4.12.3
Requires-Dist: cachetools==5.5.0
Requires-Dist: google-ai-generativelanguage==0.6.10
Requires-Dist: google-api-core==2.23.0
Requires-Dist: google-api-python-client==2.154.0
Requires-Dist: google-auth-httplib2==0.2.0
Requires-Dist: google-auth==2.36.0
Requires-Dist: google-generativeai==0.8.3
Requires-Dist: pydantic==2.10.2
Requires-Dist: python-dotenv==1.0.1
Requires-Dist: requests==2.32.3
Requires-Dist: tqdm==4.67.1
Description-Content-Type: text/markdown

# SpiderAI

A Python library for extracting structured data from web pages using AI. This library uses Google's Gemini AI to intelligently extract and format data according to your specified schema.

## Features

- Easy-to-use interface for web data extraction
- AI-powered content analysis using Google's Gemini AI
- Flexible schema definition for structured data extraction
- Automatic handling of web page fetching and parsing

## Installation

```bash
pip install spiderai
```

## Quick Start

1. First, get your Gemini AI API key from [Google AI Studio](https://makersuite.google.com/app/apikey)

2. Create a `.env` file in your project root and add your API key:

```
GEMINI_API_KEY=your_api_key_here
```

3. Use the library in your code:

```python
from spiderai import WebDataExtractor
import os
from dotenv import load_dotenv

# Load API key from .env file
load_dotenv()
gemini_api_key = os.getenv("GEMINI_API_KEY")

# Create the extractor
extractor = WebDataExtractor(api_key=gemini_api_key)

# URL to extract data from
url = "https://yoururl.com"

# Define your schema
schema = {
    "key1": "string",
    "key2": "float",
    "key3": "string"
}

# Extract the data
result = extractor.extract(url, schema)

# Use the extracted data
print("Product Name:", result["key1"])
print("Price:", result["key2"])
print("Description:", result["key3"])
```

## Schema Definition

The schema is a dictionary where:
- Keys are the field names you want to extract
- Values are the expected data types ("string", "float", "integer", etc.)

Example schema:

```python
# Product schema
schema = {
    "name": "string",
    "price": "float",
    "rating": "float",
    "review_count": "integer"
}
```

## Requirements

- Python 3.10 or higher
- Google Gemini AI API key
- Internet connection for web scraping and AI processing


## License

This project is licensed under Apache 2.0. See the [LICENSE](LICENSE) file for details.

## Contact

Feel free to contribute to the project by opening issues or suggesting improvements. For any queries, you can reach me at abhinavcv007@gmail.com