Metadata-Version: 2.4
Name: nexract
Version: 2.0.0
Summary: Nexract — Web Intelligence API SDK. Extract, analyze, and monitor any webpage. Built-in OpenAI, Claude, and LangChain integrations.
Author-email: Nexract <support@nexract.ai>
License: MIT
Project-URL: Homepage, https://www.nexract.ai
Project-URL: Documentation, https://www.nexract.ai/docs.html
Project-URL: Repository, https://github.com/Aromyla/Nexract
Project-URL: API Playground, https://www.nexract.ai/api/docs
Project-URL: Bug Tracker, https://github.com/Aromyla/Nexract/issues
Keywords: web-scraping,web-extraction,ai,openai,claude,langchain,llm,scraper,markdown,api,nexract,web-intelligence,batch-extraction,arabic-nlp
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup :: HTML
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: requests>=2.28.0
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.1.0; extra == "langchain"
Requires-Dist: pydantic>=2.0.0; extra == "langchain"

# Nexract — Web Intelligence API SDK

Extract, analyze, and monitor any webpage with one API call.

[![PyPI version](https://badge.fury.io/py/nexract.svg)](https://pypi.org/project/nexract/)

## Features

- **Extract** clean markdown from any public URL
- **AI Analysis** — summary, sentiment, entities, categories (powered by Claude)
- **Batch Extract** — up to 10 URLs in one request
- **Schema Extraction** — custom JSON schema for structured data
- **OpenAI Tool** — built-in function calling definition
- **Claude Tool** — built-in Anthropic tool definition
- **LangChain Tool** — native StructuredTool integration
- **Content Filtering** — max_words, no_links, first_paragraphs
- **Retry Logic** — automatic exponential backoff for 429/503
- **Arabic Intelligence** — dialect detection, bilingual analysis

## Install

```bash
pip install nexract
```

For LangChain integration:
```bash
pip install nexract[langchain]
```

## Quick Start

```python
from nexract import Nexract

nx = Nexract("SK-LAB-YOUR-KEY")

# Basic extraction (1 credit)
result = nx.extract("https://example.com/article")
print(result.data)        # Clean markdown
print(result.title)       # Article title
print(result.word_count)  # Word count

# With AI analysis (+2 credits)
result = nx.extract("https://bbc.com/article", ai=True)
print(result.summary)     # AI-generated summary
print(result.sentiment)   # positive/negative/neutral

# Batch extract (up to 10 URLs)
batch = nx.batch_extract([
    "https://site1.com",
    "https://site2.com",
    "https://site3.com",
])
for r in batch:
    print(r.title, r.word_count)

# Check balance
bal = nx.balance()
print(f"Credits: {bal.balance}")
```

## OpenAI Integration

```python
from openai import OpenAI
from nexract import Nexract
import json

nx = Nexract("SK-LAB-YOUR-KEY")
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize https://example.com"}],
    tools=[nx.openai_tool()],  # One line!
)

for call in response.choices[0].message.tool_calls or []:
    if call.function.name == "nexract_extract":
        args = json.loads(call.function.arguments)
        content = nx.handle_tool_call(args)
        print(content)
```

## Claude Integration

```python
import anthropic
from nexract import Nexract

nx = Nexract("SK-LAB-YOUR-KEY")
client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[nx.claude_tool()],  # One line!
    messages=[{"role": "user", "content": "Extract https://example.com"}],
)

for block in response.content:
    if block.type == "tool_use" and block.name == "nexract_extract":
        content = nx.handle_tool_call(block.input)
        print(content)
```

## LangChain Integration

```python
from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
from nexract import Nexract

nx = Nexract("SK-LAB-YOUR-KEY")
llm = ChatOpenAI(model="gpt-4o")

agent = initialize_agent(
    tools=[nx.langchain_tool()],  # One line!
    llm=llm,
    agent=AgentType.OPENAI_FUNCTIONS,
)

result = agent.invoke("Summarize https://techcrunch.com")
print(result["output"])
```

## API Reference

| Method | Description |
|--------|-------------|
| `nx.extract(url, **options)` | Extract a webpage (1 credit) |
| `nx.batch_extract(urls, **options)` | Extract multiple URLs (1 credit each) |
| `nx.balance()` | Check credit balance |
| `nx.openai_tool()` | OpenAI function calling definition |
| `nx.claude_tool()` | Anthropic Claude tool definition |
| `nx.langchain_tool()` | LangChain StructuredTool |
| `nx.handle_tool_call(args)` | Handle AI tool call → returns content string |

## Links

- **Website:** [nexract.ai](https://www.nexract.ai)
- **API Docs:** [nexract.ai/docs.html](https://www.nexract.ai/docs.html)
- **Swagger UI:** [nexract.ai/api/docs](https://www.nexract.ai/api/docs)
- **Integrations:** [nexract.ai/integrations.html](https://www.nexract.ai/integrations.html)

## License

MIT
