Metadata-Version: 2.4
Name: nebulascrape
Version: 0.0.1
Summary: NebulaScrape — Ultra-powerful HTTP scraping library with smart bypass, async support, and modular transport.
Home-page: https://github.com/MERO/nebulascrape
Author: MERO
Author-email: mero@ps.com
Keywords: nebulascrape,cloudflare,scraping,ddos,scrape,webscraper,anti-bot,waf,bypass,challenge,akamai,datadome,perimeterx,kasada,async,fingerprint,tls,http2
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.9.2
Requires-Dist: requests_toolbelt>=0.9.1
Requires-Dist: pyparsing>=2.4.7
Requires-Dist: httpx[http2]>=0.24.0
Requires-Dist: h2>=4.0.0
Requires-Dist: aiohttp>=3.8.0
Provides-Extra: headless
Requires-Dist: curl_cffi>=0.5.0; extra == "headless"
Provides-Extra: brotli
Requires-Dist: brotli>=1.0.9; extra == "brotli"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: summary

<div align="center">

# NebulaScrape

<img src="https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11%20%7C%203.12-blue?style=for-the-badge&logo=python&logoColor=white" alt="Python Versions">
<img src="https://img.shields.io/badge/version-0.0.1-informational?style=for-the-badge" alt="Version">
<img src="https://img.shields.io/badge/license-MIT-green?style=for-the-badge" alt="License">
<img src="https://img.shields.io/badge/asyncio-supported-blueviolet?style=for-the-badge&logo=python" alt="Async">
<img src="https://img.shields.io/badge/HTTP%2F2-supported-orange?style=for-the-badge" alt="HTTP2">
<img src="https://img.shields.io/badge/TLS-fingerprint%20spoof-red?style=for-the-badge" alt="TLS">
<img src="https://img.shields.io/badge/WAF-bypass%20engine-darkred?style=for-the-badge" alt="WAF">
<img src="https://img.shields.io/badge/maintained%20by-6x--u-black?style=for-the-badge&logo=github" alt="GitHub">

**NebulaScrape** is a production-grade Python HTTP scraping library built for the modern web.
It combines a modular transport system, intelligent session analysis, browser-realistic fingerprinting,
async support, and a powerful WAF bypass engine into a single clean API.

[Installation](#installation) &nbsp;|&nbsp;
[Quick Start](#quick-start) &nbsp;|&nbsp;
[Profiles](#fingerprint-profiles) &nbsp;|&nbsp;
[Retry Engine](#smart-retry-engine) &nbsp;|&nbsp;
[Session Intel](#session-intelligence-layer) &nbsp;|&nbsp;
[Transports](#modular-transport-system) &nbsp;|&nbsp;
[Async](#async-support) &nbsp;|&nbsp;
[Metrics](#built-in-metrics) &nbsp;|&nbsp;
[Plugins](#plugin-system) &nbsp;|&nbsp;
[API Reference](#api-reference)

</div>

---

## Table of Contents

- [Overview](#overview)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Fingerprint Profiles](#fingerprint-profiles)
- [Smart Retry Engine](#smart-retry-engine)
- [Session Intelligence Layer](#session-intelligence-layer)
- [Modular Transport System](#modular-transport-system)
- [Async Support](#async-support)
- [Built-in Metrics](#built-in-metrics)
- [Plugin System](#plugin-system)
- [WAF Bypass Engine](#waf-bypass-engine)
- [Advanced Usage](#advanced-usage)
- [API Reference](#api-reference)
- [Configuration Reference](#configuration-reference)

---

## Overview

NebulaScrape was designed to solve the hardest problem in modern web scraping: getting a real HTTP response from a server that actively tries to block automated clients.

Most scraping libraries send requests that are trivially identifiable as bots. They have wrong TLS fingerprints, wrong header order, no sec-ch-ua fields, no browser timing patterns, and no ability to recover intelligently from blocks. NebulaScrape was built from the ground up to solve all of these problems at once.

### What Makes NebulaScrape Different

**TLS fingerprint spoofing.** Every modern WAF inspects the TLS ClientHello. NebulaScrape sends the exact cipher suite list, ECDH curve, and TLS extension order that real Chrome 120 sends. A plain requests session sends a fingerprint that gets flagged immediately.

**Ordered, realistic HTTP headers.** Browsers send headers in a specific order that WAFs check. NebulaScrape uses OrderedDict-based profiles that match real browser traffic captures, including the correct sec-ch-ua, sec-ch-ua-mobile, sec-ch-ua-platform, Sec-Fetch-* fields.

**Smart retry decisions.** When a request fails with 403, 429, or 503, NebulaScrape does not blindly retry. It analyzes the response, reads the Retry-After header, calculates exponential backoff with jitter, decides whether to rotate the session fingerprint or rebuild the connection, and escalates to a more capable transport if needed.

**Session intelligence.** Every response is analyzed for WAF vendor signatures. The library tells you whether you hit a Cloudflare IUAM page, a DataDome challenge, a PerimeterX block, rate limiting, or a redirect loop, and attaches a risk score from 0 to 100 to each response.

**Transport escalation.** Under high-protection targets, NebulaScrape automatically escalates from standard HTTP/1.1 with TLS spoofing, to HTTP/2 via httpx, to full browser impersonation via curl_cffi. No code change required.

**Async-first.** NebulaScraper ships with a native asyncio client that supports all the same features, including retry logic, intelligence analysis, metrics, and plugins.

---

## Installation

**Minimum requirements:** Python 3.8+

Install the core library:

```bash
pip install nebulascrape
```

Install with headless browser impersonation support (required for Cloudflare Turnstile, Managed Challenge, and the most aggressive WAF protections):

```bash
pip install "nebulascrape[headless]"
```

Install from source:

```bash
git clone https://github.com/6x-u/nebulascrape.git
cd nebulascrape
pip install -e .
```

### Dependencies

| Package | Purpose | Required |
|---|---|---|
| `requests` >= 2.9.2 | Base HTTP transport | Yes |
| `requests_toolbelt` >= 0.9.1 | Request debugging | Yes |
| `pyparsing` >= 2.4.7 | JS challenge parsing | Yes |
| `httpx[http2]` >= 0.24.0 | HTTP/2 transport | Yes |
| `h2` >= 4.0.0 | HTTP/2 protocol | Yes |
| `aiohttp` >= 3.8.0 | Async fallback transport | Yes |
| `curl_cffi` >= 0.5.0 | Browser impersonation | Optional (headless) |
| `brotli` >= 1.0.9 | Brotli decompression | Optional |

---

## Quick Start

The simplest way to use NebulaScrape is through the `Client` class. It handles everything internally.

```python
from nebulascrape import Client

client = Client(profile="chrome_windows", auto_retry=True)
response = client.get("https://target.com")

print(response.status_code)
print(response.meta["challenge_type"])
print(response.meta["risk_score"])
print(response.meta["metrics"]["latency_ms"])
```

For full session control, use `NebulaScraper` directly:

```python
from nebulascrape import NebulaScraper

scraper = NebulaScraper(
    profile="chrome_windows",
    auto_retry=True,
    max_retries=5,
    mode="auto",
    interpreter="native",
    debug=False,
)

response = scraper.get("https://target.com")
print(response.meta)
```

For token extraction:

```python
from nebulascrape import get_tokens

tokens, user_agent = get_tokens("https://target.com")
print(tokens)
print(user_agent)
```

---

## Fingerprint Profiles

NebulaScrape ships with four pre-built browser fingerprint profiles. Each profile contains a real User-Agent string, browser-realistic headers in the correct order, a matching TLS cipher suite list, and the correct ECDH curve.

| Profile Name | Browser | Platform | sec-ch-ua-mobile |
|---|---|---|---|
| `chrome_windows` | Chrome 120 | Windows 10 x64 | false |
| `chrome_linux` | Chrome 120 | Linux x86_64 | false |
| `firefox` | Firefox 121 | Windows 10 | N/A |
| `mobile` | Chrome 120 | Android 13 | true |

### Using a Profile

```python
from nebulascrape import Client

# Use any built-in profile
client = Client(profile="chrome_linux")
client = Client(profile="firefox")
client = Client(profile="mobile")
```

### Inspecting a Profile

```python
from nebulascrape.fingerprints import get_profile, available_profiles

print(available_profiles())
# ['chrome_windows', 'chrome_linux', 'firefox', 'mobile']

profile = get_profile("chrome_windows")
print(profile["user_agent"])
print(profile["headers"])
print(profile["cipher_suite"])
```

### Why Header Order Matters

A standard requests session sends headers in an arbitrary order. Real browsers always send headers in a fixed, browser-specific order. WAFs such as Datadome and Kasada inspect header order as a primary bot signal. NebulaScrape uses `OrderedDict` to enforce the correct order for every profile:

```
User-Agent
Accept
Accept-Language
Accept-Encoding
sec-ch-ua
sec-ch-ua-mobile
sec-ch-ua-platform
Upgrade-Insecure-Requests
Sec-Fetch-Dest
Sec-Fetch-Mode
Sec-Fetch-Site
Sec-Fetch-User
```

This matches the exact order captured from a real Chrome 120 browser session.

---

## Smart Retry Engine

The `SmartRetryEngine` replaces naive retry loops with a response-aware retry decision system. It analyzes each failed response and decides the appropriate action based on the error type, attempt count, and session intelligence.

### How It Works

Every response is passed through `analyze_response()`, which returns a `RetryDecision` containing:

- `action` — what to do next (pass, wait and retry, rotate session, rebuild connection, switch transport, or abort)
- `backoff_seconds` — how long to wait before retrying
- `rotate_session` — whether to change the User-Agent and fingerprint
- `rebuild_connection` — whether to tear down and rebuild the connection pool
- `switch_transport` — whether to escalate to a higher-tier transport

### Per-Status Logic

**HTTP 403 Forbidden**

Indicates fingerprint detection or IP block. The engine rotates the browser fingerprint and session identity, waits a random jitter interval between 2 and 8 seconds to simulate human behavior, and rebuilds the connection on the third attempt to clear any connection-level state the server may be tracking.

**HTTP 429 Too Many Requests**

Indicates rate limiting. The engine first reads the `Retry-After` response header and uses that value if present, adding a small random jitter. If no header is present, it calculates exponential backoff: `1.5 * 2^attempt` seconds, capped at 120 seconds. Session rotation activates from the second attempt onward.

**HTTP 503 Service Unavailable**

Indicates the connection itself may be flagged. The engine rebuilds the connection pool immediately and escalates to a higher transport tier from the second attempt onward.

**HTTP 407, 408, 502, 504, 52x**

Treated as transient infrastructure errors. Exponential backoff applies, capped at 90 seconds.

**Intelligence-driven retry**

If `SessionIntelligence` detects a challenge or high-risk response (even on a 200), the retry engine uses the intel result to decide whether to rotate, switch transport, or escalate.

### Configuration

```python
from nebulascrape import Client

client = Client(
    auto_retry=True,
    max_retries=7,  # default is 5
)
```

```python
from nebulascrape.retry_engine import SmartRetryEngine

engine = SmartRetryEngine(max_retries=10, base_backoff=2.0)
```

---

## Session Intelligence Layer

The `SessionIntelligence` class analyzes every response and classifies the WAF vendor and challenge type. This information is attached to `response.meta` on every request.

### Challenge Types

| Challenge Type | Description |
|---|---|
| `none` | Clean response, no challenge detected |
| `cf_iuam` | Cloudflare I'm Under Attack Mode (v1 JS challenge) |
| `cf_captcha` | Cloudflare hCaptcha / reCaptcha challenge |
| `cf_turnstile` | Cloudflare Turnstile (v3 challenge) |
| `cf_managed` | Cloudflare Managed Challenge |
| `cf_block_1020` | Cloudflare firewall rule block (error 1020) |
| `datadome` | DataDome bot detection challenge |
| `perimeterx` | PerimeterX / HUMAN Security block |
| `kasada` | Kasada protection challenge |
| `akamai` | Akamai Bot Manager challenge |
| `imperva` | Imperva / Incapsula protection |
| `shape` | F5 Shape Security protection |
| `rate_limited` | Generic rate limiting (429 or Retry-After header) |
| `js_required` | Page requires JavaScript execution |
| `redirect_loop` | Detected circular redirect chain |

### Risk Score

The risk score is an integer from 0 to 100 representing how likely the response represents a blocking or detection event:

| Score Range | Interpretation |
|---|---|
| 0 | Clean response |
| 1-30 | WAF present but not triggered |
| 31-60 | Rate limiting or soft block |
| 61-80 | Active JS or captcha challenge |
| 81-100 | Hard block, firewall, or advanced WAF challenge |

### Reading the Meta

```python
from nebulascrape import Client

client = Client(profile="chrome_windows", auto_retry=True)
response = client.get("https://target.com")

print(response.meta["challenge_type"])   # "cf_iuam" / "datadome" / "none" / ...
print(response.meta["risk_score"])       # 0 - 100
print(response.meta["waf_vendor"])       # "cloudflare" / "akamai" / "none" / ...
print(response.meta["retry_recommended"])
print(response.meta["rotate_session"])
print(response.meta["details"])          # {"retry_after": None, "cf_ray": "...", "status_code": 200}
```

### Direct Usage

```python
from nebulascrape.session_intel import SessionIntelligence
import requests

resp = requests.get("https://some-protected-site.com")
intel = SessionIntelligence()
result = intel.analyze(resp)

print(result.challenge_type)
print(result.risk_score)
print(result.waf_vendor)
```

---

## Modular Transport System

NebulaScrape uses a three-tier transport system. Each tier provides a higher level of browser mimicry. The `TransportManager` can automatically escalate through tiers when lower tiers accumulate failures.

### Transport Tiers

**Tier 1 — TransportHTTP**

Standard HTTPS over HTTP/1.1 with TLS fingerprint spoofing. Uses a custom `HTTPAdapter` that builds an SSL context with the exact cipher suite list, ECDH curve, and TLS version range from the selected fingerprint profile. This matches the JA3/JA4 fingerprint of real Chrome or Firefox and passes most WAF TLS fingerprint checks.

**Tier 2 — TransportHTTP2**

HTTP/2 transport backed by `httpx`. Sends the correct SETTINGS frame, WINDOW_UPDATE values, and pseudo-header order (`:method :authority :scheme :path`) that match real Chrome HTTP/2 fingerprints. Many sites block HTTP/1.1 clients that cannot negotiate HTTP/2.

**Tier 3 — TransportHeadless**

Full browser impersonation using `curl_cffi`. This sends traffic that is byte-for-byte indistinguishable from the target browser at the TLS and HTTP/2 layers using libcurl compiled with BoringSSL. Used as a last resort for Cloudflare Turnstile, Managed Challenge, Kasada, and similar advanced protections.

### Modes

```python
from nebulascrape import Client

# Auto: starts at HTTP1, escalates to HTTP2, then Headless on repeated failures
client = Client(mode="auto")

# Force a specific transport
client = Client(mode="http1")
client = Client(mode="http2")
client = Client(mode="headless")
```

### Manual Transport Control

```python
from nebulascrape.transports import TransportManager, TransportHTTP, TransportHTTP2, TransportHeadless

manager = TransportManager(profile_name="chrome_windows", mode="auto")
manager.mount_on(scraper_session)
manager.escalate(scraper_session)  # manually escalate one tier
manager.rebuild(scraper_session)   # rebuild the current transport
```

---

## Async Support

NebulaScrape provides a native asyncio client through `AsyncNebulaScraper` (also exported as `AsyncClient`). It supports the same profile system, retry engine, session intelligence, metrics, and plugin hooks as the synchronous client.

The async client uses `httpx.AsyncClient` with HTTP/2 enabled as its primary backend, and falls back to `aiohttp` if httpx is not available.

### Basic Async Usage

```python
import asyncio
from nebulascrape import AsyncClient

async def main():
    client = AsyncClient(profile="chrome_windows", auto_retry=True)
    response = await client.get("https://target.com")
    print(response.status_code)
    print(response.meta)
    await client.close()

asyncio.run(main())
```

### Context Manager

```python
import asyncio
from nebulascrape import AsyncClient

async def main():
    async with AsyncClient(profile="chrome_linux", auto_retry=True, max_retries=5) as client:
        r1 = await client.get("https://httpbin.org/get")
        r2 = await client.post("https://httpbin.org/post", json={"key": "value"})
        print(r1.status_code, r2.status_code)

asyncio.run(main())
```

### Concurrent Requests

```python
import asyncio
from nebulascrape import AsyncClient

async def fetch(client, url):
    r = await client.get(url)
    return r.status_code, r.meta["risk_score"]

async def main():
    async with AsyncClient(profile="chrome_windows") as client:
        urls = [
            "https://httpbin.org/get",
            "https://httpbin.org/headers",
            "https://httpbin.org/ip",
        ]
        results = await asyncio.gather(*[fetch(client, u) for u in urls])
        for status, risk in results:
            print(f"Status: {status}  Risk: {risk}")

asyncio.run(main())
```

---

## Built-in Metrics

Every response returned by NebulaScrape contains a `meta["metrics"]` dictionary with timing and retry information collected during the request lifecycle.

### Per-Request Metrics

| Field | Type | Description |
|---|---|---|
| `latency_ms` | float | Total request duration in milliseconds |
| `tls_handshake_ms` | float | Approximate TLS handshake time in milliseconds |
| `retry_count` | int | Number of retries made for this request |
| `redirect_depth` | int | Number of redirects followed |
| `transport_used` | str | Which transport tier was active (`http1`, `http2`, `async_http2`) |

### Reading Metrics

```python
from nebulascrape import Client

client = Client(profile="chrome_windows", auto_retry=True)
response = client.get("https://httpbin.org/get")

m = response.meta["metrics"]
print(f"Latency:   {m['latency_ms']} ms")
print(f"Handshake: {m['tls_handshake_ms']} ms")
print(f"Retries:   {m['retry_count']}")
print(f"Redirects: {m['redirect_depth']}")
print(f"Transport: {m['transport_used']}")
```

### Session-Level Aggregate Metrics

```python
from nebulascrape import Client

client = Client(profile="chrome_windows")

for url in ["https://httpbin.org/get", "https://httpbin.org/headers"]:
    client.get(url)

stats = client.metrics
print(f"Total requests:    {stats['total_requests']}")
print(f"Average latency:   {stats['avg_latency_ms']} ms")
print(f"Max latency:       {stats['max_latency_ms']} ms")
print(f"Total retries:     {stats['total_retries']}")
print(f"Challenges solved: {stats['challenges_solved']}")
```

---

## Plugin System

NebulaScrape includes a plugin registry that allows you to attach custom behavior to the request lifecycle without modifying the core library. All plugins inherit from `BasePlugin` and can hook into pre-request, post-request, challenge detection, and retry events.

### Built-in Plugins

**RateLimitPlugin**

Adds adaptive pre-request delays based on request rate. Detects burst patterns and automatically increases delays. Respects Retry-After headers on 429 responses.

```python
from nebulascrape import Client
from nebulascrape.plugins.rate_limit_handler import RateLimitPlugin

client = Client(profile="chrome_windows")
client.register_plugin(RateLimitPlugin(
    min_delay=0.3,
    max_delay=2.0,
    burst_threshold=10,
))
```

**HeaderOptimizerPlugin**

Ensures browser-realistic headers are applied to every request, merging them with any user-supplied headers while preserving the correct order. Adjusts Sec-Fetch headers automatically for POST requests.

```python
from nebulascrape import Client
from nebulascrape.plugins.header_optimizer import HeaderOptimizerPlugin

client = Client(profile="chrome_windows")
client.register_plugin(HeaderOptimizerPlugin(profile_name="chrome_windows"))
```

**ProxyManagerPlugin**

Manages a pool of proxy servers with automatic rotation on failure. Tracks per-proxy failure counts and rotates after two consecutive failures on the same proxy.

```python
from nebulascrape import Client
from nebulascrape.plugins.proxy_manager import ProxyManagerPlugin

proxies = [
    "http://user:pass@proxy1:8080",
    "http://user:pass@proxy2:8080",
    "http://user:pass@proxy3:8080",
]

client = Client(profile="chrome_windows")
client.register_plugin(ProxyManagerPlugin(
    proxies=proxies,
    rotate_on_fail=True,
    rotate_on_status=[403, 429, 503],
))
```

### Writing a Custom Plugin

```python
from nebulascrape.plugins import BasePlugin
from nebulascrape import Client

class LoggingPlugin(BasePlugin):
    name = "logging_plugin"
    priority = 5  # lower number = runs first

    def on_pre_request(self, scraper, method, url, kwargs):
        print(f"REQUEST  {method} {url}")
        return kwargs

    def on_post_request(self, scraper, response, kwargs):
        print(f"RESPONSE {response.status_code} - risk={response.meta.get('risk_score', 'n/a')}")
        return response

    def on_retry(self, scraper, attempt, decision):
        print(f"RETRY {attempt} - reason: {decision.reason} - waiting {decision.backoff_seconds:.1f}s")

client = Client(profile="chrome_windows", auto_retry=True)
client.register_plugin(LoggingPlugin())

response = client.get("https://httpbin.org/get")
```

### Plugin Hook Reference

| Hook | When it runs | Return value |
|---|---|---|
| `on_pre_request(scraper, method, url, kwargs)` | Before every request | Modified kwargs dict |
| `on_post_request(scraper, response, kwargs)` | After every response | response object |
| `on_challenge_detected(scraper, response, intel_result)` | When a challenge is found | bool |
| `on_retry(scraper, attempt, decision)` | Before each retry sleep | None |

---

## WAF Bypass Engine

NebulaScrape's bypass capabilities are integrated across multiple layers of the library. There is no single "bypass" function. Instead, bypass is the result of the fingerprint, transport, intelligence, and retry systems working together.

### Cloudflare

**I'm Under Attack Mode (v1)**

Detected by inspecting the response body for the characteristic jsch trace image and challenge form. The library extracts the challenge parameters, waits a browser-realistic delay (parsed from the page's own JavaScript, with jitter added), solves the JavaScript challenge using the native interpreter, submits the solution as a POST request, and follows the redirect to retrieve the real page. The `cf_clearance` cookie is then retained in the session for future requests.

**Turnstile**

Detected by looking for `cf-turnstile` or `challenges.cloudflare.com/turnstile` in the response. When this challenge is detected, the library raises `TurnstileChallengeError` and recommends using `TransportHeadless` with `curl_cffi`, which passes the Turnstile check at the TLS and HTTP/2 fingerprint layer without requiring a browser.

**Managed Challenge and v2**

Detected by inspecting the CDN CGI orchestration endpoint pattern. Escalation to the headless transport is recommended.

**Cloudflare Firewall 1020**

Detected and raised as `CloudflareCode1020`. This is an IP-level block that requires a proxy rotation.

### Multi-WAF Detection

The `SessionIntelligence` layer detects the following vendors using header and body signature matching:

| WAF | Detection Method |
|---|---|
| Cloudflare | `Server: cloudflare` header + body patterns |
| DataDome | `dd_sitekey`, `datadome.co` cookie domains |
| PerimeterX | `_pxdk` cookie, `PerimeterX` body references |
| Kasada | `kasada`, `kpsdk` body references |
| Akamai | `_abck`, `ak_bmsc` cookies, sensor_data |
| Imperva | `incap_ses_`, `visid_incap_` cookies |
| Shape Security | `shape.io`, `x-shape-` headers |

### TLS Fingerprint Spoofing

Standard Python `ssl` sends a TLS fingerprint (JA3) that is trivially identifiable as a non-browser client. NebulaScrape replaces the default SSL context with one that:

- Sets the cipher suite list to match Chrome 120's exact order
- Sets the ECDH curve to `prime256v1`
- Sets TLS minimum version to TLS 1.2 and maximum to TLS 1.3
- Preserves the correct TLS extension set

This produces a JA3 fingerprint that matches a real Chrome browser.

---

## Advanced Usage

### All Options Together

```python
from nebulascrape import Client
from nebulascrape.plugins.rate_limit_handler import RateLimitPlugin
from nebulascrape.plugins.proxy_manager import ProxyManagerPlugin
from nebulascrape.plugins.header_optimizer import HeaderOptimizerPlugin

client = Client(
    profile="chrome_windows",
    auto_retry=True,
    max_retries=7,
    mode="auto",
    interpreter="native",
    debug=False,
)

client.register_plugin(RateLimitPlugin(min_delay=0.5, max_delay=3.0))
client.register_plugin(HeaderOptimizerPlugin(profile_name="chrome_windows"))
client.register_plugin(ProxyManagerPlugin(proxies=["http://proxy1:8080"]))

response = client.get("https://target.com", timeout=30)

print("Status:    ", response.status_code)
print("Challenge: ", response.meta["challenge_type"])
print("Risk:      ", response.meta["risk_score"])
print("WAF:       ", response.meta["waf_vendor"])
print("Latency:   ", response.meta["metrics"]["latency_ms"], "ms")
print("Retries:   ", response.meta["metrics"]["retry_count"])
```

### Using the Low-Level NebulaScraper

```python
from nebulascrape import NebulaScraper

scraper = NebulaScraper(
    browser={"browser": "chrome", "platform": "windows", "desktop": True},
    auto_retry=True,
    max_retries=5,
    mode="auto",
    captcha={"provider": "2captcha", "api_key": "YOUR_KEY"},
    solveDepth=3,
    doubleDown=True,
    delay=None,
)

response = scraper.get("https://target.com")
cookies = response.cookies
tokens = scraper.cookies.get("cf_clearance")
```

### Passing Cookies or Proxies

```python
from nebulascrape import Client

client = Client(profile="chrome_windows")

# Proxies
response = client.get("https://target.com", proxies={
    "http": "http://proxy:8080",
    "https": "http://proxy:8080",
})

# Custom cookies
response = client.get("https://target.com", cookies={
    "session_id": "abc123",
})

# Custom headers (merged with profile headers)
response = client.get("https://target.com", headers={
    "Referer": "https://google.com",
    "X-Custom-Header": "value",
})
```

### Integrating with Existing Sessions

```python
import requests
from nebulascrape import NebulaScraper

existing_session = requests.Session()
existing_session.headers.update({"Authorization": "Bearer token123"})

scraper = NebulaScraper.create_scraper(
    sess=existing_session,
    profile="chrome_linux",
    auto_retry=True,
)

response = scraper.get("https://api.target.com/data")
```

### Captcha Integration

```python
from nebulascrape import Client

client = Client(
    profile="chrome_windows",
    captcha={
        "provider": "2captcha",
        "api_key": "YOUR_2CAPTCHA_KEY",
    }
)

response = client.get("https://cloudflare-captcha-site.com")
```

Supported captcha providers: `2captcha`, `anticaptcha`, `capmonster`, `capsolver`, `9kw`, `deathbycaptcha`.

### Getting Cloudflare Tokens

```python
from nebulascrape import get_tokens, get_cookie_string

tokens, user_agent = get_tokens("https://cloudflare-protected-site.com")
print("cf_clearance:", tokens["cf_clearance"])
print("User-Agent:  ", user_agent)

cookie_string, user_agent = get_cookie_string("https://cloudflare-protected-site.com")
print("Cookie:", cookie_string)
```

---

## API Reference

### `Client`

```
Client(
    profile="chrome_windows",
    auto_retry=True,
    max_retries=5,
    mode="auto",
    captcha={},
    interpreter="native",
    debug=False,
    **kwargs
)
```

| Parameter | Type | Default | Description |
|---|---|---|---|
| `profile` | str | `chrome_windows` | Fingerprint profile to use |
| `auto_retry` | bool | `True` | Enable smart retry engine |
| `max_retries` | int | `5` | Maximum retry attempts |
| `mode` | str | `auto` | Transport mode (`auto`, `http1`, `http2`, `headless`) |
| `captcha` | dict | `{}` | Captcha provider configuration |
| `interpreter` | str | `native` | JS interpreter for challenge solving |
| `debug` | bool | `False` | Enable request/response debugging output |

**Methods:** `get(url, **kwargs)`, `post(url, **kwargs)`, `put(url, **kwargs)`, `delete(url, **kwargs)`, `request(method, url, **kwargs)`, `register_plugin(plugin)`, `session` (property), `metrics` (property)

---

### `AsyncClient` / `AsyncNebulaScraper`

```
AsyncClient(
    profile="chrome_windows",
    auto_retry=True,
    max_retries=5,
    debug=False,
    **kwargs
)
```

**Methods:** `await get(url, **kwargs)`, `await post(url, **kwargs)`, `await put(url, **kwargs)`, `await delete(url, **kwargs)`, `await request(method, url, **kwargs)`, `register_plugin(plugin)`, `await close()`, supports `async with`.

---

### `NebulaScraper`

Extends `requests.Session`. All `requests.Session` methods are available.

Additional parameters on top of Client:

| Parameter | Type | Default | Description |
|---|---|---|---|
| `browser` | dict or None | None | Browser dict with keys `browser`, `platform`, `desktop`, `mobile` |
| `solveDepth` | int | `3` | Maximum Cloudflare challenge solve loops |
| `doubleDown` | bool | `True` | Double request on captcha to check if cfuid is enough |
| `delay` | float or None | None | Manual Cloudflare challenge delay in seconds |
| `disableCloudflareV1` | bool | `False` | Disable built-in Cloudflare v1 bypass |
| `requestPreHook` | callable | None | Function called before each request |
| `requestPostHook` | callable | None | Function called after each response |
| `source_address` | str or tuple | None | Bind to a specific local IP |
| `ssl_context` | ssl.SSLContext | None | Custom SSL context |

---

### `response.meta` Fields

| Field | Type | Description |
|---|---|---|
| `challenge_type` | str | Detected challenge type (see challenge type table) |
| `waf_vendor` | str | Detected WAF vendor |
| `risk_score` | int | Risk score 0-100 |
| `retry_recommended` | bool | Whether retry is suggested |
| `rotate_session` | bool | Whether session rotation is suggested |
| `switch_transport` | bool | Whether transport escalation is suggested |
| `details` | dict | Raw details: retry_after, cf_ray, status_code |
| `metrics` | dict | latency_ms, tls_handshake_ms, retry_count, redirect_depth, transport_used |

---

## Configuration Reference

### Fingerprint Profiles

| Profile | User-Agent snippet | Platform |
|---|---|---|
| `chrome_windows` | Chrome/120.0.0.0 ... Windows NT 10.0 | Windows |
| `chrome_linux` | Chrome/120.0.0.0 ... X11; Linux x86_64 | Linux |
| `firefox` | Firefox/121.0 ... Windows NT 10.0 | Windows |
| `mobile` | Chrome/120.0.6099.144 Mobile ... Android 13 | Android |

### Transport Modes

| Mode | Backend | HTTP Version | TLS Spoof | Impersonation Level |
|---|---|---|---|---|
| `http1` | requests | HTTP/1.1 | JA3 cipher suite | High |
| `http2` | httpx | HTTP/2 | JA3 + H2 SETTINGS | Very High |
| `headless` | curl_cffi | HTTP/2 | Full BoringSSL | Maximum |
| `auto` | escalating | depends | depends | Adaptive |

### JavaScript Interpreters

| Interpreter | Requirement | Description |
|---|---|---|
| `native` | None (built-in) | Pure Python JS evaluation for simple challenges |
| `js2py` | `pip install js2py` | Full JavaScript runtime |
| `nodejs` | Node.js installed | Executes via Node.js subprocess |
| `chakracore` | ChakraCore binary | Microsoft JS engine |
| `v8` | V8 binary | Google V8 JS engine |

---

## Author

| Field | Value |
|---|---|
| Developer | MERO |
| Contact | TG@QP4M |
| GitHub | [github.com/6x-u](https://github.com/6x-u) |
| License | MIT |
