Metadata-Version: 2.4
Name: ux-journey-scraper
Version: 0.3.0
Summary: Autonomous web crawler for capturing user journeys
Home-page: https://github.com/resabh/ux-journey-scraper
Author: Rishabh Patel
Author-email: Rishabh Patel <rp87704@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/resabh/ux-journey-scraper
Project-URL: Documentation, https://github.com/resabh/ux-journey-scraper#readme
Project-URL: Repository, https://github.com/resabh/ux-journey-scraper
Project-URL: Bug Tracker, https://github.com/resabh/ux-journey-scraper/issues
Project-URL: Changelog, https://github.com/resabh/ux-journey-scraper/blob/main/CHANGELOG.md
Keywords: ux,user-experience,journey,scraper,playwright,testing,guidelines,accessibility
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Topic :: Software Development :: Testing
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: playwright>=1.40.0
Requires-Dist: Pillow>=10.0.0
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: requests>=2.31.0
Requires-Dist: click>=8.1.0
Requires-Dist: urllib3>=2.0.0
Requires-Dist: tqdm>=4.66.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: aiofiles>=23.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# UX Journey Scraper

[![CI/CD Pipeline](https://github.com/resabh/ux-journey-scraper/workflows/CI/CD%20Pipeline/badge.svg)](https://github.com/resabh/ux-journey-scraper/actions)
[![codecov](https://codecov.io/gh/resabh/ux-journey-scraper/branch/main/graph/badge.svg)](https://codecov.io/gh/resabh/ux-journey-scraper)
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Autonomous web crawler for capturing user journeys.**

A powerful tool for UX designers, developers, and QA teams to autonomously capture complete user flows through websites.

## 🎯 What It Does

### Core Features (v0.3.0)
- **🤖 Autonomous Crawling**: Intelligently navigates websites without manual intervention
- **🎯 Smart Element Detection**: Finds ALL clickables (buttons, links, onclick handlers, ARIA roles)
- **📸 Journey Capture**: Records complete user flows with screenshots at each step
- **🔐 Auth Support**: Handles login flows and session management
- **📋 Form Filling**: Automatically fills checkout forms (test data only)
- **🕵️ Stealth Mode**: Anti-bot detection with human-like behavior simulation
- **🔄 SPA Support**: Works with modern single-page applications
- **🛡️ Privacy-Aware**: Automatically blurs PII in screenshots

## 🚀 Installation

### PyPI Installation (Recommended)

```bash
# Install latest version (v0.3.0 - Autonomous Crawling)
pip install ux-journey-scraper
```

### Installation from Source

```bash
# Clone the repository
git clone https://github.com/resabh/ux-journey-scraper.git
cd ux-journey-scraper

# Install in development mode
pip install -e .

# OR install with full UX guidelines support
pip install -e ".[full]"
```

**Note:** The `[full]` option includes `ecommerce-ux-guidelines` package for detailed UX analysis. The base installation works fully for autonomous crawling and journey capture without it.

### Post-Installation

After installation, install Playwright browsers:

```bash
playwright install chromium
```

## 📖 Quick Start

### Autonomous Crawl (v0.3.0)

```bash
# Create a configuration file (see scrape-config.example.yaml)
ux-journey crawl --config scrape-config.yaml --output-dir journey_output/

# The tool will:
# 1. Check robots.txt (asks for confirmation if needed)
# 2. Autonomously navigate through the site
# 3. Capture screenshots at each step
# 4. Record complete user journey
# 5. Save journey data as JSON + screenshots
```

### Manual Recording (Deprecated)

```bash
# Manually record a journey (interactive mode)
ux-journey record \
  --start-url https://example.com \
  --output my-journey.json

# Note: Use 'crawl' for autonomous navigation
```

## 💡 Use Cases

### 1. **Competitor Analysis**
Capture competitor websites to understand user flows and patterns.

```bash
ux-journey crawl --config competitor-config.yaml --output-dir competitor_journey/
```

### 2. **User Flow Documentation**
Document user flows with screenshots for design/QA teams.

```bash
ux-journey crawl --config mysite-config.yaml --output-dir user_flows/
```

### 3. **Pre-Launch Testing**
Capture staging sites before launch.

```bash
ux-journey crawl --config staging-config.yaml --output-dir staging_test/
```

### 4. **QA Automation**
Automated journey capture for regression testing.

```bash
ux-journey crawl --config qa-config.yaml --output-dir qa_snapshots/
```

## 🛠️ Features

### Journey Recording
- ✅ Autonomous crawling with priority queue
- ✅ Multi-step journey capture
- ✅ Automatic screenshot on each navigation
- ✅ HTML structure extraction
- ✅ Form field detection and auto-fill
- ✅ CTA and button analysis
- ✅ Navigation element capture
- ✅ Session management and auth support

### Accessibility
- ✅ WCAG 2.1 Level A/AA validation
- ✅ Color contrast analysis
- ✅ Keyboard navigation checks
- ✅ Screen reader compatibility
- ✅ ARIA attributes validation

### Privacy & Security
- ✅ PII blur in screenshots (emails, credit cards, names)
- ✅ robots.txt respect (with user confirmation)
- ✅ Rate limiting
- ✅ No data collection or tracking

### Reports
- ✅ Interactive HTML reports
- ✅ Annotated screenshots with issue markers
- ✅ JSON export for automation
- ✅ Journey flow visualization
- ✅ Issue summaries and scores

## 📋 CLI Reference

### `ux-journey record`

Record a user journey interactively.

```bash
ux-journey record [OPTIONS]

Options:
  --start-url TEXT         Starting URL (required)
  --output TEXT            Output file path (default: journey.json)
  --viewport TEXT          Viewport size (default: 1920x1080)
  --blur-pii               Blur PII in screenshots (default: true)
  --respect-robots         Check robots.txt (default: true)
  --headless              Run in headless mode (default: false)
```


## 🔧 Python API

```python
from ux_journey_scraper import JourneyRecorder, Journey
from ux_journey_scraper.config import ScrapeConfig
from ux_journey_scraper.core import AutonomousCrawler

# Option 1: Autonomous crawling with config
config = ScrapeConfig.load("scrape-config.yaml")
crawler = AutonomousCrawler(config=config, output_dir="journey_output/")
journey = await crawler.crawl()

# Option 2: Manual recording (deprecated)
recorder = JourneyRecorder(
    start_url="https://example.com",
    blur_pii=True,
    respect_robots=True
)
journey = await recorder.record()
journey.save("my-journey.json")

# Load existing journey
journey = Journey.load("journey.json")

# Access journey data
for step in journey.steps:
    print(f"Step {step.step_number}: {step.title}")
    print(f"  URL: {step.url}")
    print(f"  Screenshot: {step.screenshot_path}")
```

## 📊 Sample Journey Output

```json
{
  "start_url": "https://example.com",
  "start_time": "2026-03-19T12:00:00Z",
  "end_time": "2026-03-19T12:05:30Z",
  "viewport": [1920, 1080],
  "steps": [
    {
      "step_number": 1,
      "url": "https://example.com",
      "title": "Homepage",
      "screenshot_path": "screenshots/step-001.png",
      "html_snapshot": "...",
      "page_data": {
        "forms": [...],
        "buttons": [...],
        "links": [...]
      }
    },
    ...
  ]
}
```

## 🔍 How It Works

1. **Configuration**
   - YAML config defines seed URLs, auth, form fill settings
   - Priority queue manages navigation strategy

2. **Autonomous Crawling**
   - Launches stealth browser (anti-bot detection)
   - Navigates autonomously using priority queue
   - Finds ALL clickables (buttons, links, ARIA roles, onclick handlers)
   - Handles login flows and session management
   - Fills forms with test data (payment safeguards)

3. **Capture**
   - Waits for page readiness (DOM stable, no spinners, lazy-load complete)
   - Captures screenshot + HTML at each step
   - Blurs PII automatically
   - Records all page data (forms, buttons, links)

4. **Output**
   - Saves journey JSON with all steps
   - Screenshots folder with PII blurred
   - Complete DOM snapshots for analysis

## 🛣️ Roadmap

### v0.3.0 (Current) - Pure Journey Capture
- ✅ Autonomous crawling with priority queue
- ✅ Desktop web journey recording
- ✅ Smart element detection (4 strategies)
- ✅ Auth support and session management
- ✅ Form auto-fill with safeguards
- ✅ PII blur in screenshots
- ✅ robots.txt handling
- ✅ Stealth mode (anti-bot detection)

### Future Enhancements
- ⬜ Mobile viewports (responsive crawling)
- ⬜ Multi-page site mapping
- ⬜ CAPTCHA handling improvements
- ⬜ Visual regression detection
- ⬜ Performance metrics capture
- ⬜ CI/CD integration

### Analysis Features → Moved to BayMAAR
- UX guidelines analysis is now in the separate BayMAAR Analysis Engine (private)
- This package remains a pure journey capture tool

## 🤝 Contributing

Contributions welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests
5. Submit a pull request

## 📄 License

MIT License - see LICENSE file for details

## 🔗 Related Projects

- [ecommerce-ux-guidelines](https://github.com/rishabhpatelgofynd/BayMAAR) - 324+ research-backed UX guidelines

## 📞 Support

- **Issues**: [GitHub Issues](https://github.com/rishabhpatelgofynd/ux-journey-scraper/issues)
- **Discussions**: [GitHub Discussions](https://github.com/rishabhpatelgofynd/ux-journey-scraper/discussions)

---

**Made with ❤️ for better user experiences**
