Metadata-Version: 2.4
Name: wikiscraperx
Version: 1.0.6
Summary: Scrape Wikipedia tables into CSVs or JSON
Home-page: https://github.com/Joseph-Press/wikiscraperx
Download-URL: https://github.com/Joseph-Press/WikiScraperX/archive/refs/tags/1.0.4.1.tar.gz
Author: Joseph Press
Author-email: Joseph Press <joepress101@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Joseph-Press/wikiscraperx
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Requires-Dist: beautifulsoup4
Requires-Dist: lxml
Dynamic: author
Dynamic: download-url
Dynamic: home-page
Dynamic: license-file

# wikiscraperx

## Overview
This repository contains the wikiscraperx project, a Python script designed to scrape tables from HTML documents and save them as CSV files. The project consists of two main Python files:

1. `tableParser.py`:  logic for parsing HTML tables.
2. `mainCli.py`:  CLI interface for the application.

## Prerequisites

- Python 3.x
- Additional Python packages, as listed in `requirements.txt`.

## Installation
To install the latest version:
```bash
pip install wikiscraperx
```
Or install from source:
```bash
git clone https://github.com/Joseph-Press/wikiscraperx.git
cd wikiscraperx
pip install .
```

## Usage

### Command Line Interface

Scrape all tables from a Wikipedia page to a folder (default CSV):
```bash
wikiscraperx --url "https://en.wikipedia.org/wiki/Python_(programming_language)" --output-folder ./output
```

Scrape to JSON format:
```bash
wikiscraperx --url "https://en.wikipedia.org/wiki/Python_(programming_language)" --output-folder ./output --format json
```

Save a specific table by its header (to stdout):
```bash
wikiscraperx --url "https://en.wikipedia.org/wiki/Python_(programming_language)" --header "Summary of Python 3's built-in types"
```

Save as JSON to stdout:
```bash
wikiscraperx --url "https://en.wikipedia.org/wiki/Python_(programming_language)" --header "Summary of Python 3's built-in types" --format json
```

## Use Cases

### Sports Stats Analysis
Scrape professional records of athletes (e.g., MMA fighters) for analysis:
```bash
wikiscraperx --url "https://en.wikipedia.org/wiki/Khabib_Nurmagomedov" --output-folder ./fighters/khabib --format json
```

### Climate Data Collection
Extract historical weather and climate data tables for specific regions:
```bash
wikiscraperx --url "https://en.wikipedia.org/wiki/Climate_of_London" --output-folder ./weather/london --format csv
```
