Metadata-Version: 2.4
Name: sotoki
Version: 3.0.1
Summary: Turn StackExchange dumps into ZIM files for offline usage
Project-URL: Donate, https://www.kiwix.org/en/support-us/
Project-URL: Homepage, https://github.com/openzim/sotoki
Author-email: openZIM <dev@openzim.org>
License: GPL-3.0-or-later
License-File: LICENSE
Keywords: offline,openzim,stackechange,stackoverflow,zim
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.14
Requires-Python: <3.15,>=3.14
Requires-Dist: backoff==2.2.1
Requires-Dist: beautifulsoup4==4.14.2
Requires-Dist: bidict==0.23.1
Requires-Dist: cli-formatter==1.2.0
Requires-Dist: dependency-injector==4.48.2
Requires-Dist: jinja2-pluralize==0.3.0
Requires-Dist: jinja2==3.1.6
Requires-Dist: kiwixstorage==0.10.1
Requires-Dist: lxml==6.0.2
Requires-Dist: mistune==3.1.4
Requires-Dist: pif==0.8.2
Requires-Dist: psutil==7.1.3
Requires-Dist: py7zr==1.0.0
Requires-Dist: python-dateutil==2.9.0.post0
Requires-Dist: python-slugify==8.0.4
Requires-Dist: python-snappy==0.7.3
Requires-Dist: redis==7.1.0
Requires-Dist: tld==0.13.1
Requires-Dist: xml-to-dict==0.1.6
Requires-Dist: zimscraperlib==5.3.0
Provides-Extra: check
Requires-Dist: pyright==1.1.407; extra == 'check'
Provides-Extra: dev
Requires-Dist: black==25.11.0; extra == 'dev'
Requires-Dist: coverage==7.12.0; extra == 'dev'
Requires-Dist: debugpy==1.8.17; extra == 'dev'
Requires-Dist: invoke==2.2.1; extra == 'dev'
Requires-Dist: ipdb==0.13.13; extra == 'dev'
Requires-Dist: ipython==9.7.0; extra == 'dev'
Requires-Dist: pre-commit==4.5.0; extra == 'dev'
Requires-Dist: pyright==1.1.407; extra == 'dev'
Requires-Dist: pytest==9.0.1; extra == 'dev'
Requires-Dist: ruff==0.14.6; extra == 'dev'
Provides-Extra: lint
Requires-Dist: black==25.11.0; extra == 'lint'
Requires-Dist: ruff==0.14.6; extra == 'lint'
Provides-Extra: scripts
Requires-Dist: invoke==2.2.1; extra == 'scripts'
Provides-Extra: test
Requires-Dist: coverage==7.12.0; extra == 'test'
Requires-Dist: pytest==9.0.1; extra == 'test'
Description-Content-Type: text/markdown

Sotoki
======

`Sotoki` (*Stack Overflow to Kiwix*) is an
[openZIM](https://github.com/openzim) scraper to create offline
versions of [Stack Exchange](https://stackexchange.com) websites such
as [Stack Overflow](https://stackoverflow.com/).

It is based on Stack Exchange's Data Dumps hosted by [The Internet
Archive](https://archive.org/download/stackexchange/).

[![CodeFactor](https://www.codefactor.io/repository/github/openzim/sotoki/badge)](https://www.codefactor.io/repository/github/openzim/sotoki)
[![Docker](https://ghcr-badge.egpl.dev/openzim/sotoki/latest_tag?label=docker)](https://ghcr.io/openzim/sotoki)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![PyPI version shields.io](https://img.shields.io/pypi/v/sotoki.svg)](https://pypi.org/project/sotoki/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/sotoki.svg)](https://pypi.org/project/sotoki)

## Usage

`Sotoki` works off a dump of a StackExchange website, as regularly created by StackExchange team. You must provide
a `--mirror` to use to download this dump and the `--domain` you want to scrape.

For instance, to download Sports StackExchange website as of August 2024 and based on dump hosted on archive.org,
you have to use `--mirror https://archive.org/download/stackexchange_20240829 --domain sports.stackexchange.com`.
Value of mirror is hence continuously updated as new dumps are published by StackExchange team.

Other CLI parameters are mandatory:
- `--title`: ZIM title, must be less than 30 chars. E.g. `Gardening and Landscaping`
- `--description`: ZIM description, must be less than 80 chars

### Docker

```bash
docker run -v my_dir:/output ghcr.io/openzim/sotoki sotoki --help
```

### Installation

`sotoki` is a Python3 software. If you are not using the
[Docker](https://ghcr.io/openzim/sotoki/) image, you are advised to use it in a
virtual environment to avoid installing software dependencies on your
system.

```sh
python3 -m venv ./env  # creates a virtual python environment in ./env folder
./env/bin/pip install -U pip  # upgrade pip (package manager). recommended
./env/bin/pip install -U sotoki  # install/upgrade sotoki inside virtualenv

# direct access to in-virtualenv sotoki binary, without shell-attachment
./env/bin/sotoki --help
# alias or link it for convenience
sudo ln -s $(pwd)/env/bin/sotoki /usr/local/bin/

# alternatively, attach virtualenv to shell
source env/bin/activate
sotoki --help
deactivate  # unloads virtualenv from shell
```

## Developers

Anybody is welcome to improve the Sotoki.

To run Sotoki off the git repository, you'll need to download a few
external dependencies that we pack in Python releases. Just run
`python src/sotoki/dependencies.py`.

See `requirements.txt` for the list of python dependencies.

## Users

You don't have to make your own ZIM files of Stack Exchange's Web 
sites. Updated ZIM files are built on a regular basis for all 
of them. Look at https://library.kiwix.org/?category=stack_exchange
to download them.
