Metadata-Version: 2.1
Name: regisoup
Version: 0.1.10
Summary: Convert <figure> and <xr> tags from deprecated MediaWiki extension to a new supported format.
Author: pengu5055
Author-email: urbancmarko1@gmail.com
Requires-Python: >=3.7,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Dist: beautifulsoup4 (>=4.12.0,<5.0.0)
Requires-Dist: lxml (>=4.9.2,<5.0.0)
Requires-Dist: typer[all] (>=0.7.0,<0.8.0)
Description-Content-Type: text/markdown

# **RegiSoup: Just Reg(Ex)ular and Beautiful Soup**
A tool to convert `<figure>` and `<xr>` tags from deprecated MediaWiki extension to a new supported format as specified in the extension `Figures` available on: 
[https://www.mediawiki.org/wiki/Extension:Figures](https://www.mediawiki.org/wiki/Extension:Figures).  

It was developed to mass modify pages via modifying the Wiki's `.xml` dump but it turns out that that is not a viable approach since `<text>` blocks inside the 
`.xml` dump are hashed with `sha1` and additionally have a size in bytes specified. It was simpler to create a MediaWiki API wrapper for `RegiSoup` so the original use case
for `RegiSoup` has translated [`StewChef`](https://pypi.org/project/stewchef/) but `RegiSoup` remains a standalone package.

## **Usage:**
Usage of `RegiSoup` is simple and easy. If at any point in time, you need a reminder of what anything means do not forget about:
```
regisoup --help
```
where a lot of information about the tool is gathered. Basic usage is as follows:
```
regisoup [input_file] [output_file]
```
`RegiSoup` should find and replace all `<figure>` and `<xr>` tags inside the input page. All the required information for the replacement MediaWiki parser functions is extracted from the existing tags or `RegiSoup` at least attempts to do so. One of the issues you may commonly run into is pages not specifying which page they are referencing. This can be quickly fixed by supplying the extra field for page title. For example:
```
regisoup [input_file] [output_file] 'Some Page Title'
```
**Note the use of quotation marks!** Another issue you can run into is captions being improperly specified. By default `RegiSoup` uses captions to generate the replacement parser functions. You can override this by supplying `--no-caption`. Some Wiki pages contain unescaped HTML characters, `RegiSoup` gets around this by doing a bit of encoding "magic". If this produces undesirable results you can try and supply the `--no-magic` flag and see if that works out better. Additionally for debugging support `RegiSoup` can be forced to not delete it's temporary files with `--no-clean-up` and be made more verbose with `--verbose`.
