Metadata-Version: 2.0
Name: mwxml
Version: 0.3.1
Summary: A set of utilities for processing MediaWiki XML dump data.
Home-page: https://github.com/mediawiki-utilities/python-mwxml
Author: Aaron Halfaker
Author-email: aaron.halfaker@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Environment :: Other Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Text Processing :: General
Classifier: Topic :: Utilities
Classifier: Topic :: Scientific/Engineering
Requires-Dist: jsonschema (>=2.5.1)
Requires-Dist: mwcli (>=0.0.2)
Requires-Dist: mwtypes (>=0.1.1)
Requires-Dist: para (>=0.0.1)

# MediaWiki XML

This library contains a collection of utilities for efficiently 
processing MediaWiki’s XML database dumps. There are two 
important concerns that this module intends to address: 
complexity and performance of streaming XML parsing.  This library
enables memory efficent stream processing of XML dumps with 
a simple [`iterator`](https://pythonhosted.org/mwxml/iteration.html) 
strategy.  This library also implements a distributed
processing strategy (see 
[`map()`](https://pythonhosted.org/mwxml/map.html)) that enables parallel
processing of many XML dump files at the same time. 

* **Installation:** ``pip install mwxml``
* **Documentation:** https://pythonhosted.org/mwxml
* **Repositiory:** https://github.com/mediawiki-utilities/python-mwxml
* **License:** MIT

## Example

    >>> import mwxml
    >>>
    >>> dump = mwxml.Dump.from_file(open("dump.xml"))
    >>> print(dump.site_info.name, dump.site_info.dbname)
    Wikipedia enwiki
    >>>
    >>> for page in dump:
    ...     for revision in page:
    ...        print(revision.id)
    ...
    1
    2
    3

## Author
* Aaron Halfaker -- https://github.com/halfak

## See also 
* http://dumps.wikimedia.org/
* http://community.wikia.com/wiki/Help:Database_download


