Metadata-Version: 2.1
Name: nxml2txt
Version: 0.1.2
Summary: XML formatted full-text articles to text format conversion. Standoff annotations for text and entities. Originally written by Sampo Pyysalo: https://github.com/spyysalo/nxml2txt
License: MIT
Author: Gully Burns
Author-email: gully.burns@chanzuckerberg.com
Requires-Python: >=3.11,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: beautifulsoup4 (>=5.3.0,<6.0.0)
Requires-Dist: lxml (==4.9.4)
Requires-Dist: pandas (>=2.2.2,<3.0.0)
Requires-Dist: requests (>=2.32.3,<3.0.0)
Description-Content-Type: text/markdown

nxml2txt
========

NLM .nxml to text format conversion 

Usage:

    ./nxml2txt NXMLFILE [TEXTFILE] [SOFILE]

For example (using test document):

    ./nxml2txt test/PMC3357053.nxml test/PMC3357053.txt test/PMC3357053.so

This creates the files `test/PMC3357053.txt`, containing the text
content of the input document, and `test/PMC3357053.so`, containing
the annotations (XML elements and their attributes) in a simple
standoff format.

nxml2txt assumes a unix-like environment.
If the input .nxml file contains embedded TeX-math, nxml2txt
requires [LaTeX](http://en.wikipedia.org/wiki/LaTeX) and
[catdvi](http://catdvi.sourceforge.net/).

This tool was originally introduced as part of the BioNLP Shared Task
2011 supporting resources
(https://github.com/ninjin/bionlp_st_2011_supporting).

