Metadata-Version: 2.1
Name: rst2json
Version: 0.2.0
Summary: Split rendered reStructuredText into JSON
Home-page: https://github.com/jwodder/rst2json
Author: John Thorvald Wodder II
Author-email: rst2json@varonathe.org
License: MIT
Project-URL: Source Code, https://github.com/jwodder/rst2json
Project-URL: Bug Tracker, https://github.com/jwodder/rst2json/issues
Keywords: JSON,docutils,reStructuredText,template,templating
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: License :: OSI Approved :: MIT License
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Topic :: Documentation
Classifier: Topic :: Software Development :: Documentation
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Text Processing :: Markup :: LaTeX
Requires-Python: ~=3.6
Description-Content-Type: text/x-rst
Requires-Dist: docutils (>=0.16)

.. image:: http://www.repostatus.org/badges/latest/active.svg
    :target: http://www.repostatus.org/#active
    :alt: Project Status: Active — The project has reached a stable, usable
          state and is being actively developed.

.. image:: https://travis-ci.com/jwodder/rst2json.svg?branch=master
    :target: https://travis-ci.com/jwodder/rst2json

.. image:: https://codecov.io/gh/jwodder/rst2json/branch/master/graph/badge.svg
    :target: https://codecov.io/gh/jwodder/rst2json

.. image:: https://img.shields.io/pypi/pyversions/rst2json.svg
    :target: https://pypi.org/project/rst2json/

.. image:: https://img.shields.io/github/license/jwodder/rst2json.svg
    :target: https://opensource.org/licenses/MIT
    :alt: MIT License

`GitHub <https://github.com/jwodder/rst2json>`_
| `PyPI <https://pypi.org/project/rst2json/>`_
| `Issues <https://github.com/jwodder/rst2json/issues>`_

.. contents::
    :backlinks: top

``rst2json`` renders a reStructuredText_ document as HTML or (Xe)LaTeX, but —
unlike Docutils_' builtin converters, which produce a complete output document
— it outputs a JSON object in which the document "frontmatter" (title,
subtitle, bibliographic fields, etc.) has been broken out from the body and
split into multiple fields.  By combining the output with a templating system
like Jinja_, the user can perform more powerful & more customizable templating
than is possible with Docutils' built-in template support.

Sample templates that make use of the library's output can be found in
|exampledir|_.

.. _reStructuredText: https://docutils.sourceforge.io/rst.html
.. _Docutils: https://docutils.sourceforge.io/
.. _Jinja: https://palletsprojects.com/p/jinja/

.. |exampledir| replace:: the repository's ``examples/`` directory
.. _exampledir: https://github.com/jwodder/rst2json/tree/master/examples


Installation
============
``rst2json`` requires Python 3.6 or higher.  Just use `pip
<https://pip.pypa.io>`_ for Python 3 (You have pip, right?) to install
``rst2json`` and its dependencies::

    python3 -m pip install rst2json


Command-Line Usage
==================

``rst2json`` provides a single command, also named ``rst2json``, that converts
an input reStructuredText document to markup organized into a JSON object::

    rst2json [--format <FMT>] [<docutils options>] [<infile> [<outfile>]]

The target markup format is specified with the ``-f`` or ``--format`` option.
Valid values (case insensitive) are:

``html`` (default)
   Alias for ``html4``.  When Docutils eventually changes ``rst2html.py`` to
   produce HTML 5 output instead of HTML 4, this alias will likewise update to
   point to ``html5``.

``html4``
   HTML 4 / XHTML 1.0 output, based on the Docutils writer used for
   ``rst2html4.py``.  A CSS stylesheet (such as the ``html4css1.css``
   stylesheet distributed with Docutils) must be added to the final document in
   order for everything to render properly.

``html5``
   HTML 5 output, based on the Docutils writer used for ``rst2html4.py``.  A
   CSS stylesheet (such as the ``minimal.css`` and ``plain.css`` stylesheets
   distributed with Docutils) must be added to the final document in order for
   everything to render properly.

``latex``
   LaTeX output, based on the Docutils writer used for ``rst2latex.py``

``latex2e``
   Alias for ``latex``

``xetex``
   Alias for ``xelatex``

``xelatex``
   XeLaTeX output, based on the Docutils writer used for ``rst2xetex.py``

In addition to the ``--format`` option, ``rst2json`` accepts all options that
Docutils' ``rst2html4.py``, ``rst2html5.py``, ``rst2latex.py``, and
``rst2xetex.py`` commands accept, and it can also be configured via a `Docutils
configuration file <https://docutils.sourceforge.io/docs/user/config.html>`_
the same way as the respective Docutils commands.

``rst2json`` ignores the following Docutils configuration options, as they have
no effect on its operation:

- ``documentclass``
- ``documentoptions``
- ``latex_preamble``
- ``stylesheet_path``
- ``stylesheet``
- ``template``
- ``use_latex_abstract``
- ``use_latex_docinfo``

In addition, the ``embed_stylesheet`` and ``stylesheet_dirs`` options only have
an effect when emitting HTML with ``math_output`` set to ``html`` with a
stylesheet argument.


Library Usage
=============

Convenience Function
--------------------

*New in version 0.2.0*

.. code:: python

    rst2json.core.rst2json(source, format='html', options=None, config_files=None, destination_path=None)

``rst2json`` provides a ``rst2json.core.rst2json()`` function for rendering &
splitting reStructuredText into a ``dict`` directly within Python.

``source`` specifies the input reStructuredText markup.  It can be a path to a
file (a string), a file-like object (with ``read()`` and ``close()`` methods),
or a path-like object.

``format`` is a string specifying the markup format to produce.  It has the
same set of possible values as the ``--format`` option to the ``rst2json``
command.  Alternatively, it may be set directly to a Docutils writer class.

``options`` sets values for Docutils settings.  When non-``None``, it must be a
``dict`` that maps option names to option values.  Option names must be given
as listed at <https://docutils.sourceforge.io/docs/user/config.html>, i.e., no
leading hyphens, with internal hyphens replaced with underscores.  Option
values must be of the appropriate Python type, e.g., ``bool`` for on/off
switches or ``List[str]`` for comma-separated values.

``config_files`` is a list of file paths specifying the Docutils configuration
files to read from; if ``None``, configuration is read from the files specified
in the ``DOCUTILSCONFIG`` environment variable, or from the standard
configuration files if that is not set.  Settings in configuration files
override any conflicting settings given in ``options``.

``destination_path`` is a path to a file (which need not exist) which
stylesheet paths in HTML ``<link>`` tags will be rewritten relative to; if
``None``, the paths are rewritten relative to the current directory.  This
parameter is only relevant when emitting HTML with ``math_output`` set to
``html`` with a stylesheet argument.

.. Will also be relevant if stylesheet_path links are ever captured


Docutils Writers
----------------

The actual rendering & conversion to JSON is done by custom Writer classes
inheriting from Docutils' built-in Writers.  Users familiar with Docutils can
use these Writers directly in combination with other Docutils machinery.

The ``rst2json.writers.get_json_writer_class()`` function can be used to
retrieve a specific Writer class by case-insensitive name.  The classes and
their names are as follows:

======================  ===================================
Names                   Class
======================  ===================================
``html``, ``html4``     ``rst2json.writers.html4.Writer``
``html5``               ``rst2json.writers.html5.Writer``
``latex``, ``latex2e``  ``rst2json.writers.latex.Writer``
``xelatex``, ``xetex``  ``rst2json.writers.xelatex.Writer``
======================  ===================================

Each Writer's ``translate()`` method sets ``writer.json_data`` to the final
JSON structure as a ``dict`` and sets ``writer.output`` to ``json_data`` as a
JSON-serialized string.  After ``assemble_parts()`` is then called,
``writer.parts["json_data"]`` will also equal the JSON ``dict``.


JSON Output Structure
=====================

The below description divides strings into the following types:

- A *rendered string* is a string containing markup in the target format (HTML
  or LaTeX).  Rendered strings are stripped of leading & trailing newlines.

- A *stripped string* is a string in which characters with special meaning to
  the format are escaped but all other markup has been removed; in addition,
  in stripped strings with corresponding rendered strings, newlines and tabs
  are replaced with space characters.

  For example, under HTML, if ``content.title`` is ``"<i>War &amp;
  Peace</i>"``, then ``content.title_stripped`` would be ``"War &amp; Peace"``.

- Unqualified (neither rendered nor stripped) strings are expected to never
  contain any special characters.

----

The output from ``rst2json`` is a JSON object containing the following fields:

``content`` : object
   The input document converted to the target format and broken up into the
   following fields:

   ``title`` : rendered string or ``null``
      The document title, derived from a lone top-level section title if
      |doctitle_xform|_ is enabled, or ``null`` if no title was specified or
      ``doctitle_xform`` was not enabled.

   ``subtitle`` : rendered string or ``null``
      The document subtitle, derived from a lone second-level section title
      after the document title if |doctitle_xform|_ is enabled, or ``null`` if
      no subtitle was specified or ``doctitle_xform`` was not enabled.

   ``title_stripped`` : stripped string or ``null``
      The ``title`` field, but with non-escaping markup removed.  This field
      can be used to populate an HTML document's ``<title>`` tag.

   ``subtitle_stripped`` : stripped string or ``null``
      The ``subtitle`` field, but with non-escaping markup removed.  This field
      can be used to populate an HTML document's ``<title>`` tag.

   ``document_ids`` : list of strings
      A list of all IDs assigned to the parsed ``document`` node.  Such IDs
      should be attached to the topmost or near-topmost structure of the final
      templated document using HTML's ``id`` attribute or (Xe)LaTeX's
      ``\label`` command.

   ``subtitle_ids`` : list of strings
      A list of all IDs assigned to the document subtitle, or the empty list if
      the document does not have a subtitle.  Such IDs should be attached to
      the templated subtitle using HTML's ``id`` attribute or (Xe)LaTeX's
      ``\label`` command.

      (As far as I can determine, it is not possible for a reStructuredText
      document to produce a doctree in which the ``title`` node has any IDs;
      the output from ``rst2json`` thus does not include a
      ``content.title_ids`` field.)

   ``authors`` : list of rendered strings
      A list of all authors specified in the ``:Author:`` and/or ``:Authors:``
      `bibliographic fields`_, in the order that they appear in the input.

   ``header`` : rendered string or ``null``
      The rendered contents of the ``header::`` directive from the document, or
      ``null`` if there was no such directive.  Markup for using it as a header
      is not included.

   ``footer`` : rendered string or ``null``
      The rendered contents of the ``footer::`` directive from the document, or
      ``null`` if there was no such directive.  Markup for using it as a footer
      is not included.

   ``docinfo`` : list of objects
      The document's `bibliographic fields`_ (excluding the dedication &
      abstract) in the order that they appear in the input, each one
      represented as an object with the following fields:

      ``type`` : string
         For registered fields, this is the name of the Docutils node class
         that represents the field — i.e., the English name of the field in
         lowercase (e.g., ``"author"``).  For unregistered fields, this is the
         string ``"field"``.

      ``name`` : rendered string
         For registered fields, this is the name of the field in the document's
         language (e.g., ``"Author"``).  For unregistered fields, this is the
         name of the field as it appears in the input.

      ``value``
         When ``type`` is ``"authors"`` (plural), this is a list of author
         names as rendered strings.  For all other values of ``type``, this is
         a rendered string.

         Note that, when ``type`` is ``"address"``, whitespace in ``value`` is
         significant, and ``value`` should be wrapped in ``<pre>`` tags or
         similar.

      ``value_stripped``
         The ``value`` field, but with non-escaping markup removed.  When
         ``type`` is ``"authors"`` (plural), this is a list of stripped
         strings.  For all other values of ``type``, this is a stripped string.
         This field can be used to populate the ``content`` attribute of a
         ``<meta>`` tag.

      ``classes`` : list of strings
         A list of classes attached to the field.  Normally, for registered
         fields, this list is empty, while for unregistered fields it contains
         a single element equal to the field name converted to a valid class
         token.  This field can be used to set the CSS class of the HTML
         structure that contains the rendered field.

   ``abstract`` : rendered string or ``null``
      The rendered contents of the document's ``:Abstract:`` field, or ``null``
      if there was no such field.  The abstract title and enclosing block are
      not included.

   ``dedication`` : rendered string or ``null``
      The rendered contents of the document's ``:Dedication:`` field, or
      ``null`` if there was no such field.  The dedication title and enclosing
      block are not included.

   ``body`` : rendered string
      The rendered contents of the rest of the document.

``meta`` : object
   A dictionary of data about the input document and the ``rst2json`` process,
   containing the following fields:

   ``format`` : string
      The name of the target markup format: ``"html4"``, ``"html5"``,
      ``"latex"``, or ``"xelatex"``.

   ``title`` : stripped string or ``null``
      The document's metadata title.  By default, this is equal to
      ``content.title_stripped``, but it can be overridden by a ``title::``
      directive or the ``title`` configuration option.  If none of these are
      set, the field is ``null``.

      Note that, if the title is set via the ``title::`` directive or ``title``
      configuration option, any reStructuredText markup in it will not be
      processed (though characters special to the output format will still be
      escaped).  For example, including ``.. title:: *War & Peace*`` in your
      input document will (when outputting HTML) produce a ``meta.title`` value
      of ``"*War &amp; Peace*"``, with the asterisks left as-is and the
      ampersand escaped.

   ``source`` : stripped string or ``null``
      The name of/path to the input file, or ``null`` if no name can be
      determined

   ``language`` : string
      The language code for the document language, as set via the
      |language_code|_ configuration option

   ``docutils_version`` : string
      The version of Docutils used to produce the output

   ``rst2json_version`` : string
      The version of ``rst2json`` used to produce the output

   ``generator`` : stripped string
      A string of the form ``"rst2json {version} ({url}), Docutils {version}
      ({url})"``

``html`` : object
   A dictionary of strings to insert in the head of the final HTML document.
   This object only appears in the output when the target format is HTML4 or
   HTML5.  The fields of the dictionary are as follows:

   ``math_requires`` : rendered string
      If the input document contains any ``math::`` directives or ``:math:``
      roles, this is a string containing the appropriate markup to add to the
      HTML document head in order to support them; if there are no such
      directives or roles, this is the empty string.

      When set, the value of this field is determined by the |math_output|_
      configuration option.  When set to ``html``, it is either a ``<link>``
      tag or a ``<style>`` block (as determined by the |embed_stylesheet|_
      configuration option) enabling the stylesheet passed as the option
      argument; when set to ``mathjax``, it is a ``<script>`` tag pointing to
      the path or URL passed as the option argument.  When ``math_output`` is
      ``mathml`` or ``latex``, the ``math_requires`` field is the empty string,
      as nothing needs to be added to the HTML document.

   ``meta_tags`` : rendered string
      A string containing any & all ``<meta>`` tags added to the document with
      the ``meta::`` directive.  If no ``meta::`` directives were given, this
      is the empty string.

``latex`` : object
   A dictionary of strings to insert in the preamble of the final (Xe)LaTeX
   document.  This object only appears in the output when the target format is
   LaTeX or XeLaTeX.  The fields of the dictionary are as follows:

   ``language`` : string
      The name of the document language (set via the |language_code|_
      configuration option) in a form recognized by Babel.  If Docutils does
      not recognize the language, this will be the empty string.  Note that,
      when the language is not English, ``latex.requirements`` will already
      contain the appropriate ``\usepackage[LANGUAGE]{babel}`` command; the
      purpose of this field is to be able to set the language in the document
      options.

   ``requirements`` : rendered string
      Required packages and setup, mostly consisting of ``\includepackage``
      commands needed for the markup in ``content.body``.  In a templated
      (Xe)LaTeX document, this should be placed near the beginning of the
      preamble.

   ``fallbacks`` : rendered string
      Fallback definitions (declared with ``\providecommand*``) for the various
      custom commands that Docutils uses in the body.  These definitions can be
      overridden by defining commands of the same name in the preamble before
      ``latex.fallbacks`` occurs.  In a templated (Xe)LaTeX document, this
      should be placed after ``latex.requirements`` and after any custom
      preamble commands.

   ``pdfsetup`` : rendered string
      Inclusion & setup of the ``hyperref`` package.  In a templated (Xe)LaTeX
      document, this should be placed at the end of the preamble.

``system_messages`` : list of objects
   A list of system messages generated during processing of the input document.
   Normally, system messages are embedded in the output in addition to being
   reported to stderr, but ``rst2json`` removes them from the body and places
   them in this list.  Each system message is represented as an object with the
   following fields:

   ``level`` : integer
      The system message level as an integer from 0 (least severe) through 4
      (most severe)

   ``type`` : string
      The name of the system message level.  The names and corresponding
      integer values of the system message levels are as follows:

      ===========  =========
      ``type``     ``level``
      ===========  =========
      ``DEBUG``    0
      ``INFO``     1
      ``WARNING``  2
      ``ERROR``    3
      ``SEVERE``   4
      ===========  =========

   ``source`` : stripped string or ``null``
      The name of the input file in which the message was generated, or
      ``null`` if it cannot be determined

   ``line`` : integer or ``null``
      The line of the input file at which the message was generated, or
      ``null`` if it cannot be determined

   ``body`` : rendered string
      The message itself

   ``ids`` : list of strings
      The IDs of the ``system_message`` node.  If the parsed document tree
      contains a ``problematic`` node enclosing the markup that generated the
      system message, the rendered ``problematic`` node will link to this
      system message by targeting an ID in ``ids``.

      If the system message is included in the templated document, the IDs
      should be attached to the structure using HTML's ``id`` attribute or
      (Xe)LaTeX's ``\label`` command.

   ``backrefs`` : list of strings
      If the parsed document tree contains a ``problematic`` node enclosing the
      markup that generated the system message, ``backrefs`` will contain the
      rendered ``problematic`` node's IDs, usable for creating an
      intra-document link.


.. |doctitle_xform| replace:: ``doctitle_xform``
.. _doctitle_xform: https://docutils.sourceforge.io/docs/user/config.html#doctitle-xform

.. _bibliographic fields: https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#bibliographic-fields

.. |math_output| replace:: ``math_output``
.. _math_output: https://docutils.sourceforge.io/docs/user/config.html#math-output

.. |embed_stylesheet| replace:: ``embed_stylesheet``
.. _embed_stylesheet: https://docutils.sourceforge.io/docs/user/config.html#embed-stylesheet

.. |language_code| replace:: ``language_code``
.. _language_code: https://docutils.sourceforge.io/docs/user/config.html#language-code


