Metadata-Version: 2.1
Name: docx2md
Version: 1.0.0
Summary: convert .docx to .md
Home-page: https://github.com/dogatana/docx2md
Author: Toshihiko Ichida
Author-email: dogatana@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Environment :: Other Environment
Classifier: Intended Audience :: Other Audience
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Text Editors
Classifier: Topic :: Text Editors :: Documentation
Classifier: Topic :: Text Editors :: Word Processors
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Markup
Classifier: Topic :: Text Processing :: Markup :: Markdown
Classifier: Topic :: Utilities
Requires-Dist: lxml
Requires-Dist: Pillow

docx2md

=======



Converts Microsoft Word document files (.docx extension) to Markdown

files.



Install

-------



::



   pip install docx2md



How to use

----------



::



   usage: docx2md [-h] [-m] [-v] [--debug] SRC.docx DST.md



   positional arguments:

     SRC.docx        Microsoft Word file to read

     DST.md          Markdown file to write



   optional arguments:

     -h, --help      show this help message and exit

     -m, --md_table  use Markdown table notation instead of <table>

     -v, --version   show version

     --debug         for debug



Tables

~~~~~~



A table is output as ``<table id="table(n)">``. ``id`` is the order of

output, starting from 1.



If ``--md_table`` is specified, the output will use ``|``, but the title

line item will be ``#`` fixed.



::



   | # | # | # |

   |---|---|---|

   |a|b|c|

   |d|e|f|

   |g|h|i|



Pictures

~~~~~~~~



Images will be output as ``<img id="image(n)">``. The ``id`` is output

in order starting from 1.



Elements that can be converted

------------------------------



-  Tables (including merged cells)

-  Lists (also with numbers as bullets)

-  Headings

-  Embedded images

-  Page breaks (converted to ``<div class="break"></div>``)

-  Line breaks within paragraphs (converted to ``<br>``)

-  Text boxes (inserted in the body)



Elements that cannot be converted (only known ones)

---------------------------------------------------



-  Table of Contents

-  Text decoration (bold and etc…)



License

-------



MIT



