Metadata-Version: 2.1
Name: tex2docx
Version: 1.3.0
Summary: A project to convert LaTeX to DOCX
Author-email: Mingzefei <huagb@qq.com>
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: regex
Requires-Dist: tqdm
Requires-Dist: typer
Provides-Extra: dev
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Requires-Dist: pytest; extra == "dev"

# LaTeX to Word Conversion Tool

![PyPI Downloads](https://img.shields.io/pypi/dm/tex2docx)![PyPI Version](https://img.shields.io/pypi/v/tex2docx)![PyPI License](https://img.shields.io/pypi/l/tex2docx)

[简体中文](./README_zh.md)

In daily work, colleagues or supervisors unfamiliar with LaTeX may request Word documents for review and collaboration. This project provides a Python script that uses Pandoc and Pandoc-Crossref to automatically convert LaTeX files into Word documents following a specified format. Although there's no perfect method for converting LaTeX to Word, the output generated by this project meets informal review needs. However, around 5% of the content (such as author information) may require manual corrections post-conversion.

[Project GitHub](https://github.com/Mingzefei/latex2word)

## Features

- Supports formula conversion
- Automatically numbers and cross-references images, tables, formulas, and citations
- Converts multi-figure LaTeX files
- Outputs Word files in a specified format
- Supports Chinese

Examples are shown below; more results are in [tests](./tests):

<p align="center">
  <img src=".assets/en-word-1.jpg" width="300"/>
  <img src=".assets/en-word-2.jpg" width="300"/>
  <img src=".assets/en-word-3.jpg" width="300"/>
</p>

## Quick Start

Ensure Pandoc and Pandoc-Crossref are correctly installed (see [Install Dependencies](#install-dependencies)). Execute the following command in your terminal:

```shell
tex2docx convert --input-texfile <your_texfile> --output-docxfile <your_docxfile> 
```

Replace `<...>` in the command with the corresponding file path and name.

## Install Dependencies

Ensure you have installed Pandoc, Pandoc-Crossref, and related Python libraries.

### Pandoc

Install Pandoc as described in the [official documentation](https://github.com/jgm/pandoc/blob/main/INSTALL.md). It is recommended to download the latest release from [Pandoc Releases](https://github.com/jgm/pandoc/releases).

### Pandoc-Crossref

Install Pandoc-Crossref by following the [official documentation](https://github.com/lierdakil/pandoc-crossref). Ensure compatibility between Pandoc and Pandoc-Crossref and configure the path correctly.

### Python Libraries

Install from PyPI:

```shell
pip install tex2docx
```

## Usage and Examples

This tool supports both command-line and script-based usage. Ensure the required dependencies are installed.

### Command-Line Usage

Run the following command in your terminal:

```shell
tex2docx convert --input-texfile <your_texfile> --output-docxfile <your_docxfile> --reference-docfile <your_reference_docfile> --bibfile <your_bibfile> --cslfile <your_cslfile>
```

Use `convert --help` to see details on these parameters.

For example, using `tests/en`:

```shell
convert --input-texfile ./tests/en/main.tex --output-docxfile ./tests/en/main_cli.docx --reference-docfile ./my_temp.docx --bibfile ./tests/ref.bib --cslfile ./ieee.csl
```

This will generate the Word file `main_cli.docx` in the `tests/en` directory.

### Script Usage

```python
from tex2docx import LatexToWordConverter

config = {
    'input_texfile': '<your_texfile>',
    'output_docxfile': '<your_docxfile>',
    'reference_docfile': '<your_reference_docfile>',
    'cslfile': '<your_cslfile>',
    'bibfile': '<your_bibfile>',
    'fix_table': True,
    'debug': False
}

converter = LatexToWordConverter(**config)
converter.convert()
```

For more examples, refer to `tests/test_integration.py`.

## FAQ

1. **Inconsistent Multi-Figure Layout**  
   The relative positions of sub-figures may differ between LaTeX compilation and Word conversion, as shown below:

   <p align="center">
     <img src=".assets/raw_multifig_multi-L-charge-equalization.png" width="200"/>
     <img src=".assets/modified_multifig_multi-L-charge-equalization.png" width="200"/>
   </p>

   This may result from redefined page size or parameters in the LaTeX file. To address this, adjust the `MULTIFIG_TEXFILE_TEMPLATE` variable. Below is an example for reference:

   ```python
   import tex2docx

   my_multifig_texfile_template = r"""
   \documentclass[preview,convert,convert={outext=.png,command=\unexpanded{pdftocairo -r 600 -png \infile}}]{standalone}
   \usepackage{graphicx}
   \usepackage{subfig}
   \usepackage{xeCJK}
   \usepackage{geometry}
   \newgeometry{
       top=25.4mm, bottom=33.3mm, left=20mm, right=20mm,
       headsep=10.4mm, headheight=5mm, footskip=7.9mm,
   }
   \graphicspath{{%s}}

   \begin{document}
   \thispagestyle{empty}
   %s
   \end{document}
   """

   config = {
       'input_texfile': 'tests/en/main.tex',
       'output_docxfile': 'tests/en/main.docx',
       'reference_docfile': 'my_temp.docx',
       'cslfile': 'ieee.csl',
       'bibfile': 'tests/ref.bib',
       'multifig_texfile_template': my_multifig_texfile_template,
   }

   converter = tex2docx.LatexToWordConverter(**config)
   converter.convert()
   ```

2. **The Word Output Doesn't Meet Formatting Requirements**  
   Use Word's style management tools to adjust the styles in `my_temp.docx`.

## Implementation Details

This project relies on Pandoc and Pandoc-Crossref to convert LaTeX files to Word documents. The core command used is:

```shell
pandoc texfile -o docxfile \
    --lua-filter resolve_equation_labels.lua \
    --filter pandoc-crossref \
    --reference-doc=temp.docx \
    --number-sections \
    -M autoEqnLabels \
    -M tableEqns \
    -M reference-section-title=Reference \
    --bibliography=ref.bib \
    --citeproc --csl ieee.csl \
    -t docx+native_numbering
```

1. `--lua-filter resolve_equation_labels.lua` handles equation numbering and cross-references, inspired by [Constantin Ahlmann-Eltze's script](https://gist.github.com/const-ae/752ad85c43d92b72865453ea3a77e2dd).
2. `--filter pandoc-crossref` handles cross-references for other elements.
3. `--reference-doc=my_temp.docx` applies the styles from `my_temp.docx` to the generated Word file. Two template files are included: `TIE-temp.docx` (for TIE journal submission, double-column format) and `my_temp.docx` (single-column, designed for easier annotation).
4. `--number-sections` adds numbering to section headings.
5. `-M autoEqnLabels`, `-M tableEqns` enable automatic numbering of equations and tables.
6. `-M reference-section-title=Reference` adds a section title for references.
7. `--bibliography=my_ref.bib` generates the bibliography from `ref.bib`.
8. `--citeproc --csl ieee.csl` formats citations and the bibliography using the IEEE citation style.
9. `-t docx+native_numbering` improves captions for images and tables.

The conversion for multi-figure LaTeX content may not be perfect. This project extracts multi-figure code from the LaTeX file and uses the `convert` and `pdftocairo` tools to compile the figures into a single large PNG file, replacing the original LaTeX image code and updating references to ensure smooth import into Word.

## Known Issues

1. Captions for figures and tables in Chinese still start with "Figure" and "Table".
2. Author information is not fully converted.

## Changelog

### v1.3.0

1. **Major code refactoring**: Complete modular restructuring of the codebase
   - Split monolithic `tex2docx.py` (1139 lines) into 8 specialized modules for better maintainability
   - Introduced clear separation of concerns with dedicated modules for configuration, parsing, conversion, etc.
   - Enhanced type annotations and error handling throughout the codebase

2. **Improved testing infrastructure**: 
   - Renamed and reorganized test files for better clarity:
     - `test_tex2docx_refactored.py` → `test_unit.py` (unit tests for individual components)
     - `test_tex2docx.py` → `test_integration.py` (end-to-end integration tests)
   - Added comprehensive test documentation in `tests/README.md`
   - Enhanced pytest configuration with proper markers and test discovery

3. **Critical bug fixes**:
   - Fixed LaTeX reference line break issue where `\ref{}` commands were incorrectly split as `\nef{}`
   - Resolved CLI import errors with proper module structure
   - Enhanced reference numbering accuracy for tables, figures, and equations

4. **Developer experience improvements**:
   - Better project structure with clear module boundaries
   - Comprehensive documentation updates
   - Cleaner development workflow with organized test suites
   - Preserved all existing functionality while improving code quality

### v1.2.4

1. Add support for `\include` in LaTeX texfiles. (#3)
2. Enhanced the display of figures and tables for better formatting and presentation.

3. Fixed conflicts between `cm` and `varwidth` in tables.
4. Resolved conflict issues between `subfig` and `varwidth`.

### v1.2.3

1. Add feature and option to fix table (issue [#2](https://github.com/Mingzefei/latex2word/issues/2)).

### v1.2.2

1. fix comments bug [#1](https://github.com/Mingzefei/latex2word/issues/1).

### v1.2.1

1. Improved default value settings, including built-in Word style templates and ieee.csl (used as default values).

### v1.2.0

1. Fixed module import issues, improving stability.
2. Enhanced the command-line tool for a more intuitive and efficient user experience.
3. Switched to `pyproject.toml` for dependency management, replacing `setup.py`.
4. Released on PyPI; users can install via `pip install tex2docx`.

## Miscellaneous

There are two kinds of people: those who use LaTeX and those who don't. The latter often ask the former for a Word version. Hence, the following command:

```bash
pandoc input.tex -o output.docx\
  --filter pandoc-crossref \
  --reference-doc=my_temp.docx \
  --number-sections \
  -M autoEqnLabels -M tableEqns \
  -M reference-section-title=Reference \
  --bibliography=my_ref.bib \
  --citeproc --csl ieee.csl \
  -t docx+native_numbering
```
