Metadata-Version: 2.1
Name: docx-parser-converter
Version: 0.5
Summary: A library for converting DOCX documents to HTML and plain text
Home-page: https://github.com/omer-go/docx-html-txt
Author: Omer Hayun
Author-email: your.email@example.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: alabaster ==0.7.16
Requires-Dist: annotated-types ==0.7.0
Requires-Dist: Babel ==2.15.0
Requires-Dist: beautifulsoup4 ==4.12.3
Requires-Dist: bs4 ==0.0.2
Requires-Dist: certifi ==2024.6.2
Requires-Dist: charset-normalizer ==3.3.2
Requires-Dist: colorama ==0.4.6
Requires-Dist: docutils ==0.20.1
Requires-Dist: idna ==3.7
Requires-Dist: imagesize ==1.4.1
Requires-Dist: Jinja2 ==3.1.4
Requires-Dist: lxml ==5.2.2
Requires-Dist: MarkupSafe ==2.1.5
Requires-Dist: packaging ==24.1
Requires-Dist: pydantic ==2.7.4
Requires-Dist: pydantic-core ==2.18.4
Requires-Dist: Pygments ==2.18.0
Requires-Dist: regex ==2024.5.15
Requires-Dist: requests ==2.32.3
Requires-Dist: snowballstemmer ==2.2.0
Requires-Dist: soupsieve ==2.5
Requires-Dist: Sphinx ==7.3.7
Requires-Dist: sphinx-autodoc-typehints ==2.2.2
Requires-Dist: sphinx-rtd-theme ==2.0.0
Requires-Dist: sphinxcontrib-applehelp ==1.0.8
Requires-Dist: sphinxcontrib-devhelp ==1.0.6
Requires-Dist: sphinxcontrib-htmlhelp ==2.0.5
Requires-Dist: sphinxcontrib-jquery ==4.1
Requires-Dist: sphinxcontrib-jsmath ==1.0.1
Requires-Dist: sphinxcontrib-qthelp ==1.0.7
Requires-Dist: sphinxcontrib-serializinghtml ==1.1.10
Requires-Dist: typing-extensions ==4.12.2
Requires-Dist: urllib3 ==2.2.2

# Docx Parser and Converter 📄✨

A powerful library for converting DOCX documents into HTML and plain text, with detailed parsing of document properties and styles.

## Table of Contents
- [Introduction 🌟](#introduction-)
- [Project Overview 🛠️](#project-overview-)
- [Key Features 🌟](#key-features-)
- [Installation 💾](#installation-)
- [Usage 🚀](#usage-)
- [Quick Start Guide 📖](#quick-start-guide-)
- [Examples 📚](#examples-)
- [API Reference 📜](#api-reference-)

## Introduction 🌟
Welcome to the DOCX-HTML-TXT Converter project! This library allows you to easily convert DOCX documents into HTML and plain text formats, extracting detailed properties and styles using Pydantic models.

## Project Overview 🛠️
The project is structured to parse DOCX files, convert their content into structured data using Pydantic models, and provide conversion utilities to transform this data into HTML or plain text.

## Key Features 🌟
- Convert DOCX documents to HTML or plain text.
- Parse and extract detailed document properties and styles.
- Structured data representation using Pydantic models.

## Installation 💾
To install the library, you can use pip. (Add the pip install command manually)

```sh
pip install docx-parser-converter
```

## Usage 🚀

### Importing the Library
To start using the library, import the necessary modules:

```python
from docx_parser_converter.docx_to_html import DocxToHtmlConverter
from docx_parser_converter.docx_to_txt import DocxToTxtConverter
from docx_parser_converter.docx_parsers.utils import read_binary_from_file_path
```

### Quick Start Guide 📖
1. **Convert to HTML**:
   ```python
   from docx_parser_converter.docx_to_html import DocxToHtmlConverter
   from docx_parser_converter.docx_parsers.utils import read_binary_from_file_path

    docx_path = "path_to_your_docx_file.docx"
    html_output_path = "output.html"

    docx_file_content = read_binary_from_file_path(docx_path)

    converter = DocxToHtmlConverter(docx_file_content, use_default_values=True)
    html_output = converter.convert_to_html()
    converter.save_html_to_file(html_output, html_output_path)
   ```

2. **Convert to Plain Text**:
   ```python
   from docx_parser_converter.docx_to_txt import DocxToTxtConverter
   from docx_parser_converter.docx_parsers.utils import read_binary_from_file_path

    docx_path = "path_to_your_docx_file.docx"
    txt_output_path = "output.txt"

    docx_file_content = read_binary_from_file_path(docx_path)

    converter = DocxToTxtConverter(docx_file_content, use_default_values=True)
    txt_output = converter.convert_to_txt(indent=True)
    converter.save_txt_to_file(txt_output, txt_output_path)
   ```

## Examples 📚

### Original DOCX File
![Original DOCX File in LibreOffice](docs/images/docx-test-1.png)
![Original DOCX File in LibreOffice](docs/images/docx-test-2.png)

### Converted to HTML
![Converted HTML Output](docs/images/docx-to-html-1.png)
![Converted HTML Output](docs/images/docx-to-html-2.png)

### Converted to Plain Text
![Converted TXT Output](docs/images/docx-to-txt.png)


## API Reference 📜

For detailed API documentation, please visit our [Read the Docs page](https://docx-parser-and-converter.readthedocs.io/en/latest/).


Enjoy using DOCX-HTML-TXT Converter! 🚀✨
