Metadata-Version: 2.4
Name: docx2everything
Version: 1.0.0
Summary: A pure python-based utility to extract and convert DOCX files to various formats including plain text and markdown
Author: sudipnext
Maintainer: sudipnext
License: MIT
Project-URL: Homepage, https://parajulisudip.com.np
Project-URL: Repository, https://github.com/sudipnext/docx2everything
Keywords: python,docx,text,markdown,convert,extract
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing :: Markup
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file
Dynamic: requires-python

# docx2everything

Convert DOCX files to plain text or markdown format with preserved structure.

## Installation

```bash
pip install docx2everything
```

Or install from source:

```bash
# Modern way (recommended)
pip install .

# Or using setup.py (deprecated but still works)
python setup.py install
```

## Testing Without Installation

The CLI script works directly without installation - no PYTHONPATH needed!

**Using CLI (no installation required):**
```bash
# Extract text
python3 bin/docx2everything demo.docx

# Convert to markdown
python3 bin/docx2everything --markdown demo.docx > output.md

# With images
python3 bin/docx2everything --markdown -i images/ demo.docx > output.md
```

**Using Python:**
```bash
# Set PYTHONPATH to current directory
PYTHONPATH=. python3 -c "import docx2everything; print(docx2everything.process('demo.docx')[:100])"
```

**In Python script:**
```python
import sys
sys.path.insert(0, '/path/to/python-docx2txt')

import docx2everything
text = docx2everything.process('document.docx')
```

## Usage

### Command Line

**Extract plain text:**
```bash
docx2everything document.docx
```

**Convert to markdown:**
```bash
docx2everything --markdown document.docx > output.md
```

**Extract images:**
```bash
docx2everything -i images/ document.docx
```

**Markdown with images:**
```bash
docx2everything --markdown -i images/ document.docx > output.md
```

### Python API

```python
import docx2everything

# Extract plain text
text = docx2everything.process("document.docx")

# Convert to markdown
markdown = docx2everything.process_to_markdown("document.docx")

# Extract images
text = docx2everything.process("document.docx", img_dir="images/")

# Markdown with images
markdown = docx2everything.process_to_markdown("document.docx", img_dir="images/")
```

## Features

- ✅ Plain text extraction
- ✅ Markdown conversion with preserved structure:
  - Tables → Markdown tables
  - Lists → Bulleted/numbered lists
  - Headings → Markdown headings (#, ##, ###)
  - Formatting → Bold, italic, strikethrough
  - Links → Markdown links
  - Images → Markdown image references
- ✅ Image extraction
- ✅ Header and footer support

## Requirements

Python 3.6+

## License

MIT License - see LICENSE.txt
