Metadata-Version: 2.4
Name: tfq0tool
Version: 2.1.7
Summary: A powerful text extraction utility for multiple file formats, including PDFs, Word documents, spreadsheets, and code files.
Home-page: https://github.com/tfq0/tfq0tool
Author: Talal
Project-URL: Bug Reports, https://github.com/tfq0/TFQ0tool/issues
Project-URL: Source, https://github.com/tfq0/TFQ0tool
Keywords: text extraction pdf docx xlsx ocr
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: End Users/Desktop
Classifier: Topic :: Text Processing :: General
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyPDF2>=3.0.0
Requires-Dist: python-docx>=0.8.11
Requires-Dist: pandas>=1.5.0
Requires-Dist: openpyxl>=3.1.0
Requires-Dist: pdfminer.six>=20221105
Requires-Dist: chardet>=5.0.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: pytesseract>=0.3.10
Requires-Dist: Pillow>=9.5.0
Requires-Dist: python-magic>=0.4.27
Requires-Dist: python-magic-bin>=0.4.14; sys_platform == "win32"
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# TFQ0tool

[![PyPI version](https://img.shields.io/pypi/v/tfq0tool.svg)](https://pypi.org/project/tfq0tool/)
[![License](https://img.shields.io/pypi/l/tfq0tool.svg)](https://github.com/tfq0/TFQ0tool/blob/main/LICENSE)
[![Python Versions](https://img.shields.io/pypi/pyversions/tfq0tool.svg)](https://pypi.org/project/tfq0tool/)
[![Downloads](https://img.shields.io/pypi/dm/tfq0tool.svg)](https://pypi.org/project/tfq0tool/)

A command-line utility for extracting text from various file formats. Designed for simplicity and efficiency.

## Features

- **Format Support**:
  - PDF (with password protection)
  - Microsoft Office (DOCX, DOC, XLSX, XLS)
  - Data files (CSV, JSON, XML)
  - Text files (TXT, LOG, MD)

- **Processing Features**:
  - Parallel processing
  - Memory-efficient streaming
  - Text preprocessing (lowercase, whitespace removal)
  - Progress tracking
  - Automatic encoding detection

## Installation

```bash
pip install tfq0tool
```

## Usage

### Basic Commands

```bash
# Extract text from a file
tfq0tool extract document.pdf

# Extract to specific directory
tfq0tool extract document.pdf -o output_dir

# Process multiple files
tfq0tool extract *.pdf *.docx -o ./extracted

# Show supported formats
tfq0tool formats

# Show help
tfq0tool --help
```

### Extract Options

```bash
tfq0tool extract [OPTIONS] FILE_PATHS...

Options:
  -o, --output DIR    Output directory
  -t, --threads N     Thread count (default: auto)
  -f, --force        Overwrite existing files
  -p, --password PWD  PDF password
  --preprocess OPT    Preprocessing (lowercase,strip_whitespace)
  --progress         Show progress bar
  --verbose         Detailed output
```

### Configuration

View or modify settings:

```bash
# Show current config
tfq0tool config --show

# Reset to defaults
tfq0tool config --reset

# Change settings
tfq0tool config --set processing.chunk_size 2097152
tfq0tool config --set threading.max_threads 8
```

## Examples

```bash
# Basic text extraction
tfq0tool extract document.pdf

# Multiple files with progress
tfq0tool extract *.pdf *.docx --progress -o ./output

# Process password-protected PDF
tfq0tool extract secure.pdf -p mypassword

# Extract with preprocessing
tfq0tool extract input.docx --preprocess lowercase,strip_whitespace

# Parallel processing
tfq0tool extract *.pdf -t 4 --progress
```
