Metadata-Version: 2.1
Name: tabledetector
Version: 1.0.2
Summary: End-to-End table structure detector
Home-page: https://github.com/rajban94/TableDetector
Download-URL: https://github.com/rajban94/TableDetector.git
Author: Rishav Banerjee
Author-email: rishavbanerjee10.rb@gmail.com
License: MIT License
Keywords: table detector
Classifier: Programming Language :: Python :: 3
Classifier: Development Status :: 1 - Planning
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
License-File: LICENSE

# Tabledetector

[![PyPI](https://img.shields.io/pypi/v/tabledetector)](https://pypi.org/project/tabledetector/)

Tabledetector is a Python package that takes PDFs or Images as input, checks the alignment, re-aligns if required, detects the table structure, extracts data, return as pandas dataframe for further use. The current implementation focuses on bordered, semibordered and unbordered table structures.

## Features

- **PDF Input:** Accepts PDF/Image files as input for table detection.
- **Alignment Check:** Verifies and adjusts alignment of input.
- **Table Detection:** Identifies bordered, semibordered and unbordered tables in the PDF/Image File.
- **Table Extraction:** Extract the tabular data in the form of dataframe.

## Libraries Used

- Python 3.x
- OpenCV
- NumPy
- pdf2image
- Pillow
- scipy
- jinja2
- easyocr
- pandas

## Create and Activate Environment
```bash
conda create -n <env_name> python=3.7
conda activate <env_name>
```
## Installation of package using pip

```bash
pip install tabledetector
```

## Clone the repository for latest development release

```bash
git clone https://github.com/rajban94/TableDetector.git
```

## Dependency
To utilize this library on Windows, ensure that Poppler is installed and its path is added to the environment variables.

## Usage

## Detection
For bordered table detection and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", type="bordered", rotation=False, method='detect')
```

For semibordered table detection and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered", rotation=False, method='detect')
```

For unbordered table detection and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered", rotation=False, method='detect')
```

## Extraction
For bordered table detection and extraction and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", type="bordered", rotation=False, method='extract')
```

For semibordered table detection and extraction and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="semibordered", rotation=False, method='extract')
```

For unbordered table detection and extraction and if rotation not required:
```bash
import tabledetector as td
result = td.detect(pdf_path="pdf_path", method="unbordered", rotation=False, method='extract')
```
If no method is mentioned in that case it will check for all the methods and will provide the result accordingly. Also if rotation required make the rotation = True.
