Metadata-Version: 2.4
Name: sampdfextractor
Version: 0.1.0
Summary: Automatically detect and extract text from scanned and normal PDFs.
Home-page: https://github.com/sam670/sampdfextractor
Author: Samrendra Vishwakarma
Author-email: samrendradev@gmail.com
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyMuPDF
Requires-Dist: easyocr
Requires-Dist: pandas
Requires-Dist: Pillow
Dynamic: author
Dynamic: author-email
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary


# 🧠 SamPDFExtractor

![PyPI - Version](https://img.shields.io/badge/version-1.0.0-blue)
![Python](https://img.shields.io/badge/python-3.8+-green)
![License](https://img.shields.io/badge/license-MIT-yellow)
![Status](https://img.shields.io/badge/build-passing-success)

---

**SamPDFExtractor** is a powerful yet lightweight Python library for extracting text and structured data from both *digital* and *scanned PDFs*.  
Built to be **zero-dependency-friendly**, it works out of the box without complex installations — ideal for automation, document processing, or analytics workflows.

---

## 🚀 Features

- ⚡ **Fast text extraction** from normal (text-based) PDFs  
- 🔍 **Smart scanned PDF handling** (auto-detect + optional OCR fallback)  
- 🧩 **JSON structured output** for easy integration with your applications  
- 🪶 **Lightweight** — minimal dependencies, easy to install and deploy  
- 🧠 **Extensible** — clean codebase for developers who want to build on top  

---

## 📦 Installation

```bash
pip install sampdfextractor
