Metadata-Version: 2.2
Name: pun_nlp
Version: 0.0.3
Summary: A robust NLP pipeline for stemming, lemmatization, and vectorization
Home-page: https://github.com/PunVas/pun_nlp
Author: Puneet Vaswani
Author-email: vaswaniusham2212@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: nltk
Requires-Dist: spacy
Requires-Dist: scikit-learn
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# NLPProcessor

## Overview
NLPProcessor is an automated, adaptive NLP pipeline that dynamically handles:
- **Stemming & Lemmatization** (via NLTK or spaCy)
- **Vectorization** (TF-IDF or Count Vectorizer)
- **Dependency Management** (Auto-installs missing libraries)
- **Support for 2D Text Arrays** (Processes lists of lists of text)
- **Exception-Free Execution** (Handles API changes without breaking)

## Features
- **Automated dependency installation**
- **Works with both NLTK and spaCy**
- **Vectorization support using scikit-learn**
- **Handles single strings and 2D arrays**
- **No human intervention required**

## Installation
Run the following command to install missing dependencies:
```bash
python your_script.py
```

## Usage
### Import and Initialize
```python
from your_script import NLPProcessor

processor = NLPProcessor(stem=True, lemmatize=True, vectorize="tfidf", backend="spacy")
```

### Process a Single Text
```python
output = processor.process("running jumped swimming")
print(output)
```

### Process a 2D Array of Text
```python
input_texts = [
    ["I am running", "He is jumping"],
    ["They are swimming", "Dogs are barking"]
]
output = processor.process(input_texts)
print(output)
```

### Customization Options
| Parameter | Description |
|-----------|-------------|
| `stem` | Enable stemming (default: `False`) |
| `lemmatize` | Enable lemmatization (default: `False`) |
| `vectorize` | Choose "tfidf", "count", or `None` (default: `None`) |
| `backend` | Choose "nltk" or "spacy" (default: "nltk") |

### Check Supported Vectorizers
```python
print(NLPProcessor.supported_vectorizers())  # ['tfidf', 'count']
```

## License
MIT License
