Metadata-Version: 2.4
Name: nlpap
Version: 1.0.0
Summary: A comprehensive NLP library for text preprocessing, semantic analysis, and information extraction
Home-page: https://github.com/yourusername/nlpap
Author: Your Name
Author-email: your.email@example.com
Project-URL: Bug Reports, https://github.com/yourusername/nlpap/issues
Project-URL: Documentation, https://nlpap.readthedocs.io/
Project-URL: Source, https://github.com/yourusername/nlpap
Keywords: nlp natural-language-processing text-analysis sentiment-analysis named-entity-recognition word-embeddings
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: nltk>=3.7
Requires-Dist: numpy>=1.19.0
Requires-Dist: pandas>=1.2.0
Requires-Dist: matplotlib>=3.3.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.8; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# NLPAP - NLP Analyzer Package

A comprehensive Natural Language Processing library that provides easy-to-use tools for text preprocessing, semantic analysis, and information extraction.

## Features

### Q1: Text Preprocessing & Representation
- ✅ Advanced tokenization using NLTK
- ✅ Stop word removal and text cleaning
- ✅ Lemmatization and stemming
- ✅ Bag-of-Words (BoW) representation
- ✅ TF-IDF calculation and analysis
- ✅ Vocabulary analysis and word frequency

### Q2: Semantic Understanding & Language Modeling
- ✅ WordNet integration for synonyms, antonyms, and hypernyms
- ✅ N-gram language models (unigram, bigram, trigram)
- ✅ Laplace smoothing for probability estimation
- ✅ Next word prediction capabilities
- ✅ Semantic relationship extraction

### Q3: Information Extraction & Sentiment Analysis
- ✅ Named Entity Recognition (NER)
- ✅ Rule-based sentiment analysis
- ✅ Word similarity analysis using co-occurrence
- ✅ Entity classification (Person, Organization, Location)
- ✅ Sentiment distribution analysis

## Installation

```bash
pip install nlpap
```

## Quick Start

```python
from nlpap import ComprehensiveNLP

# Initialize the analyzer
nlp = ComprehensiveNLP()

# Sample texts (Mental Health domain example)
texts = [
    "I feel overwhelmed with anxiety and stress lately. Need professional help.",
    "The therapy sessions have been very helpful for my mental health recovery.",
    "Depression affects many students during exam periods. Support is crucial."
]

# Run complete analysis
results = nlp.run_complete_analysis(texts)

# Access individual components
processed_data, tokens = nlp.preprocess_text(texts)
semantic_data = nlp.semantic_analysis(tokens)
entities, sentiments = nlp.information_extraction_and_sentiment(texts)
```

## Detailed Usage

### Text Preprocessing & Representation

```python
# Initialize analyzer
nlp = ComprehensiveNLP()

# Preprocess texts
processed_data, all_tokens = nlp.preprocess_text(your_texts)

# The method returns:
# - Tokenized text
# - Filtered tokens (no stopwords/punctuation)
# - Lemmatized tokens
# - Stemmed tokens
# - BoW and TF-IDF representations
```

### Semantic Analysis

```python
# Analyze semantic relationships
semantic_data = nlp.semantic_analysis(tokens)

# Returns WordNet analysis:
# - Definitions
# - Synonyms and antonyms
# - Hypernyms (broader categories)
# - Language model with Laplace smoothing
```

### Information Extraction & Sentiment

```python
# Extract entities and analyze sentiment
entities, sentiments = nlp.information_extraction_and_sentiment(texts)

# Returns:
# - Named entities (Person, Organization, Location)
# - Sentiment classification (Positive, Negative, Neutral)
# - Word similarity analysis
```

## Command Line Interface

After installation, you can use the command line interface:

```bash
# Analyze a single text
nlpap-analyze --text "Your text here"

# Analyze a file
nlpap-analyze --file path/to/your/textfile.txt

# Get help
nlpap-analyze --help
```

## Dependencies

- nltk >= 3.7
- numpy >= 1.19.0
- pandas >= 1.2.0
- matplotlib >= 3.3.0

## Educational Use

This library is designed for educational purposes and covers fundamental NLP concepts:

1. **Text Preprocessing**: Learn how to clean and prepare text data
2. **Representation Methods**: Understand BoW vs TF-IDF approaches
3. **Semantic Analysis**: Explore word relationships using WordNet
4. **Language Modeling**: Implement N-gram models with smoothing
5. **Information Extraction**: Practice NER and sentiment analysis
6. **Word Embeddings**: Analyze word similarity and relationships

## Examples

### Mental Health Text Analysis

```python
from nlpap import ComprehensiveNLP

# Mental health related texts
mental_health_texts = [
    "Feeling stressed about exams, need support and guidance.",
    "Therapy has been helpful for managing anxiety effectively.",
    "University counseling services provide excellent mental health support."
]

nlp = ComprehensiveNLP()
results = nlp.run_complete_analysis(mental_health_texts)

print("Analysis complete!")
print(f"Entities found: {results['entities']}")
print(f"Sentiments: {results['sentiments']}")
```

### Custom Domain Analysis

```python
# Use with your own domain-specific texts
your_texts = [
    "Your domain-specific text here...",
    "Another text for analysis...",
]

nlp = ComprehensiveNLP()

# Run individual analyses
processed_data, tokens = nlp.preprocess_text(your_texts)
semantic_data = nlp.semantic_analysis(tokens)
entities, sentiments = nlp.information_extraction_and_sentiment(your_texts)
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Citation

If you use this library in your research or projects, please cite:

```
@software{nlp_comprehensive_analyzer,
  title={NLP Comprehensive Analyzer},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/nlp-comprehensive-analyzer}
}
```

## Support

For support, please open an issue on GitHub or contact [your.email@example.com]

---

**Perfect for students, researchers, and developers learning NLP fundamentals!**
