Metadata-Version: 2.4
Name: LinguaLab
Version: 3.4.3
Summary: A multilingual text and voice processing toolkit
Home-page: https://github.com/EusDancerDev/LinguaLab
Author: Jon Ander Gabantxo
Author-email: Jon Ander Gabantxo <jagabantxo@gmail.com>
Project-URL: Homepage, https://github.com/EusDancerDev/LinguaLab
Project-URL: Documentation, https://github.com/EusDancerDev/LinguaLab#readme
Project-URL: Repository, https://github.com/EusDancerDev/LinguaLab.git
Project-URL: Bug Reports, https://github.com/EusDancerDev/LinguaLab/issues
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9,<3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: filewise>=3.7.0
Requires-Dist: pygenutils>=15.11.0
Requires-Dist: paramlib>=3.4.0
Requires-Dist: SpeechRecognition>=3.10.0
Requires-Dist: googletrans>=4.0.0
Requires-Dist: gTTS>=2.3.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# LinguaLab

[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
[![PyPI Version](https://img.shields.io/pypi/v/lingualab.svg)](https://pypi.org/project/lingualab/)

**LinguaLab** is a Python toolkit designed for natural language processing and linguistic analysis. It provides a comprehensive set of tools for text processing, language analysis, and linguistic research, making it easier to work with text data in Python applications.

## Features

- **Text Processing**:
  - Text cleaning and normalization
  - Tokenization and lemmatization
  - Stop word removal and filtering
- **Language Analysis**:
  - Syntax analysis
  - Morphological analysis
  - Semantic analysis
- **Corpus Management**:
  - Corpus creation and management
  - Text collection and organization
  - Metadata handling
- **Statistical Analysis**:
  - Frequency analysis
  - Word distribution analysis
  - Text similarity measures

## Installation

### Prerequisites

Before installing, please ensure the following dependencies are available on your system:

- **Required Third-Party Libraries**:

  ```bash
  pip install nltk spacy pandas numpy scikit-learn
  ```

  Or via Anaconda (recommended channel: `conda-forge`):

  ```bash
  conda install -c conda-forge nltk spacy pandas numpy scikit-learn
  ```

### Installation (from PyPI)

Install the package using pip:

```bash
pip install lingualab
```

### Development Installation

For development purposes, you can install the package in editable mode:

```bash
git clone https://github.com/yourusername/lingualab.git
cd lingualab
pip install -e .
```

## Usage

### Basic Example

```python
from lingualab.processing import TextProcessor
from lingualab.analysis import LanguageAnalyzer

# Process text
processor = TextProcessor("This is a sample text.")
processed_text = processor.clean().tokenize().lemmatize()

# Analyze language
analyzer = LanguageAnalyzer(processed_text)
syntax_tree = analyzer.parse_syntax()
```

### Advanced Example

```python
from lingualab.corpus import CorpusManager
from lingualab.statistics import TextAnalyzer

# Manage corpus
corpus = CorpusManager("my_corpus")
corpus.add_document("doc1.txt", metadata={"author": "John Doe"})

# Analyze text statistics
analyzer = TextAnalyzer(corpus)
word_freq = analyzer.word_frequency()
similarity = analyzer.text_similarity("doc1.txt", "doc2.txt")
```

## Project Structure

The package is organised into several sub-packages:

```text
LinguaLab/
├── processing/
│   ├── text_processor.py
│   └── tokenizer.py
├── analysis/
│   ├── syntax_analyzer.py
│   └── semantic_analyzer.py
├── corpus/
│   ├── manager.py
│   └── collector.py
└── statistics/
    ├── frequency_analyzer.py
    └── similarity_analyzer.py
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Natural Language Processing community
- Open-source contributors
- Linguistic research community

## Contact

For any questions or suggestions, please open an issue on GitHub or contact the maintainers.
