Metadata-Version: 2.4
Name: ethnidata
Version: 4.3.1
Summary: Name Analysis & Prediction Engine
Author: Teyfik Oz
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: requests
Requires-Dist: tqdm
Dynamic: author
Dynamic: description
Dynamic: description-content-type
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# 🌍 Ethnidata: Ethical & Demographic Intelligence

[![PyPI version](https://img.shields.io/pypi/v/ethnidata.svg)](https://pypi.org/project/ethnidata/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

**Ethnidata** is a specialized library for ethical demographic analysis, name-based ethnic classification, and socioeconomic profiling. It is designed to help researchers and developers understand global diversity while maintaining strict ethical standards and explainability.

---

## 🌟 Vision
To provide a transparent and robust framework for demographic intelligence, enabling unbiased analysis and inclusive product development through Explainable AI (XAI).

## 🚀 Key Features

- **🧬 Advanced Classification**: High-accuracy ethnic and regional classification based on global naming patterns.
- **🔍 Explainable AI (XAI)**: Integral `Explainer` class that breaks down WHY a classification was made, citing linguistic markers.
- **📊 Demographic Synthesis**: Generate privacy-safe synthetic demographic profiles for testing and simulation.
- **📉 Bias Detection**: Tools to identify and mitigate representation bias in your datasets.
- **🌍 Global Coverage**: Support for over 150 ethnic groups and regional clusters.

---

## 📦 Installation

```bash
pip install ethnidata
```

---

## 🛠️ Premium Usage

### 1. Unified Facade Access
The `EthniData` facade provides a streamlined interface for classification and explainability.

```python
from ethnidata import EthniData

# Initialize the intelligence engine
ed = EthniData()

# 1. Classify a name with explainability
result = ed.classify("Kazuo Ishiguro", explain=True)

print(f"Name: {result.name}")
print(f"Primary Ethnicity: {result.ethnicity}")
print(f"Confidence: {result.confidence:.2f}")

# 2. Access XAI Insights
explanation = result.explanation
print("\n--- XAI Breakdown ---")
for marker in explanation.linguistic_markers:
    print(f"- Marker: {marker.token} | Strength: {marker.weight:.2f} | Origin: {marker.region}")
```

#### ✅ Verified Output
```text
Name: Kazuo Ishiguro
Primary Ethnicity: Japanese
Confidence: 0.98

--- XAI Breakdown ---
- Marker: Kazuo | Strength: 0.85 | Origin: East Asia (Japan)
- Marker: Ishiguro | Strength: 0.92 | Origin: East Asia (Japan)
```

### 2. Synthetic Profile Generation
Create high-fidelity, privacy-safe demographic data for system testing.

```python
from ethnidata import ProfileGenerator, Region

generator = ProfileGenerator()

# Generate a batch of synthetic profiles for the Mediterranean region
profiles = generator.generate_batch(region=Region.MEDITERRANEAN, count=5)

for profile in profiles:
    print(f"Profile: {profile.name} | Age: {profile.age} | Occupation: {profile.estimated_occupation}")
```

#### ✅ Verified Output
```text
Profile: Marco Rossi | Age: 34 | Occupation: Software Engineer
Profile: Elena Papadopoulos | Age: 28 | Occupation: Architect
...
```

---

## 📊 API Reference

### `EthniData` (Facade)
- `classify(name: str, explain: bool = False) -> ClassificationResult`: The primary entry point for classification.
- `batch_classify(names: list, ...) -> List[ClassificationResult]`: Process large datasets efficiently.
- `get_explainer() -> ExplainabilityEngine`: Access the raw XAI engine.

### Modules
- `ExplainabilityEngine`: Linguistic marker analysis and evidence weighing.
- `ProfileGenerator`: Synthetic data engine with region-specific constraints.
- `BiasAnalyzer`: Statistical tools for measuring group representation.

---

## 🛡️ Ethics & Privacy
Ethnidata is built with a **Privacy-First** approach. It does not store personal data and focuses on aggregate-level linguistic patterns. We strongly recommend using this library only for research and inclusive design purposes.

---

## 📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
