Metadata-Version: 2.4
Name: intuitiveness
Version: 0.1.0
Summary: Transform raw datasets into purpose-built data through descent-ascent methodology
Author-email: Arthur Sarazin <arthur.sarazin@datactivist.coop>, Mathis Mourey <mathis.mourey@datactivist.coop>
Maintainer-email: Arthur Sarazin <arthur.sarazin@datactivist.coop>
License-Expression: MIT
Project-URL: Homepage, https://github.com/ArthurSrz/intuitiveness
Project-URL: Documentation, https://github.com/ArthurSrz/intuitiveness#readme
Project-URL: Repository, https://github.com/ArthurSrz/intuitiveness
Project-URL: Issues, https://github.com/ArthurSrz/intuitiveness/issues
Project-URL: Changelog, https://github.com/ArthurSrz/intuitiveness/releases
Project-URL: Research, https://zenodo.org/badge/latestdoi/685140191
Keywords: data-science,data-quality,dataset-transformation,complexity-management,tabpfn,knowledge-graph,data-engineering,open-data
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5.0
Requires-Dist: networkx>=2.8
Requires-Dist: numpy>=1.23.0
Requires-Dist: scikit-learn>=1.2.0
Requires-Dist: scipy>=1.10.0
Provides-Extra: quality
Requires-Dist: tabpfn-client>=0.0.21; extra == "quality"
Requires-Dist: tabpfn-extensions>=0.0.12; extra == "quality"
Requires-Dist: shap>=0.42.0; extra == "quality"
Provides-Extra: neo4j
Requires-Dist: neo4j>=5.0.0; extra == "neo4j"
Provides-Extra: embeddings
Requires-Dist: sentence-transformers>=2.2.0; extra == "embeddings"
Provides-Extra: discovery
Requires-Dist: requests>=2.28.0; extra == "discovery"
Provides-Extra: app
Requires-Dist: streamlit>=1.28.0; extra == "app"
Requires-Dist: streamlit-agraph>=0.0.45; extra == "app"
Requires-Dist: plotly>=5.14.0; extra == "app"
Requires-Dist: matplotlib>=3.7.0; extra == "app"
Provides-Extra: all
Requires-Dist: intuitiveness[app,discovery,embeddings,neo4j,quality]; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: behave>=1.2.6; extra == "dev"
Requires-Dist: playwright>=1.40.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: build>=0.10.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Dynamic: license-file

# Intuitiveness

[![PyPI version](https://img.shields.io/pypi/v/intuitiveness.svg)](https://pypi.org/project/intuitiveness/)
[![Python versions](https://img.shields.io/pypi/pyversions/intuitiveness.svg)](https://pypi.org/project/intuitiveness/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![DOI](https://zenodo.org/badge/685140191.svg)](https://zenodo.org/badge/latestdoi/685140191)

<p align="center">
  <img src="https://raw.githubusercontent.com/ArthurSrz/intuitiveness/main/images/gear_cube_logo.svg" alt="Intuitiveness Gear Cube" width="120"/>
</p>

**Transform raw, complex datasets into purpose-built data that directly answers your questions.**

Intuitiveness is a Python package implementing the **descent-ascent methodology** for dataset transformation, backed by peer-reviewed research. It helps data scientists and analysts simplify messy data, extract core insights, and rebuild enriched datasets with exactly the dimensions they need.

---

## 🎯 What is Intuitiveness?

Traditional data workflows force you to work with whatever structure you're given. Intuitiveness flips this: **you define the question first, then the data follows**.

### The Descent-Ascent Cycle

1. **Descent (L4 → L0)**: Strip away complexity to find the core truth
   - Raw tabular data → Knowledge graph → Categories → Features → Single datum

2. **Ascent (L0 → L3)**: Rebuild with YOUR intent, adding only relevant dimensions
   - Single datum → Features → YOUR categories → YOUR relationships → Purpose-built dataset

### The 5 Levels of Abstraction

| Level | Name | Description | Example |
|-------|------|-------------|---------|
| **L4** | Raw Dataset | Original tabular data | `students.csv` with 50 columns |
| **L3** | Entity Graph | Knowledge graph of relationships | Student → School → District |
| **L2** | Domain Categories | Grouped by semantic domains | Urban/Rural schools |
| **L1** | Feature Vector | Unified numeric representation | [85.5, 320, 0.42, ...] |
| **L0** | Core Datum | Single atomic value | Average score: 85.5 |

---

## ⚡ Quick Start

### Installation

```bash
# Core package (descent/ascent operations)
pip install intuitiveness

# With quality assessment (TabPFN-based)
pip install intuitiveness[quality]

# With Neo4j knowledge graphs
pip install intuitiveness[neo4j]

# With data discovery (data.gouv.fr search)
pip install intuitiveness[discovery]

# Full features (includes Streamlit app)
pip install intuitiveness[all]
```

### Basic Usage

#### Quality Assessment (TabPFN)

```python
import pandas as pd
from intuitiveness import assess_quality

# Load your dataset
df = pd.read_csv("data.csv")

# Quick quality assessment
report = assess_quality(df, target_column="label")

print(f"Usability Score: {report.usability_score:.2f}")
print(f"Data Completeness: {report.completeness:.2f}")
print(f"Feature Diversity: {report.diversity:.2f}")

# Get improvement suggestions
for suggestion in report.suggestions:
    print(f"- {suggestion.type}: {suggestion.description}")
```

#### Descent-Ascent Cycle

```python
from intuitiveness import (
    Level4Dataset,
    ComplexityLevel,
    descend,
    ascend
)
import pandas as pd

# Start with raw data
df = pd.read_csv("schools_data.csv")
l4 = Level4Dataset({"raw": df})

# Descend to core truth
l0 = descend(l4, ComplexityLevel.LEVEL_0,
             operation="mean",
             target_column="score")
print(f"Core datum: {l0.data}")  # {'average_score': 85.5}

# Ascend with YOUR dimensions
l3 = ascend(l0, ComplexityLevel.LEVEL_3,
            enrichments=["region", "school_type", "funding_level"])

# Export purpose-built dataset
l3.export("purpose_built_schools.csv")
```

#### Feature Checking

```python
from intuitiveness import (
    QUALITY_AVAILABLE,
    DISCOVERY_AVAILABLE,
    NEO4J_AVAILABLE
)

print(f"Quality Assessment: {'✓' if QUALITY_AVAILABLE else '✗ (pip install intuitiveness[quality])'}")
print(f"Data Discovery: {'✓' if DISCOVERY_AVAILABLE else '✗ (pip install intuitiveness[discovery])'}")
print(f"Neo4j Graphs: {'✓' if NEO4J_AVAILABLE else '✗ (pip install intuitiveness[neo4j])'}")
```

---

## 🚀 Features

### Core Functionality
- ✅ **5-level complexity system** (L0-L4) for dataset abstraction
- ✅ **Descent operations** to simplify datasets and extract core insights
- ✅ **Ascent operations** to rebuild datasets with custom dimensions
- ✅ **Navigation system** to explore and track transformation paths

### Quality Assessment (with `[quality]` extra)
- 📊 **TabPFN-based scoring** for dataset usability (0-100 scale)
- 🔍 **Feature profiling** with importance rankings
- 💡 **Automated suggestions** for data improvements
- 🎯 **Anomaly detection** using density estimation
- 🧪 **Synthetic data generation** with validation benchmarks

### Data Discovery (with `[discovery]` extra)
- 🇫🇷 **Natural language search** for French open data (data.gouv.fr)
- 🤖 **SmolLM3-powered queries** in plain French
- 📥 **Direct CSV downloads** with caching

### Knowledge Graphs (with `[neo4j]` extra)
- 🕸️ **Neo4j integration** for entity relationship storage
- 🔗 **Graph-based navigation** through dataset transformations
- 🧠 **Semantic matching** using sentence embeddings

### Streamlit App (with `[app]` extra)
- 🖥️ **Interactive web interface** for visual workflows
- 📈 **Real-time quality visualizations** with Plotly
- 🎨 **Export tools** for CSV, JSON, and Python snippets

---

## 📦 Installation Options

| Install Command | Includes | Use Case |
|----------------|----------|----------|
| `pip install intuitiveness` | Core package | Basic descent/ascent operations |
| `pip install intuitiveness[quality]` | + TabPFN, SHAP | Data quality assessment |
| `pip install intuitiveness[neo4j]` | + Neo4j driver | Knowledge graph storage |
| `pip install intuitiveness[embeddings]` | + sentence-transformers | Semantic matching |
| `pip install intuitiveness[discovery]` | + requests | Data.gouv.fr search |
| `pip install intuitiveness[app]` | + Streamlit, Plotly | Full web application |
| `pip install intuitiveness[all]` | Everything | Complete feature set |
| `pip install intuitiveness[dev]` | + pytest, ruff, mypy | Development tools |

---

## 📚 Documentation

- **GitHub Repository**: [ArthurSrz/intuitiveness](https://github.com/ArthurSrz/intuitiveness)
- **Research Paper**: [Intuitiveness as the Next Stage of Open Data](https://zenodo.org/badge/latestdoi/685140191)
- **Scientific Article**: See `scientific_article/` directory for peer-reviewed methodology

### Prerequisites for Full Features

#### Neo4j Database (optional)
```bash
docker run -d --name neo4j -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  -e NEO4J_PLUGINS='["apoc"]' \
  neo4j:latest
```

#### HuggingFace Token for NL Queries (optional)
Set `HF_TOKEN` environment variable:
```bash
export HF_TOKEN="your_token_here"
```

Or add to `.streamlit/secrets.toml`:
```toml
HF_TOKEN = "your_token_here"
```

---

## 🧪 Example Use Cases

### 1. Data Scientist: Quick Quality Check
```python
from intuitiveness import assess_quality
import pandas as pd

df = pd.read_csv("messy_data.csv")
report = assess_quality(df, target_column="target")

if report.usability_score < 60:
    print("⚠️ Low quality dataset - applying suggestions...")
    from intuitiveness import apply_all_suggestions
    improved_df = apply_all_suggestions(df, report.suggestions)
    print(f"✅ Score improved to {assess_quality(improved_df).usability_score:.0f}")
```

### 2. Analyst: Finding Core Insights
```python
from intuitiveness import Level4Dataset, descend, ComplexityLevel

# Strip away complexity
l4 = Level4Dataset({"raw": pd.read_csv("sales_2024.csv")})
l0 = descend(l4, ComplexityLevel.LEVEL_0,
             operation="sum",
             target_column="revenue")
print(f"Total Revenue: ${l0.data['total_revenue']:,.2f}")
```

### 3. Researcher: Building Custom Datasets
```python
from intuitiveness import Level0Dataset, ascend, ComplexityLevel

# Start from core truth
l0 = Level0Dataset({"gdp_growth": 2.3})

# Add relevant dimensions for YOUR research question
l3 = ascend(l0, ComplexityLevel.LEVEL_3,
            enrichments=["country", "sector", "quarter"])

# Export for modeling
l3.export("research_dataset.csv")
```

---

## 🏆 Acknowledgments

Part of the [Dataflow](https://dataflow.hypotheses.org/) research project.

**Funded by**:
- [Datactivist](https://datactivist.coop/)
- UNESCO Chair in AI and Data Science for Society

**Designed by**: Arthur Sarazin & Mathis Mourey

---

## 📄 License

MIT License - see [LICENSE](LICENSE) file for details.

Copyright (c) 2024-2025 Arthur Sarazin & Mathis Mourey

---

## 📖 Citation

If you use Intuitiveness in your research, please cite:

```bibtex
@software{intuitiveness2024,
  author = {Sarazin, Arthur and Mourey, Mathis},
  title = {Intuitiveness: Purpose-Built Dataset Transformation},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/ArthurSrz/intuitiveness},
  doi = {10.5281/zenodo.685140191}
}
```

---

<p align="center">
  <sub>Built with ❤️ for better data science</sub>
</p>
