Metadata-Version: 2.4
Name: ngpu
Version: 0.1.0
Summary: Hierarchical polling unit intelligence for Nigeria (ML-ready)
Author-email: "Chidiebere V. Christopher" <vchidiebere.vc@gmail.com>, MaryBlessing Umeh <umaryblessing@gmail.com>
Project-URL: Homepage, https://github.com/93Chidiebere/ngpu
Project-URL: Issues, https://github.com/93Chidiebere/ngpu/issues
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=1.5
Requires-Dist: scikit-learn>=1.2
Requires-Dist: rapidfuzz>=3.0
Dynamic: license-file

# Nigerian Polling Units

**ngpu** is a Python and R programming library that provides hierarchical intelligence over
Nigeria’s polling unit administrative structure, designed specifically
for machine learning, analytics, and research workflows.

## Features

- Programmatic access to States, LGAs, Wards, and Polling Units
- ML-ready DataFrame export
- Hierarchical categorical encoders
- Coverage and bias diagnostics
- Fuzzy search for noisy text inputs

## Installation

For Python Users: Run in notebook or IDE
```bash
pip install ngpu
```

For R Users: Run in terminal or Anaconda Prompt
```bash
conda create -n ngpu-r -c conda-forge python=3.10 ngpu r-base r-reticulate pandas scikit-learn
```
This installs:

Python

ngpu

R

reticulate

ML dependencies


## Quick Start

```python
from ngpu import Index, to_dataframe

states = Index.states()
lgas = Index.lgas("Anambra")

df = to_dataframe()
```

## Connecting R to ngpu

Open R or RStudio.
```r
library(reticulate)

use_condaenv("ngpu-r", required = TRUE)

ngpu <- import("ngpu")
```

## Machine Learning Usgae

```python
from ngpu.ml.encoder import PollingUnitEncoder

encoder = PollingUnitEncoder(level="ward")
X = encoder.fit_transform(df)
```

## Coverage Diagnostics

```python
from ngpu.ml.diagnostics import CoverageReport

report = CoverageReport(df)
report.coverage_by_state()
```

## Use Cases

Regional ML feature engineering

Bias and coverage analysis

Socio-economic modeling

Civic tech and policy research

Any domain requiring stable Nigerian administrative anchors

## In Python

List all states:
```python
from ngpu import Index

states = Index.states()
print(states)
```
⬆ Validate categorical values in incoming datasets.


Get LGAs for a state:
```python
lgas = Index.lgas("Enugu")
print(lgas)
```
⬆ Fill missing LGA values during data cleaning.


Get wards and polling units
```python
wards = Index.wards("Enugu", "Awgu")
pus = Index.polling_units("Enugu", "Awgu", "Ward 1")
```
⬆ Hierarchical drill-down analysis.


Converting to a DataFrame:
```python
df = ngpu.to_dataframe()
df.head()
```
Join with survey data

Join with transaction data

Feature engineering pipelines



#### Problem

Dataset contains:

state
Missing lga, ward, polling_unit

#### Solution
```python
import pandas as pd

data = pd.DataFrame({
    "state": ["Lagos", "Kano"]
})

ngpu_df = ngpu.to_dataframe()

enriched = data.merge(ngpu_df, on="state", how="left")
```
#### Result:

All valid LGAs, wards, and polling units are added

No manual mapping required

## Machine Learning
#### Hierarchical Encoding
```python
from ngpu.ml import PollingUnitEncoder

encoder = PollingUnitEncoder()

encoded = encoder.fit_transform(
    state="Lagos",
    lga="Ikeja",
    ward="Ward 1",
    polling_unit="PU 001"
)

print(encoded)
```
#### Use Case

Converts hierarchy into numeric features

Compatible with scikit-learn models

## ML Pipeline
```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X = ngpu_df[["state", "lga", "ward"]]
y = ngpu_df["polling_unit"]

encoder = PollingUnitEncoder()
X_encoded = encoder.fit_transform_df(X)

X_train, X_test, y_train, y_test = train_test_split(
    X_encoded, y, test_size=0.2
)

model = RandomForestClassifier()
model.fit(X_train, y_train)
```

### Data Validation & Diagnostics
```python
from ngpu.ml import HierarchyDiagnostics

diag = HierarchyDiagnostics()

diag.validate(
    state="Lagos",
    lga="Gwale"
)
```
#### Returns:

True → valid

False → invalid

#### Use Case:

Detect bad records before training

Prevent invalid inference




## In R
```r
states <- ngpu$Index$states()
print(states)
```
⬆ Validate state names in a dataset before analysis.

```r
lgas <- ngpu$Index$lgas("Lagos")
print(lgas)
```
⬆ Fill missing LGAs when only state information exists.

```r
df <- ngpu$to_dataframe()
df_r <- py_to_r(df)

head(df_r)
```
⬆ Join ngpu data with survey, transaction, or demographic datasets.

### Problem:

You have a dataset with:
state
missing lga, ward, polling_unit

### Solution:
```r
library(dplyr)

data <- data.frame(
  state = c("Lagos", "Kano")
)

ngpu_df <- py_to_r(ngpu$to_dataframe())

enriched <- data %>%
  left_join(ngpu_df, by = "state")

head(enriched)
```
Dataset is automatically expanded with valid LGAs, wards, and polling units.

### Machine Learning Usage
```r
encoder <- ngpu$PollingUnitEncoder()

encoded <- encoder$fit_transform(
  state = "Taraba",
  lga = "Jalingo",
  ward = "Ward 7",
  polling_unit = "PU 007"
)

print(encoded)
```
Convert categorical hierarchy into numeric ML-ready features

Used in regression, classification, and clustering.

### Use with caret/tidymodels
```r
library(caret)

ml_df <- ngpu_df %>%
  select(state, lga, ward) %>%
  mutate(across(everything(), as.factor))

model <- train(
  ward ~ .,
  data = ml_df,
  method = "rf"
)
```
Predict missing administrative attributes
Learn regional patterns


### Data Validation & Diagnotstics
```r
diag <- ngpu$HierarchyDiagnostics()

diag$validate(
  state = "Lagos",
  lga = "Gwale"
)
```
#### Returns:

TRUE → valid

FALSE → impossible combination

#### Use case:
Catch data errors before modeling
Prevent garbage-in-garbage-out ML


## Common Errors & Fixes

#### Error: ModuleNotFoundError
```r
use_condaenv("ngpu-r", required = TRUE)
```
#### Error: Wrong Python
```r
py_config()
```
Ensure it points to ngpu-r.


## Authors:

#### MaryBlessing Umeh, Software Engineer

#### Chidiebere V. Christopher, Data Scientist


## LICENSE

MIT

