Metadata-Version: 2.4
Name: C2REG1
Version: 0.1.0
Summary: A minimal linear regression pipeline providing six functions for data inspection, type adjustment, listwise cleaning, OLS fitting, parameter export, and diagnostics.
Author-email: Dr Subbiah <msubbiah@cepheus.in>
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Requires-Dist: pandas>=1.3
Requires-Dist: numpy>=1.19
Requires-Dist: statsmodels>=0.12
Requires-Dist: scipy>=1.5
Dynamic: license-file

# C2REG1: A Structured Linear Regression Utility for Python

C2REG1 provides a clean and predictable workflow for linear regression analysis.

It is designed for analysts who prefer a transparent, step-wise approach for building statistical models, checking missing values, converting variable types, fitting OLS models, and generating compact diagnostic tables.

The package is intentionally minimal, without hidden transformations, and is suitable for teaching, research, and production-quality statistical reporting.

C2REG1 guides users through:

Dataset structure review

Variable type conversion

Missing value summary + listwise deletion

OLS model fitting 

Compact model summary

Diagnostics table with CIs, t-tests, p-values

Each step is a simple, transparent function callable independently.
---

## Main Functions

print_dataset_structure()

convert_variable_types()

mv_summary_and_listwise_deletion()

fit_ols()

specific_output()

summary_table()

## 🔧 Installation

Install from PyPI:


pip install C2REG1


---

📘 Function Reference: C2REG1 Linear Regression Workflow

C2REG1 provides a structured, six-step linear regression workflow. Each function is intentionally minimal, predictable, and works on standard pandas DataFrames.

1️⃣ print_dataset_structure(df, max_rows_preview=5)
Purpose

Provides a compact structural summary of a dataset — similar to R’s str() or SAS contents.

Inputs
Argument		Type		Description
df			DataFrame	Raw dataset to inspect
max_rows_preview	int		Number of rows to preview (default 5)

Output
Prints column names, dtypes, non-null counts, and a small data preview.

2️⃣ convert_variable_types(df, conversions)
Purpose

User-controlled conversion of selected variables into desired data types.

Inputs
Argument		Type					Description
df				DataFrame			Input DataFrame
conversions		dict					Mapping of column names → target type ('float', 'int', 'category', 'str', 'bool')

Output
Returns a modified DataFrame (copy) with updated types.
Prints warnings for variables not present.

3️⃣ mv_summary_and_listwise_deletion(df, variables_to_consider=None)
Purpose

Summarizes missing values and performs listwise deletion across selected variables.

Inputs
Argument			Type	 		Description
df					DataFrame	Data after type conversions
variables_to_consider	list or None	Variables to check for missing values. If None, uses all columns.
Outputs
Returns
df_clean — dataset after listwise deletion
summary_dict — counts of missing values, rows removed, and sample deleted row indices
Also prints a human-readable missing-value summary.

4️⃣ fit_ols(df, dependent, independents, add_intercept=True, NumDigits=6)
Purpose

Fits an OLS regression using numeric variables and encoded categorical variables.
Prints summaries, ANOVA, and parameter tables.

Inputs
Argument	Type			Description
df			DataFrame	Cleaned dataset
dependent	str			Name of the dependent variable
independents	list			List of predictors (numeric or categorical)
add_intercept	bool			Adds an intercept column (default True)
NumDigits		int			Rounding precision for printed output
Outputs
Returns:
results — statsmodels OLS result object
summary_stats — dict with ANOVA, coefficient table, RMSE, R², Adj-R², CV %, etc.

Also prints:
OLS fit summary
Coefficient table
Manual ANOVA table

5️⃣ specific_output(results, dependent, model_label='Model1', NumDigits=6)
Purpose

Creates a compact, a row containing coefficients and metadata.

Inputs
Argument	Type			Description
results		OLS results	Output of fit_ols()
dependent	str			Dependent variable name
model_label	str			Label for the model (default "Model1")
NumDigits	int			Rounding precision for parameters
Output

Returns a single-row DataFrame containing parameters, RMSE, and metadata (_MODEL_, _TYPE_, _DEPVAR_).


6️⃣ summary_table(results, alpha=0.05, NumDigits=6)

Purpose

Computes diagnostic statistics: standard errors, t-values, p-values, and CI bounds.

Inputs
Argument	Type		Description
results		OLS results	Model results from fit_ols
alpha		float		Confidence level (default 0.05 → 95% CI)
NumDigits	int	    	Rounding precision for printed diagnostics
Output
Returns a DataFrame with:
estimate
stderr
t-statistic
p-value
lower/upper CI bounds
Prints a rounded version of the diagnostics table.

---

## 🚀 Quick Example

```python
import pandas as pd
import numpy as np
import C2REG1 as c2r1


# Sample dataset
df = pd.DataFrame({
    "bweight": np.random.normal(3000, 600, size=200),
    "matage": np.random.randint(18, 40, size=200),
    "ht": np.random.choice(["yes", "no"], size=200),
    "sex": np.random.choice(["male", "female"], size=200)
})

# Step 1: Inspect structure
c2r1.print_dataset_structure(df)

# Step 2: Adjust variable types
c2r1.df = convert_variable_types(df, {"ht": "category", "sex": "category"})

# Step 3: Missing-value summary & listwise deletion
# By default, all columns are considered if variables_to_consider is not provided
df_clean, mv_summary = c2r1.mv_summary_and_listwise_deletion(df)

# Or, optionally, specify a subset of variables
# df_clean, mv_summary = c2r1.mv_summary_and_listwise_deletion(df, ["bweight", "matage", "ht", "sex"])

# Step 4: Fit OLS model
results, stats = fit_ols(
    df_clean,
    dependent="bweight",
    independents=["matage", "ht", "sex"],NumDigits=3
)

# Step 5: Export parameter estimates 
outest_df = c2r1.specific_output(results, dependent="bweight",model_label="BW_Model1",NumDigits=4)

# Step 6: summary_table (stderr, t, p-value, CI) 
diag_df = c2r1.summary_table(results, alpha=0.05,NumDigits=4)
```

---

## 📘 Notes

* C2REG1 does **not** perform automatic transformations (e.g., log, squared, interaction terms).
  Users should create any derived variables manually in their DataFrame before fitting.
* Categorical variables are automatically dummy-encoded during regression.
* The output includes:

  * ANOVA
  * RMSE
  * R² and Adjusted R²
  * Coefficient table
  * OUTEST-like export
  * Diagnostic table with confidence intervals

---

## 📄 License

C2REG1 is open-source under the MIT License.


---
