Metadata-Version: 2.4
Name: mlguardx
Version: 0.1.0
Summary: ML data validation, profiling, drift detection and data contracts framework
Home-page: https://github.com/asthapaikacse/mlguardian
Author: Astha Paika
Author-email: asthapaika647@gmail.com
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: scikit-learn
Requires-Dist: pyyaml
Dynamic: author-email
Dynamic: home-page
Dynamic: requires-python

# MLGuardian
- Smart Schema Inference
- Data Contracts
- Drift Detection
- Data Profiling


<!-- utils.py functionality -->
1. Tracks if data changed or not
2. Detect schema changes
3. Reduce memory usage
4. Automatically detect CSV format
5. Preparing datasets for production ML

why?
1. Schema change
2. Data Corruption
3. Memory overload

<!-- Profiling script -->
Pandas alternative, as it automatically analyzes the dataset structure, quality, correlations, and drift and produces warnings and actionable ML recommendations.

<!-- Schema.py -->
This is a schema management system for ML datasets.
It does 4 major things:
- Automatically infer dataset schema
- Detect semantic meaning (email, URL, UUID, etc.)
- Validate new data against stored schema
- Handle schema evolution & drift


<!-- problems I faced -> when working -> ML projects 
-->

1. dataset schema changes -> model crashes
2. more NANs or NULLS -> model fails
3. Accuracy drops

what this pip will do?

1. Automatically infers column types

2. Detects target column

3. Detects semantic types (email, uuid, etc.)

4. Stores min/max/mean

5. Saves schema to JSON/YAML

6. Validates new data against schema

7. Calculates null %

8. Detects high cardinality

9. Detects constant columns

10. Finds correlated features

11. Produces HTML report

12. Suggests recommendations
## Installation
pip install mlguardian




