Metadata-Version: 2.4
Name: swifteda
Version: 3.0.0
Summary: A lightweight open-source tool to automate the EDA pipeline from reading and cleaning to visualization.
Home-page: https://github.com/Nitroxium18/Swift_EDA
Author: Rohan Babaria
Author-email: rohanbabaria18@gmail.com
License: Apache-2.0
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: license
Requires-Dist: numpy
Requires-Dist: matplotlib
Requires-Dist: seaborn
Dynamic: author
Dynamic: author-email
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

SwiftEDA
SwiftEDA is a lightweight Python library that automates common data cleaning and exploratory data analysis (EDA) tasks for ML pipelines. It simplifies missing-value handling, imputation, outlier detection, visualization, and dataset reporting into just a few lines of code.

This is my first personal project and my contribution to the data science and ML community.

Features:

Phase 1 (V1)
Custom CSV reader (read_csv) with: Header parsing Row alignment (handles uneven rows) Data type inference (int, float, datetime, bool, str) Missing value report generator Drop columns with high missing values Impute missing values (mean, median, mode) Tabular display of cleaned dataset One-line wrapper: clean_df() for custom control over feature usage

Phase 2 (V2)
Restructured wrapper: Simple wrapper that encapsulates all helpers with kwargs for modular use. Multi-format support: Added read_json() alongside read_csv. (Excel deliberately excluded to avoid dependency bloat — save as CSV/JSON instead).

Summary statistics upgrade: Mean, median, mode, min, max, 25th percentile, 75th percentile, and IQR. Limiter function: Optional row limiter (limit=n) to prevent terminal flooding with large datasets. Wrapper help(): Callable help function that explains wrapper usage, parameters, and defaults. Improved logging: Cleaner and more descriptive status reporting.

Phase 3 (V3)
Outlier detection & handling: IQR and Z-score methods. Flexible options to flag or remove outliers.

Edge-case refinement: Numeric coercion for strings like "$1,200" or "3.5%". Protection for identifiers (ZIP codes, IDs, phone numbers).

Visualization helpers: Histograms, boxplots, scatter plots, line charts. Correlation heatmaps and category frequency plots. Flexible kwargs for matplotlib/seaborn under the hood.

Comprehensive HTML report: Dataset info, missing value summary (before & after). Summary statistics, outlier summary, and selected plots. Exportable with export_html_path="report.html".

Type re-check system: recheck_types=True automatically re-infers column types after imputation/casting. Skips protected identifier columns. Logs upgrades (e.g., "Age" str → float).

SwiftEDA is developed as a learning project and personal contribution to simplify EDA workflows. The Devlog contains detailed patch notes for each version, implementation details, and real-world test case reports. Pull requests are welcome! For major changes, please open an issue first to discuss your ideas.

Example Usage (V3) from Swift_EDA_V3_Final import clean_df

Clean a dataset with outlier handling, visualization, and HTML report
header, data, types = clean_df( "Titanic-Dataset.csv", drop_threshold=0.3, impute_strategy="median", outlier_method="iqr", outlier_action="flag", visualize=True, plots=[("hist", "Age"), ("scatter", "Age", "Fare"), ("heatmap", None)], summary=True, export_html_path="eda_report.html" )

print("Header:", header) print("Types:", types)

Python Version 3.12.7

License: Apache

Thank you for viewing Swift EDA! Contributions, collaborations & feedback welcome!
SwiftEDA is an open source learning project & I'd love to collaborate with others who are passionate about Data Science & Python tooling.
If you'd like to help improve future versions whether through ideas, testing or development or notice any bugs, please open an issue. If you've already made improvements, open a PR so we can review & merge them. I'd love to learn from your feedback. Every bit helps the project grow!
