Metadata-Version: 2.4
Name: idmd
Version: 0.0.1
Summary: IDMD - Interactive Data Manipulator and Descriptor
Author: Bence Gercuj, Csongor Loránd Laczkó, Richárd Bence Rózsa
Maintainer-email: Bence Gercuj <gercuj.bence@hallgato.ppke.hu>, Csongor Loránd Laczkó <laczko.csongor.lorand@hallgato.ppke.hu>, Richárd Bence Rózsa <rozsa.richard.bence@hallgato.ppke.hu>
License-Expression: GPL-3.0-only
Project-URL: Homepage, https://github.com/CsongorLaczko/idmd
Project-URL: Documentation, https://idmd.readthedocs.io
Project-URL: Repository, https://github.com/CsongorLaczko/idmd.git
Project-URL: Issues, https://github.com/CsongorLaczko/idmd/issues
Classifier: Development Status :: 1 - Planning
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: matplotlib==3.10.3
Requires-Dist: openpyxl==3.1.5
Requires-Dist: pandas==2.2.3
Requires-Dist: seaborn==0.13.2
Requires-Dist: streamlit==1.45.0
Dynamic: license-file

# Interactive Data Manipulator and Descriptor (IDMD)
## Overview

The **Interactive Data Manipulator and Descriptor (IDMD)** is a Python package designed for interactive data exploration, manipulation, visualization, and reporting. It provides a modular structure for handling datasets, making it easy to extend and maintain. (Scientific Python course at PPCU - project work)

![idmd chart](images/idmd_chart.svg)

## Dependencies

The IDMD package requires the following main dependencies:
- matplotlib
- streamlit
- pandas

For specific version requirements, refer to the [requirements.txt](requirements.txt) file. If you are developing or contributing to the project, additional tools for code quality, formatting, linting, and testing are listed in [requirements-dev.txt](requirements-dev.txt).

---

## Package Structure

### `data` Module
Handles all data-related operations, such as file uploading, dataset generation, and exporting.

- **Submodules**:
  - [`export.py`](idmd/data/exporter.py): Handles exporting datasets to CSV or other formats.
  - [`generator.py`](idmd/data/generator.py): Generates sample datasets with different distributions.
  - [`uploader.py`](idmd/data/uploader.py): Handles file uploads and loading datasets.

---

### `manipulation` Module
Provides functionality for manipulating datasets.

- **Submodules**:
  - [`columns.py`](idmd/manipulation/columns.py): Handles column-specific operations like swapping, dropping, and selecting columns.
  - [`replace.py`](idmd/manipulation/replace.py): Handles value-specific operations like replacing values with mean, median, or other methods.

---

### `ui` Module
Contains components for rendering the Streamlit interface.

- **Submodules**:
  - [`base.py`](idmd/ui/base.py): Defines the abstract `Component` class for all UI components.
  - [`columns_ui.py`](idmd/ui/columns_ui.py): Provides UI for column manipulation.
  - [`data_preview.py`](idmd/ui/data_preview.py): Displays a preview of the dataset.
  - [`data_stats.py`](idmd/ui/data_stats.py): Displays dataset statistics and metadata.
  - [`exporter_ui.py`](idmd/ui/exporter_ui.py): Provides UI for exporting data.
  - [`generator_ui.py`](idmd/ui/generator_ui.py): Provides UI for generating data.
  - [`replace_ui.py`](idmd/ui/replace_ui.py): Provides UI for replacing operations.
  - [`uploader_ui.py`](idmd/ui/uploader_ui.py): Provides UI for file uploading.
  - [`visualizer_ui.py`](idmd/ui/visualizer_ui.py): Provides UI for visualizing data, including default and custom plots.

---

### `visualization` Module
Handles data visualization.

- **Submodules**:
  - [`plots.py`](idmd/visualization/plots.py): Generates various types of plots (e.g., line plots, bar plots).
  - [`heatmaps.py`](idmd/visualization/heatmaps.py): Generates correlation heatmaps.
  - [`histograms.py`](idmd/visualization/histograms.py): Generates histograms.
  - [`visualizer.py`](idmd/visualization/visualizer.py): Utility class for generating visualizations, including line plots, histograms, and heatmaps.

---

### `report` Module
Handles report generation.

- **Submodules**:
  - [`report.py`](idmd/report/report.py): Generates PDF reports with data and visualizations.

---

### `app.py`
Orchestrates the integration of all components and runs the Streamlit application.

---

## Features

1. **Sample Dataset Generation**:
   - Generate datasets with different distributions (e.g., normal, uniform).
   - Easily create synthetic data for testing and exploration.

2. **Data Manipulation**:
   - Swap, drop, and select columns.
   - Replace missing values with mean, median, or other methods.
   - Normalize or remove outliers.

3. **Data Visualization**:
   - Generate interactive plots, histograms, and heatmaps.
   - Explore data visually with Streamlit's interactivity.

4. **Data Export**:
   - Export processed datasets to CSV format.

5. **Report Generation**:
   - Generate PDF reports with data summaries and visualizations.

---

## Example Usage

Run the example application using:

```bash
streamlit run example_app.py
```

You can also explore the interactive example notebook [example_app.ipynb](example_app.ipynb) that demonstrates:
- Package installation
- Creating a complete dashboard application
- Running the app locally or on Google Colab
- Using all major components of the package

Opening the resulting website, should show a dashboard like this:
![Dashboard example without data](images/Dashboard_without_data.png)

Loading or generating data should show a similar result:
![Dashboard example with data 1](images/Dashboard_with_data_1.png)
![Dashboard example with data 2](images/Dashboard_with_data_2.png)

---

## Links

- [Homepage](https://github.com/CsongorLaczko/idmd)
- [Documentation](https://idmd.readthedocs.io)
- [Repository](https://github.com/CsongorLaczko/idmd.git)
- [Issues](https://github.com/CsongorLaczko/idmd/issues)
