Metadata-Version: 2.1
Name: dreamml
Version: 3.5.4
Summary: Framework for creating, running and validation of ML models on tabular data
Author: 'Nikita Buts, Nikita Varganov, Alexander Izyurov, Ivan Plotnikov, Maidari Tsydenov, Evgeny Tkachenko, Ilya Ivanov'
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0
Requires-Dist: numpy<2.0,>=1.24.0
Requires-Dist: numba>=0.58.1
Requires-Dist: pandas<1.5,>=1.4.3
Requires-Dist: matplotlib>=3.3.4
Requires-Dist: pip>=22.1
Requires-Dist: seaborn>=0.12.1
Requires-Dist: scipy>=1.6.3
Requires-Dist: scikit-learn<1.2,>=1.0.2
Requires-Dist: ipython>=8.12.0
Requires-Dist: tqdm>=4.63.0
Requires-Dist: pyspark<3.4,>=3.2.1
Requires-Dist: catboost>=1.2.6
Requires-Dist: lightautoml>=0.3.8b1
Requires-Dist: lightgbm==3.2.1
Requires-Dist: bayesian-optimization==1.2.0
Requires-Dist: fairlearn==0.6.2
Requires-Dist: hyperopt==0.2.7
Requires-Dist: Jinja2==3.1.2
Requires-Dist: joblib==1.2.0
Requires-Dist: openpyxl==3.0.10
Requires-Dist: optuna==2.10.0
Requires-Dist: py4j==0.10.9.5
Requires-Dist: PyYAML==6.0
Requires-Dist: shap<0.45,>=0.42.1
Requires-Dist: XlsxWriter==1.4.3
Requires-Dist: xgboost>=2.0.3
Requires-Dist: workalendar==17.0.0
Requires-Dist: etna==2.7.1
Requires-Dist: Bottleneck<=1.4.0
Requires-Dist: prophet==1.1.5
Requires-Dist: statsmodels==0.13.5
Requires-Dist: colorama>=0.4.6
Requires-Dist: py-boost==0.4.3
Requires-Dist: autowoe>=1.3.0
Requires-Dist: ipynbname>=2023.2.0.0
Requires-Dist: pydantic>=2.0
Requires-Dist: requests>=2.0
Requires-Dist: scikit-multilearn==0.2.0
Requires-Dist: nltk==3.8.1
Requires-Dist: gensim==4.1.0
Requires-Dist: pymystem3==0.2.0
Requires-Dist: simhash==2.1.2
Requires-Dist: nlpaug==1.1.11
Requires-Dist: transformers==4.42.4
Requires-Dist: pytorch-lightning==2.2.5
Requires-Dist: evaluate==0.4.2
Requires-Dist: natasha==1.6.0
Requires-Dist: ruslingua==1.0.0
Requires-Dist: PyMultiDictionary==1.2.4
Requires-Dist: fasttext-wheel==0.9.2
Requires-Dist: sacremoses==0.1.1
Requires-Dist: sentencepiece==0.2.0
Requires-Dist: sentence-transformers==3.0.1
Requires-Dist: bertopic==0.16.3
Requires-Dist: skforecast==0.7.0
Requires-Dist: kaleido==0.2.1
Requires-Dist: reportlab>=3.6.12
Requires-Dist: botocore<1.35.44
Requires-Dist: notebook>=6.0.0
Requires-Dist: jupyterlab>=3.0.0
Provides-Extra: tests
Requires-Dist: pytest>=8.2.1; extra == "tests"
Requires-Dist: black>=24.4.2; extra == "tests"
Requires-Dist: notebook<7,>=6.5.7; extra == "tests"

## DreamML - Self Machine Learning ❤️

### The next stage of evalution DS-Template

![DreamML_promo](./docs/art/DreamML-promo.png)

### About the DreamML

---
__DreamML__ is a machine learning framework aimed at the industrial process. 
The main task is to choose a simple model, taking into account the balance of complexity, quality and metrics. 
We also suggest reviewing the quality of the models in special development reports, and for some tasks, a validation report created using the central bank's methodology.

*This is the first cycle of the project's release into open source, then we plan to publish more materials and improve the framework.

---
### Installation

#### 📦 Python Package
```bash
pip install dreamml
```

### 📂 Repository
Step 1: Install [Anaconda](https://docs.anaconda.com/anaconda/install/) or [Python 3.8](https://www.python.org/downloads/release/python-380/)

Step 2: Create environment
* Anaconda ```conda create --name dreamml_env python=3.8```
* Python 3.8 ```python -m venv dreamml_env```

Step 3: Activate environment
* Anaconda ```conda activate dreamml_env```
* Python ```source dreamml_env/bin/activate```

Step 4: Clone the repository and go to the dreamml root folder
```bash
git clone https://gitverse.ru/dreamml/DreamML.git
cd DreamML
```
Step 5: Install dreamml in your environment
```bash
pip install -e .
```

### 🐳 Docker
```bash
git clone https://gitverse.ru/dreamml/DreamML.git
cd DreamML
docker build -t dreamml:v3.5.4 .
docker run -d -p 8888:8888 -v $(pwd):/app --name dreamml_container dreamml:v3.5.4
```
(!) If ${pwd} does not work (for example, in older versions of PowerShell), use the absolute path:
```bash
docker run -d -p 8888:8888 -v C:\path\to\DreamML:/app --name dreamml_container dreamml:v3.5.4
```
Then go to http://localhost:8888

### Get started

---
To develop a model, you can use the notebooks located in the `notebooks/1. Model Development`
and select the one you need depending on the type of your task.

To validate models, you can use the notebooks located in the `notebooks/2. Validate Model`

To calibration models, you can use the notebooks located in the `notebooks/3. Calibration`


### How to Use

---

#### Information on notebooks for development `notebooks/1. Model Development`

1. First, you need to determine the pipeline configuration
   * For `regression`, `binary`, `multiclass`, `multilabel` tasks you can refer to this document `docs/1_Model_Development_doc.md`
   * For `topic_modeling` task you can refer to this document `docs/1_Topic_Modeling_doc.md`
   * For `timeseries` with (boosting) task you can refer to this document `docs/1_TimeSeries_doc.md`
   * For `amts` with (Prophet) task you can refer to this document `docs/1_AltModeTimeSeries_forecast.md`
   * If your dataset contains text features you should refer to this document `docs/1_NLP_text_classification_doc.md`
   * If you would like to learn more about quality metrics and loss functions, we recommend that you refer to the document `docs/Binary_Classification_Metrics_doc.md`
   

2. You should start building the configuration and preparing the data for modeling
```
config_storage = ConfigStorage(config=config)
transformer = DataTransformer(config_storage)
data_storage = transformer.transform()
```

3. Next, you should run the simulation pipeline
```
pipeline = MainPipeline(config_storage=config_storage, data_storage=data_storage)
pipeline.transform()
```

4. For some tasks, you can also use Light Auto M L as a model and calculate out of time potential
```
lama = add_lama_model(data_storage.get_eval_set(), config_storage)
oot_potential = calculate_oot_metrics(data_storage.get_eval_set(), config_storage)
```

5. You can also start the process of saving simulation artifacts if you need it
```
saver = pipeline.artifact_saver
models = pipeline.prepared_model_dict
pipeline.oot_potential = oot_potential
models.update(lama)
nb_name = saver.get_notebook_path_and_save()
saver.save_artifacts(
    models=models,
    other_models=pipeline.other_model_dict,
    encoder=transformer.cat_transformer,
    ipynb_name=nb_name,
    feature_threshold=config_storage.feature_threshold,
)
saver.save_data(data=data_storage.get_eval_set(), dropped_data=data_storage.get_dropped_data())
```

6. At the end, we can generate a development report. By default, it will be saved to the `dreamml/results` folder.
```
get_report(pipeline=pipeline, config_storage=config_storage, data_storage=data_storage, encoder=transformer.cat_transformer)
```

### Authors

---
| Author            | Email                     |
|-------------------|---------------------------|
| Nikita Buts       | nikitabuts2000@gmail.com  |
| Alexander Izyurov | halfbrick845@gmail.com    |
| Ivan Plotnikov    | com.gateway.api@gmail.com |
| Maidari Tsydenov  | maidaritsydenov@gmail.com |
| Evgeny Tkachenko  | e_t@inbox.ru              |
| Ilya Ivanov       | morwes4@gmail.com         |
| Nikita Varganov   | -                         |


### LICENSE

---
This project is licensed under the Apache License, Version 2.0. See [LICENSE](link) for details.
