Metadata-Version: 2.1
Name: honcaml
Version: 0.2.1
Summary: Holistic and No Code Auto Machine Learning
Author-email: Joan Erráez <joan.erraez@eurecat.org>, Xavier de Juan <xavier.dejuan@eurecat.org>, Jordi Casals <jordi.casalsg@eurecat.org>, Marina Rosell <marina.rosellg@eurecat.org>, Cristina Soler <cristina.soler@eurecat.org>, Cirus Iniesta <cirus.iniesta@eurecat.org>, Luca Piras <luca.piras@eurecat.org>
Maintainer-email: Applied Machine Learning <aml@eurecat.org>
License: BSD License
        
        Copyright (c) 2022, Eurecat Centre Tecnològic de Catalunya
        All rights reserved.
        
        Redistribution and use in source and binary forms, with or without modification,
        are permitted provided that the following conditions are met:
        
        * Redistributions of source code must retain the above copyright notice, this
          list of conditions and the following disclaimer.
        
        * Redistributions in binary form must reproduce the above copyright notice, this
          list of conditions and the following disclaimer in the documentation and/or
          other materials provided with the distribution.
        
        * Neither the name of Eurecat nor the names of its
          contributors may be used to endorse or promote products derived from this
          software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
        ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
        WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
        IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
        INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
        BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
        DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
        OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
        OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
        OF THE POSSIBILITY OF SUCH DAMAGE.
        
Project-URL: Homepage, https://github.com/Data-Science-Eurecat/HoNCAML
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: BSD License 
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: joblib
Requires-Dist: openpyxl
Requires-Dist: optuna
Requires-Dist: pandas
Requires-Dist: plotly
Requires-Dist: pyyaml
Requires-Dist: ray==2.0.0
Requires-Dist: ray[tune]
Requires-Dist: scikit-learn
Requires-Dist: streamlit==1.29
Requires-Dist: torch==2.0.1
Provides-Extra: check
Requires-Dist: flake8; extra == "check"
Provides-Extra: document
Requires-Dist: sphinx; extra == "document"
Provides-Extra: tests
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-cov; extra == "tests"

# HoNCAML

## Introduction

HoNCAML (Holistic No Code Automated Machine Learning) is a tool aimed to run
automated machine learning pipelines, and specifically focused on finding the
best model and hyperparameters for the problem at hand.

Following the [no code
paradigm](https://en.wikipedia.org/wiki/No-code_development_platform), no
Python knowledge is needed. There are two ways to define pipelines:

* Through the Graphical User Interface
* Through [YAML](https://yaml.org/) configuration files

## Pipelines

There are three types of provided pipelines.

### Train

Train a specific model with the hyperparameters specified.

- Input: A dataset for the training.
- Output: The model object stored to disk.

## Predict

Use a model to generate predictions for a specific dataset.

- Input: A dataset for the test, together with a model object.
- Output: A tabular file with the predictions.

## Benchmark

Search for the best model and hyperparameters for the dataset at hand.

- Input: A dataset for the benchmark.
- Output: Main output is a configuration file with the best model and
  hyperparameters, and a tabular file with the results for all configurations
  tested.

## Focus

HoNCAML has been designed having the following aspects in mind:

* Ease of use
* Modularity
* Extensibility
* Simpler is better

## Users

HoNCAML does not assume any kind of technical knowledge, but at the same time
it is designed to be extended by expert people. Therefore, its user base may
range from:

* **Basic users**: In terms of programming experience and/or machine learning
  knowledge. It would be possible for them to get results in an easy way.

* **Advanced users**: It is possible to customize experiments in order to
  adapt to a specific use case that may be needed by an expert person.

## Support

Regarding each of the following concepts, HoNCAML supports specific sets of
them; nevertheless, due to its nature, extend the library further should be not
only feasible, but intuitive.

### Data structure

For now only data with tabular format is supported. However, HoNCAML provides special
preprocessing methods if needed:

* Normalization
* One hot encoding of categorical features

### Problem type

At this moment, the following types of problems are supported:

* Regression
* Classification

### Model type

Regarding available models, the following are supported:

* Sklearn models (ML)
* Pytorch models (DL)

## Requirements

To use HoNCAML, it is required to have Python >= 3.10.

## Install

To install HoNCAML, run: `pip install honcaml`

## Command line execution

### Quick execution with example data

For a quick usage with example data and configuration, just run:

   ```commandline
   honcaml -e {example_directory}
   ```

This would create a directory containing sample data and configuration to see
how HoNCAML works in a straightforward manner. Just enter the specified
directory: `cd {example_directory}` and run one of the pipelines located in
*files* directory. For example, a benchmark for a classification task:

   ```commandline
   honcaml -c files/classification_benchmark.yaml
   ```

### Standard execution

To start a HoNCAML execution for a particular pipeline, first it is needed to
generate the configuration file for it. It may be easy to start with a
template, which is provided by the CLI itself.

In case a basic configuration file is enough, with the minimum required
options, the following should be invoked:

   ```commandline
   honcaml -b {config_file} -t {pipeline_type}
   ```

On the other hand, there is the possibility of generating an advanced
configuration file, with all the supported options:

   ```commandline
   honcaml -a {config_file} -t {pipeline_type}
   ```

In both cases, ``{config_file}`` should be a path to the file containing the
configuration in yaml extension, and ``{pipeline_type}`` one of the supported:
train, predict or benchmark.

When having a filled configuration file to run the pipeline, it is just a
matter of executing it:

   ```commandline
   honcaml -c {config_file}
   ```

For example, the following basic configuration would train a default model
for classification and store it.

    ```yaml
    global:
      problem_type: classification

    steps:
      data:
        extract:
          filepath: data/dataset.csv
          target: class
        transform:

      model:
        transform:
          fit:
        load:
          filepath: default_model.sav
    ```

## GUI execution

To run the HoNCAML GUI locally in a web browser tab, run the following command:

   ```commnadline
   honcaml -g
   ```

It allows to execute HoNCAML by interactively selecting pipeline options,
although it is possible to run a pipeline by uploading its configuration file
as well.

## Contribute

All contributions are more than welcome! For further information, please refer
to the [contribution
documentation](https://github.com/Data-Science-Eurecat/HoNCAML/blob/main/CONTRIBUTING.md).

## Bugs

If you find any bug, please check if there is any existing
[issues](https://github.com/Data-Science-Eurecat/HoNCAML/issues), and if not,
open a new one with a clear description.

## Contact

Should you have any inquiry regarding the library or its development, please
contact the [Applied Machine Learning team](mailto:aml@eurecat.org).
