Metadata-Version: 2.1
Name: smlb
Version: 0.3.0
Summary: Scientific Machine Learning Benchmark
Home-page: https://github.com/CitrineInformatics/smlb
Author: Matthias Rupp
Author-email: mrupp@mrupp.io
License: Apache-2.0
Platform: UNKNOWN
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6.0
Description-Content-Type: text/markdown
Requires-Dist: numpy (<2,>=1.18.3)
Requires-Dist: scipy (<2,>=1.4.1)
Requires-Dist: pandas (<2,>=1.0.3)
Requires-Dist: matplotlib (<4,>=3.2.1)
Requires-Dist: scikit-learn (<0.23,>=0.22.2)
Provides-Extra: lolo
Requires-Dist: lolopy (<2,>=1.0.4) ; extra == 'lolo'
Requires-Dist: py4j (<0.11,>=0.10.9) ; extra == 'lolo'
Provides-Extra: optional
Requires-Dist: py4j (<0.11,>=0.10.9) ; extra == 'optional'
Requires-Dist: pymatgen (>=2020.4.2) ; extra == 'optional'
Requires-Dist: matminer (<0.7,>=0.6.2) ; extra == 'optional'


[![Build Status](https://api.travis-ci.org/CitrineInformatics/smlb.svg?branch=master)](https://travis-ci.org/github/CitrineInformatics/smlb)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

# Scientific Machine Learning Benchmark (smlb)

## Introduction

`smlb` is a toolbox focused on enabling rigorous empirical assessments of data-driven modeling approaches for applications in the natural sciences.
It is particularly useful when developing or fine-tuning data-driven algorithms to ensure statistically sound decisions.
Its focus is on models for experimental and computed properties of molecules and materials.
It emphasizes correctness, flexibility, and domain support.

`smlb` was designed to help answer questions that arise during the development of domain-specific machine-learning models.
Examples of such questions include

* Which of these uncertainty estimate approaches most closely matches the true error distribution? 
* How does removing slow or failing features affect the predictive accuracy of my model? 

`smlb` provides

* ready-to-use synthetic, computational and experimental datasets
* bindings to other software, including domain-specific features and general machine-learning packages
* standard loss functions and error metrics, also for predictive distributions (uncertainties)
* reproducibility  by systematic control of pseudo-random number generation

Other uses include integration tests to ensure that local changes to a modeling pipeline do not have overall adverse effects.

See the [Overview](docs/overview.md) for a more detailed description.

## Getting started

To get started, follow [installation](docs/installation.md) instructions and run the [tutorial](docs/tutorial.ipynb).

## Other

To contribute, see the [Contributing](docs/contributing.md) instructions.

[Related work](docs/related.md)<br>
[Acknowledgments](docs/acknowledgments.md)


