Metadata-Version: 2.1
Name: dabl
Version: 0.3.1
Summary: Data Analysis Baseline Library
Author-email: Andreas Mueller <t3kcit+githubspam@gmail.com>
Maintainer-email: Andreas Mueller <t3kcit+githubspam@gmail.com>, Brian Kroth <bpkroth+githubspam@gmail.com>
License: Copyright (c) 2016, Vighnesh Birodkar
        All rights reserved.
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        * Redistributions of source code must retain the above copyright notice, this
          list of conditions and the following disclaimer.
        
        * Redistributions in binary form must reproduce the above copyright notice,
          this list of conditions and the following disclaimer in the documentation
          and/or other materials provided with the distribution.
        
        * Neither the name of project-template nor the names of its
          contributors may be used to endorse or promote products derived from
          this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
        
Project-URL: Documentation, https://dabl.github.io/
Project-URL: Repository, https://github.com/dabl/dabl
Project-URL: Issues, https://github.com/dabl/dabl/issues
Keywords: data analysis,visualization
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Visualization
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: scikit-learn>=1.1
Requires-Dist: pandas
Requires-Dist: seaborn
Requires-Dist: numpy<2.0; python_version < "3.9"
Requires-Dist: scikit-learn<1.4; python_version < "3.9"
Requires-Dist: matplotlib<3.8; python_version < "3.9"
Requires-Dist: scikit-learn>=1.3; python_version >= "3.9"
Requires-Dist: matplotlib>=3.8; python_version >= "3.9"

# dabl

[![CI](https://github.com/dabl/dabl/actions/workflows/ci.yml/badge.svg)](https://github.com/dabl/dabl/actions/workflows/ci.yml)

The data analysis baseline library.

- "Mr Sanchez, are you a data scientist?"
- "I dabl, Mr president."

Find more information on the [website](https://dabl.github.io/).

## Try it out

```
pip install dabl
```

or [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/dabl/dabl/main)

## Current scope and upcoming features
This library is very much still under development. Current code focuses mostly on exploratory visualization and preprocessing.
There are also drop-in replacements for GridSearchCV and RandomizedSearchCV using successive halfing.
There are preliminary portfolios in the style of
[POSH
auto-sklearn](https://ml.informatik.uni-freiburg.de/papers/18-AUTOML-AutoChallenge.pdf)
to find strong models quickly.  In essence that boils down to a quick search
over different gradient boosting models and other tree ensembles and
potentially kernel methods.

Check out the [the website](https://dabl.github.io/dev/) and [example gallery](https://dabl.github.io/0.1.9/auto_examples/index.html) to get an idea of the visualizations that are available.

Stay Tuned!

## Related packages

## Lux
[Lux](https://github.com/lux-org/lux) is an awesome project for easy interactive visualization of pandas dataframes within notebooks.

### Pandas Profiling
[Pandas Profiling](https://github.com/pandas-profiling/pandas-profiling) can
provide a thorough summary of the data in only a single line of code. Using the
```ProfileReport()``` method, you are able to access a HTML report of your data
that can help you find correlations and identify missing data.

`dabl` focuses less on statistical measures of individual columns, and more on
providing a quick overview via visualizations, as well as convienient
preprocessing and model search for machine learning.
