Metadata-Version: 2.1
Name: poniard
Version: 0.5.0
Summary: Streamline scikit-learn model comparison
Home-page: https://github.com/rxavier/poniard
Author: Rafael Xavier
Author-email: rxaviermontero@gmail.com
License: MIT
Description: Poniard
        ================
        
        <!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
        <p align="center">
        <img src="https://raw.githubusercontent.com/rxavier/poniard/main/logo.png" alt="Poniard logo" title="Poniard" width="50%"/>
        </p>
        
        ## Introduction
        
        > A poniard /ˈpɒnjərd/ or poignard (Fr.) is a long, lightweight
        > thrusting knife ([Wikipedia](https://en.wikipedia.org/wiki/Poignard)).
        
        Poniard is a scikit-learn companion library that streamlines the process
        of fitting different machine learning models and comparing them.
        
        It can be used to provide quick answers to questions like these: \* What
        is the reasonable range of scores for this task? \* Is a simple and
        explainable linear model enough or should I work with forests and
        gradient boosters? \* Are the features good enough as is or should I
        work on feature engineering? \* How much can hyperparemeter tuning
        improve metrics? \* Do I need to work on a custom preprocessing
        strategy?
        
        This is not meant to be end to end solution, and you definitely should
        keep on working on your models after you are done with Poniard.
        
        The core functionality has been tested to work on Python 3.7 through
        3.10 on Linux systems, and from 3.8 to 3.10 on macOS.
        
        ## Installation
        
        Stable version:
        
        ``` bash
        pip install poniard
        ```
        
        Dev version with most up to date changes:
        
        ``` bash
        pip install git+https://github.com/rxavier/poniard.git@develop#egg=poniard
        ```
        
        ## Documentation
        
        Check the full [Quarto docs](https://rxavier.github.io/poniard),
        including guides and API reference.
        
        ## Usage/features
        
        ### Basics
        
        The API was designed with tabular tasks in mind, but it should also work
        with time series tasks provided an appropiate cross validation strategy
        is used (don’t shuffle!)
        
        The usual Poniard flow is: 1. Define some estimators. 2. Define some
        metrics. 3. Define a cross validation strategy. 4. Fit everything. 5.
        Print the results.
        
        Poniard provides sane defaults for 1, 2 and 3, so in most cases you can
        just do…
        
        ``` python
        from poniard import PoniardRegressor
        from sklearn.datasets import load_diabetes
        ```
        
        ``` python
        X, y = load_diabetes(return_X_y=True, as_frame=True)
        pnd = PoniardRegressor(random_state=0)
        pnd.setup(X, y)
        pnd.fit()
        ```
        
                                 <h2>Setup info</h2>
                                 <h3>Target</h3>
                                     <p><b>Type:</b> continuous</p>
                                     <p><b>Shape:</b> (442,)</p>
                                     <p><b>Unique values:</b> 214</p>
                                     <h3>Metrics</h3>
                                     <b>Main metric:</b> neg_mean_squared_error
                                     
         <h3>Feature type inference</h3>
                                        <p><b>Minimum unique values to consider a number-like feature numeric:</b> 44</p>
                                        <p><b>Minimum unique values to consider a categorical feature high cardinality:</b> 20</p>
                                        <p><b>Inferred feature types:</b></p>
                                        <table border="1" class="dataframe">
          <thead>
            <tr style="text-align: right;">
              <th></th>
              <th>numeric</th>
              <th>categorical_high</th>
              <th>categorical_low</th>
              <th>datetime</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <th>0</th>
              <td>age</td>
              <td></td>
              <td>sex</td>
              <td></td>
            </tr>
            <tr>
              <th>1</th>
              <td>bmi</td>
              <td></td>
              <td></td>
              <td></td>
            </tr>
            <tr>
              <th>2</th>
              <td>bp</td>
              <td></td>
              <td></td>
              <td></td>
            </tr>
            <tr>
              <th>3</th>
              <td>s1</td>
              <td></td>
              <td></td>
              <td></td>
            </tr>
            <tr>
              <th>4</th>
              <td>s2</td>
              <td></td>
              <td></td>
              <td></td>
            </tr>
            <tr>
              <th>5</th>
              <td>s3</td>
              <td></td>
              <td></td>
              <td></td>
            </tr>
            <tr>
              <th>6</th>
              <td>s4</td>
              <td></td>
              <td></td>
              <td></td>
            </tr>
            <tr>
              <th>7</th>
              <td>s5</td>
              <td></td>
              <td></td>
              <td></td>
            </tr>
            <tr>
              <th>8</th>
              <td>s6</td>
              <td></td>
              <td></td>
              <td></td>
            </tr>
          </tbody>
        </table>
        
              0%|          | 0/9 [00:00<?, ?it/s]
        
            PoniardRegressor(random_state=0)
        
        … and get a nice table showing the average of each metric in all folds
        for every model, including fit and score times (thanks, scikit-learn
        `cross_validate` function!)
        
        ``` python
        pnd.get_results()
        ```
        
        <table border="1" class="dataframe">
          <thead>
            <tr style="text-align: right;">
              <th></th>
              <th>test_neg_mean_squared_error</th>
              <th>test_neg_mean_absolute_percentage_error</th>
              <th>test_neg_median_absolute_error</th>
              <th>test_r2</th>
              <th>fit_time</th>
              <th>score_time</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <th>LinearRegression</th>
              <td>-2977.598515</td>
              <td>-0.396566</td>
              <td>-39.009146</td>
              <td>0.489155</td>
              <td>0.005265</td>
              <td>0.001960</td>
            </tr>
            <tr>
              <th>ElasticNet</th>
              <td>-3159.017211</td>
              <td>-0.422912</td>
              <td>-42.619546</td>
              <td>0.460740</td>
              <td>0.003509</td>
              <td>0.001755</td>
            </tr>
            <tr>
              <th>RandomForestRegressor</th>
              <td>-3431.823331</td>
              <td>-0.419956</td>
              <td>-42.203000</td>
              <td>0.414595</td>
              <td>0.101435</td>
              <td>0.004821</td>
            </tr>
            <tr>
              <th>HistGradientBoostingRegressor</th>
              <td>-3544.069433</td>
              <td>-0.407417</td>
              <td>-40.396390</td>
              <td>0.391633</td>
              <td>0.334695</td>
              <td>0.009266</td>
            </tr>
            <tr>
              <th>KNeighborsRegressor</th>
              <td>-3615.195398</td>
              <td>-0.418674</td>
              <td>-38.980000</td>
              <td>0.379625</td>
              <td>0.003038</td>
              <td>0.002083</td>
            </tr>
            <tr>
              <th>XGBRegressor</th>
              <td>-3923.488860</td>
              <td>-0.426471</td>
              <td>-39.031309</td>
              <td>0.329961</td>
              <td>0.055696</td>
              <td>0.002855</td>
            </tr>
            <tr>
              <th>LinearSVR</th>
              <td>-4268.314411</td>
              <td>-0.374296</td>
              <td>-43.388592</td>
              <td>0.271443</td>
              <td>0.003470</td>
              <td>0.001721</td>
            </tr>
            <tr>
              <th>DummyRegressor</th>
              <td>-5934.577616</td>
              <td>-0.621540</td>
              <td>-61.775921</td>
              <td>-0.000797</td>
              <td>0.003010</td>
              <td>0.001627</td>
            </tr>
            <tr>
              <th>DecisionTreeRegressor</th>
              <td>-6728.423034</td>
              <td>-0.591906</td>
              <td>-59.700000</td>
              <td>-0.145460</td>
              <td>0.004179</td>
              <td>0.001667</td>
            </tr>
          </tbody>
        </table>
        
        Alternatively, you can also get a nice plot of your different metrics by
        using the `PoniardBaseEstimator.plot.metrics` method.
        
        ### Type inference
        
        Poniard uses some basic heuristics to infer the data types.
        
        Float and integer columns are defined as numeric if the number of unique
        values is greater than indicated by the `categorical_threshold`
        parameter.
        
        String/object/categorical columns are assumed to be categorical.
        
        Datetime features are processed separately with a custom encoder.
        
        For categorical features, high and low cardinality is defined by the
        `cardinality_threshold` parameter. Only low cardinality categorical
        features are one-hot encoded.
        
        ### Ensembles
        
        Poniard makes it easy to combine various estimators in stacking or
        voting ensembles. The base esimators can be selected according to their
        performance (top-n) or chosen by their names.
        
        Poniard also reports how similar the predictions of the estimators are,
        so ensembles with different base estimators can be built. A basic
        correlation table of the cross-validated predictions is built for
        regression tasks, while [Cramér’s
        V](https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V) is used for
        classification.
        
        By default, it computes this similarity of prediction errors instead of
        the actual predictions; this helps in building ensembles with good
        scoring estimators and uncorrelated errors, which in principle and
        hopefully should lead to a “wisdom of crowds” kind of situation.
        
        ### Hyperparameter optimization
        
        The
        [`PoniardBaseEstimator.tune_estimator`](https://rxavier.github.io/poniard/estimators.core.html#poniardbaseestimator.tune_estimator)
        method can be used to optimize the hyperparameters of a given estimator,
        either by passing a grid of parameters or using the inbuilt ones
        available for default estimators. The tuned estimator will be added to
        the list of estimators and will be scored the next time
        [`PoniardBaseEstimator.fit`](https://rxavier.github.io/poniard/estimators.core.html#poniardbaseestimator.fit)
        is called.
        
        ### Plotting
        
        The `plot` accessor provides several plotting methods based on the
        attached Poniard estimator instance. These Plotly plots are based on a
        default template, but can be modified by passing a different
        [`PoniardPlotFactory`](https://rxavier.github.io/poniard/plot.plot_factory.html#poniardplotfactory)
        to the Poniard `plot_options` argument.
        
        ### Plugin system
        
        The `plugins` argument in Poniard estimators takes a plugin or list of
        plugins that subclass
        [`BasePlugin`](https://rxavier.github.io/poniard/plugins.core.html#baseplugin).
        These plugins have access to the Poniard estimator instance and hook
        onto different sections of the process, for example, on setup start, on
        fit end, on remove estimator, etc.
        
        This makes it easy for third parties to extend Poniard’s functionality.
        
        Two plugins are baked into Poniard. 1. Weights and Biases: logs your
        data, plots, runs wandb scikit-learn analysis, saves model artifacts,
        etc. 2. Pandas Profiling: generates an HTML report of the features and
        target. If the Weights and Biases plugin is present, also logs this
        report to the wandb run.
        
        The requirements for these plugins are not included in the base Poniard
        dependencies, so you can safely ignore them if you don’t intend to use
        them.
        
        ## Design philosophy
        
        ### Not another dependency
        
        We try very hard to avoid cluttering the environment with stuff you
        won’t use outside of this library. Poniard’s dependencies are:
        
        1.  scikit-learn (duh)
        2.  pandas
        3.  XGBoost
        4.  Plotly
        5.  tqdm
        6.  That’s it!
        
        Apart from `tqdm` and possibly `Plotly`, all dependencies most likely
        were going to be installed anyway, so Poniard’s added footprint should
        be small.
        
        ### We don’t do that here (AutoML)
        
        Poniard tries not to take control away from the user. As such, it is not
        designed to perform 2 hours of feature engineering and selection, try
        every model under the sun together with endless ensembles and select the
        top performing model according to some metric.
        
        Instead, it strives to abstract away some of the boilerplate code needed
        to fit and compare a number of models and allows the user to decide what
        to do with the results.
        
        Poniard can be your first stab at a prediction problem, but it
        definitely shouldn’t be your last one.
        
        ### Opinionated with a few exceptions
        
        While some parameters can be modified to control how variable type
        inference and preprocessing are performed, the API is designed to
        prevent parameter proliferation.
        
        ### Cross validate all the things
        
        Everything in Poniard is run with cross validation by default, and in
        fact no relevant functionality can be used without cross validation.
        
        ### Use baselines
        
        A dummy estimator is always included in model comparisons so you can
        gauge whether your model is better than a dumb strategy.
        
        ### Fast TTFM (time to first model)
        
        Preprocessing tries to ensure that your models run successfully without
        significant data munging. By default, Poniard imputes missing data and
        one-hot encodes or target encodes (depending on cardinality) inferred
        categorical variables, which in most cases is enough for scikit-learn
        algorithms to fit without complaints. Additionally, it scales numeric
        data and drops features with a single unique value.
        
        ## Similar projects
        
        Poniard is not a groundbreaking idea, and a number of libraries follow a
        similar approach.
        
        **[ATOM](https://github.com/tvdboom/ATOM)** is perhaps the most similar
        library to Poniard, albeit with a different approach to the API.
        
        **[LazyPredict](https://github.com/shankarpandala/lazypredict)** is
        similar in that it runs multiple estimators and provides results for
        various metrics. Unlike Poniard, by default it tries most scikit-learn
        estimators, and is not based on cross validation.
        
        **[PyCaret](https://github.com/pycaret/pycaret)** is a whole other beast
        that includes model explainability, deployment, plotting, NLP, anomaly
        detection, etc., which leads to a list of dependencies several times
        larger than Poniard’s, and a more complicated API.
        
Keywords: machine learning,scikit-learn
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: dev
