Metadata-Version: 2.1
Name: lapros
Version: 0.3
Summary: lapros data for better AI
Home-page: https://github.com/Tre-Xanh/lapros.py/tree/master/
Author: Vo Chi Cong
Author-email: ccvo@live.jp
License: Apache Software License 2.0
Keywords: AI,data,noise
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pip
Requires-Dist: packaging
Requires-Dist: loguru
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: plum-dispatch
Requires-Dist: scipy
Provides-Extra: dev
Requires-Dist: cleanlab ; extra == 'dev'
Requires-Dist: pyarrow ; extra == 'dev'
Requires-Dist: scikit-learn ; extra == 'dev'

LaPros
================

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Install

`pip install -U lapros`

## How to use

LaPros works with classifiers. It ranks the suspicious labels given
probabilies by some classification model. You can use normal Python
lists, Numpy arrays or Pandas data. Return values are in a Numpy array
or a Pandas series, the larger the value, the more suspicious are the
coresponding labels.

``` python
assert lapros.__version__ == '0.3'
```

``` python
from lapros import suspect
```

``` python
labels = pd.Series(["cat", "dog", "dog", "cat", "cat"])
```

    0    cat
    1    dog
    2    dog
    3    cat
    4    cat
    dtype: object

``` python
probas = pd.DataFrame(dict(
    cat=[0.5, 0.4, 0.3, 0.2, 0.1],
    dog=[0.5, 0.6, 0.7, 0.8, 0.9],
))
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>cat</th>
      <th>dog</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0.5</td>
      <td>0.5</td>
    </tr>
    <tr>
      <th>1</th>
      <td>0.4</td>
      <td>0.6</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0.3</td>
      <td>0.7</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0.2</td>
      <td>0.8</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0.1</td>
      <td>0.9</td>
    </tr>
  </tbody>
</table>
</div>

``` python
suspect(
    probas,
    labels=labels,
)
```

    lapros.classification.estimate_noise.avg_confidence:36 [0.26666667 0.65      ]

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>err</th>
      <th>suspected</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0.000000</td>
      <td>False</td>
    </tr>
    <tr>
      <th>1</th>
      <td>0.183333</td>
      <td>True</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0.000000</td>
      <td>False</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0.216667</td>
      <td>True</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0.416667</td>
      <td>True</td>
    </tr>
  </tbody>
</table>
</div>

``` python
residual = suspect(
    probas,
    labels=labels,
    rank_method="residual",
    return_non_errors=False,
)
```

    lapros.classification.estimate_noise.avg_confidence:36 [0.26666667 0.65      ]

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>err</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>1</th>
      <td>0.4</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0.8</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0.9</td>
    </tr>
  </tbody>
</table>
</div>

``` python
set_logger("INFO")
confidence = suspect(
    probas,
    labels=labels,
    rank_method="confidence",
    return_non_errors=False,
)
```

    lapros.classification.estimate_noise.avg_confidence:36 [0.26666667 0.65      ]

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>err</th>
    </tr>
    <tr>
      <th>id</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>1</th>
      <td>0.183333</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0.216667</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0.416667</td>
    </tr>
  </tbody>
</table>
</div>

``` python
probas.assign(labels=labels, residual=residual, confidence=confidence)
```

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>cat</th>
      <th>dog</th>
      <th>labels</th>
      <th>residual</th>
      <th>confidence</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0.5</td>
      <td>0.5</td>
      <td>cat</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1</th>
      <td>0.4</td>
      <td>0.6</td>
      <td>dog</td>
      <td>0.4</td>
      <td>0.183333</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0.3</td>
      <td>0.7</td>
      <td>dog</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0.2</td>
      <td>0.8</td>
      <td>cat</td>
      <td>0.8</td>
      <td>0.216667</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0.1</td>
      <td>0.9</td>
      <td>cat</td>
      <td>0.9</td>
      <td>0.416667</td>
    </tr>
  </tbody>
</table>
</div>

## docstring

------------------------------------------------------------------------

### suspect

Rank the suspicious labels given probas from a classifier. Accept Numpy
arrays, Pandas dataframes and series. We can use interger, string or
even float labels, given that the probability matrix’s columns are
indexed by the same label set.

#### Args

- probas (n x m matrix): probabilites for possible classes.

#### KwArgs

- labels (n x 1 vector): observed class labels
- rank_method (str): `residual` or `confidence`
- return_non_errors (bool, default = True): return all rows, including
  non-errors

#### Returns

a Pandas DataFrame including 1 index and 2 columns:

- id (int): the index which is the same to the original data row index
- err (float): the magnitude of suspiciousness, valued between \[0, 1\]
- suspected (bool): whether the data row is suspected as having a label
  error. This collum is returned iff return_non_errors=True.

``` python
help(suspect)
```

    Help on function suspect in module lapros.api:

    suspect(...)
        Rank the suspicious labels given probas from a classifier.
        Accept Numpy arrays, Pandas dataframes and series.
        We can use interger, string or even float labels, given that
        the probability matrix's columns are indexed by the same label set.
        
        #### Args
        
        - probas (n x m matrix): probabilites for possible classes.
        
        #### KwArgs
        
        - labels (n x 1 vector): observed class labels
        - rank_method (str): `residual` or `confidence`
        - return_non_errors (bool, default = True): return all rows, including non-errors
        
        #### Returns
        
        a Pandas DataFrame including 1 index and 2 columns:
        
        - id (int): the index which is the same to the original data row index
        - err (float): the magnitude of suspiciousness, valued between [0, 1]
        - suspected (bool):  whether the data row is suspected as having a label error. This collum is returned iff return_non_errors=True.
