Metadata-Version: 2.1
Name: katonic
Version: 1.1
Summary: Python SDK for MLOps
Home-page: https://www.katonic.ai/
Author: Katonic Pty Ltd.
Author-email: shailesh.kumar@katonic.ai
License: MIT
Project-URL: Documentation, https://docs.katonic.ai/
Project-URL: Source, https://github.com/katonic-dev/katonic-sdk
Project-URL: Issues, https://github.com/katonic-dev/katonic-sdk/issues
Project-URL: Changelog, https://github.com/katonic-dev/katonic-sdk/blob/master/CHANGELOG.md
Platform: unix
Platform: linux
Platform: osx
Platform: cygwin
Platform: win32
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: certifi (<2021.10.8,>=2017.4.17)
Requires-Dist: click (<8.1.2,>=7.1.1)
Requires-Dist: colorama
Requires-Dist: convertdate (>=2.1.2)
Requires-Dist: cron-descriptor
Requires-Dist: deprecation (<2.2.0,>=2.1.0)
Requires-Dist: func-timeout
Requires-Dist: gunicorn (>=20.1.0)
Requires-Dist: greenlet (==1.1.1)
Requires-Dist: holidays (==0.10.3)
Requires-Dist: humanize
Requires-Dist: isodate
Requires-Dist: importlib-metadata (==4.8.1)
Requires-Dist: LunarCalendar (>=0.0.9)
Requires-Dist: markdown (>=3.0)
Requires-Dist: msgpack (<1.1,>=1.0.0)
Requires-Dist: pandas (<1.4.0,>=1.0.0)
Requires-Dist: parsedatetime
Requires-Dist: pyparsing (<4,>=3.0.6)
Requires-Dist: python-dateutil
Requires-Dist: python-dotenv
Requires-Dist: python-geohash
Requires-Dist: pyarrow (<6.1.0,>=6.0.0)
Requires-Dist: pydantic (==1.8.2)
Requires-Dist: pyyaml (<6,>=5.3)
Requires-Dist: requests (<2.26.0,>=2.10.0)
Requires-Dist: tqdm (>=4.36.1)
Requires-Dist: typing-extensions (<4,>=3.10)
Requires-Dist: zipp (==3.5.0)
Provides-Extra: all
Requires-Dist: azure (<=4.0.0,>=3.0.0) ; extra == 'all'
Requires-Dist: boto3 (==1.19.12) ; extra == 'all'
Requires-Dist: catboost (==1.0.3) ; extra == 'all'
Requires-Dist: cmdstanpy (==0.9.5) ; extra == 'all'
Requires-Dist: delta (==0.4.2) ; extra == 'all'
Requires-Dist: delta-spark (==1.0.0) ; extra == 'all'
Requires-Dist: google (==3.0.0) ; extra == 'all'
Requires-Dist: Jinja2 (<3.0,>=2.10) ; extra == 'all'
Requires-Dist: kfp (<1.8.12,>0.1.10) ; extra == 'all'
Requires-Dist: lightgbm (==3.3.1) ; extra == 'all'
Requires-Dist: matplotlib (<3.4.3,>=3.0.0) ; extra == 'all'
Requires-Dist: mlflow (<1.24.0,>=1.20.0) ; extra == 'all'
Requires-Dist: mmh3 (==3.0.0) ; extra == 'all'
Requires-Dist: mysql-connector-python (<=8.0.28,>=8.0.14) ; extra == 'all'
Requires-Dist: numpy (<=1.24.0,>=1.22.0) ; extra == 'all'
Requires-Dist: optuna (<2.10.0,>=2.8.0) ; extra == 'all'
Requires-Dist: protobuf (==3.19.4) ; extra == 'all'
Requires-Dist: python-dateutil (>=2.8.0) ; extra == 'all'
Requires-Dist: pyspark (==3.1.2) ; extra == 'all'
Requires-Dist: psycopg2-binary (<=2.9.3,>=2.8) ; extra == 'all'
Requires-Dist: redis (==3.5.3) ; extra == 'all'
Requires-Dist: redis-py-cluster (==2.1.3) ; extra == 'all'
Requires-Dist: river (<=0.10.1,>=0.7.0) ; extra == 'all'
Requires-Dist: scikit-learn (<1.0.2,>=0.24.0) ; extra == 'all'
Requires-Dist: seaborn (==0.11.2) ; extra == 'all'
Requires-Dist: snowflake-connector-python (<=2.7.6,>=2.0.0) ; extra == 'all'
Requires-Dist: snowflake-connector-python[pandas] (<=2.7.6,>=2.0.0) ; extra == 'all'
Requires-Dist: SQLAlchemy (<1.4.32,>=1.4.23) ; extra == 'all'
Requires-Dist: starlette (==0.14.2) ; extra == 'all'
Requires-Dist: tqdm (>=4.36.1) ; extra == 'all'
Requires-Dist: xgboost (==1.5.0) ; extra == 'all'
Provides-Extra: azure-connector
Requires-Dist: azure (<=4.0.0,>=3.0.0) ; extra == 'azure-connector'
Provides-Extra: connectors
Requires-Dist: azure (<=4.0.0,>=3.0.0) ; extra == 'connectors'
Requires-Dist: mysql-connector-python (<=8.0.28,>=8.0.14) ; extra == 'connectors'
Requires-Dist: psycopg2-binary (<=2.9.3,>=2.8) ; extra == 'connectors'
Requires-Dist: snowflake-connector-python (<=2.7.6,>=2.0.0) ; extra == 'connectors'
Requires-Dist: snowflake-connector-python[pandas] (<=2.7.6,>=2.0.0) ; extra == 'connectors'
Provides-Extra: drift
Requires-Dist: numpy (<=1.24.0,>=1.22.0) ; extra == 'drift'
Requires-Dist: river (<=0.10.1,>=0.7.0) ; extra == 'drift'
Requires-Dist: scikit-learn (<1.0.2,>=0.24.0) ; extra == 'drift'
Provides-Extra: fs
Requires-Dist: delta (==0.4.2) ; extra == 'fs'
Requires-Dist: delta-spark (==1.0.0) ; extra == 'fs'
Requires-Dist: google (==3.0.0) ; extra == 'fs'
Requires-Dist: protobuf (==3.19.4) ; extra == 'fs'
Requires-Dist: Jinja2 (<3.0,>=2.10) ; extra == 'fs'
Requires-Dist: mmh3 (==3.0.0) ; extra == 'fs'
Requires-Dist: pyspark (==3.1.2) ; extra == 'fs'
Requires-Dist: psycopg2-binary (<2.9.3,>=2.9) ; extra == 'fs'
Requires-Dist: redis (==3.5.3) ; extra == 'fs'
Requires-Dist: redis-py-cluster (==2.1.3) ; extra == 'fs'
Requires-Dist: SQLAlchemy (<1.4.32,>=1.4.23) ; extra == 'fs'
Requires-Dist: starlette (==0.14.2) ; extra == 'fs'
Provides-Extra: ml
Requires-Dist: boto3 (==1.19.12) ; extra == 'ml'
Requires-Dist: catboost (==1.0.3) ; extra == 'ml'
Requires-Dist: cmdstanpy (==0.9.5) ; extra == 'ml'
Requires-Dist: lightgbm (==3.3.1) ; extra == 'ml'
Requires-Dist: matplotlib (<3.4.3,>=3.0.0) ; extra == 'ml'
Requires-Dist: mlflow (<1.24.0,>=1.20.0) ; extra == 'ml'
Requires-Dist: numpy (<1.22.2,>=1.19.0) ; extra == 'ml'
Requires-Dist: optuna (<2.10.0,>=2.8.0) ; extra == 'ml'
Requires-Dist: python-dateutil (>=2.8.0) ; extra == 'ml'
Requires-Dist: scikit-learn (<1.0.2,>=0.24.0) ; extra == 'ml'
Requires-Dist: seaborn (==0.11.2) ; extra == 'ml'
Requires-Dist: tqdm (>=4.36.1) ; extra == 'ml'
Requires-Dist: xgboost (==1.5.0) ; extra == 'ml'
Provides-Extra: mysql-connector
Requires-Dist: mysql-connector-python (<=8.0.28,>=8.0.14) ; extra == 'mysql-connector'
Provides-Extra: pipeline
Requires-Dist: kfp (<1.8.12,>0.1.10) ; extra == 'pipeline'
Requires-Dist: matplotlib (<3.4.3,>=3.0.0) ; extra == 'pipeline'
Provides-Extra: postgres-connector
Requires-Dist: psycopg2-binary (<=2.9.3,>=2.8) ; extra == 'postgres-connector'
Provides-Extra: snowflake-connector
Requires-Dist: snowflake-connector-python (<=2.7.6,>=2.0.0) ; extra == 'snowflake-connector'
Requires-Dist: snowflake-connector-python[pandas] (<=2.7.6,>=2.0.0) ; extra == 'snowflake-connector'
Provides-Extra: testing
Requires-Dist: pytest (>=6.0) ; extra == 'testing'
Requires-Dist: pytest-cov (>=2.0) ; extra == 'testing'
Requires-Dist: mypy (>=0.910) ; extra == 'testing'
Requires-Dist: flake8 (>=3.9) ; extra == 'testing'
Requires-Dist: tox (>=3.24) ; extra == 'testing'

<p align="center">
    <a href="https://katonic.ai/">
      <img src="docs/assets/katonic_logo.png" width="550">
    </a>
</p>
<br />

[![Docs Latest](https://img.shields.io/badge/docs-latest-blue.svg)](https://docs.katonic.ai/)
[![License](https://img.shields.io/badge/License-MIT-blue)](https://github.com/katonic-dev/katonic-sdk/blob/master/LICENSE)

# Katonic Python SDK

The document guides data scientists and developers to build ML applications on the Katonic MLOps platform. Katonic SDK is a repository of abstract python classes and libraries. The Katonic Python SDK was developed in Python and is designed to help data scientists and developers interact with Katonic from their code, experiments and models. Through the SDK, you can create experiments, manage models, automate your machine learning pipeline and more.


The topics in this page:

- Connectors
- Feature Store
- Experiment Operations
- Registry Operation
- Pipelines = KFP Pipeline SDK = Pipeline Operations + create pipeline
- Drift

### Connectors 

A typical AI model life cycle starts with loading the data into your workspace and analyzing it to discover useful insights. for that you can use Katonic's SDK, there are several connectors inside it you can use to load the data and put it where ever you want to work with. Ex. Azure blob, MySql, Postgres etc. 

### Feature Store

Once you loaded all the necessary data that you want to work with. You'll do the preprocessing of it. Which consists of Handling the missing values, Removing the Outliers, Scaling the Data and Encoding the features etc. Once you've finished preprocessing the data. You need to ingest the data into a Feature store. 

By uploading the clean data to a feature store, you can share it across the organization. So that other teams and data scientist working on the same problem can make use of it. By this way you can achieve Feature Reusability.

Apart from that, Machine Learning models are completely dependent on the data that was provided by the Data Scientist. So if there was a change in the Infrastructure of the data, it may lead to break the ML models. So The transformations and Logics that used for training data will also implies for the serving data, For that we can retrieve the processed features from an existing feature store for serving purpose. This will improve the consistency between the training data and serving data otherwise it will lead to training-serving skew.

### Experiment Operations 

Experiment Operations includes all the Data science activities like loading the data, performing the Exploratory Data Analysis, Model training and Tracking the metrices. All of these actions can be done by the Auto ML component inside the Katonic SDK. You can train the models with in few lines codes with out explicitly writing the code. Even all the metrics for Classification and Regression will get catalouged using SDK. Available Metrices are Accuracy score, F-1 score, Precison, Recall, Log loss etc for Classificaiton and Mean Squared Error, Mean Absolute Error and Root Mean Squared Error for Regression usecases.

### Registry Operations

Once you finished training the models with your data. Katonic's SDK will keep track of all the models and store the Model metadata and metrices inside the Experiment Registry. From there you can choose the best model and send it into Model Registy.

In Model Registy you can store the Best models according to your performance Metrices. By using the model registy you can tag the models with `staging` or `production`. The models that are with the tag `production` can be Deployed to the production and the models with `staging` tag can get a review check from the QA team and get to the further stages.

### Pipeline

No Data Scientist want to do the same thing again and again, instead of that Data Scientist want to use the previous work that he had done for the future purposes. We can do the same thing inside an AI Model Life Cycle. Once we are done with model training and registering the best model to model registry. We can convert all the work that we had done till now into a Scalable Pipeline. For that you can use the Pipelines component inside the Katonic SDK. By using this you can convert all your data science work into pipeline with in few lines of code. If you want to do the same operations with the different data, you just need to change the data source and run the pipeline. Every thing will get done automatically in a scalable manner.

### Drift

An AI model life cycle will not end with the model deployment. You need to monitor the model's performance continuously in order to detect the model detoriation or model degradation. Drift component from Katonic's SDK will help you to find the Drift inside your data. It will perform certain statistical analysis upon the data in order to check if the upcoming data has any Outliers or the data is abnormal it will let you know through a Visual representaion.


