Metadata-Version: 2.1
Name: ft-drift
Version: 0.0.4
Summary: Check for data drift with OAI data
Home-page: https://github.com/hamelsmu/ft-drift
Author: Hamel Husain
Author-email: hamel.husain@gmail.com
License: Apache Software License 2.0
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: rich
Requires-Dist: openai >=1.12.0
Requires-Dist: scikit-learn >=1.4.1
Provides-Extra: dev

# ft-drift


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

`ft-drift` helps you check for data drift by comparing two OpenAI
[multi-turn chat jsonl
files](https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset).

## Install

``` sh
pip install ft_drift
```

## Background

Checking for dataset drift can help you debug if:

1.  Your model is trained on data that doesn’t reflect production
    (different prompts, functions, etc).
2.  Your training data contains unexpected or accidental artifacts.

In either situation, you can compare data from relevant sources
(i.e. production vs fine-tuning) to find unwanted changes. This is one
of the most common source of errors when fine-tuning models!

The demo below shows a cli tool used to detect data drift between two
files, `file_a.jsonl` and `file_b.jsonl`. Afterwards, a table of
important tokens that account for the drift are shown, such as:

- `END-UI-FORMAT`
- `UI-FORMAT`
- “\`\`\`json”
- etc.

## Usage

After installing `ft_drift`, the cli command `detect_drift` will be
available to you.

![](first.gif)
