Metadata-Version: 2.1
Name: MLPet
Version: 0.0.1
Summary: Package to prepare well log data for ML projects.
Home-page: https://bitbucket.org/akerbp/petroml/
Author: Saghar Asadi
Author-email: saghar.asadi@akerbp.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: cognite-sdk (>=2.31.0)
Requires-Dist: imbalanced-learn (>=0.8.0)
Requires-Dist: joblib (==1.0.1)
Requires-Dist: numpy (>=1.19.5)
Requires-Dist: pandas (>=1.3.2)
Requires-Dist: scikit-learn (>=0.24.2)
Requires-Dist: scipy (>=1.7.1)
Requires-Dist: pyyaml

# MLPet

Preprocessing tools for Petrophysics ML projects at Eureka

## Quick start

- Clone this repository

- Install the package by running the following from the root directory (requires python 3.8 or later)

        python -m pip install --upgrade pip
        python setup.py install

- Short example for pre-processing data prior to making a regression model:

        from mlpet.Datasets.shear import Sheardata
        # Instantiate an empty dataset object using the example settings and mappings provided
        ds = Sheardata(settings="support/settings_shear.yaml", mappings="support/mappings.yaml", folder_path="support/")
        # Populate the dataset with data from a file (support for multiple file formats and direct cdf data collection exists)
        ds.load_from_pickle("support/data/shear.pkl")
        # The original data will be kept in ds.df_original and will remain unchanged 
        print(ds.df_original.head())
        # Split the data into train-validation sets
        df_train_original, df_valid_original, valid_wells = ds.train_test_split(df=ds.df_original, test_size=0.3)
        # Preprocess the data for training
        df_train, train_key_wells, feats = ds.preprocess(df_train_original)
        # Preprocecss accepts some keyword arguments related to various steps (e.g. the key wells used for normalizing curves such as GR)
        df_valid, valid_key_wells, _ = ds.preprocess(df_valid_original, _normalize_curves={'key_wells':train_key_wells})


- Short example for pre-processing data prior to making a classification model:

        from mlpet.Datasets.lithology import Lithologydata
        ds = Lithologydata(settings="support/settings_lithology.yaml", mappings="support/mappings.yaml", folder_path="support/")
        ds.load_from_pickle("support/data/lithology.pkl")


