Metadata-Version: 2.1
Name: irdatacleaning
Version: 2021.0.2
Summary: module designed to make your data preprocessing experience easier
Home-page: UNKNOWN
Author: Islander Robotics (William McKeon)
Author-email: <IslanderRobotics@gmail.com>
License: UNKNOWN
Description: # irdatacleaning
        This python package is designed to make Artificial Intelligence accessible by starting
        with the data cleaning stage.
        
        ## DataCorrelation:
        this module allows you to be able to view the correlation values of your dataset
        allowing you the ability to prevent simple errors
        DataCorrelation(df = pandas dataframe)
        df: is where you will input the dataset you would like to evaluate 
        Correlationmatrix(): is the method you call uppon to view which columns have
        correlation relationships. 
        LookingAtCorr() is the method is where you will actually make the changes to your dataset
        this method returns a pandas dataframe.
        Check(): this method will call uppon both LookingAtCorr, and Correlationmatrix for you
        this method also will return a pandas dataframe.
        
        ## DataDiscovery:
        This class is designed to allow you the ability to evaluate your data
        so that you may get an idea of what you need to change in the dataset
        the best way to use this class is by actaully creating an instance of this
        class where it will automate everything.
        DataDiscovery(df)
        df will be any pandas dataframe you wish to evaluate.
        
        ## Encoder
        this class is dessigned to help you make encoding your data simple
        the input variables for this class are
        df: a pandas dataframe
        type: by defalult this variable will br set to ONEHOTENCODER if you with to use
        OrdinalEncoder you would set type to ordinalencoder
        then you can call the check method to make the corretions this method will return a pandas data frame.
        if you wish to compare the returned value to the original dataset you may call copy.
        
        ##FeatureScaler:
        this class is dessigned to make featur scaling very simple and begginer friendly.
        this class has 2 input arguments.
        FeatureScaler(df,checker=2)
        df: which is the dataset that you will be applying standard scaller
        checker is the threshold that your columns will be evluated at, by default this variable is set to 2 but you can change
        this depending on what you need.
        
        ## InconsistentData
        this class is dessigned to help you in the process of correcting inconsitent data
        you have the ability to use use,
        seperatingwords(origin,change):
        this method is created so that you will be able to make sure all the columns
        names with more then one word is seperated correctly
        origin is the original format used to seperate the words
        change is the format you would like to be used to seperate words
        changeing_column_cases(case = "title")
        this method is used to correct the columns nanes so that they are all in
        full caps, full lower, or title case
        case will be used to tell the method what case you would like
        by defalut case will be set equal to title but by saying
        case = upper the column names will be put to full lower
        and the same for case = upper
        column_names_white_space():
        this method will be used to correct white space in column names
        data_white_space():
        this method will be used to correct white space in the dataset
        correcting(column_name, corrections ):
        this method is dessigned to help you make the needed changes to the data in the cells
        so that your data is more consistent
        column_name is the var used to identify which column will get the corrections
        corrections is the dictionary with the corrected valuescheck(seperatingwords = False, origin = "", corrections = "" , change_case = False,case = "title", correcting = False, column_name = "", cell_corrections=None):
        this methode is designed to automate all the steps. needed except you will have to provide some
        input arguments
        first is seperatingwords by defalut is false when you set this to true you will be calling the
        seperatingwords words method
        therefore you will have to add what the origin is set equal
        as well as corrections these will both be some kind string values
        next input value will be case
        
        change_case = False to be able to have all your column names changed to the same case you will want change the value of change_case to true
        case = "title"
        you can change this depending on how you would like to formate your column names
        when you want to correct specifica values in the data you will set correcting to true as well as
        column_name = to the column name that will get these corrections done
        then
        cell_corrections = to a dictionary
        the corrected pandas data frame will be return
        autocheck():
        does the same as what check does but walks you through the proccess of making all the changes
        resources():
        a method dessigned to give you links for more information on the class
        ## MissingValues:
        Is dessigned to make correcting missing values alot more accesable.
        MissingValues(df)
        df: is the inputted pandas dataframe what will have corrections made to it
        check is the method used to tell the module to start the corrections, this method will return the corrected dataframe
        if you wish to get the original dataframe call the copy variable.
        currently you are only able to use the median stratagy however other methods are in the work
        
        ## StringToDateTime:
        this class is designed to make converting strings to datetime more accessable
        this is done by creating an instance of the class StringToDateTime(df, column_names)
        df is where you will define the pandas dataframe that you will work with
        column_names is when you have a column names for columns you wish have converted to datetime
        that is not not ["date","dates","starttime","start_time","start time"], to use this input argument
        successfully you must pass in a list
        check(): to tell the module to make the corrections you must call the check method
        resources(): will give you the link to the youtube video about this module as well as the github
        ## Resources:
        this class is used to allow you islanders the ability to get additional resources on the module or classes
        
Keywords: python,Machine Learning,Artificial Intelligence,Data Science,Data Cleaning
Platform: UNKNOWN
Description-Content-Type: text/markdown
