Metadata-Version: 2.1
Name: pandas-extension-christinedonszelmann
Version: 0.2.3
Summary: makes combinations of all columns of one dataframe, and more
Home-page: UNKNOWN
Author: Christine Donszelmann
Author-email: christine.donszelmann@nl.ey.com
License: MIT
Description: ###### This is part of the EY-onboarding project for *Christine Donszelmann*.
        
        It might be useful to use when working with i.e. general ledger data. Or it might not be useful except for teaching Christine how to use python effectively.
        
        Making combinations and grouping them will be logged (DEBUG) in the 'logging_christine.log' file.
        
        ---
        
        When making combinations, remember that the amount of columns in the new DataFrame can be calculated with:
        `number_of_columns_new = 2**(number_of_columns_old)-1` so don't feed it 100 columns unless you want your memory to explode :-)
        ---
        
        This package relies on the following packages, please install them before using this package:
        * Pandas
        * Numpy
        * tqdm
        
        This package has the the following classes and methods:
        * class CombinationMaker
            * combinations(listofkeys)
            
                    makes al possible combinations of a list.
                        example input: [a, b, c]
                        example output: [[], [a], [b], [c], [a, b], [a, c], [b, c], [a, b, c]]
        
                    parameters:    
                    listofkeys: list of strings that need to be combined
                    totalpairs: needed for recursion, please keep as None as the script replaces it with the number of total possible combinations
                    Return: list of lists with all combinations of strings
        
        
        * class DataFrameGrouper
            * show_combinations(*optional: joiner*)
            
                    makes the combinations made in CombinationMaker.combinations more readable .
                        example df.keys() = [a, b, c]
                        example joiner = '-'
                        example output = [a, b, c, a-b, a-c, b-c, a-b-c]
                    (removes the empty list)
                    
                    parameters:    
                    joiner: string that comes between the joined keys, default='-'
                    Return: list of strings of combinations with the joinerstring in between
                
            * groupbyer(sum_on_key, *optional: group_on_keys*, *disabletqdm*, *joiner*)
                
                    makes a list of columnnames on which to group by (group_on_keys or self.frame.keys())
                    then concatenates all combinations of those columns with joiner as joiner-string
                    then groups the self.frame by each of those columns and sums the sumcolumn on each groupby
                    then makes 1 dataframe of this information and returns this dataframe.
            
                    parameters:       
                    sum_on_key: key on which the summing takes place
                    group_on_keys: list of all keys that need to be combined and grouped by if not all keys in df, default = None
                    disabletqdm: if True there will be no tqdm shown, default=False
                    joiner: string that is used to join the columns in concatenator, default='-'
                    Return: dataframe
            
            * evaluator(sum_on_key, *optional: group_on_keys*, *disabletqdm*, *joiner*)
            
                    makes dataframe with the following evaluation-statistics about the groupbyer-dataframe:
                        -new_column_name: all combinations of given columns in group_on_keys or self.frame.keys(), joined by joiner
                        -unique count: number of unique rows in the groupbyer-dataframe for that combination
                        -not_zero: number of rows in the groupbyer-dataframe of which the sum of summed_column is not 0.0
                        -string_length: mean string length of the values in that combination-column
            
                    parameters:    
                    sum_on_keys: key on which the summing takes place
                    group_on_keys: list of all keys that need to be combined and grouped by if not all keys in df, default = None
                    disabletqdm: if True there will be no tqdm shown on the groupbyer, default=False
                    joiner: string that is used to join the columns in concatenator, default='-'
                    Return: dataframe with evaluation-statistics
        
        
        Example 1:
        
        ```
        df = pd.DataFrame(
                [['a', 1, 'xx', 'alpha'], ['b', 2, 'yy', 'beta'], ['c', 3, 'zz', 'gamma'], ['d', 4, 'qq', 'delta'],
                ['e', -1, 'xx', 'alpha']],
                columns=['letter', 'value', 'code', 'greek'])
        
        DFG = DataFrameGrouper(df)
        
        print(DFG.evaluator('value'))
        ```
        gives:
        ```     
             new_column_name  unique_count  not_zero  string_length
        0         code-greek             4         3            7.8
        1       letter-greek             5         5            6.8
        2        letter-code             5         5            4.0
        3  letter-code-greek             5         5            9.8
        ```
        
        
        Example 2:
        ```
        df = pd.DataFrame(
                [['a', 1, 'xx', 'alpha'], ['b', 2, 'yy', 'beta'], ['c', 3, 'zz', 'gamma'], ['d', 4, 'qq', 'delta'],
                ['e', -1, 'xx', 'alpha']],
                columns=['letter', 'value', 'code', 'greek'])
        
        DFG = DataFrameGrouper(df)
        
        print(DFG.groupbyer('value', group_on_keys = ['letter','greek']))
        ```
        gives:
        ```
           value code  greek code-greek  code-greek_length  code-greek_summed
        0      1   xx  alpha   xx-alpha                7.8                  0
        1      2   yy   beta    yy-beta                7.8                  2
        2      3   zz  gamma   zz-gamma                7.8                  3
        3      4   qq  delta   qq-delta                7.8                  4
        4     -1   xx  alpha   xx-alpha                7.8                  0
        ```
Platform: UNKNOWN
Description-Content-Type: text/markdown
