Metadata-Version: 1.1
Name: cynet
Version: 1.1.24
Summary: Learning Point Processes Using Deep Granger Nets
Home-page: https://github.com/zeroknowledgediscovery/
Author: zed.uchicago.edu
Author-email: ishanu@uchicago.edu
License: LICENSE.txt
Download-URL: https://github.com/zeroknowledgediscovery/Cynet/archive/1.1.24.tar.gz
Description: ===============
        cynet
        ===============
        
        .. figure:: https://pypistats.com/badge/cynet.png
           :alt: cynet PyPI Downloads
        
        .. image:: http://zed.uchicago.edu/logo/logozed1.png
           :height: 400px
           :scale: 50 %
           :alt: alternate text
           :align: center
        
        
        .. class:: no-web no-pdf
        
        :Info: See <https://arxiv.org/abs/1406.6651> for theoretical background
        :Author: ZeD@UChicago <zed.uchicago.edu>
        :Description: Implementation of the Deep Granger net inference algorithm, described in https://arxiv.org/abs/1406.6651, for learning spatio-temporal stochastic processes (*point processes*). **cynet** learns a network of generative local models, without assuming any specific model structure.
        
        .. NOTE:: If issues arise with dependencies in python3, be sure that *tkinter* is installed
        
        .. code-block::
        
            sudo apt-get install python3-tk
        
        **Usage:**
        
        .. code-block::
        
            from cynet import cynet
            from cynet.cynet import uNetworkModels as models
            from viscynet import viscynet as vcn
        
        
        **cynet module includes:**
          * cynet
          * viscynet
          * bokeh_pipe
        
        
        cynet library classes:
        ~~~~~~~~~~~~~~~~~~~~~~
        * spatioTemporal
        * uNetworkModels
        * simulateModels
        
        **class spatioTemporal**
          Utilities for spatial-temporal analysis
        
          **Attributes:**
              * log_store (Pickle): Pickle storage of class data & dataframes
              * log_file (string): path to CSV of legacy dataframe
              * ts_store (string): path to CSV containing most recent ts export
              * DATE (string):
              * EVENT (string): column label for category filter
              * coord1 (string): first coordinate level type; is column name
              * coord2 (string): second coordinate level type; is column name
              * coord3 (string): third coordinate level type; (z coordinate)
              * end_date (datetime.date): upper bound of daterange
              * freq (string): timeseries increments; e.g. D for date
              * columns (list): list of column names to use; requires at least 2 coordinates and event type
              * types (list of strings): event type list of filters
              * value_limits (tuple): boundaries (magnitude of event above threshold)
              * grid (dictionary or list of lists): coordinate dictionary with respective ranges
                and EPS value OR custom list of lists
                of custom grid tiles as [coord1_start, coord1_stop, coord2_start, coord2_stop]
              * grid_type (string): parameter to determine if grid should be built up
                from a coordinate start/stop range ('auto') or be
                built from custom tile coordinates ('custom')
              * threshold (float): significance threshold
        
          **Methods:**
        
          .. code-block::
        
                __init__(self, log_store='log.p', log_file=None, ts_store=None, DATE='Date',
                        year=None, month=None, day=None, EVENT='Primary Type', coord1='Latitude',
                        coord2='Longitude', coord3=None, init_date=None, end_date=None, freq=None,
                        columns=None, types=None, value_limits=None, grid=None, threshold=None)
        
        
                fit(self, grid=None, INIT=None, END=None, THRESHOLD=None, csvPREF='TS',
                    auto_adjust_time=False,incr=6,max_incr=24, poly_tile=False):
        
                    Fit dataproc with specified grid parameters and
                    create timeseries for
                    date boundaries specified by INIT, THRESHOLD,
                    and END or input list of custom coordinate boundaries which do NOT have
                    to match the arguments first input to the dataproc
        
                    Inputs -
                        grid (dictionary or list of lists): coordinate dictionary with
                            respective ranges and EPS value OR custom list of lists
                            of custom grid tiles as [coord1_start, coord1_stop,
                            coord2_start, coord2_stop]
                        INIT (datetime.date): starting timeseries date
                        END (datetime.date): ending timeseries date
                        THRESHOLD (float): significance threshold
                        auto_adjust_time (boolean): if True, within increments specified
                        (6H default), determine optimal temporal frequency for timeseries data
                        incr (int): frequency increment
                        max_incr (int): user-specified maximum increment
                        poly_tile(boolean): whether or not tiles define polygons
        
                    Outputs -
                        (No output) grid pd.Dataframe written out as CSV file to path specified
        
        
                getTS(self, _types=None, tile=None, freq=None):
                    Given location tile boundaries and type category filter, creates the
                    corresponding timeseries as a pandas DataFrame
                    (Note: can reassign type filter, does not have to be the same one
                    as the one initialized to the dataproc)
        
                    Inputs:
                        _types (list of strings): list of category filters
                        tile (list of floats): location boundaries for tile
                        freq (string): intervals of time between timeseries columns
                        poly_tile (boolean): whether or not input for tiles defines
                            a polygon filter
        
                    Outputs:
                        pd.Dataframe of timeseries data to corresponding grid tile
                        pd.DF index is stringified LAT/LON boundaries
                        with the type filter  included
        
        
                get_rand_tile(tiles=None,LAT=None,LON=None,EPS=None,_types=None):
                    Picks random tile from options fed into timeseries method which maps to a
                    non-empty subset within the larger dataset
        
                    Inputs -
                        LAT (float or list of floats): singular coordinate float or list of
                                                       coordinate start floats
                        LON (float or list of floats): singular coordinate float or list of
                                                       coordinate start floats
                        EPS (float): coordinate increment ESP
                        _types (list): event type filter; accepted event type list
                        tiles (list of lists): list of tiles to build
                            Ex:(list of [lat1 lat2 lon1 lon2]) or tuples (i.e. [(x1,y1),(x2,y2)])
                            defining polygons
                        poly_tile (boolean): whether input for tile specifies a polygon
        
                    Outputs -
                        tile dataframe (pd.DataFrame)
        
        
                get_opt_freq(df,incr=6,max_incr=24):
                    Returns the optimal frequency for timeseries based on highest non-zero
                    to zero timeseries event count
        
                    Input -
                        df (pd.DataFrame): filtered subset of dataset corresponding to
                        random tile from get_rand_tile
                        incr (int): frequency increment
                        max_incr (int): user-specified maximum increment
        
                    Output -
                        (string) to pass to pd.date_range(freq=) argument
        
        
                getGrid(self):
                    Returns the tile coordinates of the working as a list of lists
        
                    Input -
                        (No inputs)
                    Output -
                        TILE (list of lists): the grid tiles
        
        
                pull(self, domain='data.cityofchicago.org', dataset_id='crimes', token=None,
                    store=True, out_fname='pull_df.p', pull_all=False):
                    Pulls new entries from datasource
        
                    Input -
                        domain (string): Socrata database domain hosting data
                        dataset_id (string): dataset ID to pull
                        token (string): Socrata token for increased pull capacity;
                            Note: Requires Socrata account
                        store (boolean): whether or not to write out new dataset
                        pull_all (boolean): pull complete dataset
                        instead of just updating
        
                    Output -
                        None (writes out files if store is True and modifies inplace)
        
        
                timeseries(self, LAT=None, LON=None, EPS=None,_types=None,CSVfile='TS.csv',
                    THRESHOLD=None,tiles=None,incr=6,max_incr=24, poly_tile=False):
        
                    Creates DataFrame of location tiles and their
                    respective timeseries from input datasource with
                    significance threshold THRESHOLD
                    latitude, longitude coordinate boundaries given by LAT, LON and EPS
                    or the custom boundaries given by tiles
                    calls on getTS for individual tile then concats them together
        
                    Input -
                        LAT (float or list of floats): singular coordinate float or list of
                                                       coordinate start floats
                        LON (float or list of floats): singular coordinate float or list of
                                                       coordinate start floats
                        EPS (float): coordinate increment ESP
                        _types (list): event type filter; accepted event type list
                        CSVfile (string): path to output file
                        tiles (list of lists): list of tiles to build
                            (list of [lat1 lat2 lon1 lon2])
                        auto_adjust_time (boolean): if True, within increments specified
                        (6H default), determine optimal temporal frequency for timeseries data
                        incr (int): frequency increment
                        max_incr (int): user-specified maximum increment
                        poly_tile (boolean): whether or tiles define polygons
        
                    Output:
                        No Output grid pd.Dataframe written out as CSV file to path specified
        
        
          **Utility functions for spatioTemporal:**
            .. code-block::
        
                splitTS(TSfile, csvNAME='TS1', dirname='./', prefix='@', BEG=None, END=None,
                    VARNAME='')
                    Utilities for spatio temporal analysis
        
                    Writes out each row of the pd.DataFrame as a separate CSVfile
                    For XgenESeSS binary
        
                    Inputs -
                        TSfile (pd.DataFrame): DataFrame to write out
                        csvNAME (string): output filename
                        dirname (string): directory for output file
                        prefix (string): prefix for files
                        VARNAME (string): string to append to file names
                        BEG (datetime): start date
                        END (datetime): end date
        
                    Outputs -
                        (No output)
        
        
                stringify(List):
                    Utility function
        
                    Converts list into string separated by dashes
                    or empty string if input list is not list or is empty
        
                    Input:
                        List (list): input list to be converted
        
                    Output:
                        (string)
        
        
                to_json(pydict, outFile):
                    Writes dictionary json to file
        
                    Input -
                        pydict (dict): ditionary to store
                        outFile (string): name of outfile to write json to
        
                    Output -
                        (No output but writes out files)
        
        
                readTS(TSfile,csvNAME='TS1',BEG=None,END=None):
                     Utilities for spatio temporal analysis
        
                     Reads in output TS logfile into pd.DF and outputs necessary
                     CSV files in XgenESeSS-friendly format
        
                     Input -
                         TSfile (string or list of strings): filename of input TS to read
                             or list of filenames to read in and concatenate into one TS
                         csvNAME (string)
                         BEG (string): start datetime
                         END (string): end datetime
        
                     Output -
                         dfts (pandas.DataFrame)
        
        
        **class uNetworkModels:**
          Utilities for storing and manipulating XPFSA models
          inferred by XGenESeSS
        
          Attributes:
            jsonFile (string): path to json file containing models
        
          Methods defined here:
        
        .. code-block::
        
            __init__(self, jsonFILE):
        
        
            append(self,pydict):
                Utilities for storing and manipulating XPFSA models
                inferred by XGenESeSS
        
                append models to internal dictionary
        
        
            augmentDistance(self):
                Utilities for storing and manipulating XPFSA models
                inferred by XGenESeSS
        
                Calculates the distance between all models and stores
                them under the
                distance key of each model;
        
                No I/O
        
        
            select(self,var="gamma",n=None,
                reverse=False, store=None,
                high=None,low=None,equal=None,inplace=False):
                Utilities for storing and manipulating XPFSA models
                inferred by XGenESeSS
        
                Selects the N top models as ranked by var specified value
                (in reverse order if reverse is True)
        
                Inputs -
                    var (string): model parameter to rank by
                    n (int): number of models to return
                    reverse (boolean): return in ascending order (True)
                        or descending (False) order
                    store (string): name of file to store selection json
                    high (float): higher cutoff
                    equal (float): choose models with selection values
                        equal to the given value
                    low (float): lower cutoff
                    inplace (bool): update models if true
                Output -
                    (dictionary): top n models as ranked by var
                                 in ascending/descending order
        
        
            setVarname(self):
                Utilities for storing and manipulating XPFSA models
                inferred by XGenESeSS
        
                Extracts the varname for src and tgt of
                each model and stores under src_var and tgt_var
                keys of each model;
        
                No I/O
        
        
            to_json(outFile):
                Utilities for storing and manipulating XPFSA models
                inferred by XGenESeSS
        
                Writes out updated models json to file
        
                Input -
                    outFile (string): name of outfile to write json to
        
                Output -
                    (No output but writes out files)
        
        
            setDataFrame(self,scatter=None):
                Generate dataframe representation of models
        
                Input -
                    scatter (string) : prefix of filename to plot 3X3 regression
                    matrix between delay, distance and coefficiecient of causality
                Output -
                    Dataframe with columns
                    ['latsrc','lonsrc','lattgt', 'lontgtt','gamma','delay','distance']
        
        **class simulateModel**
          Utilities for generating statistical analysis after processing models
        
          **Attributes:**
            * MODEL_PATH(string)- The path to the model being processed.
            * DATA_PATH(string)- Path to the split file.
            * RUNLEN(integer)- Length of the run.
            * READLEN(integer)- Length of split data to read from begining
            * CYNET_PATH - path to cynet binary.
            * FLEXROC_PATH - path to flexroc binary.
        
          **Methods:**
            .. code-block::
        
                run(self, LOG_PATH=None,
                    PARTITION=0.5,
                    DATA_TYPE='continuous',
                    FLEXWIDTH=1,
                    FLEX_TAIL_LEN=100,
                    POSITIVE_CLASS_COLUMN=5,
                    EVENTCOL=3,
                    tpr_thrshold=0.85,
                    fpr_threshold=0.15):
        
        
                This function is intended to replace the cynrun.sh shell script. This
                function will use the subprocess library to call cynet on a model to process
                it and then run flexroc on it to obtain statistics: auc, tpr, fuc.
                Inputs:
                    LOG_PATH(string)- Logfile from cynet run
                    PARTITION(string)- Partition to use on split data
                    FLEXWIDTH(int)-  Parameter to specify flex in flwxroc
                    FLEX_TAIL_LEN(int)- tail length of input file to consider [0: all]
                    POSITIVE_CLASS_COLUMN(int)- positive class column
                    EVENTCOL(int)- event column
                    tpr_thershold(float)- tpr threshold
                    fpr_threshold(float)- fpr threshold
                Returns:
                auc, tpr, and fpr statistics from flexroc.
        
          **Utility functions for simulateModel:**
            .. code-block::
        
                def parallel_process(arguments):
                    This function takes a model and produces statistics on them. The output is
                    saved to a result file with the suffix defined by RESUFFIX. We note that
                    arguments needs to be a list of various arguments (detailed below) due to
                    the nature of joblib. We expect this function to be called by a parallel
                    processing library such as joblib.
                    Inputs:
                        arguments(list) - a list of arguments necessary for the function:
                            arguments[0]-FILE(str): path to the model being processed.
                            arguments[1]-model_nums(int): Number of models to use in prediction
                            arguments[2]-Horizon(int): prediction horizon.
                            arguments[3]-DATA_PATH: path to split file.
                                Ex: './split/1995-01-01_1999-12-31'
                            arguments[4]-RUNLEN(int): the runlength
                            arguments[5]-VARNAME(list)-Variable names to be considering.
                            arguments[6]-RESSUFIX- suffix to add to the end of results.
                            arguments[7]-CYNET_PATH- path to cynet binary.
                            arguments[8]-FLEXROC_PATH- path to flexroc binary.
        
                def run_pipeline(glob_path,model_nums,horizon, DATA_PATH, RUNLEN, VARNAME,
                                RES_PATH, RESSUFIX = '.res', cores = 4):
        
                    This function is intended to take the output models from midway, process
                    them, and produce graphs. This will call the parallel_process function
                    in parallel using joblib. Eventually stores the result as 'res_all.csv'.
                    Cynet and flexroc are binaries written in C++.
                    Inputs:
                        Glob_path(str)-The glob string to be used to find all models.
                            EX: 'models/*model.json'
                        model_nums(list of ints)- The model numbers to use. Ex; [10,15,20,25]
                        Horizon(int)- prediction horizons to test in unit of temporal
                            quantization (using cynet binary)
                        DATA_PATH(str)-Path to the split files.
                            Ex: './split/1995-01-01_1999-12-31'
                        RUNLEN(int)-Length of run. Ex: 2291.
                        VARNAME(list of str)- List of variables to consider.
                        RES_PATH(str)- glob string for glob to locate all result files.
                            Ex:'./models/*model*res'
                        RESUFFIX(str)- suffix to add to the end of results.Ex:'.res'
                        cores(int)-cores to use for parrallel processing.
        
                        Outputs: Produces graphs of statistics.
        
                def get_var(res_csv, coords,varname='auc',VARNAMES=None):
        
                    This function outputs graphs of the results produced by run_pipeline. The
                    graphs concern auc, fpr, and tpr statistics.
                    Inputs:
                        res_csv(str)- path to 'res_all.csv' file produced by run_pipeline.
                        coords(list of str)- the coords to consider.
                            Ex:['lattgt1','lattgt2','lontgt1','lontgt2']
                        varname(str)-the variable name to consider. Ex: 'auc'.
                            VARNAMES(str)- List of the variable name from the dataset
                            to consider.
                            Ex: VARNAMES=['Personnel','Infrastructure','Casualties']
        
        **viscynet library classes:**
          visualization library for Network Models produced by uNetworkModels based on
          matplotlib
        
          Functions:
            .. code-block::
        
              draw_screen_poly(lats, lons, m, ax, val, cmap, ALPHA=0.6)
                  utility function to draw polygons on basemap
        
                  Inputs -
                      lats (list of floats): mpl_toolkits.basemap lat parameters
                      lons (list of floats): mpl_toolkits.basemap lon parameters
                      m (mpl.mpl_toolkits.Basemap): mpl instance for plotting
                      ax (axis parent handle)
                      cax (colorbar parent handle)
                      val (Matplotlib color)
                      cmap (string): colormap cmap parameter
                      ALPHA (float): alpha value to use for plot
        
                  Outputs -
                      (No outputs - modifies objects in place)
        
        
              getalpha(arr, index, F=0.9)
                  ction to normalize transparency of quiver
        
                  Inputs -
                      arr (iterable): list of input values
                      index (int): index position from which alpha value should be taken from
                      F (float): multiplier
                      M (float): minimum alpha value
        
                  Outputs -
                      v (float): alpha value
        
        
              showGlobalPlot(coords, ts=None, fsize=[14, 14], cmap='jet',
                            m=None, figname='fig', F=2):
                  plot global distribution of events within time period specified
        
                  Inputs -
                      coords (string): filename with coord list as lat1.lat2.lon1.lon2
                      ts (string): time series filename with data in rows, space separated
                      fsize (list):
                      cmap (string):
                      m (mpl.mpl_toolkits.Basemap): mpl instance for plotting
                      figname (string): Name of the Plot
                      F (int)
        
                  Output -
                     num (np.array): data values
                     fig (mpl.figure): heatmap of events from fitted data
                     ax (axis handler): output axis handler
                     cax (colorbar axis handler): output colorbar axis handler
        
        
              viz(unet,jsonfile=False,colormap='autumn',res='c',
                  drawpoly=False,figname='fig',BGIMAGE=None,BGIMGNAME='BM',
                  IMGRES='high',WIDTH=0.007):
        
                  Utility function to visualize spatio temporal interaction networks
        
                  Inputs -
                      unet (string): json filename
                      unet (python dict):
                      jsonfile (bool): True if unet is string  specifying json filename
                      colormap (string): colormap
                      res (string): 'c' or 'f'
                      drawpoly (bool): if True draws transparent patch showing srcs
                      figname  (string): prefix of pdf image file
                  Outputs -
                      m (Basemap handle)
                      fig (figure handle)
                      ax (axis handle)
                      cax (colorbar handle)
        
        
              _scaleforsize(a)
                  normalize array for plotting
        
                  Inputs -
                      a (ndarray): input array
                  Output -
                      a (ndarray): output array
        
        
        
        **bokeh_pipe library:**
          visualization library for Network Models produced by uNetworkModels based on
          bokeh
        
          Process overview:
            This code starts from the point
            when the json data files have been obtained.
        
            To get the neighborhood plot:
                1. run json_to_csv on the batch of json files to get the batch of csv files.
                2. run combine_merc to combine the batch of csv files into one csv file in mercator coordinates.
                3. run neighbor_plot on the combined csv file to get the neighbor hood plot.
        
        
            To get the streamline plot:
                1. same as step 1 of neighborhood plot (can be skipped if already done)
        
                2. run streamheat_combine to combine the batch of csv files into one csv file. *THIS IS IN A FORMAT DIFFERENT FROM THAT OF THE NEIGHBORHOOD PLOT.*
        
                3. run crime_stream.py on the combined file.
        
            To get the heatplot:
                1. same as streamline plot.
                2. same as streamline plot.
                3. run heat_map on the combined file.
        
            We have provided two sample datasets for use. 'crime_filtered_data.csv' can be considered
            the combined file for the neighborhood plot. 'contourmerc.csv' can be considered
            the combined file for the streamline plot and the heatplot.
        
          Functions:
            .. code-block::
        
              json_to_csv(FILEPATH, DEST):
                  This function takes a group of json data files and transforms
                  them into csv files for use. Edit the selection variables as
                  you see fit. It is very important that you initialize DEST to a folder,
                  as it generates many csv files. WARNING: Run this function in
                  python2. The rest of the code should use python3.
                  THIS TAKES QUITE A BIT OF TIME.
        
                Inputs -
                    FILEPATH (string): the filepath to the json files. Example: 'jsons/'
                    DEST (string): the place for the csv files to be stored. Example: 'csvs/'
        
        
              combine_merc(DIR, filename, N = 20):
                  This function combines the csv's into a single file. At the same time,
                  this function will convert the format of the coordinates from longitude
                  and latitude which is necessary to make our neighborhood plot. Our tileset
                  accepts mercator coordinates. This generates one combined csv in the
                  current directory. USE PYTHON 3.
        
                  Inputs:
                      DIR (string): The location(filepath) of the csvs to be combined.
                          Example 'csvs/'
                      filename (string): the desired name for the combined csv file.
                          Example: 'combined.csv'
                      N (int): the max number of sources selected for in json_to_csv:
                          M.select(var='delay',high=20,reverse=False,inplace=True).
                          high argument is N.
        
        
              neighbor_plot(filepath= 'crime_filtered_data.csv'):
                  This is the first implementation of our Bokeh plot. The function takes
                  the filepath of the data and opens the bokeh plot in a browser. Chrome
                  seems to be the best browser for bokeh plots. The datafile must be a csv
                  file in the correct format. See the file 'crime_filtered_data.csv' for an
                  example. Each row represents a point, all the lines(sources) connected to
                  it and the gammas and delays associated with the lines. The current
                  implementation results in the bokeh plot, and a linked table of the data.
                  IMPORTANT: Points are in MERCATOR Coordinates. This is because the current
                  tileset for the map is in mercator coordinates.
                  Example file is 'crime_filtered_data.csv'
        
                  Inputs -
                    filepath (string): input data file
        
        
              streamheat_combine(DIR, filename):
                  We need to once again combine the csvs, into a format appropriate for the
                  streamplots. This file will do that. This function will produce two files.
                  File 1 will be in longitude and latitude. File 2 will be in mercator
                  coordinates. We will be primiarily working with file 2
        
                  Inputs -
                      DIR (string): The filepath to the csvs. Ex: 'csvs/'
                      filename (string): The filename for the combined csv file.
                        Ex: 'contourmerc.csv'
        
        
              crime_stream(datafile='contourmerc.csv',density=4, npoints=10,
                  output_name='streamplot.html', method = 'cubic'):
                  This function takes a csv datafile of crime vectors, reads it into
                  a pandas dataframe and plots the streamplot using Delanuay
                  interpolation. Function will open the plot in a new browser. Use chrome.
                  Inputs:
                      datafile: name of the csv file. Example file is 'contourmerc.csv'
                      density: desired line density of the plot. Ex: 4.
                      npoints: The dimensions used for the streamplot. The grid will
                          have npoints**2 number of grids. It is not advised to have
                          npoints > 200.
                          Reccommended: npoints =10.
                      ouput_name: name to save plot to.
                      method: method for interpolation. 'cubic','linear', or 'nearest'
        
        
              heat_map(datafile='contourmerc.csv', npoints=300, output_name='heatmap.html',
                      method = 'linear'):
                  Makes a heatmap from the same datafile that cimre_stream uses.
                  datafile: name of the datafile. Example file is 'contourmerc.csv'.
                  npoints: dimension for plot. number of squares = npoints**2.
                      Recommended: 100-300
        
                  Inputs -
                    output_name (string): output file name for the plot.
                    method (string): method for interpolation. 'cubic','linear', or 'nearest'
        
        
        VERSION 1.1.03
        
Keywords: spatial,temporal,inference,statistical,causality
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2.7
