Metadata-Version: 2.1
Name: change_detection
Version: 0.3.1
Summary: package for detecting change in time-series data
Home-page: https://github.com/ebmdatalab/change_detection
License: UNKNOWN
Author: Alex Walker
Author-email: alex.walker@phc.ox.ac.uk
Description-Content-Type: text/markdown
Classifier: License :: OSI Approved :: MIT License
Requires-Dist: pandas >= 0.24
Requires-Dist: pandas-gbq >= 0.8.0
Requires-Dist: numpy
Requires-Dist: rpy2
Requires-Dist: ebmdatalab >=0.0.20

# Change detection in prescribing data

Detects changes in time series with a python wrapper around the R package gets (https://cran.r-project.org/web/packages/gets/index.html). Uses a combination of Google BigQuery and Python to query data, which is then fed to the R change detection code. Outputs a table containing results.

## Installation
`pip install change_detection`

Anaconda users may have to `conda install rpy2` and `conda install geopandas` if not already installed.

## Usage
See https://github.com/ebmdatalab/change_detection/blob/master/examples/examples.ipynb for examples of use.

## Data flow
1. Get data, by:
    - using a csv in `data/<name>`, which must have only the fields `code`, `month`, `numerator` and `denominator`
    - creating a BigQuery SQL query in the same folder as the notebook that you're using, query must produce a table with only the fields `code`, `month`, `numerator` and `denominator`
    - querying any number of the OpenPrescribing measures in BigQuery
2. Reshapes data with Pandas
3. Splits data into chunks and passes each chunk to the R change detection code
4. The resulting output is then extracted with further R code
5. The R outputs are then concatenated

### Options
- `name` specifies either the name of the custom SQL file, or the name of the BigQuery measure to be queried
- `verbose` makes the R output more verbose to help with bug fixing _default = False_
- `sample` for testing purposes, takes a random sample of 100 entities, to reduce processing time _default = False_
- `measure` specifies that the `name` specified refers to a measure, rather than custom SQL _default = False_
- `direction` specifies which direction to look for changes, may be `'up'`, `'down'`, or `'both'`, _default = 'both'_
- `use_cache` passes the `use_cache` option to `bq.cached_read` _default = True_
- `csv_name` to specify a .csv file to be used in the change detection, rather than getting the data from BigQuery
- `overwrite` forces reprocessing of the change detection, default behaviour is to not re-run if the output files exist _default = False_
- `draw_figures` draw an R plot for each of the time-series, along with plotting regression lines/breaks. These are stored in the `figures` folder. Options are `'no'` or `'yes'` _default = 'no'_

## Output table

### Timing Measures
`is.tfirst` First negative break
`is.tfirst.pknown` First negative break after a known intervention date
`is.tfirst.pknown.offs` First negative break after a known intervention date not offset by a XX% increase
`is.tfirst.offs` First negative break not offset by a XX% increase
`is.tfirst.big` Steepest break as identified by `is.slope.ma`

### Slope Measures
`is.slope.ma` Average slope over steepest segment contributing at least XX% of total drop
`is.slope.ma.prop` Average slope as proportion to prior level
`is.slope.ma.prop.lev` Percentage of the total drop the segment used to evaluate the slope makes up

### Level Measures
`is.intlev.initlev` Pre-drop level
`is.intlev.finallev` End level
`is.intlev.levd` Difference between pre and end level
`is.intlev.levdprop` Proportion of drop


## Requirements

Python with an associated install of R. Python dependencies should be dealt with on installation (though for my install, I had to install rpy2 separately. R packages should be installed with the package is first loaded.

### Python installation requires:
- ebmdatalab library https://github.com/ebmdatalab/datalab-pandas
- rpy2 (to install R and the below libraries)
- pandas
- pandas-gbq
- numpy

### R installation requires:
- zoo
- caTools
- gets


