Metadata-Version: 2.1
Name: sankee
Version: 0.0.1
Summary: Visualize classified time series data with interactive Sankey plots in Google Earth Engine.
Home-page: https://github.com/aazuspan/sankee
Author: Aaron Zuspan
Author-email: aa.zuspan@gmail.com
License: UNKNOWN
Project-URL: Bug Reports, https://github.com/aazuspan/sankee/issues
Project-URL: Source, https://github.com/aazuspan/sankee/
Keywords: sankey land cover visualization
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, <4
Description-Content-Type: text/markdown
Requires-Dist: backcall (==0.2.0)
Requires-Dist: certifi (==2020.12.5)
Requires-Dist: cffi (==1.14.5)
Requires-Dist: decorator (==4.4.2)
Requires-Dist: earthengine-api (==0.1.254)
Requires-Dist: google-api-python-client (==1.12.8)
Requires-Dist: google-auth-httplib2 (==0.1.0)
Requires-Dist: httplib2 (==0.19.0)
Requires-Dist: httplib2shim (==0.0.3)
Requires-Dist: ipykernel (==5.5.0)
Requires-Dist: ipython-genutils (==0.2.0)
Requires-Dist: jsonschema (==3.2.0)
Requires-Dist: nbformat (==5.1.2)
Requires-Dist: numpy (==1.20.1)
Requires-Dist: pandas (==1.2.3)
Requires-Dist: pickleshare (==0.7.5)
Requires-Dist: plotly (==4.14.3)
Requires-Dist: protobuf (==3.15.5)
Requires-Dist: pyasn1 (==0.4.8)
Requires-Dist: pyasn1-modules (==0.2.8)
Requires-Dist: pytz (==2021.1)
Requires-Dist: retrying (==1.3.3)
Requires-Dist: wcwidth (==0.2.5)
Requires-Dist: prompt-toolkit (==3.0.16) ; python_full_version >= "3.6.1"
Requires-Dist: future (==0.18.2) ; python_version >= "2.6" and python_version not in "3.0, 3.1, 3.2, 3.3"
Requires-Dist: pyparsing (==2.4.7) ; python_version >= "2.6" and python_version not in "3.0, 3.1, 3.2, 3.3"
Requires-Dist: attrs (==20.3.0) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3"
Requires-Dist: idna (==2.10) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3"
Requires-Dist: packaging (==20.9) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3"
Requires-Dist: pycparser (==2.20) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3"
Requires-Dist: python-dateutil (==2.8.1) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3"
Requires-Dist: six (==1.15.0) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3"
Requires-Dist: uritemplate (==3.0.1) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3"
Requires-Dist: chardet (==4.0.0) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4"
Requires-Dist: requests (==2.25.1) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4"
Requires-Dist: urllib3 (==1.26.3) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4" and python_version < "4"
Requires-Dist: google-api-core (==1.26.1) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4, 3.5"
Requires-Dist: google-auth (==1.27.1) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4, 3.5"
Requires-Dist: google-cloud-core (==1.6.0) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4, 3.5"
Requires-Dist: google-cloud-storage (==1.36.1) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4, 3.5"
Requires-Dist: google-resumable-media (==1.2.0) ; python_version >= "2.7" and python_version not in "3.0, 3.1, 3.2, 3.3, 3.4, 3.5"
Requires-Dist: google-crc32c (==1.1.2) ; python_version >= "3.5"
Requires-Dist: jupyter-client (==6.1.11) ; python_version >= "3.5"
Requires-Dist: pygments (==2.8.1) ; python_version >= "3.5"
Requires-Dist: pyrsistent (==0.17.3) ; python_version >= "3.5"
Requires-Dist: tornado (==6.1) ; python_version >= "3.5"
Requires-Dist: googleapis-common-protos (==1.53.0) ; python_version >= "3.6"
Requires-Dist: jedi (==0.18.0) ; python_version >= "3.6"
Requires-Dist: jupyter-core (==4.7.1) ; python_version >= "3.6"
Requires-Dist: parso (==0.8.1) ; python_version >= "3.6"
Requires-Dist: pyzmq (==22.0.3) ; python_version >= "3.6"
Requires-Dist: rsa (==4.7.2) ; python_version >= "3.6"
Requires-Dist: ipython (==7.21.0) ; python_version >= "3.7"
Requires-Dist: traitlets (==5.0.5) ; python_version >= "3.7"
Requires-Dist: cachetools (==4.2.1) ; python_version ~= "3.5"
Requires-Dist: colorama (==0.4.4) ; sys_platform == "win32"
Requires-Dist: pywin32 (==300) ; sys_platform == "win32"
Provides-Extra: dev

# sankee
Visualize classified time series data with interactive Sankey plots in Google Earth Engine

[![MODIS glacier loss example Sankey plot](examples/MODIS.png)](https://htmlpreview.github.io/?https://github.com/aazuspan/sankee/main/examples/MODIS.html)



## Description
`sankee` provides a dead-simple API that combines the power of [GEE](https://github.com/google/earthengine-api) and [Plotly](https://github.com/plotly/plotly.py) to visualize changes in land cover, plant health, burn severity, or any other classified imagery over a time series in a region of interst using interactive Sankey plots. Use a library of built-in datasets like NLCD, MODIS Land Cover, or CGLS for convenience or define your own custom datasets for flexibility.

## Installation
```
pip install sankee
```

## Requirements
- An authenticated GEE Python environment ([offical guide](https://developers.google.com/earth-engine/guides/python_install))

## Quick start 
### Using a Premade Dataset
Datasets in `sankee` are used to apply labels and colors to classified imagery. `sankee` includes premade `Dataset` objects for common classified datasets in GEE like NLCD, MODIS land cover, and CGLS. See [datasets](https://github.com/aazuspan/sankee#Datasets) for a detailed explanation.
```python
import ee
import sankee

ee.Initialize()

# Choose a premade dataset object contains band, label, and palette information for NLCD
dataset = sankee.datasets.NLCD2016

# Build a list of images
img_list = [ee.Image(f"USGS/NLCD/NLCD2001"), ee.Image(f"USGS/NLCD/NLCD2016")]
# Build a matching list of labels for the images (optional)
label_list = ["2001", "2016"]

# Define an area of interest
vegas = ee.Geometry.Polygon(
        [[[-115.4127152226893, 36.29589873319828],
          [-115.4127152226893, 36.12082334399102],
          [-115.3248245976893, 36.12082334399102],
          [-115.3248245976893, 36.29589873319828]]])

# Choose a title to display over your plot (optional)
title = "Las Vegas Urban Sprawl, 2001 - 2016"

# Generate your Sankey plot
plot = sankee.sankify(img_list, vegas, label_list, dataset, max_classes=4, title=title)
```
[![NLCD Las Vegas urbanization example Sankey plot](examples/NLCD.png)](https://htmlpreview.github.io/?https://github.com/aazuspan/sankee/main/examples/NLCD.html)

### Using a Custom Dataset
Datasets can also be manually defined for custom datasets. In this example, we'll classify 1-year and 5-year post-fire Landsat imagery using NDVI and visualize plant recovery using `sankee`.
```python
import ee
import sankee

ee.Initialize()

# Load fire perimeters from MTBS data
fires = ee.FeatureCollection("users/aazuspan/fires/mtbs_1984_2018")
# Select the 2014 Happy Camp Complex fire perimeter in California
fire = fires.filterMetadata("Fire_ID", "equals", "CA4179612337420140814")

# Load imagery 1 year after fire and 5 years after fire
immediate = ee.Image("LANDSAT/LC08/C01/T1_TOA/LC08_045031_20150718")
recovery = ee.Image("LANDSAT/LC08/C01/T1_TOA/LC08_046031_20200807")

# Calculate NDVI
immediate_NDVI = immediate.normalizedDifference(["B5", "B4"])
recovery_NDVI = recovery.normalizedDifference(["B5", "B4"])

# Reclassify continuous NDVI values into classes of plant health
immediate_class = ee.Image(1) \
  .where(immediate_NDVI.lt(0.3), 0) \
  .where(immediate_NDVI.gt(0.5), 2) \
  .rename("health")

recovery_class = ee.Image(1) \
  .where(recovery_NDVI.lt(0.3), 0) \
  .where(recovery_NDVI.gt(0.5), 2) \
  .rename("health")

# Specify the band name for the image
band = "health"

# Assign labels to the pixel values defined above
labels = {
    0: "Unhealthy",
    1: "Moderate",
    2: "Healthy"
}
# Assign colors to the pixel values defined above
palette = {
    0: "#e5f5f9",
    1: "#99d8c9",
    2: "#2ca25f"
}

# Define the images to use and create labels to describe them
img_list = [immediate_class, recovery_class]
label_list = ["Immediate", "Recovery"]

# Generate your Sankey plot
plot = sankee.sankify(img_list, fire, label_list, band=band, labels=labels, palette=palette, scale=20)
```
[![NDVI post-fire recover example Sankey plot](examples/NDVI.png)](https://htmlpreview.github.io/?https://github.com/aazuspan/sankee/main/examples/NDVI.html)


### Datasets

Datasets in `sankee` define how classified image values are labeled and colored when plotting (eg. a value of 42 in an NLCD 2016 image should be labeled "Evergeen forest" and colored green). `label` and `palette` arguments for `sankee` functions can be manually provided as dictionaries where pixel values are keys and labels and colors are values. Every value in the image __must__ have a corresponding color and label. Datasets also define the `band` name in the image in which classified values are found.

Any classified image can be visualized by manually defining a band, palette, and label. However, premade datasets are included for convenience in the `sankee.datasets` module. To access a dataset, use its name, such as `sankee.datasets.NLCD2016`. To get a list of all dataset names, run `sankee.datasets.names()`. Datasets can also be accessed using `sankee.datasets.get()` which returns a list of `Dataset` objects that can be selecting by indexing.

# API
## Core function
### sankee.sankify(image_list, region, *label_list, dataset, band, labels, palette, exclude, max_classes, n, title, scale, seed, dropna*)

Generate `n` random samples points within a `region` and extract classified pixel values from each image in an `image list`. Arrange the sample data into a Sankey plot that can be used to visualize changes in image classifications.

__Arguments__  
- image_list (list)
  - An ordered list of images representing a time series of classified data. Each image will be sampled to generate the Sankey plot. Any length of list is allowed, but lists with more than 3 or 4 images may produce unusable plots. 
- region (ee.Geometry)
  - A region to generate samples within.
- *label_list (list, default: None)*
  - An list of labels corresponding to the images. The list must be the same length as `image_list`. If none is provided, sequential numeric labels will be automatically assigned starting at 0.
- *dataset (sankee.datasets.Dataset, default: None)*
  - A premade dataset that defines the band, labels, and palette for all images in `image_list`. If none is provided, `band`, `labels`, and `palette` must be provided instead.
- *band (str, default: None)*
  - The name of the band in all images of `image_list` that contains classified data. If none is provided, `dataset` must be provided instead.
- *labels (dict, default: None)*
  - The labels associated with each value of all images in `image_list`. Every value in the images must be included in the `labels` dictionary. If none is provided, `dataset` must be provided instead.
- *palette (dict, default: None)*
  - The colors associated with each value of all images in `image_list`. Every value in the images must be included in the `palette` dictionary. If none is provided, `dataset` must be provided instead. Colors must be supported by `Plotly`.
- *exclude (list, default: None)*
  - An optional list of pixel values to exclude from the plot. Excluded values must be raw pixel values rather than class labels.
- *max_classes (int, default: None)*
  - If a value is provided, small classes will be removed until `max_classes` remain.
- *n (int, defualt: 100)*
  - The number of samples points to randomly generate for characterizing all images. More samples will provide more representative data but will take longer to process.
- *title (str, default: None)*
  - An optional title that will be displayed above the Sankey plot.
- *scale (int, default: None)*
  - The scale in image units to perform sampling at. If none is provided, GEE will attempt to use the image's nominal scale, which may cause errors.
- *seed (int, default: 0)*
  - The seed value used to generate repeatable results during random sampling.
- *dropna (bool, default: True)*
  - If the `region` extends into areas that contain no data in any image, some samples may have null values. If `dropna` is True, those samples will be dropped. This may lead to fewer samples being returned than were requested by `n`. 

__Returns__
- A `Plotly` Sankey plot object.

---
## Dataset functions
### sankee.datasets.names()

Get a list of supported dataset names. Names can be used to access datasets using `sankee.datasets.{dataset_name}`.

__Arguments__
- None

__Returns__ (list)
- A list of strings for supported dataset names.

### sankee.datasets.get(*i*)
Get a list of supported `sankee.datasets.Dataset` objects.  
__Arguments__
- *i (int, default: None)*
  - An optional index to retrieve a specific dataset.  

__Returns__ (list)
- A list of supported `sankee.datasets.Dataset` objects. If `i` is provided, only one object is returned.

### sankee.datasets.Dataset.get_images(*max_images*)
Get a list of image names in the collection of a specific dataset.  
__Arguments__
- *max_images (int, default: 20)*
  - The max number of images to return.  

__Returns__ (list)
- A list of image names that can be used to load `ee.Image` objects.

__Example__
```python
sankee.datasets.NLCD2016.get_images(3)

>> ['USGS/NLCD/NLCD1992', 'USGS/NLCD/NLCD2001', 'USGS/NLCD/NLCD2001_AK', '...']
```
---
## Dataset properties and attributes
### sankee.datasets.Dataset.collection
- Return the image collection associated with the dataset.

### sankee.datasets.Dataset.df
- Return a Pandas dataframe describing the classes, labels, and colors associated with the dataset.

### sankee.datasets.Dataset.id
- Return the system ID of the image collection.

