Metadata-Version: 2.1
Name: energy-dashboard-client
Version: 0.3
Summary: Energy Dashboard Command Line Interface (CLI)
Home-page: http://github.com/energy-analytics-project/energy-dashboard-cli
Author: Todd Greenwood-Geer (Enviro Software Solutions, LLC)
Author-email: pub+github@zwrob.com
License: UNKNOWN
Description: # Energy Dashboard Command Line Interface (edc)
        
        Command Line Interface for the Energy Dashboard.
        
        !!!PRE-ALPHA!!!
        
        While this is the master branch, this project is not released yet. Stand by...
        
        All examples commands, install, etc. assume a linux (ubuntu) installation
        and use the `apt` package manager, etc.
        
        ## Prerequisites
        
        ### Install basic deps
        
        ```bash
        sudo apt install parallel
        sudo apt install build-essential
        sudo apt install git
        ```
        
        ### Install git-lfs (git large file store)
        
        *git-lfs* is used for storing the database files, which are basically binary blobs
        that are updated periodically. Rather than store all the db blob revisions in the 
        git repository, which would bloat it considerably, the db blobs are offloaded
        to git-lfs.
        
        For installation instructions, go here:
        
        * https://git-lfs.github.com/
        
        Example:
        
        ```bash
        curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
        ```
        
        ### Install conda/anaconda
        
        You don't strictly _need_ anaconda for this toolchain to work. If you prefer
        mucking with python virtualenv directly, then go for it. I find that anaconda
        works really well with other parts of this toolchain, namely Jupyter Notebooks. 
        All the examples and documentation will assume you are using anaconda.
        
        Example, see the website for current instructions:
        
        * https://www.anaconda.com/distribution/#download-section
        
        Example:
        
        ```bash
        wget https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-x86_64.sh
        chmod +x Anaconda3-2019.07-Linux-x86_64.sh 
        ./Anaconda3-2019.07-Linux-x86_64.sh 
        ```
        
        ## Installation
        
        This webpage has a great tutorial on how to use conda. It's what I use
        when I forget the commands and concepts:
        
        * https://geohackweek.github.io/Introductory/01-conda-tutorial/
        
        First, create a conda environment, it can be named anything, I'll call
        this `edc-cli`:
        
        ```bash
        conda update conda
        conda create -n edc-cli python=3 numpy jupyter pandas
        conda activate edc-cli
        ```
        
        Then install the energy-dashboard-client:
        
        ```bash
        pip install -U energy-dashboard-client
        ```
        
        
        ## Setup
        
        The energy-dashboard-client has two commands to get you up and running with an
        energy-dashboard:
        
        * clone : this will literally use git to clone the energy-dashboard repo to your local machine
        * update : this will pull down all the submodules to your local machine
        
        Note: if you only want a subset of the submodules installed on your local machine, then you
        can use the `git submodule deinit data/[name-of-submodule-to-remove]`.
        
        As always, let me know if you need better tooling around this or any other aspect of this project.
        
        ```bash
        mkdir foo
        cd foo
        edc clone
        cd energy-dashboard
        edc update
        ```
        
        At this point you should have a working environment:
        
        Verify that you have files:
        
        ```bash
        $ tree -L 1
        .
        ├── data
        ├── docs
        ├── LICENSE
        ├── notebooks
        ├── README.md
        └── run.sh
        ```
        
        Verify that `edc` works:
        
        ```bash
        `$ edc --help
        Usage: edc [OPTIONS] COMMAND [ARGS]...
        
          Command Line Interface for the Energy Dashboard. This tooling  collects
          information from a number of data feeds, imports that data,  transforms
          it, and inserts it into a database.
        
        Options:
          --ed-dir TEXT                   Energy Dashboard directory (defaults to cwd)
          --log-level [CRITICAL|ERROR|WARNING|INFO|DEBUG]
          --help                          Show this message and exit.
        
        Commands:
          clone    Clone energy-dashboard locally
          feed     Manage individual 'feed' (singular).
          feeds    Manage the full set of data 'feeds' (plural).
          license  Show the license (GPL v3).
          update   Update the submodules
        ```
        
        Verify that you can list out the data feeds:
        
        ```bash
        $ edc feeds list | head
        data-oasis-atl-ruc-zone-map
        data-oasis-cbd-nodal-grp-cnstr-prc
        data-oasis-cmmt-rmr-dam
        data-oasis-atl-sp-tie
        data-oasis-prc-mpm-cnstr-cmp-dam
        data-oasis-trns-curr-usage-all-all
        data-oasis-ene-baa-mkt-events-rtd-all
        data-oasis-ene-eim-transfer-limit-all-all
        data-oasis-as-results-dam
        data-oasis-ene-wind-solar-summary
        ```
        
        Using `find` we can verify that we don't have any data files 
        such as .zip, .xml, or .sql in the tree, but that we _do_ have
        the state files:
        
        ```
        $ find data/ | grep state | head
        data/data-oasis-atl-ruc-zone-map/sql/state.txt
        data/data-oasis-atl-ruc-zone-map/xml/state.txt
        data/data-oasis-atl-ruc-zone-map/zip/state.txt
        data/data-oasis-cbd-nodal-grp-cnstr-prc/sql/state.txt
        data/data-oasis-cbd-nodal-grp-cnstr-prc/xml/state.txt
        data/data-oasis-cbd-nodal-grp-cnstr-prc/zip/state.txt
        data/data-oasis-cmmt-rmr-dam/sql/state.txt
        data/data-oasis-cmmt-rmr-dam/xml/state.txt
        data/data-oasis-cmmt-rmr-dam/zip/state.txt
        data/data-oasis-atl-sp-tie/sql/state.txt
        ```
        
        Now verify what databases you have downloaded...
        
        ```bash
        $ find data/ | grep "\.db$" | head
        data/data-oasis-atl-gen-cap-lst/db/data-oasis-atl-gen-cap-lst_00.db
        data/data-oasis-sld-adv-fcst-rtd/db/data-oasis-sld-adv-fcst-rtd_01.db
        data/data-oasis-sld-adv-fcst-rtd/db/data-oasis-sld-adv-fcst-rtd_00.db
        data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_00.db
        data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_03.db
        data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_01.db
        data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_05.db
        data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_04.db
        data/data-oasis-sld-sf-eval-dmd-fcst/db/data-oasis-sld-sf-eval-dmd-fcst_02.db
        data/data-oasis-ene-flex-ramp-dc-rtd-all/db/data-oasis-ene-flex-ramp-dc-rtd-all_00.db
        ```
        
        I'll go over this in more detail below, but the reason there are multiple database files 
        for a given data feed is because the feed has multiple formats (argh!) and I 
        have not yet sorted out how to deal with that. More on this later.
        
        
        ## Use Cases
        
        ### Create Jupyter Notebook
        
        TODO
        
        This is what most of the users of this project want to do.
        
        ### Process Data Feeds
        
        At a high level, a data feed is simply a url and some instructions for processing
        it. The url is stored in the `manifest.json`, and the processing instructions
        are stored in the `./src` directory. The `./src` directory contains python files
        that handle downloading, parsing, constructing sql insert statements, and inserting
        the data into a sqlite3 database. See the section on `Add New Data Feed` for more
        details on the construction of a data feed.
        
        Data feeds are processed in stages, and you can think of this as a vertical to
        horizontal processing.
        
        Horizontally, the process is very simple, we move from downloading a resource
        through the stages until we insert the records into a database. The DATABASE 
        is the final product.
        
        ```bash
        DOWNLOAD -> EXTRACT -> PARSE -> SQL -> INSERT -> *DATABASE*
        ```
        
        Vertically, the process is also very simple. Each stage processes all the artifacts
        from the previous stage. However, we need this to be robust and re-startable. No idea
        if that's a word, but it's a real thing. Machines crash, or you may need to stop
        for some reason. I've also had the case where I had an error in the logic in a given
        stage, and needed to start over from scratch. How do we do this here? Easy. A `state`
        file:
        
        ```bash
        DOWNLOAD        -> EXTRACT     -> PARSE SQL     -> INSERT       -> *DATABASE*
        ./zip/          ./xml/          ./sql/          ./db/
            state.txt       state.txt       state.txt       state.txt
        ```
        
        Each state.txt contains a list of artifacts that have already been processed. 
        Originally I called these: ./zip/downloaded.txt, ./xml/unzipped.txt, etc. But 
        after working with this for a few days, it is easier to just `cat` out [dir]/state.txt
        rather than remembering what each state file is named.
        
        Each stage in the pipeline looks at the artifacts in the previous stage and compares
        that list with the list of previously processed artifacts in it's `state.txt` file, and
        then gives the delta of new files to the processing code.
        
        So, to restart a given stage, you just delete the stage directory. This deletes all 
        the generated artifacts *and* the state file. Voila. You are ready to start over.
        
        Note: if you delete a stage, you may want to delete the subsequent stages, too.
        
        Here's the command that does this for you:
        
        ```bash
        edc feed [feed name] reset [stage]
        ```
        
        And to process a given stage:
        
        ```bash
        edc feed [feed name] proc [stage]
        ```
        
        Here's the scenario. I've been writing the code for this project on my laptop. But it does
        not have the horsepower to crunch all this data into the various sqlite databases in
        a reasonable amount of time. So I'm firing up a desktop machine to perform the heavy
        lifting. Here's what that process looks like. This is the same process any researcher 
        that wanted to replicate my work would want to do.
        
        Clone and update
        
        ```bash
        mkdir foo
        edc clone
        cd energy-dashboard
        edc update
        ```
        
        At this point, we have the energy-dashboard project, but we don't want to re-download
        all the previously downloaded files from their original source. In the case of CAISO
        OASIS, that would simply take too long (I've calculated the upper bound as 152 days, 
        though in reality it took about 3 weeks to download the resources here). Instead, we
        can pull these previously downloaded artifacts from one of the public S3 buckets that
        I've mirrored them on...
        
        Example:
        
        ```bash
        #edc feed [feed-name] s3restore
        edc feed data-oasis-atl-ruc-zone-map s3restore
        ```
        
        To grab the artifacts from the entire set of feeds:
        
        
        ```bash
        edc feeds list | xargs -L 1 -I {} edc feed {} s3restore
        ```
        
        
        
        ### Add New Data Feed
        
        TODO
        
        
        
        ## Show Help
        
        
        ## edc
        
        ```bash
        Usage: edc [OPTIONS] COMMAND [ARGS]...
        
          Command Line Interface for the Energy Dashboard. This tooling  collects
          information from a number of data feeds, imports that data,  transforms
          it, and inserts it into a database.
        
        Options:
          --config-dir TEXT     Config file directory
          --debug / --no-debug  Enable debug logging
          --help                Show this message and exit.
        
        Commands:
          config   Manage config file.
          feed     Manage individual 'feed' (singular).
          feeds    Manage the full set of data 'feeds' (plural).
          license  Show the license (GPL v3).
        ```
        
        ### config
        
        ```bash
        Usage: edc config [OPTIONS] COMMAND [ARGS]...
        
          Manage config file.
        
        Options:
          --help  Show this message and exit.
        
        Commands:
          show    Show the config
          update  Update config
        ```
        
        ### feed
        
        ```bash
        Usage: edc feed [OPTIONS] COMMAND [ARGS]...
        
          Manage individual 'feed' (singular).
        
        Options:
          --help  Show this message and exit.
        
        Commands:
          archive    Archive feed to tar.gz
          create     Create new feed
          download   Download from source url
          invoke     Invoke a shell command in the feed directory
          proc       Process a feed through the stages
          reset      Reset feed to reprocess stage
          restore    Restore feed from tar.gz
          s3archive  Archive feed to S3 bucket
          s3restore  Restore feed zip files from from S3 bucket
          status     Show feed status
        ```
        
        ### feeds
        
        ```bash
        Usage: edc feeds [OPTIONS] COMMAND [ARGS]...
        
          Manage the full set of data 'feeds' (plural).
        
        Options:
          --help  Show this message and exit.
        
        Commands:
          list    List feeds
          search  Search feeds (NYI)
        ```
        
        ### license
        
        ```bash
            
            edc : Energy Dashboard Command Line Interface
            Copyright (C) 2019  Todd Greenwood-Geer (Enviro Software Solutions, LLC)
        
            This program is free software: you can redistribute it and/or modify
            it under the terms of the GNU General Public License as published by
            the Free Software Foundation, either version 3 of the License, or
            (at your option) any later version.
        
            This program is distributed in the hope that it will be useful,
            but WITHOUT ANY WARRANTY; without even the implied warranty of
            MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
            GNU General Public License for more details.
        
            You should have received a copy of the GNU General Public License
            along with this program.  If not, see <https://www.gnu.org/licenses/>.
            
        ```
        
        
        ##Usage
        
        ### Examples
        
        ```bash
        edc feed invoke data-oasis-atl-lap-all "git st"
        edc feed invoke data-oasis-atl-lap-all "ls"
        edc feed invoke data-oasis-atl-lap-all "cat manifest.json"
        edc feed invoke data-oasis-atl-lap-all "head manifest.json"
        edc feeds list
        edc feeds list | grep atl
        edc feeds list | grep atl | edc feed invoke "head manifest.json"
        edc feeds list | grep atl | edc feed invoke "head manifest.json" -
        edc feeds list | grep atl | xargs -L 1 -I {} edc feed invoke {} "head manifest.json"
        edc feeds list | grep atl | xargs -L 1 -I {} edc feed invoke {} "jq . < manifest.json"
        edc feeds list | grep atl | xargs -L 1 -I {} edc feed invoke {} "jq .url < manifest.json"
        edc feeds list | grep mileage | xargs -L 1 -I {} edc feed invoke {} "echo {}; sqlite3 db/{}.db 'select count(*) from oasis'"
        edc feeds list | grep atl | xargs -L 1 -I {} edc feed invoke {} "jq .url < manifest.json"
        edc feeds list| xargs -L 1 -I {} edc feed invoke {} "echo {}; sqlite3 db/{}.db 'select count(*) from oasis'"
        edc feeds list | grep atl | xargs -L 1 -I {} edc feed status {}
        edc feeds list | grep atl | xargs -L 1 -I {} edc feed status --header {}
        edc feeds list | grep atl | xargs -L 1 -I {} edc feed status --header {}
        edc feeds list | grep mileage | xargs -L 1 -I {} edc feed status --header {}
        edc feeds list | xargs -L 1 -I {} edc feed invoke {} "./src/10_down.py"
        edc feed archive data-oasis-as-mileage-calc-all
        edc feed archive data-oasis-as-mileage-calc-all | xargs -L 1 -I {} tar -tvf {}
        edc feed reset data-oasis-as-mileage-calc-all --stage xml --stage db
        edc feed s3restore data-oasis-as-mileage-calc-all --outdir=temp --service=wasabi
        edc feed s3archive data-oasis-as-mileage-calc-all
        ```
        
        ### Onboarding
        
        Some quick notes on how I onboarded 'data-oasis-as-mileage-calc-all':
        
        ```bash
        edc feed proc data-oasis-as-mileage-calc-all
        edc feed s3archive data-oasis-as-mileage-calc-all --service wasabi
        edc feed s3archive data-oasis-as-mileage-calc-all --service digitalocean
        edc feed status data-oasis-as-mileage-calc-all --header
        edc feed invoke data-oasis-as-mileage-calc-all "git st"
        edc feed invoke data-oasis-as-mileage-calc-all "git log"
        edc feed invoke data-oasis-as-mileage-calc-all "git show HEAD"
        ```
        
        ## Author
        Todd Greenwood-Geer (Enviro Software Solutions, LLC)
        
        ## Notes
        This project uses submodules, and this page has been useful:
        https://github.blog/2016-02-01-working-with-submodules/
        
        
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Build Tools
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python :: 3.6
Description-Content-Type: text/markdown
