Metadata-Version: 2.1
Name: hound
Version: 1.0.0
Summary: A FireCloud database extension
Home-page: https://github.com/broadinstitute/hound
Author: Aaron Graubert - Broad Institute - Cancer Genome Computational Analysis
Author-email: aarong@broadinstitute.org
License: BSD3
Description: # Hound
        A FireCloud database extension
        
        ## Purpose
        
        This repository contains the source code for the Hound database extension system.
        This system aims to provide a low-latency system for logging changes to a FireCloud
        workspace.
        
        This allows for attribute provenance to be reconstructed by querying the database
        history, and for external tools to log changes as well.
        
        ## Usage
        1) Users with hound-enabled software automatically generate logs as they continue
        to do their work
        2) Hound can recreate attribute value histories from logs to produce provenance
        
        ## Format
        
        Hound logs data in a bucket folder. Entries are organized based on the list below.
        `snowflake` refers to an auto-generated ID from Hound's snowflake implementation.
        Snowflakes are almost guaranteed to be unique (see below)
        
        * hound/: Root folder for hound data in bucket
          * (samples|pairs|participants|sets)/: Folder for entity-metadata change logs
            * (entity-id)/: Folder for entity data
              * (attribute)/: Folder for attribute data on each entity
                * (snowflake): Serial numbered files containing **update objects**
          * workspace/: Folder for workspace-level metadata
            * (attribute)/: Folder for attribute data on the workspace
              * (snowflake): Serial numbered files containing **update objects**
          * logs/: Folder for event-logs
            * (job|upload|meta|other)/: Folder for specific event entries
              * (snowflake): Files containing **log entries**
        
        ### Snowflake spec
        Encode 22-byte snowflake into 44 byte (hex encoded) object name
        * 64-bit (8 byte) unix timestamp (8 byte floating-point number)
        * 64-bit (8 byte) machine id (based on nodename) Only 6 bytes used
        * 16-bit (2 byte) random client id (generated during init of Snowflake object)
        * 16-bit (2 byte) sequence identifier (starts at 0 per client, increments from there)
        * 8-bit (1 byte) Zero field (reserved)
        * 8-bit (1 byte) checksum field (sum of remaining fields)
        
        #### Uniqueness
        
        Snowflakes are structured to almost guarantee uniqueness. If two clients from the
        same machine (or from machines with identical MAC addresses) create a snowflake
        at **exactly** the same time (within their system clocks' precisions) **AND** the
        clients have generated the same number of snowflakes so far (clients have the same
          sequence id), there is a 1/65536 chance that the snowflakes will collide (based on
          client id).
        
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Interface Engine/Protocol Translator
Description-Content-Type: text/markdown
