Metadata-Version: 2.1
Name: contentai-activity-classifier
Version: 1.3.7
Summary: ContentAI Activity Classification Service
Home-page: https://gitlab.research.att.com/turnercode/activity-classifier-extractor
Author: Eric Zavesky
License: Apache
Platform: UNKNOWN
Requires-Python: >=3.6
Description-Content-Type: text/x-rst
Requires-Dist: pandas (==1.0.4)
Requires-Dist: numexpr (==2.7.1)
Requires-Dist: scikit-learn (==0.23.1)
Requires-Dist: h5py (==2.10.0)
Requires-Dist: matplotlib (==3.2.1)
Requires-Dist: imblearn (==0.0)
Requires-Dist: tensorflow (==2.2.0)
Requires-Dist: contentaiextractor (>=1.0.4)

activity-classifier-extractor
=============================


Generates activity classifications from low-level feature inputs in support
of analytic workflows within the `ContentAI Platform <https://www.contentai.io>`__, 
published as the extractor
``dsai_activity_classifier``. 

1. `Getting Started <#getting-started>`__
2. `Execution <#execution-and-deployment>`__
3. `Creating Models <#creating-models>`__
4. `Testing <#testing>`__
5. `Future Development <#future-development>`__
6. `Changes <CHANGES.md>`__

Getting Started
===============

| This library is used as a `single-run executable <#contentai-standalone>`__.
| Runtime parameters can be passed for processing that configure the
  returned results and can be examined in more detail in the
  `main <main.py>`__ script.

-  ``verbose`` - *(bool)* - verbose input/output configuration printing (*default=false*)
-  ``path_content`` - *(str)* - input video path for files to label (*default=video.mp4*)
-  ``path_result`` - *(str)* - output path for samples (*default=.*)
-  ``path_models`` - *(str)* - manifest path for model information (*default=data/models/manifest.json*)
-  ``time_interval`` - *(float)* - time interval for predictions from models (*default=3.0*)
-  ``average_predictions`` - *(bool)* - flatten predictions across time and class (*default=false*)
-  ``round_decimals`` - *(int)* - rounding decimals for predictions (*default=5*)
-  ``score_min`` - *(float)* - apply a minimum score threshold for classes (*default=0.1*)


dependencies
------------

| To install package dependencies in a fresh system, the recommended
  technique is a set of
| vanilla pip packages. The latest requirements should be validated from
  the ``requirements.txt`` file but at time of writing, they were the
  following.

.. code:: shell

   pip install --no-cache-dir -r requirements.txt 

Execution and Deployment
========================

This package is meant to be run as a one-off processing tool that
aggregates the insights of other extractors.

command-line standalone
-----------------------

Run the code as if it is an extractor. In this mode, configure a few
environment variables to let the code know where to look for content.

One can also run the command-line with a single argument as input and
optionally ad runtime configuration (see `runtime
variables <#getting-started>`__) as part of the ``EXTRACTOR_METADATA``
variable as JSON.

.. code:: shell

   EXTRACTOR_METADATA='{"compressed":True}'

Locally Run Classifier on Results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For utility, the above line has been wrapped in the bash script
``run_local.sh``.

.. code:: shell

    ./run_local.sh <docker_image> [<source_directory> <output_data_dir> [<json_args>]] [<all_args>]
       - run clip extraction on source with prior processing

      <docker_image> = 0 IF local command-line based (args using arg parse) 
                     = 1 IF local docker emulation
                     = IMAGE_NAME IF docker image name to run

      ./run_local.sh 0 --path_content features/ --path_result results/ --verbose 
      ./run_local.sh 1 features/ results/ 0 '{\"verbose\"true}' 

Through all of the above examples, the underlying command-line execution is 
similar to this excution run on the testing data.

.. code:: shell

    python -u activity_classifier/main.py --path_content testing/data/launch/video.mp4 
            --path_result testing/class --path_models activity_classifier/data/models/manifest.json --verbose

Feature-Based Similarity
~~~~~~~~~~~~~~~~~~~~~~~~

A helper script is also avaialble to compute the similarity of clips in 
one or more feature files. *(v1.1.0)*

.. code:: shell

    python -u activity_classifier/features.py --path_content testing/data/dummy.txt \\ 
            --feature_type dsai_videocnn dsai_vggish --path_result testing/dist


ContentAI
---------

Deployment
~~~~~~~~~~

Deployment is easy and follows standard ContentAI steps.

.. code:: shell

   contentai deploy dsai_activity_classifier
   Deploying...
   writing workflow.dot
   done

Alternatively, you can pass an image name to reduce rebuilding a docker
instance.

.. code:: shell

   docker build -t dsai_activity_classifier
   contentai deploy metadata-flatten dsai_activity_classifier

Locally Downloading Results
~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can locally download data from a specific job for this extractor to
directly analyze.

.. code:: shell

   contentai data wHaT3ver1t1s --dir data

Run as an Extractor
~~~~~~~~~~~~~~~~~~~

.. code:: shell

   contentai run https://bucket/video.mp4  -w 'digraph { dsai_videocnn -> dsai_activity_classifier; dsai_vggish -> dsai_activity_classifier }'

   JOB ID:     1Tfb1vPPqTQ0lVD1JDPUilB8QNr
   CONTENT:    s3://bucket/video.mp4
   STATE:      complete
   START:      Fri Feb 15 04:38:05 PM (6 minutes ago)
   UPDATED:    1 minute ago
   END:        Fri Feb 15 04:43:04 PM (1 minute ago)
   DURATION:   4 minutes 

   EXTRACTORS

   my_extractor

   TASK      STATE      START           DURATION
   724a493   complete   5 minutes ago   1 minute 

Or run it via the docker image.  Please review the ``run_local.sh`` file for more information.


View Extractor Logs (stdout)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: shell

   contentai logs -f <my_extractor>
   my_extractor Fri Nov 15 04:39:22 PM writing some data
   Job complete in 4m58.265737799s


Adding New Models
=================

There are two steps to adding new models. 

1. First, train the models and formulate
   a well-known structure (this can be done exhaustively across a number of model types).  
   See `MODELS.rst <MODELS.rst>`__ for more details.
2. Update the manifest according to the instructions below to indicate how the activity
   classifier should load the model (e.g. the `framework`), the required features, and
   a few fields for understanding other descriptions (e.g. the `name` and the `id`).


Updating The Manifest
---------------------

Adding models to the pre-determined set of models is as easy as editing a manifest file and 
adding a model into git LFS.  

1. Archive the new model into a serialized fileset.  At time of writing, this was serializing 
   models from `sklearn <https://scikit-learn.org>`__ with simple 
   `pickle load/save serialization <https://scikit-learn.org/stable/modules/model_persistence.html>`__. 
2. Gather all of the relevant output files and compress them if you can.  Currently, the library 
   understands gzip compression extensions (e.g. ".gz").
3. Choose the appropriate sub-directory that corresponds to the upstream feature extractor.  For 
   example, models built on ``3dcnn`` features may process new videos (via `extractor chaining <https://www.contentai.io/docs/extractor-chaining>`__)  
   to the extractor ``dsai_3dcnn``.  If one doesn't exist yet, please create a new directory, but
   remember what combination of audio and video features is required.
4. Modify the manifest file in ``activity_classifier/data/models/manifest.json`` for your new entry.
   Specifically, the input video and audio features must be defined as well as the serialization
   library.  Below is an example block that indicates ``3dcnn` video and ``vggish`` audio features for 
   a model crated with ``sklearn`` where prediction results will be nested with the name ``Running``.

    .. code:: shell

        [ ...
        {
            "path": "3dcnn-vggish/lr-Running.pkl.gz",
            "name": "Running",
            "id": "ugc",
            "framework": "sklearn",
            "video": "dsai_videocnn",
            "audio": "dsai_vggish"
        },
        ... ]

5. Prepare to add your model files to the repo.  **NOTE This repo uses `git-lfs <https://git-lfs.github.com/>`__
   to store all binary files like models.  If your model is added with regular git tools alone, you will 
   get a sternly worded email (and friendly advice on how to re-add correctly).**  

    .. code:: shell

        (from the base directory only)
        git lfs track activity_classifier/data/models/3dcnn/moonwalk_model.pkl.gz
        git add activity_classifier/data/models/3dcnn/moonwalk_model.pkl.gz
        git add activity_classifier/data/models/manifest.json

6. Test your model with the data in the ``testing`` directory.  The CI/CD process should do this too
   but it's always easier to find and fix problems here than with a vague email.  The features in this
   directory came from processing of the `HBO Max Launch Video <https://www.youtube.com/watch?v=9yLNhhHs3-k>`__,
   which is publicly available as a reference.

    .. code:: shell

        (from the base directory)

        ./run_local.sh 0 --path_content testing/data/test.mp4 --time_interval 1.5

        (check for predictions from your new model in data.json) 



Testing
=======

Testing is included via tox.  To launch testing for the entire package, just run `tox` at the command line. 
Testing can also be run for a specific file within the package by setting the evironment variable `TOX_ARGS`.

.. code:: shell

   TOX_ARG=test_basic.py tox 



Future Development
==================

-  additional training hooks?




Changes
=======

Generates activity classifications from low-level feature inputs in support
of analytic workflows within the `ContentAI Platform <https://www.contentai.io>`__.

1.3
---

1.3.7
~~~~~
- fix run_local typos
- more verbosity checks

1.3.6
~~~~~
- modeling.py separators
- docs reorg

1.3.5
~~~~~
- contentai key request fix

1.3.3
~~~~~
- docs update
- multiclass write

1.3.2
~~~~~
- docker build update, run example update

1.3.1
~~~~~
- docs fix for example of using package
- bug fix for default location, change inputs to classify function

1.3.0
~~~~~
- move models out of the primary package
- *breaking change*, rename input param `path_models` to `path_manifest`

1.2
---

1.2.2
~~~~~
- bump version for model migration to LFS

1.2.1
~~~~~
- fix docker/deployed image run command

1.2.0
~~~~~
- switch to package representation, push to pypi
- several updates for MANIFEST definition (id)
- inclusion of multi-parameter training and testing framework
- safety for model loading, catch exceptions, return gracefully
- update documents to split for binary models 

1.1
---

1.1.1
~~~~~
- cosmetic change for reuse in other libraries

1.1.0
~~~~~

- refactor feature code, add utility for difference computation among segments
- min value thresholding to avoid low scoring results in output (default=0.1)
- refactor caching information for feature load (allow flatten, remove cache, allow multi-asset)
- allow recursive feature load for distance compute


1.0
---

1.0.2
~~~~~

- fixes for output, modify to require other extractors as dependencies
- fix order of paramters for local runs


1.0.1
~~~~~

- updates for integration of other models, fixes for prediction output
- add l2norm after average/merge in time of source features

1.0.0
~~~~~

- initial project merge from other sources
- generates json prediction dict
- callable as package
- includes some testing routines with windowing comparison



