*****
Guide
*****

Model Interface
===============

A MISTK model implements the `mistk.model.AbstractModel` class. This class
provides a set of abstract methods that represent the model lifecycle and
must be implemented by the new model. Ultimately, these methods form the core of
the web endpoint service that is made available for every model implementation.

**do_initialize** (objectives: list, props: dict, hparams: dict)
    Called once the endpoint service has launched.  This would typically be the
    first call made to the service. Perform any general setup and initialization.

    :param objectives: A list of model objectives intended to aid in its
        setup and initialization. Possible values are:
        'training', 'prediction', 'streaming_prediction', 'generation', 'transfer_learning'.
    :param props: A dictionary of settings or configuration values that are passed
        from the ecosystem, but are not considered model hyperparameters.
    :param hparams: A dictionary of hyperparameters that are used by the model.

**do_load_data** (dataset_map: dict)
    Instructs the container to load training, testing, or generation data (or at least
    record in memory where the data is) from the supplied paths.

    :param dataset_map: A dictionary that maps string keys {train, test, generation} to a
        MistkDataset object that contains information on the dataset to load.
        The `objectives` values for this model determines which keys are present in
        the map (i.e. a model with only a training objective should not depend on
        a 'test' key to be present).

        The MistkDataset values will require the following fields:

            object_info: A dictionary that stores metadata information for the
            dataset. Only the 'name' and 'kind' fields are required, all other
            fields are optional.

            data_path: A string containing the path to the dataset root folder

            modality: A string with value image, audio, video, or text

            format: A string containing the name of the format of this dataset

**do_build_model** (path=None)
    Instructs the service to build all necessary data structures given the
    architecture and selected hyperparameters.

    :param path: The path to the model file or checkpoint that should be loaded.
        Defaults to None if no file was specified in this model's definition.

**do_train** ()
    Perform training with the previously supplied data.

**do_predict** ()
    Perform predictions with the previously supplied data.

**do_generate** ()
	Perform generations with the previously supplied data.

**do_pause** ()
    Pause the current training or testing.

**do_resume_training** ()
    Resume the previously paused training.

**do_resume_predict** ()
    Resume the previously paused predictions.

**do_save_predictions** (dataPath)
    Save the model predictions to the supplied data path. The predictions should
    be saved as CSV with each row in the following format:

    id,label[,confidence][,bounds]

    :param dataPath: The path to which the predictions should be saved.
    
**do_save_generations** (dataPath)
	Saves the generated media produced by the model to the supplied data path.
	
	:param dataPath: The path to which the generations should be saved. 

**do_stream_predict** (data_map: dict, details: bool)
    Perform predictions on the input dict of id's to base64 encoded data and
    return a dict of id's to predicted values.
    The underlying format of the base64 encoded data should be the native
    input format for the model.

    :param data_map: A dict of id's to base64 encoded data.
    :param details: Optional parameter for the model to provide additional details in the returned dict. Default value is False.
 
    :return: A dict of id's to predicted values. The id 'details' is a keyword used for additional details provided in Markdown or HTML.

**do_update_stream_properties** (props: dict)
    Updates the stream prediction properties for subsequent stream predict calls. 
    
    :param dict: A dict that can be used for streaming properties.
    
**do_save_model** (path)
    Save a checkpoint of the model to the supplied data path. Format of the saved
    file(s) is at the discretion of the model. The infrastructure associates
    this checkpoint with this model.

    :param path: The path to which the model should be saved.

**do_terminate** ()
    Prepare for application termination.

The Model State Machine
-----------------------

Model implementations follow a workflow lifecycle based on state machine
transitions. The underlying MISTK infrastructure ensures that only legal
transitions from one state to another can be made and that the appropriate
model methods are called during those transitions. The image below illustrates
the high-level state machine transitions and the model methods that may be
called between them.

.. image:: images/state_machine.png

The 'Terminated' state (and associated method 'do_terminate') is not
pictured but is a valid transition from any of the states depicted above.
Note that an internal 'Failed' state will be entered if the model
implementation throws an exception. No further model activities are permitted
from the 'Failed' state.

If a model workflow enters the 'Failed' state, then the workflow will need to be restarted
from the beginning. This is required in order to support various machine learning
technologies (ie. Tensorflow) that may not have a methodology to reset/resume their state
after a critical failure.

Initializing a Model
--------------------

Initialization of a model loads the hyperparameters and model properties associated
with the model. Hyperparameters are defined by the algorithm that a model is based on.
Model properties are defined by a specific implementation of an algorithm (ie. a
PyTorch implementation of Densenet may have different properties than a
Tensorflow implementation). Below are sample hyperparameters and model properties
dictionaries that can be passed to a model implementation's 'do_initialize' method.

Hyperparameters
:::::::::::::::
  {
    'learning_rate': 0.001,
    'weight_decay': 0.004
    'momentum': 0.9
  }

Model Properties
::::::::::::::::
  {
    'arch': 'densenet',
    'num_labels': 1000,
    'model_load_file': 'checkpoint.pth'
  }


Loading Data
------------

When loading data into a model instance using the **do_load_data(dataset_map)**
method inherited from 'AbstractModel', the dataset_map parameter must include
sufficient information for each dataset mapped to an objective {train, test, generate}.
The required fields include 'object_info', 'data_path', 'modality', and 'format'
(Described in the Model API method documentation previously)
An example dataset map is provided below:

::

  {
    "train": {
       "object_info": {
         "name": "image_training_set",
         "kind": "MistkDataset"
       },
       "data_path": "/path/to/data",
       "modality": "image",
       "format": "jpg"
    },
    "test": {
       "object_info": {
         "name": "image_testing_set",
         "kind": "MistkDataset"
       },
       "data_path": "/path/to/data",
       "modality": "image",
       "format": "jpg"
    },
    "generation": {
       "object_info": {
         "name": "image_generation_set",
         "kind": "MistkDataset"
       },
       "data_path": "/path/to/data",
       "modality": "text",
       "format": "raw"
    }
  }

The dataset for generations should contain files that indicate the targets for generation.


Please note, that when using the RESTful API directly, where the user would be
submitting the JSON via the 'load_data' API call, the JSON keys would need to be
in camelcase format rather than underscored (ie. object_info becomes objectInfo).

Model Status
------------

Model implementations are encouraged to report status pertaining to their
current workflow state back to the MISTK infrastructure by calling the
**update_status(dict)** method inherited from `AbstractModel` (note this method
is intended to be called by the model, not overridden as the methods above).
This method takes a dictionary object of key-value pairs defined by the model.
For example, the logistic regression model below updates its status with the number of
samples fit during training and the number of samples predicted during testing.

Predictions and Ground Truth
----------------------------

Model prediction output from the **do_save_predictions** method as well as
corresponding dataset ground truth should be CSV-formatted as follows::

  id,label[,confidence][,bounds]

The `id` field can be arbitrary as long as predictions and ground truth use the
same values.
All columns except `id` may contain whitespace-separated values as necessary.
The confidence probabilities and bounds values are optional.

* Example Prediction CSV line for only id and label:

    mnist-test-00001,7

* Example Prediction CSV line for id and a single label, confidence, and bounds:

    ILSVRC2012_val_00000163,n03733805,.9,84 0 433 278
    
* Example Prediction CSV line for id and a single label, no confidence, and bounds:

    ILSVRC2012_val_00000163,n03733805,,84 0 433 278

* Example Prediction CSV line for id and multiple (3) label, confidence, and bounds:

    ILSVRC2012_val_00004833,n03733805 n03733805 n03733805,.9 .8 .8,85 93 350 355 301 306 421 428 0 5 469 373

* Example of *incorrect* Prediction CSV line where addition bounds and confidence will be ignored due to only 1 label:

    ILSVRC2012_val_00004833,n03733805,.9 .8 .8,85 93 350 355 301 306 421 428 0 5 469 373
    

Example Model
-------

The following example code implements a logistic regression algorithm
from scikit-learn with the MISTK model interface which operates on a common public dataset:

.. literalinclude:: sample_code_files/logistic_regression.py
    :linenos:
    :language: python

Transfer learning
-----------------

Model support of transfer learning is independent to each model implementation.
Model developers that use the MISTK library need to incorporate their implementation
features for transfer learning into their build_model and train methods. The PyTorch
implementation below builds their model with specific configurations when performing
transfer learning. The highlighted lines are relevant lines for transfer learning
in this PyTorch implementation. 

.. literalinclude:: sample_code_files/pytorch_model.py
    :emphasize-lines: 62-69, 79-82
    :linenos:
    :language: python

Model developers will need to specify in their model properties that they are performing
transfer learning when using this model for that purpose.


Data Transformation Interface
=============================

The MISTK API provides an interface for creating data transformation plugins. These
plugins run within docker containers similar to models. A MISTK data transform
implements the 'mistk.transform.AbstractTransformPlugin' abstract class. This class
provides a set of abstract methods that represent the data transform lifecycle and
must be implemented by the new data transform plugin. Ultimately, these methods form the core of
the web endpoint service that is made available for every data transform implementation.

**do_transform** (inputDirs: list, outputDir: string, properties: dict)
  Performs a data transformation using the data provided in the input directories
  and stores the resulting dataset(s) in the output directory.

  :param inputDirs: A list of directory paths from which to load input datasets from.
  :param outputDir: The directory path where the output dataset will be stored
  :param props: A dictionary of settings or configuration values that are passed
      to the transformation.

**do_terminate** ()
    Prepare for application termination.

The Transformation State Machine
--------------------------------

Data transformation plugin implementations follow a workflow lifecycle based on state machine
transitions. The underlying MISTK infrastructure ensures that only legal
transitions from one state to another can be made and that the appropriate
model methods are called during those transitions. The image below illustrates
the high-level state machine transitions and the data transform plugin methods
that may be called between them.

.. image:: images/transform_state_machine.png

A data transform plugin instance will automatically go to the Ready state if its
container instance successfully starts up. After completing a transformation, it
will return to the Ready state so that multiple transformations can be executed
using the same container instance.

Example
-------

The following example code implements a ground truth transform for an MNIST formatted dataset 
with the MISTK data transformation interfacet:

.. literalinclude:: sample_code_files/groundtruth_mnist_plugin.py
    :linenos:
    :language: python


Evaluation Interface
====================

The MISTK API provides an interface for creating evaluation plugins to evaluate models
against specific metrics. These plugins run within docker containers similar to models. 
A MISTK evaluation implements the 'mistk.evaluation.AbstractEvaluationPlugin' abstract class. 
This class provides a set of abstract methods that represent the evaluation lifecycle and
must be implemented by the new evaluation plugin. Ultimately, these methods form the core of
the web endpoint service that is made available for every evaluation implementation.

**do_evaluate** (assessment_type: string, metrics: list, input_data_path: string, evaluation_input_format: string, ground_truth_path: string, evaluation_path: string, properties: dict)
  Performs an evaluation using the ground truth data and predictions data for the assessment type and metrics
  specified and stores the resulting evaluation in the output directory.

  :param assessment_type: The evaluation type. One of {'BinaryClassification', 
        'MultilabelClassification', 'MulticlassClassification', 'Regression'}
  :param metrics: Specific metrics to evaluate against instead of all metrics defined by assessment_type
  :param input_data_path: Path to input data for the evaluation
  :param evaluation_input_format: The format of the input data. One of {predictions, generations}
  :param ground_truth_path: The directory path where the ground_truth.csv file is located
  :param evaluation_path: A directory path to where the evaluation.json output file will be stored
  :param properties: A dictionary of key value pairs for evaluation plugin arguments. 

**do_terminate** ()
    Prepare for application termination.
    
Evaluation Metrics
------------------
Metrics are loaded for the evaluation container based on the metrics.json file. This JSON file contains a list
of Metric objects to describe the metric and how it will be executed.

The Metric values will require the following fields:

            object_info: A dictionary that stores metadata information for the
            dataset. Only the 'name' and 'kind' fields are required, all other
            fields are optional.

            image_id: The image id for the evaluation container to run.
            
            assessment_types: The assessment type for the metric, e.g. MulticlassClassification.


An example for a metric:

::

	  {
	  "objectInfo": {
	      "kind": "Metric",
	      "name": "sklearn.metrics.r2_score"
	    },
	    "imageId": "docker-registry:5000/sml-evaluators/sklearn",
	    "assessmentTypes": ["Regression"]
	  }


The Metric values can optionally contain the following fields for accessing external metrics from other Python packages:

            package: (str) The name of the package containing the implementation of this external metric.

            method: (str) The name of the method from `package` to be called when executing the external metric.
                        
            defaultArgs: (dict) The default arguments passed to the method when the metric is called.  
            These key/value pairs can also be used as properties for the metric.
            
An example for an external metric:

::

	  {
	  "objectInfo": {
	      "kind": "Metric",
	      "name": "r2_score"
	    },
	    "imageId": "docker-registry:5000/sml-evaluators/sklearn",
	    "assessmentTypes": ["Regression"],
	    "package": "sklearn.metrics",
	    "method": "r2_score",
	    "defaultArgs": {
	      "multioutput": "uniform_average"
	    }
	  }            
        

The Evaluation State Machine
----------------------------

Evaluation plugin implementations follow a workflow lifecycle based on state machine
transitions. The underlying MISTK infrastructure ensures that only legal
transitions from one state to another can be made and that the appropriate
model methods are called during those transitions. The image below illustrates
the high-level state machine transitions and the evaluation plugin methods
that may be called between them.

.. image:: images/evaluation_state_machine.png

A evaluation plugin instance will automatically go to the Started state if its
container instance successfully starts up. After completing a evaluation, it
will return to the Ready state so that multiple evaluations can be executed
using the same container instance.

Example
-------

The following example code implements a sklearn evaluation metrics with the MISTK evaluation metrics interfacet:

.. literalinclude:: sample_code_files/sklearn_evaluation.py
    :linenos:
    :language: python