Getting Started with DMM

This section will walk you through simple steps to get started on monitoring a model with DMM. Before we get into the specific steps, let’s get a high level overview of model monitoring workflow and how DMM integrates with your system.

Model Monitoring Approach

_images/model-monitoring-approach.png

Model Monitoring Workflow

_images/model-monitoring-workflow.png

DMM monitors models by reading and processing the model’s training, predictions and ground truth data from one of the supported data stores. DMM allows you to link different data stores to it so that it can read data from them. In the current release, DMM can read data from AWS S3 buckets.

Use the steps below steps to quickly get started with drift detection.

1) Connect Data Source

  1. If your S3 bucket is already linked to DMM, skip this step.

  2. Go to the Data Sources section, and click ‘Add Data Source’.

  3. In the popup, fill the details of the AWS S3 bucket in which you will have your data files.

  4. Click on the newly created Data Source to make sure you can see files present in the S3 bucket on DMM.

2) Register Model

  1. Go to the Model Dashboard and click on ‘Register Model > Use Guided Flow’.

  2. Enter the model’s metadata. Click ‘Next’.

  3. Select your S3 bucket from ‘Choose Data Source’ dropdown. The files in it should be displayed.

  4. If the training and prediction data CSV files of the model are not present in the files list, add them to the bucket using ‘Upload New File’.

  5. Select the training data CSV file. Click ‘Next’.

  6. You should see all the columns present in the file. Unselect the columns which you do not want to monitor.

  7. For the selected columns, review and correct the Column Type and Value Type using the dropdown.

  8. Be particularly careful about the row_identifier and timestamp columns.

  9. Once you are satisfied with the selections, Click ‘Next’.

  10. You can see the generated Model Config file used for model registration. Click ‘Register’.

  11. You should see the newly registered model listed on the Health Dashboard.

3) Add Prediction Data

  1. Click on the registered model.

  2. Go to ‘Data Drift > Analyze’ page.

  3. Click on ‘Register Prediction Data > Use Guided Flow’

  4. Select your S3 bucket from ‘Choose Data Source’ dropdown. The files in it should be displayed.

  5. Select or Upload your prediction data CSV file.

  6. If there are any new columns in the prediction dataset, you will be asked to identify if any of those are timestamp, row identifier, prediction probability or sample weight columns (if those were not declared during model registration). These are optional.

  7. Click Next to see the Config file and click ‘Register’.

  8. You will now be taken to the Data Drift analyze page and DMM will begin calculation of drift on this dataset.

4) Analyze Data Drift

  1. After the drift calculations are complete, you should see the divergence value for each feature in the Drift column.

  2. The default Test Type and Threshold value will be used for the computation. (You can change the defaults in ‘Settings > Drift Tests Defaults’ section).

  3. You can now experiment with other Test Types, Conditions and Threshold values for this dataset.

  4. You can filter the data by different Date Ranges to look at Drift and its trends for specific time periods.

  5. If your model had a timestamp column declared, then that is used to get the timestamp of different predictions in the dataset. If it was not declared, then data’s ingestion time in DMM is taken as its time stamp.

  6. You can control which features should be excluded from the Scheduled Checks using the Enable/Disable Alerts toggle icon in the table. This is helpful to reduce alert noise.

  7. If you want to save the test types, threshold or other configs, and want to save them as the default config of the model to be used for running scheduled checks, you can save them using the ‘Other Actions > Save Checks Config’ action.

  8. If you have made experimented with different configs and want to load the model’s default config to reset the Analyze table, you can load them using the ‘Other Actions > Restore Checks Config’ action.

5) Add Ground Truth Data

  1. After you have added prediction data, you can ingest the model’s ground truth data to monitor the quality of predictions made by the model.

  2. For this analysis, it is necessary that a row_identifier column was declared for the model. It is used to match the ground truth labels to the model’s predictions.

  3. Go to ‘Model Quality > Analyze’ page.

  4. Click on ‘Register Ground Truth Data > Use Guided Flow’.

  5. Select your S3 bucket from Choose Data Source dropdown. The files in it should be displayed.

  6. Select or Upload your ground truth data CSV file.

  7. In the next step you need to select which column in the Ground Truth column in the file and the output prediction column that it maps to.

  8. Click ‘Next’ to see the Config file and click ‘Register’.

  9. You will now be taken to the Model Quality analyze page and DMM will begin calculation of model quality metrics for this dataset.

  10. You can control which metrics should be excluded from the Scheduled Checks using the Enable/Disable Alerts toggle icon in the table. This is helpful to reduce alert noise.

6) Setup Scheduled Checks and Notifications

  1. You can also setup checks that run periodically and configure who should be receiving email alert notifications if they fail.

  2. Go to the Model’s Notifications section and enter the email ids to which notifications will be sent for this model.

  3. Go to the Analyze tab in Data Drift to control which features should be excluded from the Scheduled Checks using the Enable/Disable Alerts toggle icon in the table. This is helpful to reduce alert noise.

  4. Now go to the Schedule Checks tab to set the frequency and other parameters.

  5. Enter the name for the check

  6. Configure the frequency at which it should be run.

  7. In the “Select Data to Check” options select one:

    1. ‘Use data since last check time’ ensures only predictions with timestamp that fall after the last time the scheduled check ran are considered for the check.

    2. ‘Data since last x time period’ allows to select predictions with timestamp that fall within the last specified x time period (eg. last 3 days) are considered for the check.

  8. Click ‘Save’ and the Scheduled Checks get enabled. You can see the historical reports of all the runs in Checks History tab for data drift.

  9. Similarly you can set up Schedule Checks for model quality as well.

  10. One important distinction to keep in mind is that while for data drift the timestamp of the predictions is used select data for the schedule checks, in case of model quality it is the ingestion time of the ground truth labels that is used to select data. Refer to the ‘Setting Scheduled Checks for the model’ section of the documentation for more details.


System diagram with DMM integration

This diagram shows one way of how current release of DMM can be integrated into your system.

_images/dmm_sys_integration_diag.png