Machine Learning Workbench

Overview

Machine Learning Workbench (MLW) is aimed at data scientists and machine learning practitioners to help them solve business problems faster by streamlining the machine learning lifecycle including data capture and analysis, model training and evaluation, and model deployment. MLW provides multi-modal model building capability including an easy to use, no-code Graphical User Interface, and a programmer-friendly Jupyter Notebook based environment.

MLW provides a project-based structure for encapsulating data science assets - data, code, resources, and models - along with a version control system. MLW provides visual tools to ingest and transform data produced by devices connected to Cumulocity IoT or data offloaded by Cumulocity IoT DataHub. An in-built task scheduler can be used to orchestrate periodic data pulls or model re-training activities. Once a model is trained and evaluated, MLW allows 1-click deployment of models to Machine Learning Engine.

The following sections will describe the features of MLW in detail.

Home screen

In the Cumulocity IoT platform, you access the Machine Learning Workbench application through the app switcher. Clicking Machine Learning Workbench in the app switcher will open the Machine Learning Workbench application.

App Switcher screen

Projects

Machine Learning Workbench (MLW) provides a version-controlled project-based structure to organize all the data science resources including data, code, models, neural network architectures, inference pipelines, and training workflows.

Projects functionality includes:

Creating a new project

Click Projects in the navigator. This will list all the available projects.

Click +Add Project at the right of the top menu bar, enter a project name and description, and click Add Project. This will create a new project with the given name. The new project will not contain any resources.

Uploading resources

Machine Learning Workbench (MLW) categorizes project resources as follows:

Resource Content description File type
Data Training data for Machine Learning Workbench (MLW)
  • csv
  • json
  • zip
  • png
  • jpg
  • txt
Code Python code for data preparation/exploration, data pre/post-processing steps, model training and evaluation
  • py
  • ipynb
Model Models trained by Machine Learning Workbench (MLW)
  • pmml
  • onnx
NN Designer Architectures depicting complex structures of deep neural networks
  • architecture (JSON)
Inference Pipeline Inference pipelines that define a sequence of pre-processing step, ONNX model, and post-processing step
  • pipeline (JSON)
Training Workflow Training workflows that define a sequence of data preparation and model training/export activity that can be scheduled periodically
  • wf (JSON)

To upload files, click the cloud upload icon Upload and either click on the upload pane and select the files for uploading or use the drag and drop files capability.

Upload resources

Once the files are uploaded, they will be placed under the respective categories.

Commiting a project version

To commit a project with its resources for versioning, click the plus icon Commit at the top right and click Commit Project.

You can select all or a subset of resource files that need to be committed to a version. and click the submit icon Submit.

Commit version

Click Tasks in the navigator and click the respective task name, which in this case will be the project name, to display the status of the commit process in the Task History section at the centre.

The project card will show the different versions available for that project.

Upload Resources

Data pull

Machine Learning Workbench (MLW) provides connectors to the various data sources such as Cumulocity IoT, DataHub, etc. from where the data could be downloaded to start the machine learning model-building process.

Cumulocity IoT

The following steps illustrate how to ingest and transform data produced by devices connected to Cumulocity IoT.

  1. Click the add icon Add and select Import from Cumulocity.

  2. Select the device for which you want to pull the data and click the download icon Download under Fetch Data.

  3. As part of data pull, provide the parameters such as data file name, data interval, data aggregation, and sensor name for data extraction. Once these parameters are provided, click the submit icon Submit.

    Cumulocity parameters

Click Tasks in the navigator and click the corresponding task name, to display the status of the Cumulocity IoT data pull in the Task History section at the centre.

Once the task has reached the status COMPLETED, the data is stored in the Data folder of the respective project in Machine Learning Workbench (MLW).

DataHub

The following steps illustrate how to ingest and transform data offloaded by Cumulocity IoT DataHub.

  1. Click the add icon Add and select Import from DataHub.

  2. Input the query, and click the submit icon Submit.

    DataHub Query

  3. Provide the resource name with which you want to save the pulled data, and click Submit.

    DataHub name

Click Tasks in the navigator and click the corresponding task name, to display the status of the Cumulocity IoT Datahub data pull in the Task History section at the centre.

Once the task has reached the status COMPLETED, the data will be stored in the Data folder of the respective project in Machine Learning Workbench (MLW).

Once the data is ingested from Cumulocity IoT or DataHub, the corresponding CSV file is available in the Data folder of the respective project in Machine Learning Workbench (MLW). You can view the metadata for the newly created CSV file by clicking on the respective file name.

Data from the CSV file can be previewed by clicking the preview icon Preview at the right of the top menu bar.

Automated ML

Machine Learning Workbench (MLW) provides an Automated Machine Learning (AutoML) feature which enables you to build your machine learning models for classification, regression and anomaly detection with ease by performing an exhaustive grid search in hyper-parameter space to generate the best model for your dataset.

AutoML

The following steps illustrate how to train a machine learning model using AutoML.

  1. Click the upload icon Upload to upload a tabular dataset to train a machine learning model on that data.

  2. Select the data resource in the Data folder, and click the add icon Add at the right of the top menu bar to proceed with training the AutoML model on that data.

  3. Select the Problem Type (Regression or Classification) and select the Target Variable at the right. Next, select the imputation methods and data transformation steps for the respective column and click Build to proceed.

    Pre-processing steps

  4. In the Training Parameter section at the right, select the training parameters which include model evaluation criteria (Scoring), training iterations (Generation) and population size for each generation (Population Size) and click the submit icon Submit.

    Pre-processing steps

This will create a new task in the Tasks section.

Click Tasks in the navigator and click the corresponding task name, to display the status of the model training in the Task History section at the centre.

Once the task is COMPLETED, all the trained models are listed along with the model evaluation score in descending order.

The hyper-parameters for each model can be viewed by clicking on the corresponding model name.

Pre-processing steps

After the training is complete, the best model selected by the evaluation criteria will be saved in the Model folder of the respective Project in PMML format.

Model deployment and predictions

Once the model is available in the Model folder, it can be deployed on Machine Learning Engine (MLE) for predictions.

Select the model from the Model folder and click the cloud icon Deploy (“Deploy”) at the right of the top menu bar to deploy the selected model on Machine Learning Engine (MLE).

Once the model is successfully deployed, the cloud icon will change to Deployed “Deployed”.

To predict data using a deployed model, select the data set from the Data folder and click the predict icon Predict.

Data can be sent to a PMML model, an ONNX model, or an ONNX pipeline. For this example, we will use a PMML model deployed on the Machine Learning Engine (MLE). Select the PMML option under the predict icon Predict.

Select Format MLE

This will list all the PMML models deployed on the Machine Learning Engine (MLE). Select the PMML model for prediction and click the submit icon Submit.

Select Model for Prediction

The predicted results will be stored in the Data folder. For PMML models, the format of the input data will determine the format of predictions, i.e. output data. In our example, the input data was in CSV format. The output data will also be in CSV format.

Select the output data from the Data folder and click the download icon Download at the right of the top menu bar to download the output data to the local machine.

Select the output data from the Data folder and click the preview icon Preview to preview the output data.

Neural Network (NN) Designer

Machine Learning Workbench (MLW) provides an intuitive drag-and-drop designer that allows you to construct, edit, train and analyze deep neural networks. Built on the solid foundation of TensorFlow and Keras, the visual approach provides greater insight and clarity into the architecture of neural networks, enabling the creation of state-of-the-art deep learning models without writing a single line of code.

There are two approaches to training deep neural networks using Neural Network (NN) Designer. You can either start with a pre-trained model from a similar domain and use its architecture and weights as a starting point (transfer learning) or you can start from a blank slate and design a custom network from scratch. We will look at both these approaches in detail.

Transfer learning

To begin the model training with transfer learning, you need to create a new neural network architecture file from an existing architecture.

The following steps illustrate how to train a deep neural network model using transfer learning.

Creating a new transfer learning architecture file

  1. To create a new architecture file, click the add icon Add and select Add New Resource.

  2. In the Add New Resource dialog, select “NN Designer” as Resource Type and “MobileNet” as Architecture, enter a Resource Name and click Submit.

    New NN Name

This will create a new architecture file with the extension .architecture in the NN Designer folder of the project.

Editing the architecture file

Select the architecture file in the NN Designer folder and click edit icon Edit to edit the architecture.

This will open the MobileNet architecture in the editor where you can add new layers or remove existing layers.

New NN Selector

With the pre-trained MobileNet model represented by the architecture shown above, you can initiate transfer learning. To get started, you have to remove the last 2 layers: Reshape and Activation.

Next, drag and drop Flatten and Dense (Activation Function : softmax , Units : 2) layers, set the properties and connect them to the network.

Click the save icon Save at the right of the top menu bar to save the architecture.

Specifying the training process

  1. Click the cogwheel icon Cogwheel to train a model on the updated architecture.
  2. In the Training Parameters section at the right, select the appropriate data under Data.
  3. Specify the Problem Type which can either be “classification” or “regression”.
  4. If the data needs pre-processing, specify the Pre Processing Script.
  5. The Recurrence parameter defines whether the training task needs to be executed one time or periodically. For this example, the training task will be one time.
  6. Provide values for Epoch, Learning Rate, Loss, Metrics, Optimizer. Other parameters can be left as default.
  7. Once the training parameters are updated click the submit icon Submit which will trigger the training process.

NN training parameter

Click Tasks in the navigator and click the corresponding task name, to display the status of the model training in the Task History section at the centre.

Once the task is COMPLETED, the trained model will be saved in the Model folder of the respective Project in ONNX format.

Custom architecture

To begin the model training with a custom architecture, you need to create a new neural network architecture file from scratch.

The following steps illustrate how to train a deep neural network model using custom architecture.

Creating a new custom architecture file

  1. To create a new architecture file, click the add icon Add and select Add New Resource.

  2. In the Add New Resource dialog, select “NN Designer” as Resource Type and “None” as Architecture, and enter a Resource Name and click Submit.

This will create a empty architecture file with the extension .architecture in the NN Designer folder of the project.

Editing the architecture file

Select the architecture file and click edit icon Edit to edit the architecture.

This will open a blank architecture in the editor where you can add new layers to build a custom neural network architecture.

The rest of the steps to save the custom architecture and train the neural network model remains the same as in the case of transfer learning.

Jupyter Notebook

Machine Learning Workbench (MLW) provides an integrated Jupyter Notebook environment that enables you to write your code, perform exploratory data analysis, visualize your data, and build your models. The notebook environment is an intuitive in-browser editor that can be used to combine Markdown text and executable Python source code.

Creating a new notebook

  1. To create a new notebook, click the add icon Add and select Add New Resource.

  2. In the Add New Resource dialog, select “Notebook” as Resource Type and provide the Resource Name which identifies the notebook and click Submit.

    Notebook Selector

This will create a new notebook file with the extension .ipynb in the Code folder of the project.

Editing and executing a notebook

To edit a notebook, select the notebook file in the Code folder and click the edit icon Edit at the top right.

This will open the notebook in an editor which you can use to write and execute Python and Markdown code interactively.

Snippets of code/comments can be combined in individual cells. Output generated from a cell is displayed below the cell.

Notebook execution

Info: The Jupyter Notebook environment currently supports Python 3 kernel.

Task scheduler

Machine Learning Workbench (MLW) provides a flexible task scheduler which can be used for orchestrating a wide variety of activities including periodic data pulls from data sources or retraining your machine learning models at regular intervals.

To showcase the task scheduler, we will create a simple Python script which will be scheduled for periodic execution. This can easily be extended for more involved activities like data pull and model retraining. First, we create a new resource which will contain the Python source code.

Creating a new Python script

  1. To create a new Python file, click the add icon Add and select Add New Resource.

  2. In the Add New Resource dialog, select “Python Script” as Resource Type and provide the Resource Name which identifies the source file and click Submit.

    New PY Name

This will create a new Python file with the extension .py in the Code folder of the project.

Scheduling a Python script

From the Code folder, edit the Python file and write any python code. Click Execute and select “Repeat” as Recurrence. Provide the execution interval and click the submit icon Submit. This will execute the Python script periodically at the specified intervals.

PY Script Scheduler

Training workflow

Model training is a complex process which often requires data ingestion/transformation via arbitrary scripts. Training workflows define a sequence of data preparation and model training/export activity that can be scheduled periodically.

Info: To proceed, you will require a trained model created from Automated ML.

Creating a new workflow

  1. To create a new workflow file, click the add icon Add and select Add New Resource.

  2. In the Add New Resource dialog, select “Workflow” as Resource Type and provide the Resource Name which identifies the workflow.

  3. Select the appropriate Model, Pre Processing Script, and Data which defines the sequence of this workflow and click Submit.

    Add Workflow

This will create a new workflow file with the extension .wf in the Training Workflow folder of the respective project.

Click on a workflow file in the Training Workflow folder to view its metadata.

Executing a workflow

  1. To schedule the execution of a workflow, click the cogwheel icon Cogwheel.

  2. In the Workflow Execution section at the right, provide the parameters that will define the workflow execution including Task Name and Recurrence and click submit icon Submit.

    Workflow training

This will create a new task in the Tasks section.

Click Tasks in the navigator and click the corresponding task name, to display the status of the workflow execution in the Task History section at the centre.

Inference pipeline

ONNX models typically require a pre-processing step that converts raw input data into tensors and a post-processing step that converts tensors into output values. Inference pipelines define a sequence of pre-processing steps, ONNX model, and post-processing step. Machine Learning Workbench (MLW) can deploy inference pipelines to Machine Learning Engine.

Info: To proceed, you will require a trained model in ONNX format created from the Neural Network Designer or Jupyter Notebooks along with pre-processing and post-processing Python scripts.

Creating a new pipeline

  1. To create a new pipeline file, click the add icon Add and select Add New Resource.

  2. In the Add New Resource dialog, select “Pipeline” as Resource Type and provide the Resource Name which identifies the pipeline.

  3. Select the appropriate Model, Pre-processing Script and Post-processing Script which defines the sequence of this pipeline and click Submit.

    Add Pipeline

This will create a new pipeline file with the extension .pipeline in the Inference Pipeline folder of the project.

Click on a pipeline file in the Inference Pipeline folder to view its metadata.

Deploying a pipeline

Click on a pipeline file in the Inference Pipeline folder and click deploy icon Deploy to deploy the inference pipeline on Machine Learning Engine.