Overview
Machine Learning Workbench (MLW) is aimed at data scientists and machine learning practitioners to help them solve business problems faster by streamlining the machine learning lifecycle including data capture and analysis, model training and evaluation, and model deployment. MLW provides multi-modal model building capability including an easy to use, no-code Graphical User Interface, and a programmer-friendly Jupyter Notebook based environment.
MLW provides a project-based structure for encapsulating data science assets - data, code, resources, and models - along with a version control system. MLW provides visual tools to ingest and transform data produced by devices connected to Cumulocity IoT or data offloaded by Cumulocity IoT DataHub. An in-built task scheduler can be used to orchestrate periodic data pulls or model re-training activities. Once a model is trained and evaluated, MLW allows 1-click deployment of models to Machine Learning Engine.
The following sections will describe the features of MLW in detail.
Home screen
In the Cumulocity IoT platform, you access the Machine Learning Workbench application through the app switcher. Clicking Machine Learning Workbench in the app switcher will open the Machine Learning Workbench application.
Projects
Machine Learning Workbench (MLW) provides a version-controlled project-based structure to organize all the data science resources including data, code, models, neural network architectures, inference pipelines, and training workflows.
Projects functionality includes:
- Creating a new project
- Uploading resources
- Deleting resources
- Committing a project version
- Switching between project versions
- Downloading a project
- Uploading a project
- Deleting a project
- Deleting a project version
- Deleting multiple projects
- Task grouping and deletion
Creating a new project
Click Projects in the navigator. This will list all the available projects.
Click +Add Project at the right of the top menu bar, enter a project name and description, and click Add Project. This will create a new project with the given name. The new project will not contain any resources.
Uploading resources
Machine Learning Workbench (MLW) categorizes project resources as follows:
Resource | Content description | File type |
---|---|---|
Data | Training data for Machine Learning Workbench (MLW) |
|
Code | Python code for data preparation/exploration, data pre/post-processing steps, model training and evaluation |
|
Model | Models trained by Machine Learning Workbench (MLW) |
|
NN Designer | Architectures depicting complex structures of deep neural networks |
|
Inference Pipeline | Inference pipelines that define a sequence of pre-processing step, ONNX model, and post-processing step |
|
Training Workflow | Training workflows that define a sequence of data preparation and model training/export activity that can be scheduled periodically |
|
To upload files, click the cloud upload icon and either click on the upload pane and select the files for uploading or use the drag and drop files capability.
Once the files are uploaded, they will be placed under the respective categories.
Deleting resources
To delete resource(s), click Projects in the navigator, select the project from which you want to delete resources, select the resource category (for example, Data) and select the resource(s). Click the delete icon on top right corner to delete the selected resource.
Committing a project version
To commit a project with its resources for versioning, click the plus icon next to the project name at the top.
You can select all or a subset of resource files that need to be committed to a version. And click the submit icon .
Click Tasks in the navigator and click the respective task name, which in this case will be the project name, to display the status of the commit process in the Task History section at the center.
The project card will show the different versions available for that project.
Switching between project versions
To switch to a different version of the project, click Projects in the navigator, select the version you want to switch to in the project tile and click Pull. The version switch message will show up in the respective project tile.
Click Tasks in the navigator and click the respective task name, which in this case is the project name, to display the status of the commit process in the Task History section at the center.
Downloading a project
To download a specific version of a project, click the icon and click Download.
Click Tasks in the navigator and click the respective task name, which in this case is the project name underscore download (ex. demoproject_download), to display the status of the download process in the Task History section at the center.
Once the task has reached COMPLETED status, the project ZIP would be successfully created. You can download the project ZIP by clicking on the Project button on the project card.
Uploading a project
To facilitate collaboration and sharing, MLW allows you to upload the contents from an exported project archive.
To upload a project on a particular tenant, Click +Add Project at the right of the top menu bar, select the Upload radio button and upload the ZIP file by clicking on Drop file here button.
Click Tasks in the navigator and click the respective task name, which in this case is the project name underscore unique uuid (ex. demoproject_a3n67e), to display the status of the upload process in the Task History section at the center.
Once the task has reached COMPLETED status, the new project card is successfully created. You can view the contents of the project by clicking the respective project card.
Deleting a project
To delete a project, click the context menu icon and click Delete.
Click Tasks in the navigator and click the respective task name, which in this case is the project name underscore delete (ex. demoproject_delete), to display the status of the delete process in the Task History section at the center.
Once the task has reached COMPLETED status, the project would be successfully deleted.
Deleting a project version
To delete a specific version of the project, click Projects in the navigator, select the version you want to delete in the project tile and click Delete.
Click Tasks in the navigator and click the respective task name, which in this case is “project name underscore version underscore version number underscore delete” (for example, demoproject_version_v1_delete), to display the status of the delete process in the Task History section at the center.
Once the task has reached COMPLETED status, the project version is successfully deleted.
Deleting multiple projects
To delete multiple projects, click Projects in the navigator, then click Bulk Actions at the top right. Select the projects you want to delete and click Delete.
Click Tasks in the navigator and click the respective task names for each deleted project, which in this case is “project name underscore delete” (for example, demoproject_delete), to display the status of the delete process in the Task History section at the center.
Once the task has reached COMPLETED status, the project version is successfully deleted.
Note, that deleting a project or a specific project version will fail with a notification if there is any ongoing task associated with the project.
Task grouping and deletion
Tasks are grouped separately for each project. To see tasks associated with a project, click Tasks in the navigator, and then select the project to see all its associated tasks.
To delete task(s), click Tasks in the navigator and select the project to display its associated tasks. Select the task(s) and click the delete icon at the top right.
Data pull
Machine Learning Workbench (MLW) provides connectors to the various data sources such as Cumulocity IoT, DataHub, etc. from where the data could be downloaded to start the machine learning model-building process.
Cumulocity IoT
The following steps illustrate how to ingest and transform data produced by devices connected to Cumulocity.
-
Click the add icon and select Import from Cumulocity IoT.
-
Select the device for which you want to pull the data and click the download icon under Fetch Data.
-
As part of data pull, provide the parameters such as data file name, data interval, data aggregation, and sensor name for data extraction. Once these parameters are provided, click the submit icon .
Click Tasks in the navigator and click the corresponding task name, to display the status of the Cumulocity IoT data pull in the Task History section at the center.
Once the task has reached the status COMPLETED, the data is stored in the Data folder of the respective project in Machine Learning Workbench (MLW).
DataHub
The following steps illustrate how to ingest and transform data offloaded by Cumulocity IoT DataHub.
-
Click the add icon and select Import from DataHub.
-
Input the query, and click the submit icon .
-
Provide the resource name with which you want to save the pulled data, and click Submit.
Click Tasks in the navigator and click the corresponding task name, to display the status of the Cumulocity IoT Datahub data pull in the Task History section at the center.
Once the task has reached the status COMPLETED, the data will be stored in the Data folder of the respective project in Machine Learning Workbench (MLW).
Once the data is ingested from Cumulocity IoT or DataHub, the corresponding CSV file is available in the Data folder of the respective project in Machine Learning Workbench (MLW). You can view the metadata for the newly created CSV file by clicking on the respective file name.
Data from the CSV file can be previewed by clicking the preview icon at the right of the top menu bar.
AWS S3
The following steps illustrate how to ingest and transform data available in AWS S3 buckets.
-
Click Settings in the navigator, then click the AWS-S3 tab to register the S3 credentials with Machine Learning Workbench.
-
Click Projects in the navigator, then select the project. Click the add resources icon and select Import from AWS S3.
-
Select the bucket from which you want to pull the data and click the save icon to save the ZIP file in the project’s resources.
Click Tasks in the navigator and click the corresponding task name, to display the status of the AWS S3 data pull in the Task History section at the center.
Once the task has reached the status COMPLETED, the data is stored in the Data/Code folder of the respective project in Machine Learning Workbench (MLW).
Automated ML
Machine Learning Workbench (MLW) provides an Automated Machine Learning (AutoML) feature which enables you to build your machine learning models for classification, regression and anomaly detection with ease by performing an exhaustive grid search in hyper-parameter space to generate the best model for your dataset.
AutoML
The following steps illustrate how to train a machine learning model using AutoML.
-
Click the upload icon to upload a tabular dataset to train a machine learning model on that data.
-
Select the data resource in the Data folder, and click the add icon at the right of the top menu bar to proceed with training the AutoML model on that data.
-
Select the Problem Type (Regression or Classification) and select the Target Variable at the right. Next, select the imputation methods and data transformation steps for the respective column and click Build to proceed.
-
In the Training Parameter section at the right, select the training parameters which include model evaluation criteria (Scoring), training iterations (Generation) and population size for each generation (Population Size) and click the submit icon .
This will create a new task in the Tasks section.
Click Tasks in the navigator and click the corresponding task name, to display the status of the model training in the Task History section at the center.
Once the task is COMPLETED, all the trained models are listed along with the model evaluation score in descending order.
The hyper-parameters for each model can be viewed by clicking on the corresponding model name.
After the training is complete, the best model selected by the evaluation criteria will be saved in the Model folder of the respective Project in PMML format.
Model deployment and predictions
Once the model is available in the Model folder, it can be deployed on Machine Learning Engine (MLE) for predictions.
Select the model from the Model folder and click the cloud icon (“Deploy”) at the right of the top menu bar to deploy the selected model on Machine Learning Engine (MLE).
Once the model is successfully deployed, the cloud icon will change to “Deployed”.
To predict data using a deployed model, select the data set from the Data folder and click the predict icon .
Data can be sent to a PMML model, an ONNX model, or an ONNX pipeline. For this example, we will use a PMML model deployed on Machine Learning Engine (MLE). Select the PMML option under the predict icon .
This will list all the PMML models deployed on Machine Learning Engine (MLE). Select the PMML model for prediction and click the submit icon .
The predicted results will be stored in the Data folder. For PMML models, the format of the input data will determine the format of predictions, i.e. output data. In our example, the input data was in CSV format. The output data will also be in CSV format.
Select the output data from the Data folder and click the download icon at the right of the top menu bar to download the output data to the local machine.
Select the output data from the Data folder and click the preview icon to preview the output data.
Neural Network (NN) Designer
Machine Learning Workbench (MLW) provides an intuitive drag-and-drop designer that allows you to construct, edit, train and analyze deep neural networks. Built on the solid foundation of TensorFlow and Keras, the visual approach provides greater insight and clarity into the architecture of neural networks, enabling the creation of state-of-the-art deep learning models without writing a single line of code.
There are two approaches to training deep neural networks using Neural Network (NN) Designer. You can either start with a pre-trained model from a similar domain and use its architecture and weights as a starting point (transfer learning) or you can start from a blank slate and design a custom network from scratch. We will look at both these approaches in detail.
Transfer learning
To begin the model training with transfer learning, you need to create a new neural network architecture file from an existing architecture.
The following steps illustrate how to train a deep neural network model using transfer learning.
Creating a new transfer learning architecture file
-
To create a new architecture file, click the add icon and select Add New Resource.
-
In the Add New Resource dialog, select “NN Designer” as Resource Type and “MobileNet” as Architecture, enter a Resource Name and click Submit.
This will create a new architecture file with the extension .architecture in the NN Designer folder of the project.
Editing the architecture file
Select the architecture file in the NN Designer folder and click edit icon to edit the architecture.
This will open the MobileNet architecture in the editor where you can add new layers or remove existing layers.
With the pre-trained MobileNet model represented by the architecture shown above, you can initiate transfer learning. To get started, you must remove the last 2 layers: Reshape and Activation.
Next, drag and drop Flatten and Dense (Activation Function : softmax , Units : 2) layers, set the properties and connect them to the network.
Click the save icon at the right of the top menu bar to save the architecture.
Specifying the training process
- Click the cogwheel icon to train a model on the updated architecture.
- In the Training Parameters section at the right, select the appropriate data under Data.
- Specify the Problem Type which can either be “classification” or “regression”.
- If the data needs pre-processing, specify the Pre Processing Script.
- The Recurrence parameter defines whether the training task needs to be executed one time or periodically. For this example, the training task will be one time.
- Provide values for Epoch, Learning Rate, Loss, Metrics, Optimizer. Other parameters can be left as default.
- Once the training parameters are updated click the submit icon which will trigger the training process.
Click Tasks in the navigator and click the corresponding task name, to display the status of the model training in the Task History section at the center.
Once the task is COMPLETED, the trained model will be saved in the Model folder of the respective Project in ONNX format.
Custom architecture
To begin the model training with a custom architecture, you need to create a new neural network architecture file from scratch.
The following steps illustrate how to train a deep neural network model using custom architecture.
Creating a new custom architecture file
-
To create a new architecture file, click the add icon and select Add New Resource.
-
In the Add New Resource dialog, select “NN Designer” as Resource Type and “None” as Architecture, and enter a Resource Name and click Submit.
This will create a empty architecture file with the extension .architecture in the NN Designer folder of the project.
Editing the architecture file
Select the architecture file and click edit icon to edit the architecture.
This will open a blank architecture in the editor where you can add new layers to build a custom neural network architecture.
The rest of the steps to save the custom architecture and train the neural network model remains the same as in the case of transfer learning.
Jupyter Notebook
Machine Learning Workbench (MLW) provides an integrated Jupyter Notebook environment that enables you to write your code, perform exploratory data analysis, visualize your data, and build your models. The notebook environment is an intuitive in-browser editor that can be used to combine Markdown text and executable Python source code.
Creating a new notebook
-
To create a new notebook, click the add icon and select Add New Resource.
-
In the Add New Resource dialog, select “Notebook” as Resource Type and provide the Resource Name which identifies the notebook and click Submit.
This will create a new notebook file with the extension .ipynb in the Code folder of the project.
Editing and executing a notebook
To edit a notebook, select the notebook file in the Code folder and click the edit icon at the top right.
This will open the notebook in an editor which you can use to write and execute Python and Markdown code interactively.
Snippets of code/comments can be combined in individual cells. Output generated from a cell is displayed below the cell.
Jupyter Enterprise Gateway and MLW library
Jupyter Enterprise Gateway (JEG) is now integrated to use with MLW. To use JEG, credentials needs to be set first.
Click Settings in the navigator, switch to the JEG tab and enter the JEG credentials.
Once the JEG credentials are set, all previously active Notebook instances will be killed. While opening any notebook, the desired JEG kernel must be selected from the dropdown list.
A new library mlw-sdk has been developed to help move files between MLW & JEG. The sample code snippets for this library are available to be used readily.
To open the code snippets panel, click on the code snippets icon at the end of the Jupyter Notebook tool bar.
Select the desired code snippet from this panel & click the insert option to populate the code in Jupiter Notebook cells.
Assets grouping and deletion
Jupyter Notebook instances are grouped separately for each projects. To see the Jupyter Notebook instance associated with a project, click Assets in the navigator, and then select the project to see all its active notebook instances.
To kill instance(s), click Assets in the navigator and select a project to display the associated instances. Select the instance(s), and click the kill instance icon at the top right.
Task scheduler
Machine Learning Workbench (MLW) provides a flexible task scheduler which can be used for orchestrating a wide variety of activities including periodic data pulls from data sources or retraining your machine learning models at regular intervals.
To showcase the task scheduler, we will create a simple Python script which will be scheduled for periodic execution. This can easily be extended for more involved activities like data pull and model retraining. First, we create a new resource which will contain the Python source code.
Creating a new Python script
-
To create a new Python file, click the add icon and select Add New Resource.
-
In the Add New Resource dialog, select “Python Script” as Resource Type and provide the Resource Name which identifies the source file and click Submit.
This will create a new Python file with the extension .py in the Code folder of the project.
Scheduling a Python script
From the Code folder, edit the Python file and write any python code. Click Execute and select “Repeat” as Recurrence. Provide the execution interval and click the submit icon . This will execute the Python script periodically at the specified intervals.
Training workflow
Model training is a complex process which often requires data ingestion/transformation via arbitrary scripts. Training workflows define a sequence of data preparation and model training/export activity that can be scheduled periodically.
Creating a new workflow
-
To create a new workflow file, click the add icon and select Add New Resource.
-
In the Add New Resource dialog, select “Workflow” as Resource Type and provide the Resource Name which identifies the workflow.
-
Select the appropriate Model, Pre Processing Script, and Data which defines the sequence of this workflow and click Submit.
This will create a new workflow file with the extension .wf in the Training Workflow folder of the respective project.
Click on a workflow file in the Training Workflow folder to view its metadata.
Executing a workflow
-
To schedule the execution of a workflow, click the cogwheel icon .
-
In the Workflow Execution section at the right, provide the parameters that will define the workflow execution including Task Name and Recurrence and click submit icon .
This will create a new task in the Tasks section.
Click Tasks in the navigator and click the corresponding task name, to display the status of the workflow execution in the Task History section at the center.
Inference pipeline
ONNX models typically require a pre-processing step that converts raw input data into tensors and a post-processing step that converts tensors into output values. Inference pipelines define a sequence of pre-processing steps, ONNX model, and post-processing step. Machine Learning Workbench (MLW) can deploy inference pipelines to Machine Learning Engine.
Creating a new pipeline
-
To create a new pipeline file, click the add icon and select Add New Resource.
-
In the Add New Resource dialog, select “Pipeline” as Resource Type and provide the Resource Name which identifies the pipeline.
-
Select the appropriate Model, Pre-processing Script and Post-processing Script which defines the sequence of this pipeline and click Submit.
This will create a new pipeline file with the extension .pipeline in the Inference Pipeline folder of the project.
Click on a pipeline file in the Inference Pipeline folder to view its metadata.
Deploying a pipeline
Click on a pipeline file in the Inference Pipeline folder and click deploy icon to deploy the inference pipeline on Machine Learning Engine.