Integrating Cumulocity IoT DataHub with TrendMiner
TrendMiner provides process manufacturing companies the analytical means to further optimize their production processes. The self-service analytics approach allows you to conduct time-series industrial analytics, with data being automatically visualized in displays and dashboards.
For that purpose, TrendMiner accesses industrial data generated by these production processes, resulting in time series of sensor, instrument, and asset data. TrendMiner analyzes these time series in order to identify trends and patterns and derive actionable insights solving production issues.
With the offloading and query capabilities of Cumulocity IoT DataHub, TrendMiner can also access and analyze the data being managed by the Cumulocity IoT platform. Key features of the integration between Cumulocity IoT DataHub and TrendMiner are:
- TrendMiner can leverage historical data of the Cumulocity IoT platform without adversely affecting the Operational Store of the platform. Cumulocity IoT DataHub offloads for that purpose the data from the Operational Store to a data lake.
- TrendMiner offers a time-series visualization interface and operational monitoring, both relying on live data from the Cumulocity IoT platform. Cumulocity IoT DataHub provides for that purpose a live view on recent data in the Operational Store of the platform.
- Cumulocity IoT DataHub unifies the data access layer so that TrendMiner can access historical as well as live data by querying a single view.
- Cumulocity IoT DataHub ensures that the layout of that table meets the query needs of TrendMiner, that is, the data is in a relational and flattened format, not in a document-based format as in the Operational Store.
The following diagram illustrates the high-level concepts of the integration between Cumulocity IoT DataHub and TrendMiner.
Design of a TrendMiner offloading pipeline
Providing TrendMiner access to Cumulocity IoT data requires you solely to define an offloading pipeline using the TrendMiner data layout. When the offloading pipeline is in place, Cumulocity IoT data is regularly extracted from the Operational Store, flattened, and exported into a data lake. In addition, Dremio is configured to access recent data from the Operational Store, using the same schema as for the historical data.
In Dremio a new view is provided, which combines the historical data in the data lake with recent data from the Operational Store, effectively providing a unified view over hot data in the Operational Store and cold data in the data lake. Cumulocity IoT DataHub takes care that the combined data in that view is lossless and does not introduce duplicates. This view is the single connection point to provide TrendMiner access to historical and live data of the Cumulocity IoT platform.
You must follow the instructions in Configuring offloading jobs on how to configure an offloading pipeline for the measurements collection, so that TrendMiner can access the data.
Accessing Cumulocity IoT data in TrendMiner
Once you have defined and activated a TrendMiner offloading pipeline, the initial offload must be completed before you can start querying the data in TrendMiner.
Cumulocity IoT DataHub provides the following views within Dremio, based on tables having the same name and the same schema:
- c8y_cdh_tm_measurements is the view over the table in the data lake, which stores historical data being offloaded from the Operational Store so far.
- c8y_cdh_tm_measurements_live is the live view combining c8y_cdh_tm_measurements with recent data from the Operational Store. Both views have the same schema.
- c8y_cdh_tm_tags is the view over the table in the data lake, which stores the tag names and the source IDs. The source ID identifies the device managed in the Cumulocity IoT platform. The tagname combines the source ID with the path in the measurements documents to the values establishing the time series. In TrendMiner you use the tagnames to select the time series you want to investigate. With this view you can map this series to the device in the platform.
For details on the schema of these views/tables, see the section Offloading Cumulocity IoT base collections.
In TrendMiner you must connect to these Dremio views using ODBC. For the ODBC connection settings, you must navigate to the Home page in the Cumulocity IoT DataHub UI and click the ODBC icon to open the ODBC connection settings.
For more details on the steps required in TrendMiner, see also the corresponding TrendMiner documentation of the connector configuration.
Integrating Cumulocity IoT DataHub with Machine Learning Workbench
Machine Learning Workbench (MLW) is designed to facilitate the work of data scientists and machine learning practitioners by streamlining model training and evaluation activities. MLW provides a no-code UI as well as a Jupyter Notebook-based setup for the various machine learning tasks.
Machine learning heavily relies on suitable datasets for training and evaluating models. For the specific case of IoT data, MLW offers tooling to ingest and process data from devices connected to the Cumulocity IoT platform. In particular, MLW can process the data which Cumulocity IoT DataHub has offloaded into a data lake. For that purpose, MLW provides a connector for Cumulocity IoT DataHub, which fetches the data from the data lake using a SQL query. The imported data is then stored in CSV format in MLW. Once the data is in place, you can start training or evaluating corresponding machine learning models.
For detailed instructions on how to leverage data offloaded by Cumulocity IoT DataHub in MLW see the section Data pull of the MLW documentation.
Integrating Cumulocity IoT DataHub with Microsoft Power BI
Microsoft Power BI is a business intelligence tool which allows you to create and use interactive reports for data from various sources. These reports can also be built on your IoT data. Given your devices are connected with the Cumulocity IoT platform, you can utilize Cumulocity IoT DataHub to offload the data into a data lake of your choice. Then you can create a Microsoft Power BI report which is based on the data in the data lake. Cumulocity IoT DataHub allows you to access and work with these reports from within the Cumulocity IoT DataHub web frontend.
Before setting up the connection to Microsoft Power BI in Cumulocity IoT DataHub, conduct the following steps.
Accessing data lakes in Microsoft Power BI reports
Cumulocity IoT DataHub leverages the native interaction between Microsoft Power BI and Dremio. Microsoft Power BI reports can consume data from data lakes using Dremio as query and data access layer. When creating a new report in Microsoft Power BI desktop, you can select Dremio as a database and establish a connection to the Dremio cluster. With this connection you have access to the data lakes connected to Dremio.
A report is typically published so that it is available to other users as well. For a published report, it is currently required to deploy a Microsoft Power BI gateway which establishes the connection between Microsoft Power BI and Dremio.
Configuring access to Microsoft Power BI reports
To make reports available in its web frontend, Cumulocity IoT DataHub embeds Microsoft Power BI content. Users neither must sign in to Microsoft Power BI nor need a Microsoft Power BI license to access the reports. For access authentication an Azure Active Directory service principal object with an application secret is used.
The following configuration steps are required, as discussed in detail in the corresponding Microsoft documentation.
As prerequisite you need an Azure Active Directory tenant. If you do not have an Azure Active Directory tenant, follow the instructions in the Microsoft documentation.
Next you must register an Azure Active Directory application, which serves as service principal. You must configure the service principal application to access the REST APIs of Microsoft Power BI, following the instructions on the Microsoft Power BI website:
- Select Embed for your customers.
- Sign in to Microsoft Power BI.
- Register an application with respective permissions.
- Skip creating a workspace and importing content.
- Grant permissions to the service principal.
Alternatively, you can create a service principal application following the section Creating an Azure AD app in the Microsoft Azure portal in the Microsoft documentation.
Additionally, you must add a client secret for the service principal application. You can do that via the Azure portal. Search for App registrations, select your application by its name under All applications, and click the link next to the Client credentials entry on the Overview page of the application.
Next you can define a workspace to organize your reports. By adding the service principal application as a member or admin to the workspace, it can access the reports of the workspace. Go to the Microsoft Power BI website and conduct the following steps to grant the permissions:
- Sign in to Microsoft Power BI.
- Click Workspaces.
- Select the context menu of the workspace to share with the service principal.
- Select Workspace access.
- Enter the name of your recently created service principal application and grant the Member or Admin permission.
Only workspaces granting access to the service principal application can be browsed from within Cumulocity IoT DataHub. Once the workspace is available, you can publish reports to it and access it in Cumulocity IoT DataHub.
Setting up the connection in Cumulocity IoT DataHub
In the navigator, select Settings and then Microsoft Power BI to define the connection settings.
|Azure Active Directory tenant ID||The ID of the Azure Active Directory tenant. Within the tenant, an Azure Active Directory application must exist with a service principal that is allowed to access corresponding resources of Microsoft Power BI.|
|Client ID||The ID of the Azure Active Directory application which has permissions to call the REST APIs of Microsoft Power BI.|
|Client secret||The client secret, which is configured for the Azure Active Directory application.|
Once all settings are completed, click Save on the action bar to save the settings and establish the connection.
If you want to delete the settings, click Delete on the action bar. You cannot access reports afterwards.
Working with reports
Once the settings are defined, you can access and work with the reports.
In the navigator, select Microsoft Power BI. The menu entry is only shown if the connection settings are defined.
On the Reports page, click Add report in the action bar. A dialog opens with two dropdown boxes. The first dropdown box lists all workspaces which grant member or admin access to the service principal. Select the workspace you are interested in. The second dropdown box provides all reports of the selected workspace. Select a report from the dropdown box.
Click Select to open the report or Cancel to close the dialog without selecting a report.
The selected report is shown and can be interacted with. You can open multiple reports. For each opened report, a tab entry shows up in the action bar. To close the currently selected report, click Remove report in the action bar.