Querying offloaded Cumulocity data

Cumulocity IoT DataHub offers an SQL interface so that you can efficiently query offloaded (device) data and leverage the results in your own applications. A prerequisite for running SQL queries over device data is that you have configured and executed offloading pipelines that replicate and transform data from Cumulocity’s Operational Store to the data lake.

Overview

As described in section Cumulocity IoT DataHub at a glance, Cumulocity IoT DataHub manages offloading pipelines which periodically extract data from Cumulocity’s Operational Store, transform the data into a relational format and finally store it in a data lake. Instead of querying the Operational Store, you run your queries against the data lake. The distributed SQL engine Dremio provides the query interfaces to access the data lake.

Different standard interfaces exist for that purpose, namely JDBC, ODBC, and REST. In order to work with one of those interfaces, select Home in the navigation bar. Under Quick links you will find starting points for the different interfaces.

Access to data lake contents

You require an account in both DataHub Console and Dremio, and these accounts use the same same username and password. You need these account settings to authenticate your requests when running queries using Dremio.

When you have established a connection to Dremio, you can run SQL queries against your tables in the data lake (to which new data is appended whenever the offloading pipeline has successfully run). The source you refer to in the query is defined by your account name and the target table you have specified in the offloading configuration. The identifier to be used as the source in a SQL query is:

YourAccountNameDataLake.YourAccountName.TargetTable

For example, if your account name is JohnSmith and you have defined an offloading configuration to write the alarms collection to the target table JohnsAlarms, then an example query would be:

SELECT * FROM JohnSmithDataLake.JohnSmith.JohnsAlarms;

Info: The offloading pipeline has to be executed at least once with corresponding data being offloaded before you can run a query.

Connecting via JDBC

If you have a Java client, you can use JDBC to run SQL queries against the data lake. You have to download the Dremio JDBC driver. You can obtain the JDBC connection string from DataHub Console by clicking the JDBC icon in the Quick links section of the Home page. As username and password use the credentials from your Dremio/DataHub Console account when setting up your JDBC client.

For additional JDBC settings of Dremio see also the associated Dremio documentation.

Connecting via ODBC

If you want to use an ODBC client to run SQL queries against the data lake, you have to configure the platform-specific driver, following the associated Dremio installation instructions. To obtain the ODBC connection string, click the ODBC icon in the Quick links section of the Home page. As username and password use the credentials from your Dremio/DataHub Console account when setting up your ODBC client.

Connecting via Dremio REST API

Dremio offers a SQL REST API which you can use to run SQL queries against tables in the data lake. You need to authenticate against Dremio in order to use the API. Note that the API might change any time and Software AG does not provide any guarantees.

Connecting via DataHub Console REST API

DataHub Console server also can handle REST requests for Dremio query processing. DataHub Console server acts as a proxy for these requests and forwards them to Dremio for evaluation. Therefore this API might change any time as well.

In contrast to directly calling the Dremio REST API, in this case authentication is done against DataHub Console. In future releases it is planned that no authorization will be required due to the use of single sign-on.

Request URLs

The URL paths of proxied requests consist of the corresponding Dremio REST API path, prefixed by “/dremio”. Dremio’s SQL and JOB APIs are supported. Headers and request body are as specified in the corresponding Dremio REST API documentation.

Note that you must not provide the authorization header for Dremio when using DataHub Console REST API. Instead you have to provide the authorization header for DataHub Console.

Submitting a query

Retrieving job status

Retrieving job results

Cancelling running jobs

Example

This example submits a Dremio SQL query to fetch the five most recent alarms which already were offloaded, waits for it to complete, and fetches the result.

The SQL query, assuming tenant name “Smith”, is:

SELECT creationTime, severity, text
FROM SmithDataLake.Smith.alarms
ORDER BY creationTime DESC
LIMIT 5

This request submits the query, with the URL specific to your organization:

POST /dremio/api/v3/sql HTTP/1.1
Host: mytenant.cumulocity.com:9090
Content-Type: application/json
{
    "sql": "SELECT creationTime, severity, text\nFROM SmithDataLake.Smith.alarms\nORDER BY creationTime DESC\nLIMIT 5"
}

The response provides the job id:

{
    "id": "22feee74-875a-561c-5508-04114bdda000"
}

The following request checks for job completion:

GET /dremio/api/v3/job/22feee74-875a-561c-5508-04114bdda000 HTTP/1.1
Host: localhost:9090

The response shows the job status:

{
    "jobState": "COMPLETED",
    "rowCount": 5,
    "errorMessage": "",
    "startedAt": "2019-06-12T14:51:54.158Z",
    "endedAt": "2019-06-12T14:51:54.563Z",
    "queryType": "REST",
    "queueName": "Low Cost User Queries",
    "queueId": "f62caa92-c36b-4d5f-b20c-ff0ce3feff1c",
    "resourceSchedulingStartedAt": "2019-06-12T14:51:54.211Z",
    "resourceSchedulingEndedAt": "2019-06-12T14:51:54.240Z",
    "cancellationReason": ""
}

A job state of RUNNING is returned while the query is still being executed, and eventually it changes to COMPLETED or FAILED. Once the job has been completed, its results are returned by the following request:

GET /dremio/api/v3/job/22feee74-875a-561c-5508-04114bdda000/results HTTP/1.1
Host: localhost:9090

The response might look as follows:

{
    "rowCount": 5,
    "schema": [
        
            "name": "creationTime",
            "type": {
                "name": "TIMESTAMP"
            
        },
        
            "name": "severity",
            "type": {
                "name": "VARCHAR"
            
        },
        
            "name": "text",
            "type": {
                "name": "VARCHAR"
            
        
    ],
    "rows": [
        
            "creationTime": "2019-06-07 13:58:38.197",
            "severity": "MINOR",
            "text": "Something unfortunate went wrong (1)."
        
    
} 

Connecting other clients

Dremio offers support for connecting a variety of clients, including reporting tools like PowerBI and common analytics languages like Python. The Dremio documentation discusses how to connect these clients to Dremio and leverage its query capabilities.