Running Cumulocity DataHub on Cumulocity Edge

This section describes how to run Cumulocity DataHub on the Cumulocity Edge, the local version of Cumulocity.

Cumulocity DataHub Edge overview

Documentation overview

The following sections will walk you through all the functionalities of Cumulocity DataHub Edge in detail.

For your convenience, here is an overview of the contents:

Section Content
Cumulocity DataHub Edge overview Get an overview of Cumulocity DataHub Edge
Setting up Cumulocity DataHub Edge Set up Cumulocity DataHub Edge and its components
Working with Cumulocity DataHub Edge Manage offloading pipelines and query the offloaded results
Operating Cumulocity DataHub Edge Run administrative tasks

Cumulocity DataHub Edge at a glance

Cumulocity Edge is the single-server variant of the Cumulocity platform, designed to run in factories on industrial PCs or local servers, that is, in the same site (“onsite”) in which the IoT assets are located. Cumulocity DataHub is available as an add-on to Cumulocity Edge.

Cumulocity DataHub Edge offers the same functionality as the cloud-variant of Cumulocity DataHub, and is deployed similarly into a Kubernetes cluster. The significant difference is that processes and data are entirely local to your network, rather than in the cloud. You can define offloading pipelines, which regularly move data from the Operational Store of Cumulocity into a data lake. In the Edge setup, a NAS is used as data lake. Dremio, the internal engine of Cumulocity DataHub, can access the data lake and run analytical queries against its contents, using SQL as the query interface.

Cumulocity DataHub Edge versus Cumulocity DataHub cloud deployments

Cumulocity DataHub Edge uses the same software as Cumulocity DataHub, though in the following aspects these two variants differ:

Area Cumulocity DataHub Edge Cumulocity DataHub Cloud
High Availability Depending on any underlying virtualization technology Depending on the cloud deployment setup
Vertical scalability Yes Yes
Horizontal scalability No Yes
Upgrades with no downtime No No
Root access No Yes, if customer is hosting
Installation Offline & Online Online
Dremio cluster setup 1 master, 1 executor Minimum 1 master, 1 executor
Dremio container management Kubernetes Kubernetes
Cumulocity DataHub backend container management Microservice in Cumulocity Edge Microservice in Cumulocity Core
Data lakes NAS Azure Storage, S3, (NAS)

Setting up Cumulocity DataHub Edge on Kubernetes

In this setup, Cumulocity DataHub is deployed into a Kubernetes environment using the Edge operator. The DataHub backend is run as a microservice within the Cumulocity platform. The Dremio master and executor are deployed as a set of Kubernetes pods.

Prerequisites

Resource requirements {#resource requirements}

The resource requirements for running a bare Cumulocity Edge instance are described in Requirements. When Cumulocity DataHub Edge on Kubernetes is deployed on top, the resource requirements change by the following additional amounts:

  • Recommended: 16 GB RAM, minimum: 10 GB RAM
  • Recommended: 10 logical CPU cores, minimum: 6 logical CPU cores
  • 100 GB of free disk space plus sufficient free disk space for the data lake contents. For more information about configuring the storage, see Configuring storage.

Hardware requirements for the host OS are excluded.

Setting up Cumulocity DataHub Edge on Kubernetes

To install and configure DataHub Edge on Kubernetes, update the spec.dataHub field in the Edge Custom Resource (CR) with the necessary configuration details for the Edge operator. After making the changes, apply the updated CR to deploy DataHub Edge.

For more details on the spec.dataHub field, refer to Edge Custom Resource - DataHub.

For additional guidance, see the Install Edge and Modify Edge sections in the Edge on Kubernetes documentation.

In order to access Dremio, you must also make the domain datahub-<domain_name> resolvable, just as the configured domain name and management-<domain_name> were made resolvable in Accessing Edge.

Using Cumulocity DataHub Edge on Kubernetes

Cumulocity DataHub Edge on Kubernetes behaves like the Cloud and Edge appliance version.

Validation of the Cumulocity DataHub installation

If the product doesn’t work as intended after the installation, go through the validation steps described below.

Info
Substitute the namespace name c8yedge in the subsequent commands with the specific namespace name into which you installed Edge.

MySQL

You can monitor the startup of the MySQL pod datahub-mysql-0 using:

kubectl get pods -n c8yedge datahub-mysql-0 --watch

The result will be similar to:

NAME              READY   STATUS    RESTARTS   AGE
datahub-mysql-0   1/1     Running   0          4m55s

When running the command:

kubectl get svc -n c8yedge

The output will be similar to:

NAME          TYPE          CLUSTER-IP          EXTERNAL-IP          PORT(S)          AGE
mysql-client  ClusterIP     XXX.XXX.XXX.XXX     <none>               3306/TCP         10m

Dremio

You can monitor the state of the Dremio pods “zk-0”, “dremio-executor-0”, and “dremio-master-0” using:

kubectl get pods -n c8yedge --watch

The status “Running” indicates that the pods have started successfully:

NAME              READY          STATUS          RESTARTS         AGE
...
zk-0              1/1            Running         0                6m34s
dremio-executor-0 1/1            Running         0                6m34s
dremio-master-0   1/1            Running         0                6m34s

When running the command:

kubectl get svc -n c8yedge

The output will be similar to:

NAME              TYPE              CLUSTER-IP              EXTERNAL-IP               PORT(S)                                         AGE
dremio-client     LoadBalancer      XXX.XXX.XXX.XXX         XXX.XXX.XXX.XXX           31010:XXXXX/TCP,9047:XXXXX/TCP,32010:XXXXX/TCP  9m33s

Cumulocity DataHub microservice

When logged into the Cumulocity UI, the Cumulocity DataHub microservice is available under Administration > Ecosystem > Microservices.

You can monitor the startup of the microservice pod “datahub-scope-edge-deployment-….” using:

kubectl get pods -n c8yedge --watch

The status “Running” indicates that the pod has started successfully:

NAMESPACE     NAME                                            READY   STATUS      RESTARTS    AGE
...
c8yedge       datahub-scope-edge-deployment-XXXXXXXXXX-YYYYY  1/1     Running     0           16m

DataHub web application

When logged into the Cumulocity UI, the Cumulocity DataHub web application is available under Administration > Ecosystem > Applications. It should also be present in the usual Cumulocity application switcher.

Working with Cumulocity DataHub Edge

Cumulocity DataHub Edge offers the same set of functionality as the cloud variant. See Working with Cumulocity DataHub for details on configuring and monitoring offloading jobs, querying offloaded Cumulocity data, and refining offloaded Cumulocity data.

Operating Cumulocity DataHub Edge

Similar to the cloud variant, Cumulocity DataHub Edge UI allows you to check system information and view audit logs. See Operating Cumulocity DataHub for details.

When managing Cumulocity DataHub Edge, the following standard tasks are additionally relevant.

Troubleshooting the system

If problems occur, you should follow these steps:

If you need to contact product support, include the diagnostic log archive. See Accessing logs.

Health check

Check Cumulocity DataHub Edge backend status

You can check the status of the backend in the Administration page of the Cumulocity DataHub UI. Alternatively you can query the isalive endpoint, which should produce an output similar to:

curl --user admin:your_password https://edge_domain_name/service/datahub/isalive

{
  "timestamp" : 1582204706844,
  "version" : {
    "versionId" : "10.6.0.0.337",
    "build" : "202002200050",
    "scmRevision" : "4ddbb70bf96eb82a2f6c5e3f32c20ff206907f43"
  }
}

If the backend cannot be reached, you will get an error response.

Check Dremio backend status

You can check the status of Dremio using the server_status endpoint:

curl http://datahub-edge_domain_name/apiv2/server_status
"OK"

Dremio is running if OK is returned. No response will be returned if it is not running or inaccessible.

Log files

Logs are available in the Administration application of Cumulocity. Navigate to Ecosystem > Applications, select the DataHub microservice from the application list and switch to the Logs tab.

Dremio logs are available for each pod via kubectl, for example:

kubectl -n c8yedge logs dremio-executor-0
kubectl -n c8yedge logs dremio-master-0

Cleanup of Dremio job history

Dremio maintains a history of job details and profiles, which can be inspected in Dremio’s job log, that is, the Jobs page of the Dremio UI. This job history must be cleaned up regularly to free the resources necessary for storing it.

Dremio is configured to perform the cleanup of job results automatically without downtime. The default value for the maximum age of stored job results is seven days. To change that value, a Dremio administrator must modify the support key jobs.max.age_in_days. The changes become effective within 24 hours or after restarting Dremio. See the corresponding Dremio documentation for more details on support keys.