Running Cumulocity DataHub on Cumulocity Edge
This section describes how to run Cumulocity DataHub on the Cumulocity Edge, the local version of Cumulocity.
This section describes how to run Cumulocity DataHub on the Cumulocity Edge, the local version of Cumulocity.
The following sections will walk you through all the functionalities of Cumulocity DataHub Edge in detail.
For your convenience, here is an overview of the contents:
Section | Content |
---|---|
Cumulocity DataHub Edge overview | Get an overview of Cumulocity DataHub Edge |
Setting up Cumulocity DataHub Edge | Set up Cumulocity DataHub Edge and its components |
Working with Cumulocity DataHub Edge | Manage offloading pipelines and query the offloaded results |
Operating Cumulocity DataHub Edge | Run administrative tasks |
Cumulocity Edge is the single-server variant of the Cumulocity platform, designed to run in factories on industrial PCs or local servers, that is, in the same site (“onsite”) in which the IoT assets are located. Cumulocity DataHub is available as an add-on to Cumulocity Edge.
Cumulocity DataHub Edge offers the same functionality as the cloud-variant of Cumulocity DataHub, and is deployed similarly into a Kubernetes cluster. The significant difference is that processes and data are entirely local to your network, rather than in the cloud. You can define offloading pipelines, which regularly move data from the Operational Store of Cumulocity into a data lake. In the Edge setup, a NAS is used as data lake. Dremio, the internal engine of Cumulocity DataHub, can access the data lake and run analytical queries against its contents, using SQL as the query interface.
Cumulocity DataHub Edge uses the same software as Cumulocity DataHub, though in the following aspects these two variants differ:
Area | Cumulocity DataHub Edge | Cumulocity DataHub Cloud | |
---|---|---|---|
High Availability | Depending on any underlying virtualization technology | Depending on the cloud deployment setup | |
Vertical scalability | Yes | Yes | |
Horizontal scalability | No | Yes | |
Upgrades with no downtime | No | No | |
Root access | No | Yes, if customer is hosting | |
Installation | Offline & Online | Online | |
Dremio cluster setup | 1 master, 1 executor | Minimum 1 master, 1 executor | |
Dremio container management | Kubernetes | Kubernetes | |
Cumulocity DataHub backend container management | Microservice in Cumulocity Edge | Microservice in Cumulocity Core | |
Data lakes | NAS | Azure Storage, S3, (NAS) |
In this setup, Cumulocity DataHub is deployed into a Kubernetes environment using the Edge operator. The DataHub backend is run as a microservice within the Cumulocity platform. The Dremio master and executor are deployed as a set of Kubernetes pods.
The resource requirements for running a bare Cumulocity Edge instance are described in Requirements. When Cumulocity DataHub Edge on Kubernetes is deployed on top, the resource requirements change by the following additional amounts:
Hardware requirements for the host OS are excluded.
To install and configure DataHub Edge on Kubernetes, update the spec.dataHub
field in the Edge Custom Resource (CR) with the necessary configuration details for the Edge operator. After making the changes, apply the updated CR to deploy DataHub Edge.
For more details on the spec.dataHub
field, refer to Edge Custom Resource - DataHub.
For additional guidance, see the Install Edge and Modify Edge sections in the Edge on Kubernetes documentation.
In order to access Dremio, you must also make the domain datahub-<domain_name>
resolvable, just as the configured domain name and management-<domain_name>
were made resolvable in Accessing Edge.
Cumulocity DataHub Edge on Kubernetes behaves like the Cloud and Edge appliance version.
If the product doesn’t work as intended after the installation, go through the validation steps described below.
You can monitor the startup of the MySQL pod datahub-mysql-0
using:
kubectl get pods -n c8yedge datahub-mysql-0 --watch
The result will be similar to:
NAME READY STATUS RESTARTS AGE
datahub-mysql-0 1/1 Running 0 4m55s
When running the command:
kubectl get svc -n c8yedge
The output will be similar to:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mysql-client ClusterIP XXX.XXX.XXX.XXX <none> 3306/TCP 10m
You can monitor the state of the Dremio pods “zk-0”, “dremio-executor-0”, and “dremio-master-0” using:
kubectl get pods -n c8yedge --watch
The status “Running” indicates that the pods have started successfully:
NAME READY STATUS RESTARTS AGE
...
zk-0 1/1 Running 0 6m34s
dremio-executor-0 1/1 Running 0 6m34s
dremio-master-0 1/1 Running 0 6m34s
When running the command:
kubectl get svc -n c8yedge
The output will be similar to:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dremio-client LoadBalancer XXX.XXX.XXX.XXX XXX.XXX.XXX.XXX 31010:XXXXX/TCP,9047:XXXXX/TCP,32010:XXXXX/TCP 9m33s
When logged into the Cumulocity UI, the Cumulocity DataHub microservice is available under Administration > Ecosystem > Microservices.
You can monitor the startup of the microservice pod “datahub-scope-edge-deployment-….” using:
kubectl get pods -n c8yedge --watch
The status “Running” indicates that the pod has started successfully:
NAMESPACE NAME READY STATUS RESTARTS AGE
...
c8yedge datahub-scope-edge-deployment-XXXXXXXXXX-YYYYY 1/1 Running 0 16m
When logged into the Cumulocity UI, the Cumulocity DataHub web application is available under Administration > Ecosystem > Applications. It should also be present in the usual Cumulocity application switcher.
Cumulocity DataHub Edge offers the same set of functionality as the cloud variant. See Working with Cumulocity DataHub for details on configuring and monitoring offloading jobs, querying offloaded Cumulocity data, and refining offloaded Cumulocity data.
Similar to the cloud variant, Cumulocity DataHub Edge UI allows you to check system information and view audit logs. See Operating Cumulocity DataHub for details.
When managing Cumulocity DataHub Edge, the following standard tasks are additionally relevant.
If problems occur, you should follow these steps:
If you need to contact product support, include the diagnostic log archive. See Accessing logs.
You can check the status of the backend in the Administration page of the Cumulocity DataHub UI. Alternatively you can query the isalive
endpoint, which should produce an output similar to:
curl --user admin:your_password https://edge_domain_name/service/datahub/isalive
{
"timestamp" : 1582204706844,
"version" : {
"versionId" : "10.6.0.0.337",
"build" : "202002200050",
"scmRevision" : "4ddbb70bf96eb82a2f6c5e3f32c20ff206907f43"
}
}
If the backend cannot be reached, you will get an error response.
You can check the status of Dremio using the server_status
endpoint:
curl http://datahub-edge_domain_name/apiv2/server_status
"OK"
Dremio is running if OK is returned. No response will be returned if it is not running or inaccessible.
Logs are available in the Administration application of Cumulocity. Navigate to Ecosystem > Applications, select the DataHub microservice from the application list and switch to the Logs tab.
Dremio logs are available for each pod via kubectl, for example:
kubectl -n c8yedge logs dremio-executor-0
kubectl -n c8yedge logs dremio-master-0
Dremio maintains a history of job details and profiles, which can be inspected in Dremio’s job log, that is, the Jobs page of the Dremio UI. This job history must be cleaned up regularly to free the resources necessary for storing it.
Dremio is configured to perform the cleanup of job results automatically without downtime. The default value for the maximum age of stored job results is seven days. To change that value, a Dremio administrator must modify the support key jobs.max.age_in_days. The changes become effective within 24 hours or after restarting Dremio. See the corresponding Dremio documentation for more details on support keys.