DataHub

Cumulocity DataHub on Edge offers the same functionality as a cloud installation of Cumulocity DataHub, and is an optional component of Edge. The significant difference is that processes and data are entirely local to your network, rather than in the cloud. You can define offloading pipelines, which regularly move data from the Operational Store of Cumulocity into a data lake. In the Edge setup, a NAS or local disk is used as data lake. Dremio, the internal engine of Cumulocity DataHub, can access the data lake and run analytical queries against its contents, using SQL as the query interface.

To learn more about DataHub in general, see DataHub overview. As an end user, DataHub on Edge appears and behaves much the same as DataHub in a cloud installation, subject to the limitations in the comparison table later in this section.

Installing and using DataHub

DataHub is an optional component of Edge, and can be enabled by updating the spec.dataHub field in the Edge custom resource (CR). For more details on the spec.messagingService field, refer to Edge custom resource - DataHub. For general guidance on configuring Edge, see the Install Edge and Modify Edge sections.

The data lake and related storage will always be written to the host file system under the path /datahub, whatever is mounted there. You are expected to have a single shared NAS file system, such as NFS mounted at that path on all nodes of the Kubernetes cluster that Edge is running on. This is to ensure the resilience of your data lake contents.

In order to access Dremio, you must also make the domain datahub-<domain_name> resolvable, just as the configured domain name and management-<domain_name> were made resolvable in Accessing Edge.

Comparison between DataHub Edge and DataHub Cloud

Area Cumulocity DataHub Edge Cumulocity DataHub Cloud
High availability Depending on any underlying virtualization technology Depending on the cloud deployment setup
Vertical scalability Yes Yes
Horizontal scalability No Yes
Upgrades with no downtime No No
Installation Offline & Online Online
Dremio cluster setup 1 master, 1 executor Minimum 1 master, 1 executor
Data lakes NAS or local disk Azure Storage, S3, (NAS)