Cumulocity IoT DataHub

Release 10.13.0

Cumulocity IoT DataHub Release 10.13 includes the following improvements, limitations, and known issues:

New version of internal query engine

Cumulocity IoT DataHub now uses version 19.3 of Dremio as query engine. The new version comes along with various features and improvements like a revamped user interface, performance improvements, and more fine-granular permission management.

Materialization of views

For the alarms, events, and inventory collections Cumulocity IoT DataHub configures views over the target tables, which maintain the latest status of all entities. To circumvent potential performance penalties of view maintenance in large-scale setups, the views can be optionally materialized so that the latest state of each entity will be persisted in the data lake.

Compaction strategies

Cumulocity IoT DataHub uses a periodically executed compaction process to combine multiple smaller files in the data lake into one or more larger files. The corresponding strategy can now be configured for each offloading pipeline.

Improved rendering of large sets of artifacts

The user interface was improved with respect to the handling of large sets of artifacts, including job history, offloading status, and offloading configuration entries. The underlying pagination being introduced allows thus for larger use-cases and reduces main memory consumption.

Manual refresh of measurement types

When configuring an offloading pipeline for the measurements collection, you can manually trigger the refresh of measurement types, which lets the system search for not yet detected measurement types.

Limitations

Description
Mixed usage of uppercase and lowercase characters for an attribute name in the documents is not supported as it may corrupt the schema of the offloaded data.
If the collection to be offloaded has JSON attributes consisting of more than 32,000 characters, its data cannot be offloaded. One specific case where this limitation applies is Cumulocity IoT’s application builder, which stores its assets in the inventory collection when being used.
If the collection to be offloaded has more than 800 JSON attributes, its data cannot be offloaded. This limitation also includes nested JSON content, which will be expanded into columns during offloading. Therefore, measurements documents with more than 800 series/series value fragments are not supported.
If an attribute of a collection has varying types associated, the result table will contain a mixed type which may render query writing difficult or lead to problems with subsequent consumer applications.
DataHub does not work with a Kubernetes version prior to 1.9.

Known issues

Edition
Description
Cloud Data lake configuration validation is broken in terms of wrong bucket names (AWS S3) and wrong account names (Azure Storage). When saving the settings with an invalid bucket/account name, DataHub fails to quickly detect the problem and will instead run a time-consuming check, which shows up as an ongoing save request in the UI. Eventually the request will fail in the UI with a timeout and the save request in the backend will fail as well. In such a case, please carefully check the bucket/account name and try saving again.
Cloud For older deployments, if the track changes feature of Cumulocity IoT was not enabled, the readings in the Operational Store of Cumulocity IoT do not have a creation timestamp. If more than 4096 of those readings exist, the offloading will fail, even if track changes was enabled afterwards. In that case a Dremio administrator must run a SELECT * query without limit or filter over the collection in order to gather the correct schema, which is required for a successful offloading.
Edge There are no retention policies in place that prevent the data lake contents from exceeding the hard disk limits.
Edge TLS is not supported for ODBC and JDBC.

Release 10.13.0.9

Cumulocity IoT DataHub Release 10.13 includes the following improvements, limitations, and known issues:

New version of internal query engine

Cumulocity IoT DataHub now uses version 19.3 of Dremio as query engine. The new version comes along with various features and improvements like a revamped user interface, performance improvements, and more fine-granular permission management.

Materialization of views

For the alarms, events, and inventory collections Cumulocity IoT DataHub configures views over the target tables, which maintain the latest status of all entities. To circumvent potential performance penalties of view maintenance in large-scale setups, the views can be optionally materialized so that the latest state of each entity will be persisted in the data lake.

Compaction strategies

Cumulocity IoT DataHub uses a periodically executed compaction process to combine multiple smaller files in the data lake into one or more larger files. The corresponding strategy can now be configured for each offloading pipeline.

Improved rendering of large sets of artifacts

The user interface was improved with respect to the handling of large sets of artifacts, including job history, offloading status, and offloading configuration entries. The underlying pagination being introduced allows thus for larger use-cases and reduces main memory consumption.

Manual refresh of measurement types

When configuring an offloading pipeline for the measurements collection, you can manually trigger the refresh of measurement types, which lets the system search for not yet detected measurement types.

Limitations

Description
Mixed usage of uppercase and lowercase characters for an attribute name in the documents is not supported as it may corrupt the schema of the offloaded data.
If the collection to be offloaded has JSON attributes consisting of more than 32,000 characters, its data cannot be offloaded. One specific case where this limitation applies is Cumulocity IoT’s application builder, which stores its assets in the inventory collection when being used.
If the collection to be offloaded has more than 800 JSON attributes, its data cannot be offloaded. This limitation also includes nested JSON content, which will be expanded into columns during offloading. Therefore, measurements documents with more than 800 series/series value fragments are not supported.
If an attribute of a collection has varying types associated, the result table will contain a mixed type which may render query writing difficult or lead to problems with subsequent consumer applications.
DataHub does not work with a Kubernetes version prior to 1.9.
In previous releases, the childDevices and childAssets fields were part of the default offloading columns for the inventory collection. They are now excluded from the offloading process as their potentially high number of list items leads to problems. When that number exceeds a corresponding Dremio limit, the offloaded data is no longer readable. New inventory offloading configurations exclude those two fields. For existing inventory configurations, they are automatically added as additional result columns, in order to preserve the set of columns handled by the offloading configuration.

Known issues

Edition
Description
Cloud Data lake configuration validation is broken in terms of wrong bucket names (AWS S3) and wrong account names (Azure Storage). When saving the settings with an invalid bucket/account name, DataHub fails to quickly detect the problem and will instead run a time-consuming check, which shows up as an ongoing save request in the UI. Eventually the request will fail in the UI with a timeout and the save request in the backend will fail as well. In such a case, please carefully check the bucket/account name and try saving again.
Cloud For older deployments, if the track changes feature of Cumulocity IoT was not enabled, the readings in the Operational Store of Cumulocity IoT do not have a creation timestamp. If more than 4096 of those readings exist, the offloading will fail, even if track changes was enabled afterwards. In that case a Dremio administrator must run a SELECT * query without limit or filter over the collection in order to gather the correct schema, which is required for a successful offloading.
Edge There are no retention policies in place that prevent the data lake contents from exceeding the hard disk limits.
Edge TLS is not supported for ODBC and JDBC.