Cumulocity IoT DataHub

Release 10.16.0

Cumulocity IoT DataHub Release 10.16 includes the following improvements, limitations, and known issues:

Authentication against Azure data lake via Azure Active Directory

When using Azure Storage as data lake, Azure Active Directory credentials can be used for authentication.

Configuration of offloading frequency

Per default an offloading run is executed once an hour, for each hour of the day. The offloading frequency can be configured more fine-granular by selecting at which hours of the day an offloading run will be executed.

Handling of duplicate column names

When deriving column names for offloading the measurements collection and attribute names vary only by their case, duplicate column names may occur, leading to unwanted behavior like mixed types. A new option for name sanitization ensures that distinct column names are generated.

Minor UI enhancements

The offloading page has been enhanced with status information. It provides the status of the most recent offloading and compaction runs for an offloading configuration. Regarding the definition of an offloading configuration, the additional filter predicate step provides examples for filter predicates.

Limitations

Description
Mixed usage of uppercase and lowercase characters for attribute names in the documents is not supported.
If the collection to be offloaded has JSON attributes consisting of more than 32,000 characters, its data cannot be offloaded. One specific case where this limitation applies is Cumulocity IoT’s application builder, which stores its assets in the inventory collection when being used.
If the collection to be offloaded has more than 800 JSON attributes, its data cannot be offloaded. This limitation also includes nested JSON content, which will be expanded into columns during offloading. Therefore, measurements documents with more than 800 series/series value fragments are not supported.
If an attribute of a collection has varying types associated, the result table will contain a mixed type which may render query writing difficult or lead to problems with subsequent consumer applications.
In previous releases, the childDevices and childAssets fields were part of the default offloading columns for the inventory collection. They are now excluded from the offloading process as their potentially high number of list items leads to problems. When that number exceeds a corresponding Dremio limit, the offloaded data is no longer readable. New inventory offloading configurations exclude those two fields. For existing inventory configurations, they are automatically added as additional result columns, in order to preserve the set of columns handled by the offloading configuration.

Known issues

Edition
Description
Cloud Data lake configuration validation is broken in terms of wrong bucket names (AWS S3) and wrong account names (Azure Storage). When saving the settings with an invalid bucket/account name, DataHub fails to quickly detect the problem and will instead run a time-consuming check, which shows up as an ongoing save request in the UI. Eventually the request will fail in the UI with a timeout and the save request in the backend will fail as well. In such a case, please carefully check the bucket/account name and try saving again.
Edge There are no retention policies in place that prevent the data lake contents from exceeding the hard disk limits.
Edge TLS is not supported for ODBC and JDBC.