Edge Operations

This section describes the main operating procedures for standard tasks that must be carried out when managing Cumulocity IoT Edge.

Monitoring

Edge on Kubernetes allows for monitoring of the Edge deployment using Prometheus, an open-source project that is used for monitoring application state. See https://prometheus.io/ for detailed information on Prometheus and how to use it.

The Edge operator exposes a Prometheus-compatible metrics endpoint, https://\<domain>:3443/metrics, where the domain is the one you specified in the Edge CR (or myown.iot.com if you followed the Quickstart installation steps). You can monitor the recommended list of metrics below, with more available at the above endpoint.

Cumulocity IoT Core Metrics

Metrics related to Cumulocity IoT Core, prefixed by c8yedge_core

Metric
Description
Interpretation
c8yedge_core_sag_c8y_process_cpu_usage CPU usage percentage of Cumulocity IoT Core process. Key for monitoring the CPU demand of the Cumulocity IoT Core process, helping in identifying high CPU consumption issues.
c8yedge_core_sag_c8y_system_cpu_count The total count of CPU cores available to the Cumulocity IoT Core system. Useful for understanding the computing capacity of the system and for scaling considerations.
c8yedge_core_sag_c8y_system_load_average_1m The 1-minute average system load. Indicates the immediate demand placed on the system’s resources, helping to identify spikes in usage.
c8yedge_core_sag_c8y_process_files_open_files The number of files currently open by Cumulocity IoT Core process. Important for monitoring resource utilization and preventing potential exhaustion of file descriptors.

Apama Metrics

Metrics related to the Apama Microservice, prefixed by c8yedge_apama

For more details, see Monitoring with Prometheus in the Apama documentation.

Metric
Description
Interpretation
c8yedge_apama_process_cpu_usage CPU usage by Apama process. Essential for understanding how Apama impacts overall system CPU resources.
c8yedge_apama_system_cpu_usage The total CPU usage percentage by the system, including Apama and other process. Helps gauge the overall CPU load and identify potential bottlenecks.
c8yedge_apama_system_load_average_1m The system’s 1-minute load average, including Apama’s impact. Gives an immediate view of system demand, useful for identifying sudden increases in load.
c8yedge_apama_sag_apama_in_c8y_uptime_secs This metric measures the uptime of Apama and its correlator in seconds within the Edge deployment. These are crucial for tracking the stability and reliability of the Apama service.
c8yedge_apama_process_uptime_seconds Measures how long the Apama process has been running in seconds. Similar to the uptime metrics above, this provides insights into Apama’s process stability.
c8yedge_apama_sag_apama_correlator_uptime_seconds metrics measure the uptime of Apama and its correlator in seconds within the Edge deployment. These are crucial for tracking the stability and reliability of the Apama service.
c8yedge_apama_process_files_open_files The number of files currently open by Apama process. Critical for ensuring Apama does not run into file descriptor limits, affecting its ability to operate.
c8yedge_apama_jvm_threads_peak_threads The peak thread count used by Apama’s JVM (Java Virtual Machine). Indicates the maximum concurrency level required by Apama, useful for JVM tuning and performance optimization.
c8yedge_apama_process_start_time_seconds The start time of the Apama process, measured in seconds since the Unix epoch. Can be used to determine the Apama process’s age, correlating with other events or metrics.
c8yedge_apama_sag_apama_in_c8y_is_starter_mode Indicates whether Apama is running in starter mode within Edge. Starter mode may have different resource usage or limitations compared to full operation mode.
c8yedge_apama_sag_apama_in_c8y_is_safe_mode Flags if Apama is operating in a safe mode within Edge. Safe mode might restrict certain operations or functions to ensure stability or security.

MongoDB Metrics

Metrics related to MongoDB, prefixed by c8yedge_db

Metric
Description
Interpretation
c8yedge_db_mongodb_up Indicates whether the MongoDB database instance is up and running. A binary metric where 1 means the database is operational, and 0 indicates it is down. This metric is crucial for alerting on database availability.
c8yedge_db_process_cpu_seconds_total Cumulative CPU time used by the database process, measured in seconds. This helps in understanding the total CPU time the MongoDB process has consumed since it started, allowing for analysis of CPU usage trends over time.
c8yedge_db_process_resident_memory_bytes The amount of RAM currently being used by the database process, in bytes. This metric is vital for understanding the database’s memory footprint, helping to ensure that the database does not exceed available memory resources and to plan for scaling or optimization if necessary.
c8yedge_db_process_virtual_memory_bytes The total virtual memory used by the database process, in bytes. Virtual memory includes all memory that the process can access, including what is in RAM and on disk (swap). Monitoring this helps in understanding the database’s overall memory demand, which is crucial for performance and stability.
c8yedge_db_process_virtual_memory_max_bytes The maximum amount of virtual memory the database process can use. This metric often reflects a system or process-level limit on memory usage. While not all systems enforce a maximum virtual memory size, when present, this metric can help identify configurations that may limit database performance or scalability.
c8yedge_db_process_open_fds The current number of open file descriptors by the database process. Monitoring this metric alongside c8yedge_db_process_max_fds can warn of potential resource exhaustion if the number of open file descriptors approaches the maximum limit.
c8yedge_db_process_max_fds The maximum number of file descriptors the database process can open. File descriptors are used by process to access files and network sockets. This metric indicates the upper limit set for these resources, helping to anticipate and prevent resource exhaustion issues.
c8yedge_db_process_start_time_seconds The start time of the database process, measured in seconds since the Unix epoch. This metric can be used to determine how long the database has been running since its last restart. It’s useful for tracking uptime and correlating with other events or metrics.

StatefulSets Metrics

This metric tracks the health status of StatefulSets in the Edge deployment. StatefulSets are Kubernetes workloads that manage stateful applications, ensuring that the deployment and scaling of the application are handled properly, and that the application maintains its state across restarts and migrations.
The c8yedge_statefulset metric can be qualified by the StatefulSet name, such as statefulset="c8ycore-sts", statefulset="edge-db-rs0", and so on, for monitoring different StatefulSets in the Edge deployment. A gauge value of 1 for a StatefulSet indicates that it is healthy, meaning all desired replicas are up and serving without issues. A value of 0 indicates a failure, such as one or more replicas not running as expected.

Metrics Label Options
c8yedge_statefulset{statefulset=“c8ycore-sts”} - statefulset="c8ycore-sts"
- statefulset="edge-db-rs0"
- statefulset="logging-fluentd"
- statefulset="event-tailer-event-tailer"

Deployments Metrics

These metrics represent the health status of Deployments in Edge. Deployments are another type of Kubernetes workload that manage stateless applications, ensuring that a specified number of replicas of the application are running at any given time.
The c8yedge_deployment metric can be qualified by the Deployment name, such as deployment="c8yedge-operator-controller-manager", deployment="apama-ctrl-scope-edge-deployment", and so on, for monitoring different Deployments in Edge. A gauge value of 1 for a Deployment indicates that it is healthy, meaning all desired replicas are up and serving without issues. A value of 0 indicates a failure, such as one or more replicas not running as expected.

Metrics Label Options
c8yedge_deployment{deployment=“microservices-registry”} - deployment="c8yedge-operator-controller-manager"
- deployment="logging-operator-c8yedge-edge-sample"
- deployment="psmdb-operator-c8yedge-edge-sample"
- deployment="microservices-registry"
- deployment="thin-edge"
- deployment="smartrule-scope-management-deployment"
- deployment="apama-ctrl-scope-edge-deployment"
- deployment="opcua-mgmt-service-scope-management-deployment"

Pod Metrics (CPU and Memory Usage)

These metrics track the CPU and memory usage of the workloads in your Edge deployment. CPU usage is a critical performance metric, indicating how much of the allocated CPU resources are being utilized by a container. Monitoring memory usage helps in managing application scalability, optimizing resource allocation, and preventing out-of-memory (OOM) issues that could lead to application downtime.
The c8yedge_pod_cpu_usage and c8yedge_pod_memory_usage metrics can be qualified by the Pod and Container names, such as container="cumulocity-core", pod="c8ycore-sts-0" and so on for monitoring the CPU and memory usage of different containers in the Edge deployment.


Pod names often contain a unique hash or a set of alphanumeric characters that change every time the pod is restarted, redeployed, or rescheduled on a different node. To continuously monitor the CPU usage of a container across pod restarts, you can use regular expressions (regex) in your monitoring queries to match only the static part of the pod name. For example, if you want to monitor the apama-ctrl-scope-edge-pod container regardless of pod restarts, you must construct a query that matches any pod name starting with apama-ctrl-scope-edge-deployment without specifying the unique hash, like container="apama-ctrl-scope-edge-pod",pod=~"apama-ctrl-scope-edge-deployment.*".

Metrics Label Options
c8yedge_pod_cpu_usage{container=“cumulocity-core”, pod=“c8ycore-sts-0”} - container="cumulocity-core", pod="c8ycore-sts-0"
- container="openresty", pod="c8ycore-sts-0"
- container="smartrule-scope-management-pod", pod=~"smartrule-scope-management-deployment-*"
- container="apama-ctrl-scope-edge-pod", pod=~"apama-ctrl-scope-edge-deployment-*"
- container="opcua-mgmt-service-scope-management-pod", pod=~"opcua-mgmt-service-scope-management-deployment-*"
- container="mongod", pod="edge-db-rs0-0"
- container="mongodb-exporter", pod="edge-db-rs0-0"
- container="psmdb-operator", pod=~"psmdb-operator-c8yedge-edge-sample-*"
- container="kedge-agent", pod=~"thin-edge-*"
- container="mosquitto", pod=~"thin-edge-*"
- container="microservices-registry", pod=~"microservices-registry-*"
- container="microservices-registry", pod=~"microservices-registry-garbage-collector-*"
- container="microservices-registry-config", pod=~"microservices-registry-config-*"
- container="fluentd", pod="logging-fluentd-0"
- container="config-reloader", pod="logging-fluentd-0"
- container="fluent-bit", pod=~"logging-fluentbit-*"
- container="event-tailer", pod="event-tailer-event-tailer-0"
- container="logging-operator", pod="logging-operator-c8yedge-edge-sample-*"
- container="manager", pod=~"c8yedge-operator-controller-manager-*"
c8yedge_pod_memory_usage{container=“cumulocity-core”, pod=“c8ycore-sts-0”} - container="cumulocity-core", pod="c8ycore-sts-0"
- container="openresty", pod="c8ycore-sts-0"
- container="smartrule-scope-management-pod", pod=~"smartrule-scope-management-deployment-*"
- container="apama-ctrl-scope-edge-pod", pod=~"apama-ctrl-scope-edge-deployment-*"
- container="opcua-mgmt-service-scope-management-pod", pod=~"opcua-mgmt-service-scope-management-deployment-*"
- container="mongod", pod="edge-db-rs0-0"
- container="mongodb-exporter", pod="edge-db-rs0-0"
- container="psmdb-operator", pod=~"psmdb-operator-c8yedge-edge-sample-*"
- container="kedge-agent", pod=~"thin-edge-*"
- container="mosquitto", pod=~"thin-edge-*"
- container="microservices-registry", pod=~"microservices-registry-*"
- container="microservices-registry", pod=~"microservices-registry-garbage-collector-*"
- container="microservices-registry-config", pod=~"microservices-registry-config-*"
- container="fluentd", pod="logging-fluentd-0"
- container="config-reloader", pod="logging-fluentd-0"
- container="fluent-bit", pod=~"logging-fluentbit-*"
- container="event-tailer", pod="event-tailer-event-tailer-0"
- container="logging-operator", pod="logging-operator-c8yedge-edge-sample-*"
- container="manager", pod=~"c8yedge-operator-controller-manager-*"

Installing and Configuring Monitoring Tools

Installing Prometheus

Prometheus is an open-source project that is used for monitoring application state. See https://prometheus.io/ for detailed information on Prometheus and how to use it. See Installing Prometheus for detailed steps on installing Prometheus.

Installing Grafana

Grafana is an open-source project which serves as an introductory tool for querying, visualising, alerting, and exploring metrics, logs, and traces from diverse storage locations. See https://grafana.com/docs/grafana/latest/ for detailed information on Grafana and how to use it. See Installing Grafana for detailed steps on installing Grafana.

Monitoring the Edge metrics from your cloud tenant

In your Cumulocity IoT cloud tenant, you can monitor the measurements of the Edge listed in the table below. To monitor the measurements from your cloud tenant, ensure that you have registered your Edge with the Cumulocity IoT cloud tenant. See Registering Edge in the cloud tenant for more information.

Measurement
Metrics
Description
Disk space - Total disk space
- Free disk space
- Used disk space
- Percentage of used disk space
The Edge appliance sends the disk space metrics as a measurement for both installation disk and data disk, every 10 minutes. The measurements are sent in gigabytes (GB) rounded to two decimal places. The percentage is rounded to one decimal place. The data points for this measurement are:

- c8y_InstallationDisk

- c8y_DataDisk


If Edge is unable to read the metrics from the installation disk or the data disk, an alarm is sent to the Cumulocity IoT tenant. The alarms have a minor severity and the data points for the alarms are:

- c8y_FileSystemMeasurementErrorInstallationDisk

- c8y_FileSystemMeasurementErrorDataDisk

Memory (RAM) - Total RAM
- Free RAM
- Used RAM
- Percentage of RAM used
The Edge appliance sends the memory usage metrics as a measurement every 5 seconds in gibibytes (GiB). The data point for this measurement is c8y_Memory

If Edge is unable to read the metrics from the memory, an alarm is sent to the Cumulocity IoT tenant. The data point for the alarm is:

- c8y_MemoryMeasurementError.

CPU Percentage of CPU used

Unit: Percentage
The Edge appliance sends the percentage of CPU used at intervals over 5 seconds, 60 seconds, and 600 seconds. The data points for this measurement are:

- c8y_CpuUsage5Seconds

- c8y_CpuUsage60Seconds

- c8y_CpuUsage600Seconds


If Edge is unable to read the metrics from the CPU, an alarm is sent to the Cumulocity IoT tenant. The data point for the alarm is:

- c8y_CPUMeasurementError.

Disk I/O - Data read per second
- Data written per second

Unit: KB/s
The Edge appliance sends the disk input/output metrics as a measurement for both installation disk and data disk at intervals over 5 seconds, 60 seconds, and 600 seconds. The data points for this measurement are:

- c8y_DataDiskIo5Seconds

- c8y_DataDiskIo60Seconds

- c8y_DataDiskIo600Seconds

- c8y_InstallationDiskIo5Seconds

- c8y_InstallationDiskIo60Seconds

- c8y_InstallationDiskIo5Seconds


If Edge is unable to read the metrics from the disk, an alarm is sent to the Cumulocity IoT tenant. The data point for the alarm is:

- c8y_DiskIOMeasurementError.

Network - Data and packets sent per second
- Data and packets received per second

Unit: KB/s and packets/s
The Edge appliance sends the network metrics as a measurement at intervals over 5 seconds, 60 seconds, and 600 seconds. The data points for this measurement are:

- c8y_NetworkInterface_lo-5Seconds

- c8y_NetworkInterface_lo-60Seconds

- c8y_NetworkInterface_lo-600Seconds


If Edge is unable to read the metrics from the network, an alarm is sent to the Cumulocity IoT tenant. The data point for the alarm is:

- c8y_NetworkIoMeasurementError.

To monitor the metrics in your Cumulocity IoT tenant, you can create a dashboard and add widgets in the Cockpit application of your tenant. For more information about creating dashboards, see Working with dashboards.

Also, you can define smart rules to create alerts or raise alarms for the metrics. For example, when the free disk space is less than 5 GB, create an alert. For more information about smart rules, see Smart rules.