Edge Operations
This section describes the main operating procedures for standard tasks that must be carried out when managing Cumulocity IoT Edge.
This section describes the main operating procedures for standard tasks that must be carried out when managing Cumulocity IoT Edge.
Edge on Kubernetes allows for monitoring of the Edge deployment using Prometheus, an open-source project that is used for monitoring application state. See https://prometheus.io/ for detailed information on Prometheus and how to use it.
The Edge operator exposes a Prometheus-compatible metrics endpoint, https://\<domain>:3443/metrics
, where the domain is the one you specified in the Edge CR (or myown.iot.com if you followed the Quickstart installation steps).
You can monitor the recommended list of metrics below, with more available at the above endpoint.
Metrics related to Cumulocity IoT Core, prefixed by c8yedge_core
Metric |
Description |
Interpretation |
---|---|---|
c8yedge_core_sag_c8y_process_cpu_usage | CPU usage percentage of Cumulocity IoT Core process. | Key for monitoring the CPU demand of the Cumulocity IoT Core process, helping in identifying high CPU consumption issues. |
c8yedge_core_sag_c8y_system_cpu_count | The total count of CPU cores available to the Cumulocity IoT Core system. | Useful for understanding the computing capacity of the system and for scaling considerations. |
c8yedge_core_sag_c8y_system_load_average_1m | The 1-minute average system load. | Indicates the immediate demand placed on the system’s resources, helping to identify spikes in usage. |
c8yedge_core_sag_c8y_process_files_open_files | The number of files currently open by Cumulocity IoT Core process. | Important for monitoring resource utilization and preventing potential exhaustion of file descriptors. |
Metrics related to the Apama Microservice, prefixed by c8yedge_apama
For more details, see Monitoring with Prometheus in the Apama documentation.
Metric |
Description |
Interpretation |
---|---|---|
c8yedge_apama_process_cpu_usage | CPU usage by Apama process. | Essential for understanding how Apama impacts overall system CPU resources. |
c8yedge_apama_system_cpu_usage | The total CPU usage percentage by the system, including Apama and other process. | Helps gauge the overall CPU load and identify potential bottlenecks. |
c8yedge_apama_system_load_average_1m | The system’s 1-minute load average, including Apama’s impact. | Gives an immediate view of system demand, useful for identifying sudden increases in load. |
c8yedge_apama_sag_apama_in_c8y_uptime_secs | This metric measures the uptime of Apama and its correlator in seconds within the Edge deployment. | These are crucial for tracking the stability and reliability of the Apama service. |
c8yedge_apama_process_uptime_seconds | Measures how long the Apama process has been running in seconds. | Similar to the uptime metrics above, this provides insights into Apama’s process stability. |
c8yedge_apama_sag_apama_correlator_uptime_seconds | metrics measure the uptime of Apama and its correlator in seconds within the Edge deployment. | These are crucial for tracking the stability and reliability of the Apama service. |
c8yedge_apama_process_files_open_files | The number of files currently open by Apama process. | Critical for ensuring Apama does not run into file descriptor limits, affecting its ability to operate. |
c8yedge_apama_jvm_threads_peak_threads | The peak thread count used by Apama’s JVM (Java Virtual Machine). | Indicates the maximum concurrency level required by Apama, useful for JVM tuning and performance optimization. |
c8yedge_apama_process_start_time_seconds | The start time of the Apama process, measured in seconds since the Unix epoch. | Can be used to determine the Apama process’s age, correlating with other events or metrics. |
c8yedge_apama_sag_apama_in_c8y_is_starter_mode | Indicates whether Apama is running in starter mode within Edge. | Starter mode may have different resource usage or limitations compared to full operation mode. |
c8yedge_apama_sag_apama_in_c8y_is_safe_mode | Flags if Apama is operating in a safe mode within Edge. | Safe mode might restrict certain operations or functions to ensure stability or security. |
Metrics related to MongoDB, prefixed by c8yedge_db
Metric |
Description |
Interpretation |
---|---|---|
c8yedge_db_mongodb_up | Indicates whether the MongoDB database instance is up and running. | A binary metric where 1 means the database is operational, and 0 indicates it is down. This metric is crucial for alerting on database availability. |
c8yedge_db_process_cpu_seconds_total | Cumulative CPU time used by the database process, measured in seconds. | This helps in understanding the total CPU time the MongoDB process has consumed since it started, allowing for analysis of CPU usage trends over time. |
c8yedge_db_process_resident_memory_bytes | The amount of RAM currently being used by the database process, in bytes. | This metric is vital for understanding the database’s memory footprint, helping to ensure that the database does not exceed available memory resources and to plan for scaling or optimization if necessary. |
c8yedge_db_process_virtual_memory_bytes | The total virtual memory used by the database process, in bytes. | Virtual memory includes all memory that the process can access, including what is in RAM and on disk (swap). Monitoring this helps in understanding the database’s overall memory demand, which is crucial for performance and stability. |
c8yedge_db_process_virtual_memory_max_bytes | The maximum amount of virtual memory the database process can use. | This metric often reflects a system or process-level limit on memory usage. While not all systems enforce a maximum virtual memory size, when present, this metric can help identify configurations that may limit database performance or scalability. |
c8yedge_db_process_open_fds | The current number of open file descriptors by the database process. | Monitoring this metric alongside c8yedge_db_process_max_fds can warn of potential resource exhaustion if the number of open file descriptors approaches the maximum limit. |
c8yedge_db_process_max_fds | The maximum number of file descriptors the database process can open. | File descriptors are used by process to access files and network sockets. This metric indicates the upper limit set for these resources, helping to anticipate and prevent resource exhaustion issues. |
c8yedge_db_process_start_time_seconds | The start time of the database process, measured in seconds since the Unix epoch. | This metric can be used to determine how long the database has been running since its last restart. It’s useful for tracking uptime and correlating with other events or metrics. |
This metric tracks the health status of StatefulSets in the Edge deployment. StatefulSets are Kubernetes workloads that manage stateful applications, ensuring that the deployment and scaling of the application are handled properly, and that the application maintains its state across restarts and migrations.
The c8yedge_statefulset metric can be qualified by the StatefulSet name, such as statefulset="c8ycore-sts"
, statefulset="edge-db-rs0"
, and so on, for monitoring different StatefulSets in the Edge deployment. A gauge value of 1 for a StatefulSet indicates that it is healthy, meaning all desired replicas are up and serving without issues. A value of 0 indicates a failure, such as one or more replicas not running as expected.
Metrics | Label Options |
---|---|
c8yedge_statefulset{statefulset=“c8ycore-sts”} | - statefulset="c8ycore-sts" - statefulset="edge-db-rs0" - statefulset="logging-fluentd" - statefulset="event-tailer-event-tailer" |
These metrics represent the health status of Deployments in Edge. Deployments are another type of Kubernetes workload that manage stateless applications, ensuring that a specified number of replicas of the application are running at any given time.
The c8yedge_deployment metric can be qualified by the Deployment name, such as deployment="c8yedge-operator-controller-manager"
, deployment="apama-ctrl-scope-edge-deployment"
, and so on, for monitoring different Deployments in Edge. A gauge value of 1 for a Deployment indicates that it is healthy, meaning all desired replicas are up and serving without issues. A value of 0 indicates a failure, such as one or more replicas not running as expected.
Metrics | Label Options |
---|---|
c8yedge_deployment{deployment=“microservices-registry”} | - deployment="c8yedge-operator-controller-manager" - deployment="logging-operator-c8yedge-edge-sample" - deployment="psmdb-operator-c8yedge-edge-sample" - deployment="microservices-registry" - deployment="thin-edge" - deployment="smartrule-scope-management-deployment" - deployment="apama-ctrl-scope-edge-deployment" - deployment="opcua-mgmt-service-scope-management-deployment" |
These metrics track the CPU and memory usage of the workloads in your Edge deployment. CPU usage is a critical performance metric, indicating how much of the allocated CPU resources are being utilized by a container. Monitoring memory usage helps in managing application scalability, optimizing resource allocation, and preventing out-of-memory (OOM) issues that could lead to application downtime.
The c8yedge_pod_cpu_usage and c8yedge_pod_memory_usage metrics can be qualified by the Pod and Container names, such as container="cumulocity-core", pod="c8ycore-sts-0"
and so on for monitoring the CPU and memory usage of different containers in the Edge deployment.
Pod names often contain a unique hash or a set of alphanumeric characters that change every time the pod is restarted, redeployed, or rescheduled on a different node. To continuously monitor the CPU usage of a container across pod restarts, you can use regular expressions (regex) in your monitoring queries to match only the static part of the pod name. For example, if you want to monitor the apama-ctrl-scope-edge-pod
container regardless of pod restarts, you must construct a query that matches any pod name starting with apama-ctrl-scope-edge-deployment
without specifying the unique hash, like container="apama-ctrl-scope-edge-pod",pod=~"apama-ctrl-scope-edge-deployment.*"
.
Metrics | Label Options | |
---|---|---|
c8yedge_pod_cpu_usage{container=“cumulocity-core”, pod=“c8ycore-sts-0”} | - container="cumulocity-core", pod="c8ycore-sts-0" - container="openresty", pod="c8ycore-sts-0" - container="smartrule-scope-management-pod", pod=~"smartrule-scope-management-deployment-*" - container="apama-ctrl-scope-edge-pod", pod=~"apama-ctrl-scope-edge-deployment-*" - container="opcua-mgmt-service-scope-management-pod", pod=~"opcua-mgmt-service-scope-management-deployment-*" - container="mongod", pod="edge-db-rs0-0" - container="mongodb-exporter", pod="edge-db-rs0-0" - container="psmdb-operator", pod=~"psmdb-operator-c8yedge-edge-sample-*" - container="kedge-agent", pod=~"thin-edge-*" - container="mosquitto", pod=~"thin-edge-*" - container="microservices-registry", pod=~"microservices-registry-*" - container="microservices-registry", pod=~"microservices-registry-garbage-collector-*" - container="microservices-registry-config", pod=~"microservices-registry-config-*" - container="fluentd", pod="logging-fluentd-0" - container="config-reloader", pod="logging-fluentd-0" - container="fluent-bit", pod=~"logging-fluentbit-*" - container="event-tailer", pod="event-tailer-event-tailer-0" - container="logging-operator", pod="logging-operator-c8yedge-edge-sample-*" - container="manager", pod=~"c8yedge-operator-controller-manager-*" |
|
c8yedge_pod_memory_usage{container=“cumulocity-core”, pod=“c8ycore-sts-0”} | - container="cumulocity-core", pod="c8ycore-sts-0" - container="openresty", pod="c8ycore-sts-0" - container="smartrule-scope-management-pod", pod=~"smartrule-scope-management-deployment-*" - container="apama-ctrl-scope-edge-pod", pod=~"apama-ctrl-scope-edge-deployment-*" - container="opcua-mgmt-service-scope-management-pod", pod=~"opcua-mgmt-service-scope-management-deployment-*" - container="mongod", pod="edge-db-rs0-0" - container="mongodb-exporter", pod="edge-db-rs0-0" - container="psmdb-operator", pod=~"psmdb-operator-c8yedge-edge-sample-*" - container="kedge-agent", pod=~"thin-edge-*" - container="mosquitto", pod=~"thin-edge-*" - container="microservices-registry", pod=~"microservices-registry-*" - container="microservices-registry", pod=~"microservices-registry-garbage-collector-*" - container="microservices-registry-config", pod=~"microservices-registry-config-*" - container="fluentd", pod="logging-fluentd-0" - container="config-reloader", pod="logging-fluentd-0" - container="fluent-bit", pod=~"logging-fluentbit-*" - container="event-tailer", pod="event-tailer-event-tailer-0" - container="logging-operator", pod="logging-operator-c8yedge-edge-sample-*" - container="manager", pod=~"c8yedge-operator-controller-manager-*" |
Prometheus is an open-source project that is used for monitoring application state. See https://prometheus.io/ for detailed information on Prometheus and how to use it. See Installing Prometheus for detailed steps on installing Prometheus.
Grafana is an open-source project which serves as an introductory tool for querying, visualising, alerting, and exploring metrics, logs, and traces from diverse storage locations. See https://grafana.com/docs/grafana/latest/ for detailed information on Grafana and how to use it. See Installing Grafana for detailed steps on installing Grafana.
In your Cumulocity IoT cloud tenant, you can monitor the measurements of the Edge listed in the table below. To monitor the measurements from your cloud tenant, ensure that you have registered your Edge with the Cumulocity IoT cloud tenant. See Registering Edge in the cloud tenant for more information.
Measurement |
Metrics |
Description |
---|---|---|
Disk space | - Total disk space - Free disk space - Used disk space - Percentage of used disk space |
The Edge appliance sends the disk space metrics as a measurement for both installation disk and data disk, every 10 minutes. The measurements are sent in gigabytes (GB) rounded to two decimal places. The percentage is rounded to one decimal place. The data points for this measurement are: - c8y_InstallationDisk - c8y_DataDisk If Edge is unable to read the metrics from the installation disk or the data disk, an alarm is sent to the Cumulocity IoT tenant. The alarms have a minor severity and the data points for the alarms are: - c8y_FileSystemMeasurementErrorInstallationDisk - c8y_FileSystemMeasurementErrorDataDisk |
Memory (RAM) | - Total RAM - Free RAM - Used RAM - Percentage of RAM used |
The Edge appliance sends the memory usage metrics as a measurement every 5 seconds in gibibytes (GiB). The data point for this measurement is c8y_Memory If Edge is unable to read the metrics from the memory, an alarm is sent to the Cumulocity IoT tenant. The data point for the alarm is: - c8y_MemoryMeasurementError. |
CPU | Percentage of CPU used Unit: Percentage |
The Edge appliance sends the percentage of CPU used at intervals over 5 seconds, 60 seconds, and 600 seconds. The data points for this measurement are: - c8y_CpuUsage5Seconds - c8y_CpuUsage60Seconds - c8y_CpuUsage600Seconds If Edge is unable to read the metrics from the CPU, an alarm is sent to the Cumulocity IoT tenant. The data point for the alarm is: - c8y_CPUMeasurementError. |
Disk I/O | - Data read per second - Data written per second Unit: KB/s |
The Edge appliance sends the disk input/output metrics as a measurement for both installation disk and data disk at intervals over 5 seconds, 60 seconds, and 600 seconds. The data points for this measurement are: - c8y_DataDiskIo5Seconds - c8y_DataDiskIo60Seconds - c8y_DataDiskIo600Seconds - c8y_InstallationDiskIo5Seconds - c8y_InstallationDiskIo60Seconds - c8y_InstallationDiskIo5Seconds If Edge is unable to read the metrics from the disk, an alarm is sent to the Cumulocity IoT tenant. The data point for the alarm is: - c8y_DiskIOMeasurementError. |
Network | - Data and packets sent per second - Data and packets received per second Unit: KB/s and packets/s |
The Edge appliance sends the network metrics as a measurement at intervals over 5 seconds, 60 seconds, and 600 seconds. The data points for this measurement are: - c8y_NetworkInterface_lo-5Seconds - c8y_NetworkInterface_lo-60Seconds - c8y_NetworkInterface_lo-600Seconds If Edge is unable to read the metrics from the network, an alarm is sent to the Cumulocity IoT tenant. The data point for the alarm is: - c8y_NetworkIoMeasurementError. |
To monitor the metrics in your Cumulocity IoT tenant, you can create a dashboard and add widgets in the Cockpit application of your tenant. For more information about creating dashboards, see Working with dashboards.
Also, you can define smart rules to create alerts or raise alarms for the metrics. For example, when the free disk space is less than 5 GB, create an alert. For more information about smart rules, see Smart rules.