Edge Operations

Monitoring

Edge on Kubernetes allows for monitoring of the Edge deployment using Prometheus, an open-source project that is used for monitoring application state. See https://prometheus.io/ for detailed information on Prometheus and how to use it.

The Edge operator exposes a Prometheus-compatible metrics endpoint, https://\<domain>:3443/metrics, where the domain is the one you specified in the Edge CR (or myown.iot.com if you followed the Quickstart installation steps). You can monitor the recommended list of metrics below, with more available at the above endpoint.

Cumulocity Core Metrics

Metrics related to Cumulocity Core, prefixed by c8yedge_core

Metric	Description	Interpretation
c8yedge_core_sag_c8y_process_cpu_usage	CPU usage percentage of Cumulocity Core process.	Key for monitoring the CPU demand of the Cumulocity Core process, helping in identifying high CPU consumption issues.
c8yedge_core_sag_c8y_system_cpu_count	The total count of CPU cores available to the Cumulocity Core system.	Useful for understanding the computing capacity of the system and for scaling considerations.
c8yedge_core_sag_c8y_system_load_average_1m	The 1-minute average system load.	Indicates the immediate demand placed on the system’s resources, helping to identify spikes in usage.
c8yedge_core_sag_c8y_process_files_open_files	The number of files currently open by Cumulocity Core process.	Important for monitoring resource utilization and preventing potential exhaustion of file descriptors.

Apama Metrics

Metrics related to the Apama Microservice, prefixed by c8yedge_apama

For more details, see Monitoring with Prometheus in the Apama documentation.

Metric	Description	Interpretation
c8yedge_apama_process_cpu_usage	CPU usage by Apama process.	Essential for understanding how Apama impacts overall system CPU resources.
c8yedge_apama_system_cpu_usage	The total CPU usage percentage by the system, including Apama and other process.	Helps gauge the overall CPU load and identify potential bottlenecks.
c8yedge_apama_system_load_average_1m	The system’s 1-minute load average, including Apama’s impact.	Gives an immediate view of system demand, useful for identifying sudden increases in load.
c8yedge_apama_sag_apama_in_c8y_uptime_secs	This metric measures the uptime of Apama and its correlator in seconds within the Edge deployment.	These are crucial for tracking the stability and reliability of the Apama service.
c8yedge_apama_process_uptime_seconds	Measures how long the Apama process has been running in seconds.	Similar to the uptime metrics above, this provides insights into Apama’s process stability.
c8yedge_apama_sag_apama_correlator_uptime_seconds	metrics measure the uptime of Apama and its correlator in seconds within the Edge deployment.	These are crucial for tracking the stability and reliability of the Apama service.
c8yedge_apama_process_files_open_files	The number of files currently open by Apama process.	Critical for ensuring Apama does not run into file descriptor limits, affecting its ability to operate.
c8yedge_apama_jvm_threads_peak_threads	The peak thread count used by Apama’s JVM (Java Virtual Machine).	Indicates the maximum concurrency level required by Apama, useful for JVM tuning and performance optimization.
c8yedge_apama_process_start_time_seconds	The start time of the Apama process, measured in seconds since the Unix epoch.	Can be used to determine the Apama process’s age, correlating with other events or metrics.
c8yedge_apama_sag_apama_in_c8y_is_starter_mode	Indicates whether Apama is running in starter mode within Edge.	Starter mode may have different resource usage or limitations compared to full operation mode.
c8yedge_apama_sag_apama_in_c8y_is_safe_mode	Flags if Apama is operating in a safe mode within Edge.	Safe mode might restrict certain operations or functions to ensure stability or security.

MongoDB Metrics

Metrics related to MongoDB, prefixed by c8yedge_db

Metric	Description	Interpretation
c8yedge_db_mongodb_up	Indicates whether the MongoDB database instance is up and running.	A binary metric where 1 means the database is operational, and 0 indicates it is down. This metric is crucial for alerting on database availability.
c8yedge_db_process_cpu_seconds_total	Cumulative CPU time used by the database process, measured in seconds.	This helps in understanding the total CPU time the MongoDB process has consumed since it started, allowing for analysis of CPU usage trends over time.
c8yedge_db_process_resident_memory_bytes	The amount of RAM currently being used by the database process, in bytes.	This metric is vital for understanding the database’s memory footprint, helping to ensure that the database does not exceed available memory resources and to plan for scaling or optimization if necessary.
c8yedge_db_process_virtual_memory_bytes	The total virtual memory used by the database process, in bytes.	Virtual memory includes all memory that the process can access, including what is in RAM and on disk (swap). Monitoring this helps in understanding the database’s overall memory demand, which is crucial for performance and stability.
c8yedge_db_process_virtual_memory_max_bytes	The maximum amount of virtual memory the database process can use.	This metric often reflects a system or process-level limit on memory usage. While not all systems enforce a maximum virtual memory size, when present, this metric can help identify configurations that may limit database performance or scalability.
c8yedge_db_process_open_fds	The current number of open file descriptors by the database process.	Monitoring this metric alongside c8yedge_db_process_max_fds can warn of potential resource exhaustion if the number of open file descriptors approaches the maximum limit.
c8yedge_db_process_max_fds	The maximum number of file descriptors the database process can open.	File descriptors are used by process to access files and network sockets. This metric indicates the upper limit set for these resources, helping to anticipate and prevent resource exhaustion issues.
c8yedge_db_process_start_time_seconds	The start time of the database process, measured in seconds since the Unix epoch.	This metric can be used to determine how long the database has been running since its last restart. It’s useful for tracking uptime and correlating with other events or metrics.

StatefulSets Metrics

This metric tracks the health status of StatefulSets in the Edge deployment. StatefulSets are Kubernetes workloads that manage stateful applications, ensuring that the deployment and scaling of the application are handled properly, and that the application maintains its state across restarts and migrations.
The c8yedge_statefulset metric can be qualified by the StatefulSet name, such as statefulset="c8ycore-sts", statefulset="edge-db-rs0", and so on, for monitoring different StatefulSets in the Edge deployment. A gauge value of 1 for a StatefulSet indicates that it is healthy, meaning all desired replicas are up and serving without issues. A value of 0 indicates a failure, such as one or more replicas not running as expected.

Metrics	Label Options
c8yedge_statefulset{statefulset=“c8ycore-sts”}	- `statefulset="c8ycore-sts"` - `statefulset="edge-db-rs0"` - `statefulset="logging-fluentd"` - `statefulset="event-tailer-event-tailer"`

Deployments Metrics

These metrics represent the health status of Deployments in Edge. Deployments are another type of Kubernetes workload that manage stateless applications, ensuring that a specified number of replicas of the application are running at any given time.
The c8yedge_deployment metric can be qualified by the Deployment name, such as deployment="c8yedge-operator-controller-manager", deployment="apama-ctrl-scope-edge-deployment", and so on, for monitoring different Deployments in Edge. A gauge value of 1 for a Deployment indicates that it is healthy, meaning all desired replicas are up and serving without issues. A value of 0 indicates a failure, such as one or more replicas not running as expected.

Metrics	Label Options
c8yedge_deployment{deployment=“microservices-registry”}	- `deployment="c8yedge-operator-controller-manager"` - `deployment="logging-operator-c8yedge-edge-sample"` - `deployment="psmdb-operator-c8yedge-edge-sample"` - `deployment="microservices-registry"` - `deployment="thin-edge"` - `deployment="smartrule-scope-management-deployment"` - `deployment="apama-ctrl-scope-edge-deployment"` - `deployment="opcua-mgmt-service-scope-management-deployment"`

Pod Metrics (CPU and Memory Usage)

These metrics track the CPU and memory usage of the workloads in your Edge deployment. CPU usage is a critical performance metric, indicating how much of the allocated CPU resources are being utilized by a container. Monitoring memory usage helps in managing application scalability, optimizing resource allocation, and preventing out-of-memory (OOM) issues that could lead to application downtime.
The c8yedge_pod_cpu_usage and c8yedge_pod_memory_usage metrics can be qualified by the Pod and Container names, such as container="cumulocity-core", pod="c8ycore-sts-0" and so on for monitoring the CPU and memory usage of different containers in the Edge deployment.

Pod names often contain a unique hash or a set of alphanumeric characters that change every time the pod is restarted, redeployed, or rescheduled on a different node. To continuously monitor the CPU usage of a container across pod restarts, you can use regular expressions (regex) in your monitoring queries to match only the static part of the pod name. For example, if you want to monitor the apama-ctrl-scope-edge-pod container regardless of pod restarts, you must construct a query that matches any pod name starting with apama-ctrl-scope-edge-deployment without specifying the unique hash, like container="apama-ctrl-scope-edge-pod",pod=~"apama-ctrl-scope-edge-deployment.*".

Metrics	Label Options
c8yedge_pod_cpu_usage{container=“cumulocity-core”, pod=“c8ycore-sts-0”}	- `container="cumulocity-core", pod="c8ycore-sts-0"` - `container="openresty", pod="c8ycore-sts-0"` - `container="smartrule-scope-management-pod", pod=~"smartrule-scope-management-deployment-"` - `container="apama-ctrl-scope-edge-pod", pod=~"apama-ctrl-scope-edge-deployment-"` - `container="opcua-mgmt-service-scope-management-pod", pod=~"opcua-mgmt-service-scope-management-deployment-"` - `container="mongod", pod="edge-db-rs0-0"` - `container="mongodb-exporter", pod="edge-db-rs0-0"` - `container="psmdb-operator", pod=~"psmdb-operator-c8yedge-edge-sample-"` - `container="kedge-agent", pod=~"thin-edge-"` - `container="mosquitto", pod=~"thin-edge-"` - `container="microservices-registry", pod=~"microservices-registry-"` - `container="microservices-registry", pod=~"microservices-registry-garbage-collector-"` - `container="microservices-registry-config", pod=~"microservices-registry-config-"` - `container="fluentd", pod="logging-fluentd-0"` - `container="config-reloader", pod="logging-fluentd-0"` - `container="fluent-bit", pod=~"logging-fluentbit-"` - `container="event-tailer", pod="event-tailer-event-tailer-0"` - `container="logging-operator", pod="logging-operator-c8yedge-edge-sample-"` - `container="manager", pod=~"c8yedge-operator-controller-manager-"`
c8yedge_pod_memory_usage{container=“cumulocity-core”, pod=“c8ycore-sts-0”}	- `container="cumulocity-core", pod="c8ycore-sts-0"` - `container="openresty", pod="c8ycore-sts-0"` - `container="smartrule-scope-management-pod", pod=~"smartrule-scope-management-deployment-"` - `container="apama-ctrl-scope-edge-pod", pod=~"apama-ctrl-scope-edge-deployment-"` - `container="opcua-mgmt-service-scope-management-pod", pod=~"opcua-mgmt-service-scope-management-deployment-"` - `container="mongod", pod="edge-db-rs0-0"` - `container="mongodb-exporter", pod="edge-db-rs0-0"` - `container="psmdb-operator", pod=~"psmdb-operator-c8yedge-edge-sample-"` - `container="kedge-agent", pod=~"thin-edge-"` - `container="mosquitto", pod=~"thin-edge-"` - `container="microservices-registry", pod=~"microservices-registry-"` - `container="microservices-registry", pod=~"microservices-registry-garbage-collector-"` - `container="microservices-registry-config", pod=~"microservices-registry-config-"` - `container="fluentd", pod="logging-fluentd-0"` - `container="config-reloader", pod="logging-fluentd-0"` - `container="fluent-bit", pod=~"logging-fluentbit-"` - `container="event-tailer", pod="event-tailer-event-tailer-0"` - `container="logging-operator", pod="logging-operator-c8yedge-edge-sample-"` - `container="manager", pod=~"c8yedge-operator-controller-manager-"`

Monitoring the Edge metrics from your cloud tenant

In your Cumulocity cloud tenant, you can monitor the measurements of the Edge listed in the table below. To monitor the measurements from your cloud tenant, ensure that you have registered your Edge with the Cumulocity cloud tenant. See Registering Edge in the cloud tenant for more information.

Measurement	Metrics	Description
Disk space	- Total disk space - Free disk space - Used disk space - Percentage of used disk space	The Edge appliance sends the disk space metrics as a measurement for both installation disk and data disk, every 10 minutes. The measurements are sent in gigabytes (GB) rounded to two decimal places. The percentage is rounded to one decimal place. The data points for this measurement are: - c8y_InstallationDisk - c8y_DataDisk If Edge is unable to read the metrics from the installation disk or the data disk, an alarm is sent to the Cumulocity tenant. The alarms have a minor severity and the data points for the alarms are: - c8y_FileSystemMeasurementErrorInstallationDisk - c8y_FileSystemMeasurementErrorDataDisk
Memory (RAM)	- Total RAM - Free RAM - Used RAM - Percentage of RAM used	The Edge appliance sends the memory usage metrics as a measurement every 5 seconds in gibibytes (GiB). The data point for this measurement is c8y_Memory If Edge is unable to read the metrics from the memory, an alarm is sent to the Cumulocity tenant. The data point for the alarm is: - c8y_MemoryMeasurementError.
CPU	Percentage of CPU used Unit: Percentage	The Edge appliance sends the percentage of CPU used at intervals over 5 seconds, 60 seconds, and 600 seconds. The data points for this measurement are: - c8y_CpuUsage5Seconds - c8y_CpuUsage60Seconds - c8y_CpuUsage600Seconds If Edge is unable to read the metrics from the CPU, an alarm is sent to the Cumulocity tenant. The data point for the alarm is: - c8y_CPUMeasurementError.
Disk I/O	- Data read per second - Data written per second Unit: KB/s	The Edge appliance sends the disk input/output metrics as a measurement for both installation disk and data disk at intervals over 5 seconds, 60 seconds, and 600 seconds. The data points for this measurement are: - c8y_DataDiskIo5Seconds - c8y_DataDiskIo60Seconds - c8y_DataDiskIo600Seconds - c8y_InstallationDiskIo5Seconds - c8y_InstallationDiskIo60Seconds - c8y_InstallationDiskIo5Seconds If Edge is unable to read the metrics from the disk, an alarm is sent to the Cumulocity tenant. The data point for the alarm is: - c8y_DiskIOMeasurementError.
Network	- Data and packets sent per second - Data and packets received per second Unit: KB/s and packets/s	The Edge appliance sends the network metrics as a measurement at intervals over 5 seconds, 60 seconds, and 600 seconds. The data points for this measurement are: - c8y_NetworkInterface_lo-5Seconds - c8y_NetworkInterface_lo-60Seconds - c8y_NetworkInterface_lo-600Seconds If Edge is unable to read the metrics from the network, an alarm is sent to the Cumulocity tenant. The data point for the alarm is: - c8y_NetworkIoMeasurementError.

To monitor the metrics in your Cumulocity tenant, you can create a dashboard and add widgets in the Cockpit application of your tenant. For more information about creating dashboards, see Working with dashboards.

Also, you can define smart rules to create alerts or raise alarms for the metrics. For example, when the free disk space is less than 5 GB, create an alert. For more information about smart rules, see Smart rules.