Watching correlator runtime status

The engine_watch tool lets you monitor the runtime operational status of a running correlator. The executable for this tool is located in the Apama/bin directory.

Synopsis

To monitor the operation of a correlator, run the following command:

engine_watch [ options ]

When you run this command with the –h option, the usage message for this command is shown.

Description

The engine_watch tool periodically polls a correlator for status information, writing the standard status messages to stdout (see List of correlator status statistics for more information on the standard status messages). When you also specify the -a option, any user-defined status values are appended to the standard status messages. For additional progress information, use the –v option.

Options

The engine_watch tool takes the following options:

Option	Description
`-h \| --help`	Displays usage information. Optional.
`-n host \| --hostname host`	Name of the host on which the correlator is running. The default is `localhost`. Non-ASCII characters are not allowed in host names.
`-p port \| --port port`	Port on which the correlator is listening. Optional. The default is `15903`.
`-i ms \| --interval ms`	Specifies the poll interval in milliseconds. Optional. The default is `1000`.
`-f filename \| --filename filename`	Writes status output to the named file. Optional. The default is to send status information to `stdout`.
`-r \| --raw`	Indicates that you want raw output format, which is more suitable for machine parsing. Raw output format consists of a single line for each status message. Each line is a comma-separated list of status numbers. This format can be useful in a test environment. If you do not specify that you want raw output format, the default is a multi-line, human-readable format for each status message.
`-a \| --all`	Outputs all user-defined status values after the standard status messages. Optional. The default is to output only the standard status messages.
`-t \| --title`	If you also specify the `--raw` option, you can specify the `--title` option so that the output contains headers that make it easy to identify the columns.
`-o \| --once`	Outputs one set of status information and then quits. Optional. The default is to indefinitely return status information at the specified poll interval.
`-v \| --verbose`	Displays process names and versions in addition to status information. Optional. The default is to display only status information.
`-V \| --version`	Displays version information for the `engine_watch` tool. Optional. The default is that the tool does not output this information.

Exit status

The engine_watch tool returns the following exit values:

Value	Description
`0`	All status requests were processed successfully.
`1`	No connection to the correlator was possible or the connection failed.
`2`	Other error(s) occurred while requesting/processing status.

List of correlator status statistics

This topic gives a detailed list of the status values that can be monitored for a correlator. The descriptions below show where the status values are used. The status is available through the following mechanisms:

REST API: The name of the key in the REST API. See also Managing and monitoring over REST and the descriptions of /correlator/status and /info/stats in the API reference for Component Management REST APIs.
Log field: The name of the status log field in the Status: log lines in the main correlator log file. See also Correlator status log fields.
Prometheus metric name: The name used to expose internal correlator statistics to the Prometheus monitoring system. See also Monitoring with Prometheus.
Display name: The standard status message that the engine_watch tool writes to stdout (see Watching correlator runtime status).

The descriptions below also indicate the typical trend. This can be one of the following:

Steady: After any start-up phase, this number would typically be steady. It may increase as bursts of events come in, or if there is a change in the size of the application (for example, the number of items the application is tracking). Typically, if these numbers are continually trending upwards when there is no more being asked of the application, that indicates an application leak of monitor instances, listeners or objects. This will eventually lead to an out of memory condition.
Increasing: This may be increasing in normal usage. Depending on deployment, some statistics may not be increasing, though if they normally are and have stopped increasing, this may indicate that something is preventing events being delivered or processed correctly.
Low: This number is typically 0 or near 0. If this number increases, this typically indicates that the correlator is not keeping up with processing events. For queues, it is normal that during bursts of activity, these may be non-zero for some time. Steadily increasing queue sizes can be a sign of back-pressure due to a slow receiver, or the system is not keeping up and may eventually block senders due to not processing the events at the rate they arrive.
Varies: Will typically vary. 0 may indicate a problem with events being delivered.
None: Typically, all contexts and receivers should be keeping up, so none are reported as slow (in which case, the empty string will be returned from the API).

The term “receiver” which is used in the descriptions below refers to any of the following:

EPL, Java or C++ plug-ins using the Correlator.subscribe method.
Connectivity plug-ins for “towards” transport events.
Client library connections, including other correlators that have been connected with the engine_connect, or engine_receive tools.

Time since the correlator was started

The time in milliseconds since the correlator was started.

Typical trend: increasing.

REST API: uptime
Java API: getUptime
Log field: not applicable
Prometheus metric name: sag_apama_correlator_uptime_seconds
Display name: Uptime (ms)

Number of contexts

The number of contexts in the correlator, including the main context.

Typical trend: steady.

REST API: numContexts
Java API: getNumContexts
Log field: nctx=n
Prometheus metric name: sag_apama_correlator_contexts_total
Display name: Number of contexts

Number of monitors

The number of EPL monitor definitions injected into the correlator. This number changes on injections, deletions or if the last instance of a monitor terminates.

Typical trend: steady.

REST API: numMonitors
Java API: getNumMonitors
Log field: not applicable
Prometheus metric name: sag_apama_correlator_monitors_total
Display name: Number of monitors

Number of monitor instances

The number of monitor instances, also known as sub-monitors.

Typical trend: steady.

REST API: numProcesses
Java API: getNumProcesses
Log field: sm=n
Prometheus metric name: sag_apama_correlator_monitor_instances_total
Display name: Number of sub-monitors

Number of Java applications and Java EPL plug-ins

The number of Java applications and Java EPL plug-ins loaded in the correlator. This number changes on injections and deletions.

Typical trend: steady.

REST API: numJavaApplications
Java API: getNumJavaApplications
Log field: not applicable
Prometheus metric name: sag_apama_correlator_java_applications_total
Display name: Number of Java applications

Number of listeners

The number of listeners in all contexts. This includes on statements and active stream source templates.

Typical trend: steady.

REST API: numListeners
Java API: getNumListeners
Log field: ls=n
Prometheus metric name: sag_apama_correlator_listeners_total
Display name: Number of listeners

Number of sub-listeners

The number of sub-event-listeners that are active across all contexts. Stream source templates will have one sub-event-listener. An on statement can have multiple sub-event-listeners. See also Evaluating event listeners for all A-events followed by B-events.

Typical trend: steady.

REST API: numSubListeners
Java API: getNumSubListeners
Log field: not applicable
Prometheus metric name: sag_apama_correlator_sub_listeners_total
Display name: Number of sub-listeners

Number of event types

The number of event types defined within the correlator. This number changes on injections and deletions.

Typical trend: steady.

REST API: numEventTypes
Java API: getNumEventTypes
Log field: not applicable
Prometheus metric name: sag_apama_correlator_event_types_total
Display name: Number of event types

Number of executors on input queues

The number of executors on the input queues of all contexts. As well as events, this can include clock ticks, spawns, injections and other operations. A context in an infinite loop will grow by 10 per second due to clock ticks. Every context has an input queue, which by default is a maximum of 20,000 entries.

Typical trend: low.

REST API: numQueuedInput
Java API: getNumQueuedInput
Log field: iq=n
Prometheus metric name: sag_apama_correlator_queued_input_total
Display name: Events on input queue

Number of received events

The number of events that the correlator has received from external sources since the correlator started. This includes connectivity plug-ins, engine_send, other correlators connected with engine_connect, and events that are not parsed correctly. This number excludes events sent within the correlator from EPL monitors or EPL plug-ins.

Typical trend: increasing.

REST API: numReceived
Java API: getNumReceived
Log field: rx=n
Prometheus metric name: sag_apama_correlator_input_total
Display name: Events received

Number of processed events

The number of events processed by the correlator in all contexts. This includes external events and events routed to contexts by monitors. An event is considered to have been processed when all listeners and streams that were waiting for it have been triggered, or when it has been determined that there are no listeners for the event.

Typical trend: increasing.

REST API: numProcessed
Java API: getNumProcessed
Log field: not applicable
Prometheus metric name: sag_apama_correlator_processed_total
Display name: Events processed

Sum of events on route queues

The sum of routed events on the route queues of all contexts.

Typical trend: low.

REST API: numQueuedFastTrack
Java API: getNumQueuedFastTrack
Log field: rq=n
Prometheus metric name: sag_apama_correlator_queued_route_total
Display name: Events on internal queue

Number of routed events

The number of events that have been routed across all contexts since the correlator was started.

Typical trend: increasing.

REST API: numFastTracked
Java API: getNumFastTracked
Log field: rt=n
Prometheus metric name: sag_apama_correlator_route_total
Display name: Events routed internally

Number of external consumers/receivers

The number of external consumers/receivers connected to receive emitted events. This includes connectivity plug-ins, engine_receive, or correlators connected using engine_connect.

Typical trend: steady.

REST API: numConsumers
Java API: getNumConsumers
Log field: nc=n
Prometheus metric name: sag_apama_correlator_consumers_total
Display name: Number of consumers

Number of events on output queues

The number of events waiting on output queues to be dispatched to any connected external consumers/receivers.

Typical trend: low.

REST API: numOutEventsQueued
Java API: getNumOutEventsQueued
Log field: oq=n
Prometheus metric name: sag_apama_correlator_queued_output_total
Display name: Events on output queue

Number of events created for sending to external channels

The number of events that have been sent (see The send… to statement) or emitted (see The emit statement) to channels which have at least one external consumer/receiver subscribed (see also Number of external consumers/receivers). This excludes events sent to channels with no external consumers/receivers. This counts each event once, even if delivered to multiple external consumers/receivers.

Typical trend: increasing.

REST API: numEmits
Java API: getNumOutEventsCreated
Log field: not applicable
Prometheus metric name: sag_apama_correlator_created_output_total
Display name: Output events created

Number of events delivered to external consumers/receivers

The number of events that have been delivered to external consumers/receivers. This counts for each external consumer/receiver an event is sent to. It counts the number of deliveries of events.

Note:

This status indicator counts every event that was delivered, whereas the previous status indicator counts every event that was sent. For example, sending one event to a channel with two external consumers/receivers would be counted as one event sent (numEmits), but two events delivered (numOutEventsSent).

Typical trend: increasing.

REST API: numOutEventsSent
Java API: getNumOutEventsSent
Log field: tx=n
Prometheus metric name: sag_apama_correlator_output_total
Display name: Output events sent

Number of events on input queues of all public contexts

The number of events on the input queues of all public contexts. See also About context properties for information on the receiveInput flag.

Typical trend: low.

REST API: numInputQueuedInput
Java API: getNumInputQueuedInput
Log field: icq=n
Prometheus metric name: sag_apama_correlator_queued_input_public_total
Display name: Events on input context queues

Name of slowest context

The name of the slowest context. This may or may not be a public context.

Typical trend: none.

REST API: mostBackedUpInputContext
Java API: getMostBackedUpInput
Log field: lcn=name
Prometheus metric name: The name of the slowest context is given as a Prometheus label on the Prometheus metric sag_apama_correlator_slowest_input_queue_size_total
Display name: Slowest context name

Number of events on queue for slowest context

The number of events on the slowest context’s queue, as identified by the name of the slowest context.

Typical trend: low.

REST API: mostBackedUpICQueueSize
Java API: getMostBackedUpQueueSize
Log field: lcq=n
Prometheus metric name: sag_apama_correlator_slowest_input_queue_size_total
Display name: Slowest context queue size

Time difference in seconds for slowest context

For the context identified by the slowest context name, this is the time difference in seconds between its current logical time and the most recent time tick added to its input queue.

Typical trend: low.

REST API: mostBackedUpICLatency
Java API: getMostBackedUpICLatency
Log field: lct=seconds
Prometheus metric name: sag_apama_correlator_slowest_input_queue_latency_seconds
Display name: not applicable

Name of slowest consumer/receiver of events

The name of the consumer/receiver with the largest number of incoming events waiting to be processed. This is the slowest non-context consumer/receiver of events, which can be an external receiver or an EPL plug-in.

Typical trend: none.

REST API: slowestReceiver
Java API: getSlowestReceiver
Log field: srn=name
Prometheus metric name: The name of the slowest consumer/receiver of events is given as a Prometheus label on the Prometheus metric sag_apama_correlator_slowest_output_queue_size_total
Display name: Slowest receiver name

Number of events on queue for slowest consumer/receiver

The number of events on the slowest consumer’s/receiver’s queue, as identified by the name of the slowest consumer/receiver.

Typical trend: low.

REST API: slowestReceiverQueueSize
Java API: getSlowestReceiverQueueSize
Log field: srq=n
Prometheus metric name: sag_apama_correlator_slowest_output_queue_size_total
Display name: Slowest receiver queue size

Number of events per second

The number of events per second currently being processed by the correlator across all contexts. This value is computed with every status refresh and is only an approximation.

Typical trend: varies.

REST API: not applicable
Java API: not applicable
Log field: not applicable
Prometheus metric name: not applicable
Display name: Event rate over last interval

Number of enqueued events

The number of events queued from the enqueue statement (not the enqueue...to statement). The enqueue statement is deprecated.

Typical trend: low.

REST API: enqueueQueueSize
Java API: not applicable
Log field: not applicable
Prometheus metric name: not applicable
Display name: not applicable

Virtual memory

Virtual memory. For the REST API, the value is in megabytes. For the log field, the value is in kilobytes. For Prometheus, the value is in bytes.

Typical trend: steady.

REST API: virtualMemoryMB
Java API: not applicable
Log field: vm=kB
Prometheus metric name: sag_apama_correlator_virtual_memory_bytes
Display name: not applicable

Physical memory

Physical memory. For the REST API, the value is in megabytes. For the log field, the value is in kilobytes. For Prometheus, the value is in bytes.

Typical trend: steady.

REST API: physicalMemoryMB
Java API: not applicable
Log field: pm=kB
Prometheus metric name: sag_apama_correlator_physical_memory_bytes
Display name: not applicable

Peak physical memory usage

The highest amount of physical memory used by the correlator at any measurement point since startup, given in units of megabytes. This is the highest measured amount of memory, measured when a status line is logged or status is requested from the correlator.

Typical trend: steady.

REST API: peakPhysicalMemoryMB
Java API: not applicable
Log field: not applicable
Prometheus metric name: not applicable
Display name: not applicable

Number of contexts on run queue

The number of contexts on the run queue. These are the contexts that have work to do but are not currently running.

Typical trend: low.

REST API: not applicable
Java API: not applicable
Log field: runq=n
Prometheus metric name: not applicable
Display name: not applicable

Number of pages read from swap space

The number of pages per second that are being read from swap space. If this is greater than zero, it may indicate that the machine is under-provisioned, which can lead to reduced performance, connection timeouts and other problems. Consider adding more memory, reducing the number of other processes running on the machine, or partitioning your Apama application across multiple machines.

Typical trend: low.

REST API: swapPagesRead
Java API: not applicable
Log field: si=n
Prometheus metric name: sag_apama_correlator_swap_pages_read_hertz
Display name: not applicable

Number of pages written to swap space

The number of pages per second that are being written to swap space. If this is greater than zero, it may indicate that the machine is under-provisioned, which can lead to reduced performance, connection timeouts and other problems. Consider adding more memory, reducing the number of other processes running on the machine, or partitioning your Apama application across multiple machines.

Typical trend: low.

REST API: swapPagesWrite
Java API: not applicable
Log field: so=n
Prometheus metric name: sag_apama_correlator_swap_pages_write_hertz
Display name: not applicable

Total heap memory used by the JVM

The total heap memory used by the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

REST API: jvmMemoryHeapUsedMB
Java API: not applicable
Log field: not applicable
Prometheus metric name: sag_apama_correlator_jvm_heap_used_bytes
Display name: not applicable

Total free heap memory in the JVM

The total heap memory that is free in the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

REST API: jvmMemoryHeapFreeMB
Java API: not applicable
Log field: not applicable
Prometheus metric name: sag_apama_correlator_jvm_heap_free_bytes
Display name: not applicable

Total non-heap memory used by the JVM

The total non-heap memory used by the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

REST API: jvmMemoryNonHeapUsedMB
Java API: not applicable
Log field: not applicable
Prometheus metric name: sag_apama_correlator_jvm_non_heap_used_bytes
Display name: not applicable

Total memory used by all buffer pools in the JVM

The sum of memory used by all buffer pools in the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

REST API: jvmMemoryBufferPoolUsedMB
Java API: not applicable
Log field: not applicable
Prometheus metric name: sag_apama_correlator_jvm_buffer_pool_used_bytes
Display name: not applicable

Total memory used by the JVM

The sum of all memory used by the Java virtual machine (JVM) which is embedded in the correlator (that is, the used heap memory, the used non-heap memory, and the used buffer pool memory). For the REST API and the log field, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API and the log field will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

REST API: jvmMemoryAllUsedMB
Java API: not applicable
Log field: jvm=MB
Prometheus metric name: sag_apama_correlator_jvm_memory_all_bytes
Display name: not applicable

Number of threads in use by the JVM

The total number of active threads in the Java virtual machine (JVM). These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

REST API: jvmNumThreads
Java API: not applicable
Log field: not applicable
Prometheus metric name: sag_apama_correlator_jvm_num_threads
Display name: not applicable