Watching correlator runtime status

The engine_watch tool lets you monitor the runtime operational status of a running correlator. The executable for this tool is located in the Apama/bin directory.

Synopsis

To monitor the operation of a correlator, run the following command:

engine_watch [ options ]

When you run this command with the –h option, the usage message for this command is shown.

Description

The engine_watch tool periodically polls a correlator for status information, writing the standard status messages to stdout (see List of correlator status statistics for more information on the standard status messages). When you also specify the -a option, any user-defined status values are appended to the standard status messages. For additional progress information, use the –v option.

Options

The engine_watch tool takes the following options:

Option

Description

-h | --help

Displays usage information. Optional.

-n host | --hostname host

Name of the host on which the correlator is running. The default is localhost. Non-ASCII characters are not allowed in host names.

-p port | --port port

Port on which the correlator is listening. Optional. The default is 15903.

-i ms | --interval ms

Specifies the poll interval in milliseconds. Optional. The default is 1000.

-f filename | --filename filename

Writes status output to the named file. Optional. The default is to send status information to stdout.

-r | --raw

Indicates that you want raw output format, which is more suitable for machine parsing. Raw output format consists of a single line for each status message. Each line is a comma-separated list of status numbers. This format can be useful in a test environment. If you do not specify that you want raw output format, the default is a multi-line, human-readable format for each status message.

-a | --all

Outputs all user-defined status values after the standard status messages. Optional. The default is to output only the standard status messages.

-t | --title

If you also specify the --raw option, you can specify the --title option so that the output contains headers that make it easy to identify the columns.

-o | --once

Outputs one set of status information and then quits. Optional. The default is to indefinitely return status information at the specified poll interval.

-v | --verbose

Displays process names and versions in addition to status information. Optional. The default is to display only status information.

-V | --version

Displays version information for the engine_watch tool. Optional. The default is that the tool does not output this information.

Exit status

The engine_watch tool returns the following exit values:

Value Description
0 All status requests were processed successfully.
1 No connection to the correlator was possible or the connection failed.
2 Other error(s) occurred while requesting/processing status.

List of correlator status statistics

This topic gives a detailed list of the status values that can be monitored for a correlator. The descriptions below show where the status values are used. The status is available through the following mechanisms:

The descriptions below also indicate the typical trend. This can be one of the following:

  • Steady: After any start-up phase, this number would typically be steady. It may increase as bursts of events come in, or if there is a change in the size of the application (for example, the number of items the application is tracking). Typically, if these numbers are continually trending upwards when there is no more being asked of the application, that indicates an application leak of monitor instances, listeners or objects. This will eventually lead to an out of memory condition.
  • Increasing: This may be increasing in normal usage. Depending on deployment, some statistics may not be increasing, though if they normally are and have stopped increasing, this may indicate that something is preventing events being delivered or processed correctly.
  • Low: This number is typically 0 or near 0. If this number increases, this typically indicates that the correlator is not keeping up with processing events. For queues, it is normal that during bursts of activity, these may be non-zero for some time. Steadily increasing queue sizes can be a sign of back-pressure due to a slow receiver, or the system is not keeping up and may eventually block senders due to not processing the events at the rate they arrive.
  • Varies: Will typically vary. 0 may indicate a problem with events being delivered.
  • None: Typically, all contexts and receivers should be keeping up, so none are reported as slow (in which case, the empty string will be returned from the API).

The term “receiver” which is used in the descriptions below refers to any of the following:

  • EPL, Java or C++ plug-ins using the Correlator.subscribe method.
  • Connectivity plug-ins for “towards” transport events.
  • Client library connections, including other correlators that have been connected with the engine_connect, or engine_receive tools.
Time since the correlator was started

The time in milliseconds since the correlator was started.

Typical trend: increasing.

  • REST API: uptime
  • Java API: getUptime
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_uptime_seconds
  • Display name: Uptime (ms)
Number of contexts

The number of contexts in the correlator, including the main context.

Typical trend: steady.

  • REST API: numContexts
  • Java API: getNumContexts
  • Log field: nctx=n
  • Prometheus metric name: sag_apama_correlator_contexts_total
  • Display name: Number of contexts
Number of monitors

The number of EPL monitor definitions injected into the correlator. This number changes on injections, deletions or if the last instance of a monitor terminates.

Typical trend: steady.

  • REST API: numMonitors
  • Java API: getNumMonitors
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_monitors_total
  • Display name: Number of monitors
Number of monitor instances

The number of monitor instances, also known as sub-monitors.

Typical trend: steady.

  • REST API: numProcesses
  • Java API: getNumProcesses
  • Log field: sm=n
  • Prometheus metric name: sag_apama_correlator_monitor_instances_total
  • Display name: Number of sub-monitors
Number of Java applications and Java EPL plug-ins

The number of Java applications and Java EPL plug-ins loaded in the correlator. This number changes on injections and deletions.

Typical trend: steady.

  • REST API: numJavaApplications
  • Java API: getNumJavaApplications
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_java_applications_total
  • Display name: Number of Java applications
Number of listeners

The number of listeners in all contexts. This includes on statements and active stream source templates.

Typical trend: steady.

  • REST API: numListeners
  • Java API: getNumListeners
  • Log field: ls=n
  • Prometheus metric name: sag_apama_correlator_listeners_total
  • Display name: Number of listeners
Number of sub-listeners

The number of sub-event-listeners that are active across all contexts. Stream source templates will have one sub-event-listener. An on statement can have multiple sub-event-listeners. See also Evaluating event listeners for all A-events followed by B-events.

Typical trend: steady.

  • REST API: numSubListeners
  • Java API: getNumSubListeners
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_sub_listeners_total
  • Display name: Number of sub-listeners
Number of event types

The number of event types defined within the correlator. This number changes on injections and deletions.

Typical trend: steady.

  • REST API: numEventTypes
  • Java API: getNumEventTypes
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_event_types_total
  • Display name: Number of event types
Number of executors on input queues

The number of executors on the input queues of all contexts. As well as events, this can include clock ticks, spawns, injections and other operations. A context in an infinite loop will grow by 10 per second due to clock ticks. Every context has an input queue, which by default is a maximum of 20,000 entries.

Typical trend: low.

  • REST API: numQueuedInput
  • Java API: getNumQueuedInput
  • Log field: iq=n
  • Prometheus metric name: sag_apama_correlator_queued_input_total
  • Display name: Events on input queue
Number of received events

The number of events that the correlator has received from external sources since the correlator started. This includes connectivity plug-ins, engine_send, other correlators connected with engine_connect, and events that are not parsed correctly. This number excludes events sent within the correlator from EPL monitors or EPL plug-ins.

Typical trend: increasing.

  • REST API: numReceived
  • Java API: getNumReceived
  • Log field: rx=n
  • Prometheus metric name: sag_apama_correlator_input_total
  • Display name: Events received
Number of processed events

The number of events processed by the correlator in all contexts. This includes external events and events routed to contexts by monitors. An event is considered to have been processed when all listeners and streams that were waiting for it have been triggered, or when it has been determined that there are no listeners for the event.

Typical trend: increasing.

  • REST API: numProcessed
  • Java API: getNumProcessed
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_processed_total
  • Display name: Events processed
Sum of events on route queues

The sum of routed events on the route queues of all contexts.

Typical trend: low.

  • REST API: numQueuedFastTrack
  • Java API: getNumQueuedFastTrack
  • Log field: rq=n
  • Prometheus metric name: sag_apama_correlator_queued_route_total
  • Display name: Events on internal queue
Number of routed events

The number of events that have been routed across all contexts since the correlator was started.

Typical trend: increasing.

  • REST API: numFastTracked
  • Java API: getNumFastTracked
  • Log field: rt=n
  • Prometheus metric name: sag_apama_correlator_route_total
  • Display name: Events routed internally
Number of external consumers/receivers

The number of external consumers/receivers connected to receive emitted events. This includes connectivity plug-ins, engine_receive, or correlators connected using engine_connect.

Typical trend: steady.

  • REST API: numConsumers
  • Java API: getNumConsumers
  • Log field: nc=n
  • Prometheus metric name: sag_apama_correlator_consumers_total
  • Display name: Number of consumers
Number of events on output queues

The number of events waiting on output queues to be dispatched to any connected external consumers/receivers.

Typical trend: low.

  • REST API: numOutEventsQueued
  • Java API: getNumOutEventsQueued
  • Log field: oq=n
  • Prometheus metric name: sag_apama_correlator_queued_output_total
  • Display name: Events on output queue
Number of events created for sending to external channels

The number of events that have been sent (see The send… to statement) or emitted (see The emit statement) to channels which have at least one external consumer/receiver subscribed (see also Number of external consumers/receivers). This excludes events sent to channels with no external consumers/receivers. This counts each event once, even if delivered to multiple external consumers/receivers.

Typical trend: increasing.

  • REST API: numEmits
  • Java API: getNumOutEventsCreated
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_created_output_total
  • Display name: Output events created
Number of events delivered to external consumers/receivers

The number of events that have been delivered to external consumers/receivers. This counts for each external consumer/receiver an event is sent to. It counts the number of deliveries of events.

Note:

This status indicator counts every event that was delivered, whereas the previous status indicator counts every event that was sent. For example, sending one event to a channel with two external consumers/receivers would be counted as one event sent (numEmits), but two events delivered (numOutEventsSent).

Typical trend: increasing.

  • REST API: numOutEventsSent
  • Java API: getNumOutEventsSent
  • Log field: tx=n
  • Prometheus metric name: sag_apama_correlator_output_total
  • Display name: Output events sent
Number of events on input queues of all public contexts

The number of events on the input queues of all public contexts. See also About context properties for information on the receiveInput flag.

Typical trend: low.

  • REST API: numInputQueuedInput
  • Java API: getNumInputQueuedInput
  • Log field: icq=n
  • Prometheus metric name: sag_apama_correlator_queued_input_public_total
  • Display name: Events on input context queues
Name of slowest context

The name of the slowest context. This may or may not be a public context.

Typical trend: none.

  • REST API: mostBackedUpInputContext
  • Java API: getMostBackedUpInput
  • Log field: lcn=name
  • Prometheus metric name: The name of the slowest context is given as a Prometheus label on the Prometheus metric sag_apama_correlator_slowest_input_queue_size_total
  • Display name: Slowest context name
Number of events on queue for slowest context

The number of events on the slowest context’s queue, as identified by the name of the slowest context.

Typical trend: low.

  • REST API: mostBackedUpICQueueSize
  • Java API: getMostBackedUpQueueSize
  • Log field: lcq=n
  • Prometheus metric name: sag_apama_correlator_slowest_input_queue_size_total
  • Display name: Slowest context queue size
Time difference in seconds for slowest context

For the context identified by the slowest context name, this is the time difference in seconds between its current logical time and the most recent time tick added to its input queue.

Typical trend: low.

  • REST API: mostBackedUpICLatency
  • Java API: getMostBackedUpICLatency
  • Log field: lct=seconds
  • Prometheus metric name: sag_apama_correlator_slowest_input_queue_latency_seconds
  • Display name: not applicable
Name of slowest consumer/receiver of events

The name of the consumer/receiver with the largest number of incoming events waiting to be processed. This is the slowest non-context consumer/receiver of events, which can be an external receiver or an EPL plug-in.

Typical trend: none.

  • REST API: slowestReceiver
  • Java API: getSlowestReceiver
  • Log field: srn=name
  • Prometheus metric name: The name of the slowest consumer/receiver of events is given as a Prometheus label on the Prometheus metric sag_apama_correlator_slowest_output_queue_size_total
  • Display name: Slowest receiver name
Number of events on queue for slowest consumer/receiver

The number of events on the slowest consumer’s/receiver’s queue, as identified by the name of the slowest consumer/receiver.

Typical trend: low.

  • REST API: slowestReceiverQueueSize
  • Java API: getSlowestReceiverQueueSize
  • Log field: srq=n
  • Prometheus metric name: sag_apama_correlator_slowest_output_queue_size_total
  • Display name: Slowest receiver queue size
Number of events per second

The number of events per second currently being processed by the correlator across all contexts. This value is computed with every status refresh and is only an approximation.

Typical trend: varies.

  • REST API: not applicable
  • Java API: not applicable
  • Log field: not applicable
  • Prometheus metric name: not applicable
  • Display name: Event rate over last interval
Number of enqueued events

The number of events queued from the enqueue statement (not the enqueue...to statement). The enqueue statement is deprecated.

Typical trend: low.

  • REST API: enqueueQueueSize
  • Java API: not applicable
  • Log field: not applicable
  • Prometheus metric name: not applicable
  • Display name: not applicable
Virtual memory

Virtual memory. For the REST API, the value is in megabytes. For the log field, the value is in kilobytes. For Prometheus, the value is in bytes.

Typical trend: steady.

  • REST API: virtualMemoryMB
  • Java API: not applicable
  • Log field: vm=kB
  • Prometheus metric name: sag_apama_correlator_virtual_memory_bytes
  • Display name: not applicable
Physical memory

Physical memory. For the REST API, the value is in megabytes. For the log field, the value is in kilobytes. For Prometheus, the value is in bytes.

Typical trend: steady.

  • REST API: physicalMemoryMB
  • Java API: not applicable
  • Log field: pm=kB
  • Prometheus metric name: sag_apama_correlator_physical_memory_bytes
  • Display name: not applicable
Peak physical memory usage

The highest amount of physical memory used by the correlator at any measurement point since startup, given in units of megabytes. This is the highest measured amount of memory, measured when a status line is logged or status is requested from the correlator.

Typical trend: steady.

  • REST API: peakPhysicalMemoryMB
  • Java API: not applicable
  • Log field: not applicable
  • Prometheus metric name: not applicable
  • Display name: not applicable
Number of contexts on run queue

The number of contexts on the run queue. These are the contexts that have work to do but are not currently running.

Typical trend: low.

  • REST API: not applicable
  • Java API: not applicable
  • Log field: runq=n
  • Prometheus metric name: not applicable
  • Display name: not applicable
Number of pages read from swap space

The number of pages per second that are being read from swap space. If this is greater than zero, it may indicate that the machine is under-provisioned, which can lead to reduced performance, connection timeouts and other problems. Consider adding more memory, reducing the number of other processes running on the machine, or partitioning your Apama application across multiple machines.

Typical trend: low.

  • REST API: swapPagesRead
  • Java API: not applicable
  • Log field: si=n
  • Prometheus metric name: sag_apama_correlator_swap_pages_read_hertz
  • Display name: not applicable
Number of pages written to swap space

The number of pages per second that are being written to swap space. If this is greater than zero, it may indicate that the machine is under-provisioned, which can lead to reduced performance, connection timeouts and other problems. Consider adding more memory, reducing the number of other processes running on the machine, or partitioning your Apama application across multiple machines.

Typical trend: low.

  • REST API: swapPagesWrite
  • Java API: not applicable
  • Log field: so=n
  • Prometheus metric name: sag_apama_correlator_swap_pages_write_hertz
  • Display name: not applicable
Total heap memory used by the JVM

The total heap memory used by the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

  • REST API: jvmMemoryHeapUsedMB
  • Java API: not applicable
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_jvm_heap_used_bytes
  • Display name: not applicable
Total free heap memory in the JVM

The total heap memory that is free in the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

  • REST API: jvmMemoryHeapFreeMB
  • Java API: not applicable
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_jvm_heap_free_bytes
  • Display name: not applicable
Total non-heap memory used by the JVM

The total non-heap memory used by the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

  • REST API: jvmMemoryNonHeapUsedMB
  • Java API: not applicable
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_jvm_non_heap_used_bytes
  • Display name: not applicable
Total memory used by all buffer pools in the JVM

The sum of memory used by all buffer pools in the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

  • REST API: jvmMemoryBufferPoolUsedMB
  • Java API: not applicable
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_jvm_buffer_pool_used_bytes
  • Display name: not applicable
Total memory used by the JVM

The sum of all memory used by the Java virtual machine (JVM) which is embedded in the correlator (that is, the used heap memory, the used non-heap memory, and the used buffer pool memory). For the REST API and the log field, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API and the log field will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

  • REST API: jvmMemoryAllUsedMB
  • Java API: not applicable
  • Log field: jvm=MB
  • Prometheus metric name: sag_apama_correlator_jvm_memory_all_bytes
  • Display name: not applicable
Number of threads in use by the JVM

The total number of active threads in the Java virtual machine (JVM). These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.

Typical trend: steady.

  • REST API: jvmNumThreads
  • Java API: not applicable
  • Log field: not applicable
  • Prometheus metric name: sag_apama_correlator_jvm_num_threads
  • Display name: not applicable