Watching correlator runtime status
The engine_watch
tool lets you monitor the runtime operational status of a running correlator. The executable for this tool is located in the Apama/bin
directory.
Synopsis
To monitor the operation of a correlator, run the following command:
engine_watch [ options ]
When you run this command with the –h
option, the usage message for this command is shown.
Description
The engine_watch
tool periodically polls a correlator for status information, writing the standard status messages to stdout
(see List of correlator status statistics for more information on the standard status messages). When you also specify the -a
option, any user-defined status values are appended to the standard status messages. For additional progress information, use the –v
option.
Options
The engine_watch
tool takes the following options:
Option |
Description |
---|---|
|
Displays usage information. Optional. |
|
Name of the host on which the correlator is running. The default is |
|
Port on which the correlator is listening. Optional. The default is |
|
Specifies the poll interval in milliseconds. Optional. The default is |
|
Writes status output to the named file. Optional. The default is to send status information to |
|
Indicates that you want raw output format, which is more suitable for machine parsing. Raw output format consists of a single line for each status message. Each line is a comma-separated list of status numbers. This format can be useful in a test environment. If you do not specify that you want raw output format, the default is a multi-line, human-readable format for each status message. |
|
Outputs all user-defined status values after the standard status messages. Optional. The default is to output only the standard status messages. |
|
If you also specify the |
|
Outputs one set of status information and then quits. Optional. The default is to indefinitely return status information at the specified poll interval. |
|
Displays process names and versions in addition to status information. Optional. The default is to display only status information. |
|
Displays version information for the |
Exit status
The engine_watch
tool returns the following exit values:
Value | Description |
---|---|
0 |
All status requests were processed successfully. |
1 |
No connection to the correlator was possible or the connection failed. |
2 |
Other error(s) occurred while requesting/processing status. |
List of correlator status statistics
This topic gives a detailed list of the status values that can be monitored for a correlator. The descriptions below show where the status values are used. The status is available through the following mechanisms:
- REST API: The name of the key in the REST API. See also Managing and monitoring over REST and the descriptions of
/correlator/status
and/info/stats
in the API reference for Component Management REST APIs. - Log field: The name of the status log field in the
Status:
log lines in the main correlator log file. See also Correlator status log fields. - Prometheus metric name: The name used to expose internal correlator statistics to the Prometheus monitoring system. See also Monitoring with Prometheus.
- Display name: The standard status message that the
engine_watch
tool writes tostdout
(see Watching correlator runtime status).
The descriptions below also indicate the typical trend. This can be one of the following:
- Steady: After any start-up phase, this number would typically be steady. It may increase as bursts of events come in, or if there is a change in the size of the application (for example, the number of items the application is tracking). Typically, if these numbers are continually trending upwards when there is no more being asked of the application, that indicates an application leak of monitor instances, listeners or objects. This will eventually lead to an out of memory condition.
- Increasing: This may be increasing in normal usage. Depending on deployment, some statistics may not be increasing, though if they normally are and have stopped increasing, this may indicate that something is preventing events being delivered or processed correctly.
- Low: This number is typically 0 or near 0. If this number increases, this typically indicates that the correlator is not keeping up with processing events. For queues, it is normal that during bursts of activity, these may be non-zero for some time. Steadily increasing queue sizes can be a sign of back-pressure due to a slow receiver, or the system is not keeping up and may eventually block senders due to not processing the events at the rate they arrive.
- Varies: Will typically vary. 0 may indicate a problem with events being delivered.
- None: Typically, all contexts and receivers should be keeping up, so none are reported as slow (in which case, the empty string will be returned from the API).
The term “receiver” which is used in the descriptions below refers to any of the following:
- EPL, Java or C++ plug-ins using the
Correlator.subscribe
method. - Connectivity plug-ins for “towards” transport events.
- Client library connections, including other correlators that have been connected with the
engine_connect
, orengine_receive
tools.
Time since the correlator was started
The time in milliseconds since the correlator was started.
Typical trend: increasing.
- REST API:
uptime
- Java API:
getUptime
- Log field: not applicable
- Prometheus metric name:
sag_apama_correlator_uptime_seconds
- Display name:
Uptime (ms)
Number of contexts
The number of contexts in the correlator, including the main context.
Typical trend: steady.
- REST API:
numContexts
- Java API:
getNumContexts
- Log field:
nctx=n
- Prometheus metric name:
sag_apama_correlator_contexts_total
- Display name:
Number of contexts
Number of monitors
The number of EPL monitor definitions injected into the correlator. This number changes on injections, deletions or if the last instance of a monitor terminates.
Typical trend: steady.
- REST API:
numMonitors
- Java API:
getNumMonitors
- Log field: not applicable
- Prometheus metric name:
sag_apama_correlator_monitors_total
- Display name:
Number of monitors
Number of monitor instances
The number of monitor instances, also known as sub-monitors.
Typical trend: steady.
- REST API:
numProcesses
- Java API:
getNumProcesses
- Log field:
sm=n
- Prometheus metric name:
sag_apama_correlator_monitor_instances_total
- Display name:
Number of sub-monitors
Number of Java applications and Java EPL plug-ins
The number of Java applications and Java EPL plug-ins loaded in the correlator. This number changes on injections and deletions.
Typical trend: steady.
- REST API:
numJavaApplications
- Java API:
getNumJavaApplications
- Log field: not applicable
- Prometheus metric name:
sag_apama_correlator_java_applications_total
- Display name:
Number of Java applications
Number of listeners
The number of listeners in all contexts. This includes on
statements and active stream source templates.
Typical trend: steady.
- REST API:
numListeners
- Java API:
getNumListeners
- Log field:
ls=n
- Prometheus metric name:
sag_apama_correlator_listeners_total
- Display name:
Number of listeners
Number of sub-listeners
The number of sub-event-listeners that are active across all contexts. Stream source templates will have one sub-event-listener. An on
statement can have multiple sub-event-listeners. See also Evaluating event listeners for all A-events followed by B-events.
Typical trend: steady.
- REST API:
numSubListeners
- Java API:
getNumSubListeners
- Log field: not applicable
- Prometheus metric name:
sag_apama_correlator_sub_listeners_total
- Display name:
Number of sub-listeners
Number of event types
The number of event types defined within the correlator. This number changes on injections and deletions.
Typical trend: steady.
- REST API:
numEventTypes
- Java API:
getNumEventTypes
- Log field: not applicable
- Prometheus metric name:
sag_apama_correlator_event_types_total
- Display name:
Number of event types
Number of executors on input queues
The number of executors on the input queues of all contexts. As well as events, this can include clock ticks, spawns, injections and other operations. A context in an infinite loop will grow by 10 per second due to clock ticks. Every context has an input queue, which by default is a maximum of 20,000 entries.
Typical trend: low.
- REST API:
numQueuedInput
- Java API:
getNumQueuedInput
- Log field:
iq=n
- Prometheus metric name:
sag_apama_correlator_queued_input_total
- Display name:
Events on input queue
Number of received events
The number of events that the correlator has received from external sources since the correlator started. This includes connectivity plug-ins, engine_send
, other correlators connected with engine_connect
, and events that are not parsed correctly. This number excludes events sent within the correlator from EPL monitors or EPL plug-ins.
Typical trend: increasing.
- REST API:
numReceived
- Java API:
getNumReceived
- Log field:
rx=n
- Prometheus metric name:
sag_apama_correlator_input_total
- Display name:
Events received
Number of processed events
The number of events processed by the correlator in all contexts. This includes external events and events routed to contexts by monitors. An event is considered to have been processed when all listeners and streams that were waiting for it have been triggered, or when it has been determined that there are no listeners for the event.
Typical trend: increasing.
- REST API:
numProcessed
- Java API:
getNumProcessed
- Log field:
not applicable
- Prometheus metric name:
sag_apama_correlator_processed_total
- Display name:
Events processed
Sum of events on route queues
The sum of routed events on the route queues of all contexts.
Typical trend: low.
- REST API:
numQueuedFastTrack
- Java API:
getNumQueuedFastTrack
- Log field:
rq=n
- Prometheus metric name:
sag_apama_correlator_queued_route_total
- Display name:
Events on internal queue
Number of routed events
The number of events that have been routed across all contexts since the correlator was started.
Typical trend: increasing.
- REST API:
numFastTracked
- Java API:
getNumFastTracked
- Log field:
rt=n
- Prometheus metric name:
sag_apama_correlator_route_total
- Display name:
Events routed internally
Number of external consumers/receivers
The number of external consumers/receivers connected to receive emitted events. This includes connectivity plug-ins, engine_receive
, or correlators connected using engine_connect
.
Typical trend: steady.
- REST API:
numConsumers
- Java API:
getNumConsumers
- Log field:
nc=n
- Prometheus metric name:
sag_apama_correlator_consumers_total
- Display name:
Number of consumers
Number of events on output queues
The number of events waiting on output queues to be dispatched to any connected external consumers/receivers.
Typical trend: low.
- REST API:
numOutEventsQueued
- Java API:
getNumOutEventsQueued
- Log field:
oq=n
- Prometheus metric name:
sag_apama_correlator_queued_output_total
- Display name:
Events on output queue
Number of events created for sending to external channels
The number of events that have been sent (see The send… to statement) or emitted (see The emit statement) to channels which have at least one external consumer/receiver subscribed (see also Number of external consumers/receivers). This excludes events sent to channels with no external consumers/receivers. This counts each event once, even if delivered to multiple external consumers/receivers.
Typical trend: increasing.
- REST API:
numEmits
- Java API:
getNumOutEventsCreated
- Log field:
not applicable
- Prometheus metric name:
sag_apama_correlator_created_output_total
- Display name:
Output events created
Number of events delivered to external consumers/receivers
The number of events that have been delivered to external consumers/receivers. This counts for each external consumer/receiver an event is sent to. It counts the number of deliveries of events.
Note:
This status indicator counts every event that was delivered, whereas the previous status indicator counts every event that was sent. For example, sending one event to a channel with two external consumers/receivers would be counted as one event sent (numEmits
), but two events delivered (numOutEventsSent
).
Typical trend: increasing.
- REST API:
numOutEventsSent
- Java API:
getNumOutEventsSent
- Log field:
tx=n
- Prometheus metric name:
sag_apama_correlator_output_total
- Display name:
Output events sent
Number of events on input queues of all public contexts
The number of events on the input queues of all public contexts. See also About context properties for information on the receiveInput
flag.
Typical trend: low.
- REST API:
numInputQueuedInput
- Java API:
getNumInputQueuedInput
- Log field:
icq=n
- Prometheus metric name:
sag_apama_correlator_queued_input_public_total
- Display name:
Events on input context queues
Name of slowest context
The name of the slowest context. This may or may not be a public context.
Typical trend: none.
- REST API:
mostBackedUpInputContext
- Java API:
getMostBackedUpInput
- Log field:
lcn=name
- Prometheus metric name: The name of the slowest context is given as a Prometheus label on the Prometheus metric
sag_apama_correlator_slowest_input_queue_size_total
- Display name:
Slowest context name
Number of events on queue for slowest context
The number of events on the slowest context’s queue, as identified by the name of the slowest context.
Typical trend: low.
- REST API:
mostBackedUpICQueueSize
- Java API:
getMostBackedUpQueueSize
- Log field:
lcq=n
- Prometheus metric name:
sag_apama_correlator_slowest_input_queue_size_total
- Display name:
Slowest context queue size
Time difference in seconds for slowest context
For the context identified by the slowest context name, this is the time difference in seconds between its current logical time and the most recent time tick added to its input queue.
Typical trend: low.
- REST API:
mostBackedUpICLatency
- Java API:
getMostBackedUpICLatency
- Log field:
lct=seconds
- Prometheus metric name:
sag_apama_correlator_slowest_input_queue_latency_seconds
- Display name: not applicable
Name of slowest consumer/receiver of events
The name of the consumer/receiver with the largest number of incoming events waiting to be processed. This is the slowest non-context consumer/receiver of events, which can be an external receiver or an EPL plug-in.
Typical trend: none.
- REST API:
slowestReceiver
- Java API:
getSlowestReceiver
- Log field:
srn=name
- Prometheus metric name: The name of the slowest consumer/receiver of events is given as a Prometheus label on the Prometheus metric
sag_apama_correlator_slowest_output_queue_size_total
- Display name:
Slowest receiver name
Number of events on queue for slowest consumer/receiver
The number of events on the slowest consumer’s/receiver’s queue, as identified by the name of the slowest consumer/receiver.
Typical trend: low.
- REST API:
slowestReceiverQueueSize
- Java API:
getSlowestReceiverQueueSize
- Log field:
srq=n
- Prometheus metric name:
sag_apama_correlator_slowest_output_queue_size_total
- Display name:
Slowest receiver queue size
Number of events per second
The number of events per second currently being processed by the correlator across all contexts. This value is computed with every status refresh and is only an approximation.
Typical trend: varies.
- REST API: not applicable
- Java API: not applicable
- Log field: not applicable
- Prometheus metric name: not applicable
- Display name:
Event rate over last interval
Number of enqueued events
The number of events queued from the enqueue
statement (not the enqueue...to
statement). The enqueue
statement is deprecated.
Typical trend: low.
- REST API:
enqueueQueueSize
- Java API: not applicable
- Log field: not applicable
- Prometheus metric name: not applicable
- Display name: not applicable
Virtual memory
Virtual memory. For the REST API, the value is in megabytes. For the log field, the value is in kilobytes. For Prometheus, the value is in bytes.
Typical trend: steady.
- REST API:
virtualMemoryMB
- Java API: not applicable
- Log field:
vm=kB
- Prometheus metric name:
sag_apama_correlator_virtual_memory_bytes
- Display name: not applicable
Physical memory
Physical memory. For the REST API, the value is in megabytes. For the log field, the value is in kilobytes. For Prometheus, the value is in bytes.
Typical trend: steady.
- REST API:
physicalMemoryMB
- Java API: not applicable
- Log field:
pm=kB
- Prometheus metric name:
sag_apama_correlator_physical_memory_bytes
- Display name: not applicable
Peak physical memory usage
The highest amount of physical memory used by the correlator at any measurement point since startup, given in units of megabytes. This is the highest measured amount of memory, measured when a status line is logged or status is requested from the correlator.
Typical trend: steady.
- REST API:
peakPhysicalMemoryMB
- Java API: not applicable
- Log field: not applicable
- Prometheus metric name: not applicable
- Display name: not applicable
Number of contexts on run queue
The number of contexts on the run queue. These are the contexts that have work to do but are not currently running.
Typical trend: low.
- REST API: not applicable
- Java API: not applicable
- Log field:
runq=n
- Prometheus metric name: not applicable
- Display name: not applicable
Number of pages read from swap space
The number of pages per second that are being read from swap space. If this is greater than zero, it may indicate that the machine is under-provisioned, which can lead to reduced performance, connection timeouts and other problems. Consider adding more memory, reducing the number of other processes running on the machine, or partitioning your Apama application across multiple machines.
Typical trend: low.
- REST API:
swapPagesRead
- Java API: not applicable
- Log field:
si=n
- Prometheus metric name:
sag_apama_correlator_swap_pages_read_hertz
- Display name: not applicable
Number of pages written to swap space
The number of pages per second that are being written to swap space. If this is greater than zero, it may indicate that the machine is under-provisioned, which can lead to reduced performance, connection timeouts and other problems. Consider adding more memory, reducing the number of other processes running on the machine, or partitioning your Apama application across multiple machines.
Typical trend: low.
- REST API:
swapPagesWrite
- Java API: not applicable
- Log field:
so=n
- Prometheus metric name:
sag_apama_correlator_swap_pages_write_hertz
- Display name: not applicable
Total heap memory used by the JVM
The total heap memory used by the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.
Typical trend: steady.
- REST API:
jvmMemoryHeapUsedMB
- Java API: not applicable
- Log field: not applicable
- Prometheus metric name:
sag_apama_correlator_jvm_heap_used_bytes
- Display name: not applicable
Total free heap memory in the JVM
The total heap memory that is free in the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.
Typical trend: steady.
- REST API:
jvmMemoryHeapFreeMB
- Java API: not applicable
- Log field: not applicable
- Prometheus metric name:
sag_apama_correlator_jvm_heap_free_bytes
- Display name: not applicable
Total non-heap memory used by the JVM
The total non-heap memory used by the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.
Typical trend: steady.
- REST API:
jvmMemoryNonHeapUsedMB
- Java API: not applicable
- Log field: not applicable
- Prometheus metric name:
sag_apama_correlator_jvm_non_heap_used_bytes
- Display name: not applicable
Total memory used by all buffer pools in the JVM
The sum of memory used by all buffer pools in the Java virtual machine (JVM) which is embedded in the correlator. For the REST API, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.
Typical trend: steady.
- REST API:
jvmMemoryBufferPoolUsedMB
- Java API: not applicable
- Log field: not applicable
- Prometheus metric name:
sag_apama_correlator_jvm_buffer_pool_used_bytes
- Display name: not applicable
Total memory used by the JVM
The sum of all memory used by the Java virtual machine (JVM) which is embedded in the correlator (that is, the used heap memory, the used non-heap memory, and the used buffer pool memory). For the REST API and the log field, the value is in megabytes. For Prometheus, the value is in bytes. These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API and the log field will return 0 (zero) as the value, and Prometheus will not have this metric.
Typical trend: steady.
- REST API:
jvmMemoryAllUsedMB
- Java API: not applicable
- Log field:
jvm=MB
- Prometheus metric name:
sag_apama_correlator_jvm_memory_all_bytes
- Display name: not applicable
Number of threads in use by the JVM
The total number of active threads in the Java virtual machine (JVM). These statistics will only exist if the embedded JVM has been enabled. If the JVM is disabled, the REST API will return 0 (zero) as the value, and Prometheus will not have this metric.
Typical trend: steady.
- REST API:
jvmNumThreads
- Java API: not applicable
- Log field: not applicable
- Prometheus metric name:
sag_apama_correlator_jvm_num_threads
- Display name: not applicable