Troubleshooting and diagnostics

Info: If you have Apama Smart Rules-only (also called Apama-ctrl-smartrules), the functionality described in this topic does not apply.

Downloading diagnostics and logs

If a user has READ permission for “CEP management”, then two links for downloading diagnostics information are available in both the Apama EPL Apps and Apama Analytics Builder web applications: one for downloading basic diagnostics information (the Diagnostics link) and another one for downloading enhanced (more resource-intensive) diagnostics information (the Enhanced link). These links are shown at the bottom of the web application’s starting page (that is, in the EPL app manager and in the model manager).

It may be useful to capture this diagnostics information when experiencing problems, or for debugging EPL apps. It is also useful to provide to support if you are filing a support ticket.

You can see a version number in the EPL app manager and in the model manager. It is shown next to the above links.

Basic diagnostics information is provided in a ZIP file named diagnostic-overview<timestamp>.zip and includes the following information (this should be typically a few Megabytes, and be generated in about 5 seconds):

Enhanced diagnostics information is provided in a ZIP file named diagnostic-enhanced<timestamp>.zip and includes the following information:

What a user can see or do depends on the permissions:

Log files of the Apama-ctrl microservice

There are two ways to get the logs of the Apama-ctrl microservice:

Contact Software AG Support if needed.

Diagnostics REST endpoints

The following diagnostics endpoints are available for REST requests. These require authentication as a user with READ permission for “CEP management”:

Alarms generated by the Apama-ctrl microservice

Alarms are created by user applications in the Cumulocity IoT tenant (for example, by a smart rule, an Analytics Builder model, or an activated EPL file). To learn about alarms in general, refer to Working with alarms in the User guide. The Apama-ctrl microservice also generates alarms because it has encountered some problem. For example, if the Apama license is about to expire, the Apama-ctrl microservice generates an alarm to the platform, so that the user is notified about the situation. The information below is about alarms that are generated by the Apama-ctrl microservice, their causes, consequences and possible ways to resolve them.

Info: Alarms generated by Apama-ctrl about its own state are available as of Apama Analytics Builder 10.5.0 and Apama EPL Apps 10.5.0.

You can view alarms in the following ways:

  1. In the Cockpit application. See Cockpit in the User guide for detailed information.
  2. In the Administration application, in the Applications menu under Subscribed applications (or possibly under Own applications). Click the card for Apama-ctrl and then click Status. See Managing applications and especially its subsection Monitoring microservices in the User guide for detailed information.
  3. From the Apama EPL Apps and Apama Analytics Builder applications. Click the Diagnostics (or Enhanced) link which is provided with both applications. A ZIP file is then downloaded that contains alarms information under /alarm/alarms_apama-ctrl-object.json. See Downloading diagnostics and logs for detailed information.

Alarm severities

Severity Description
CRITICAL Apama-ctrl was unable to continue running the user’s applications and will require corrective action.
MAJOR Apama-ctrl has encountered a situation that will result in some loss of service (for example, due to a restart).
MINOR Apama-ctrl has a problem that you might want to fix.
WARNING There is a warning.

Alarms created by the Apama-ctrl microservice

Apama-ctrl can create alarms to notify users in scenarios such as Apama license expiry, the correlator running out of memory, uncaught exceptions in activated EPL files, etc. Once you see an alarm in the Cumulocity IoT tenant, you should diagnose it and resolve it depending on the severity level of the raised alarm. Each alarm has details such as title, text, type, date, and count (represents the number of times the alarm has been raised).

The following is a list of the alarms. The information further down below explains when these alarms will occur, their consequences, and how to resolve them.

Once the cause of an alarm is resolved, you have to acknowledge and clear the alarm in the Cumulocity IoT tenant. Otherwise, you will continue to see the alarm until a further restart of the Apama-ctrl microservice.

Info: The alarm texts for the alarms below may undergo minor changes in the future.

Change in tenant options and restart of Apama-ctrl

This alarm is raised only when a tenant option changes in the analytics.builder category. For details on the tenant options, refer to Option in the Reference guide.

Apama Analytics Builder allows you to configure its settings by changing the tenant options, using key names such as numWorkerThreads or status_device_name. For example, if you want to process things in parallel, you can set numWorkerThreads to 3 by sending a REST request to Cumulocity IoT, which will update the tenant option. Such a change requires a restart of the Apama-ctrl microservice. To notify the users about the restart, Apama-ctrl raises an alarm, saying that Apama has detected a change in a tenant option and will restart in order to use it.

Once you see this alarm, you can be sure that your change is effective.

No license found on startup

This alarm is raised when the correlator cannot find a valid Apama license. Software AG provides an Apama license file for use during the product installation. For more details, see About Apama license files in the Apama documentation.

The Apama correlator looks for a license key during its startup. If a license key is not found, the correlator continues to run with limited capabilities, such as restrictions on the number of EPL monitors, query definitions, etc. For more details on the limited capabilities, refer to Running Apama without a license file in the Apama documentation.

When a valid Apama license is not found, an alarm of MAJOR severity with the above type and text is raised by the Apama-ctrl microservice. You can either continue without an Apama license with limited capabilities or upload a valid Apama license file to get the full functionality of Apama EPL Apps and Apama Analytics Builder. For getting a license, refer to License file in the Apama documentation.

License change and restart of Apama-ctrl

This alarm is raised when Apama-ctrl restarts due to license changes.

The Apama-ctrl microservice detects changes in the license file. To apply the latest changes, the Apama-ctrl microservice has to be restarted. To notify the users about the restart, the microservice raises an alarm with MINOR severity.

It is recommended that you also check for any other alarms after the restart, just to make sure Apama-ctrl is able to start with the new license.

License expiry

This alarm is raised when the license is about to expire. It is repeated seven days prior to the license expiry date.

The Apama license key has the expiry date as one of its parameters. During the startup process, the Apama-ctrl microservice checks the license expiry date against the current date and raises an alarm with WARNING severity seven days prior to the expiry date. In order to continue without any intervention, you should renew the license before the expiry date. If the license expires, there is a grace period of seven days, beyond which the correlator will exit and kill the microservice.

Currently, when the Apama-ctrl microservice goes down due to license expiry, it generates an alarm with a generic message saying “Apama has exited unexpectedly. As a precaution, user-provided EPL and Analytics Builder models that might have caused this have been disabled, refer to the audit log for more details. Please check any recent alarms, or contact support or your administrator”.

Refer to Software AG support or the Operations guide (available from the Software AG Empower Product Support website) for how to upload a new license file. After uploading a new license file, you have to re-start the Apama-ctrl microservice manually.

You can view the license information using the diagnostics information as described in Downloading diagnostics and logs. When you click the Diagnostics link, a ZIP file is downloaded which contains license information in the file /info/license.json.

Safe mode on startup

This alarm is raised whenever the Apama-ctrl microservice switches to Safe mode.

In the case of unexpected restarts, Apama-ctrl assumes that they may have been caused by user error. For example, an EPL app that consumes more memory than is available, or an extension containing bugs. To avoid an infinite restart loop caused by these errors, Safe mode is activated, resulting in all user-provided content being disabled.

The microservice checks on every restart if it has restarted in the last 20 minutes. If so, the microservice considers the restart as unexpected and enables Safe mode. Otherwise, it treats the restart as a normal restart. Safe mode can be erroneously triggered by the user manually unsubscribing and resubscribing the microservice too quickly, or by problems in the hosting infrastructure that cause frequent restarts.

You can check the mode of the microservice (either Normal or Safe mode) by making a REST request to service/cep/diagnostics/apamaCtrlStatus (available as of Apama EPL Apps 10.5.7 and Apama Analytics Builder 10.5.7), which contains a safe_mode flag in its response.

To diagnose the cause of an unexpected restart, you can try the following:

In Safe mode, all previously active Apama Analytics Builder models and Apama EPL apps are deactivated and must be manually re-activated.

Deactivating models in Apama Starter

This alarm is raised when Apama-ctrl switches from the licensed microservice to Apama Starter with more than 3 active models.

In Apama Starter, a user can have a maximum of 3 active models. For example, a user is working with the licensed Apama-ctrl microservice and has 5 active models, and then switches to Apama Starter. Since Apama Starter does not allow more than 3 active models, it deactivates all the active models (5) and raises an alarm to notify the user.

High memory usage

This alarm is raised whenever the correlator consumes 90% of the maximum memory permitted for the microservice container. During this time, the Apama-ctrl microservice automatically generates the diagnostics overview ZIP file which contains diagnostics information used for identifying the most likely cause for memory consumption.

There are 3 variants of this alarm, depending on the time and count restrictions of the generated diagnostics overview ZIP file.

First variant:

Second variant:

Third variant:

Running Apama EPL Apps (and to a lesser extent, smart rules and Analytics Builder) consumes memory, the amount will depend a lot on the nature of the app running. The memory usage should be approximately constant for a given set of apps, but it is possible to create a “memory leak”, particularly in an EPL file or a custom block. The Apama-ctrl microservice monitors memory and raises an alarm with WARNING severity if the 90% memory limit is reached along with the diagnostics overview ZIP file and saves it to the files repository (as mentioned in the alarm text).

Apama-ctrl generates the diagnostics overview ZIP files with the following conditions:

To diagnose high-memory-consuming models and EPL apps, you can try the following (it could be listener leaks, excessive state being stored or spawned monitors leaking, etc.):

If the memory continues to grow, then when it reaches the limit, the correlator will run out of memory and Apama-ctrl will shut down. To prevent the microservice from going down, you have to fix this as a priority.

See also the TECHniques blog post at Diagnostic tools for Apama in Cumulocity IoT.

Warning or higher level logging from an EPL file

This alarm is raised whenever messages are logged by Apama EPL files with specific log levels (including CRITICAL, FATAL, ERROR and WARNING).

Apama EPL Apps allows you to deploy EPL files to the correlator. The Apama-ctrl microservice analyzes logged content in the EPL files and raises an alarm for specific log levels with details such as monitor name, log text and alarm type (either of WARNING or MAJOR), based on the log level.

For example, the following is a simple monitor which prints a sequence and logs some texts at different EPL log levels.

monitor Sample{
   action onload() {
      log "Info"; // default log level is now INFO
      log "Fatal Error" at FATAL; // log level is FATAL
      log "Critical Error" at CRIT; // log level is CRITICAL
      log "Warning" at WARN; // log level is WARNING
   }
}

Apama-ctrl analyzes all the log messages, filters out only certain log messages, and raises an alarm for the identified ones. Thus, Apama-ctrl generates the following three alarms for the above example:

First alarm:

Second alarm:

Third alarm:

An EPL file throws an uncaught exception

You have seen that the Apama-ctrl microservice raises alarms for logged messages. In addition, there can also be uncaught exceptions (during runtime). Apama-ctrl identifies such exceptions and raises alarms so that you can identify and fix the problem.

For example, the following monitor throws IndexOutOfBoundsException during runtime:

monitor Sample{
   sequence<string> values := ["10", "20", "30"];
   action onload() {
      // IndexOutOfBoundsException (runtime error)
      log "Value = " + values[10] at ERROR;
   }
}

Apama-ctrl generates the following alarm for the above example:

You can diagnose the issue by the monitor name and line number given in the alarm.

For more details, you can also check the Apama logs if the tenant has the “microservice hosting” feature enabled. Alarms of this type should be fixed as a priority as these uncaught exceptions will terminate the execution of that monitor instance, which will typically mean that your app is not going to function correctly. This might even lead to a correlator crash if not handled properly.

An EPL file blocks the correlator context for too long

If an EPL app has an infinite loop, it may block the correlator context for too long, not letting any other apps run in the same context or, even worse, causes excessive memory usage (as the correlator is unable to perform any garbage collection cycles) leading to the app running out of memory. The Apama-ctrl microservice identifies such scenarios (the correlator logs warning messages if an app is blocking a context for too long) and raises alarms, so that the user can identify and fix the problem.

For example, the following monitor blocks the correlator main context:

event MyEvent {
}
 
monitor Sample{
    action onload() {
        while true {
            // do something
            send MyEvent() to "foo";
        }
    }
}

Apama-ctrl generates the following alarm for the above example:

You can diagnose the issue by the monitor name and context name given in the alarm.

For more details, you can also check the Apama logs if the tenant has the “microservice hosting” feature enabled. Alarms of this type should be fixed as a priority as these scenarios may lead to the microservice and correlator running out of memory.

Invalid measurement format

This alarm is raised whenever the measurementFormat key is set with an invalid value in the tenant option.

Valid measurementFormat values for any tenant are MEASUREMENT_ONLY and BOTH. The default value is MEASUREMENT_ONLY.

Multiple extensions with the same name

This alarm is raised when the Apama-ctrl microservice tries to activate the deployed extensions during its startup process and there are multiple extensions with the same name.

This disables all extensions that were deployed to Apama-ctrl. In order to use the deployed extensions, the user has to decide which extensions to keep and then delete the duplicate ones.

Info: In case of multiple duplicates, this alarm is only listed once.

Connection to correlator lost

This alarm is raised in certain cases when the connection between the Apama-ctrl microservice and the correlator is lost. This should not happen, but can be triggered by high load situations.

Apama-ctrl will automatically restart. Report this to Software AG Support if this is happening frequently.

The correlator queue is full

This alarm is raised whenever the correlator queue is full, including both input and output queues.

The correlator’s input and output queues are periodically monitored to check for building up of events. If the pending queue size grows above the normal threshold (20,000 for the input queue and 10,000 for the output queue), an alarm is raised. The alarm text contains a snapshot of the correlator status at the time of raising the alarm. A correlator with a full input or output queue can cause a serious performance degradation.

The correlator queue size is based on the number of events, not raw bytes.

Check the alarm text to get an indication of which queue is blocking. This also contains information about the slowest receiver and the most backed-up context. To diagnose the cause, see the information given in The CEP queue is full. A problem is likely to trigger the “correlator queue is full” alarm followed by the “CEP queue is full” alarm.

The CEP queue is full

This alarm is raised whenever the CEP queue for the respective tenant is full.

Karaf nodes that send events to the CEP engine maintain per-tenant queues for the incoming events. This data gets processed by the CEP engine for the hosted CEP rules. For various reasons, these queues can become full and cannot accommodate newly arriving data. In such cases, an alarm is sent to the platform so that the end users are notified about the situation.

If the CEP queue is full, older events are removed to handle new incoming events. To avoid this, you have to diagnose the cause of the queue being full and resolve it as soon as possible.

The CEP queue size is based on the number of CEP events, not raw bytes.

To diagnose the cause, you can try the following. It may be that the Apama-ctrl microservice is running slow because of time-consuming rules in the script, or the microservice is deprived of resources, or code is not optimized, etc. Check the input and output queues from the “correlator queue is full” alarm (or from the microservice logs or from the diagnostics overview ZIP file under /correlator/status.json).