Defining queries

Info
Apama queries are deprecated and will be removed in a future release.

A query is one of the basic units of EPL program execution.

Info
The other basic unit is a monitor. A monitor cannot contain a query. A query cannot contain a monitor. For information about writing monitors, see Defining monitors.

Apama queries are suitable for applications where the incoming events provide information updates about a very large set of real-world entities.

The topics below provide information and instructions for defining queries.

For reference information, see Queries.

See also: Using Query Designer and Deploying and Managing Queries.

Introduction to queries

An Apama query is a self-contained processing element that communicates with other queries, and with its environment, by sending and receiving events. Queries are designed to be multithreaded and to scale across machines.

You use Apama queries to find patterns within, or perform aggregations over, defined sets of events. For each pattern that is found, an associated block of procedural code is executed. Typically this results in one or more events being transmitted to other parts of the system.

Info
If a license file cannot be found while the correlator is running, several restrictions are enforced on queries. See Running Apama without a license file.

Example of a query

The following code provides an example of a query. This query monitors credit card transactions for a large set of credit card holders. The goal is to identify any fraudulent transactions. While this example illustrates query operation, it is not intended to be a realistic application.

query ImprobableWithdrawalLocations {
   parameters {
      float period;
   }
   inputs {
      Withdrawal(value>500) key cardNumber within period;
   }
   find Withdrawal as w1 -> Withdrawal as w2
      where w2.country != w1.country {
      log "Suspicious withdrawal: " + w2.toString() at INFO;
   }
}

Each query definition is in a separate file that has a .qry file name extension. The example shows several query features:

  • Parameters section

    Queries can be parameterized. When a query has no parameters, a single instance of the query is automatically created when the query is loaded into a correlator. If one or more parameters are defined for a query, when the query is loaded into a correlator, no instances are created until you define an instance and specify a set of parameter values for that instance.

  • Inputs section

    The inputs section identifies the events that the query will operate on, that is, the event inputs. This section contains one or more definitions. Each definition identifies the type of input event (Withdrawal in the example) together with details identifying which Withdrawal events are input, how those events are distributed, and what state, or event history, is to be held.

    The query key is a fundamental concept. If a key is defined, then the incoming events are partitioned into different sets based on the value of the key. Query processing operates independently for each set/partition. In the example query, events for each cardNumber will be independently processed.

    For each event input, the definition identifies the set of events that are current. When looking for pattern matches or evaluating aggregates, only current events are used. For each event input, the set of events that is current is referred to as the event window.

  • Find statement

    The find statement identifies an event pattern to be matched and defines what event processing actions are taken when a match is found. A find statement consists of an event pattern followed by a find block.

    The event pattern can specify conditions that determine whether there is a match. A where condition specifies a Boolean expression that must evaluate to true for there to be a match. A within condition specifies that certain elements within the pattern must occur within a given time period. A without condition specifies an event whose presence can prevent a match.

    Statements in a find block can send events to communicate with other queries, with monitor instances, and with external system elements in a deployment, such as adapters, correlators, or other deployed processes. Some EPL statements, such as on, spawn, from, and die are not allowed in queries.

Use cases for queries

Apama queries are useful when you want to monitor incoming events that provide information updates about a very large set of real-world entities such as credit cards, bank accounts, cell phones. Typically, you want to independently examine the set of events associated with each entity, that is, all events related to a particular credit card account, bank account, or cell phone. A query application operates on a huge number of independent sets with a relatively small number of events in each set.

One use case for Apama queries is to detect subsequent withdrawals from the same bank account but from locations that make it improbable that the withdrawals are legitimate. Very large numbers of withdrawal events would stream into your application. A query can segregate the transactions for each bank account from the transactions of any other bank account. Your query application can then check the transaction events for a particular account to determine if there have been withdrawals within, for example, a two-hour period from locations that are more than two hours apart. You can write a query application so that if it finds this situation the response is to contact the credit card holder.

Another use case is to detect repeated maximum withdrawals from the same automatic teller machine (ATM) within a short period of time. This might be due to a criminal with a stack of copied cards and identification numbers. In this case, a query can segregate events by ATMs. That is, the transactions conducted at a particular ATM would be in their own partition, separate from transactions conducted at any other ATM. Your query application can check the events in each partition to determine if, for example, there are repeated withdrawals of $500 within one hour. If such a situation is found your query can be written to send an alert message to the local police.

Another use case for Apama queries is to offer a better data plan to new smartphone users. Large numbers of events related to cell phone customers would come into the system. Your query application can create sets of events where each set, or partition, contains the events related to one cell phone customer. When your query detects an upgrade from a flip phone to a smart phone, your application can automatically send a message to that customer that outlines a better data plan.

In summary, the characteristics of an Apama query application include:

  • You want to monitor a very large number of real-world entities.
  • You want to process events on a per-entity basis, for example, all events related to one credit card account.
  • The data you need to retain in order to run Apama queries is either too large to fit on to a single machine or there is a requirement to place it in shared, fast-access storage (a cache) to support resilience/availability requirements.

Delayed and out of order events

In many of the typical applications envisaged for Apama queries, the input events may be either delayed or out of order. For example, cars and other mobile sources of events such as smart phones and tablet computers might normally send regular streams of events, but when such devices are out of network coverage, these events will have to be batched and sent when back in range. Many older generation factory robots store events and only send periodic batches by design. And in other cases, events may be sent out of order. Television set top boxes, for example, often employ distinct channels for tuning information and diagnostics. This means that a “channel changed” event may be received before a “set top box crashed” event, and so may be thought to have caused it, even though the event in fact happened after it, and was causally unconnected.

Delayed or out of order events can create problems for the query runtime because it assumes that events should be treated as being in the order in which they are processed, and the time of each event is the correlator’s time at the point the event is processed. However, provided that the input events contain a timestamp recording the time that the event was created at the source, these problems can be overcome by using the Apama queries source timestamp functionality. This allows the queries runtime to wait for specified periods before processing events, and then to process those events on the basis of their source timestamps rather than the time they were received by the correlator. (For out of order events, the Apama event definitions must have the appropriate annotation; for more information, see Out of order events).

Events can also be supplemented by heartbeat events with timestamps from data sources to inform the query runtime when communication with the data source is working correctly, which avoids long delays waiting for events to occur in case they are delayed.

See Using source timestamps of events for details on how to configure Apama queries to use source timestamps.

Query terminology

The following table defines important query terms.

Term Description
query A self-contained processing unit. It partitions incoming events according to a key and then independently processes the events in each partition. Processing involves watching for an event pattern and then executing a block of procedural code when that pattern is found.
input An event type that a query operates on. An input definition specifies an event type plus details that indicate how to partition incoming events and what state, or event history, is to be held.
key A query key identifies one or more fields in the events being operated on. Each input definition must specify the same key.
partition A partition contains a set of events that all have the same key value. One or more windows contain the events added to each partition.
window For each input, a window contains the events that are current. The query operates on only current events.
latest event The latest event is the event that was most recently added to a partition.
set of current events The events that are in the window(s) of a partition.
pattern Specification of the event or sequence of events or aggregation that you are interested in. A pattern can include conditions and operators.
match set A match set is the set of events that matches the specified pattern. A match set always includes the latest event.
parameterization A query definition that specifies parameters is a parameterized query. An instance of a parameterized query is referred to as a parameterization.
source timestamp The time an event occurred at its source. This may be before it is processed if there is some delay or disruption in delivering the event from the source to the query runtime. This will be data in one or more fields of an input event. Queries can use the source timestamp if an action is provided to obtain the source timestamp in the correct form. See Using source timestamps of events.
heartbeat event An event that a query uses to determine when communication with a data source is working correctly, and it has not missed any events that are delayed. With heartbeat events, received input events can be processed as they are considered definitive. Without these, the query runtime needs to wait for the input’s wait time specified in the query definition to ensure it avoids missing delayed events.
definitive time The point in time for which the query runtime has been told that it can assume it has received all the events it is going to receive. All events at or before this point in time are considered definitive and can be used to evaluate the query. This applies when using the source timestamp functionality.

Overview of query processing

When Apama executes queries, it does so in parallel, making use of multiple CPU cores as available. This is good for performance, but uses more resources on the hosts running the correlator and can, in edge cases, cause events to be processed in an order that is different from the order in which they were delivered to the correlator. To simplify testing, a serial mode is supported where events are processed in order, no matter how quickly they are sent.

Apama processes queries as follows:

  1. Based on the inputs section of a query, the query subsystem creates listeners for the required events.
  2. Running Apama queries receive events sent on the default channel and on the com.apama.queries channel.
  3. Events matching those listeners are forwarded to the query subsystem that processes the events.
  4. The events are processed in parallel. That is, multiple threads of execution are employed, thereby achieving vertical scaling on machines that have multiple cores.
  5. The query subsystem must locate the relevant events for the query partition. That is, the previously encountered events that are still current according to the defined event windows for that query. The information in the incoming event, that is, the key, is all that is required to locate these events.
  6. The window contents are updated, adding the new event and discarding any events that are no longer current.
  7. The system then checks the updated window contents to determine if there are any new pattern matches.
  8. For each new pattern match the associated find block statements are executed.

In a single correlator solution, events in a particular partition are held in one or more Apama MemoryStore records. The key from the incoming event is used to locate these records. In a multi-correlator solution, the records are held in a distributed cache, accessed by means of the MemoryStore API. All of this is internal, however, you should consider timing constraints when deciding whether a query-based solution is appropriate for a given problem.

After injecting a query into a correlator, events may be immediately sent to that query. If necessary, Apama stores these events until the query is prepared. That is, the query might be opening local/remote stores. Events are delivered when the query is ready to process them. There is no guarantee that the order in which the events arrived in the correlator is the same order in which the query processes them. See Event ordering in queries.

When testing, either send events at a realistic event rate, with pauses in between each set of events, or use single context mode. To send events with pauses, you can place BATCH entries in the .evt file. See Event timing.

By default, the query subsystem determines the size of the machine it is running on (the number of cores) and scales accordingly. If other services are affected by the load on the host machine, or for testing, then send one of the following events to the correlator (for example, by creating an .evt file in Apama Plugin for Eclipse and sending it as part of the Run Configuration) to configure how the correlator executes queries:

  • com.apama.queries.SetSingleContext()
  • com.apama.queries.SetMultiContext()

Overview of query application components

While queries can make up the central logic of an Apama deployment, deploying an Apama query application also requires event definitions, and connections to event sinks and event sources. Optionally, an Apama query application can make use of EPL plug-ins, EPL actions, and interactions with EPL monitors.

In addition to queries, the following components are required to implement a query application.

  • Event definitions. This includes event types used by adapters or mapped from message busses (see below) or used internally within application components. Typically, event types specific to an adapter or to existing messages on a message bus would be written by those creating or configuring the adapter.
  • Connections between event sources and queries and also between queries and event sinks. This is typically handled by adapters or by mapping to messages on a message bus by means of JMS. For testing, it is possible to use Apama Plugin for Eclipse or command line tools to send and receive messages.
  • A correlator process. Several queries can share the same correlator process. Queries can be started by Ant scripts, which can be exported from an Apama project. For testing, Apama Plugin for Eclipse can start the queries.
  • Optionally, queries can use a library of functions that you provide. These would be written in EPL and can call EPL plug-ins written in C++ or Java. Functions in such a library can be invoked from different points in a query.
  • Optionally, a query can interact with monitors. See Communication between monitors and queries.

For additional information, see Query application architecture.

Writing event definitions

Event definitions are defined in Apama .mon files. When writing event type definitions be sure to consider the following:

  • An inputs block in a query can specify filters on event fields of type boolean, decimal, float, integer, string or location.

  • An event field to be specified as a query key must be of type boolean, decimal, float, integer, string or location.

  • An event field to be specified in an inputs block, whether as a filter or a key, cannot be marked with the wildcard modifier in the event type definition.

  • A where condition in a query can make use of all actions and fields of events, including members of reference types such as sequence, dictionary and other events.

  • Specifying an event filter in an inputs block is very efficient because it prevents any part of the query from executing if the filter condition does not match. However, a filter in an inputs block can operate on only contiguous ranges and can compare only a single field to a constant or parameter value.

    Specifying an event filter in a where condition is more expensive than specifying an event filter in an inputs block. However, a filter in a where clause can be more powerful because it can specify any EPL expression.

  • A query cannot use an event that contains an action variable or fields of type chunk or listener.

  • If you want to take advantage of the source timestamp functionality, be sure to add an event field that records the time of the creation of the data encapsulated in the event, and an action that returns this time in the form of a float representing the number of seconds since the epoch (midnight, 1 Jan 1970 UTC). If the time data is not in this format, you can use the TimeFormat event library to perform the relevant conversions (for further information, see Using the TimeFormat event library).

For example, consider the following event definitions:

event Slice {
    integer quantity;
    float price;
}
event UsableEvent {
    integer quantity;
    string username;
    wildcard string auxData;
    sequence<Slice> slices;
    action averagePrice() returns float {
        float t:=0;
        Slice s;
        for s in slices {
            t:=t+s.price;
        }
        return t/(slices.length().toFloat());
    }
}
event InternalEvent {
    action<> returns float  averager;
}

UsableEvent.quantity and UsableEvent.username can be used in a query inputs block or in a query where condition.

UsableEvent.auxData, UsableEvent.slices and UsableEvent.averagePrice() can be used in where conditions but not in inputs blocks.

InternalEvent cannot be an input to a query because it has an action variable. However, an instance of InternalEvent could be used in a where condition or in triggered EPL code in a find block.

For example, the find statement in a query can be written as follows:

find UsableEvent as e1 and UsableEvent as e2
   where e1.averagePrice() > e2.averagePrice()
   and
   e1.slices[0].price < e2.slices[0].price

Action definitions can supply helper actions such as the averagePrice() action above. This can be useful in both event types used by adapters and in internal event types. For example, some event types may have no members but simply be a named container for useful library actions.

To make use of EPL plug-ins written in C, C++ or Java, it is recommended to write an EPL event type or set of event types that wrap the plug-in. This provides a more consistent interface and can add type safety to the use of chunks, which are opaquely-typed C, C++ or Java objects. These EPL actions can then be called from queries, as can any EPL action.

Event sinks and sources

A typical deployment includes adapters that connect the Apama system to external sources of data or provide the means to send events out of Apama. This can include:

For testing purposes, Apama Plugin for Eclipse can send / receive events from / to files, and command line tools are provided as well.

Correlator process

When developing queries in Apama Plugin for Eclipse, launching a configuration starts a correlator and injects queries into it by default. It is also possible to export the Apama launch configuration to an Ant script, which can be copied onto another machine such as a server to run your project on that machine.

It is possible to run multiple correlators that are configured to use the same distributed cache store. These correlators share query state. In such deployments, the recommendation is to use a JMS message queue. Typically, these deployments would use correlators on separate physical machines so a failure of one does not affect others. For testing, it is possible to run several correlators on a single machine provided a separate port number is allocated to each correlator. Take care to use the correct port number when interacting with the correlators.

Format of query definitions

A query searches for an event pattern that you specify. You define a query in a file with the extension .qry. Each .qry file contains the definition of only one query. The following sample shows the definition of a simple query that will search for a Withdrawal event pattern:

query ImprobableWithdrawalLocations {
    metadata {
        "author":"Apama",
        "version":"1"
    }
    parameters {
        float period;
    }
    inputs {
        Withdrawal() key cardNumber within (period);
    }
    find
        Withdrawal as w1 -> Withdrawal as w2
        where w2.country != w1.country {
        log "Suspicious withdrawal: " + w2.toString() at INFO;
    }
}

The format for a query definition is as follows:

query name {
   [ metadata { metadata_block } ]
   [ parameters { parameters_block } ]
   inputs { inputs_block }
   find pattern block
   [ action_definition... ]
}

Syntax Element

Description

query name

Specify the query keyword followed by a name for your query. Like monitors and event types, the identifier you specify as the name of a query must be unique within your application.

metadata

The metadata section is optional. If you specify a metadata section, it must be the first section in the query. Metadata are specified as a list of key-value pairs. Both key and value must be string literals. For more information, see Defining metadata in a query.

parameters

The parameters section is optional. If you specify a parameters section, it must follow the metadata section, if there is one, and precede the inputs section. Parameters must be of the following types: - integer

  • decimal
  • float
  • string
  • boolean

Specify one or more data_type parameter_name pairs. The parameter name can use any of the characters allowed for EPL identifiers (see Identifiers). Any parameters you specify are available throughout the rest of the query. For more information about parameters and how parameters get their values, see Implementing parameterized queries.

inputs

The inputs section is required and it must follow the parameters section, if there is one, and precede the find statement. In the inputs section, you must define at least one input. If you specify more than one input each input must be a different event type. The inputs section specifies the events that the query operates on. An input definition can include the keyword, key, followed by one or more fields in the specified event. This is the query key. The correlator uses the key to partition incoming events into separate windows. For example, the cardNumber key indicates that there is a separate window for the Withdrawal events for each card number. In other words, each window can contain Withdrawal events associated with only one account.

For details, see Defining query input.

find statement

After the inputs section, you must specify a find statement. A find statement specifies the event pattern of interest and a block that contains procedural code. This code can define EPL actions you want to perform when there is a match. For more information, see Finding and acting on event patterns.

action_definition

After the find statement, you can optionally specify one or more actions in the same form as in EPL monitors. An expression in a find statement can reference an action defined in that query. See Defining actions in queries.

Defining metadata in a query

You can record information about a query in the metadata section. This can be, for example, the recording author, the version number, or the last modified date of a query. Once defined, metadata information about a query can be viewed in the Scenario Browser. See also Using the Scenario Browser view.

Format for defining query metadata

You define query metadata in the metadata section of a query definition. The metadata section is optional. If you specify a metadata section, it must be the first section in the query. The format for specifying the metadata section is as follows:

metadata {
   key:value
   [, key:value ]...
}

key and value must be string literals. Both are case-sensitive.

value can be a multi-line string.

key must be a valid EPL identifier (see Identifiers). Therefore, key must not include spaces, hyphens, dots or any other characters that are not allowed in EPL identifiers.

All key definitions that are contained in a single metadata section of a query must be unique.

It is recommended to use lowerCamelCase style for the key. The prefix “apama” should not be used for the key as it is reserved for future use.

Partitioning queries

Partitioning queries

Based on the values of selected fields in incoming events, the correlator segregates events into many separate partitions. Partitions typically relate to real-world entities that you are monitoring such as bank accounts, cell phones, or subscriptions. For example, you can specify a query that partitions Withdrawal events based on their account number. Each partition could contain the Withdrawal events for one account. Typically, a query application operates on a huge number of partitions with a relatively small number of events in each partition.

Each partition is identified by a unique key value. You specify a key definition in each input definition in the query’s inputs block. The key definition specifies one or more fields or actions in the event type you want to monitor. The number, order and type of the key fields must be the same in each input definition in a query.

A query operates on the events in the windows in each partition independently of the other partitions.

Info
Several restrictions are enforced on queries if a license file cannot be found while the correlator is running. See Running Apama without a license file.

Defining query keys

At runtime, each partition is identified by a unique key value, which is the value of one or more fields or actions in the events that the query operates on.

Info

Using a key is optional. If you do not specify a key, all events the query operates on are in one partition. Since this is an unusual use case for queries, the documentation assumes that you always choose to specify a key.

An event member that is declared as a constant cannot be used as a query key.

In a query, each input definition in the inputs section specifies the query key in the key definition. The key definition specifies one or more fields or actions in the event that the window will contain. For example:

query ImprobableWithdrawalLocations {
   inputs {
      Withdrawal() key cardNumber within (600.0);
   }
   find (Withdrawal as w1 -> Withdrawal as w2)
        where (w1.country != w2.country) {
            getAccountInfo();
            sendEmail();
        }
}

In this example, the definition for Withdrawal events specifies that the cardNumber field is the key. When the correlator processes a Withdrawal event, it adds the event to the partition identified by the Withdrawal event’s cardNumber value.

Suppose the input definition in this example specifies two key fields:

inputs {
   Withdrawal() key cardNumber, cardType within (600.0);
}

Each partition is now identified by a combination of the cardNumber value and the cardType value. When you specify two or more key fields, insert a comma after each field except the last one. It is allowable to specify key fields in an order that is different than the order of the fields in the event.

When you specify more than one input in a query, each input definition must specify the same number and data type order of key fields. For example:

inputs {
   Withdrawal() key cardNumber within (600.0);
   AddressChange() key cardNumber retain 1;
}

For each input in this example, the key is the cardNumber field. The data type of the cardNumber field in the Withdrawal event must be the same as the data type of the cardNumber field in the AddressChange event.

Sometimes, a field in one event contains the same information as a field in another event but the two fields have different names. For example, information about the type of a card could be in the cardType field in Withdrawal events and the typeOfCard field in AddressChange events. In this situation, you must specify an alias for one of the event field names. You do this in the input definition’s key definition. In the following example, as cardType in the second input definition specifies the alias:

inputs {
   Withdrawal() key cardNumber, cardType within (600.0);
   AddressChange() key cardNumber, typeOfCard as cardType retain 1;
}

When you specify more than one input, the key definition in each input definition must specify the same number of fields in the same order. Also, the data type of a field in one key definition must be the same as the data type of its corresponding field in every other key definition in the same inputs block. If the names of corresponding key fields are not the same, you must use the as keyword to specify an alias.

While specification of an alias for a key field name is sometimes required, it is always an option you can choose to use. For example:

inputs {
   Withdrawal() key number as cardNumber, cardType within (600.0);
   AddressChange() key number as cardNumber, typeOfCard as cardType retain 1;
}

An alias maps a field in an event to a key field. You cannot use an alias as a field of the event. For example, consider the following query:

query Q {
   inputs {
      A() key surname as lastName,  dob as dateOfBirth retain 5;
      B() key lastName, dateOfBirth retain 4;
   }
   find A as a -> B as b...
}

In the find block of this query, you can use the following

  • a.surname, a.dob - Names of event fields
  • b.lastName, b.dateOfBirth - Names of event fields
  • lastName, dateOfBirth - Names of key fields

Defining actions as query keys

A query may also use the result of an action call on the event as the key for a query. To use an action as a query input key, you must provide the action name, parameters and an alias. The action call must always return a value (chunk, listener and action are invalid return types).

The following example calls the getName() action within the A event to generate the input key:

query Q {
   inputs {
      A() key getName() as name retain 1;
   }
   find A as a...
}

The parameters passed to an input key action can include query parameters or literals, or can be blank in which case empty parenthesis must still be supplied. Passing a query parameter allows for specializing the partitions depending on the query parameter, which can reduce duplicating query code when only the input keys differ.

The following example calls the getName() action within the B event, supplying the query parameter nameToPartition into the action. The action can then return a different field (firstname or surname) depending on the parameters of this query instance:

query Q {
   parameters {
       string nameToPartition;
   }
   inputs {
      B() key getName(nameToPartition) as name retain 1;
   }
   find B as b...
}

An alias must always be supplied and can be used to identify the returned value of the action call in the find block. For example, the alias name will identify the value returned from the call to getName() and can be used in the find block:

find B as b {
    print name;
}

If using multiple input events, the action return type must match the type for the key in all other inputs, and the alias must match between inputs. The following example uses both action and field input keys; the surname field and the return from getName() must be the same type, and they are both mapped to the alias name:

query Q {
   inputs {
      A() key surname as name retain 1;
      B() key getName("Surname") as name retain 1;
   }
   find A as a -> B as b...
}

We suggest that query parameters that are passed into action keys are not updated. Updating them can cause unexpected partitioning as the returned value from the action call may not be as expected.

Query partition example with one input

A key can be one event field. For example:

query ImprobableWithdrawalLocations {
   inputs {
      Withdrawal() key cardNumber within (600.0);
   }
   find (Withdrawal as w1 -> Withdrawal as w2)
        where (w1.country != w2.country) {
            getAccountInfo();
            sendEmail();
   }
}

In the previous code fragment, the key is the cardNumber field in the incoming Withdrawal event type. When a Withdrawal event arrives the correlator adds it to the window in the partition identified by the value of the Withdrawal event’s cardNumber field. For each partition, each unique card number in this example, the correlator maintains the window and evaluates the pattern separately from every other partition.

Suppose that cardNumber is the first field in Withdrawal events. The following table shows what happens at runtime.

Incoming Event

Goes Into Window in Partition Identified by This Key Value

Window Contents

Withdrawal (12345, 50.0,...)

12345

Withdrawal (12345, 50.0,...)

Withdrawal (24601, 60.0,...)

24601

Withdrawal (24601, 60.0,...)

Withdrawal (12345, 10.0,...)

12345

Withdrawal (12345, 50.0,...), Withdrawal (12345, 10.0,...)

In the execution of this query, there is no interaction between the `Withdrawal` events for account number `12345` and the `Withdrawal` event for account number `24601`.

Query partition example with multiple inputs

The following query provides an example of partitioning with two inputs. This query operates on APNR (Automatic Plate Number Recognition) events and Accident events:

query DetectSpeedingAccidents {
   inputs {
      APNR() key road within(150.0);
      Accident() key road within(10.0);
   }
   find APNR as checkpointA -> APNR as checkpointB -> Accident as accident
      where checkpointA.plateNumber = checkpointB.plateNumber
      and checkpointB.time - checkPointA.time < 100
      // Which indicates the car was speeding
   {
      emit NotifyPolice(accident.road, checkpointA.plateNumber);
   }
}

The road field in an APNR event must be the same type as the road field in an Accident event. Assuming that road is a string, each partition is identified by a unique value for that string.

Suppose the correlator processes the following events in top to bottom order and that road is the first field in each event:

  • Accident("M11")
  • APNR("A14", "FAB 1",...)
  • APNR("A14", "BSG 75",...)
  • APNR("M11", "ZC 158",...)
  • APNR("A14", "BSG 75",...)
  • APNR("M11", "ZC 158",...)
  • APNR("A14", "FAB 1",...)
  • Accident("A14")

The following table shows which events are in which partitions. Note that in each partition, the APNR events are in one window and the Accident events are in another window. Although the events are in separate windows, the correlator time-orders the events across all windows in a partition.

Events in Partition Identified by “M11”

Events in Partition Identified by “A14”

Accident("M11") APNR("M11", "ZC 158",...)

APNR("M11", "ZC 158",...)

APNR("A14", "FAB 1",...) APNR("A14", "BSG 75",...)

APNR("A14", "BSG 75",...)

APNR("A14", "FAB 1",...)

Accident("A14")

In each partition, the query evaluates the event pattern against the events in the windows in that partition. The query does this for each partition separately from every other partition. In this example, when the correlator adds the `Accident("A14")` event to the partition identified by `"A14"` the event pattern is triggered if the `where` clause in the `find` statement evaluates to `true`. The event pattern is not triggered in the partition identified by `"M11"`.

About keys that have more than one field

A key can be made up of several event fields. For example, a Transaction event might contain a field that indicates the transaction source account and another field that indicates the transaction destination account. You can specify that you want to partition Transaction events according to each unique source/destination combination:

query TransactionMonitor {
   inputs {
      Transaction() key source, dest within PERIOD;
   }
...
}

In this example, there is a partition identified by the value of each source/dest combination. Each of the following events is added to a window in a different partition:

This Event Is Added to the Window in the Partition Identified By
Transaction(1, 100,...) 1, 100
Transaction(1, 102,...) 1, 102
Transaction(2, 100,...) 2, 100
Transaction(2, 102,...) 2, 102

Regardless of the event pattern in the query, this query monitors the transfer of money from one specific account to another specific account. This query handles each transfer between the same two accounts separately from all other transactions.

Now suppose that there is an Acknowledgement event that acknowledges that a transaction has succeeded. It also has account source and account destination fields, but they are inverted when compared to the transaction event fields. That is, the source account for an acknowledgment is the destination account of the transaction. To ensure that the acknowledgments are added to the same partition as the corresponding transactions, the key definition specifies the as keyword:

inputs {
   Transaction() key source as txSource, dest as txDest within PERIOD;
   Acknowledgement() key dest as txSource, source as txDest within PERIOD
}

The query partitions events according to the combined values of the fields identified by txSource and txDest. The following table shows the partition that each event is added to.

This Event Is Added to a Window in the Partition Identified By
Transaction(1, 100,...) 1, 100
Acknowledgement(100, 1,...) 1, 100
Transaction(1, 102,...) 1, 102
Transaction(2, 100,...) 2, 100
Acknowledgement(100, 2,...) 2, 100

As you can see, a Transaction event and its Acknowledgement event are each added to a window in the same partition.

Defining query input

In a query definition, you must specify an inputs block that defines at least one input. The input definitions identify the events that you want the query to operate on. An input definition can specify particular content and it can also specify a number of events or a time period. For example:

query FraudulentWithdrawalDetection {
    inputs {
        Withdrawal(amount > 10.0)
            key cardNumber, cardType
            within 600.0;
        AddressChange()
            key cardNumber, typeOfCard as cardType
            retain 1;
     }
    find (Withdrawal as w1 -> Withdrawal as w2)
        where (w1.country != w2.country or w1.city != w2.city)
        without AddressChange as ac {
            getAccountInfo();
            if preferredContactType = "Email" {
                sendEmail();
            }
            if preferredContactType = "SMS" {
                sendSMS();
            }
    }
}

The previous code defines two inputs. For each input, there is an associated window of events. The first input window contains Withdrawal events and the second contains AddressChange events.

The input definition for the Withdrawal events specifies that each Withdrawal event in the window must have a value greater than 10.0 in the amount field. The input definition for the AddressChange events does not specify an event filter. Therefore, each AddressChange event that arrives is eligible to be in the window.

The next element in an input definition is the key definition. The key definition indicates how you want to partition the incoming events. If you define more than one input, the number, type and order of the key fields must be the same for each input. In the previous sample code, assume that the key fields for Withdrawal events, cardNumber and cardType are integer and string, respectively, and that the key fields for AddressChange events, cardNumber and typeOfCard are also integer and string, respectively. The two input keys match in number, type and order of key fields.

After the key definition, you can specify a within clause, a retain clause, or both. If you specify both, the within clause must be before the retain clause. A within clause specifies a period of time. Only events that arrive within that period of time are in the window. In the window that contains Withdrawal events, only Withdrawal events that have arrived in the last 10 minutes (600.0 seconds) are in the window. A retain clause specifies how many events can be in the window. In the window that contains AddressChange events, only the last AddressChange event that arrived can be in the window. When an AddressChange event arrives, if an AddressChange event is already in the window it is ejected.

After the duration, you can optionally specify a with unique clause to prevent repeated values appearing in the window. If specified, the with unique clause lists one or more fields or actions on the event type (action names should be followed by open and close parentheses). If there is more than one event in the window after the within and retain specifications, then all but the latest are discarded. See Matching only the latest event for a given field.

The final, optional, element of an input definition is the specification of the event source timestamp and the associated wait period. If you expect that input events from a source will be subject to delays or may be received out of order, then you can specify a time from clause with the name of an action that returns a float specifying the number of seconds from the epoch (midnight, 1 Jan 1970 UTC) that the event was created. If you do this, you must also add a wait clause which requires a float or time literal specifying the maximum delay expected for these events. This tells the query runtime how long it must wait if it has not received any events before it can continue processing the events it has received. Both of these clauses require that the event definition must have a source timestamp recording the time of creation of the event, and a corresponding action that returns that timestamp in the form of a float representing the number of seconds since the epoch. In the example below, the query is gathering data from cars, which may be delayed if a vehicle goes out of network coverage. Accordingly, the input definitions specify that the source timestamps of the events are to be obtained from the events’ getEcuTime actions which simply return the value of the events’ ts float field. Further, the input definitions specify that in each case, the runtime should wait for up to 1 hour before continuing to process any events already received to allow for possible delays. For further details, see Using source timestamps of events.

event CarRPM {
    string carId;
    float ts;
    float rpm;

    action getEcuTime() returns float {
        return ts;
    }
}

event CarEngineTemp {
    string carId;
    float ts;
    float temp;

    action getEcuTime() returns float {
        return ts;
    }
}

event CarEngineMisfire {
    string carId;
    float ts;

    action getEcuTime() returns float {
        return ts;
    }
}

query DetectEnginePerformanceProblems {

  inputs {
    CarEngineTemp() key carId within 1 hour time from getEcuTime wait 1 hour;
    CarRPM() key carId within 1 hour time from getEcuTime wait 1 hour;
    CarEngineMisfire() key carId within 1 hour time from getEcuTime wait 1 hour;
  }

  find CarEngineTemp as t and CarRPM as r -> wait 1 minute
    where t.temp > T_THRESHOLD
    where r.rpm > R_THRESHOLD
    without CarEngineMisfire as misfire {
    log "Possible engine performance problem" + t.toString() + r.toString();

  }
}

Typically, you define one to four inputs. If you define more than one input, each must be a different event type. In other words, two inputs to the same query cannot be the same event type.

Queries can share windows

All query instances that have the same input definitions share the same windows. Two queries have the same input definitions when they specify:

  • the same input event types (the order can be different)
  • the same keys
  • the same (if any) input filters
  • the same use of source timestamps - that is, the same action named in time from clauses (wait times may be different)
  • the same use of heartbeat events

Any wait, within, retain or with unique specifications can be different.

When two query instances have the same input definitions and no parameters are used in any input filters, then all instances of those query definitions can share window data. If parameters are used in input filters, then parameterizations with different parameter values each store data separately. This increases total storage requirements and cost of processing the queries.

If a query is already running and you inject a query that defines the same inputs or create a parameterization that defines the same inputs then the new query instance or new parameterization uses the same windows as the query that was already running. This means that events that were received before the new query was injected or before the parameterization was created can be in a match set for the new query instance or new parameterization. This can happen when an event arrives after the new query is injected or after the parameterization is created and that event completes the pattern that the new instance or parameterization is looking for.

To reduce the amount of memory storage required to run queries, you might want to adjust the input definition for a query so that it is the same as another query. For example, suppose query Q is consuming inputs A, B, and X, while query P is consuming inputs A, B, and Y. If both queries define both X and Y as inputs (as well as A and B) then they can share the same windows. This can be an advantage when there are many A and B events but comparatively few X and Y events. If many queries can be written with similar input sections then they can share windows, which can lead to very efficient use of memory.

If the reason for adding an input using source timestamps is simply in order to share window contents, then the wait time for this input can be zero to avoid unnecessary delays.

Format of input definitions

In a query definition, you define one or more inputs in the inputs block. The format of the inputs block is as follows.

inputs {
   event_type(event_filter)
   key query_key [within_clause] [retain_clause]
   [with_unique_clause]
   [time_from_clause wait_clause [or_clause]];

   [ event_type(event_filter)
   key query_key [within_clause] [retain_clause]
   [with_unique_clause]
   [time_from_clause wait_clause [or_clause]]; ]...
}

Syntax Element

Description

event_type

Name of the event type that you want to operate on. The event type must be parsable. See Type properties summary. Event type names can come from the root namespace, a using declaration, or a local package as specified in a package declaration.

event_filter

Optionally filter which events of this type you want to be in the window. For example, you might define the window to contain only the events whose amount field is greater than 10. The rules for what you can specify for the event filter are the same as for what you can specify in an event template in EPL. See Event templates.

query_key

Specify one or more event fields or actions. You can specify event fields of type boolean, decimal, float, integer, string or location. When an action is used as an input key, an alias must be supplied. The correlator uses the key to partition the events. Each partition is identified by a unique key value. One or two keys is typical. Three is unusual and rarely needed. More than three key values is discouraged.

When you define more than one input in a query

  • The number, type, and order of the key fields in each input definition must be the same.
  • If the names of the key fields are not the same in each input definition, you must specify aliases so that the names match. For details, see Partitioning queries.

retain_clause

Optionally specify retain followed by an EPL integer expression that indicates how many events to hold in the window. For example, if you specify retain 1, only the last event that arrived that is of the specified type and that has the key value(s) associated with that partition is in the window. You must specify a retain clause or a within clause or both. While it is possible to retain any number of events, you must ensure that you define an input that allows a match with the event pattern specified in the corresponding find statement. For example, the following query never finds a match:

query Q {
  inputs {
      A() key k retain 3;
  }
  find A as a1 -> A as a2 -> A as a3 -> A as a4 {
      print a1.toString()+ " - "+a4.toString();
  }
}

within_clause

Optionally specify within followed by a float expression or time literal that specifies the length of time that an event remains in the window. You must specify a retain clause or a within clause or both. See Specifying event duration in windows.

with_unique_clause

Optionally specify a set of secondary keys which constrains the window to only include the latest event for each value for the set of keys. See Matching only the latest event for a given field.

time_from_clause

Optionally specify time from followed by the name of an action that specifies how the source timestamp of the event can be obtained. The named action must be an action defined on that input event type. It must take no parameters and must return a float. This is taken to be when the event occurred, specified as seconds since the epoch.

Info
You are not permitted to use the event’s built-in getTime() method. This method returns the time when the correlator either processed or created the event, which defeats the purpose of the source timestamp functionality.

wait_clause

If a time_from_clause is provided, a wait_clause is required, which specifies wait followed by a float expression or time literal which specifies the maximum delay expected for events. This is how long a query will wait for events if it has not received any events. See also Using heartbeat events with source timestamps and Out of order events.

or_clause

Optionally specify a heartbeat event type which informs the query runtime when communication with the data source is not delayed. See Using heartbeat events with source timestamps. This can only be specified if the time_from_clause and wait_clause are specified.

Behavior when there is more than one input

The correlator orders the events in a window according to the time it processes each event, that is, the time it adds the event to its window. When a query defines more than one input then, for each partition, the correlator maintains a single time-order for all events in all windows.

Suppose the correlator adds an event to a window and within 0.1 seconds the correlator adds a different event to the same window or to another window in the same partition. Outside a query, these two events might have the same timestamp because default correlator behavior is to increment the timestamp only every tenth of a second. In a query, however, if an event is added to a window within 0.1 seconds after another event was added to a window, the correlator assigns the second event a timestamp with enough significant digits to ensure that time order is preserved. The following code fragment shows the result of calling the getTime() method on two events that arrive within 0.1 seconds of each other:

find E as e -> F as f {
   print e.getTime().toString(); // Yields "1365761429.1"
   print f.getTime().toString(); // Yields "1365761429.100001"
}

The order of the events is important when the event pattern in a find statement specifies the followed-by operator. Consider this example:

query Q {
   inputs {
      A() key k retain 20;
      B() key k retain 10;
   }
   find A as a -> B as b {... }
}

This pattern does not trigger when the correlator adds an A event to the A window. But if there is already an A event in the A window then this pattern triggers when a B event is added to the B window.

In a partition, at any one time, it is possible for the set of windows to contain multiple sets of events that, each taken in isolation, would match the defined event pattern. In this case, the event matching policy determines which of the candidate sets triggers an action. See Event matching policy for a description of how the query chooses the event set that triggers an action. To illustrate event matching policy, that topic provides an example of query behavior when there is more than one window.

Specifying event duration in windows

In an input definition, you can specify an optional within clause that indicates the length of time that an event remains in the window. For example:

query FraudulentWithdrawalDetection {
    inputs {
        Withdrawal() key userId within 1 hour;
     }
    find Withdrawal as w1 -> Withdrawal as w2
        where w1.city != w2.city {
        log "Suspicious withdrawal: " + w2.toString() at INFO;
    }
...
}

In this example, a Withdrawal event remains in the window for 1 hour. After 1 hour in the window, an event is ejected. Each time an event is added to one of the windows in a partition, the correlator evaluates the find pattern for that partition. Ejection of an event from a window does not trigger pattern evaluation. There are two formats for specifying a within clause:

  • within time_literal
  • within float_expression

Parentheses in within clauses are allowed. The rules for specifying a time literal are:

  • Specify one or more integer or float literal(s) and follow each one with a keyword that indicates a time unit.

  • Time unit keywords are:

    • day, days
    • hour, hours
    • min, minute, minutes
    • sec, second, seconds
    • msec, millisecond, milliseconds Outside a query, you can use these keywords as identifiers. Inside a query, you cannot use these keywords as identifiers unless you prefix them with a hash symbol (#). See also Keywords.
  • A space is required between an integer or float literal and its time unit. A space is required after a time unit if it is followed by an integer or float literal. Additional whitespace is allowed.

  • If you specify more than one time unit keyword they must be in the order of decreasing size. For example, days must be before minutes.

  • You need not specify all time units.

  • Each time unit keyword must represent a different time unit, that is, you cannot, for example, specify both day and days.

Examples of valid time literals:

  • 10 hours
  • 1 days 12 hours
  • 1 day 2 hours 30 minutes 4 sec
  • 2 days 5 minutes
  • 2.5 sec
  • 10 seconds - This is equivalent to specifying the float expression 10.0.

While it is possible to define time literals using float values, for example, 3.5 days 12.5 hours 33.3 min, it is recommended that you use only integer values when the specification includes more than one time unit. For example, rather than specifying 2 days 65.75 minutes, you should specify 2 days 1 hour 5 min 45 sec.

If you open and edit a query in Apama’s Query Designer in Apama Plugin for Eclipse, it modifies the time literal (if necessary) such that it contains only integers. Also, the allowable range of integers is 0 to 23 for hours, 0 to 59 for minutes, 0 to 59 for seconds, and 0 to 999 for milliseconds. Where necessary, the Query Designer rounds up to a whole number of milliseconds. For example, suppose you specify the following time literal in EPL code:

3.5 days 4 hours 27.5 minutes 1002.75 milliseconds

The Query Designer converts this to 3 days 16 hours 27 minutes 31 seconds 3 milliseconds. The actual Query Designer display is: 3d 16h 27m 31s 3ms.

When you specify a float expression it indicates a number of seconds.

Consider the example at the beginning of this topic as the following events are added to their appropriate windows:

Time Event Added to Window
10:00 Withdrawal("Dan", "London")
10:30 Withdrawal("Dan", "Dublin")
10:45 Withdrawal("Dan", "Paris")
11:15 Withdrawal("Ray", "Honolulu")
11:30 Withdrawal("Dan", "Rome")

For the partition identified by user ID Dan, the query evaluates the pattern at the following times:

Time

Window Contents

Matching Events

10:00

Withdrawal("Dan", "London")

10:30

Withdrawal("Dan", "Dublin") Withdrawal("Dan", "London")

w1=Withdrawal("Dan", "London") w2=Withdrawal("Dan", "Dublin")

10:45

Withdrawal("Dan", "Paris") Withdrawal("Dan", "Dublin")

Withdrawal("Dan", "London")

w1=Withdrawal("Dan", "Dublin") w2=Withdrawal("Dan", "Paris")

11:30

Withdrawal("Dan", "Rome") Withdrawal("Dan", "Paris")

w1=Withdrawal("Dan", "Paris") w2=Withdrawal("Dan", "Rome")

An event remains in its window for exactly the specified duration. For example, at 11:00, `Withdrawal("Dan", "London")` is no longer in the window and at 11:30, `Withdrawal("Dan", "Dublin")` is no longer in the window. Although the contents of the window have changed, ejection of an event does not cause evaluation of the event pattern.

At 11:15, there is no evaluation of the event pattern for the partition identified by user ID Dan because an event is added to a window in the partition identified by user ID Ray.

Using the output of another query as query input

While it is possible to have a query send an event explicitly containing all of the coassignments in the pattern or aggregates using the %send construct, this requires setting each field and defining an event type, which currently can only be defined in EPL. In this case, however, this event definition needs to be kept in sync with the query. If the query pattern is modified, then the query’s %send construct and the event definition may both need to be updated. It is therefore recommended that you use the query output event to chain queries together.

Every query will automatically generate a query output event, which can be used as an input to other queries. This makes it easy to connect multiple queries together. One query may compute an aggregate such as an average over a sensor reading, while another query checks if a number of averaged readings match some condition.

The query output event

With each query definition, a query output event definition is automatically defined which is named after the query (see below) and has all of the values available to actions defined as fields.

The query output event definition comprises:

  • All parameters.
  • All keys (using the alias name).
  • For find patterns:
    • All “positive” event coassignments (that is, excluding without events and wait nodes).
  • For find every patterns (that is, a query which contains aggregates):
    • All aggregate select values.

Consider the two examples below, where we assume that the Measure event is defined as follows:

event Measure {
     string deviceId;
     integer userId;
     float value;
}

Example 1

query Q1 {
    parameters {
        float threshold;
    }

    inputs {
        Measure() key deviceId within 5 minutes;
        Error() key deviceId within 5 minutes;
    }
    find Measure:m1 -> (Measure:m2 or Error:e1)
        without Error:err {
    }
}

Query output event definition for Q1:

event Q1 {
    float threshold;
    string deviceId;
    Measure m1;
    optional<Measure> m2;
    optional<Error> e1;
}

In query Q1, the parameter threshold of type float, the solitary input key deviceId of type string, and the positive coassignments m1, m2 and e1 are mapped in the query output event Q1. m2 and e1 are wrapped in optional since they might or might not contain any value (just m2 or e1 is enough to trigger the find pattern).

Example 2

package com.apamax.mypkg;
query Q2 {
    inputs {
        Measure() key deviceId, userId as id within 5 minutes;
    }
    find every Measure:m1
        select mean(m1.value):avg_value
        select last(m1.userId):last_value{
    }
}

Query output event definition for Q2:

package com.apamax.mypkg;
event Q2 {
    string deviceId;
    integer id;
    float avg_value;
    integer last_value;
}

In query Q2, which uses aggregates, both the keys deviceId of type string and userId (aliased as id) of type integer are mapped in the query output event Q2. The query contains two select statements both of whose values are also mapped in the query output event.

Note that the query output event definition resides in the same package as the query, and that it can be used in any EPL application in the usual way you use any external event definition.

Info
Only the first 32 fields of indexable event types can be used in listeners. Beyond this, fields will be marked as wildcard fields. See Improving performance by ignoring some fields in matching events for more information on wildcards.

When is the query output event generated?

Whenever a query’s find statement triggers, the query output event is routed, no matter whether it was triggered by an event or a timer firing. Another query can use this query output event as an input, thus allowing the “chaining” of one query to another.

For example, query Q3 below uses both Q1 and Q2 as input:

using com.apamax.mypkg.Q2;
query Q3 {

     inputs {
              Q1() key deviceId within 5 minutes;
              Q2() key deviceId within 5 minutes;
     }
     find Q1: q1 and Q2 : q2 {
            print "Query Q3 is triggered";
    }
}

Q3 will trigger on receiving the inputs of type Q1 and Q2. Hence, Q3 will trigger when both queries Q1 and Q2 trigger.

Info

It is recommended to use separate packages for queries (and hence the query output event) and any external event definitions defined in EPL. This makes it clear where the event definitions are and avoids name clashes.

Cycles are illegal. For example, if a query Q2 uses Q1 as input and if Q1 in turn also uses Q2 as input, any such cyclic dependency is illegal to use.

Specifying maximum number of events in windows

In an input definition, you can specify a retain clause that indicates how many events can be in the window. For example:

query FraudulentWithdrawalDetection2 {
   inputs {
      Withdrawal() key userId retain 3;
   }
   find Withdrawal as w1 -> Withdrawal as w2 where w1.city != w2.city {
      log "Suspicious withdrawal: " + w2.toString() at INFO;
   }
}

In this query, only the most recent three Withdrawal events can be in the window. In other words, the window cannot contain more than three events. If only zero, one or two Withdrawal events with a particular key have arrived since the application was started then there would be only zero, one or two events, respectively, in the window.

The correlator evaluates the event pattern each time an event is added to the window. Suppose that at the indicated times the following events are added to the window in the partition identified by user ID Dan:

Time Event Added to Window
10:00 Withdrawal("Dan", "Dublin")
10:10 Withdrawal("Dan", "London")
10:20 Withdrawal("Dan", "London")
10:30 Withdrawal("Dan", "London")
11:30 Withdrawal("Dan", "Paris")

For the partition identified by user ID Dan, the query evaluates the pattern at the following times:

Time

Window Contents

Matching Events

10:00

Withdrawal("Dan", "Dublin")

10:10

Withdrawal("Dan", "Dublin") Withdrawal("Dan", "London")

w1=Withdrawal("Dan","Dublin") w2=Withdrawal("Dan","London")

10:20

Withdrawal("Dan", "Dublin") Withdrawal("Dan", "London")

Withdrawal("Dan", "London")

w1=Withdrawal("Dan","Dublin") w2=Withdrawal("Dan","London")

10:30

Withdrawal("Dan", "London") Withdrawal("Dan", "London")

Withdrawal("Dan", "London")

11:30

Withdrawal("Dan", "London") Withdrawal("Dan", "London")

Withdrawal("Dan", "Paris")

w1=Withdrawal("Dan","London") w2=Withdrawal("Dan","Paris")

It is important to note that at 10:30, the `Withdrawal("Dan", "Dublin")` event that arrived at 10:00 is no longer in the window because the window retains three events at most and there are three `Withdrawal` events that have been added to the window more recently.

Specifying event duration and maximum number of events

In an input definition, you can specify a within clause that indicates how long an event can remain in the window and a retain clause that indicates how many events can be in the window. When you specify both a within clause and a retain clause the within clause must be before the retain clause. For example:

query FraudulentWithdrawalDetection3 {
   inputs {
      Withdrawal() key userId within 1 hour retain 3;
   }
   find Withdrawal as w1 -> Withdrawal as w2 where w1.city != w2.city {
      log "Suspicious withdrawal: " + w2.toString() at INFO;
   }
}

In this query, a Withdrawal event can be in the window for up to one hour and only the three most recent Withdrawal events, if each one arrived during the previous hour, can be in the window. In other words, the window cannot contain an event that arrived more than an hour ago and it cannot contain more than three events. If only two Withdrawal events arrived in the previous hour then there would be only two events in the window.

Suppose that at the indicated times the following events are added to the window in the partition identified by user ID Dan:

Time Event Added to Window
10:00 Withdrawal("Dan", "Dublin")
10:10 Withdrawal("Dan", "London")
10:20 Withdrawal("Dan", "London")
10:30 Withdrawal("Dan", "London")
11:30 Withdrawal("Dan", "Paris")

For the partition identified by user ID Dan, the query evaluates the pattern at the following times:

Time

Window Contents

Matching Events

10:00

Withdrawal("Dan", "Dublin")

w1=Withdrawal("Dan","Dublin") w2=Withdrawal("Dan","London")

10:10

Withdrawal("Dan", "Dublin") Withdrawal("Dan", "London")

w1=Withdrawal("Dan","Dublin") w2=Withdrawal("Dan","London")

10:20

Withdrawal("Dan", "Dublin") Withdrawal("Dan", "London")

Withdrawal("Dan", "London")

10:30

Withdrawal("Dan", "London") Withdrawal("Dan", "London")

Withdrawal("Dan", "London")

11:30

Withdrawal("Dan", "Paris")

It is important to note that at 10:30 the `Withdrawal("Dan", "Dublin")` event that arrived at 10:00 is no longer in the window because the window retains three events at most and there are three `Withdrawal` events that have been added to the window more recently. Also, at 11:30 there are no `Withdrawal("Dan","London")` events in the window as they have been ejected because more than one hour has elapsed since each one was added to the window.

Using source timestamps of events

By default, the query runtime assumes that events should be treated to be in the order in which they are processed, and the time of each event is the correlator’s time at the point the event is processed. This is suitable if events are delivered reliably to the Apama correlator in a short amount of time and in order. However, if the events are delayed, accumulated into batches before being sent or delivered over unreliable networks, then it may be necessary to use the time at which an event happened at the event source, which would have to be available in the event in order for queries to use the source timestamp. For example, a car may measure the engine’s temperature, RPM and other important statistics along with a timestamp, and record these in a small computer in the car. Periodically, when the car is connected to a wireless network, the car will send this data as a batch of events. For the correct behavior of queries that make use of the time or ordering of events, the query will need to be configured to use the source timestamp.

Info
Source timestamps are not intended to be a replacement for Xclock. They can, however, be used in conjunction with Xclock for testing purposes. Xclock is controlling the correlator’s time (see Disabling the correlator’s internal clock). Source timestamps indicate the time at which an event occurred.
In order to use the source timestamp:

  • Every event which may be delayed should contain the source timestamp in some form.

  • An action must be defined on the event definition, which takes no parameters and returns a float. This should calculate the source time of the event - typically the time the event occurred - based on the fields of the event. The return value of the action should specify the time in seconds since the epoch (midnight, 1 Jan 1970 UTC). If the event contains the time in seconds since the epoch (in this example, stored in a field named sourceTime), this can be as simple as the following:

    action getSourceTime() returns float { return sourceTime; }
    

    Otherwise, the TimeFormat event library can be used to help convert from time of day and date, and perform time zone conversions as necessary. For example, if the source timestamps in your events are not already in the UTC time zone, then one way to do this is to include a time zone field and then use the TimeFormat event’s parseTimeWithTimeZone action to obtain the source time in the correct form as shown in the following event definition:

    using com.apama.correlator.timeformat.TimeFormat;
    using com.apama.queries.TimeFrom;
    
    @TimeFrom("getSourceTime")
    event E {
      integer k;
      string sourceTime;
      string timeZone;
    
      action getSourceTime() returns float {
          TimeFormat timeFormat := TimeFormat();
          return timeFormat.parseTimeWithTimeZone("HH:MM:SS", sourceTime,
                 timeZone);
      }
    }
    

    See also Using the TimeFormat event library.

  • The event definition should have a @TimeFrom annotation as in the above example (see also Adding predefined annotations) or queries that use the event as an input must specify a time from clause that names the action that provides the source time. In either case, queries must always specify a maximum time to wait for the events (see below). If both are specified, the time from clause in the query takes precedence.

    Info
    You are not permitted to use the event’s built-in getTime() method. This method returns the time when the correlator either processed or created the event, which defeats the purpose of the source timestamp functionality.

Waiting for delayed events

If using source timestamps, we assume events may be delayed between the source time at which they occur and being processed by the Apama correlator. If no events are received by the correlator, it needs to distinguish between no events having occurred and events being delayed. If events are delayed, the query runtime will wait before evaluating the query, as it does not have a definitive view of all of the events. A query that uses source timestamps must specify the maximum wait time that a query will wait before it will process events. This is the maximum delay that the query will tolerate and the maximum delay between an event having occurred and the query processing that event. The wait time is inclusive - that is, an event delayed by exactly the value specified in the wait clause will be considered valid.

The maximum wait time must be specified and must be set to a reasonable value, as it can increase the number of events stored by the query runtime, and processing of the query may be delayed by up to that duration. The maximum wait time for an input may be less than or more than the within duration, but should not represent a large number of events for typical event rate for that input.

The wait time must be specified in a query using the wait clause in an input declaration. The wait clause can specify a time as a time literal (using days, hours, minutes and seconds) or as a float expression. Both the source timestamp action and wait clause must be specified (or neither). The source timestamp action can be specified via the time from clause in the query or a @TimeFrom annotation on the event type definition.

It is possible to mix inputs that have source times and events that do not have source times in a single query. Event inputs without a source time are equivalent to using currentTime (that is, the correlator’s current time, see currentTime) as the source time, and a wait time of 0.

Event definitions may have an annotation defined @DefaultWait which specifies the default value to use for the wait time (see also Adding predefined annotations). This is only informational and used by the Design tab in Apama Plugin for Eclipse when editing query files as a means of setting the default wait time. The query must always specify the wait time, even if it is using the default value. Note that the editor will copy the value from the annotation, so changing the annotation will not affect existing query definitions.

Definitive time of a query event source

Given that input events may be delayed or out of order, how does the query runtime know when it is safe to process events? To answer this question, we introduce the concept of definitive time. The point in time for which the query runtime is entitled to think that it has received all the events it is going to receive is called the “definitive time”. All events at or before this point in time are considered definitive and can be used to evaluate the query. Events after the definitive time will not be processed until they become definitive (that is, the definitive time has changed so that the events are now at or before the definitive time). The query runtime will assume that no further events will be received with a time before the definitive time, and will only evaluate events that have occurred before the definitive time.

In the case of an individual query input, the definitive time of that input is the latest of:

The query’s overall definitive time is then determined as the minimum or earliest of the definitive times for each input.

If no events (either input or heartbeat events) are received, then a query may need to wait in order to evaluate the events it has received (particularly if using the wait operator in the pattern, or more than one input, where some inputs have no events received).

The concept of definitive time is best explained using worked examples. Consider, first, a query with a single input event type.

using com.apama.queries.TimeFrom;

@TimeFrom("getSourceTime")
event E {
    integer k;
    float sourceTime;

    action getSourceTime() returns float {
        return sourceTime;
    }
}

query SingleInput {

  inputs {
      E() key k within 1 hour wait 2 hours;
  }

  find E as e1 -> E as e2 where e2.getSourceTime() - e1.getSourceTime() > 600.0
  {
     log "Time gap " + (e2.getSourceTime() - e1.getSourceTime()).toString();
  }
}

In this case, where there is only a single input type, the definitive time will be the latest or most recent of either: the source timestamp of the last event, or the current time minus the wait time (2 hours in this example). The following table shows how the query runtime keeps track of the definitive time as it receives input events.

Wall Time E event source timestamp Query definitive time Result Explanation
10:00 07:00 08:00
10:05 07:30 08:05 Nothing - events are too old.
10:10 08:30 08:30
10:24 08:32 08:32 Nothing - event timestamps were only 2 minutes apart.
10:26 08:50 08:50 Time gap 18 minutes
10:30 10:30 10:30 Nothing - only 1 event in the “within 1 hour” window.

Now consider a more complex case where the query has two input event types. Events of type E are defined as above, but we add another definition for events of type X.

@TimeFrom("getSourceTime")
event X {
    integer k;
    float sourceTime;

    action getSourceTime() returns float {
        return sourceTime;
    }
}

query MultipleInputs {

    inputs {
        E() key k within 1 hour wait 1 hour;
        X() key k within 1 hour wait 1 hour;
    }

    find E as e1 -> E as e2 without X as x {
        log "Got (" + e1.sourceTime.toString() + ", "
             + e2.sourceTime.toString() + ")";
    }
}

Once again the table below shows how the definitive time of the query is determined. In this case, the runtime must take the definitive time as being the earliest of the definitive times of the input types because, as the pattern depends on all input types, it is only up until that point that it has a definitive view of all the query inputs.

For example, at wall time 09:22, even though the runtime has got E events with source timestamps 08:32 and 08:40, it is not entitled to conclude that we have a match for the query pattern because the most recent X event has a timestamp of only 08:25, so we do not yet know if there was an X event between 08:32 and 08:40 that would prevent a match. The wait time of 1 hour has not yet elapsed, so the definitive time of the query remains at 08:25, which is the source time of the most recent X event.

It is not until wall time 09:23 that we receive another X event with a source timestamp of 08:50. At this point, given that in this example we know that events are being delivered in order, it is safe for the runtime to assume that there were no other X events between 08:25 and 08:50 and so it can proceed to execute the query and match for the two pairs of E events (“08:30, 08:32” and “08:32, 08:40”). Further, at this time (wall time 09:23) the receipt of the X event with source timestamp 08:50 allows the runtime to update the definitive time of the overall query to 08:40, which has become the earliest of the definitive times of the query inputs.

Wall Time

E event source timestamp

X event source timestamp

Query definitive time

Result

Explanation

09:20

08:30

08:25

08:25

09:21

08:32

08:25

Nothing yet. Still waiting for an X.

09:22

08:40

08:25

09:23

08:50

08:40

Got (08:30, 08:32) Got (08:32, 08:40)

09:24

08:55

08:50

No 08:40 - 08:55 match, there is an X at 08:50.

09:25

09:00

08:50

Nothing yet - still waiting for X after 08:50.

09:26

08:57

08:57

No 08:55 - 09:00 match, there is an X.

09:27

09:10

08:57

Nothing yet - still waiting for X after 08:57.

10:10

09:10

Got (09:00, 09:10)

We waited for 1 hour for an X.

Using heartbeat events with source timestamps

When using source timestamps, if a query’s input has no events for a period of time, then the query will wait for the specified wait time for that query before evaluating events. This can cause unacceptable delays in processing events from other inputs. Some data sources may provide heartbeat events with timestamps which signal that communication from the data source to the queries system is working correctly. If these events occur but no input events have been received, then the query can infer that no input events, or only the input events received, have occurred, and thus the query’s input is definitive upon receiving a heartbeat, without having to wait any further. If communication is disrupted or delayed, then the heartbeat events will similarly be delayed, and the query will wait, as it has to in order to process delayed events.

Heartbeat events are specified on the input event type’s definition or per input of the query. They are only used if a query input is using source timestamps, that is, it has a wait clause specified. The heartbeat can be specified as a @Heartbeat annotation on the event definition, which should name the fully qualified event type to use as heartbeat events.

If a query input contains a time from clause, then the heartbeat must be explicitly named with an or heartbeat-type clause after the wait clause. For example, these two are equivalent:

@TimeFrom("getEcuTime")
@Heartbeat("CarHeartbeat")
event CarEngineTemp {.. }
...
query... {
    inputs {
        CarEngineTemp() key carId within 1 hour wait 6 hours;
    }
...

or:

query... {
    inputs {
        CarEngineTemp() key carId within 1 hour time from getEcuTime
           wait 6 hours or CarHeartbeat;
    }
...

The following rules apply for the heartbeat event:

  • The heartbeat event cannot be filtered.
  • The heartbeat event must share the same key fields and the same types as the input event type. In the above example, both CarEngineTemp and CarHeartbeat must have a field named carId which is of the same type in each event type. If actions are used in the input key, the heartbeat event must also supply the same action with the same signature (same parameters and return type).
  • The heartbeat event must have a matching action for obtaining the source time. In the above example, both CarEngineTemp and CarHeartbeat must have an action of the signature action getEcuTime() returns float. Typically, these would have the same implementation, as the heartbeat would have source timestamps in the same form as the input events; but the implementation of these methods may be different for heartbeat events (see Out of order events.)
  • The heartbeat event cannot be used as an input in the pattern, unless it is also listed as an input event in its own right.
  • The same heartbeat event type may be used for different inputs of the same query (this is typical, as a query may use a number of different types of events from the same data source, such as a car in the above example).

When a heartbeat event is received and processed, it will step forward the definitive time for all inputs that specify that heartbeat event. Thus, if all inputs use the same heartbeat event, then that heartbeat can step forward the definitive time, allowing the query to evaluate events received on some inputs without having to wait for the input wait time on other inputs where no input events were received.

Typically, heartbeat events will be delivered regularly. The rate at which heartbeat events are sent is dependent on the data source, but the queries system must be able to handle all of the heartbeat events from all data sources as well as the input events. Some devices may only send the heartbeats under certain conditions, for example, a car may only send heartbeats if the engine is running or the car is occupied. If no heartbeat events are received, then queries will use the wait time specified in the input before evaluating any events received, as needed.

Note that queries assume that the heartbeat events are delivered in the same order as input events. If an input event arrives with a timestamp before a previous heartbeat event, it will be discarded.

Typically, heartbeat events will be events that come from the same data source as the input events they are used with. Thus, any communications disruption affecting the input events will affect the heartbeat events in the same way. This is not a requirement; if some other system has knowledge of when a data source is connected or disconnected, the heartbeat events could be sent from that system - but if the system incorrectly sends heartbeat events and input events are delayed, then input events may be discarded.

Out of order events

When using source timestamps (see also Using source timestamps of events), the query runtime by default expects events to arrive in order. If an event arrives with an earlier source timestamp than a previous event for that same partition, it will be discarded. However, there are two cases where this behavior does not occur (see below), and queries will store events which arrive out of order and re-order them so that when they are processed, they are processed in order according to the source time.

Info
In both cases described below (with the @OutOfOrder() annotation and delayed events), heartbeat events (if specified) are always considered definitive, even if they are delayed. You cannot use an event definition with an @OutOfOrder() annotation as a heartbeat event. Note that as soon as a heartbeat event is processed, the query will ignore any events with earlier timestamps.
Case 1: Using the @OutOfOrder() annotation on the event definition

If the event definition (in an EPL file) has the @OutOfOrder() annotation which is available in the package com.apama.queries (see also Adding predefined annotations), then the queries runtime will treat it as not occurring in order.

This means that definitive time is not affected by the timestamp on the events. Thus, events will not be processed until the specified wait time has elapsed since their source time, or a heartbeat event (if specified) with a later timestamp has been processed (and all inputs have had their definitive time moved forward).

It is recommended to use heartbeats when using @OutOfOrder() events. They are not required, but if not used, the query execution will be delayed by the longest input wait specified in the query.

The following example compares the behavior if @OutOfOrder() is or is not specified on the input:

query FindAdjacentAEvents {
    inputs {
        A() within 30.0 wait 20 seconds;
    }
    find A as a1 -> A as a2 {
        print "a1 = "+a1.toString()+"; a2 = "+a2.toString();
    }
}

In the following tables, the events are listed in the order in which they are processed, but they occur in the order A(1), A(2), A(3), A(4). Note that A(2) is delayed by more than the wait time of the query (the actual events would have a source timestamp, but we show that as a separate column for clarity).

The following table applies if the event definition does have @OutOfOrder():

Input event Input event timestamp Correlator time Notes Query definitive time Query output
A(1) 10:00:10 10:00:20 10:00:00
A(4) 10:00:20 10:00:30 10:00:10
A(3) 10:00:15 10:00:32 10:00:12
10:00:35 20 seconds after A(3)’s source time (10:00:15) 10:00:15 a1=A(1); a2=A(3)
A(2) 10:00:12 10:00:37 discarded - more than 20 seconds old 10:00:17
10:00:40 20 seconds after A(4)’s source time (10:00:20) 10:00:20 a1=A(3); a2=A(4)

The following table applies if the event definition does not have @OutOfOrder():

Input event Input event timestamp Correlator time Notes Query definitive time Query output
A(1) 10:00:10 10:00:20 10:00:10
A(4) 10:00:20 10:00:30 10:00:20 a1=A(1); a2=A(4)
A(3) 10:00:15 10:00:32 10:00:20 (nothing - event is discarded as it is out of order)
A(2) 10:00:12 10:00:37 discarded - more than 20 seconds old 10:00:20
Case 2: Events are delayed

Even in the case where events are normally delivered in order from the data source, if there is a delay which is then resolved, a number of delayed events may all be processed in a very short space of time. Even if they are delivered to Apama correlators in the correct order, the query runtime runs in parallel within the correlator, so events processed close together in time may be processed out of order, even if they do not have an @OutOfOrder() annotation on the event definition. If an event is delayed, then the query runtime will wait before considering the event’s time as definitive for that input.

By default, the query runtime considers an event as delayed if its source time is more than 10 seconds before the correlator’s time at the point it is processed, and it will wait for 10 seconds before considering the event’s time as definitive for that input. These settings can be modified by sending in a SetDelayedEventsLeeway(delayLeeway, reorderBuffer) event:

com.apama.queries.SetDelayedEventsLeeway(5, 20.0)

The above example would set the query runtime to consider events older than 5 seconds as delayed, and would not consider them definitive until 20 seconds after they were received.

To consider all events in order regardless of delay, send an event with the first value set to infinity (as all actual delays must be less than infinity):

com.apama.queries.SetDelayedEventsLeeway(infinity, 0.0)

These events should be sent to all correlators in a cluster, typically as part of the initialization of the correlator along with injecting the query definitions.

The following example compares the behavior with different configurations and some delayed events:

query FindAdjacentAEvents {
    inputs {
        A() within 30 minutes wait 10 minutes;
    }
    find A as a1 -> A as a2 {
        print "a1 = "+a1.toString()+"; a2 = "+a2.toString();
    }
}

The following table lists the events where the A event does not have @OutOfOrder(). The last three columns give the behavior with different configurations:

  • Default config. A. Matches with the default values: 10 seconds delay threshold and 10 seconds reorder buffer.
  • Config. B. Matches if SetDelayedEventsLeeway(300, 10) is sent: 5 minutes (300 seconds) delay threshold and 10 seconds reorder buffer.
  • Config. C. Matches if SetDelayedEventsLeeway(10, 60) is sent: 10 seconds delay threshold and 1 minute reorder buffer.

Input event

Input event timestamp

Correlator time

Definitive time of the query for default leeway values

Default config. A

Config. B

Config. C

A(1)

10:06:10

10:10:30

10:00:30 (10 minutes ago)

A(4)

10:06:20

10:10:31

10:00:31 (10 minutes ago)

a1=A(1); a2=A(4)

A(3)

10:06:15

10:10:32

10:00:32 (10 minutes ago)

(A(3) out of order and discarded)

A(2)

10:06:13

10:10:33

10:00:33 (10 minutes ago)

(A(2) out of order and discarded)

10:10:43

10:06:20 (latest A event received)

a1=A(1); a2=A(2)

a1=A(2); a2=A(3)

a1=A(3); a2=A(4)

10:11:33

a1=A(1); a2=A(2)

a1=A(2); a2=A(3)

a1=A(3); a2=A(4)

A(6)

10:12:05

10:12:10

10:12:05 (latest A event received)

a1=A(4); a2=A(6)

a1=A(4); a2=A(6)

a1=A(4); a2=A(6)

A(5)

10:12:04

10:12:11

10:12:05 (latest A event received)

(none - event A(5) is discarded)

(none - event A(5) is discarded)

(none - event A(5) is discarded)

Note that A(6) is treated as occurring in order, as it is delayed by less than the delayLeeway value. Thus A(5) is discarded, as it has occurred out of order.

Matching only the latest event for a given field

A query input can optionally limit the window to only contain the most recent item for each value of a given field or action of the event. This is performed by the with unique operator, which is followed by one or more fields or actions of the input event type.

For example, consider a query looking at sensor data from a number of sensors on the same production line, with events that specify the productionLine and sensorId. The query compares sensor values between different machines and sensors on the same production line, so the query can be keyed on the productionLine field of events, but not on the sensorId field. However, only the latest event for each sensor is required. By specifying a with unique sensorId clause, only the latest value of each sensor is used.

If you add a with unique clause, if there is more than one item in the window that has the same value for all the fields or actions listed in the with unique clause, then only the most recent event is considered to be in the window and can match the pattern. The suppression of duplicates occurs after the within and/or retain clauses apply. For example:

inputs {
    Sensor() key productionLine retain 3 with unique sensorId;
}

Given the following events, the window contains only those marked in the third column of the following table (assuming all are for the same productionLine):

Event sensorId Window contains Notes
1 A 1(A)
2 B 1(A), 2(B)
3 C 1(A), 2(B), 3(C)
4 B 3(C), 4(B) Event 1 is discarded due to retain 3. Event 2 is discarded as event 4 has the same sensorId.
5 D 3(C), 4(B), 5(D)
6 C 4(B), 5(D), 6(C) Event 3 is discarded due to retain 3.
7 D 6(C), 7(D) Event 4 is discarded due to retain 3. Event 5 is discarded as event 7 has the same sensorId.

Note that the with unique is applied after the retain expression. Any with unique expression does not affect window sharing (see also Queries can share windows) nor how much data is stored.

The with unique clause comes after the sizing of the window (within, retain) and before, if present, the time from, wait or or clauses used for specifying source time.

with unique can list a number of comma-separated members or calls to actions, where the action name is followed by parentheses. Actions used in a with unique clause must take no parameters and return a value. The ordering is unimportant.

For example, using with unique upperName() for an event definition such as the following would only keep one event for each value of the name field, ignoring case:

event E {
    string name;
    action upperName() returns string { returns name.toUpper(); }
}

Finding and acting on event patterns

In a query, the find statement specifies the event pattern you are interested in. At runtime, for each event that the correlator adds to a window, the query checks for a match. Depending on the definition of the event pattern, the set of events that matches the pattern contains one or more events. This is the match set. A match set

  • Always contains the latest event, which is the event that was most recently added to a window.
  • Satisfies the event pattern.
  • Is always the most recent set that matches the event pattern. This is important when there is more than one set that is a candidate for the match set.

The format of a find statement is as follows:

find pattern block
Syntax Element Description
pattern The event pattern that you want to find. See Defining event patterns.
block The procedural code to execute when a match is found. See Acting on pattern matches.

Defining event patterns

In a query definition, you specify a find statement when you want to detect a particular event pattern. The find statement specifies the event pattern of interest followed by a procedural block that specifies what you want to happen when a match is found. For example:

query ImprobableWithdrawalLocations
{
    inputs {
        Withdrawal() key cardNumber within 24 hours;
    }
    find
        Withdrawal as w1 -> Withdrawal as w2 where w2.country != w1.country {
                log "Suspicious withdrawal: " + w2.toString() at INFO;
        }
}

In this example, the window that the query operates on contains any Withdrawal events that have arrived in the last 24 hours. The key is the card number so each partition contains only Withdrawal events that have the same value in their cardNumber field. In other words, each partition contains the Withdrawal events for one particular account. For more information about input definitions, see Defining query input.

The find statement specifies that the event pattern of interest is a Withdrawal event followed by another Withdrawal event.

In each partition, the where clause filters the Withdrawal events so that there is a match only when the values of their country fields are different. The two event templates in the find statement coassign matching Withdrawal events to w1 and w2, respectively.

In this example, the two matching Withdrawal events might or might not have arrived in the partition consecutively. For details, see Query followed-by operator.

When there is a match the query executes the action in the find block.

The format for defining a find statement is as follows:

find
   [every] [wait duration as identifier]
   event_type as identifier [find_operator event_type as identifier]...
   [wait duration as identifier]
   [where_clause] [within_clause] [without_clause]
   [select_clause] [having_clause] {
      block
   }

Syntax Element

Description

event_type

Name of the event type you are interested in. You must have specified this event type in the inputs section.

every

Specify the optional every modifier in conjunction with the select and having clauses. This lets you specify a pattern that aggregates event field values in order to find data based on many sets of events. See Aggregating event field values.

wait

Specify the optional wait modifier followed by a time literal or a float expression. A wait modifier indicates a period of elapsed time at the beginning of the event pattern and/or at the end of the event pattern. A float expression always indicates a number of seconds, See Query wait operator.

identifier

Coassign the matching event to this identifier. A coassignment variable specified in an event pattern is within the scope of the find block and it is a private copy in that block. The exception to this is in an aggregating find statement, only the projection expression can use the coassignments from the pattern. The procedural block of code can use projection coassignments and any parameters, but it cannot use coassignments from the pattern. Changes to the content that the variable points to do not affect any values outside the query. Unlike EPL event expressions, you need not declare this identifier before you coassign a value to it.

In an event pattern in a find statement, each coassignment variable identifier must be unique. You must ensure that an identifier in an event pattern does not conflict with an identifier in the parameters section, or inputs section.

find_operator

Optionally specify and or -> and then specify an event_type and coassignment variable. Parentheses are allowed in the pattern specification and you can specify multiple operators, each followed by an event_type and coassignment variable. For example, the following is a valid find statement:

find (A as a1 -> ((A as a2)) -> (A as a3) ->
(A as a4 -> A as a5 -> A as a6) ->
(((A as a7) -> A as a8) -> A as a9) ->
  A as a10)
{
 print "query with 10: "+a1.toString()+ "
 - "+a10.toString();
}

You can use either as or the colon (:) as the coassignment operator.

where_clause

To filter which events match, specify where followed by a Boolean expression that refers to the events you are interested in. The Boolean expression must evaluate to true for the events to match. The where clause is optional. Coassignment variables specified in the find or select statements are in scope in the where clause. Also available in a where clause are any parameter values and key values. This where clause applies to the event pattern and is referred to as a find where clause to distinguish it from a where clause that is part of a without cause, which is referred to as a without where clause. See Query conditions.

within_clause

A within clause sets the time period during which events in the match set must have been added to their windows. A pattern can specify zero, one, or more within clauses. See Query conditions.

without_clause

A without clause specifies event types whose presence prevents a match. See Query conditions.

select_clause

A select clause indicates that aggregate values are to be computed. See Aggregating event field values.

having_clause

A having clause restricts when the procedural code is invoked for a pattern that aggregates values. See Aggregating event field values.

block

Specify one or more statements that operate on the matching event(s). For details about code that is permissible in the find block, see Acting on pattern matches. Items available in a find block can include:

  • Any parameters defined in the parameters section
  • Coassignment variables specified in the event pattern (or projection coassignments in the case of aggregating find statements).
  • Key values

Query followed-by operator

You can specify the -> (followed-by) operator in the find statement. The -> operator matches events that come after each other. The event on the left of the operator always arrives in the correlator before the event on the right. In other words, the -> operator is always between two distinct events. For example, A as a1 -> A as a2 requires the arrival of two instances of an A event for the query to find a match. Also, any where clauses in the find statement must evaluate to true for an event pattern to match. Finally, the match set always includes the latest event.

Thus, the rules for when there is a match for an event pattern that specifies one or more followed-by operators are as follows. All of these requirements must be met for there to be a match.

  • There are events in the partition that match the subpatterns on both sides of the followed-by operator(s).
  • There is a match for the subpattern on the left of a followed-by operator before there is a match for the subpattern on the right of a followed-by operator. One event cannot match more than one subpattern in an event pattern.
  • If a subpattern contains a where clause then the where clause must evaluate to true for the subpattern to match.
  • The match set contains the latest event.
  • If there is more than one candidate event set for the match set then it is the most recent candidate event set that is the match set. See Event matching policy.

The following sections provide examples that illustrate these rules.

Two coassignments

Consider the following code in which the Withdrawal event contains only one field of interest, which is the country. Assume that the query partitions arriving Withdrawal events into windows according to the account number field.

find Withdrawal as w1 -> Withdrawal as w2
   where w1.country = "UK" and w2.country = "Narnia" {
      // Recent card fraud in Narnia against UK customers
      send SuspiciousWithdrawal(w2) to "Suspicious";
}

To make it easier to understand the behavior of the -> operator in more populated windows, the following example events omit the account number field but include a unique identifier field. Suppose the window for this query contains the following events, in arrival order top to bottom:

Withdrawal("Belgium", 1)
Withdrawal("UK", 2)

Although there is a Withdrawal event followed by another Withdrawal event, the where clause does not evaluate to true so there is no match. Now suppose the window contains these events:

Withdrawal("UK", 3)
Withdrawal("Narnia", 4)

Now the query finds a match. There is a Withdrawal event followed by another Withdrawal event, and the where clause evaluates to true. Withdrawal("UK, 3") is coassigned to w1 and Withdrawal("Narnia", 4) is coassigned to w2. The query executes the statements in its find block, which in this example is to send the event that triggered the match.

In this example, the Withdrawal events in the match set arrived consecutively. However, this is not a requirement. Consider a window that contains the following events:

Withdrawal("UK", 5)
Withdrawal("Belgium", 6)
Withdrawal("Belgium", 7)
Withdrawal("Narnia", 8)

When Withdrawal("Narnia", 8) is added to its window, the query finds a match because the Withdrawal("UK", 5) event is followed by the Withdrawal("Narnia", 8) event and the where clause evaluates to true for those two events. The effective behavior is that all combinations of events in the window are inspected to find a combination that matches. The Withdrawal("UK, 5") event is coassigned to w1 and Withdrawal("Narnia, 8") is coassigned to w2. The query executes the statements in its find block.

A match must include the event that arrived most recently in the window (the latest event). This ensures that a query does not detect more than one match for the same combination of events. In the previous example, the query found a match when the Withdrawal("Narnia", 8) event arrived.

Imagine that another Withdrawal event arrives and the window now contains the following events:

Withdrawal("UK", 5)
Withdrawal("Belgium", 6)
Withdrawal("Belgium", 7)
Withdrawal("Narnia", 8)
Withdrawal("Belgium", 9)

While the window still contains the Withdrawal("UK", 5) event followed by the Withdrawal("Narnia", 8) event, the arrival of the Withdrawal("Belgium", 9) event does not trigger a new match because it is not part of that combination. However, suppose the Withdrawal("Narnia", 10) event arrives. The window now contains the following events:

Withdrawal("UK", 5)
Withdrawal("Belgium", 6)
Withdrawal("Belgium", 7)
Withdrawal("Narnia", 8)
Withdrawal("Belgium", 9)
Withdrawal("Narnia", 10)

Now the query finds a new match. The Withdrawal("UK", 5) event is followed by the just arrived Withdrawal("Narnia", 10) event and the where clause evaluates to true for these two events. This match set contains Withdrawal("UK", 5) and Withdrawal("Narnia", 10). While this match set contains the same Withdrawal("UK", 5) event that was in the previous match set, it is a new match set because it contains the event that arrived most recently, which is the Withrawal("Narnia", 10) event.

Suppose that the Withdrawal("Narnia", 14) event has just arrived in the following window:

Withdrawal("Belgium", 11)
Withdrawal("UK", 12)
Withdrawal("UK", 13)
Withdrawal("Narnia", 14)

In this situation, there is a match set that contains the two most recently arrived events, that is, Withdrawal("UK", 13) and Withdrawal("Narnia", 14). The Withdrawal("UK", 12) event is not part of the match set because it is not the most recently arrived Withdrawal event whose country field is "UK".

Three coassignments

The code example below shows three coassignments in the find statement. This query partitions the arriving events into windows according to their Automated Transaction Machine identifier numbers (atmId).

query RepeatedMaxWithdrawals {
    inputs {
        Withdrawal() key atmId within 4 minutes;
    }
    find Withdrawal as w1 -> Withdrawal as w2 -> Withdrawal as w3
        where w1.amount = 500 and w2.amount = 500 and w3.amount = 500  {
            log "Suspicious withdrawal: " + w3.toString() at INFO;
    }
}

Each window contains the Withdrawal events that occurred in the last four minutes at a particular ATM. For simplicity, the following examples show only the amount and transactionId event fields. Suppose the following events are in the window and that they arrived in order from top to bottom:

Withdrawal(500, 101)   w1
Withdrawal(500, 102)   w2
Withdrawal(500, 103)   w3

After the third event arrives, the event pattern is matched, the where clause evaluates to true, and the events are coassigned to w1, w2, and w3 as shown above.

Another event arrives in the window:

Withdrawal(500, 101)
Withdrawal(500, 102)   w1
Withdrawal(500, 103)   w2
Withdrawal(500, 104)   w3

When the fourth event arrives there is a new match and the events are coassigned as shown above. The Withdrawal(500, 101) event is not part of the new match set. A match set always includes the most recent events that satisfy the event pattern and that allow the where clause to evaluate to true.

Another event arrives and the window now contains these events:

Withdrawal(500, 101)
Withdrawal(500, 102)
Withdrawal(500, 103)
Withdrawal(500, 104)
Withdrawal(100, 105)

The latest event, Withdrawal(100, 105), does not have 500 in its amount field. Consequently, its arrival in the window does not trigger a new match because a match set must always include the latest event. While the window still contains three events that satisfy the event pattern, the actions in the find block are not executed as a result of the arrival of Withdrawal(100, 105) because it did not trigger a new match.

Another event arrives and the window now contains these events:

Withdrawal(500, 101)
Withdrawal(500, 102)
Withdrawal(500, 103)  w1
Withdrawal(500, 104)  w2
Withdrawal(100, 105)
Withdrawal(500, 106)  w3

With the arrival of the Withdrawal(500, 106) event, there is a new match and the events are coassigned as shown above. The coassigned events are the three most recently arrived events that satisfy the event pattern. It does not matter that Withdrawal(100, 105) arrived after some events that are in the match set. That event does not satisfy the event pattern and so it is not included in the match set.

Finally, suppose all of the following events have arrived in the window within the specified four minutes:

Withdrawal(500, 101)
Withdrawal(500, 102)
Withdrawal(500, 103)
Withdrawal(500, 104)
Withdrawal(100, 105)
Withdrawal(500, 106)   w1
Withdrawal(500, 107)   w2
Withdrawal(100, 108)
Withdrawal(100, 109)
Withdrawal(500, 110)   w3

As you can see, the latest event causes a new match. This match set does not include the two events that arrived just before the latest event. Those two events do not satisfy the event pattern.

Query and operator

In a find statement, you can specify the and operator in the event pattern. The events on both sides of the and operator must be matched for the query to fire. The condition on each side of an and operator can be a single event template or a more complex expression.

In the next example, assuming that an X event and a Y event have already been added to their respective windows, adding an A event to its window causes a match.

(X as x -> A as a1) and (Y as y -> A as a2)

In the second example, suppose events were added to their windows in this order: X(1), A(1), Y(1), A(2). The A(1) event is not included in the match set. Only A(2) is in the match set because it is the most recent A event to follow X(1) as well as the most recent A event to follow Y(1).

When a single event is used in more than one coassignment you must coassign the event, A in these examples, to distinct identifiers, a1 and a2 in these examples.

Specification of an and operator implies that there are no requirements regarding the order in which the events specified in the event pattern are added to the window. For example, events specified in the right-side condition can be added to their windows before events specified in the left-side condition. When conditions specify multiple events, the events that cause one side of the and operator to evaluate to true:

  • can all be added to their windows before the events that cause the other side to evaluate to true;
  • can all be added to their windows after the events that cause the other side to evaluate to true;
  • can arrive in their windows at times interleaved with the arrival of the events that cause the other side to evaluate to true;
  • can contain the events that cause the other side to evaluate to true;
  • can be contained by the events that cause the other side to evaluate to true.

When there is an order requirement or when you require multiple instances of the same event type, specify the followed-by (->) operator.

The and operator has a higher precedence than the followed-by (->) operator, and lower precedence than the or operator. For clarity, use brackets in expressions that specify more than one type of operators.

Query or operator

In a find statement, you can specify the or operator in the event pattern. The events on one side or the other of the or operator must be matched for the query to fire. The condition on each side of an or operator can be a single event template or a more complex expression.

In the next example, assuming that a FlagAccount event and an OrderPlaced event have already been added to the query’s window, adding either a CreditCardAdded or OrderCancelled event to its window causes a match.

FlagAccount as account -> (CreditCardAdded as added or
  (OrderPlaced as placed and OrderCancelled as cancelled) )

A pattern normally only matches one side of an or operator, as it matches the most recent events. However, if one event matches both sides of an or operator, then both events may be coassigned.

Optional or-terms

Events on one side of the or operator are not required to be present when matching the pattern. In the example above, the added, placed and cancelled coassignments are not all required to be present. It will match if either an added event, or a placed and canceled event appears in the query’s window. These terms are referred to as “or-terms”. It is possible for the pattern to match with matching only some of those events, and others are left without an event assigned to them. These or-terms are thus optional rather than definitely having a value matched by the pattern. The following rules apply to or-terms:

  • Or-terms can only be used in where clauses (see Query conditions) if the where clause does not make use of or-terms on the opposite side of the or operator in the pattern. In the above example, added is opposite placed and cancelled. Therefore, the following where clauses are not legal:

    • where added.cardId = placed.cardId

    • where added.cardId = 5 or placed.cardId = 5

      (but see the next point for an example of how to express these conditions)

  • If one of the where clauses uses or-terms that are not being matched by the pattern, then they are ignored as they cannot be evaluated. For example, only one of the following where clauses is required to match (as it is not possible for both to match):

    • where added.cardId = 5
    • where placed.cardId = 5
  • In the action of the query (see Acting on pattern matches), the type of an or-term is optional<EventType>. The types in the above example are:

  • If using the %send construct (see Adding query send event actions), any or-terms required by the fields of the event should be included in an ifpresent entry of the %send. This uses the ifpresent statement, thus the contents of the or-terms events are available to the send action. When using the Query Send Event Action dialog in Apama Plugin for Eclipse, the ifpresent is automatically filled out.

  • If one side of an or term matches and the other side is incomplete, then no or-terms from the incomplete side of the or operator are included in the matched events. Each side of an or operator can either match completely or not at all. In the above example, if a CreditCardAdded event occurs, the OrderPlaced event is discarded, despite being present in the window. Thus, detecting the presence of just the placed or canceled event with ifpresent is sufficient to detect which side of the or has matched.

  • Aggregates (see Aggregating event field values) cannot use or-terms.

The or operator has a higher precedence than the and operator, and lower precedence than the followed-by (->) operator. For clarity, use brackets in expressions that specify more than one type of operator.

Query wait operator

You can specify the wait operator in an event pattern. The wait operator indicates that there must be a time interval either at the beginning of the matching pattern or at the end of the matching pattern. The format for specifying the wait operator is as follows:

wait ( durationExpression ) as coassignmentId

You can use either as or the colon (:) as the coassignment operator.

Syntax Element Description
durationExpression A time literal (such as 2 min 3 seconds) or a float expression. A float expression can use constants and parameters. It indicates a number of seconds.
coassignmentId An identifier. You can specify this identifier only in a between clause. See Query condition ranges.

Typically, you specify the wait operator in conjunction with an event pattern condition. For example:

find A as a -> B as b -> wait(10) as t
   without X as x between ( b t )

There is a match for this pattern when these things happen in this order:

  1. An A event is added to a window in a partition.
  2. A B event is added to a window in the same partition.
  3. Ten seconds go by without an X event being added to a window in that partition.

The wait operator can be unambiguously at the beginning of a pattern that uses the followed-by operator or unambiguously at the end of a pattern that uses the followed-by operator. For example:

X as x -> wait(1.0) -> Y as y              // Not allowed
X as x and wait(1.0) and Y as y            // Not allowed
X as x and Y as y and wait(1.0)            // Not allowed
wait(1.0) -> (X as x and Y as y)           // Allowed
wait(1.0) -> X as x -> Y as y -> wait(1.0) // Allowed

The following code fragment detects the opening of a door without security authorization:

find wait( 5 seconds ) as p -> DoorOpened as e
   without SecurityAuthorization as s where s.doorId = e.doorId {
   emit UnautorizedAccess(e.doorId);
   }

Suppose the following events were received:

Time Event
00 SecurityAuthorization("door1")
02 DoorOpened("door1")
07 DoorOpened("door1")
15 DoorOpened("door2")

The first DoorOpened event for door1 does not generate an alert because a SecurityAuthorization event was received within the 5 seconds that preceded the first DoorOpened event and the doorId field is the same for both events. That is, because the Boolean expression in the where clause of the without clause evaluates to true, a match is prevented and so an alert is not sent.

The second DoorOpened event for door1 causes an UnautorizedAccess alert because the SecurityAuthorization event was received more than 5 seconds before the second DoorOpened event for door1.

The DoorOpened event for door2 causes an UnauthorizedAccess alert because a SecurityAuthorization event was not received within the 5 seconds that preceded that DoorOpened event. Since there was no SecurityAuthorization event, the Boolean expression in the where clause that is in the without clause evaluates to false, which allows a match.

Query conditions

A find statement can specify conditions that determine whether there is a match for the specified event pattern. The following table provides an overview of the conditions you can specify.

Condition:

where

within

without

Specifies:

Boolean expression

Time period

Event type coassigned to an identifier

Latest event can cause a match when:

The Boolean expression evaluates to true.

Events in the pattern (or, if specified, the between range) must have been received within the time specified. That is, the elapsed time from when the first event was received to when the last event was received must be less than the within time period.

An event of a specified type was not added to a window after the addition of the oldest event in the potential match set nor before the addition of the latest event.

Number allowed:

Zero or more

Zero or more

Zero or more

Order when all conditions are specified:

1st

2nd

3rd

Format:

where boolean_expression

within time_literal

without typeId as coassignmentID You can use either as or the colon (:) as the coassignment operator.

Notes:

where x where y is equivalent to

where x and y A where clause that precedes any within or without clauses is referred to as a find where clause.

Alternatively, you can specify within expression. The expression must evaluate to a float. Optionally, after each within clause, you can specify a between clause. See Query condition ranges.

Optionally, after each without clause you can specify one where clause, which is referred to as a without where clause to distinguish it from a find where clause. Optionally, after each without clause, you can specify a between clause. See Query condition ranges.

Query where clause

A where clause filters which events match. A where clause consists of the where keyword followed by a Boolean expression that refers to the events you are interested in.

Coassignment variables specified in the find statement are in scope in the find where clause. Also available in the find where clause are any parameter values and key values. However, each where clause cannot use or-terms (coassignments that are not required as they are on one side of an or operator in the pattern) from both sides of an or operator in the pattern. Each where clause can only use coassignments from at most one side of every or operator in the pattern. Different where clauses can use coassignments from different sides of an or operator (see Query or operator). Thus, when writing where clauses which apply to either side of an or, use separate where clauses for each condition. They cannot be combined into a single where clause with an or or and operator in the where clause; they may require both sides to be evaluated.

For example, instead of

find OrderPlaced:placed or OrderCancelled:cancelled
        where placed.orderId = 1 or cancelled.orderId = 2

write the following:

find OrderPlaced:placed or OrderCancelled:cancelled
        where placed.orderId = 1
        where cancelled.orderId = 2

For where clauses that do not use any coassignments, all of the Boolean expressions must evaluate to true for the events to match.

For where clauses that use or-terms, they only apply if the events they make use of are matched by the pattern. If they use or-terms that have not been matched by the pattern, then those where clauses are ignored as they cannot be evaluated.

All of the where clauses that can be evaluated must be true for the pattern to match. If a single where clause combines (with an and or or operator) conditions on an or-term and a normal coassignment, then the entire where clause is ignored if the or-term is not matched.

The where clause is optional. You can specify zero, one or more where clauses.

Info
You can specify a find where clause that applies to the event pattern and you can also specify a without where clause that is part of a without clause. Any where clauses that you want to apply to the event pattern must precede any within or without clauses.

Query within clause

A within clause sets the time period during which events in the match set must have been added to their windows. A pattern can specify zero, one, or more within clauses. These must appear after any find where clauses and before any without clauses. The format of a within clause is as follows. The between clause is optional.

within durationExpression [ between ( identifer1 identifier2... ) ]

The durationExpression must be a time literal (such as 2 min 3 seconds) or it must evaluate to a float value. A float expression can use constants and parameters. It indicates a number of seconds.

For example, consider the following find statement:

find LoggedIn as lc -> OneTimePass as otp
   where lc.user = otp.user
   within 30.0 {
      emit AccessGranted(lc.user);
   }

If specified, the between clause lists two or more items. Each item can be a coassigned event in the pattern. A wait coassignment can also be specified. These items define a range. See Query condition ranges. For example:

find wait(1.) as w -> A as a {
...
 within (5.0) between w a

Now assume that the following events arrive:

Time Event Access Granted?
10 LoggedIn("Andy")  
15 OneTimePass("Andy") Yes. Both events received within 30 seconds.
20 LoggedIn("Mike")  
60 OneTimePass("Mike") No. OneTimePass event received more than 30 seconds after corresponding LoggedIn event.
60 LoggedIn("Sam")  
90 OneTimePass("Sam") No. OneTimePass event received exactly 30 seconds after corresponding LoggedIn event. For there to be a match, the OneTimePass event must be received less than 30 seconds after its corresponding LoggedIn event.

As mentioned earlier, a find statement can specify multiple within clauses. This is useful when the pattern of interest refers to multiple events and you specify a between range as part of each within clause. When you specify multiple within clauses they must all be satisfied for there to be a match.

Query without clause

A without clause specifies event types, which must be specified in the inputs block of the query, whose presence prevents a match. For example, if a potential match set contains 3 events, it can be a match only if a type specified in a without clause was not added to a window neither after the first event nor before the third event. Any event type that can be used in the find pattern can be used in the without clause.

Optionally, after each without clause, you can specify one where clause, which is referred to as a without where clause to distinguish it from a find where clause. The following table compares find where clauses and without where clauses.

Find where clause Without where clause
true allows a match. Think of this as a positive where clause. false allows a match. Think of this as a negative where clause.
Can only be before any within or without clauses Can only be part of a without clause
Applies to the event pattern Applies to the event specified in its without clause
Cannot refer to event specified in without clause Can refer to event specified in without clause

The absence of an event of a type specified in a without clause has the same effect as the presence of an event for which the without where clause evaluates to false.

In addition to being able to refer to parameters and coassignment identifiers in the event pattern, a without where clause can refer to the one event mentioned in its without clause. When a without where clause evaluates to true, the presence of the without event prevents a match. If a without where clause is false, then that without event instance is ignored; that is, a match is possible.

A without clause cannot use the -> or and pattern operators. However, you can specify multiple without clauses. If there are multiple without clauses each one can refer to only its own coassignment and not coassignments in other without clauses. However, all without clauses can make use of the pattern’s standard coassignments, such as od.user in the example at the end of this topic.

If there are multiple without clauses a matching event for any one of them prevents a pattern match. Multiple without clauses can use the same type and the same coassignment, which is useful only when their where conditions are different.

Typically, a without where clause references the event in its without clause, but this is not a requirement.

Optionally, after each without clause, you can specify a between clause, which lists two or more coassigned events. It can also list a wait coassignment. For an event to cause a match, the type specified in the without clause cannot be added to the window between the points specified in the between clause. See Query condition ranges.

Any without clauses must be after any find where clauses and within clauses. If you specify both optional clauses, the without where clause must be before the between clause.

When a without clause includes both optional clauses, where and between, the format looks like this:

without typeId as coassignmentId
   where boolean_expression
   between ( identifier1 identifier2... )

As mentioned previously, a find where clause applies to the event pattern while a without where clause applies to the event specified in its without clause. The following table shows the resulting behavior according to the type of the where clause and the value of its Boolean expression:

Type of where clause Boolean expression evaluates to true Boolean expression evaluates to false
Find where clause applies to event pattern Allows match Prevents match
Without where clause applies to its without event Prevents match Allows match
Example

Consider the following find statement:

find OuterDoorOpened as od -> InnerDoorOpened as id
   where od.user = id.user
   without SecurityCodeEntered as sce where od.user = sce.user {
      emit Alert("Intruder "+id.user);
}

Now suppose the following events arrive:

Event Received Result
OuterDoorOpened("Andrew")
SecurityCodeEntered("Andrew") Causes the without where clause to evaluate to true, which prevents a match.
InnerDoorOpened("Andrew") No alert is set.
OuterDoorOpened("Brian")
InnerDoorOpened("Brian") Because there is no intermediate SecurityCodeEntered event, there is a match and the query sends an alert. This is an example of how the absence of an event of a type specified in a without clause has the same effect as the presence of an event for which the without where clause evaluates to false.
OuterDoorOpened("Chris")
SecurityCodeEntered("Charlie") Causes the without where clause to evaluate to false, which allows a match.
InnerDoorOpened("Chris") Causes a match and the query sends an alert.
OuterDoorOpened("Dan")
SecurityCodeEntered("David") Causes the without where clause to evaluate to false, which allows a match.
SecurityCodeEntered("Dan") Causes the without where clause to evaluate to true, which prevents a match.
SecurityCodeEntered("Densel") Causes the without where clause to evaluate to false, which allows a match.
InnerDoorOpened("Dan") There is no match because one of the SecurityCodeEntered events caused the without where clause to evaluate to true, which prevents a match.

Query condition ranges

The within and without clauses (see Query conditions) can each have an optional between clause that restricts which part of the pattern the within or without clause applies to. The format for specifying a range is as follows:

between ( identifer1 identifier2... )

At least two identifiers that are specified in the event pattern are required. The identifiers specify a period of time that starts when one of the specified events is received and ends when one of the other specified events is received. A between clause is the only place in which you can specify a coassignment identifier that was assigned in a wait clause. You cannot specify identifiers used in a without clause. Also, the same event cannot match both the coassignment identifier in the without clause and an identifier in a between clause.

The condition that the between clause is part of must occur in the range of identifiers specified in the between clause. For example, consider the following find pattern:

find A as a and B as b and C as c without X as x between ( a b )

For there to be a match set for this pattern, no X event can be added to its window between the arrivals of the a and b events. If events are received in the order B A X C, then there is a match set because the X event is not between the a and b events. If the events are received in the order B C X A, then there is no match set because an X event occurred between the a and b events.

Here is another example:

find A as a -> B as b -> (C as c and D as d)
   within 10.0 between (a b)
   within 10.0 between (c d)
Range Description
(a b) This duration starts when an A event is received because the pattern is looking for an A event followed by a B event. For there to be a match, the B event must arrive less than 10 seconds after the A event.
(c d) After an A event followed by a B event has been received, this duration starts when either a C event or a D event is received. Since the pattern is looking for a C and a D, it does not matter which event is received first. For there to be a match, the event that is not received first must be received less than 10 seconds after the first event.

The following table provides examples of match sets.

Time Event Received Match Set
10 A(1)  
15 B(1)  
20 D(1)  
25 C(1) A(1), B(1), D(1), C(1)
37 D(2) No match. More than 10 seconds elapsed between C(1) and D(2).
40 C(2) A(1), B(1), D(2), C(2)

The range is exclusive. That is, the range applies only after the first event is received and before the last event is received. For example, consider this pattern:

find A as a1 -> A as a2 without A as repeated between ( a1 a2 )

A match set for this pattern is two consecutive A events. If three consecutive A events are added to the window, the first and third do not constitute a match set event though the first A was followed by the third A. This is because the second A was added between the first and the third A events. In other words, the events that match a1 and a2 are excluded from the range in which the repeated event can match. The following table provides examples of match sets for this pattern. It assumes that A(1) is still in the window when A(4) is added.

Event Added to Window Match Set Not a Match Set
A(1)    
A(2) A(1), A(2)  
A(3) A(2), A(3) A(1), A(3)
A(4) A(3), A(4) A(1), A(4) and A(2), A(4)

The query below is a real world example of the pattern just discussed. It emits the average price change in the last minute.

query FindAveragePriceMove {
   inputs {
      Trade() key symbol within 1 minute;
   }
   find every Trade as t1 -> Trade as t2
      without Trade as mid between (t1 t2)
      select avg(t2.price - t1.price) as avgPriceChange {
         emit AveragePriceChange(symbol, avgPriceChange);
   }
}

It is illegal to have two within clauses with identical between ranges. This would be redundant, as only the shortest within duration would have any effect. It is, however, legal to have more than one without clause with the same between range. Typically, these would refer to different event types or where conditions.

If or-terms (see Query or operator) are included in the range of a condition, then if an or-term is not matched, that coassignment is ignored in the range. If this means that the range has less than two points, the condition is ignored. There must be a combination of events for which there are at least two coassignments definitely in the range. Using only or-terms on opposite sides of an or operator in the pattern is an error, as the condition will never apply.

Special behavior of the and operator

To optimize performance when evaluating a query where clause, the correlator evaluates each side of an and operator as early as possible even if evaluation is not in left to right order. This behavior is different from the behavior outside a query. That is, outside a query, the left side of an and operator is guaranteed to be evaluated first. See Logical intersection (and).

For example, suppose you specify the following event pattern:

A as a -> B as b where a.x = 1 and b.y = 2

Consider what happens when the following events are added to their windows:

A(1), A(2), A(3), B(5), B(4), B(3)

The correlator can identify that

  • only the a coassignment target is needed to evaluate the a.x = 1 condition;
  • only the b coassignment target is needed to evaluate the b.y = 2 condition.

Because none of the B events cause the b.y = 2 condition to evaluate to true, the correlator does not evaluate the a.x = 1 condition.

In a where clause, because the right side of an and operator might be evaluated first, you should not specify conditions that have side effects. Side effects include, but are not limited to:

  • print or log statements
  • route, emit, enqueue...to, send...to statements
  • Modifying events, sequences, dictionaries, etc.
  • Causing a runtime error
  • Calling an action that has a side effect statement in it
  • Calling plug-ins that have side effects

If a where clause calls an action that has a side effect, you should not rely on when or whether the action is executed.

Whether the correlator can optimize evaluation of the where clause depends on how you specify the where clause conditions. For example, consider the following event definition:

event Util {
    static action myWhereClause(A a, B b) returns boolean {
         return a.x = 1 and b.y = 2;
    }
}

Suppose you specify the following event pattern:

A as a -> B as b where Util.myWhereClause(a, b)

If the same A and B events listed above are added to their windows, the result is the same as the result of evaluating the following:

A as a -> B as b where a.x = 1 and b.y = 2

However, evaluation might take longer because the correlator cannot separate evaluation of b.y = 2 from evaluation of a.x = 1. The myWhereClause() action returns a.x = 1 and b.y = 2 as a single expression. Consequently, the correlator evaluates Util.myWhereClause(a, b) for each combination of a and b. Given the A and B events listed above, this is a total of 9 times.

While the correlator might evaluate some where clause conditions in a right-to-left order, the correlator always evaluates each where clause condition as soon as it is ready to be evaluated. When multiple conditions become ready to be evaluated at the same time then the correlator evaluates those conditions in the order they are written. For example, the typical pattern of checking whether a dictionary contains a key before operating on the value with that key continues to work reliably:

E as e -> F as f where e.dict.hasKey("k") and e.dict["k"] = f.x and f.y = 1

In this example, f.y = 1 might be evaluated before the other two conditions, but e.dict.hasKey("k") is always evaluated before e.dict["k"] = f.x, and the latter is not evaluated if the hasKey() method returns false.

Aggregating event field values

A find statement can specify a pattern that aggregates event field values in order to find data based on many sets of events. A pattern that aggregates values specifies the every modifier in conjunction with select and having clauses.

Based on a series of values, an aggregate function computes a single value, such as the average of a series of numbers. See the API reference for EPL (ApamaDoc) for detailed information on all built-in aggregate functions.

Info
If a built-in aggregate function does not meet your needs, you can use EPL to write a custom aggregate function. A custom aggregate function that you want to use in a query must either be a bounded function or it must support both bounded and unbounded operation. See Defining custom aggregate functions.
For example, the following query watches for a withdrawal amount that is greater than some threshold multiplied by the average withdrawal amount of the ATMWithdrawal events in the window, which might be as many as 20 events. This query uses the last() aggregate function to identify the event added to the window most recently and uses the avg() aggregate function to find the average withdrawal amount of the events in the window. The having clause must evaluate to true for the query to send the SuspiciousTransaction event, passing the transaction ID of the suspicious withdrawal. You can use either as or the colon (:) as the coassignment operator.

using com.apama.aggregates.avg;
using com.apama.aggregates.last;
query FindSuspiciouslyLargeATMWithdrawals {
   parameters {
      float THRESHOLD;
   }
   inputs {
      ATMWithdrawal() key accountId retain 20;
   }
   find every ATMWithdrawal as w
      select last(w.transactionId) as tid
      having last(w.amount) > THRESHOLD * avg(w.amount){
      send SuspiciousTransaction(tid) to SuspiciousTxHandler;
   }
}

To use an aggregate function in a find statement, specify the every modifier and specify one or more select or having clauses. A select clause indicates that aggregate values are to be computed. Each select clause specifies a projection expression and a projection coassignment. The projection expression can use coassignments from the pattern if the coassignments are within a single aggregate function call. For example, the following pattern computes the average value of the x member of event type A in the query’s input and coassigns that average value to aax.

find every A as a select avg(a.x) as aax

A select clause can use parameter values. For example the following two select clauses are both valid if there is a parameter param:

find every A as a
   select avg(param * a.x) as apax
   select param * avg(a.x) as paax

You can specify multiple select clauses to produce multiple aggregate values.

In an aggregating find statement, only the projection expression can use the coassignments from the pattern. The procedural block of code can use projection coassignments and any parameters, but it cannot use coassignments from the pattern.

The first() and last() built-in aggregate functions are useful if you want to refer to the coassignment value of the oldest or newest event, respectively, in the window.

The following example determines the average price of trades other than your own:

find every Trade as t
   where t.buyer != myId and t.seller != myId
   select wavg(t.price, t.amount) as avgprice
Match sets used in aggregations

In find statements without the every modifier, only the most recent set of events that match the pattern are used to invoke the procedural code block. With the every modifier, every set of events that matches the pattern is available for use by the aggregate function, provided that the latest event is present in one of the sets of events. Any events or combinations of events that do not match the pattern or do not match the where clause, or are invalidated due to a within or without clause, are ignored; their values are not used in the aggregate calculation.

For example, consider the following find statement:

find every A as a -> B as b
   where b.x >= 2
   select avg(a.x + b.x) as aabx {
   print aabx.toString();
}

The following table shows what happens as events are added to the window.

Event Added to Window

Match Sets

Average Of

Value of aabx

A(1)

None

A(2)

None

B(2)

A(1), B(2) A(2), B(2)

3 and 4

3.5

B(1)

None because B(1) causes the where clause to be false.

B(3)

A(1), B(2) A(2), B(2)

A(1), B(3)

A(2), B(3)

3, 4, 4, and 5

4

Info
Only coassignments that definitely have a value may be used in aggregates. Or-terms that are on one side of an or operator in the pattern may not be used in aggregate expressions (see also Query or operator).
Using aggregates in namespaces

As with event types, an aggregate function is typically defined in a namespace. To use an aggregate function, specify its fully-qualified name or a using statement. The built-in aggregate functions are in the com.apama.aggregates namespace. For example, to use the avg() aggregate function you would specify the following in the query:

using com.apama.aggregates.avg;
Filtering unwanted invocation of procedural code

Each select clause defines an aggregate value to be produced. You can also specify one or more having clauses to restrict when the procedural code is invoked. For example, consider the following find statement:

find every A as a
   select avg(a.x) as aax
   having avg(a.x) > 10.0 {
   print aax.toString();
}

This example calculates the average value of a.x for the set of A events in the window. However, it executes the procedural block only when the average value of a.x is greater than 10.0.

Multiple having clauses

You can specify multiple having clauses and you can use parameter values in having clauses. For example,

find every A as a
   select avg(a.x) as aax
   select sum(a.y) as aay
   having avg(a.x) > 10.0
   having sum(a.y) > param1
   having max(a.z) < param2
   {
   print aax.toString(), + " : " + aay.toString();
}

When you specify more than one having clause it is equivalent to specifying the and operator, for example:

...
   having avg(a.x) > 10.0 or sum(a.y) > param1
   having max(a.z) < param2
...

is equivalent to

...
   having ( avg(a.x) > 10.0 or sum(a.y) > param1 ) and  ( max(a.z) < param2 )
...
Using a select coassignment in a having clause

Rather than specifying an aggregate expression twice, once in a select clause and then subsequently in a having clause, it is possible to refer to the aggregate value by using the select coassignment name. For example:

find every A as a
   select avg(a.x) as aax
   having avg(a.x) > 10.0 {
   print aax.toString();
}

You can rewrite that as follows:

find every A as a
   select avg(a.x) as aax
   having aax > 10.0 {
   print aax.toString();
}
Using a having clause without a select clause

When you want to test for an aggregate condition but you do not want to use the aggregate value, you can specify a having clause without specifying a select clause. For example,

find every A as a
   having avg(a.x) > 10.0 {
   print "Average value is greater than ten!";
}

Event matching policy

It is possible for the windows for a given key to contain multiple sets of events that, each taken in isolation, would match the defined event pattern. In this case, the matching policy determines which of the candidate event sets is the match set that triggers the query. There are two event matching policies:

  • Recent — This is the only policy followed for queries that to not specify the every keyword, that is, they do not specify aggregate functions.
  • Every — This is the only policy followed for queries that specify the every keyword. That is, they specify aggregate functions.

For both policies, the match set must include the latest event. The latest event is the event that was most recently added to the set of windows identified by a particular key.

For the recent matching policy, to identify which candidate match set triggers the query, the correlator compares the times of the second-most-recent events in the candidate event sets. If one of these events is more recent than its corresponding event(s) then the candidate event set it is in is the match set. However, if two or more candidate event sets share the second-most-recent event, then the correlator compares the times of the third-most-recent events in those candidate event sets. The correlator continues this until it finds an event that is more recent than its corresponding event(s) in other candidate event set(s). The candidate event set that becomes the match set is referred to as the most recent set that matches the event pattern.

Once the correlator determines which candidate event set is the match set, it ignores the order of any earlier events in any event sets. This means that it is possible for the most recent set of events to contain an event that was added earlier than an event in a set that is not the most recent set. The following event definitions and sample query illustrate this.

event APNR {
   // Automatic Plate Number Recognition
   string road;
   string plateNumber;
   integer time; // Represents time order for illustration purposes
}

event Accident {
   string road;
}

event NotifyPolice {
   string road;
   string plateNumber;
}

The following query uses these events:

query DetectSpeedingAccidents {
   inputs {
      APNR() key road within(150.0);
      Accident() key road within(10.0);
   }
   find APNR as checkpointA -> APNR as checkpointB -> Accident as accident
      where checkpointA.plateNumber = checkpointB.plateNumber
      and checkpointB.time - checkPointA.time < 100
      // Which indicates the car was speeding
   {
      emit NotifyPolice(accident.road, checkpointA.plateNumber);
   }
}

Suppose the following events are in the query windows:

  • APNR("MyRoad", "2N2R4", 1000)
  • APNR("MyRoad","FAB 1", 1010)
  • APNR("MyRoad","FAB 1", 1080)
  • APNR("MyRoad","2N2R4", 1090)
  • Accident("MyRoad")

There are two candidate event sets:

Coassignment identifier

A candidate event set

Another candidate event set

checkpointA checkpointB

accident

APNR("MyRoad", "2N2R4", 1000) APNR("MyRoad", "2N2R4", 1090)

Accident("MyRoad")

APNR("MyRoad","FAB 1", 1010) APNR("MyRoad","FAB 1", 1080)

Accident("MyRoad")

Both sets match against the single `Accident` event. The next most recent events are `APNR("MyRoad","2N2R4", 1090)` and `APNR("MyRoad","FAB 1", 1080)`. The `APNR("MyRoad", "2N2R4", 1090)` event is more recent. Consequently, after the `Accident` event is added to its window, there is a match and the match set includes the `Accident` event and the `2N2R4`  `APNR` events. This is the most recent set of events.

In this example, in the most recent set of events, the earliest event, APNR("MyRoad", "2N2R4", 1000) is earlier than the earliest event, APNR("MyRoad", "FAB 1", 1010), in the other set of events.

Acting on pattern matches

When a query finds a set of events that matches the specified pattern it executes the statements in its find block. The find block specifies one or more statements that operate on the matching event(s). The items available in a find block include:

  • Any parameters defined in the parameters section.

  • Coassignment variables specified in the event pattern.

    In the case of an aggregating find statement, only the projection expression can use the coassignments from the pattern. The find block can use projection coassignments, but it cannot use coassignments from the pattern.

  • Key values.

  • Actions that are defined in the same query after the find block. Any expression in the find statement pattern or block can reference an action defined after the find block.

  • EPL constructs and statements that are allowed in queries. See Restrictions in queries.

Defining actions in queries

In a query, after a find statement, you can define one or more actions in the same form as in EPL monitors. See Defining actions.

In a given query, an action that you define can be referenced from any expression in that query’s find statement, including any statements in its find block. For example:

query CallingQueryActions {
   parameters {
      float distanceThreshold;
      float period;
   }
   inputs {
      Withdrawal() key account within period;
   }
   find Withdrawal as w1 -> Withdrawal as w2
      where distance(w1.coords, w2.coords ) > distanceThreshold
      {
      logIncident( w1, w2 );
      sendSmsAlertToCustomer(
         getTelephoneNumber(w1), getAlertText(w1,w2) );
   }

   action distance( Coords a, Coords b) returns float {
      integer x := a.x - b.x;
      integer y := a.y – b.y;
      return ( x*x + y*y ).sqrt();
   }

   action logIncident ( Withdrawal w, Withrawal w2 ) {... }
   action getTelephoneNumber(Withdrawal w ) returns string {... }
   action getAlertText ( Withdrawal w1, Withrawal w2 ) returns string {... }
   action sendSmsAlertToCustomer( string telephoneNumber, string text ) {... }
}
Info
In a query, do not define an action whose name is onload, ondie, onunload, onBeginRecovery, or onConcludeRecovery. In EPL monitors, actions with these names have special meaning. For more information, see Monitor actions.

Implementing parameterized queries

An Apama query can define parameters and then refer to those parameters throughout the query definition. This enables a query definition to function as a template for multiple query instances.

A query that defines parameters is referred to as a parameterized query. An instance of a parameterized query is referred to as a parameterization.

A parameterized query offers the following benefits:

  • Patterns of interest (find patterns) may be customized from a single generic query. This can significantly reduce the amount of code that needs to be written and maintained.
  • Parameterizations exist only at runtime. There is no need to maintain a file for each instance.
  • Parameters can be used throughout the query in which they are defined. For example, you can use them in the definition of inputs, in find actions, and in user-defined actions. Values do not need to be hardcoded.

You define query parameters in the parameters section of a query definition. See also Format of query definitions. The format for specifying the parameters section is as follows:

parameters {
   data_type parameter_name;
   [ data_type parameter_name; ]...
}

In the following example, the parameters section is in bold as are the references to the parameters.

query FaultyProduct {
        parameters {
          string product;
          float thresholdCost;
          float warrantyPeriod;
        }
        inputs {
          Sale() key customerId within warrantyPeriod;
          Repair() key customerId retain 1;
        }
        find Sale() as s1 -> Repair() as r1
          where s1.product = product
          and r1.product = product
          and r1.cost >= thresholdCost
        {
          log "Cost of warranty covered repair for product "" + product +
             "! above threshold $" + thresholdCost.toString() + " by $
             " + (r1.cost - thresholdCost).toString() at INFO;
        }
     }

See also: Query lifetime.

Parameterized queries as templates

When a parameterized query is injected into a correlator no instances of the query are created until a request to create a parameterization is sent using the Scenario Service (that is, the com.apama.services.scenario client API). This request must include valid values for the query’s parameters. For example, if the query in the previous topic is injected, the request to create a parameterization must include valid values for the product, thresholdCost, and warrantyPeriod parameters. Only then does the query become active.

A parameterized query lets you define a generic query find pattern that operates on a particular group of input types and that can be customized for particular criteria. The query in the previous topic could be created for any product with the threshold cost and warranty period specified as required. To achieve the same result with a non-parameterized query, you would have to define a query such as the following:

query FaultyProduct {
   inputs {
      Sale() key customerId within 1 week; //warrantyPeriod
      Repair() key customerId retain 1;
   }
   find Sale() as s1 w-> Repair() as r1
      where s1.productId = "Mobile device A" // productId
      and r1.productId = "Mobile device A" // productId
      and r1.cost >= 50.00 // thresholdCost
   {
      log "Cost of warranty covered repair for product \"Mobile device A\
      " above threshold $50.00 by $" + (r1.cost - 50.00).toString() at INFO;
   }
}

While this query is valid it has the drawback that whenever you want to perform a similar query for a product that differs by type, warranty coverage period or threshold repair cost then a new query will need to be written (or most likely copied and pasted) with the new set of values and then injected into the correlator. The benefit of a parameterized query is that only one query definition needs to be injected into the correlator and you can then manually or programmatically create as many different instances for the different product-value combinations as required.

Using the Scenario Service to manage parameterized queries

There are several ways to manage (create/edit/remove) parameterizations:

The Scenario Service is also used to read and manage instances of DataViews and MemoryStore.

To these tools, a query will appear with the fully qualified name declared in the .qry file prefixed with QRY_ to highlight that the entity being viewed is a query. For parameterized queries, instances can be created, edited or deleted. For unparameterized queries, a single instance will appear as soon as the query is injected. This instance cannot be edited nor deleted, nor new instances created.

When there is a request to create a parameterization, the Scenario Service tries to validate the supplied parameter values. If the values are valid, the result is as if a query with those values had just been injected.

End users have the ability to define conditions on parameter values when setting them in dashboards. Parameter values can be modified only by the Scenario Service. Updates by the Scenario Service do not occur atomically across all contexts if the query is running in multiple contexts. Consequently, it is possible to observe the effects of the old parameter values interleaved with the effects of the new parameter values. For example, consider a query that has a pattern such as the following:

find A as a -> wait(paramValue) as t

The wait period will be based on the value the parameter had when the wait period started. If the parameter value is edited after the A event enters the partition the wait still fires according to the old value. Such transitions are typically short. The actual time required depends on various factors such as machine load and memory.

Some important differences between parameterized queries and other strategies include:

  • Parameterized queries have input variables but not output variables. DataViews and MemoryStore have both input variables and output variables. All queries have an empty list of output variables.
  • Requests to create or update a parameterization with values that are invalid will be denied. Invalid values are values that would cause wait, within or retain clauses to evaluate to less than or equal to zero, or would cause them to fail to evaluate, for example, by causing a runtime exception to be thrown.

For example, consider the following query:

query ParameterizationExample {
   parameters {
      integer intParam;
      integer floatParam;
   }

   inputs {
      A() key id retain (10/intParam);
      B() key id within (5.0 - floatParam);
   }

   find A as a -> B as b -> wait(-1.0 * floatParam)
      where (a.intField/intParam > 0) {
      log "Found match" at INFO;
   }

Suppose that there is a request to create a parameterization of this query. The request indicates that intParam is equal to 0 and floatParam is equal to 10.0. If the parameterization were created then every expression that contains a parameter value would immediately throw an exception or be invalid. In the inputs block, evaluation of the retain expression would result in a divide-by-zero exception. The within expression would evaluate to -5.0, which is not valid. Similarly, upon evaluating the elements in the find block the wait expression would be a negative value and the where clause would also result in a divide-by-zero exception. Since a parameterization such as this would lead to either invalid expressions or exceptions being thrown, these values are not allowed. If you try to pass disallowed values to the Scenario Service createInstance() method then the Scenario Service returns null. Similarly, if you try to pass invalid values to the Scenario Service editInstance() method, then the Scenario Service returns false, which indicates an error.

Referring to parameters in queries

You can refer to parameters throughout a query definition.

You cannot change parameter values in the query code itself. Parameter values can be modified only by the Scenario Service.

Info

CAUTION:

Apama recommends that you do not change parameter values used in input filters because it is possible to miss events that would cause a match. In a given parameterization, when an input filter refers to a parameter and you change the value of that parameter, it causes the parameterization to stop and restart. Events sent during the changeover are ignored. Also, there might have been earlier events that match the new parameter value but that did not make it into the window because they did not match the previous parameter value. An alternative is to use a parameter in a where clause in the find statement instead. This can be more efficient when the parameter value needs to be changed frequently. Using parameter values in input filters can also increase memory usage, see Queries can share windows.

Examples of using parameters in queries:

  • In retain and within expressions that are in the inputs block:

    parameters {
       integer maxRetention;
       float maxDuration;
    }
    inputs {
       A() key id retain maxRetention;
       B() key id with maxDuration;
    }
    
  • In the filter of the event template in the inputs block:

    parameters {
       float threshold;
    }
    inputs {
       Withdrawal(amount > threshold) key k;
    }
    
  • In where and within clauses that are in the find pattern:

    parameters {
       float maxDuration;
       float maxDifference;
    }
    inputs {
       A() key id retain 2;
    }
    find A as a1 -> A as a2 where (a2.cost - a1.cost) >
       maxDifference within maxDuration {
    

… } ```

  • In wait expression(s) that are in the find pattern:

    parameters {
       float interval;
    }
    inputs {
       A() key id retain 2;
    }
    find A as a1 -> wait(interval) as w1 -> A as a2 {
    

… } ```

  • In an aggregate expression that is in the find pattern:

    parameters {
       float avg;
    }
    inputs {
       A() key id within 1 day;
    }
    find every A as a
       select avg(a.cost - avg) as avgDeviation {
    

… } ```

  • In an action that is in the find block:

    parameters {
       float avg;
    }
    inputs {
       A() key id retain 1;
    }
    find A as a {
       log "Deviation from mean = " + (a.value - avg).toString();
    }
    
  • In a user-defined action block:

    parameters {
       float avg;
    }
    inputs {
       A() key id retain 1;
    }
    find A as a {
       log "Deviation from mean = " + getDeviation(a).toString();
    }
    action getDeviation(A a) returns float {
       return (a.value - avg);
    }
    

While parameter values can be used anywhere within the query it is illegal to mutate the parameter values. They can be modified only by the Scenario Service.

Scaling and performance of parameterized queries

Depending on the machine architecture a user can expect to be able to create several hundred parameterizations, which all concurrently process events.

As a result of the time required to process a parameterization edit request, the recommendation is to avoid multiple simultaneous edit requests for the same parameterization. There is no guarantee that all of the threads executing the parameterization will hold the same parameter values during the update period. During the update period, there might be a mix of results based on old parameter values and results based on new parameter values. Any requests to the same parameterization should be spaced approximately 1 second apart to allow time for requests to be executed throughout the parameterization. This applies to create, edit and delete requests.

In a cluster of correlators, the correlators share the same set of parameter values across the cluster. While a Scenario Service client can connect to any correlator in the cluster, it is not recommended to edit the same parameterization from multiple Scenario Service clients concurrently, as the results will be undefined.

Restrictions in queries

There are some EPL elements that are appropriate for monitors but not queries, for example spawn and die. This is because queries scale automatically, with multiple threads of execution processing the events for different partitions as and when they arrive. Hence, within query code, the spawn and die operations are meaningless. Queries operate on the events in their windows and do not need to set up event listeners, stream queries, or stream listeners. Also, queries cannot subscribe to receive events sent to particular channels.

The following EPL features cannot be used in queries:

  • Event listeners, that is, on statements
  • Stream queries and stream listeners
  • spawn and spawn...to statements
  • die statements
  • monitor.subscribe() and monitor.unsubscribe()
  • An identifier cannot start with two consecutive underscore characters. For example, __MyEvent is an invalid event type name in a query (it is valid in a monitor). A single underscore at the beginning of an identifier is valid.
  • Predefined self variable

Of course, you cannot call an action on an event when that action uses a restricted feature listed here.

The recommended means to send events from queries to monitors is by sending to a channel. See Generating events with the send statement.

The debugger does not support debugging query execution - it is not possible to set breakpoints in a query file. Use of the debugger can also affect how quickly queries are ready to respond to events, and should not be used in a production system (where it would cause significant pauses of the correlator).

Info
Several restrictions are enforced on queries if a license file cannot be found while the correlator is running. See Running Apama without a license file.

Best practices for defining queries

Use values for the length of the window that will not store too much data in the window.

Given the expected incoming event rate, set the within and/ or retain window lengths so that typically less than a hundred events per partition will be within the window. With more than that the cost of executing queries can become excessive and the system will not perform efficiently. There is no limit on the number of events within any partition. If a very small proportion of exceptional partitions has many more, then that is not a problem. The important factor is that if the average number is large, this can affect the performance of executing queries.

Use parameters instead of creating many similar queries.

Rather than writing many separate queries which are very similar in structure and differ only in values, it may be easier to write a template query and create multiple parameterizations of it. See also Parameterized queries as templates.

If a query requires different fields for its keys depending on the query parameters, it should use an action as a query key. See also Defining actions as query keys.

Use within in input durations if the partition values change over time

In some queries, the key used by the query may correspond to a transient object, that is, any given value for the partition is not permanent. For example, if tracking parcels being delivered, then each consignment ID will be short-lived. Once a parcel is delivered, there would in most cases be no more events for that consignment ID (and future deliveries may never re-use the same consignment ID). In these cases, over long periods, the number of different key values processed will only increase, as new IDs are generated. Such queries should include a within specification in the inputs for all event types. Otherwise, if inputs only have a retain specification, then the events will be held forever, and more and more storage will be required by the queries system. This is not typically necessary if the key corresponds to more permanent objects, such as ATMs or distribution depots.

Use input within that is larger than the value of all waits, withins in the pattern

If your inputs specify a within and there is wait or within in the pattern, then the input within should be larger than the longest wait and within in the pattern. If not, the pattern will not have the intended effect, as events will be expired from the input window while a wait or within in the pattern is still active.

Use same set of inputs to allow sharing of data

If you have many queries of different types and they are using a lot of memory or are running slowly, then check if they are using the same inputs definitions (see also Queries can share windows). Memory usage can be reduced and performance increased by making multiple queries use the same set of input definitions, even if some queries have some event types in their inputs that they are not using.

Understand the difference between filters and where clauses

Filters in the input section filter events before they are stored in the distributed cache. By contrast, the where clause filters events (or combinations of events) after they have been stored in the distributed cache. The where clause is more powerful, but also more expensive, especially if most events do not match the where clause.

  • A filter applies before the event window. Thus, events not matching the filter are ignored and do not need to be stored anywhere. This makes filtering a very cheap way of reducing the number of events that need to be processed. The retain count only applies to the events that match the filter. For example, the following query input will match events where there have been two events with value = 5; it will match if another event for the same k has occurred between them with value not equal to 5.

    query Q1 {
        inputs {
            Event( value = 5) key k retain 2;
        }
        find Event as e1 -> Event as e2 {
        }
    }
    

    Compare the above with:

    query Q2 {
        inputs {
            Event() key k retain 2;
        }
        find Event as e1 -> Event as e2 where e1.value = 5 and e2.value = 5 {
        }
    }
    

    This only matches if the last two events for a given value of k both have the value 5 - as we only retain 2 events and after retaining 2 events, check that they have value = 5.

  • A filter applies to all events. Note that in query Q2 above we had to repeat the value = 5 check.

  • A where clause does not affect the definition of the inputs. Query Q2 could share window contents with other queries that are concerned with different values of value, or do not filter at all.

  • A filter is restricted to range or equality matches per field of the incoming events. where clauses can be more complex (for example, where e1.field1 + e2.field2 = 10 is valid, as is e1.isTypeA or e1.isTypeB; but neither could be expressed in a filter).

Avoid changing parameter values used in filters

If using parameters in filters, avoid changing the values of those parameters. As this changes which events should be stored in the window, this is similar in effect to stopping a query instance and creating a new query instance. It involves creating new tables in the distributed cache, and events that are delivered to correlators while a new table is opened will be dropped. It may be more desirable to use a where clause to restrict which events match a pattern.

Use custom aggregates to get data from multiple match sets

As well as the built-in aggregates, it is possible to define new aggregates in EPL to collate information about all events that matched a pattern. For example, it may be desirable to have a list of all events that matched a pattern. This can be achieved by writing a new custom aggregate. For example:

// file MyAggregates.mon:
aggregate CollateEvents(Event e) returns sequence<Event> {
    sequence<Event> allEvts;
    action add(Event e) {
        allEvts.append(e);
    }

    action value() returns sequence<Event> {
        return allEvts;
    }
}
// file PrintAllEvents.qry:
query PrintAllEvents {
    inputs {
        Event() within 2 hours;
    }

    find every Event as e1 select CollateEvents(e1) as c1 {
        Event e;
        for e in c1 {
            print e.toString();
        }
    }
}

Testing query execution

When writing queries, as with any programming, it is important to test that the query is behaving as expected. Testing can be as simple as a small Apama project with the event definitions, the queries, and an .evt file of events to send to the query. You can use this project to check whether the query sends out the correct events. In Apama Plugin for Eclipse, use the Engine Receive view to observe the output of the query. Whether or not a query is written to send output events, you can add log statements to the query file to verify whether it has or has not triggered.

Be sure to test queries in an environment that is separate from your production environment. Of course, preventing problems is the best way to avoid the need to troubleshoot so ensure that queries are sufficiently tested before deploying them.

The following background information and troubleshooting tips provide some guidance. See also Overview of query processing.

Exceptions in queries

In a query, exceptions can occur in the following places:

  • Procedural code in a find statement block
  • having clause
  • retain clause
  • select clause
  • wait clause
  • All where clauses
  • All within clauses

An exception in the inputs block (retain or within clause) or the find block’s wait or within clause causes the query to terminate. If there is an exception elsewhere, the query continues to process incoming events. An exception that occurs in a where or having clause causes the Boolean expression to evaluate to false.

Event ordering in queries

Unlike EPL monitors, the order in which queries process events is not necessarily the order in which they were sent into the correlator. In particular, if two events that will be processed by the same query with the same key value are sent very close together in time (both events received less than about.1 seconds of each other) then they may be processed as if they had been sent in a different order. For example, consider a query that is looking for an A event followed by an A event. If two A events with the same key arrive 1 millisecond apart then the events might not be processed in the order in which they were sent.

Queries use multiple threads to process events and to scale across multiple correlators on multiple machines. To do this efficiently, there is no enforcement that the events are processed in order. However, when events that have the same key arrive roughly about.5 seconds apart or more then out-of-order processing is typically avoided provided the system can keep up with the load. Therefore, you want to specify a query so that it operates on partitions in which the arrival of consecutive events is spaced far enough apart. For example, consider a query that operates on credit card transaction events, which could mean thousands of events per second. You want to partition this query on the credit card number so that there is one event or less per partition per second. By following this recommendation, it becomes possible to process events that are generated at rates of up to 10,000 events per second.

When creating an .evt file for testing purposes, the recommendation is to begin the file with a &FLUSHING(1) line to cause more predictable and reliable event-processing behavior. See Event timing.

Query diagnostics

To help you monitor queries that are running on a given correlator, Apama provides data about active queries in DataViews. See Monitoring running queries.

When deploying Apama queries it is possible to enable generation of diagnostic information. These are log statements that explain some of the internal workings of the query evaluation. In particular, events coming into the query and the contents of the windows before the pattern is evaluated are both logged. This can aid understanding of how the query evaluation occurs. If a query is misbehaving then providing this diagnostics logging to Apama support can help in understanding the issue.

Info
Diagnostic logs contain the event data. You may want to consider using fake data rather than real data if the real data is sensitive.

Logging in where statements

It can be useful to modify a query so that rather than including the expression that needs to be evaluated in a where clause, the query calls an action on the query to execute the expression used by the where clause. This allows logging of inputs and the result of the expression. For example, instead of a query that contains the following:

find A as a -> B as b where a.x >= b.x {...

Write the query this way:

action compareAB(A a, B b) returns boolean {
  log "compareAB; inputs:  A as a = "+a.toString()+ ", B as b = "+b.toString();
  boolean r:= (a.x >= b.x);
  log "compareAB; result is "+r.toString();
  return r;
}

find A as a -> B as b where compareAB(a, b) {...

You can then use these log statements to check if the query is behaving as expected.

Divide and conquer

One of the advantages of testing a query with a known set of input events is that it is possible to see how changing the query affects the results. For example, if a query is not matching any events and has many within and without clauses, try removing all of them. One way to do this is to place them onto separate lines and use // as a comment at the beginning of the lines in the source view. If the query still does not fire, use query diagnostics to check that events are being evaluated. If the query is firing, then add within and without clauses one at a time until the query stops firing. The problem is at the condition that stops it from firing when it should.

Query performance

A critical factor that affects the performance of queries is the size of the windows specified in the inputs block of the query. Aim for windows that contain no more than 100 events. Depending on the distributed cache used to store data, it may also be necessary to change the number of parallel contexts per correlator. Experiment with different values for the number of worker contexts. See also Overview of query processing.

Using external clocking when testing

When testing queries, as well as switching into single context execution, it is often useful to use external clocking. This allows &TIME events to be sent into the correlator to simulate the passage of time, which allows queries involving long durations (for example, multiple days) to be tested easily. To ensure the correct ordering of processing between events and &TIME events, you should also include &FLUSHING(1) at the beginning of the event file, before any events. See Externally generating events that keep time (&TIME events) and Event timing.

Communication between monitors and queries

Queries can be used with or without monitors written in EPL. The following statements can be used to send events between queries, or between queries and monitors or vice versa:

  • route statement. A route statement from a query sends the event to be processed by other query instances. This is the recommended mechanism for sending events between queries. See also The route statement.

    The route statement cannot be used to send events to a monitor.

  • send...to statement. A send...to statement can be used to send an event to all Apama queries running on that correlator by sending it to the com.apama.queries channel or the default channel. To send the event to a monitor, send it to a channel the monitor is listening to. See also The send… to statement.

Info

Queries receive events sent to the default channel, which is useful for testing.

The order in which events are processed is not guaranteed for queries. See Event ordering in queries.

In case the events are expected to be received by a monitor, the monitor author should make it clear which channel they are expecting events on. The channel name can be a single name for a given monitor or a name constructed from data in the event, so that different values are processed in parallel.

If you are using multiple correlators, be aware that communication between queries and monitors normally takes place within a single correlator. However, it is possible to use engine_connect or Universal Messaging to connect correlators. This allows an event sent on a channel on one correlator to be processed by a monitor subscribed to that channel on another correlator.

Unlike a query’s history window, any state stored in EPL monitors, including in the listeners, is independent in each correlator, and is not automatically moved or shared between correlators.