Lexical elements

The lexical rules of the EPL grammar describe how sequences of characters are used to form the basic elements of the language, that is, identifiers, constants (string, numeric, and so on), operators, separators, white space, comments, and language keywords. These elements, after discarding any white space and comments, form the symbols used in the syntactical grammar of the language.

Program text

A program’s source text is composed of an optional UTF-8 byte-order marker followed by characters that form a sequence of symbols, white space, comments, and line terminators, up to the end of file (denoted by the EOF symbol).

The UTF-8 byte order marker is a sequence of three consecutive bytes with the values 0xEF, 0xBB, and 0xBF respectively, appearing at the beginning of a file containing EPL source text. The UTF-8 character encoding format does not need a byte-order marker to indicate the byte order because UTF-8 is by definition a bytewise encoding. A UTF-8 byte-order marker at the start of a file just indicates that the program text is encoded in the UTF-8 format. It is inserted automatically by some text editors, such as Notepad on Windows systems.

A program’s source text can be encoded as Unicode UTF-8, as 7-bit ASCII (which is a proper subset of UTF-8), or various other encodings. The comiler will convert the source text from the locale’s encoding to UTF-8 if necessary. In practice, this really only affects comments, white space, and string literals because all other EPL constructs are limited to the ASCII subset. Identifiers, for example, are limited to only a few of the many possible Unicode characters.

Comments

Comments are explanatory notes or text intended for human readers to help them understand what a program or section of a program does.

There are several kinds of comments:

  • Block comments

    Block comments begin with the character sequence slash-asterisk /*, which is followed by any number of other characters and line breaks, followed by a closing asterisk-slash */ sequence. The entire contents of all block comments are ignored.
     

  • End-of-line comments

    End-of-line comments begin with two consecutive slash characters // followed by any number of characters up to and including the end of the current line. The entire contents of all end-of-line comments are ignored.
     

  • ApamaDoc comments

    ApamaDoc comments are a special kind of block comment which begin with the character sequence slash-asterisk-asterisk /**. Their content is used when generating documentation for your EPL code. See also Generating documentation for your EPL code.

White space

White space characters are characters such as spaces and tabs that are used between symbols to separate them. White space characters are sometimes required between symbols when they would otherwise be misinterpreted or unrecognizable. For example, the symbol / is used as the division operator and the symbol * is used as the multiplication operator, but the character pair /* with no white space between them marks the beginning of a block comment.

Though they act as separators between symbols, white space characters are otherwise ignored and discarded during program compilation.

Judicious use of white space improves a program’s readability.

The ASCII white space characters and their encodings are listed below:

Code Point UTF-8 Encoding ASCII Encoding Name
0x0020 0x20 0x20 Space
0x0009 0x09 0x09 Horizontal Tab
0x000c 0x0c 0x0c Form Feed
0x001c 0x1c 0x1c File Separator
0x001d 0x1d 0x1d Group Separator
0x001e 0x1e 0x1e Record Separator
0x001f 0x1f 0x1f Unit Separator

The Unicode white space characters, as defined by the Unicode character dictionary, and their encodings are listed below:

Code Point UTF-8 Encoding Name
0x0085 0xc2 0x85 unnamed control character
0x00a0 0xc2 0xa0 NO-BREAK SPACE
0x1680 0xe1 0x9a 0x80 OGHAM SPACE MARK
0x180e 0xe1 0xa0 0x8e MONGOLIAN VOWEL SEPARATOR
0x2000 0xe2 0x80 0x80 EN QUAD
0x2001 0xe2 0x80 0x81 EM QUAD
0x2002 0xe2 0x80 0x82 EN SPACE
0x2003 0xe2 0x80 0x83 EM SPACE
0x2004 0xe2 0x80 0x84 THREE-PER-EM SPACE
0x2005 0xe2 0x80 0x85 FOUR-PER-EM SPACE
0x2006 0xe2 0x80 0x86 SIX-PER-EM SPACE
0x2007 0xe2 0x80 0x87 FIGURE SPACE
0x2008 0xe2 0x80 0x88 PUNCTUATION SPACE
0x2009 0xe2 0x80 0x89 THIN SPACE
0x200a 0xe2 0x80 0x8a HAIR SPACE
0x2028 0xe2 0x80 0xa8 LINE SEPARATOR
0x2029 0xe2 0x80 0xa9 PARAGRAPH SEPARATOR
0x202f 0xe2 0x80 0xaf NARROW NO-BREAK SPACE
0x205f 0xe2 0x81 0x9f MEDIUM MATHEMATICAL SPACE
0x3000 0xe3 0x80 0x80 IDEOGRAPHIC SPACE

All white space characters appearing between two symbols are ignored. However, note that white space appearing within string literals is not ignored. See Literals.

Line terminators

Line terminators are used to mark the end of a line of source text. Different operating systems use different characters or character sequences to mark the end of a line.

The following terminators are used on various operating systems:

Operating System Line Terminator
Mac OS X ASCII Carriage Return (0x0D)
UNIX ASCII Newline (0x0A)
Linux ASCII Newline (0x0A)
Windows ASCII Carriage Return (0x0D) followed by ASCII Newline (0x0A)

In general, any number of line terminators can be used between any two symbols in a program and they are treated the same as other white space. A line terminator appearing at the end of an end-of-line comment terminates the comment.

Symbols

Symbols (also called tokens, atoms, or lexemes) are the elements and words of the language, consisting of identifiers, keywords, operators, separators, and literals. Symbols are composed of one or more characters, excluding white space, comments, and line terminators.

Identifiers

An identifier is a character sequence composed of a combination of the following characters:

  • The 26 letters of the Roman alphabet in upper and lower case
  • Digits 0 through 9
  • Underscore character (_)
  • Dollar sign ($)

The first character must not be a digit. Identifiers are case sensitive. An identifier must not have the same spelling as a keyword. For example, the word action is a keyword and cannot be used as an identifier. An identifier can also contain a hash symbol (#) as the first character. This is helpful if you want to use a keyword as an identifier. See also Keywords.

Identifiers

Keywords

EPL keywords are case sensitive. They are reserved words that are an intrinsic part of the language, and must not be used as identifiers (for example, monitor, print and event). See also Identifiers.

There are also some words that EPL may use as keywords in a future release, and you should avoid using them as identifiers (for example, public, class and byte).

Apama Plugin for Eclipse and the correlator will warn you if you attempt to use a keyword or a word reserved for future use.

If you absolutely have to use a keyword or a word reserved for future use as an identifier, then you have to prefix this word using the hash symbol (#). This may be the case if an external system mandates a particular field name in an event type. Adding the hash symbol (#) makes sure that EPL accepts a keyword as an identifier. Internally, however, Apama treats #action as action.

Operators

Operators are symbols used in expressions and statements to perform a computation on or test a relation between data values or, in event expressions, to detect sequences and patterns of events. As you will see, the same symbol is sometimes used for different operations, depending on the context in which the operator is used. For example, the and operator is used both in logical expressions, and event sequencing and the * operator is used both for integer and floating point multiplication and to match any value in event templates.

Ordinary operators

The ordinary operators are used in primary and bitwise expressions. See Expressions to perform calculations and comparisons on variables, data values, and other constructs. The descriptions of the built-in types in the API reference for EPL (ApamaDoc) provide information about the operators that you can use with values of each type.

The ordinary operators are grouped into the following subcategories:

Field operators

Field operators are used within event expressions to define conditions on individual fields in an event template. See Field operators.

Separators

Separators are symbols that are used in certain statements and expressions. These are:

  • {
  • }
  • [
  • ]
  • (
  • )
  • .
  • ;
  • ,
  • :
  • white space

Separators are used to:

  • Keep the various parts from bumping into each other, for example commas between parameter values in an action call.
  • Group related elements together, for example the left and right braces at the beginning and end of a block of statements.

Literals

A literal is a source text representation of a constant value of a primitive type, or a location, dictionary, or sequence type.

You might want to declare a constant for a frequently used literal so that you can refer to it by name. See Specifying named constant values.

Boolean literals

There are two Boolean literal values: true and false.

Example:

a := true;
b := false;

Integer literals

Integer literal values can be written either base 10 (decimal) or base 16 (hexadecimal).

Base 10 literals

Base 10 integral literal values are a sequence of one or more of the digits 0 through 9.

Examples:

i := 0;
i := 11;
i := 1023;
i:= 9223372036854775807;

The value can optionally be preceded by a sign. If the sign is omitted, + is assumed.

The number 9223372036854775807 or (263 - 1) is the largest base 10 integer literal value that can be represented.

Base 16 literals

Base 16 integral literal values begin with the characters 0x, and consist of a combination of the decimal digits 0 through 9 and the hexadecimal digits a through f and A through F.

Examples:

j := 0x0;
j := 0x0d;
j := 0x0aFF;
j := 0x7fffffffffffffff;

The number 0x7fffffffffffffff or (263 - 1) is the largest base 16 integer literal value that can be represented.

Floating point and decimal literals

Floating-point literal values can take one of the following forms:

  • Optional sign, integer digits followed by an exponent.
  • Optional sign, integer digits, a decimal point, and an optional exponent,
  • Optional sign, integer digits, a decimal point, fraction digits, and an optional exponent.
  • Optional sign, a decimal point, fraction digits, and an optional exponent.

If the sign is omitted, + is assumed. If the exponent is omitted, e0 is assumed.

The exponent is the letter “e” followed by an optional sign, and one or more decimal digits.

Examples:

f := 0.0;
f := 1.;
f := 200128.00005
f := 3.14159265358979;
f := 1e4;
f := 1e-4;
f := 10000e0;
f :=.1234;
f :=.1234e4;
f := 1.E-32;
f := 1.E-032;
f := 6.0221415E23;
f := 1.7976931348623157e308;

The largest positive floating point literal value that can be represented in EPL is 1.7976931348623157 * 10308. The smallest positive nonzero value that can be represented is 2.2250738585072014 * 10-308. If you write a floating-point literal whose value would be outside the range of values that can be represented, the compiler raises an error.

String literals

A string literal is a sequence of characters enclosed in double quotes.

The backslash character is used as an escape character to allow inclusion of special characters such as newlines and horizontal tabs.

To include a double quote in a string literal, precede it with a \ character which serves as an escape character, which means “do not treat this quote as the end of the string literal”.

To include a newline, use \n.

To include a tab character, use \t.

To include a single \ character, use two: \\. The compiler will remove the extra backslashes.

Examples:

s := "Hello, World!";
s := "\ta\tstring\twith\ttabs\tbetween\twords";
s := "a string on\n two lines";
s := "a string with \\ a backslash and a \" quote";

The length of a string literal is limited only by available memory at compile time and runtime. In practice, this means you can make them as long as you need.

Location literals

The four float literals form the location’s corner point coordinates, x1, y1 and x2, y2.

Example:

location(0.0, 0.0, 10.0, 10.0)

Dictionary literals

A dictionary literal can contain one or more pairs of key/item values.

The first expression in a dictionary literal entry is the key value and the second expression is the item value. In a dictionary literal, all key values must be the same type and all item values must the same type. Both must be of a type that matches the types specified in the dictionary variable’s definition.

A dictionary literal must contain at least one key/item pair except when the dictionary literal is in an initializer. For example, the following statement is valid:

myDictionary := {};

The following statement is not valid:

takesADictionaryArgument({});

Example:

{1:"One", 2:"Two", 3:"Three"}

Sequence literals

A sequence literal can contain one or more sequence item values.

Each expression in the comma separated list is one entry in the sequence literal. The types must all be the same and must match the sequence type.

A sequence literal must contain at least one item except when the sequence literal is in an initializer. For example, the following statement is valid:

mySequence := [];

The following statement is not valid:

takesASequenceArgument([]);

Example:

[1,2,3,4]

Time literals

In Apama query definitions, time literals can be in within clauses. They are either float or integer literals followed by a unit. Not all units are required, but they have to be in order.

Note: Apama queries are deprecated and will be removed in a future release.

You can specify the following time literals, in the following order:

  • day/days
  • hour/hours
  • min/minute/minutes
  • sec/second/seconds
  • msec/millisecond/milliseconds

For example:

  • 10 hours
  • 1.5 days
  • 1 day 2.5 hours 10 min 4 sec
  • 2 day 3.5 minutes

A space is required between a float or integer literal and its associated time unit. A space is required between a time unit and a float or integer literal that follows it. Additional whitespace is also allowed.

You cannot specify a negative number.

Outside a query, you can use these keywords as identifiers. Inside a query, you cannot use these keywords as identifiers unless you prefix them with a hash symbol (#). See also Keywords.

Names

Names are used in EPL programs to refer to the various different kinds of entities in the program. Actions, variables and reference variable members, parameters, monitors, methods, aggregate functions, events, packages, and plug-ins all have names.

Description

Names are either simple or qualified. Simple names consist of a single identifier. Qualified names consist of a sequence of identifiers separated by . symbols, with an optional . prefix.

Every name has a scope, which is the part of a program’s text where the name can be used as a simple identifier. The scope is determined by where in the program the name is declared. See Variable scope.

Do not create EPL structures in the com.apama namespace. This namespace is reserved for future Apama features. If you inadvertently create an EPL structure in the com.apama namespace, the correlator might not flag it as an error in this release, but it might flag it as an error in a future release.

Name Precedence

When there are duplicate unqualified names for types, the correlator searches for the associated definition in the following order, and uses the first one it finds:

  1. The monitor-internal type definitions, for example, event type definitions and custom aggregate function definitions.
  2. Definitions that have been brought in with a using declaration in the current file.
  3. Definitions in the current package (this could be the root namespace if a package was omitted).
  4. The root namespace.

The fully qualified name of a type can always be named by using a dot (.) followed by the fully qualified name. For example, select.com.apama.aggregates.avg(x) uses the built-in avg type, even if com is a name in the current package.

If you try to create a package-level type that has the same name as a definition brought in with a using declaration, it causes a compiler error and the code does not inject. For example:

package foo;
using bar.Bar;
event Bar { // Causes an error when injecting as Bar has already been
            // defined by a "using" declaration.}

You cannot define a type that has the same fully-qualified name as another type.

If two types have the same name but are in different packages, either one can take precedence over the other depending on their ordering in the precedence list. The correlator uses the first match it finds even if that results in an error when a lower-priority match would have worked. For example:

X x;

This causes an error if, for example, there is an aggregate function called X in the current package even if there is an event type called X in the root namespace. You can use a . prefix on the name to force it to be looked up from the root namespace, in which case the fully qualified name must be used.

Annotations

A program can contain predefined annotations before specific language elements. For detailed information, see Adding predefined annotations.