Using correlator persistence

Info
Correlator persistence is deprecated and will be removed in a future release.

When the correlator shuts down, the default behavior is that all state is lost. When you restart the correlator, no state from the previous time the correlator was running is available. You can change this default behavior by using correlator persistence.

Correlator persistence means that the correlator automatically periodically takes a snapshot of its current state and saves it on disk. When you shut down and restart that correlator, the correlator restores the most recent saved state.

To enable persistence, you indicate in your EPL code which monitors you want to be persistent. Optionally, you can write actions that the correlator executes as part of the recovery process. When code is injected for a persistence application, the correlator that the code is injected into must have been started with a persistence option.

Persistent monitors must be written in EPL. State in chunks, with a few exceptions, cannot be persistent.

To protect the security of personal data, see Handling personal data “at rest” in the correlator persistence and JMS datastores.

If you plan to install a new version of Apama, see Persistence database backup.

Info
If a license file cannot be found, the number of persistent monitors that the correlator allows is limited. See Running Apama without a license file.

Description of state that can be persistent

A correlator that is running with persistence enabled automatically stores state on disk and automatically recovers state when it restarts. Saved state includes the following:

  • For a persistent EPL monitor, all of that monitor’s state is saved. This includes all events, strings, primitives, sequences, dictionaries, action variables, closures, and global variables. It also includes all the state of listeners and streams — local variables captured by them and all active listeners and sublisteners, including the events currently flowing through them.

  • All source code that was injected into the correlator, including any non-persistent EPL monitors. EPL files that were injected from a correlator deployment package (CDP) are not stored in plain text.

    Code that is not injected includes the following:

    • EPL plug-ins, which are imported at runtime. The actual plug-in file must be on a specified path that the correlator can load it from.
    • Any Java class files on the correlator’s classpath but not injected.
    • The correlator runtime itself.
  • Contents of all context queues.

  • Some correlator-global state including integer.getUnique() and integer.incrementCounter() IDs and context IDs. See the API reference for EPL (ApamaDoc) for more information on the integer type and its built-in methods incrementCounter and getUnique.

Info
In general, chunks cannot be persistent. However, chunks used by the Apama Time Format plug-in and the Apama MemoryStore plug-in can be persistent.

When persistence is useful

Enabling correlator persistence is a good fit for applications in which it is unacceptable to lose any information. For example, an application for processing mortgage requests does not need to be available continuously. A small amount of downtime, especially outside business hours, might be acceptable. However, losing any state associated with a mortgage application would be unacceptable.

In such a mortgage processing application, there is unlikely to ever be a point at which there are no open applications and thus no state to preserve. But state might change over the course of weeks, rather than seconds. Enabling correlator persistence lets you implement complex event expressions such as the following:

on all LoanRequest() -> (PropertyValuation() and ProofOfIncome())
  within (4 * week)...

With persistence enabled, the event expression can still be running even if weeks elapse between when it is created and when it finally completes. Without persistence, the event expression’s state is susceptible to being lost if there are system restarts, software upgrades, and the like.

When non-persistent monitors are useful

A correlator that is running with persistence enabled can have persistent and non-persistent monitors injected. Non-persistence is a good choice for a monitor that does one or more of the following:

  • Uses legacy code that does not use the persistence feature. See Designing applications for persistence-enabled correlators.
  • Interacts with user-defined EPL plug-ins or Apama EPL plug-ins other than the Time Format or MemoryStore plug-ins.
  • Contains large amounts of fast-changing state that is undesirable to persist for performance reasons.
  • Operates as a stateless utility that just responds to incoming events.
  • Contains minimal state that can be reconstructed by the onBeginRecovery() action on a persistent monitor.

How the correlator persists state

When persistence is enabled, the correlator periodically writes data to disk to reflect the correlator’s runtime state. To do this, the correlator

  1. Suspends all execution in the correlator across all contexts.
  2. Takes an in-memory snapshot of what needs to be stored.
  3. Resumes processing while the state is written to disk.

The correlator waits to suspend execution until all contexts have completed any in-progress event processing and any in-progress deletions. It can take time for the correlator to pause all contexts. Consequently, it is best practice that a single event listener does not take a long time to process. When there is a need to perform a large amount of work, try to split the work across multiple events.

How fine-grained to split work depends on the performance requirements of the application. Avoid very fine-grained work units as the overhead of scheduling will start to dominate and lead to the application running slowly.

Committing the snapshot to disk is an atomic operation. That is, a failure while storing state reverts the stored data to the previously successfully stored snapshot.

By default, the correlator does the following when you enable persistence:

  • Takes a snapshot of state changes every 200 milliseconds. This is the snapshot interval. The correlator tracks the in-memory objects that have changed since the last snapshot and writes only that state to disk. If only a small fraction of the correlator’s state changes, then only a fraction of the correlator’s state must be stored for each snapshot.
  • Automatically adjusts the snapshot interval. For example, if a significant percentage of the correlator’s state changes, then the correlator increases the snapshot interval so that the overall throughput is not adversely affected.
  • Stores persistent state in the current directory, which is the directory in which the correlator was started.
  • Uses persistence.db as the name of the file that contains persistent state. This is the recovery datastore.
  • Copies the recovery datastore to the input log if one was specified when the correlator was started. This happens only upon restarting the correlator.
  • For applications that do not use the correlator’s internal clock (correlators started with the -Xclock option), the correlator uses the time of day in the last committed snapshot as the current time in the restarted correlator.

Enabling correlator persistence

Before you enable persistence, you should design and develop your application to handle persistence and recovery. See Designing applications for persistence-enabled correlators.

Info
If a license file cannot be found, the number of persistent monitors that the correlator allows is limited. See Running Apama without a license file.

To enable correlator persistence, you must proceed as follows:

  • Insert the word persistent before the monitor declaration for each monitor written in EPL that you want to be persistent. For example:

    persistent monitor Order {
       action onload() {
        ...
       }
    }
    

    For a monitor declared as persistent, the correlator persists the state of all monitor instances of that name, and all instances of events that the monitor instances create.

    You do not mark event types as persistent. Whether or not an event is persisted depends on whether it is used from a persistent or non-persistent monitor. If an event is on a context queue when the correlator takes a snapshot, the event is persisted.

  • Optionally, define onBeginRecovery() and onConcludeRecovery() actions in your persistent monitors. The correlator executes any such actions as part of the recovery process. To determine whether you need to define these actions, see  Designing applications for persistence-enabled correlators, Defining recovery actions and Sample code for persistence applications.

  • Specify one or more persistence options when you start the correlator. To enable correlator persistence, you specify one of the following:

    • the -P or -Penabled=true option, or

    • the --config option together with the name of a YAML configuration file that contains the following definition:

      correlator:
        persistence:
          enabled: true
      

    Specify just one of the above options (without any additional persistence options) to implement the default behavior for correlator persistence.

    To change the default behavior, also specify one or more of the options described in the table below. The correlator uses the default when you do not specify an option that indicates otherwise. For example, if you specify -P, -PsnapshotIntervalMillis and -PstoreLocation (or --config with a YAML configuration file that contains the corresponding options), the correlator uses the values you specify for the snapshot interval and the recovery datastore location and uses the default settings for all other persistence behavior.

    For more information on the different -P options and the --config option, see Starting the correlator.

    For information on all of the persistence options that you can specify in a YAML configuration file, see Configuring persistence in a YAML configuration file.

    Info
    During development of a persistence application, it varies whether you want to specify a persistence option when you start the correlator. In the earlier stages of development, you might choose not to specify a persistence option since you might make many and frequent changes to early versions of your program, thereby making recovery of a previous version impossible. For example, you might have changed the structure and perhaps added new variables. Once your program structure becomes relatively stable, you must take into account what happens during recovery and you will want to define onBeginRecovery() and onConcludeRecovery() actions. These actions never get called in a correlator that was not started with a persistence option. To deploy a persistence application, the correlator must be started with a persistence option.
  • If you are using both correlator persistence and the compiled runtime (--runtime compiled option), we recommend the use of the --runtime-cache option to improve recovery times. For more information on these options, see Starting the correlator.

The following table describes correlator persistence behavior, the default behavior, and the options you can specify to change default behavior.

Correlator Persistence Behavior

Default

Option for Changing

The correlator waits a specified length of time between snapshots.

200 milliseconds

-PsnapshotIntervalMillis=interval Or the corresponding option in a YAML configuration file:

snapshotIntervalMillis: interval

Specify an integer that indicates the number of milliseconds to wait.

The correlator can automatically adjust the snapshot interval according to application behavior. It can be useful to set this to false to diagnose a problem or test a new feature.

true. The correlator automatically adjusts the snapshot interval.

-PadjustSnapshot=boolean Or the corresponding option in a YAML configuration file:

adjustSnapshot: boolean

The correlator puts the recovery datastore in a specified directory.

The directory in which the correlator was started. That is, the current directory.

-PstoreLocation=path Or the corresponding option in a YAML configuration file:

storeLocation: path

You can specify an absolute or relative path. The directory must exist.

The correlator copies the snapshot into a specified file. This is the recovery datastore.

persistence.db

-PstoreName=filename Or the corresponding option in a YAML configuration file:

storeName: filename

Specify a filename without a path.

For correlators that use an external clock, the correlator uses a specified time of day as its starting time when it restarts. This behavior is useful only for replaying input logs that contain recovery information.

The time of day captured in the last committed snapshot.

-XrecoveryTime num To change the default, specify an integer that indicates seconds since the epoch.

The correlator can automatically copy the recovery datastore to the input log when a persistence-enabled correlator restarts.

The correlator copies the recovery datastore to the input log.

-noDatabaseInReplayLog Or the corresponding option in a YAML configuration file:

includeDatabaseInInputLog

You might set this option if you are using an input log as a record of what the correlator received. The recovery datastore is a large overhead that you probably do not need. Or, if you maintain an independent copy of the recovery datastore, you probably do not want a copy of it in the input log.

Info

Important:

If an option is specified both with one of the -P options on the command line and in a YAML configuration file, the value on the command line takes precedence and a warning is logged.

How the correlator recovers state

When you restart a correlator for which persistence has been enabled the correlator

  • Detects, recompiles, and re-injects all code that was injected and not deleted as of the last committed snapshot.
  • Restarts and restores the state of all persistent monitors as of the last committed snapshot.
  • Restarts non-persistent EPL monitors at their onload() action.
  • Executes any onBeginRecovery() and onConcludeRecovery() actions. See Defining recovery actions.
  • Recovers persistent connections (connections created with engine_connect -p) and resumes them at the first opportunity.

Code is re-injected in the order in which it was originally injected. The correlator tracks which objects (monitors, events, Java objects) were deleted and does not re-inject them. Such objects might have been deleted explicitly with the engine_delete utility or implicitly as when all instances of a monitor have terminated. If a snapshot shows that an object was deleted and then re-injected, recovery ignores the first injection and re-injects the monitor or event at the point of its second injection.

For a persistent monitor, recovery appears to be a pause in processing. This pause has the potential to be long enough to cause some events to be stale. All non-persistent monitors appear to have spontaneously reverted to their onload state. Communication channels to external components have been interrupted and can be assumed to not yet be connected. Except, the correlator treats connections created with engine_connect -p, which are persistent connections, the same as it treats persistent state. Persistent connections continue until you explicitly remove them. Upon recovery, the correlator tries to reconnect to the external components that were connected with persistent connections. However, events sent or received after the last committed snapshot might have been dropped because there is no reliable delivery on persistent connections.

For a non-persistent monitor, recovery appears the same as starting the correlator. The correlator’s current time is up-to-date. The monitor is in the state it would be if it were just injected. External components have not yet connected to the correlator. If a monitor initiates a request of a non-persistent monitor, then the non-persistent monitor might have to queue the request until a connection is made to an external component, for example, the correlator subscribes to a data stream from an external adapter.

Recovery order

When the correlator recovers state from a recovery datastore, it does the following in the following order:

  1. Recompile and re-inject all sources except for deleted events and monitors, which are ignored.
  2. Restore objects and listeners in persistent monitors. The correlator does not execute any user code in the first two steps. While it sets up listeners, the listeners cannot yet change state.
  3. Set currentTime to the currentTime of the last committed snapshot, which might be considerably earlier than the current time of day if the correlator was down for some time before recovering.
  4. Initiate execution of any onBeginRecovery() actions on instances of restored events, monitors, and custom aggregate functions in all persistent monitor instances in all contexts. The order of execution of these actions is undefined. See Defining recovery actions.
  5. Quiesce — The correlator waits for all events that have been sent to a context to be processed, and also waits for any events that are sent to a context as a result of those events to be processed, and so on, until no more events are generated and sent to a context. The correlator also does this for spawn...to statements. This is similar to processing all events in all queues. Be careful not to generate an infinite loop of send...to statements.
  6. Restore events, clock ticks, pending spawn...to statements, and so on, that were waiting on context queues when the snapshot was taken.
  7. Send a single clock tick of the time at which the correlator is recovered, that is, the current time of day. If -XrecoveryTime was set when the correlator was started, the correlator uses that time for the current time of day.
  8. Initiate execution of onload() actions in all non-persistent monitors in injection order.
  9. Quiesce.
  10. Initiate execution of any onConcludeRecovery() actions on instances of restored events, monitors, and custom aggregate functions in all persistent monitor instances in all contexts. The order of execution of these actions is undefined. See Defining recovery actions.
  11. Quiesce.
  12. Start generating clock ticks.
  13. Start taking persistence snapshots.
  14. Open the server port. External components can now connect with the correlator, for example, IAF, engine_send, and engine_receive.

Defining recovery actions

In a persistent monitor, you can define one or two actions that the correlator executes as part of the recovery process:

  • onBeginRecovery() — The correlator executes this action after it re-injects all source code and restores state in persistent monitors. The order of execution of onBeginRecovery() actions is undefined.
  • onConcludeRecovery() — The correlator executes this action just before it begins sending clock ticks, taking persistent snapshots, and becoming available for connections to external components. The order of execution of onConcludeRecovery() actions is undefined.

Whether you define zero, one or both actions in each persistent monitor is application-dependent. See Designing applications for persistence-enabled correlators and Sample code for persistence applications.

You can define an event and specify one or both of these actions as fields in the event. If an event defines a recovery action and an instance of the event is live in a persistent monitor, then the correlator calls the action(s) on those objects as well. A live event is reachable from a global variable or listener-captured local variable and consequently is not a candidate for garbage collection.

You can define onBeginRecovery() and onConcludeRecovery() actions in custom aggregate functions in the same way as you define them in events. When an aggregate function contains an onBeginRecovery() or onConcludeRecovery() action, this action is called on each custom aggregate function instance in a live query in a persistent monitor along with the onBeginRecovery() and onConcludeRecovery() actions in persistent monitors and events.

The order in which the correlator executes instances of onBeginRecovery() actions and instances of onConcludeRecovery() actions for objects in a monitor is not defined. If a monitor terminates after execution of onBeginRecovery() and before recovered queues have been flushed, the correlator does not call that monitor’s onConcludeRecovery() action (if it has one). If the correlator terminates all of a monitor’s listeners in one execution of onBeginRecovery(), later calls to onBeginRecovery() for that monitor instance still occur because they might instantiate new listeners. If no listeners exist in a monitor after onBeginRecovery() and onConcludeRecovery() have been executed for every object in that monitor, the monitor instance terminates as usual.

See Recovery order for more details about when onBeginRecovery() and onConcludeRecovery() are executed.

Simplest recovery use case

When you observe the following restrictions, the correlator’s recovery behavior is straightforward:

  • All monitors are persistent. The correlator contains no chunks.
  • There are no implementations of onBeginRecovery() or onConcludeRecovery() actions.

EPL code that adheres to these restrictions appears to behave as if it is running in a completely reliable and fault tolerant system. The downside is that while the correlator is down, incoming or outgoing events are dropped. If you implement a “retransmit until acknowledge” protocol, then the correlator can have a large number of events (and retransmits) to process when it restarts, depending on how long it is down.

Designing applications for persistence-enabled correlators

When you are designing an application that you will deploy on a persistence-enabled correlator, you should consider the following issues:

  • You do not need to re-inject code after you restart a persistence-enabled correlator. During recovery, the correlator obtains injected code from the recovery datastore.
  • To recover from a hardware failure, you must maintain a copy of the recovery datastore on some form of reliable, shared storage. You want to ensure that the storage medium for the recovery datastore is not a single point of failure. This typically means putting it on a fileserver with suitable levels of redundancy (disk, power supply, network and controller) that is accessible by two correlator host servers.
  • The length of time between when a correlator shuts down and when it restarts is unpredictable. Consequently, you might want to implement onBeginRecovery() actions that do the following:
    • Specify behavior according to how long the down time was. For example, you could write a listener that ignores a subset of old events but matches on a new event.
    • Terminate on all wait(...) listeners. Such listeners have the potential to fire many times because the time jumps from the time of the last committed snapshot to the time at which the correlator was restarted.
  • It is possible for persistent monitors to communicate with non-persistent monitors and to set up state, such as subscriptions to a stream of data, in a non-persistent monitor. If you need to recover this state, you must write code to do it in the onConcludeRecovery() action of a persistent monitor or an event within a persistent monitor. In a persistent monitor, having an event that manages an activity in a non-persistent monitor is a recommended practice.

Upgrading monitors in a persistence-enabled correlator

While injection order is fixed and you cannot change it, you might want to upgrade a monitor and this would appear to require a change in the injection order. That is, upon recovery, you want the correlator to restore the upgraded monitor and not the older version of the monitor.

Remember that it is an error if you try to inject a monitor while instances of that monitor are already running in the correlator. The correlator never injects a duplicate monitor definition.

In a correlator without persistence enabled, you can terminate all monitor instances and then inject the updated monitor definition. Since all old versions of the monitor had terminated, the correlator would correctly inject the updated monitor even though it had the same name. Also, since persistence is not enabled, there is no recovery process and so recovery of the older version of the monitor is not an issue.

In a persistence-enabled correlator, terminating all instances of a monitor you want to upgrade is unlikely to be an option. For more information, see Versioning and upgrading monitors.

When your upgrade procedure terminates all instances of the old monitor the recovery process does not restore that monitor since all instances were deleted.

You might find that it makes more sense for your upgrade procedure to leave the instances of the old monitor running while changing the interface for whatever creates new instances of the monitor to create instances of the upgraded monitor instead of instances of the old monitor. The correlator would then be running some old versions of the monitor and some new versions of the monitor. Upon recovery, the correlator would recover both versions until all instances of the old monitor had terminated. This approach might be appropriate when the logic has changed so much that it is not practical to upgrade monitor instances, or when maintaining behavior for existing instances is desired.

Sample code for persistence applications

The topics below provide sample code for persistence applications.

See also Versioning and upgrading monitors which describes a sample for transferring monitor state using the MemoryStore.

Sample code for discarding stale state during recovery

The following code provides an example of discarding stale data during recovery. This application discards all recovered Data events because their data has become stale. However, the application always processes and does not discard ControlEvent events.

persistent monitor eg1 {
   listener l;
   listener lt;
   action onload() {
      initializeState();
      initiateListeners();
      on all ControlEvent() as c { handleControl(c); }
   }
   action initiateListeners() {
      l:=on all Data() as d { process(d); } // Process is moderately expensive
      lt:=on all wait(0.1) { send Average(state) to "output"; }
   }
   action onBeginRecovery() {
      l.quit();  // Discard all recovered Data events.
      lt.quit(); // Stop sending intermittent updates.
                 // Do not flood receivers.
                 // Note that the ControlEvent listener is still present.
                 // The code throttles only Data events. If the
                 // ControlEvent listener is not present, this monitor
                 // would have no listeners and would terminate
                 // after this action.
   }
   action onConcludeRecovery() {
      initiateListeners(); // Go back to normal.
   }
}

Sample code for recovery behavior based on downtime duration

The following sample is the same as the discard-stale-data sample with some changes that provide a downtime policy. Downtime is the duration between the last committed snapshot and the time of day upon recovery.

This code sample ignores downtimes that are less than two hours. However, if recovery starts just under the two-hour limit the processing of old data might appear to be beyond the two hour threshold. The downtime policy must take this into account.

persistent monitor eg1 {
   import "TimeFormatPlugin" as timeFormatPlugin;
   //... onload() and so on
   listener l;
   listener lt;
   action onload() {
      initiateListeners();
     // on all ControlEvent() as c { handleControl(c); }
   }
   action initiateListeners() {
     // l:=on all Data() as d { process(d); } // Process is moderately expensive
      //lt:=on all wait(0.1) { send Average(state) to "output"; }
   }
   boolean longDowntime;
   action onBeginRecovery()  {
      // currentTime is the time of the last snapshot, which is
      // approximately when the correlator went down.
      // timeFormatPlugin.getTime() is the actual time of recovery.
      if (timeFormatPlugin.getTime() - currentTime > (60.0 * 60.0 * 2.0) )
         {
            // If we were down for less than 2 hours, pretend nothing
            // happened.  For longer gaps, skip stale data as it will be
            // too expensive to process it.
            longDowntime:=true;
            log "Correlator was down for a long time - will discard stale
               data.";
            l.quit();  // Discard all recovered Data events.
            lt.quit(); // Stop sending intermittent updates.
                       // Do not flood receivers.
      }
   }
   action onConcludeRecovery() {
      if longDowntime {
         longDowntime:=false;
         initiateListeners(); // Go back to normal.
      }
   }
}

Sample code that recovers subscription to non-persistent monitor

This sample code defines a persistent monitor that subscribes to a non-persistent service monitor. Note that the service monitor can handle the case where the subscription is received before the adapter is connected.

monitor service_monitor {
   boolean connected;
   sequence <Subscribe> pendingSubscribes;
   action onload() {
      on all Subscribe() as s {
         if not connected {
            pendingSubscribes.append(s);
         } else {
            if(incrRefCount(s.subkey)) {
               send Adapter_Subscribe(s.subkey) to "output";
            }
         }
      }
      on all wait(1.0) {
         send IsAdapterUp() to "output";
      }
      on all AdapterUp() {
         connected:=true;
         Subscribe s;
         for s in pendingSubscribes {
            route s;
         }
         pendingSubscribes.clear();
      }
   }
  action incrRefCount(string subkey) returns boolean {
    return false; }
}

persistent monitor eg2 {
   listener l;
   Instance i;
   context svcCtx;
   action spawnedInstance(context c) {
      svcCtx:=c; // Contains anything required to recover subscription.
      send Subscribe(i.subkey) to svcCtx;
      l:=on all Data() as d { process(d); }
   }
   action onConcludeRecovery() {
        // Non-persistent service monitor is now reset to its onload state.
        // Re-subscribe.
      send Subscribe(i.subkey) to svcCtx;
   }
}

Requesting snapshots from EPL

A persistent or non-persistent monitor can request a snapshot to occur as soon as possible using the Management interface. For details, see Using the Management interface.

Developing persistence applications

While you are writing the EPL code for your persistence application, use Apama Plugin for Eclipse as you usually do, and do not enable persistence. When your application is near completion and has been successfully tested, start testing execution of the onBeginRecovery() and onConcludeRecovery() actions you defined in your application. Do this as follows:

  1. Select Run, Run configurations, Correlator component.
  2. Add -P to the command line of the correlator.
  3. Start the correlator.
  4. In the Run configuration, Correlator component, Initialization tab, disable all check boxes so that nothing is re-injected.
  5. Stop and restart the correlator. It will have persisted the injected monitors.
  6. Test the behavior of onBeginRecovery() and onConcludeRecovery() actions.
  7. If everything is working correctly, you can stop here. Otherwise, modify your code and continue with the following steps.
  8. Delete the persistence.db file.
  9. In the Run configuration, Correlator component, Initialization tab, re-enable all check boxes so that your code is injected.
  10. Start again at step 3 and continue until your code is working as desired.

Ensure that you delete the persistence.db file and re-inject fresh monitors only when loss of all state is acceptable, for example, during testing.

Backing up the persistence database while the correlator is running

Backing up the correlator persistence database while the correlator is running is not as simple as copying the file. This is because copying files happens by reading chunks of the file at a time and copying them elsewhere. In between reading chunks of the file, it is possible that the database is modified. Because of this, it is required to make an atomic snapshot of the database file, reading the entire state in one go. This can be done using file-system snapshots, a capability provided by many storage systems (for example, VMWare or NetApp) or operating systems (for example, Shadow Copy on Windows, or LVM with XFS and Ext4 file systems on Linux).

Proceed as follows to create a backup of the persistence database:

  1. Create a snapshot of the volume containing the persistence database file.
  2. Copy the persistence database from the snapshot.
  3. Also copy all other files with similar names in the same folder. Using a wildcard filter such as “persistence.db*” will copy the following:
    • persistence.db
    • persistence.db-journal
    • persistence.db-wal
    • persistence.db-shm
  4. Delete the snapshot when you have copied all required files. The snapshot is no longer needed.

Examples of how to create a snapshot and back up the persistence database on Windows and Linux are given below.

Backing up the persistence database using Shadow Copy on Windows

On Windows server platforms, you can create a persistent snapshot of the volume containing the persistence database, copy the database and related files from the snapshot, and then delete the snapshot.

It is also possible to use a temporary snapshot to copy the files in a single command. This works on all supported Windows operating systems. For example, you can take a snapshot using the VShadow tool. With this tool, snapshots are temporary by default. To invoke the tool, use an elevated command prompt (that is, run the Windows command prompt as an administrator) and enter the following:

vshadow -nw -script=SETVAR1.cmd -exec=copyPersistence.bat D:

where:

  • -nw makes the backup faster by skipping applications that react to a snapshot being taken, which the database does not.
  • -script generates a script called SETVAR1.cmd which provides variables for accessing the snapshot.
  • -exec runs the script copyPersistence.bat (see below) before the tool exits (that is, while the snapshot still exists).
  • The final argument is the drive that contains the persistence database.

The copyPersistence.bat script handles the actual copying. It has the following content:

call SETVAR1.cmd
for %%i in (%SHADOW_DEVICE_1%\path\to\persistenceDatabase\persistence.db*)
 do copy %%i D:\backup\location

Backing up the persistence database using LVM on Linux

Proceed as follows:

  1. Log in as root.

  2. Create a snapshot of the volume containing the persistence database:

    lvcreate -L1G -s -n snapshot /dev/vgname/persistence_volume
    

    In this case, the name of the snapshot is “snapshot”.

    Info
    Some file systems require you to pause writes while creating the snapshot. On XFS, for example, you have to run xfs_freeze -f /myxfs before running lvcreate and then xfs_freeze -u /myxfs after it completes.
  3. To copy the database, first mount the snapshot:

    mount /dev/vgname/snapshot /mnt
    
  4. Copy the relevant files from the snapshot:

    cp /mnt/path/to/persistenceDatabase/persistence.db* /backup/
    
  5. Unmount the snapshot:

    umount /mnt
    
  6. Remove the snapshot:

    lvremove snapshot
    

Using EPL plug-ins when persistence is enabled

A persistent monitor can import an EPL plug-in only when the following conditions are met:

  • None of the plug-in’s functions/actions, including unused functions/actions, refer to a chunk type.
  • The plug-in is capable of persisting its chunks. In this release, only the Time Format plug-in and the MemoryStore plug-in are capable of persisting chunks. User-defined EPL plug-ins and other Apama-provided plug-ins cannot persist chunks.

See also Restrictions on correlator persistence.

Using the MemoryStore when persistence is enabled

When persistence is enabled, a persistent monitor can use the MemoryStore only with a correlator-persistent store. A correlator-persistent store is a store that was created by execution of the storage.prepareCorrelatorPersistent(store name) action. A persistent monitor cannot use a store that was created by executing any other storage.prepare() action. The only exception to this is if the persistent monitor is in a correlator for which persistence is not enabled. In this situation, the correlator treats persistent monitors in the same way it treats non-persistent monitors.

In a persistence-enabled correlator, both persistent and non-persistent monitors can use correlator-persistent stores. If you try to prepare an in-memory, on-disk or distributed store from a persistent monitor in a persistence enabled correlator, the correlator terminates the monitor that tries to do this. These are runtime errors. The compiler cannot catch these errors. The following table shows when you can use each kind of store.

Store type Persistent correlator and persistent monitor Persistent correlator and non-persistent monitor Non-persistent correlator and persistent monitor Non-persistent correlator and non-persistent monitor
In-memory Yes Yes Yes
On-disk Yes Yes Yes
Correlator-persistent Yes Yes* Yes* Yes*
Distributed Yes Yes Yes

* Correlator-persistent store behaves as an in-memory store.

Snapshots include the contents of all correlator-persistent stores that are open. A snapshot can occur at any time, and it is not possible to commit only certain states of correlator-persistent stores or the tables in them. However, when using correlator-persistent stores from persistent monitors, failure and recovery of a correlator should appear as though nothing has happened. That is, all monitor state and table state should be as it was when the most recent snapshot was taken.

Just as you cannot execute Store.persist() for in-memory stores, you cannot execute the Store.persist() action on correlator-persistent stores. You can, however, use Apama’s Management interface to request a snapshot of the entire correlator state and wait for that to complete. See Using the Management interface.

In persistent monitors, Store, Table, Row and Iterator events are persistent and their state can be recovered to the latest snapshot. Persistent monitors should not see any inconsistency between the contents of the table and any state in the monitor, including Store, Table, Row, and Iterator events. Correlator-persistent stores behave the same as an in-memory stores, except that the state of correlator-persistent stores is preserved across correlator restarts.

When the correlator takes a snapshot, it includes Row events held by persistent monitors. Such Row events are, of course, versions of rows in a table that is in a correlator-persistent store. A persistence snapshot does not include Row events held by non-persistent monitors, even if they represent rows in tables that are in correlator-persistent stores.

Info
The recovery datastore in which the correlator saves snapshots is different from the stores used with the MemoryStore. The recovery datastore contains the state of all persistent monitors, which might include Row events, Iterator events, and other MemoryStore-related events, and also the state of any correlator-persistent stores created with the MemoryStore. Thus, the recovery datastore contains any correlator-persistent stores. If non-persistent monitors have opened in-memory and/or on-disk stores, those stores operate independently of the recovery datastore. For example, a non-persistent monitor can request persistence for an on-disk store and this on-disk store would not be persisted in the recovery datastore.

In a DataView, you can expose only in-memory and on-disk stores; you cannot expose correlator-persistent stores.

See also Using the MemoryStore.

Comparison of correlator persistence with other persistence mechanisms

Correlator persistence is not the only way to persist Apama application data. The table below compares the various features you can use to persist Apama data. As you can see, correlator persistence provides the most comprehensive, automatic persistence.

Persistence characteristic Correlator persistence MemoryStore Apama Database Connector Adapter (ADBC)
Completeness of what is persisted All state in persistent EPL monitors Only state that you explicitly store. Partial listener evaluations are impossible to store. Only state that you explicitly store. Partial listener evaluations are impossible to store.
Recovery mechanism Automatic Manual Manual
EPL monitors can be notified about recovery Yes Yes Yes
Supported across Apama versions Yes Yes Yes
Incremental snapshots Yes Yes Yes
Storage type Embedded Embedded Shared servers are supported. You can use any database server or driver.
Atomic snapshots Yes Yes Yes
Performance benefit from pipelining disk writes with processing Yes Yes Yes
Supports multiple contexts Yes Yes Yes

Restrictions on correlator persistence

EPL plug-ins written in C++ and Java

A persistent monitor can use the Apama Time Format and MemoryStore EPL plug-ins and the chunk types contained by the events defined by those plug-ins. A persistent monitor cannot use any other chunk types. This means that a persistent monitor cannot use an event or plug-in that references a chunk type even if the application does not use those chunks.