Handling personal data in the "in-memory" state of the correlator
In Apama applications, customer-defined data - potentially including personal data - is held “in memory”, in various places:
- event fields
- monitor variables
- variables in the scope of an event listener
- MemoryStore
- state held in EPL plug-ins
- state held in connectivity plug-ins
As for languages such as C++ or Java, for Apama it is the responsibility of both the author of the application and those responsible for deploying and using it to ensure that this personal data is handled appropriately, by writing suitable logic into the application, and ensuring that the right policies and access controls are in place.
Some key areas to consider are listed below:
- Data minimization should be practiced by ensuring that no personal data is sent into Apama other than that absolutely required to implement the required functionality. For example, consider using the ApamaDoc to review all the event definitions used for data entering or leaving the correlator, and any monitors that store personal data, and use them to check for data that could be safely removed. If there is some personal data that Apama does not need to make direct use of, but must be opaquely passed through to another system that Apama is sending messages to, consider whether it is possible to avoid that data existing as plaintext inside Apama by having the system that generated the messages pass it into Apama as a Base64-encoded encrypted string, with the key shared only between the originating upstream system and the downstream system that needs to use the data.
- Rectification (correction) or erasure of data can be implemented by identifying which monitor instances and data structures hold personal data, and ensuring that EPL logic is in place to change or delete it for a given person, in response to an Apama event requesting this. If the application has a monitor instance dedicated to each user, this could be as simple as making that monitor instance “die” in response to an event requesting removal. Otherwise, it may be a case of removing that user from dictionaries and other data structures. If the application uses plug-ins such as the MemoryStore to hold personal data, the EPL application must also remove the keys holding personal data from the MemoryStore. It is important that any rectification or erasure capabilities are built into the application from the beginning and carefully tested, since it would often not be possible to add them once the application is deployed without losing the correlator’s state.
- On-demand access to a user’s personal data can be implemented by identifying where personal data is held and ensuring EPL logic is in place to return it, perhaps by sending an event containing the data in response to an event requesting the data. Personal data can be exported in an open and portable format by using the JSON EPL plug-in to serialize it to a JSON string (see also Using the JSON plug-in).
- Pseudonymization should be practiced where possible. This means keeping the identity of a person and data about that person separate as much as possible. For example, rather than sending a message into Apama containing a user’s real name and information such as the user’s medical history, the user’s name could be replaced by a unique identifier (for example, a GUID) assigned and protected by the upstream system, so that the user’s identity and the information about the user are not held together inside the correlator. By separating the user’s name from the other information, and ensuring that the mapping between the real name and the assigned unique identifier is kept secure, the risk of the data about that user being leaked and linked back to the person it relates to is significantly reduced. Pseudonymization techniques should be applied as early as possible in the processing of messages, ideally before the message enters the Apama correlator. This will help to minimize the number of systems where both the data about the persons and their identity exist together. If that is not possible, it should be performed as early as possible in the correlator’s handling of the data, for example, in a connectivity plug-in before it is passed to any EPL, or in the initial EPL listener but before it is stored in a data structure inside the application. This reduces the chance of the personal data leaking out in a log message.
- A security audit trail should be created, where practical, whenever personal data is created, modified or deleted, in order to protect its accuracy and allow errors to be tracked down, whether introduced accidentally or as part of an attack. For example, this could be achieved by using EPL
log
statements. It is possible to configure the file that log entries are written to; so if necessary, audit logging could be written to a dedicated log file. It is usually best to perform logging regarding personal data at the application level in EPL, rather than relying on logging of input events or connections from connectivity plug-ins. However, it is important to ensure that user identifiers in the incoming events can be relied upon, which means using a connectivity plug-in such as the HTTP server that supports per-user authentication, or if using a message bus such as Universal Messaging, ensuring that channel permissions are set appropriately and that the system that is publishing messages can be relied upon to set usernames accurately. See Specifying log statements for more information about logging from EPL. - Security must be architected into the design and deployment of the Apama application to ensure the information security of personal data handled by Apama, and protect the confidentiality and integrity of data. Key points are:
- See Security Requirements for Apama for detailed information about how to ensure your Apama deployment is secure.
- Remember to also fully configure all connected systems to perform adequate authentication and authorization, for example, setting appropriate permissions on all channels if using Universal Messaging (see Configuring the connection to Universal Messaging (dynamicChainManagers)).
- Note that it is possible for a user with direct access to the correlator’s port to receive events passing through the correlator - which may contain customer-defined personal data - or to inject code that may change or access that data. Similar considerations apply to the IAF port and the dashboard server management port. There is no encryption or authentication for these ports, but in a properly configured deployment they would always be locked down using standard operating system configuration tools and firewalls, so that you can be confident that only trusted server processes and system administrators are able to connect to them. For monitoring purposes, products such as webMethods API Gateway can expose required correlator features such as the read-only monitoring REST APIs in a secure way without exposing other features that are security-critical.
- Apama provides many ways to get data in and out of the correlator that support encryption (for example, SSL/TLS) and authentication to protect the confidentiality and integrity of data: for example, the dashboard servers, and correlator connectivity plug-ins such as the HTTP server (when configured to use TLS/HTTPS). Be sure to check the documentation for your chosen means of connectivity carefully to ensure all the required security options are enabled. If using a message bus such as Universal Messaging or JMS, ensure that the permissions on the channels/topics/queues are set securely, and that if message publishers are providing information to identify users that will be relied upon for authorization or audit logging, that they are implementing authentication securely to ensure the information is accurate.
- Apama application developers can use EPL to implement any user-specific authorization checks needed to protect access to personal data. However it is essential to check that authentication is being securely performed by the connectivity plug-ins or upstream message publisher that is setting the username field that the EPL will be relying on, to ensure an adequate security audit trail.
- Ensure that you regularly install the latest Apama fixes and keep your operating system fully patched, to ensure the latest security fixes are present.
Apama also provides the ability to store customer-defined data in external systems such as a Terracotta distributed cache (using our MemoryStore API) or a database. You should consult the documentation of systems such as these for information about how to ensure personal data written there by your application is properly handled and protected, and you should also make sure that your Apama application logic includes mechanisms to rectify or erase personal data stored there by the application.