Handling personal data "at rest" in log files

Personal data can appear in the log files of any Apama server process, such as the correlator, IAF, or dashboard servers.

Example log messages containing personal data

These files include logging performed by the customer’s application and by standard Apama connectivity, EPL or IAF plug-ins and the correlator, IAF and dashboard servers themselves.

For example, customer application log statements may contain personal data. It is also important to note that the contents of Apama events are often logged (either in full, or truncated so that only the beginning of the event’s fields are displayed) if an error occurs during processing or sending of the event, and data from events or other EPL data structures may be logged as part of error messages.

We provide a small set of indicative log messages as examples to give an idea of the kind of data that may be present. Note that this is for illustrative purposes only. It is not possible to provide an exhaustive list of all possible log messages, and the format of the messages shown below is subject to change at any time and should not be relied upon.

When a client connects directly to the correlator port, the IP addresses and username are logged (note that the username is not authenticated, so should not be relied upon for security purposes; this would be either a system administrator or another machine, not an end user):

2018-05-3014:23:43.950 INFO [30188] - Sender engine_inject
(MY_USERNAME) (0000000000435490) (component ID
6561364086281508704/6561645561258219360) connected from 127.0.0.1:6714

When the HTTP server connectivity plug-in receives a new incoming connection, the IP address and username is logged (authentication and TLS may optionally be enabled, in which case the username can be relied upon):

2018-05-30 14:23:45.257 INFO [29300] -
<connectivity.httpServer.httpServer-instance> Started receiving messages
from host 127.0.0.1
2018-05-30 14:23:45.260 WARN [29300] -
<connectivity.httpServer.httpServer-instance> Authentication failed for
user 'uniqueusername' from host 127.0.0.1.

When an event is sent directly to the correlator and cannot be parsed (perhaps due to an application bug, or an error in the format used by the sender), a message like this will be logged (in log messages like this, all the personal data is in customer-defined fields):

2018-05-30 16:36:58.609 WARN [29308] - Failed to parse the event
"com.acme.MyEvent("private political opinions go here", "1/2/1990",
"My full name", "MY_USERID")" from My Sender Client due to the error:
Unable to find event type com.acme.MyEvent

When a connectivity plug-in fails to transform an incoming message into the form the application is expecting, the message will be logged. The order in which the fields appear is undefined, and it is possible the message will be truncated:

2018-05-30 16:44:47.811 WARN [17644:processing] -
<connectivity.myCodec.myFirstChain> Codec plug-in MyCodec failed to
transform message Message<metadata={"sag.channel"="MyFirstChainChannel",
"sag.type"="MyMessage", "username":""J_BLOGGS"}, payload={"medical
info": "Embarrassing medical info here", "name":"Joe Bloggs",
"username:"J_BLO...}>: java.lang.Exception: Something bad happened

When correlator-integrated JMS receives an invalid incoming message, it may log some or all of the body (which may contain customer-defined personal data), potentially truncated if it is long:

2018-05-30 16:25:37.114 WARN
[11708:JMSReceiver:myConnection-receiver-apama-queue-01] -
1 mapping rule warning(s) while mapping to target event:
test.TestMessage("DOB=1/2/1990","Embarrassing medical info here",
"Joe Bloggs","J_BLO...:
  - MappingRule<source="${jms.body.textmessage}",
    target="${apamaEvent['str']}", action="xpath", actionResource="/xxx">
  - Exception evaluating the xpath "/xxx": org.xml.sax.SAXParseException;
    lineNumber: 3; columnNumber: 70; The element type "xxx" must be
    terminated by the matching end-tag "</xxx>".
  with source JMS message:
   Property.USERNAME=J_BLOGGS
    --->    JMSDestination=Queue<apama-queue-01>
    --->    Body=<mydata>
    ---> <val key="date_of_birth">1/1/2018</val>
    ---> <val key="name">Joe Bloggs</val>
    ---> <val key="medicalinfo">Embarrassing medical info here</val>
    ---> <val key="username">J_BLO...

Protecting and erasing data from Apama log files

To protect the security of personal data in log files, it is important that operating system file permissions are set on the log files and directory containing them to ensure that only the correlator process and authorized system administrators have access to the files. On Windows, this would mean setting an inheritable Access Control List (ACL) limiting read access to the contained files. On UNIX systems, this would involve restricting read and execute permissions to only the owning user (that is, “700”) and if possible also setting a umask of 0077 on the correlator process to ensure files created by the correlator also have locked down permissions.

As there are many situations in which usernames, IP addresses or events containing personal data may be logged, including by customer-provided plug-ins and third-party libraries, it is not practical to enumerate all of the log messages that may contain such data, or the set of categories they may be logged under.

Log files are by nature immutable and formatted for reading by human system administrators (not machines), so rectification of data contained within them does not make sense, and erasure of data for individual persons is not practical. The retention of complete information in log files also serves an important and legitimate purpose, in providing a security audit trail, and the ability to diagnose and fix accidental or unlawful events compromising the availability, integrity or confidentiality of the application and personal data it contains.

For these reasons, the recommended approach to protecting personal data in Apama log files is to regularly rotate the logs, and archive the old log files to a secured location protected by encryption.

Optionally, old log files may be deleted after a set time period, though this should be done only when necessary as it will destroy information that might be important for diagnosing bugs or attacks that compromise the integrity or availability of the application. Cumulocity product support may not be able to provide assistance with support requests if the relevant log files have been deleted.

Apama provides a variety of mechanisms for rotating its various log files. These can be combined with operating system features such as Linux’s periodic cron jobs, Windows Scheduled Tasks, or common utilities such as logrotate and gnupg, to implement whatever log handling scheme best fits with your organization’s data protection policies. For full information about how to rotate logs, see the following topics:

You may wish to inform your employees or end-users - or in some cases request from them - regarding the fact personal data may be stored in server log files, along with details of the steps your organization takes to protect the data they contain.

We recommend the main correlator, IAF and dashboard server log level to always be set to INFO. If it is set to WARN or higher, then security-relevant events will not be recorded and diagnosing failures can be difficult. But if it is set to DEBUG or lower, then there will be a significant performance impact and security-sensitive information will likely be written to the logs.

For similar reasons (as well as to avoid performance problems), it is important not to go into production with diagnostic logging of input/output messages enabled. For example, do not enable the Diagnostic codec connectivity plug-in or set logJmsMessages to true, except for non-production testing when there is no real personal data present in the messages.

Recommendations for logging by Apama application code

If you are developing EPL applications, connectivity plug-ins or EPL plug-ins, you will need to make your own choices about what information to log and at what level from the code you write. To comply with the principle of data minimization, it is best to avoid unnecessarily including personal data in log output. So where possible, avoid logging details such as username, IP address or the contents of messages, unless needed for security auditing or for legitimate interests such as diagnosing and resolving application errors. In some cases, “pseudonymization” will be possible. That is, when logging personal data, use an application-generated globally unique identifier (GUID) instead of a username, or an IP address that could be used to link the data to an individual person, and protect the mapping between GUIDs and usernames.

It is important to select an appropriate log level for application-generated log messages. It is possible to select different log levels for individual packages within your EPL application, and to direct the output to different log files. See Setting EPL log files and log levels dynamically for more details. If using multiple log files, ensure that the same file system permissions and secure rotation policies are applied to all of them.

You may wish to gather together all the security audit logging from your EPL application into a single file, or perhaps all of the logging that may include personal data. As EPL log statements are written to a category based on the event definition where they exist, these use cases can be addressed by defining a dedicated Apama event definition to perform the logging. For example:

package com.mycompany;
event SecurityAuditLogging
{
    action logModification(string username, string resource)
    {
        log "Security event: user '"+username+"' modified resource: "
          +resource at INFO;
    }
}
...
SecurityAuditLogging.logModification("myuserid", "resource");

The log level and log file for this could then be configured in the correlator’s YAML configuration file as described in Setting EPL log files and log levels in a YAML configuration file. For example:

eplLogging:
  com.mycompany.SecurityAuditLogging:
     file: apama-security-auditing.log
     level: INFO