Collector Development Guide

CONNECTOR INTERACTION

In general, the Collectors written under this template will get their data from another type of plug-in, the Connector. If you review the basic template structure, each time through the loop a call is made to fetch the next record from the Connector, which is what the Collector then parses. It is possible to circumvent the template and not use this basic methodology, but we've only done this once and that was because we did not have a Connector for the particular protocol we were trying to implement.

The template does quite a bit of work to abstract the complexities of dealing with the Connector, but there are some key considerations of which to be aware. This section will discuss those topics.

Basic Architecture

The general data flow is represented in Event Source Management as follows:

The Event Source represents the device or application from which data is being collected. This could be a file, a syslog source, a database, etc.
The Connector represents the software component that handles the protocol-level communications with the Event Source. There can be multiple Event Sources that are handled by a single Connector.
The Collector represents the parser that takes the log record from the Connector and converts the input into a normalized Sentinel event.
The Collector then delivers the event to the Collector Manager which performs additional filtering and mapping, then delivers the event to the Sentinel backend.

Since the Connector is a general-purpose component that can work with many different types of sources, various methods for configuring Connector operations are provided. One set of configuration applies to how the Connector gets data from the Event Source, and is embodied in the representation of the Event Source in Event Source Manager. You set this up by right-clicking on the Connector and selecting Add Event Source, and then specifying the details of how the Connector can find the event source and connect to it (options vary based on the type of Connector).

Each Connector can also be used with one or more Collectors, so a second set of configuration parameters applies to the interaction between the Collector and the Connector. Some of these parameters also affect how the Connector gets data from the Event Source, and others modify how the data is presented to the Collector by the Connector. These parameters are applied across all Event Sources that are connected via the Connector to the relevant Collector.

Here are some examples of various configurable options for a few of the common Connectors:

For the File Connector, the properties stored in the Event Source include things like the file name or pattern, whether it's a rotating file, what the file encoding is, whether the file is local or on a remote file share, and how often to check for new data appended to the file. The key property that is set by the Collector to which the Connector is attached is the record delimiter, which the Connector uses to determine how to deliver a single "record" collected from the source to the Collector.
For the Windows Event (WMI) Connector, the Event Source holds properties relating to things like the hostname of the system from which data should be collected and the credentials that should be used to access that system. The Collector sets properties related to the specific query (which logs, which events from those logs) that should be issued against those sources.
For the Database Connector, the Event Source holds properties about each database source — which JDBC driver to use, connection properties, credentials, etc. The properties set by the Collector include how the query should identify the start of the audit trail table, whether column names should get a prefix, and whether the data should be presented to the Collector as a string or as a map. For the Database Connector specifically the Collector actually manages the query so it is a little more complex.

When developing a Collector to handle data from a particular device or application, you will need to determine which settings the Collector should apply to the Connector and hence to Event Sources, plus you should document any specific steps needed to configure the actual device or application, and if necessary to configure Event Source nodes in ESM for the rest of the configuration.

Record Handling

In general, Connectors can be divided into two categories: those that passively receive records, and those that go out and fetch them. In the former category are the Novell Audit and Syslog Connector, and in the latter are things like the File Connector and the Database Connector. Some, like the Process Connector, are a bit of a hybrid — you have to manually configure the Event Source but once invoked, the source then just delivers a stream of events to the Connector.

In general for the first class of Connectors the architecture is set up with a network server that receives data from remote system — this is called an Event Source Server within Event Source Manager, but is properly part of the Connector. Source systems can generally just start sending records to the appropriate network ports and the Connector will automatically provision appropriate Event Source nodes. The Collector will typically then have some properties it can set that will help the Connector know where to route different sets of events so that they are handled by the appropriate Collector.

For the second class of Connectors, usually some manual configuration of the Event Source nodes is necessary. After that, the Connector will scan the defined sources and pass any new data on to the Collector for processing. The Connector itself will keep track of the offset — the position within the event stream to which the Connector has previously read.

In both cases, log records are collected from the one or more configured sources, and are typically stored in a buffer. Those records are then routed to the appropriate Collector, where there is another buffer that serves as an input queue for the Collector. All the Collector has to do is to pick up the next record that was placed into the queue and process it. The Connector handles almost all of the complexities of interacting with the source — error handling, retry logic, offsets, etc (with the exception of the Database Connector, see below).

The one important decision to be made on the Collector side is how to handle situations where the input queue is empty. This can of course happen if the Collector consumes all of the input records and the source has not produced any new ones. The template defines the following interaction models:

falloff: This is the default interaction mode, and is designed to provide the most desirable behavior in most circumstances. It is so named because the Collector will attempt to read a new record and if it does not find one, it will almost immediately return control back to the Collector. The second time it doesn't find a new record, it will wait a bit longer, the third time even longer (exponential falloff). The mode strikes a balance between not waiting around for too long when there is actually data coming in, and querying the source too frequently or running through the Collector loop too often when there is not.
fixed: This mode requests a record from the input queue and if it doesn't find one, waits a pre-defined fixed retry interval before asking again.
none: This mode means to skip reading from the Connector entirely. This is only used in rare cases where the Collector does not use a Connector at all.

Note that whenever the Collector is waiting for input data in either falloff or fixed mode, the template Collector loop is not processing. This means that you can't issue any "no recent data" alerts, can't process any expired Sessions, or do anything like that. This is one of the reasons that the falloff method is designed the way it is.

If you do nothing, your Collector will automatically use the falloff mode. You can override this by manually specifying a different mode; add the following to your Collector.prototype.initialize() method to (for example) set the fixed mode:

this.CONFIG.params.Conn_Retry = "fixed";

You can also let the person configuring the Collector onsite choose; to do so, include the conn_retry template parameter in your Collector (see the Parameters documentation). Note that whenever you specify the fixed mode, you should also hardcode a retry interval (instance.CONFIG.params.RX_Timeout) and/or include the rx_timeout parameter to allow it to be configured onsite.

Configuring Connector Properties

As part of setting up a Collector, you must also specify the allowed Connectors (methods) that can deliver data to the Collector, and the modes that can be used by that Connector. This is determined by the capabilities of the event source, and usually only one method is suitable (although Collectors can be configured to support more than one method if the source supports it). An XML template file with a custom editor is provided that provides a lot of examples; essentially you eliminate the methods and modes you won't support. You can support multiple methods and multiple modes per method, but one method and one mode per method must be declared as the default and the implementor will have to override that choice to use non-default modes.

Recall from the earlier discussion how this works: you specify a set of connection properties as part of the Collector configuration, and when the Collector is deployed and a Connector is deployed with associated Event Source nodes, those connection properties are applied to the Event Sources and/or to the Connector output.

The specific properties that apply to each Connection Method (Connector) are documented fully in the associated Connector document — find the Connector on the Plug-ins Download site and look for details in the appendices. Below we will provide some examples of interesting Connectors and their supported mode properties.

File Connector

The File Connector really only supports one relevant property, the Delimiter. You set the delimiter to tell the Connector how to identify, for that particular event source, the end of a single record. For many sources this is just a newline, but some multi-line sources use either a double newline or some funky character. You can specify more than one alternate line ending by separating each option with a comma, and non-printable characters are specified in hex as 0x0D.

You can also do more complex things: for XML input, for example, you can specify the end tag of the record to get a complete XML record. Just note that the delimiter is consumed by the Connector, so you may have to restore it to the input string to get parsable XML.

Windows Event (WMI) Connector

This Connector is considerably more complex, with an agent that can poll multiple event sources and deliver data back to Sentinel. But again there is only one important mode property, EventLogQuery, that affects how the Connector queries the event source. It specifies which Windows Event Logs should be queried and which events should be filtered. The Collector determines this because the Collector will be designed to parse a specific subset of the universe of possible events in Windows, hence it should only query for those events that it can parse.

The syntax of this property is a little funny: it is a comma-separated list, but values come in pairs. The first element of each pair is the name of an Event Log in Windows — like Security or System — and the second element is the filter (in WQL) that should be applied to that log. You can leave the filter blank, but you must include the position.

For example:

Security,"EventCode=4696 OR EventCode < 100"

This example requests event code 4696 and codes less than 100 from the Security log.

Security,"EventCode=4696 OR EventCode < 100",Application,

This example requests the same Security events and all Application events (note the empty filter at the end).

Database Connector

The Database Connector is by far the most complex to work with, because so many parameters of the event source are variable such as the table(s) and column(s) to be queried, how the data is indexed, whether we're using an interactive query or a stored procedure, and so forth. As a result, the Collector actually ends up doing a lot more work to manage the database queries than with any other type of Connector; thankfully, the Collector template handles a lot of the complexity for you.

We'll get into the details of handling the database query in a bit, but first let's discuss the connection mode options as we've done with the other Connectors. To help this discussion, let's assume that we have a little database table that we want to query that looks like this:

Time	EventName	AccountName
Jan 5 2012 10:12:34	Authenticate	jsmith2
Jan 5 2012 10:12:35	File Open	jsmith2

We'll be issuing a query something like SELECT * from this table.

First, the Connector has a mode property called DataFormat that affects how the results are presented to the Collector. For this example's first record:

DataFormat: nvp would result in a name-value pair structure like this: rec.s_RXBufferString='Time="Jan 5 2012 10:12:34" EventName="Authenticate" AccountName="jsmith2"'
DataFormat: map would result in a structured record, which I'll represent in pseudo-JSON (JavaScript's native format): rec.RXMap = '{ Time: "Jan 5 2012 10:12:34", EventName: "Authenticate", AccountName: "jsmith2" }

With this second format, you can easily address each column directly using a syntax like rec.RXMap.AccountName, making it very easy to map this to the output event.

There are some other properties that control details of how the columns are presented:

ForceLower: This forces all column names to appear in lowercase, for example EventName above would appear as eventname
AddPrefix: Handling the input as a native JavaScript map introduces the potential for name conflicts, so for example if the database happens to have a column named s_db_hostname (which is supposed to report the hostname configured for the database Event Source), the Connector metadata variable of the same name might get overwritten. This property allows you to add an arbitrary prefix onto each column name to prevent this (you can also use the SELECT col AS alias syntax for many DBs).
ReportNullValue: This indicates that the Connector should provide an entry for each queried column in its output, even if the value is empty. If you don't do this, you can run into trouble attempting to access variable that have not been defined.

That covers the basics of handling the data returned from the database... but how do we get data in the first place?

Database Connector Queries

In most data collection scenarios, the input from any source can be thought of as an infinite stream of records, starting at some point in the past and extending infinitely into the future. Of course, the most you can read at any point in time only extends up to the present moment, and one must subsequently check (poll) to find out whether there is more data. Similarly, since the stream of event data extends backwards in time until whatever time the audit trail was first configured, there could be thousands, millions, even billions of records in the full audit trail. Clearly, to deal with this data stream we will need to do two things:

When querying for past data, we will need to limit the set of results returned so that we don't overflow memory resources on the Collector Manager.
We will need to repeatedly poll the event source to get additional sets of records.
- If we are querying old data, we need to poll the database as fast as is reasonable until we reach "real time"
- Once we reach "real time", we should poll the database less frequently to avoid overloading the database if there's no actual data to retrieve

To do this correctly, we clearly need to come up with some way to track where we are in our event stream — some offset or index that will let us keep track of where we've read to within the stream of events so that subsequent queries will pick up where we left off and we won't miss any events. We'll need something that increases or decreases in only one direction (with respect to time), i.e. monotonically, and ideally something that is unique for each and every record so that we can tell exactly where we last read to with a resolution of a single record. This gets really tricky, however, since not all database vendors provide support for such constructs, and not all audit trails are designed with a nice, simple, incrementing row number or similar. In fact, the most common offset we use is a timestamp; timestamps satisfy the first criteria (monotonically increasing) but not the second (unique for each record) because it's quite possible to have multiple records in a single "timeslice", meaning whatever the smallest time interval reported is.

The SDK Collector template provides a number of features that help simplify the complexity described above. First of all, a very simple method is provided for specifying the database query: put it in a file. In each new Collector, you'll find a file called sqlquery.base (if the Collector won't use the Database Connector, it's safe to simply delete this file). Simply type or copy the desired SQL query into that file, something like this:

SELECT * FROM table1

Of course, you can specify individual columns and column aliases, join multiple tables, add filters in a WHERE clause - all the usual stuff. But this query doesn't yet satisfy our constraints listed above, because it will simply get all records from the source every time it is issued.

The next thing we have to do is to restrict the query so that it returns only a manageable subset of records. The way to do this varies from database to database, but in general you'll be using a construct like SELECT TOP 100000... or SELECT ... WHERE rownum <= 1000000. In fact, the template provides some parameters to allow the implementor to tune the number of records returned, so when writing your query you should use the special replacement variable %d at the point where you will need to specify the max, for example:

SELECT TOP %d * FROM table1

Second, you need to decide what field or fields you wish to use to track your offset. Then you need to do two things: add a WHERE clause to your query to begin the query at that offset, and tell the Collector template how to calculate the offset. Here's an example of a modified sqlquery.base file with an offset specified:

SELECT TOP %d * FROM table1 WHERE RowNumber > %s

Note that we're using %s here to represent the current value of the offset — that value will need to be updated continuously as new records come in and the Collector processes them. Rather than make the Collector developer manually extract the offset each time, the way the template handles this involves having the developer define an anonymous method that "knows" how to calculate the offset, and then attach that method to the Connector object (the class that represents the Connector in the Collector execution space). The Connector will then automatically take care of calculating and updating the offset. Here's an example; this code would live in Collector.prototype.initialize():

this.PARSER.getOffsetData = function(input) {
    if (typeof input.RXMap != "undefined") {
        return input.RXMap.TIMESTAMP; 
    } 
}
conn.addParser(this.PARSER.getOffsetData);

In this example, we've defined an arbitrarily-named method getOffsetData() (it could be anything). We go ahead and store this method in the global instance.PARSER area (instance represents the running Collector) — this is optional, but allows us to re-use the method later or swap it out temporarily if we need to. Next, we add this new method to the Connector using the addParser() method, which is part of the Connector class.

It is important to note that the Connector itself will keep track of the offset, and that that information will be persisted across restarts of the Collector/Connector/Event Source, indeed of Sentinel.

The way this works in the background is as follows:

Whenever a new Event Source is added to Sentinel that is attached to a Database Connector, and that Event Source is started, the first thing the source does is generate a pseudo-event that reports whatever the last offset it saw from event data was.
The Collector template will automatically create a new SQLQuery object that is pre-loaded with the template query from the sqlquery.base file, plus has the offset calculation method attached to it.
The Collector template will pass the offset sent by the Event Source to the SQLQuery object and it will construct a new query with the current max rows and offsets injected into it.
The Connector will pass the query to the database and retrieve a set of records.
The Connector then feeds those records to the Collector, and the Collector processes the record data.
As part of the processing, the SQLQuery object will automatically calculate any new offsets and store them back in the Connector.
When the input record set is depleted, the SQLQuery object will automatically construct a new query with the latest offset.

Last but not least, there is one important connection mode property needed to make all this work, the InitialStartOfData property. This is used to construct the SQL query for a source for the very first time — since the Collector hasn't yet received any records from that source, it can't know what value and format the offset should take on. In many cases the SQL query will expect a particular syntax for the offset, for example something like Mar 12 2012 10:12:03pm, and therefore the Connector needs to pre-initialize the source with a string that matches that format.

This covers the basics of handling databases. The SDK template supports some additional features to help drive database-oriented Collectors, see the SQLQuery class documentation for more detail.

Forward to Common Code
Back up to Develop to Sentinel

Collector Development Guide

CONNECTOR INTERACTION

Basic Architecture

Record Handling

Configuring Connector Properties

File Connector

Windows Event (WMI) Connector

Database Connector

Database Connector Queries

Collector Development Guide

Development Topics