Collector Development Guide

COLLECTOR OVERVIEW

Collectors are designed for one purpose: to collect data from endpoint systems, parse that data, and send it to the Sentinel platform. Most Collectors perform this function primarily for event data, but (on SIEM platforms) they can also process identity, host, and vulnerability data to provide context to event data.

This Guide will walk you through the process of creating a Collector to parse data for a new data source. The most common techniques, usage, and best practices will be covered, but if more information is needed look in the Development Topics area. Not all steps in the Guide are necessary for all Collectors — there's a section on complete documentation, for example, which might not be needed in your environment — but if all steps are followed at the end of the process you should have a complete, functional, tested, and documented Collector.

Sentinel Collector Plug-ins

Sentinel Collectors run within the Collector Manager environment and are managed using the Web UI or the Event Source Manager application. The Collector Manager environment provides much of the infrastructure to run Collectors and associated components, and provides the interface that allows Collectors to receive and send data. The other key component within Collector Manager is the Connector, which works in tandem with the Collector to gather and process data:

Connector: Provides either inbound (data sent from source to Connector) or outbound (Connector queries for data) protocol-level communications. Instantiates an Event Source to represent each distinct data source being processed. Converts inbound event data into a textual map form consumable by the Collector.
Collector: Receives the textual map from the Connector, parses and normalizes the proprietary data format in that map into the Sentinel Event Schema. Enriches the Event with additional source-specific data. Sends the Event to the Collector Manager framework.
Collector Manager: Hosts the Collectors and Connectors. Receives the normalized Event from the Collector. On SIEM platforms, performs local filtering and applies maps from the Mapping Service. Reliably delivers the Event to the backend Sentinel system.

The split between the Connector and Collector is intended to segregate the tasks of fetching the data from the task of parsing the data. The Connector is generalized to support a specific protocol and is usually not concerned with the particular product it is fetching data from as long as that product follows the protocol, but it must be configured either by the UI or by the Collector to fetch the data correctly from the specific source and in the specific format the Collector expects to receive. The Collector however is written to support a single product from a single vendor (except for a few “Generic” Collectors), and encodes special knowledge about the product, how its data can be captured, and how it can be parsed.

The Collector Manager framework presents a very simple interface to the Collector: basically, it provides a queue from which the Collector can pick up events from the Connector(s), and it provides a method by which the Collector can send events to the Collector Manager. It then invokes the Collector script within that environment. There are also some additional auxiliary functions it provides to allow the Collector to cache its state, or to query for information about its environment. And of course the Collector Manager provides the infrastructure within which Collectors are instantiated, configured, and hooked up to one or more Connectors and Event Sources.

Within that simplistic environment the SDK provides a complete template that makes the development of Collector much easier. The template provides:

The basic control flow (loop) that processes one event at a time and then repeats.
The infrastructure for configuring the Connector to properly query the Event Source.
The configurable interface by which inbound records are picked up from the import queue, with proper timeouts and error handling.
The conversion of the native Javascript Event object into the form required by Collector Manager.
A simple framework for providing configurable parameters to control Collector operations in different environments.
JavaScript Application Programmer's Interfaces (APIs) to allow the Collector to store Identity, Asset (host), and Vulnerability information in SIEM platforms.
Utility classes and methods to assist with data parsing, normalization, and manipulation.
A customization capability that allows local instances to be tuned to work in custom environments.

The image below shows the overall control flow of the standard Collector template. The states in grey are built right into the template and are not touched by the developer; the states in green are part of the template and have default code but can be modified by the developer; and the states in yellow are provided to allow end customers to tweak the Collector for local needs (they are not executed unless the Collector is configured for 'custom' Execution Mode).

In practice the developer will set up the Collector in initialize(), and then put most of the parsing code in preParse(), parse(), and normalize(). The query(), postParse(), reply(), and cleanup() states are rarely used. Note that the Collector is structured such that each method is a member of one of the foundational classes, meaning that each method is called on something. For example the Collector calls initialize() on itself to get configured, and then calls preParse(), parse() etc on the received Record object to apply the transforms that those methods represent. In a sense, the Record object is therefore parsing itself - in the sample code you will see that within say the parse() method, the Record being parsed is referred to as 'this'. If this isn't immediately familiar to you, you should review the first couple sections of this article, which presents a nice concise overview.

There are a number of other design considerations to keep in mind when writing a Collector — for example, you in general don't add values to the output Event object, instead you construct a map that defines how the input Record will be transformed into the output Event — but we'll cover those details later.

Forward to Getting Started
Back up to Develop to Sentinel