Wednesday, December 29, 2010

Working with unified and unified2 files

First you might ask "why do I care about unified files?" (unified2 these days, more on that later)

There is no perfect answer to that, just good reasons you should. Some of those reasons are:

  • It is a unified format allowing for easy archive management
  • It is the fastest output method available for snort
  • It is the preferred method used by Sourcefire and the most tested.
  • It is not easily modified. (It is still possible however)

8 years ago if you needed to get data from a snort unified file and do something with it you had few options, there was barnyard and that was it. This is fine and for good reason people were using unified output spooling to disk, reading those files with barnyard, and storing them in a mysql database. In the typical use case this was a perfect solution that worked for many years for many people. This solution had one major drawback though, unified files could not be processed by wetware without barnyard.

Normally this wouldn't be a problem as barnyard supported csv, fast alerts, database, pcap, etc as output methods. While a pain to process a unified file and then post process the results from several different file formats, it was sufficient to get the job done and not too painful.

...Unless you don't have barnyard available. This is exactly the situation that forced me to write the first iteration of I was working on a system that had unified files but no barnyard, no database, no compiler. Normally this isn't a problem either, you just install the tools you need but that was not allowed. As a matter of fact it was forbidden as these tools were not "certified" for use in production environments and the data was not allowed to leave this environment. The absurdity of that situation, having the data but no way to use it, by policy, was maddening. I thought to myself, who was the person that decided to use unified logging and nothing else? What was the value of having the data if you never intended to be able to use it? blag blah blah you can think the rest yourself, and just like I did, please keep those thoughts to yourself. Clearly their answer to not being able to use the data was to find a way to use the data when they had a need for it. Back to the point of the post.

In looking around on the system I quickly determined that there was xxd, perl, bash, and a smattering of other useful utilities. This was good as I was not looking forward to manual processing of these files from hex dumps. ( A side note is that I'm a fan of perl, it is a wonderful tool, so capable, so simple, so complex, it is what you need when you need something and at the same time the problem to your solution if done wrong.) The Snort source and perl was all I needed to get to the data they wanted and to save me from death by paper and pen. The first version of was in fact and quite possibly the ugliest thing you have ever seen.

So here we are. What is this thing? is a perl module for processing unified files (a set of perl modules actually). It isn't well documented outside the code itself, has few examples, and has saved me at least once. I know it is used in production environments and assume it is stable as I never get bug reports, just feature requests, though not since I added the processing hooks I'll be discussing below.

If you would like to know more you can browse the source in google code, get the latest source, or just continue reading, it is up to you. Working with SnortUnified is easy and depending on what you are trying to do could be as simple as a few lines of perl.
use SnortUnified(qw(:ALL));
use Data::Dumper;

= openSnortUnified(shift);
while ( $record = readSnortUnifiedRecord() ) {
print Dumper($record);

In the source tarball there is a samples directory with various samples available from converting a unified file to XML to signing the unified files or records within them; ensuring they cannot be modified without detection.

Two samples that I think are worth discussing and illustrate core functionality are and

The comments in the beginning of the handler script are sufficient for us to discuss here. First up are handlers, these get called for each record type / action defined and allow you to work with the records without having to change your functional tools. I initially envisioned this as a way to handle transformation of data within the records without having to rework the module. EG: convert an IP address integer to a dotted quad without having to change anything else. I also find it is quite useful for other tasks like converting payloads, debugging, adding output methods, etc. The result of a handler is irrelevant to the processing path, qualifiers are there for that.
# Handlers come before qualifiers come before pcre

# handlers will be run and regardless of the result processing will continue
# The available handlers for are
# ("unified_opened", $UF);
# ("unified2_event", $UF_Record);
# ("unified2_packet", $UF_Record);
# ("unified2_unhandled", $UF_Record);
# ("unified2_record", $UF_Record);
# ("unified_record", $UF_Record);
# ("read_data", ($readsize, $buffer));
# ("read_header", $h);

register_handler('unified2_packet', \&make_ascii_pkt);
register_handler('unified_record', \&make_ascii_pkt);
# register_handler does not care about return values so the following will continue
# register_handler('unified_record', \&make_noise_fail);
# register_handler('unified_record', \&make_noise);

register_handler('unified2_record', \&printrec);
register_handler('unified_record', \&printrec);

# show_handlers();

# Qualifiers will be run, if any return a value less than 1
# then the record will be discarded and processing will continue
# with the next record in the file
# Only one option for unified types

# Skip all but sid 402
# register_qualifier(0,0,0, sub{return 0;});
# By having something specific for 402
# register_qualifier(0,1,402, \&printrec);
# register_qualifier(0,1,402, sub{return 1;});
# register_qualifier(0,1,402, \&make_noise);
# register_qualifier(0,1,402, \&make_noise);
# register_qualifier(0,1,402, \&make_noise_fail);
# register_qualifier(0,1,402, \&make_noise_never);

# But you can be granular with unified2 types
# register_qualifier($UNIFIED2_IDS_EVENT,1,402, \&make_noise);
# register_qualifier($UNIFIED2_PACKET,1,402, \&make_noise);

# register_pcre(1,402, "test");
# register_pcre(1,402, "*");
# register_pcre(1,402, ".**");

# show_qualifiers();

Breaking out the comments above:

  • unified_opened gets called every time a new unified file is opened. This allows you to do things like hook in verification routines, mark progress in a waldo file, create verbose informed logging, etc.
  • unified2_event gets called every time a unified2 event record type is encountered in the unified file.
  • unified2_packet gets called every time a unified2 packet record type is encountered in the unified file.
  • unified2_unhandled gets called if a record type that is unknown / not handled by default is encountered in the unified2 file. This is useful as a bridge for new types, like the recently added 104 and 105 event types. I suggest that minimally you register a handler to log the unhandled types to a file so you can further investigate.
  • unified_record gets called for earlier types of records only found in "unified" files, not "unified2" files
  • read_data gets called every time we read data from the unified file.
  • read_header gets called every time we read a header from the file.
  • register_handler('unified2_packet', \&make_ascii_pkt); registers the "make_ascii_pkt" subroutine as a handler for all unified2_packet records. It gets called as follows make_ascii_packet($UF_Data) and line 105 of the uf_csv_handler is the beginning of that routine.
  • show_handlers() will print the handlers that are registered.
Qualifiers are intended to allow you to "tune after the fact" and I think this is the right way to handle a lot of tuning tasks. We should not be artificially limited in our detection scope simply because that detection potentially creates a lot of "noise". I think that we should have the freedom to cast the widest net necessary and possible when it comes to these things, especially when investigating evolving threats and potentially targeted and ongoing attacks.

Qualifiers also allow you to get very specific in your actions. You can limit a qualifier to a specific sid or cast a super wide net on all events. Qualifiers are all executed so you can do things like easily suppress output, except for specific events. You can also get granular within those specific events. In the comments above all events are suppressed except gen:1 sid:402 and then only for sid:402 events that contain the PCRE ".**". You could make the qualifier look at the IP address, check an archived (as in comprehensive) blacklist and change the context of the event, etc. I used a PCRE earlier, this is done with a special type of qualifier (purpose built?) that makes it trivial to use a regular expression as a qualifier though it has one caveat, a PCRE will not override the decision of other qualifiers, it only has a vote but in the case of conflict, the proper qualifier wins.

Qualifiers are called in the same way as handlers but the return code is checked and any return with a value that is less than one is discarded, processing continues on the next record, and
your code never knows that an event was skipped. If you want or need tracking of these things you could use the qualifier to populate some stats and a handler to keep global stats.

If you use the code please drop me a note letting me know, it helps me gauge interest and any impact of possible changes. I've not modified the code in some time, largely because I've not needed to. But, if you have patches, requests, criticisms, commentary, or want to catch a pint at the next event we are both at feel free to let me know.

That is all for now, if you would like to know more or need assistance don't hesitate to ask. I can be reached at or My details are also in the README contained in the release tarball if you don't want to come back to the post.

Happy Snorting!