Rewriting http_inspect has given me a real appreciation for the original. In this business we don’t always show proper respect for old software. I’ve come to understand how many difficult problems http_inspect solves, how much real-world messiness it deals with, and how it remains fast and efficient in spite of these things. Many really smart people have contributed to it over the years.
That’s a lot of expectations to live up to and it has been a lot of work. But it is paying off because new_http_inspect is approaching a critical mass of features. Over the next few months we will complete the remaining core capabilities including gzip decompression and flow depth limits. That’s when the real fun begins as we start to add powerful new features such as support for HTTP/2.0 and other advanced web protocols.
Perhaps the most fundamental difference in new_http_inspect is true separation of HTTP from the lower protocol layers. new_http_inspect is a module that inspects HTTP protocol messages. That sounds obvious and even trite but classic http_inspect does something slightly different. It separately inspects pieces of HTTP protocol taken from the underlying TCP segments and it does this in a fairly stateless way. Heroic redevelopment efforts in recent years have largely overcome this weakness but it still has many subtle effects on how features work. Here are two examples where we hope the big-picture approach of new_http_inspect will be simpler to use and easier to understand.
Flow Depth
HTTP messages can be very long. A single response may include a file with many megabytes of data. Snort can run detection rules on all of it if you like. But you may not want to because detection is hard work and if you have a lot of traffic you may discover you need bigger, badder, and especially more machines to run Snort on. Meanwhile most of the malware tends to be in scripts near the beginning of the message body and not embedded in the middle of a video someone is downloading. It’s normal to set a parameter called "flow depth” which limits the amount of data Snort runs through detection to a reasonable amount.
If you look in the classic Snort Manual you can find the “server_flow_depth” and “client_flow_depth” parameters under 2.2.7 HTTP Inspect. In my copy almost all of page 69 is devoted to explaining them. Sometimes the limit is per TCP segment and other times it is per TCP connection. The HTTP headers may be included in the limit or only the message body. It can matter whether the traffic was zipped. It might be all used up by a jumbo cookie in the headers even if you are not inspecting cookies at all.
It can be difficult to figure out whether your detection rules are going to cover a situation. It may depend on things that shouldn’t matter, like how the bad guy cleverly divided his message up into packets and whether he tossed in an extra large chocolate chip cookie as a distraction.
The approach for new_http_inspect is simple. The flow depth is the first N bytes of the message body. That’s the part of the message body that detection rules can search. TCP segment boundaries don’t matter. Reusing the TCP connection for multiple messages doesn’t matter. Zipping doesn’t matter because the unzipped size is always used. You can still set different limits for client messages (POST/PUT) and server responses.
Detection Buffers
new_http_inspect will simplify the rule-writing process and improve detection efficiency by better organizing the data buffers provided to detection. It divides HTTP messages into separate sections for the start line, the rest of the headers, and the message body (in 16K blocks). Already things are better because TCP segment boundaries become irrelevent and cannot be manipulated by an attacker to avoid detection. Searches for specific items only need to be performed within the section of the message that might contain them, improving efficiency and minimizing false positives. Keywords that identify specific message features such as URI (http_uri) and message headers (http_header) will be supplemented by new keywords to further narrow the search. Among those planned are subcomponents of the URI including path, query, host, and authority; the reason phrase; and individual message header fields. The latter is already implemented and eliminates the need to write rules that search all message headers trying to match the name of a specific header field followed by a colon, white space, and the field value. Instead the rule specifies the name of the specific header field to examine followed by the field value you want to match.
The new approach also makes it easy to search for combinations of things that are in different message parts. Suppose you want to search for a particular URI fragment combined with a signature in the POST body. In classic Snort that is hard to do unless the entire message is small and processed all at once. new_http_inspect is aware of the entire HTTP message. When a POST body section is received it can be searched for the signature while simultaneously searching the stored URI for a match. This principle can even be extended to matching the requested URI with a signature in the response from the server.
One of the biggest questions is what to do next? What new features would you like to see for new_http_inspect? This is an opportunity for a good idea to have a really major impact on the future of Snort. Send your suggestions to the snort-devel mailing list.