Faster Suricata startups with Hyperscan caching

Suricata, as a high-performance network analysis and threat detection engine, scans for thousands of patterns at once to speed up its operation. Hyperscan, as one of Suricata’s Multi-Pattern Matching (MPM) libraries, is widely used for its efficient pattern matching.

Suricata 8 introduces Hyperscan Multi-Pattern Matching (MPM) caching, a feature that dramatically reduces startup times by avoiding repeated compilation of large pattern databases. Traditionally, on every Suricata start, Hyperscan compiled all patterns into a special optimized database. This process could take up to several minutes on large rule sets. With detect.sgh-mpm-caching enabled, Suricata now stores these compiled databases on disk and loads them on subsequent startups, cutting startup time by up to 50%!

A gif with a demo comparison of the hyperscan caching.
Figure 1: Startup time live comparison, the baseline is on the left (24 seconds), the caching is on the right (12 seconds).

Now go try it for yourself! But with the occasion of unveiling this feature, I thought of writing a technical deep dive on how this feature actually works, along with presenting Suricata’s bag of tricks to optimize packet processing. In the end, you will have a comprehensive understanding of what is actually happening during the Hyperscan MPM caching process.

As Shivani showed in her Suricon 2024 talk, naively processing, e.g., 1M packets per second with 100k signatures leads to 100k checks each 1 microsecond (packet’s time budget when processing 1M packets per second). Suricata needs to work smart to handle high packet rates without losing precision.

Signature groups

The first trick is signature grouping. Suricata organizes signatures into groups based on protocol, direction, and port ranges. Each packet is mapped to exactly one Signature Group Head (SGH), and only the signatures in that group are considered.

Why is this important? Because it narrows the search space and is crucial for performance. A TCP packet to port 80 doesn’t need to be compared against ICMP or UDP signatures. Similarly, DNS signatures are grouped under UDP signature groups, so they won’t slow down HTTP traffic analysis. Some signatures may be part of multiple groups too (e.g. signatures targeting L3 (IP) protocols or all L4 ports). 

Signature Groups are identified by L4 protocol (ICMP4/6, TCP, UDP), traffic direction (to server/to client), and server port range (SGs don’t overlap). The number and structure of groups can be tuned in the detect section of suricata.yaml. Distribution of SGHs can be seen in rule_group.json after enabling settings in detect.profiling.grouping.*. Note that it is best to explore with only a handful of rules first, to not get overwhelmed.

Here are some signature group examples:

  • tcp.toserver.80-80 — SG contains particularly signatures focused on server port 80, but also other signatures that don’t specify ports or traffic direction
  • icmpv4.toserver — ICMP groups are not port-grouped, since there are no ports
  • udp.toclient.0-65535 — catch-all group

Multi-Pattern Matching (MPM)

Once a packet is assigned to an SGH, Suricata still needs to figure out which signatures in that group are worth evaluating in detail. Checking every condition of every signature would be far too costly. This is where prefiltering and Multi-Pattern Matching (MPM) come in to make a difference .

Prefiltering logic

Each Suricata signature is essentially a set of conditions combined with a logical AND. To match, all conditions (e.g., dsize, flow.age, content) must hold true. Instead of evaluating everything up front, prefilters try to quickly rule out (short-circuit) impossible signature matches using the cheapest and most selective conditions first. If signature-specified, the first layer of prefilters focuses on numerical values such as TTL or packet size (dsize).

Multi-Pattern Matching (MPM)

MPM, as one of the core prefilters, focuses on matching rules’ fast patterns. Fast patterns are string literals defined by content keywords. Suricata extracts one fast pattern per rule, and it is identified either explicitly by the user or through Suricata’s internal heuristics. A fast pattern is ideally the least likely string of the rule to occur in benign traffic. Selecting patterns in this way is desired to avoid additional work (i.e., full rule evaluation) as not matching the rule pattern means the rule can never match, and Suricata can skip it. Single Pattern Matching (SPM) is a step after MPM when the candidate signature’s contents and their relationships/restrictions (e.g., depth, distance, absent) are evaluated.

Generally, rule writers know best the most implausible string literal in the network traffic. Therefore, Suricata relies on this manually hand-picked pattern in the fast pattern selection process. This content is denoted with the fast_pattern post directive. As a fallback, Suricata chooses the longest pattern with some MPM context specified, e.g., http.uri.

For example, if a rule says

alert http any any → any 80 (http.uri; content:"/search"; pcre:"/\/search\.php\?q=.*(['\"\;]|--|\bOR\b|\bAND\b)/"; http.method; content:"GET"; ...)

there’s no need to check for other contents (SPM) in e.g. HTTP method or the expensive regular expression matching (pcre) unless the string “/search” actually occurs in the HTTP URI.

These pattern sets are compiled from different inspection contexts (sticky buffers). At runtime, Suricata feeds the parsed data buffers (payload, HTTP URI, DNS query, TLS SNI, etc.) into the MPM engine that uses only the context-specialized pattern sets. Signatures of which patterns matched are further evaluated.

SGH: tcp.toserver.80
   ├── payload.db      (patterns for raw TCP payload)
   ├── http.uri.db     (patterns for HTTP URIs)
   └── tls.sni.db      (patterns for TLS Server Names)

MPM caching

Suricata, by default, divides the ruleset into 128 groups, where each group has around 30 different inspection contexts (payload, http.uri, tls.sni) and each context can have hundreds to thousands of patterns. To be fast at runtime, pattern matching engines (e.g., Hyperscan) compile these pattern sets into highly optimized pattern databases.

This means that Suricata needs to spend on every startup or a rule reload the processing power to convert raw patterns into compiled pattern-matching databases. Up to now, Suricata has optimized this process by storing the compiled databases in the runtime memory to share one database across all the same pattern sets (occasionally, different SGHs contain the same signatures). While it optimized the subsequent compilations, the creation of the initial databases still took the precious time. This was especially visible on Hyperscan/Vectorscan MPM algos.

Suricata 8 brings disk-enabled caching of MPM databases. As a result, Suricata can skip the pattern compilation step and ensure reliably faster startup/reload times. Cache files are separate files, each storing one pattern set database. Storing the MPM databases on the granular level additionally allows them to be reused even when the overall ruleset is slightly changed. This is because, very likely, some pattern sets will remain the same (adding a TCP-based rule won’t affect UDP SGHs).

Suricata informs about the MPM caching by:

mpm-hs: Rule group caching - loaded: 115 newly cached: 0 total cacheable: 115

At the moment, cache files are never deleted automatically (sneak peek for automatic stale cache pruning in PR#13850) and require proper write filesystem permissions. In very rare cases, hash collisions may cause a warning. If the warning persists over multiple runs, caching should be disabled to avoid mismatched databases.

Conclusion

And that’s a wrap! Startup times cut in half, less wasted CPU, and a smoother reload process thanks to Hyperscan caching. Under the hood, Suricata keeps getting smarter about doing less work for the same result. I hope you also enjoyed the tour through the initialization and detection pipeline of Suricata. If you test it, I’d love to hear how much time you save on your ruleset. And if you have more ideas how to accelerate Suricata please let us know!

Written by: Lukáš Šišmiš, Suricata team member.

The post Faster Suricata startups with Hyperscan caching appeared first on Suricata.

Image

Pensée du jour :

Ce que l'homme a fait ,

l'homme peut le défaire.

 

"No secure path in the world"