Bug #6376
closedHuge increase on Suricata load time with a lot of ip-only rules and bigger HOME_NET
Description
At deployments with bigger HOME_NET variables, like a university or big enterprise, we could end up with a very long rule load time at startup but also reload. At one deployment the rule reload even took several hours which was the initial report on that issue.
I was able to simulate this with Suricata 6.0.14 and 7.0.1 with the following setup:
HOME_NET: "[10.0.0.0/8,102.168.0.0/16,172.16.0.0/12,100.64.0.0/10,127.0.0.0/8,169.254.0.0/16,0.0.0.0/8,100.64.0.0/10,192.0.0.0/24,192.0.2.0/24,192.88.99.0/24,198.18.0.0/15,198.51.100.0/24,203.0.113.0/24,233.252.0.0/24,2001:db8::/32,fc00::/7,2001:20::/28,2001:0000::/32,64:ff9b:1::/48,64:ff9b::/96]"
This HOME_NET is just for pure testing with some subnets that are for testing, documentation etc. but still smaller as some HOME_NET deployments you would see at a big enterprise or university.
The rest of the config is default. The ruleset being used is https://threatfox.abuse.ch/downloads/threatfox_suricata.rules which has over 20k ip-only rules as of today.
Those are the stats for loading this ruleset, smaller portions of it and the ETOpen ruleset to compare as well:
> 34min (ThreatFox) 70931 rules (22560 ip only) -> 28min (Threatfox with just op only rules) 22560 -> 28s (Threatfox without iponly rules) 48731 rules -> 19s (ETOpen All) 33485 rules (0 ip only)
As you can see there is rather short load time when you have non ip-only rules that is below 1minute load time on a normal workstation system.
Once you have pure ip-only rules it grows very big. If you change the HOME_NET and add or remove parts you will see a direct increase or decrease as well.
In addition to that, we experienced in the past a big performance penalty with ip-only rules, so the problem might be even bigger on those rules.
I also forged a version of the ruleset that is using the ip dataset feature, the load time was again within seconds.
We could suggest ip dataset to threatfox and others, but we should also try to find the root cause of the big penalty when ip-only rules are used. If it's hard to fix or maybe due to good design choices, we should at least warn about this.