Bug #6963
openrule-reload: potential memory leak in multiple rule reloads
Description
There is a potential memory leak present in Suricata 6.0.18 and 7.0.4 that is revealed by the memory usage on rule-reloads.
This doesn't need any traffic being forwarded.
To reproduce it, start Suricata with the ET Open ruleset and default/basic settings. You can use a dummy interface instead of an actual interface where traffic is forwarded to.
ip link add dummy0 type dummy
ip link set dummy0 up
Once Suricata did start check the memory usage, for example with htop. After 2-3 minutes trigger a rule reload (either via suricatasc or sending USR2 signal).
Observe the memory output and repeat this a few times. You should see that in most cases the memory usage increases during the reload, is reduced a bit in the end but the overall diff between the time before and after the reload is positive.
On a testrun with 6.0.18 I saw the following usage for the Suricata process with the first value being "VIRT" and the second one being "RES" memory value read from htop:
3170/732 3443/993 3485/1026 3486/1039 3490/1049
Those are the values for 7.0.4
3271/883 3587/1182 3653/1247 3666/1278 3679/1291
The PR https://github.com/OISF/suricata/pull/9756 which is linked at https://redmine.openinfosecfoundation.org/issues/6454 doesn't change that issue (I tried a backport of that PR to 7.0.4)
Updated by Andreas Herz 7 months ago
Some more additions, the first bump in memory usage happens with the call of "SigLoadSignatures" from 3271/883 to 3888/1499 in the example, this is expected to be an increase since the rules are loaded in parallel while the current ruleset is still active and will be swapped later.
The second bump happens when "DetectEngineReloadThreads" is triggered, the bump is much smaller, from the 3888/1499 to 3907/1502 so very minor.
With the "DetectEnginePruneFreeList" most is freed again down to "3587/1182" and the new addition with the malloc PR reduces it further down to "3587/964" which makes sense that less RES memory is used when the "malloc_trim" happens. But we still have an overall increase that will go up for each reload and thus overtime the system memory will be exhausted at one point.
Updated by Andreas Herz 6 months ago
Further investigation with different runs of Suricata with the default `glibc`, `jemalloc` and `tcmalloc` showed no diff on the root issue. There is a slight diff on how much memory is used (see table below) but the steady increase is there in all 3 cases.
The additional output described in https://blog.inliniac.net/2014/12/23/profiling-suricata-with-jemalloc/ for jemalloc also showed no actual leak, so it's more a logical "leak"
Also `runmode=single` shows the issue as well.
Memory usage (VIRT and RES) after the start, after the first reload, after the second reload
Default glibc VIRT/RES 1065/761 1381/859 1402/1096 tcmalloc VIRT/RES 966/854 1607/1494 1657/1595 jemalloc VIRT/RES 1185/788 2007/975 2119/972
Updated by Andreas Herz about 2 months ago
I had some time to play around with the suggestion from Victor to see if it's related to threshold, classification and/or reference. I tried a ET ruleset where I removed all metadata (except sid/rev/msg) and all thresholds. The "leak" is still present there as well.