Bug #4502: TCP reassembly memuse approaching memcap value results in TCP detection being stopped - Suricata - Open Information Security Foundation

Actions

Copy link

Bug #4502

closed

TCP reassembly memuse approaching memcap value results in TCP detection being stopped

Added by Andreas Herz almost 4 years ago. Updated over 2 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Victor Julien

Target version:

7.0.0-beta1

Affected Versions:

6.0.2

Effort:

medium

Difficulty:

Label:

Description

We discovered on the majority of 6.0 deployments have an increase in overall memory usage but especially for tcp reassembly memusage we saw an increase that results with the memuse hitting the memcap (but not .memcap entry in stats.log) and at that point nearly the whole tcp app-layer detection stopped.

We see on all of those deployments some amount of tcp.reassambly_gaps and after the breakpoint where tcp reassembly memuse hits the memcap border those gaps increase much more and now tcp.insert_data_normal_fail starts to occur as well. As you can see in the Grafana output there are some correlations:

tcp reassembly memuse hitting the limit after less then 24hours with a 8Gbit/s traffic ingest (done via t-rex)
tcp insert fail rise as soon as the limit is hit
tcp gaps make a jump as well
tcp app layer totally gone, only udp (dns) still inspected
stable values of traffic, load, no drops, flows/min

The most interesting thing is that we can't reproduce it with 5.0.3 while everything else is the same (we only changed the config to not use the new app layer parsers that are only in 6.0).
(In general the overall memory usage with 5.0.3 is much smaller compared to 6.0.2)
In 5.0.3 the tcp memuse as well as flow memuse settle at one plateau which makes much more sense compared to the stable t-rex traffic ingest.

Some system Details:

Debian Buster with 5.10 Backports kernel.
Capture on X710 10GE SFP+ NICS with AF_PACKETv3 although there was no diff with using cluster_qm + RSS setting or just cluster_flow with 1 Queue.

Current workaround is a daily restart of Suricata for 6.0.2 and might be necessary to rollback to 5.0.3.

Any hints where we should look on the code side will help, nevertheless we will also check the git log if we can spot something for that regression.

Files

Download all files

tcpreassemblymemuse.png (482 KB) tcpreassemblymemuse.png		Andreas Herz, 05/25/2021 09:11 AM
buildinfo_50.txt (3.86 KB) buildinfo_50.txt		Andreas Herz, 05/27/2021 09:15 AM
buildinfo_60.txt (3.89 KB) buildinfo_60.txt		Andreas Herz, 05/27/2021 09:15 AM
flowmemuse.png (27.8 KB) flowmemuse.png		Andreas Herz, 05/27/2021 09:15 AM
perftop50.png (23.6 KB) perftop50.png		Andreas Herz, 05/27/2021 09:15 AM
perftop60.png (19 KB) perftop60.png		Andreas Herz, 05/27/2021 09:15 AM
tcpmemuse.png (30.3 KB) tcpmemuse.png		Andreas Herz, 05/27/2021 09:15 AM
top50.png (35.9 KB) top50.png		Andreas Herz, 05/27/2021 09:15 AM
top60.png (34.8 KB) top60.png		Andreas Herz, 05/27/2021 09:15 AM
massif2.out (31.3 KB) massif2.out		Andreas Herz, 07/07/2021 12:35 PM
massif.out (58.1 KB) massif.out		Andreas Herz, 07/07/2021 12:35 PM

Related issues 4 (0 open — 4 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Suricata

Custom queries

Bug #4502

TCP reassembly memuse approaching memcap value results in TCP detection being stopped

Updated by Andreas Herz almost 4 years ago

Updated by Andreas Herz almost 4 years ago

Updated by Peter Manev almost 4 years ago

Updated by Andreas Herz almost 4 years ago

Updated by Andreas Herz almost 4 years ago

Updated by Peter Manev almost 4 years ago

Updated by Andreas Herz almost 4 years ago

Updated by Andreas Herz almost 4 years ago

Updated by Odin Jenseg almost 4 years ago

Updated by Andreas Herz over 3 years ago

Updated by Andreas Herz over 3 years ago

Updated by Andreas Herz over 3 years ago

Updated by Andreas Herz over 3 years ago

Updated by Andreas Herz over 3 years ago

Updated by Andreas Herz over 3 years ago

Updated by Andreas Herz over 3 years ago

Updated by Andreas Herz over 3 years ago

Updated by Victor Julien over 3 years ago

Updated by Victor Julien over 2 years ago

Related to Suricata - Bug #4650: Stream TCP raw reassembly is leaking	Closed	Victor Julien	Actions
Related to Suricata - Optimization #4652: GAP handling improvements seem expensive	Rejected		Actions
Related to Suricata - Optimization #4653: Flow cleaning with chunked approach is memory hungry	Closed	Victor Julien	Actions
Related to Suricata - Bug #4654: tcp: insert_data_normal_fail can hit without triggering memcap	Closed	Eric Leblond	Actions