Bug #3782
closedOnce Suricata enters emergency mode it doe not recover properly
Description
Using
Suricata version 6.0.0-dev (639f3d265 2020-06-16)
AFPv3 , cluster_flow or clster_qm
In two separate cases (trex and pcap replay) I can reproduce the following:
Once emergency mode is entered, even after "recovery" it seems Suricata never really recovers to a stable state.
I can observe also that after entering the emergency mode there is a lot of cpu usage spent in mutex/pthreads locks functions (using perf top) - and that usage never recovers to normal operation.
Just like on the attached screenshot, before entering emergency mode, the top 3 functions CPU usage are FM/ and the pthread/mutex lock ones. after entering emergency mode , they switch and the mutex/lock functions take over the CPU usage completely.
This is an extreme case with Trex testing on 40G setup where any and all flows are "proper" and last 1-2 seconds but those include file transfers and similar nonetheless. Also the "active flows" are never over 1mil.
Sharing the runs and pcaps in a separate communication.
Files