Bug #4421
closedflow manager: using too much CPU during idle (6.0.x backport)
Description
Less extreme variant of #4096: the new usleep
based loops in the flow manager and recycler consume too much CPU time. In busy systems this has been show to be more efficient than the old pthread_condtimed logic, but in lower end systems the usleep approach has too much overhead.
The reason for getting rid of the pthread_condtimed logic was that it would actually wake up millions of times per second on busy systems, leading to lots of overhead.
Perhaps we can make the usleep value configurable. On lower end systems there is no need to wake up frequently. Or optionally bring back the old behavior. Overall I'm not a great fan these kinds of options.
Files
Updated by Jeff Lucovsky over 3 years ago
- Copied from Bug #4379: flow manager: using too much CPU during idle added
Updated by Victor Julien over 3 years ago
- Target version changed from 6.0.3 to 6.0.4
Updated by Victor Julien about 3 years ago
- Target version changed from 6.0.4 to 6.0.5
Updated by Victor Julien over 2 years ago
- Target version changed from 6.0.5 to 6.0.6
Updated by Victor Julien over 2 years ago
- Target version changed from 6.0.6 to 6.0.7
Updated by Victor Julien over 2 years ago
- Subject changed from flow manager: using too much CPU during idle to flow manager: using too much CPU during idle (6.0.x backport)
For backports to 6.0.7:
https://github.com/OISF/suricata/pull/7534/commits/e6ac2e4e8a697a4c98b637a0d6c58dce8fb918aa
https://github.com/OISF/suricata/pull/7534/commits/f271fb457522d77a1befeb1d097c125afcbdeeb9
Might be good to already do a PR so it can be tested already by those affected by the issue.
Updated by Jeff Lucovsky about 2 years ago
- Status changed from Assigned to In Progress
- Assignee changed from Shivani Bhardwaj to Jeff Lucovsky
Updated by Jeff Lucovsky about 2 years ago
- Status changed from In Progress to In Review
Cherry-pick commit(s):
- e6ac2e4
- f271fb4
Updated by Jeff Lucovsky about 2 years ago
Updated by Jeff Lucovsky about 2 years ago
- Status changed from In Review to Closed
Updated by Michiel Janssens about 2 years ago
- File 5.0.7 pidstat in vm measuring suricata process 5.0.7 pidstat in vm measuring suricata process added
- File 5.0.7 pidstat on host measuring pid vm running suricata 5.0.7 pidstat on host measuring pid vm running suricata added
- File 5.0.10 pidstat in vm measuring suricata process 5.0.10 pidstat in vm measuring suricata process added
- File 5.0.10 pidstat on host measuring pid vm running suricata 5.0.10 pidstat on host measuring pid vm running suricata added
- File 6.0.0 pidstat in vm measuring suricata process 6.0.0 pidstat in vm measuring suricata process added
- File 6.0.0 pidstat on host measuring pid vm running suricata 6.0.0 pidstat on host measuring pid vm running suricata added
- File 6.0.1 pidstat in vm measuring suricata process 6.0.1 pidstat in vm measuring suricata process added
- File 6.0.1 pidstat on host measuring pid vm running suricata 6.0.1 pidstat on host measuring pid vm running suricata added
- File 6.0.2 pidstat in vm measuring suricata process 6.0.2 pidstat in vm measuring suricata process added
- File 6.0.2 pidstat on host measuring pid vm running suricata 6.0.2 pidstat on host measuring pid vm running suricata added
Hi, I noticed your heads-up here and that it was merged to the 6.0.x branch, thanks for that.
I was following this for a while already in the forum and here on the issue list, so I thought now it's time to do effort and try to measure the changes for several suricata releases.
And it seems that my hunch that mainly the cost of context switching is one of the import ingredients for the high cpu load is on the right track.
I'll let you decide and interpret the results I found so far.
In a summary I put together most important results from the files I uploaded in this report, just to get a feeling for the differences. Not all values are totally correct as some are distributed on multiple threads.
My interest in suricata is mainly through opnsense, which I run virtualized in Proxmox VE.
I haven't tested suricata on bare metal.
Hardware specs for the system I tested on: older amd opteron 6380, 1 socket 1cpu, 16 core.
VM with Ubuntu 22.04.1 LTS and 8 cores configured in proxmox. This VM runs Suricata for all the tests. No other vm's running during testing.
I hope that the naming of the uploaded files speak for themselves.
I felt that pidstat gave me the most important results, so all the files have output from pidstat.
As an extra I also did a test with stress-ng to simulate context switching. It's not exactly comparable with the other tests, but you see the same sort of cpu increase.
Please ask if some of my comment or uplodaded files is not clear or correct.
Updated by Michiel Janssens about 2 years ago
- File 6.0.3 pidstat in vm measuring suricata process 6.0.3 pidstat in vm measuring suricata process added
- File 6.0.3 pidstat on host measuring pid vm running suricata 6.0.3 pidstat on host measuring pid vm running suricata added
- File 6.0.4 pidstat in vm measuring suricata process 6.0.4 pidstat in vm measuring suricata process added
- File 6.0.4 pidstat on host measuring pid vm running suricata 6.0.4 pidstat on host measuring pid vm running suricata added
- File 6.0.5 pidstat in vm measuring suricata process 6.0.5 pidstat in vm measuring suricata process added
- File 6.0.5 pidstat on host measuring pid vm running suricata 6.0.5 pidstat on host measuring pid vm running suricata added
- File 6.0.6 pidstat in vm measuring suricata process 6.0.6 pidstat in vm measuring suricata process added
- File 6.0.6 pidstat on host measuring pid vm running suricata 6.0.6 pidstat on host measuring pid vm running suricata added
- File 6.0.6 usleep 100 pidstat in vm measuring suricata process 6.0.6 usleep 100 pidstat in vm measuring suricata process added
- File 6.0.6 usleep 100 pidstat on host measuring pid vm running suricata 6.0.6 usleep 100 pidstat on host measuring pid vm running suricata added
Updated by Michiel Janssens about 2 years ago
- File 6.0.6 usleep 50000 pidstat in vm measuring suricata process 6.0.6 usleep 50000 pidstat in vm measuring suricata process added
- File 6.0.6 usleep 50000 pidstat on host measuring pid vm running suricata 6.0.6 usleep 50000 pidstat on host measuring pid vm running suricata added
- File 6.0.6 usleep 1000000 pidstat in vm measuring suricata process 6.0.6 usleep 1000000 pidstat in vm measuring suricata process added
- File 6.0.6 usleep 1000000 pidstat on host measuring pid vm running suricata 6.0.6 usleep 1000000 pidstat on host measuring pid vm running suricata added
- File 6.0.7-dev (f40ad90ad 2022-09-19) pidstat in vm measuring suricata process 6.0.7-dev (f40ad90ad 2022-09-19) pidstat in vm measuring suricata process added
- File 6.0.7-dev (f40ad90ad 2022-09-19) pidstat on host measuring pid vm running suricata 6.0.7-dev (f40ad90ad 2022-09-19) pidstat on host measuring pid vm running suricata added
- File Baseline pidstat in vm measuring all Baseline pidstat in vm measuring all added
- File Baseline pidstat on host measuring pid vm Baseline pidstat on host measuring pid vm added
- File Stress-ng pidstat in vm measuring stress-ng cyclic usleep process Stress-ng pidstat in vm measuring stress-ng cyclic usleep process added
- File Stress-ng pidstat in vm running stressng cyclic usleep measuring all Stress-ng pidstat in vm running stressng cyclic usleep measuring all added
Updated by Michiel Janssens about 2 years ago
Updated by Michiel Janssens about 2 years ago
And to add:
All suricata installs were done from source, with make install-full.
Starting suricata with default yaml.
Measurement was done during idling and waiting a bit for the process to settle down, no traffic on suricata.
On thee Proxmox host cpufreq.default_governor performance was set. normally this system runs schedutil governor, but that would interfere witch the measurements.
If these results are correct, I think many people will be pleased when 6.0.7 will be released with this backport included, especially considering current energy costs.
Updated by Jeff Lucovsky about 2 years ago
@Michiel Janssens Thanks for the additional data files and analysis.
Updated by Michiel Janssens about 2 years ago
- File Baseline top in freebsd vm measuring all Baseline top in freebsd vm measuring all added
- File 6.0.6 pidstat on host measuring pid freebsd vm running suricata 6.0.6 pidstat on host measuring pid freebsd vm running suricata added
- File 6.0.6 top in freebsd vm measuring suricata pid 6.0.6 top in freebsd vm measuring suricata pid added
- File 6.0.6 vmstat in freebsd vm measuring all 6.0.6 vmstat in freebsd vm measuring all added
- File 6.0.8 pidstat on host measuring pid freebsd vm running suricata 6.0.8 pidstat on host measuring pid freebsd vm running suricata added
- File Baseline pidstat on host measuring pid freebsd vm Baseline pidstat on host measuring pid freebsd vm added
- File Baseline vmstat in freebsd vm measuring all Baseline vmstat in freebsd vm measuring all added
- File 6.0.8 vmstat in freebsd vm measuring all 6.0.8 vmstat in freebsd vm measuring all added
- File 6.0.8 top in freebsd vm measuring suricata pid 6.0.8 top in freebsd vm measuring suricata pid added
And to conclude the analysis, additional results for release 6.0.8 and now also freebsd (as my main reason was testing for opnsense).
An updated summary was also included. Results on freebsd are comparable with ubuntu.
@Jeff Lucovsky Thank you too for maintaining this project and the other contributors too.
Updated by Michiel Janssens about 2 years ago
- File Summary usleep tests with freebsd and 6.0.8.ods Summary usleep tests with freebsd and 6.0.8.ods added
- File 6.0.8 pidstat on host measuring pid vm running suricata 6.0.8 pidstat on host measuring pid vm running suricata added
- File 6.0.8 pidstat in vm measuring suricata process 6.0.8 pidstat in vm measuring suricata process added