Bug #1716
closed
live rule reloads not functioning on some servers
Added by Chris Beverly over 8 years ago.
Updated over 8 years ago.
Description
We have ~85 servers that are all running the same build of suricata, distrubuted as a docker image. On all but 5 of these servers, live rule reloading (docker kill -s USR2 suricata) works just fine. But on the 5 servers that it does not work on, the USR2 signal gets to the point of printing the "<Notice> - rule reload starting", but does NOT print the "<Notice> - rule reload completed" message like all of the others. Any subsequent attempt to issue a USR2 signal to the suricata process results in absolutely no activity or log output.
The containers have been removed and recreated from image, and the servers have been completely reprovisioned (full OS reinstall, all servers configured the exact same using config management). Nothing appears to be clearing this issue. Attached is the gdb output. The sequence of the output is as follows:
01) gdb was attached right after a fresh container creation and after suricata had completed it's full startup sequence
02) The rule reload was issued
03) 'thread apply all bt' was issued
04) 'cont' was issued
05) After the rule reload stuck at "<Notice> - rule reload starting", 'thread apply all bt' was issued once more
Please let me know if there is any other information that can be provided.
Files
It's also worth noting that while the "<Notice> - rule reload completed" message does not ever print on these 5 servers, the actual rule reload does appear to work for that first run (if test rule is added, alerts will show up for the new rule), though something appears to be preventing the rule reload function from completing and issuing the appropriate log message, which then prevents any future attempts to issue a rule reload from doing anything at all.
Is there really no difference between those machines? Same hardware and also same ruleset?
And it's a known "issue" that another USR2 signal that is send is ignored while a rule reload is still in progress. But we're looking to work on this as well.
They're all a bit different on hardware builds, but the 5 that live reloads do not work on match the hardware builds and rule sets of other servers that do work. Of the 5 that aren't working, 2 of them are seeing very little traffic for inspection (less than 50 Mb/s) while the others are between 8 to 13 Gb/s. Changing rule sets does not appear to have any effect on the 5 that aren't working.
I have seen the bug about a second rule reload not processing if one is already in progress, but I think the issue here is that the first one never completes on these five servers.
- Status changed from New to Assigned
- Assignee changed from OISF Dev to Victor Julien
- Target version changed from 3.0.1RC1 to 70
I suspect some of the threads never get a packet. Can you confirm this by enabling per thread stats logging (either in eve or the regular stats.log)?
I'll give some thought about how to handle such cases better.
I've attached the full stats.log file after having cleared it, enabled per thread statistics, let suricata run for 5 minutes, issued a rule reload, then run for another 5 minutes.
- Status changed from Assigned to Closed
- Target version changed from 70 to 3.0.1
Also available in: Atom
PDF