Support #1791
closedKernel Drops: Thread occasionally Spike to 100%
Description
All,
I have noticed an issue that occasionally when traffic hits around 600-800mbps that thread utilization will peak up to 100% causing kernel drops (about %.009 of our total traffic on the interface). I have worked with VictorJ on this issue a bit. Running Suri Version 3.0.1. He requested that i run `perf` on the thread that pegs. Here is the perf output output. It appears some instruction in SCACSearch is potentially being called too much or something. Let me know if I can provide anything else to help track this down. Attached is my suricata.yaml config as well as the perf report output.
Files
Updated by zach hatsis over 8 years ago
Please note i am also running PF_RING ZC as well and do not see drops on the ZC interfaces when running pfcount.
Updated by Peter Manev over 8 years ago
For starters you can try -
- max-pending-packets: 65536
- Your cpu affinity section needs to be adjusted. You can follow the guideline here - https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Suricatayaml#Relevant-cpu-affinity-settings-for-IDSIPS-modes
After you do the changes can you please start in "-v" mode and share the suricata.log (enabled inside suricata.yaml)
Updated by zach hatsis over 8 years ago
Peter Manev wrote:
For starters you can try -
- max-pending-packets: 65536
- Your cpu affinity section needs to be adjusted. You can follow the guideline here - https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Suricatayaml#Relevant-cpu-affinity-settings-for-IDSIPS-modesAfter you do the changes can you please start in "-v" mode and share the suricata.log (enabled inside suricata.yaml)
Peter, thank you for your input. I read through that guide and reduced the amount of management threads that i'm using, and repurposed the cores for the detect threads. I didn't however set priorities on the detect threads though as i wasn't clear on what those do in the article you linked. Also i set - max-pending-packets: 65534 (65536 was too high a value and suri wouldn't start.) still seeing some drops however it doesn't appear that they are caused by cpu's pegging at 100% now.
set-cpu-affinity: yes
- Tune cpu affinity of suricata threads. Each family of threads can be bound
- on specific CPUs.
cpu-affinity:
- management-cpu-set:
cpu: [ 18 ] # include only these cpus in affinity settings
mode: "balanced"
- receive-cpu-set:
cpu: [ 18 ] # include only these cpus in affinity settings
- decode-cpu-set:
cpu: [ 18, 19 ]
mode: "balanced"
- stream-cpu-set:
cpu: [ "18-19" ]
- detect-cpu-set:
cpu: [ 29,21,22,23,2,4,6,8,10,12,14,16 ]
mode: "exclusive" # run detect threads in these cpus- prio:
- low: [ 0 ]
- medium: [ "1-2" ]
- high: [ 3 ]
- default: "medium"
- default: "high"
- Use explicitely 3 threads and don't compute number by using
- detect-thread-ratio variable:
- threads: 16
- prio:
#low: [ "0-4" ]
#medium: [ "5-23" ]
#high: [ 3 ]
- verdict-cpu-set:
cpu: [ 0 ]
prio:
default: "high"
- reject-cpu-set:
cpu: [ 0 ]
prio:
default: "low"
- output-cpu-set:
cpu: [ "all" ]
prio:
default: "medium"
output from running suri:
- /root/scripts/suri_init.sh
[+] Purging log files in /var/log/suricata
[!] Suricata is running, killing 26605 26571 restarting now.
pids remaining:
[.] Starting Suricata.
[+] Starting Perimeter Suricata..
[+] Taskset suricata instance 1 to cores:
[+] 2,4,6,8,10,12,14,16,18-23
26/5/2016 -- 12:34:29 - <Notice> - This is Suricata version 3.0.1 RELEASE
26/5/2016 -- 12:34:29 - <Info> - CPUs/cores online: 32
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@0 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@1 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@2 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@3 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@4 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@5 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@6 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@7 from config file
26/5/2016 -- 12:34:29 - <Info> - 'default' server has 'request-body-minimal-inspect-size' set to 33882 and 'request-body-inspect-window' set to 4053 after randomization.
26/5/2016 -- 12:34:29 - <Info> - 'default' server has 'response-body-minimal-inspect-size' set to 42119 and 'response-body-inspect-window' set to 16872 after randomization.
26/5/2016 -- 12:34:29 - <Info> - DNS request flood protection level: 500
26/5/2016 -- 12:34:29 - <Info> - DNS per flow memcap (state-memcap): 524288
26/5/2016 -- 12:34:29 - <Info> - DNS global memcap: 16777216
26/5/2016 -- 12:34:29 - <Info> - Protocol detection and parser disabled for modbus protocol.
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@0': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@1': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@2': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@3': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@4': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@5': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@6': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@7': No such device (19)
26/5/2016 -- 12:34:29 - <Info> - allocated 3669960 bytes of memory for the defrag hash... 65535 buckets of size 56
26/5/2016 -- 12:34:29 - <Info> - preallocated 65535 defrag trackers of size 168
26/5/2016 -- 12:34:29 - <Info> - defrag memory usage: 14679840 bytes, maximum: 1073741824
[+] Taskset suricata instance 2 to cores:
[+] 3,5,7,9,11,13,15,17,18-23
26/5/2016 -- 12:34:29 - <Notice> - This is Suricata version 3.0.1 RELEASE
26/5/2016 -- 12:34:29 - <Info> - CPUs/cores online: 32
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@8 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@9 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@10 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@11 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@12 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@13 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@14 from config file
26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@15 from config file
26/5/2016 -- 12:34:29 - <Info> - 'default' server has 'request-body-minimal-inspect-size' set to 33882 and 'request-body-inspect-window' set to 4053 after randomization.
26/5/2016 -- 12:34:29 - <Info> - 'default' server has 'response-body-minimal-inspect-size' set to 42119 and 'response-body-inspect-window' set to 16872 after randomization.
26/5/2016 -- 12:34:29 - <Info> - DNS request flood protection level: 500
26/5/2016 -- 12:34:29 - <Info> - DNS per flow memcap (state-memcap): 524288
26/5/2016 -- 12:34:29 - <Info> - DNS global memcap: 16777216
26/5/2016 -- 12:34:29 - <Info> - Protocol detection and parser disabled for modbus protocol.
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@8': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@9': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@10': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@11': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@12': No such device (19)
26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@13': No such device (19)
26/5/2016 -- 12:34:30 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@14': No such device (19)
26/5/2016 -- 12:34:30 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@15': No such device (19)
26/5/2016 -- 12:34:30 - <Info> - allocated 3670016 bytes of memory for the defrag hash... 65536 buckets of size 56
26/5/2016 -- 12:34:30 - <Info> - preallocated 65535 defrag trackers of size 168
26/5/2016 -- 12:34:30 - <Info> - defrag memory usage: 14679896 bytes, maximum: 536870912
26/5/2016 -- 12:34:30 - <Info> - Use pid file /var/run/suricata-2.pid from config file.
[+] Sleeping for 5 seconds to allow processes to spawn.
[+] 2 of 2 Suricata processes have restarted.
Updated by zach hatsis over 8 years ago
Peter Manev wrote:
What is your start (command) line?
/bin/taskset -c 2,4,6,8,10,12,14,16,18-23 /bin/suricata -v --pfring -k none -D -c /etc/suricata/suricata.yaml
/bin/taskset -c 3,5,7,9,11,13,15,17,18-23 /bin/suricata -v --pfring -k none -D -c /etc/suricata/suricata-2.yaml
Updated by Peter Manev over 8 years ago
Remove the taskset command - affinity is taken care of by Suricata in your suricata.yaml (where you have already set up the affinity).
What is the reason you have 2 Suricata running ?
Updated by zach hatsis over 8 years ago
Peter Manev wrote:
Remove the taskset command - affinity is taken care of by Suricata in your suricata.yaml (where you have already set up the affinity).
What is the reason you have 2 Suricata running ?
I will try removing the taskset. Also the reason for running 2 suricata instances is to split up the amount of rules running per instance.
Updated by Anoop Saldanha about 8 years ago
Hi Zach,
Wanted to see if you are still seeing the issue, or did the issue solve itself? If it is the later, what did you do to solve the issue?
Updated by zach hatsis about 8 years ago
Hi Anoop, Just removed the taskset to see if this resolves my issue. I will report back once i know whether or not this fixed my issue.
Anoop Saldanha wrote:
Hi Zach,
Wanted to see if you are still seeing the issue, or did the issue solve itself? If it is the later, what did you do to solve the issue?
Updated by zach hatsis about 8 years ago
Anoop Saldanha wrote:
Hi Zach,
Wanted to see if you are still seeing the issue, or did the issue solve itself? If it is the later, what did you do to solve the issue?
Taking taskset off i am still seeing the issue. Drops actually seem to be increasing more somehow than when i had taskset enabled.
Updated by Andreas Herz about 8 years ago
- Assignee set to Anonymous
- Target version set to TBD
Updated by Andreas Herz about 5 years ago
- Tracker changed from Optimization to Support
- Assignee changed from Community Ticket to zach hatsis
- Target version changed from TBD to Support
@zach did you have time to look into that again?
Updated by Victor Julien about 5 years ago
- Status changed from New to Closed
- Assignee deleted (
zach hatsis) - Target version deleted (
Support)