Project

General

Profile

Actions

Support #1791

closed

Kernel Drops: Thread occasionally Spike to 100%

Added by zach hatsis over 8 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Affected Versions:
Label:

Description

All,

I have noticed an issue that occasionally when traffic hits around 600-800mbps that thread utilization will peak up to 100% causing kernel drops (about %.009 of our total traffic on the interface). I have worked with VictorJ on this issue a bit. Running Suri Version 3.0.1. He requested that i run `perf` on the thread that pegs. Here is the perf output output. It appears some instruction in SCACSearch is potentially being called too much or something. Let me know if I can provide anything else to help track this down. Attached is my suricata.yaml config as well as the perf report output.


Files

suricata.yaml (61.5 KB) suricata.yaml zach hatsis, 05/24/2016 05:15 PM
perf_report.txt (62.6 KB) perf_report.txt zach hatsis, 05/24/2016 05:15 PM
Actions #1

Updated by zach hatsis over 8 years ago

Please note i am also running PF_RING ZC as well and do not see drops on the ZC interfaces when running pfcount.

Actions #2

Updated by Peter Manev over 8 years ago

For starters you can try -

- max-pending-packets: 65536
- Your cpu affinity section needs to be adjusted. You can follow the guideline here - https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Suricatayaml#Relevant-cpu-affinity-settings-for-IDSIPS-modes

After you do the changes can you please start in "-v" mode and share the suricata.log (enabled inside suricata.yaml)

Actions #3

Updated by zach hatsis over 8 years ago

Peter Manev wrote:

For starters you can try -

- max-pending-packets: 65536
- Your cpu affinity section needs to be adjusted. You can follow the guideline here - https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Suricatayaml#Relevant-cpu-affinity-settings-for-IDSIPS-modes

After you do the changes can you please start in "-v" mode and share the suricata.log (enabled inside suricata.yaml)

Peter, thank you for your input. I read through that guide and reduced the amount of management threads that i'm using, and repurposed the cores for the detect threads. I didn't however set priorities on the detect threads though as i wasn't clear on what those do in the article you linked. Also i set - max-pending-packets: 65534 (65536 was too high a value and suri wouldn't start.) still seeing some drops however it doesn't appear that they are caused by cpu's pegging at 100% now.

set-cpu-affinity: yes
  1. Tune cpu affinity of suricata threads. Each family of threads can be bound
  2. on specific CPUs.
    cpu-affinity:
    - management-cpu-set:
    cpu: [ 18 ] # include only these cpus in affinity settings
    mode: "balanced"
    - receive-cpu-set:
    cpu: [ 18 ] # include only these cpus in affinity settings
    - decode-cpu-set:
    cpu: [ 18, 19 ]
    mode: "balanced"
    - stream-cpu-set:
    cpu: [ "18-19" ]
    - detect-cpu-set:
    cpu: [ 29,21,22,23,2,4,6,8,10,12,14,16 ]
    mode: "exclusive" # run detect threads in these cpus
  3. prio:
  4. low: [ 0 ]
  5. medium: [ "1-2" ]
  6. high: [ 3 ]
  7. default: "medium"
  8. default: "high"
  9. Use explicitely 3 threads and don't compute number by using
  10. detect-thread-ratio variable:
  11. threads: 16
  12. prio:
    #low: [ "0-4" ]
    #medium: [ "5-23" ]
    #high: [ 3 ]
    - verdict-cpu-set:
    cpu: [ 0 ]
    prio:
    default: "high"
    - reject-cpu-set:
    cpu: [ 0 ]
    prio:
    default: "low"
    - output-cpu-set:
    cpu: [ "all" ]
    prio:
    default: "medium"

output from running suri:

  1. /root/scripts/suri_init.sh
    [+] Purging log files in /var/log/suricata
    [!] Suricata is running, killing 26605 26571 restarting now.
    pids remaining:
    [.] Starting Suricata.
    [+] Starting Perimeter Suricata..
    [+] Taskset suricata instance 1 to cores:
    [+] 2,4,6,8,10,12,14,16,18-23
    26/5/2016 -- 12:34:29 - <Notice> - This is Suricata version 3.0.1 RELEASE
    26/5/2016 -- 12:34:29 - <Info> - CPUs/cores online: 32
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@0 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@1 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@2 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@3 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@4 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@5 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@6 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@7 from config file
    26/5/2016 -- 12:34:29 - <Info> - 'default' server has 'request-body-minimal-inspect-size' set to 33882 and 'request-body-inspect-window' set to 4053 after randomization.
    26/5/2016 -- 12:34:29 - <Info> - 'default' server has 'response-body-minimal-inspect-size' set to 42119 and 'response-body-inspect-window' set to 16872 after randomization.
    26/5/2016 -- 12:34:29 - <Info> - DNS request flood protection level: 500
    26/5/2016 -- 12:34:29 - <Info> - DNS per flow memcap (state-memcap): 524288
    26/5/2016 -- 12:34:29 - <Info> - DNS global memcap: 16777216
    26/5/2016 -- 12:34:29 - <Info> - Protocol detection and parser disabled for modbus protocol.
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@0': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@1': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@2': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@3': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@4': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@5': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@6': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@7': No such device (19)
    26/5/2016 -- 12:34:29 - <Info> - allocated 3669960 bytes of memory for the defrag hash... 65535 buckets of size 56
    26/5/2016 -- 12:34:29 - <Info> - preallocated 65535 defrag trackers of size 168
    26/5/2016 -- 12:34:29 - <Info> - defrag memory usage: 14679840 bytes, maximum: 1073741824
    [+] Taskset suricata instance 2 to cores:
    [+] 3,5,7,9,11,13,15,17,18-23
    26/5/2016 -- 12:34:29 - <Notice> - This is Suricata version 3.0.1 RELEASE
    26/5/2016 -- 12:34:29 - <Info> - CPUs/cores online: 32
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@8 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@9 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@10 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@11 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@12 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@13 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@14 from config file
    26/5/2016 -- 12:34:29 - <Info> - Adding interface zc:99@15 from config file
    26/5/2016 -- 12:34:29 - <Info> - 'default' server has 'request-body-minimal-inspect-size' set to 33882 and 'request-body-inspect-window' set to 4053 after randomization.
    26/5/2016 -- 12:34:29 - <Info> - 'default' server has 'response-body-minimal-inspect-size' set to 42119 and 'response-body-inspect-window' set to 16872 after randomization.
    26/5/2016 -- 12:34:29 - <Info> - DNS request flood protection level: 500
    26/5/2016 -- 12:34:29 - <Info> - DNS per flow memcap (state-memcap): 524288
    26/5/2016 -- 12:34:29 - <Info> - DNS global memcap: 16777216
    26/5/2016 -- 12:34:29 - <Info> - Protocol detection and parser disabled for modbus protocol.
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@8': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@9': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@10': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@11': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@12': No such device (19)
    26/5/2016 -- 12:34:29 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@13': No such device (19)
    26/5/2016 -- 12:34:30 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@14': No such device (19)
    26/5/2016 -- 12:34:30 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 'zc:99@15': No such device (19)
    26/5/2016 -- 12:34:30 - <Info> - allocated 3670016 bytes of memory for the defrag hash... 65536 buckets of size 56
    26/5/2016 -- 12:34:30 - <Info> - preallocated 65535 defrag trackers of size 168
    26/5/2016 -- 12:34:30 - <Info> - defrag memory usage: 14679896 bytes, maximum: 536870912
    26/5/2016 -- 12:34:30 - <Info> - Use pid file /var/run/suricata-2.pid from config file.
    [+] Sleeping for 5 seconds to allow processes to spawn.
    [+] 2 of 2 Suricata processes have restarted.
Actions #4

Updated by Peter Manev over 8 years ago

What is your start (command) line?

Actions #5

Updated by zach hatsis over 8 years ago

Peter Manev wrote:

What is your start (command) line?

/bin/taskset -c 2,4,6,8,10,12,14,16,18-23 /bin/suricata -v --pfring -k none -D -c /etc/suricata/suricata.yaml
/bin/taskset -c 3,5,7,9,11,13,15,17,18-23 /bin/suricata -v --pfring -k none -D -c /etc/suricata/suricata-2.yaml

Actions #6

Updated by Peter Manev over 8 years ago

Remove the taskset command - affinity is taken care of by Suricata in your suricata.yaml (where you have already set up the affinity).

What is the reason you have 2 Suricata running ?

Actions #7

Updated by zach hatsis over 8 years ago

Peter Manev wrote:

Remove the taskset command - affinity is taken care of by Suricata in your suricata.yaml (where you have already set up the affinity).

What is the reason you have 2 Suricata running ?

I will try removing the taskset. Also the reason for running 2 suricata instances is to split up the amount of rules running per instance.

Actions #8

Updated by Peter Manev over 8 years ago

What is the HW config you utilize ?

Actions #9

Updated by Anoop Saldanha over 8 years ago

Hi Zach,

Wanted to see if you are still seeing the issue, or did the issue solve itself? If it is the later, what did you do to solve the issue?

Actions #10

Updated by zach hatsis over 8 years ago

Hi Anoop, Just removed the taskset to see if this resolves my issue. I will report back once i know whether or not this fixed my issue.

Anoop Saldanha wrote:

Hi Zach,

Wanted to see if you are still seeing the issue, or did the issue solve itself? If it is the later, what did you do to solve the issue?

Actions #11

Updated by zach hatsis over 8 years ago

Anoop Saldanha wrote:

Hi Zach,

Wanted to see if you are still seeing the issue, or did the issue solve itself? If it is the later, what did you do to solve the issue?

Taking taskset off i am still seeing the issue. Drops actually seem to be increasing more somehow than when i had taskset enabled.

Actions #12

Updated by Andreas Herz over 8 years ago

  • Assignee set to Anonymous
  • Target version set to TBD
Actions #13

Updated by Andreas Herz almost 6 years ago

  • Assignee set to Community Ticket
Actions #14

Updated by Andreas Herz over 5 years ago

  • Tracker changed from Optimization to Support
  • Assignee changed from Community Ticket to zach hatsis
  • Target version changed from TBD to Support

@zach did you have time to look into that again?

Actions #15

Updated by Victor Julien over 5 years ago

  • Status changed from New to Closed
  • Assignee deleted (zach hatsis)
  • Target version deleted (Support)
Actions

Also available in: Atom PDF