Optimization #2845
closedCounters for kernel_packets decreases at times without restart
Description
We have seen cases in Suricata where the stats.capture.kernel_packets counter decreases while Suricata is running. My understanding is that this is supposed to be a running counter that should not decrease unless Suricata is restarted. This behavior has been observed on 4.0.6 and 4.1.2. I am fairly confident I have also seen this on 3.2.2 as well. This decrease would be more expected if the value reset or rolled over from overflow, but I don't believe that is what is happening here.
Below is one example from the logs I am attaching. I have many other logs I can provide if desired.
$ jq 'select(.event_type == "stats") | select(.timestamp | startswith("2019-02-22T07:55:")) | .timestamp, .stats.capture' eve.json_stats_only_08-snf3-2019022208 ... "2019-02-22T07:55:36.000327-0600" { "kernel_packets": 17308040184, "kernel_packets_delta": 1039779, "kernel_drops": 0, "kernel_drops_delta": 0, "kernel_ifdrops": 0, "kernel_ifdrops_delta": 0 } "2019-02-22T07:55:45.000335-0600" { "kernel_packets": 13013890235, "kernel_packets_delta": -4294149949, "kernel_drops": 0, "kernel_drops_delta": 0, "kernel_ifdrops": 0, "kernel_ifdrops_delta": 0 } "2019-02-22T07:55:54.000320-0600" { "kernel_packets": 13014866476, "kernel_packets_delta": 976241, "kernel_drops": 0, "kernel_drops_delta": 0, "kernel_ifdrops": 0, "kernel_ifdrops_delta": 0 }
Corresponding from stats.log:
Date: 2/22/2019 -- 07:55:36 (uptime: 2d, 21h 14m 54s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 17308040184 ------------------------------------------------------------------------------------ Date: 2/22/2019 -- 07:55:45 (uptime: 2d, 21h 15m 03s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 13013890235 ------------------------------------------------------------------------------------ Date: 2/22/2019 -- 07:55:54 (uptime: 2d, 21h 15m 12s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 13014866476 ------------------------------------------------------------------------------------
Here are more examples from other Suricata instances that don't have logs attached, but I am including for reference:
"2019-02-22T15:09:00.000327-0600" { "kernel_packets": 15681829155, "kernel_packets_delta": -4294025171, "kernel_drops": 0, "kernel_drops_delta": 0, "kernel_ifdrops": 0, "kernel_ifdrops_delta": 0 } "2019-02-22T03:18:51.000325-0600" { "kernel_packets": 15980883154, "kernel_packets_delta": -4293598551, "kernel_drops": 0, "kernel_drops_delta": 0, "kernel_ifdrops": 0, "kernel_ifdrops_delta": 0 } "2019-02-19T10:22:00.000363-0600" { "kernel_packets": 17749102321, "kernel_packets_delta": -4294216445, "kernel_drops": 2227794327, "kernel_drops_delta": 0, "kernel_ifdrops": 0, "kernel_ifdrops_delta": 0 } "2019-02-19T10:17:40.000327-0600" { "kernel_packets": 16791755239, "kernel_packets_delta": -4294006615, "kernel_drops": 1280457873, "kernel_drops_delta": 0, "kernel_ifdrops": 0, "kernel_ifdrops_delta": 0 } "2019-02-19T09:30:35.000346-0600" { "kernel_packets": 17342905685, "kernel_packets_delta": -4294369072, "kernel_drops": 580833306, "kernel_drops_delta": 0, "kernel_ifdrops": 0, "kernel_ifdrops_delta": 0 } "2019-02-19T09:25:05.000338-0600" { "kernel_packets": 23570036423, "kernel_packets_delta": -4293688281, "kernel_drops": 775213362, "kernel_drops_delta": 0, "kernel_ifdrops": 0, "kernel_ifdrops_delta": 0 } "2019-02-19T08:51:53.000331-0600" { "kernel_packets": 12005768232, "kernel_packets_delta": -4294159125, "kernel_drops": 4547641950, "kernel_drops_delta": 0, "kernel_ifdrops": 0, "kernel_ifdrops_delta": 0 } "2019-02-19T08:51:03.000358-0600" { "kernel_packets": 22256188092, "kernel_packets_delta": -4294023378, "kernel_drops": 722622375, "kernel_drops_delta": 0, "kernel_ifdrops": 0, "kernel_ifdrops_delta": 0 }
I do not see any messages in the suricata.log file during this time.
Is this behavior expected and if not what additional troubleshooting would you like us to perform to assist with this issue?
Files
Updated by Eric Urban over 5 years ago
Additional info:
Running CentOS Linux release 7.6.1810 (Core)
Myricom 10G-PCIE2-8C2-2S with SNFv3 (Sniffer 3) driver versions 3.0.14.50843 and 3.0.15.50857
Suricata is using PCAP capture method (using --pcap command-line option) with workers runmode
Updated by Victor Julien over 5 years ago
It's possible to log the stats per thread. Could you try that? One possible cause I can think of is that for some reason some threads could either not report stats for a while (so not contributing to the sum of thread values) or somehow reset the stats for some reason.
Updated by Eric Urban over 5 years ago
I made this change yesterday but will need to wait until we get a negative delta.
Updated by Eric Urban over 5 years ago
- File stats.log-2019030413.gz stats.log-2019030413.gz added
We encountered this three times recently with stats per thread enabled.
Here is what the stats.logs show (which now looks like rollover):
Case 1:
Date: 3/4/2019 -- 12:43:15 (uptime: 6d, 00h 40m 50s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 31856634887 capture.kernel_packets | W#01-snf0 | 3674776885 capture.kernel_packets | W#02-snf0 | 4294933329 capture.kernel_packets | W#03-snf0 | 3846012591 capture.kernel_packets | W#04-snf0 | 4091485723 capture.kernel_packets | W#05-snf0 | 3515277825 capture.kernel_packets | W#06-snf0 | 3560094513 capture.kernel_packets | W#07-snf0 | 3955686633 capture.kernel_packets | W#08-snf0 | 689417483 capture.kernel_packets | W#09-snf0 | 4228949905 ------------------------------------------------------------------------------------ Date: 3/4/2019 -- 12:43:25 (uptime: 6d, 00h 41m 00s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 27562916233 capture.kernel_packets | W#01-snf0 | 3674912821 capture.kernel_packets | W#02-snf0 | 48425 capture.kernel_packets | W#03-snf0 | 3846094371 capture.kernel_packets | W#04-snf0 | 4091538831 capture.kernel_packets | W#05-snf0 | 3515371879 capture.kernel_packets | W#06-snf0 | 3560184719 capture.kernel_packets | W#07-snf0 | 3956273142 capture.kernel_packets | W#08-snf0 | 689468748 capture.kernel_packets | W#09-snf0 | 4229023297 ------------------------------------------------------------------------------------ Date: 3/4/2019 -- 12:43:35 (uptime: 6d, 00h 41m 10s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 27564390368 capture.kernel_packets | W#01-snf0 | 3675238778 capture.kernel_packets | W#02-snf0 | 107570 capture.kernel_packets | W#03-snf0 | 3846173711 capture.kernel_packets | W#04-snf0 | 4091614569 capture.kernel_packets | W#05-snf0 | 3515459369 capture.kernel_packets | W#06-snf0 | 3560261997 capture.kernel_packets | W#07-snf0 | 3956906711 capture.kernel_packets | W#08-snf0 | 689517589 capture.kernel_packets | W#09-snf0 | 4229110074 ------------------------------------------------------------------------------------
Case 2:
Date: 3/4/2019 -- 20:53:58 (uptime: 6d, 08h 51m 33s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 21287251135 capture.kernel_packets | W#01-snf0 | 3885782812 capture.kernel_packets | W#02-snf0 | 195681170 capture.kernel_packets | W#03-snf0 | 4030967035 capture.kernel_packets | W#04-snf0 | 4294929626 capture.kernel_packets | W#05-snf0 | 3690908901 capture.kernel_packets | W#06-snf0 | 3795432023 capture.kernel_packets | W#07-snf0 | 67179936 capture.kernel_packets | W#08-snf0 | 1088286168 capture.kernel_packets | W#09-snf0 | 238083464 ------------------------------------------------------------------------------------ Date: 3/4/2019 -- 20:54:08 (uptime: 6d, 08h 51m 43s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 16992716963 capture.kernel_packets | W#01-snf0 | 3885847698 capture.kernel_packets | W#02-snf0 | 195736417 capture.kernel_packets | W#03-snf0 | 4030987599 capture.kernel_packets | W#04-snf0 | 35986 capture.kernel_packets | W#05-snf0 | 3690939607 capture.kernel_packets | W#06-snf0 | 3795475350 capture.kernel_packets | W#07-snf0 | 67239312 capture.kernel_packets | W#08-snf0 | 1088323942 capture.kernel_packets | W#09-snf0 | 238131052 ------------------------------------------------------------------------------------ Date: 3/4/2019 -- 20:54:18 (uptime: 6d, 08h 51m 53s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 16993260554 capture.kernel_packets | W#01-snf0 | 3885915442 capture.kernel_packets | W#02-snf0 | 195803255 capture.kernel_packets | W#03-snf0 | 4031007545 capture.kernel_packets | W#04-snf0 | 149232 capture.kernel_packets | W#05-snf0 | 3690968960 capture.kernel_packets | W#06-snf0 | 3795511142 capture.kernel_packets | W#07-snf0 | 67335413 capture.kernel_packets | W#08-snf0 | 1088360742 capture.kernel_packets | W#09-snf0 | 238208823 ------------------------------------------------------------------------------------
Case 3:
Date: 3/5/2019 -- 08:05:06 (uptime: 6d, 21h 04m 04s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 31126576613 capture.kernel_packets | W#01-snf0 | 3646951621 capture.kernel_packets | W#02-snf0 | 4294902636 capture.kernel_packets | W#03-snf0 | 3627729495 capture.kernel_packets | W#04-snf0 | 3948373796 capture.kernel_packets | W#05-snf0 | 3293425221 capture.kernel_packets | W#06-snf0 | 3386946249 capture.kernel_packets | W#07-snf0 | 4041643843 capture.kernel_packets | W#08-snf0 | 700726492 capture.kernel_packets | W#09-snf0 | 4185877260 ------------------------------------------------------------------------------------ Date: 3/5/2019 -- 08:05:16 (uptime: 6d, 21h 04m 14s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 26832214503 capture.kernel_packets | W#01-snf0 | 3647014810 capture.kernel_packets | W#02-snf0 | 21714 capture.kernel_packets | W#03-snf0 | 3627756678 capture.kernel_packets | W#04-snf0 | 3948410543 capture.kernel_packets | W#05-snf0 | 3293498165 capture.kernel_packets | W#06-snf0 | 3387020464 capture.kernel_packets | W#07-snf0 | 4041702493 capture.kernel_packets | W#08-snf0 | 700764713 capture.kernel_packets | W#09-snf0 | 4186024923 ------------------------------------------------------------------------------------ Date: 3/5/2019 -- 08:05:26 (uptime: 6d, 21h 04m 24s) ------------------------------------------------------------------------------------ Counter | TM Name | Value ------------------------------------------------------------------------------------ capture.kernel_packets | Total | 26832983650 capture.kernel_packets | W#01-snf0 | 3647065229 capture.kernel_packets | W#02-snf0 | 364829 capture.kernel_packets | W#03-snf0 | 3627780433 capture.kernel_packets | W#04-snf0 | 3948497668 capture.kernel_packets | W#05-snf0 | 3293516751 capture.kernel_packets | W#06-snf0 | 3387097606 capture.kernel_packets | W#07-snf0 | 4041746104 capture.kernel_packets | W#08-snf0 | 700792834 capture.kernel_packets | W#09-snf0 | 4186122196 ------------------------------------------------------------------------------------
I am attaching one of the logs, though I don't know if it is necessary. I am hoping to look into a fix for this if I get a chance, but don't know if and when that will happen :)
Updated by Victor Julien over 5 years ago
That looks suspiciously much like a u32 overflow. First it's very close to 2^^32, then it's very low.
Updated by Victor Julien over 5 years ago
Yup, from pcap.h:
/* * As returned by the pcap_stats() */ struct pcap_stat { u_int ps_recv; /* number of packets received */ u_int ps_drop; /* number of packets dropped */ u_int ps_ifdrop; /* drops by interface -- only supported on some platforms */ #if defined(_WIN32) && defined(HAVE_REMOTE) u_int ps_capt; /* number of packets that reach the application */ u_int ps_sent; /* number of packets sent by the server on the network */ u_int ps_netdrop; /* number of packets lost on the network */ #endif /* _WIN32 && HAVE_REMOTE */ };
Suri takes those values and sets them directly to the stats API.
static inline void PcapDumpCounters(PcapThreadVars *ptv) { struct pcap_stat pcap_s; if (likely((pcap_stats(ptv->pcap_handle, &pcap_s) >= 0))) { StatsSetUI64(ptv->tv, ptv->capture_kernel_packets, pcap_s.ps_recv); StatsSetUI64(ptv->tv, ptv->capture_kernel_drops, pcap_s.ps_drop); (void) SC_ATOMIC_SET(ptv->livedev->drop, pcap_s.ps_drop); StatsSetUI64(ptv->tv, ptv->capture_kernel_ifdrops, pcap_s.ps_ifdrop); } }
So when the pcap counters overflow, Suricata just sets the overflown values.
Updated by Victor Julien over 5 years ago
- Assignee set to Community Ticket
- Effort set to low
- Difficulty set to low
I guess the way to address this would be to keep track of the previous stats values, detect the overflow and calc how much packets there must have been, then feed that to our stats API.
Updated by Fabian Franz over 4 years ago
Hey,
I just wanted to chime in because I see the exact same problem as Eric reported.
Is there currently a plan to fix this?
Updated by Peter Manev over 4 years ago
What NIC,runmode and suricata version are you using ?
Updated by Fabian Franz over 4 years ago
I am using a Myricom 10G-PCIE2-8C2-2S capture card (SNF 3.0.20) and Suricata 5.0.2 in workers runmode.
Updated by Roland Fischer over 4 years ago
I might be able to help out with the implementation, but I am not sure I'd be able to test it at home. Don't have access to work equipment. Unsure whether I'll have the time to set up the test env.
FYI, yup, pcap 32bits counters are "interesting". We've had to fix the same thing at work for another project and fixed it the way it's proposed by storing the old value and do "wrap-around substraction". Mind you, by the time we've implemented that we had a native SNF driver. Moreover, at 10gbps and worst-case minimal eth packet size (60 bytes / 84 with preamble, gap and crc) you need to get your stats no later than every 4.5 mins.
A longer term plan could be to get a 64-bit pcap_stats_ex() or so into libpcap, but that would require Myricom/SNF libpcap updates as well.
A hail mary would be to ask Aria/Myricom whether they'd be willing to provide an additional, non-pcap-conforming, 64 stats API for their libpcap, or expose the internal ring-handle so that we could use the SNF native snf_ring_stats().
Another longer term plan is to implement an SNF source. This has not been a prio for us so far, so I am doubtful it will be in the future.
FYI, quickly checked if Myricom/Sniffer v3 provide an extended stats for their libpcap and whether you could possibly recompile Suricata with Myricom's pcap.h to get 64-bit counters, but alas not. Makes sense as they want to keep ABI compatibility with the normal libpcap/pcap_stats() to have a drop-in replacement.
Updated by Roland Fischer over 4 years ago
FYI, I talk about my company when talking about "SNF source has not been a prio for us" in the comment above, not about the Suricata project. I am just a bystander here.
Browsing through the source code I realize that stats are collected every sec in Suricata. My bad. Ignore the "4.5mins statement above".
Just for completeness sake, another option in case there's a "have you thought about that discussion"...
Instead of waiting for a libpcap 64bit stats implementation, implement it in suricata. Essentially implement an alternative "64-bit pcap snf stats call" to use instead of the normal PcapDumpCounters() in source-pcap.c if some SNF compile switch is set. Requires some "pcap open-heart surgery" to look into pcap's opaque handle but is more than doable. Look at https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-snf.c#L50 for an "inspiration". It would require clients compiling Suricata themselves to get this support and hopefully not LD_PRELOADing the wrong libpcap or bad things would happen. I personally don't like it due to that accidental misuse. A config could make this more palpable, but not by much. However, it would provide (native SNF) 64bit counts without having to implement a full SNF source.
Updated by Roland Fischer over 4 years ago
Sorry, let me know if there is too much chatter from my side.
Anyways, if noone else is looking into this ticket, I can take care of it. Just let me know.
Would implement it as proposed by Victor to keep the old stats around as that seems the most useful solution for now, imho.
As a side note, I can add an "Implement native SNF source" ticket and describe what needs to be done if that is of any help. Depending on how much longer this work-from-home will last and my other work load, I might be able to work on it in a few weeks probably.
Updated by Roland Fischer over 4 years ago
FYI, I do have an implementation for this based on storing the last value in the pcap source as mentioned by Victor. I.e. it handles one-time wrap-around of pcap_stats. Should be able to open an official PR soon...
Updated by Roland Fischer over 4 years ago
Updated by Victor Julien over 4 years ago
- Tracker changed from Support to Optimization
- Status changed from New to In Review
- Assignee changed from Community Ticket to Roland Fischer
- Priority changed from Low to Normal
- Target version set to 6.0.0beta1
- Affected Versions deleted (
4.0.6, 4.1.2)
Updated by FATEMA WALA over 4 years ago
I am running Suricata v 5.0.2 with Myricom SNF : 10G-PCIE-8B-S myri_snf 3.0.20.50894
And I notice the value of capture.kernel_* counters way more than the u32 limit (2^32) of pcap_stat().
Is it because of Suricata using Myricom's libpcap libraries?
I am a little confused after reading this thread, as par this thread these counters shouldn't exceed 2^32 (unless the PR by Roland is merged and available in the latest Suri update, which clearly is not the case as I am still running an old version of Suri) . Am I missing something here?
Here are some values of these counters from my stats.log file:
------------------------------------------------------------------------------------
Date: 6/18/2020 -- 09:37:18 (uptime: 0d, 02h 47m 30s)
------------------------------------------------------------------------------------
Counter | TM Name | Value
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
capture.kernel_packets | Total | 67707835013
capture.kernel_drops | Total | 67346525680
more..
$stats.log | grep "capture"
capture.kernel_packets | Total | 68408818324
capture.kernel_drops | Total | 68089493904
capture.kernel_ifdrops | Total | 2800
capture.kernel_packets | Total | 68408994460
capture.kernel_drops | Total | 68089493904
capture.kernel_ifdrops | Total | 2800
Updated by FATEMA WALA over 4 years ago
FATEMA WALA wrote in #note-19:
I am running Suricata v 5.0.2 with Myricom SNF : 10G-PCIE-8B-S myri_snf 3.0.20.50894
And I notice the value of capture.kernel_* counters way more than the u32 limit (2^32) of pcap_stat().
Is it because of Suricata using Myricom's libpcap libraries?
I am a little confused after reading this thread, as par this thread these counters shouldn't exceed 2^32 (unless the PR by Roland is merged and available in the latest Suri update, which clearly is not the case as I am still running an old version of Suri) . Am I missing something here?Here are some values of these counters from my stats.log file:
------------------------------------------------------------------------------------
Date: 6/18/2020 -- 09:37:18 (uptime: 0d, 02h 47m 30s)
------------------------------------------------------------------------------------
Counter | TM Name | Value
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
capture.kernel_packets | Total | 67707835013
capture.kernel_drops | Total | 67346525680more..
$stats.log | grep "capture"
capture.kernel_packets | Total | 68408818324
capture.kernel_drops | Total | 68089493904
capture.kernel_ifdrops | Total | 2800
capture.kernel_packets | Total | 68408994460
capture.kernel_drops | Total | 68089493904
capture.kernel_ifdrops | Total | 2800
Ah.. Nevermind, I realized it's TOTAL of 16 threads, which means 16 x 2^32 can be expected as the maximum value of the capyure.kernel_packets value..
Sorry my bad..
Updated by Victor Julien over 4 years ago
- Status changed from In Review to Closed