Project

General

Profile

Actions

Optimization #2845

closed

Counters for kernel_packets decreases at times without restart

Added by Eric Urban over 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Target version:
Effort:
Difficulty:
Label:

Description

We have seen cases in Suricata where the stats.capture.kernel_packets counter decreases while Suricata is running. My understanding is that this is supposed to be a running counter that should not decrease unless Suricata is restarted. This behavior has been observed on 4.0.6 and 4.1.2. I am fairly confident I have also seen this on 3.2.2 as well. This decrease would be more expected if the value reset or rolled over from overflow, but I don't believe that is what is happening here.

Below is one example from the logs I am attaching. I have many other logs I can provide if desired.

$ jq 'select(.event_type == "stats") | select(.timestamp | startswith("2019-02-22T07:55:")) | .timestamp, .stats.capture' eve.json_stats_only_08-snf3-2019022208

...
"2019-02-22T07:55:36.000327-0600" 
{
  "kernel_packets": 17308040184,
  "kernel_packets_delta": 1039779,
  "kernel_drops": 0,
  "kernel_drops_delta": 0,
  "kernel_ifdrops": 0,
  "kernel_ifdrops_delta": 0
}
"2019-02-22T07:55:45.000335-0600" 
{
  "kernel_packets": 13013890235,
  "kernel_packets_delta": -4294149949,
  "kernel_drops": 0,
  "kernel_drops_delta": 0,
  "kernel_ifdrops": 0,
  "kernel_ifdrops_delta": 0
}
"2019-02-22T07:55:54.000320-0600" 
{
  "kernel_packets": 13014866476,
  "kernel_packets_delta": 976241,
  "kernel_drops": 0,
  "kernel_drops_delta": 0,
  "kernel_ifdrops": 0,
  "kernel_ifdrops_delta": 0
}

Corresponding from stats.log:

Date: 2/22/2019 -- 07:55:36 (uptime: 2d, 21h 14m 54s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 17308040184
------------------------------------------------------------------------------------

Date: 2/22/2019 -- 07:55:45 (uptime: 2d, 21h 15m 03s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 13013890235
------------------------------------------------------------------------------------

Date: 2/22/2019 -- 07:55:54 (uptime: 2d, 21h 15m 12s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 13014866476
------------------------------------------------------------------------------------

Here are more examples from other Suricata instances that don't have logs attached, but I am including for reference:

"2019-02-22T15:09:00.000327-0600" 
{
  "kernel_packets": 15681829155,
  "kernel_packets_delta": -4294025171,
  "kernel_drops": 0,
  "kernel_drops_delta": 0,
  "kernel_ifdrops": 0,
  "kernel_ifdrops_delta": 0
}
"2019-02-22T03:18:51.000325-0600" 
{
  "kernel_packets": 15980883154,
  "kernel_packets_delta": -4293598551,
  "kernel_drops": 0,
  "kernel_drops_delta": 0,
  "kernel_ifdrops": 0,
  "kernel_ifdrops_delta": 0
}
"2019-02-19T10:22:00.000363-0600" 
{
  "kernel_packets": 17749102321,
  "kernel_packets_delta": -4294216445,
  "kernel_drops": 2227794327,
  "kernel_drops_delta": 0,
  "kernel_ifdrops": 0,
  "kernel_ifdrops_delta": 0
}
"2019-02-19T10:17:40.000327-0600" 
{
  "kernel_packets": 16791755239,
  "kernel_packets_delta": -4294006615,
  "kernel_drops": 1280457873,
  "kernel_drops_delta": 0,
  "kernel_ifdrops": 0,
  "kernel_ifdrops_delta": 0
}
"2019-02-19T09:30:35.000346-0600" 
{
  "kernel_packets": 17342905685,
  "kernel_packets_delta": -4294369072,
  "kernel_drops": 580833306,
  "kernel_drops_delta": 0,
  "kernel_ifdrops": 0,
  "kernel_ifdrops_delta": 0
}
"2019-02-19T09:25:05.000338-0600" 
{
  "kernel_packets": 23570036423,
  "kernel_packets_delta": -4293688281,
  "kernel_drops": 775213362,
  "kernel_drops_delta": 0,
  "kernel_ifdrops": 0,
  "kernel_ifdrops_delta": 0
}
"2019-02-19T08:51:53.000331-0600" 
{
  "kernel_packets": 12005768232,
  "kernel_packets_delta": -4294159125,
  "kernel_drops": 4547641950,
  "kernel_drops_delta": 0,
  "kernel_ifdrops": 0,
  "kernel_ifdrops_delta": 0
}
"2019-02-19T08:51:03.000358-0600" 
{
  "kernel_packets": 22256188092,
  "kernel_packets_delta": -4294023378,
  "kernel_drops": 722622375,
  "kernel_drops_delta": 0,
  "kernel_ifdrops": 0,
  "kernel_ifdrops_delta": 0
}

I do not see any messages in the suricata.log file during this time.

Is this behavior expected and if not what additional troubleshooting would you like us to perform to assist with this issue?


Files

stats_08-snf3.log-2019022208.gz (124 KB) stats_08-snf3.log-2019022208.gz Stats log Eric Urban, 02/22/2019 09:11 PM
eve.json_stats_only_08-snf3-2019022208.gz (203 KB) eve.json_stats_only_08-snf3-2019022208.gz Eve log (filtered for only stats events) Eric Urban, 02/22/2019 09:12 PM
stats.log-2019030413.gz (2.68 MB) stats.log-2019030413.gz Eric Urban, 03/05/2019 06:51 PM
Actions #1

Updated by Eric Urban over 5 years ago

Additional info:

Running CentOS Linux release 7.6.1810 (Core)

Myricom 10G-PCIE2-8C2-2S with SNFv3 (Sniffer 3) driver versions 3.0.14.50843 and 3.0.15.50857

Suricata is using PCAP capture method (using --pcap command-line option) with workers runmode

Actions #2

Updated by Victor Julien over 5 years ago

It's possible to log the stats per thread. Could you try that? One possible cause I can think of is that for some reason some threads could either not report stats for a while (so not contributing to the sum of thread values) or somehow reset the stats for some reason.

Actions #3

Updated by Eric Urban over 5 years ago

I made this change yesterday but will need to wait until we get a negative delta.

Actions #4

Updated by Eric Urban over 5 years ago

We encountered this three times recently with stats per thread enabled.

Here is what the stats.logs show (which now looks like rollover):

Case 1:

Date: 3/4/2019 -- 12:43:15 (uptime: 6d, 00h 40m 50s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 31856634887
capture.kernel_packets                     | W#01-snf0                 | 3674776885
capture.kernel_packets                     | W#02-snf0                 | 4294933329
capture.kernel_packets                     | W#03-snf0                 | 3846012591
capture.kernel_packets                     | W#04-snf0                 | 4091485723
capture.kernel_packets                     | W#05-snf0                 | 3515277825
capture.kernel_packets                     | W#06-snf0                 | 3560094513
capture.kernel_packets                     | W#07-snf0                 | 3955686633
capture.kernel_packets                     | W#08-snf0                 | 689417483
capture.kernel_packets                     | W#09-snf0                 | 4228949905
------------------------------------------------------------------------------------
Date: 3/4/2019 -- 12:43:25 (uptime: 6d, 00h 41m 00s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 27562916233
capture.kernel_packets                     | W#01-snf0                 | 3674912821
capture.kernel_packets                     | W#02-snf0                 | 48425
capture.kernel_packets                     | W#03-snf0                 | 3846094371
capture.kernel_packets                     | W#04-snf0                 | 4091538831
capture.kernel_packets                     | W#05-snf0                 | 3515371879
capture.kernel_packets                     | W#06-snf0                 | 3560184719
capture.kernel_packets                     | W#07-snf0                 | 3956273142
capture.kernel_packets                     | W#08-snf0                 | 689468748
capture.kernel_packets                     | W#09-snf0                 | 4229023297
------------------------------------------------------------------------------------
Date: 3/4/2019 -- 12:43:35 (uptime: 6d, 00h 41m 10s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 27564390368
capture.kernel_packets                     | W#01-snf0                 | 3675238778
capture.kernel_packets                     | W#02-snf0                 | 107570
capture.kernel_packets                     | W#03-snf0                 | 3846173711
capture.kernel_packets                     | W#04-snf0                 | 4091614569
capture.kernel_packets                     | W#05-snf0                 | 3515459369
capture.kernel_packets                     | W#06-snf0                 | 3560261997
capture.kernel_packets                     | W#07-snf0                 | 3956906711
capture.kernel_packets                     | W#08-snf0                 | 689517589
capture.kernel_packets                     | W#09-snf0                 | 4229110074
------------------------------------------------------------------------------------

Case 2:

Date: 3/4/2019 -- 20:53:58 (uptime: 6d, 08h 51m 33s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 21287251135
capture.kernel_packets                     | W#01-snf0                 | 3885782812
capture.kernel_packets                     | W#02-snf0                 | 195681170
capture.kernel_packets                     | W#03-snf0                 | 4030967035
capture.kernel_packets                     | W#04-snf0                 | 4294929626
capture.kernel_packets                     | W#05-snf0                 | 3690908901
capture.kernel_packets                     | W#06-snf0                 | 3795432023
capture.kernel_packets                     | W#07-snf0                 | 67179936
capture.kernel_packets                     | W#08-snf0                 | 1088286168
capture.kernel_packets                     | W#09-snf0                 | 238083464
------------------------------------------------------------------------------------
Date: 3/4/2019 -- 20:54:08 (uptime: 6d, 08h 51m 43s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 16992716963
capture.kernel_packets                     | W#01-snf0                 | 3885847698
capture.kernel_packets                     | W#02-snf0                 | 195736417
capture.kernel_packets                     | W#03-snf0                 | 4030987599
capture.kernel_packets                     | W#04-snf0                 | 35986
capture.kernel_packets                     | W#05-snf0                 | 3690939607
capture.kernel_packets                     | W#06-snf0                 | 3795475350
capture.kernel_packets                     | W#07-snf0                 | 67239312
capture.kernel_packets                     | W#08-snf0                 | 1088323942
capture.kernel_packets                     | W#09-snf0                 | 238131052
------------------------------------------------------------------------------------
Date: 3/4/2019 -- 20:54:18 (uptime: 6d, 08h 51m 53s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 16993260554
capture.kernel_packets                     | W#01-snf0                 | 3885915442
capture.kernel_packets                     | W#02-snf0                 | 195803255
capture.kernel_packets                     | W#03-snf0                 | 4031007545
capture.kernel_packets                     | W#04-snf0                 | 149232
capture.kernel_packets                     | W#05-snf0                 | 3690968960
capture.kernel_packets                     | W#06-snf0                 | 3795511142
capture.kernel_packets                     | W#07-snf0                 | 67335413
capture.kernel_packets                     | W#08-snf0                 | 1088360742
capture.kernel_packets                     | W#09-snf0                 | 238208823
------------------------------------------------------------------------------------

Case 3:

Date: 3/5/2019 -- 08:05:06 (uptime: 6d, 21h 04m 04s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 31126576613
capture.kernel_packets                     | W#01-snf0                 | 3646951621
capture.kernel_packets                     | W#02-snf0                 | 4294902636
capture.kernel_packets                     | W#03-snf0                 | 3627729495
capture.kernel_packets                     | W#04-snf0                 | 3948373796
capture.kernel_packets                     | W#05-snf0                 | 3293425221
capture.kernel_packets                     | W#06-snf0                 | 3386946249
capture.kernel_packets                     | W#07-snf0                 | 4041643843
capture.kernel_packets                     | W#08-snf0                 | 700726492
capture.kernel_packets                     | W#09-snf0                 | 4185877260
------------------------------------------------------------------------------------
Date: 3/5/2019 -- 08:05:16 (uptime: 6d, 21h 04m 14s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 26832214503
capture.kernel_packets                     | W#01-snf0                 | 3647014810
capture.kernel_packets                     | W#02-snf0                 | 21714
capture.kernel_packets                     | W#03-snf0                 | 3627756678
capture.kernel_packets                     | W#04-snf0                 | 3948410543
capture.kernel_packets                     | W#05-snf0                 | 3293498165
capture.kernel_packets                     | W#06-snf0                 | 3387020464
capture.kernel_packets                     | W#07-snf0                 | 4041702493
capture.kernel_packets                     | W#08-snf0                 | 700764713
capture.kernel_packets                     | W#09-snf0                 | 4186024923
------------------------------------------------------------------------------------
Date: 3/5/2019 -- 08:05:26 (uptime: 6d, 21h 04m 24s)
------------------------------------------------------------------------------------
Counter                                    | TM Name                   | Value
------------------------------------------------------------------------------------
capture.kernel_packets                     | Total                     | 26832983650
capture.kernel_packets                     | W#01-snf0                 | 3647065229
capture.kernel_packets                     | W#02-snf0                 | 364829
capture.kernel_packets                     | W#03-snf0                 | 3627780433
capture.kernel_packets                     | W#04-snf0                 | 3948497668
capture.kernel_packets                     | W#05-snf0                 | 3293516751
capture.kernel_packets                     | W#06-snf0                 | 3387097606
capture.kernel_packets                     | W#07-snf0                 | 4041746104
capture.kernel_packets                     | W#08-snf0                 | 700792834
capture.kernel_packets                     | W#09-snf0                 | 4186122196
------------------------------------------------------------------------------------

I am attaching one of the logs, though I don't know if it is necessary. I am hoping to look into a fix for this if I get a chance, but don't know if and when that will happen :)

Actions #5

Updated by Victor Julien over 5 years ago

That looks suspiciously much like a u32 overflow. First it's very close to 2^^32, then it's very low.

Actions #6

Updated by Victor Julien over 5 years ago

Yup, from pcap.h:

/*
 * As returned by the pcap_stats()
 */
struct pcap_stat {
    u_int ps_recv;      /* number of packets received */
    u_int ps_drop;      /* number of packets dropped */
    u_int ps_ifdrop;    /* drops by interface -- only supported on some platforms */
#if defined(_WIN32) && defined(HAVE_REMOTE)
    u_int ps_capt;      /* number of packets that reach the application */
    u_int ps_sent;      /* number of packets sent by the server on the network */
    u_int ps_netdrop;   /* number of packets lost on the network */
#endif /* _WIN32 && HAVE_REMOTE */
};

Suri takes those values and sets them directly to the stats API.

static inline void PcapDumpCounters(PcapThreadVars *ptv)
{
    struct pcap_stat pcap_s;
    if (likely((pcap_stats(ptv->pcap_handle, &pcap_s) >= 0))) {
        StatsSetUI64(ptv->tv, ptv->capture_kernel_packets, pcap_s.ps_recv);
        StatsSetUI64(ptv->tv, ptv->capture_kernel_drops, pcap_s.ps_drop);
        (void) SC_ATOMIC_SET(ptv->livedev->drop, pcap_s.ps_drop);
        StatsSetUI64(ptv->tv, ptv->capture_kernel_ifdrops, pcap_s.ps_ifdrop);
    }
}

So when the pcap counters overflow, Suricata just sets the overflown values.

Actions #7

Updated by Victor Julien over 5 years ago

  • Assignee set to Community Ticket
  • Effort set to low
  • Difficulty set to low

I guess the way to address this would be to keep track of the previous stats values, detect the overflow and calc how much packets there must have been, then feed that to our stats API.

Actions #8

Updated by Victor Julien over 5 years ago

  • Target version set to TBD
Actions #9

Updated by Fabian Franz over 4 years ago

Hey,
I just wanted to chime in because I see the exact same problem as Eric reported.
Is there currently a plan to fix this?

Actions #10

Updated by Peter Manev over 4 years ago

What NIC,runmode and suricata version are you using ?

Actions #11

Updated by Fabian Franz over 4 years ago

I am using a Myricom 10G-PCIE2-8C2-2S capture card (SNF 3.0.20) and Suricata 5.0.2 in workers runmode.

Actions #12

Updated by Peter Manev over 4 years ago

Ok, thank you for confirming.

Actions #13

Updated by Roland Fischer over 4 years ago

I might be able to help out with the implementation, but I am not sure I'd be able to test it at home. Don't have access to work equipment. Unsure whether I'll have the time to set up the test env.

FYI, yup, pcap 32bits counters are "interesting". We've had to fix the same thing at work for another project and fixed it the way it's proposed by storing the old value and do "wrap-around substraction". Mind you, by the time we've implemented that we had a native SNF driver. Moreover, at 10gbps and worst-case minimal eth packet size (60 bytes / 84 with preamble, gap and crc) you need to get your stats no later than every 4.5 mins.

A longer term plan could be to get a 64-bit pcap_stats_ex() or so into libpcap, but that would require Myricom/SNF libpcap updates as well.

A hail mary would be to ask Aria/Myricom whether they'd be willing to provide an additional, non-pcap-conforming, 64 stats API for their libpcap, or expose the internal ring-handle so that we could use the SNF native snf_ring_stats().

Another longer term plan is to implement an SNF source. This has not been a prio for us so far, so I am doubtful it will be in the future.

FYI, quickly checked if Myricom/Sniffer v3 provide an extended stats for their libpcap and whether you could possibly recompile Suricata with Myricom's pcap.h to get 64-bit counters, but alas not. Makes sense as they want to keep ABI compatibility with the normal libpcap/pcap_stats() to have a drop-in replacement.

Actions #14

Updated by Roland Fischer over 4 years ago

FYI, I talk about my company when talking about "SNF source has not been a prio for us" in the comment above, not about the Suricata project. I am just a bystander here.

Browsing through the source code I realize that stats are collected every sec in Suricata. My bad. Ignore the "4.5mins statement above".

Just for completeness sake, another option in case there's a "have you thought about that discussion"...
Instead of waiting for a libpcap 64bit stats implementation, implement it in suricata. Essentially implement an alternative "64-bit pcap snf stats call" to use instead of the normal PcapDumpCounters() in source-pcap.c if some SNF compile switch is set. Requires some "pcap open-heart surgery" to look into pcap's opaque handle but is more than doable. Look at https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-snf.c#L50 for an "inspiration". It would require clients compiling Suricata themselves to get this support and hopefully not LD_PRELOADing the wrong libpcap or bad things would happen. I personally don't like it due to that accidental misuse. A config could make this more palpable, but not by much. However, it would provide (native SNF) 64bit counts without having to implement a full SNF source.

Actions #15

Updated by Roland Fischer over 4 years ago

Sorry, let me know if there is too much chatter from my side.

Anyways, if noone else is looking into this ticket, I can take care of it. Just let me know.

Would implement it as proposed by Victor to keep the old stats around as that seems the most useful solution for now, imho.

As a side note, I can add an "Implement native SNF source" ticket and describe what needs to be done if that is of any help. Depending on how much longer this work-from-home will last and my other work load, I might be able to work on it in a few weeks probably.

Actions #16

Updated by Roland Fischer over 4 years ago

FYI, I do have an implementation for this based on storing the last value in the pcap source as mentioned by Victor. I.e. it handles one-time wrap-around of pcap_stats. Should be able to open an official PR soon...

Actions #18

Updated by Victor Julien over 4 years ago

  • Tracker changed from Support to Optimization
  • Status changed from New to In Review
  • Assignee changed from Community Ticket to Roland Fischer
  • Priority changed from Low to Normal
  • Target version set to 6.0.0beta1
  • Affected Versions deleted (4.0.6, 4.1.2)
Actions #19

Updated by FATEMA WALA over 4 years ago

I am running Suricata v 5.0.2 with Myricom SNF : 10G-PCIE-8B-S myri_snf 3.0.20.50894

And I notice the value of capture.kernel_* counters way more than the u32 limit (2^32) of pcap_stat().
Is it because of Suricata using Myricom's libpcap libraries?
I am a little confused after reading this thread, as par this thread these counters shouldn't exceed 2^32 (unless the PR by Roland is merged and available in the latest Suri update, which clearly is not the case as I am still running an old version of Suri) . Am I missing something here?

Here are some values of these counters from my stats.log file:

------------------------------------------------------------------------------------
Date: 6/18/2020 -- 09:37:18 (uptime: 0d, 02h 47m 30s)
------------------------------------------------------------------------------------
Counter | TM Name | Value
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
capture.kernel_packets | Total | 67707835013
capture.kernel_drops | Total | 67346525680

more..

$stats.log | grep "capture"

capture.kernel_packets | Total | 68408818324
capture.kernel_drops | Total | 68089493904
capture.kernel_ifdrops | Total | 2800
capture.kernel_packets | Total | 68408994460
capture.kernel_drops | Total | 68089493904
capture.kernel_ifdrops | Total | 2800

Actions #20

Updated by FATEMA WALA over 4 years ago

FATEMA WALA wrote in #note-19:

I am running Suricata v 5.0.2 with Myricom SNF : 10G-PCIE-8B-S myri_snf 3.0.20.50894

And I notice the value of capture.kernel_* counters way more than the u32 limit (2^32) of pcap_stat().
Is it because of Suricata using Myricom's libpcap libraries?
I am a little confused after reading this thread, as par this thread these counters shouldn't exceed 2^32 (unless the PR by Roland is merged and available in the latest Suri update, which clearly is not the case as I am still running an old version of Suri) . Am I missing something here?

Here are some values of these counters from my stats.log file:

------------------------------------------------------------------------------------
Date: 6/18/2020 -- 09:37:18 (uptime: 0d, 02h 47m 30s)
------------------------------------------------------------------------------------
Counter | TM Name | Value
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
capture.kernel_packets | Total | 67707835013
capture.kernel_drops | Total | 67346525680

more..

$stats.log | grep "capture"

capture.kernel_packets | Total | 68408818324
capture.kernel_drops | Total | 68089493904
capture.kernel_ifdrops | Total | 2800
capture.kernel_packets | Total | 68408994460
capture.kernel_drops | Total | 68089493904
capture.kernel_ifdrops | Total | 2800

Ah.. Nevermind, I realized it's TOTAL of 16 threads, which means 16 x 2^32 can be expected as the maximum value of the capyure.kernel_packets value..
Sorry my bad..

Actions #21

Updated by Victor Julien over 4 years ago

  • Status changed from In Review to Closed
Actions

Also available in: Atom PDF