Project

General

Profile

Actions

Bug #2993

closed

Suricata 5.0.0beta1 memory allocation of 4294966034 bytes failed

Added by Anonymous over 5 years ago. Updated almost 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Affected Versions:
Effort:
Difficulty:
Label:

Description

Today suricata stopped after running for almost 9h:
May 27 12:05:22 server suricata28762: memory allocation of 4294966034 bytes failed
May 27 12:05:25 server systemd1: suricata.service: main process exited, code=killed, status=6/ABRT
May 27 12:05:25 server systemd1: Unit suricata.service entered failed state.
May 27 12:05:25 server systemd1: suricata.service failed.

Could not find a core file though I configured a home directory and core dump in suricata.yaml
As far as I can see the system was not running out of any resources.

  1. suricata -V
    This is Suricata version 5.0.0-beta1 RELEASE
  1. ldd `which suricata`
    linux-vdso.so.1 => (0x00007ffd769c7000)
    libhtp.so.2 => /lib/libhtp.so.2 (0x00007fd563c32000)
    librt.so.1 => /lib64/librt.so.1 (0x00007fd563a2a000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fd563728000)
    liblz4.so.1 => /lib64/liblz4.so.1 (0x00007fd563513000)
    libGeoIP.so.1 => /lib64/libGeoIP.so.1 (0x00007fd5632e3000)
    libmagic.so.1 => /lib64/libmagic.so.1 (0x00007fd5630c6000)
    libcap-ng.so.0 => /lib64/libcap-ng.so.0 (0x00007fd562ec0000)
    libpfring.so.1 => /usr/local/lib/libpfring.so.1 (0x00007fd562c28000)
    libpcap.so.1 => /usr/local/lib/libpcap.so.1 (0x00007fd562955000)
    libnet.so.1 => /lib64/libnet.so.1 (0x00007fd56273b000)
    libjansson.so.4 => /lib64/libjansson.so.4 (0x00007fd56252e000)
    libyaml-0.so.2 => /lib64/libyaml-0.so.2 (0x00007fd56230e000)
    libhs.so.5 => /usr/local/lib64/libhs.so.5 (0x00007fd5619f5000)
    libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fd561793000)
    libz.so.1 => /lib64/libz.so.1 (0x00007fd56157d000)
    liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fd561357000)
    libssl3.so => /lib64/libssl3.so (0x00007fd561105000)
    libsmime3.so => /lib64/libsmime3.so (0x00007fd560ede000)
    libnss3.so => /lib64/libnss3.so (0x00007fd560bb1000)
    libnssutil3.so => /lib64/libnssutil3.so (0x00007fd560981000)
    libplds4.so => /lib64/libplds4.so (0x00007fd56077d000)
    libplc4.so => /lib64/libplc4.so (0x00007fd560578000)
    libnspr4.so => /lib64/libnspr4.so (0x00007fd56033a000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd56011e000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fd55ff1a000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd55fd04000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fd55f937000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fd563e51000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fd55f630000)

Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 7.6 (Maipo)
Release: 7.6
Codename: Maipo
Kernel 3.10.0-957.12.1.el7.x86_64

pfring : 7.5.0 dev


Files

suricata.yaml (75.8 KB) suricata.yaml Anonymous, 05/28/2019 08:25 AM
Actions #1

Updated by Peter Manev over 5 years ago

Hi,
Thank you for the report!
Was this a default config run or there were changes made to the memcap/ring settings ?

Actions #2

Updated by Anonymous over 5 years ago

Hi,
  1. cat /etc/modprobe.d/pf_ring.conf
    options pf_ring transparent_mode=0 min_num_slots=32767 enable_tx_capture=0
Actions #3

Updated by Andreas Herz over 5 years ago

  • Assignee set to OISF Dev
  • Target version set to Support

Can you share some more details? Like suricata --build-info, the configuration file and it would be perfect if you can reproduce it with a coredump.

At https://redmine.openinfosecfoundation.org/projects/suricata/wiki/Debugging you can find more information how to debug this.

Actions #4

Updated by Anonymous over 5 years ago

Config file attached, maybe the settings are not correct to create a core dump because I was not able to find one after the crash yesterday?


# suricata --build-info
This is Suricata version 5.0.0-beta1 RELEASE
Features: PCAP_SET_BUFF PF_RING AF_PACKET HAVE_PACKET_FANOUT LIBCAP_NG LIBNET1.1 HAVE_HTP_URI_NORMALIZE_HOOK PCRE_JIT HAVE_NSS HAVE_LIBJANSSON TLS MAGIC RUST 
SIMD support: SSE_4_2 SSE_4_1 SSE_3 
Atomic intrisics: 1 2 4 8 16 byte(s)
64-bits, Little-endian architecture
GCC version 4.8.5 20150623 (Red Hat 4.8.5-36), C version 199901
compiled with -fstack-protector
compiled with _FORTIFY_SOURCE=2
L1 cache line size (CLS)=64
thread local storage method: __thread
compiled with LibHTP v0.5.30, linked against LibHTP v0.5.30

Suricata Configuration:
  AF_PACKET support:                       yes
  eBPF support:                            no
  XDP support:                             no
  PF_RING support:                         yes
  NFQueue support:                         no
  NFLOG support:                           no
  IPFW support:                            no
  Netmap support:                          no 
  DAG enabled:                             no
  Napatech enabled:                        no
  WinDivert enabled:                       no

  Unix socket enabled:                     yes
  Detection enabled:                       yes

  Libmagic support:                        yes
  libnss support:                          yes
  libnspr support:                         yes
  libjansson support:                      yes
  liblzma support:                         yes
  hiredis support:                         no
  hiredis async with libevent:             no
  Prelude support:                         no
  PCRE jit:                                yes
  LUA support:                             no
  libluajit:                               no
  libgeoip:                                yes
  Non-bundled htp:                         no
  Old barnyard2 support:                   no
  Hyperscan support:                       yes
  Libnet support:                          yes
  liblz4 support:                          yes

  Rust support:                            yes
  Rust strict mode:                        no
  Rust debug mode:                         no
  Rust compiler:                           rustc 1.34.0
  Rust cargo:                              cargo 1.34.0

  Python support:                          yes
  Python path:                             /bin/python2.7
  Python version:                          
  Python distutils                         yes
  Python yaml                              yes
  Install suricatactl:                     yes
  Install suricatasc:                      yes
  Install suricata-update:                 yes

  Profiling enabled:                       no
  Profiling locks enabled:                 no

Development settings:
  Coccinelle / spatch:                     no
  Unit tests enabled:                      no
  Debug output enabled:                    no
  Debug validation enabled:                no

Generic build parameters:
  Installation prefix:                     /usr
  Configuration directory:                 /etc/suricata/
  Log directory:                           /var/log/suricata/

  --prefix                                 /usr
  --sysconfdir                             /etc
  --localstatedir                          /var
  --datarootdir                            /usr/share

  Host:                                    x86_64-pc-linux-gnu
  Compiler:                                gcc (exec name) / gcc (real)
  GCC Protect enabled:                     yes
  GCC march native enabled:                yes
  GCC Profile enabled:                     no
  Position Independent Executable enabled: no
  CFLAGS                                   -g -O2 -march=native -I${srcdir}/../rust/gen/c-headers
  PCAP_CFLAGS                               -I/usr/local/include
  SECCFLAGS                                -fstack-protector -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security

Actions #5

Updated by Peter Manev over 5 years ago

How much memory you have on the box btw in total? (and is it only suricata running or something else that can be resource intensive ?)

Actions #6

Updated by Anonymous over 5 years ago

64Gb RAM available and indeed an ELK stack consumes almost half of the available RAM, but this setup has been running for years quite stable. On the other we want/need to move the Elasticsearch setup to a newly to build cluster.

Actions #7

Updated by Peter Manev over 5 years ago

I think it is a good candidate reason in some conditions (aka an intense ES search) for not being able to have that needed extra allocation. In general yes - could be better if things are separated as i suspect resource contention would also occur(if not yet) in terms of CPU too.

Actions #8

Updated by Victor Julien over 5 years ago

I don't think this has anything to do with the system or what else is running on it. The error suggests suri tried to alloc almost 4GB in a single malloc, which is certainly a bug. It smells like an int underflow.

Actions #9

Updated by Anonymous over 5 years ago

Indeed, but I had a look at some system resources when the crash occured, but neither Zabbix nor sar did mention lack of memory or cpu. Oke is is a busy system, but that keeps the dust away ;)

Actions #10

Updated by Peter Manev over 5 years ago

Does it happen repeatedly ?

Actions #11

Updated by Anonymous over 5 years ago

Running v5 since May second, it happend twice so far:

2019-05-27T12:05:22.331122+02:00 server suricata: memory allocation of 4294966034 bytes failed
2019-05-21T09:44:12.279565+02:00 server suricata: memory allocation of 4294926257 bytes failed

Suricata is restarted daily to rotate logs.

Actions #12

Updated by Peter Manev over 5 years ago

Is it possible to correlate the sys/ram utilization at that time ? Just to make sure it is not sys/mem related I was thinking.

Actions #13

Updated by Anonymous over 5 years ago

May 27 crash at 12:05 :


                CPU     %user     %nice   %system   %iowait    %steal     %idle
11:30:02 AM     all     38.47      5.57     12.01      3.43      0.00     40.53
11:40:02 AM     all     37.36      5.61     11.57      4.06      0.00     41.40
11:50:01 AM     all     38.84      5.73     11.63      4.01      0.00     39.79
12:00:02 PM     all     38.99      5.63     12.03      3.78      0.00     39.57
12:10:01 PM     all     17.96      2.53      5.88      2.80      0.00     70.82
12:20:01 PM     all      1.27      0.08      0.46      0.17      0.00     98.02
12:30:01 PM     all      1.06      0.07      0.43      0.05      0.00     98.39

            kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
11:30:02 AM    429940  65502432     99.35      2784   2184880  42437912     57.10  27393028   3474244     87552
11:40:02 AM    434328  65498044     99.34      2732   2262996  42416296     57.07  27314036   3541212     26828
11:50:01 AM    533528  65398844     99.19      2568   1451616  42286528     56.90  27520568   3257140     27148
12:00:02 PM    598556  65333816     99.09      2700   1085900  42580564     57.29  27543920   3160296     78272
12:10:01 PM  26531992  39400380     59.76      6572   4556032  35589444     47.89   2030348   4324928      2220
12:20:01 PM  25655172  40277200     61.09     23952   5361284  35609344     47.91   2549072   4669684      3396
12:30:01 PM  25304540  40627832     61.62     24840   5701996  35586908     47.88   2782596   4777068      2944

             kbswpfree kbswpused  %swpused  kbswpcad   %swpcad
11:30:02 AM   2970544   5418060     64.59    232280      4.29
11:40:02 AM   2714704   5673900     67.64    229024      4.04
11:50:01 AM   2390440   5998164     71.50    249900      4.17
12:00:02 PM   2020524   6368080     75.91    220356      3.46
12:10:01 PM   3570620   4817984     57.43     91680      1.90
12:20:01 PM   3576252   4812352     57.37    121428      2.52
12:30:01 PM   3579580   4809024     57.33    124324      2.59

May 21 crash at 09:44 :


                CPU     %user     %nice   %system   %iowait    %steal     %idle
09:00:01 AM     all     33.32      7.42      9.71      2.61      0.00     46.94
09:10:01 AM     all     32.51      6.73     10.10      2.17      0.00     48.49
09:20:02 AM     all     32.09      6.75     10.27      2.50      0.00     48.40
09:30:02 AM     all     32.20      6.41      9.55      2.37      0.00     49.46
09:40:02 AM     all     34.08      6.48     10.17      4.98      0.00     44.29
09:50:01 AM     all     16.93      2.75      5.31      2.47      0.00     72.54
10:00:02 AM     all      0.94      0.08      0.45      0.05      0.00     98.49

            kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
09:00:01 AM    684396  65247976     98.96      6620  12489868  39250032     52.81  21870676   8354172    313188
09:10:01 AM    768752  65163620     98.83      6884  10145636  40047656     53.88  22893100   7287660    417328
09:20:02 AM    560148  65372224     99.15      6848   7709440  40094496     53.95  24282280   6187248    390432
09:30:02 AM    669136  65263236     98.99      5820   6772564  39834916     53.60  24672960   5765980    261164
09:40:02 AM    436924  65495448     99.34      2392    774276  40230372     54.13  27868580   3037652     32116
09:50:01 AM  29049272  36883100     55.94     25652   2155684  35554456     47.84   2226176   1659824      1304
10:00:02 AM  28781336  37151036     56.35     26612   2405776  35554344     47.84   2534504   1611564        96

Actions #14

Updated by Anonymous over 5 years ago


             kbswpfree kbswpused  %swpused  kbswpcad   %swpcad
09:00:01 AM   3479724   4908880     58.52    301672      6.15
09:10:01 AM   3473240   4915364     58.60    300224      6.11
09:20:02 AM   3468888   4919716     58.65    278480      5.66
09:30:02 AM   3457368   4931236     58.78    240940      4.89
09:40:02 AM   2628548   5760056     68.67    122640      2.13
09:50:01 AM   3331300   5057304     60.29    125436      2.48
10:00:02 AM   3338468   5050136     60.20    128456      2.54
Actions #15

Updated by Andreas Herz over 5 years ago

Might be worth to test AF_PACKET instead of PF_RING and a more recent kernel maybe?

Actions #16

Updated by Victor Julien about 5 years ago

I think it would be good to set a maximum to what a single allocation can try to get. Determining what that max would be is of course the trick.

Actions #17

Updated by Victor Julien almost 5 years ago

Possibly related:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fb803076801 in __GI_abort () at abort.c:79
#2  0x00005619886e5367 in std::sys::unix::abort_internal ()
#3  0x00005619886f39fd in rust_oom ()
#4  0x0000561988708b67 in alloc::alloc::handle_alloc_error ()
#5  0x000056198860a181 in alloc::raw_vec::RawVec<T,A>::allocate_in (cap=4294965836, zeroed=<optimized out>, a=...) at /usr/src/rustc-1.37.0/src/liballoc/raw_vec.rs:99
#6  alloc::raw_vec::RawVec<T>::with_capacity_zeroed (cap=4294965836) at /usr/src/rustc-1.37.0/src/liballoc/raw_vec.rs:145
#7  <u8 as alloc::vec::SpecFromElem>::from_elem (elem=0, n=4294965836) at /usr/src/rustc-1.37.0/src/liballoc/vec.rs:1635
#8  alloc::vec::from_elem (elem=0, n=4294965836) at /usr/src/rustc-1.37.0/src/liballoc/vec.rs:1614
#9  suricata::smb::smb::SMBState::parse_tcp_data_tc_gap (self=0x7fb7f47b6630, gap_size=<optimized out>) at src/smb/smb.rs:1734
#10 rs_smb_parse_response_tcp_gap (state=state@entry=0x7fb7f47b6630, input_len=<optimized out>, input_len@entry=4294965836) at src/smb/smb.rs:1864
#11 0x00005619883f766e in SMBTCPParseResponse (f=0x56198a88b380, state=0x7fb7f47b6630, pstate=0x7fb7f4611760, input=0x0, input_len=4294965836, local_data=0x0, flags=24 '\030') at app-layer-smb.c:68
#12 0x00005619883f6e06 in AppLayerParserParse (tv=tv@entry=0x56199f507920, alp_tctx=0x7fb7f429eeb0, f=f@entry=0x56198a88b380, alproto=8, flags=<optimized out>, input=input@entry=0x0,
    input_len=input_len@entry=4294965836) at app-layer-parser.c:1225
#13 0x00005619883b1cd6 in AppLayerHandleTCPData (tv=tv@entry=0x56199f507920, ra_ctx=ra_ctx@entry=0x7fb7f429ea90, p=p@entry=0x7fb7f4268180, f=0x56198a88b380, ssn=ssn@entry=0x7fb7f4359620,
    stream=stream@entry=0x7fb7fd3ae748, data=<optimized out>, data@entry=0x0, data_len=<optimized out>, flags=<optimized out>) at app-layer.c:601
#14 0x00005619885483d0 in ReassembleUpdateAppLayer (dir=<optimized out>, p=<optimized out>, stream=<optimized out>, ssn=<optimized out>, ra_ctx=<optimized out>, tv=<optimized out>)
    at stream-tcp-reassemble.c:1012
#15 StreamTcpReassembleAppLayer (tv=tv@entry=0x56199f507920, ra_ctx=ra_ctx@entry=0x7fb7f429ea90, ssn=ssn@entry=0x7fb7f4359620, stream=<optimized out>, stream@entry=0x7fb7f4359630, p=p@entry=0x7fb7f4268180,
    dir=dir@entry=UPDATE_DIR_PACKET) at stream-tcp-reassemble.c:1123
#16 0x000056198854905c in StreamTcpReassembleHandleSegment (tv=tv@entry=0x56199f507920, ra_ctx=0x7fb7f429ea90, ssn=ssn@entry=0x7fb7f4359620, stream=0x7fb7f4359630, p=p@entry=0x7fb7f4268180,
    pq=pq@entry=0x7fb7f429e738) at stream-tcp-reassemble.c:1788
#17 0x000056198853e33f in HandleEstablishedPacketToClient (stt=<optimized out>, pq=<optimized out>, p=<optimized out>, ssn=<optimized out>, tv=<optimized out>) at stream-tcp.c:2408
#18 StreamTcpPacketStateEstablished (tv=0x56199f507920, p=0x7fb7f4268180, stt=stt@entry=0x7fb7f429e730, ssn=0x7fb7f4359620, pq=0x7fb7f429e738) at stream-tcp.c:2645
#19 0x0000561988541870 in StreamTcpStateDispatch (tv=0x56199f507920, p=0x7fb7f4268180, stt=0x7fb7f429e730, ssn=0x7fb7f4359620, pq=0x7fb7f429e738, state=<optimized out>) at stream-tcp.c:4650
#20 0x0000561988543bd9 in StreamTcpPacket (tv=0x56199f507920, p=0x7fb7f4268180, stt=0x7fb7f429e730, pq=0x7fb7f428d5f8) at stream-tcp.c:4838
#21 0x0000561988544505 in StreamTcp (tv=tv@entry=0x56199f507920, p=p@entry=0x7fb7f4268180, data=<optimized out>, pq=pq@entry=0x7fb7f428d5f8, postpq=postpq@entry=0x0) at stream-tcp.c:5174
#22 0x00005619884cc4b2 in FlowWorker (tv=0x56199f507920, p=0x7fb7f4268180, data=0x7fb7f428d5d0, preq=0x56199f507d00, unused=<optimized out>) at flow-worker.c:245
#23 0x000056198855308c in TmThreadsSlotVarRun (tv=tv@entry=0x56199f507920, p=p@entry=0x7fb7f4268180, slot=slot@entry=0x56199f507b80) at tm-threads.c:130
#24 0x000056198852ae4e in TmThreadsSlotProcessPkt (p=0x7fb7f4268180, s=0x56199f507b80, tv=0x56199f507920) at tm-threads.h:162
#25 AFPReadFromRing (ptv=<optimized out>) at source-af-packet.c:1025
#26 0x000056198852ecd6 in ReceiveAFPLoop (tv=0x56199f507920, data=<optimized out>, slot=<optimized out>) at source-af-packet.c:1589
#27 0x000056198855430d in TmThreadsSlotPktAcqLoop (td=0x56199f507920) at tm-threads.c:335
#28 0x00007fb803cf36db in start_thread (arg=0x7fb7fd3b0700) at pthread_create.c:463
#29 0x00007fb80315788f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Actions #18

Updated by Victor Julien almost 5 years ago

  • Status changed from New to Assigned
  • Assignee changed from OISF Dev to Victor Julien
  • Target version changed from Support to 5.0.2
  • Affected Versions 5.0.0, 5.0.1 added
Actions #19

Updated by Victor Julien almost 5 years ago

  • Status changed from Assigned to Closed
Actions

Also available in: Atom PDF