Bug #4219
closedSuricata 6.0.1 segfault
Added by Anonymous about 4 years ago. Updated almost 4 years ago.
Description
Hello,
Last night round 1h CEST so not much of a load, suricata crashed, not able to find a core dump (is configured in suricata.yaml)
Dec 14 01:06:57 scomp1185 kernel: W#01[47474]: segfault at 5a4 ip 00007fda8d6c2c40 sp 00007fd9d4efb568 error 4 in libc-2.17.so[7fda8d576000+1c4000] Dec 14 01:07:00 scomp1185 kernel: device ens2f0 left promiscuous mode Dec 14 01:07:01 scomp1185 kernel: device ens2f1 left promiscuous mode Dec 14 01:07:01 scomp1185 systemd[1]: suricata.service: main process exited, code=killed, status=11/SEGV Dec 14 01:07:01 scomp1185 systemd[1]: Unit suricata.service entered failed state. Dec 14 01:07:01 scomp1185 systemd: suricata.service: main process exited, code=killed, status=11/SEGV Dec 14 01:07:01 scomp1185 systemd: Unit suricata.service entered failed state. Dec 14 01:07:01 scomp1185 systemd[1]: suricata.service failed. Dec 14 01:07:01 scomp1185 systemd: suricata.service failed.
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 7.9 (Maipo) Release: 7.9 Codename: Maipo Kernel: 3.10.0-1160.2.2.el7.x86_64 #1 SMP Sat Oct 17 05:06:47 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux SELinux not enabled
This is Suricata version 6.0.1 RELEASE
# cat /proc/net/pf_ring/info PF_RING Version : 7.8.0 (unknown) Total rings : 28 Standard (non ZC) Options Ring slots : 32767 Slot version : 18 Capture TX : No [RX only] IP Defragment : No Socket Mode : Standard Cluster Fragment Queue : 201 Cluster Fragment Discard : 0
Updated by Victor Julien about 4 years ago
Think it is going to be hard to determine the cause of this. A segfault in libc is probably related to some kind of memory corruption or double free or something, but w/o more details it will be impossible to say where. Are you able to run it in the foreground in gdb or fix your core dumping setup? Then the next crash might give us more clues.
Updated by Anonymous about 4 years ago
Hi Victor, I know, any adviced settings regarding coredump? Bit busy here seen a.o. Solarwinds ;)
Updated by Anonymous about 4 years ago
Crashed again today:
Dec 15 14:43:54 scomp1185 kernel: W#2922400: segfault at 3eac ip 00007fae97d08c40 sp 00007fad41ffa5c8 error 4 in libc-2.17.so[7fae97bbc000+1c4000]
Dec 15 14:43:56 scomp1185 kernel: device ens2f0 left promiscuous mode
Dec 15 14:43:58 scomp1185 kernel: device ens2f1 left promiscuous mode
Dec 15 14:43:58 scomp1185 systemd1: suricata.service: main process exited, code=killed, status=11/SEGV
Dec 15 14:43:58 scomp1185 systemd1: Unit suricata.service entered failed state.
Dec 15 14:43:58 scomp1185 systemd: suricata.service: main process exited, code=killed, status=11/SEGV
Dec 15 14:43:58 scomp1185 systemd1: suricata.service failed.
Dec 15 14:43:58 scomp1185 systemd: Unit suricata.service entered failed state.
Dec 15 14:43:58 scomp1185 systemd: suricata.service failed.
suricata.yaml:
coredump:
max-dump: 2g
#max-dump: unlimited
run-as:
user: suri
group: suri
daemon-directory: "/home/suri/"
Runs a non priviled user, but no core in it's home directory or elsewhere to be found?!
Updated by Anonymous about 4 years ago
And again last night, please advise on correcting the coredump settings! Running via gdb is a bit tricky seen log rotating and rules update scripts.
Otherwise I've to switch back to a stable 5.0.5 because this is production now failing.
Dec 16 04:19:59 scomp1185 kernel: W#2443752: segfault at 0 ip 00007f26a05d37b0 sp 00007f25657f9568 error 4 in libc-2.17.so[7f26a0485000+1c4000]
Dec 16 04:20:00 scomp1185 kernel: device ens2f0 left promiscuous mode
Dec 16 04:20:02 scomp1185 kernel: device ens2f1 left promiscuous mode
Dec 16 04:20:02 scomp1185 systemd1: suricata.service: main process exited, code=killed, status=11/SEGV
Dec 16 04:20:02 scomp1185 systemd1: Unit suricata.service entered failed state.
Dec 16 04:20:02 scomp1185 systemd: suricata.service: main process exited, code=killed, status=11/SEGV
Dec 16 04:20:02 scomp1185 systemd: Unit suricata.service entered failed state.
Dec 16 04:20:02 scomp1185 systemd1: suricata.service failed.
Dec 16 04:20:02 scomp1185 systemd: suricata.service failed.
Updated by Anonymous about 4 years ago
Getting better, also crashed a moment ago on an other server:
W#1731108: segfault at 0 ip 00007fad4b47dbe5 sp 00007fac5bffe598 error 4 in libc-2.17.so[7fad4b331000+1c4000]
Switching back to 5.0.5
Updated by Peter Manev about 4 years ago
Do you start it as a service or on the command line?
The core might be in "/" (root) folder maybe? Did you compile it with debugging enabled to get us a bit more clues ? (for ref "./configure YOUR_USUAL_FLAGS CFLAGS="-ggdb -O0" " )
Updated by Anonymous about 4 years ago
As a service via systemd systemctl indeed.
No I'm running suricata under a normal account, non root.
No debugging options configured, I'll build 6.0.1 again if you prefer:
./configure --enable-gccprotect --prefix=/usr/ --sysconfdir=/etc/ --localstatedir=/var/ --enable-pfring LD_RUN_PATH=/opt/PF_RING/lib:/usr/lib:/usr/local/lib --with-libpfring-libraries=/usr/local/lib/ --with-libpfring-includes=/opt/PF_RING/include/ --with-libpcap-libraries=/usr/local/lib/ --with-libpcap-includes=/usr/local/include/ --enable-geoip -with-libhs-includes=/usr/local/include/hs/ --with-libhs-libraries=/usr/local/lib64/ --enable-rust
What to add to improve you're debugging options?
TIA!
Updated by Victor Julien about 4 years ago
- Status changed from New to Assigned
- Assignee set to Philippe Antoine
- Target version set to 6.0.2
I've helped Andre to get a bt and analyze it:
#0 0x00007f7bbfa5dc40 in __memcpy_ssse3 () from /lib64/libc.so.6 #1 0x00000000004e2dc0 in HTPCallbackResponseHeaderData (tx_data=0x7f7b28efb2d0) at app-layer-htp.c:2337 #2 0x00007f7bc3a1116a in htp_hook_run_all (hook=0x12e6350, user_data=user_data@entry=0x7f7b28efb2d0) at htp_hooks.c:127 #3 0x00007f7bc3a157d5 in htp_connp_res_receiver_send_data (connp=connp@entry=0x7f7b8d3c0e90, is_last=is_last@entry=0) at htp_response.c:102 #4 0x00007f7bc3a174b2 in htp_connp_res_data (connp=0x7f7b8d3c0e90, timestamp=<optimized out>, data=<optimized out>, len=<optimized out>) at htp_response.c:1342 #5 0x00000000004e00df in HTPHandleResponseData (f=0x7f7ac4cdea20, htp_state=0x7f7b8ef0a050, pstate=0x7f7b8e38f2f0, input=0x0, input_len=1460, local_data=0x0, flags=26 '\032') at app-layer-htp.c:953 #6 0x00000000004eb756 in AppLayerParserParse (tv=0x182c5850, alp_tctx=0x7f7b1c548340, f=0x7f7ac4cdea20, alproto=1, flags=26 '\032', input=0x0, input_len=1460) at app-layer-parser.c:1258 #7 0x00000000004bbc00 in AppLayerHandleTCPData (tv=0x182c5850, ra_ctx=0x7f7b1c535cc0, p=0x7f7b1c4deb80, f=0x7f7ac4cdea20, ssn=0x7f7b1c60e430, stream=0x7f7b28efb5e0, data=0x0, data_len=1460, flags=26 '\032') at app-layer.c:627 #8 0x0000000000616b12 in ReassembleUpdateAppLayer (tv=0x182c5850, ra_ctx=0x7f7b1c535cc0, ssn=0x7f7b1c60e430, stream=0x7f7b28efb5e0, p=0x7f7b1c4deb80, dir=UPDATE_DIR_PACKET) at stream-tcp-reassemble.c:1111 #9 0x0000000000616f7c in StreamTcpReassembleAppLayer (tv=0x182c5850, ra_ctx=0x7f7b1c535cc0, ssn=0x7f7b1c60e430, stream=0x7f7b1c60e440, p=0x7f7b1c4deb80, dir=UPDATE_DIR_PACKET) at stream-tcp-reassemble.c:1238 #10 0x000000000061824c in StreamTcpReassembleHandleSegment (tv=0x182c5850, ra_ctx=0x7f7b1c535cc0, ssn=0x7f7b1c60e430, stream=0x7f7b1c60e440, p=0x7f7b1c4deb80, pq=0x7f7b1c50c7a0) at stream-tcp-reassemble.c:1900 #11 0x000000000060fcae in StreamTcpPacket (tv=0x182c5850, p=0x7f7b1c4deb80, stt=0x7f7b1c5359b0, pq=0x7f7b1c50c7a0) at stream-tcp.c:4824 #12 0x000000000061075f in StreamTcp (tv=0x182c5850, p=0x7f7b1c4deb80, data=0x7f7b1c5359b0, pq=0x7f7b1c50c7a0) at stream-tcp.c:5194 #13 0x00000000005abfab in FlowWorkerStreamTCPUpdate (tv=0x182c5850, fw=0x7f7b1c50c770, p=0x7f7b1c4deb80, detect_thread=0x7f7b1c61ed10) at flow-worker.c:364 #14 0x00000000005ac0c6 in FlowWorkerFlowTimeout (tv=0x182c5850, p=0x7f7b1c4deb80, fw=0x7f7b1c50c770, detect_thread=0x7f7b1c61ed10) at flow-worker.c:401 #15 0x00000000005ab7c9 in FlowFinish (tv=0x182c5850, f=0x7f7ac4cdea20, fw=0x7f7b1c50c770, detect_thread=0x7f7b1c61ed10) at flow-worker.c:157 #16 0x00000000005ab88c in CheckWorkQueue (tv=0x182c5850, fw=0x7f7b1c50c770, detect_thread=0x7f7b1c61ed10, counters=0x7f7b28efb850, fq=0x7f7b28efb860) at flow-worker.c:183 #17 0x00000000005ac25c in FlowWorkerProcessInjectedFlows (tv=0x182c5850, fw=0x7f7b1c50c770, p=0x7f7b1c4dff00, detect_thread=0x7f7b1c61ed10) at flow-worker.c:447 #18 0x00000000005ac5d6 in FlowWorker (tv=0x182c5850, p=0x7f7b1c4dff00, data=0x7f7b1c50c770) at flow-worker.c:570 #19 0x0000000000621fd6 in TmThreadsSlotVarRun (tv=0x182c5850, p=0x7f7b1c4dff00, slot=0x1a576e70) at tm-threads.c:117 #20 0x00000000005fff61 in TmThreadsSlotProcessPkt (tv=0x182c5850, s=0x1a576e70, p=0x7f7b1c4dff00) at tm-threads.h:192 #21 0x00000000006007c3 in ReceivePfringLoop (tv=0x182c5850, data=0x7f7b1c4e08c0, slot=0x1c78a9a0) at source-pfring.c:415 #22 0x00000000006226ba in TmThreadsSlotPktAcqLoop (td=0x182c5850) at tm-threads.c:312 #23 0x00007f7bc0100ea5 in start_thread () from /lib64/libpthread.so.0 #24 0x00007f7bbfa0f96d in clone () from /lib64/libc.so.6 (gdb) f 1 #1 0x00000000004e2dc0 in HTPCallbackResponseHeaderData (tx_data=0x7f7b28efb2d0) at app-layer-htp.c:2337 2337 memcpy(tx_ud->response_headers_raw + tx_ud->response_headers_raw_len, (gdb) p *tx_data $4 = {tx = 0x7f79b02b5af0, data = 0x0, len = 1460, is_last = 0} #3 0x00007f7bc3a157d5 in htp_connp_res_receiver_send_data (connp=connp@entry=0x7f7b8d3c0e90, is_last=is_last@entry=0) at htp_response.c:102 102 htp_status_t rc = htp_hook_run_all(connp->out_data_receiver_hook, &d); (gdb) p d $5 = {tx = 0x7f79b02b5af0, data = 0x0, len = 1460, is_last = 0} (gdb) l 97 d.tx = connp->out_tx; 98 d.data = connp->out_current_data + connp->out_current_receiver_offset; 99 d.len = connp->out_current_read_offset - connp->out_current_receiver_offset; 100 d.is_last = is_last; 101 102 htp_status_t rc = htp_hook_run_all(connp->out_data_receiver_hook, &d); 103 if (rc != HTP_OK) return rc; 104 105 connp->out_current_receiver_offset = connp->out_current_read_offset; 106 (gdb) p connp->out_current_data $6 = (unsigned char *) 0x0 (gdb) p connp->out_current_receiver_offset $7 = 0 (gdb) p connp->out_current_read_offset $8 = 1460 #4 0x00007f7bc3a174b2 in htp_connp_res_data (connp=0x7f7b8d3c0e90, timestamp=<optimized out>, data=<optimized out>, len=<optimized out>) at htp_response.c:1342 1342 htp_connp_res_receiver_send_data(connp, 0 /* not last */); #5 0x00000000004e00df in HTPHandleResponseData (f=0x7f7ac4cdea20, htp_state=0x7f7b8ef0a050, pstate=0x7f7b8e38f2f0, input=0x0, input_len=1460, local_data=0x0, flags=26 '\032') at app-layer-htp.c:953 953 const int r = htp_connp_res_data(hstate->connp, &ts, input, input_len); (gdb) l 948 949 htp_time_t ts = { f->lastts.tv_sec, f->lastts.tv_usec }; 950 htp_tx_t *tx = NULL; 951 size_t consumed = 0; 952 if (input_len > 0) { 953 const int r = htp_connp_res_data(hstate->connp, &ts, input, input_len); 954 switch (r) { 955 case HTP_STREAM_ERROR: 956 ret = -1; 957 break; (gdb) p input $9 = (const uint8_t *) 0x0 (gdb) p input_len $10 = 1460 (gdb) p/x flags $11 = 0x1a
First look suggests an issue between suricata and libhtp on timeout handling, where suricata gets a NULL pointer it doesn't expect.
Unclear if that is the issue or if it is a symptom of a deeper issue.
Updated by Philippe Antoine almost 4 years ago
Can we print connp->out_state
and rc
in frame 4 htp_connp_res_data ? and htp_connp_RES_BODY_IDENTITY_CL_KNOWN
and htp_connp_RES_BODY_IDENTITY_STREAM_CLOSE
and htp_connp_RES_FINALIZE
?
Updated by Victor Julien almost 4 years ago
Not sure I understand what you meant with "and htp_connp_RES_BODY_IDENTITY_CL_KNOWN and htp_connp_RES_BODY_IDENTITY_STREAM_CLOSE and htp_connp_RES_FINALIZE ?" What do you need me to do there?
(gdb) f 4 #4 0x00007f7bc3a174b2 in htp_connp_res_data (connp=0x7f7b8d3c0e90, timestamp=<optimized out>, data=<optimized out>, len=<optimized out>) at htp_response.c:1342 1342 htp_connp_res_receiver_send_data(connp, 0 /* not last */); (gdb) p connp->out_state $1 = (int (*)(htp_connp_t *)) 0x7f7bc3a15940 <htp_connp_RES_BODY_IDENTITY_STREAM_CLOSE> (gdb) p rc $2 = <optimized out> (gdb) l 1337 } 1338 1339 if (rc != HTP_OK) { 1340 // Do we need more data? 1341 if ((rc == HTP_DATA) || (rc == HTP_DATA_BUFFER)) { 1342 htp_connp_res_receiver_send_data(connp, 0 /* not last */); 1343 1344 if (rc == HTP_DATA_BUFFER) { 1345 if (htp_connp_res_buffer(connp) != HTP_OK) { 1346 connp->out_status = HTP_STREAM_ERROR; (gdb) p htp_connp_RES_BODY_IDENTITY_STREAM_CLOSE $3 = {htp_status_t (htp_connp_t *)} 0x7f7bc3a15940 <htp_connp_RES_BODY_IDENTITY_STREAM_CLOSE> (gdb) p htp_connp_RES_FINALIZE $4 = {htp_status_t (htp_connp_t *)} 0x7f7bc3a16830 <htp_connp_RES_FINALIZE>
Updated by Philippe Antoine almost 4 years ago
Thanks, I got the information I need.
I think this highlights a deeper functional bug where we do not call htp_connp_res_receiver_finalize_clear
in one scenario to be determined
Updated by Philippe Antoine almost 4 years ago
- Status changed from Assigned to In Review
Gitlab MR
Root bug is that out_data_receiver_hook
does not get cleared by calling htp_connp_res_receiver_finalize_clear
for every case, especially when htp_tx_state_response_complete_ex
gets called while processing the headers
Thus, out_data_receiver_hook
got reused for another transaction where it was not valid
Updated by Philippe Antoine almost 4 years ago
Updated by Victor Julien almost 4 years ago
- Related to Task #4257: libhtp 0.5.37 added
Updated by Victor Julien almost 4 years ago
- Status changed from In Review to Closed
Fixed by https://github.com/OISF/libhtp/pull/314, thanks Philippe!
Updated by Philippe Antoine almost 4 years ago
- Status changed from Closed to In Review
https://github.com/OISF/libhtp/pull/315 to complete
Updated by Philippe Antoine almost 4 years ago
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29531&q=label%3AProj-libhtp
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29532&q=label%3AProj-libhtp
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29551&q=label%3AProj-libhtp
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29556&q=label%3AProj-libhtp
Updated by Philippe Antoine almost 4 years ago
And https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29690 is a new variant due to incomplete fix
https://github.com/OISF/libhtp/pull/317
Updated by Philippe Antoine almost 4 years ago
- Status changed from In Review to Closed
Merged in https://github.com/OISF/libhtp/pull/317