Bug #4582
closedBUG_ON triggered from TmThreadsInjectFlowById
Updated by Jeff Lucovsky over 3 years ago
SIGABRT at large customer with 40Gbps+
Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. [Current thread is 1 (Thread 0x7fee0f4ff640 (LWP 37729))] (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007ff0b30bf12e in __GI_abort () at abort.c:79 #2 0x00007ff0b30b680a in __assert_fail_base (fmt=0x7ff0b3210040 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x56501fb0ae88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x56501fb0aef6 "tm-threads.c", line=line@entry=2320, function=function@entry=0x56501fb0afd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:92 #3 0x00007ff0b30b6882 in __GI___assert_fail (assertion=assertion@entry=0x56501fb0ae88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x56501fb0aef6 "tm-threads.c", line=line@entry=2320, function=function@entry=0x56501fb0afd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:101 #4 0x000056501f830fa0 in TmThreadsInjectFlowById (f=f@entry=0x7fee6d734ac0, id=<optimized out>) at tm-threads.c:2320 #5 0x000056501f7d0469 in FlowForceReassemblyForFlow (f=f@entry=0x7fee6d734ac0) at flow-timeout.c:352 #6 0x000056501f7cd865 in ProcessAsideQueue (td=td@entry=0x7fee0e801030, counters=<optimized out>, counters=<optimized out>) at flow-manager.c:335 #7 0x000056501f7cdd4d in FlowTimeoutHash (td=td@entry=0x7fee0e801030, ts=ts@entry=0x7fee0f4fd310, hash_min=<optimized out>, hash_max=<optimized out>, counters=counters@entry=0x7fee0f4fd330) at flow-manager.c:524 #8 0x000056501f7ce69c in FlowTimeoutHashInChunks (chunks=40, iter=21, counters=0x7fee0f4fd330, hash_max=<optimized out>, hash_min=<optimized out>, ts=0x7fee0f4fd310, td=0x7fee0e801030) at flow-manager.c:554 #9 FlowManager (th_v=0x7fee93f16f80, thread_data=0x7fee0e801000) at flow-manager.c:899 #10 0x000056501f82ef54 in TmThreadsManagement (td=0x7fee93f16f80) at tm-threads.c:551 #11 0x00007ff0b4d2f37e in start_thread (arg=0x7fee0f4ff640) at pthread_create.c:463 #12 0x00007ff0b3180b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) fr 5 #5 0x000056501f7d0469 in FlowForceReassemblyForFlow (f=f@entry=0x7fee6d734ac0) at flow-timeout.c:352 352 in flow-timeout.c (gdb) p f->thread_id $5 = {0, 18}
Second occurrence -- similar -- note f->thread_id
array
Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. [Current thread is 1 (Thread 0x7f19298ff640 (LWP 21542))] (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f1bcd6f612e in __GI_abort () at abort.c:79 #2 0x00007f1bcd6ed80a in __assert_fail_base (fmt=0x7f1bcd847040 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55b42e378e88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x55b42e378ef6 "tm-threads.c", line=line@entry=2320, function=function@entry=0x55b42e378fd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:92 #3 0x00007f1bcd6ed882 in __GI___assert_fail (assertion=assertion@entry=0x55b42e378e88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x55b42e378ef6 "tm-threads.c", line=line@entry=2320, function=function@entry=0x55b42e378fd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:101 #4 0x000055b42e09efa0 in TmThreadsInjectFlowById (f=f@entry=0x7f1a8fe28300, id=<optimized out>) at tm-threads.c:2320 #5 0x000055b42e03e469 in FlowForceReassemblyForFlow (f=f@entry=0x7f1a8fe28300) at flow-timeout.c:352 #6 0x000055b42e03b865 in ProcessAsideQueue (td=td@entry=0x7f1928c01030, counters=<optimized out>, counters=<optimized out>) at flow-manager.c:335 #7 0x000055b42e03bd4d in FlowTimeoutHash (td=td@entry=0x7f1928c01030, ts=ts@entry=0x7f19298fd310, hash_min=<optimized out>, hash_max=<optimized out>, counters=counters@entry=0x7f19298fd330) at flow-manager.c:524 #8 0x000055b42e03c69c in FlowTimeoutHashInChunks (chunks=40, iter=30, counters=0x7f19298fd330, hash_max=<optimized out>, hash_min=<optimized out>, ts=0x7f19298fd310, td=0x7f1928c01030) at flow-manager.c:554 #9 FlowManager (th_v=0x7f19ae1fb480, thread_data=0x7f1928c01000) at flow-manager.c:899 #10 0x000055b42e09cf54 in TmThreadsManagement (td=0x7f19ae1fb480) at tm-threads.c:551 #11 0x00007f1bcf36637e in start_thread (arg=0x7f19298ff640) at pthread_create.c:463 #12 0x00007f1bcd7b7b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) fr 5 #5 0x000055b42e03e469 in FlowForceReassemblyForFlow (f=f@entry=0x7f1a8fe28300) at flow-timeout.c:352 352 flow-timeout.c: No such file or directory. (gdb) p f->thread_id $1 = {0, 3} (gdb)
Updated by Jeff Lucovsky over 3 years ago
3rd occurrence -- same customer site
Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. [Current thread is 1 (Thread 0x7fef74eff640 (LWP 5103))] (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007ff218aec12e in __GI_abort () at abort.c:79 #2 0x00007ff218ae380a in __assert_fail_base (fmt=0x7ff218c3d040 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55cd9d1f3e88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x55cd9d1f3ef6 "tm-threads.c", line=line@entry=2320, function=function@entry=0x55cd9d1f3fd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:92 #3 0x00007ff218ae3882 in __GI___assert_fail (assertion=assertion@entry=0x55cd9d1f3e88 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x55cd9d1f3ef6 "tm-threads.c", line=line@entry=2320, function=function@entry=0x55cd9d1f3fd0 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:101 #4 0x000055cd9cf19fa0 in TmThreadsInjectFlowById (f=f@entry=0x7fe1692195c0, id=<optimized out>) at tm-threads.c:2320 #5 0x000055cd9ceb9469 in FlowForceReassemblyForFlow (f=f@entry=0x7fe1692195c0) at flow-timeout.c:352 #6 0x000055cd9ceb6865 in ProcessAsideQueue (td=td@entry=0x7fef74201030, counters=<optimized out>, counters=<optimized out>) at flow-manager.c:335 #7 0x000055cd9ceb6d4d in FlowTimeoutHash (td=td@entry=0x7fef74201030, ts=ts@entry=0x7fef74efd310, hash_min=<optimized out>, hash_max=<optimized out>, counters=counters@entry=0x7fef74efd330) at flow-manager.c:524 #8 0x000055cd9ceb769c in FlowTimeoutHashInChunks (chunks=40, iter=11, counters=0x7fef74efd330, hash_max=<optimized out>, hash_min=<optimized out>, ts=0x7fef74efd310, td=0x7fef74201030) at flow-manager.c:554 #9 FlowManager (th_v=0x7feff9266b40, thread_data=0x7fef74201000) at flow-manager.c:899 #10 0x000055cd9cf17f54 in TmThreadsManagement (td=0x7feff9266b40) at tm-threads.c:551 #11 0x00007ff21a75c37e in start_thread (arg=0x7fef74eff640) at pthread_create.c:463 #12 0x00007ff218badb5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) fr 5 #5 0x000055cd9ceb9469 in FlowForceReassemblyForFlow (f=f@entry=0x7fe1692195c0) at flow-timeout.c:352 352 flow-timeout.c: No such file or directory. (gdb) p f->thread_id $1 = {0, 19}
Updated by Jeff Lucovsky over 3 years ago
Occurred again with the patches from #4478 applied.
(gdb) where #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f387060512e in __GI_abort () at abort.c:79 #2 0x00007f38705fc80a in __assert_fail_base (fmt=0x7f3870756040 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5615ef73cde8 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x5615ef73ce56 "tm-threads.c", line=line@entry=2320, function=function@entry=0x5615ef73cf30 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:92 #3 0x00007f38705fc882 in __GI___assert_fail (assertion=assertion@entry=0x5615ef73cde8 "!(id <= 0 || id > (int)thread_store.threads_size)", file=file@entry=0x5615ef73ce56 "tm-threads.c", line=line@entry=2320, function=function@entry=0x5615ef73cf30 <__PRETTY_FUNCTION__.22581> "TmThreadsInjectFlowById") at assert.c:101 #4 0x00005615ef462f80 in TmThreadsInjectFlowById (f=f@entry=0x7f369e2cf580, id=<optimized out>) at tm-threads.c:2320 #5 0x00005615ef402439 in FlowForceReassemblyForFlow (f=f@entry=0x7f369e2cf580) at flow-timeout.c:352 #6 0x00005615ef3ff835 in ProcessAsideQueue (td=td@entry=0x7f35cba00030, counters=<optimized out>, counters=<optimized out>) at flow-manager.c:335 #7 0x00005615ef3ffd1d in FlowTimeoutHash (td=td@entry=0x7f35cba00030, ts=ts@entry=0x7f35cc6fd310, hash_min=<optimized out>, hash_max=<optimized out>, counters=counters@entry=0x7f35cc6fd330) at flow-manager.c:524 #8 0x00005615ef40066c in FlowTimeoutHashInChunks (chunks=40, iter=10, counters=0x7f35cc6fd330, hash_max=<optimized out>, hash_min=<optimized out>, ts=0x7f35cc6fd310, td=0x7f35cba00030) at flow-manager.c:554 #9 FlowManager (th_v=0x7f3639169f80, thread_data=0x7f35cba00000) at flow-manager.c:899 #10 0x00005615ef460f34 in TmThreadsManagement (td=0x7f3639169f80) at tm-threads.c:551 #11 0x00007f387227537e in start_thread (arg=0x7f35cc6ff640) at pthread_create.c:463 #12 0x00007f38706c6b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) fr 4 #4 0x00005615ef462f80 in TmThreadsInjectFlowById (f=f@entry=0x7f369e2cf580, id=<optimized out>) at tm-threads.c:2320 2320 in tm-threads.c (gdb) p f->thread_id $4 = {0, 30}
Updated by Jeff Lucovsky over 3 years ago
This occurs with regularity at one of our customer sites.
Could slot 0 of thread_id
remain the same as its initialized state (0
) if traffic if only pkts to the client are observed?
The slots are assigned based on the idx
value from HandleThreadId
: const int idx = (!(PKT_IS_TOSERVER(p)));
Updated by Victor Julien over 3 years ago
Can you show the contents of the flow? At least the stats in it and p/x f->flags
?
Updated by Jeff Lucovsky over 3 years ago
(gdb) p/x f->flags $1 = 0x10031b (gdb) p *f $2 = {src = {address = {address_un_data32 = {3017916995, 0, 0, 0}, address_un_data16 = {49731, 46049, 0, 0, 0, 0, 0, 0}, address_un_data8 = "C\302\341\263", '\000' <repeats 11 times>}}, dst = {address = {address_un_data32 = {1348125032, 0, 0, 0}, address_un_data16 = { 49512, 20570, 0, 0, 0, 0, 0, 0}, address_un_data8 = "h\301ZP", '\000' <repeats 11 times>}}, {sp = 61436, icmp_s = {type = 252 '\374', code = 239 '\357'}}, {dp = 443, icmp_d = {type = 187 '\273', code = 1 '\001'}}, proto = 6 '\006', recursion_level = 0 '\000', vlan_id = {0, 0}, use_cnt = 0, vlan_idx = 0 '\000', {{ffr_ts = 1 '\001', ffr_tc = 1 '\001'}, ffr = 17 '\021'}, timeout_at = 1630017823, thread_id = {0, 30}, next = 0x0, livedev = 0x0, flow_hash = 2575188026, lastts = {tv_sec = 1630017523, tv_usec = 544004}, timeout_policy = 300, flow_state = 1, tenant_id = 0, probing_parser_toserver_alproto_masks = 0, probing_parser_toclient_alproto_masks = 0, flags = 1049371, file_flags = 3087, protodetect_dp = 0, parent_id = 0, m = {__data = {__lock = 1, __count = 0, __owner = 10851, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\001\000\000\000\000\000\000\000c*\000\000\001", '\000' <repeats 26 times>, __align = 1}, protoctx = 0x7f346926b480, protomap = 0 '\000', flow_end_flags = 0 '\000', alproto = 0, alproto_ts = 0, alproto_tc = 0, alproto_orig = 0, alproto_expect = 0, de_ctx_version = 99, min_ttl_toserver = 123 '{', max_ttl_toserver = 123 '{', min_ttl_toclient = 49 '1', max_ttl_toclient = 49 '1', alparser = 0x0, alstate = 0x0, sgh_toclient = 0x7f3633331550, sgh_toserver = 0x7f363314c130, flowvar = 0x0, fb = 0x0, startts = {tv_sec = 1630017482, tv_usec = 114984}, todstpktcnt = 1, tosrcpktcnt = 101, todstbytecnt = 64, tosrcbytecnt = 8464}
Updated by Victor Julien over 3 years ago
Can you also add p *(TcpSession *)f->protoctx
?
Updated by Jeff Lucovsky over 3 years ago
(gdb) p *(TcpSession *)f->protoctx $4 = {res = 29, state = 4 '\004', pstate = 0 '\000', queue_len = 0 '\000', data_first_seen_dir = 8 '\b', tcp_packet_flags = 25 '\031', flags = 5123, reassembly_depth = 5242880, server = {flags = 0, wscale = 14, os_policy = 0 '\000', tcp_flags = 25 '\031', isn = 993559311, next_seq = 993559312, last_ack = 993559312, next_win = 993559312, window = 0, last_ts = 0, last_pkt_ts = 0, base_seq = 993559312, app_progress_rel = 0, raw_progress_rel = 0, log_progress_rel = 0, min_inspect_depth = 0, data_required = 0, sb = { cfg = 0x5615efa7a550 <stream_config+48>, stream_offset = 0, buf = 0x0, buf_size = 0, buf_offset = 0, sbb_tree = {rbh_root = 0x0}, head = 0x0}, seg_tree = {rbh_root = 0x0}, segs_right_edge = 0, sack_size = 0, sack_tree = {rbh_root = 0x0}}, client = {flags = 0, wscale = 14, os_policy = 12 '\f', tcp_flags = 0 '\000', isn = 3770227041, next_seq = 3770227088, last_ack = 3770227042, next_win = 3779647842, window = 9420800, last_ts = 0, last_pkt_ts = 0, base_seq = 3770227042, app_progress_rel = 0, raw_progress_rel = 0, log_progress_rel = 0, min_inspect_depth = 0, data_required = 0, sb = {cfg = 0x5615efa7a550 <stream_config+48>, stream_offset = 0, buf = 0x7f3442c3e800 "\027\003\003", buf_size = 2048, buf_offset = 46, sbb_tree = {rbh_root = 0x0}, head = 0x0}, seg_tree = { rbh_root = 0x7f3442ae0620}, segs_right_edge = 3770227088, sack_size = 0, sack_tree = {rbh_root = 0x0}}, queue = 0x0}
Updated by Victor Julien over 3 years ago
The following flow flags are set
FLOW_IPV4 FLOW_TO_SRC_SEEN FLOW_TO_DST_SEEN FLOW_TOSERVER_IPONLY_SET FLOW_TOCLIENT_IPONLY_SET FLOW_SGH_TOSERVER FLOW_SGH_TOCLIENT
So nothing special.
Traffic is TCP. We've seen a single packet to the server direction (todstpktcnt = 1
). There is a single path that could avoid us to reach HandleThreadId
, which is when the packet had a bad checksum. Do you see any bad checksums in the stats of this sensor?
Updated by Jeff Lucovsky over 3 years ago
The data from the time of the crash is not available ... We do see many occurrences of non-zero invalid_checksum
, e.g., 38470 invalid checksums out of 75,644,142
packets
Updated by Victor Julien over 3 years ago
Thanks, I think the TcpSession
internals support the idea that no toserver
packet made it past that checksum check, thus not reaching the HandleThreadId
func.
Can you try:
diff --git a/src/stream-tcp.c b/src/stream-tcp.c
index 8d3611285..8ee4aefb1 100644
--- a/src/stream-tcp.c
+++ b/src/stream-tcp.c
@@ -4785,8 +4785,6 @@ int StreamTcpPacket (ThreadVars *tv, Packet *p, StreamTcpThread *stt,
SCLogDebug("p->pcap_cnt %"PRIu64, p->pcap_cnt);
- HandleThreadId(tv, p, stt);
-
TcpSession *ssn = (TcpSession *)p->flow->protoctx;
/* track TCP flags */
@@ -5207,6 +5205,8 @@ TmEcode StreamTcp (ThreadVars *tv, Packet *p, void *data, PacketQueueNoLock *pq)
return TM_ECODE_OK;
}
+ HandleThreadId(tv, p, stt);
+
/* only TCP packets with a flow from here */
if (!(p->flags & PKT_PSEUDO_STREAM_END)) {
If you can't try a patch, a work around could be to disable csum checks in the stream engine config.
Updated by Jeff Lucovsky over 3 years ago
The patch was installed on 30 August mid-day and has been running without issue. The frequency of the crash was low and it's not clear what traffic caused the issue to occur. Thus, the patch is not harmful and success cannot be determined at this point.
Updated by Victor Julien about 3 years ago
- Related to Bug #4468: Assertion failures in TmThreadsInjectFlowById added
Updated by Chris Sibley over 2 years ago
Hi Jeff,
We recently upgraded to Suricata 6.0.6 and have been consistently experiencing crashes that appear to be related to this issue. Specifically, the process crashes with the following error:
suricata: tm-threads.c:2365: TmThreadsInjectFlowById: Assertion `!(id <= 0 || id > (int)thread_store.threads_size)' failed.
systemd reports:
Sep 19 11:43:45 rsmtnmri03 systemd1: suricata.service: Main process exited, code=killed, status=6/ABRT
Sep 19 11:43:45 rsmtnmri03 systemd1: suricata.service: Failed with result 'signal'.
Sep 19 11:43:45 rsmtnmri03 systemd1: suricata.service: Consumed 10h 52min 27.523s CPU time.
After finding this bug, I updated the suricata configuration to disable stream engine checksum checks and suricata has been stable ever since. We have a second server, processing the same traffic, that I have not made the configuration change on and it continues to crash.
Thanks,
--Chris
Updated by Victor Julien over 2 years ago
- Status changed from New to Assigned
- Assignee set to Victor Julien
- Target version set to 6.0.7
Updated by Victor Julien over 2 years ago
- Status changed from Assigned to In Progress
- Target version changed from 6.0.7 to 7.0.0-beta1
- Label Needs backport to 6.0 added
Updated by Victor Julien over 2 years ago
- Status changed from In Progress to In Review
Updated by Victor Julien over 2 years ago
- Status changed from In Review to Closed
- Label deleted (
Needs backport to 6.0)