Bug #6066
closedMemory Corruption in util-streaming-buffer
Description
While running a recent version of suricata (commit 2ddd26446e3, 12 Apr 2023) we have been getting crashes related to memory corruption in the streaming buffer.
Our suricata.yaml has:
stream: inline: yes midstream: true async-oneside: true
We noticed the instability manifested when we switched to inline: yes.
The crashes manifest as seemingly random memory corruption. Using ASAN, we found a large write (>4GB) in GrowRegionToSize() at util-streaming-buffer.c:609.
The backtrace was:
GrowRegionToSize util-streaming-buffer:609 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#L609 BufferInsertAtRegionConsolidate util-streaming-buffer.c:1109 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#L1109 BufferInsertAtRegionDo util-streaming-buffer.c:1245 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#L1245 BufferInsertAtRegion util-streaming-buffer.c:1311 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#L1311 StreamingBufferInsertAt util-streaming-buffer.c:1331 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/util-streaming-buffer.c#L1331 InsertSegmentDataCustom stream-tcp-list.c:99 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp-list.c#L99 StreamTcpReassembleInsertSegment stream-tcp-list.c:650 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp-list.c#L650 StreamTcpReassembleHandleSegmentHandleData stream-tcp-reassemble.c:804 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp-reassemble.c#L804 StreamTcpReassembleHandleSegment stream-tcp-reassemble.c:2031 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp-reassemble.c#L2031 HandleEstablisherPacketToClient stream-tcp.c:2813 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp.c#L2813 StreamTcpPacketStateEstablished stream-tcp.c:3225 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp.c#L3225 StreamTcpStateDispatch stream-tcp.c:5238 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp.c#L5238 StreamTcpPacket stream-tcp.c:5447 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp.c#L5447 StreamTcp stream-tcp.c:5759 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/stream-tcp.c#L5759 FlowWorkerStreamTCPUpdate flow-worker.c:377 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/flow-worker.c#L377 FlowWorker flow-worker.c:555 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/flow-worker.c#L555 TmThreadsSlotVarRun tm-threads.c:119 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/tm-threads.c#L119 TmThreadsSlotProcessPkt tm-threads.h:194 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/tm-threads.h#L194 ReceiveDPDKLoop source-dpdk.c:413 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/source-dpdk.c#L413 TmThreadsSlotPktAcqLoop tm-threads.c:314 https://github.com/OISF/suricata/blob/2ddd26446e3a568074650cf16ec4ad8402acdcd4/src/tm-threads.c#L314
We have identified that the problem is here:
/* for safe printing and general caution, lets memset the * new data to 0 */ size_t diff = grow - region->buf_size; void *new_mem = ((char *)ptr) + region->buf_size; memset(new_mem, 0, diff);
Specifically in our case `grow` is less than `region->buf_size`, and so the calculation of `diff` underflows and the `memset()` writes NULL bytes to a large chunk of memory.
We have a workaround that simplify identifies when this condition will occur and returns `-1` to indicate error before reallocating the buffer. When this happens, the requested new `size` of the buffer passed to `GrowRegionToSize()` is actually less than the existing `region->buf_size`, so the call is trying to shrink the buffer. Simply returning an error here seems effective at preventing the memory corruption (and subsequent segfault), but is obviously not the good solution since it is not handling the reassembly correctly.
After applying this workaround, suricata stability is much improved, but now we seem to be hitting an assert (which may or may not be related - it is possible these are two separate issues):
The backtrace on this assert is (sorry, no line numbers):
StreamingBufferSBBGetData() StreamReassembleRaw() PrefilterPktStream() Prefilter() DetectRun.part.0() Detect() FlowWorker() TmThreadsSlotVarRun() ReceiveDPDKLoop() TmThreadsSlotPktAcqLoop()
It isn't obvious what is going wrong that we are hitting this assert, but looking at the region / SBB segment selection logic, we are trying the following patch:
--- a/src/util-streaming-buffer.c +++ b/src/util-streaming-buffer.c @@ -72,10 +72,10 @@ static inline int InclusiveCompare(StreamingBufferBlock *lookup, StreamingBuffer const uint64_t tre = intree->offset + intree->len; if (lre <= intree->offset) // entirely before return -1; - else if (lookup->offset < tre && lre <= tre) // (some) overlap - return 0; - else - return 1; // entirely after + if (tre <= lookup->offset) // entirely after + return 1; + + return 0; // (some) overlap }
Since the existing logic does not identify the right hand overlap case `(lookup->offset < tre && lre > tre)`, as overlap (it identifies it as entirely after), but we haven't been able to determine if this is effective or not.
Updated by Victor Julien over 1 year ago
- Status changed from New to Assigned
- Assignee changed from OISF Dev to Victor Julien
- Target version changed from TBD to 7.0.0-rc2
Updated by Victor Julien over 1 year ago
@Todd Mortimer do you have a pcap to reproduce the issue?
Updated by Todd Mortimer over 1 year ago
Victor Julien wrote in #note-2:
@Todd Mortimer do you have a pcap to reproduce the issue?
I don't have a pcap to reproduce, sorry. I have core dumps but can't share them.
At the time of failure, we see values for region->buf_size = 743424 and the requested size is 335872, etc. So the requested size is smaller than the existing region buffer. Sometimes by a few KB, sometimes about half (as above).
Updated by Philippe Antoine over 1 year ago
Quite high numbers, after the original 2048 bytes allocation...
Could you print out more information ? About the flow ? (specific ports, flags like gap)
I wonder how you could get a pcap for these flows...
Updated by Victor Julien over 1 year ago
- Related to Bug #5834: tcp/regions: list corruption added
Updated by Victor Julien over 1 year ago
- Related to Bug #6041: ASSERT: !(sb->region.buf_offset != 0) added
Updated by Todd Mortimer over 1 year ago
Philippe Antoine wrote in #note-4:
Could you print out more information ? About the flow ? (specific ports, flags like gap)
No problem. I captured some info about the buffer state, session, stream and packet at the time this happens.
For an instance where it tried to shrink the region from 745472 to 460800 bytes, (cfg.buf_size = 2048)
Region Info region.buf_size: 745472 region.buf_offset: 35612 region.stream_offset: 897668 Streaming Buffer (SBBPrintList() output) sbb offset 0, len 49120 gap offset 49120 len 848548 sbb offset 897668 len 1228 gap offset 898896 len 284896 sbb offset 1183792 len 4912 gap offset 1188704 len 1228 sbb offset 1189932 len 9824 gap offset 1199756 len 46664 sbb offset 1246420 len 6140 gap offset 1252560 len 335244 sbb offset 1587804 len 35612 gap offset 1623416 len 8596 sbb offset 1632012 len 11052 Packet Info: packet.source port: 443 packet.dest port: 52140 packet.proto: 6 packet.flowflags: 5 = 0b0000_0101 = FLOW_PKT_TOSERVER & FLOW_PKT_ESTABLISHED packet.payload_len: 1228 packet.flow->flow_state: 1 = FLOW_STATE_ESTABLISHED packet.flow->flags: 1089802 = 0b0001_0000_1010_0001_0000_1010 = FLOW_TO_DST_SEEN | FLOW_TOSERVER_IPONLY_SET | FLOW_SGH_TOSERVER | FLOW_TS_PM_ALPROTO_DETECT_DONE | FLOW_TS_PE_ALPROTO_DETECT_DONE | FLOW_IPV4 packet.flow->alproto: 0 = ALPROTO_UNKNOWN packet.flow->todstpktcnt: 99 packet.flow->tosrcpktcnt: 0 packet.flow->todstbytecnt: 129294 packet.flow->tosrcbytecnt: 0 TCP Header info packet.tcphr.th_sport: 443 packet.tcphr.th_dport: 52140 packet.tcphr.th_seq: 1274407213 packet.tcphr.th_ack: 3686284819 packet.tcphr.th_offx2: 128 packet.tcphr.th_flags: 16 packet.tcphr.th_win: 546 packet.tcphr.th_sum: 11219 packet.tcphr.th_urp: 0 Stream Info TcpStream.flags: 0 TcpStream.isn: 1273154652 TcpStream.next_seq: 1274797717 TcpStream.last_ack: 1274797717 TcpStream.base_seq: 1273154653 Session Info TcpSession.state: 4 = TCP_ESTABLISHED TcpSession.pstate: 0 = TCP_NONE TcpSession.queue_len: 0 TcpSession.data_first_seen_dir: 4 = STREAM_TOSERVER TcpSession.tcp_packet_flags: 24 TcpSession.flags: 3147 = STREAMTCP_FLAG_MIDSTREAM | STREAMTCP_FLAG_MIDSTREAM_ESTABLISHED | STREAMTCP_FLAG_TIMESTAMP | STREAMTCP_FLAG_ASYNC | STREAMTCP_FLAG_SACKOK | STREAMTCP_FLAG_LOSSY_BY_LIBERAL TcpSession.reassembly_depth: 134217728
I can reproduce this pretty reliably (takes some time, but happens at least daily) so I can pull more info if needed about the stream or the streaming buffer state if needed.
Looking at this output, it seems like the stream is one-sided, attached midstream, and is missing a lot of packets so has lots of gaps. The packet we are trying to insert is closer to the beginning of the stream than the end, so it's selected the 3rd region in the streaming buffer to insert into.
Updated by Victor Julien over 1 year ago
Updated by Todd Mortimer over 1 year ago
Updated by Philippe Antoine over 1 year ago
- Related to Bug #6117: tcp regions streaming buffer: assert failed (!((region->stream_offset == sbb->offset && region->buf_offset > sbb->len))), function StreamingBufferSBBGetData added
Updated by Victor Julien over 1 year ago
- Status changed from Assigned to In Review
Updated by Victor Julien over 1 year ago
- Status changed from In Review to Closed
2 more sets of fixes, hoping that this case is addressed by it as well.
Updated by Todd Mortimer over 1 year ago
Victor Julien wrote in #note-9:
Does this still happen with the current git master? I've addressed 2 possible related issues in #5834 and #6041
Okay, we have had this in prod for a couple of days and haven't had any issues with GrowRegionToSize() underflowing, so that looks like it is definitely fixed!
We do see an assert hitting in SBBPrune() now, so I will pull in the latest changes and put them through the test / deploy process to see!
Thank you!