Optimization #7208
opentcp/reassemble: GetBlock takes O(nlgn) in worst case
Description
The GetBlock function search the suitable StreamingBufferBlock by offset, but it relys on SBB_RB_NEXT which degree the sbb red black tree search time from O(logn) to O(n), and the SBB_RB_NEXT increases L1 dcache missing when tcp session within much tcpsegment. In production env, SBB_RB_NEXT would occoupies 8-12% cpu times. Suggest changing the way to find block to standerd red black search:
Updated by chris tang 3 months ago
1. Finds the first node A greater than or equal to the offset
2. Find A's prev node B
3. Check B->len+B->offset > offset? yes:return B ,no: return A
According to optimization above, we've decrease the cpu time from 8-12% to under 1%.
Updated by Shivani Bhardwaj 3 months ago
chris tang wrote in #note-1:
1. Finds the first node A greater than or equal to the offset
2. Find A's prev node B
3. Check B->len+B->offset > offset? yes:return B ,no: return AAccording to optimization above, we've decrease the cpu time from 8-12% to under 1%.
Thank you.
Q: Could you please tell how is it guaranteed that only the A's prev node B will satisfy B->len + B->offset > offset
?
Q: Could it not be any other node prior to A?
Updated by chris tang 3 months ago
Shivani Bhardwaj wrote in #note-2:
chris tang wrote in #note-1:
1. Finds the first node A greater than or equal to the offset
2. Find A's prev node B
3. Check B->len+B->offset > offset? yes:return B ,no: return AAccording to optimization above, we've decrease the cpu time from 8-12% to under 1%.
Thank you.
Q: Could you please tell how is it guaranteed that only the A's prev node B will satisfyB->len + B->offset > offset
?
Q: Could it not be any other node prior to A?
Thanks for pointing out that, that's also confused me.
So I read the SBBUpdate function code, found that new StreamingBufferBlock node will be adjusted by ConsolidateFwd and ConsolidateBackward to make sure there is only one node at a certain offset+len, overlap would be merged, only gap exists? I'm not sure if my understanding is right. If it is,i think any node prior to B will not satisfy if B satisfied B->len + B->offset > offset
?
Updated by Shivani Bhardwaj about 2 months ago
So I read the SBBUpdate function code, found that new StreamingBufferBlock node will be adjusted by ConsolidateFwd and ConsolidateBackward to make sure there is only one node at a certain offset+len, overlap would be merged, only gap exists? I'm not sure if my understanding is right. If it is,i think any node prior to B will not satisfy if B satisfied
B->len + B->offset > offset
?
Thank you for the necessary pointers. Apologies for getting back late on this.
After my analysis, I think you algorithm could actually work and yield more correct results indeed.
However, I'm a bit unsure about the claimed time complexity of the rbtree macros. It seems to me that RB_NEXT
(finding successor) will take the same amount of time as RB_PREV
(finding predecessor) i.e. O(lg n) per run. Please check the implementation in tree.h
in the source code. Lmk wdyt about this?
However, with the change in algorithm that you suggest, particularly the search for first node greater than equal to offset, we always seem to reach the correct node.
I'd like to understand how you'd handle the following.
Assumption: There is no offset bigger than the current offset in the tree.
1. There does exist an overlap with an existing buffer whose offset + len > offset (right most node always?)
2. There is no block covering that offset (so, perhaps we're in a GAP?)
Let's see if your PR passes our QA and if our analysis is correct. :) Please make sure to add your test results and useful pointers/test data for us to verify that as well.
Please feel free to assign the ticket to yourself since you're working on it. Thanks.
Updated by chris tang about 2 months ago
- Assignee changed from OISF Dev to chris tang
Thank you for your ongoing attention and responses to my tickets.
However, I'm a bit unsure about the claimed time complexity of the rbtree macros. It seems to me that
RB_NEXT
(finding successor) will take the same amount of time asRB_PREV
(finding predecessor) i.e. O(lg n) per run. Please check the implementation intree.h
in the source code. Lmk wdyt about this?
Sorry for my describe, yes the time complexity of RB_NEXT is equal to RB_PREV, i mean, but the GetBlock call O(n) times RB_NEXT.
for ( ; blk != NULL; blk = SBB_RB_NEXT(blk)) {
if (blk->offset >= offset)
return blk;
else if ((blk->offset + blk->len) > offset) {
return blk;
}
}
It searches from beginning of the tree, one by one, truns rb tree search into line. In worst situation, the node we find is the end block of the streaming buffer, it takes n*logn time to find it. If we find the prior node from root, we can run RB_PREV only once. Here's the code snippet:
static StreamingBufferBlock *GetBlockDo(const StreamingBuffer *sb, const uint64_t offset)
{
StreamingBufferBlock *blk;
StreamingBufferBlock *res = GetPriorBlock(sb, offset);
if (res != NULL) {
blk = SBB_RB_PREV(res);
} else {
blk = SBB_RB_MINMAX(&sb->sbb_tree, RB_INF);
}
if (blk && blk->offset + blk->len > offset) {
return blk;
}
return res;
}
However, with the change in algorithm that you suggest, particularly the search for first node greater than equal to offset, we always seem to reach the correct node.
I'd like to understand how you'd handle the following.
Assumption: There is no offset bigger than the current offset in the tree.
1. There does exist an overlap with an existing buffer whose offset + len > offset (right most node always?)
2. There is no block covering that offset (so, perhaps we're in a GAP?)
Pleas look at the code snippet,
1.Return the RB_MAX node, then check if it satisfied offset and length check
2.Like what current GetBlock code is doing: seems that there is't a correct node, so we return NULL.
Let's see if your PR passes our QA and if our analysis is correct. :) Please make sure to add your test results and useful pointers/test data for us to verify that as well.
Please feel free to assign the ticket to yourself since you're working on it. Thanks.
Thanks,here's the commit https://github.com/regit/suricata/pull/51/commits/04906b57fdb0ed5a941389e8630c7783cd15d740 PR and hope it passed the QA.
Updated by Shivani Bhardwaj about 2 months ago
chris tang wrote in #note-6:
Thank you for your ongoing attention and responses to my tickets.
However, I'm a bit unsure about the claimed time complexity of the rbtree macros. It seems to me that
RB_NEXT
(finding successor) will take the same amount of time asRB_PREV
(finding predecessor) i.e. O(lg n) per run. Please check the implementation intree.h
in the source code. Lmk wdyt about this?Sorry for my describe, yes the time complexity of RB_NEXT is equal to RB_PREV, i mean, but the GetBlock call O(n) times RB_NEXT.
ah ok. yes, thank you!
[...]
It searches from beginning of the tree, one by one, truns rb tree search into line. In worst situation, the node we find is the end block of the streaming buffer, it takes n*logn time to find it. If we find the prior node from root, we can run RB_PREV only once. Here's the code snippet:
agreed.
[...]
However, with the change in algorithm that you suggest, particularly the search for first node greater than equal to offset, we always seem to reach the correct node.
I'd like to understand how you'd handle the following.
Assumption: There is no offset bigger than the current offset in the tree.
1. There does exist an overlap with an existing buffer whose offset + len > offset (right most node always?)
2. There is no block covering that offset (so, perhaps we're in a GAP?)Pleas look at the code snippet,
1.Return the RB_MAX node, then check if it satisfied offset and length check
2.Like what current GetBlock code is doing: seems that there is't a correct node, so we return NULL.Let's see if your PR passes our QA and if our analysis is correct. :) Please make sure to add your test results and useful pointers/test data for us to verify that as well.
Please feel free to assign the ticket to yourself since you're working on it. Thanks.
Thanks,here's the commit https://github.com/regit/suricata/pull/51/commits/04906b57fdb0ed5a941389e8630c7783cd15d740 PR and hope it passed the QA.
Thank you. Could you please create a PR at https://github.com/OISF/suricata/ so we can see if the QA passes and we're both correct in our analysis? ;)
Updated by chris tang about 2 months ago
Shivani Bhardwaj wrote in #note-7:
Thank you. Could you please create a PR at https://github.com/OISF/suricata/ so we can see if the QA passes and we're both correct in our analysis? ;)
Oh sorry for my mistake. I've request a PR at https://github.com/OISF/suricata/ but wrote the wrong address, here's the PR https://github.com/OISF/suricata/pull/11727, and i'm requesting a new PR to format code and sign commit.
Thanks.
Updated by chris tang about 2 months ago ยท Edited
Here is the new PR https://github.com/OISF/suricata/pull/11749
Updated by Shivani Bhardwaj about 1 month ago
- Subject changed from Tcpreassmble GetBlock needs optimization to tcp/reassemble: GetBlock takes O(nlgn) in worst case
- Status changed from New to In Review
Updated by Victor Julien about 1 month ago
- Status changed from In Review to Resolved
- Label Needs backport to 7.0 added
Updated by OISF Ticketbot about 1 month ago
- Label deleted (
Needs backport to 7.0)