Bug #4187
closedrs_dcerpc_udp_get_tx takes out unusual amount of CPU
Description
I noticed something unusual while chasing https://redmine.openinfosecfoundation.org/issues/4080
It seems rs_dcerpc_udp_get_tx keeps every thread busy with about 0.5-2.5% of the total CPU, which seems a lot for a function like that.
(was having side discussion with Eric about it too)
Please see the Flame Graph (svg) attached as well for a 12 hr run.
The purple squares in the screenshot attached are from the rs_dcerpc_udp_get_tx function.
Using:
/opt/suritest-tmp/bin/suricata --build-info This is Suricata version 6.0.1-dev (f3c59ef8a 2020-12-01) Features: PCAP_SET_BUFF AF_PACKET HAVE_PACKET_FANOUT LIBCAP_NG LIBNET1.1 HAVE_HTP_URI_NORMALIZE_HOOK PCRE_JIT HAVE_NSS HAVE_LUA HAVE_LUAJIT HAVE_LIBJANSSON TLS TLS_C11 MAGIC RUST SIMD support: SSE_4_2 SSE_4_1 SSE_3 Atomic intrinsics: 1 2 4 8 16 byte(s) 64-bits, Little-endian architecture GCC version 4.2.1 Compatible Clang 7.0.1 (tags/RELEASE_701/final), C version 201112 compiled with _FORTIFY_SOURCE=0 L1 cache line size (CLS)=64 thread local storage method: _Thread_local compiled with LibHTP v0.5.35, linked against LibHTP v0.5.35 Suricata Configuration: AF_PACKET support: yes eBPF support: yes XDP support: yes PF_RING support: no NFQueue support: no NFLOG support: no IPFW support: no Netmap support: no DAG enabled: no Napatech enabled: no WinDivert enabled: no Unix socket enabled: yes Detection enabled: yes Libmagic support: yes libnss support: yes libnspr support: yes libjansson support: yes hiredis support: no hiredis async with libevent: no Prelude support: no PCRE jit: yes LUA support: yes, through luajit libluajit: yes GeoIP2 support: yes Non-bundled htp: no Hyperscan support: yes Libnet support: yes liblz4 support: yes Rust support: yes Rust strict mode: yes Rust compiler path: /root/.cargo/bin/rustc Rust compiler version: rustc 1.48.0 (7eac88abb 2020-11-16) Cargo path: /root/.cargo/bin/cargo Cargo version: cargo 1.48.0 (65cbdd2dc 2020-10-14) Cargo vendor: yes Python support: yes Python path: /usr/bin/python3 Python distutils yes Python yaml yes Install suricatactl: yes Install suricatasc: yes Install suricata-update: not bundled Profiling enabled: no Profiling locks enabled: no Plugin support (experimental): yes Development settings: Coccinelle / spatch: no Unit tests enabled: no Debug output enabled: no Debug validation enabled: no Generic build parameters: Installation prefix: /opt/suritest-tmp Configuration directory: /opt/suritest-tmp/etc/suricata/ Log directory: /opt/suritest-tmp/var/log/suricata/ --prefix /opt/suritest-tmp --sysconfdir /opt/suritest-tmp/etc --localstatedir /opt/suritest-tmp/var --datarootdir /opt/suritest-tmp/share Host: x86_64-pc-linux-gnu Compiler: clang (exec name) / g++ (real) GCC Protect enabled: no GCC march native enabled: yes GCC Profile enabled: no Position Independent Executable enabled: no CFLAGS -ggdb3 -O0 -Wchar-subscripts -Wshadow -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -Wno-unused-parameter -Wno-unused-function -std=c11 -march=native -I${srcdir}/../rust/gen -I${srcdir}/../rust/dist PCAP_CFLAGS -I/usr/include SECCFLAGS
Files
Updated by Peter Manev almost 4 years ago
- File Screenshot from 2020-12-02 23-41-52.png Screenshot from 2020-12-02 23-41-52.png added
- File Screenshot from 2020-12-02 23-41-52.png added
Another perf top attempt to trace (attached)
Updated by Peter Manev almost 4 years ago
- File deleted (
Screenshot from 2020-12-02 23-41-52.png)
Updated by Shivani Bhardwaj almost 4 years ago
- Status changed from New to Assigned
- Assignee set to Shivani Bhardwaj
- Target version set to 7.0.0-beta1
Updated by Peter Manev almost 4 years ago
On certain CPUs i can observe the following:
Updated by Shivani Bhardwaj almost 4 years ago
- Priority changed from Normal to High
Updated by Peter Manev over 3 years ago
It does not seem related to traditional ports traffic for dcerpc.
When the traditional dcerpc ports are negated, what is left is very few dcerpc logs per second - really low - it could be 40-50 events per minute.
Interesting observation is that the performance hit is related to a number of specific events appearing.
So the percentage of the function (rs_dcerpc_udp_get_tx) usage goes up and down and disappears with time and only climbs up after seeing a few - 5/15 "UNREPLIED" responses. The performance hit can be up to 70% of that thread usage it seems.
Example:
{"timestamp":"2021-02-04T00:28:49.287610+0100","flow_id":126330399730600,"in_iface":"abcd","event_type":"dcerpc","vlan":[222],"src_ip":"x.x.x.x","src_port":51820,"dest_ip":"y.y.y.y","dest_port":61056,"proto":"UDP","dcerpc":{"request":"REQUEST","req":{"opnum":0,"frag_cnt":1,"stub_data_size":63},"response":"UNREPLIED","activityuuid":"2cfa693a-298c-cee2-7c09-ebbd4a5a474e","seqnum":15733106,"rpc_version":"4.0"}}
pcap shared privately.
Updated by Shivani Bhardwaj over 3 years ago
- Status changed from Assigned to Closed
Closed by: https://github.com/OISF/suricata/pull/5891
Updated by Shivani Bhardwaj over 3 years ago
- Copied to Bug #4353: rs_dcerpc_udp_get_tx takes out unusual amount of CPU added