Feature #6805
opencpu-affinity: enhance CPU affinity logic with per-interface NUMA preferences
Description
This could help with deployments where CPU cores of 1 NUMA node are interleaved with CPU cores of the other NUMA (nodes) and there are NICs on every NUMA node.
In this scenario, the user might want to use CPU cores from both CPU NUMA nodes but control the NUMA assignment per-interface basis.
Supposed we have 2 NUMA nodes and 2 interfaces, we cannot assign CPU cores to the interfaces to be NUMA-friendly:
e.g.:
NUMA CPU1: 0,2,4,6,8
NUMA CPU2: 1,3,5,7,9
iface1 on NUMA1,
iface2 on NUMA2.
The desired assignment - cores 0,2,4,6,8 are assigned to iface1 and cores 1,3,5,7,9 are assigned to iface2.
Currently, the cores are merged together and are getting picked up in order by individual NICs, so iface1 gets cores 0,1,2,3,4 and iface2 gets cores 5,6,7,8,9.
This could be solved by more granular CPU assignment - e.g. CPU mask per interface or the CPU assigning logic could prefer CPU cores from NUMA nodes of the currently configured NIC.
Updated by Lukas Sismis 5 months ago ยท Edited
This feature will likely need to be capture mode-specific, so e.g. DPDK and af_packet.
Or it might be generic if there is a way to obtain NUMA-id of the NIC via generic calls.
For AF_PACKET you need to support autofp mode, in that case you need to consider receive-cpu-set,
otherwise you consider worker-cpu-set. This doesn't make much sense either because then receive-cpu-set would transfer data to worker-cpu-set
Can management threads be pinned to a specific NUMA Node and only work with that, primarily?
- Not at the current moment.
Is memory allocated on both NUMA nodes for Suricata structures?
- Packetpool should be allocated by the worker, Flow memory by the main thread.
The true goal would be being NUMA-local with all memory allocations.
In DPDK it can be relatively easy to pin workers to a specific NUMA node,
- you take the first CPU core from the NUMA node from where the NIC is.
- memory for the packet mempool is allocated on the same NUMA node, because it is allocated in the initialization phase of the workers.
- Suricata packetpool should be allocated by worker and by the kernel call, hopefully on the same NUMA as where CPU is.
- it is unfortunate with the flow table and other structures etc., it will likely be allocated only on one NUMA node,
Flow table could be allocated for each interface to be NUMA-local,
- requires likely a lot of changes
- might not be so useful in the end in the larger deployments
- it is not possible to allocate memory for the flow table on a specific NUMA node.
Implementing it in a generic way - could use -
RunModeSetLiveCaptureWorkersForDevice - TmThreadCreatePacketHandler - TmThreadCreate( - TmThreadSetSlots - TmThreadsSlotVar - TmThreadSetupOptions - AffinityGetNextCPU
Design idea:
From RunModeSetLiveCaptureWorkersForDevice propagate device name to AffinityGetNextCPU or assign it to thread vars structure so it can be later queried in the AffinityGetNextCPU for the NUMA ID. In this function, individual CPUs should also be queried for the NUMA locality.
Updated by Victor Julien 5 months ago
- Related to Task #3318: Research: NUMA awareness added
Updated by Victor Julien 5 months ago
I think note 2 is mostly off topic here. It should probably be added to #3318 or a related ticket. Lets focus this ticket on how to express the NIC/NUMA/cores in our yaml.
Updated by Victor Julien 4 months ago
- Related to Bug #7137: "invalid cpu range" when trying to use CPU affinity added
Updated by Lukas Sismis about 2 months ago
- Related to Task #3695: research: libhwloc for better autoconfiguration added