Optimization #3322: Use standard CRC32 for hash-like functions - Suricata - Open Information Security Foundation

Actions

Copy link

Optimization #3322

open

Use standard CRC32 for hash-like functions

Added by Philippe Antoine over 5 years ago. Updated 7 months ago.

Status:

New

Priority:

Low

Assignee:

Community Ticket

Target version:

TBD

Effort:

low

Difficulty:

Label:

Description

Instead of a custom one (as CRC32 was I think designed for kind of avoiding collisions)

One such function is StringHash cf https://github.com/OISF/suricata/pull/4337

Related issues 2 (1 open — 1 closed)

Actions

Copy link

Updated by Victor Julien over 5 years ago

As we discussed offline, it would be nice to create some kind of benchmarking framework where we could validate such changes. Pure pcap tests may not always give enough insight. For example with the bm optimizations pcap based tests showed no difference, while I think more micro level benchmarks would have shown something.

Actions

Copy link

Updated by Philippe Antoine over 5 years ago

It would be nice if this benchmarking framework handles caches realistically.

With the example of Boyer-Moore optimizations (one less call to alloc), I am not sure a naive benchmarking would shows much difference as the additional call to alloc would grab repeatedly the same cached memory area, whereas in a real Suricata execution, this would not be the case

Actions

Copy link

Updated by Andreas Herz over 5 years ago

Assignee set to Philippe Antoine
Target version set to TBD

Actions

Copy link

Updated by Philippe Antoine over 5 years ago

MurmurHash may be the best function
https://en.wikipedia.org/wiki/MurmurHash

Current function is not random
It is DJB hash
https://stackoverflow.com/questions/10696223/reason-for-5381-number-in-djb-hash-function

Here is a comparison
https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

Actions

Copy link

Updated by Philippe Antoine over 5 years ago

I put a first benchmark here
https://github.com/OISF/suricata/pull/4371

Actions

Copy link

Updated by Philippe Antoine about 5 years ago

Status changed from New to In Review

https://github.com/OISF/suricata/pull/4371

Actions

Copy link

Updated by Roland Fischer almost 5 years ago

There are a few other hash functions that might be interesting. Google's CityHash comes to mind as it's their "general purpose" hash. SpookyHash/xxhash might be options as well. The stackexchange page lists them as well I think.

Ultimately, it depends on how much time you want to spend on this vs how happy you are with the current hash. Plus, what the usage patterns of the hash are as well as the data to be hashed. ;)

Actions

Copy link

Updated by Victor Julien over 4 years ago

Blocked by Bug #4265: QA lab: add possibility to do repeatable replay tests added

Actions

Copy link

Updated by Philippe Antoine over 4 years ago

Closing https://github.com/OISF/suricata/pull/4816 waiting for QA lab

Actions

Copy link

#10

Updated by Philippe Antoine over 2 years ago

Assignee changed from Philippe Antoine to Community Ticket
Priority changed from Normal to Low

This does not seem the most important area to optimize...

Actions

Copy link

#11

Updated by Philippe Antoine 8 months ago

Status changed from In Review to New

Actions

Copy link

#12

Updated by Philippe Antoine 7 months ago

Related to Security #7209: thash: random factor not used; possible abusive hash collisions added

Actions

Copy link

#13

Updated by Philippe Antoine 7 months ago

SipHash seems to be the current standard, used by rust and other internally...

Actions

Copy link

Also available in: Atom PDF

	Related to Suricata - Security #7209: thash: random factor not used; possible abusive hash collisions	Closed	Philippe Antoine				Actions
	Blocked by Suricata - Bug #4265: QA lab: add possibility to do repeatable replay tests	In Progress	Peter Manev				Actions

Project

General

Profile

Suricata

Custom queries