Feature #6802
openSupport Domain rollup using existing dataset library
Description
Support domain rollup using specialized Matcher leveraging dataset code.
The matcher would navigate the input buffer string, backward, and for each . (dot) would query the dataset for the presence of the subdomain.
ex:
api.google.com on the inspection buffer:
iterate the string backward, and stop at the first dot:
com -> check the dataset
keep going
google.com -> check the dataset
api.google.com -> check the dataset
It would introduce a new signature keyword:
dns.query; domain-rollup <dataset-name>;
The matcher would automatically automatically perform a dataset:isset internally using the DatasetLookup function directly
An optimization that could be explored is to support a new type of dataset type: domain
In this case the domain would be calculated in reverse order when they are added to the dataset
if we add google.com to the dataset, it would be stored as hash of moc.elgoog
when we navigate the inspection buffer in reverse, it would compute the hash as it move along the char byte array.
upon reaching a . (dot), the hash is ready to be check, no need rehash the string.
Updated by Jason Ish 7 months ago
- Related to Feature #5639: Allow dataset to match on extracted domain added
- Related to Feature #5681: datasets: add more transform layers to match on domains added
Updated by Francois Methot 7 months ago
Support domain rollup matching against existing dataset funcitonnality (endswith like behavior)
Option 1 - Domain rollup using specialized Matcher
The matcher would navigate the input buffer string, backward, and for each . (dot) would query the dataset for the presence of the subdomain.
ex:
api.google.com on the inspection buffer:
iterate the string backward, and stop at the first dot:
com -> check the dataset
keep going
google.com -> check the dataset
api.google.com -> check the dataset
It would introduce a new signature keyword:
dns.query; domain-rollup <dataset-name>;
The matcher would automatically automatically perform a dataset:isset internally using the DatasetLookup function directly
Option 2 - Add support to new "domain" Dataset type to enable subdomain matching
Config ex:
datasets:
domain-block:
type: domain
state: domain-block.lst
Signature implementation
dataset:set-> add domain to the associated dataset
dataset:isset-> return true if domain matcher (as described in option 1) find any subdomain in the associated dataset
dataset:isnotset-> return true dataset:isset return false;
In this case the domain could be calculated in reverse order when they are added to the dataset
if we add google.com to the dataset, it would be stored as hash of moc.elgoog
when we navigate the inspection buffer in reverse, it would compute the hash as it move along the char byte array.
upon reaching a . (dot), the hash is ready to be check, no need rehash the string.
Updated by Eric Leblond 6 months ago
Hello François, what is your final goal ? Is the domain keyword like implemented in https://github.com/OISF/suricata/pull/8155 enough for your need ? We still need a discussion on the crate to use but the concept was looking ok for OISF team.
Updated by Francois Methot 6 months ago · Edited
Eric Leblond wrote in #note-3:
Hello François, what is your final goal ? Is the domain keyword like implemented in https://github.com/OISF/suricata/pull/8155 enough for your need ? We still need a discussion on the crate to use but the concept was looking ok for OISF team.
Our end goal is to use datasets to store domain of any subdomain length to match as subdomain.
So we could have IOC like
- test1.com
- test2.test3.com
- test4.test5.test6.com
- any.number.of.subdomains.test7.test8.test9.com
Added to a dataset and allowing to match dns.query/http.host that ends with these subdomain sequence.
We are keen on using dataset because domain can be added/removed very quickly without reloading rules.
The only drawback of this algorithm is that a long domain from the wire like "any.number.of.subdomains.test7.test8.test9.com" will trigger multiple dataset check (8 checks in this case).
But our tests showed that dataset hash check performance is great.