Feature #2076
closed
Strip whitespace from buffers
Added by Jason Williams almost 8 years ago.
Updated almost 7 years ago.
Description
It would be very useful to be able to have a modifier that we could apply to buffers that would normalize/eliminate whitespace. This would be most useful in the file_data; section and would significantly reduce pcre usage when dealing with html and javascript signatures.
For example in javascript you can have:
'window . location = '
We have to write pcres to account for this possible whitespace such as:
content:"window"; pcre:"/^\s*\.\s*location\s*=\s*/";
It would be very useful if we could write this as:
file_data; content:"window.location="; ignore_whitespace;
How do you see this interact with other keywords?
file_data; content:"window.location="; ignore_whitespace; content:"something"; distance:0; within:10; isdataat:!1,relative;
Would the second content and the isdataat also run on some stripped buffer? If so it might make more sense to have something like:
file_data; ignore_whitespace; content:"window.location="; content:"something"; distance:0; within:10; isdataat:!1,relative;
Or even something ugly like:
file_data_ignore_whitespace; content:"window.location="; content:"something"; distance:0; within:10; isdataat:!1,relative;
If we preprocess the file_data buffer to strip whitespace or do some other transformation, we're essentially creating a new buffer and a new inspect engine internally. Related ticket
#1006.
I believe the second option would be most practical for the purpose of reducing pcre usage.
- Assignee set to OISF Dev
- Target version set to TBD
Will this also eliminate nulls (0x00)? This would help in matching on unicode text among other things.
in a buffer like " a b c d" would the expected result be "abcd" or something else? Would all whitespace be stripped?
Victor Julien wrote:
in a buffer like " a b c d" would the expected result be "abcd" or something else? Would all whitespace be stripped?
Well, I think there we should either remove all whitespace and smush the buffer together, or replace all whitespace instances with a single space. So (?:\t\r\n\s\x00)+ becomes \s. I don't think it really matters on the sig writing side, I think whichever has the least amount of overhead on the sensor would be best.
In your original example of 'window . location = ' the best result would probably be 'window.location=' ?
I think it would be the best result.
I agree, stripping out whitespace would be best, especially for \x00. Turning \x00+ to \x20 would negate changing \x00 at all.
- Status changed from New to Assigned
- Assignee changed from OISF Dev to Victor Julien
- Target version changed from TBD to 70
- Status changed from Assigned to Closed
- Target version changed from 70 to 4.1beta1
Also available in: Atom
PDF