Jason Ish wrote in #note-7:
Can we get some examples of the actual breakage? I'm playing with 6.0.13 and 7.0.0-rc2 with some basic requests, but not too much is jumping out, at least with a default configuration.
So I was trying to get the http server information in the log and I used the configuration to do so:
- http:
extended: yes # enable this for extended logging information
custom: [server, accept-encoding]
the thing I have with 7 is:
{
"hostname": "198.71.247.91",
"url": "/",
"http_user_agent": "curl/7.58.0",
"http_content_type": "text/html",
"http_method": "GET",
"protocol": "HTTP/1.1",
"status": 200,
"length": 51,
"response_headers": [
{
"name": "Server",
"value": "Apache/2.4.41 (Ubuntu)"
}
]
}
With 6.x I got
{
"hostname": "198.71.247.91",
"url": "/",
"http_user_agent": "curl/7.58.0",
"http_content_type": "text/html",
"server": "Apache/2.4.41 (Ubuntu)",
"http_method": "GET",
"protocol": "HTTP/1.1",
"status": 200,
"length": 51
}
I believe the main issue was that listing headers in custom
could overwrite some http
fields as we, arguably in error logged these objects directly into the http
object instead of some sub-object named headers. And now we log these response_headers
. regit am I correct that you see this breakage when using a custom value for @http.custom
?
As the header names allowed in custom
are a defined list, an alternative fix to the issue could be to make sure there is no name overlap by renaming some fields, but still logging them under the http
object.
Then response_headers
and request_headers
could be saved for the dump-all-headers
option.
Within response_headers
and request_headers
what is the best format? What we have now:
"response_headers": [
{ "name": "Server", value: "Caddy" }
]
IMO, this is the way to go for listing of arbitrary keys.
or..
[...]
where the first may be harder to query, but second causing index expansion and possibly other issues.
Yes, this one is going to cause software like Elasticsearch to explode.
Eric Leblond wrote in #note-6:
Philippe Antoine wrote in #note-5:
First, it breaks backward compatibility of events and users will have to upgrade all data handling for something that was there since ages.
So, how do you propose to fix #5320 ?
Having content_range
changed to content_range_parsed
in EveHttpLogJSONBasic
?
This is still a backward compatibility breakage (but maybe a lesser one)
Yes, changing one single key would be better. Also there is something we should avoid which is to have a JSON key pointing to different values type. Pointed in the issue was the fact one single key was single value or multiple values. In the case of Elasticsearch it handles it fine but for most "DB" engines or even for coding this is a nightmare.
Second, switching from a set of JSON key: value to a dictionary {"key": $key, "value": $value} is going to cause severe issues in lot of data lake.
Then, regit is not that a problem with @dump-all-headers
suricata.yaml option ?
(disclaimer: I do not understand everything about Elastic)
For me, dump-all-headers is really nice when you want to investigate one single event or doing forensic as we are not going to miss a single field. The key, value is needed as adding dynamic keys to tools like Elasticsearch is causing an explosion of their engine. But it is not usable for doing queries like give me the rare $name_your_header_key values. I think we need to keep the set of static key we already have as it is really easy to query them and do stats or dashboards.
The lack of conversion to lower case is making sense as this output is trying to match the reality of the transaction as close as possible.
Not sure I understand if you want normalization or raw data here...
In dump all headers, I think it is making sense to have raw data. This is for me a one event at the time or something used in processing.