Feature #2488
openHTML Parsing / Buffers
Description
We write a lot of signatures on the contents of html in file_data. It would be awesome to be able to do some parsing/buffering here to avoid having to go through the whole file_data buffer. Alternatively perhaps this could be some kind of transform?
Some quick off the top of my head example html:
<html>
<head>
<title>Meerkat HQ</title>
<!--Meerkat HQ cloned by z001ie -->
<link rel="stylesheet" href="./z001ie_files/css/meerkats.css">
<link rel="shortcut icon" href="./z001ie_files/images/favicon.gif" type="image/gif"/>
<script src="./z001ie_files/jquery_003_002.html"></script>
<script>
function IsEmpty() {
var x = document.forms["login"]["user"].value;
var y = document.forms["login"]["pass"].value;
if (x == "") {
document.getElementById("ErrorBox").style.display = "block";
document.getElementById("ErrorUser").style.display = "block";
return false;
}
}
</script>
</head>
<body>
<form id="signon" name="login" action="login.php" method="post" autocomplete="off" onsubmit="return IsEmpty();">
<input type="text" id="userid" placeholder="Username" class="required" name="user" value="" autocomplete="off">
<input type="password" placeholder="Password" class="required" id="passwd" name="pass" value="" autocomplete="off">
<input type="submit" class="signin" value="Sign On" onclick="return IsEmpty();">
</form>
</body>
</html>
I think that the following buffers could be very useful for detection to avoid parsing all of file_data (like parsing all of http_header)
html_title¶
literal:<title>Meerkat HQ</title>
rule: html_title; content:"Meerkat HQ"; nocase;
html_comment¶
literal comment: @
rule: @html_comment; content:"cloned by z00lie"; nocase;
html_resources¶
literal resources: (there are a few)
<link rel="stylesheet" href="./z001ie_files/css/meerkats.css">
<link rel="shortcut icon" href="./z001ie_files/images/favicon.gif" type="image/gif"/>
<script src="./z001ie_files/jquery_003_002.html"></script>
rule: html_resource; content:"/z001ie"; nocase;
literal javascript:¶
function IsEmpty() {
var x = document.forms["login"]["user"].value;
var y = document.forms["login"]["pass"].value;
if (x == "") {
document.getElementById("ErrorBox").style.display = "block";
document.getElementById("ErrorUser").style.display = "block";
return false;
}
}
rule: html_javascript; strip_whitespace; content:"varx=document.forms[|22|login|22|][|22|user|22|]"
html_form¶
literal form:
<form id="signon" name="login" action="login.php" method="post" autocomplete="off" onsubmit="return IsEmpty();">
<input type="text" id="userid" placeholder="Username" class="required" name="user" value="" autocomplete="off">
<input type="password" placeholder="Password" class="required" id="passwd" name="pass" value="" autocomplete="off">
<input type="submit" class="signin" value="Sign On" onclick="return IsEmpty();">
</form>
rule: html_form; content:".php"; content:"method=|22|post|22|"; nocase; content:"onsubmit=|22|return IsEmpty()|3b|"; nocase; content:"user"; nocase; content:"pass"; nocase; distance:0;
Or maybe as a transform?
file_data; extract_html_title; content:"Meerkat HQ";
file_data; extract_html_comment; content:"cloned by z00lie"; nocase;
file_data; extract_html_resources; content:"/z001ie"; nocase;
file_data; extract_html_javascript; strip_whitespace; content:"varx=document.forms[|22|login|22|][|22|user|22|]";
file_data; extract_html_form; content:".php"; content:"method=|22|post|22|"; nocase; content:"onsubmit=|22|return IsEmpty()|3b|"; nocase; content:"user"; nocase; content:"pass"; nocase; distance:0;
Updated by Victor Julien over 5 years ago
Maybe we can use a rust html parsing crate.
Updated by Victor Julien about 5 years ago
Possible Rust HTML parser: https://github.com/servo/html5ever
Updated by Jeff Lucovsky about 4 years ago
- Related to Task #4097: Suricon 2020 brainstorm added