View
51
Download
0
Category
Preview:
DESCRIPTION
Fast and Precise Sanitizer Analysis with Bek. Pieter Hooimeijer Ben Livshits David Molnar Prateek Saxena Margus Veanes. 2011-08-10 USENIX Security. < img src =' some untrusted input '/>. < img src =' some untrusted input '/>. Question: What could possibly go wrong?. - PowerPoint PPT Presentation
Citation preview
Fast and Precise Sanitizer Analysis with BEK
Pieter Hooimeijer Ben Livshits David Molnar Prateek Saxena Margus Veanes
2011-08-10 USENIX Security
3
4
<img src='some untrusted input'/>
5
QUESTION:
What could possibly go wrong?
<img src='some untrusted input'/>
6
<img src='some untrusted input'/>
Attacker: im.png' onload='javascript:...
7
<img src='some untrusted input'/>
Attacker: im.png' onload='javascript:...
8
<img src='some untrusted input'/>
Attacker: im.png' onload='javascript:...
Result:<img src='im.png' onload='javascri
9
<img src='some untrusted input'/>
Attacker: im.png' onload='javascript:...
Result:<img src='im.png' onload='javascriFAIL
10
11
A tale of two sanitizers…
12
' 'single quote html entity
13
some untrusted input
14
Library AName:Around for:Availability:
HtmlEncodeYearsReadily available to C# developers
some untrusted input
15
Library AName:Around for:Availability:
Library BName:Around for:Availability:
HtmlEncodeYearsReadily available to C# developers
HtmlEncodeYearsReadily available to C# developers
some untrusted input
16
Library AName:Around for:Availability:
Library BName:Around for:Availability:
HtmlEncodeYearsReadily available to C# developers
HtmlEncodeYearsReadily available to C# developers
' ' ' ' ✔ ✘
17
public static string HtmlEncode(string s){ if (s == null) return null; int num = IndexOfHtmlEncodingChars(s, 0); if (num == -1) return s; StringBuilder builder=new StringBuilder(s.Length+5); int length = s.Length; int startIndex = 0;Label_002A: if (num > startIndex) { builder.Append(s, startIndex, num-startIndex); } char ch = s[num]; if (ch > '>') { builder.Append("&#"); builder.Append(((int) ch).ToString(NumberFormatInfo.InvariantInfo)); builder.Append(';'); } else { char ch2 = ch; if (ch2 != '"') { switch (ch2) { case '<': builder.Append("<"); goto Label_00D5; case '=': goto Label_00D5; case '>': builder.Append(">"); goto Label_00D5; case '&': builder.Append("&"); goto Label_00D5; } } else { builder.Append("""); } }Label_00D5: startIndex = num + 1; if (startIndex < length) { num = IndexOfHtmlEncodingChars(s, startIndex); if (num != -1) { goto Label_002A; } builder.Append(s, startIndex, length-startIndex); } return builder.ToString();}
.NET WebUtilityMS AntiXSS private static string HtmlEncode(string input, bool useNamedEntities, MethodSpecificEncoder encoderTweak) { if (string.IsNullOrEmpty(input)) { return input; } if (characterValues == null) { InitialiseSafeList(); } if (useNamedEntities && namedEntities == null) { InitialiseNamedEntityList(); } // Setup a new character array for output. char[] inputAsArray = input.ToCharArray(); int outputLength = 0; int inputLength = inputAsArray.Length; char[] encodedInput = new char[inputLength * 10]; SyncLock.EnterReadLock(); try { for (int i = 0; i < inputLength; i++) { char currentCharacter = inputAsArray[i]; int currentCodePoint = inputAsArray[i]; char[] tweekedValue; // Check for invalid values if (currentCodePoint == 0xFFFE || currentCodePoint == 0xFFFF) { throw new InvalidUnicodeValueException(currentCodePoint); } else if (char.IsHighSurrogate(currentCharacter)) { if (i + 1 == inputLength) { throw new InvalidSurrogatePairException(currentCharacter, '\0'); } // Now peak ahead and check if the following character is a low surrogate. char nextCharacter = inputAsArray[i + 1]; char nextCodePoint = inputAsArray[i + 1]; if (!char.IsLowSurrogate(nextCharacter)) { throw new InvalidSurrogatePairException(currentCharacter, nextCharacter); } // Look-ahead was good, so skip. i++; // Calculate the combined code point long combinedCodePoint = 0x10000 + ((currentCodePoint - 0xD800) * 0x400) + (nextCodePoint - 0xDC00); char[] encodedCharacter = SafeList.HashThenValueGenerator(combinedCodePoint); encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (char.IsLowSurrogate(currentCharacter)) { throw new InvalidSurrogatePairException('\0', currentCharacter); } else if (encoderTweak != null && encoderTweak(currentCharacter, out tweekedValue)) { for (int j = 0; j < tweekedValue.Length; j++) { encodedInput[outputLength++] = tweekedValue[j]; } } else if (useNamedEntities && namedEntities[currentCodePoint] != null) { char[] encodedCharacter = namedEntities[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (characterValues[currentCodePoint] != null) { // character needs to be encoded char[] encodedCharacter = characterValues[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else { // character does not need encoding encodedInput[outputLength++] = currentCharacter; } } } finally { SyncLock.ExitReadLock(); } return new string(encodedInput, 0, outputLength); }
private static string HtmlEncode(string input, bool useNamedEntities, MethodSpecificEncoder encoderTweak) { if (string.IsNullOrEmpty(input)) { return input; } if (characterValues == null) { InitialiseSafeList(); } if (useNamedEntities && namedEntities == null) { InitialiseNamedEntityList(); } // Setup a new character array for output. char[] inputAsArray = input.ToCharArray(); int outputLength = 0; int inputLength = inputAsArray.Length; char[] encodedInput = new char[inputLength * 10]; SyncLock.EnterReadLock(); try { for (int i = 0; i < inputLength; i++) { char currentCharacter = inputAsArray[i]; int currentCodePoint = inputAsArray[i]; char[] tweekedValue; // Check for invalid values if (currentCodePoint == 0xFFFE || currentCodePoint == 0xFFFF) { throw new InvalidUnicodeValueException(currentCodePoint); } else if (char.IsHighSurrogate(currentCharacter)) { if (i + 1 == inputLength) { throw new InvalidSurrogatePairException(currentCharacter, '\0'); } // Now peak ahead and check if the following character is a low surrogate. char nextCharacter = inputAsArray[i + 1]; char nextCodePoint = inputAsArray[i + 1]; if (!char.IsLowSurrogate(nextCharacter)) { throw new InvalidSurrogatePairException(currentCharacter, nextCharacter); } // Look-ahead was good, so skip. i++; // Calculate the combined code point long combinedCodePoint = 0x10000 + ((currentCodePoint - 0xD800) * 0x400) + (nextCodePoint - 0xDC00); char[] encodedCharacter = SafeList.HashThenValueGenerator(combinedCodePoint); encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (char.IsLowSurrogate(currentCharacter)) { throw new InvalidSurrogatePairException('\0', currentCharacter); } else if (encoderTweak != null && encoderTweak(currentCharacter, out tweekedValue)) { for (int j = 0; j < tweekedValue.Length; j++) { encodedInput[outputLength++] = tweekedValue[j]; } } else if (useNamedEntities && namedEntities[currentCodePoint] != null) { char[] encodedCharacter = namedEntities[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else if (characterValues[currentCodePoint] != null) { // character needs to be encoded char[] encodedCharacter = characterValues[currentCodePoint]; encodedInput[outputLength++] = '&'; for (int j = 0; j < encodedCharacter.Length; j++) { encodedInput[outputLength++] = encodedCharacter[j]; } encodedInput[outputLength++] = ';'; } else { // character does not need encoding encodedInput[outputLength++] = currentCharacter; } } } finally { SyncLock.ExitReadLock(); } return new string(encodedInput, 0, outputLength); }
public static string HtmlEncode(string s){ if (s == null) return null; int num = IndexOfHtmlEncodingChars(s, 0); if (num == -1) return s; StringBuilder builder=new StringBuilder(s.Length+5); int length = s.Length; int startIndex = 0;Label_002A: if (num > startIndex) { builder.Append(s, startIndex, num-startIndex); } char ch = s[num]; if (ch > '>') { builder.Append("&#"); builder.Append(((int) ch).ToString(NumberFormatInfo.InvariantInfo)); builder.Append(';'); } else { char ch2 = ch; if (ch2 != '"') { switch (ch2) { case '<': builder.Append("<"); goto Label_00D5; case '=': goto Label_00D5; case '>': builder.Append(">"); goto Label_00D5; case '&': builder.Append("&"); goto Label_00D5; } } else { builder.Append("""); } }Label_00D5: startIndex = num + 1; if (startIndex < length) { num = IndexOfHtmlEncodingChars(s, startIndex); if (num != -1) { goto Label_002A; } builder.Append(s, startIndex, length-startIndex); } return builder.ToString();}
18
.NET WebUtilityMS AntiXSS
• Same behavior on all inputs?• If not, what is a
differentiating input?• Can it generate any known ‘bad’ outputs?
19
A tale of 151 sanitizers…
20
PHP Trunk Changes to html.c, 1999—2011
21
PHP Trunk Changes to html.c, 1999—2011
R7,841April 1999135 loc
R309,482March 20111693 loc
22
PHP Trunk Changes to html.c, 1999—2011
R32,564September 2000ENT_QUOTES introduced
R7,841April 1999135 loc
R309,482March 20111693 loc
23
PHP Trunk Changes to html.c, 1999—2011
R32,564September 2000ENT_QUOTES introduced
R242,949September 2007
$double_encode=true
R7,841April 1999135 loc
R309,482March 20111693 loc
24
PHP Trunk Changes to html.c, 1999—2011
• Safe to apply twice?
• Safe to combine with other sanitizers?
Motivation
25
• Writing string sanitizers correctly is difficult
• There is no cheap way to identify problems with sanitizers
• ‘Correctness’ is a moving target
• What if we could say more aboutsanitizer behavior?
26
BEK Frontend: a small language
for string manipulation; similar to how sanitizers are written today
Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation
Contributions
27
BEK Frontend: a small language
for string manipulation; similar to how sanitizers are written today
Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation
ContributionsEvaluation Converted sanitizers from a
variety of sources
Checked properties like reversibility, idempotence, equivalence, and commutativity
28
BEK Frontend: a small language
for string manipulation; similar to how sanitizers are written today
Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation
ContributionsEvaluation Converted sanitizers from a
variety of sources
Checked properties like reversibility, idempotence, equivalence, and commutativity
29
s := iter(c in t)[b := false;] { case (!b && c in "[\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c); };
Bek Program
BEK: Architecture
30
Symbolic Finite Transducers
Z3
Transformation
Microsoft.Automata
s := iter(c in t)[b := false;] { case (!b && c in "[\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c); };
Bek Program
BEK: Architecture
31
Symbolic Finite Transducers
Z3
Transformation Analysis Does it do the right thing?
Counterexample “\' vs. \\'”Microsoft.Automata
s := iter(c in t)[b := false;] { case (!b && c in "[\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c); };
Bek Program
BEK: Architecture
32
Symbolic Finite Transducers
Z3
Transformation Analysis Does it do the right thing?
Counterexample “\' vs. \\'”Microsoft.Automata
s := iter(c in t)[b := false;] { case (!b && c in "[\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c); };
Bek Program
Code Gen
C# JavaScript C
Code Gen
BEK: Architecture
33
Symbolic Finite Transducers
Z3
Transformation Analysis Does it do the right thing?
Counterexample “\' vs. \\'”Microsoft.Automata
s := iter(c in t)[b := false;] { case (!b && c in "[\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c); };
Bek Program
Code Gen
C# JavaScript C
Code Gen
BEK: Architecture
34
t := iter(c in s)[b := false;] { case (!b && c in "['\"]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c); };
A BEK Program: Escape Quotes
35
t := iter(c in s)[b := false;] { case (!b && c in "['\"]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c); };
A BEK Program: Escape Quotesiterate over the characters in string s
A BEK Program: Escape Quotes
36
t := iter(c in s)[b := false;] { case (!b && c in "['\"]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c); };
iterate over the characters in string s
while updating one boolean variable b
37
Symbolic Finite Transducers
Z3
Transformation Analysis Does it do the right thing?
Counterexample “\' vs. \\'”Microsoft.Automata
s := iter(c in t)[b := false;] { case (!b && c in "[\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c); };
Bek Program
Code Gen
C# JavaScript C
Code Gen
BEK: Architecture
38
A Symbolic Finite Transducer
39
A Symbolic Finite Transducersymbolic predicates
40
output lists
A Symbolic Finite Transducersymbolic predicates
41
Symbolic Finite Transducers
Z3
Transformation Analysis Does it do the right thing?
Counterexample “\' vs. \\'”Microsoft.Automata
s := iter(c in t)[b := false;] { case (!b && c in "[\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c); };
Bek Program
Code Gen
C# JavaScript C
Code Gen
BEK: Architecture
42
Symbolic Finite Transducers
Z3
Transformation Analysis Does it do the right thing?
Counterexample “\' vs. \\'”Microsoft.Automata
s := iter(c in t)[b := false;] { case (!b && c in "[\"\\]"): b := false; yield('\\', c); case (c == '\\'): b := !b; yield(c); case (true): b := false; yield(c); };
Bek Program
Code Gen
C# JavaScript C
Code Gen
BEK: Architecture
Now what?
SFT Algorithms
43
Equivalence Checking
SFT Algorithms
44
Equivalence Checking
AntiXSS.HtmlEncode
WebUtility.HtmlEncode
SFT Algorithms
45
Join Composition
SFT A B
in outSFT A in outSFT B
SFT Algorithms
46
Join Composition
SFT A B
in outSFT A in outSFT B
JavaScriptEncode(HtmlEncode(w))
HtmlEncode(JavaScriptEncode(w))
47
Pre-Image Computation
in
SFT A
Regular Language
Regular Language
S
48
Pre-Image Computation
in
SFT A
Regular Language
Regular Language
S?
49
BEK Frontend: a small language
for string manipulation; similar to how sanitizers are written today
Backend: a model based on symbolic finite transducers with algorithms for analysis and code generation
ContributionsEvaluation Converted sanitizers from a
variety of sources
Checked properties like reversibility, idempotence, equivalence, and commutativity
50
Some Questions• What features are needed to port
existing sanitizers?
• Can we check interesting properties on real sanitizers?
• Will HtmlEnc implementations protect against XSS Cheat Sheet samples?
Language Features
51
Data:
1x OWASP esapi HTMLencode
13x Google Ctemplate AutoEscape
21x IE 8 XSS Filter
7x Synthetic
inspect
feature counts
What features are needed to port existing sanitizers?
Language Features
52
What features are needed to port existing sanitizers?
• Majority (76%) of sanitizers can be ported without extending the language
• With multi-character lookahead: 90%
53
Data• 4x MS internal
HtmlEncode
• 3x ‘for hire’ HtmlEncode based on English-language specification (C#)
Commutative?
Equivalent?
Can we check interesting properties on real sanitizers?
54
Can we check interesting properties on real sanitizers?
• Short answer: Yes!
55
• Short answer: Yes!
• EQ results take less than a minute to obtain:1 2 3 4 5 6 7
1 ✔ ✔ ✔ ✘ ✘ ✔ ✘2 ✔ ✔ ✘ ✘ ✔ ✘3 ✔ ✘ ✘ ✔ ✘4 ✔ ✘ ✘ ✘5 ✔ ✘ ✘6 ✔ ✘7 ✔
Can we check interesting properties on real sanitizers?
The Cheat Sheet
56
Will HtmlEnc protect against known XSS strings?
in
SFT A
Regular Language
Regular Language
S?
The Cheat Sheet
57
Will HtmlEnc protect against known XSS strings?• One out of seven implementations correctly
encodes all strings for use in both HTML and attribute contexts
58
• BEK is a domain-specific language for writing string sanitizers
• We model BEK programs without approximation using symbolic finite transducers, enabling e.g., equivalence checks
• We evaluate our system using real-world sanitizers from a variety of different sources
Conclusion
Thanks!
http://research.microsoft.com/en-us/projects/bek/
http://www.rise4fun.com/bek/
Demo Time
61
Randomly-generated BEK programs, parameterized
on SFT size
Commutative?
Equivalent?
Scalability: Approach
62
Commutativity Self-Equivalence
Scalability: Results
63
100 PHPprojects
scrape
9.6 millionlines of PHP
static count
usage stats for 111 distinct PHP library functions
Sanitizer use in PHP code: Approach
64
Sanitizer use in PHP code: Results
Recommended