Upload
gertrude-holland
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
1 / 28
Modeling the HTML DOM and Browser API in Static Analysis
of JavaScript Web Applications
ESEC/FSE 2011
Anders Møller, Magnus Madsen and Simon Holm Jensen
2 / 28
Motivation
• How can we help developers writing JavaScript web applications?– by providing tools for findings bugs early in the
development cycle• In this work we focus on finding bugs in the
way JavaScript programs interact with the web browser
3 / 28
JavaScript in a browser
The Document Object Model
interaction
JavaScript code
DOM manipulation
eventsweb browserrendering
user
4 / 28
Example
The el.button property is always absent(it is undefined)
An HTMLImageElement object does not have a button property
Unreachable
The programmer has confused el and ev
5 / 28
TAJS: Type Analysis for JavaScript
A tool for static analysis of plain JavaScript– the starting point for our work
– flow-sensitive dataflow analysis– interprocedural– whole-program analysis– intended for non-minified, non-obfuscated code
[S.H. Jensen, A. Møller and P. Thiemann SAS '09]
6 / 28
Bug Finding
We look for general errors such as:– dead or unreachable code– invocations of built-in functions with an incorrect
number of arguments or wrong argument types– undefined dereference– reading absent properties– etc.
7 / 28
Contributions
We extend the static analysis of TAJS to reason about JavaScript that execute in a browser:– how to model the browser API?• 100s of non-standardized objects and functions
– how to model the HTML page?• complex prototype hierarchy of the W3C DOM
– how to model the event system?• many kinds of events• dynamic registration of event handlers
8 / 28
Architecture
potential errors
TAJS
DOM model
<form id="foo">...</div>
• Named tags
Flow graph extension
<script>...</script>
<div onclick="..."/>
• JavaScript code
• Event handler code
Browser API
9 / 28
The Browser API
• The global window object– history, location, navigator, screen– alert(...), print(...), encodeURI(...) – setTimeout(...), setInterval(...)– addEventHandler(...)
• Non-standard and legacy functionality
10 / 28
The HTML DOM
• The Document Object Model (W3C)– tree like structure– e.g. one JavaScript object for each HTML tag• HTMLInputElement, HTMLFontElement, etc.
– arranged in a large prototype hierarchy• Huge amount of properties and functions– most properties are string or integer constants
11 / 28
The HTML DOM
• Important functions– createElement(...)– getElementById(...)– getElementByName(...)– getElementByTagName(...)
• The analysis tracks elements by:
Tag ID Name
<img id="foo" name="bar"/>
12 / 28
Prototype Hierarchy
The complete model has ~250 objects and ~500 properties
13 / 28
Choice of AbstractionModel the DOM objects as:
single abstract object single abstract object for every element kind
abstract object for every element in the initial
HTML page
Our Choice
<img><img><div>
<img><img><div>
<img><img><div>
14 / 28
Straightforward Hierarchy?
• The image tag looks pretty innocent:
<img src="a.png" alt=""/>
• Image objects can be created in several ways:
new Image();document.createElement("img");
15 / 28
Example
16 / 28
Image Prototype Hierarchy
HTMLImageElement
(constructor obj)
Object(prototype obj)
Blue arrows are internal prototype linksRed arrows are external prototype links
Image(constructor obj)
Image(instance obj)
HTMLImageElement
(prototype obj)
HTMLImageElement
(instance obj)
Image(prototype obj)
Attached to window
Attached to window
new Image();document.createElement("img");
17 / 28
Registration of Event Handlers
• Directly in the HTML source– <div onclick="...">
• Using the Browser API– setTimeout(...), setInterval(...)– addEventListener(...)
• Writes to "magic properties"– x.onclick = ...,
Special properties that have side-
effects on the DOM when written to
18 / 28
Tracking Event Handlers
Separate event handlers based on their kind– page load (onload)– keyboard (onkeypress, ...)– mouse (onclick, onmouseover, ...)– timed (setTimeout, setInterval, ...)– etc.
19 / 28
Flow graph Extension
Event handlers are executed by introducing an event-handler-loop– separates page load event
handlers from other event handlers
– executes event handlers in two non-deterministic loops
20 / 28
Evaluation
• With these extensions TAJS can reason about JavaScript applications that run in a browser
• Is the analysis precise enough to be useful?
21 / 28
Benchmarks
Evaluated on a series of benchmarks:– Chrome Experiments– Internet Explorer 9 Test Drive– 10K Challenge – A List Apart
– (excluding benchmarks using eval, jquery or not relevant for JavaScript)
22 / 28
Research Questions
Q1: Ability to show absence of errors?The analysis is able to show that• 85-100% of call sites are safe• 80-100% of property reads are safe
23 / 28
Research Questions
Q2: Ability to locate sources of errors?– We randomly introduce spelling errors– The analysis is able to pinpoint most of them
(details in the paper)
24 / 28
Research Questions
Q3: Precision of computed call graph?The analysis is able to show that90-100% of call sites are monomorphic
25 / 28
Research Questions
Q4: Precision of inferred types?– boolean, number, string, object and undefined – the analysis is able to show that the average type
size is 1.0-1.3• e.g. if the average type size is 1.0 then every read in the
program results in values of a single type
26 / 28
Research Questions
Q5: Ability to detect dead or unreachable code?– found several unreachable functions– most appear to be unused library code copy &
pasted directly into the benchmark programs
27 / 28
Future / Current Work
• Dynamically generated code– eval
• Library support– jQuery, MooTools, etc.
28 / 28
Conclusion
Extended previous work to reason precisely about JavaScript programs that execute in a browser-based environment
allows us to discover general errors such as:• reading absent properties• dereferencing null or undefined• invoking functions with incorrect arguments• etc.
29 / 28
30 / 28
DOM Modules & Levels
Module \ Level Level 0 Level 1 Level 2 Level 3
Core Module - ()HTML Module - ()Event Module - - ()CSS Module - - () ()Browser API - - -
Year ~1996 1998 2000 2004
In addition we support the HTMLCanvasElement from HTML5.
31 / 28
Soundness Issues?
Assignment to computed property names
foo[bar] = "baz"foo[bar] = function() {...}
If the exact value of bar is unknown:– it could be a write to a "magic property"– or a registration of an event handler