31
1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm Jensen

1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

Embed Size (px)

Citation preview

Page 1: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

1 / 28

Modeling the HTML DOM and Browser API in Static Analysis

of JavaScript Web Applications

ESEC/FSE 2011

Anders Møller, Magnus Madsen and Simon Holm Jensen

Page 2: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

2 / 28

Motivation

• How can we help developers writing JavaScript web applications?– by providing tools for findings bugs early in the

development cycle• In this work we focus on finding bugs in the

way JavaScript programs interact with the web browser

Page 3: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

3 / 28

JavaScript in a browser

The Document Object Model

interaction

JavaScript code

DOM manipulation

eventsweb browserrendering

user

Page 4: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

4 / 28

Example

The el.button property is always absent(it is undefined)

An HTMLImageElement object does not have a button property

Unreachable

The programmer has confused el and ev

Page 5: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

5 / 28

TAJS: Type Analysis for JavaScript

A tool for static analysis of plain JavaScript– the starting point for our work

– flow-sensitive dataflow analysis– interprocedural– whole-program analysis– intended for non-minified, non-obfuscated code

[S.H. Jensen, A. Møller and P. Thiemann SAS '09]

Page 6: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

6 / 28

Bug Finding

We look for general errors such as:– dead or unreachable code– invocations of built-in functions with an incorrect

number of arguments or wrong argument types– undefined dereference– reading absent properties– etc.

Page 7: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

7 / 28

Contributions

We extend the static analysis of TAJS to reason about JavaScript that execute in a browser:– how to model the browser API?• 100s of non-standardized objects and functions

– how to model the HTML page?• complex prototype hierarchy of the W3C DOM

– how to model the event system?• many kinds of events• dynamic registration of event handlers

Page 8: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

8 / 28

Architecture

potential errors

TAJS

DOM model

<form id="foo">...</div>

• Named tags

Flow graph extension

<script>...</script>

<div onclick="..."/>

• JavaScript code

• Event handler code

Browser API

Page 9: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

9 / 28

The Browser API

• The global window object– history, location, navigator, screen– alert(...), print(...), encodeURI(...) – setTimeout(...), setInterval(...)– addEventHandler(...)

• Non-standard and legacy functionality

Page 10: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

10 / 28

The HTML DOM

• The Document Object Model (W3C)– tree like structure– e.g. one JavaScript object for each HTML tag• HTMLInputElement, HTMLFontElement, etc.

– arranged in a large prototype hierarchy• Huge amount of properties and functions– most properties are string or integer constants

Page 11: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

11 / 28

The HTML DOM

• Important functions– createElement(...)– getElementById(...)– getElementByName(...)– getElementByTagName(...)

• The analysis tracks elements by:

Tag ID Name

<img id="foo" name="bar"/>

Page 12: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

12 / 28

Prototype Hierarchy

The complete model has ~250 objects and ~500 properties

Page 13: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

13 / 28

Choice of AbstractionModel the DOM objects as:

single abstract object single abstract object for every element kind

abstract object for every element in the initial

HTML page

Our Choice

<img><img><div>

<img><img><div>

<img><img><div>

Page 14: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

14 / 28

Straightforward Hierarchy?

• The image tag looks pretty innocent:

<img src="a.png" alt=""/>

• Image objects can be created in several ways:

new Image();document.createElement("img");

Page 15: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

15 / 28

Example

Page 16: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

16 / 28

Image Prototype Hierarchy

HTMLImageElement

(constructor obj)

Object(prototype obj)

Blue arrows are internal prototype linksRed arrows are external prototype links

Image(constructor obj)

Image(instance obj)

HTMLImageElement

(prototype obj)

HTMLImageElement

(instance obj)

Image(prototype obj)

Attached to window

Attached to window

new Image();document.createElement("img");

Page 17: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

17 / 28

Registration of Event Handlers

• Directly in the HTML source– <div onclick="...">

• Using the Browser API– setTimeout(...), setInterval(...)– addEventListener(...)

• Writes to "magic properties"– x.onclick = ...,

Special properties that have side-

effects on the DOM when written to

Page 18: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

18 / 28

Tracking Event Handlers

Separate event handlers based on their kind– page load (onload)– keyboard (onkeypress, ...)– mouse (onclick, onmouseover, ...)– timed (setTimeout, setInterval, ...)– etc.

Page 19: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

19 / 28

Flow graph Extension

Event handlers are executed by introducing an event-handler-loop– separates page load event

handlers from other event handlers

– executes event handlers in two non-deterministic loops

Page 20: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

20 / 28

Evaluation

• With these extensions TAJS can reason about JavaScript applications that run in a browser

• Is the analysis precise enough to be useful?

Page 21: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

21 / 28

Benchmarks

Evaluated on a series of benchmarks:– Chrome Experiments– Internet Explorer 9 Test Drive– 10K Challenge – A List Apart

– (excluding benchmarks using eval, jquery or not relevant for JavaScript)

Page 22: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

22 / 28

Research Questions

Q1: Ability to show absence of errors?The analysis is able to show that• 85-100% of call sites are safe• 80-100% of property reads are safe

Page 23: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

23 / 28

Research Questions

Q2: Ability to locate sources of errors?– We randomly introduce spelling errors– The analysis is able to pinpoint most of them

(details in the paper)

Page 24: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

24 / 28

Research Questions

Q3: Precision of computed call graph?The analysis is able to show that90-100% of call sites are monomorphic

Page 25: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

25 / 28

Research Questions

Q4: Precision of inferred types?– boolean, number, string, object and undefined – the analysis is able to show that the average type

size is 1.0-1.3• e.g. if the average type size is 1.0 then every read in the

program results in values of a single type

Page 26: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

26 / 28

Research Questions

Q5: Ability to detect dead or unreachable code?– found several unreachable functions– most appear to be unused library code copy &

pasted directly into the benchmark programs

Page 27: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

27 / 28

Future / Current Work

• Dynamically generated code– eval

• Library support– jQuery, MooTools, etc.

Page 28: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

28 / 28

Conclusion

Extended previous work to reason precisely about JavaScript programs that execute in a browser-based environment

allows us to discover general errors such as:• reading absent properties• dereferencing null or undefined• invoking functions with incorrect arguments• etc.

Page 29: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

29 / 28

Page 30: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

30 / 28

DOM Modules & Levels

Module \ Level Level 0 Level 1 Level 2 Level 3

Core Module - ()HTML Module - ()Event Module - - ()CSS Module - - () ()Browser API - - -

Year ~1996 1998 2000 2004

In addition we support the HTMLCanvasElement from HTML5.

Page 31: 1 / 28 Modeling the HTML DOM and Browser API in Static Analysis of JavaScript Web Applications ESEC/FSE 2011 Anders Møller, Magnus Madsen and Simon Holm

31 / 28

Soundness Issues?

Assignment to computed property names

foo[bar] = "baz"foo[bar] = function() {...}

If the exact value of bar is unknown:– it could be a write to a "magic property"– or a registration of an event handler