Automatic JavaScript Parallelism for Resource-Efficient

Preview:

Citation preview

Horcrux:Automatic JavaScript Parallelism for

Resource-Efficient Web Computations

Shaghayegh Mardani1, Ayush Goel2, Ronny Ko3, Harsha Madhyastha2, Ravi Netravali41UCLA, 2University of Michigan, 3Harvard University, 4Princeton University

1

Modern Web Browsing

2

Performance MattersWeb Traffic is Dominated by Mobile

Users

53% of visits are abandoned if a

mobile site takes longer than 3

seconds to load.Source: thinkwithgoogle.com

Content Providers

Source: bluecorona.com

If your site makes $100,000/day, 1 sec

improvement in page load increases revenue

by $7,000 daily.

The Problem: Computation Delays

• Evaluation setup

• With NO network delays:

3

Developed Region Emerging Region

• 582 pages in US• Google Pixel 3

• 2.0 GHz octa-core

• 91 pages in Pakistan• Redmi 6A

• 2.0 GHz quad-core

Performance metrics:• Page Load Time (PLT)• Speed Index (SI)

% of pages with load times > 3

Why Are Computation Delays Significant?

4

• 80% more JavaScript over the last 5 years

Mobile DevicesJavaScript

primary compute improvements

=more cores

52%

More cores != Better performance

5

Reason: Single-thread execution Solution: Parallelizing JavaScript

Parallelization Opportunities

• Web workers are widely supported by browsers• Constrained APIs:• No access to DOM APIs• No access to the main global state

• Legacy pages are highly amenable to safe parallelization

6

Pass-by-value

DOM APIs

Main Thread

JS Heap

JavaScript Engine

# of cores Speedup in JS Runtimes

2 cores 49%

4 cores 75%

8 cores 87%

Web Worker

Challenges

7

Conservative Signatures

C1: Ensuring CorrectnessThe exact state accessed by parallelized JavaScript- Despite non-determinism and across all possible control paths

Signature Generation

8

Dynamic Instrumentation

Instrumented

f(x)

Conservative Signatures

Original Page

Headless Browser

Concolic Engine

concrete input

JS codeto execute

f(x)

g(x)

h(x)

f´(x)

g´(x)

h´(x)

g(x) h(x)

if (x == 5) {y = x + 5;

} else {z = 9;

}

{"reads":[x],"writes":[y, z]

}Server-side and Offline

Conservative signaturesDynamic analysis: track read/writes to page state

Concolic execution: explore all possible control paths

Challenges

9

Client-sideParallelization

Scheduler

Conservative Signatures

C1: Ensuring CorrectnessThe exact state accessed by parallelized JavaScript- Despite non-determinism and across all possible control paths

C2: Web Workers Constrained APIs- How to parallelize execution with constrained APIs?

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

10Main Thread

Scheduler

Page State

JS Heap

DOMQueue

f(x) +

g(x) +

h(x) +

Web Worker

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

11Main Thread

Scheduler

Page State

JS Heap

DOMQueue

g(x) +

h(x) +

Web Worker

no state dependenciesx = "foo"

y = 5

f(x) reads & writes to x, yf(x) +

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

12Main Thread

Scheduler

Page State

JS Heap

DOMQueue

g(x) +

h(x) +

Web Worker

f(x) +

x = "foo"y = 5

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

13Main Thread

Scheduler

Page State

JS Heap

DOMQueue

g(x) +

h(x) +

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

14Main Thread

Scheduler

Page State

JS Heap

DOMQueue

h(x) +

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

g(x) reads y

f(x) reads & writes to x, y

state dependency

g(x) +

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

15Main Thread

Scheduler

Page State

JS Heap

DOMQueue

h(x) +

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

g(x)+

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

16Main Thread

Scheduler

Page State

JS Heap

DOMQueue

h(x) +

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

h(x) reads z

g(x)+

no state dependenciesz = "bar"

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

17Main Thread

Scheduler

Page State

JS Heap

DOMQueue

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

g(x)+

h(x) +

z = "bar"

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

18Main Thread

Scheduler

Page State

JS Heap

DOMQueue

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

g(x)+

Initializestate

Executeh(x)

z = "bar"

Horcrux Dynamic Scheduler

• Runs in the main browser thread• Only task: manage offloads in event-driven mode

19Main Thread

Scheduler

Page State

JS Heap

DOMQueue

Web Worker

Initializestate

x = "foo"y = 5

Executef(x)

g(x)+

Initializestate

Executeh(x)

z = "bar"

Pauses the execution

getEle

mentBy

Id()

Challenges

20

Client-sideParallelization

Scheduler

Conservative Signatures

C1: Ensuring CorrectnessThe exact state accessed by parallelized JavaScript- Despite non-determinism and across all possible control paths

C2: Web Workers Constrained APIs- How to parallelize execution with constrained APIs?

Root-function Granularity

C3: Offloading overheads- Pass-by-value I/O can take ~0-10 ms

Offloading Granularity

• Trade-off: parallelization benefits vs. offloading overheads• Solution: offloading root-function invocations

21

A()

B()

C()

D()

<script>function A() {

…B(); // invokes C()D();…

}A();

</script>

Original Page Root-function

4x fewer offloads and 73% of parallelization

benefits

Evaluation Questions

• Impact on browser computation delays?• Impact on end-to-end performance?• Horcrux comparison to prior compute optimizations?• What do conservative signatures forgo?• How much are the server-side overheads?

22

Computation Delay Reductions

• Total Computation Time (TCT)

23

0

20

40

60

80

Developed Emerging

% Im

prov

emen

t in

TC

T

Network: WiFi LTEMedian Improvements:

Developed: WiFi (41%), LTE (34%)

Emerging: WiFi (44%), LTE (31%)

End-to-end Performance Improvements

• Page Load Time (PLT) and Speed Index (SI)

24

0.00

0.25

0.50

0.75

1.00

0 25 50 75% Improvement

CD

F

LTE, PLTLTE, SIWiFi, PLTWiFi, SI

100

27%

Developed Region

0.00

0.25

0.50

0.75

1.00

0 25 50 75% Improvement

CD

F

LTE, PLTLTE, SIWiFi, PLTWiFi, SI

100

29%

Emerging Region

Conclusion

• More cores != Better performance• Horcrux automatically parallelizes JavaScript execution using concolic

execution to take advantage of phones multi-core CPUs

25

shaghayegh@cs.ucla.edu

github.com/ShaghayeghMrdn/horcrux-osdi21

Recommended