36
Rendering Web Content @ 60FPS Vangelis Kokkevis & Brian Salomon [email protected] bsalomon@Google. com

WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

Embed Size (px)

DESCRIPTION

Presentation WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon at the AMD Developer Summit (APU13) Nov. 11-13, 2013.

Citation preview

Page 1: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

Rendering Web Content @ 60FPS

Vangelis Kokkevis & Brian [email protected] bsalomon@Google.

com

Page 2: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Google Chrome

● Recently celebrated Chrome’s fifth anniversary!● Hundreds of millions of active users● Cross platform:

○ Windows (XP +) , Mac, Linux○ Chrome OS (x86 and ARM), Android, iOS (*)

● Open source: Chromium and Blink● Rapid release cycle, four channels (canary, dev, beta, stable)● Core Principles: Speed, Security, Stability, Simplicity

Page 3: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Renderer

V8 (JavaScript)Blink (Web Renderer) Skia (2D graphics)

Renderer

V8 (JavaScript)Blink (Web Renderer) Skia (2D graphics)

Chrome’s Multi-Process Architecture (pre-GPU)

Browser

Renderer

V8 (JavaScript)Blink (Web Renderer) Skia (2D graphics)

Shared Memory

ScreenUser Input

Page 4: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Why use the GPU?

● Enable new platform features:○ 3D CSS, WebGL

● Speed & Responsiveness○ Less jank: Smoother scrolling, 60fps CSS animations○ Page “sticks to your finger”○ Faster <canvas>, <video>

Page 5: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Accelerated Compositing

Re-rasterizing is expensive and should be avoided if possible

Caching rasterized contents into textures is an effective way to reduce raster costs.

Split the page contents into layers, use the GPU to composite them

What gets a layer?

● Content that rasters on the GPU: WebGL, 2D Canvas, Video, Flash● Content that is expected to change infrequently:

○ CSS transform and opacity animations○ Overflow scroll○ Fixed position elements

● Content that overlaps other composited content

Page 6: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Compositing Layers

Page 7: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Compositor

The Rendering Pipeline

Run ScriptUser Input or Timer

Event

Rasterize Invalidated

Content

Upload New Content to Textures (if needed)

Draw Textured Quads

Re-Layout Document

< 16ms =

Page 8: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Tiling

Large content layers get tiled

● Layer split up into 256 x 256 or 512 x 512 pixel tiles● Cache rasterized contents in manageable chunks to

○ Speed up scrolling○ Conserve VRAM

Page 9: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Tiling Example

Page 10: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

GPU Architecture

Browser

Renderer

Blink (WebGL)Skia (Canvas)Compositor

Screen

Transfer buffer

Transfer buffer

Transfer buffer

GPU Process

ANGLE (GL ES -> D3D)

GLES2 Client GLES2

Service

CMDringbuffer

CMDringbuffer

Shared Memory

CMDringbuffer

Page 11: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

The Challenge

JS Layout Rasterize Upload Draw

JS Layout Rasterize Upload Draw

16ms 16ms

Ideally….

Page 12: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

The Challenge

JS Layout Rasterize Upload Draw

16ms 16ms

In practice...

Page 13: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Threaded Compositing

JS Layout Rasterize

Upload Draw

16ms 16ms

Solution: Move compositing to its own thread

Draw

Main Thread

Compositor Thread

Upload

Page 14: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Good enough?

The devil’s in the details

● Need to aggressively pre-paint tiles to avoid running out of rasterized content in the compositor thread when scrolling.

● How many tiles to pre-paint?○ Too many: VRAM pressure, possibly lots of unnecessary work○ Too few: Checkerboarding

Page 15: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Deferred Rasterization

JS Layout Record Display List

Draw

16ms 16ms

Less checkerboarding: Move raster out of main thread

Draw

Main Thread

Compositor Thread

UT

Raster Thread(s)

Sort Tiles

Issue Raster Tasks

RT RT RT RT RT RT RT RT RT RT

UT UT UT UT UT UTSort Tiles

Issue Raster Tasks

Page 16: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Tooling

Lots of threads, lots of asynchronous tasks.

Good performance tools are a must for debugging and improving!

Tools we use when developing Chrome:

● Tracing (to monitor what each thread is doing in a timeline)● FrameViewer (Inspect layers, tiles and rasterization)● Telemetry (automated performance measurement framework)

Page 17: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Tracing

Page 18: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Frame-Viewer

Page 19: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Telemetry

Page 20: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Challenges

● Rasterization is a bottleneck ● The main thread is unpredictable (JS, layout, long records)● There’s not enough cores to go around (mobile)● Bandwidth is at premium● GPU is a shared resource and can get oversubscribed● Huge matrix of OS / GPU / CPU / Drivers

Page 21: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

What does the future hold

More performance gains:

● Hardware accelerated rasterization● “Zero-copy” texture uploads● Hardware accelerated image decode● Smarter and more efficient layers

Page 22: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Skia

● Portable 2D graphics/text engine○ Device independent coordinates○ 3x3 matrices w/ perspective○ Arbitrary clipping○ Transparency, anti-aliasing, dithering, filters○ Extension architecture for…

● Multiple Backends○ SW rasterizer○ GPU (“Ganesh”)○ PDF○ Picture (display list)

● Open source○ code.google.com/p/skia

Page 23: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Primitives

● Lines● Rectangles● Ellipses● Rounded Corner Rectangles● Text● ...● Paths

○ Made of contours○ Contours are connected set of Bezier

curves■ lines■ quadratics (rational)■ cubics

○ Can be filled or stroked○ Fills are based on winding number

Even/OddNon-Zero

Page 24: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Pipeline Stages

Programmable (via Subclassing)

● SkPathEffect○ Path -> Path○ e.g. Dashing

● SkRasterizer○ Path -> Coverage Mask○ e.g. ?? [considering deprecating]

● SkMaskFilter○ Coverage Mask -> Coverage Mask○ e.g. Blur

● SkShader○ Source-Space Coordinate -> Color○ e.g. Gradients, Bitmap Fill

● SkColorFilter○ Color -> Color○ e.g. Color Matrix, Blend with constant Color

SkPaint: Life of a Path

● SkImageFilter○ Src Image -> New Src Image○ e.g. Color Blur, Morphology Filter○ Subsume SkColorFilter?

● SkXfermode○ AKA Blend○ Src Color + Dst Color -> New Dst Color○ e.g. Porter-Duff modes, Darken, …

Fixed Function

● Stroking (width, caps, joins)● Text settings (typeface, pt size, …)● AA enable/disable● Image filtering quality level● Alpha● Default color if no SkShader

Page 25: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

GPU Shaders

GPU Backend has an “effect” system for building shaders

● Effects arranged in linear order.● Write a snippet of GLSL fragment

code.● Effect passes a vec4 “color” to the

next effect.○ Input to first effect is either

constant or per-vertex value.● Can insert uniforms, functions,

textures.● Internal effects can

○ Insert vertex shader code.○ Require additional vertex

attributes.

texture

matrix uniform

Initial Color

Color Effect 1

Color Effect 2

final color

Initial Coverage

Cov. Effect 1

Cov. Effect 2

final coverage

Cov. Effect 3

Important to keep color and fractional coverage separate.

Page 26: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Pipeline Stages and GPU Backend

● SkPathEffect○ Perform on CPU○ Call filterPath(), draw the resulting path○ Special hooks for some dashing cases○ Future: general mechanism to avoid creating intermediate path object on CPU

● SkRasterizer: ignored○ No known clients use custom rasterizers.○ Act as though no rasterizer installed

● SkMaskFilter:○ Filter object is given a gpu “context object” and primitive’s mask

■ Can create intermediate textures■ Performs draws using Effects■ Returns new mask as a texture.

○ Special case for filters that can be performed inline with the draw to dst○ In practice the only significant SkMaskFilter is blur○ Future: Specialize blur code path for simple primitive types (e.g. rects)

Page 27: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Pipeline Stages and GPU Backend Continued

● SkShader○ Produces an Effect object that is inserted into the draw○ Implementations for bitmap shaders, various gradient types, noise shader.

● SkColorFilter: ○ Produces an effect that receives SkShader effect’s output.○ Implementations for color matrix, color table, blend-against-const-color

● SkImageFilter:○ Works the same way as SkMaskFilter but with color input/ouput○ Implementations for

■ Color blur■ Lighting effect■ Any (color filter, shader, or xfermode) as an image filter

○ Graph implementation for chaining SkImageFilters together (CPU or GPU)■ SVG image filter DAG■ Future: Optimization pass to minimize intermediate draws.

○ Shortcuts for Image filters that can be done inline or are really just a matrix.

Page 28: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Pipeline Stages and GPU Backend Continued

● SkXfermode: Either as GL coefficients or Effect○ The Porter-Duff blend modes (src-over, etc) are all expressible as GL blend coeffs

■ Big caveat here

○ Many others are not:■ Luminance■ Darken■ Arithmetic■ …

○ Xfermode can install an Effect■ Access to the destination?

● Effect framework provides abstract interface for accessing the dst color● GL_EXT_shader_framebuffer_fetch if available● Future: GL_NV_texture_barrier● Otherwise a dst-copy-to-texture is triggered

Page 29: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Primitives: Text

● Skia sits on top of system font engine:○ FreeType○ CoreText○ GDI○ DirectWrite

● Large ALPHA8 texture used as glyph mask atlas (1024 x 2048)○ Will use a second RGB(A) texture if there are “LCD” glyphs○ Texture divided into 256x256 texel “plots”

● Strike: A unique combination of○ Typeface○ Size○ Style (italic, bold, …)

● Strikes claim (multiple) plots● Plots purged wholesale using LRU

Strike 1 Strike 0 Strike 2Strike 0

Strike 1 Strike 3 Strike 3Strike 2

Strike 3 Strike 3 Strike 1Strike 0

Strike 3 (free) Strike 2Strike 2

Page 30: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Primitives: Text Continued

● Glyphs packed in plots packed using Skyline algorithm [Jukka Jylänki http://clb.demon.fi/]● Attempt to perform all uploads for a frame before draws

○ Queue GL draws○ Uploads go through immediately

● Avoid flushing draws○ Only flush draws to GL when a plot is purged that is referenced in currently queued draws○ Matters a lot more on mobile, especially tiled architectures

● Works pretty well for scrolling● Struggles with pinch-zoom● Under development: distance field atlas

○ Same texture partitioning and replacement scheme○ “Masks” are (mostly) resolution independent

Page 31: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Primitives: Rects

Not anti-aliased: Simple, draw a quad!

Two approaches for anti-aliasing (non-MSAA):

● Geometric○ Create inner and outer offset geometry○ Offset is 0.5 pixels○ Use “coverage” vertex attribute

■ 0 at outer offset rect■ 1 at inner offset rect

○ Handle degenerate cases● Shader

○ Attributes:■ W = rect.width() + 0.5, H = rect.height() + 0.5■ Y = normalized y-axis of rect■ C = center of rect

○ coverage in Y at pixel P is clamp(H-((p - C) dot Y), 0, 1)

Geometry shaders could reduce VBO size and save CPU cycles

Y

Hp

CW

c=0

c=1

Page 32: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Primitives: Misc

Adaptations for stroked rectangles

Similar shader techniques for:

● Ellipses● Circles● Rounded-Rectangles

Page 33: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Primitives: Paths

Can’t anti-alias interior edge!

Can’t double blend in overlap!

Multiple edges from different contours relevant to pixels in concavities!

● Why are paths hard?○ In most general case have to handle both the fill rule and anti-aliasing○ After a blend coverage/alpha distinction is lost. Must only perform one blend in general.

Page 34: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Primitives: Paths Continued

● MSAA solves the AA problem● Use the stencil to solve the fill rule problem● Tessellate contours into line segments● Pass 1:

○ Draw the tessellated contours as triangle fan○ Disable color writes○ Stencil op: +1 for front face, -1 for back face

● Pass 2:○ Draw bounding geometry○ Enable color writes○ Stencil func

■ Winding: Pass if stencil is non-zero■ Even/Odd: Pass if LSB is 1

● Avoid tessellating quadratic and cubic beziers:○ Discard in FS if outside the curve [Kokojima et al.]○ Need per sample discard or sample coverage mask○ No-go on ES3 :(

+1

-1

+1

Pass 1

Pass 2

Page 35: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Primitives: Paths Continued

For AA paths without MSAA:

● Detect if path is one of the other primitive types (e.g. rounded rectangle)● If very thin stroke draw as AA lines (and ignore double blend problem)● If path is convex fill rule problem goes away

○ Fan the on-contour control points○ Draw bounding hulls of curves○ Compute coverage using implict eq. approx distance to curve [Loop-

Blinn]● Otherwise, SW rasterize mask and upload

Page 36: WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon

| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL

Questions

?