Upload
amd-developer-central
View
816
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Presentation WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
Citation preview
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Google Chrome
● Recently celebrated Chrome’s fifth anniversary!● Hundreds of millions of active users● Cross platform:
○ Windows (XP +) , Mac, Linux○ Chrome OS (x86 and ARM), Android, iOS (*)
● Open source: Chromium and Blink● Rapid release cycle, four channels (canary, dev, beta, stable)● Core Principles: Speed, Security, Stability, Simplicity
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Renderer
V8 (JavaScript)Blink (Web Renderer) Skia (2D graphics)
Renderer
V8 (JavaScript)Blink (Web Renderer) Skia (2D graphics)
Chrome’s Multi-Process Architecture (pre-GPU)
Browser
Renderer
V8 (JavaScript)Blink (Web Renderer) Skia (2D graphics)
Shared Memory
ScreenUser Input
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Why use the GPU?
● Enable new platform features:○ 3D CSS, WebGL
● Speed & Responsiveness○ Less jank: Smoother scrolling, 60fps CSS animations○ Page “sticks to your finger”○ Faster <canvas>, <video>
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Accelerated Compositing
Re-rasterizing is expensive and should be avoided if possible
Caching rasterized contents into textures is an effective way to reduce raster costs.
Split the page contents into layers, use the GPU to composite them
What gets a layer?
● Content that rasters on the GPU: WebGL, 2D Canvas, Video, Flash● Content that is expected to change infrequently:
○ CSS transform and opacity animations○ Overflow scroll○ Fixed position elements
● Content that overlaps other composited content
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Compositing Layers
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Compositor
The Rendering Pipeline
Run ScriptUser Input or Timer
Event
Rasterize Invalidated
Content
Upload New Content to Textures (if needed)
Draw Textured Quads
Re-Layout Document
< 16ms =
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Tiling
Large content layers get tiled
● Layer split up into 256 x 256 or 512 x 512 pixel tiles● Cache rasterized contents in manageable chunks to
○ Speed up scrolling○ Conserve VRAM
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Tiling Example
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
GPU Architecture
Browser
Renderer
Blink (WebGL)Skia (Canvas)Compositor
Screen
Transfer buffer
Transfer buffer
Transfer buffer
GPU Process
ANGLE (GL ES -> D3D)
GLES2 Client GLES2
Service
CMDringbuffer
CMDringbuffer
Shared Memory
CMDringbuffer
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
The Challenge
JS Layout Rasterize Upload Draw
JS Layout Rasterize Upload Draw
16ms 16ms
Ideally….
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
The Challenge
JS Layout Rasterize Upload Draw
16ms 16ms
In practice...
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Threaded Compositing
JS Layout Rasterize
Upload Draw
16ms 16ms
Solution: Move compositing to its own thread
Draw
Main Thread
Compositor Thread
Upload
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Good enough?
The devil’s in the details
● Need to aggressively pre-paint tiles to avoid running out of rasterized content in the compositor thread when scrolling.
● How many tiles to pre-paint?○ Too many: VRAM pressure, possibly lots of unnecessary work○ Too few: Checkerboarding
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Deferred Rasterization
JS Layout Record Display List
Draw
16ms 16ms
Less checkerboarding: Move raster out of main thread
Draw
Main Thread
Compositor Thread
UT
Raster Thread(s)
Sort Tiles
Issue Raster Tasks
RT RT RT RT RT RT RT RT RT RT
UT UT UT UT UT UTSort Tiles
Issue Raster Tasks
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Tooling
Lots of threads, lots of asynchronous tasks.
Good performance tools are a must for debugging and improving!
Tools we use when developing Chrome:
● Tracing (to monitor what each thread is doing in a timeline)● FrameViewer (Inspect layers, tiles and rasterization)● Telemetry (automated performance measurement framework)
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Tracing
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Frame-Viewer
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Telemetry
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Challenges
● Rasterization is a bottleneck ● The main thread is unpredictable (JS, layout, long records)● There’s not enough cores to go around (mobile)● Bandwidth is at premium● GPU is a shared resource and can get oversubscribed● Huge matrix of OS / GPU / CPU / Drivers
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
What does the future hold
More performance gains:
● Hardware accelerated rasterization● “Zero-copy” texture uploads● Hardware accelerated image decode● Smarter and more efficient layers
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Skia
● Portable 2D graphics/text engine○ Device independent coordinates○ 3x3 matrices w/ perspective○ Arbitrary clipping○ Transparency, anti-aliasing, dithering, filters○ Extension architecture for…
● Multiple Backends○ SW rasterizer○ GPU (“Ganesh”)○ PDF○ Picture (display list)
● Open source○ code.google.com/p/skia
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives
● Lines● Rectangles● Ellipses● Rounded Corner Rectangles● Text● ...● Paths
○ Made of contours○ Contours are connected set of Bezier
curves■ lines■ quadratics (rational)■ cubics
○ Can be filled or stroked○ Fills are based on winding number
Even/OddNon-Zero
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Pipeline Stages
Programmable (via Subclassing)
● SkPathEffect○ Path -> Path○ e.g. Dashing
● SkRasterizer○ Path -> Coverage Mask○ e.g. ?? [considering deprecating]
● SkMaskFilter○ Coverage Mask -> Coverage Mask○ e.g. Blur
● SkShader○ Source-Space Coordinate -> Color○ e.g. Gradients, Bitmap Fill
● SkColorFilter○ Color -> Color○ e.g. Color Matrix, Blend with constant Color
SkPaint: Life of a Path
● SkImageFilter○ Src Image -> New Src Image○ e.g. Color Blur, Morphology Filter○ Subsume SkColorFilter?
● SkXfermode○ AKA Blend○ Src Color + Dst Color -> New Dst Color○ e.g. Porter-Duff modes, Darken, …
Fixed Function
● Stroking (width, caps, joins)● Text settings (typeface, pt size, …)● AA enable/disable● Image filtering quality level● Alpha● Default color if no SkShader
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
GPU Shaders
GPU Backend has an “effect” system for building shaders
● Effects arranged in linear order.● Write a snippet of GLSL fragment
code.● Effect passes a vec4 “color” to the
next effect.○ Input to first effect is either
constant or per-vertex value.● Can insert uniforms, functions,
textures.● Internal effects can
○ Insert vertex shader code.○ Require additional vertex
attributes.
texture
matrix uniform
Initial Color
Color Effect 1
Color Effect 2
final color
Initial Coverage
Cov. Effect 1
Cov. Effect 2
final coverage
Cov. Effect 3
Important to keep color and fractional coverage separate.
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Pipeline Stages and GPU Backend
● SkPathEffect○ Perform on CPU○ Call filterPath(), draw the resulting path○ Special hooks for some dashing cases○ Future: general mechanism to avoid creating intermediate path object on CPU
● SkRasterizer: ignored○ No known clients use custom rasterizers.○ Act as though no rasterizer installed
● SkMaskFilter:○ Filter object is given a gpu “context object” and primitive’s mask
■ Can create intermediate textures■ Performs draws using Effects■ Returns new mask as a texture.
○ Special case for filters that can be performed inline with the draw to dst○ In practice the only significant SkMaskFilter is blur○ Future: Specialize blur code path for simple primitive types (e.g. rects)
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Pipeline Stages and GPU Backend Continued
● SkShader○ Produces an Effect object that is inserted into the draw○ Implementations for bitmap shaders, various gradient types, noise shader.
● SkColorFilter: ○ Produces an effect that receives SkShader effect’s output.○ Implementations for color matrix, color table, blend-against-const-color
● SkImageFilter:○ Works the same way as SkMaskFilter but with color input/ouput○ Implementations for
■ Color blur■ Lighting effect■ Any (color filter, shader, or xfermode) as an image filter
○ Graph implementation for chaining SkImageFilters together (CPU or GPU)■ SVG image filter DAG■ Future: Optimization pass to minimize intermediate draws.
○ Shortcuts for Image filters that can be done inline or are really just a matrix.
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Pipeline Stages and GPU Backend Continued
● SkXfermode: Either as GL coefficients or Effect○ The Porter-Duff blend modes (src-over, etc) are all expressible as GL blend coeffs
■ Big caveat here
○ Many others are not:■ Luminance■ Darken■ Arithmetic■ …
○ Xfermode can install an Effect■ Access to the destination?
● Effect framework provides abstract interface for accessing the dst color● GL_EXT_shader_framebuffer_fetch if available● Future: GL_NV_texture_barrier● Otherwise a dst-copy-to-texture is triggered
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Text
● Skia sits on top of system font engine:○ FreeType○ CoreText○ GDI○ DirectWrite
● Large ALPHA8 texture used as glyph mask atlas (1024 x 2048)○ Will use a second RGB(A) texture if there are “LCD” glyphs○ Texture divided into 256x256 texel “plots”
● Strike: A unique combination of○ Typeface○ Size○ Style (italic, bold, …)
● Strikes claim (multiple) plots● Plots purged wholesale using LRU
Strike 1 Strike 0 Strike 2Strike 0
Strike 1 Strike 3 Strike 3Strike 2
Strike 3 Strike 3 Strike 1Strike 0
Strike 3 (free) Strike 2Strike 2
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Text Continued
● Glyphs packed in plots packed using Skyline algorithm [Jukka Jylänki http://clb.demon.fi/]● Attempt to perform all uploads for a frame before draws
○ Queue GL draws○ Uploads go through immediately
● Avoid flushing draws○ Only flush draws to GL when a plot is purged that is referenced in currently queued draws○ Matters a lot more on mobile, especially tiled architectures
● Works pretty well for scrolling● Struggles with pinch-zoom● Under development: distance field atlas
○ Same texture partitioning and replacement scheme○ “Masks” are (mostly) resolution independent
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Rects
Not anti-aliased: Simple, draw a quad!
Two approaches for anti-aliasing (non-MSAA):
● Geometric○ Create inner and outer offset geometry○ Offset is 0.5 pixels○ Use “coverage” vertex attribute
■ 0 at outer offset rect■ 1 at inner offset rect
○ Handle degenerate cases● Shader
○ Attributes:■ W = rect.width() + 0.5, H = rect.height() + 0.5■ Y = normalized y-axis of rect■ C = center of rect
○ coverage in Y at pixel P is clamp(H-((p - C) dot Y), 0, 1)
Geometry shaders could reduce VBO size and save CPU cycles
Y
Hp
CW
c=0
c=1
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Misc
Adaptations for stroked rectangles
Similar shader techniques for:
● Ellipses● Circles● Rounded-Rectangles
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Paths
Can’t anti-alias interior edge!
Can’t double blend in overlap!
Multiple edges from different contours relevant to pixels in concavities!
● Why are paths hard?○ In most general case have to handle both the fill rule and anti-aliasing○ After a blend coverage/alpha distinction is lost. Must only perform one blend in general.
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Paths Continued
● MSAA solves the AA problem● Use the stencil to solve the fill rule problem● Tessellate contours into line segments● Pass 1:
○ Draw the tessellated contours as triangle fan○ Disable color writes○ Stencil op: +1 for front face, -1 for back face
● Pass 2:○ Draw bounding geometry○ Enable color writes○ Stencil func
■ Winding: Pass if stencil is non-zero■ Even/Odd: Pass if LSB is 1
● Avoid tessellating quadratic and cubic beziers:○ Discard in FS if outside the curve [Kokojima et al.]○ Need per sample discard or sample coverage mask○ No-go on ES3 :(
+1
-1
+1
Pass 1
Pass 2
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Primitives: Paths Continued
For AA paths without MSAA:
● Detect if path is one of the other primitive types (e.g. rounded rectangle)● If very thin stroke draw as AA lines (and ignore double blend problem)● If path is convex fill rule problem goes away
○ Fan the on-contour control points○ Draw bounding hulls of curves○ Compute coverage using implict eq. approx distance to curve [Loop-
Blinn]● Otherwise, SW rasterize mask and upload
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Questions
?