Upload
harsha-padmanabha
View
6.802
Download
5
Embed Size (px)
DESCRIPTION
The presentation explains the need for Open source drivers for graphics in the mobile space. Unlike the PC the mobile system is more dynamic and has a broader innovation space. Will Android push this among the IP vendors ?. more information on www.soulbuzz.net
Citation preview
MOBILE GRAPHICSThe need for Open Source Drivers & Stack
Harsha Padmanabhahttp://soulbuzz.net
TOPICS
• Mobile Graphics : An Introduction
• Application Processors : Missing datasheets
• Mobile graphics pipeline & cores
• A developers perspective on Graphics & Games
• Case Study : Developing a Android based product
TYPICAL MOBILE SOC
Display
Video
Graphics
AudioCommunication
Bus Interconnect
ARM
OpenGL ES 1.1OpenGL ES 2.0
MPEG-1/2/4, H.264, VC-1
Post ProcessingDeInterlacing
.......MP3, AAC, 3GPPLTE, 3G, Wifi
Bluetooth
And
roid
, Li
nux,
Mee
Go,
Sym
bian
, W
indo
ws
Mob
ile
IPHONE : A4 PROCESSOR
• A4 Processor
• Main processor for the Apple iPad
• 1GHz ARM Cortex A8 45nm core
• NEON SIMD
• Cache size:
• L1I$=32KB
• L1D$=32KB
• L2=512KB
• Graphics engine:
• PowerVR SGX 3D engine from Imagination Technologies
• Video Engine
• PowerVR VXD / VXE : MultiProfile video encode/decode
• 53mm² on 45nm LP
!"#$%&'( !"#$%&'(
!""#$%!&%'()*$++)( ,-.%,/'.001!10%'()*$++)(
2 ! !""#$%!&%3+4%,56+789%,/'.001! .)8:;<$8=;5#
.>;"%,;?$%@%/2%66A .>;"%,;?$%@%BA%66A
SGX 535
ARM A8
VXD
www.ubmtechinsights.com/.../Apple%20A4%20vs%20SEC%20S5PC110A01.pdf
NVIDIA TEGRA 250
• 8 separate cores
• Dual ARM Cortex A9
• ARM9 300MHz
• 1 HD Encode, 1 Decode
• OGLES 2.0 Graphics core
• Image Processor, for HD Camera
• Lots of internal memories
Dual ARM A9
HD Decode
HD Encode
2D/3D GPU
ARM 9
http://www.anandtech.com/show/2911
MOBILE OS MARKET SHARE
2%2%
18%
17%
5%14%
41%
Symbian iOS Windows MobileAndroid RIM LinuxOthers
http://www.gartner.com/it/page.jsp?id=1421013
TOP APPLICATIONS
RANK CATEGORIES1 Games
2 Social Networking
3 Mail & Messaging
4 Music
5 Entertainment
6 Weather
7 Sports
8 Education & Employment
9 News
10 Health & Fitness
http://www.millennialmedia.com/wp-content/images/mobilemix/MM-MobileMix-Oct2.pdf
TODAY’S GRAPHICS PERFORMANCE
Processor GPU Mhz (M) Tri/Sec (M) Pixels/Sec
Texas InstrumentsOMAP
PowerVR SGX 535 130 28 500*
ST Micro ARM MALI 400 275 30 400*
Marvell ARMADA
VivanteGC 800 250 25 375*
References
http://arm.com/products/multimedia/mali-graphics-hardware/mali-400-mp.php
http://www.vivantecorp.com/p_mvr.html#GC800
* Estimates as per marketing data
VISUAL COMPUTING
Embedded 3D
3D Digital Asset Exchange format
High-level Enhanced Audio
Vector 2D
Inter-API Interoperability Hub
Mobile OS resource abstraction
Heterogeneous Parallel
Programming
Plugin-free 3D Web Content
Streaming Media and Image Processing
High-level Steaming Media Recording and Playback
Codec Creation Window System Acceleration
Authoring & Accessibility
Multimedia Frameworks
Acceleration
System Integration
All logos & trademarks are copyright www.khronors.org
MOBILE GRAPHICS API• OpenGL ES 1.1
• Fixed Function API derived from OpenGL 1.3/1.4
• OpenGL ES 2.0
• Complete programmable API, GLSL support derived from OpenGL 2.0
• OpenGL ES 3.0*
• Future, planned for 2011/12, cross compatibility with OpenGL 4.0
• EGL 1.4
• Visual API interoperability hub
• OpenCL 1.1
• Open compute, supports GPGPU on multicore architecture
OPENGL ES VERSIONS• OpenGL ES 1.1 – fixed-function pipeline
• Based on OpenGL 1.5
• Vertex Arrays / Buffer Objects
• Transform & Lighting
• Multi-texturing (min 2 units)
• Fixed-point & Floating-point profiles
• OpenGL ES 2.0 – programmable pipeline • Based on OpenGL 2.0
• Adds vertex and fragment shader programming
• Removes fixed function pipeline
• Super-compact, efficient API
• High level language (GLSL ES)
• On-line or off-line compilationImages & text Copyright Khronos.org
Images Copyright Rightware
OPENGL ES 2.0 PIPELINE
API
PrimitiveProcessing
VertexBuffer
Objects
PrimitiveProcessing
PrimitiveProcessing
VertexShader
AlphaTest
DepthTest
ColourBufferBlend
Dither Framebuffer
FragmentShader
Triangles/Lines/Points
PROGRAMMABLE HARDWARE
ShaderCore
Texture
Cache
Fetch
Texture
Cache
Fetch
Input Assembly
Scheduler
Rasterizer
Alpha / Depth Test
Output Blend
ShaderCore
ShaderCore
ShaderCore
DIFFUSE SHADER
<diffuseShader>:
Sample r0,v4, t0, s0
mul r3, v0, cb0[0]
madd r3, v1, cb0[1], r3
madd r3, v2, cb0[2], r3
clmp r3, r3, 1(0.0), 1(1.0)
mul o0,r0,r3
mul o1,r1,r3
mul o2,r2,r3
mov o0, 1(1.0)
uniform sampler2D my_color_texture;
void main(){
// Defining The Material Colors const vec4 DiffuseColor = vec4(1.0, 0.0, 0.0, 1.0);
// Scaling The Input Vector To Length 1 vec3 normalized_normal = normalize(normal); vec3 normalized_vertex_to_light_vector = normalize(vertex_to_light_vector);
// Calculating The Diffuse Term And Clamping It To [0;1] float DiffuseTerm = clamp(dot(normal, vertex_to_light_vector), 0.0, 1.0);
// Calculating The Final Color gl_FragColor = AmbientColor + DiffuseColor * DiffuseTerm;}
Shader compiler
Input Fragment Output Shaded Fragment
THE DRIVER ARCHITECTURE
Hardware [ GPU + Display Controller ]
GPU Driver
OpenGL ES EGL
L2-Stack
FB Driver
Shader Compiler
Kern
el S
pace
Use
r Sp
ace
KERNEL DRIVERS
• Device State & Information
• Memory Management
• Register Allocation
• IRQ handling
• Performance Counters
• Framebuffer Access
USER SPACE DRIVER
• Based on OpenGL ES state trackers
• Consider OpenGL ES as a big state machine
• Some time contains essential algorithms GPU manipulation
• Use of host CPU for setup
• Tessellation for example
• Emulate calls not supported by GPU
• Just-In-Time Shader compiler
GRAPHICS PC EMULATOR
Hardware [ GPU + Display Controller ]
GPU Driver
OpenGL ES EGL
FB Driver
Shader Compiler
Kern
el S
pace
Use
r Sp
ace
MESA GL /Proprietary OGL GLSL
MEEGO QEMU EMULATOR
http://conference2010.meego.com/sites/all/files/sessions/opengl_acceleration_in_meegosimulatorandemulator.pdf
Frame Buffer
Offscreen BufferLibGL Server Stub
ProcessManagement
Virtual I/O Device
X Server
LibGL Client
ApplicationKernel
Virtual I/ODriver
QEMU HOST
Client OS
THE EMULATOR DRAWBACKS• Does not emulate all OpenGL ES functions
• OpenGL ES is usually mapped to OS OpenGL calls
• Shader compilers are not optimized
• Too many SW layers, consider QMEU as well
• Rendering is not pixel accurate
• Most emulators don’t support texture compression formats
• Texture bandwidth & Memory is not accurately depicted
• On Intel integrated GPUs rendering is mostly SW
• Only recently have Nvidia & AMD started supporting OGLES 2
• Performance counters ?, PBuffers, Depth Size variations........
ANDROID GRAPHICS ARCHITECTURE
FBDev
libagl
Android JSR
HAL
GPU KDrv
ARM + NeoN
copyBLIT
SW renderer
EGLOpenGL ES
JNI Wrapper
JNI Wrapper
Surface Flinger
SKIA
User Space
Kernel Space
Gralloc
libSkiaHW
MEEGO GRAPHICS ARCHITECTURE
QT Paint Engine
X Server (optional)
User Space
Kernel Space
QT OpenGL Wrapper QScreen
QML + API
OpenGL ES 2.0 OpenGL ES 1.1
GPU KDrv FBDev
EGL
A PLATFORM SCENARIO• Consider a SoC with ARM Cortex A9 + SGX535 GPU
• Intel Atom Z6xx with x86 + SGX535
• As a system integrator you want to “sell products”
• Simpler to maintain a common set of drivers
• But GPU drivers are an issue
• Android userspace drivers are compiled against bionic libc
• MeeGo for example uses glibc and generic linux stack
• DirectX support only on Intel SGX, not on TI OMAP
• Symbol & linker errors
• PC development uses an emulation layer with PC Graphics
• Performance on the same GPU varies
MOBILE GRAPHICS BIG PICTURE
PC Development Device
AndroidQEMU ARM EmulationOGLES Based on AMD
PowerVR SGXQualcomm AMDNvidia Tegra
iPhoneSimulator, x86 codeOpenGL ->OGLES
PowerVR MBXPowerVR SGX
MeeGoQEMU ARM EmulationX11, OGLES Passthrough
PowerVR SGXARM MALI
GPU IP
Emulators Emulator Features API Support
Imagination TechWin32 - OpenGLLinux - Mesa/OpenGL
No PVRTC support OpenGL ES 2.0
ARM MALIWin32 - OpenGLLinux - Mesa/OpenGL
OpenGL ES 2.0
Nvidia Tegra Win32 - OpenGL ESNo AntialiasingNo ETC1
OpenGL ES 2.0
Vivante Win32 - OpenGL Basic OpenGL ES 2.0
Qualcomm*Win32 - OpenGLAndroid - OpenGL
Supports performance counters
OpenGL ES 2.0
THE DEVELOPER• OpenGL ES is a standard, conformance guarantees the
implementation & functionality
• Fragmentation, a lot of players
• Khronos does not specify
• Texture Compression formats, ETC, ETC2, S3
• Non power of 2 textures, wrapping & mip-mapping
• Shader Application Binary Interface
• Depth size is not constant , varies 24bit PVR, 16Bit Tegra
• Font rendering , Tessellation of Geometry
• Although FSAA has become a standard
• There is no API to turn off AA feature
WHAT HIDDEN IN SWSoC OMAP 3730
SoC Process 45nm
OS Android 2.2
Mobile DDR2 256MBytes
Screen Resolution 800x480
CPU ARM Cortex A8
DSP C64x TI DSP
GPU SGX 530
ARM 600 MHz ARM 800MHz
GPU 200MHzGPU 200MHz
Draw Image 16.0 21.0
Utah TeaPot 30.33 39.19
OGLES Fog 110.71 129.78
OGLES Blending 105.43 118.92
http://code.google.com/p/0xbench/wiki/Benchmarks
SUMMARY OF THE ISSUES
• Many API ( OpenGL ES, OpenVG, OpenCL, OpenMAX )
• Many OS ( Linux, Android, MeeGo, Symbian, WinPhone7 )
• Many GPU IP ( SGX, MALI, TEGRA, Vivante, DMP )
• Binary portability of Shader programs
• Extensions differentiate GPUs, also fragments
• Does not provide for easy transition among vendors IPs
• SW/HW partition is not transparent
• Performance of emulators ( Linux, Windows, OS X )
DRIVER ARCHITECTURE
Hardware [ GPU + Display Controller ]
GPU Driver
OpenGL ES EGL
L2-Stack
FB Driver
Shader Compiler
Kern
el S
pace
Use
r Sp
ace
Open Source
USER SPACE DRIVER
Source : http://llvm.org , 10-lattner-OpenGL.pdf
OpenGL ES 2.0
OpenGL ES 2.0PC Emulation
OpenGL ES 1.1
OpenCL 1.1
SGXLLVM Backend
MALILLVM Backend
PC GPU EmulationLLVM Backend
Gallium
LLVM
HOW DOES IT WORK
• OpenGL ES, OpenCL, OpenMAX are all standard APIs
• Gallium outputs LLVM intermediate representation
• LLVM optimizes this IR
• Each GPU driver has a LLVM backend description
• LLVM Backend describes
• Instruction Set
• Registers
• Constraints
WHAT DOES THIS OFFER• Portable infrastructure supports all GPUs & CPUs
• Reuse of optimization paths
• PowerPC ( Altivec )
• ARM ( Neon )
• x86 ( SSE )
• Smaller driver, simpler to maintain
• Interface to inject proprietary code as LLVM backend
• Support a common set of graphics functions across platforms
• Support a number of different APIs and bring cohesion
REALITY CHECK
• OpenGL support on x86 without explicit Graphics card Leopard (OS X 10.5)
• Step1: gcc front end parses OpenGL C code
• Step2: GLSL or shader is compiled using clang/LLVM
• Step3 : LLVM IR is produced to be further optimized by LLVM JIT
Llvm-gcc Disk
OpenGLParser
OpenGL to LLVM LLVM Optimizer LLVM JITGLSL
LLVM IR LLVM IROpenGL AST
Opcode-functions
C Code Bytecode
REALITY CHECK• OpenCL compilation process on
SnowLeopard (OSX 10.6)
• Step1: Compile OCL to LLVM IR (Intermediate Representation)
• Step2: Compile to target device
• NVIDIA GPU device compiles the LLVM IR in two steps:
• LLVM IR to PTX (CUDA IR) – PTX to target GPU
• CPU device uses LLVM x86 BE to compile directly to x86 Binary code.
OpenCLCompute Program
X86Binary
G80Binary
G92Binary
G200Binary
PTX IR
LLVM IR
http://www.haifux.org/lectures/212/OpenCL_for_Halifux_new.pdf
NVIDIAX86 LLVMBack-End
OpenCL front-end [ clang ]
CAN IT BECOME A REALITY
• Fabless companies should force this on IP vendors
• Projects like Android & MeeGo can help push SoC providers
• Ultimately its the product companies, Nokia, Motorola etc
• Developers
• Maybe a commercial FOSS company
MISC REFERENCES IN SLIDES
• [Slide-1] http://www.nobodysplace.nl/pacman.jpg
• ATI RUBY : http://www.pcgameshardware.com/&menu=browser&image_id=999114&article_id=680582&page=1&show=original
• http://cdn.erictric.com/wp-content/uploads/2010/06/samsung-galaxy-s-stock-shot.jpg
• http://media.photobucket.com/image/iPhone4/defskilz14/blog%20pics/torch_iphone4.jpg
• http://s08.idav.ucdavis.edu/fatahalian-gpu-architecture.pdf
• http://s08.idav.ucdavis.edu/fatahalian-gpu-architecture.pdf
ApplicaCon
gl2ext.h
gl2pla+orm.h
gl2.h
eglext.h
eglpla+orm.h
egl.h
libGLESv2.lib
include
include
include
khrpla+orm.h
libEGL.lib
link
link
LinuxX11
libEGL.dll
libEGL.so
libGLESv2.dll
libGLESv2.so
WinXPGDI
GPU
OpenGL32.dll
xxx_DRI.so
Components of OpenGL ES 2.0 simulator