The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of...

Preview:

Citation preview

The ParLab Stack

Parallel Computing Laboratory

Sarah Bird

May 30, 2013

Bridging the Gap

Parallel Applications

Parallel Hardware

Parallel Software

IT industry Users

Easy to write correct programs that run efficiently on manycore

Integrated Software Stack Personal Health

Image Retrieval

Hearing, Music

Speech Parallel Browser

Motifs/Dwarfs

Sketching

Legacy Code

Schedulers Communication & Synch. Primitives

Efficiency Language Compilers

Legacy OS

Multicore/GPGPU

OS Libraries & Services

RAMP Manycore

Hypervisor

Co

rrec

tnes

s

Composition & Coordination Language (C&CL)

Parallel Libraries

Parallel Frameworks

Static Verification

Dynamic Checking

Debugging with Replay

Directed Testing

Autotuners

C&CL Compiler/Interpreter

Efficiency Languages

Type Systems

Hardware

• RAMP Gold

– Simulation

– 64 cores

– FPGA

RISC-V

Chip

FPGA

RAMP Gold

SPARC PTX

GPGPU

x86

Multicore

• RISC-V – Implementations

written in Chisel – Rocket

• 6 stage in-order

– Hwacha • 64 bit vector core

– FPGA • 2 Rocket

– 45 nm Chip • 1 Rocket, 1 Hwacha • 1 Ghz

• x86

• GPGPU

– Cuda

– OpenGL

Operating Systems

• Akaros

– Cloud OS

RISC-V

Chip

FPGA

RAMP Gold

SPARC PTX

GPGPU

x86

Multicore

• Tessellation

– Client OS

– Space-Time Partitioning

– Two-Level Scheduling

– QoS to Applications

– PACORA

• Linux

Akaros Tessellation Linux

Schedulers

• Lithe – Compose Parallel Runtimes

– Thread Building Blocks

– Open MP

RISC-V

Chip

FPGA

RAMP Gold

SPARC PTX

GPGPU

x86

Multicore

• PULSE – Framework to write schedulers

– Earliest Deadline-First

– Global Round Robin

Akaros Tessellation Linux

PULSE Lithe

OpenMP TBB EDF GRR

SEJITS

RISC-V

Chip

FPGA

RAMP Gold

SPARC PTX

GPGPU

x86

Multicore

Akaros Tessellation Linux

PULSE Lithe

OpenMP TBB EDF GRR

ASP Copperhead

PyCASP

TFJ

Applications

RISC-V

Chip

FPGA

RAMP Gold

SPARC PTX

GPGPU

x86

Multicore

Akaros Tessellation Linux

PULSE Lithe

OpenMP TBB EDF GRR

TFJ ASP Copperhead

Optical Flow Music

Synthesis

PyCASP

Parallel Browser

PARDORA

Why Create an Integrated Prototype?

• Encourages Collaboration • Prevents neglecting important

pieces of the problem • Uncover opportunities for

invention by seeing which side of an interface is the best place to satisfy a requirement

• Demonstrate the importance of design simplicity

• Enhance the education of the PhD students in areas beyond their own specialties

• Help with technology transfer by giving concrete examples of our ideas for our colleagues in industry

Forces for Integration • Design Compatibility

– Shared Space and Discussions – Symbiotic Designs – Example: Music and Tessellation

• Customized Support – In-house experts helping adapt

their design to your problem – Examples: Lithe and Tessellation CAA and Applications

• Motivating Applications – Exciting to show your research on

run a compelling application – Examples: Patterns and MRI BFS and RISC-V

Integrated Demos in ParLab History

Stack Redesign

• Productivity programs can create applications that require efficiency – Scale – Performance

• Easily target many platforms and features – Example: Vector units on RISC-V Chip

• Efficient performance predictibility – Interactivity and Responsiveness – Realtime Performance

What did rethinking the entire computing stack at once get us?

Integrated Demos

• Two Demos to show off the ParLab Stack

• Fun and compelling applications that require efficiency

• Easily target many platforms and features

• Interactivity and Realtime Performance

• Integration!

Music Exploration and Recommendation

• Better Pandora

• Audio Content Analysis Framework

• Parallel Browser Big Data Visualization

• Demonstrates Scale and Responsiveness

Music Recommendation and Exploration

RISC-V

RV FPGA

PTX

GPGPU

x86

Multicore

Tessellation

Lithe

TBB

ASP

PyCASP

Parallel Browser

PARDORA

RISC-V

RV FPGA

PTX

GPGPU

x86

Multicore

Tessellation

Lithe

TBB

ASP

PyCASP

Parallel Browser

PARDORA

Virtual Musical Instrument

• Musical Instrument using a camera

• Music Synthesis

• Vision Applications

• Demonstrate Realtime

Virtual Musical Instrument

PTX

GPGPU

x86

Multicore

Tess Linux

PULSE

GRR

TFJ

Optical Flow

Music Synthesis

RISC-V

FPGA

Chip

Virtual Musical Instrument

PTX

GPGPU

x86

Multicore

Tess Linux

PULSE

GRR

TFJ

Optical Flow

Music Synthesis

RISC-V

FPGA

Chip

Virtual Musical Instrument

PTX

GPGPU

x86

Multicore

Linux

PULSE

GRR

TFJ

Optical Flow

Music Synthesis

RISC-V

RV FPGA

RV Chip

Tess

Virtual Musical Instrument

PTX

GPGPU

x86

Multicore

Tess Linux

PULSE

GRR

TFJ

Optical Flow

Music Synthesis

RISC-V

RV Chip

RV Chip

Music Recommendation & Exploration Demo

Leo Meyerovich, Katya Gonina, Gage Eads, Eric Roman, Eric Battenberg, Henry Cook, Gerald Friedland

End of ParLab Celebration

May 30, 2013

21

Demo

22

System Overview

“Radiohead”

Client Server

Recommendation Engine 1K

clustered songs

23

1M songs

Layout Engine

sid_1, sid_2

“King Tubby”

SEJITS Parallel Browser

Tesselation

Client Architecture

24

Parallel JavaScript Libraries 1K

clustered songs

1M songs

Safari Web Browser

Parse (JSON)

Layout Render

viz.ftl Synthesizer

offline: compile-time

Music Recommendation Stack

ASP

PyCASP

OpenCL

GPGPU

Parallel Browser

PARDORA

Linux

CUDA / Cilk+

PTX

GPGPU

RISC-V

RV FPGA

x86

Multicore

Tessellation

Lithe

TBB

x86

Multicore

JavaScript

x86

Train a UBM*

Offline Phase

Online Phase

Find Closest Songs

Adapt UBM on query songs

Get potential neighbors using

Collaborative Filtering

* UBM = Universal Background Model 26

1M songs

1K clustered

songs

“Radiohead”

Server Architecture

SEJITS

GPU

CPU

FPGA

Tesselation

Music Recommendation Stack

ASP

PyCASP

OpenCL

GPGPU

Parallel Browser

PARDORA

Linux

CUDA / Cilk+

PTX

GPGPU

RISC-V

RV FPGA

x86

Multicore

Tessellation

Lithe

TBB

x86

Multicore

JavaScript

x86

Music Recommendation Stack

ASP

PyCASP

OpenCL

GPGPU

Parallel Browser

PARDORA

Tessellation

Lithe

TBB

x86

Multicore

RISC-V

RV FPGA

Linux

CUDA / Cilk+

PTX

GPGPU

x86

Multicore

Recommended