28
The ParLab Stack Parallel Computing Laboratory Sarah Bird May 30, 2013

The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

The ParLab Stack

Parallel Computing Laboratory

Sarah Bird

May 30, 2013

Page 2: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Bridging the Gap

Parallel Applications

Parallel Hardware

Parallel Software

IT industry Users

Easy to write correct programs that run efficiently on manycore

Page 3: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Integrated Software Stack Personal Health

Image Retrieval

Hearing, Music

Speech Parallel Browser

Motifs/Dwarfs

Sketching

Legacy Code

Schedulers Communication & Synch. Primitives

Efficiency Language Compilers

Legacy OS

Multicore/GPGPU

OS Libraries & Services

RAMP Manycore

Hypervisor

Co

rrec

tnes

s

Composition & Coordination Language (C&CL)

Parallel Libraries

Parallel Frameworks

Static Verification

Dynamic Checking

Debugging with Replay

Directed Testing

Autotuners

C&CL Compiler/Interpreter

Efficiency Languages

Type Systems

Page 4: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Hardware

• RAMP Gold

– Simulation

– 64 cores

– FPGA

RISC-V

Chip

FPGA

RAMP Gold

SPARC PTX

GPGPU

x86

Multicore

• RISC-V – Implementations

written in Chisel – Rocket

• 6 stage in-order

– Hwacha • 64 bit vector core

– FPGA • 2 Rocket

– 45 nm Chip • 1 Rocket, 1 Hwacha • 1 Ghz

• x86

• GPGPU

– Cuda

– OpenGL

Page 5: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Operating Systems

• Akaros

– Cloud OS

RISC-V

Chip

FPGA

RAMP Gold

SPARC PTX

GPGPU

x86

Multicore

• Tessellation

– Client OS

– Space-Time Partitioning

– Two-Level Scheduling

– QoS to Applications

– PACORA

• Linux

Akaros Tessellation Linux

Page 6: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Schedulers

• Lithe – Compose Parallel Runtimes

– Thread Building Blocks

– Open MP

RISC-V

Chip

FPGA

RAMP Gold

SPARC PTX

GPGPU

x86

Multicore

• PULSE – Framework to write schedulers

– Earliest Deadline-First

– Global Round Robin

Akaros Tessellation Linux

PULSE Lithe

OpenMP TBB EDF GRR

Page 7: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

SEJITS

RISC-V

Chip

FPGA

RAMP Gold

SPARC PTX

GPGPU

x86

Multicore

Akaros Tessellation Linux

PULSE Lithe

OpenMP TBB EDF GRR

ASP Copperhead

PyCASP

TFJ

Page 8: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Applications

RISC-V

Chip

FPGA

RAMP Gold

SPARC PTX

GPGPU

x86

Multicore

Akaros Tessellation Linux

PULSE Lithe

OpenMP TBB EDF GRR

TFJ ASP Copperhead

Optical Flow Music

Synthesis

PyCASP

Parallel Browser

PARDORA

Page 9: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Why Create an Integrated Prototype?

• Encourages Collaboration • Prevents neglecting important

pieces of the problem • Uncover opportunities for

invention by seeing which side of an interface is the best place to satisfy a requirement

• Demonstrate the importance of design simplicity

• Enhance the education of the PhD students in areas beyond their own specialties

• Help with technology transfer by giving concrete examples of our ideas for our colleagues in industry

Page 10: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Forces for Integration • Design Compatibility

– Shared Space and Discussions – Symbiotic Designs – Example: Music and Tessellation

• Customized Support – In-house experts helping adapt

their design to your problem – Examples: Lithe and Tessellation CAA and Applications

• Motivating Applications – Exciting to show your research on

run a compelling application – Examples: Patterns and MRI BFS and RISC-V

Page 11: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Integrated Demos in ParLab History

Page 12: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Stack Redesign

• Productivity programs can create applications that require efficiency – Scale – Performance

• Easily target many platforms and features – Example: Vector units on RISC-V Chip

• Efficient performance predictibility – Interactivity and Responsiveness – Realtime Performance

What did rethinking the entire computing stack at once get us?

Page 13: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Integrated Demos

• Two Demos to show off the ParLab Stack

• Fun and compelling applications that require efficiency

• Easily target many platforms and features

• Interactivity and Realtime Performance

• Integration!

Page 14: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Music Exploration and Recommendation

• Better Pandora

• Audio Content Analysis Framework

• Parallel Browser Big Data Visualization

• Demonstrates Scale and Responsiveness

Page 15: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Music Recommendation and Exploration

RISC-V

RV FPGA

PTX

GPGPU

x86

Multicore

Tessellation

Lithe

TBB

ASP

PyCASP

Parallel Browser

PARDORA

RISC-V

RV FPGA

PTX

GPGPU

x86

Multicore

Tessellation

Lithe

TBB

ASP

PyCASP

Parallel Browser

PARDORA

Page 16: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Virtual Musical Instrument

• Musical Instrument using a camera

• Music Synthesis

• Vision Applications

• Demonstrate Realtime

Page 17: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Virtual Musical Instrument

PTX

GPGPU

x86

Multicore

Tess Linux

PULSE

GRR

TFJ

Optical Flow

Music Synthesis

RISC-V

FPGA

Chip

Page 18: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Virtual Musical Instrument

PTX

GPGPU

x86

Multicore

Tess Linux

PULSE

GRR

TFJ

Optical Flow

Music Synthesis

RISC-V

FPGA

Chip

Page 19: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Virtual Musical Instrument

PTX

GPGPU

x86

Multicore

Linux

PULSE

GRR

TFJ

Optical Flow

Music Synthesis

RISC-V

RV FPGA

RV Chip

Tess

Page 20: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Virtual Musical Instrument

PTX

GPGPU

x86

Multicore

Tess Linux

PULSE

GRR

TFJ

Optical Flow

Music Synthesis

RISC-V

RV Chip

RV Chip

Page 21: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Music Recommendation & Exploration Demo

Leo Meyerovich, Katya Gonina, Gage Eads, Eric Roman, Eric Battenberg, Henry Cook, Gerald Friedland

End of ParLab Celebration

May 30, 2013

21

Page 22: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Demo

22

Page 23: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

System Overview

“Radiohead”

Client Server

Recommendation Engine 1K

clustered songs

23

1M songs

Layout Engine

sid_1, sid_2

“King Tubby”

SEJITS Parallel Browser

Tesselation

Page 24: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Client Architecture

24

Parallel JavaScript Libraries 1K

clustered songs

1M songs

Safari Web Browser

Parse (JSON)

Layout Render

viz.ftl Synthesizer

offline: compile-time

Page 25: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Music Recommendation Stack

ASP

PyCASP

OpenCL

GPGPU

Parallel Browser

PARDORA

Linux

CUDA / Cilk+

PTX

GPGPU

RISC-V

RV FPGA

x86

Multicore

Tessellation

Lithe

TBB

x86

Multicore

JavaScript

x86

Page 26: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Train a UBM*

Offline Phase

Online Phase

Find Closest Songs

Adapt UBM on query songs

Get potential neighbors using

Collaborative Filtering

* UBM = Universal Background Model 26

1M songs

1K clustered

songs

“Radiohead”

Server Architecture

SEJITS

GPU

CPU

FPGA

Tesselation

Page 27: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Music Recommendation Stack

ASP

PyCASP

OpenCL

GPGPU

Parallel Browser

PARDORA

Linux

CUDA / Cilk+

PTX

GPGPU

RISC-V

RV FPGA

x86

Multicore

Tessellation

Lithe

TBB

x86

Multicore

JavaScript

x86

Page 28: The ParLab Stackparlab.eecs.berkeley.edu/sites/all/parlab/files/The Par... · 2013. 6. 12. · of an interface is the best place to satisfy a requirement •Demonstrate the importance

Music Recommendation Stack

ASP

PyCASP

OpenCL

GPGPU

Parallel Browser

PARDORA

Tessellation

Lithe

TBB

x86

Multicore

RISC-V

RV FPGA

Linux

CUDA / Cilk+

PTX

GPGPU

x86

Multicore