30
Creation Being Originated From Practice Jianfeng Yang [email protected] Wuhan University

Creation Being Originated From Practice - · PDF file5 Northeast Univ. JinjiaoLi Computer Architecture Based on Multi-core Processor ... parallel_do (new) pipeline (improved) ... express

Embed Size (px)

Citation preview

Creation Being Originated From Practice

Jianfeng [email protected]

Wuhan University

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Agenda

• Introduction• Achievements• Course Development• TBB• Resources for curriculum building

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Introduction

• Multi-core Architecture & Multi-threaded Programming– Electronic Information College, Wuhan

Unviersity– 3 credits, 72 hours– 6th semester

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

MOE-Intel Model Curriculum, 2007

software design based on multi-coreHu ChenSouth China univ. of Technology

12

Compiler PrincipleHuowang ChenNational Univ. of Defense Technology

11

Multi-core Architecture & Multi-threaded ProgrammingJianfeng YangWuhan Univ.10

High Performance Processor ArchitectureHong AnUniversity of Science and Technology of China

9

Multi-core ComputingTianzhou ChenZhejiang Univ.8

Parallel programming Based On multi-core processorBin LuoNanjing Univ.7

Principle of OSHuijuan ZhangTongji Univ.6

Computer Architecture Based on Multi-core ProcessorJinjiao LiNortheast Univ.5

Parallel ComputingJizhou SunTianjing Univ.4

Principle of OSDan WangBeijing Univ. of Industry

3

Advanced Computer ArchitectureWeimin ZhengTSinghua Univ.2

Parallel Program DesignHuashan YuPeking Univ.1

Curriculum NamePrincipalUniversityNo

http://www.jpkcnet.com/new/zhengce/Announces_detail.asp?Announces_ID=94

Achievements

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Multi-core Training Workshop

• April 10-17, 2008• Participants

• 46 faculties from22 PRC Universities

Achievements

http://softwareblogs-zho.intel.com/2008/05/06/125/

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Intel Cup Embedded Contest

• 1st-Class Award – Vehicle Real-time Image Panorama System

• 2nd-Class Award– Intelligent Target Tracking and Photos Taking

System • 3rd-Class Award

– Video-based Safety Guardianship Aid

Achievements

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Teaching Groups CombinationCourse Development

Embedded System

TG

Multi-core Architecture & Multi-threaded Programming

TG

TG

combination

•Different Concepts & Design Methods

•Similar programming modules & tools

•Intend for the design of Embedded System with Multi/many -core processor

•One lab room

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Applications

• Packet processing• Image processing / searching• Data clustering with large data base• …

Course Development

Real-time system(May be I/O bound) Large scale computing

Control Oriented Computing Oriented

Mixed solution

More Parallelism or Concurrency will be BetterMore Parallelism or Concurrency will be Better

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

We use…

• Intel Compilers• Intel VTune Performance Analyzer• Threading Analysis

– Intel Thread Checker– Intel Thread Profiler

• Performance Library– Intel IPP– Intel OpenCV– Intel MKL

• OpenMP, MPI, Pthread…

Course Development

What Why How

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Threading Building Blocks (TBB)

• C++ template library• Parallelism: Higher-level, task-based• Outfitting C++ for multi-core processor parallelism• Initial released : Aug, 2006, current version: 2.1• Mainly focus on: Performance & Scalability• Will be the best solution of Many-core processor

Course Development

What Why How

http://www.threadingbuildingblocks.org/

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

TBB

• Most students are family with C++– Easy to follow

• you specify tasks instead of threads– Library maps your logical tasks onto physical

threads, efficiently using cache and balancing load

• avoid tedious works of creating and managing threads

• avoid inefficient programs

Course Development

What Why How

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

TBB

• Comparison with Raw Threads– Raw threads, e.g. pthread, windows threads

• intended for shared memory parallelism• Lowest level of controlling parallelism• Cost much in programming, debugging,

maintenance.• Difficulty in programming Correctly and efficiently• Load imbalances

Course Development

What Why How

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

TBB

• Comparison with MPI– MPI intended for distributed memory system– Cost much on communication among

processes for synchronization

Course Development

What Why How

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

TBB

• Comparison with OpenMP– Intended for shared memory, First standard: 1997.

Supported by most Compilers– You need to choose one of the scheduling approach for

loop iterations: • Static, Dynamic, Guided

– TBB:• divide-and-conquer scheduling approach, you don’t need to

worry about the scheduling policies.

– Reduction:• OpenMP: Build-in types• TBB: All types

Course Development

What Why How

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

TBB

• Besides:– Scalability

• Tuning grain size• Do well on single-core platform• Especially, Move to many-core processor

Course Development

What Why How

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Key contents of TBBCourse Development

What How

Synchronization primitivesatomic operations

various flavors of mutexes (improved)

Parallel algorithmsparallel_for (improved)

parallel_reduce (improved)parallel_do (new)

pipeline (improved)parallel_sortparallel_scan

Concurrent containersconcurrent_hash_map

concurrent_queueconcurrent_vector

(all improved)

Task schedulerWith new functionality

Memory allocatorstbb_allocator (new), cache_aligned_allocator, scalable_allocator

Utilitiestick_count

tbb_thread (new)

Why

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

TBB– Task Based ApproachCourse Development

What Why How

Work-stealing balances loadLoad imbalance

Programmer specifies tasks, not threadsHigh overhead

Non-preemptive unfair schedulingFair scheduling

One scheduler thread per hardware threadOversubscription

Intel® TBB ApproachProblem

• Intel® TBB provides C++ constructs that allow you to express parallel solutions in terms of task objects – Task scheduler manages thread pool – Task scheduler avoids common performance problems of

programming with threads

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Teaching TBB contents

• Be delivered after OpenMP and MPI contents– The basic thread and threading concept have been

accepted by students• Analysis the program which was constructed by

TBB, with Intel programming tools.• Lab projects:

– Matrix Multiply– Numerical Integration– recursive tasks– Concurrent hash map– Scalable Allocator

• Applications

Course Development

What Why How

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Teaching TBB contents

• Note: TBB is NOT intended for:– I/O bound processing– Real-time processing

• So, you need to tune your program carefully by Intel Programming Tools in term of your applications, such as embedded system design.

• Surely, TBB will gain your heart sooner or later, for your teaching or applications!

Course Development

What Why How

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Resources for curriculum building

• TBB as example• What can we get from Intel?

– software– Technical support– Courseware– Teaching approach & experience– New/Updated technology– Training chance

Resource

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Software

• Mainly, ISC (Intel Software College)– http://softwarecollege.intel.com– http://www.intel.com/cd/software/products/asmo

-na/eng/index.htm– Two columns

• Intel® Academic Community• Training for Developers

• For TBB– http://www.threadingbuildingblocks.org/

Resource

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

SoftwareResource

Registration for more

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Courseware

• University Program– http://softwarecollege.intel.com/university/

Resource

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Courseware

• Access Courseware – Multi Core Courseware content from Intel, include lab

projects, source code, etc.– Multi Core Courseware from Faculty – Other Courseware – Webinars – Videos – ISC 'Wiki' – Featured Events – Papers – Conference Presentations

Resource

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Courseware

• Curriculum websites from global Universities.

• Microsoft and other companies website • Get from Intel engineering directly

Resource

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Support

• Documentations– White paper– Reference manual– Books

• Training chance– Workshop– Webinars– Forums

Resource

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Others

• Intel Blog– Global (in English) and Chinese blog.

• New/updated technology• Teaching approach & experience

• Webinars– excellent training approach, you can discuss

with Intel veteran engineer directly.– Recent topics such as: ( see next slide)

Resource

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Others

• Towards a Curriculum for Parallelism - Design Patterns

• Towards a Curriculum for Parallelism - Using Multithreaded Libraries to Maximize Performance for Digital Media Apps

• Towards a Curriculum for Parallelism - Practical hands on architecture for applications programmers

• Teaching Many-core Computing in an Academic Environment - Design Doc Review

• Towards a Curriculum for Paralleism - Using Intel®Threading Building Blocks

• The March towards Manycores - Towards a Curriculum for Parallelism

Resource

2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.2008 Intel China Multi-core Academic Forum, Xiamen, China, August 28-29, 2008.

Wuhan University

Another Useful Programming Tools

• Parallel Studio– Announced August 20, 2008– Ultimate all-in-one parallelism toolkit. – Intel Parallel Advisor

• Gain insight on where parallelism will benefit existing source code.

– Intel Parallel Composer• Incorporate parallelism quickly with a C/C++ compiler and

comprehensive threaded libraries. – Intel Parallel Inspector

• Ensure application reliability with proactive "bug finder" for all parallel programming models.

– Intel Parallel Amplifier• Easy-to-use performance analyzer finds bottlenecks quickly.

Thanks!

Welcome to Wuhan University!