Upload
marissa-skinner
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
© 2013 IBM Corporation
Implement high-level parallel API in JDK
Richard Ning – Enterprise Developer1st June 2013
© 2013 IBM Corporation
2
Important Disclaimers
– THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.
– WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.
– ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES.
– ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.
– IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.
– IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
– NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS.
© 2013 IBM Corporation3
Introduction to the speaker
■Developing enterprise application software since 1999 (C++, Java)■Recent work focus:■IBM JDK development
■My contact information:–mail: [email protected]
© 2013 IBM Corporation
What should you get from this talk?
■By the end of this session, you should be able to:
–Understand implementation of high-level parallel API in JDK
–Understand how parallel computing works on multi-cores
© 2013 IBM Corporation
Agenda
Introduction: multi-threading, multi-cores, parallel computing
Case study
Other high-level parallel API
11
22
33
Roadmap44
© 2013 IBM Corporation
Introduction
Multi-ThreadingMulti-Threading
Multi-core computerMulti-core computer
Parallel computing
© 2013 IBM Corporation
Case study
■ Execute the same task for every element in a loop
■ Use multi-threading for the execution
© 2013 IBM Corporation
■ Can it improve performance?
© 2013 IBM Corporation
time
CPU
t1
t2
t1
t2
t1
■ Multi-threading on computer with one core
© 2013 IBM Corporation
■ 100% CPU usage with single thread and multi-threading
• Performance even decreases with extra threading consuming
• Performance even decreases with extra threading consuming
• Can't improve performance
• Can't improve performance
• It is useless to
use multi-
threading(par
allel) API)
• It is useless to
use multi-
threading(par
allel) API)
© 2013 IBM Corporation
CPU1 CPU1■ Multi-threading on computer with multi-core
© 2013 IBM Corporation
Cor4t4
t2
t3
t1
Cor3
Cor2
Cor1
Thread runs separately on every core
time
© 2013 IBM Corporation
■ Raw thread
Any improvement? Executor
–Users need to create and manage it–Users need to create and manage it
Disadvantages
– Not flexible – the number of threads is hard to configure flexibly> core number, resources are consumed in thread context, even decrease performance< core number, some cores are wastedNo balance, the calculation can't be allocated into every core equally
– Not flexible – the number of threads is hard to configure flexibly> core number, resources are consumed in thread context, even decrease performance< core number, some cores are wastedNo balance, the calculation can't be allocated into every core equally
© 2013 IBM Corporation
■ Separate creation and execution of thread
■ Use thread pool to reuse thread
© 2013 IBM Corporation
■ A high-level API concurrent_for
© 2013 IBM Corporation
© 2013 IBM Corporation
The API is easy to use, users only need to input executed task and
data range and don't care about how they are executed. However they
still have disadvantages.
1. The number
of thread in
thread pool
isn't aligned
to core
number
2. Task
executes an
entry once,
which isn't
sufficient
3. A task is
targeted to
a thread,
which isn't
flexible
© 2013 IBM Corporation
1 2 3 n
Thread Pool
1 3 n2
Tasks
m
1 2 3 4
CPU
Core
Thread
Task
Core: 4Thread: nTask: m
Overloading: n>>4
Not flexible: m >n
© 2013 IBM Corporation
1 2 3 4
Thread Pool
1 2 3 4
CPU
Core
Thread
Thread number = core number
Core number doesn't align to thread number: Use fixed thread pool
© 2013 IBM Corporation
Task division: another task division strategy ForkJoinPool
ForkJoin
Task1
Task2 Task3
Task5 Task6 Task7
Divide and conquer
1. Divide big task into small tasks recursively
2. Execute the same operation for every task
3. Join result of every small task
Task4
© 2013 IBM Corporation
© 2013 IBM Corporation
© 2013 IBM Corporation
Better use for divide and conquer problem
Previous issues (thread oversubscription and starvation, unbalancing) still exist
Task dividing strategy is from users, isn't configured properly according to running condition
© 2013 IBM Corporation
New parallel API based on task scheduler
© 2013 IBM Corporation
1 2 3 4
Thread Pool
1 2 3 4
CPU
Core
Thread
1
2
3
4
5
TASKQUEUE
6
7
8
11
12
16
13
14
15
9
10
17
18
19
20
Initial status●Tasks are allocated equally,●One thread by one core●Every thread maintains its task queue which consists of affiliated tasks
© 2013 IBM Corporation
1 2 3 4
Thread Pool
1 2 3 4
CPU
Core
Thread
2
3
4
5
TASKQUEUE
10 15
Unbalancing loading
© 2013 IBM Corporation
1 2 3 4
Thread Pool
1 2 3 4
CPU
Core
Thread
2
3 22
TASKQUEUE
10
4
15
5
21 Balancing loading by task stealing and adding new tasks
© 2013 IBM Corporation
Parallel API with new working mechanism - concurrent_for
Range: the range of data set [0, n)
Strategy: the strategy of dividing range: automatic, static with granularity
Task: the task which executes the same operation on range
© 2013 IBM Corporation
© 2013 IBM Corporation
© 2013 IBM Corporation
Other high-level parallel API
Can add data set while executing it concurrently.concurrent
_while
Use divide_join based task to return calculation result.concurrent_reduce
Sort data set concurrently.concurrentsort
for example, a matrix multiply another matrixint[5][10] matrix1 , int[10][5] matrix2int[5][5] matrix3 = matrix1 * matrix2int[5][5] matrix3 = concurrent_multiply(matrix1, matrix2)
Math calculation
© 2013 IBM Corporation
Anyway we always can achieve performance improvement
by parallel computing based on multi-cores.
© 2013 IBM Corporation
Scalable
Roadmap
■Implement high-level parallel API in JDK based on new task scheduler
Correct
Portable
High performance
© 2013 IBM Corporation
Review of Objectives
■Now that you’ve completed this session, you are able to:
–Understand design of new parallel API based on task.
–Understand what parallel computing is and what is good for
© 2013 IBM Corporation
Q & A
© 2013 IBM Corporation
Thanks!