4
Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted by: KAPIL CHOGGA CAO JIANFENG RAMASESHAN KANNAN

Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted

Embed Size (px)

Citation preview

Page 1: Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted

Directed Reading 2

Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to

address these.

Submitted by:

KAPIL CHOGGACAO JIANFENG

RAMASESHAN KANNAN

Page 2: Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted

Hardware issues for large scale parallel computing.

Cost, Power and Processor Challenge

The Memory and Storage Challenge 

CommunicationResiliency Challenge

-Power consumption is now a critical issue.

-Power, required cooling affect density and floor space are other issues.

-For example, The 10 petaflop Opteron-based system was estimated to cost $1.8 billion and required 179 megawatts to operate. This kind of approach is not feasible.

-Use of Smaller Processors•Energy Efficient [Chandrakasan et al1992]•Large processors can have limitations of clock speed•Highest performance per unit area for parallel codes• Smaller is easily manageable. (in case of defect, it might be easy to deal with it.)

-FPGAs is an option.

-Different or same processorsAmdahl’s Law [Hennessy and Patterson 2007] suggests that heterogenous many core systems yield better performance.

-major consequence of the power challenge. 

-The currently available main memories (DRAM) and disk drives (HDD) consume way too much power. 

-New technologies are needed.

-Exascale requires higher bandwidth.

-Higher band width can be achieved by point to point connectivity between cores. (new ways to connect cores is required)

-Chip-scale multiprocessors (CMPs) provides greater inter-core bandwidth and less Inter-core latencies

-Synchronization Using Transactional Memory( to avoid locks)

-The problem with tightly coupled designs is that any delays in moving information from any node to any other node can cause a delay for all the nodes. In other words, small delays can quickly add up to big drops in performance.

-Resilience the ability of a system (with such huge number of components) to continue operations in the presence of faults. 

-An exascale system must be truly autonomic in nature, constantly aware of its status, and optimizing and adapting itself to rapidly changing conditions, including failures of its individual components. 

Page 3: Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted

Software issues for large scale parallel computing.

Security

Synchronicity

-Data input/output is a considerable problem on petascale machines.

As a trivial example, imagine a 100K machine in which all processors try to open a file for reading. The resulting file system storm would probably swamp any single-interface storage server. Furthermore, without intelligent file system semantics, 100K copies of exactly the same file could be pushed through the network.

-The amount of data that can be generated by a petascale machine is staggering. There should be one dedicated I/O node for every 8 compute node.

-Obviously, no single fileserver can currently handle data input in the range of 100GB/s. Thus, file I/O must be parallelized. A dedicated parallel filesystem has become a standard component for leadership-class architectures.

-TLB is a cache used to improve the speed of virtual address translation. One major challenge is to avoid TLB trash

-Cache pollution occurs when multiple programs attempt to use the same processor core cache. Cache "pollution" is bad and techniques to avoid it should be developed.

-The crash of one component in a browser, such as the Acrobat Reader or the Flash Player, should not cause the entire browser or worse yet, the entire machine to falter.

-Also important is how the petascale OS coordinates its fault response with other parts of the system. The most common and robust method for providing fault tolerance in scientific applications is the checkpoint/restart (CPR).

-Increasing Need of Protection required because of enormous users; their privacy and security.

Handling I/O Fault Tolerance

Page 4: Directed Reading 2 Key issues for the future of Software and Hardware for large scale Parallel Computing and the approaches to address these. Submitted

References:

• Operating System Issues for Petascale Systems Argonne National Laboratory

{beckman, iskra, kazutomo, smc}@mcs.anl.govhttp://ftp.mcs.anl.gov/pub/tech_reports/reports/P1332.pdf

• Software Challenges for Extreme Scale Computing: Going fromPetascale to Exascale Systems Michael A. Heroux, Sandia National

Laboratorieshttp://www.exascale.org/mediawiki/images/7/72/Heroux_IESP.pdf

• The Landscape of Parallel Computing Research: AView from Berkeleyhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf

• Irving Wladawsky-Berger http://blog.irvingwb.com/blog/

• Productive Petascale Computing: Requirements, Hardware, and Software

http://labs.oracle.com/techrep/2009/smli_tr-2009-183.pdf