Upload
patricia-noss
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Multi-core and tera-scale
computingA short overview of benefits and
challengesCSC 2007
Andrzej Nowak, CERN28.08.2007
Multi-core and tera-scale computing - Andrzej Nowak, CERN 3
The “free” bonus
>Silicon technology advances more quickly than design capabilities
>Single CPU complexity is rising slowly
>Moving from 90nm and 65nm processes to 45nm and 32nm processes
>Free transistors available Take all you want… eat all you take
Multi-core and tera-scale computing - Andrzej Nowak, CERN 4
The multi-core revolution
>What do we do with extra silicon? Copy what we already have
>First shot at the PC consumer market – Intel’s Hyper-Threading in the Xeons and Pentium 4 (SMT) Idea: do work when nothing is happening Some resources in the CPU core were shared The relation to extra space on die was not direct
>First popular dual-core CPU for Joe Average – the Intel Core Duo Idea: copy a big part of the processor Less resources are shared
>Next generations of x86-like CPUs are coming 6, 8, 16 cores
Multi-core and tera-scale computing - Andrzej Nowak, CERN 5
Multi-core designs
>Many other multi-core CPUs are on the market AMD x2 (and x4 coming soon) ARM specifications for multi-core CPUs
(your iPod is dual core!) Sun’s Niagara processor (8 cores) Cell processor in Playstation 3 units
>Programmers need to take advantage of the new features CERN openlab and Intel are organizing
a multi-threading and parallelism workshop on the beginning of October!
Multi-core and tera-scale computing - Andrzej Nowak, CERN 6
Tera-scale computing
>Computer performance is traditionally expressed in FLOPS (floating point operations per second) CDC 6600 (1966) – 10 MFLOPS, 64kB
memory Your iPod – 100 MFLOPS Your iMac – 3-4 GFLOPS Your graphics card: 300-500 GFLOPS
>Not so far from the magical limit - 1 Teraflop…? Hence the name, tera-scale
Multi-core and tera-scale computing - Andrzej Nowak, CERN 7
Processors in GPUs (digression)
>Newest trend – heavily multi-core (up to 128)
>Blazing fast
>Toolkits available (i.e. NVIDIA CUDA)
>But… Floating point operations are not precise
enough or non-standard Data types are limited Memory handling is not optimized for
general purpose computing Tiny cache, if at all ~150W… for the chip only
Multi-core and tera-scale computing - Andrzej Nowak, CERN 8
Tera-scale computing ctd.
>Intel’s Polaris 80-core prototype ~1 TFLOPS
>Intel’s Larrabee design 16-24 core x86-GPU hybrid ~3 TFLOPS
>Research directions How do you feed 80 hungry cores? Parallelism – fine grained or coarse? Effective virtualization Memory access and bus optimization Resource sharing
Multi-core and tera-scale computing - Andrzej Nowak, CERN 9
Questions for the future
>How many cores does your mother need?
>How many cores do you, a scientist, need?
>How do you effectively use what you have?
>What is the best level to introduce parallelism? Do you need to redesign your software?
>GRID computing or tera-scale homogenous computers? Will virtualization be effective enough?
Q&A(1 Swiss minute)
This research project has been supported by a Marie Curie Early Stage Research Training Fellowship of the European Community’s Sixth Framework Programme under contract number (MEST-CT-2004-504054)