Lars Arge
1/12
Lars Arge
Lars Arge
2/12
Pervasive use of computers and sensors Increased ability to acquire/store/process data
→ Massive data collected everywhere Society increasingly “data driven”
→ Access/process data anywhere any time
Increased data availability: “nano-technology-like” opportunity
Nature issue 2/06: “2020 – Future of computing” Trend continues as e.g. sensors increasingly pervasive → Exponential growth of scientific data Paradigm shift: Science will be about mining data → Computer science paramount in all sciences
Massive Data
Lars Arge
3/12
Algorithm Inadequacy Importance of scalability/efficiency
→ Algorithmics core computer science area
Traditional algorithmics:Transform input to output using simple machine model
Inadequate with e.g. Massive data Small/diverse devices Continually arriving data
→ Software inadequacies!
Communities addressing inadequacies have emerged But much work still needs to be done
Lars Arge
4/12
Established 2007 Initially funded for 5 years by the national research foundation
High level objectives: Advance algorithmic knowledge in “massive data” processing
area Train researchers in world-leading international environment Be catalyst for multidisciplinary/industry collaboration
Building on: Strong international team Establish vibrant international environment (focus on people) Focus areas:
I/O-efficient, streaming, cache-oblivious algorithms Algorithm engineering
Center for Massive Data Algorithmics
Lars Arge
5/12
I/O-Efficient Algorithms Problems involving massive data on disk
Disk access is 106 times slower than main memory access Large access time amortized by transferring large blocks of data→ Important to store/access data to take advantage of blocks
I/O-efficient algorithms: Move as few disk blocks as possible to solve given problem
track
magnetic surface
read/write armread/write head
“The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in
sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a
sharpener on someone else’s desk.” (D. Comer)
Lars Arge
6/12
I/O-Efficient Algorithms Matters Example: Traversing linked list (List ranking)
Array size N = 10 elements Disk block size B = 2 elements Main memory size M = 4 elements (2 blocks)
Difference between N and N/B large since block size is large Example: N = 256 x 106, B = 8000 , 1ms disk access time
N I/Os take 256 x 103 sec = 4266 min = 71 hr N/B I/Os take 256/8 sec = 32 sec
Algorithm 2: N/B=5 I/OsAlgorithm 1: N=10 I/Os
1 5 2 6 73 4 108 9 1 2 10 9 85 4 76 3
Lars Arge
7/12
Streaming Algorithms Problems involving truly massive data
Sequential read of disk blocks much faster than random read In many modern (sensor) applications data arrive continually→ (Massive) problems often have to be solved in one sequential scan
Streaming algorithms: Use single scan, handle each element fast, using small space
track
magnetic surface
read/write armread/write head
Lars Arge
8/12
Cache-Oblivious Algorithms Problems to be solved on unknown and/or changing devices
Block access important on all levels of memory hierarchy But memory hierarchies are very diverse
Cache-oblivious algorithms: Use blocks efficiently on all levels of any hierarchy
Lars Arge
9/12
Algorithm engineering: Design/implementation of practical algorithms Experimentation
Center motivated by theory inadequacy Center wants to promote interdisciplinary/industry work→ Natural to consider algorithm engineering
Algorithm engineering work can lead to practical breakthroughs Example: Flow simulation on massive terrain models ~18 billion points at 1 meter (>>1TB)
Implementation of I/O-efficient alg.
→ two weeks to three hours!!
Algorithm Engineering
Lars Arge
10/12
Center Team International core team of algorithms researchers
Including top ranked US and European groups
Leading expertise in focus areas AU: I/O, cache and algorithm engineering MPI: I/O (graph) and algorithm engineering MIT: Cache and streaming
AU
MPIMIT
Arge Brodal
Mehlhorn Meyer
Demaine Indyk
Lars Arge
11/12
Center Activities Visits of core researchers Exchange of AU, MPI and MIT Post Docs and PhD students
Visiting faculty, Post doc and students from other institutions
Frequent summer schools: Streaming algorithms this week! Major international events:
25th Annual Symposium on Computational Geometry in 2009 New conference: Symposium on Algorithms for Massive Datasets
Theme Workshops to foster multidisciplinary/industry collaboration
Lars Arge
12/12
Initially funded for 5 years by the National Research Foundation Collaboration between three leading international groups
Addresses inadequacies of traditional theory When processing massive data When using diverse devices
Initial research focus on I/O-efficient, streaming, cache-oblivious Also focus on algorithm engineering and multidisciplinary work Focus on people: Establish vibrant international environment
www.madalgo.au.dk