Upload
matt-wood
View
6.083
Download
0
Tags:
Embed Size (px)
DESCRIPTION
An introduction to cloud computing from a scientific research perspective.
Citation preview
Into the WonderfulTowards a Virtual Institute
you are here
Data
Lots of data
Lots of data, lots of people
Lots of data, lots of people, lots of compute
Lots of data, lots of people, lots of compute,
lots of uses
Lots of data, lots of people, lots of compute, lots of uses, lots and lots
and lots and lots...
Trillionics
A platform for science
1 Get
2 Select
3 Work
4 Save
1 Get
2 Select
3 Work
4 Save
1 Get
2 Select
3 Work
4 Save
1 Get
2 Select
3 Work
4 Save
Work is the killer app
get here quickly
Work = publications
Problematic for complex data
1 Get
2 Select
3 Work
4 Save
1 Get: flat files / databases
2 Select
3 Work
4 Save
1 Get: flat files / databases
2 Select: scripts / directories
3 Work
4 Save
1 Get: flat files / databases
2 Select: scripts / directories
3 Work: interesting
4 Save
1 Get: flat files / databases
2 Select: scripts / directories
3 Work: interesting
4 Save: flat files / databases
Get Filter Work Save
Get Filter Work Save
Get Filter Work Save
Get Filter Work Save
Get Filter Work Save
Get Filter Work Save
Get Filter Work Save
Get Filter Work Save
Get Filter Work Save
Get
Filter
Work
Save
Get Filter Work Save
Get
Filter
Work
Save
Get
Work
Get Filter Work Save
Get
Filter
Work
Save
Get
Work
Virtualise
Get Save
Get Save
Data platform
Get Save
Data platform
Work
Get Save
Data platform
Work
App platform
Data accessible via services
Applications accessible via services
Data platform
Work
App platform
Get / Save
Projects / SNP calling
Distribute
Data platform
Work
App platform
Get / SaveHintxon San Diego
Distributed storage
Virtualised services
Application programming interfaces
Work
Getters Filters Savers
Distributed storage
Virtualised services
Application programming interfaces
Work
Getters Filters Savers
WorkWork WorkWork
Work
Work
Work
WorkWork
Work Work
Work
Work
Work
Work
Work Work
A distributed mindset
map/reduce
1. map
@a = [ 1, 2, 3 ]@result = []
for each $value in @a push @result, map($value)end
sub map($incoming) return ($incoming * 10)end
2. reduce
reduce(@result)
sub reduce($r) <transform $r>end
independent
independent
of array size!
independent
of each other!
of array size!
independent
distribute across virtual machines!
Prerequisites
Open data
easy to get at data
Open APIs
software as a service
Beyond SQL
Accessibility
Accessibility
24/7 East coast
West coastDown the corridor
Reliability
Build for flux
Authentication
Privacy
Less software
Distribute everything
Replicate everything
Speed. Redundancy.
Will it scale?
Oh yes
New York Times
11 million TIFs
24 hours$500
Google, Yahoo!Amazon
We are here
We need to start now
2X
150Tb/week
We need to start now
as in, like, yesterday
Petabyte journal club
foomongers.org.uk
Thank you
GREENISGOOD.CO.UK