Into The Wonderful

Into the WonderfulTowards a Virtual Institute

you are here

Data

Lots of data

Lots of data, lots of people

Lots of data, lots of people, lots of compute

Lots of data, lots of people, lots of compute,

lots of uses

Lots of data, lots of people, lots of compute, lots of uses, lots and lots

and lots and lots...

Trillionics

A platform for science

1 Get

2 Select

3 Work

4 Save

1 Get

2 Select

3 Work

4 Save

1 Get

2 Select

3 Work

4 Save

1 Get

2 Select

3 Work

4 Save

Work is the killer app

get here quickly

Work = publications

Problematic for complex data

1 Get

2 Select

3 Work

4 Save

1 Get: flat files / databases

2 Select

3 Work

4 Save


2 Select: scripts / directories

3 Work

4 Save



3 Work: interesting

4 Save



3 Work: interesting

4 Save: flat files / databases

Get Filter Work Save









Get

Filter

Work

Save


Get

Filter

Work

Save

Get

Work


Get

Filter

Work

Save

Get

Work

Virtualise

Get Save

Get Save

Data platform

Get Save

Data platform

Work

Get Save

Data platform

Work

App platform

Data accessible via services

Applications accessible via services

Data platform

Work

App platform

Get / Save

Projects / SNP calling

Distribute

Data platform

Work

App platform

Get / SaveHintxon San Diego

Distributed storage

Virtualised services

Application programming interfaces

Work

Getters Filters Savers

Distributed storage

Virtualised services

Application programming interfaces

Work

Getters Filters Savers

WorkWork WorkWork

Work

Work

Work

WorkWork

Work Work

Work

Work

Work

Work

Work Work

A distributed mindset

map/reduce

1. map

@a = [ 1, 2, 3 ]@result = []

for each $value in @a push @result, map($value)end

sub map($incoming) return ($incoming * 10)end

2. reduce

reduce(@result)

sub reduce($r) <transform $r>end

independent

independent

of array size!

independent

of each other!

of array size!

independent

distribute across virtual machines!

Prerequisites

Open data

easy to get at data

Open APIs

software as a service

Beyond SQL

Accessibility

Accessibility

24/7 East coast

West coastDown the corridor

Reliability

Build for flux

Authentication

Privacy

Less software

Distribute everything

Replicate everything

Speed. Redundancy.

Will it scale?

Oh yes

New York Times

11 million TIFs

24 hours$500

Google, Yahoo!Amazon

We are here

We need to start now

2X

150Tb/week

We need to start now

as in, like, yesterday

Petabyte journal club

foomongers.org.uk

Thank you

GREENISGOOD.CO.UK

Technology

Into The Wonderful