43
The data challenge in astronomy • archives • technology • problems • solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observato

The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

Embed Size (px)

Citation preview

Page 1: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

The data challenge in astronomy

The data challenge in astronomy

• archives

• technology

• problems

• solution

DCC conference Bath Andy Lawrence Sep 2005

DCC conference Bath Andy Lawrence Sep 2005

the virtual observatory

Page 2: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

astronomical archives

(1)

Page 3: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

IT in astronomy : key areasIT in astronomy : key areas• (1) facility operations• (2) facility output processing• (3) shared supercomputers for theory• (4) science archives• (5) end-user tools

(1-3) : big bucks(4-5) : smaller bucks but

- produces the final science output

- sets requirements for (1-2)

Page 4: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

astronomical archives astronomical archives

• major archives growing at TB/yr

ESO Archive Volume (GB)

1

10

100

1000

10000

ESO Archive Volume (GB)

1

10

100

1000

10000

Page 5: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

astronomical archives astronomical archives

• major archives growing at TB/yr

• issue not storage but management (curation)

• improving quality of data access and presentation

• needs specialist data centres

ESO Archive Volume (GB)

1

10

100

1000

10000

ESO Archive Volume (GB)

1

10

100

1000

10000

Page 6: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

end users end users

• increasing fraction of archive re-use

Page 7: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

end users end users

• increasing fraction of archive re-use• increasing multi-archive use • most download small files and analyse at home• some users process whole databases• reduction standardised; analysis home grown

Page 8: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

needles in a haystackneedles in a haystackHambly et al 2001

- faint moving object is a cool white dwarf- may be solution to the dark matter problem- but hard to find : one in a million- even harder across multiple archives

Page 9: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

failed starsfailed stars

compare optical and infra-red

extra object is very cold

a "brown dwarf" orfailed star

Page 10: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

multi- views of a Supernova Remnant

Shocks seen in the X-ray

Heavy elementsseen in the optical

Dust seen in the IR

Relativistic electrons seen in the radio

Page 11: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

solar-terrestrial linkssolar-terrestrial links

Coronal mass ejection imaged by space-based

solar observatory

Effect detected hours later bysatellites and ground radar

Page 12: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

background technology

(2)

Page 13: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

dogs and fleas dogs and fleas

• there is a very large dog

Page 14: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

hardware trends hardware trends

• ops, storage, bw : all 1000x/decade– can get 1TB IDE = $5K

– backbones and LANS are Gbps

1.E-06

1.E-03

1.E+00

1.E+03

1.E+06

1.E+09

1880 1900 1920 1940 1960 1980 2000

doubles every 7.5 years

doubles every 2.3 years

doubles every 1.0 years

ops per second/$

Page 15: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

hardware trends hardware trends

• ops, storage, bw : all 1000x/decade– can get 1TB IDE = $5K– backbones and LANS are Gbps

• but device bw 10x/decade– real PC disks 10MB/s; fibre channel SCSI poss 100MB/s

• and last mile problem remains– end-end b/w typically 10Mbps

1.E-06

1.E-03

1.E+00

1.E+03

1.E+06

1.E+09

1880 1900 1920 1940 1960 1980 2000

doubles every 7.5 years

doubles every 2.3 years

doubles every 1.0 years

ops per second/$

Page 16: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

operations on a TB database operations on a TB database

• searching at 10 MB/s takes a day– solved by parallelism– but development non-trivial ==> people

• transfer at 10 Mbps takes a week– leave it where it is

• ==> data centres provide search and analysis services

Page 17: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

network development network development • higher level protocols ==> transparency

• TCP/IP message exchange

• HTTP doc sharing (web)

• grid suite CPU sharing

• XML/SOAP data exchange

==> service paradigm

Page 18: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

next up on the internet next up on the internet

• workflow definition

• dynamic semantics (ontology)

• software agents

Page 19: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

the problems

(3)

Page 20: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

data growth data growth

• astronomical data is growing fast

• but so is computing power

• so whats the problem ?

(1) Heterogeneity(2) End user delivery(3) End user demand

Page 21: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

data rich future data rich future • heritage

– Schmidt, IRAS, Hipparcos

• current hits– VLT, SDSS, 2MASS, HST, Chandra, XMM, WMAP

• coming up : – UKIDSS, VISTA, ALMA, JWST, Planck, Herschel

• cross fingers : – LSST, ELT, Lisa, Darwin,SKA, XEUS, etc.

• plus lots more

• issue is archive interoperability– need standards and transparent infrastructure

Page 22: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

archive data rates archive data rates • map the sky : 0.1" x 16 bits = 100 TB• process to find objects : billion row tables• VISTA 100 TB/yr by 2007• SKA datacubes 100PB/yr by 2020• not a technical or financial problem

– LHC doing 100PB/yr by 2007

• issue is logistic : data management • need professional data centres

Page 23: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

data rates : user delivery data rates : user delivery

• disk I/O and bandwidth – end-user bottlenecks will get WORSE– but links between data centres can be good

• move from download to service paradigm– leave the data where it is– operations on data (search, cluster analysis, etc) as services– shift the results not the data– networks of collaborating data centres (datagrid or VO)

Page 24: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

user demands user demands

• bar constantly raising– online ease– multi-archive transparency– easy data intensive science

• new requirements – automated resource discovery (intelligent Google)– cheap I/O and CPU cycles – new standards and software infrastructure

Page 25: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

the virtual observatory

(4)

Page 26: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

the VO concept the VO concept

• web all docs in the world inside your PC

• VO all databases in the world inside your PC

Page 27: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

Generic science driversGeneric science drivers

• data growth• multi-archive science• large database science

can do all this now, but needsto be fast and easy

• empowerment

Beijing as good as Berkeley

Page 28: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

whats its notwhats its not

• not a monolith

• not a warehouse

Page 29: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

VO frameworkVO framework

• framework + standards

• inter-operable data

• inter-operable software modules

• no central VO-command

- its not a thing- its a way of life

Page 30: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

VO geometryVO geometry

• not a warehouse

• not a hierarchy

• not a peer-to-peer system

• small set of service centresand large population of end users

– note : latest hot database lives with creators / curators

Page 31: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

yesterdayyesterday

browserfrontend

CGIrequest

html

web page

DBengine

SQL

data

Page 32: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

todaytoday

appl

icat

ion

webservice

SOAP/XML request

SOAP/XML data

DBengine

SQL

nativedata

anyt

hin

g

standard formats

Page 33: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

tomorrowtomorrow

appl

icat

ion

webservice

job

results

anyt

hin

g

webservice

webservice

webservice

webservice

webservice

Registry Workflow

GLUE Certification VO Space

standard semantics

publ

ish W

SDL

grid

con

nec

ted

Page 34: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

publishing metaphorpublishing metaphor

• facilities are authors

• data centres are publishers

• VO portals are shops

• end-users are readers

• VO infrastructure is distribution system.

Page 35: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

International VO alliance (IVOA) International VO alliance (IVOA)

Page 36: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

IVOA standardsIVOA standards

• formal process modelled on W3C

• technical working groups and interop workshops

• agreed functionality roadmap

Page 37: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

IVOA standardsIVOA standards

• key standards so far– table formats– resource and service metadata definitions– semantic dictionary– protocols for image and spectrum access

• coming year– grid and web service interfaces– authentication– storage sharing protocols– application metadata and interfaces

Page 38: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

state of implementationsstate of implementations

• key projects : AstroGrid, US-NVO, Euro-VO

• many compliant data services

• VO aware tools

• mutually harvesting registries

• workflow system

• simple shared storage

• AstroGrid has ~100 registered users

• first science results coming out

Page 39: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

coming yearcoming year

• single sign on

• internationally shared storage

• NGS link up

• many more tools

Page 40: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

next stepsnext steps

• intelligent glue– ontology, agents

• analysis services– cluster analysis, multi-D visualisation, etc

• theory services – simulated data, models on demand

• embedding facilities– VO ready facilities

– links to data creation

Page 41: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

lessons

Page 42: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

lessonslessons

• drivers: end user bottleneckend user demandempowerment

• need network of healthy data centres

• need last mile investment

• need facilities to be VO ready

• need continuing technology development

• need continuing standards programme

Page 43: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory

FIN