37
Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Embed Size (px)

Citation preview

Page 1: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Desperately Trying to Cope with the Data Explosion in Astronomical

Sciences

Ray NorrisCSIRO Australia Telescope

National Facility

Page 2: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility
Page 3: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Overview

• Background: astronomical data

• Good news

• Bad news

• Data Manifesto

Page 4: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Astronomical Data

Q: How did the first galaxies in the Universe form?

Page 5: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Need many wavelengths:

Page 6: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Source “c” at 3 cm

wavelength

Page 7: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

The mysterious “source c”

Page 8: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

WFPC2 image

2 arcsec

Page 9: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

The hard questions:• Give me the WFPC image to normalise my

spectral line cube– Obviously best to do computation locally

• Give me every source in NED with J-k>4– Obviously best to do computation at host

• Give the me the radio spectral indices (using ATCA data) of all the objects in SLOAN which have J-K>4 in available ESO/STScI databases”– Some computations local, some on hosts– VO needs to make sensible decisions– VO needs grid computing standards

Terabyte database in Baltimore

Local megabyte dataset NASA

Extragalactic Database in Pasadena

Terabyte database in Sydney

Terabyte database in New Mexico

Multi-terabyte databases in Europe & US

Page 10: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Good News

• The Virtual Observatory

• Astronomical Data Centres

• Public-domain data

Page 11: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

The Virtual Observatory (VO)• The FITS standard (~1980) paved the way in

interoperability

• International Virtual Observatory Alliance involves all major astronomical observatories worldwide – IVOA established 2002

• VO is a collection of interoperating data archives and software tools which are linked to form a research environment in which astronomical research programs can be conducted.

• It includes terabyte distributed databases, data dictionaries, standards, protocols, tools, algorithms, web services, etc.

Page 12: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Examples of VO operationsGive me a list of all the objects which satisfy:

– Criterion A in the CDS database (in Strasbourg, France),

– Criterion B in the Parkes HIPASS survey (in Australia)

– Criterion C in the Hubble archive (in Baltimore, USA)

P.S.

– Each of these databases has a different format, coordinate system, and ontology, and each is several Tbyte in size.

– Metadata is of variable quality

– The object names will be different in each database.

Page 13: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility
Page 14: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

VO Status• VO is not a project-managed project – it is a collaboration of

different groups, with different drivers, but united by a common goal.

• Several groups worldwide are now defining standards, tools, protocols, etc.

• Some prototype tools and web services already available (e.g. http://www.aus-vo.org/services.html)

• More will become available over the next 1-2 years• See http://www.ivoa.net/

Page 15: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Good News

• The Virtual Observatory

• Astronomical Data Centres

• Public-domain data

Page 16: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Astronomical Data Centres

• Centre de Données astronomiques de Strasbourg, France (CDS)– attempts to hold electronic copies of all published

astronomical data, surveys, etc

• NASA Astronomical Data Centre (ADC) Baltimore, USA

• NASA Extragalactic Database (NED)– Interprets and combines extragalactic data

• Astronomical Data System (ADS)– All published astronomical literature

• Others

Page 17: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Good News

• The Virtual Observatory• Astronomical Data Centres• Public-domain data

Security, confidentiality, and IP protection are not major issues in astronomy – most data are in the public domain – hence VO is interesting to Microsoft etc.

Page 18: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Bad News• Intellectual Property controls.

• Journal data

• Bad planning of new instruments

• Digital Divide

• Legacy data

• Lack of awareness

• "Why should I share my data with my competitors?"

Page 19: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Bad News• Intellectual Property controls.

• Journal data

• Bad planning of new instruments

• Digital Divide

• Legacy data

• Lack of awareness

• "Why should I share my data with my competitors?"

Page 20: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Intellectual Property Protection• Patents

– protect inventions

• Copyright– protects written work and creative work

• Proposed database protection– protects information (about anything)– No “fair use” provisions– You cannot cite someone else’s data

without obtaining their permission– Each paper will need a paper-trail showing

rights to cite data

Page 21: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

ICSU International Council of Science

United Nations

IAU IUGG etc...

CODATA

WIPO

United Nations

National Representatives

Committee on Data for Science and Technology

World Intellectual Property Organisation

Page 22: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Bad News• Intellectual Property controls.

• Journal data

• Bad planning of new instruments

• Digital Divide

• Legacy data

• Lack of awareness

• "Why should I share my data with my competitors?"

Page 23: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Journal Data

• Most data published in journals never make it to the data centres

• When they do appear in data centres, they rarely carry the metadata or ontology that enable machine-understanding

• Journals need to impose standards (e.g. VOTable) on authors

Page 24: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Bad News

• Intellectual Property controls. • Journal data• Bad planning of new instruments• Digital Divide• Legacy data• Lack of awareness• "Why should I share my data with my competitors?"

Many new instruments are plannedwithout sufficient planning or fundingfor data management(decreasing scientific productivity)

Page 25: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Bad News

• Intellectual Property controls. • Journal data• Bad planning of new instruments• Digital Divide• Legacy data• Lack of awareness• "Why should I share my data with my

competitors?")

We take for granted instant access to literature and databases. Our colleagues in developing countries still dream of it(thus disadvantaging them even further)

Page 26: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Bad News

• Intellectual Property controls. • Journal data• Bad planning of new instruments• Digital Divide• Legacy data• Lack of awareness• "Why should I share my data with my

competitors?"

Digitising old data competesfor funding with newinstruments

Page 27: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Bad News• Intellectual Property controls.

• Journal data

• Bad planning of new instruments

• Digital Divide

• Legacy data

• Lack of awareness

• "Why should I share my data with my competitors?"

BORING!

Page 28: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Bad News• Intellectual Property controls.

• Journal data

• Bad planning of new instruments

• Digital Divide

• Legacy data

• Lack of awareness

• "Why should I share my data with my competitors?"

Page 29: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

The Data Manifestohttp://www.ivoa.net/twiki/bin/view/Astrodata/

AstronomersManifesto

We, the global community of astronomy, aspire to the following guidelines for managing astronomical data, believing that this would maximise the rate and cost-effectiveness of scientific discovery…

Page 30: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

1. All major tables, images, and spectra published in journals

should appear in the astronomical data centres.

• Journals should, in collaboration with data centres, define formats, table descriptions, and metadata that are easy for authors to adhere to, and can automatically be translated into a format (e.g. VOTable, FITS, etc) that can be entered by the data centre into their database.

Page 31: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

2. All data obtained with publicly-funded observatories should, after appropriate proprietary periods, be

placed in the public domain.

• Consistent with ICSU and OECD recommendations

• …to which Australia is a signatory

Page 32: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

3. In any new major astronomical construction project, the data

processing, storage, migration, and management requirements should be

built in at an early stage of the project plan, and costed along with

other parts of the project

• Isn’t this obvious?– apparently not!

Page 33: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

4. Astronomers in all countries should have the same access to

astronomical data and information.

Page 34: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

5. Legacy astronomical data can be valuable, and high-priority legacy

data should be preserved and stored in digital form in the data centres.

How do you prioritise?

Page 35: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

6. The IAU should work with other international organisations to

achieve our common goals and learn from our colleagues in other fields.

• Use bodies such as CODATA to cross-fertilise

Page 36: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

But the major challenge to coping with the data explosion remains…

Page 37: Desperately Trying to Cope with the Data Explosion in Astronomical Sciences Ray Norris CSIRO Australia Telescope National Facility

Why can’t someoneelse do it?