67
Scientific Databases Lecture: Virtual Observatories for Space Science Dr. Kirk Borne, GMU SCS November 18, 2003 GMU CSI 710

November 18

Embed Size (px)

Citation preview

Page 1: November 18

Scientific Databases Lecture:Virtual Observatories for Space Science

Dr. Kirk Borne, GMU SCS

November 18, 2003

GMU CSI 710

Page 2: November 18

11/18/2003 Virtual Observatories for Space Science 2

Outline

• Quick Review of Astronomy Data • The National Virtual Obseratory (NVO)• Other Virtual Observatories for Space Science• Why Virtual Observatories?• NVO – It’s all about the Science:

– IT-enabled, Science-enabling

• The Enabling Computational Science Technologies for the NVO – where you can help!

• Distributed Data Mining in the NVO

Page 3: November 18

11/18/2003 Virtual Observatories for Space Science 3

The Nature of Astronomical Data

• Imaging– 2D map of the sky at multiple wavelengths

• Derived catalogs– subsequent processing of images

– extracting object parameters (400+ per object)

• Spectroscopic follow-up– spectra: more detailed object properties

– clues to physical state and formation history

– lead to distances: 3D maps

• Numerical simulations

• All inter-related!

Page 4: November 18

11/18/2003 Virtual Observatories for Space Science 4

NOAO Deep Wide-Field Survey: http://www.noao.edu/noao/noaodeep/

Page 5: November 18

11/18/2003 Virtual Observatories for Space Science 5

NOAO Deep Wide-Field Survey: http://www.noao.edu/noao/noaodeep/

Page 6: November 18

11/18/2003 Virtual Observatories for Space Science 6

NOAO Deep Wide-Field Survey: http://www.noao.edu/noao/noaodeep/

Page 7: November 18

11/18/2003 Virtual Observatories for Space Science 7

NASA Astronomy Mission Data:the tip of the data mountain

http://nssdc.gsfc.nasa.gov/astro/astrolist.html

NSSDC’sastrophysicsdataholdings:

One of manyscience datacollectionsfor astronomyacross the USand the world!

NSSDC =NationalSpace ScienceData Center@ NASA/GSFC

Page 8: November 18

11/18/2003 Virtual Observatories for Space Science 8

“Quote of the day”

• “It's just as unpleasant to get more than you expected as it is to get less.”

– George Bernard Shaw

Page 9: November 18

11/18/2003 Virtual Observatories for Space Science 9

Why so many Telescopes? …

Many great astronomicaldiscoveries have comefrom inter-comparisonsof various wavelengths:- Quasars- Gamma-ray bursts- Ultraluminous IR galaxies- X-ray black-hole binaries- Radio galaxies- . . .

Overlay

Because …

Page 10: November 18

11/18/2003 Virtual Observatories for Space Science 10

Therefore, our science dataarchive systems should enablemulti-wavelength interdisciplinarydistributed database access, discovery, mining, and analysis.

Page 11: November 18

11/18/2003 Virtual Observatories for Space Science 11

How does one integrate and use these distributed data archives? …

Page 12: November 18

11/18/2003 Virtual Observatories for Space Science 12

Emerging Computational Environment

• Standardizing distributed data– Web Services, supported on all platforms

– Custom configure remote data dynamically

– XML: Extensible Markup Language

– SOAP: Simple Object Access Protocol

– WSDL: Web Services Description Language

– UDDI: Universal Description, Discovery and Integration

• Standardizing distributed computing– Grid Services

– Custom configure remote computing dynamically

– Build your own remote computer, use it, then discard it

– Virtual Data: new data sets on demand

Page 13: November 18

11/18/2003 Virtual Observatories for Space Science 13

…The National Virtual Observatory (NVO)

• National Academy of Sciences “Decadal Survey” recommended

NVO as highest priority small (<$100M) project :

“Several small initiatives recommended by the committee span both ground

and space. The first among them—the National Virtual Observatory (NVO)—

is the committee’s top priority among the small initiatives. The NVO will

provide a “virtual sky” based on the enormous data sets being created now

and the even larger ones proposed for the future. It will enable a new mode of

research for professional astronomers and will provide to the public an

unparalleled opportunity for education and discovery.” (p.14)

Page 14: November 18

11/18/2003 Virtual Observatories for Space Science 14

Why is it Virtual?

A Virtual Data System : has multiple components is (geographically) distributed is interoperable provides seamless user access to

distributed data system components provides “one-stop shopping” for data

end-user

Page 15: November 18

11/18/2003 Virtual Observatories for Space Science 15

Why is it Necessary?

To maximize cross-enterprise multi-institutional resources

To minimize duplication of effort

To streamline operations through shared development

To serve multiple user communities To facilitate simultaneous data mining, knowledge discovery,

and information retrieval from multiple distributed data collections

Because data volumes are huge & growing rapidly ... For example, in Astronomy : a few terabytes "yesterday” (10,000 CDROMs) tens of terabytes "today” (100,000 CDROMs) petabytes "tomorrow" (within 10 years) (100,000,000 CDROMs)

Page 16: November 18

11/18/2003 Virtual Observatories for Space Science 16

National Virtual Observatoryhttp://www.us-vo.org/

NVO is a concept. It was recommended by the Astronomy Decadal Survey Committee to the National Academy of Sciences. Currently funded by NSF ($10M Information Technology Research grant); and NASA next year(?).

NVO is not just “National”. It is actually “Global”: http://www.ivoa.net/

Will link geographically distributed astronomical data archives and information resources = provides “one-stop shopping” for data end-user

Will be heterogeneous, interoperable, and federated (autonomy maintained at local sites) … therefore, we are using XML and Web Services.

Requires middleware standards for : metadata, resource descriptions (including the Dublin Core), queries, query results, the data (including the Data Model – see next slide), and semantics (… we are using Unified Content Descriptors = UCDs).

Requires innovative computational science technologies for :

data discovery, data mining, data fusion, distributed querying, and code-shipping (“Ship the code, not the data”)

Page 17: November 18

11/18/2003 Virtual Observatories for Space Science 17

Virtual Observatory Data Model

A data model is the structure in which a computer program stores persistent information.

Page 18: November 18

11/18/2003 Virtual Observatories for Space Science 18

Page 19: November 18

11/18/2003 Virtual Observatories for Space Science 19

VxO: becoming an operational system (high TRL)

• What is a VxO?

– Virtual “anything” Observatory – where “anything” currently includes

Astrophysical, Solar, Magnetospheric, Heliospheric, Ionospheric, …

• Summary statement for any VxO …Researchers should be able to find and access seamlessly all existing data relevant to the research they are considering, that data should be independently and correctly useable, and that data should be available in useful ways and in useful contexts.

• Without exception, full VxO efforts aim in this direction by providing multi-mission data access and easy browse functionality.

Page 20: November 18

11/18/2003 Virtual Observatories for Space Science 20

(Trajectories)(Trajectories)

Tools & Services

Science Data Facility

Acquisition & Ingest

Science User Support

http://spdf.gsfc.nasa.gov/

ModelsWeb

CDAWLibHelioWeb

Capabilities of Space Physics Science Databases.The VxO Challenge: to Integrate Data, Tools, Services

Page 21: November 18

11/18/2003 Virtual Observatories for Space Science 21

How do Space Science Databases Change in a Future that has an Increasingly

Rich/Robust VxO Framework?

• One definition for this VxO framework could be …

"The distributed implementation of an integrated space sciences data environment"

• The broad goals of the data centers don't fundamentally change with this definition.– They still must enable new science by adding unique value to the Space Science research

community through strong multi-discipline and cross-discipline data resources, with unique services tied to unique databases.

• These services (data, functions, software) should (and will) be increasingly supplied as a key element of that new, broader VxO environment.

– Logically, the data center’s services eventually become consumers as well as providers.

– Visible early user impact of VxO is critical.

• VxOs should develop a good long-term hybrid solution = PIs + missions/projects + Science Data Centers + (other) specialized services

Page 22: November 18

11/18/2003 Virtual Observatories for Space Science 22

Science Data Formats – part of the glue

• Several key data formats are standard in space science: FITS (Astrophysics & Solar Physics), CDF and netCDF (Space Physics & Earth Science), HDF (Space Physics, Earth Science, & Computational Science).

• Why?

– These provide a baseline data format for all data sets in that discipline and in joint international projects.

– They provide the base for many data center services, data analysis tools, data integration tools, visualization packages.

– They are a key enabling technology for many different space missions and space science projects.

• Plans:

– Translation tools: from FITS <–> CDF <–> HDF <–> netCDF

– Substantial work on format translators via XML and XSLT.

Page 23: November 18

11/18/2003 Virtual Observatories for Space Science 23

Interfaces to a VxO Environment• "Web Services" interface to existing data services

– Web Services interfaces and software libraries complement existing FTP and interactive user web interfaces.

– Web Services provides application-to-application interface, without human intervention.

– Web Services provides distributed data registry (WSDL), data/resource discovery (UDDI), and data services (SOAP).

– Scientific database services have unique scope and functionality that must be accessible by the VxO environment for it to gain user acceptance. • e.g., SOAP/XML interface for Space Physics data now enables 3-D

interactive graphics of distributed multi-mission data.

– Plans for data format translators and converters

Page 24: November 18

11/18/2003 Virtual Observatories for Space Science 24

Why Virtual Observatories?

• Because:– The data are highly distributed.

– Multi-mission data lead to new discoveries.

– The data volumes are HUGE and growing.

– And maybe because of Augustine’s Law …

“Software is like entropy; it always increases.”

- Norman Augustine

Page 25: November 18

11/18/2003 Virtual Observatories for Space Science 25

Szalay’s Law:The utility of N comparable datasets

increases as N2

• Metcalf’s Law: The value of a network scales as n2, where n is the number of nodes connected.

• Hagel & Armstrong’s Axiom: The aggregation of resources is more important than the amount of resources owned.

• Metcalf’s law applies to telephones, the Internet …• Szalay argues as follows:

– Each new dataset gives new information. – 2-way combinations give even more new information.

Page 26: November 18

11/18/2003 Virtual Observatories for Space Science 26

Size of a Typical Archived Astronomical Data Repository

• Size of the archived data for an all-sky survey -- 40,000 square degrees is two Trillion pixels --– One band 4 Terabytes– Multi-wavelength 10-100 Terabytes– Time dimension 10 Petabytes– LSST project (10 yrs) ~100 Petabytes @ http://www.lsst.org/

All-sky distribution of526,280,881 stars fromthe MACHO survey.

Page 27: November 18

11/18/2003 Virtual Observatories for Space Science 27

Ongoing Surveys of the Sky• Large number of new surveys

– multi-TB in size, 100 million objects or more

– individual archives planned, or under way

• Multi-wavelength view of the sky– more than 13 wavelength coverage in 5 years

• Impressive early discoveries– finding exotic objects by unusual colors

• L,T dwarfs, high-z quasars

– finding objects by time variability• gravitational microlensing

MACHO2MASSDENISSDSSGALEXFIRSTDPOSSGSC-IICOBE MAPNVSSFIRSTROSATOGLE ...

MACHO2MASSDENISSDSSGALEXFIRSTDPOSSGSC-IICOBE MAPNVSSFIRSTROSATOGLE ...

Page 28: November 18

11/18/2003 Virtual Observatories for Space Science 28

Sloan Digital Sky Survey Data Productshttp://www.sdss.org/

• Full Data Collection ~20 TB

• Object catalog 400 GB parameters of >108 objects

• Redshift Catalog 1 GB parameters of 106 objects

• Atlas Images 1.5 TB 5 color cutouts of >108 objects

• Spectra 60 GB in a one-dimensional form

• Derived Catalogs 20 GB - clusters - QSO absorption lines

• 4x4 Pixel All-Sky Map 60 GB heavily compressed

Page 29: November 18

11/18/2003 Virtual Observatories for Space Science 29

Highly ranked in Decadal ReviewHighly ranked in Decadal Review

Optimized for surveysOptimized for surveys

• scan modescan mode

• deep modedeep mode

7 square degree field7 square degree field

6.9m effective aperture6.9m effective aperture

2424thth mag in 20 sec mag in 20 sec

> 20 Tbytes/night> 20 Tbytes/night

Real-time analysisReal-time analysis

“Celestial Cinematography”

Simultaneous multiple science Simultaneous multiple science

goalsgoals

Large Synoptic Survey Telescope

Page 30: November 18

11/18/2003 Virtual Observatories for Space Science 30

Large Mirror Fabrication(for large telescopes, such as LSST)

(Univ. of Arizona Mirror Laboratory)(Univ. of Arizona Mirror Laboratory)That’s big!

Page 31: November 18

11/18/2003 Virtual Observatories for Space Science 31

NVO – It’s all about the Science

Page 32: November 18

11/18/2003 Virtual Observatories for Space Science 32

Science Discovery - the Old Way

Page 33: November 18

11/18/2003 Virtual Observatories for Space Science 33

Science Discovery - The New Way -Different!

Systematic data exploration

– will have a central role

– statistical analysis of the

“typical” objects

– automated search for the

“rare” events

The discovery process will rely heavily on distributed data access and multi-archive data mining.

Page 34: November 18

11/18/2003 Virtual Observatories for Space Science 34

Conceptual Architecture for a Distributed Data Mining System

Data ArchivesData Archives

Discovery toolsDiscovery tools

Analysis toolsAnalysis toolsUser

Gateway

Page 35: November 18

11/18/2003 Virtual Observatories for Space Science 35

The Discovery Process

discover significant patternsfrom the analysis of statistically rich and unbiased image/catalog databases

understand complex astrophysical systems via confrontation between data and large numerical simulations

Past: observations of small, carefully selected samplesof objects in a narrow wavelength band

Future: high quality, homogeneous multi-wavelengthdata on millions of objects, allowing us to

The discovery process will rely heavily on advanced visualization, data mining, and statistical analysis tools.

Page 36: November 18

11/18/2003 Virtual Observatories for Space Science 36

The NVO in 5 words or less:

“The archive is the sky!”

Page 37: November 18

11/18/2003 Virtual Observatories for Space Science 37

NVO: It is all about the Science

• There is a huge scientific interest in the new data collections --large sky surveys, multiple telescopes, multiple-wavelength coverage of the sky, time domain coverage ... And it is all available on-line from your desktop …

– “The archive is the sky!”• Something is needed to help scientists access, mine, and

explore these huge data collections.– 1 Terabyte at 10 Mbyte/s takes 1 day to transmit

– Hundreds of intensive queries and thousands of casual queries per-day

– Data will reside at multiple locations, in many different formats

– Existing analysis tools do not scale to Terabyte data sets

• Acute need in a few years; solution will not just happen.

Page 38: November 18

11/18/2003 Virtual Observatories for Space Science 38

• Rare and exotic objects– Very high redshift quasars– Dark matter in the galactic halo– Time-variable objects, transient events:

distant supernovae and microlensing– Brown dwarfs– Variable stars– Asteroids...

• ...incoming!!

– Serendipity!

NVO Enables New Sciencehttp://www.us-vo.org/

Page 39: November 18

11/18/2003 Virtual Observatories for Space Science 39

NVO Science Cases & Drivers(from Aspen 2001 NVO Workshop)

Solar System : NEOs, Long-Period Comets, TNOs, Killer Asteroids!!! The Digital Galaxy : Find star streams and populations -- relics of past/present

assembly phase. Identify components of disk, thick disk, bulge, halo, arms, ?? The Low-Surface Brightness Universe : spatial filtering, multi-wavelength

searches, intersection of the image and catalog domains Panchromatic Census of AGN (Active Galactic Nuclei) : Complete sample of

the AGN zoo, their emission mechanisms, and their environments Precision Cosmology & Large-Scale Structure : Hierarchical Assembly History

of Galaxies and Structure, Cosmological Parameters, Dark Matter and Galaxy Biasing as f(z)

Precision science of any kind that depends on very large sample sizes "Survey Science Deluxe" Search for rare and exotic objects (e.g., high-z QSOs, high-z Sne, L/T dwarfs) Serendipity : Explore new domains of parameter space (e.g., time domain, or

"color-color space" of all kinds)

Page 40: November 18

11/18/2003 Virtual Observatories for Space Science 40

Enabling Computational Science Technologies for the NVO

Page 41: November 18

11/18/2003 Virtual Observatories for Space Science 41

Major Functions of the NVO and the related Enabling Computational Science Technologies

To facilitate data mining and knowledge discovery within the very large astronomical databases -- Requires: indexing for fast queries, filtering of large queries, data

subsetting, visualization, parallelization (queries, access), ...

To facilitate linkages and cross-archive investigations -- Requires: distributed computing, scalable architectures, load balancing, thin

middleware layer, interoperability, code libraries, code-shipping, data-finding services, data standards & interchange formats, query/results protocols, data fusion, quality assessment, archive/metadata profiles, user profiles, intelligent agents, ...

To serve a broad community of users (professionals, amateur astronomers, schools, general public) -- must support thousands of queries per day

Page 42: November 18

11/18/2003 Virtual Observatories for Space Science 42

Some General Challenges for NVO (and all Virtual Data Systems)

Data Discovery: Finding data within distributed data systems

Transparent User Access to Data: across heterogeneous environments

(Distributed) Data Mining and Analysis: of terabytes!

Interoperability: of systems, data, metadata, tools

New Technology Infusion: across multiple distributed systems

Sociology: "We don't need it" or "We already have it”

Page 43: November 18

11/18/2003 Virtual Observatories for Space Science 43

How do you get all of these distributed

science databases working together?

Virtual Observatory team motto:

“It’s the middleware, stupid.”

Page 44: November 18

11/18/2003 Virtual Observatories for Space Science 44

National Virtual Observatoryhttp://www.us-vo.org/

NVO is a concept. It was recommended by the Astronomy Decadal Survey Committee to the National Academy of Sciences. Currently funded by NSF ($10M Information Technology Research grant); and NASA next year(?).

NVO is not just “National”. It is actually “Global”: http://www.ivoa.net/

Will link geographically distributed astronomical data archives and information resources = provides “one-stop shopping” for data end-user

Will be heterogeneous, interoperable, and federated (autonomy maintained at local sites) … therefore, we are using XML and Web Services.

Requires middleware standards for : metadata, resource descriptions (including the Dublin Core), queries, query results, the data (including the Data Model – see next slide), and semantics (… we are using Unified Content Descriptors = UCDs).

Requires innovative computational science technologies for :

data discovery, data mining, data fusion, distributed querying, and code-shipping (“Ship the code, not the data”)

Page 45: November 18

11/18/2003 Virtual Observatories for Space Science 45

Tools for the NVO & other Virtual Data Systems

XML (eXtensible Markup Language) = "the language of interoperability" - ADC/XML Project was most comprehensive and advanced application of XML to NASA astrophysics data archives - including the XDF (eXtensible Data Format) and FITSML data standards [ http://xml.gsfc.nasa.gov/ ]

Comprehensive Data Mining Resource Guide for Large Scientific Databases - [follow the link at http://nvo.gsfc.nasa.gov/ ] "The trouble with facts is that there are so many of them."

- Samuel McChord Crothers, in "The Gentle Reader"

ISAIA (Interoperable Systems for Archival Information Access) : resource description profiles to enable access to distributed data providers

MOCHA (Middleware based On a Code-sHipping Architecture): middleware tools for search, retrieval, & data fusion from heterogeneous databases using heterogeneous interfaces - transparently federates distributed data access - "Ship the code, not the data“

The GRID! …

Page 46: November 18

11/18/2003 Virtual Observatories for Space Science 46

What is The Grid?

• The GRID is “a distributed computing infrastructure that facilitates resource-sharing and coordinated problem-solving in dynamic, multi-institutional virtual organizations.”

http://www.globus.org/datagrid/

http://www.gridforum.org/

http://www.nas.nasa.gov/About/IPG/ (NASA’s Information Power Grid)

Page 47: November 18

11/18/2003 Virtual Observatories for Space Science 47

The Grid: by Foster & Kesselman (Argonne National Laboratory)

Internet computing and GRID technologies promise to change the way we tackle complex problems. They will enable large-scale aggregation and sharing of computational, data and other resources across institutional boundaries …. Transform scientific disciplines ranging from high energy physics to the life sciences

Page 48: November 18

11/18/2003 Virtual Observatories for Space Science 48

Data Grids vs. Computational Grids

Page 49: November 18

11/18/2003 Virtual Observatories for Space Science 49

Slide shown earlier:

Conceptual Architecture for a Distributed Data Mining System

Data ArchivesData Archives

Discovery toolsDiscovery tools

Analysis toolsAnalysis toolsUser

Gateway

Page 50: November 18

11/18/2003 Virtual Observatories for Space Science 50

A Concept for a Data Grid Nodefor Distributed Data Mining**

Hardware requirements• Large distributed database engines

– with few Gbyte/s aggregate I/O speed

• High speed (>10 Gbit/s) backbones– cross-connecting the major archives

• Scalable computing environment– with hundreds of CPUs for analysis

ObjectivityObjectivity

RAIDRAIDObjectivityObjectivity

RAIDRAIDObjectivityObjectivity

RAIDRAIDObjectivityObjectivity

RAIDRAIDObjectivityObjectivity

RAIDRAIDObjectivityObjectivity

RAIDRAIDObjectivityObjectivity

RAIDRAID

RAIDRAID

Database layer 2 GBytes/sec

Compute nodeCompute nodeCompute nodeCompute node

Compute nodeCompute nodeCompute nodeCompute node

Compute nodeCompute nodeCompute nodeCompute node

Compute nodeCompute nodeCompute nodeCompute node

Compute nodeCompute nodeCompute nodeCompute node

Compute nodeCompute nodeCompute nodeCompute node

Compute nodeCompute nodeCompute nodeCompute node

Compute nodeCompute nodeCompute nodeCompute node

Compute nodeCompute nodeCompute nodeCompute node

Compute nodeCompute nodeCompute nodeCompute node

Compute nodeCompute nodeCompute nodeCompute node

Compute nodeCompute nodeCompute nodeCompute node

Compute layer200 CPUs

Interconnect layer 1 Gbits/sec/node

Other nodes10 Gbits/s

** Slide provided by Alex Szalay (JHU)

HPC comes to the rescue!

Page 51: November 18

11/18/2003 Virtual Observatories for Space Science 51

An HPC Application:Parallel Data Mining

Figure: How Parallel Processing Speeds Up Data Mining

The application of parallel computing resources and parallel data access (e.g., RAID) enables concurrent drill-downs into large data collections

Page 52: November 18

11/18/2003 Virtual Observatories for Space Science 52

Distributed Data Mining in the NVO

Page 53: November 18

11/18/2003 Virtual Observatories for Space Science 53

Data Mining: connecting the dots?

Reference: http://homepage.interaccess.com/~purcellm/lcas/Cartoons/cartoons.htm

Page 54: November 18

11/18/2003 Virtual Observatories for Space Science 54

Scaling the VO Mountain: Role of Data MiningDiscoveries

Data MiningVisualization

DataServices

Existing Data Centers and Archives

You arehere

Page 55: November 18

11/18/2003 Virtual Observatories for Space Science 55

Exploration of new domains of the

observable parameter space :

The Time Domain -Part 1

Moving object appears as little rainbow in multiple-color image overlays

In-coming Killer Asteroid?

NVO = Data Mining in Action

Page 56: November 18

11/18/2003 Virtual Observatories for Space Science 56

Data Mining through Data Processsing:Simple Multiple-Frame Subtraction

SUPERNOVAdiscovered !!

Page 57: November 18

11/18/2003 Virtual Observatories for Space Science 57

Mega-Flares on normalSun-like stars = a star like

our Sun increased in brightness 300X one night!

… say what??

The Time Domain -- Part 2

NVO = Data Mining in Action

Page 58: November 18

11/18/2003 Virtual Observatories for Space Science 58

SETI@home searches for E.T. -- An equivalent data mining tool VO@home on anyone’s desktop can find

new comets, asteroids, exploding stars, quasars -- Chunks of data are sent to user’s screensaver, which

begins to mine data for special or one-of-kind astronomical events.

The Time Domain -- Part 3

NVO = Data Mining in Action

Page 59: November 18

11/18/2003 Virtual Observatories for Space Science 59

VO@home brings science discovery to

the desktop of everyone! … a great

tool for space science and computational science education.Requires: access to distributed science databases and data

mining & analysis tools.

Page 60: November 18

11/18/2003 Virtual Observatories for Space Science 60

1. Potential tool for Distributed Data Mining:http://skyserver.pha.jhu.edu/VOconeprofile/

ConeSearch• Find all astronomical objects

within a radius of a point on the sky (= cone).

• Find cross-identifications (e.g., a radio galaxy in one catalog = an Infrared galaxy in another catalog)

• >70 services are now queried.

• Results are returned in XML format (VOTable).

Page 61: November 18

11/18/2003 Virtual Observatories for Space Science 61

2. Potential Tool for Distributed Data Mining:Data Inventory Service

http://us-vo.org/news/dis.html/

Response from the Data Inventory Service, showing links to relevant images and catalogs:

Uses ConeSearch Profile Service.

Page 62: November 18

11/18/2003 Virtual Observatories for Space Science 62

3. Potential tool for Distributed Data Mining: http://www.skyquery.net/main.htm

Submits queries to large distributed databases!

2nd placeWinner in MicrosoftContest

Page 63: November 18

11/18/2003 Virtual Observatories for Space Science 63

Sample Data Mining Applications within the NVO:·Discover data stored in geographically distributed heterogeneous systems.

·Search huge databases for trends and correlations in high-dimensional parameter spaces: identify new properties or new classes of objects.

·Search for rare, one-of-a-kind, and exotic objects in huge databases.

· Identify temporal variations in objects from millions or billions of observations.

· Identify moving objects in huge survey catalogs and image databases.

· Identify parameter glitches / anomalies / deviations either in static databases (e.g., archives) or in dynamic data (e.g., science / telemetry / engineering data streams from remote satellites).

·Find clusters, nearest neighbors, outliers, and/or zones of avoidance in the distribution of astrophysical objects or other observables in arbitrary parameter spaces.

·Serendipitously explore the huge databases that will be part of the NVO, through access to distributed, autonomous, federated, heterogeneous, multi-wavelength, multi-mission astrophysics data archives.

Summary - Applications of Data Mining to the NVOData Mining Resource Guide for Space Science:

http://nvo.gsfc.nasa.gov/nvo_datamining.html• Purpose and Content -- to assist NASA scientists in data mining activities by providing

comprehensive summaries of: NASA-funded data mining projects, data mining tutorials, algorithms, techniques, software, organizations, conference links, conference summaries, publications, lessons learned, related I.T. technologies, science applications, expert interviews, and applications of data mining to the new National Virtual Observatory (NVO).

http://www.us-vo.org/

Page 64: November 18

11/18/2003 Virtual Observatories for Space Science 64

Web References• General:

• http://xml.gsfc.nasa.gov/

• http://nvo.gsfc.nasa.gov/

• http://www.us-vo.org/

• Specific:• VOTable - XML language for queries and tabular query results:

http://www.us-vo.org/VOTable/

• Data Mining Resource Guide:http://nvo.gsfc.nasa.gov/nvo_datamining.html

• Scientific Data Mining Workshop and Reports:http://www.anc.ed.ac.uk/sdmiv/

Page 65: November 18

11/18/2003 Virtual Observatories for Space Science 65

VO: Creating the Future of Astrophysics Data Analysis

Page 66: November 18

11/18/2003 Virtual Observatories for Space Science 66

Summary

• Quick Review of Astronomy Data • The National Virtual Obseratory (NVO)• Other Virtual Observatories for Space Science• Why Virtual Observatories?• NVO – It’s all about the Science:

– IT-enabled, Science-enabling

• The Enabling Computational Science Technologies for the NVO – where you can help!

• Distributed Data Mining in the NVO

Page 67: November 18

11/18/2003 Virtual Observatories for Space Science 67

Next Lecture

• November 25 – Intelligent Archives of the Future