VOTable: Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall,...

Preview:

Citation preview

VOTable:Tabular Data for Virtual Observatory

François OchsenbeinRoy Williams

Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta, Tom McGlynn, Alex Szalay,

Andreas Wicenec

The Context

Need of exchanging data in tabular form:• Coming from a wide variety of data servers

and archives (VO context)• Must include the associated metadata in

order to be interpretable by applications• Must deal with potentially millions of

records• Existence of FITS

VOTable History

• Astrores at CDS/ESO (June 1999)• XSIL at Caltech (June 2000)• October 2001: first discussions• December 2001: VOTable 0.1 • January 2002: Interoperability meeting Strasbourg• 15 April 2002: VOTable 1.0

http://cdsweb.u-strasbg.fr/doc/VOTable/

VOTable archives & discussion groups:

http://archives.us-vo.org/VOTable/

Why XML ?

• includes in a single document the data and their associated metadata (descriptive data)

• is of common usage since ~ 3 years

• can be interpreted parsers and tools readily available

• can be visualized (XSL)

• can be encapsulated in messages

A “classical” XML Document

<?xml version="1.0"?><!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/…...dtd"> <RESOURCE name="myResource"> <OBSERVER>William Herschel</OBSERVER> <SOURCE id="mySource"> <STAR-NAME>Procyon</STAR-NAME> <POSITION equinox="J2000" epoch="J2000"> <RA unit="deg">114.827</RA> <Dec unit="deg">+05.227</Dec> </POSITION> <COUNTS> <COUNT>4</COUNT> <COUNT>5</COUNT> <COUNT>3</COUNT> </COUNTS> </SOURCE> ….. </RESOURCE>

Problems of “classical” XML Documents

Each data element is <tagged>, meaning:

• Huge overheads in terms of volume, required resources, and processing time

Not adapted to multi-million row tables

• Need to introduce new elements (tags) for each new parameter, or to cross-match a potentially large set of name spaces

The VOTable way

• The metadata part (data description), essentially as a set of <FIELD> and <PARAMETER> specifications

• The data part (serialisation), which may be in XML, FITS or binary.

VOTables follow the classical tabular presentation where the columns are assumed to be homogeneous in terms of their associated metadata; a VOTable document contains:

<?xml version="1.0"?><!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/VOTable.dtd"><VOTABLE version="1.0"> <DEFINITIONS> <COOSYS ID="myJ2000" equinox="2000." epoch="2000." system="eq_FK5"/> </DEFINITIONS> <RESOURCE> <PARAM name="Observer" datatype="char" arraysize="*" value="William Herschel"> <DESCRIPTION>This parameter is designed to store the observer's name </DESCRIPTION> </PARAM> <TABLE name="Stars"> <DESCRIPTION>Some bright stars</DESCRIPTION> <FIELD name="Star-Name" ucd="ID_MAIN" datatype="char" arraysize="10"/> <FIELD name="RA" ucd="POS_EQ_RA" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Dec" ucd="POS_EQ_DEC" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Counts" ucd="NUMBER" datatype="int" arraysize="2x3x*"/> <DATA> <TABLEDATA> <TR> <TD>Procyon</TD><TD>114.827</TD><TD> 5.227</TD> <TD>4 5 3 4 3 2 1 2 3 3 5 6</TD> </TR> <TR> <TD>Vega</TD><TD>279.234</TD> <TD>38.782</TD><TD>8 7 8 6 8 6</TD> </TR> </TABLEDATA> </DATA> </TABLE> </RESOURCE></VOTABLE>

<RESOURCE> <PARAM …/> … <TABLE> <FIELD…/>… <DATA>

<TABLEDATA> <TR> <TD>… </TR> …</TABLEDATA>

<FITS extnum="n "> <STREAM …></FITS>

<BINARY> <STREAM …></BINARY>

</DATA> </TABLE></RESOURCE>

The <FIELD> and <PARAMETER>

name column label

unit standardized unit

datatype computer type

width character representation

precision character representation

Arraysize repetition factor

ucd standardized parameter category

Describe the metadata attached to columns <FIELD>or to the resource <PARAMETER>

The UCDs

• Interpretation of the table contents• Decide whether values can be compared• Data mining

S. Derrière's talk on Friday

Unified Content Descriptor

Categorisation of the parameters listed in the table

datatype Meaning FITS Bytes

"boolean" Logical L 1

"bit" Bit X *

"unsignedByte" Byte (0 to 255) B 1

"short" Short Integer I 2

"int" Integer J 4

"long" Long integer K 8

"char" ASCII Character A 1

"unicodeChar" Unicode Character

2

"float" Floating point E 4

"double" Double D 8

"floatComplex" Float Complex C 8 "doubleComplex

"Double Complex M 16

FITS Compatibility

• Compatible data types• FITS keywords are represented as <FIELD>,

e.g. width precision arraysize

• Array and variable-length arrays• <DATA> may link to existing FITS data sets

VOTable was designed to be compatible with existing FITS data tables

Data SerializationFITS or BINARY data may be embedded in thedocument, or remote; compression/encodingmay be applied.

Existing tools and Servers

• Several databases are delivering VOTables: HEASARC IPAC NOAO NRAO VizieR SIMBAD (cone search >50 services)

• VOTable parsers in Perl, Java, C (different types of parsers for different applications)

• VOTable validators

• XSLT basic XML/HTML translators

DTD or XML-Schema

• The VOTable rules are existing as a DTD (Document-Type Definition) and in the XML-Schema language (heavily used in developping WebServices applications)

VOTable appendices

1.The LINK conventions describing how to get the correlated data (explanations, images, spectra…) based on substitution of the column contents

Astrores had two features not implemented in VOTables:

…<FIELD name="FileName" datatype="char"…/>…<LINK href="http://server/getFile?${FileName}" …/>…<TR> … <TD>photo/procyon.dat</TD>… </TR><TR> … <TD>photo/vega.dat</TD>… </TR>

VOTable appendices (2)

2. The Query Mechanism using conventions similar to the HTML <FORM> for retrieving the data from user-supplied constraints

<PARAM name="Observer" datatype="char" arraysize="*" /> <TABLE name="Stars"> <DESCRIPTION>Some bright

stars</DESCRIPTION> <FIELD name="Star-Name" ucd="ID_MAIN" datatype="char"

arraysize="10"/> <FIELD name="RA" ucd="POS_EQ_RA" ref="myJ2000" unit="deg"

datatype="float" precision="F3" width="7"/> <FIELD name="Dec" ucd="POS_EQ_DEC" ref="myJ2000" unit="deg"

datatype="float" precision="F3" width="7"/> <FIELD name="Counts" ucd="NUMBER" datatype="int"

arraysize="2x3x*"/> <LINK type="query" action="http://server-node/getResult?" /> </TABLE>

toward more generic WDSL-like solutions ?

Conclusions

• Just version 1.0 … more to come

• Comments ? Proposals ?

Join the discussion group

VOTable@us-vo.org

Recommended