Upload
wesley-maxwell
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation
The BinX LanguageThe BinX Language
www.edikt.orgwww.edikt.org
What is BinX?What is BinX?
Binary in XML– Use XML to mark up binary data– Mark up data types– Mark up sequences– Mark up arrays– Complex structures
www.edikt.orgwww.edikt.org
1. <short-16 byteOrder=“littleEndian”> 32767</short-16>
2. <integer-32 byteOrder=“bigEndian”> 2147483647</integer-32>
3. <float-32 byteOrder=“littleEndian”>100.0</float-32>
4. <float-32 byteOrder=“bigEndian”>100.0</float-32>
Primitive Data Types Primitive Data Types
Mark up data types
FF 7F 7F FF FF FF 00 00 C8 42 42 C8 00 00
1 2 3 4
www.edikt.orgwww.edikt.org
Abstract “struct” typesAbstract “struct” types
Mark up a sequence
<struct> <unsignedShort-16 /> <unsignedShort-16 /> <byte-8 /> <byte-8 /> <byte-8 /></struct>
Screen descriptor in GIF:
Screen width: unsigned short;
Screen height: unsigned short;
Packed field: a byte
Background colour index: byte
Pixel aspect ratio: byte
www.edikt.orgwww.edikt.org
Abstract “array” typesAbstract “array” types
Mark up an array
<arrayFixed> <integer-32 /> <dim indexTo=“99”> <dim indexTo=“9” /> </dim></ arrayFixed >
A 2-dimensional array containing 10-by-100,32-bit integers
www.edikt.orgwww.edikt.org
Embedded abstract typesEmbedded abstract types
Complex structures<struct>
<short-16 />
<arrayFixed>
<byte-8 />
<dim indexTo=“7” />
</arrayFixed>
<struct>
<integer-32 />
<float-32 />
<double-64 />
</struct>
</struct>
www.edikt.orgwww.edikt.org
User-defined metadataUser-defined metadata
Label the data types and structures<struct varName=“Data Sample”>
<short-16 varName=“ID” />
<arrayFixed varName=“List of 10 complex numbers”>
<struct varName=“Complex”><float-32 varName=“Real” /><float-32 varName=“Imaginary” />
</struct>
<dim indexTo=“9” />
</arrayFixed>
</struct>
www.edikt.orgwww.edikt.org
Reusable type definitionsReusable type definitions
Define macros for reuse<definitions>
<defineType typeName=“FourCC”><arrayFixed>
<character-8 /><dim count=“4” />
</arrayFixed></defineType>
</definitions>
<struct varName=“Wave_Header”><useType typeName=“FourCC” varName=“Keyword” /><integer-32 varName=“Chunk_Size” />
</struct>
www.edikt.orgwww.edikt.org
Linking to binary dataLinking to binary data
Reference the binary data file<definitions>
<defineType typeName=“Header”>… …</defineType><defineType typeName=“Format_Chunk”>… …</defineType><defineType typeName=“Data_Chunk”>… …</defineType>
</definitions>
<dataset src=“myfile.wav”><useType typeName="Header" /><useType typeName="Format_Chunk" /><useType typeName="Data_Chunk" />
</dataset>
www.edikt.orgwww.edikt.org
A BinX documentA BinX document
<binx byteOrder=“bigEndian”>– <definitions>
<defineType typeName=“myTyp”>– <arrayFixed>
• <character-8/>• <dim indexTo=“9”/>
– </arrayFixed>
</defineType>
– </definitions>– <dataset src=“myfile.bin”>
<useType typeName=“myTyp”/> <integer-32 varName=“X” />
– </dataset>
</binx>
Root element
Data class section
Data instance section
Abstract data type
www.edikt.orgwww.edikt.org
DataBinXDataBinX
DataBinX = BinX with Data<dataset src=“myfile.bin”>
<struct><short-16 /><long-64 /><double-64 />
</struct>
<arrayFixed><integer-32 /><dim count=“2” />
</arrayFixed>
</dataset>
<dataset> <struct> <short-16>100</short-16> <long-64>1000</long-64> <double-64>5.257</double-64> </struct> <arrayFixed> <dim> <integer-32>1</integer-32> </dim> <dim> <integer-32>2</integer-32> </dim> </arrayFixed></dataset>
e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation
The BinX LibraryThe BinX Library
www.edikt.orgwww.edikt.org
BinX ComponentsBinX Components
The library has core functionality to support generic utilities and applications
Applications
Utilities
BinX LibraryCore
BinX core functionality Parse/Gen BinX doc Read/write binary data Parse/Gen DataBinX
Generic tools DataBinx pack/unpack Extractor, Viewer BinX editorApplications Domain-specific
www.edikt.orgwww.edikt.org
BinX application modelsBinX application models
Data catalogue model
Data manipulation model
Data query model
Data service model
Data transportation model
www.edikt.orgwww.edikt.org
Data catalogue modelData catalogue model
Primary storage
Binary data files
Metadata
Syntactic annotation
Semantic annotation
Classification
Domain specific
Cross-reference
XLink 0101010101
0101010101
BinX
1.1
BinX
1.1
BinX
1.2.1
BinX
1.2.1
BinX
1.2.2
BinX
1.2.2
BinX
1.2.3
BinX
1.2.3
0101010101
0101010101
0101010101
0101010101
0101010101
0101010101
BinX
1.2
BinX
1.2
BinX1
BinX1
BINARY
Detailed
Abstract
METADATA
www.edikt.orgwww.edikt.org
Data manipulation modelData manipulation model
Extraction– Subset of a dataset
Combination– Merge several datasets
Transformation– Conversion of data types– Change of sequence order– Transposition of array dimensions
Transparency– Automatic change of byte order
www.edikt.orgwww.edikt.org
Data query modelData query model
In-dataset query– XPath against virtual XML
Cross-dataset query– Link into multiple datasets
Defining result format– XQuery-based return
fragment
Output interface– SAX events
Utility
BinX library
010101010
010101010
BinXdatasourc
e
BinXdatasourc
e
DataBinXSAX
Events
VOTableSAX
Events
APPVOTable
APPDataBinx
010101010
010101010
BinXdatasourc
e
BinXdatasourc
e
APPCustom
XQuerySAX
Events
010101010
010101010
BinXdatasourc
e
BinXdatasourc
e
XPath010101010
010101010
BinXdatasourc
e
BinXdatasourc
e
XLink
Transform
www.edikt.orgwww.edikt.org
Data service modelData service model
Publishing logical datasets in BinX
DB
0101010101
0101010101
0101010101
0101010101
0101010101
0101010101
0101010101
0101010101
Client
BinX
BinX
BinX
BinX
Grid
0101010101
0101010101
BinX
BinX
Dataset from one binary file
Dataset from several binary files
Dataset from multiple data sources
www.edikt.orgwww.edikt.org
Data transportation modelData transportation model
DataBinX as interlingua
XMLdocument
XMLdocument
DataBinX
DataBinX Schem
aBinX
SchemaBinX
BinX+Binary
BinX+Binary
ZIP(MIME)
ZIP(MIME)
XSLTBinXUtil
ZIPtool
SendReceive
XSLTBinXUtil
ZIPtool
e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation
Application in AstronomyApplication in Astronomy
Case Study 1
Data Conversion
Between FITS and VOTable
www.edikt.orgwww.edikt.org
Application in astronomyApplication in astronomy
FITS and VOTable conversion
DataBinX Utility
BinX libraryCore
SIMPLE = T… …END
01010101
SIMPLE = T… …END
01010101
<?xml version=.<VOTABLE>… …
</VOTABLE>
<?xml version=.<VOTABLE>… …
</VOTABLE>
www.edikt.orgwww.edikt.org
FITS fileFITS file
SIMPLE = T / file does conform to FITS standard
BITPIX = 8 / number of bits per data pixel
NAXIS = 1 / number of data axes
… …
END
3D 4A 14 0F 1C FE 25 04 … …
XTENSION= ‘BINTABLE’ / binary table extension
BITPIX = 8 / 8-bit bytes
NAXIS = 2 / 2-dimensional binary table
… …
END
7B 3E 40 2C 16 70 E7 6F … …
0 79
Primary HDU
Extension
Header
Header
Data
Data
www.edikt.orgwww.edikt.org
VOTableVOTable
<VOTABLE><RESOURCE>
<PARAM name=“Obs” value=“Bob”/><TABLE name=“Stars”> <FIELD name=“Star-name” datatype=“char” arraysize=“10” /> <FIELD name=“RA” datatype=“float” /> <FIELD name=“Dec” datatype=“float” /> <FIELD name=“Counts” datatype=“int” arraysize=“2x3x*” /> <DATA> <TABLEDATA> <TR> <TD>Procyon</TD><TD>114.827</TD><TD>5.227</TD> <TD>4 5 3 4 3 2 1 2 3 3 5 6</TD> </TR> </TABLEDATA> </DATA></TABLE>
</RESOURCE></VOTABLE>
www.edikt.orgwww.edikt.org
FITS →DataBinX →VOTableFITS →DataBinX →VOTable
FITS to VOTable conversion
DataBinX Utility
FITSFITS
SchemaBinX
SchemaBinX
Preprocessor
DataBinX
DataBinX
VOTable
VOTable
XSLTXSLT
XSLTtransformer
www.edikt.orgwww.edikt.org
VOTable→DataBinX→FITSVOTable→DataBinX→FITS
VOTable to FITS conversion
XSLTtransformer
VOTable
VOTable
XSLTXSLT
Preprocessor
DataBinX
DataBinX
FITSFITS
SchemaBinX
SchemaBinX
DataBinXUtility
BinaryData
BinaryData
Postprocessor
FITSHeader
FITSHeader
www.edikt.orgwww.edikt.org
FITS-VOTable experimentFITS-VOTable experiment
Sample FITS file– A data table of 82 rows X 20 fields– File size: 37KB
Generated DataBinX by DataBinX utility– Time spent: 268 ms– DataBinX document size: 1.2MB
VOTable transformed by MSXML– Time spent: about 1 second– VOTable document size: 51KB
F V DB
e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation
Application in AstronomyApplication in Astronomy
Case Study 2
Data Transportation by
pipelining BinX and VOTable
www.edikt.orgwww.edikt.org
The ProblemThe Problem
Three kinds of VOTable data sources– Pure XML VOTable (large)– VOTable + FITS (small)– VOTable + Binary (smaller)
Difficulties– Additional parser for VOTable+Binary– Limited binary format– Byte order and data types
www.edikt.orgwww.edikt.org
The Solution: VOTable + BinXThe Solution: VOTable + BinX
No coding necessary Smaller data files Easy to separate and restore Pipelined to work in the background Platform independent
www.edikt.orgwww.edikt.org
ApproachesApproaches
1. Embedded BinX
2. BinX document linking
Perhaps another method?
www.edikt.orgwww.edikt.org
Embedded BinXEmbedded BinX
Example:<VOTABLE xmlns:bx=http://www.edikt.org/binx/2003/06/binx>
<TABLE name=“stars”><FIELD name=“star-name” datatype=“char” arraysize=“*”/><FIELD name=“RA” datatype=“float”/><DATA>
<bx:dataset src=“bin-file.dat”><bx:array>
<bx:struct><bx:string varName=“star-
name” /><bx:float-32
varName=“RA” /></bx:struct>
</bx:array></bx:dataset>
</DATA></TABLE>
</VOTABLE>
www.edikt.orgwww.edikt.org
BinX Document LinkingBinX Document Linking
Example:
<VOTABLE><TABLE name=“stars”>
<FIELD name=“star-name” datatype=“char” arraysize=“*”/><FIELD name=“RA” datatype=“float”/><DATA>
<BINX href=“stars-data-binx.xml” type=“TABLEDATA”/></DATA>
</TABLE></VOTABLE>
www.edikt.orgwww.edikt.org
Comparison of the two approachesComparison of the two approaches
Embedded BinX– Advantages:
One annotation file Consistency with VOTable definitions
– Disadvantages: Spoil the VOTable document Difficult to parse
BinX document linking– Advantages:
Keep VOTable clean Easy to parse
– Disadvantages: Need separate BinX document Difficult to keep consistent
e-Science Data Information and Knowledge Transformatione-Science Data Information and Knowledge Transformation
BinX SoftwareBinX Software
Today and the Future
www.edikt.orgwww.edikt.org
Future releasesFuture releases
Utilities (GUI BinX editor) XPath-based data query DFDL support Text file support Output through SAX events Output as XQuery return Database interfacing Java wrapper for utilities
www.edikt.orgwww.edikt.org
SupportSupport
Information and software download:– http://www.edikt.org/binx (coming soon)
Questions:– [email protected]
Requirements and suggestions:– [email protected]– [email protected]