Upload
jaden-warren
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Data Format Description Data Format Description Language (DFDL) WGLanguage (DFDL) WG
Martin WestheadEPCC, University of [email protected]
Alan ChappellPNNL
AgendaAgenda
• Introduction and welcome - Martin Westhead 10mins• Binary Format Description Language (BFD) - Alan Chappell 10mins • Binary XML (BinX) - Stephen Rutherford 10mins • DFDL - Martin Westhead 15mins
– Big picture– Structural Description Language– Charter
(20 mins Discussion)
• Examples repository - Alan Chappell 10mins – Bruce Barkstrom Examples at NASA
(15mins Discussion)
MotivationMotivation
• There will never be a standard data format– E.g. XML – verbose, tree-based, explicit structure– Legacy formats– Application specific formats– One size will never fit all
• But could we provide a language for describing formats– Transparency of physical representation– Automatic format conversion– Unambiguous description of data
There’s more…There’s more…
Explicit structure enables:• Standard transformation to/from XML
representation– Could allow application to read/write XML – But provide underlying efficient binary representation
• Data stream/file becomes database– Point to parts of the structure– Extract parts of the structure– Modify parts of the structure– Integrate parts of different structures
And more…And more…
• Generic tools possible– Browsing– Conversion and transformation
• Annotation of data– E.g. identify bits that depict hurricane in an image
• Enables general semantic labels, many ontologies could be developed e.g.:– S.I. units, SQL types, Time– Community specific labels, “starClass = whiteDwarf”– Application specific labels, “nodeColour = green”
• Could lead to a standard transformation language
Not fairy talesNot fairy tales
• Based on implemented work– BinX
http://www.epcc.ed.ac.uk/gridserve/WP5/Binx/– BFD part of the Scientific Annotation Middleware
project (http://www.scidac.org/SAM/)
• Generalized and extended a little
• Formal semantics
• Foundation for extensibility
ApproachApproach
• Separate out structure and semantics• General structural language
– Repetition– Pointers– References to data– New structures can be built (compositionality)
• Semantics– Hard to express so…we don’t– General labeling– Label semantics define elsewhere (ontologies)– Labels can be added (extensibility)
Structure – arbitrary labelsStructure – arbitrary labels
fooSet
fooPairfoo
bunchThings
thing 0
thing 1
thing 1
thing 0
thing 0
thing 1
thing 1
thing 1
bunchThings .
.
.
.
.
.bunchThings
bunchThings
foo .
.
.fooPair .
.
.fooPair
fooPair
Structure – example labelsStructure – example labels
complexArray
complexfloat
byte
bit 0
bit 1
bit 1
bit 0
bit 0
bit 1
bit 1
bit 1
byte .
.
.
.
.
.byte
byte
float .
.
.complex .
.
.complex
complex
Structural languageStructural language• Formal semantics
– Structured binary sequence– Defines hierarchical structure over underlying sequence of binary values
• Language for describing hierarchical structure– Repetition
• Explicit number repeats• Termination characters
– Data reference• Conditionals• Data size
– Pointers• Scope
– As general as possible but– Must be concise and implementable
• Draft language definition on web page (www.epcc.ed.ac.uk/dfdl)
CSV file exampleCSV file example
char:=byte
data:=[(char - [',']).*]
field:=[data; [',']]
finalField:=[data; [‘\n’]]
row:=[field.*] :: [finalField]
table:=[row.*]
Semantic labelsSemantic labels
• Many ontologies possible• Initial scope probably:
– Basic types (floating point, integer, character)– Simple structures (structs, arrays, tables)
• Obvious extensions:– SQL types– XML Schema types
• Key WG goal:– Define form and requirements of new ontologies
What is an Ontology?What is an Ontology?
• XML Schema for new types
• Structural description of new types
• Definition of core API behaviour on new type
• API extensions
• Relationships to other types
WG goalsWG goals
• Formal language for DFDL data structure
• Standard representation of this language in XML
• Requirements for DFDL ontology
• Basic types ontology
• Basic structures ontology
Currently under discussionCurrently under discussion
• Abstraction from the underlying binary– Compression, encoding, encryption– Physical vs. conceptual binary sequence
• Abstraction of description– complex:=[foo; foo]– Instantiate “foo:= float” or “foo:= double” at use time
• Filtering of results– Getting to data model and leave format behind– CSV -> [[value; value; value]; [value; value; value]]
DFDL in the VODFDL in the VO
• Generic tools
• Metadata possibilities– Ontologies can define relationships between
types– E.g. polar to Cartesian– Standard classes over data objects
Getting involvedGetting involved
• Webpages:
http://www.epcc.ed.ac.uk/dfdl
• Mailing list ([email protected])
• My address:[email protected]