I18: Data Format Description
Language (DFDL)
Modeling and Parsing Business Data
Alex Wood IBM DFDL Development Team Lead
IBM Hursley Lab ,UK
© 2014 IBM Corpora/on
Please Note IBM’s statements regarding its plans, direc/ons, and intent are subject to change or withdrawal without no/ce at IBM’s sole discre/on. Informa/on regarding poten/al future products is intended to outline our general product direc/on and it should not be relied on in making a purchasing decision.
The informa/on men/oned regarding poten/al future products is not a commitment, promise, or legal obliga/on to deliver any material, code or func/onality. Informa/on about poten/al future products may not be incorporated into any contract. The development, release, and /ming of any future features or func/onality described for our products remains at our sole discre/on
Performance is based on measurements and projec/ons using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considera/ons such as the amount of mul/programming in the user’s job stream, the I/O configura/on, the storage configura/on, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
2
• Introduc/on
• OGF DFDL – a standard for modelling text and binary data
• IBM DFDL Component
• DFDL and IIB -‐ demo
• Ques/ons
Agenda
Modeling Text and Binary Data • Much of the data in the world resides in files, is not XML, is a mixture of textual and binary with
custom syntax and encodings, and does not have a shareable machine-‐readable descrip/on
• But there has been no universal standard for modelling this data! – XML -‐> use XML Schema – RDBMS -‐> use database schema – Text/binary -‐> ??
• Exis/ng standards are too prescrip/ve: “Put your data in this format!” • Organiza/ons including IBM evolved their own way of modelling text and binary data based on
customer need. • IBM® examples…
– WebSphere® Message Broker: MRM message set – WebSphere Transforma/on Extender: Type Trees – WebSphere DataPower: FFD – WebSphere Cast Iron: Flat File Schema – Sterling B2B Integrator: DDF and IDF files
ü DFDL: a universal, shareable, non-prescriptive description for general text & binary data formats
• Introduc/on
• OGF DFDL – a standard for modelling text and binary data
• IBM DFDL Component
• DFDL and IIB
• Ques/ons
Agenda
Data Format Descrip:on Language (DFDL)
§ A new open standard – From the Open Grid Forum (OGF)
– http://www.ogf.org/ – Version 1.0
– ‘Proposed Recommendation’ status
§ A way of describing data… – It is NOT a data format itself!
§ A powerful modeling language … – Text, binary and bit – Commercial record-oriented – Scientific and numeric – Modern and legacy – Industry standards
§ While allowing high performance … – You choose the right data format
for the job
§ Leverage XML Schema technology – Uses W3C XML Schema 1.0 subset
& type system to describe the logical structure of the data
– Uses XSDL annotations to describe the physical representation of the data
– The result is a DFDL schema § Keep simple cases simple § Annotations are human readable § Both read and write
– A DFDL Processor can parse and serialize data using a DFDL schema
§ Intelligent parsing – Automatically resolve choices and
optionality § Validation of data when parsing and
serializing
Example – Delimited Text Data
cat=5;lbound=-7.1E8 ASCII text
integer ASCII text
floating point
Separator
Separators, initiators (aka tags), & terminators are all examples in DFDL of delimiters
Initiator Initiator
§ 7
Example – DFDL Schema <xs:complexType name=“numbers"> <xs:sequence> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:sequence separator=“;” encoding=“ascii” …/> </xs:appinfo>
</xs:annotation> <xs:element name=“category" type=“xs:int”> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element representation="text"
textNumberPattern="###0" encoding="ascii" lengthKind="delimited" initiator=“cat=" …/> </xs:appinfo>
</xs:annotation> </xs:element>
<xs:element name=“lowerBound" type=“xs:float”> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element representation="text" textNumberPattern="##0.0#E0" encoding="ascii"
lengthKind="delimited" initiator=“lbound=" …/> </xs:appinfo>
</xs:annotation> </xs:element> </xs:sequence> </xs:complexType>
DFDL properties
DFDL annotation
cat=5;lbound=-7.1E8
Example – DFDL Schema (Short Form)
<xs:complexType name=“numbers"> <xs:sequence dfdl:separator=“;” dfdl:encoding=“ascii” …> <xs:element name=“category" type=“xs:int” dfdl:representation="text"
dfdl:textNumberPattern="###0" dfdl:encoding="ascii" dfdl:lengthKind="delimited" dfdl:initiator=“cat=" … /> <xs:element name=“lowerBound" type=“xs:float” dfdl:representation="text"
dfdl:textNumberPattern="##0.0#E0" dfdl:encoding="ascii" dfdl:lengthKind="delimited" dfdl:initiator="lbound=" … /> </xs:sequence> </xs:complexType>
DFDL properties
cat=5;lbound=-7.1E8
§ A DFDL processor uses a DFDL schema to understand a data stream
§ It consists of a DFDL parser and (optionally) a DFDL serializer
§ The DFDL parser reads a data stream and creates a DFDL ‘infoset’
§ The DFDL serializer takes a DFDL ‘infoset’ and writes a data stream
DFDL Processor
<Document> <Element name=“numbers”/> <Element name=“category” dataType=“xs:int” dataValue=“5”/> <Element name=“lowerBound” dataType=“xs:float” dataValue=“-7.1E08”/> </Element> </Document>
cat=5;lbound=-7.1E8
<xs:complexType name=“numbers"> <xs:sequence dfdl:separator=“;” dfdl:encoding=“ascii” ... > <xs:element name=“category" type=“xs:int” dfdl:representation="text"
dfdl:encoding="ascii“ dfdl:textNumberPattern=“###0” dfdl:lengthKind="delimited" dfdl:initiator=“cat=“ ... /> <xs:element name=“lowerBound" type=“xs:float” dfdl:representation="text"
dfdl:encoding="ascii“ dfdl:textNumberPattern=“##0.0#E0” dfdl:lengthKind="delimited" dfdl:initiator=“lbound=“ ... /> </xs:sequence> </xs:complexType>
DFDL Processor
DFDL 1.0 Features § Text data types such as strings, numbers, zoned decimals, calendars, booleans § Binary data types such as integers, floats, BCD, packed decimals, calendars, booleans § Fixed length data and data delimited by text or binary markup § Bi-directional text § Bit data of arbitrary length § Pattern languages for text numbers and calendars § Ordered, unordered and floating content § Default values on parsing and serializing § Nil values for handling out-of-band data § Fixed and variable arrays § XPath 2.0 expression language including variables to model dynamic data § Speculative parsing to resolve choices and optional content § Validation to XML Schema 1.0 rules § Scoping mechanism to allow common property values to be applied at multiple points § Hide elements in the data § Calculate element values
When should I use DFDL? § DFDL’s sweet spot is when you need to model and parse a text or binary data format and where either:
§ You have a specification of the data format ‘on the wire’ § You have actual wire examples of the data format
§ DFDL is recommended to model: § Binary data from COBOL, C, PL/1, ASM programs § Text data with delimiters such as CSV § Text industry standards such as SWIFT, HL7, EDIFACT, X12, HIPPA… § Binary industry standards such as ISO8583, TLog, ...
§ DFDL is not recommended to model: § XML
§ Already have XML parsers and XML Schema / DTDs § JSON
§ Already have JSON parsers, and JSON schema under design § GPB, HDF5, …
§ With serialization formats like GPB, the wire format is never exposed to the consumer and access to the data is using APIs
DFDL Adop:on • IBM DFDL reusable component ships with:
– WebSphere Message Broker v8.0 – IBM Integra/on Bus v9.0 – IBM Integra/on Bus open-‐beta – Ra/onal® Performance Test Server v8 (v8.0.1 onwards) – Ra/onal Test Virtualiza/on Server v8 (v8.0.1 onwards) – Ra/onal Test Workbench v8 (v8.0.1 onwards) – Ra/onal Developer for System z v8.5 – InfoSphere® Master Data Management v11
• Further IBM products and appliances looking to adopt
• Open-‐source DFDL implementa/on in progress ‘Daffodil’ – Available as an alpha release (parser only) – More features added every release
• DFDL web community on GitHub for collabora/ve authoring of DFDL schemas for commercial and scien/fic data formats
DFDL Schemas Web Community • Free public repository for DFDL models
• Hosted on the popular GitHub community website
• Unlimited read-‐only access • Collabora/on encouraged • Evolving content
GeMng Started with DFDL
• Introduc/on
• OGF DFDL – a standard for modelling text and binary data
• IBM DFDL Component
• DFDL and IIB
• Ques/ons
Agenda
IBM DFDL • Designed as an embeddable component
– First shipped in 2011 • DFDL processor
– High performance parser and serializer – Java and C – Streaming, on-‐demand, specula/ve – Pre-‐compiles DFDL schema – Parser emits SAX-‐like events
• Tooling for crea/ng DFDL models – DFDL Schema editor eclipse plugins – Wizards for CSV, COBOL & C – Debug model using real data from within tooling
• IBM DFDL implements majority of the OGF DFDL 1.0 specifica/on – Some more advanced features of DFDL are not yet available – Will be added in future DFDL deliverables un/l 100% achieved
<Document> <Element name=“numbers”/> <Element name=“category” …/> <Element name=“lowerBound” …/> </Element> </Document>
cat=5;lbound=-7.1E8
<xs:schema …> <xs:annotation> <xs:appinfo …> </xs:appinfo> </xs:annotation> ... </xs:schema>
IBM DFDL Processor
What’s New in IBM DFDL • Latest release of IBM DFDL is v1.1.1 • Since IBM DFDL v1.0 several spec features have been added:
– Extrac/ng data using a prefixed length – Extrac/ng data using a regular expression – Extrac/ng binary data with delimiters – User-‐defined variables – Default values when serializing – More XPath func/ons in expressions – Unordered sequences – Asserts with recoverable errors
• Since IBM DFDL v1.0 several tooling features have been added: – Keyboard shortcuts in DFDL editor – MBCS enablement in DFDL debugger – Copy/paste in DFDL editor – Genera/on of sample values
• Con/nual increase in performance
DFDL Schemas for Industry Formats • Fully supported, full func/on DFDL schemas will appear as part of IBM products such as the industry connec/vity packs for IIB
• Unsupported, part func/on DFDL schemas will appear on DFDLSchemas GitHub site as and when available
• HL7 v2.5.1, v2.6 and v2.7 – IIB Connec/vity Pack for Healthcare & GitHub
• HIPPA v5010 mandatory transac/ons – IIB Connec/vity Pack for Healthcare
• IBM/Toshiba 4690 SurePos ACE v7r3 TLog – IIB Retail Pack & GitHub
• ISO 8583 1987 & 1993 – WMB v8 and IIB v9 sample & GitHub
• NACHA 2013 – GitHub
• EDIFACT (all releases) – GitHub
• SWIFT FIN (2013-‐2014) – New! IBM Integra/on Bus Solu/on for SWIFT FIN Messaging (DFDL Edi/on)
• Introduc/on
• OGF DFDL – a standard for modelling text and binary data
• IBM DFDL Component
• DFDL and IIB -‐ demo
• Ques/ons
Agenda
IBM DFDL in WMB and IIB • WMB v8 embeds IBM DFDL v1.0 • IIB v9 embeds IBM DFDL v1.1 • DFDL domain and parser
– Available in usual way – input nodes, ESQL, Java, … – More capable and higher performing than MRM CWF/TDS
• DFDL models – DFDL schema files reside in libraries, not in message sets
• DFDL wizards and editor for crea/ng DFDL models • DFDL model debug and test
– Debug and test parsing & wri/ng of data within toolkit – No message flow or run/me deploy necessary!
• DFDL schema deployed in BAR file – No dic/onary file
• No automa/c migra/on from MRM
IBM DFDL Demo
Launcher for creating Message Models
Selecting one of these creates
a DFDL schema
New Message Model
New Message Model – CSV Wizard
Choose the end of line character
Contains a header record?
Change the number of columns
DFDL Editor – With Generated CSV Model
Logical structure
view DFDL
properties view Problems
view
Test and Debug
Start a test parse
Parsed ‘infoset’
Parsed data
Delimiters highlighted
Test and Debug – Failure
Parsed data up to
error
Trace console
Error message
Object in error
Parsed ‘infoset’ up to error
Model and data
linked
Using the DFDL domain and parser
On Demand or Complete
parsing
Optional validation
DFDL domain
Specify message
name
DFDL message tree
( ['MQROOT' : 0xd6d218] (0x01000000:Name):Properties = ( ['MQPROPERTYPARSER' : 0x141d34e8] (0x03000000:NameValue):MessageSet = '' (CHARACTER) (0x03000000:NameValue):MessageType =
'{}:Company' (CHARACTER) (0x03000000:NameValue):MessageFormat = '' (CHARACTER) (0x03000000:NameValue):Encoding = 273 (INTEGER) (0x03000000:NameValue):CodedCharSetId = 850 (INTEGER) .... ) (0x01000000:Name):DFDL = ( ['dfdl' : 0xd812c8] (0x01000000:Name):Company = ( (0x03000000:NameValue):CompanyName = 'IBM' (CHARACTER) (0x01000000:Name):Employee = ( (0x03000000:NameValue):EmpNo = 12345 (INTEGER)
(0x03000000:NameValue):Dept = 21 (INTEGER) (0x03000000:NameValue):EmpName = 'Steve Hanson' (CHARACTER) ... )
) ) )
DFDL message
name
Message name in tree (like XMLNSC)
DFDL domain
Compact ‘Name/Value’
syntax elements
Data types from DFDL
schema
• The IBM DFDL Java classes may be used to create stand-‐alone Java applica/ons for parsing/serializing text & binary data formats
• IIB license permits IBM DFDL classes to be used by Java applica/ons: – On a computer where IIB is installed – On a remote computer where IIB is not installed
• When used remotely, charged as if IIB was installed (via ITLM) • IBM DFDL for Java is fully supported when used in this manner • Javadoc for APIs and a sample program are provided
<Document> <Element name=“myNumbers”/> <Element name=“myInt” dataType=“xs:int” dataValue=“5”/> <Element name=“myFloat” dataType=“xs:float” dataValue=“-7.1E08”/> </Element> </Document>
intval=5;fltval=-7.1E8 IBM DFDL for Java
Stand-alone Java Application
Using IBM DFDL Outside a Broker
Summary • DFDL – an OGF standard for describing and parsing text and
binary data – Defines a rich set of features to handle broad range of custom and
industry data formats. – Leverages XML schema logical model and logical valida/on. – DFDL Schemas community for sharing of standard industry data
format descrip/ons
• IBM DFDL – the parser for test and binary data in IBM Integra/on Bus and other IBM products. – Highly performing run/me parser inside IIB and other IBM products. – Rich set of DFDL development and test tools in IIB Studio and other
IBM products.
• DFDL 1.0 specifica:on: hvp://www.ogf.org/documents/GFD.174.pdf
• DFDL tutorials: hvp://redmine.ogf.org/dmsf/dfdl-‐wg?folder_id=5485
• DFDL developerWorks: hvp://www.ibm.com/developerworks/library/se-‐dfdl/index.html
• DFDL Wikipedia page: hvp://en.wikipedia.org/wiki/DFDL
• DFDL Schemas on GitHub: hvps://github.com/DFDLSchemas
• OGF DFDL home page: hvp://www.ogf.org/dfdl/
• Daffodil open source project: hvps://opensource.ncsa.illinois.edu/confluence/display/DFDL/Daffodil%3A+Open+Source+DFDL
Useful DFDL links
© 2014 IBM Corpora/on
For Addi:onal Informa:on l IBM Training
hvp://www.ibm.com/training l IBM WebSphere
hvp://www-‐01.ibm.com/sozware/be/websphere/ l IBM developerWorks
www.ibm.com/developerworks/websphere/websphere2.html l WebSphere forums and community
www.ibm.com/developerworks/websphere/community/
33