Upload
garey-foster
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
1
Introduction
• Unlike most standards, XML was born from a desire to simplify, nurtured with perseverance and insight. While forces conspire to complicate, obfuscate and mystify XML with an entourage of related standards, XML's inherit balance of simplicity and functionality keep it relevant. This talk will explore the conditions that combined to create a standard unique in its simplicity. Then the forces of complexity will be examined by considering the development of related XML standards. Finally the future of XML and its role in information architecture will be considered by projecting how these forces are aligned today.
3
Sherlock Holmes and Dr. Watson went on a camping trip.
• After a good meal and a bottle of wine they laid down for the night, and went to sleep.
• Some hours later, Holmes awoke and nudged his faithful friend.– "Watson, look up at the sky and tell me what you see."
• Watson replied, "I see millions and millions of stars.“
• Holmes asked: "What does that tell you?"– Watson pondered for a minute.
• "Astronomically, it tells me that there are millions of galaxies and potentially billions of planets.
• Astrologically, I observe that Saturn is in Leo. • Horologically, I deduce that the time is approximately a quarter past
three. • Theologically, I can see that God is all powerful and that we are small
and insignificant. • Meteorologically, I suspect that we will have a beautiful day tomorrow.
5
Introduction
• Unlike most standards, XML was born from– desire to simplify– perseverance and insight
• Forces conspire to – complicate, obfuscate and mystify– with an entourage of related standards, – XML's inherit balance of simplicity and functionality keep it relevant.
• This talk will explore the conditions that combined to create a standard unique in its simplicity. – Then the forces of complexity will be examined by considering the
development of related XML standards.
• Finally the future of XML and its role in information architecture will be considered by projecting how these forces are aligned today.
6
Constraint Systems
• Information architecture is an exercise in constraints and models.– Constraint: Boolean relationship– Model: Abstraction, resource allocation, shared understanding
• Isn’t a conformance to a model also a constraint? Yes.
– Schema – a model and system of constraints
• Schemas– Define contracts for data that will be exchanged in the
transaction – Provide application developers guidance– Guide an author in creating and editing information
7
Prior to XML
ISA~00~ ~00~ ~01~0819405530010 ~01~153734900 ~000114~0927~U~00302~000160473~0~P~|.GS~PO~COMDEX~D710-850~000114~0927~161441~X~003020.ST~850~290267.BEG~00~DS~20-P1-749833~~000114.NTE~ORI~SHIP ASAP.FOB~CC~OR.DTM~002~000114.N1~ST~LUCENT TECHNOLOGIES~92~99.N3~67 WHIPPANY RD~CAHNDANG.N4~WHIPPANY~NJ~07981.
I have no idea what this might
mean!
• EDI error rates can approach 85%.• HTML parsing requires up to 50% of the
code in your favorite browser!
8
Markup
• Simple Syntax that make it easy to separate “data” from “meta-data”
• Markup includes– Elements– Attributes– Comments– Entity references– Processing instructions– CDATA sections– Document type declarations
<tag> Content </tag>
Element
OpeningTag
ClosingTagContent
9
Understanding Data
• To understand data, you must be able to– parse it– infer its context– understand how it relates to you
Semantic Harmonization
Schema Reconciliation Semantic Reconciliation
Harmonized
XML
XML
XML XML
Lexical Reconciliation
Lexical
EDI
Legacy
Flat File Syntax Semantic
10
Managing Assets
Adding Context
Generating Intellectual Capital
Increasing Value
Capture Organize
Synthesize
Evaluate
Level of
Investm
en
t
Process Complexity
High
Low
Low High
Putting information into managed locations
Classifying documents, creating classification schemes Collecting information about the quality and usefulness of the information
Driving business processes with knowledge
Creating new knowledge from existing knowledge
The Information Continuum
Data Mgmt.Data Mgmt.
Information Mgmt.
Information Mgmt.
Knowledge Mgmt.
Knowledge Mgmt.
11
Kann ich bitte ein Glas Wasser haben?
• Presentment– Again, louder– Reword– Reduction– Gesture– Translate
• Fulfillment– Guess– Look Up– Partial Understanding– Full Understanding
• shared context
Secondary Factors
– Trust– Policy
– Ability– Anticipation– Motivation
Wasser bitte!Wasser bitte!
WASSER!!!!!!WASSER!!!!!!
Can I please a glass of water have?Can I please a glass of water have?
Wuerden Sie mir bitte ein Glas Wasser reichen?Wuerden Sie mir bitte ein Glas Wasser reichen?
12
Cognition
• Carbon-based life– Intuition
• Experience
• Reasoning
– Logic• Inductive, deductive
– Intent• Idiom
• Semantics
• Language Roots
• Silicon-based– Data
• Meta-data
– Process– Context
• Associations
• Look-up Tables
• Repositories
Context Example: Crew Chief• Race Car Team leader or
• Rowing Team leader
Context Example: Crew Chief• Race Car Team leader or
• Rowing Team leader
13
EDI Tech
nology
Disruption
demand for EDI
XML Technology
demand for XML eCommerce Technology
demand for technology Z
Per
form
ance
Met
ric
Time
Technology Z
• Volume of transactions• Security, Reliability,
Predictability• Reduced Cost of Procurement
Interoperability• Flexibility and Agility• Number of trading partners• Global supply chains• Reduced setup and TCO• One-to-one marketing
• Reuse, leverage and communities• Semantics
• Cost of new product deployment• One-to-one business
• Security, Reliability, Predictability? • Completeness?
Ref: Innovators Dilemma; Clayton Christensen
15
Why XML?
• XML was designed to manage documents on the web– Team included architects of HP.COM and
DOCS.SUN.COM– Reuse content made for print in multiple web pages:
• data sheets, white papers etc.
– Present a more organized view of information• We faced significant differences in how our organizations
structured information
• So, the answer was to create XML to – Interchange document information between groups– Make it easy to publish content standards– Separate content from presentation
• which makes it easy to build tools that reuse information
16
The design goals for XML
1. XML shall be straightforwardly usable over the Internet.
2. XML shall support a wide variety of applications.
3. XML shall be compatible with SGML.
4. It shall be easy to write programs which process XML documents.
5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
6. XML documents should be human-legible and reasonably clear.
7. The XML design should be prepared quickly.
8. The design of XML shall be formal and concise.
9. XML documents shall be easy to create.
10.Terseness in XML markup is of minimal importance.
17
XML
• XML is the eXtensible Markup Language
• Evolved from ISO Standard SGML• Designed to
– Add structure to Web documents – Be simple (25 pages)
• XML has expanded well beyond its original goals
18
Perseverance Timeline
Oct 18, 1994 first xml meeting at the Cafe d‘Artist at the WWW2 conference
Oct 20, 1994 first draft of charter written (taxi ride w/ Jon Bosak)
July 22, 1996 First xml working group email (I hosted the server)
Aug 8, 1996 WG joined w3c
Aug 19, 1996 XML name coined
Aug 25, 1996 Design principles
Feb 1998 Released
19
Origins of XML
• 1996 November - introduced to SGML Community
• 1997 March - First press articles • 1997 April (WWW6) - introduced to
Web Community• 1998 February - XML 1.0• 1999 January - XML Namespaces• 2001 May - XML Schema • 2001 October - XSL
Recommendation• 2002 February - XML Digital
Signatures“I didn’t actually build it, but it
was based on my idea.”
20
The world around us - The Evolution of e-Commerce
Web services promise to bring these all together and make networks of computers useful and ubiquitous
• Silicon chips made computer ubiquitous
• GUIs made using computers ubiquitous
• The Web made accessing content ubiquitous
• XML made understanding content ubiquitous
1980scustom
applications
early 1990sERP systems
mid 1990sfax, phone, EDI
late 1990sB2C, B2B
2000sWeb Services
1975: FedEx installs the first drop box
1991: Crossing the Chasm and Virtual Corporation
published
1994: The Web carries commercial messages anywhere in the world.
1999: e-Everything, ad nauseum
2001: Crossroads -- “The P.T. Barnum
Era of B2B is over.”
21
Insight
• SGML Substrate (primordial soup)– All 12 of us had worked with SGML extensively– We knew the founders of SGML– We had worked together
• In short, we were a community-of-practice
• In development we walked through each SGML feature and asked:– Is this necessary for success?
22
Ockham's Razor
It is pointless to do with more what can be done with less.
…it also means that Ockham’s razor cut too thin…you needed more information to do it right.
According to Ockham
No one errs intentionally.
This means that whenever we do something wrong it is out of ignorance rather than evil.
According to Socrates
23
Standards Development
• An Example: XML Schema• Complication
– Time– Convergence
• Obfuscation– Priesthood
• Mystify
24
Serenity
• The Chair’s Credo
Grant me the
Serenity to accept the things I cannot change, the
Courage to change the things I can, and the
Wisdom to know the difference.
25
Validation
• Validation assures the data conforms to the schema(s) constraints
– XML requires documents to be well-formed• must follow the grammar to assure parsers can correctly
separate data from markup
– XML allows documents to be validated against:• DTD, Schemas, Others
– Schemas can only express part of the semantics required for business applications.
Validation done with schemas improve data quality andlower the application costs.
26
Where It Goes WrongThe Precision Example
Detailed Description Of Data
• Syntax– How the data is
parsed into elements
• Structure– Contents, order and names
• Constraints– Datatypes– How many elements there can or must be– What values are valid– For example: date constraints could state that the year must
an integer that is greater than 1960 and less than 2100
27
Standardizing Interoperation - Precision
Standards should be constrained wherever possible
• Constraints will allow – Developers to tailor their application– Improved data quality by message validation
• But, constraints limit adoption and flexibility– Developers and adopters choose not to use them
Taken to the extreme, the only standard needed is…
a container for “anything”
28
An entourage: The W3C XML Family
•XML Coordination Group– XML Core
• errata, X-Include, Information Set
– XML Schema • Parts 0, 1, 2, 3
– XML Linking WG • XML Base, Xpath, Xlink,
Xpointer
– XML Query WG• Data Model, Algebra,
Language
– XML Namespaces
•XML Protocols WG•XSL WG
– XSL, XSLT
•XML DSIG– XML Signature, – Canonical XML
•DOM ( Levels 1, 2, 3 )
•Others – XML-Encryption– VoiceXML– XForms WG – SMIL, SVG – XHTML– RDF …
More than 20 horizontal XML specifications!More than 20 horizontal XML specifications!
29
Complication: Convergence
• Convergence – Mixture of two or more communities of practice do
not share sufficient background belief systems to know how to judge "necessary for success".
• Fractured Communities– Lack of community coherence due to historical
differences in practice or implementation
• Examples– Schema: documentation, ecommerce, database– Query: hierarchy, relational– Namespaces: Java, XML, UML
30
Complication: Convergence
• Relational– Entity Relation Model– Normalization Plan
• BLOBs/CLOBs
– Queries• Grievances
• Signers and states
• Declarations
• Hierarchical (XML)– Elements, attributes– Structure– Constraints
Now, tell me who’s proudest?
31
Complication: Data/Context/Process
• Why do document and database people lack shared perspective?– Is it really the difference between hierarchal and
relational views?
… Because their community-of-practice focuses on different metrics…
both think the other is a disruptive force
32
Complication: Time
• Time pressure causes lack of exploration of alternatives and design clarity– Unwillingness to compromise where it is
appropriate• Examples
– Namespace – TBL wanted it done– XLink – after time, interest waned and new parties
did not understand original goals
Another common standards personality type…“Don Quixote”: An impractical idealist bent on righting
incorrigible wrongs….
33
Obfuscation
• Priesthood – Many standards participants attempt to create an
expertise that they can then exploit– The priesthood that surrounds complicated
technology is self-serving• Example
– Namespaces: user community desire to make it mean more than it does, use of overly complex namespace plans
“If you can’t explain it to a 5 year old, then you don’t really understand it”
Cats Cradle, Kurt Vonnegut
34
Mystify
• Vendors are rewarded for creating a mystique around a standard -- particularly one that may challenge their current competitive positioning.
• Mystique serves multiple corporate needs:– Increase interest– Value of supporting technologies– Ability to subvert benefits of openness or
functionality
To combat mystique - release open source or public domain tools that implement the standard.
IBM did this and MS followed for XML parsers.
35
Free or near-free software
• XML enabled reuse of core technology– Parsers
• DOM, SAX, others
– Processors• App servers, java, .Net
– Databases• Native and Enabled
• Free, or at least inexpensive:– http://www.xml.com/
programming/
Lexical
Semantic
<ShoppingCart><ProductList> Dave’s Order</ProductList><Part> 00000-99999 </Part></ShoppingCart>
ISA~00~ ~00~ ~01~0819405530010 BEG~00~DS~20-P1-749833~~000114.NTE~ORI~SHIP ASAP.
<Order><PL> Dave’s Order </PL><Part> 00000-99999 </Part></Order>
Syntactic
36
Future Projection
• Vulnerabilities– Byte count, Schema, Query, Namespace
• Semantics is the focus of the future…so why challenge what is working?
• Future Projections– XML– Schema Validation– XML Databases and XQuery– Semantics
37
XML
• Enables Information Reuse – Global interchange– Machine processing– New uses for documents
• Benefits of XML– Feature/Complexity balance– Enables user defined semantics and semantic processing
DataSet
Size
DataSet
Size
StructuredStructuredUn-StructuredUn-Structured
LargeLarge
SmallSmall
PublishingPublishing DatabaseDatabase
Desktop & PDADesktop & PDA TransactionalTransactional
XMLXML
Interchangeable Parts drove the Industrial Age
Reusable Information drives the Information Age
XML will remain the standard platform for information convergence
©2003 Contivo. All rights reserved
38
Validation
• Validation is seldom used today– Complexity: Computationally expensive– Mystify: Difficult to maintain “tight” schemas– Obfuscate: Schemas can only express part of the
semantics required for business applications
• Hardware Accelerators• Schema Tools
Validation will be done in production with schemas improve data quality and lower the application costs.
39
XML Database and XQuery
• Suffers from– Mystification - IBM & MS v. the world. – Obfuscation - pursuit of
detail and exceptions to distraction
• But time has overridden this with XQuery based tools on the low-cost track
XML Databases will make a comeback.
40
The basic problem w/ semantics:
Why Semantics
We put words on everything
Semantics in Business Systems; Dave McComb p 11
Then we put meaning on the words
Then we disagree
What do you do now?
41
Data at the Edge
• In 1869 the transcontinental railroad enabled and accelerated the migration westward.
• In the 40’s and 50’s, the interstate system enabled and accelerated migration to the suburbs.
• In the 80’s and 90’s computing become less centralized – Accelerated by PCs, relational databases, SQL, the Web – Data migrated out of “glass houses” and closer to the user
• Web Services, XQuery, XML– The latest technologies to help people get better control over
data and processes that help them in their daily activities
and in doing so, data will migrate closer to the edge
New technologies do not create chaos, they expose and accelerate it.
42
Sources of Semantic Chaos
• Data at the edge enables different processes for– different payment history and methods– different customers and partners – different legal jurisdictions
• Data at the edge is more personalized – “Call Sally”
• My cell phone knows who I mean• A centralized corporate directory does not
• With personalization comes differences– with differences comes semantic chaos Don’t blame
my phone.
43
Semantics
• Semantics today mirrors SGML of 1988– Complexity: description logics– Mystify: cashing in on the “semantic web” hype– Obfuscate: RDF, OWL, OWL-lite, DAML-OIL, KIF,
REA, etc.
• Semantic Integration• Emergent Modeling• Community of Practice
Semantics will be the “next big thing”…but today’s semantic technology will seem like Model Ts.