43
1 Introduction Unlike most standards, XML was born from a desire to simplify, nurtured with perseverance and insight. While forces conspire to complicate, obfuscate and mystify XML with an entourage of related standards, XML's inherit balance of simplicity and functionality keep it relevant. This talk will explore the conditions that combined to create a standard unique in its simplicity. Then the forces of complexity will be examined by considering the development of related XML standards. Finally the future of XML and its role in information architecture will be considered by projecting how these forces are aligned today.

1 Introduction Unlike most standards, XML was born from a desire to simplify, nurtured with perseverance and insight. While forces conspire to complicate,

Embed Size (px)

Citation preview

1

Introduction

• Unlike most standards, XML was born from a desire to simplify, nurtured with perseverance and insight. While forces conspire to complicate, obfuscate and mystify XML with an entourage of related standards, XML's inherit balance of simplicity and functionality keep it relevant. This talk will explore the conditions that combined to create a standard unique in its simplicity. Then the forces of complexity will be examined by considering the development of related XML standards. Finally the future of XML and its role in information architecture will be considered by projecting how these forces are aligned today.

XML Trends

3

Sherlock Holmes and Dr. Watson went on a camping trip.

• After a good meal and a bottle of wine they laid down for the night, and went to sleep.

• Some hours later, Holmes awoke and nudged his faithful friend.– "Watson, look up at the sky and tell me what you see."

• Watson replied, "I see millions and millions of stars.“

• Holmes asked: "What does that tell you?"– Watson pondered for a minute.

• "Astronomically, it tells me that there are millions of galaxies and potentially billions of planets.

• Astrologically, I observe that Saturn is in Leo. • Horologically, I deduce that the time is approximately a quarter past

three. • Theologically, I can see that God is all powerful and that we are small

and insignificant. • Meteorologically, I suspect that we will have a beautiful day tomorrow.

4

Dr. Watson

• What does it tell you?"

"Watson, you idiot.

Someone has stolen our tent."

5

Introduction

• Unlike most standards, XML was born from– desire to simplify– perseverance and insight

• Forces conspire to – complicate, obfuscate and mystify– with an entourage of related standards, – XML's inherit balance of simplicity and functionality keep it relevant.

• This talk will explore the conditions that combined to create a standard unique in its simplicity. – Then the forces of complexity will be examined by considering the

development of related XML standards.

• Finally the future of XML and its role in information architecture will be considered by projecting how these forces are aligned today.

6

Constraint Systems

• Information architecture is an exercise in constraints and models.– Constraint: Boolean relationship– Model: Abstraction, resource allocation, shared understanding

• Isn’t a conformance to a model also a constraint? Yes.

– Schema – a model and system of constraints

• Schemas– Define contracts for data that will be exchanged in the

transaction – Provide application developers guidance– Guide an author in creating and editing information

7

Prior to XML

ISA~00~ ~00~ ~01~0819405530010 ~01~153734900 ~000114~0927~U~00302~000160473~0~P~|.GS~PO~COMDEX~D710-850~000114~0927~161441~X~003020.ST~850~290267.BEG~00~DS~20-P1-749833~~000114.NTE~ORI~SHIP ASAP.FOB~CC~OR.DTM~002~000114.N1~ST~LUCENT TECHNOLOGIES~92~99.N3~67 WHIPPANY RD~CAHNDANG.N4~WHIPPANY~NJ~07981.

I have no idea what this might

mean!

• EDI error rates can approach 85%.• HTML parsing requires up to 50% of the

code in your favorite browser!

8

Markup

• Simple Syntax that make it easy to separate “data” from “meta-data”

• Markup includes– Elements– Attributes– Comments– Entity references– Processing instructions– CDATA sections– Document type declarations

<tag> Content </tag>

Element

OpeningTag

ClosingTagContent

9

Understanding Data

• To understand data, you must be able to– parse it– infer its context– understand how it relates to you

Semantic Harmonization

Schema Reconciliation Semantic Reconciliation

Harmonized

XML

XML

XML XML

Lexical Reconciliation

Lexical

EDI

Legacy

Flat File Syntax Semantic

10

Managing Assets

Adding Context

Generating Intellectual Capital

Increasing Value

Capture Organize

Synthesize

Evaluate

Level of

Investm

en

t

Process Complexity

High

Low

Low High

Putting information into managed locations

Classifying documents, creating classification schemes Collecting information about the quality and usefulness of the information

Driving business processes with knowledge

Creating new knowledge from existing knowledge

The Information Continuum

Data Mgmt.Data Mgmt.

Information Mgmt.

Information Mgmt.

Knowledge Mgmt.

Knowledge Mgmt.

11

Kann ich bitte ein Glas Wasser haben?

• Presentment– Again, louder– Reword– Reduction– Gesture– Translate

• Fulfillment– Guess– Look Up– Partial Understanding– Full Understanding

• shared context

Secondary Factors

– Trust– Policy

– Ability– Anticipation– Motivation

Wasser bitte!Wasser bitte!

WASSER!!!!!!WASSER!!!!!!

Can I please a glass of water have?Can I please a glass of water have?

Wuerden Sie mir bitte ein Glas Wasser reichen?Wuerden Sie mir bitte ein Glas Wasser reichen?

12

Cognition

• Carbon-based life– Intuition

• Experience

• Reasoning

– Logic• Inductive, deductive

– Intent• Idiom

• Semantics

• Language Roots

• Silicon-based– Data

• Meta-data

– Process– Context

• Associations

• Look-up Tables

• Repositories

Context Example: Crew Chief• Race Car Team leader or

• Rowing Team leader

Context Example: Crew Chief• Race Car Team leader or

• Rowing Team leader

13

EDI Tech

nology

Disruption

demand for EDI

XML Technology

demand for XML eCommerce Technology

demand for technology Z

Per

form

ance

Met

ric

Time

Technology Z

• Volume of transactions• Security, Reliability,

Predictability• Reduced Cost of Procurement

Interoperability• Flexibility and Agility• Number of trading partners• Global supply chains• Reduced setup and TCO• One-to-one marketing

• Reuse, leverage and communities• Semantics

• Cost of new product deployment• One-to-one business

• Security, Reliability, Predictability? • Completeness?

Ref: Innovators Dilemma; Clayton Christensen

14

Creating XML

• Unlike most standards, XML was born from– Simplify– Perseverance– Insight

15

Why XML?

• XML was designed to manage documents on the web– Team included architects of HP.COM and

DOCS.SUN.COM– Reuse content made for print in multiple web pages:

• data sheets, white papers etc.

– Present a more organized view of information• We faced significant differences in how our organizations

structured information

• So, the answer was to create XML to – Interchange document information between groups– Make it easy to publish content standards– Separate content from presentation

• which makes it easy to build tools that reuse information

16

The design goals for XML

1. XML shall be straightforwardly usable over the Internet.

2. XML shall support a wide variety of applications.

3. XML shall be compatible with SGML.

4. It shall be easy to write programs which process XML documents.

5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

6. XML documents should be human-legible and reasonably clear.

7. The XML design should be prepared quickly.

8. The design of XML shall be formal and concise.

9. XML documents shall be easy to create.

10.Terseness in XML markup is of minimal importance.

17

XML

• XML is the eXtensible Markup Language

• Evolved from ISO Standard SGML• Designed to

– Add structure to Web documents – Be simple (25 pages)

• XML has expanded well beyond its original goals

18

Perseverance Timeline

Oct 18, 1994 first xml meeting at the Cafe d‘Artist at the WWW2 conference

Oct 20, 1994 first draft of charter written (taxi ride w/ Jon Bosak)

July 22, 1996 First xml working group email (I hosted the server)

Aug 8, 1996 WG joined w3c

Aug 19, 1996 XML name coined

Aug 25, 1996 Design principles

Feb 1998 Released

19

Origins of XML

• 1996 November - introduced to SGML Community

• 1997 March - First press articles • 1997 April (WWW6) - introduced to

Web Community• 1998 February - XML 1.0• 1999 January - XML Namespaces• 2001 May - XML Schema • 2001 October - XSL

Recommendation• 2002 February - XML Digital

Signatures“I didn’t actually build it, but it

was based on my idea.”

20

The world around us - The Evolution of e-Commerce

Web services promise to bring these all together and make networks of computers useful and ubiquitous

• Silicon chips made computer ubiquitous

• GUIs made using computers ubiquitous

• The Web made accessing content ubiquitous

• XML made understanding content ubiquitous

1980scustom

applications

early 1990sERP systems

mid 1990sfax, phone, EDI

late 1990sB2C, B2B

2000sWeb Services

1975: FedEx installs the first drop box

1991: Crossing the Chasm and Virtual Corporation

published

1994: The Web carries commercial messages anywhere in the world.

1999: e-Everything, ad nauseum

2001: Crossroads -- “The P.T. Barnum

Era of B2B is over.”

21

Insight

• SGML Substrate (primordial soup)– All 12 of us had worked with SGML extensively– We knew the founders of SGML– We had worked together

• In short, we were a community-of-practice

• In development we walked through each SGML feature and asked:– Is this necessary for success?

22

Ockham's Razor

It is pointless to do with more what can be done with less.

…it also means that Ockham’s razor cut too thin…you needed more information to do it right.

According to Ockham

No one errs intentionally.

This means that whenever we do something wrong it is out of ignorance rather than evil.

According to Socrates

23

Standards Development

• An Example: XML Schema• Complication

– Time– Convergence

• Obfuscation– Priesthood

• Mystify

24

Serenity

• The Chair’s Credo

Grant me the

Serenity to accept the things I cannot change, the

Courage to change the things I can, and the

Wisdom to know the difference.

25

Validation

• Validation assures the data conforms to the schema(s) constraints

– XML requires documents to be well-formed• must follow the grammar to assure parsers can correctly

separate data from markup

– XML allows documents to be validated against:• DTD, Schemas, Others

– Schemas can only express part of the semantics required for business applications.

Validation done with schemas improve data quality andlower the application costs.

26

Where It Goes WrongThe Precision Example

Detailed Description Of Data

• Syntax– How the data is

parsed into elements

• Structure– Contents, order and names

• Constraints– Datatypes– How many elements there can or must be– What values are valid– For example: date constraints could state that the year must

an integer that is greater than 1960 and less than 2100

27

Standardizing Interoperation - Precision

Standards should be constrained wherever possible

• Constraints will allow – Developers to tailor their application– Improved data quality by message validation

• But, constraints limit adoption and flexibility– Developers and adopters choose not to use them

Taken to the extreme, the only standard needed is…

a container for “anything”

28

An entourage: The W3C XML Family

•XML Coordination Group– XML Core

• errata, X-Include, Information Set

– XML Schema • Parts 0, 1, 2, 3

– XML Linking WG • XML Base, Xpath, Xlink,

Xpointer

– XML Query WG• Data Model, Algebra,

Language

– XML Namespaces

•XML Protocols WG•XSL WG

– XSL, XSLT

•XML DSIG– XML Signature, – Canonical XML

•DOM ( Levels 1, 2, 3 )

•Others – XML-Encryption– VoiceXML– XForms WG – SMIL, SVG – XHTML– RDF …

More than 20 horizontal XML specifications!More than 20 horizontal XML specifications!

29

Complication: Convergence

• Convergence – Mixture of two or more communities of practice do

not share sufficient background belief systems to know how to judge "necessary for success".

• Fractured Communities– Lack of community coherence due to historical

differences in practice or implementation

• Examples– Schema: documentation, ecommerce, database– Query: hierarchy, relational– Namespaces: Java, XML, UML

30

Complication: Convergence

• Relational– Entity Relation Model– Normalization Plan

• BLOBs/CLOBs

– Queries• Grievances

• Signers and states

• Declarations

• Hierarchical (XML)– Elements, attributes– Structure– Constraints

Now, tell me who’s proudest?

31

Complication: Data/Context/Process

• Why do document and database people lack shared perspective?– Is it really the difference between hierarchal and

relational views?

… Because their community-of-practice focuses on different metrics…

both think the other is a disruptive force

32

Complication: Time

• Time pressure causes lack of exploration of alternatives and design clarity– Unwillingness to compromise where it is

appropriate• Examples

– Namespace – TBL wanted it done– XLink – after time, interest waned and new parties

did not understand original goals

Another common standards personality type…“Don Quixote”: An impractical idealist bent on righting

incorrigible wrongs….

33

Obfuscation

• Priesthood – Many standards participants attempt to create an

expertise that they can then exploit– The priesthood that surrounds complicated

technology is self-serving• Example

– Namespaces: user community desire to make it mean more than it does, use of overly complex namespace plans

“If you can’t explain it to a 5 year old, then you don’t really understand it”

Cats Cradle, Kurt Vonnegut

34

Mystify

• Vendors are rewarded for creating a mystique around a standard -- particularly one that may challenge their current competitive positioning.

• Mystique serves multiple corporate needs:– Increase interest– Value of supporting technologies– Ability to subvert benefits of openness or

functionality

To combat mystique - release open source or public domain tools that implement the standard.

IBM did this and MS followed for XML parsers.

35

Free or near-free software

• XML enabled reuse of core technology– Parsers

• DOM, SAX, others

– Processors• App servers, java, .Net

– Databases• Native and Enabled

• Free, or at least inexpensive:– http://www.xml.com/

programming/

Lexical

Semantic

<ShoppingCart><ProductList> Dave’s Order</ProductList><Part> 00000-99999 </Part></ShoppingCart>

ISA~00~ ~00~ ~01~0819405530010 BEG~00~DS~20-P1-749833~~000114.NTE~ORI~SHIP ASAP.

<Order><PL> Dave’s Order </PL><Part> 00000-99999 </Part></Order>

Syntactic

36

Future Projection

• Vulnerabilities– Byte count, Schema, Query, Namespace

• Semantics is the focus of the future…so why challenge what is working?

• Future Projections– XML– Schema Validation– XML Databases and XQuery– Semantics

37

XML

• Enables Information Reuse – Global interchange– Machine processing– New uses for documents

• Benefits of XML– Feature/Complexity balance– Enables user defined semantics and semantic processing

DataSet

Size

DataSet

Size

StructuredStructuredUn-StructuredUn-Structured

LargeLarge

SmallSmall

PublishingPublishing DatabaseDatabase

Desktop & PDADesktop & PDA TransactionalTransactional

XMLXML

Interchangeable Parts drove the Industrial Age

Reusable Information drives the Information Age

XML will remain the standard platform for information convergence

©2003 Contivo. All rights reserved

38

Validation

• Validation is seldom used today– Complexity: Computationally expensive– Mystify: Difficult to maintain “tight” schemas– Obfuscate: Schemas can only express part of the

semantics required for business applications

• Hardware Accelerators• Schema Tools

Validation will be done in production with schemas improve data quality and lower the application costs.

39

XML Database and XQuery

• Suffers from– Mystification - IBM & MS v. the world. – Obfuscation - pursuit of

detail and exceptions to distraction

• But time has overridden this with XQuery based tools on the low-cost track

XML Databases will make a comeback.

40

The basic problem w/ semantics:

Why Semantics

We put words on everything

Semantics in Business Systems; Dave McComb p 11

Then we put meaning on the words

Then we disagree

What do you do now?

41

Data at the Edge

• In 1869 the transcontinental railroad enabled and accelerated the migration westward.

• In the 40’s and 50’s, the interstate system enabled and accelerated migration to the suburbs.

• In the 80’s and 90’s computing become less centralized – Accelerated by PCs, relational databases, SQL, the Web – Data migrated out of “glass houses” and closer to the user

• Web Services, XQuery, XML– The latest technologies to help people get better control over

data and processes that help them in their daily activities

and in doing so, data will migrate closer to the edge

New technologies do not create chaos, they expose and accelerate it.

42

Sources of Semantic Chaos

• Data at the edge enables different processes for– different payment history and methods– different customers and partners – different legal jurisdictions

• Data at the edge is more personalized – “Call Sally”

• My cell phone knows who I mean• A centralized corporate directory does not

• With personalization comes differences– with differences comes semantic chaos Don’t blame

my phone.

43

Semantics

• Semantics today mirrors SGML of 1988– Complexity: description logics– Mystify: cashing in on the “semantic web” hype– Obfuscate: RDF, OWL, OWL-lite, DAML-OIL, KIF,

REA, etc.

• Semantic Integration• Emergent Modeling• Community of Practice

Semantics will be the “next big thing”…but today’s semantic technology will seem like Model Ts.