11
Data Formats and XML Charles Severance www.dr-chuck.com Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License. http://creativecommons.org/licenses/by/3.0/. Copyright 2009, Charles Severance. You assume all responsibility for use and potential liability associated with any use of the material. Material contains copyrighted content, used in accordance with U.S. law. Copyright holders of content included in this material should contact [email protected] with any questions, corrections, or clarifications regarding the use of content. The Regents of the University of Michigan do not license the use of third party content posted to this site unless such a license is specifically granted in connection with particular content. Users of content are responsible for their compliance with applicable law. Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement by the University of Michigan. For more information about how to cite these materials visit http://michigan.educommons.net/about/terms-of-use . Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. You should speak to your physician or make an appointment to be seen if you have questions or concerns about this information or your medical condition. Viewer discretion is advised: Material may contain medical images that may be disturbing to some viewers. Data on the Web With the HTTP Request/Response well understood and well supported there was a natural move toward exchanging data between programs using these protocols Outline XML - Marking up data XHTML - HTML + XML XML Schema - Contracts for XML RSS - Data Feeds

SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

Data Formats and XMLCharles Severancewww.dr-chuck.com

Unless otherwise noted, the content of this course material is licensed under a Creative Commons Attribution 3.0 License.http://creativecommons.org/licenses/by/3.0/.

Copyright 2009, Charles Severance.

You assume all responsibility for use and potential liability associated with any use of the material. Material contains copyrighted content, used in accordance with U.S. law. Copyright holders of content included in this material should contact [email protected] with any questions, corrections, or clarifications regarding the use of content. The Regents of the University of Michigan do not license the use of third party content posted to this site unless such a license is specifically granted in connection with particular content. Users of content are responsible for their compliance with applicable law. Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement by the University of Michigan. For more information about how to cite these materials visit http://michigan.educommons.net/about/terms-of-use.

Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. You should speak to your physician or make an appointment to be seen if you have questions or concerns about this information or your medical condition. Viewer discretion is advised: Material may contain medical images that may be disturbing to some viewers.

Data on the Web

• With the HTTP Request/Response well understood and well supported there was a natural move toward exchanging data between programs using these protocols

Outline

• XML - Marking up data

• XHTML - HTML + XML

• XML Schema - Contracts for XML

• RSS - Data Feeds

Page 2: SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

XMLMarking up data to send across the network...

http://en.wikipedia.org/wiki/XML

eXtensible Markup Language

• Primary purpose is to help information systems share structured data

• It started as a simplified subset of the Standard Generalized Markup Language (SGML), and is designed to be relatively human-legible

http://en.wikipedia.org/wiki/XML

XML Basics

• Start Tag

• End Tag

• Text Content

• Attribute

• Self Closing Tag

<person> <name>Chuck</name> <phone type=”intl”> +1 734 303 4456 </phone> <email hide=”yes” /></person>

White Space<person> <name>Chuck</name> <phone type=”intl”> +1 734 303 4456 </phone> <email hide=”yes” /></person>

<person> <name>Chuck</name> <phone type=”intl”>+1 734 303 4456</phone> <email hide=”yes” /></person>

Line ends do not matter. White space is generally discarded on text elements. We indent only

to be readable.

Page 3: SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

Some XML... <recipe name="bread" prep_time="5 mins" cook_time="3 hours"> <title>Basic bread</title> <ingredient amount="8" unit="dL">Flour</ingredient> <ingredient amount="10" unit="grams">Yeast</ingredient> <ingredient amount="4" unit="dL" state="warm">Water</ingredient> <ingredient amount="1" unit="teaspoon">Salt</ingredient> <instructions> <step>Mix all ingredients together.</step> <step>Knead thoroughly.</step> <step>Cover with a cloth, and leave for one hour in warm room.</step> <step>Knead again.</step> <step>Place in a bread baking tin.</step> <step>Cover with a cloth, and leave for one hour in warm room.</step> <step>Bake in the oven at 180(degrees)C for 30 minutes.</step> </instructions> </recipe>

http://en.wikipedia.org/wiki/XML

XML Terminology

• Tags indicate the beginning and ending of elements

• Attributes - Keyword/value pairs on the opening tag of XML

• Serialize / De-Serialize - Convert data in one program into a common format that can be stored and/or transmitted between systems in a programming language independent manner

http://en.wikipedia.org/wiki/Serialization

Sending Data across the “Net”

PythonDictionary

JavaHashMap

a.k.a. “Wire Protocol” - What we send on the “wire”

Agreeing on a “Wire Format”

PythonDictionary

JavaHashMap

Serialize

<person> <name> Chuck </name> <phone> 303 4456 </phone></person>

De-Serialize

Page 4: SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

XML “Elements” (or Nodes)

• Simple Element

• Complex Element

<people> <person> <name>Chuck</name> <phone>303 4456</phone> </person> <person> <name>Chuck</name> <phone>303 4456</phone> </person></people>

XML as a Tree

<a> ! <b>B</b> ! <c> ! <d>D</d> <e>E</e> ! </c></a>

a

b c

B d e

D EElements Text

XML Text and Attributes

<a> ! <b x=”5”>B</b> ! <c> ! <d>D</d> <e>E</e> ! </c></a>

a

b c

B d e

D EElements Text

5

xattrib

textnode

XML as Paths

<a> ! <b>B</b> ! <c> ! <d>D</d> <e>E</e> ! </c></a>

a

b c

B d e

D E

Elements Text

/a/b B/a/c/d D/a/c/e E

Page 5: SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

XHTML - HTML as XML

http://en.wikipedia.org/wiki/XHTML

XHTML Rules (versus HTML)

• Makes HTML also be well-formed XML

• Allows more flexible parsing of HTML

• All tags must have end tags

• All tag names are lower case (in XML case matters)

• Attributes must have quotes

<HTML> <body> <p>Some text. <p>Some more text</p> <p>A <a href=http://www.cnn.com>link</a></body></HTML>

<html> <body> <p>Some text.</p> <p>Some more text</p> <p>A <a href=”http://www.cnn.com”>link</a></p></body></html>

Old HTML

New XHTML

XML SchemaDescribing a “contract” as to what is acceptable XML.

http://en.wikipedia.org/wiki/Xml_schemahttp://en.wikibooks.org/wiki/XML_Schema

Page 6: SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

XML Schema

• Description of the legal format of an XML document

• Expressed in terms of constraints on the structure and content of documents

• Often used to specify a “contract” between systems - “My system will only accept XML that conforms to this particular Schema.”

• If a particular piece of XML meets the specification of the Schema - it is said to “validate”

http://en.wikipedia.org/wiki/Xml_schema

ValidatorXML Schema Contract

XML Document

XML Validation

Validator

<person> <lastname>Severance</lastname> <age>17</age> <dateborn>2001-04-17</dateborn></person>

<xs:complexType name=”person”> <xs:sequence> <xs:element name="lastname" type="xs:string"/> <xs:element name="age" type="xs:integer"/> <xs:element name="dateborn" type="xs:date"/> </xs:sequence></xs:complexType>

XML Schema Contract

XML Document XML Validation Many XML Schema Languages

• Document Type Definition (DTD)

• http://en.wikipedia.org/wiki/Document_Type_Definition

• Standard Generalized Markup Language (ISO 8879:1986 SGML)

• http://en.wikipedia.org/wiki/SGML

• XML Schema from W3C - (XSD)

• http://en.wikipedia.org/wiki/XML_Schema_(W3C)

http://en.wikipedia.org/wiki/Xml_schema

Page 7: SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

XSD XML Schema (W3C spec)

• We will focus on the World Wide Web Consortium (W3C) version

• It is often called “W3C Schema” because “Schema” is considered generic

• More commonly it is called XSD because the file names end in .xsd

http://www.w3.org/XML/Schema

http://en.wikipedia.org/wiki/XML_Schema_(W3C)

XSD Structure

• xs:element

• xs:sequence

• xs:complexType

<xs:complexType name=”person”> <xs:sequence> <xs:element name="lastname" type="xs:string"/> <xs:element name="age" type="xs:integer"/> <xs:element name="dateborn" type="xs:date"/> </xs:sequence></xs:complexType>

<person> <lastname>Severance</lastname> <age>17</age> <dateborn>2001-04-17</dateborn></person>

XSDConstraints

http://www.w3schools.com/Schema/schema_complex_indicators.asp

<xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="full_name" type="xs:string"/> maxOccurs="1" minOccurs="1"/> <xs:element name="child_name" type="xs:string" maxOccurs="10" minOccurs="0"/> </xs:sequence> </xs:complexType></xs:element>

<person> <full_name>Tove Refsnes</full_name> <child_name>Hege</child_name> <child_name>Stale</child_name> <child_name>Jim</child_name> <child_name>Borge</child_name></person>

XSD Data Types

http://www.w3schools.com/Schema/schema_dtypes_numeric.asp

<xs:element name="customer" type="xs:string"/><xs:element name="start" type="xs:date"/><xs:element name="startdate" type="xs:dateTime"/><xs:element name="prize" type="xs:decimal"/><xs:element name="weeks" type="xs:integer"/>

<customer>John Smith</customer><start>2002-09-24</start><startdate>2002-05-30T09:30:10Z</startdate><prize>999.50</prize><weeks>30</weeks>

Page 8: SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

<?xml version="1.0" encoding="utf-8" ?><xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Address"> <xs:complexType> <xs:sequence> <xs:element name="Recipient" type="xs:string" /> <xs:element name="House" type="xs:string" /> <xs:element name="Street" type="xs:string" /> <xs:element name="Town" type="xs:string" /> <xs:element minOccurs="0" name="County" type="xs:string" /> <xs:element name="PostCode" type="xs:string" /> <xs:element name="Country"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="FR" /> <xs:enumeration value="DE" /> <xs:enumeration value="ES" /> <xs:enumeration value="UK" /> <xs:enumeration value="US" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

<?xml version="1.0" encoding="utf-8"?><Address xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="SimpleAddress.xsd"> <Recipient>Mr. Walter C. Brown</Recipient> <House>49</House> <Street>Featherstone Street</Street> <Town>LONDON</Town> <PostCode>EC1Y 8SY</PostCode> <Country>UK</Country></Address>

<?xml version="1.0" encoding="ISO-8859-1" ?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="shiporder"> <xs:complexType> <xs:sequence> <xs:element name="orderperson" type="xs:string"/> <xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="item" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string" minOccurs="0"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="orderid" type="xs:string" use="required"/> </xs:complexType></xs:element></xs:schema>

<?xml version="1.0" encoding="ISO-8859-1"?><shiporder orderid="889923" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="shiporder.xsd"> <orderperson>John Smith</orderperson> <shipto> <name>Ola Nordmann</name> <address>Langgt 23</address> <city>4000 Stavanger</city> <country>Norway</country> </shipto> <item> <title>Empire Burlesque</title> <note>Special Edition</note> <quantity>1</quantity> <price>10.90</price> </item> <item> <title>Hide your heart</title> <quantity>1</quantity> <price>9.90</price> </item></shiporder>

http://www.w3schools.com/Schema/schema_example.asp

RSSReally Simple Syndication

Rich Site SummaryRDF Site Summary

http://en.wikipedia.org/wiki/RSS_(file_format)http://en.wikipedia.org/wiki/Web_feed

Page 9: SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

RSS - Rich Site Summary

• A way to list “what’s new” on a site

• Also called a “feed”

• Designed to be “scraped”

• Special Programs called “RSS Reader”, “News Reader”

http://en.wikipedia.org/wiki/Aggregator

Content Mashup

RSS

Subscribers

RSS

RSS

Page 10: SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

Advertising an RSS Feed in HTML

<head> <title>Dr. Chuck's Interactive Personal Portfolio</title> <link rel="alternate" type="application/rss+xml" title="RSS" href="http://www.dr-chuck.com/csev-blog/index.rdf" /> <link rel="alternate" type="application/atom+xml" title="Atom" href="http://www.dr-chuck.com/csev-blog/atom.xml" /></head>

Several Versions of RSS

• There are two threads of RSS specification - the numbers overlap a bit - there mut have been a conflict of approaches

• RDF RSS - RDF Site Summary

• RSS 2.* - Really Simple Syndication

http://en.wikipedia.org/wiki/RSS_(file_format)

<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/"> <channel rdf:about="http://www.xml.com/xml/news.rss"> <title>XML.com</title> <link>http://xml.com/pub</link> <description> XML.com features a rich mix of information and services for the XML community. </description> <image rdf:resource="http://xml.com/universal/images/xml_tiny.gif" /> <items> <rdf:Seq> <rdf:li rdf:resource="http://xml.com/pub/2000/08/09/xslt/xslt.html" /> <rdf:li rdf:resource="http://xml.com/pub/2000/08/09/rdfdb/index.html" /> </rdf:Seq> </items> <textinput rdf:resource="http://search.xml.com" /> </channel>

<item rdf:about="http://xml.com/pub/2000/08/09/xslt/xslt.html"> <title>Processing Inclusions with XSLT</title> <link>http://xml.com/pub/2000/08/09/xslt/xslt.html</link> <description> Processing document inclusions with general XML tools can be problematic. This article proposes a way of preserving inclusion information through SAX-based processing. </description> </item> <item rdf:about="http://xml.com/pub/2000/08/09/rdfdb/index.html"> <title>Putting RDF to Work</title> <link>http://xml.com/pub/2000/08/09/rdfdb/index.html</link> <description> Tool and API support for the Resource Description Framework is slowly coming of age. Edd Dumbill takes a look at RDFDB, one of the most exciting new RDF toolkits. </description> </item>

</rdf:RDF>

Page 11: SI502-Data Formats and XML - University of Michigan...Mention of specific products in this material solely represents the opinion of the speaker and does not represent an endorsement

<?xml version="1.0"?><rss version="2.0"> <channel> <title>Lift Off News</title> <link>http://liftoff.msfc.nasa.gov/</link> <description>Liftoff to Space Exploration.</description> <item> <title>Star City</title> <link>http://liftoff.msfc.nasa.gov/news/2003/news-starcity.asp</link> <description>How do Americans get ready to work with Russians aboard the International Space Station? They take a crash course in culture, language and protocol at Russia's Star City.</description> <pubDate>Tue, 03 Jun 2003 09:39:21 GMT</pubDate> <guid>http://liftoff.msfc.nasa.gov/2003/06/03.html#item573</guid> </item> <item> <title>Space Exploration</title> <link>http://liftoff.msfc.nasa.gov/</link> <description>Sky watchers in Europe, Asia, and parts of Alaska and Canada will experience a partial eclipse of the Sun on Saturday, May 31st.</description> <pubDate>Fri, 30 May 2003 11:06:42 GMT</pubDate> <guid>http://liftoff.msfc.nasa.gov/2003/05/30.html#item572</guid> </item> </channel></rss>

Summary

• We have protocols that allow cooperating applications to exchange data

• One common data format is XML

• XML Schema can be used to establish a contract about the format of a set of XML documents

• RSS is a format used for “feeds” that are published by sites and consumed by readers