XML Schema Basics

Embed Size (px)

Citation preview

  • 8/12/2019 XML Schema Basics

    1/25

    XML Schema BasicsIn this lesson of the XML tutorial, you will learn...1. The purpose of XML Schema.2. The limitations of DTDs.3. The power of XML Schema.4. How to validate an XML Instance with an XML schema.The Purpose of XML SchemaXML Schema is an XML-based language used to create XML-based languages and datamodels.An XML schema defines element and attribute names for a class of XML documents.Theschema also specifies the structure that those documents must adhere to and thetype of contentthat each element can hold.XML documents that attempt to adhere to an XML schema are said to be instances of thatschema. If they correctly adhere to the schema, then they are valid instances. This is not the sameas being well formed. A well-formed XML document follows all the syntax rules ofXML, but itdoes necessarily adhere to any particular schema. So, an XML document can be well formedwithout being valid, but it cannot be valid unless it is well formed.The Power of XML Schema

    You may already have some experience with DTDs. DTDs are similar to XML schemasin thatthey are used to create classes of XML documents. DTDs were around long before the advent ofXML. They were originally created to define languages based on SGML, the parentof XML.Although DTDs are still common, XML Schema is a much more powerful language.As a means of understanding the power of XML Schema, let's look at the limitations of DTD.1. DTDs do not have built-in datatypes.2. DTDs do not support user-derived datatypes.3. DTDs allow only limited control over cardinality (the number of occurrences of an

    element within its parent).4. DTDs do not support Namespaces or any simple way of reusing or importingotherschemas.A First LookAn XML schema describes the structure of an XML instance document by defining what eachelement must or may contain. An element is limited by its type. For example, anelement ofcomplex type can contain child elements and attributes, whereas a simple-type element can onlycontain text. The diagram below gives a first look at the types of XML Schema elements.

    Schema authors can define their own types or use the built-in types. Throughoutthis course, wewill refer back to this diagram as we learn to define elements. You may want tobookmark thispage, so that you can easily reference it.The following is a high-level overview of Schema types.1. Elements can be of simple type or complex type.2. Simple type elements can only contain text. They can not have child elements or

  • 8/12/2019 XML Schema Basics

    2/25

    attributes.3. All the built-in types are simple types (e.g, xs:string).4. Schema authors can derive simple types by restricting another simple type. For example,an email type could be derived by limiting a string to a specific pattern.5. Simple types can be atomic (e.g, strings and integers) or non-atomic (e.g, lists).6. Complex-type elements can contain child elements and attributes as wellas text.7. By default, complex-type elements have complex content, meaning that they have childelements.8. Complex-type elements can be limited to having simple content, meaning they onlycontain text. They are different from simple type elements in that they have attributes.9. Complex types can be limited to having no content, meaning they are empty, but theyhave may have attributes.10. Complex types may have mixed content - a combination of text and child elements.A Simple XML SchemaLet's take a look at a simple XML schema, which is made up of one complex type element with

    two child simple type elements.Code Sample: SchemaBasics/Demos/Author.xsd

    Code ExplanationAs you can see, an XML schema is an XML document and must follow all the syntaxrules ofany other XML document; that is, it must be well formed. XML schemas also have to follow therules defined in the "Schema of schemas," which defines, among other things, thestructure ofand element and attribute names in an XML schema.Although it is not required, it is a common practice to use the xs qualifier (see footnote) toidentify Schema elements and types.The document element of XML schemas is xs:schema. It takes the attribute xmlns:xs with the

    value of http://www.w3.org/2001/XMLSchema, indicating that the document should follow therules of XML Schema. This will be clearer after you learn about namespaces.In this XML schema, we see a xs:element element within the xs:schema element. xs:element isused to define an element. In this case it defines the element Author as a complex type element,which contains a sequence of two elements: FirstName and LastName, both of whichare of thesimple type, string.

  • 8/12/2019 XML Schema Basics

    3/25

  • 8/12/2019 XML Schema Basics

    4/25

    Simple-Type ElementsIn this lesson of the XML tutorial, you will learn...1. To use XML Schema's built-in simple types.2. To derive your own types.3. To define list types.4. To define union types.5. To declare global simple-type elements.6. To set default and fixed values.7. To allow for nil values.OverviewSimple-type elements have no children or attributes. For example, the Name element below is asimple-type element; whereas the Person and HomePage elements are not.Code Sample: SimpleTypes/Demos/SimpleType.xmlMark TwainCode ExplanationAs the diagram below shows, a simple type can either be built-in or user-derived

    . In this lesson,we will examine both.Built-in Simple TypesXML Schema specifies 44 built-in types, 19 of which are primitive.19 Primitive Data TypesThe 19 built-in primitive types are listed below.1. string2. boolean3. decimal4. float5. double6. duration7. dateTime

    8. time9. date10. gYearMonth11. gYear12. gMonthDay13. gDay14. gMonth15. hexBinary16. base64Binary17. anyURI18. QName19. NOTATIONBuilt-in Derived Data Types

    The other 25 built-in data types are derived from one of the primitive types listed above.1. normalizedString2. token3. language4. NMTOKEN5. NMTOKENS6. Name7. NCName8. ID

  • 8/12/2019 XML Schema Basics

    5/25

    9. IDREF10. IDREFS11. ENTITY12. ENTITIES13. integer14. nonPositiveInteger15. negativeInteger16. long17. int18. short19. byte20. nonNegativeInteger21. unsignedLong22. unsignedInt23. unsignedShort24. unsignedByte25. positiveIntegerDefining a Simple-type ElementA simple-type element is defined using the type attribute.Code Sample: SimpleTypes/Demos/Author.xsd

    Code ExplanationNotice the FirstName and LastName elements in the code sample below. They are not explicitlydefined as simple type elements. Instead, the type is defined with the type attribute. Because thevalue (string in both cases) is a simple type, the elements themselves are simpl

    e-type elements.The Author element is a complex-type element. Complex-type elements will be covered in"Complex-Type Elements".Exercise: Building a Simple SchemaDuration: 10 to 15 minutes.In this exercise, you will build a simple XML schema.1. Open SimpleTypes/Exercises/Song.xsd for editing.2. Between the open and close xs:sequence tags, declare three new elements:o Title of type xs:string.o Year of type xs:gYear.o Artist of type xs:string.3. Save the file.

    4. Try to validate LoveMeDo.xml against the schema you just created. If theXML document isinvalid, fix your schema.Code Sample: SimpleTypes/Exercises/Song.xsd

  • 8/12/2019 XML Schema Basics

    6/25

    Add three simple-type elements: 1. Title 2. Year 3. Artist --> Where is the solution?User-derived Simple TypesA schema author can derive a new simple type using the element.This simpletype can then be used in the same way that built-in simple types are.Simple types are derived by restricting built-in simple types or other user-derived simple types.For example, you might want to create a simple type called password that is an eight-characterstring. To do so, you would start with the xs:string type and restrict its length to eight characters.This is done nesting the element inside of the element.Code Sample: SimpleTypes/Demos/Password.xsd

    Code Sample: SimpleTypes/Demos/Password.xmlMyPasWrdApplying FacetsSimple types can be derived by applying one or more of the following facets.* length* minLength* maxLength* pattern

    * enumeration* whiteSpace* minInclusive* minExclusive* maxInclusive* maxExclusive* totalDigits* fractionDigitsControlling LengthThe length of a string can be controlled with the length, minLength, and maxLeng

  • 8/12/2019 XML Schema Basics

    7/25

    th facets. Weused the length facet in the example above to create a Password simple type as an eight-characterstring. We could use minLength and maxLength to allow passwords that were between six andtwelve characters in length.The schema below shows how this is done. The two XML instances shown below it are bothvalid, because the length of the password is between six and twelve characters.Code Sample: SimpleTypes/Demos/Password2.xsd

    Code Sample: SimpleTypes/Demos/Password2.xmlMyPassCode Sample: SimpleTypes/Demos/Password2b.xmlMyPassWord

    Specifying PatternsPatterns are specified using the xs:pattern element and regular expressions. (see footnote) Forexample, you could use the xs:pattern element to restrict the Password simple type to consist ofbetween six and twelve characters, which can only be lowercase and uppercase letters andunderscores.Code Sample: SimpleTypes/Demos/Password3.xsd

  • 8/12/2019 XML Schema Basics

    8/25

    Code Sample: SimpleTypes/Demos/Password3.xmlMyPasswordWorking with NumbersNumeric simple types can be derived by limiting the value to a certain range using minExclusive,minInclusive, maxExclusive, and maxInclusive. You can also limit the total number of digits andthe number of digits after the decimal point using totalDigits and fractionDigits, respectively.Mins and MaxsThe following example shows how to derive a simple type called Salary, which isa decimalbetween 10,000 and 90,000.Code Sample: SimpleTypes/Demos/Employee.xsd

    Code Sample: SimpleTypes/Demos/JohnSmith.xml

    55000Number of DigitsUsing totalDigits and fractionDigits, we can further specify that the Salary type should consist ofseven digits, two of which come after the decimal point. Both totalDigits and fractionDigits aremaximums. That is, if totalDigits is specified as 5 and fractionDigits is specified as 2, a validnumber could have no more than five digits total and no more than two digits aft

    er the decimalpoint.Code Sample: SimpleTypes/Demos/Employee2.xsd

  • 8/12/2019 XML Schema Basics

    9/25

    Code Sample: SimpleTypes/Demos/MarySmith.xml55000.00EnumerationsA derived type can be a list of possible values. For example, the JobTitle element could be a listof pre-defined job titles.Code Sample: SimpleTypes/Demos/Employee3.xsd

    Code Sample: SimpleTypes/Demos/SteveSmith.xml

    90000.00Sales ManagerWhitespace-handlingBy default, whitespace in elements of the datatype xs:string is preserved in XMLdocuments;however, this can be changed for datatypes derived from xs:string. This is done

  • 8/12/2019 XML Schema Basics

    10/25

    with thexs:whiteSpace element, the value of which must be one of the following.* preserve - whitespace is not normalized. That is to say, it is kept as is.* replace - all tabs, line feeds, and carriage returns are replaced by single spaces.* collapse - all tabs, line feeds, and carriage returns are replaced by single spaces and then allgroups of single spaces are replaced with one single space. All leading and trailing spaces arethen removed (i.e, trimmed).In SimpleTypes/Demos/Password.xsd, we looked at restricting the length of a Password datatypeto eight characters using the xs:length element. If whitespace is preserved, then leading andtrailing spaces are considered part of the password. In the following example, we setxs:whiteSpace to collapse, thereby discounting any leading or trailing whitespace. As you cansee, this allows the XML instance author to format the document without consideration ofwhitespace.Code Sample: SimpleTypes/Demos/Password4.xsd

    Code Sample: SimpleTypes/Demos/Password4.xml 12345678Exercise: Restricting Element ContentDuration: 15 to 20 minutes.In this exercise, you will further restrict the Song schema, so that the Title a

    nd Artist elementswill have a specified pattern and the Year will be 1950 or later.1. Open SimpleTypes/Exercises/Song.xsd and save it as Song2.xsd in the samedirectory.2. Define a simple type called ProperName that follows this pattern. Note that the only space inthe pattern is the one before the question mark.([A-Z0-9][A-Za-z0-9\-']* ?)+3. Change the Title and Artist elements to be of the ProperName type.4. Define another simple type called Year, which is derived from gYear and

  • 8/12/2019 XML Schema Basics

    11/25

    only accepts yearsbetween 1950 and 2010, inclusive.5. Change the Year element to be of the Year type.6. Try to validate SimpleTypes/Exercises/CantBuyMeLove.xml against the schema you justcreated. If the XML document is invalid, fix your schema.Code Sample: SimpleTypes/Exercises/Song.xsd Where is the solution?Specifying Element Type Locally

    So far in this lesson, we have been defining simple types globally and then setting the typeattribute of element declarations to be of our derived simple types. This makesit easy to reuse asimple type across multiple elements, as we saw with the ProperName type in thelast exercise.It is also possible to define the type of an element locally. The type is then unnamed andapplicable only to that element. The only reason to do this is to clearly show that the type isspecific to that element and not meant for reuse.Code Sample: SimpleTypes/Demos/PasswordLocal.xsd

    Nonatomic TypesAll of XML Schema's built-in types are atomic, meaning that they cannot be broken down intomeaningful bits. XML Schema provides for two nonatomic types: lists and unions.ListsList types are sequences of atomic types separated by whitespace; you can have alist of integers

  • 8/12/2019 XML Schema Basics

    12/25

    or a list of dates. Lists should not be confused with enumerations. Enumerationsprovide optionalvalues for an element. Lists represent a single value within an element.Code Sample: SimpleTypes/Demos/EmployeeList.xsd

    Code Sample: SimpleTypes/Demos/SandySmith.xml

    44000Salesperson2006-8-13 2006-08-14 2006-08-15UnionsUnion types are groupings of types, essentially allowing for the value of an element to be of morethan one type. In the example below, two atomic simple types are derived: RunningRace andGymnastics. A third simple type, Event, is then derived as a union of the previous two. The

    Event element is of the Event type, which means that it can either be of the RunningRace or theGymnastics type.Code Sample: SimpleTypes/Demos/Program.xsd

  • 8/12/2019 XML Schema Basics

    13/25

    Code Sample: SimpleTypes/Demos/100Meters.xml100 metersExercise: Adding Nonatomic TypesDuration: 10 to 15 minutes.In this exercise, you will add a nonatomic type to the song schema.1. Open SimpleTypes/Exercises/Song2.xsd and save it as Song3.xsd in the same directory.2. Define a new simple type called SongLength, which is an enumeration of t

    hree values: "Short","Medium', and "Long".3. Define another new simple type called SongTime, which is a union of xs:duration andSongLength.4. At the end of the sequence of elements within the Song element, insert an additionalelement, Length, which is of the SongTime datatype.5. Try to validate SimpleTypes/Exercises/TicketToRide.xml andSimpleTypes/Exercises/EleanorRigby.xml against the schema you just created. If either XMLdocument is invalid, fix your schema.Where is the solution?

    Declaring Global Simple-Type ElementsWhen an element declaration is a child of the xs:schema element, the declared element is global.Global elements can be referenced by other element declarations, allowing for element reuse.Take a look at the following example.Code Sample: SimpleTypes/Demos/AuthorGlobal.xsd

  • 8/12/2019 XML Schema Basics

    14/25

    Code ExplanationIn this example, the FirstName and LastName elements are both declared globally.The globalelements are then referenced as children of the Author sequence.Global vs. Local Simple-Type ElementsThe major advantage of declaring an element globally is that the element can then be referencedthroughout the schema. This makes the code more modular and easier to maintain.For example,suppose that the song schema contained MusicWriter, LyricsWriter, and Singer elements. Each ofthese elements might have the child element Name. If the Name element is declared globally, anychanges to that element can be made in one location.

    The major disadvantage of declaring elements globally is that all global elements must haveunique names.Code Sample: SimpleTypes/Demos/BookLocal.xsd

    Code ExplanationNotice that there are two elements named Title, which can appear in different locations in theXML instance and are of different types. When the Title element appears at the root of the XMLinstance, its value can be any string; whereas, when it appears as a child of Author, its value is

  • 8/12/2019 XML Schema Basics

    15/25

    limited to "Mr.", "Ms.", or "Dr."The example below defines a similar content model; however, because the elementsare declaredglobally, the name Title cannot be used twice.Code Sample: SimpleTypes/Demos/BookGlobal.xsd

    Exercise: Converting Simple-Type Element Declarationsfrom Local to GlobalDuration: 10 to 20 minutes.In this exercise, you will convert the element declarations in the song schema f

    rom local toglobal.1. Open SimpleTypes/Exercises/Song3.xsd and save it as Song4.xsd in the same directory.2. Change the Title, Year, Artist, and Length elements to be declared globally.3. Try to validate SimpleTypes/Exercises/StrawberryFields.xml against the schema you justcreated. If the XML document is invalid, fix your schema.Where is the solution?Default ValuesElements that do not have any children can have default values. To specify a default value, use

    the default attribute of the xs:element element.Code Sample: SimpleTypes/Demos/EmployeeDefault.xsd---- Code Omitted ----

  • 8/12/2019 XML Schema Basics

    16/25

  • 8/12/2019 XML Schema Basics

    17/25

    Code ExplanationThe MinOccurs attribute is used to make the Status element optional. However, ifit is used, itmust contain the value current or be left empty, in which case, the value current is implied. Thefile SimpleTypes/Demos/LauraSmith.xml in the Demos folder validates against thisschema.Nil ValuesWhen an optional element is left out of an XML instance, it has no clear meaning. For example,suppose a schema declares a Name element as having required FirstName and LastNameelements and an optional MiddleName element. And suppose a particular instance of this schemadoes not include the MiddleName element. Does this mean that the instance authordid not knowthe middle name of the person in question or does it mean the person in questionhas no middlename?Setting the nillable attribute of xs:element to true indicates that such element

    s can be set to nil bysetting the xsi:nil attribute to true.Code Sample: SimpleTypes/Demos/AuthorNillable.xsd

    Code Sample: SimpleTypes/Demos/MarkTwain.xmlMarkTwainCode ExplanationBy including the MiddleName element and setting xsi:nil to true, we are explicitly stating that

    Mark Twain has no middle name.Simple-Type Elements ConclusionIn this lesson of the XML tutorial, you have learned to work with XML Schema's built-in simpletypes, to derive your own simple types, and to declare simple-type elements. Wewill now take alook at complex-type elements.

  • 8/12/2019 XML Schema Basics

    18/25

    Complex-Type Elements

    Complex-Type ElementsIn this lesson of the XML tutorial, you will learn...1. To declare complex-type elements.2. To use content models.3. To control cardinality (quantity of elements).4. To declare elements of mixed types.

    5. To define named complex types.OverviewComplex-type elements have attributes, child elements, or some combination of the two. Forexample, the Name and HomePage elements below are both complex-type elements.Code Sample: ComplexTypes/Demos/ComplexType.xml Mark Twain

    As the diagram below shows, a complex-type element can be empty, contain simplecontent suchas a string, or can contain complex content such as a sequence of elements.Whereas it is not necessary to explicitly declare that a simple-type element isa simple type, it isnecessary to specify that a complex-type element is a complex type. This is donewith thexs:complexType element as shown below.Syntax

  • 8/12/2019 XML Schema Basics

    19/25

  • 8/12/2019 XML Schema Basics

    20/25

    Syntax Furthermore, model groups can be nested within each other. The following exampleillustratesthis. Notice that the choice model group, which allows for either a Salary element or a Wageelement is nested with in a sequence model group. Both of the subsequent instances are validaccording to this schema.

    Code Sample: ComplexTypes/Demos/Employee.xsd

    Code Sample: ComplexTypes/Demos/DaveSmith.xml Dave Smith

  • 8/12/2019 XML Schema Basics

    21/25

    90000Code Sample: ComplexTypes/Demos/JillSmith.xml Jill Smith20.50Occurrence ConstraintsBy default, elements that are declared locally must show up once and only once within theirparent. This constraint can be changed using the minOccurs and maxOccurs attributes. Thedefault value of each of these attributes is 1. The value of minOccurs can be any non-negativeinteger. The value of maxOccurs can be any positive integer or unbounded, meaning that theelement can appear an infinite number of times.The example below shows how minOccurs can be used to make an element optional and how

    maxOccurs can be used to allow an element to be repeated indefinitely.Code Sample: ComplexTypes/Demos/Employee2.xsd

  • 8/12/2019 XML Schema Basics

    22/25

    Code ExplanationNote that minOccurs and maxOccurs can also be applied to model groups (e.g, xs:sequence) tocontrol the number of times a model group can be repeated.Exercise: Adding ComplexType ElementsDuration: 15 to 25 minutes.In this exercise, you will modify the song schema, adding model groups and occurrenceconstraints.1. Open ComplexTypes/Exercises/Song.xsd for editing.2. Modify the schema so that the Song element's children can appear in anyorder.3. Change the content model of the Song element so that it contains an Artists element inplace of the Artist element.4. Make the Artists element allow for one or more child Artist elements. Note that the Artistelement is already declared globally.5. Make the Length element optional.6. Try to validate ComplexTypes/Exercises/WeAreTheWorld.xml against the schema youjust created. If the XML document is invalid, fix your schema.

    Where is the solution?Declaring Global Complex-Type ElementsAs with simple-type elements, complex-type elements can be declared globally byplacing theelement declaration as a child of the xs:schema element.Globally declared elements cannot take occurrence constraints. However, the minOccurs andmaxOccurs constraints can be applied to references to globally declared elements. To illustrate,look at the following example. Notice that all elements, both simple-type and complex-type, aredeclared globally and then referenced within the model groups. Some of the references (e.g,

    Responsibilities) have occurrence constraints assigned to them.Code Sample: ComplexTypes/Demos/Employee3.xsd

  • 8/12/2019 XML Schema Basics

    23/25

    Exercise: Converting Complex-Type Elements from Local toGlobal

    Duration: 10 to 15 minutes.In this exercise, you will convert the Artists element declaration in the song schema from local toglobal.1. Open ComplexTypes/Exercises/Song.xsd and save it as Song2.xsd in the same directory.2. Change the Artists element to be declared globally.3. Try to validate ComplexTypes/Exercises/TheGirlIsMine.xml against the schema you justcreated. If the XML document is invalid, fix your schema.Where is the solution?Mixed ContentSometimes an element will contain both child elements and character text. For ex

    ample, a paraelement might contain mostly plain character text, but it could also have otherelements (e.g,emphasis) littered throughout the character text.As an example, let's examine look at the following XML instance document.Code Sample: ComplexTypes/Demos/PaulMcCartney.xml Paul McCartney

    90000 Worked for the Beatles as a

    Singer. Worked for the Beatles as a

    Bass Guitarist. Worked for the Wings as a

    Singer.

  • 8/12/2019 XML Schema Basics

    24/25

    Code ExplanationNotice that the Bio element contains child elements Company and JobTitle as wellas charactertext. Such elements are said to contain mixed content. The syntax for declaringelements withmixed content is shown below.Syntax The following example illustrates how to define this in our employee schema.Code Sample: ComplexTypes/Demos/Employee4.xsd---- Code Omitted ----

    ---- Code Omitted ----

    Defining Complex Types GloballyAs with simple types, complex types can be defined globally. The example below shows how thisis done.Code Sample: ComplexTypes/Demos/Author.xsd

    Code ExplanationAs you can see, complex types are defined with the xs:complexType element. The major

    advantage of defining a complex type globally is that it can be reused. For example, a schemamight allow for an Illustrator element as well as an Author element. Both elements could be oftype Person. This way, if the Person type is changed later, the change will apply to both elements.The instance document below will validate properly against the schema above.Code Sample: ComplexTypes/Demos/MarkTwain.xml

  • 8/12/2019 XML Schema Basics

    25/25

    xsi:noNamespaceSchemaLocation="Author.xsd">MarkTwainComplex-Type Elements ConclusionIn this lesson of the XML tutorial, you have learned to work with complex typesand complex-type elements. We will now take a look at declaring attributes.