70
1 enkat Subramaniam – [email protected] enkat Subramaniam – [email protected] HTML and XML

1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

Embed Size (px)

Citation preview

Page 1: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

1Venkat Subramaniam – [email protected] Subramaniam – [email protected]

HTML and XML

Page 2: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

2Venkat Subramaniam – [email protected] Subramaniam – [email protected]

HTML• Hyper Text Markup Language• HTML 4.0 has strict compliance with XML

standard• Presentation details presented with

information – using markups

• Browsers act as interpreters/parsers in – parsing through HTML documents– displaying the contents of the documents

Page 3: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

3Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Tags, Elements and Attributes<STRONG>boldface Text</STRONG>

<HR>

<TABLE BORDER="1">…</TABLE>• Tag starts with < and ends with >• Elements generally have start and end tags

– starts with <TagName> – ends with </TagName> (optional in some cases)– contents of elements included between tags

• Attributes – Name=Value specifies information about contents in

an element– Provided between tag name and ending >– Multiple attributes separated by space

Page 4: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

4Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Tags, Case, well-formedness• HTML is relaxed when it comes to case and well-

formedness• <HR> is as good as <hr> as are <Hr> and <hR>

• <STRONG>This is <I> italics</I> Text</STRONG>

• However, – <STRONG>This is <I> italics</STRONG> </I> Text

– Is generally accepted, though not well-formed– How does a browser handle this? Try it on different

browsers

• XML on the other hand is well-formed and case sensitive

• XHMTL is HTML following XML restrictions

Page 5: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

5Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Tags, Line Breaks, Special Characters

• Block-level tags affect a block of text/content– HEAD, BODY, P, H1, BR, UL, TABLE

• Inline tags affect only a few letters or words– EM, B, IMG

• Line breaks– generally include automatic in block-level tags– Not so with inline tags

• Special characters– <, >, & and " are special characters– To display these use names (&lt;, &gt;, &amp;, &quot;) or numbers ()

Page 6: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

6Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Common Tags• <HTML> Optional tag indicating content type• <TITLE> Title of a web page• <BODY> Content of a web page• <Hn ALIGN=direction>

Level 1 to 6 of header (Times New Roman

24, 18, 14, 12, 10 and 8 points)

direction = left, right or center• <P ALIGN=direction>

Space between paragraphs

Page 7: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

7Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Text Formatting – Font, Size• Specifying Font (deprecated in HTML 4.0)• <FONT SIZE="value" FACE="name1, name2" COLOR="value">– Size value may be 1 to 7 (Times 8, 10, 12, 14, 18, 24,

36)– Size may also be +n or –n to specify a point higher or

lower• Also may be altered with <BIG> or <SMALL> tags

– If name1 is not available on system, select name2• More alternatives may be specified

– If none of the alternatives available, choose default

• You may set default size for entire document using <BASEFONT SIZE=“value”>

Page 8: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

8Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Text Formatting - Color• Color value can be specified

– using either #rrggbb value– Or using “color” for one of 16 predefined

colors

• <BODY TEXT=“value”>– Sets the default color for text in the

document

• <FONT COLOR=“value”>– Sets the color for the content of this element

Page 9: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

9Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Text Formatting - Miscellaneous

• <SUB> for subscript• <SUP> for superscript• <STRIKE> for strikeout• <U> for underline• <B> or <STRONG> for boldface• <I> or <EM> for italics• <CODE>, <KBD>, <SAMP>, <TT> for monospace• <BLINK> for blinking text• <!– to start comments and end with -->• All these tags have a start and end tag

Page 10: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

10Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Links• Links are used to relate documents together

– to navigate, to view, to take some action, etc.

• Link has three parts destination, label and target<A HREF=“anotherPage.html” >Next</A>– HREF provides target, Next is the label– A special attribute called TARGET may be used to tell browser

to display in another frame or new window (_blank)

• target names are case sensitive• <BASE TARGET=“…”> in head section sets default target for page

• Good practice to use relative URL – use absolute for outside web pages

• Links may be of other types: ftp, news, mailto, etc.

Page 11: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

11Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Links and Anchors• You may define an anchor within a

document– <A NAME=“anchorName”>…</A>

• You may link to that location in document by– <A HREF=“#anchorName”>label</A>– <A HREF=“URL#anchorName”>label</A>

Page 12: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

12Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Tables<TABLE>

<TR><TD>cell 1 content</TD><TD>cell 2 content</TD>

</TR>…

</TABLE>

• TABLE attribute BORDER=n defines thickness – default is 2– If you do not specify, the border is drawn with space, not line– to add extra space around table, use HSPACE or VSPACE

• TABLE attribute ALIGN=center will center the table • TABLE or TD attribute WIDTH=n sets cell width pixels

– size specified ignored if specified space is too small for contents• Attribute of TD, COLSPAN=n specifies number of columns to span

– use ROWSPAN to span across rows• Use <TH> for table header, centered and boldfact• Use <CAPTION> for a table caption

– attribute ALIGN=direction (top, bottom, left, right)

Page 13: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

13Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Lists• You may create (un)ordered list and definitions

lists– May be plain, numbered, bulleted

<OL TYPE=X>

<LI> list item 1</LI>

<LI> list item 2</LI>

</OL>– Type is optional (defaults to 1 for numbers)– A for capital letters, a for small letters, I for capital

roman numerals, i for small roman numerals– Use START=n for initial value for list item

• always numeric and converted automatically to proper type– In LI, may override TYPE, VALUE for this & following items

Page 14: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

14Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Unordered List• Use <UL> to create unordered list

• Use attribute TYPE=shape for bullet type– disc for solid round bullet (default for 1st

level)– circle for an empty round bullet (default for

2nd level)– square for square bullets (default for >= 3rd

level)

• <LI> may override the type

Page 15: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

15Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Definition Lists• Great to create lists that describe items

– Like glossaries

<DL>Text here will appear on own line

<DT>Text To Appear On Own Line Aligned Left</DT>

<DD> Definition text </DD>

</DL>

– You may have multiple of DLs and DTs to allow multiple words or definitions

Page 16: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

16Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Images• HTML tag IMG allows placement of images• <IMG SRC=“LocationAndNameOfImageFile”>

• Attributes– BORDER=“n” – ALT=“tooltip or alternate text”

• specify a text that may appear instead of image• this also serves a tool tip on windows• a required attribute in HTML 4

– WIDTH=“x” HEIGHT=“y”• allows browser to optimize size for image while displaying text

– LOWSRC• specify a fast load low resolution image to be shown first• high resolution image is loaded slowly replacing the low resolution

image– ALIGN

• align left or right to allow text wrapping around image– HSPACE=“pixel” VSPACE=“pixel”

• Provides padding on sides (horizontal and vertical) around image

Page 17: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

17Venkat Subramaniam – [email protected] Subramaniam – [email protected]

BR, CLEAR and Text Wrapping• <BR> command provides a line break• CLEAR attribute says do not begin text until the

specified margin is clear

– <BR CLEAR=“left”>• Do not begin text until left margin is clear of images

– <BR CLEAR=“right”>• Do not begin text until right margin is clear of images

– <BR CLEAR=“all”>• Do not begin text until both margins are clear of images

Page 18: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

18Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Forms• Form has three parts

– FORM tag with URL of the action script– form elements, text, radio buttons, etc.– Submit button to send data to the script

<FORM METHOD=POST ACTION=“scriptURL”>

…</FORM>

• The method may be POST or GET– GET is limiting in the amount of information

sent• sent as part of query string

Page 19: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

19Venkat Subramaniam – [email protected] Subramaniam – [email protected]

FORM elements• Elements are created using<INPUT TYPE=“type” NAME=“name” VALUE=“initvalue”>

– name and user given value are sent as name=value– Use attributes DISABLED or READONLY if desired

• Text box– TYPE=“text”– Attributes: SIZE=“n” MAXLENGTH=n

– last two attributes are in number of characters, optional

– SIZE defaults to 20

• Password box– A text box where what you type is not shown

(asterisks)– Not encrypted when sent to server, though

Page 20: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

20Venkat Subramaniam – [email protected] Subramaniam – [email protected]

• Radio button– TYPE=“radio”– NAME=“radioset”

• where radioset is group name for mutually exclusive buttons• verifies that only one of the group is set• This is the name sent to server side script, as well

– attribute CHECKED if you like button checked initially– VALUE=“value” is the value sent if this button checked

• Check box– TYPE=“checkbox”– attribute CHECKED if you like button checked initially– VALUE=“value” is the value sent if this button checked

FORM elements…

Page 21: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

21Venkat Subramaniam – [email protected] Subramaniam – [email protected]

• Uploading files– TYPE=“file”– NAME=“title” for server to identify– SIZE=n number of chars of field to enter path/file

• default 20

– In the FORM tag, use attribute ENCTYPE=“multipart/form-data”

– METHOD on FORM should be POST

• Hidden fields– Useful to maintain session information– TYPE=“hidden”

FORM elements…

Page 22: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

22Venkat Subramaniam – [email protected] Subramaniam – [email protected]

• Menu<SELECT NAME=“name” SIZE=“n” MULTIPLE><OPTION SELECTED VALUE=“value”>label</OPTION>…</SELECT>

– SIZE is height in lines– SELECTED is optional, initial selection of menu item

• Text Area– When one line is not enough– <TEXTAREA NAME=“name” ROWS=“n” COLS=“n” WRAP>

– ROWS defaults to 4 and COLS to 40, WRAP optional– User may provide up to 32,700 chars

FORM elements…

Page 23: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

23Venkat Subramaniam – [email protected] Subramaniam – [email protected]

• Submit button<INPUT TYPE=“submit” VALUE=“button text”>– if you do not provide value, the word Submit appears– if you set the name attribute, value is sent to server

• Use TYPE=“reset” to provide a clear/reset button• HTML 4 adds BUTTON tag that allows you to

– change the font– background color– image<BUTTON TYPE=“submit” NAME=“name” VALUE=“value” STYLE=“font: size FontName;background:color”>Text to left of image <IMG SRC=“imageFileName”>Text to right of image

</BUTTON>

FORM elements…

Page 24: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

24Venkat Subramaniam – [email protected] Subramaniam – [email protected]

• You may also use an image to send information

• <INPUT TYPE=“image” SRC=“imageFileName”

NAME=“name”>

• Mouse coordinate on which user clicks is sent – as name.x and name.y– Top-left of image is (0, 0)

FORM elements…

Page 25: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

25Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Organizing Form Elements

• You may put a box around elements<FORM…>

<FIELDSET><LEGEND ALIGN=right>box caption</LEGEND>

… elements …

</FIELDSET>… other fieldsets

</FORM>• Simply surround elements with FIELDSET

element

Page 26: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

26Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Running a Script on Input• It is useful to run a script when user

makes a selection– JavaScript is the default scripting language

• Simply add an attribute of an event type to the tag

• Specify the code to execute– You may either type the code right there or

refer to it<BUTTON TYPE=“button” NAME=“Time”

ONCLICK=“alert(‘Today is ‘ + Date())”>

Current Time</Button>

We will see this put to work in JavaScript session

Page 27: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

27Venkat Subramaniam – [email protected] Subramaniam – [email protected]

HTML Events• ONBLUR user leaves an element that has focus• ONCHANGE user modifies content of element (like

INPUT)• ONCLICK / ONDBLCLICK user clicks / double clicks on specified area• ONFOCUS user selects, clicks or tabs to element• ONKEYDOWN / ONKEYPRESS user types something in the specified area• ONKEYUP user releases key after typing• ONLOAD page is loaded in browser• ONMOUSEDOWN mouse pressed down over the element• ONMOUSEMOVE mouse moved over after pointing at

element• ONMOUSEOVER mouse moved away from element after being over• ONMOUSEUP mouse released after the click• ONRESET form’s reset button clicked• ONSELECT selected one or more words in element• ONSUBMIT form’s submit button clicked• ONUNLOAD browser loads different page after

specified page

Page 28: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

28Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Cascading Style Sheets• HTML allows specification of fonts, colors,

etc.• These may be placed through out the

document– results in poor maintainability– What if you want to change these

• This is where CSS comes in • You specify the formatting or styling

separately in – the top of the document– or in a separate document

Page 29: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

29Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CSS: Specifying Style• Instead of defining style all over document,• specify at the top and simply refer to it in

document• Specification has two parts:

– selector• this is a name you associate a style with

– declarations• this is definition of how it should look

• The specification may be local, internal or external

• The cascade:– local overrides internal which in turn may override

external specifications

Page 30: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

30Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CSS: Local Style• This style applies to the element on which

it is declared

• This takes a local effect

• Useful to alter the style specified internally in the document or externally from another file

Page 31: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

31Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CSS: Internal Style• Specified between the <HEAD> and the

</HEAD>• Provide one or more selectors

– Separate by comma for declarations to apply to all of selectors

– Separate by space if declarations to apply to only nested selectors and not other appearances

• Provide the declarations – within the {}, separated by ;

Page 32: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

32Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CSS: External Style Sheet• Writing the style in a separate file allows

sharing of the style and applying it to more than one page

• Pages link the style sheet that specifies the style

• You may apply internal style sheet as well as local at the same time

Page 33: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

33Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CSS: Defining Classes• You can define a class or category and

style for that class

• Any element defined to be as part of that class will use the specified style for that class

• Classes are defined to belong to a certain selector type using the format selectoryName.className

Page 34: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

34Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CSS: Defining IDs• ID can be defined for individual elements in

your document– The ID must be unique

• Style can be specified for that tag/element– Tag name followed by # followed by the ID

• The style applies only for that element with that ID

• Scripts may also identify that element in document

Page 35: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

35Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CSS: DIV and SPAN• Style may be specified on pre-defined

tags – like Hn and P– how to apply style on a wide range of items?

• DIV and SPAN allows you to define areas of document over which a style may be applied

• DIV is a block-level tag while SPAN in an inline tag

Page 36: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

36Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CSS: Font Styles• font-family

• specify a list of fonts to choose from• font-family:”Times Roman”, “Helvetica”, “Ariel”

• font-style• specify whether font should be italic, oblique, or normal• font-style:italic• to remove italic font-style:normal

• font-weight• specifies boldness of text; possible values: bold, bolder, lighter

• or multiple of 100s between 100 and 900, with 400 for book weight and 700 for bold

• normal will remove bold

• font-size• specify absolute font size: xx-small, x-small, small, medium, large, x-large, xx-large

• specify relative font size: large, small• exact point size: 18pt• percentage relative size: 200%

Page 37: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

37Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CSS: Font Style…• line-height

– specifies the space between lines (leading) within a paragraph

– line-height:15pt or line-height:50%

• All the font-styles may be specified in one shot as well– Specify in the following order, space

separated:• font-size/line-height font-weight small-cap font-size

font-family– / separates font-size from line-height

Page 38: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

38Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CSS: Text Color Style• color

– specify one of 16 colors or #rrggbb or rgb(r, g, b) or (r%, g%, b%)

• background– transparent or a color value– url(image.gif) to specify an image file name– repeat to tile the image, repeat-x for horizontal

tiling, repeat-y for vertical tiling– fixed or scroll for background to scroll along

canvas– x y for position of background image from top-

left corner

Page 39: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

39Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CSS: Text Spacing Style• word-spacing• letter-spacing• text-indent• white-space

– pre to preserve extra spaces; nowrap to keep elements on same line; normal to return to normal behavior

• text-align– left, center, right, justify

• text-decoration– underline, overline, line-through, none, blink

• blink not supported by IE, generally not recommended as well

• text-transform– capitalize, uppercase, lowercase, none

• font-variant:small-caps will type uppercase in lowercase size

Page 40: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

40Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Markup and XMLMarkup and XML• Markup

– conveying metadata with literals/tags to delimit, describe

– Generalized Markup Language (GML)– Standard Generalized Markup Language

(SGML)• adopted by ISO• Popular use, however, too complex

• eXtensible Markup Language (XML) – designed by World Wide Web Consortium (W3C)

– subset of SGML– simpler to read, write and develop parsers

Page 41: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

41Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Why XML?• HTML is de facto standard for mark up

– Markup for information presentation– Talks about how information looks, is

presented– Does not let you add more markups of your

own• What about the information itself?• Need to

– describe information– Extend the descriptions– Must be structured, easy to express and

validate

Page 42: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

42Venkat Subramaniam – [email protected] Subramaniam – [email protected]

What is XML?• XML is about extensibility and flexibility• tags describe and surround the data• Example:<?xml version = "1.0" ?><equipment>

<pump><name> p01 </name><pressure units="psi"> 32.23 </pressure>

</pump><pump>

<name> p02 </name><pressure units="psi"> 22.887 </pressure>

</pump></equipment>

• Open, extensible• Platform independent• Self describing data

– Data Exchange

• Supports query and discovery of data

• Dynamic Data Exchange

Page 43: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

43Venkat Subramaniam – [email protected] Subramaniam – [email protected]

What does XML provide?• Tags delimit content

– lets you define structure of arbitrary complexity

• Self Describing Data– tags describe and name the data being defined– name related to the information it models/represents

• standard eXtensibility – in defining new tags & semantics

• Vocabularies– description of data used for information

exchange – within specific domains

• Separates contents from presentation

Page 44: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

44Venkat Subramaniam – [email protected] Subramaniam – [email protected]

XML System

XML Document

XML Constraint(DTD, Schema)

XML Parser/Processor/Styling

XML APP

Page 45: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

45Venkat Subramaniam – [email protected] Subramaniam – [email protected]

• Well-Formed syntax• Document Type Definitions (DTDs)

– Captures rules added to extend core syntax rules

• Document Object Model (DOM)– API for manipulating, parsing, creating XML documents– provides a tree-structured view of the document– Standard API

• Simple API for XML (SAX)– Provides events as document is being parsed– Leaves it to application to keep state and content

information

• Styling and Transformation (XSL and XSLT)

Features of XML technologies

Page 46: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

46Venkat Subramaniam – [email protected] Subramaniam – [email protected]

The Markup Syntax• XML Entity

– A file or stream with a well-formed structure

• Tags delimit the elements of the structure

• XML Tags are case-sensitive• XML uses Unicode character set• Names are used to identify structures

– Names begin with letter, underscore or colon• Followed by any chars, including numbers, hyphen & period

Start TagAttributes

Content End Tag

Page 47: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

47Venkat Subramaniam – [email protected] Subramaniam – [email protected] (Optional) : comments, processing instructions

Structure of a DocumentProlog (Optional) : comments, processing instructions

BODY : Root Elementcommentsprocessing instructionsElements

AttributesCDATA, Entities, ID,…

PCDATAEntity References

Entity ReferencesCDATA Sections

Document Type Declarationcommentsprocessing instructionsDocument Type Definitions

Element DeclarationsAttribute DeclarationsEntity DeclarationsNotation Declarations

Page 48: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

48Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Markups that go in XML Document

• The following tags may be contained in any XML document– Element start and end tags– Attributes– Comments– Entity references– Processing instructions– Character data sections (CDATA)– Document type declarations

Page 49: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

49Venkat Subramaniam – [email protected] Subramaniam – [email protected]

A Sample XML File

Page 50: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

50Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Elements• Building blocks of an XML document• Element content may include

– Other elements– Character data– Character references– Entity references– Processing instructions– Comments– CDATA sections

• Empty elements may be abbreviated to save space– <ElementTypeName/> indicates an empty

element

StartTag Content EndTag

<ElementTypeName> </ElementTypeName>

Page 51: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

51Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Document and Elements• XML document may be viewed as a

hierarchical tree

Document DocumentRoot

Prolog

DocumentElement

Epilog

Element*

Represents containment/aggregation

*

Page 52: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

52Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Contents• Element Content

– Contains other elements but no character data

• Mixed content– Contains character data and other elements

• Character content– Contains nothing but character data

• Empty element– Contains nothing

Page 53: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

53Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Nesting• XML requires proper nesting of elements• Items must be fully contained within their

nested level

• XML is strict about proper nesting unlike HTML– Allowing ambiguity leads to programming

complexity– Keep it simply policy– Gives not well-formed error if encountered– Results in fatal error/termination of parsing

Page 54: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

54Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Name• A name

– begins with an alphabetic character or an underscore

– followed by alphanumeric characters, periods, hyphens, underscores or full stops

Name = (Letter | '_') (Char)*Char = Letter | Digit | '.' | '-' | '_'

Page 55: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

55Venkat Subramaniam – [email protected] Subramaniam – [email protected]

XML String Literals• Literals are delimited by apostrophe or quote

• "hello" 'hi'

• Character used as delimiter can’t appear in literal• "George, What's up!"• 'He said "what a nice day!"'• Following is not valid: 'what's up'

– apostrophe may be used as an escape character in front of a quote• "He said '"what a nice day!'""

– quote may be used as an escape character in front of an apostrophe• 'George, What"'s up!'

• What if you need to use apostrophe and quote– You may use entity reference: the &apos; or &quot;

• 'I asked George, What&apos;s up, "He said, fine"'

Page 56: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

56Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Attributes• Element generally describes & contains

information• Attributes provide information that are

part of element rather than being contained in it– Generally talks about the information format, etc.

• Name-value pair•attributeName="value"•attributeName='value'

– The value must be a string literal; numbers not allowed

– An attribute may appear only ones within a tag

Page 57: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

57Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Special Attributes• xml:space

– White spaces are not generally preserved– How does one indicate that there is a space– xml:space tells that a space is encoded into the

document– Recommends that the space must be preserved– Applications may choose to honor or ignore the space– Must take a value of "preserve" or "default"

• xml:lang– Indicates the language/locale info of the XML

document• If present, these two attributes apply on all nested

elements as well

Page 58: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

58Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Special Characters• White spaces:

– Horizontal Tab(09), Line-feed(0A), Carriage-return(0D), space (20)

– Parsers preserve white spaces within element content– May remove from attributes and element tags

• End-of-line– End of line is generally indicated by

• A carriage-return followed by line-feed• Only a line-feed• Only a carriage-return

• XML parsers required to convert to single line-feed– UNIX-style favored

Page 59: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

59Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Character References• Character References

– Represent displayable characters that can’t be placed in a well-formed document as is

– The character may be represented using• &# prefixed before a decimal number

representing char• &#x prefixed before a hexadecimal number

representing char

Page 60: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

60Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Entity References• Entity References

– Think of these as macro definitions– Allows insertion of string literals– Provides mnemonic equivalence– Starts with an & and ends with a ;– Predefined Entity references:

•&amp;, &lt;, &gt;, &apos;, &quot;

• Rather than repeating content, you can refer where to find it– Declare the substitution text in doctype – Refer to it by &name;

Page 61: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

61Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Processing Instructions• Processing Instructions (PI) allows you to

provide hints to applications as part of the document

• PI consists of two things:– a target tag followed by instruction

•<?target instruction ?>

– The target tag is an XML name that identifies the application the instruction is intended for

– Instruction is a string literal

• To avoid confusion with – <?xml version = "1.0" ?>

– PI can’t be a string "xml" or "XML"

Page 62: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

62Venkat Subramaniam – [email protected] Subramaniam – [email protected]

XML Comments• Comments may be present any where in a

document– Except as part of other markup

• Comments start with <!-- and end with -->• May contain any string that does not

– have --– does not end with -

• Entities within comments are not expanded• Markups within comments are not

interpreted

Page 63: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

63Venkat Subramaniam – [email protected] Subramaniam – [email protected]

CDATA Sections• CDATA sections are bulk of document

that will not be interpreted for markup<![CDATA[ ]]>

• Starts with the tag:– <![CDATA[

• Ends with the tag– ]]>

• The contained text can’t have– String that contains the delimiter ]]>– Nested CDATA

non parsed data

Page 64: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

64Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Prolog• Optional member of an XML document• Provides hints and information on

encoding methods• Contains

– Optional XML declaration– Optional comments (several)– PIs– White space characters– Optional Document Type Declarations (not

DTDs)• Ties DTD to the document

Page 65: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

65Venkat Subramaniam – [email protected] Subramaniam – [email protected]

XML Declaration• XML declaration is optional• If present

– Must be the first in the document• No comments or white spaces allowed to precede

– The xml tag must be lowercase•<?xml version="1.0" ?>

• Attributes:– version required. For future versions– encoding optional. UTF-8, UTF-16, IS-8859-1 (Latin-1), etc.

– standalone optional. yes or no (external DTD required)

Page 66: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

66Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Epilog• Optional member of an XML document

• Contains– Optional comments (several)– PIs– White space characters

• Use of this is ambiguous since it is optional and most applications may not wait for reading this

Page 67: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

67Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Well-formed Document• An XML document is said to be well-

formed if– The document syntax conforms to XML

specifications– Elements form a hierarchical tree with a

single root node– There are no references to external entities

• Unless DTD is provided

– A Well-formed XML document is• case sensitive• expects you to close tags• does not allow overlapping tags

Page 68: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

68Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Parsers• An XML Processor or Parser is an application

that will read through an XML document and interpret it

• Parser Types– Non-validating

• Ensures data object/document is well-formed XML– Validating

• Validates, using DTD, well-formed data object’s form and content

• Parser Implementations– Event-driven Parsers

• Parser calls back into application as it identifies data• Applications handle the data• Parser does not keep the tree structure or the data upon parsing• Memory resource usage is minimal

– Tree-based Parsers• A tree structure of the document is built in memory• This tree is then manipulated using an interface

Page 69: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

69Venkat Subramaniam – [email protected] Subramaniam – [email protected]

XML Parsers• Several parsers available in the market– Xerces (Apache)– JAXP (More of an API from Sun)– MSXML (Microsoft)– Expat (James Clark)– RXP (Richard Tobin)– XP (James Clark)– XML4J (IBM)– XML::Parser (Clark Cooper)– Pyexpat (Jack Jansen)– Lark (Tim Bray)– TclXML (Steve Ball)

Page 70: 1 HTML and XML. 2 HTML Hyper Text Markup Language HTML 4.0 has strict compliance with XML standard Presentation details presented with information –using

70Venkat Subramaniam – [email protected] Subramaniam – [email protected]

Major APIs• DOM API

• SAX API

• JDOM

• XSLT

• XPath