15
1 ICS-FORTH May 25, 2001 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May 25, 2001 Center for Cultural Informatics

ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

Embed Size (px)

Citation preview

Page 1: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

1ICS-FORTH May 25, 2001

The Utility of XML

Martin Doerr

Foundation for Research and Technology - HellasInstitute of Computer Science

Heraklion,May 25, 2001

Center for Cultural Informatics

Page 2: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

2ICS-FORTH May 25, 2001

XML is

XML is a compromise between databases and free texts

It takes the better from both sides without being perfect on either side.

It is readable. It allows to disambiguate meaning.

It is simple.

It is rich enough to open a new systems paradigm.

Page 3: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

3ICS-FORTH May 25, 2001

What is a Document ?

A composite statement : a unit relating known facts, items and categories with new knowledge - linguistic or by other media.

It has an inner logic: the pure rendered knowledge, independent from language and form.

It has a meaningful structure: The sequence, arrangement or linking used to render the inner logic.

It has a presentation: Structure and style to assist perception and impression

Page 4: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

4ICS-FORTH May 25, 2001

A document

Page 5: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

5ICS-FORTH May 25, 2001

The statements….

Diego Velasquez is Spanish.

Diego Velasquez lived 1599-1660.

Diego Velasquez painted “Juan de Pareja”.

“Juan de Pareja” is a painting.

“Juan de Pareja” has dimension 81,3X69,9cm

Juan de Pareja is Moorish.

Juan de Pareja is a painter.

Philipp IV sent Velazquez to Italy.

…..

Page 6: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

6ICS-FORTH May 25, 2001

Another document

Page 7: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

7ICS-FORTH May 25, 2001

What’s Wrong with HTML

<B>MONET, Claude<B><BR>Haystacks at Chailly at Sunrise<BR>1865<BR>Oil on canvas<BR>30 x 60 cm (11 7/8 x 23 3/4 in.)<BR>San Diego Museum of Art <BR><P><IMG SRC=“http://192.41.13.240/artchive/ m/monet/hayricks.jpg”>

If written properly, normal HTML may reflect document presentation, but it cannot adequately represent the semantics & structure of data

Artist Name

Date

Artifact Title

Dimensions

Material

Museum

Image Reference

Page 8: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

8ICS-FORTH May 25, 2001

User Problems/ Design Reasons

Preserving info units: who said that / self-contained

Entering data:

what can I say, what should I say, how can I say it.

Rendering data: how to tell my child, the public…

Accessing data: querying, mediation

Reusing data: transmission to other environments, merging, evolution of local system, preservation for future use.

Page 9: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

9ICS-FORTH May 25, 2001

In Technical Terms

Transformation under preservation of meaning

Correct adaptation of presentation without knowing meaning

Packaging information for presentation – “1 document”

Sequencing categories for data input.

Interpretation of intended meaning - searching

Automatic relating of common meaning – merging of different statements

Page 10: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

10ICS-FORTH May 25, 2001

What’s wrong with

Free texts: Clear packaging, rendering for one target, not machine processable (poor querying, categories uncomprehensive), poorly reusable, no help to enter data, transform data..

HTML: Solves platform-independence of presentation, weak connection between meaning and presentation structure – not far better than free text.

Databases: Clear logical structure, categorization, machine processable, excellent querying, difficult presentation, transformation, merging, evolution, no information units

XML: Clear packaging, logical structure, machine processable if correctly used, clear separation and relation of meaningful structure and presentation.

Helpful to enter data, easy to extend, transform, present. Can be queried, structure not independent from user view.

Page 11: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

11ICS-FORTH May 25, 2001

XML and databases

Databases:

Schema first: Prior to data, complete, inflexible analysis of all categories and their relations.

Table structures: indexes prepared, excellent consistency enforcement.

XML:

Data first; structure explanatory, can come second, need not be formalized, extensible, DTD’s can be combined

semi-structured: flexible, but reduced guarantee if a question can be answered, reduced consistency enforcement.

Embedded schema: each instance carries the schema it uses –

querying by parsing without index structures – ideal transport format.

Page 12: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

12ICS-FORTH May 25, 2001

Data First, Embedded Schema

This document carries the interpretation with it. It is readable without knowledge of the schema.

<ARTIST> <NAME><FIRST>Claude</FIRST><LAST>Monet</LAST></NAME> <ARTWORK> <ARTIFACT> <TITLE>Haystacks at Chailly at Sunrise</TITLE> <DATE>1865</DATE> <MATERIAL>Oil on canvas</MATERIAL> <DIM Metric=‘cm’> <HEIGHT>30</HEIGHT><WIDTH>60</WIDTH></DIM> <DIM Metric=‘in’> <HEIGHT>11 7/8</HEIGHT><WIDTH>23 3/4</WIDTH></DIM> <LOCATION>San Diego Museum of Art</LOCATION> <IMAGE File=‘http://192.41.13.240/artchive/m/monet/hayricks.jpg’/> </ARTIFACT> </ARTWORK></ARTIST>

Page 13: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

13ICS-FORTH May 25, 2001

What’s important

Data first: delayed analysis, preserves data.

Embedded schema: facilitates data transport, readable in the future.

Separation of semantics and presentation: enables information reuse.

Guides and controls data entry

Same meaning can be encoded in multiple formats:

DTD design depends on purpose: Transport, presentation, data entry…

Page 14: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

14ICS-FORTH May 25, 2001

Useful Applications

Prescription for documentation / input

Data transfer between systems (“middle ware”)

Document bases with full query access.

Combine database with XML documents: mission-critical data in tables and DTD, rich extensible structures in DTD only.

Create data for long-term use: even machine readable from paper!

Create information sets for multiple presentation

Page 15: ICS-FORTH May 25, 2001 1 The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May

15ICS-FORTH May 25, 2001

Final Remark

How to encode meaning without structure ambiguities:

=> use RDF/ RDFS

How to standardize meaning of element types (tags) ?

=> use ontologies – e.g. formulated in RDFS!