Upload
hoanghanh
View
217
Download
4
Embed Size (px)
Citation preview
Native XML Databases 1
Native XML Databases
Lesson 4
Native XML Databases 2
XML Data• XML adds a new data model to the world
– In addition to relational, hierarchical, OO, ...• The “XML” data model is
– A tree of ordered nodes– Nodes have different types (element, attribute, etc.)– Some nodes are labeled (Date, Quantity, Price, etc.)– Data stored in leaf nodes
• Modeling language is an XML schema language– DTD, XML Schemas, etc.
Native XML Databases 3
Storing XML in a DB
• Build a model– Model the data in the XML document, or...– Model the XML document itself
• Map the model to the database• Transfer data according to the model
Native XML Databases 4
Sample XML document<Order>
<Number>1234</Number><Customer>Gallagher Co.</Customer><Date>29.10.00</Date><Item Number="1">
<Part>A-10</Part><Quantity>12</Quantity><Price>10.95</Price>
</Item><Item Number="2">
<Part>B-43</Part><Quantity>600</Quantity><Price>3.99</Price>
</Item></Order>
Native XML Databases 5
Modeling data
• Objects in model are specific to XML schema object Order {
number=1234;customer="Gallagher Corp.";date=29.10.00;items={ptrs to Items};
}
object Item {number=1;part="A-10";quantity=12;price=10.95;
}
object Item {number=2;part="B-43";quantity=600;price=3.99;
}
Native XML Databases 6
Storing data
• Database schema specific to XML schemaOrdersNumber Customer Date1234 Gallagher Co. 291000... ... ...... ... ...
ItemsSONum Item Part Qty Price1234 1 A-10 12 10.951234 2 B-43 600 3.99... ... ... ... ...
Native XML Databases 7
Modeling documents
• Objects in model independent of XML schema Element
(Order)
Element Element Element Element Element(Number) (Customer) (Date) (Item) (Item)
Text Text Text ...Element Element Element Attr
1234 Gallagher Co. 29.10.00 (Part) (Quantity) (Price) (Number)
Text Text Text Text
A-10 12 10.95 1
Native XML Databases 8
Storing documents• Database schema independent of XML schema
(order columns not shown)
Elements Attributes TextID Name Parent ID Name Parent Parent Value 1 Order -- 13 Number 5 2 12342 Number 1 14 Number 6 3 Gallagher Co.3 Customer 1 4 29.10.004 Date 1 7 A-105 Item 1 8 126 Item 1 9 10.957 Part 5 10 B-438 Quantity 5 11 6009 Price 5 12 3.9910 Part 6 13 111 Quantity 6 14 212 Price 6
Native XML Databases 9
Native XML Database
– Defines an XML data model– Uses a document as its fundamental unit of
(logical) storage– Can have any physical storage
Native XML Databases 10
XML data model
• XML 1.0 did not define a model• Native XML databases define their own model
– Model must include elements, attributes, text, and document order
– Examples are XPath, Info Set, DOM, and SAX– XQuery model will be de facto standard in future?
• Data transferred according to the model
Native XML Databases 11
Fundamental unit of storage
• Document is fundamental unit of logical storage– Equivalent structure in RDBMS is a row
• Document usually contains single set of data
Native XML Databases 12
Text-based storage
• Stores documents as text• Uses file system, CLOB, etc.
– Includes XML-aware text in RDBMS
• May need to parse documents at run time• Uses indexes to avoid extra parsing,
increase search speed
Native XML Databases 13
Text-based storage<Address>
<Street>123 Main St.</Street><City>Chicago</City><State>IL</State><PostCode>60609</PostCode><Country>USA</Country>
</Address>
<Address><Street>123 Main St.</Street><City>Chicago</City><State>IL</State><PostCode>60609</PostCode><Country>USA</Country>
</Address>
<Address><Street>123 Main St.</Street><City>Chicago</City><State>IL</State><PostCode>60609</PostCode><Country>USA</Country>
</Address>
<Address><Street>123 Main St.</Street><City>Chicago</City><State>IL</State><PostCode>60609</PostCode><Country>USA</Country>
</Address>
Native XML Databases 14
Text-based databases
• Indexed files– TextML
• CLOBs– Oracle 9i release 2, DB2
Native XML Databases 15
Model-based storage
• Stores documents in “object” form• Documents parsed when inserted• For example, store DOM objects in
OODBMS• Underlying storage can be relational,
object-oriented, hierarchical, proprietary• Uses indexes to speed searches
Native XML Databases 16
Model-based storage<Address>
<Street>123 Main St.</Street><City>Chicago</City><State>IL</State><PostCode>60609</PostCode><Country>USA</Country>
</Address>
Element
Element Element Element Element Element
Text Text Text Text Text
Native XML Databases 17
Model-based databases
• Proprietary– Tamino, Xindice, Neocore, Ipedo, XStream DB,
XYZFind, Infonyte, Virtuoso, Coherity, Luci, TeraText, Sekaiju, Cerisent, DOM-Safe, XDBM, ...
• Relational– Xfinity, eXist, Sybase, DBDOM
• Object-oriented– eXcelon, X-Hive, Ozone/Prowler, 4Suite, Birdstep
Native XML Databases 18
Whole documents and fragments(Text-based databases)
• Should be very fast– Data is contiguous on disk– Retrieval requires index lookup and single disk
read 1. Index lookup2. Position disk head3. Read to here
Native XML Databases 19
Whole documents and fragments (Model-based databases)
• Databases with proprietary stores should be fast– Can use physical pointers between nodes
• Databases built on other DBs may be fast or slow– Depends on underlying database and
implementation
Node Node Node Node NodeNode
1. Index lookup2. Position disk head3. Follow pointers to end
Native XML Databases 20
Unindexed data• Slow for model-based databases
– Must read many elements, not just particular type– Comparisons may be slower due to converting text
• Very slow for text-based databases– Must parse document as well as comparing values
Element
Element Element Element Element
Text Text Text Attr Element ...
Task:Find date 29.10.00
Relational database:1. Search this column
Model-based native XML database:1. Search all elements for Date elements2. Search text for all Date elements
Orders... ... ...1234 29.10.00 Gallagher Industries... ... ...... ... ...
Native XML Databases 21
Unindexed data: Optimization
• /Order[Date="29.10.00"]– Only need to search children of Order– Schema may help locate Date element
• //Order[Date="29.10.00"]– Schema may help locate Order and Date
elements– Without schema, must search entire document
Native XML Databases 22
The eXist project• eXist is the leading open source implementation of
XML:DB• It acts as a repository for indexing and retrieval of
XML and RDF documents• Uses a native java backend data store
– Which splits out elements, attributes and entities into columns and associates tree path information
– This allows for very high response times to the search queries
• The eXist project goes beyond simple XPath queries by adding functionality like NEAR based queries and document grouping (collections)
Native XML Databases 23
eXist DB• Schema-less data storage• Collections• Index-based query processing• XQuery/XPath extension for performing full text search• XUpdate support also
• exist-db.org
Native XML Databases 24
eXist Architecture• eXist is split into three distinct areas
– Brokers, for accessing the data held in either the native Java data store or a relational DB like mySQL or Oracle
– The engine itself, used to rebuild the documents and query the data store – Interfaces to the engine, either XML-RPC, the XML:DB API (Java) or
SOAP for use within a Web Services framework
Native XML Databases 25
Building products using eXist• There are three main options for building applications over
eXist– Using WebDAV access– Java applications, using the XML:DB API– Web Service applications (Built using Java, Python, Perl)
• Web Services can then be used through any SOAP aware programming languages
• XML-RPC Interfaces, available through most modern programming languages
• A natural fit for XML applications built on eXist is using Apache’s Cocoon as the presentation layer of your application
Native XML Databases 26
xml.apache.org
• The Apache XML Project has activities focused on different aspects of XML – Xerces - XML parsers in Java, C++ (with Perl and COM bindings) – Xalan - XSLT stylesheet processors, in Java and C++ – Cocoon - XML-based web publishing, in Java – FOP - XSL formatting objects, in Java – Xang - Rapid development of dynamic server pages, in Java – Indice
• Native XML database
Native XML Databases 27
Building products using eXist
Native XML Databases 28
XPath Extension
• document()– Selection of a document, a set of documents, or
all
• collection()– Specification of a collection of documents to be
included in query evaluation– collection(‘/db/vincent’)//scence[speech[speaker=“David”]]/title
Native XML Databases 29
XPath Extension
• Querying text– XPath only have a few functions in text searching– contains(), near(), &=, |=, match-all()
• //chapter[ contains(., ‘XML’) and contains(., ‘database)]– Find a chapter that its content contains the words ‘XML’and ‘database’
• //section[ near(., ‘XML database’, 50)]– Find sections containing both keywords in the correct order and with less than 50
words between them• //scene [ speech [&= ‘witch game’and line |= ‘fenny snake’]]
– Two additional operators for simple keyword queries: &= and |=– &= : selects context nodes containing ALL of the keywords in the right-hand
argument in any order. – |= : selects context nodes containing ANY of the keywords in the right-hand
argument • //speech [ match-all [line, ‘li[fv]e[s]’)]
– Tries to match the regular expression string
Native XML Databases 30
Query Execution
• Basic approach– Top-down/bottom-up traversal for XPath expression– Very inefficient
• /book//section[ contains (title, ‘XML’)]• Follow every child path beginning at BOOK to check for potential
SECTION descendants
• Indexing structure– Efficient processing of regular path expressions on large,
unconstrained document collections
Native XML Databases 31
IndexingVirtual nodes Complete K-ary tree
Native XML Databases 32
Indexing• Too high numbers => document size limitation => drop completeness
– For 2 nodes x and y of a tree, size(x) = size(y) if level(x) = level(y), where size(n) is the number of children of a node n and level(m) is the length of the path from the root node of the tree to m
Native XML Databases 33
Storage Implementation
• Storage backend– dom.dbx collects DOM nodes in a paged file
and associates node identifiers to the actual nodes
– collections.dbx manages the collection hierarchy
– element.dbx indexes elements and attributes– words.dbx keeps track of word occurrences and
is used by the full text search extensions
Native XML Databases 34
Storage Implementation• All indexes are based on B+ - tree
node n1, node n2, …
Multiroot B+ -tree
Data pages
DOM nodes
Node-id Address
Document d1
Document d2
Native XML Databases 35
Query Processing• Decompose a path into a chain of basic steps
– /PLAY//SPEECH[SPEAKER=`HAMLET`]
• Load the root element (PLAY) for all documents in the input document set
• The set of SPEECH elements is retrieved for the input documents via an index lookup from file element.dbx
• Use an ancestor-descendant path join algorithm to join the two sets
• Evaluate the predicate
Native XML Databases 36
Query Screen
Native XML Databases 37
Advantages of eXist
• Advantages of eXist as a Native XML DB– eXist Provides a scalable, reliable XML database implementation
royalty free– By including multiple different interfaces eXist makes application
integration simple– With the addition of other open source platforms (like Cocoon for
presentation and Axis for Web Services) eXist can be extended easily to fit many application needs
– eXist is being used currently in many live implementations
Native XML Databases 38
Drawbacks• Drawbacks of eXist as a Native XML DB
– With open source, documentation is scarce, you will have to rely on mailing lists for the more difficult problems
– Warranty is not included from source, although it can be given by a third party
– Although stable, eXist is still considered beta software. Current version is 1.1.1 (Feb 2007)
Native XML Databases 39
Sedna – Another Native XML DB
• Full-featured database system (external and main memory management, query and update facilities, concurrency etc.)
• Native XML database• Based on the XQuery language and the
XQuery/XPath data model• XUpdate language• Implemented in Scheme and C/C++• Supported platforms are Windows and Linux• http://modis.ispras.ru/Development/sedna.htm
Native XML Databases 40
Data Organization
• Descriptive schema driven storage strategy is used, which consists in clustering nodes of XML document according to their positions in descriptive schema
• Direct pointers are used to represent relations between nodes of an XML document such as parent, child and sibling relationships
Native XML Databases 41
Descriptive Schema (Data Guide)<library><book><title>Foundation on databases</title><author>Abiteboul</author><author>Hull</author><author>Vianu</author>
</book>. . .<book><title>An Introduction to DatabaseSystems</title>
<author>Date</author><issue><publisher>Addison-Wesley</publisher><year>2004</year>
</issue></book><paper><title>A Relational Model for Large SharedData Banks</title><author>Codd</author>
<paper>. . .<paper><title>The Complexity of Relational Query Languages</title>
<author>Codd</author><paper>
</library>
library
book paper
title author issue
publisher year
title book
/library/book/title
library
book
title
Native XML Databases 42
Data Structures (Node descriptor)title
. . .
node handle
Indirection table
children “by descriptive schema”
next-in-block
right-sibling
prev-in-block
left-sibling
parent
Label (numbering scheme)
Native XML Databases 43
Structural query efficiency
When we answer structural queries like
We• Read only blocks containing necessary
information and do not read other blocks• Every block, which is being read, does
contain only those nodes that are to be in the answer
/library/book/title
Native XML Databases 44
Node updates efficiency
• Node descriptors have fixed size aside the block
• Node descriptors are partly ordered
• Immutable numbering scheme
• Indirection table for parents
node right-sibling
left-sibling
parent
indirectiontable
child child…
Native XML Databases 45
Memory Management• Pointers are used to present relationships between
nodes and traversing nodes results in intensive pointer dereferencing, so the dereferencing operation should be effective
• Database address space should be big enough to represent large volumes of data
OS memory management restrictions• Restriction on the size of address space caused by
32-bit architecture that prevails nowadays• Cannot control the page replacement (swapping)
procedure
Native XML Databases 46
Layered Address Space (LAS)Layered Address Space
OS Virtual Process Address Space
Transaction
process
Buffer Manager
External Memory (Disk)
(layer, addr)
addr
MapViewOfFile(Windows)
mmap (Linux)
Buffer Memory
VirtualLock (Windows)
mlock (Linux)
layer * LAYER_SIZE + addr
Native XML Databases 47
Query Evaluation Aspects
• Suspended element constructors• Different strategies for XPath queries
evaluation• Combining Lazy and Strict Semantics
Native XML Databases 48
Element Constructors
• XML element construction requires deep copy of its content – so, the operation is heavy
• Suspended element constructors – Does not perform the deep copy
• Stores a pointer instead– the copy is performed on demand when
some operation gets into the constructed element
Native XML Databases 49
Different strategies for XPath queries evaluation
library
book paper
title author issue
publisher year
title book
/library/book[issue/year=2004]
/library/book/issue/year[.=2004]../../title
year
book
Top-down: descriptive schema
Two steps
Native XML Databases 50
Combining Lazy and Strict Semantics
• Iterative result computation (open; next; close)
• Iterative result computation with functional programming language give lazy evaluation
• On the other hand, strict semantic of a language is more efficient comparing with lazy semantics
• So, strict and lazy semantics is combined for XQuery
Native XML Databases 51
Combining Lazy and Strict Semantics
• Query evaluations starts in lazy mode• Every function call is a reason to switch to
strict mode if the sizes of arguments are relatively small
• The large input sequence for any physical operation in the strict mode is the subject to switch to lazy mode
Native XML Databases 52
Summary
• Efficient evaluation of structured XPath queries
• Local node-level updates• Effective processing of XML data in main
memory comparable to general purpose programming language
Native XML Databases 53
Tamino XML Server• Supports Internet and W3C standards• Stores/Retrieves native XML documents and non-XML • Offers full text search on XML documents • Accesses Rdbms databases• Integrates with other business applications• Supports any programming language • Works with major Web servers and Applications
servers
Native XML Databases 54
Tamino XML Server Architecture
Core Services
Security Service
Tokenizer (opt.) Chin.,Jap.,Kor.
Administration Services
Query / Text-RetrievalService
XML Parser + Query Interpreter
Obj. Processor& Obj. Composer
X-TensionService
Tamino Manager
Data Map
XML SchemaService
Native XMLData Store
Single Customer View
Self ServicePortals
Supply Chain Integration
Business Reporting
XML Business Integration Solutions + Customer Solutions
Customer Solutions
Enabling Services
InteractiveServices
Application Programming
Software AGIntegration Svcs
SchemaServices
More ...
EnterpriseEdition Services
External DBServices (opt.)
UDDI andWeb Services
Internet (HTTP, WebDAV, SOAP)
Databases
Applications
Back Office/ Back End
incl. Adabas
Front Office/ Clients
Mobile Phone
PDA
Printer
Browser
CDInternet File
System
XQuery, XPath
XML, WML, HTML
ODBC,Adabas
Data / Metadata / non XMLCOM
Native XML Databases 55
Tamino Features
• Integration of data in existing, external data sources– Access to and modification of data from diverse
systems (relational DBs, object DBs, Office-Systems...)
• Database Queries with ‘XQuery’– XPath-based - regarding document structure and
content• Simple administration
– Browser-based control via any PC having Internet access
• Simple connection to Internet via standard Web-Server
Native XML Databases 56
Tamino Features• Security
– Group / User authorization rights down to XML Element level
• Read-Only databases supported• Dynamic Style-Sheet processing supported
– e.g. HTML formatted output of XML documents
Native XML Databases 57
The Communication Interface
Application
Web Server
Web Browser
HTTP
X-Port
Apache, MS IIS, IBM WebSphere,Sun iPlanet
Tamino Server
Port 3204
"xml"
Tamino Server
Port 3207
"mydata"
Tamino Server
Port 3210
"test1"
http://www.ebiz1.com/tamino/xml
Native XML Databases 58
• Connecting XML-based businesses to Web• Highest performance with unlimited native XML data
storage• “Valid” and “well-formed” XML accepted• XML database structure easily changeable• Full-text search (Indexing for Full Text and Standard
search)
Native XML Store
Native XML Databases 59
Tamino XML Data Store
Native XML Databases 60
Tamino Data Map
Native XML Databases 61
Tamino XML Schema• Contains all the information needed
for storage, indexing and processing of XML objects, especially for– storage of XML structures and properties– Integration of external databases /
applications– Construction of standard and text indices
Native XML Databases 62
Schema - CollectionsCollection"sailing"
Doctype "cruise"
Doctype"yacht"
Doctype "contract"
Collection"shop"
Doctype "order"
Doctype "product"
Doctype "contract"
Native XML Databases 63
XML Schema Support• Complete Support for XML Schema 1.0 Specification• Industry Schema Support:
Docbook 4.4FpML 4.1METS 1.0NewsML 1.2SVG 1.0UBLVoiceMLWord 2003XBRL 2.1
• Full DTD Support
Native XML Databases 64
Storing - customer.xml<CUSTOMER>
CUSTOMERID>1044</CUSTOMERID><FIRSTNAME>Paul</FIRSTNAME><LASTNAME>Astoria</LASTNAME><HOMEADDRESS>
<STREET>123 Cherry Lane</STREET><CITY>Best</CITY><STATE>CA</STATE><ZIP>94132</ZIP>
</HOMEADDRESS></CUSTOMER>
Store XML document into Schema using URL with command _Process in default collection
Ready!OR USE
DTD Tamino XML Schema
Read DTD or XML Schemainto Tamino Schema Editor
Mark check boxes for (text) indexing
Save XML Schema with a collection name
Store XML document using URL with command _Process in named collection
Native XML Databases 65
Tamino Search and Retrieval• W3C XQuery Support
– User-defined functions– If-Then-Else– Node-level update
• XPath Support– Extended with text search
XML Schema
Ap
plic
atio
n
APIXQuery
XML Web Server
Data Map
TaminoX-Tension
customApplication
existingDBMS
for $b ininput()/bib/booklet $a := $b/author where$b/pricelt200return($b/title, $a)
Native XML Databases 66
Tamino Indexing and Retrieval• Standard
– Classical database indexes– Index any combination of elements and attributes– Supports relational operators, exact comparisons, sorting
• Text– Use in conjunction with text retrieval functions– Supports wildcard searches
• Structure– Index declared on the document– Registers instances of undeclared nodes
Native XML Databases 67
Tamino Indexing and Retrieval• Reference
– Indexes specific sub-trees of a document (e.g. /doc/a/b)– Useful for documents of high complexity (multiplicity of sub-
trees)• Multipath
– Index any element or attribute that meets an XPath expression• Compound
– Index a combination of two elements (e.g. lastname and firstname)
Native XML Databases 68
• Easy Integration with existing DBMS
• “One Server View” on integrated heterogeneous databases
• non-native objects stored in other database types (SQL, Adabas,...) • Connection to existing data storage in other DB types
(e.g. RDBMS, Excel ... )
DB Connector
X-N
OD
EO
pen
AP
I
SCHEMA
Tamino KernelRDBMS
RDBMS
RDBMSAdabas
Native XML Databases 69
–XML-based application– logic on the Tamino server
• Event functions, query functions, content dependent mapping ...
• Message forwarding on events• User programmable server
functionality, customization • Allows integration with ext. Applications• Technology: COM Objects (C++) & Java
X-Tension
Financial System
Spreadsheet...
ServerExtension
Native XML Databases 70
SearchDisplay
Modify
XAppGenerator
X-Application Architecture
SOAP
Web Service
Internet
HTML Pages
Browser
Web Server
HTML
JSP TagLibrary
Business
Modules
Tamino API
Tamino
Plu
gin
s
Internet
Java API
ServersidePRESENTATION
LAYER
APPLICATION LAYER
DATA ACCESSLAYER
Native XML Databases 71
MS Office 2000, Internet Explorer,
XML-Spy, Adobe Acrobat, Python,
WebDAV Explorer, Dreamweaver , other applications ....
Tamino WebDAV Server
• WebDAV = Web-based Distributed Authoring and Versioning
• WebDAV is a standard • Future: framework for
CMS (with check-in/-out, versioning, query)
• Community downloads
Tamino WebDAV Server
TaminoServer
Native XML Databases 72
Tamino APIs
• Various APIs to Tamino Server
• Released with Tamino 3.1– Java APIs – EJB-API (Application
Server support – BEA, IBM, HP ...)
– ActiveX
Tamino XML Server
EJB API
C-API
Java API
.NET
WebServerless APIs
ActiveX
… … .
Native XML Databases 73
Tamino DOM APIsJava
ApplicationCOM
Application
HTTP
HTTP C Web browser/ASP
JavaInterface
JScriptInterface
HTTP CInterface
COMInterface
Tamino server
Native XML Databases 74
Cons of Native XML DB
• Products are immature• Many standards are still in development• Techniques are unfamiliar to people• Not good at transaction processing• Tool support is minimal• Some expected database practices are still
unsupported• Interoperability between products is minimal
Native XML Databases 75
Pros of Native XML DB
• Good way to store XML• Can store document or data style XML• Tremendous flexibility• Applications can be loosely coupled• Data modeling is simple and flexible• Complement RDBMS with XML mapping
solutions• Performance can be very good
Native XML Databases 76
HKCAN• Hong Kong Chinese Authority (Name)• A collaborative project since 1999• 7 Hong Kong university libraries
– Chinese University of Hong Kong– City University of Hong Kong– Hong Kong Baptist University– Hong Kong Institute of Education– Hong Kong Polytechnic University– Lingnan University– University of Hong Kong
Native XML Databases 77
Aims• To build up a Chinese name authority file with CJK
(Chinese, Japanese, Korean) scripts that meets the need of the bilingual community
• To improve and streamline authority-control operations by setting up standardization for name headings and principles for authority record selection to achieve “Better”, “Faster” and “Cheaper”
• To participate in regional and global cooperative activities on authority work
Native XML Databases 78
Record model - HKCAN record 008 941020nc acannaabn |a aaa |||010 $anr 94034993035 $a(DLC#)nr 94034993a040 $aDLC-R$beng$cDLC-R$dOCoLC$dHkCU$dHkCAN066 $c$1100 1 $aZhou, Ying,$d17th cent.400 1 $wnne$aChou, Ying,$d17th cent.400 1 $aZhou, Fangshu,$d17th cent.400 1 $a周方叔,$d17th cent.400 1 $aChou, Fang-shu,$d17th cent.670 $aChih lin (卮林), 1992:$bt.p. (Chou Ying)670 $aChung wen ta tz{176}u tien (中文大詞典):$bv. 6, p. 290 (Chou Ying; of Ming; native of P{176}u-t{176}ien; t. Fang-shu; author of Chih lin; lived around the mid of Emperor Ch{176}ung-chen reign)670 $aHis卮林 : 10卷, 附補遺1卷, [1963]:$bt.p. (周嬰)670 $a中國人名大辭典, 1934:$bp. 545 (周嬰, 明莆田人, 字方叔, 崇禎中以貢生知上猶縣, 所著卮林, 体近類書)700 1 $a周嬰,$d17th cent
Native XML Databases 79
Record model – in library system
Native XML Databases 80
Document Type Definition
• HKCAN DTD (Document Type Definition) – to specify the structure of each XML authority record
• With this DTD, records can be output to the XML schema or other related schemas if needed
• DTD has well-served all the necessary functionality in the present XML platform
Native XML Databases 81
DTD<?xml version="1.0" encoding="UTF-8"?><!--DTD generated by XMLSPY v2004 rel. 3 U (http://www.xmlspy.com)--><!ELEMENT Leader (#PCDATA)><!ELEMENT Name (Leader, (Tag* | tag_type00* | tag_type10* | tag_type11* | tag_type30* | tag_1xx* | tag_4xx* | tag_5xx* | tag_7xx* | tag_670*)*)>
<!ATTLIST Nametag001 CDATA #IMPLIEDrecord_type CDATA #IMPLIED
><!ELEMENT Subfield (#PCDATA)><!ATTLIST Subfield
subfield_code CDATA #IMPLIED><!ELEMENT Tag (#PCDATA | Subfield*)*><!ATTLIST Tag
tagcode CDATA #IMPLIEDrecord_type CDATA #IMPLIED
ind1 CDATA #IMPLIEDind2 CDATA #IMPLIED
><!ELEMENT tag_1xx (#PCDATA)><!ELEMENT tag_4xx (#PCDATA)><!ELEMENT tag_5xx (#PCDATA)>
<!ELEMENT tag_670 (#PCDATA)><!ELEMENT tag_7xx (#PCDATA)><!ELEMENT tag_type00 (#PCDATA)><!ELEMENT tag_type10 (#PCDATA)><!ELEMENT tag_type11 (#PCDATA)><!ELEMENT tag_type30 (#PCDATA)>
Native XML Databases 82
HKCAN XML platform
Web interface
Records in Communication MARC format with EACC encoding
Records in XML format with EACC encoding
Records in XML format with UTF-8 encoding
HKCAN XML full text search server (Tamino)
for full text search, records display & download
HKCAN index search server
(SQL anywhere 8.0)
Full text searchIndex search
Program to convert records from Communication MARC format to XML format
Program to convert the records from CCCII encoding to UTF-8 encoding
Import the records to a relational database for index search
Retrieve the full record from HKCAN XML server
MARC
Native XML Databases 83
Record conversion
00681cz 2200193n 4504001001000000003000600010005001700016008004100033010001600074035002300090040003000113066000700143100003000150670002600180670018700206670005700393678000900450700002800459 000000001 HkCAN 19960504052613.5 800523n| acannabb| |n aaa an 50026575 a(DLC#)n 50026575a aDLC cDLC dCU dDLC-R dHkCU c$1 1 aNakayama, Shigeru, d1928- aHis Senseijutsu, 1963 aKagaku gijutsu to ekoroj{229}i, 1995: bt.p. (Nakayama Shigeru) colophon (r; b. 1928; Ph.D. (from Harvard Univ.); prof., Kanagawa Daigaku; former asst. prof., T{229}oky{229}o Daigaku) a�$1!Bs!Ci!O(':`!5=�(B, 1999: bp. 2 (�$1!04!;e!Th�(B) aSc.D 1 a�$1!04!;e!Th�(B, d1928-
From MARC record with EACC encoding
Native XML Databases 84
Record conversionTo XML record with EACC encoding
<Name tag001 = "000000001" record_type = "00"><Leader>00681cz 2200193n 4504</Leader><Tag tagcode = "003" record_type = "" ind1 = "" ind2 = "">HkCAN</Tag><Tag tagcode = "005" record_type = "" ind1 = "" ind2 = "">19960504052613.5</Tag><Tag tagcode = "008" record_type = "" ind1="" ind2="">800523n| acannabb| |n aaa </Tag><Tag tagcode = "010" record_type = "" ind1=" " ind2=""><Subfield subfield_code = "a">n 50026575 </Subfield></Tag>… ..<Tag tagcode = "670" record_type = "" ind1=" " ind2=""><Subfield subfield_code � �= "a"> $1!Bs!Ci!O(':̀ !5= (B, 1999: </Subfield><Subfield subfield_code � �= "b">p. 2 ( $1!04!;e!Th (B) </Subfield></Tag>… … .
� �<tag_type00> $1!04!;e!Th (B, | 1928- | </tag_type00>� �<tag_7xx> $1!04!;e!Th (B, | 1928- | </tag_7xx>
</Name>
Native XML Databases 85
Record search• Index search (browse search)• Full text search (phrase/keyword search)• Tamino server: Full text search, record display, record download• SQL Anywhere server: Index search
Native XML Databases 86
One stop search• Inspired by VIAF (Virtual International Authority Files) & the
LEAF (Linking and Exporting Authority Files) Projects• Search across multiple authority files concurrently• HKCAN, Chinese Authority Name Database (Taiwan), LC
Authority File, National Library of China