29
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang

1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying…

Embed Size (px)

DESCRIPTION

3 Main Accomplishment This study provides an efficient and consistent storage for semistructured data by developing algorithms that map the XML document to logical ORA-SS model and then to an object- relational data store.

Citation preview

1

Storing and Maintaining Semistructured Data Efficiently in an Object-Relational Database

Mo Yuanying and Ling Tok Wang

2

Contests1. Main accomplishment2. Related Works3. ORA-SS4. Storing Algorithm5. Comparison with Related Works6. Conclusion

3

Main Accomplishment This study provides an efficient and consistent

storage for semistructured data by developing algorithms that map the XML document to logical ORA-SS model and then to an object-relational data store.

4

Contests1. Main accomplishment2. Related Works3. ORA-SS4. Storing Algorithm5. Comparison with Related Works6. Conclusion

5

(1) the file system store each XML document as a separate operating

system file and use a DOM or SAX parser whenever the document is accessed by a query

Disadvantage XML files in ASCII format need to be parsed every time when

they are accessed for either browsing or querying. the entire parsed file must be memory-resident during query

processing in DOM. it is hard to build and maintain indices on documents stored

this way. update operations are difficult to implement.

Related Works

6

(2)Using a relational DBMS XML data is stored in relations and the XML query

language (for example, XQuery) is translated to SQL and executed by the underlying relational database system

Related Works

Disadvantages A great deal of redundancy Difficult to do search or update Handling multi-valued attribute is

expensive

-- The Edge Approach-- The Attribute Approach-- Universal Table-- Normalized Universal

Approach-- STORED

7

(3)Using a storage manager

the XML query is parsed, translated to a suitable operator tree representation, optimized, and then executed by an XML Query Engine

-- Shore-- B-tree

Related Works

Disadvantage Inconvenient when doing the search or update

8

(4)Our approach --Store ORA-SS in nested relations

Problems in existing storage approaches Stored in flat files – it is long and difficult to query or update Relational DBMS – these approaches cannot get the semantic

information ORA-SS reflects the nested structure of semi-structured data,

distinguishes between object classes, relationship types and attributes. It is possible to specify the degree of n-ary relationship types and indicate if an attribute is an attribute of a relationship type or an attribute of an object class. Such information is essential for designing an efficient and non-redundant storage organization for semi-structured data

Handling multi-valued attribute better in nested relations

Related Works

9

Contests1. Main accomplishment2. Related Works3. ORA-SS4. Storing Algorithm5. Comparison with Related Works6. Conclusion

10

ORA-SS A semantically richer data model for semi-

structured data 3 main concepts

Object class Relationship type Attribute

11

Example Binary relationship type

ORA-SS

12

Example (Cont) Ternary relationship type

ORA-SS

13

Example (Cont) The distinction between binary and ternary

relationship types cannot be made in other semi-structured data models.

ORA-SS

14

ORA-SS ORA-SS can specify the degree of n-ary

relationship types ORA-SS can indicate if an attribute is an

attribute of a relationship type or an attribute of an object class

Existing semi-structured data models cannot specify such information while it is essential and important for storage

15

Contests1. Main accomplishment2. Related Works3. ORA-SS4. Storing Algorithm5. Comparison with Related Works6. Conclusion

16

ORA-SS to OR database Object-Relational database can handle multi-

valued attributes efficiently. Multi-valued attributes are treated as repeating groups

in nested relations.

Storing Algorithm

17

ORA-SS to OR database Main rules

Each object class together with its attributes forms a nested relation while multi-valued attributes as repeating groups of this relation (Object relation).

Each relationship type(object classes involved in this relationship type) together with its attributes forms a nested relation while multi-valued attributes as repeating groups of this relation (Relationship relation).

Storing Algorithm

18

(1)Object class translation algorithm

O1 The identifier and candidate key of this object class is the primary key and candidate key of the generated relation.

O2 Each single-valued attribute of this object class is a single-valued attribute of the generated relation.

O3 Composite attributes of object class are represented directly. They are replaced by their components in the generated relation.

Storing Algorithm

19

Object class translation algorithm (cont)

O4 Each multi-valued attribute of this object class forms a repeating group in this relation.

O5 Each reference is a foreign key in this relation. O6 Each disjunctive attribute is treated as two

attributes. O7 For the ID dependency relationship type, the

rule for the ID dependent object class is the same as the rule for the regular object class. The ID dependent object class together with its attributes forms a nested relation within its parent object class.

Storing Algorithm

20

Translation Example1Storing Algorithm

21

(2)Relationship type translation algorithm

R1 All the identifiers of the object classes participating in this relationship type form the single-valued attributes of the nested relation. The key of the relationship type can be determined by

the participation constraint of the relationship type. R2 Each single-valued attribute of this

relationship type is a single-valued attribute of the generated relation.

Storing Algorithm

22

Relationship type translation algorithm (cont)

R3 Composite attributes of relationship type are represented directly. They are replaced by their components in the generated relation

R4 Each multi-valued attributes of this relationship type forms a repeating group in this relation.

R5 A disjunctive relationship type is treated as two relationship types.

R6 There is no need to translate ID dependency relationship type.

Storing Algorithm

23

Translation Example1Storing Algorithm

24

Translation for Ordering and ANY (3)Translation for Ordering

we define another attribute named ordinal within the ordered object class (ie, the ordered attribute).

(4)Translation for ANY the unknown structured attribute or an attribute may have a

different structure for different instances, which is denoted as ANY

we define a separate table as (Identifier, ANY, ANY-value). Identifier is the identifier of the object class or the relationship type

which this ANY belongs to. ANY is the different structure name (the TAG) for the different

instances. ANY-value is its value.

Storing Algorithm

25

Translation Results Followed these algorithms, the Normal Form

ORA-SS schema will result in the normal form nested relations.

the undesirable update anomalies in semi-structured databases are removed and any redundancy due to many-to-many relationships and n-ary relationships are controlled

Storing Algorithm

26

Contests1. Main accomplishment2. Related Works3. ORA-SS4. Storing Algorithm5. Comparison with Related Works6. Conclusion

27

Comparison

Other modelsSupply(J#, S#, P#, price, Qty)

28

Conclusion Our approach is to use ORA-SS as our data

model and use object-relational database as the database management system.

We can store and access the semi-structured data correctly, more efficient and without avoidable redundancy.

There is no node ID needed in our approach.

29

Conclusion (cont) Our approach can capture the semantic

information which is essential and important for storage.

Our approach can represent the degree of n-ary relationship types.

Our approach can represent the attribute as attribute of object class or attribute of relationship type.