Upload
jasper-fletcher
View
215
Download
0
Embed Size (px)
Citation preview
Incremental, Semi-automatic, Mapping-Based Integration of Heterogeneous
Collections into Archaeological Digital Libraries: Megiddo Case Study
ECDL 2005, Vienna, September 19, 2005
Ananth Raghavan, Naga Srinivas Vemuri, Rao Shen, Marcos André Gonçalves, Weiguo Fan, and
Edward A. Fox
[email protected] http://fox.cs.vt.edu
Acknowledgements (Selected)
• Sponsors: NSF grant ITR-0325579; AOL, ASOR, CWRU, ETANA, Vanderbilt U., Virginia Tech
• Faculty/Staff: Lillian Cassel, Debra Dudley, Roger Ehrich, Manuel Perez, …
• VT (Former) Students: Doug Gorton, Aaron Krowne, Ming Luo, Hussein Suleman, Ricardo Torres, …
Acknowledgements (Selected)
• Karen Borstad, MPP
• Giorgio Buccellati, UCLA
• Douglas Clark, Walla Walla College
• Joanne Eustis, CWRU
• Nick Fischio, CWRU
• Israel Finkelstein, Tel-Aviv University
• Paul Gherman, Vanderbilt U.
• Andrew Graham, U. Toronto
• Tim Harrison, U. Toronto
• Larry Herr, Canadian University College
• Christopher Holland, LRP
• Paul Jacobs, Mississippi State U.
• Douglas Knight, Vanderbilt U.
• Stan LaBianca, Andrews U.
• David McCreery, Willamette U.
• Eric Meyers, Duke U.
• Adam Porter, Illinois College
• Jack Sasson, Vanderbilt U.
• Tom Schaub, Indiana U. of Penn.
• Randall Younker, Andrews U.
• Doug Gorton, Virginia Tech
Outline
Problems Background: ETANA-DL, Megiddo Approaches
Within the 5S framework Visual mapping service Multi-dimensional browsing
Conclusions Future Work
Problems
Vast quantities of heterogeneous archaeological data Integration is a monumental task.
Wrapper automation difficult to construct a global schema in
archaeological domain
Background
ETANA-DL Web Site
Background (Cont.)
Megiddo Collection Archaeological site in Israel Contains over 30000 records 7 different types of artifacts
Wall Locus Pottery Bucket Flint tool Vessel Lab Item Miscellaneous Artifact
Approaches
Within the 5S framework Visual mapping service
Semi-automatically generate wrapper based on a visual schema mapping tool that simultaneously improves the global schema.
Multi-dimensional browsing service Extend access to newly integrated collections
through multi-dimension browsing component.
5S MetaModel
5SGraphDL
Expert
DL Designer
5SL DL
Model
5SLGen
Practitioner
Researcher
TailoredDL
Services
Teacher
componentpool
ODLSearch,ODLBrowse,ODLRate,ODLReview,
…….
Requirements (1) Analysis (2)
Implementation (4)
Design (3)
5SGraph 5SGen
Mapping Tool
5SSuite
Structure Sub-model
Mapping Tool
Wrapper
ArchDL Designer
5SGraph5S Archaeology
MetaModelArchDL Expert ArchDL Designer
ETANA-DLUnion Services
Descriptions
HarvestingMapping
SearchingBrowsing
…
Scenario Sub-model
Local Schema ETANA-DL Schema
Local data
Globaldata
UnionCatalog
5SGen
ComponentPool
Browsing…
Multi-dimensionBrowsing Service
*Pottery bucket
*Flint tool
*LocusMegiddo *Area *Square*Vessel
*Lab item
*miscellaneous artifact
Megiddo Site Organization in Structure Sub-model
Visual Mapping Service
Features of visual schema mapping tool Scenario usage
Mapping Megiddo local schema into ETANA global one
Usability evaluation
Features of Visual Schema Mapping Tool
Schema Visualization using hyperbolic trees Recommendation engine that uses 3 algorithms
Name-based matching (editing distance) Rules Mapping history
Colors to distinguish between different types of schema nodes (root, leaf, non-leaf, selected, recommended, and mapped)
Mapping table that stores mappings from local to global nodes
Allows for renaming, deleting a node, and adding a local schema sub-tree as a child in the global schema.
Generates an XSLT style sheet as a result of mapping process.
Features of Visual Schema Mapping Tool
Mapping Megiddo Local Schema into ETANA Global Schema
Mapping of flint tool and vessel collectionsName-based matching (editing distance)Rules
Area - > PARTITION Square1 - > SUBPARTITION OriginalBucket - > CONTAINER Locus - > LOCUS
Mapping history
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)
Initial set of mappings for flint tool based on rules and name-based matching
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)
Adding FLINT sub-tree as a child of OBJECT in the global schema
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)
Global node Description renamed to DESCRIPTION, and user choosing to Save Mappings
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)
Flint tool style sheet generated
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)
Using the View Only Top Level Leaf Nodes option mapping Vessel Collection
Mapping Megiddo Local Schema into ETANA Global Schema (Cont.)
Name change recommendation based on mapping history
Usability Evaluation
Claims Analysis Exploring trade-off between
linear representation and hyperbolic tree representation with recommendations in terms of mapping speed.
scrolling involved in linear representation and re-orient actions involved in hyperbolic trees.
representing mappings as lines across the screen and in a separate mapping table
editing capability in the same tool and mapping and editing in different tools in terms of ease of use and editing and mapping speed.
Benchmark Tasks (BTs) to explore the above claims Comparison between Schema Mapper and MapForce for 1-
1 schema mapping (as found in ETANA-DL).
Benchmark Task 1
Required the user to map 6 given nodes from the local to global schema.
Used to compare time and scrolls vs. re-orients and number of errors.
Users were asked to indicate as to which tool helped them locate nodes faster.
Benchmark Task 1 Quantitative Results
Benchmark Task 1 Quantitative Results (Cont.)
Benchmark Task 1 Quantitative Results (Cont.)
2 users recorded 1 error each when using Schema Mapper, no errors for MapForce.
The error was that they selected the wrong local schema node.
However, both of them realized their error because of the mapping table provided.
Reduces the criticality of error.
Benchmark Task 1 Qualitative Results
Wins 8 out of 9 users felt that Schema Mapper helped locate
both local schema and global schema nodes faster than MapForce.
The remaining user felt that both tools were equally effective for local schema node detection. However, for global schema node detection, Schema Mapper was superior.
Areas for Improvement Users complained that they could not look at the full
node name in Schema Mapper.
Benchmark Task 2
User asked to map Megiddo Flint collection into ETANA-DL.
Task involves schema editing.
Task accomplished by using MapForce for mapping and XML Spy for editing for comparison with Schema Mapper.
Used to compare efficiency between the two tools.
Benchmark Task 2 Quantitative Results
Benchmark Task 2 Quantitative Results (Cont.)
Benchmark Task 2 Quantitative Results (Cont.)
Schema Mapper – All errors were due to Rename feature. Task required the user to rename the node name to
uppercase of existing node name. The Rename box in the UI did not contain the old name.
Critical Incident with a high criticality Rectified by adding old name in the Rename box while
prompting the user to enter a new name.
In MapForce, one user actually lost all his mappings!!
Benchmark Task 2 Qualitative Results
Wins All 9 users preferred editing capability of Schema
Mapper over that of MapForce and XML Spy combined.
Areas for Improvement Rename functionality to be extended to the mapping
table. Allowing a group rename by selecting multiple nodes
and renaming them in a separate window.
Benchmark Task 3
Asks the users to identify mappings done in BT-2.
Compares the time taken by each tool to identify the mappings.
Compares errors in identifying mappings.
Benchmark Task 3 Quantitative Results
Benchmark Task 3 Quantitative Results (Cont.)
Benchmark Task 3 Quantitative Results
Wins 7 out of 9 users were faster using Schema Mapper. No errors using Schema Mapper whereas 2 users made
1 error each while using MapForce.
Areas for Improvement Sorting feature can be added to further aid the user in
locating the mappings faster. (Has been subsequently added.)
Benchmark Task 3 Qualitative Results
Wins All 9 users found it easier to identify mappings with
Schema Mapper than MapForce.
Benchmark Task 4
Users were asked whether they would be using View Only Top-Level Leaf Nodes and View Only This Sub-tree features.
This question was mainly posed to find out whether an undo feature (getting back the original view with all nodes displayed) needed to be implemented.
All users unanimously agreed that they would use both of the features.
(Undo feature was implemented subsequently.)
Summary of Usability Evaluation
All claims justified.
Rename box modified to display old name while prompting for new name.
Undo feature implemented.
Sort feature provided for sorting the mapping table.
Multi-dimension Browsing Service
Extend browsing service to integrated Megiddo collection Flint Vessel Lab item Miscellaneous artifact
Multi-dimension Browsing Service
Integrated Megiddo collection
Conclusions
Demonstrate the DL integration workflow through Megiddo case study.
Visual schema mapping tool supports integration by wrapper generation and global schema enrichment.
Positive results from initial pilot studies of the visual schema mapping tool
Future Work
Extensive usability studies
Explore complex mappings
Enhance mapping recommendations
Questions?Comments?