View
213
Download
0
Tags:
Embed Size (px)
Citation preview
September 23, 2007 NSF TANGO BYU/RPI 1
TANGO
Table Analysis for Generating OntologiesDavid W. Embley (BYU) & George Nagy (RPI)
under NSF Awards 0414644 and 0414854 INFORMATION & KNOWLEDGE MANAGEMENT
Dr. Maria Zemankova
(a) Table Interpretation
(b) Query by Table
September 23, 2007 NSF TANGO BYU/RPI 2
TABLE
INTERPRETED TABLE
MINI ONTOLOGY
GROWING ONTOLOGY
Wang Notation & XML
Wang Notation Tool
Ontology Editor
Annotated Semantic
Web Pages
Standard Ontology Language
(OWL)
Ontology Based Web
Services
Form Based Specification
Extraction Ontologies
Relational Databases
Query By Table
TANGO STEPS
September 23, 2007 NSF TANGO BYU/RPI 3
TABLE
INTERPRETED TABLE
MINI ONTOLOGY
GROWING ONTOLOGY
Wang Notation & XML
Wang Notation Tool
Ontology Editor
Annotated Semantic
Web Pages
Standard Ontology Language
(OWL)
Ontology Based Web
Services
Form Based Specification
Extraction Ontologies
Relational Databases
Query By Table
This presentation
September 23, 2007 NSF TANGO BYU/RPI 4
(a) Table Interpretation
HTML web pages
ConstructWang notation
Matlabtable
Extracttable
WangNotation
XMLtable
Confirm or correct
Confirm orcorrectMini Ontology
September 23, 2007 NSF TANGO BYU/RPI 5
Median Income tablehttp://www40.statcan.ca/l01/cst01/famil108a.htm?sdi=median%20income
September 23, 2007 NSF TANGO BYU/RPI 6
Median Income table displayed from Canada Statistics displayed in TANGO Wang Notation Tool
September 23, 2007 NSF TANGO BYU/RPI 7
Wang Notation
• Abstract table is specified by ordered pair (C,) - (category, delta)
• C is a finite set of labeled domains (header, sub headers of tables, etc)
represents each individual value within a table corresponding to C.
September 23, 2007 NSF TANGO BYU/RPI 8
Categories
• Two categories in previous table.
• CATEGORY 1: (Region_Virtual,{(Canada,phi), (Newfoundland and Labrador,phi), (Prince Edward Island,phi), (Nova Scotia,phi), (New Brunswick,phi), (Quebec,phi), (Ontario,phi), (Manitoba,phi), (Saskatchewan,phi),(Alberta,phi),(British Columbia,phi),(Yukon Territory,phi), (Northwest Territories,phi), (Nunavut,phi)})
• CATEGORY 2: (Year_Virtual, {(2001,phi), (2002,phi), (2003,phi),
(2004,phi), (2005,phi)})
September 23, 2007 NSF TANGO BYU/RPI 9
Content (leaf) cells• Delta Notation for two (of 15) rows:delta({Year_Virtual.2001,Region_Virtual.Canada})=53,500
delta({Year_Virtual.2002,Region_Virtual.Canada})=55,000
delta({Year_Virtual.2003,Region_Virtual.Canada})=56,000
delta({Year_Virtual.2004,Region_Virtual.Canada})=58,100
delta({Year_Virtual.2005,Region_Virtual.Canada})=60,600
delta({Year_Virtual.2001,Region_Virtual.Newfoundland and Labrador})=41,400
delta({Year_Virtual.2002,Region_Virtual.Newfoundland and Labrador})=43,200
delta({Year_Virtual.2003,Region_Virtual.Newfoundland and Labrador})=44,800
delta({Year_Virtual.2004,Region_Virtual.Newfoundland and Labrador})=46,100
delta({Year_Virtual.2005,Region_Virtual.Newfoundland and Labrador})=47,600
September 23, 2007 NSF TANGO BYU/RPI 10
XML Representation:Schema for (1) table (2) categories (3) data cells (4) augmentation
<InterpretedTable xsi:noNamespaceSchemaLocation="G:\RPI\XML\02_TableInterface.XS.070803.xml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table TableOID="Table2" Number="2" DocumentCitation="Wang's Thesis" Title="Wang table" Caption="Grades in 1991 and 1992"> <CategoryNodes> <CategoryNode CategoryNodeOID="C1" Label="Median Total Income"></CategoryNode> <CategoryNode CategoryNodeOID="C11" Label="Canada"></CategoryNode> <CategoryNode CategoryNodeOID="C12" Label="Newfoundland and Labrador"></CategoryNode> <CategoryNode CategoryNodeOID="C13" Label="Prince Edward Island"></CategoryNode> <CategoryNode CategoryNodeOID="C14" Label="Nova Scotia"></CategoryNode> <CategoryNode CategoryNodeOID="C15" Label="New Brunswick"></CategoryNode> <CategoryNode CategoryNodeOID="C16" Label="Quebec"></CategoryNode> <CategoryNode CategoryNodeOID="C17" Label="Ontario"></CategoryNode> <CategoryNode CategoryNodeOID="C18" Label="Manitoba"></CategoryNode> <CategoryNode CategoryNodeOID="C19" Label="Saskatchewan"></CategoryNode> <CategoryNode CategoryNodeOID="C110" Label="Alberta"></CategoryNode> <CategoryNode CategoryNodeOID="C111" Label="British Columbia"></CategoryNode> <CategoryNode CategoryNodeOID="C112" Label="Yukon Territory"></CategoryNode> <CategoryNode CategoryNodeOID="C113" Label="Northwest Territories"></CategoryNode> <CategoryNode CategoryNodeOID="C114" Label="Nunavut"></CategoryNode> <CategoryNode CategoryNodeOID="C2" Label="Year (Virtual)"></CategoryNode> <CategoryNode CategoryNodeOID="C21" Label="2001"></CategoryNode> <CategoryNode CategoryNodeOID="C22" Label="2002"></CategoryNode> <CategoryNode CategoryNodeOID="C23" Label="2003"></CategoryNode> <CategoryNode CategoryNodeOID="C24" Label="2004"></CategoryNode> <CategoryNode CategoryNodeOID="C25" Label="2005"></CategoryNode> </CategoryNodes> </Table><CategoryParentNodes> <CategoryParentNode CategoryParentNodeOID="C1"> <CategoryNodes>…… XML file for this table has ~350 lines of Object Identifier tags
September 23, 2007 NSF TANGO BYU/RPI 11
Verification tool: category headers for a selected content cell
September 23, 2007 NSF TANGO BYU/RPI 13
Verification tool:hierarchical category structure for a selected content cell
September 23, 2007 NSF TANGO BYU/RPI 14
(b) Query by Table
Income2002200320042005
Ontologyfrom many tables Database
Income2002 $45002003 $33002004 $12402005 $3400
QBTInterpretQuery Table
September 23, 2007 NSF TANGO BYU/RPI 15
Query Table Composed in MS-Excel by a person seeking informationfrom an ontology compiled from many web tables
September 23, 2007 NSF TANGO BYU/RPI 16
Display of automatically processed Query Table for human verification
September 23, 2007 NSF TANGO BYU/RPI 19
URLs of tables in the Example Database• Median Total Income :
http://www40.statcan.ca/l01/cst01/famil108a.htm?sdi=median%20income* • Number of Induced Abortions:
http://www40.statcan.ca/l01/cst01/health40a.htm?sdi=abortions• Number of Divorces:
http://www40.statcan.ca/l01/cst01/famil02.htm?sdi=number%20divorces• Infant Mortality Rate:
http://www40.statcan.ca/l01/cst01/health21a.htm?sdi=infant%20mortality%20rate*• Trips By Canadians in Canada:
http://www40.statcan.ca/l01/cst01/arts26a.htm• Number of Homicides:
http://www40.statcan.ca/l01/cst01/legal12a.htm?sdi=homicide• Population:
http://www40.statcan.ca/l01/cst01/demo02a.htm?sdi=population• Number of Persons with Diabetes: http://www40.statcan.ca/l01/cst01/health54a.htm?sdi=diabetes• Number of Persons with Asthma: • http://www40.statcan.ca/l01/cst01/health50a.htm?sdi=asthma• University Degrees Awarded to Males: http://www40.statcan.ca/l01/cst01/educ51b.htm• University Degrees Awarded to Females: http://www40.statcan.ca/l01/cst01/educ51c.htm• Food services and drinking places (13 tables):
http://www40.statcan.ca/l01/cst01/serv24j
September 23, 2007 NSF TANGO BYU/RPI 20
Fields in the Example Database• IDENTIFIER • REGION • YEAR• NUMBER_OF_ABORTIONS• ABORTION_RATE• NUMBER_OF_DIVORCES• INFANT_MORTALITY_RATE• NUMBER_OF_TRIPS• MEDIAN_TOTAL_INCOME• POPULATION• NUMBER_OF_HOMICIDES• GENDER • INCIDENCE_OF_DIABETES• UNIVERSITY_DEGREES_AWARDED• INCIDENCE_OF_ASTHMA• RESTAURANT_OPERATING_REVENUE• RESTAURANT_OPERATING_EXPENSES• RESTAURANT_OPERATING_PROFIT_MARGIN• RESTAURANT_OPERATING_WAGES
September 23, 2007 NSF TANGO BYU/RPI 22
A current puzzle
How can QBT tell that these two query tables represent the same request? NB: Although plausible, both of these tables exemplify poor layout.
Year Region Gender Diabetics
2002
Alberta
Male XXFemale XX
Ontario
Male XXFemale XX
Year Region
Diabetics
Male Female
2002
Alberta XX XXOntario XX XX
September 23, 2007 NSF TANGO BYU/RPI 23
Next steps• Complete the conversion of Wang/XML
table descriptions to mini ontologies
• Improve the interface for generating cumulative ontology from mini ontologies
• Implement database generation from ontology
• Embed logging routines for statistical evaluation of time/error trade-offs