Upload
sharleen-fisher
View
218
Download
1
Tags:
Embed Size (px)
Citation preview
1
What Do You Want—Semantic Understanding?
(You’ve Got to be Kidding)
David W. EmbleyBrigham Young University
Funded in part by the National Science Foundation
2
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
3
Grand Challenge
Semantic UnderstandingSemantic Understanding
Can we quantify & specify the nature of this grand challenge?
4
Grand Challenge
Semantic UnderstandingSemantic Understanding“If ever there were a technology that could generatetrillions of dollars in savings worldwide …, it wouldbe the technology that makes business informationsystems interoperable.”
(Jeffrey T. Pollock, VP of Technology Strategy, Modulant Solutions)
5
Grand Challenge
Semantic UnderstandingSemantic Understanding“The Semantic Web: … content that is meaningful tocomputers [and that] will unleash a revolution of newpossibilities … Properly designed, the Semantic Webcan assist the evolution of human knowledge …”
(Tim Berners-Lee, …, Weaving the Web)
6
Grand Challenge
Semantic UnderstandingSemantic Understanding“20th Century: Data Processing“21st Century: Data Exchange “The issue now is mutual understanding.”
(Stefano Spaccapietra, Editor in Chief, Journal on Data Semantics)
7
Grand Challenge
Semantic UnderstandingSemantic Understanding“The Grand Challenge [of semantic understanding] has become mission critical. Current solutions … won’t scale. Businesses need economic growth dependent on the web working and scaling (cost: $1 trillion/year).”
(Michael Brodie, Chief Scientist, Verizon Communications)
8
Why Semantic Understanding?
Because we’re overwhelmed with data• Point and click too slow• “Give me what I want when I want it.”
Because it’s the key to revolutionary progress• Automated interoperability and knowledge sharing• Automated negotiation in e-business• Large-scale, in-silico experiments in e-science
We succeed in managing information if we can “[take] data and [analyze] it and [simplify] it and [tell] people exactly the information they want, rather than all the information they could have.” - Jim Gray, Microsoft Research
9
What is Semantic Understanding?
Understanding: “To grasp or comprehend [what’s]intended or expressed.’’
Semantics: “The meaning or the interpretation of a word, sentence, or other language form.”
- Dictionary.com
10
Can We Achieve Semantic Understanding?
“A computer doesn’t truly ‘understand’ anything.”
But computers can manipulate terms “in ways that are useful and meaningful to the human user.”
- Tim Berners-Lee
Key Point: it only has to be good enough.And that’s our challenge and our opportunity!
…
11
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
13
Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of
certainty or community agreement Information: data in a conceptual framework Data: attribute-value pairs
- Adapted from [Meadow92]
14
Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of
certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs
- Adapted from [Meadow92]
15
Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of
certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs
- Adapted from [Meadow92]
16
Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of
certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs
- Adapted from [Meadow92]
17
Data
Attribute-Value Pairs• Fundamental for information• Thus, fundamental for knowledge & meaning
18
Data
Attribute-Value Pairs• Fundamental for information• Thus, fundamental for knowledge & meaning
Data Frame• Extensive knowledge about a data item
�̶Everyday data: currency, dates, time, weights & measures
�̶Textual appearance, units, context, operators, I/O conversion
• Abstract data type with an extended framework
19
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
20
?
Olympus C-750 Ultra Zoom
Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm
21
?
Olympus C-750 Ultra Zoom
Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm
22
?
Olympus C-750 Ultra Zoom
Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm
23
?
Olympus C-750 Ultra Zoom
Sensor Resolution 4.2 megapixelsOptical Zoom 10 xDigital Zoom 4 xInstalled Memory 16 MBLens Aperture F/8-2.8/3.7Focal Length min 6.3 mmFocal Length max 63.0 mm
24
Digital Camera
Olympus C-750 Ultra Zoom
Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm
25
?
Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red
ABS6 CD changerkeyless entry
Price $33,000Phone (916) 972-9117
26
?
Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red
ABS6 CD changerkeyless entry
Price $33,000Phone (916) 972-9117
27
?
Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red
ABS6 CD changerkeyless entry
Price $33,000Phone (916) 972-9117
28
?
Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red
ABS6 CD changerkeyless entry
Price $33,000Phone (916) 972-9117
29
Car Advertisement
Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red
ABS6 CD changerkeyless entry
Price $33,000Phone (916) 972-9117
30
?
Flight # Class From Time/Date To Time/Date Stops
Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04
Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04
31
?
Flight # Class From Time/Date To Time/Date Stops
Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04
Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04
32
Airline Itinerary
Flight # Class From Time/Date To Time/Date Stops
Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04
Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04
33
?
Monday, October 13, 2003
Group A W L T GF GA Pts.USA 3 0 0 11 1 9Sweden 2 1 0 5 3 6North Korea 1 2 0 3 4 3Nigeria 0 3 0 0 11 0
Group B W L T GF GA Pts.Brazil 2 0 1 8 2 7…
34
?
Monday, October 13, 2003
Group A W L T GF GA Pts.USA 3 0 0 11 1 9Sweden 2 1 0 5 3 6North Korea 1 2 0 3 4 3Nigeria 0 3 0 0 11 0
Group B W L T GF GA Pts.Brazil 2 0 1 8 2 7…
35
World Cup Soccer
Monday, October 13, 2003
Group A W L T GF GA Pts.USA 3 0 0 11 1 9Sweden 2 1 0 5 3 6North Korea 1 2 0 3 4 3Nigeria 0 3 0 0 11 0
Group B W L T GF GA Pts.Brazil 2 0 1 8 2 7…
36
?
Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm
37
?
Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm
38
?
Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm
39
Treadmill Workout
Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm
40
?
Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,000 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW
41
?
Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,000 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW
42
?
Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,000 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW
43
Maps
Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,100 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW
44
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
46
What is an Extraction Ontology? Augmented Conceptual-Model Instance
• Object & relationship sets• Constraints• Data frame value recognizers
Robust Wrapper (Ontology-Based Wrapper)• Extracts information• Works even when site changes or when new sites
come on-line
47
Extraction Ontology: Example
Car [-> object];Car [0:1] has Year [1:*];Car [0:1] has Make [1:*];…Car [0:*] has Feature [1:*];PhoneNr [1:*] is for Car [0:1];Year matches [4] constant {extract “\d{2}”; context “\b’[4-9]\d\b”; …} …Mileage matches [8] keyword {\bmiles\b”, “\bmi\b.”, …} ……
48
Extraction Ontologies:An Example of
Semantic Understanding
“Intelligent” Symbol Manipulation Gives the “Illusion of Understanding” Obtains Meaningful and Useful Results
49
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
50
A Variety of Applications
Information Extraction High-Precision Classification Schema Mapping Semantic Web Creation Agent Communication Ontology Generation
52
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
Constant/Keyword Recognition
Descriptor/String/Position(start/end)
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
53
Heuristics
Keyword proximity Subsumed and overlapping constants Functional relationships Nonfunctional relationships First occurrence without constraint violation
54
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
Keyword Proximity
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
55
Subsumed/Overlapping Constants
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
56
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
Functional Relationships
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
57
Nonfunctional Relationships
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
58
First Occurrence without Constraint Violation
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
59
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
Database-Instance Generator
insert into Car values(1001, “97”, “CHEVY”, “Cavalier”, “7,000”, “11,995”, “556-3800”)insert into CarFeature values(1001, “Red”)insert into CarFeature values(1001, “5 spd”)
63
Document 1: Car Ads
Year: 3Make: 2Model: 3Mileage: 1Price: 1Feature: 15PhoneNr: 3
Expected Values Heuristic
Document 2: Items for Sale or Rent
Year: 1Make: 0Model: 0Mileage: 1Price: 0Feature: 0PhoneNr: 4
64
Vector Space of Expected Values
OV ______ D1 D2Year 0.98 16 6Make 0.93 10 0Model 0.91 12 0Mileage 0.45 6 2Price 0.80 11 8Feature 2.10 29 0PhoneNr 1.15 15 11
D1: 0.996D2: 0.567
ov
D1
D2
65
Grouping Heuristic
YearMakeModelPriceYearModelYearMakeModelMileage…
Document 1: Car Ads
{{{
YearMileage…MileageYearPricePrice…
Document 2: Items for Sale or Rent
{{
66
GroupingCar Ads----------------YearYearMakeModel-------------- 3PriceYearModelYear---------------3MakeModelMileageYear---------------4ModelMileagePriceYear---------------4…Grouping: 0.875
Sale Items----------------YearYearYearMileage-------------- 2MileageYearPricePrice---------------3YearPricePriceYear---------------2PricePricePricePrice---------------1…Grouping: 0.500
Expected Number in Group = floor(∑ Ave ) = 4 (for our example)
Sum of Distinct 1-Max Object Sets in each GroupNumber of Groups * Expected Number in a Group
1-Max
3+3+4+4 4*4
= 0.875 2+3+2+1 4*4
= 0.500
68
Problem: Different Schemas
Target Database Schema{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}
Different Source Table Schemas• {Run #, Yr, Make, Model, Tran, Color, Dr}• {Make, Model, Year, Colour, Price, Auto, Air Cond.,
AM/FM, CD}• {Vehicle, Distance, Price, Mileage}• {Year, Make, Model, Trim, Invoice/Retail, Engine,
Fuel Economy}
69
Solution: Remove Internal Factoring
Discover Nesting: Make, (Model, (Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*)*
Unnest: μ(Model, Year, Colour, Price, Auto, Air Cond, AM/FM, CD)* μ (Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*Table
Legend
ACURA
ACURA
70
Solution: Replace Boolean Values
Legend
ACURA
ACURA
β CD Table
Yes,
CD
CD
Yes,Yes,βAutoβAir CondβAM/FMYes,
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
71
Solution: Form Attribute-Value Pairs
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
<Make, Honda>, <Model, Civic EX>, <Year, 1995>, <Colour, White>, <Price, $6300>, <Auto, Auto>, <Air Cond., Air Cond.>, <AM/FM, AM/FM>, <CD, >
72
Solution: Adjust Attribute-Value Pairs
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
<Make, Honda>, <Model, Civic EX>, <Year, 1995>, <Colour, White>, <Price, $6300>, <Auto>, <Air Cond>, <AM/FM>
73
Solution: Do Extraction
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
74
Solution: Infer Mappings
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}
Each row is a car. πModelμ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*TableπMakeμ(Model, Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*μ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*TableπYearTable
Note: Mappings produce sets for attributes. Joining to form recordsis trivial because we have OIDs for table rows (e.g. for each Car).
75
Solution: Infer Mappings
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}
πModelμ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*Table
76
Solution: Do Extraction
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}
πPriceTable
77
Solution: Do Extraction
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}
Yes,ρ Colour←Feature π ColourTable U ρ Auto←Feature π Auto β AutoTable U ρ Air Cond.←Feature π Air Cond.
β Air Cond.Table U ρ AM/FM←Feature π AM/FM β AM/FMTable U ρ CD←Featureπ CDβ CDTableYes, Yes, Yes,
79
The Semantic Web
Make web content accessible to machines What prevents this from working?
• Lack of content• Lack of tools to create useful content• Difficulty of converting the web to the
Semantic Web
83
The Problem
Requiring these assumptions precludes
agents from interoperating on the fly
“The holy grail of semantic integration in architectures” is to “allow two agents to generate needed mappings between them on the fly without a priori agreement and without them having built-in knowledge of any common ontology.” [Uschold 02]
Agents must:
1- share ontologies,
2- speak the same language,
3- pre-agree on message format.
84
SolutionAgents must:
1- share ontologies,
2- speak the same language,
3- pre-agree on message format.• Eliminate all assumptions
- Dynamically capturing a message’s semantics
- Matching a message with a service
- Translating (developing mutual understanding)
• This requires:
85
MatchMaking System (MMS)
MMS
Translation
Message-Service Matching
Message Handling
Agent 1
MMS
Translation
Message-Service Matching
Message Handling
Agent 2
Response to the message Service call
The matched service
Messages
Response Request
Info = FindBestBuy (“Notebook PC”)
Translation repository
Services repository
Translation repository
Services repository
Response Handling Response
Handling
87
TANGO: Table Analysis for Generating Ontologies
Recognize and normalize table information Construct mini-ontologies from tables Discover inter-ontology mappings Merge mini-ontologies into a growing ontology
88
Recognize Table Information
Religion Population Albanian Roman Shi’a SunniCountry (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other
Afganistan 26,813,057 15% 84% 1%Albania 3,510,484 20% 70% 30%
89
Construct Mini-Ontology Religion Population Albanian Roman Shi’a SunniCountry (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other
Afganistan 26,813,057 15% 84% 1%Albania 3,510,484 20% 70% 30%
92
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
93
Limitations and Pragmatics
Data-Rich, Narrow Domain Ambiguities ~ Context Assumptions Incompleteness ~ Implicit Information Common Sense Requirements Knowledge Prerequisites …
94
Busiest Airport in 2003?
Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)
95
Busiest Airport in 2003?
Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)
96
Busiest Airport in 2003?
Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)
97
Busiest Airport in 2003?
Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)
Ambiguous Whom do we trust? (How do they count?)
98
Busiest Airport in 2003?
Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)
Important qualification
99
Dow Jones Industrial Average
High Low Last Chg30 Indus 10527.03 10321.35 10409.85 +85.1820 Transp 3038.15 2998.60 3008.16 +9.8315 Utils 268.78 264.72 266.45 +1.7266 Stocks 3022.31 2972.94 2993.12 +19.65
44.07
10,409.85
Graphics, Icons, …
100
Dow Jones Industrial Average
High Low Last Chg30 Indus 10527.03 10321.35 10409.85 +85.1820 Transp 3038.15 2998.60 3008.16 +9.8315 Utils 268.78 264.72 266.45 +1.7266 Stocks 3022.31 2972.94 2993.12 +19.65
44.07
10,409.85
Reported onsame date
WeeklyDaily
Implicit information: weekly stated in upper corner of page; daily not stated.
101
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
102
Some Key Ideas Data, Information, and Knowledge Data Frames
• Knowledge about everyday data items• Recognizers for data in context
Ontologies• Resilient Extraction Ontologies• Shared Conceptualizations
Limitations and Pragmatics
103
Some Research Issues
Building a library of open source data recognizers Creating a corpora of test data for extraction,
integration, table understanding, … Precisely finding and gathering relevant information
• Subparts of larger data• Scattered data (linked, factored, implied)• Data behind forms in the hidden web
Improving concept matching• Indirect matching• Calculations and unit conversions
…