Upload
connor
View
48
Download
0
Embed Size (px)
DESCRIPTION
MASSACHUSETTS INSTITUTE OF TECHNOLOGY SLOAN SCHOOL OF MANAGEMENT INFORMATION TECHNOLOGIES GROUP SEMANTIC INTEGRATION (COIN PROJECT) For Dr. Bob Popp, DARPA 8 April 2003 Stuart Madnick ([email protected]) Michael Siegel ([email protected]) Richard Wang ([email protected]). - PowerPoint PPT Presentation
Citation preview
1
MASSACHUSETTS INSTITUTE OF TECHNOLOGYSLOAN SCHOOL OF MANAGEMENT
INFORMATION TECHNOLOGIES GROUP
SEMANTIC INTEGRATION (COIN PROJECT)
For Dr. Bob Popp, DARPA
8 April 2003 Stuart Madnick ([email protected]) Michael Siegel ([email protected])
Richard Wang ([email protected])
2
Data bases
Appli- cations
OUTPUT PROCESSING
ODBC Driver
Web - Publishing
CONTEXT MEDIATION* Automatic Automatic conflict conflict detection detection and and conversionconversion- Derived data- Source selection- Source attribution
TRUSTED
AGENTS
INPUT PROCESSING* Automatic web wrapping- - Semi-Semi-structured structured texttext-Multi--Multi-source source query plan query plan and and executionexecution
Browsers APPLICATIONS: Financial services,
electronic commerce, asset visibility, in-transit visibility.
Sources
Web Pages
Receivers
COntext INterchange (COIN) Project
3
Background on DARPA Supportfor Context Mediation Research
• Initial efforts funded as part of DARPA Intelligent Integration of Information (I3) Program
• Period: July 1993 - Sept 1998• Started under: Gio Wiederhold• then under: Dave Gunning & Bob Neches
Other related activity:• MIT Total Data Quality Management (TDQM)• Since 1991 (web.mit.edu/tdqm)
4
Multiple Perspectives . . . old lady or young lady ?
5
CONTEXT VARIATIONS:- GEOGRAPHIC ( US vs. UK )- FUNCTIONAL (CASH MGMT vs. LOANS )- ORGANIZATIONAL ( CITIBANK vs. CHASE )
Context Context
Context
Data: Databases Web data E-mail
?$ £
¥
Role Of Context01-02-03
03-02-01
02-01-03
6
Example : Context Differences ( from multiple web
sources)
Daimler Benz ( DAI ) Financial Data P/E Ratio
ABC 11.6Bloomberg 5.57DBC 19.19MarketGuide 7.46
7
Complementary Aggregation Example• Q: How did CO2 emissions
(total, per GDP, per capita) change over time (between 1990 and 2000) in Yugoslavia?– User 1: YUG as a geographic
region bounded before the breakup
– User 2: YUG as a legal autonomous state
Related effort: - Laboratory for Information Globalization and Harmonization Technologies (LIGHT)
8
1990 2000
Country
GDP Pop GDP Pop
YUG 698.3 23.7
1627.8
10.6
BIH 13.6 3.9
HRV 266.9 4.5
MKD 608.7 2.0
SVN 7162 2.0
Country Code Currency CurCode
Yugoslavia YUG New Yug. Dinar
YUN
Bosnia and Herzegovia
BIH Marka BAM
Croatia HRV Kuna HRK
Macedonia MKD Denar MKD
Slovenia SVN Tolar SIT
From
To 1990 2000
USD YUG
10.5 67.267
USD BIH 2.086
USD HRV
8.089
USD MKD
64.757
USD SVN
225.93
CO2 Emission
Country 1990 2000
YUG 35604 15480
BIH 1279
HRV 5405
MKD 3378
SVN 3981
User 1 User 2
Country 1990 2000 1990 2000
CO2 35604 29523 35604 15480
GDP 66.5 104.8 66.5 24.2
CO2/capita 1.5 1.28 1.5 1.46
CO2/GDP 535 282 535 640
GDP/Capita
2800 4560 2800 1100
GDP in billions local currency; GDP in billions local currency; Population in millionsPopulation in millions
In 1000 tons per yearIn 1000 tons per year
Total CO2 in 1000 tons per year; GDP in Total CO2 in 1000 tons per year; GDP in billions USD; CO2/Capita in tons per billions USD; CO2/Capita in tons per person; CO2/GDP in tons per million USD; person; CO2/GDP in tons per million USD; GDP/Capita in USD per personGDP/Capita in USD per person
World Bank’s World Dev. World Bank’s World Dev. Indicator DB; Indicator DB; UN UN Statistic Division; Statistic Division; Statistics BureausStatistics Bureaus
OAK Ridge’s CDIAC DB; OAK Ridge’s CDIAC DB; WRI; GSSD; EPAsWRI; GSSD; EPAs Olsen (Web)Olsen (Web)
Many sources needed:Meanings in sources & users might differ
9
The 1999 OvertureUnit-of-measure mixup tied to loss of $125Million Mars Orbiter“NASA’s Mars Climate Orbiter was lost
because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . .
. . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ).”Source: Kathy Sawyer, Boston Globe, October 1, 1999, page 1.
10
The Context Interchange Approach
ContextMediator
Source Receiver
ReceiverContext
ConversionLibraries
SourceContext
SharedOntologies
ContextTransformation
Context ManagementApplication
Concept: Length
Meters Feet f()meters feet
17
part length
Select partlengthFrom catalogWhere partno=“12AY”
11
COIN Elevation Axioms(Ontology)
12
Another Context Example
Company Name
Company NameNet Income
Net Income
Sales
Sales
DAIMLER-BENZ AG
346,57756,268,168
615,000,000
97,737,000,000
O&A DEM-USD Exchange Rate1.00 German Mark= 0.58 US Dollar as 12/31/93
WorldScope
Disclosure
OANDAWeb Server
Context Mediation Services
Users & Appl.Systems
Net IncomeCompany Name
Sales
DAIMLER-BENZ
614,99597,736,992
Datastream
Wrapper Services
*
*
*
*
*
DAIMLER BENZ CORP
13
Some Context DifferencesContext Definitions
Disclosure Worldscope DataStream Currency Used
Country of Incorporation
USD Country of Incorporation
Currency Conversion
Money Amount As_Of_Date
Money Amount As_Of_Date
Money Amount As_Of_Date
Currency Symbols
3 Letters 3 Letters 2 Letters
Scale Factor 1 1000 1000 Company Names
Disclosure Names Worldscope Names DataStream Names
Date Style American with ‘/’ as separator
American with ‘/’ as separator
European with ‘-’ as separator
Olsen (OANDA) Web Source uses 3 Letter Currency Symbols and European Date Style with ‘/’ as a separator
14
Domain Modelnumber exchange-
Ratestring
currency-Type
from
Cur
toCur
company-Financials
scal
eFac
tor
date
country-Name
curTypeSym
company-Name
curre
ncy
fyEnding
company
coun
tryIn
corp
form
at
date
FmttxnDate
officialCurrency
InheritanceAttributeModifier
Some currency context possibilities:• Currency is stated explicitly as part of record• Currency not stated, but the same for all (e.g., US $)• Currency not stated or constant, but inferred by country
15
HTTPD
-Daem
on
HTTPD
-Daem
on
HTTPD
-Daem
on
Web-site
Wrapper
WWW Gateway
SERVER PROCESSES MEDIATOR PROCESSES CLIENT PROCESSES
COINRepository
ContextMediator
Optimizer
Executioner
Data Store for IntermediateResults
SQL Compiler
DatalogQuery
MediatedQuery
Optimized Query Plan
N
N
HTTPD
-Daem
on
ODBC-compliant Apps
(e.g Microsoft Excel)ODBC-Driver
Web Client
(cgi-scripts)
Results
SQL Query
SQL Q
uery
COIN System Architecture
16
System Demonstration
Q6. Scenario: Using Context Interchange, the financial analyst can look at the Disclosure data using Datastream Context.
Query: Find out from Disclosure what Net Income for DAIMLER-BENZ was. Use Datastream Context.
Capabilities Demonstrated: Ability to perform Scale Factor Conversion, Date Format Conversion, Company Name Conversion.
Single Source Queries with MediationSingle Source Queries with Mediation
17
Demonstration – context2.mit.edu
Context
Source
18
Conflict Detection and Mediation
Date convertScale factor convertName convert
Mediated Query in Datalog
19
Mediated SQL Query & Result
Adjust scale factor
Date format conversion
Name conversion
Final results – from Disclosure but in Datastream context
Mediated SQL Query
20
The 1805 Overture
In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians promised that their forces would be in the field in Bavaria by Oct. 20.
The Austrian staff planned its campaign based on that date in the Gregorian calendar. Russia, however, still used the ancient Julian calendar, which lagged 10 days behind.
The calendar difference allowed Napoleon to surround Austrian General Mack's army at Ulm and force its surrender on Oct. 21, well before the Russian forces could reach him, ultimately setting the stage for Austerlitz.
Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390.
21
Summary
• Tremendous opportunity to gather and integrate information from many diverse sources• But … need to overcome many context challenges• Context-type “metadata” plays a critical role• COIN technology can be an important aid