21
1 MASSACHUSETTS INSTITUTE OF TECHNOLOGY SLOAN SCHOOL OF MANAGEMENT INFORMATION TECHNOLOGIES GROUP SEMANTIC INTEGRATION (COIN PROJECT) For Dr. Bob Popp, DARPA 8 April 2003 Stuart Madnick ([email protected]) Michael Siegel ([email protected]) Richard Wang ([email protected])

Data bases

  • Upload
    connor

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

MASSACHUSETTS INSTITUTE OF TECHNOLOGY SLOAN SCHOOL OF MANAGEMENT INFORMATION TECHNOLOGIES GROUP SEMANTIC INTEGRATION (COIN PROJECT) For Dr. Bob Popp, DARPA 8 April 2003 Stuart Madnick ([email protected]) Michael Siegel ([email protected]) Richard Wang ([email protected]). - PowerPoint PPT Presentation

Citation preview

Page 1: Data bases

1

MASSACHUSETTS INSTITUTE OF TECHNOLOGYSLOAN SCHOOL OF MANAGEMENT

INFORMATION TECHNOLOGIES GROUP

SEMANTIC INTEGRATION (COIN PROJECT)

For Dr. Bob Popp, DARPA

8 April 2003 Stuart Madnick ([email protected]) Michael Siegel ([email protected])

Richard Wang ([email protected])

Page 2: Data bases

2

Data bases

Appli- cations

OUTPUT PROCESSING

ODBC Driver

Web - Publishing

CONTEXT MEDIATION* Automatic Automatic conflict conflict detection detection and and conversionconversion- Derived data- Source selection- Source attribution

TRUSTED

AGENTS

INPUT PROCESSING* Automatic web wrapping- - Semi-Semi-structured structured texttext-Multi--Multi-source source query plan query plan and and executionexecution

Browsers APPLICATIONS: Financial services,

electronic commerce, asset visibility, in-transit visibility.

Sources

Web Pages

Receivers

COntext INterchange (COIN) Project

Page 3: Data bases

3

Background on DARPA Supportfor Context Mediation Research

• Initial efforts funded as part of DARPA Intelligent Integration of Information (I3) Program

• Period: July 1993 - Sept 1998• Started under: Gio Wiederhold• then under: Dave Gunning & Bob Neches

Other related activity:• MIT Total Data Quality Management (TDQM)• Since 1991 (web.mit.edu/tdqm)

Page 4: Data bases

4

Multiple Perspectives . . . old lady or young lady ?

Page 5: Data bases

5

CONTEXT VARIATIONS:- GEOGRAPHIC ( US vs. UK )- FUNCTIONAL (CASH MGMT vs. LOANS )- ORGANIZATIONAL ( CITIBANK vs. CHASE )

Context Context

Context

Data: Databases Web data E-mail

?$ £

¥

Role Of Context01-02-03

03-02-01

02-01-03

Page 6: Data bases

6

Example : Context Differences ( from multiple web

sources)

Daimler Benz ( DAI ) Financial Data P/E Ratio

ABC 11.6Bloomberg 5.57DBC 19.19MarketGuide 7.46

Page 7: Data bases

7

Complementary Aggregation Example• Q: How did CO2 emissions

(total, per GDP, per capita) change over time (between 1990 and 2000) in Yugoslavia?– User 1: YUG as a geographic

region bounded before the breakup

– User 2: YUG as a legal autonomous state

Related effort: - Laboratory for Information Globalization and Harmonization Technologies (LIGHT)

Page 8: Data bases

8

1990 2000

Country

GDP Pop GDP Pop

YUG 698.3 23.7

1627.8

10.6

BIH 13.6 3.9

HRV 266.9 4.5

MKD 608.7 2.0

SVN 7162 2.0

Country Code Currency CurCode

Yugoslavia YUG New Yug. Dinar

YUN

Bosnia and Herzegovia

BIH Marka BAM

Croatia HRV Kuna HRK

Macedonia MKD Denar MKD

Slovenia SVN Tolar SIT

From

To 1990 2000

USD YUG

10.5 67.267

USD BIH 2.086

USD HRV

8.089

USD MKD

64.757

USD SVN

225.93

CO2 Emission

Country 1990 2000

YUG 35604 15480

BIH 1279

HRV 5405

MKD 3378

SVN 3981

User 1 User 2

Country 1990 2000 1990 2000

CO2 35604 29523 35604 15480

GDP 66.5 104.8 66.5 24.2

CO2/capita 1.5 1.28 1.5 1.46

CO2/GDP 535 282 535 640

GDP/Capita

2800 4560 2800 1100

GDP in billions local currency; GDP in billions local currency; Population in millionsPopulation in millions

In 1000 tons per yearIn 1000 tons per year

Total CO2 in 1000 tons per year; GDP in Total CO2 in 1000 tons per year; GDP in billions USD; CO2/Capita in tons per billions USD; CO2/Capita in tons per person; CO2/GDP in tons per million USD; person; CO2/GDP in tons per million USD; GDP/Capita in USD per personGDP/Capita in USD per person

World Bank’s World Dev. World Bank’s World Dev. Indicator DB; Indicator DB; UN UN Statistic Division; Statistic Division; Statistics BureausStatistics Bureaus

OAK Ridge’s CDIAC DB; OAK Ridge’s CDIAC DB; WRI; GSSD; EPAsWRI; GSSD; EPAs Olsen (Web)Olsen (Web)

Many sources needed:Meanings in sources & users might differ

Page 9: Data bases

9

The 1999 OvertureUnit-of-measure mixup tied to loss of $125Million Mars Orbiter“NASA’s Mars Climate Orbiter was lost

because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . .

. . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ).”Source: Kathy Sawyer, Boston Globe, October 1, 1999, page 1.

Page 10: Data bases

10

The Context Interchange Approach

ContextMediator

Source Receiver

ReceiverContext

ConversionLibraries

SourceContext

SharedOntologies

ContextTransformation

Context ManagementApplication

Concept: Length

Meters Feet f()meters feet

17

part length

Select partlengthFrom catalogWhere partno=“12AY”

Page 11: Data bases

11

COIN Elevation Axioms(Ontology)

Page 12: Data bases

12

Another Context Example

Company Name

Company NameNet Income

Net Income

Sales

Sales

DAIMLER-BENZ AG

346,57756,268,168

615,000,000

97,737,000,000

O&A DEM-USD Exchange Rate1.00 German Mark= 0.58 US Dollar as 12/31/93

WorldScope

Disclosure

OANDAWeb Server

Context Mediation Services

Users & Appl.Systems

Net IncomeCompany Name

Sales

DAIMLER-BENZ

614,99597,736,992

Datastream

Wrapper Services

*

*

*

*

*

DAIMLER BENZ CORP

Page 13: Data bases

13

Some Context DifferencesContext Definitions

Disclosure Worldscope DataStream Currency Used

Country of Incorporation

USD Country of Incorporation

Currency Conversion

Money Amount As_Of_Date

Money Amount As_Of_Date

Money Amount As_Of_Date

Currency Symbols

3 Letters 3 Letters 2 Letters

Scale Factor 1 1000 1000 Company Names

Disclosure Names Worldscope Names DataStream Names

Date Style American with ‘/’ as separator

American with ‘/’ as separator

European with ‘-’ as separator

Olsen (OANDA) Web Source uses 3 Letter Currency Symbols and European Date Style with ‘/’ as a separator

Page 14: Data bases

14

Domain Modelnumber exchange-

Ratestring

currency-Type

from

Cur

toCur

company-Financials

scal

eFac

tor

date

country-Name

curTypeSym

company-Name

curre

ncy

fyEnding

company

coun

tryIn

corp

form

at

date

FmttxnDate

officialCurrency

InheritanceAttributeModifier

Some currency context possibilities:• Currency is stated explicitly as part of record• Currency not stated, but the same for all (e.g., US $)• Currency not stated or constant, but inferred by country

Page 15: Data bases

15

HTTPD

-Daem

on

HTTPD

-Daem

on

HTTPD

-Daem

on

Web-site

Wrapper

WWW Gateway

SERVER PROCESSES MEDIATOR PROCESSES CLIENT PROCESSES

COINRepository

ContextMediator

Optimizer

Executioner

Data Store for IntermediateResults

SQL Compiler

DatalogQuery

MediatedQuery

Optimized Query Plan

N

N

HTTPD

-Daem

on

ODBC-compliant Apps

(e.g Microsoft Excel)ODBC-Driver

Web Client

(cgi-scripts)

Results

SQL Query

SQL Q

uery

COIN System Architecture

Page 16: Data bases

16

System Demonstration

Q6. Scenario: Using Context Interchange, the financial analyst can look at the Disclosure data using Datastream Context.

Query: Find out from Disclosure what Net Income for DAIMLER-BENZ was. Use Datastream Context.

Capabilities Demonstrated: Ability to perform Scale Factor Conversion, Date Format Conversion, Company Name Conversion.

Single Source Queries with MediationSingle Source Queries with Mediation

Page 17: Data bases

17

Demonstration – context2.mit.edu

Context

Source

Page 18: Data bases

18

Conflict Detection and Mediation

Date convertScale factor convertName convert

Mediated Query in Datalog

Page 19: Data bases

19

Mediated SQL Query & Result

Adjust scale factor

Date format conversion

Name conversion

Final results – from Disclosure but in Datastream context

Mediated SQL Query

Page 20: Data bases

20

The 1805 Overture

In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians promised that their forces would be in the field in Bavaria by Oct. 20.

The Austrian staff planned its campaign based on that date in the Gregorian calendar. Russia, however, still used the ancient Julian calendar, which lagged 10 days behind.

The calendar difference allowed Napoleon to surround Austrian General Mack's army at Ulm and force its surrender on Oct. 21, well before the Russian forces could reach him, ultimately setting the stage for Austerlitz.

Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390.

Page 21: Data bases

21

Summary

• Tremendous opportunity to gather and integrate information from many diverse sources• But … need to overcome many context challenges• Context-type “metadata” plays a critical role• COIN technology can be an important aid