Upload
gillian-boone
View
213
Download
0
Embed Size (px)
Citation preview
1
SIMTech – Invited Research Lecture
Integrating Information from Global Systems:
Knowledge Representation and Reasoning in the Context Interchange System
January 24, 2007
Stuart Madnick ([email protected])
MASSACHUSETTS INSTITUTE OF TECHNOLOGYSLOAN SCHOOL OF MANAGEMENT
& SCHOOL OF ENGINEERINGINFORMATION TECHNOLOGIES GROUP
© MIT, 2007
2
Characteristics of Global Systems
• Large number of sources– Manufacturers, Suppliers, Logistics, Customers, etc– Online comparison shopping services
• Diverse user needs – Different organizations have different needs
• Cannot establish a single data standard– Sometimes works, but not always
• Must get semantics right– Adaptability, extensibility, scalability
3
Example from RFID: Types of Information in EPCGlobal Registry
Five types of Information: 162 Attributes
4
A Case Study on Context Issues
• Company is a top China-based international trading firm supplying more than 50,000 types of goods to 350 buyers located in 40 countries.
• It’s trading product lines are very wide, but mostly in consumer packaged goods (CPG), apparel, and hard-line categories.
• It’s buyers include many US’s major retailers such as Wal-mart, Home Depot, Staples, Target, and once received the best supplier award from Wal-mart.
• It is a member of the local EAN meaning that it publishes its offered product items in a product database LocalRegistry
5
Some Global Supply-Chain Problem areas caused by:
1. Measurement systems
2. Regulations: Safety & Substituability
3. Cultural systems
4. Logistical systems
5. Trading terms
6
Scenario 1: Core Attribute Context Discrepancy: Measurement
• Context discrepancy subject to different measurement systems used in China and the US.– GlobalRegistry’s attributes “height”, “width”
and “length” are assumed to take “inch” while LocalRegistry’s counterpart attribute are assumed to take “cm”.
– GlobalRegistry’s attributes “FlashPointTemp”, indicating the flashpoint temperature for hazardous material, is assumed to take “Degrees Fahrenheit” while the local convention is assumed to take “Degrees Celsius”.
7
Scenario 3: Manufacturing-specific Attribute Context Discrepancy: Cultural systems
• Context discrepancy subject to different cultural systems used in China and US. – GlobalRegistry’s attribute “PackageType” is
used in the apparel industry indicating whether the item is of the size: ‘S”, “M”, “L”, and “XL”.
– Contract manufacturers in China might interpret “M” as the medium size for Asians and manufacture accordingly while the US buyers mean the medium size for Americans, which are very different in sizing.
8
Scenario 4: Logistics-specific Attribute Context Discrepancy: Logistic systems
• Context discrepancy subject to different logistic systems used in China and US. – GlobalRegistry’s attributes “ti” and ‘hi” refer
respectively to the number of items that can fit on a single layer on a pallet and the number of layers on a pallet.
– The issue arises when the standard pallets being used in Asia (mostly 100 * 100) are different from the standard pallets used in domestic US (100* 120).
– Consequently, the values for “ti” and “hi” filled by the Asian suppliers based on the former pallet capacity will be misleading and cause troubles for a LSP in the US adopting the latter pallet standard (e.g., Wal-mart cross-docking distribution strategy).
9
Example for analysis:Comparison Shopping: www.mysimon.com
10
Regional Comparison Shoppers
US Sweden France UK
11
Motivating Example
Semantic aspect Number of distinctions
Currency 10 different currencies (e.g., USD, UKP, JPY, TRL, KOW)
Scale factor 3 different scale factors: 1, 1K, 1M
Price definition 3 different definitions: base, base+tax, base+tax+SH
Date format 3 different formats, mm/dd/yyyy, dd-mm-yyyy, yyyy-mm-dd
Global Online Comparison Shopping– Different semantic assumptions in data – Compare prices in the context of any source chosen
by the user– Many vendor sources in different countries– Example: 270 potential different contexts
Need many conversions - 159,600 of them!
12
Desired Properties • Adaptability
– Capability of accommodating changes in sources
• Extensibility– Easy to add/remove sources
• Scalability– Effort of enabling interoperation wrt the number of
sources and the size of ontology – Performance wrt number of sources and the size
of each source (query optimization issue)
• Flexibility = Adaptability + Extensibility
13
Interoperate: hard-wired approaches(a) BFS approach: Brute-force between pair-wise sources
(b) BFC approach: Brute-force between contexts
1 2
6
5 4
3
1 2
6
5 4
3Internal
standard
(c) Internal standard approach:Adopting a standard
1 2
65 43
context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy
context_acurrency: ‘KRW’; scaleFactor:1000kind: base; format: yyyy-mm-dd
context_ccurrency: ‘USD’; scaleFactor:1kind:base+tax+SH; format: mm/dd/yyyy
14
Data bases
Appli- cations
OUTPUT PROCESSING
ODBC Driver
Web - Publishing
CONTEXT MEDIATION* Automatic Automatic conflict conflict detection detection and and conversionconversion- Derived data
- Source selection
- Source attribution
TRUSTED
AGENTS
INPUT PROCESSING
* Automatic web wrapping
- - Semi-Semi-structured structured texttext
-Multi--Multi-source source query plan query plan and and executionexecution
Browsers APPLICATIONS: Financial services,
electronic commerce, asset visibility, in-transit visibility.
Sources
Web Pages
Receivers
COntext INterchange (COIN) Project
15
Key COIN Technologies Web Wrapper
Extract selected information from web (HTML+XML) Allows web to be treated as large relational SQL database Handles dynamic web sites, cookies, “login”, etc. Performs SQL Joins & Unions involving DB’s + Web sources
Context Mediator Resolve semantic (meaning) differences
Enable meaningful aggregation & comparison
16
Context: Multiple Perspectives . . . old lady or young lady ?
17
CONTEXT VARIATIONS:- GEOGRAPHIC ( US vs. UK )
- FUNCTIONAL (CASH MGMT vs. LOANS )
- ORGANIZATIONAL ( CITIBANK vs. CHASE )
Context Context
Context
Data: Databases Web data E-mail
?$ £
¥
Role Of Context08-07-09
09-07-08
07-08-09
18
Types of Context
Representational Ontological
Temporal
Example Temporal
Representational Currency: $ vs € Scale factor: 1 vs 1000
Francs before 2000, € thereafter
Ontological Revenue: Includes vs excludes interest
Revenue: Excludes interest before 1994 but incl. thereafter
19
Airbus' A380 double-decker jet is two years behind schedule, sending billions of dollars in potential profits down the drain. But the reason sounds too simple to be true: Airbus factories in Germany and France were using incompatible design software, so the wiring produced in Hamburg didn't fit properly into the plane on the assembly line in Toulouse.
Point:Not just a technology issue,but also involves business strategy and organization/culture.
20
The 1999 Overture
Unit-of-measure mixup tied to loss of $125Million Mars Orbiter
“NASA’s Mars Climate Orbiter was lost because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . .
. . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ).”
Source: Kathy Sawyer, Boston Globe, October 1, 1999, page 1.
21
COntext Interchange (COIN) Approach
ContextMediator
ReceiverContext
ConversionLibraries
SourceContext
SharedOntologies
Context ManagementAdministrator
Concept: Length
Meters Feet f()meters feet
Source Receiver
ContextTransformation17
part length
Select partlengthFrom catalogWhere partno=“12AY”
55.79
Auto-composition of conversions
Select partlength/.3048From catalogWhere partno=“12AY”
22
COIN Conceptual Model
(Ontology)
23
Ontology and Conversion Functioncontext_acurrency: ‘KRW’; scaleFactor:1000kind: base; format: yyyy.mm.dd
context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy
context_ccurrency: ‘USD’; scaleFactor:1kind:base+tax+SH; format: mm/dd/yyyy
context_d is_a context_b scaleFactor:1e3
context_e is_a context_dFormat: yyyy-mm-dd
context_f is_a context_cKind: base+tax
monetaryValue
price
temporalEntitybasic
kind
currency
is_a relationship
attribute
modifier
Legend
format
scaleFactor
organization
taxRate
Example source: src_turkey(Product, Vendor, QuoteDate, Price)
.*])2([),,,_(
][])2([])1([],1@)2,([
|:
222
ruvrCvalueRDTBCACDRBAolsen
TtempAttrxCCcurrencyxCCcurrencyxvuCCcurrencycvtx
luemonetaryVax
CC
t
C
f
tf
24
Demo – Same Context
No semantic differences
Meaningful data returned
25
(a) Select Vendor, Price From src_turkey Where Product=“Samsung SyncMaster 173P”;
Conversion for scale factor
(b) Select Vendor, QuoteDate, Price From src_turkey Where Product=“Samsung SyncMaster 173P”;
Conversion for date formatConversion for scale factor
Compose only relevant conversions (b e)
context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax;format: dd-mm-yyyy
context_d is_a context_b scaleFactor:1e3
context_e is_a context_dFormat: yyyy-mm-dd
26
Auto-reconciliation for auxiliary source (b f)
Introduced because of context difference in auxiliary source
context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy
context_ccurrency: ‘USD’; scaleFactor:1kind:base+tax+SH; format: mm/dd/yyyy
context_f is_a context_cKind: base+tax
27
Detection and Explication (ba)
context_acurrency: ‘KRW’; scaleFactor:1000kind: base; format: yyyy.mm.dd
context_bcurrency: ‘TRL’; scaleFactor:1e6kind:base+tax; format: dd-mm-yyyy
28
Date format for receiver
Price definition – remove taxScale factor
Date format for auxiliary source olsenCurrency
Mediated Query (b a)
29
Flexibility and Scalability
Approach General case In the example BFS N(N-1), N:= number of sources and
receivers 159,600
BFC n(n-1), n:= number of unique contexts 72,630 ETL/GS 2N, N:= number of sources and receivers 800 COIN 1) Worst case:
m
iii nn
1)1( , ni:= number of
unique values of ith modifier, m := number of modifiers in ontology
2)
m
iin
1)1( , when equational relationships
exist 3) m, if all conversions can be
parameterized
1) worst: 108 2) actual number: 5 (3
general conversions plus 2 for price)
Need to update/add many conversion programs
• Why other approaches cannot fully benefit from general purpose conversion?– the decision whether to invoke the conversion is in the conversion
program
Update the declarative knowledge base.
flexible
Flexible
Not
30
How COIN Scales
• Semantic differences cannot be standardized away• Must be flexible and scalable• COIN approach
– Component conversions are defined for each modifier– Overall conversions are automatically composed by abductive
reasoning engine– Composition via symbolic equation solver and a shortest path
algorithm– Inheritance enabled
• COIN is a good solution to:– Modularization, declarativeness– Automatic composition of necessary conversions
31
The 1805 Overture
In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians promised that their forces would be in the field in Bavaria by Oct. 20.
The Austrian staff planned its campaign based on that date in the Gregorian calendar. Russia, however, still used the ancient Julian calendar, which lagged 10 days behind.
The calendar difference allowed Napoleon to surround Austrian General Mack's army at Ulm and force its surrender on Oct. 21, well before the Russian forces could reach him, ultimately setting the stage for Austerlitz.
Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390.
32
Summary• Tremendous opportunity to gather and integrate
information from many diverse sources
• But … need to overcome many context challenges
• Context-type “metadata” plays a critical role
• COIN technology can be an important aid for semantically meaningful information integration:
- Scalable- Extensible
- Application Domain Merging- Reuse and extension of ontologies and contexts
References: http://web.mit.edu/smadnick/www/wp/CISL-Sloan%20WP%20spreadsheet.htm
33
Appendix: Sample Applications• Airfare, Car Rental and Merged Travel • Weather • Global Price Comparison • Airfare Aggregation • Disaster Relief • TASC Financial Example • Web Services Demo • Corporate Householding • Aggregation/Integration of Intelligence Data• Infrastructure for Inter-organizational RFID
Systems
34
Web page spec file *
Appendix: COIN Web-Wrapper Technology
Select Edgar.Net_incomeFrom EdgarWhere Edgar.Ticker=intcand Edgar.Form=10-Q
Ticker Net IncomeINTC 1,983
User or Program (via SQL Query)
Web Wrapper Generat
or
Data record returned
* Spec file contains:Schema, Navigation rules,and Extraction rules.
SQLSide
HTMLSide