Upload
rodger-miller
View
214
Download
0
Embed Size (px)
Citation preview
© 2001 Microsoft Corp. 1
Generic Model Management
A Database Infrastructure for
Schema Manipulation
Philip A. BernsteinMicrosoft Corporation
September 6, 2001
© 2001 Microsoft Corp. 2
The Problem There is 30 years of DB Research on meta data
But we don’t have great infrastructure to offer– Most design tools and web services store meta data
in files, not DBs– OODBMS’s are not a huge success– Most meta data driven tools use their own infrastructure
Goal: generic meta data manipulation infrastructure – Reduce the amount of programming required to build meta
data driven applications.
Proposal: Model Management– Define an algebra to manipulate meta data in large
chunks, called models and mappings.
© 2001 Microsoft Corp. 3
Outline
• Overview of Model Management
• Solutions to classical meta data problems
• Recent technical results
© 2001 Microsoft Corp. 4
Models and Mappings• Model – a complex information structure
– XML schema, SQL schema, OO interface, UML model, web site map, make script, ….
• Mapping – a transformation from one model into another– Map between two XML schemas– Map a SQL schema to an XML schema– Map data sources to a data warehouse– Map an ER diagram to a SQL schema– Map a process defn to a workflow script
© 2001 Microsoft Corp. 5
RepresentationA model is a directed graph with one root.A model is a directed graph with one root.
Emp
E#
Dept#
Name
RelationalSchema
Emp
E#
Dept#
Name
First
Last
XSDmap1
A mapping is a model each A mapping is a model each of whose nodes connects of whose nodes connects nodes of two other modelsnodes of two other models
© 2001 Microsoft Corp. 6
Model Management Algebra
• Match
• Merge
• Compose
• Select
• Diff
• Enumerate
• ApplyFunction
• Copy
• Update operations
© 2001 Microsoft Corp. 7
map = Match(M1, M2, ) • Match(M1, M2, ) returns the best mapping
between M1 and M2, w.r.t. to
map1
=
=
Emp
E#
Dept#
Name
Addr
M1
M2Emp
E#
Dept#
Name
First
Last
Phone
© 2001 Microsoft Corp. 8
M3 = Merge(M1, M2, map)
• Return the union of models M1 and M2
– Use map to guide the Merge– If elements x = y in map, then collapse
them into one element
Emp
Addr Name
Emp
Name Phone
mapC
=
Emp
Name PhoneAddr
© 2001 Microsoft Corp. 9
Left Composition ( f • )Emp
Addr
Street
City
Emp
Street
City
Emp
StAddr
Town
mapA
a1
a2
a3
mapB
b2
b3
M1 M2 M3
Emp
Addr
Street
City
Emp
StAddr
Town
mapC
c1
c2
c3
mapC = mapA f• mapB
Name Nameb1
© 2001 Microsoft Corp. 10
Model Management Algebra
• map = Match (M1, M2, )
• M3 = Merge (M1, M2, map)
• map3 = Compose(map1, map2)
• M2 = Select(M1, pred)
• M2 = Diff(M1, map)
• list = Enumerate(M)
• ApplyFunction(M, f )
• M2 = Copy(M1)
• Update operations
They’re generic = data model independent … well … implemented on an extended ER model with an extensibility story
© 2001 Microsoft Corp. 11
Example
rdb1rdb1
xsd1xsd1
map
1
xsd2xsd21. map2 1. 1. mapmap22= Match(xsd1, xsd2)= Match(xsd1, xsd2)
2. m
ap3
2. 2. mapmap33 = = mapmap11 mapmap22
rdb2rdb2
3. m
ap4
3. <3. <mapmap44, rdb2 > = Copy(, rdb2 > = Copy(mapmap33))
• Given – map1 from SQL schema rdb1 to xsd1, – xsd2, which is similar to xsd1
• Produce– a map between xsd2 and a relational schema.
4. Use ApplyFunction(map4) to map each x in Diff(xsd2,map4) into rdb2
© 2001 Microsoft Corp. 12
Theme• Classic meta data problems can be solved
using Model Management operations– Schema integration – Schema evolution – Data migration– Reverse engineering– Data reintegration (3-way merging)
• Published solutions to these problems help us produce generic implementations of model mgmt operations
© 2001 Microsoft Corp. 13
OutlineOverview of Model Management
• Solutions to classical meta data problems– Schema integration – Schema evolution– Reverse engineering– Data reintegration (3-way merging) – Data migration
• Recent technical results
© 2001 Microsoft Corp. 14
1. map
1. 1. mapmap= Match(V= Match(V11, V, V22))
Schema Integration• Given
– two view schemas, V1 and V2
• Produce – an integrated schema, S
VV11 VV22
2. S2. S = Merge(V = Merge(V11, V, V22 , map) , map)
map
SS
2. 3. 3. ApplyFunction(S) // to resolve ) // to resolve conflicts in conflicts in S, , producing SS
SS
© 2001 Microsoft Corp. 15
Emp
E#
Dept#
Addr
V1 V2
E#
Dept#
Phone
FirstName
LastName
Emp
Name
1. 1. mapmap= Match(V= Match(V11, V, V22))
map
=
=
2. S2. S = Merge(V = Merge(V11, V, V22 , map) , map)
S
E#
Dept#
Addr
Phone
Emp
Name
FirstName
LastName
f
L
R
FirstName
LastName 3. Use ApplyFunction(S3. Use ApplyFunction(S)) to re- to re-solve conflicts, producing Ssolve conflicts, producing S
© 2001 Microsoft Corp. 16
Merging Knowledge Bases (Ontologies)
• Same as schema integration, but applied to ontologies
• The literature on merging ontologies focuses mostly on Match.
© 2001 Microsoft Corp. 17
Schema Evolution• Given
– mapSV from schema S to view V– a modified version S of S
• Produce– a mapping mapSV from S to V
(i.e. a view defn for V over S).
SS
VV
map
SV
SS1. mapSS
1. 1. mapmapSSS S = Match(S= Match(S, S), S)2. map
SV2. 2. mapmapSS V V = = mapmapSS S S mapmapSVSV
3. Use ApplyFunction(V) to delete elements not derivable from S
© 2001 Microsoft Corp. 18
OutlineOverview of Model Management
• Solutions to classical meta data problemsSchema integration Schema evolution – Reverse engineering– Data reintegration (3-way merging)
– Data migration
• Recent technical results
© 2001 Microsoft Corp. 19
Reverse Engineering• Given
– Model M (e.g., an ER model)– Model G (e.g., SQL) generated via mapMG from M– A modified version G of G
• Produce– A modified version M of M that generates G
GG
MM
map
MG
GG1. mapGG
1. map1. mapGGGG = Match(G, G= Match(G, G))2. m
apM
G
2. map2. mapMGMG = map= mapMG MG map mapGGGG
MM3. map
MG
3. <M3. <M, map, mapGG M M > = Copy(map> = Copy(mapMGMG))
4. Use ApplyFunction(mapMG), to reverse engineer each g in Diff(G,mapMG) into M
© 2001 Microsoft Corp. 20
3-Way Merge (aka Reintegration)• Given
– a source schema S0
– two derived schemas S1 and S2
• Produce– a schema S3 that merges the changes of S1 and S2
1. MapOA = Match(O, A) (based on OIDs)
2. MapOB = Match (O, B) (based on OIDs)
3. MapOA = ApplyFunction(MapOA) such that if eMapOA if domain(e) = range(e), then delete e (i.e. things changed in A)
4. MapOB = ApplyFunction(MapOB) such that if eMapOB if domain(e) = range(e), then delete e (i.e. things changed in B)
5. ChangedA = range(MapOA)6. ChangedB = range(MapOB)7. MapChAChB = Match(ChangedA, ChangedB)
8. MapChBChA = invert(MapChAChB)
9. A = Diff(ChangedA, ChangedB, MapChAChB) (changed in A but not changed in B)
10. B = Diff(ChangedB, ChangedA, MapChBChA)
11. MapAB = Match (A,B) (by OIDs)
12. G = Merge (A,B, MapAB)
13. MapGA =Match(G,A)
14. GA = Merge (G, A, MapGA) with preference for A 15. MapGAB =Match(GA,B) 16. GAB = Merge (GA’, B’, MapGA’B’) with preference for B17. DeletedA = Diff(O,A,MapOA)
18. DeletedB = Diff(O, B, MapOB)
19. MapDeletedAChangedB = Match(DeletedA, ChangedB)
20. MapDeletedBChangedA = Match(DeletedB, ChangedA)
21. ShouldDeleteA = Diff(DeletedA, ChangedB, MapDeletedAChangedB)
22. ShouldDeleteB = Diff(DeletedB, ChangedA, MapDeletedBChangedA)
23. MapGABSDA = Match(GAB, ShouldDeleteA)
24. GABSDA = Diff(GAB, ShouldDeleteA, MapGABSDA)
25. MapGABSDASDB = Match(GABSDA,ShouldDeleteB)
26. Final result = Diff(GABSDA, ShouldDeleteB, MapGABSDASDB)
S0
S1 S2
S3
© 2001 Microsoft Corp. 21
Data Migration• Given
– a schema S and its database D– an evolved schema S
• Produce– a procedure for mapping D into an
S database D
SS SS D
2. Use Enum(S) to generate a data migration script
GenerateMigration
ScriptEnum
1. 1. mapmapSSSS = Match(S, S= Match(S, S))
1. mapSS
Run
D
© 2001 Microsoft Corp. 22
Data Translation
• Like data migration, except S and S are expressed in different data models.
© 2001 Microsoft Corp. 23
OutlineOverview of Model Management
Solutions to classical meta data problems
• Recent technical results
© 2001 Microsoft Corp. 24
Status Report• Vision
– [Bernstein, Halevy, & Pottinger, SIGMOD Record 12/00]
• Data Warehouse Examples– [Bernstein & Rahm, ER ’00]
• Match Operation– Survey: [Rahm & Bernstein, MSR Tech Report]– Prototype: [Madhavan, Bernstein, & Rahm, VLDB ’01]
• Merge Operation– coming soon …
• Theory– [Alagić & Bernstein, DBPL ’01]
© 2001 Microsoft Corp. 25
Schema Matching Approaches• About a dozen published algorithms.
• Many good ideas, but none are robust.
Automatic composition
Composite
Individual matchers Combined matchers
Manual composition
Schema-based Content-based
• Graph matching
Linguistic Constraint-based
StructuralPer-Element
• Types• Keys
• Value pattern and ranges
Constraint-based
Linguistic
• IR (word frequencies, key terms)
Per-Element
Hybrid
Constraint-based
• Names• Descriptions
© 2001 Microsoft Corp. 26
The CUPID Algorithm
City Street
PurchaseOrder
InvoiceToDeliverTo
City Street City Street
Address Address
POShipTo
PO
POBillTo
City Street
ssim++
• Computes linguistic similarity of element pairs• Computes structural similarity of element pairs• Generates a mapping
© 2001 Microsoft Corp. 27
M3 = Merge(M1, map, M2)
• [Buneman, Davidson, Kosky, EDBT 92]– Meta-model has aggregation & generalization only– Do a union and collapse objects having the same name– Fix-up step for inconsistencies created by merging
Y
X
a
Z
X
aY X Z
W
a
Y
X
Z
a a
– Successive fixups lead to different results – Batch them at the end, to produce a unique minimal result
• Now enrich the meta-model (containment, complex mappings) & merge semantics (conflicts, deletes)
© 2001 Microsoft Corp. 28
A Formal Semantics for Model Mgt
• Use category theory for a data-model-independent characterization of models and mappings
• Models and their DBs are categories
• Model and data transformations are morphisms
• Mappings between models & data are functors
• Utility
– Define formal semantics for Match and Merge
– Explain when Match & Merge preserve constraints.
– Check that implementation satisfies the semantics
© 2001 Microsoft Corp. 29
Categories
Functor
Theory
Db Db(Sch1)
Db(Sch12)Db(Sch2)
DbDb
q
p
Sch12
Sch1
Sch2
fSchm
g
Match
Merge
• Goal – a mathematical semantics of MM algebra
© 2001 Microsoft Corp. 30
Implementation Vision
OR Mapper
MM Meta-Model
MatchMerge
ComposeCopy
Apply …
Model-DrivenUI Generator
ModelManager
Object-OrientedRepository
SQLDBMS
BillCustomer
UpdateMarketing
Inventory
AuthorizeCredit
OrderEntry
ScheduleDelivery
Customer
Order
ScheduledDelivery
Product
Salesperson
select allselect all
custempdept
dnodna
Generic ToolsGeneric Tools• BrowserBrowser• Import/exportImport/export• ScriptingScripting
• EditorsEditors• CatalogsCatalogs
OperationSpeciali-zations
InferencingEngine
© 2001 Microsoft Corp. 31
Related Work• There’s a lot of it. Apply it to model management!
• Platforms – OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic)
• Inferencing on mappings – AQUV, description logic
• Transitive closure and recursive QP
• Differencing – text, trees, graphs
• Data translation – algebras, schema evolution
• Data integration – schema match, view generation
© 2001 Microsoft Corp. 32
Summary
• Raise the level of abstraction of meta-data programming by using:– models and mappings as objects– an algebra that manipulates models and
mappings on a generic meta-model
• Classical meta data problems can be expressed using this algebra
• Implementations of classic problems offer guidance on implementing the algebra
© 2001 Microsoft Corp. 33
References• http://www.research.microsoft.com/~philbe
• P. Bernstein & E. Rahm, “Data Warehouse Scenarios for Model Management”, ER 2000 Conference
• P. Bernstein, A. Levy, R. Pottinger, “A Vision for Manage-ment of Complex Models”, SIGMOD Record, Dec. 2000
• E. Rahm, P. Bernstein, “On Matching Schemas Automatically,” MSR Tech Report
• J. Madhavan, P. Bernstein, E. Rahm, “Generic Schema Matching with Cupid”, VLDB 2001
• S. Alagić, P. Bernstein, “A Model Theory for Generic Schema Management”, DBPL 2001