21
SYSTEMATIC THOUGHT LEADERSHIP FOR INNOVATIVE BUSINESS QuickMig Semi-automatic Schema Matching for Data Migration Christian Drumm , SAP AG Matthias Schmitt, SAP AG Hong-Hai Do, SAP AG Erhard Rahm, University Leipzig 01.10.2007

QuickMig - uniroma1.itlenzerin/INFINT2007/material/Drumm.pdfChristian Drumm, SAP AG Matthias Schmitt, SAP AG Hong-Hai Do, SAP AG Erhard Rahm, University Leipzig 01.10.2007 Problem

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

  • SYSTEMATIC THOUGHT LEADERSHIP FOR INNOVATIVE BUSINESS

    QuickMigSemi-automatic Schema Matchingfor Data Migration

    Christian Drumm, SAP AGMatthias Schmitt, SAP AGHong-Hai Do, SAP AGErhard Rahm, University Leipzig

    01.10.2007

  • Problem DescriptionData migration from unknown source to well-knowntarget systemData export schemas of legacy systems not designedfor interoperabilityDevelopment of mappings requires significant effort

    QuickMig:Overview

    © SAP 2007 / Page 2

    Automatic Schema MatchingCOMA++

    Used to evaluate schema based matching approachesEnables flexible combination of different matchingapproaches

    Schema-based matching approaches not applicableto data migration use case

    SolutionDevelopment of new matching algorithmsDevelopment of a new methodologyCombination of new approaches with existingCOMA++ functionalityPrototype implementation including evaluation of newapproach

  • © SAP 2007 / Page 3

    1. QuickMig1.1. Data Migration1.2. New Concepts1.3. Evaluation

    2. Open Research Questions

    Agenda

  • Data Migration:Scenario

    ScenarioData migration for „SAP Business ByDesign“

    SAP's comprehensive new mid market solutionFinancials, Customer Relationship Management, Human Resources Management,Supply Chain Management, Project Management, …

    Volume businessLegacy data migration within one week

    ChallengesCustomers lack the expertise to perform data migrationCustomers cannot afford expensive data migration projectsHigh diversity of possible source systems

    3rd party systemsCustom solutions

    Migration of any (unknown) source system to well-knowntargets system

  • Data Migration:Opportunities – Domain Knowledge

    Limited value of source schema informationData export schemas of legacy systems are not designed forinteroperabilityTechnical names, abbreviations, proprietary structures, codes,flat structures, …Low quality of meta-data (e.g. element documentation)

    Availability of domain knowledgeSoftware vendor has detailed knowledge about businesscapabilities and import interfaces of target systemCustomer has detailed knowledge about business capabilities ofsource system

    Utilize target system knowledgeLeverage business knowledge of source system

    Internal customerNumber

    First part of thecompany name

    ZIP Code of thePOBox

  • Scope of data migration projectsRich target system capabilities which can be easily configured to customers needsData Migration highly dependent on source system capabilities

    Accumulated mapping knowledgeSeveral mapping tasks in one data migration projecte.g. mapping of customer data, supplier data, purchase orders, …Schemas usually contain similar elements and sub-structurese.g. address data, bank data, currency code, …

    Flexibly reduce migration scope

    Data Migration:Opportunities – Scope

    Reuse previous mapping results

  • New Concepts:Schema Reduction

    GoalReduce complexity of matching taskSimplify verification of proposed mapping

    ApproachElectronic questionnaire to exploit source system capabilities and customer specific scopeEach question targets a specific capability of the target system

    Irrelevant parts of target schema can be removedAdditional simplifications (e.g. cardinality, time dependency, …)

    Mappings between original schema and reduced schema can be derived automatically

    CustomerNameAddress [1..1]

    StreetCity

    CustomerNameAddress [1..n]

    StreetCity

    BankDetails [1..n]AccountNumberAccountHolder

    Shall bank detailsbe migrated?Yes No

    Shall multipleaddresses bymigrated?Yes No

  • New Concepts:Sample Data

    GoalUse instance data for schema matchingAchieve high precision value (=1.0)

    ApproachPre-deliver specially prepared instance data for target structure

    For elements with explicit business content only (no technical elements, no codes, …)Highly selective content for every element (no identical values, no substrings, …)

    Enforce identical instance in source systemSample instance is provided in business terms (PDF, Excel, Email, …)Customer can easily create sample instance via standard business user interface

  • New Concepts:Domain Knowledge

    GoalUtilize detailed target system knowledgeAchieve higher recall value

    ApproachModel domain knowledge

    Alternative information (about data formats and common standards)– Different date and time formats– Phone Number, Street and House NumberAdditional information (about target system)– Code lists including texts describing the code value– Default values for required fields (context dependent)– Query attributes for resolution of foreign key associations

    Provide additional instance based matchers exploiting domain knowledge

  • New Concepts:Reuse

    GoalReuse previous mapping results within one data migration projectAchieve higher recall value

    ApproachProvide mapping between similar structures of the target systemMatch corresponding source structuresCompose required mapping by reusing previously developed correct mapping

  • New Concepts:Mapping Categories

    GoalDetermine executable mapping code (beyond simple “Move”)

    ApproachDevelop a list of semantic mapping categories covering all requiredtransformationsExtend matchers to enable determination of mapping categories byexploiting sample data and domain knowledgeExtend “Reuse” matcher to support composition of mappingcategoriesDerive mapping code in runtime system

    Generation of mapping code (“Move”, “Concatenate”, …)Generation of code templates (“Value Mapping”, “Internal ID”, …)Manual implementation (“Complex”)

    Mapping CategoriesCreate Instance

    Key Mapping

    Internal ID

    Look Up

    Move

    Value Mapping

    Default Value

    Split

    Concatenate

    Complex

  • Evaluation:Schema Reduction

    EvaluationSchemas used for evaluation* Target schema after reduction

    Scenario # of ElementsTarget Schema 4639

    SAP R/3 4.0 953

    SAP ERP 2150

    SAP B1 480

    Scenario # of Elements

    Target Schema 4639

    SAP R/3 4.0 645

    SAP ERP 612

    SAP B1 639

    Resulting target schemas differ largely depending on source systemSignificant reduction of target schema possible

    * Evaluation data sets are available at: https://www.sdn.sap.com/irj/sdn/weblogs?blog=/pub/wlg/7008

    https://www.sdn.sap.com/irj/sdn/weblogs?blog=/pub/wlg/7008

  • Evaluation:Sample Data

    EvaluationPurely schema based matching Sample data based matching

    Schema-based matching approaches not applicableSample data matcher achieves precision of 1

    Scenario Prec. Recall F-Meas.SAP R/3 4.0 1 0.27 0.43

    SAP ERP 1 0.38 0.55

    SAP B1 1 0.58 0.70

    Average 1 0.41 0.56

    Scenario Prec. Recall F-Meas.SAP R/3 4.0 0.00 0.00 0.00

    SAP ERP 0.40 0.04 0.08

    SAP B1 0.31 0.18 0.23

    Average 0.23 0.07 0.10

    Improvement of recall required

  • Evaluation:Domain Knowledge

    EvaluationSample data based matching Combination of domain knowledge

    and sample data

    Utilization of domain knowledge significantly improves recallMatcher still achieves precision of 1

    Scenario Prec. Recall F-Meas.SAP R/3 4.0 1 0.50 0.66

    SAP ERP 1 0.49 0.65

    SAP B1 1 0.65 0.79

    Average 1 0.55 0.70

    Scenario Prec. Recall F-Meas.SAP R/3 4.0 1 0.27 0.43

    SAP ERP 1 0.38 0.55

    SAP B1 1 0.58 0.70

    Average 1 0.41 0.56

  • Evaluation:Reuse

    EvaluationReuse matcher Combination of reuse, sample

    data and domain knowledge

    Reuse matcher achieves mediocre precision and recall valuesReuse matcher and sample data determine complementary matches

    Scenario Prec. Recall F-Meas.SAP R/3 4.0 0.99 0.67 0.80

    SAP ERP 0.99 0.70 0.82

    SAP B1 1 0.80 0.89

    Average 0.99 0.72 0.84

    Scenario Prec. Recall F-Meas.SAP R/3 4.0 0.80 0.50 0.61

    SAP ERP 0.81 0.43 0.56

    SAP B1 1 0.80 0.89

    Average 0.87 0.58 0.69

    Combination of all approaches delivers very good results

  • Evaluation:Mapping Categories

    Evaluation97% of mapping categories identified correctlyAutomatic code generation for more than 40% ofthe matches possible („Move“ and „Split“)

    Significant reduction of manual implementation effort possible

    Mapping Categories OccurrenceCreate Instance 0.07

    Key Mapping 0.02

    Internal ID 0.02

    Look Up 0.13

    Move 0.36

    Value Mapping 0.14

    Default Value 0.02

    Split 0.07

    Concatenate 0.00

    Complex 0.15

  • © SAP 2007 / Page 17

    1. SAP Research2. QuickMig3. Open Research Questions

    Agenda

  • Open Research Questions

    1. How can mapping code be generated automatically based on mappingcategories?

    Obvious in case of „Move“What about more complex categoriesTemplates my allready be very helpful

    2. How can (useful) user input be aquired to support the automatic mappingprocess?

    Sample data just a first stepImportant that user input is aquired non-intrusivelyWhere is the sweet-spot?

    3. How to handle codes and code listsSample data not helpful due to large number of similar valuesUnknown codes and code lists

    © SAP 2007 / Page 18

  • © SAP 2007 / Page 19

    Thank you!

  • COMA++http://dbs.uni-leipzig.de/de/Research/coma.html

    QuickMigCIKM 2007 Paper: http://dbs.uni-leipzig.de/file/QuickMig-cikm07.pdfEvaluation data sets: https://www.sdn.sap.com/irj/sdn/weblogs?blog=/pub/wlg/7008

    Contact DetailsChristian Drumm [email protected] Do [email protected]

    QuickMig:Resources

    http://dbs.uni-leipzig.de/de/Research/coma.htmlhttp://dbs.uni-leipzig.de/file/QuickMig-cikm07.pdfhttps://www.sdn.sap.com/irj/sdn/weblogs?blog=/pub/wlg/7008mailto:[email protected]:[email protected]

  • © SAP 2007 / Page 21

    Copyright 2007 SAP AGAll rights reserved

    No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changedwithout prior notice.Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver, Duet, Business ByDesign, ByDesign, PartnerEdge and other SAP products and services mentioned herein as well as theirrespective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned andassociated logos displayed are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.

    The information in this document is proprietary to SAP. This document is a preliminary version and not subject to your license agreement or any other agreement with SAP. This documentcontains only intended strategies, developments, and functionalities of the SAP® product and is not intended to be binding upon SAP to any particular course of business, product strategy,and/or development. SAP assumes no responsibility for errors or omissions in this document. SAP does not warrant the accuracy or completeness of the information, text, graphics, links, orother items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties ofmerchantability, fitness for a particular purpose, or non-infringement.SAP shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials. This limitationshall not apply in cases of intent or gross negligence.The statutory liability for personal injury and defective products is not affected. SAP has no control over the information that you may access through the use of hot links contained in thesematerials and does not endorse your use of third-party Web pages nor provide any warranty whatsoever relating to third-party Web pages

    Weitergabe und Vervielfältigung dieser Publikation oder von Teilen daraus sind, zu welchem Zweck und in welcher Form auch immer, ohne die ausdrückliche schriftliche Genehmigung durchSAP AG nicht gestattet. In dieser Publikation enthaltene Informationen können ohne vorherige Ankündigung geändert werden.Einige von der SAP AG und deren Vertriebspartnern vertriebene Softwareprodukte können Softwarekomponenten umfassen, die Eigentum anderer Softwarehersteller sind.SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver, Duet, Business ByDesign, ByDesign, PartnerEdge und andere in diesem Dokument erwähnte SAP-Produkte und Servicessowie die dazugehörigen Logos sind Marken oder eingetragene Marken der SAP AG in Deutschland und in mehreren anderen Ländern weltweit. Alle anderen in diesem Dokument erwähntenNamen von Produkten und Services sowie die damit verbundenen Firmenlogos sind Marken der jeweiligen Unternehmen. Die Angaben im Text sind unverbindlich und dienen lediglich zuInformationszwecken. Produkte können länderspezifische Unterschiede aufweisen.

    Die in diesem Dokument enthaltenen Informationen sind Eigentum von SAP. Dieses Dokument ist eine Vorabversion und unterliegt nicht Ihrer Lizenzvereinbarung oder einer anderenVereinbarung mit SAP. Dieses Dokument enthält nur vorgesehene Strategien, Entwicklungen und Funktionen des SAP®-Produkts und ist für SAP nicht bindend, einen bestimmtenGeschäftsweg, eine Produktstrategie bzw. -entwicklung einzuschlagen. SAP übernimmt keine Verantwortung für Fehler oder Auslassungen in diesen Materialien. SAP garantiert nicht dieRichtigkeit oder Vollständigkeit der Informationen, Texte, Grafiken, Links oder anderer in diesen Materialien enthaltenen Elemente. Diese Publikation wird ohne jegliche Gewähr, wederausdrücklich noch stillschweigend, bereitgestellt. Dies gilt u. a., aber nicht ausschließlich, hinsichtlich der Gewährleistung der Marktgängigkeit und der Eignung für einen bestimmten Zwecksowie für die Gewährleistung der Nichtverletzung geltenden Rechts.SAP übernimmt keine Haftung für Schäden jeglicher Art, einschließlich und ohne Einschränkung für direkte, spezielle, indirekte oder Folgeschäden im Zusammenhang mit der Verwendungdieser Unterlagen. Diese Einschränkung gilt nicht bei Vorsatz oder grober Fahrlässigkeit.Die gesetzliche Haftung bei Personenschäden oder die Produkthaftung bleibt unberührt. Die Informationen, auf die Sie möglicherweise über die in diesem Material enthaltenen Hotlinkszugreifen, unterliegen nicht dem Einfluss von SAP, und SAP unterstützt nicht die Nutzung von Internetseiten Dritter durch Sie und gibt keinerlei Gewährleistungen oder Zusagen überInternetseiten Dritter ab.Alle Rechte vorbehalten.