16
Proceedings of 24th Int. Conf. on Conceptual Modeling, ER’2005 Klagenfurt, Austria, Oct. 2005 How to Tame a Very Large ER Diagram (using Link Analysis and Force-Directed Drawing Algorithms) Yannis Tzitzikas 1 and Jean-Luc Hainaut 2 1 University of Crete and FORTH-ICS, Heraklion, Greece 2 Institut d’Informatique, University of Namur (F.U.N.D.P.), Belgium email: [email protected], [email protected] Abstract. Understanding a large schema without the assistance of per- sons already familiar with it (and its associated applications), is a hard and very time consuming task that occurs very frequently in reverse engi- neering and in information integration. In this paper we describe a novel method that can aid the understanding and the visualization of very large ER diagrams that is inspired by the link analysis techniques that are used in Web Searching. Specifically, this method takes as input an ER diagram and returns a smaller (top-k) diagram that consists of the major entity and relationship types of the initial diagram. Concerning the drawing of the resulting top-k graphs in the 2D space, we propose a force-directed placement algorithm especially adapted for ER diagrams. Specifically, we describe and analyze experimentally two different force models and various configurations. The experimental evaluation on large diagrams of real world applications proved the effectiveness of this tech- nique. 1 Prologue It has been recognized long ago that the usefulness of conceptual diagrams (e.g. ER/UML diagrams) degrades rapidly as they grow in size. Understanding a large schema without the assistance of persons already familiar with it, is of- ten a nightmare. Unfortunately, large conceptual schemas are becoming more and more frequent. The integration of information systems, the development or reverse engineering of large systems, the usage of ERP (the SAP database in- cludes 30.000 tables) and the development of the Semantic Web (structured into ontologies potentially including dozens of thousands of classes) naturally lead to the building of very large schemas. Although a good drawing of a concep- tual schema could aid its understanding, and several approaches for automatic placement have been already proposed (e.g. see [33, 23, 8, 28, 10]), it is a widely accepted opinion that the automatic layout facilities offered by current UML- based CASE tools are not satisfactory even for very small diagrams (for more 1

How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

Proceedings of 24th Int. Conf. on Conceptual Modeling, ER’2005

Klagenfurt, Austria, Oct. 2005

How to Tame a Very Large ER Diagram(using Link Analysis and Force-Directed

Drawing Algorithms)

Yannis Tzitzikas1 and Jean-Luc Hainaut2

1 University of Crete and FORTH-ICS, Heraklion, Greece2 Institut d’Informatique, University of Namur (F.U.N.D.P.), Belgium

email: [email protected], [email protected]

Abstract. Understanding a large schema without the assistance of per-sons already familiar with it (and its associated applications), is a hardand very time consuming task that occurs very frequently in reverse engi-neering and in information integration. In this paper we describe a novelmethod that can aid the understanding and the visualization of verylarge ER diagrams that is inspired by the link analysis techniques thatare used in Web Searching. Specifically, this method takes as input anER diagram and returns a smaller (top-k) diagram that consists of themajor entity and relationship types of the initial diagram. Concerningthe drawing of the resulting top-k graphs in the 2D space, we propose aforce-directed placement algorithm especially adapted for ER diagrams.Specifically, we describe and analyze experimentally two different forcemodels and various configurations. The experimental evaluation on largediagrams of real world applications proved the effectiveness of this tech-nique.

1 Prologue

It has been recognized long ago that the usefulness of conceptual diagrams (e.g.ER/UML diagrams) degrades rapidly as they grow in size. Understanding alarge schema without the assistance of persons already familiar with it, is of-ten a nightmare. Unfortunately, large conceptual schemas are becoming moreand more frequent. The integration of information systems, the development orreverse engineering of large systems, the usage of ERP (the SAP database in-cludes 30.000 tables) and the development of the Semantic Web (structured intoontologies potentially including dozens of thousands of classes) naturally leadto the building of very large schemas. Although a good drawing of a concep-tual schema could aid its understanding, and several approaches for automaticplacement have been already proposed (e.g. see [33, 23, 8, 28, 10]), it is a widelyaccepted opinion that the automatic layout facilities offered by current UML-based CASE tools are not satisfactory even for very small diagrams (for more

1

Page 2: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

see [14])1. Consequently, the vast majority of layouts created today are done”by hand”; a human designer makes most, if not all, of the decisions about theposition of the objects to be presented [26]. The visualization and drawing oflarge conceptual graphs is even less explored. The classical hierarchical decom-position techniques that are used for visualizing large plain graphs (for a surveysee Chapter 3 of [29]), have not been applied or tested on conceptual graphs.Consequently, only manual collapsing mechanisms (like those described in [22])are currently available for decreasing the visual clutter and for aiding the under-standing of big conceptual graphs. In addition, the techniques that have havebeen proposed for reducing the size of a conceptual graph (in order to aid itscomprehension), specifically ER clustering, either require human input [15, 34,18, 7], or they are automated but not tested on large conceptual schemas [30, 1].

We decided to devise an automatic technique for identifying the major entityand relationship types of a very large conceptual graph as a means for facilitat-ing its understanding. As Link Analysis has been proved very successful in WebSearching [6, 25] and recently in several other application domains [19, 20], wedecided to design a similar in spirit technique for one very common and impor-tant kind of conceptual schemas, namely Entity-Relationship (ER) diagrams [9].Concerning the drawing in the 2D space of the resulting top-k graphs, we de-scribe a force-directed placement algorithm especially adapted for ER diagrams.Specifically, we describe and analyze experimentally two different force modelsand various configurations.

Both of the techniques that are presented in this paper can be applied notonly to ER diagrams, but also on other kinds of conceptual graphs. The Seman-tic Web is one interesting application area because it is founded on ontologies(potentially including dozens of thousands of classes) that are exchanged in alayout-missing format. In this context, the provision of top-k diagrams and au-tomatic layout services is very important as they can aid understanding thatis very important for accomplishing tasks like semantic annotation, creation ofontology mappings, ontology specialization, etc.

This paper is structured as follows: Section 2 describes Link Analysis forER diagrams, Section 3 introduces force-directed placement algorithms for ERdiagrams and finally, Section 4 concludes this paper.

2 Link Analysis for ER Diagrams

Our objective here is to identify the major entity and relationship types of avery large ER diagram in order to facilitate its understanding. We designeda PageRank [6] style scoring method because PageRank is described in termson the entire Web, while HITS [25] is mainly applied on small collections ofpages (say those retrieved in response to a query). We view an ER diagram as atriple (E,R, I) where E = {e1, ..., eN} denotes the entity types, R = {r1, ..., rm}denotes the relationship types, and I the isA relationships over E (i.e. I ⊆1 General graph drawings algorithms (e.g. see [2]) usually make some assumptions

that are not always valid in conceptual graphs.

2

Page 3: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

E × E). For any given e ∈ E, we shall use connR(e) to denote those entitytypes that are connected with e through relationship types, connsb(e) denotethe direct subtypes of e, and connsp(e) the direct supertypes of e. We shall alsouse the following shorthands: connI(e) = connsb(e) ∪ connsp(e) and conn(e) =connR(e) ∪ connI(e). Since two entity types may be connected with more thanone relationship types we consider connR(e) as a bag for being able to recordduplicates. In addition, we shall use attrs(e) to denote the attributes of an entitytype e.

Now the score (or EntityRank) of an entity type e in E, denoted Sc(e), canbe defined as follows:

Sc(e) = q/N + (1− q) ∗∑

e′∈connR(e)

Sc(e′)|connR(e′)| (1)

where q stands for a constant less than 1 (e.g. 0.15 as in the case of Google)2. Onecan easily see that the above formula simulates a random walk in the schema.Under this view each relationship type is viewed as a bidirectional transitionand the probability of randomly jumping to an entity type is the same for allentity types (i.e. q/N). The resulting scores of the entity types correspond tothe stationary probabilities of the Markov chain.

A rising question here is how we can incorporate n-ary (n > 2) relationshiptypes into the aforementioned model. This can be achieved by replacing each n-ary relationship type (n > 2) over n entity types e1, ..., en by n(n− 1)/2 binaryrelationship types that form a complete graph3 over e1, ..., en. Consequently, ann-ary relationship type is viewed as n(n− 1)/2 binary relationships.

An alternative approach is to assume that the probability of jumping to arandom entity is not the same for all entities, but it depends on the number ofits attributes. In this case we define the score (or BEntityRank) of an entity typee in E as follows:

Sc(e) = q|attrs(e)||Attr| + (1− q) ·

e′∈connR(e)

Sc(e′)|connR(e′)| (2)

where Attr denotes the set of all attributes of all entity types (i.e. Attr =∪{ attrs(e) | e ∈ E}). This particular formula simulates a user navigating ran-domly in the schema who jumps to a random entity e with probability q|attrs(e)|

|Attr|or follows a random relationship type (on the current entity). The probability|attrs(e)||Attr| corresponds to the probability of selecting e by clicking randomly on a

list that enumerates the attributes of all entity types of the schema.The linear algebra version of EntityRank and BEntityRank is given in Ap-

pendix A.

2 If we set q = 0.15 or below then an iterative method for computing the scores (e.g.the Jacobi method) requires at most 100 iterations to convergence.

3 A complete graph is a graph in which each pair of graph vertices is connected by anedge.

3

Page 4: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

Let’s now discuss the differences between link analysis for ER diagrams andlink analysis for the Web. Firstly, Web links are directed, while relationshiptypes are not directed thus the latter are considered as bidirectional transitions.Secondly, we do not collapse all relationships types between two entity typesinto one (as it is done with Web links), and this is the reason why we considerconnR as a bag. Thirdly, in ER diagrams we should count ”self hyperlinks”(i.e. cyclic relationship types), although the Web techniques ignore them. Forexample, consider a schema consisting of two entity types {Person, City} andtwo relationship types {Person lives City, Person fatherOf Person}. If weignore the relationship type fatherOf , then both Person and City would beequally scored, a not so good choice. At last, although in the Web link analysisis exploited mainly for ranking the results of retrieval queries, in our case wedon’t need just a ranked list of entity types, but rather another diagram thatconsists of the major entity and relationship types.

We have implemented and evaluated the above scoring schemes into theDB− MAIN CASE tool (for more see [17, 21]). The designer provides a thresholdper between 0 and 100. Subsequently, all entity types with score lower thanper% ∗ ScMax, where ScMax denotes the highest score, disappear. Controllingthe visibility of entity types according to their score, and not according to theirrank, is preferred as it better handles ties. Concerning relationship types, onlythose that connect the visible entity types are displayed. The computation of thescores takes only some seconds on a conventional PC. Specifically, to compute thescores we use the Jacobi iterative algorithm. We have noticed that 50 iterationsgive quite stable orderings and their application on schemas with 1000 entityand relationship types takes less than 2 seconds in a conventional PC.0-N1-N Ve nte à cr édit valeur mensualité e n de vise innombre me nsua lité svaleur a compte e n devise intertaeg ma ximumtaeg rée lnombre ma x terme s autorisévaleur mensualité e n de vise exvaleur a compte e n devise exte r 0-N1-NTYPE D'INTERVENTION PAR DOSSIE0-N 1-1tva/ec h 0-NTRANSPORTEUR PAR CANAL

0-10-N tra duction codes/type emballag 1-1tpd/tpdliv0-Ntype de top1-1 0-Ntpbsta/toptyp 1-1 0-Ntpbmtf/tpbsta 1-1top/a ction1-1 tpb/tpbsta0-N1-1 0-Ntoptyp/tpc 0-N 1-1toptyp/tpbcli

1-1 0-N 1-1support du loge ment0-N1-N 0-N0-N SUIVRE LE DOSSIER HeureTexte suivi dos sie r SAV 0-N1-N SUIVI OPERATRICECode oper atriceCode type de délai de livraisocode type de relaisCode type de suivi opératriceCumul commandeCumul ligneCumul temps d'e ncodageCumul vale ur catalogueCumul vale ur nette0-N0-N0-N0-N0-NSUIVI DES TOPS CLIENT Nombre de Clients date de demande 0-N 0-N Suivi des jets 1-10-N0-N SUIVI COMMANDESNuméro de sé que nce suivi c ommaEtat pour syste me de mesure1-10-N suivi bnb 0-N 0-N0-NSUIVI ACTIVITECode applicationCode mouve mentCode type mouve mentsCumul c ommandeCumul c olisCumul a rtic leCumul valeur ca talogue 1-N1-N Stat par canalNumé ro de niveau de statistiqu1-N1-N STAT FOURNISSEUR MENSUELLESCode mois anne eNombre d'articles réc eptionnésChiffre d'affa ire e n deviseChiffre d'affa ireChiffre d'affa ire prix de revi0-N 0-N 0-N

0-N servic es repris Cout service à la clie ntè le inCout service à la clie ntè le1-N0-N services proposés quantité se rviceTop modification logistique0-NSERVICE PAYSCout du servic e interneCout du servic e ext erneTa ux de c hange 0-N 0-1Rés ervé e à cette ac tivité0-N0-Nrépartition/tvaTaux de répartition de la tva 0-N 1-1revalorise r 0-N re pre sente0-N re mplac e0-1e st re mplac e0-1Remplac ement0-N 0-N 0-Nda te de fin de validité0-Ndate dé but de validité0-N remise fin anné e fournisseurtaux de ris tournetype de ristourneLibe llé ristourne fournisse ur0-N 0-NREFERENCE CONTIENTnombre de piles 1-N1-1rcn/rcnlgn

0-N1-N QUANTITE EN STOCKStoc k thé orique1-1 0-Nprxre f/prxre fcat1-1 1-Nprxre f/prxpos 1-10-Nprxraycat/prxrefc at 1-1 0-Nprxray/prxref1-1 0-Nprxray/prxrayca t1-11-Nprxposca t/prxrefcat0-N1-1prxposcat/prxposexe1-10-Nprxpos/prxposcat0-N 1-1prxpgec at/prxrefcat0-N 1-1prxpayc at/prxrefca t 1-10-N prxpaycat/prxra yca t1-10-N prxpa yca t/prxpgec at1-1 0-Nprxcat /prxpaycat

0-N 0-NPROMOTION D'UNE COMMANDERéponse a u concoursNombre de promotionsCode é tatValeur promotion interneValeur promotion e xterne0-N0-N PRODUIT/CATEGORIE Nombre d'articlesTop information 0-N 0-1prmrsv/sai0-1prmrsv/prmtyp concours0-10-10-1prm/prmrsv0-N 1-NPrix de ve nte/pays Valeur ca talogue de l'a rtic le Valeur hors tva de l'article (Valeur tva de l'a rtic le (interValeur tva de l'a rtic le (e xterTaux de TVATop TVA dans prix cata logue 0-N0-N0-N POSITIONS/ACTIVITE/MOISquantité vendue par mois0-1 1-NPOSITION photo a pre ndre 0-N 0-NPOS-SEMnombre d'ar ticle par sema ine0-N0-N POS-DTEnbre art. disponiblesnbre art. suspensnbre art. autre stockencours cde frn 0-N0-N pondération taux repa rtition tai/clrtaux repa rtition clr/reftaux pondé ration long te rme ex1-N 0-N0-N 0-N pondérationtaux re partition tai/clrtaux re partition clr/reftaux pondéra tion long terme ex0-N 0-N0-N Poids max/c ana lPoids maximum du colisPoids minimum du colis1-N1-1pile, famille pile

0-N 1-1pays/quartier0-N 0-N0-NPAYS ENSEIGNE LANGUE 1-1pays du fournisseur1-1pays de provenanc e 1-1pays de la devise1-10-1 pays1-1pa y/tpbsta0-N 1-1pay/tpbcli0-N 1-1pay/sopruepays de vente1-1 0-N 0-N 0-N PARAMETRER SIMULATION/SECTEUR Déc alage envoi/r éception 0-Ntva par dé faut1-Npa r défaut code tva 0-NPAR défautCode TVA 1-1 0-Nop/sai0-N0-N 1-1 0-Ndescription0-1décrire0-NNotice 0-N 0-N0-N 0-N0-N 0-1Mouve ment de stock Numéro de mouve mentNombre de pièc esDa te du mouvementPrix ac hat sta nda rdPrix de vente de ba seTop texte associé0-N 1-Nmode livraison/pa ys ech 0-N 0-NMise en logeme ntQuantité entrée en stockCode étatCode article vpc1-11-10-10-1Da te de de pot1-10-N0-N 0-N Carte0-1sa che t1-1LOTAffra nchissement unita ire 1-N0-N1-1ligne0-1 0-Nlien e ch/pos 0-Nla ngue1-10-N lan/tpbsta 0-N 1-1lan/tpbc li 1-10-N la n/soprue0-N 1-1lan/qrt 0-N 1-1je t/tpbc li0-N individu/pays0-N 0-N0-N 0-N0-N 1-1implanté 0-N implanter le s ta illesTaille impla ntée 1-N0-N0-N habitation/pa ys1-1 0-Ngere 1-10-N frnpubptc/mrq 1-N0-N 1-1frn/rc nlgn1-N 1-1frn/pay0-N0-N 0-NFRAIS PAR DEFAUTCode fraisMontant des frais de port inteMontant des frais de port exteTa ux de c hange 0-N 1-N0-N 1-NFRAIS D'EXPEDITIONPoids maximum frais d'expéditiPoids minimum frais e xpéditionVa le ur frais d'e xpédition1-1fournisseur/pays 0-N 1-1fournisseur/langue1-1 fia /ori 0-N 1-1facture / rec ept ion 1-1 0-Nfac ture / commande 1-N

0-N E 0-N1-11-N e ta/etatyp 0-N1-1 e st stocké0-N 0-Nest réc eptionnéquantité livré e réellequantité livré e non conformeeta t lv 1-10-N est prese nt0-N1-1e st livré0-N1-1 e st livre 1-1 1-Nest géré0-N 1-1est emballé 1-1 1-N est commandé par0-N 1-1est commandé 1-10-N1-1e nvexd/lgt1-10-Nenve xd/envexdpra1-1 0-Nenvcop/jet0-N ECHANTILLON/CATEGORIENombre d'articles0-N0-N ECHANTILLON CONTIENT nombre de piles 0-N0-1e ch/photo a pre ndre Ca deau a ssocié0-N 0-N0-N 0-N EARLY BIRDNombre de cadeaux attribué s toNombre de cadeaux maximum totaNombre de cadeaux maximum/jourNombre de joursNuméro du jour en coursNombre de cadeaux du jour en c GRP-PRM#C0-NDET-PRM#C0-NDé taila rticle vpc ense mble0-Narticle vpc pièc e0-Ndécomposition d'un ense mblequantité dans ensemble 0-N dte3/tpbsta dte0-N dte 2/tpbstadte0-N dte/tpbstadtedate de réception1-1 0-Ndte /tpbsta0-Ndte/paytaux de c hange Date de dema nde1-10-N dte/je t0-N 1-1Document par langue 1-1dev/pntctl0-N 1-1date de réce ption 0-N1-11-Nc sa/csade t0-N0-Ncrédits possibles 0-N 0-Ncrédit repris/c atalogue0-NCREDIT PAR MONTANTPrix de vente compt ant externeVa le ur de l'ac ompte externeVa le ur de mensua lité externeNombre de mensua litésTaux a nnuel effect if global maTaux a nnuel global re el utilis0-N0-N1-1Correspond 1-11-10-N 1-1conc erne 0-N 0-N0-N com/lanlibe llé du c ommentaire1-1 colvatmsl/c olva tmsldet

1-11-Nc nd-cndtyp 1-N0-1 cn/rla1-Ncn/pnttyp0-N 1-1cn/pay 0-N 0-1cn/csa 1-10-1cn/controle goulotte1-1 0-Nc n/colsta 0-N1-1 cn/c ntyp0-Nc n/cnla n 0-N1-1c n/circuit0-N 1-1cn-cnd0-N 0-N0-NCIRCUIT1-1 c gp/tpbcli0-N c d/eta0-N0-1 c crtyp/envcop1-N c atref/svc Nombre de se rviceCout du se rvice à la clientèle 1-N1-1catpos pay/pay 1-10-N catpospay/dev

1-N0-N0-N CAT-PRD- SEMnombr e de référe nce c umuléeLong terme a ctualisélong terme e xtrapole0-N 0-NCAT-PRD-DTELong te rme actualisénombre de ré férence cumuléelong te rme pré vuta ux a vancement produit/c atalota ux c ouvertureta ux majorationta ux re tours produit 0-N0-1 1-1CANAL/ FERMETURE 0-N 1-1Canal livraison0-N 1-1Canal commande0-N 0-N 0-N0-Ncalcul goulotteBorne supé rie ure code postalBorne inférieure code postalcode tournée 0-N0-N 0-N 0-Nc alcul canalBorne supérieur e code postalBorne infé rie ure code postal 0-N 1-1cadeau a preleve rLibe llé a rt ic le à c ommuniquer 0-N 1-1Cadeau Sec teur ac tivite0-N1-11-N0-N

VENTE AU COMPTANTnombre me nsua lités ma ximum pouprix ve nte c omptant UNITE BUDGETAIRENuméro d'unité budgétaireLibelléTYPE STOCKCode type de stockLibellé type de stock TYPE PAIEMENT RECEPTION c ode type paieme nt réc eptionlibellé type pa ie ment réceptioTYPE INTERVENTIONCode type d'interve ntionLibellé type d'interve ntion SATYPE FRAIS D'EXPEDITIONCode type frais d'e xpéditionLibellé de frais d'expédition

TYPE ETATCode type éta tLibellé type éta tTYPE DOSSIER SAAVCode type dossie r SAAVLibellé du type dossier SAAVTYPE DE TOP PUBLICITAIRE TYPE DE TOP CLIENT TYPE DE CANALcode type de ca nallibe llé du type de canal[0-1]top c ana l de c ommande(o/n)[0-1]top c ana l de livraison(o/n)[0-1]top sa isie commandes[0-1]TYPE D'EMBALLAGEcode type e mballagelibellé TYPE CIRCULAIRECode type circula ireLibellé type circula ireCode ca talogue/cade auTop bon d'ac hatTop ristourneTop à payerTYPE CENTRE DISPATCHINGCode type ce ntr e dis patchingTYPE ARTICLE SCORINGCode type art/scoringLibellé TYPE ADRESSEc ode a dressedescription adre sseTVAc ode TVALibellé TVATa ux de tvaindice comptable de la tvaTop traite ment mise à jour TRANCHE CACA fournisseur de baseTRADUCTION DES CODEScode type de texteValeur de la cléTraductionTOP CLIENT Code Top ClientLibellé Top clie ntTope r/détoperTOMBOLA CHANCE ARTICLEnumé ro du billet SUPPORT DE STOCKAGECode support de stocka geLibe llé fra nca is du supportLongueur du supportLa rgeur du supportHauteur du supportPoids à videNombre de logements contigusNombre de logements opposés

SUIVI DES TOPS SANS COMMANDE Nombre de ClientsSTATISTIQUES DE CONSOMMATIONc ode sta tis tiques de consommatdésigna tion sta tistiques de co SERVICEc ode se rvicedésigna tion se rviceTop service dans prix a rtic le semaineannéesemaineSECTEUR ACTIVITECode secteur a ctivitélibellé secteur d'a ctivitécode enseigne Unité budgé taireCode de texte cibleCode texte stimula tionCode texte specialCode texte c alculSAISONCode saisonLibe llé de la saisonTop tra ite ment par fac turation

REVALORISATIONNuméro de reva lorisationDa te reva lorisationStock ava ntPrix avantQuantité livrée / fournisseurPrix fournisseurNouveau prix moyen pondé réRETOUR FOURNISSEURNuméro de réparationQuantité re tournée c hez le fouDate de cré ation ré pa rationDate d'envoi au fournisse ur

RESERVATION PROMOTION code promotionTexte de la promotion de reserDébut de va liditéFin de va liditéTop Sans commande pr évuAction pré vueRESERVATIONnuméro bon rés erva tionCode éta tnombre d'articles rése rvésnb articles commandés sur résepériode de rés erva tion : débutpériode de rés erva tion : fin

REGROUPEMENT STATISTIQUECode regroupementLibellé regroupeme ntREFERENCE DOUANECode réfé renc e douaneComme ntaire de la ré férence doCode unité complémenta ire douaLibe llé réfé renc e douaniè reLibe llé franca is réfé renc e dou REFERENCE ARTICLENuméro de ré férence articleLibellé franc ais de l'articleCode marque articleCode article enc ombrantCode article LDFCode tissuTop sans promotionCode plage d'attr ibution numé rTop produitType article pour promotionFournisseur pe rsonna lisationCode statistique de c onsommatiLibellé née rla nda is de l'articCode de familleCode taille groupéeTaille minimumCode mode d'e mploiNuméro de l'acheteurCode type d'e mballageTop desc ription te chniquetop fragileNombre de piece s dans le lottop emballa geTop texte pour conce ptionTop texte condition SAAVtop texte de contrôlenuméro d'a vis top instruc tion saisie cdetop desc ription du colistop instruc tion pour le pic kintop texte prix unitairetop texte pour transporteurtop texte pour show roomTop pré montageRefere nce produit decomposeRECEPTION ECHANTILLONnumé ro r éce ptiondate r éc eption rée llenom du firme trans porteurnumé ro plaque camionheure ar rivé e trans portheure dé part trans portnombre colis planifiénombre pale ttes planifiénombre colis ré elnombre pale ttes ré elstock???te xte réc eptioncode fourniss eurcode type articlenombre de ligne s de r éce pti onCode étatCode ge stion achatPoids taxableCubage (e n m3)Code type de ré ceptionDate mise e n stock de r éce ptiocode marquagecode type article r éce ptionnombre pers onne manute ntioncode de balleDate ar rivé e quai transpTop r endez vous RECEPTIONnumé ro réce ptiondate réce ption réelleDa te mise en stock de ré ceptionom du firme transporte urnumé ro plaque camionheure a rrivée transportheure dépa rt tra nsportnombre colis planifiénombre pa le ttes planifiénombre colis réelnombre pa le ttes réeltexte réc eptionnumé ro emplacement stock te mponombre de ligne s de réce ptionCode étatCode gestion achatPoids ta xableCubage (en m3)Code type de réc eptionc ode marquec ode type a rtic le ré ceptionnombre pe rsonne ma nut entionc ode deballeDa te arrivée quai transpPROPOSI TIONS D'ACHATNuméro de proposition ac hatType de proposition ac hatDa te initia le de livraisonDa te proposition d'achatHorizon en nombre de jourCode état PRODUIT TAILLEtaillelibe llé taille francaislibe llé taille en néerlandaislibe llé taille née rlandaisdescription te chniquePRODUIT FOURNISSEURréférence fournisse urlibe llé produit fournisse ur% frais de transpor t% droits de douane% commission d'agent% frais divers achat produitunité de mesurenombre article par boitequantité minimum à commandermontant minimum à c ommanderdélais de livraisontexte bon de commandenombre mois garantie sur piècenombre mois de gara ntie sur prtaux d'importationtype de fournisseurquantité / boite fournisseurtaux inspec tion PRODUIT COLORISlibellé francais coloris du prlibellé né erlandais du colorisPRIX:REFERENCE/CATALOGUEpage -seconda ire 1page -seconda ire 2page -seconda ire 3c ode tvataux de tva francaissur facetype d'offretype de prixtaux de pondera tionpage livre bla ncc ommenta iretop atop btop ctop dtop etop fquantite re alise e/pays ve ntequantite re alise fra ncec hiffre cata logue PRIX:RAYONrayonlibe lle rayonsecteur groupe PRIX:POSITIONS DU CATALOGUEprix de revie nt pre visionne l fprix de ve nte fra nca isprix de ve nte theoriqueprix de ve nte psyc holoqiqueprix de ve nte ret enuquantité réa lisée Francequantité réa lisée /pays ventequantité prévue Fra nceflag modif positionflag blocage des prixtype de prixcodificationPRIX:POSITIONc oloris ta illelibelle coloris libellé taillelibellé positionCode position mere PRIX:CATALOGUE-RAYONlibelle rayonse cteurgroupeCA TTC ré aliséCA hors taxe ré alisémarge brute hors taxe ré alisé ebf-prévisionnel calculébf-réaliséc a hors ta xe prévisionnelmarge brute hors taxe prévisiomultiplic ateur ff prévisionne lPRIX:CATALOGUE PAR PAYStaux de change taux de cessionpa rtie fixetaux de cession ldfpa rtie fixe LDFdiffusion action pa ys ve ntesaison pays venteCode devise PRIX:CATALOGUECode catalogue libelle catalogue ac tion pays d'originesaison pa ys d'origineZone action Zone de prix durée cata logue e n moisda te d'a jout PRIX: REFERENCEreferencelibe lle refe renc eref mere PRIX: PAGES DU CATALOGUEpage libe lle de la pageCA TTC réaliséCA hors taxe réaliséMarge brute hors taxe réaliséeCA hors taxe pré visionnelMarge brute hors taxe pré visio POSITION FOURNISSEURCode fournisseur de l'articlelibellé position/fournisse urtype fournisse urPILE/MARQUECode marqueCode pile dans la marquePILEN¦ Be batlibellé BebatCode format pileCode I.E.C. pilePoids de la pileHa uteur pileLongueur pileLargeur pile PARTICULARITE MOUVEMENTType de mouvementCode pa rtic ularité mouvementDate début validitéDate f in validitéLibellé pa rtic ularitépa ie ments des mensualitésNo séquentielType de recordMontant à payer (inte rne)Date du paiementMontant payé (inter ne)Libe llé paiementQui a liquidéPa rt des frais financiersPa rt de la tva dans la mensualPa rt à re mbourser (inte rne)Montant à payer (externe)Montant payé (externe)Pa rt à re mbourser (externe) OFFRE PRODUITnumé ro offredate début validitédate fin va liditémontant minimum à comma nde rquantité minimum à c ommanderdate de transmission par fournlibellé de l'offretype d'offre OFFRE POSITIONligne offreprix de base prix de l'offreprix d'a cha tunité de commande Prix de revient prévisionneltype d'offretaux ristourne 1taux ristourne 2taux ristourne 3taux ristourne 4taux ristourne 5frais d'emba llage numéro offre moisanné emoisMODE DE LIVRAISONc ode mode de livraisonlibelle mode de livraison MODE DE DEMANDE DE PERSOCode mode de demandeLibellé de mode de dema ndeMODE DE CREDITc ode c réditdésigna tion c rédit MARQUEcode marque LOGEMENTc ode logeme ntType de logementCode bloc ageCode état du logementCode pse udoniveauCode état d'inventaireDa te dernie r inve ntaireDa te inve ntaire - 1Da te inve ntaire - 2Da te inve ntaire - 3Da te inve ntaire - 4Top protec tion incendieCode type de prélèvementCode groupe de loge mentLIGNE RECEPTION ECHANTILLONnumero de l ignequantité the or iquedate fin livrai soncode e tat LIGNE RECEPTIONnuméro ligne réceptionnombre a rtic le s re ceptionnés cnombre a rtic le s ré ceptionnés ntype article livrécode état avanceme nt ligne ré ctop c ompte rendu réc eptiondate fin ré ceptionnom responsable contrôle qualinombre a rtic le s ré ception théotexte libre de réceptiontop marquage à la réce ptiontop r econditionne ment à la ré ccode secte ur d'a ctivitéLIGNE PROPOSITION D'ACHATNuméro ligne proposition d'achDate initia le de livraisonQua ntitée proposé eNombre minimum article à commaNuméro de l'offreQua ntité rée lle proposée LIGNE PHOTOnume ro de ligneNombre d'article s sur la photoc ode montage studiotop monta ge fait

LIGNE DE COLIS SANS ADRESSEIndic e de la lignecode origineRéférence articleLIEU DE LIVRAISONcode lie ulibe llé

LANGUEcode languelibe llé c ode languelibe lle langueTop la ngue pour le c lientJOURNAL DES VENTES

JOURLibellé du jourJETNuméro de je tFOURNISSEURcode fournisseurNom du fournisseurNuméro de TVAnuméro de registre de comme rceCode jour de livraison 1Top te xte libre fournisseurDélai de livraisonVa le ur minimum à commanderDate de dé but de vaca nce sDate de fin de vac anc estype de fournisseurCode jour de livraison 2Code condition d'envoiPoids minimum à commanderVolume minimum à comma nderNumero de l'a che teurDelai de confirmation bon de cnom fournisseurcode jour de livra isondélais de livraison maximumCode mode de paiementNuméro compta ble du fournisseuCode de viseTop livr aison sé paré eTop dema nde sé paré eTop fournisseur pe rsoCode mode de dema nde FILIALEc ode filialeLibellé filia leFERMETURE RELAIScode canalDate début ferme tureDate fin fermeture

FAMILLE PILEfamille Be batlibellé fa mille Beba t FACTURENuméro de factureValeur fa cture internec ode c analValeur fa cture e xterneFACTURATIONnume ro facturationligne de facturationnbre article fa cturéprix unita irec ommenta iree tat factura tionEXCEPTION ( multi)prése ntation chiffre ca taloguetop multipleprix revient previsionnel fra nprix de vente francaisquantité prévue fra ncecodifica tion ETAT D'AVANCEMENT (LIGNE OU COtype d'éta t[0-1]c ode é tatlibellé état[0-1]Libellé état en 3 cara ctère sLibellé état en 10 c aract èresNombre de jours de réte ntionCode refus ENVOIS EXEDENCE PREPARATI ONNombres d'articlesNombre de prélèveme ntsPrix de vente a exc edenceType d'envois e xce denceNuméro de sé lection exc édenceCode etat ENVOI EXCEDENCENombre d'articlesNombre de préleve mentsType d'e nvoi excéde ncecode étatnuméro de sélectionECHANTILLONnuméro échantillonlibellé née rla nda islibellé franc ais libellé fournisseurVolume d'a rtic le pour tr ansporpoidsnombre de colishauteur montélarge ur montélongue ur montéhauteur e mballélarge ur emba llélongue ur emba llévolume e mballécode mode d'emploitop fragiletailletop suiviprix achat articletop conformetop reponse courrie rtop ve nda bletop mode emploi conformetop emballa ge conformeunité compléme ntaireprix vente estiméTop attCode colori de l'articleTop test externeRefere nce c hez le fournisse urCode fournisseur de l'a rticle DOSSIER SAAVNuméro de dossier SAVDate du dos sie r SAVRéfére nce f ournisseur articleLibellé francais articleNuméro de page du docume ntCode saisonCode actionArticle cata logueNuméro de clie nt (interne)Code canalNuméro du colisDate de facturationNuméro de ré pa rationCode type d'a dresseCode étatPrix de vente de baseSe cteur d'ac tivitéNuméro lien a dresse courrierNuméro lien a dresse livraisonNuméro adresse si absentDISPONIBILITECode disponibilitéLibellé disponibilité

DETAIL DOSSIER SAAVNuméro de dossierNuméro de ligneTexte de la ligne dema nde clie ntdate de saisieCode é tattop circuitNumé ro de c ommande[0-1]Motifréponse subsidiaireCode t ype de saisieCode user de saisieDATE dateNuméro de sema ine CORRESPONDANT 3 SUISSEScn#cCONTROLE MOUVEMENTS PAIEMENTSNuméro contrôle mouveme ntDa te de réc eption bandeNombre de records sur bandeValeur totale paiement interneCode traitementDa te de paiementIdent ifica tion ba nde paiementUse r colisValeur totale paiement externeTop a rchivageValeur paiement anomalie interValeur paiement anomalie exte rCONTROLE CIRCUITCode circuitLibellé circuit CONTROLE CALCUL GOULOTTEc ode c analtop sec teur ac tivité goulottetop mode de réception goulottetop pays pour goulottec ode c anal déviation goulottec ode c anal déviation tourné eCONDITIONNEMENTCode conditionneme ntLibe llé c onditionneme ntCout unita ire COMPOSITION D'ENVOICode composition d'envoiLibellé composition d'envoiCommentaireDernier je tDate dernier jetTop envoi reta rdéDate d'envoi probableTop circuitAction à éditerValeur bon d'achat externeValeur minimale commande exterTaux de ristourne à édit erCOMPLEMENT DOSSI ER SAAVCode pa ysNom de l'a dresseNuméro de téléphoneLocalitéCode postalRue de l'adre sseNuméro de l'adre sseNuméro du colisDa te factureNuméro de répa rationNuméro de dossier COMMENTAIRENumé ro commentaireLibellé fra nca is commenta ireLibellé néerlandais comme ntairCOMMANDEnumero de commandedate livraison problabledate rende z-vousprix ac hat e n FBdate de mandequantité demandéeprix ac hat e n devisedate butoir pour lay-outtop fiche tec hniquetop tissutop boistaux de ris tourneetat cdenuméro de le ttreDate initiale de la livraisonCOLORIScode c olorislibellé coloris e n francaislibellé coloris e n né erlandais

COLIS SANS ADRESSESComposteurDate de retouropéra tric e d'ouverturePaysCode postalNuméro de localité Commenta ir e ou colis retrouvéCréate ur commentaireIndice supplémenta ireCENTRE DISPATCHINGCode ce ntr e dis patchingLibelle c entre dispatc hingDonné e e ntr eprise centre dispa CANALc ode c ana llibellé du ca nal[0-1]délai de livraison[0-1]top ca nal de commande(o/n)[0-1]top ca nal de livraison(o/n)[0-1]top ca nal de retour[0-1]type de mouvement de paiement[0-1]c ode blocage[0-1]c ode c ana l de déroutage [0-1]top saisie des comma ndes[0-1]nom du responsa ble du ca nal[0-1]numé ro de téléphone du canal[0-1]c ode c irc uit[0-1]c ode langue [0-1]c ode posta l[0-1]c ode du type de ma gasin[0-1]Nombre maximum de colisTop scission du colisTop attribution no livraisonType de bordereau de livraisonOrigine du mouve ment PARCode type centre dispatchingNuméro de code postal/localitéCode depotNombre de poste de travail/heuBICNumé ro de réfé rence de l'a rticCode filialeNumé ro BICLibellé du BIC ARTICLE VPCCode article VPClibellé francais article VPCLibellé né erlandais a rtic le VPCode coloris de l'articleDé la i de livra isonCode éditionCode exceptionLongueur emballé e de l'a rt ic lePoids de l'articleCode pli articlePrix ac hat standardVolume de l'a rtic lePrix de vente de baseUnité de ve nteNombre de colis pour l'articleTop tec hnique 1Top tec hnique 2Prix ac hat hors cessionCode article VPC interneCode article VPC externeQuantité minimum article pic kiquantité ré servé e 24 hCode taille de l'articleLargeur emballé e de l'art ic leHa uteur emballée de l'a rtic leLongueur montée de l'articleLargeur montée de l'articleHa uteur monté e de l'articleCode fournisseur LDFNombre minimum article stockCode frais Code type de remplace mentTop abandon de l'articleType de message sur bor derea uVolume article pour transportCode loge mentPrix vente soldeCode report campagne suivantTop article sur-stockTop modif prix soldetop article composé[0-1]Da te initia lisation statistiquACHETEURc ode a che teurNom acheteurgroupe achatc ode pays de provenance par dec ode pays de fabrication par d10 période s de cade nce mentnombre de jour de sécuriténumé ro de personnetaux de c ouverturePériode d'échelonnement[ 10-10]

Fig. 1. Excerpt (< 10%) of a large ER diagram drawn using a force-directed placementalgorithm

Figure 1 shows a very small part of the ER diagram of a Belgian distributioncompany. Though the schema comprises about 450 nodes and 800 edges only, the

4

Page 5: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

layout is definitely useless for understanding the schema and the correspondingapplication.

0-N

0-N

SUIVRE LE DOSSIER

1-N

0-N

SUIVI OPERATRICE

0-N

0-N

SUIVI LIVRAISON COLIS

0-N

0-N

SUIVI DES TOPS CLIENT

0-N

0-N

1-1

SUIVI COMMANDES

0-N

0-N

0-N

SUIVI ACTIVITE

1-N

1-N

STAT FOURNISSEUR MENSUELLES

0-N

0-N

simulation prix de vente

0-N

0-N

réexpédition colis

est remplace0-1 remplace

0-1

Remplacement

date début de validité0-N

date de fin de validité0-N

0-N

remise fin année fournisseur

0-N

0-N

POSITIONS/ACTIVITE/MOIS

0-N

0-NPOS-DTE

0-N

1-N

pondération

0-N

0-N

Poids max/canal

0-N

1-N

PAYS LANGUE

0-N

0-N

0-N

PAYS ENSEIGNE LANGUE

0-N

1-1

pays du fournisseur

0-N

1-1

pays de la devise

0-1

0-N

Pays d'origine

décrire0-N

description0-11-1

0-N

0-N

Notice

0-N

0-N

Mouvement de stock

0-N0-N

Date de depot1-1LOT

1-10-NLIVRAISON COLIS

1-N

1-1

HISTORIQUE COLIS

0-N0-NFRAIS PAR DEFAUT pays destination

0-N

pays d'origine0-N FRAIS DE DOUANE

1-1

0-N

fournisseur/pays

1-1

0-N

fournisseur/langue

0-N

0-N

FORMULE PAR PAYS/LANGUE

0-N

0-N

0-N

Cadeau associé0-N EARLY BIRD

article vpc pièce0-N

article vpc ensemble0-N

décomposition d'un ensemble

0-N

0-N

dte/pay

1-1

0-N

Document par langue

0-N

0-N

cumul mouvement systeme mesure

0-N

0-N

Composition

0-N

0-N

colcom/suicol

0-N

1-1

col/eta

1-10-N

cn/suicol

1-1

0-N

cn/pay

1-1

0-N

cd/suicol

0-1

0-N

cd/eta

1-1

0-N

Canal livraison

1-1

0-N

Canal commande

0-N

0-N

0-N

calcul goulotte0-N

0-N

0-N

calcul canal

1-1 0-Natvsec/cd

0-N

0-1

artvpc/frn

0-N

1-N

artref/tva

0-N

1-N

artref/douref

1-N

0-N

artref/atvsec

0-N

1-1

artref/artvpc0-N

0-N

0-N

0-N

ALIMENTER

1-1

0-N

achfrn/lan

SECTEUR ACTIVITE

REFERENCE ARTICLE

PAYS

LANGUE

FOURNISSEUR

ETAT D'AVANCEMENT (LIGNE OU CO

DATE

COMMANDE

COLISCANAL

ARTICLE VPC

Fig. 2. The diagram of the top-11 entity types of the schema of Figure 1

Now Figure 2 shows in micrography the diagram of the top-11 entity typesof the schema of Figure 1 according to EntityRank (for reasons of space, theattributes are not displayed in this figure). Although this schema has 54 rela-tionship types it is extremely more easy to visualize, and thus to understand,than the original schema. Of course, one user could start from even smaller di-agrams. For instance, Figure 3 shows the graph of the top-5 entity types of thesame schema. It indeed contains the major entity types of this application anda user can immediately understand the application domain of this schema.

In case of diagrams with big isA hierarchies, some entity types, although ma-jor, may not receive high scores because their relationship types are scatteredin several subentity types. To handle this case, we introduced a (optional) pre-processing step in which each isA hierarchy of the schema is collapsed into oneentity type that collects all the attributes and relationship types of its subentitytypes.

Concerning evaluation, at first we have to note that the evaluation of theeffectiveness of link analysis techniques for ER diagrams (for conceptual graphsin general), is more difficult than in the case of Web. In the latter case, it isnot so hard to judge whether the top ranked pages are indeed relevant to thesubmitted query. However, in the case of conceptual graphs, one has to know wellin advance the conceptual graph in order to judge whether the resulting smallgraph indeed contains the major concepts of the conceptual graph and of itsunderlying domain. Inevitably, the most reliable evaluation of such techniquescan be done only in already known conceptual graphs. For this reason we appliedthis method to almost every conceptual schema that the DBMAIN group hasproduced the last 3 years. This was a quite representative test bed as it includes

5

Page 6: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

0-N

0-N0-N

supplies

0-N0-N

Stock Movement

replaces0-1

replacedBy0-1

replacement

0-N

0-NPOSITIONS/ACTIVITIES/MONTHS

0-N

1-N

Operator

0-N

1-1description0-1 describes

0-N

Notice

0-N

0-N

Max Weight/Channel

0-N

0-N

0-N

follow-up activityBonus

0-N

0-N

0-N

EARLY BIRD

0-N 0-N

DefaultCost

partOf0-N

0-N

decomposition

origin0-N

destination0-N

Customs costs

0-N0-N

country language

0-N

1-1

cn/pay

0-N

0-N

0-N

channelCalculation

0-N 0-N

0-N

path Calculation0-N

0-N

accumulated movements

State of Progress

SectorOfActivity

Item

Country

Channel

Fig. 3. The diagram of the top-5 entity types of the schema of Figure 1

big schemas of existing (and non artificial) real world applications. We alwaysobtained surprisingly good results. For reasons of space we cannot report here theexact results of the evaluation of EntityRank (and BEntityRank) using metricscoming from the area of IR (for more [35]). In addition, it is an advantage thatthe proposed formulas for link analysis are mathematically founded and thatthe underlying model (the random walk model) is quite relevant to browsing,i.e. to the most widely used method for understanding a conceptual graph. Atlast, another evidence that link analysis is indeed appropriate for ER diagrams isthat all large schemas that we have tested have a small set of elements, usuallyless than 5% of the total ones, whose scores are significantly higher than therest. This at least indicates that big ER diagrams tend to have a well connectedkernel which, at least in our experiments, always comprised the more importantconcepts of the application domain.

3 Automatic ER Drawing

For drawing automatically the top-k ER diagrams that are derived by the pre-vious technique, we shall view them as mechanical systems. Below we presenttwo force models that combine the spring-model (proposed and developed in [13,24, 16]) with the magnetic-spring model (proposed in [32, 31]) in a way that isappropriate for ER diagrams.

3.1 Force Model A

Here entity types are viewed as equally charged particles which repel each other.Relationship types and isA relationships are viewed as springs that pull theiradjacent entity types. Moreover, we assume that the springs that correspond toisA links are all magnetized and that there is a global magnetic field that acts onthese springs. Specifically, this magnetic field is parallel (i.e. all magnetic forcesoperate in the same direction) and the isA springs are magnetized unidirection-ally, so they tend to align with the direction of the magnetic field, here upwards.Figure 4 illustrates this metaphor.

6

Page 7: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

��������

��������

��������

��������

Person CompanyworksAt

Manager Secretary

electrical repulsion

magnetic field

magnet

Fig. 4. Viewing an ER diagram as a mechanical system

Under the above force model, the force on a entity type ei is given by:

F (ei) =∑

ej∈conn(ei)

f(ej , ei) +∑

ej∈E,ej 6=ei

g(ej , ei) +∑

ej∈connI(ei)

h(ej , ei) (3)

where: f(ej , ei) is the force exerted on ei by the spring between ej and ei (notethat ei and ej are connected by a relationship type or an isA link), g(ej , ei) isthe electrical repulsion exerted on ei by the entity type ej , and h(ej , ei) is therotational force exerted on ei by the entity type ej (here ei and ej are connectedby an isA link).

Figure 5 gives some indicative examples that explain the role of the forces f ,g and h. Specifically, figure (a) justifies the spring force, figure (b) justifies theelectrical repulsion and shows that high electrical repulsion (high Ke) results insymmetrical drawings, and figure (c) illustrates how the magnetic field can beused in order to obtain the classical top-down drawings for isA hierarchies.

(c)

(b)

with f with f and g (low Ke)

with f and g (high Ke) and h

with f and g (high Ke)

with f and g (high Ke)

no force with f

(a)

no force

Fig. 5. Forces and ER Drawings

The spring force f(ej , ei) follows Hooke’s law, i.e. it is proportional to thedifference between the distance between ej and ei and the zero-energy length of

7

Page 8: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

the spring. Let d(p, p′) denote the Euclidean distance between two points p andp′ and let pi = (xi, yi) denote the position of an entity type ei. The x componentof the force f(ei) is given by:

fx(ei) =∑

ej∈conn(ei)

Ksi,j(d(pi, pj)− Li,j)

xj − xi

d(pi, pj)

where Li,j denotes the natural (zero energy) length of the spring between ei andej . This means that if d(pi, pj) = Li,j then no force is exerted by the springbetween ei and ej . Now Ks

i,j denotes the stiffness of the spring between ei andej . The larger the value of Ks

i,j , the more tendency for the distance d(pi, pj) tobe close to Li,j . The y component of the force f(ei) is defined analogously.

The electrical force g(ej , ei) follows an inverse square law. The x componentof the force g(ei) is given by:

gx(ei) =∑

ej∈E,ej 6=ei

Kei,j

d(pi, pj)2xi − xj

d(pi, pj)

where Kei,j is used to control the repulsion strength between ei and ej . The y

component of the force g(ei) is defined analogously.The magnetic force h(ej , ei) depends on the angle between the isA spring

(that connects ej and ei) and the direction of the magnetic field and it induces arotational force on that spring. For example, Figure 6 shows an isA link betweenei and ej and the exerted forces on ei and ej due to the magnetic field. The xand y components of the magnetic force h(ei) are given by:

hx(ei) =∑

ej∈connsp(ei)

Km xj − xi

Li,j+

ej∈connsb(ei)

Km xj − xi

Li,j

hy(ei) =∑

ej∈connsp(ei)

Km Li,j + yj − yi

Li,j−

ej∈connsb(ei)

Km Li,j + yi − yj

Li,j

where Km is used to control the strength of the magnetic field.

le

h (e )

h (e )

j

ei

y − y

x − x

l

Magnetic Field

x y

x

j

j i

h (e )y

j

i

i

j

i

h (e )

Fig. 6. Magnetic forces and isA links

The x and y components of the composed force F (ei) on an entity typeei are obtained by summing up, i.e.: Fx(ei) = fx(ei) + gx(ei) + hx(ei) andFy(ei) = fy(ei) + gy(ei) + hy(ei).

8

Page 9: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

As in the link analysis technique, we view an n-ary relationship type asn(n− 1)/2 springs.

3.2 Force Model B

One weakness of the above model is that the resulting drawings can have severaloverlaps. The reason is that: (a) there is no repulsion among relationship types,and (b) there is no repulsion between entity and relationship types. Figure 7illustrates this problem. This drove us to introduce a different force model whereeach relationship type is viewed as a particle too. Clearly, the resulting electricalrepulsion discourages the creation of overlaps (between entity and relationshiptypes, or between relationship types themselves). Notice that according to thisview, a relationship type does no longer correspond to one spring. Specifically,the particle of a relationship type over k entity types, is connected with onespring with each one of them. The forces on entity types and relationship typesare computed analogously to the force model A.

(b)

Without rel−rel repulsion

Without ent−rel. repulsion

With rel−rel repulsion

With ent−rel repulsion

(a)

Fig. 7. Forces and ER Drawings

3.3 The Drawing Algorithm

We can reach a drawing by an algorithm that simulates the mechanical system.Such a algorithm would seek for a configuration with locally minimal energy,i.e. a drawing in which the forces on each node is zero. A variety of numericaltechniques can be used to find an equilibrium configuration, and thus the finaldrawing. We have adopted the iterative method based on the method proposedin [13]. At first the nodes are placed at random positions. At each iteration,the force on each node is computed and then the node is moved towards thecorresponding direction by a small amount proportional to the magnitude of theforce. This can be continued until convergence, but we can also limit the numberof iterations.

Note that if we would like to find a drawing that corresponds to a statewith globally minimal energy, then we would have to resort to very general op-timization methods. For instance, a method based on simulated annealing isproposed in [11], while an approach based on genetic algorithms is describedin [4]. However the computational complexity of these techniques turns them

9

Page 10: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

not very appropriate for interactive design systems. In addition, and accordingto the results of the extensive empirical analysis of several force-directed algo-rithms (including globally minimal energy algorithms) upon plain graphs thatare reported in [3], there is no universal winner and the general approach is totry several methods and choose the best.

3.4 Experimental Evaluation

We have investigated and evaluated all these issues in the context of the CASEtool DB-MAIN. The specification of the parameters L, Ks, Ke and Km is not atrivial task as these parameters determine in a high degree how the final drawingwill look like. One flexibility of the proposed approach is that we can adjust thespring length (Li,j), spring stiffness(Ks

i,j) and electrical repulsion(Kei,j), in order

to customize the appearance of the drawing according to the semantics of theER diagram constructs. For instance, as it is desirable to keep the nodes of anisA hierarchy close enough and since between any two isA-related entity typeswe only have to draw a line (and not any hexagon-enclosed string), we can usea smaller length for isA-springs than that of relationship-springs. In any case,the user can change their value at run-time.

Figure 8 shows one drawing obtained by the algorithm using low electricalrepulsion. Although the isA hierarchy is drawn as a top-down drawing and wehave no overlaps, this drawing is not satisfying because a designer would hardlymanually place into the space occupied by an isA hierarchy an entity type thatdoes not belong to that hierarchy. After we increased the repulsion and themagnetic field we never faced again such a drawing. The lesson learned is thathigh repulsion not only results in symmetrical drawings but its combinationwith a strong magnetic field results in clear isA drawings. Another drawing ofa diagram with 4 isA hierarchies that is derived by the algorithm according toforce model A, is shown in Figure 9.

high Ke, high Kmlow Ke

Fig. 8. How to obtain clean isA drawings

A more complex case is shown in Figure 10. Figure 10.(a) shows a manuallyplaced diagram where all subentity types have been placed at the outer part ofthe drawing. Figure (b) shows the drawing obtained according to force modelB. Notice that every isA hierarchy now corresponds to a top-down drawing andthat the entire drawing is symmetrical and satisfying.

10

Page 11: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

1-10-N R_3

1-1 0-NR_2

1-10-N

R_1

1-10-NR

ENTITY_9

ENTITY_8

ENTITY_7ENTITY_6

ENTITY_5

ENTITY_4

ENTITY_3

ENTITY_2ENTITY_11

ENTITY_10

ENTITY_1

ENTITY

Fig. 9. A drawing of a diagram with 4 isA hierarchies according to force model A

1-1

0-N

R_9

1-1

0-NR_8

1-1

0-N

R_7

1-1

0-N

R_6

1-1

0-N

R_5

1-10-N R_4

1-1

0-N

R_3

1-1

0-N

R_2

1-1

0-N

R_10

1-1

0-N

R

ENTITY_9

ENTITY_8

ENTITY_7

ENTITY_6

ENTITY_5

ENTITY_4

ENTITY_3

ENTITY_2

ENTITY_1

ENTITY

(a)

1-1

0-N

R_9

1-1 0-N

R_8

1-1

0-N

R_7

1-1

0-N

R_6

1-1

0-N

R_5

1-10-N R_4

1-1

0-N

R_3

1-1

0-NR_2

1-1

0-N

R_10

1-1

0-N

R

ENTITY_9

ENTITY_8 ENTITY_7

ENTITY_6

ENTITY_5ENTITY_4

ENTITY_3ENTITY_2

ENTITY_1

ENTITY

(b)

Fig. 10. Drawing of a diagram with several IsA hierarchies

(a): manual drawing where isA links are not vertical. (b): drawing ob-tained according to force model B

The experimental evaluation showed that the drawings according to forcemodel A suffer from overlaps, while those according to force model B have a few(or none) overlaps. The difference between force model A and force model B iseven more evident in dense diagrams. Figure 11 shows the drawings obtainedby these two models when applied on the top-5 (|E| = 5, |R| = 17) diagram ofFigure 3. Again, the second drawing is evidently better. A noteworthy remarkhere is that the second diagram is more clear and intuitive than the manuallyspecified layout that is shown in Figure 3. This indicates that in certain cases(at least when the diagram is very dense) the automatically-derived drawingscan be better than the manually drawn.

However, we have to note that force model B has two weaknesses comparingto force model A: (i) it is computational more expensive, and (ii) in the resultingdrawings the tentacles of binary relationship types are in many cases unneces-sarily not aligned. This is evident in Figure 12. Although this is not a majorproblem it is an issue for further research.

Figure 13 shows the automatic layout obtained for the top-11 diagram (thatwas presented in Figure 2). The high relative number of relationships makesthe drawing almost unreadable. This example suggests that we should take intoaccount the density of a diagram, in order to reach readable and clear drawings.

Roughly, we could handle dense diagrams by considering: (i) larger springs,(ii) higher electrical repulsion, (iii) less stiff springs. For example, and assumingforce model A, Figure 14.(a) shows the drawing obtained with spring length

11

Page 12: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

0-N

0-N

0-Nsupplies

0-N

0-N

Stock Movement

replaces0-1

replacedBy0-1

replacement

0-N

0-N

POSITIONS/ACTIVITIES/MONTHS

0-N

0-N

0-N

path Calculation

0-N

1-N

Operator0-N

1-1description0-1

describes0-N

Notice

0-N

0-N

Max Weight/Channel

0-N

0-N

0-N

follow-up activity

Bonus0-N

0-N

0-N

EARLY BIRD

0-N

0-N

DefaultCost

partOf0-N

0-Ndecomposition

origin0-N

destination0-N

Customs costs

0-N

0-N

country language

0-N

1-1

cn/pay

0-N

0-N

0-N

channelCalculation

0-N

0-N

accumulated movements

State of Progress

SectorOfActivity

Item

Country

Channel

(a)

0-N

0-N0-N

supplies

0-N

0-N

Stock Movement

replaces0-1

replacedBy0-1

replacement

0-N

0-N

POSITIONS/ACTIVITIES/MONTHS

0-N0-N

0-N

path Calculation0-N

1-N

Operator

0-N

1-1description0-1

describes0-N

Notice

0-N

0-N

Max Weight/Channel

0-N

0-N

0-N

follow-up activity

Bonus0-N

0-N

0-N

EARLY BIRD

0-N0-N

DefaultCost

partOf0-N

0-N

decomposition

origin0-N

destination0-N

Customs costs

0-N0-N

country language

0-N

1-1

cn/pay 0-N

0-N0-N

channelCalculation

0-N

0-Naccumulated movements

State of Progress

SectorOfActivity

Item

Country

Channel

(b)

Fig. 11. Force model A vs force model B on a diagram with |E| = 5 and |R| = 17

1-1

0-N

R_9

1-1

0-N

R_8

1-1

0-N

R_7

1-10-N

R_6

1-1

0-N

R_51-1

0-N

R_4

1-10-N

R_3

1-1

0-N

R_21-1

0-N

R_15

1-1

0-N

R_14

1-1

0-N

R_13

1-1

0-N

R_12 1-1

0-N

R_111-1

0-N

R_10

1-1

0-N

R_1

1-1

0-N

R

ENTITY_8

ENTITY_7

ENTITY_6

ENTITY_5

ENTITY_4

ENTITY_3

ENTITY_2

ENTITY_1

ENTITY

(a)

1-1

0-N

R_9

1-1

0-NR_8

1-1

0-N

R_7

1-10-NR_6

1-1

0-NR_5

1-1

0-N

R_4

1-1

0-NR_3

1-1

0-N

R_21-1

0-N

R_15

1-1

0-N

R_14

1-1

0-N

R_13

1-1

0-N

R_12

1-1

0-N

R_11

1-1

0-N

R_10

1-1

0-NR_1

1-1

0-N

R

ENTITY_8

ENTITY_7

ENTITY_6

ENTITY_5

ENTITY_4

ENTITY_3

ENTITY_2

ENTITY_1

ENTITY

(b)

1-1

0-N

R_9

1-1

0-N

R_8

1-1

0-N

R_7

1-1 0-NR_6

1-1

0-N

R_5 1-1

0-N

R_4

1-1

0-N

R_311-1

0-N

R_30

1-10-N R_3

1-10-N R_29

1-1

0-N

R_28

1-1

0-N

R_27

1-1 0-NR_26

1-1

0-N

R_25

1-1

0-N

R_24

1-1

0-N

R_23 1-1

0-N

R_22

1-1 0-NR_21

1-1

0-N

R_20

1-1

0-N

R_2

1-1

0-N

R_19

1-10-N R_18

1-1

0-N

R_17

1-1

0-N

R_16

1-1

0-N

R_15

1-1

0-N

R_14

1-1

0-N

R_13

1-1

0-N

R_12

1-1

0-N

R_11

1-1

0-N

R_10

1-1

0-N

R_1

1-1

0-N

R

ENTITY_9

ENTITY_8

ENTITY_7

ENTITY_6

ENTITY_5

ENTITY_4

ENTITY_3

ENTITY_2

ENTITY_16

ENTITY_15

ENTITY_14

ENTITY_13 ENTITY_12

ENTITY_11

ENTITY_10

ENTITY_1

ENTITY

(c)

1-1

0-N

R_9

1-1

0-N

R_8

1-1

0-N

R_7

1-1 0-NR_6

1-1

0-N

R_5

1-1

0-N

R_4

1-1

0-NR_31

1-1

0-N

R_30

1-10-N R_3

1-10-N

R_29

1-1

0-N

R_28

1-1

0-N

R_27

1-1 0-N

R_26

1-1

0-N

R_25

1-1

0-N

R_24

1-1

0-N

R_23

1-1

0-N

R_22

1-1 0-N

R_21

1-1

0-N

R_20

1-1

0-N

R_2

1-1

0-N

R_19

1-10-N

R_18

1-1

0-N

R_17

1-1

0-N

R_16

1-1

0-N

R_15

1-1

0-N

R_14

1-1

0-N

R_13

1-1

0-N

R_12

1-1

0-N

R_11

1-1

0-N

R_10

1-1

0-N

R_1

1-1

0-N

R

ENTITY_9

ENTITY_8

ENTITY_7

ENTITY_6

ENTITY_5

ENTITY_4

ENTITY_3

ENTITY_2

ENTITY_16

ENTITY_15

ENTITY_14

ENTITY_13ENTITY_12

ENTITY_11

ENTITY_10

ENTITY_1

ENTITY

(d)

Fig. 12. Force model A vs force model B

(a): force model A. (b): force model B. (c): force model A. (d): forcemodel B.

0-N

0-N

SUIVRE LE DOSSIER

1-N

0-N

SUIVI OPERATRICE

0-N

0-N

SUIVI LIVRAISON COLIS

0-N

0-N

SUIVI DES TOPS CLIENT 0-N

0-N

1-1

SUIVI COMMANDES

0-N

0-N

0-NSUIVI ACTIVITE

1-N

1-N

STAT FOURNISSEUR MENSUELLES0-N

0-N

simulation prix de vente

0-N

0-N

réexpédition colis

est remplace0-1

remplace0-1

Remplacement

date début de validité0-N

date de fin de validité0-N

0-N remise fin année fournisseur

0-N

0-N

POSITIONS/ACTIVITE/MOIS

0-N

0-NPOS-DTE

0-N

1-N

pondération

0-N

0-N

Poids max/canal

0-N

1-N

PAYS LANGUE

0-N

0-N

0-N

PAYS ENSEIGNE LANGUE

0-N

1-1

pays du fournisseur

0-N

1-1

pays de la devise0-1

0-N

Pays d'origine décrire0-N

description0-1

1-1

0-N

0-N Notice

0-N

0-N

Mouvement de stock

0-N

0-NDate de depot1-1

LOT

1-1

0-NLIVRAISON COLIS

1-N

1-1

HISTORIQUE COLIS

0-N

0-NFRAIS PAR DEFAUT

pays destination0-N

pays d'origine0-N

FRAIS DE DOUANE

1-1

0-N

fournisseur/pays

1-1

0-N

fournisseur/langue

0-N

0-N

FORMULE PAR PAYS/LANGUE

0-N

0-N

0-N Cadeau associé0-N

EARLY BIRD

article vpc pièce0-N

article vpc ensemble0-N

décomposition d'un ensemble

0-N

0-N

dte/pay 1-1

0-N

Document par langue

0-N

0-Ncumul mouvement systeme mesure

0-N

0-N

Composition

0-N

0-N

colcom/suicol

0-N

1-1

col/eta

1-1

0-Ncn/suicol

1-1

0-Ncn/pay

1-1

0-N

cd/suicol

0-1

0-Ncd/eta1-1

0-N

Canal livraison

1-1

0-N

Canal commande0-N

0-N

0-Ncalcul goulotte

0-N

0-N

0-N

calcul canal

1-1 0-Natvsec/cd

0-N

0-1

artvpc/frn

0-N

1-N

artref/tva

0-N

1-N

artref/douref

1-N

0-N

artref/atvsec

0-N

1-1

artref/artvpc

0-N

0-N

0-N

0-NALIMENTER

1-1

0-N

achfrn/lan

SECTEUR ACTIVITE

REFERENCE ARTICLE

PAYS

LANGUE

FOURNISSEUR

ETAT D'AVANCEMENT (LIGNE OU CO

DATE

COMMANDE

COLIS

CANAL

ARTICLE VPC

Fig. 13. Dense diagram drawing

12

Page 13: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

L′ = 5L, Figure 14.(b) shows the drawing obtained with Ke′ = 100Ke, andFigure 14.(c) shows the drawing obtained with Ks′ = Ks/10000. Indeed, all arebetter than the original drawing shown in Figure 13. Another simple method thatis both effective and efficient is to scale up the entire drawing (i.e. multiply eachcoordinate by a constant c > 1). Nevertheless, an issue that is worth furtherresearch is to investigate the effectiveness of local-density adaptations, e.g. toadapt the spring lengths according to the local density of the graph. EntityRankand BEntityRank scores could be exploited for this purpose.

0-N

0-N

SUIVRE LE DOSSIER

1-N

0-N

SUIVI OPERATRICE

0-N

0-N

SUIVI LIVRAISON COLIS

0-N

0-N

SUIVI DES TOPS CLIENT

0-N

0-N

1-1

SUIVI COMMANDES

0-N

0-N

0-N

SUIVI ACTIVITE

1-N

1-N

STAT FOURNISSEUR MENSUELLES

0-N

0-N

simulation prix de vente

0-N

0-N

réexpédition colis

est remplace0-1

remplace0-1

Remplacement

date début de validité0-N

date de fin de validité0-N

0-N remise fin année fournisseur

0-N

0-N

POSITIONS/ACTIVITE/MOIS

0-N

0-NPOS-DTE

0-N

1-N

pondération

0-N

0-N

Poids max/canal

0-N

1-N

PAYS LANGUE

0-N

0-N

0-N

PAYS ENSEIGNE LANGUE

0-N

1-1

pays du fournisseur

0-N

1-1

pays de la devise

0-1

0-N

Pays d'origine

décrire0-N

description0-1

1-1

0-N

0-N Notice

0-N

0-N

Mouvement de stock

0-N

0-N

Date de depot1-1

LOT

1-1

0-N

LIVRAISON COLIS

1-N

1-1

HISTORIQUE COLIS

0-N

0-N

FRAIS PAR DEFAUT

pays destination0-N

pays d'origine0-N

FRAIS DE DOUANE

1-1

0-N

fournisseur/pays

1-1

0-N

fournisseur/langue

0-N

0-N

FORMULE PAR PAYS/LANGUE

0-N

0-N

0-N

Cadeau associé0-N

EARLY BIRD

article vpc pièce0-N

article vpc ensemble0-N

décomposition d'un ensemble

0-N

0-N

dte/pay

1-1

0-N

Document par langue

0-N

0-N

cumul mouvement systeme mesure

0-N

0-N

Composition

0-N

0-N

colcom/suicol

0-N

1-1

col/eta

1-1

0-N

cn/suicol

1-1

0-N

cn/pay

1-1

0-N

cd/suicol

0-1

0-N

cd/eta

1-1

0-N

Canal livraison

1-1

0-N

Canal commande

0-N

0-N

0-Ncalcul goulotte

0-N

0-N

0-N

calcul canal

1-1

0-N

atvsec/cd

0-N

0-1

artvpc/frn

0-N

1-N

artref/tva

0-N

1-N

artref/douref

1-N

0-N

artref/atvsec

0-N

1-1

artref/artvpc

0-N

0-N

0-N

0-N

ALIMENTER

1-1

0-N

achfrn/lan

SECTEUR ACTIVITE

REFERENCE ARTICLE

PAYS

LANGUE

FOURNISSEUR

ETAT D'AVANCEMENT (LIGNE OU CO

DATE

COMMANDE

COLIS

CANAL

ARTICLE VPC

(a)

0-N

0-NSUIVRE LE DOSSIER

1-N

0-N

SUIVI OPERATRICE

0-N

0-N

SUIVI LIVRAISON COLIS

0-N

0-N

SUIVI DES TOPS CLIENT

0-N

0-N

1-1

SUIVI COMMANDES

0-N

0-N

0-N

SUIVI ACTIVITE

1-N

1-N

STAT FOURNISSEUR MENSUELLES

0-N

0-N

simulation prix de vente

0-N

0-N

réexpédition colis

est remplace0-1

remplace0-1

Remplacement

date début de validité0-N

date de fin de validité0-N

0-Nremise fin année fournisseur

0-N

0-N

POSITIONS/ACTIVITE/MOIS

0-N

0-N

POS-DTE

0-N

1-N

pondération

0-N

0-N

Poids max/canal 0-N

1-N

PAYS LANGUE

0-N

0-N

0-N

PAYS ENSEIGNE LANGUE

0-N 1-1pays du fournisseur0-N 1-1pays de la devise

0-1

0-N

Pays d'origine

décrire0-N

description0-1

1-1

0-N

0-N Notice

0-N

0-N

Mouvement de stock

0-N

0-N

Date de depot1-1

LOT

1-1

0-N

LIVRAISON COLIS

1-N

1-1HISTORIQUE COLIS

0-N0-N

FRAIS PAR DEFAUT

pays destination0-N

pays d'origine0-N

FRAIS DE DOUANE 1-10-N fournisseur/pays

1-1

0-N

fournisseur/langue

0-N

0-N

FORMULE PAR PAYS/LANGUE

0-N

0-N

0-N

Cadeau associé0-N

EARLY BIRD

article vpc pièce0-N

article vpc ensemble0-N

décomposition d'un ensemble

0-N

0-N

dte/pay

1-10-NDocument par langue

0-N

0-N

cumul mouvement systeme mesure

0-N

0-N

Composition

0-N

0-N

colcom/suicol

0-N

1-1

col/eta

1-1

0-N

cn/suicol

1-1

0-N

cn/pay

1-1

0-Ncd/suicol

0-1

0-N

cd/eta

1-1

0-N

Canal livraison

1-1

0-N

Canal commande0-N

0-N0-N

calcul goulotte

0-N0-N

0-N

calcul canal1-1

0-N

atvsec/cd 0-N

0-1

artvpc/frn

0-N

1-N

artref/tva

0-N

1-N

artref/douref

1-N

0-N

artref/atvsec

0-N

1-1

artref/artvpc

0-N

0-N0-N

0-N

ALIMENTER

1-1

0-N

achfrn/lan

SECTEUR ACTIVITE

REFERENCE ARTICLE

PAYS

LANGUE

FOURNISSEUR

ETAT D'AVANCEMENT (LIGNE OU CO

DATE

COMMANDE

COLIS

CANAL

ARTICLE VPC

(b)

0-N

0-N

SUIVRE LE DOSSIER

1-N

0-N

SUIVI OPERATRICE

0-N

0-N

SUIVI LIVRAISON COLIS

0-N

0-N

SUIVI DES TOPS CLIENT

0-N

0-N

1-1

SUIVI COMMANDES

0-N

0-N

0-N

SUIVI ACTIVITE

1-N

1-N

STAT FOURNISSEUR MENSUELLES0-N

0-N

simulation prix de vente

0-N

0-N

réexpédition colis

est remplace0-1

remplace0-1

Remplacement

date début de validité0-N

date de fin de validité0-N

0-N remise fin année fournisseur

0-N

0-N

POSITIONS/ACTIVITE/MOIS

0-N

0-N

POS-DTE

0-N

1-N

pondération

0-N

0-N

Poids max/canal

0-N

1-N

PAYS LANGUE

0-N

0-N

0-N

PAYS ENSEIGNE LANGUE

0-N

1-1

pays du fournisseur

0-N

1-1

pays de la devise

0-1

0-N

Pays d'origine

décrire0-N

description0-1

1-1

0-N

0-N Notice

0-N

0-N

Mouvement de stock

0-N

0-NDate de depot1-1

LOT

1-1

0-N

LIVRAISON COLIS

1-N

1-1

HISTORIQUE COLIS

0-N

0-N

FRAIS PAR DEFAUT

pays destination0-N

pays d'origine0-N

FRAIS DE DOUANE

1-1

0-N

fournisseur/pays

1-1

0-N

fournisseur/langue

0-N

0-N

FORMULE PAR PAYS/LANGUE

0-N

0-N

0-N

Cadeau associé0-N

EARLY BIRD

article vpc pièce0-N

article vpc ensemble0-N

décomposition d'un ensemble

0-N

0-N

dte/pay 1-1

0-N

Document par langue

0-N

0-N

cumul mouvement systeme mesure

0-N

0-N

Composition

0-N

0-N

colcom/suicol

0-N

1-1

col/eta1-1

0-N

cn/suicol

1-1

0-N

cn/pay

1-1

0-N

cd/suicol

0-1

0-N

cd/eta

1-1

0-N

Canal livraison

1-1

0-N

Canal commande0-N

0-N

0-N

calcul goulotte

0-N

0-N

0-N

calcul canal

1-1

0-N

atvsec/cd

0-N

0-1

artvpc/frn

0-N

1-N

artref/tva

0-N

1-N

artref/douref

1-N

0-N

artref/atvsec

0-N

1-1

artref/artvpc

0-N

0-N

0-N

0-N

ALIMENTER

1-1

0-N

achfrn/lan

SECTEUR ACTIVITE

REFERENCE ARTICLE

PAYS

LANGUE

FOURNISSEUR

ETAT D'AVANCEMENT (LIGNE OU CO

DATE

COMMANDE

COLIS

CANAL

ARTICLE VPC

(c)

Fig. 14. (a): larger springs; (b): higher repulsion; (c) less stiff springs

As a final remark note that the above drawing techniques can be appliedfor drawing the structural part of ontologies expressed in RDFS [5] and OWL[12]. The only difference is that RDFS supports property specialization whichhowever will be handled correctly due to the magnetic field that is applied onspecialization/generalization links (also indicated by Figure 9).

4 Conclusion

We described a novel method for identifying the major elements of an ER dia-gram that is based on link analysis. This method can significantly aid (a) theunderstanding, (b) the visualization, and (c) the drawing of very large schemas.The proposed technique can elevate automatically the major elements and allowsexploring the schema gradually: from the more important elements to the less.Consequently, it can be very useful in reverse engineering and in information inte-gration. Moreover, the scores can be exploited for ordering the schema elementsthat match a keyword query of the user. In addition, and given the inabilityto produce automatically aesthetically satisfying layouts for large schemas, thesmall (top-k graphs) that can be derived by this technique can be visualizedeffectively and this is very useful during communication (e.g. between designersand application programmers or in requirements engineering and training). Forthis purpose we investigated a force-directed drawing algorithm and evaluatedtwo different force models upon several conceptual schemas of real applications.For small and medium sized diagrams the results were satisfying in most of thecases. In the rest cases, human intervention (moving, nailing) and rerun of thedrawing algorithm could rectify the problems.

13

Page 14: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

Acknowledgements

The first author wants to thank Tonia Dellaporta for the several fruitful and reallyenjoyable discussions on this issue. Also many thanks to Jean-Rock Maurisse, Anne-France Brogneaux and Jean Herald for their help on using the DB-MAIN toolkit.

References

1. Jacky Akoka and Isabelle Comyn-Wattiau. “Entity-Relationship and Object-Oriented Model Automatic Clustering”. Data and Knowledge Engineering,20(2):87–117, 1996.

2. Giuseppe Di Battista, Peter Eades, Roberto Tamassia, and Ioannis Tollis. “Graphdrawing: algorithms for the visualization of graphs”. Prentice Hall Englewood Cliffs(N.J.), 1999. ISBN/ISSN : 0-13-301615-3.

3. F. J. Braedenburg, M. Himsolt, and C. Rohrer. “An Experimental Comparison ofForce-Direceted and Randomized Graph Drawing Algorithms”. In Procs of GraphDrawing, GD’95, pages 76–87, 1996.

4. J. Branke, F. Bucher, and H. Schmeck. “Using Genetic Algorithms for DrawingUndirected Graphs”. In Procs of the 3rd Nordic Workshop on Genetic Algorithmsand Their Applications, 3NWGA, pages 193–2005, 1997.

5. Dan Brickley and R. V. Guha. “Resource Description Framework(RDF) Schema specification: Proposed Recommendation, W3C”, March 1999.http://www.w3.org/TR/1999/PR-rdf-schema-19990303.

6. Sergey Brin and Lawrence Page. “The Anatomy of a Large-scale HypertextualWeb Search Engine”. In Proceedings of the 7th International WWW Conference,Brisbane, Australia, April 1998.

7. L. J. Campbell, Terry A. Halpin, and Henderik Alex Proper. “Conceptual Schemaswith Abstractions: Making Flat Conceptual Schemas More Comprehensible”. Dataand Knowledge Engineering, 20(1):39–85, 1996.

8. Rodolfo Castello, Rym Mili, and I. Tollis. “A Framework for the Static and Interac-tive Visualization of Statecharts”. Journal of Graph Algorithms and Applications,6(3):313–351, 2002.

9. P. Chen. “The Entity-Relationship Model - Toward a Unified View of Data”. ACMTransactions on Database Systems, 1(1):9–36, March 1976.

10. Richard Cole. “Automatic Layout of Concept Lattices using Force Directed Place-ment and Genetic Algorithms”. In Proc. of the 23th Australiasian Computer Sci-ence Conference, pages 47–53. Australian Computer Science Communications 1,IEEE Computer Society, 2000.

11. R. Davidson and D. Harel. “Drawing Graphics Nicely Using Simulated Annealing”.ACM Trans. Graph., 15, 1996.

12. M. Dean, D. Connolly, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuin-ness, P. F. Patel-Schneider, and L.A. Stein. “OWL Web Ontology Language 1.0Reference”, 2002. (http://www.w3c.org/TR/owl-ref).

13. P. Eades. “A Heuristic for Graph Drawing”. Congressus Numerantium, 42, 1984.14. Holger Eichelberger and Jurgen Wolff von Gudenberg. “UML Class Diagrams -

State of the Art in Layout Techniques”. In Proceeding of Vissoft 2003, InternationalWorkshop on Visualizing Software for Understanding and Analysis, pages 30–34,2003.

15. P. Feldman and D. Miller. “Entity Model Clustering: Structuring a Data Modelby Abstraction”. The Computer Journal, 29(4):348–360, 1986.

14

Page 15: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

16. T. Fruchterman and E. Reingold. “Graph Drawing by Force-directed Placement”.Software - Practice and Experience, 21(11):1129–1164, 1991.

17. F.U.N.D.P. “DB-MAIN”. (http://www.info.fundp.ac.be/∼dbm/).18. Munish Gandhi, EdwardL Robertson, and Dirk Van Gucht. “Levelled Entity Re-

lationship Model”. In Procs of the 13rd Intern. Conf. on the Entity RelationshipApproach, ER’94, pages 420–436, Manchester, U.K., December 1994.

19. Floris Geerts, Heikki Mannila, and Evimaria Terzi. “Relational Link-based rank-ing”. In Procs of the 30th Intern. Conference on Verly Large Data Bases,VLDB’2004, Toronto, Canada, August 2004.

20. Zoltan Gyongyi, Hector Garcia-Molina, and Jan Pedersen. “Combating Web Spamwith TrustRank”. In Procs of the 30th Intern. Conference on Verly Large DataBases, VLDB’2004, Toronto, Canada, August 2004.

21. Jean-Luc Hainaut. “Transformation-based Database Engineering”. In Transforma-tion of Knowledge, Information and Data: Theory and Applications. IDEA GroupPub., 2004.

22. Jouni Huotari, Kalle Lyytinen, and Marketta Niemela. “Improving Graphical Infor-mation System Model Use with Elision and Connecting Lines”. ACM Transactionson Computer-Human Interaction, 10(4), 2003.

23. Yannis E. Ioannidis, Miron Livny, Jian Bao, and Eben M. Haber. “User-OrientedVisual Layout at Multiple Granularities”. In Proc. of the 3rd International Work-shop on Advanced Visual Interfaces, pages 184–193, Gubbio, Italy, May 1996.

24. T. Kamada. ”On Visualization of Abstract Objects and Relations”. PhD thesis,Dept. of Information Science, Univ. of Tokyo, Dec 1988.

25. Jon Kleinberg. “Authoritative Sources in a Hyperlinked Environment”. In Proceed-ings of 9th ACM-SIAM Symposium on Discrete Algorithms, San Francisco, USA,1998.

26. Simon Lok and Steven Feiner. “A Survey of Automated Layout Techniques forInformation Presentations”. In Procs of the 1st. Int. Symp. on Smart Graphics,Hawthorne, NY, 2001.

27. Rajeev Motwani and Prabhakar Raghavan. Randomized algorithms. CambridgeUniversity Press, 1995.

28. Arthur Ouwerkerk and Heiner Stuckenschmidt. “Visualizing RDF Data for P2PInformation Sharing”. In Procs of the workshop on Visualizing Information inKnowledge Engineering, VIKE’03, Sanibel Island, FL, 2003.

29. Aaron J. Quigley. “Large Scale Relational Information Visualization, Clustering,and Abstraction”. PhD thesis, University of Newcastle, Australia, August 2001.(http://www.it.usyd.edu.au/∼aquigley/thesis/aquigley-thesis-mar-02.pdf).

30. O. Rauh and E. Stickel. ”Entity Tree Clustering: A Method for Simplifying ERDesign”. In Procs of the 11th Int. Conf. on Entity-Relationship Approach, ER’92.

31. K. Sugiyama and K. Misue. “A Simple and Unified Method for Drawing Graphs:Magnetic-Spring Algorithm”. In Procs of Graph Drawing Conference, GD’94, pages364–375, 1994.

32. K. Sugiyama and K. Misue. “Graph Drawing by Magnetic-Spring Model”. Journalon Visual Lang. Comput., 6(3), 1995.

33. R. Tamassia, C. Batini, and M. Talamo. “An algorithm for automatic layoutof entity-relationship diagrams”. In Procs of the 3rd International Conferenceon Entity-relationship approach to software engineering, pages 421–439. ElsevierNorth-Holland, Inc., 1983.

34. T. J. Teory, W. Guangping, D. L. Bolton, and J. A. Koenig. “ER Model Cluster-ing as an Aid for User Communication and Documentation in Database Design”.Communications of the ACM, 32(8):975–987, 1989.

15

Page 16: How to Tame a Very Large ER Diagram (using Link Analysis ...tzitzik/publications/Tzitzikas_ER_2005.pdf · reverse engineering of large systems, the usage of ERP (the SAP database

35. Yannis Tzitzikas and Jean-Luc Hainaut. ”Ranking the Elements of ConceptualDiagrams”, 2005. (submitted for publication).

A Linear Algebra Version of (B)EntityRank

Let A be the generalized adjacency matrix of an ER diagram where A[ei, ej ] equalsthe number of transitions from ei to ej . Now the probability transition matrix M isobtained by normalizing each row of A to sum to 1. EntityRank is based on a Markovchain on the entity types with transition matrix

q · U + (1− q) ·M

where U is the transition matrix of uniform transition probabilities i.e. U[ei, ej ] = 1/Nfor all i, j. The vector of the EntityRank scores, denoted by Sc, is then defined tobe the stationary distribution of this Markov chain. Equivalently, Sc is the principalright eigenvector of the transition matrix (q · U + (1 − q) · M)T , since by definitionthe stationary distribution satisfies (q · U + (1 − q) ·M)T Sc = Sc. On the other hand,BEntityRank (the biased version of EntityRank) is based on the transition matrix:

q · B + (1− q) ·M

where B[ej , ei] = |attrs(ei)||Attr| where Attr denotes the set of all attributes of all entity

types (i.e. Attr = ∪{ attrs(e) | e ∈ E}).As another remark note that in an undirected (strongly connected and non-bipartite)

graph G = (V, R), the stationary probability of a node u is given by P (u) = deg(u)2|R|

where deg(u) is the degree of u [27]. This means that in an undirected graph (or multi-graph) the stationary probabilities can be computed very efficiently and without theneed of an iterative algorithm. In our case we cannot employ the above method due tothe ”teleporting” transitions which are indispensable in our case for ensuring that thetransition graph is strongly connected (note that large ER diagrams are not alwaysconnected). Specifically, the ”teleporting” transitions of BEntityRank are not symmet-ric and this cannot be captured by an undirected graph. For instance, consider the caseof an ER diagram consisting of two entity types e1 and e2 and one relationship typebetween them, where e1 has one attribute and e2 has two attributes. According to arandom walk on the undirected graph both entity types have probability 1/2. Accord-ing to BEntityRank if q = 0 then P (e1) = P (e2) = 1/2, if q = 1 then P (e1) = 1/3 andP (e2) = 2/3, and if q = 0.5 then P (e1) = 0.44 and P (e2) = 0.55.

16