40
INFORMATION IS AT THE HEART OF ALL ARCHITECTURE DISCIPLINES AN ENTERPRISE ARCHITECTS WHITE PAPER ...AND THERE’S MORE TO DATA MODELLING THAN YOU THOUGHT CHRISTOPHER BRADLEY CHIEF INFORMATION ARCHITECT & ENTERPRISE SERVICES DIRECTOR

Data Modelling is NOT just for RDBMS's

Embed Size (px)

DESCRIPTION

Data modelling has been around since the mid 1970's but in many organisations there is considerable scepticism and downright distrust regarding the place dta modelling should occupy. So why does data modelling still have to be "sold" in many companies, and in others people simply don't believe it's necessary " the software package has all I need"! This paper looks at the failure of organisations to capitalise on the benefits data modelling can yield and examines where in the changing information systems landscape modelling is relevant.

Citation preview

Page 1: Data Modelling is NOT just for RDBMS's

INFORMATION IS AT THE HEART OF ALL ARCHITECTURE DISCIPLINES

AN ENTERPRISE ARCHITECTS WHITE PAPER

...AND THERE’S MORE TO DATA MODELLING THAN YOU THOUGHT

CHRISTOPHER BRADLEY CHIEF INFORMATION ARCHITECT & ENTERPRISE SERVICES DIRECTOR

Page 2: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

2 | ENTERPRISE ARCHITECTS ©2014

Page 3: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

3ENTERPRISE ARCHITECTS ©2014 |

CONTENTS DATA MODELLING IS A CRITICAL TECHNIQUE AND AT THE HEART OF ALL ARCHITECTURE DISCIPLINES 4

DATA MODELLING INTRODUCTION 6

BACKGROUND & HISTORY 8

DIFFERENT TYPES OF MODELS FOR DIFFERENT PURPOSES AND AUDIENCES 10

DATA MODELLING FOR DBMS DEVELOPMENT 12

DATA MODELLING INCORRECTLY TAUGHT AT UNIVERSITY 16

BUT THIS IS WRONG? SO WHAT NEEDS TO CHANGE? 17

MODELLING FOR THE “NEW” TECHNOLOGIES 18

DEMONSTRATING BENEFITS 30

THE GREATEST CHANGE REQUIRED 32

WHAT NEEDS TO STAY THE SAME? 35

CONCLUSION 36

ABOUT THE AUTHOR 38

ABOUT ENTERPRISE ARCHITECTS 39

Page 4: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

4 | ENTERPRISE ARCHITECTS ©2014

DATA MODELL ING IS A CRIT ICAL TECHNIQUE AND AT THE HE ART OF ALL ARCHITECTURE DISC IPL INES

But they were wrong.

People who believe Data Modelling is just for DBMS design are just as misinformed. Data Modelling, particularly Conceptual Data Modelling is an absolutely critical technique and is at the heart of all architecture disciplines. Here’s why:

Since data has to be understood to be managed, it stands to reason that gaining agreement on the meaning and definition of concepts will be a key component. That is precisely what a data model provides.

But just what do I mean when I state that Data Modelling is at the heart of all architecture disciplines?

At its heart, the Data Model provides the unifying language, lingua franca, the common vocabulary upon which everything else is based. Other modelling techniques within the complimentary architecture disciplines will interact with each other, forming a

supportive; cross-checked, integrated and validated set of techniques. It’s not just (sometime it’s never) about technical DBMS design.

So to illustrate the case with a few simple examples, we see in:

The Business Architecture Domain: A Project Charter documents the rationale, the objectives, the business scope, and measures the success of the project. It uses the language of a high level data model to describe the business concepts.

The Process Architecture Domain: A Workflow Model describes the sequence of steps carried out by the actors involved in the process.

The Application & Systems Architecture Domain: A Use Case describes how an actor completes a step in the process, by interacting with a system to obtain a service. A Service Specification describes some form of business service that is initiated to complete a business event

Many years ago people believed the World was flat and if they sailed over the horizon, then they would fall off the edge. They also believed that the Earth was at the centre of the heavens, and that all other planets orbited around it.

Page 5: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

5ENTERPRISE ARCHITECTS ©2014 |

FIGURE 1: Data Modelling is at the heart of all architecture disciplines

The Information Architecture Domain: A Data Model depicts the critical data items, and the attributes or facts about them. This is important data that the organisation wishes to know or store information on, and is the stuff that the processes and systems act on.

Every type of model references the entities of significance in the conceptual data model, showing why conceptual data modelling is such a vital technique.

Getting agreement on the language and definition of the data concepts always must always occur first; once established detail about processes can be added:

» To begin we discover the Nouns: i.e. the items of interest to the organisation , e.g. “Product” “Customer” “Location”

» Next we discover “Verb – Noun” pairs: These are activities that must be performed, such as process and sub-process, in order for the organisation to operate, e.g. “Design Product” “Ship Order”

» Lastly we discover “Actor – Verb – Noun “ combinations: These form the Use Cases or steps within a business process, , e.g. “Lead Architect Designs New Product”.

At this high level, we are seeking to gain an understanding and agreement on terms and vocabulary for the data concepts. We do not want to get bogged down in the level of excruciating detail that a detailed logical model would take us into.

Thus, high level conceptual models (often called Business Data Models) are the appropriate vehicle to use here. It can be loosely argued that they provide some of the features of an “ontology” i.e. business concepts and their relationships, although a Conceptual Data Model with its metadata extensions provides much more.

Page 6: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

6 | ENTERPRISE ARCHITECTS ©2014

DATA MODELL ING INTRODUCTION

The problem for many Data Architects is that “Data Modelling” has, in far too many companies received a lot of bad press. Have you heard any of these?

» “It just gets in the way”

» “It takes too much time”

» “What’s the point of it”

» “It’s not relevant in today’s systems landscape”

» “I don’t need to do modelling, the package has it all”

Yet when Data Modelling first came onto the radar in the mid 1970’s the potential was enormous: We were told we would realise benefits of:

» “a single consistent definition of data”

» “master data records of reference”

» “reduced development time”

» “improved data quality”

» “impact analysis”

...to name but a few.

Do organisations today want to reap these benefits? You bet, it’s a no-brainer.

So then, why is it that now, here we are, 30+ years on and we see in many organisations that the benefits of Data Modelling still need to be “sold” and in others the big benefits simply fail to be delivered? What’s happened? What needs to change?

AS WITH MOST THINGS A LOOK BACK INTO THE PAST IS A GOOD PLACE TO START.

Page 7: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

7ENTERPRISE ARCHITECTS ©2014 |

Page 8: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

8 | ENTERPRISE ARCHITECTS ©2014

BACKGROUND & HISTORY

1950’s – 70’s:

Information Technology (at that time often called Automated Data Processing (ADP)) was starting to enter the mainstream world of commerce. During this period we saw the introduction of the first database management systems such as DL1, IMS, IDMS and TOTAL. Who can remember a DBMS that could be implemented entirely on tapes? **

At that time the cost of disc storage was exceptionally high, and the notion of exchangeable disc packs was just coming into the data centre. The concept of “database” operations came into play and the early mentions of “corporate central databases” appeared.

** It was IMS HISAM if you really want to know.

1970 – 1990:

Data was “discovered”. Early mentions of managing data “as an asset” were seen and the concepts of Data Requirements Analysis and Data Modelling were introduced.

1990 – 2000:

The “Enterprise” became flavour of the decade. We saw Enterprise Data Management Coordination, Enterprise Data Integration, Enterprise Data Stewardship and Enterprise Data Use. An important change began to happen in this period, there was a dawning realisation that “technology” alone wasn’t the answer to many of the information issues, and we started to see Data Governance being talked about seriously

2000 and beyond:

Data Quality, Data as a Service, Data Security & Compliance, Data Virtualisation, Services Oriented Architecture (SOA), governance and alignment with the business were (and still are) the data management challenges of this period.

All of this needs to be undertaken in these rapidly changing times when we have a “new” view of information: Web 2.0, Blogs, Mash-ups, Data Virtualisation. It seems anyone can create data! At the same time we have a greater dependence on

Looking back into the history of data management; we see a number of key eras.

Page 9: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

9ENTERPRISE ARCHITECTS ©2014 |

“packaged” or COTS (Commercial off the shelf) applications such as the major ERPs. There is also more and more use of SOA, XML, business intelligence and less reliance on traditional “bespoke” development.

Notice I sneaked in “mash-ups” (or web application hybrid) there? See the Wiki article Mashup_(web_application_hybrid) for more on mash-ups. There are many powerful facilities available now that enable you to create your own mash-ups. Make no mistake, these are now becoming the new “Shadow IT” of this decade.

Remember the home grown departmental Excel macros of the 90’s and onwards that became “critical” to parts of the business? Now mash-ups are doing the same thing. But just who is looking at the data definitions,

the data standards, applicability etc.? Certainly not the data management group – because frequently they don’t even know that these functions are being built in departmental silos, and anyway the “data team” is pigeon holed as being only involved in DBMS development.

So that leads us on to examine the belief that many people still have (too many unfortunately) that Data Modelling is only for DBMS development.

SO WHY IS THAT?

Firstly we’ll look at Data Modelling for use in DBMS development.

Page 10: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

10 | ENTERPRISE ARCHITECTS ©2014

FIGURE 2: Levels of Data Models

DIFFERENT T YPES OF MODELS FOR DIFFERENT PURPOSES AND AUDIENCESIn its early days Data Modelling was mostly (almost exclusively) what we now call Logical and/or Physical Data Modelling and it was primarily aimed at DBMS development.

However, there are many different levels of “Data Models” that can be developed, and they each have a different purpose and audience.

From Figure 2, we see there are many different levels of “Data Models”. The higher up the pyramid we go, the more “communication” focused the models are. Whereas the further down the pyramid we go the more “implementation focused the models are. Frequently, a higher level model is created with the sole purpose of improving communication and understanding.

Page 11: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

11ENTERPRISE ARCHITECTS ©2014 |

FIGURE 3: Purpose of data model levels

ENTERPRISE DATA MODELDocuments the very high level business data objects and definitions. Enterprise wide scope to provide a strategic view of enterprise data.

CONCEPTUAL DATA MODEL (SUBJECT ARE A)The business key, attributes and definitions of business data objects. Also shows the relationship between business data objects. Broader scope than LDM and may cover a subject area (also known as subject area model).

LOGICAL DATA MODEL (APPL ICATION)Documents the business key, attributes and definitions of business data objects. It also shows the relationship between business data objects. Frequently is within the scope of a defined project.

PHYSICAL DATA MODELTechnical design e.g. tables, columns, keys, foreign keys and other constraints to be implemented in the data base or XSD. May be generated from a logical data model.This model is within the scope of a defined project.

Page 12: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

12 | ENTERPRISE ARCHITECTS ©2014

DATA MODELL ING FOR DBMS DE VELOPMENT

As outlined previously, in its early year’s Data Modelling was primarily aimed at DBMS development and there were two main techniques for developing a model; we’ll have a look at these in a moment.

Just to illustrate this we can look at 4 typical roles that may be considered as “customers” of the Data Modelling output:

THE ENTERPRISE DATA CUSTOMER

This might be at Director or CxO level. The accuracy of data is critical, they are reports users, and the data “products” that data professionals produce are key to serving the needs of this type of user.

THE DATA ARCHITECT

This person knows the business and its rules. He/she manages knowledge about the data and defines the conceptual direction and requirements for capturing data.

THE DBA

This role is production oriented, manages data storage and the performance of databases. They also plan and manage data movement strategies, and play a major part in data architecture by working with architects to help optimise and implement their designs in databases.

THE DEVELOPER DBA

This role works closely with the development teams and is focused on DBMS development. They frequently move and transform data, often writing scripts and ETL to accomplish this.

Data Models (more accurately the metadata) were and are seen as the glue, or the lingua franca, for integrating IT roles through the DBMS development lifecycle. All of the roles listed above depend on metadata from at least one of the other positions.

Page 13: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

13ENTERPRISE ARCHITECTS ©2014 |

What then are the steps for developing a DBMS and utilising Data Models? Firstly a word of warning; this could be the subject of a huge paper in its own right, but I’ll try and summarise it simply here.

There are two “main” approaches to creating DBMS’s from models: One is the “top down” or “to-be” approach and the other is termed the “bottom-up” or “as-is” approach.

TOP DOWN (TO-BE) APPROACH

STEP 1:

When speaking with business representatives, discover and document the business requirements, before agreeing on a high-level scope. The output is typically some form of Business Requirements Document (BRD). This will give an understanding at a high level, of the concept where the data is used by business processes, and vice versa.

STEP 2:

Create a more detailed business requirement document with subscriber data requirements, business process and business rules.

STEP 3:

Understand and document the business keys, attributes and definitions from business subject matter experts. From this create and continually refine a logical data model. Determine what the master entities are and what is common to other business areas.

STEP 4:

Verify the logical data model with the stakeholders. Walk a number of major business use cases through and refine the model. Apply the technical design rules with knowledge of the technical environment that you are going to implement the solution on, use known volumetric and performance criteria and create a first cut physical data model. Remember the same logical model could be implemented in different ways upon varying technology platforms.

STEP 5:

Generate the Data Definition Language (DDL) from the physical model. Refine the physical design with DBA support and implement the DBMS using the refined physical model.

This top down approach has an advantage that the “New” or “To-Be” business and data requirements are the main priority. In the early days there were not many “existing systems” to consider, a good job because the approach doesn’t take into account any of the hidden nuances and rules that may be deep down within the existing systems.

Page 14: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

14 | ENTERPRISE ARCHITECTS ©2014

BOTTOM UP (AS-IS) APPROACH

The primary purpose of the Bottom Up (or As-Is) Approach is to create a model of an existing system into which the new requirements can be added. Frequently, the bottom-up approach is used because a model of the current system simply doesn’t exist. Often because it has evolved and/or the original design staff have retired, died, or moved on and the documentation has not been kept up to date.

The main steps in the bottom-up approach are:

STEP 1:

Reverse engineer the database of file schema from the system that is already implemented. From this you will have the database catalog, table, column, index names etc. Of course these will all be in “tech” language without any business definitions.

STEP 2:

Profile the real data by browsing and analysing the data from the tables. Scan through the ETLs to find out any hidden relationships and constraints. Modern data profiling tools are invaluable here as they will allow you to gain real insight to the data, way beyond simply trying to understand from the column names. You did know that SpareField6 really has the alternative delivery location?

The bottom up approach is great for capturing those hidden “gotchas” that are tucked away inside the current system. However it doesn’t give any serious attention to new requirements.

STEP 3:

Find out foreign key relationships between tables from IT subject matter experts, and verify the findings. The typical output here is a refined physical model.

STEP 4:

Document the meanings of columns and tables from IT subject matter experts.

STEP 5:

Try to understand the business meaning of probable attributes and entities that may be candidates for logical data model. From here the result is a “near logical” model.

Thus, a third way is a hybrid of these two approaches that is frequently called the “Middle Out” Approach. The Middle Out Approach employs the best parts of the Top-Down and Bottom-Up Approaches. This is the approach I favour when designing a new model, which is likely to have a better chance of ultimately being used for a technology solution.

Page 15: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

15ENTERPRISE ARCHITECTS ©2014 |

THE MIDDLE OUT APPROACH

Employs the best parts of the Top-Down and Bottom-Up Approaches. This is the approach I favour when designing

a new model, which is likely to have a better chance of ultimately being used for a technology solution.

Image: Flickr, Wwarby - Red Arrows

Page 16: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

16 | ENTERPRISE ARCHITECTS ©2014

DATA MODELL ING INCORRECTLY TAUGHT AT UNIVERSIT Y

Here are a few snippets I have pulled from 5 separate universities recently regarding data modelling on the Computer Science Bachelors & Masters courses:

» “The purpose of a Data model is to design a relational database system”

» “An ER Model is used to specify design and document Database design”

» “A Data model is a pictorial representation of the structure of a relational database system”

» “… it is a description of the objects represented by a computer system together with their properties and relationships”

» “ER Modelling is a Database design method”

At one of these I dug deeper and examined several of the course assignments. One assignment asked students to prepare a model to represent an office environment and in part of the detailed description within the assignment brief it mentioned the “Rolodex” and “IBM Selectric” that were on the desks in this office.

Now, I’m not talking here of reading an assignment paper set for a course in 1975, this was one I saw in 2013!!

Now with all of these uses of Data Models that I have described so far, the history of Data Modelling, the way it’s still being taught in some Universities, and judging from much of the literature from the Data Modelling tool vendors themselves; it not surprising that many people are left with the impression that data modelling is just for DBMS’s.

As part of my DAMA-I education brief (and to be honest as a way of giving something back to the community) I am frequently asked to speak not just at conferences but with academic institutions. Over the past 10 years or so I have been taken aback at what I have observed regarding the way in which Data Modelling is portrayed on courses at many Universities in the UK and USA (and I suspect in other places too).

Page 17: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

17ENTERPRISE ARCHITECTS ©2014 |

BUT THIS IS WRONG? SO WHAT NEEDS TO CHANGE?

To make Data Modelling relevant for today’s systems landscape we must show that it’s relevant for the “new” technologies such as:

» ERP packages.

» SOA & XML.

» Business Intelligence.

» Data Lineage.

» Data Virtualisation.

Without forgetting that an appropriate level Data Model is an awesome communication tool so it can for used for communicating with the business.

See also “Data Modelling For The Business – A Handbook for aligning the business with IT using high-level Data Models”; Technics Publishing; ISBN 978-0-9771400-7-7.

We also need to break away from the “you must read my detailed Data Model” mentality and make the information available in a format users can readily understand. For example this means that Data Architects need to recognise the different motivations of their users and re-purpose the model for the audience: Don’t show a business user a Data Model!

Information should be updated instantaneously, and we must make it easy for users to give feedback, after all you’ll achieve common definitions quicker that way.

We need to recognise the real world commercial climate that we’re working in and break away from arcane academic arguments about notations methodologies and the like. If we want to have Data Modelling play a real part in our business then it’s up to us to demonstrate and communicate the genuine benefits that can be realised. Remember, Data Modelling isn’t a belief system, just because you “get it” don’t assume that the next person does.

The use and benefit of Data modelling is considerably greater than its current “one trick pony” press would suggest.

Page 18: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

18 | ENTERPRISE ARCHITECTS ©2014

MODELL ING FOR THE “NE W” TECHNOLOGIES

It’s just that “traditionally” Data Modelling has not been seen as being relevant to these areas. To break out of this “modelling is a one trick pony” view we need to show how and why Data Modelling IS relevant for today’s varied IT landscape. Therefore we must show that it’s relevant for the “new(er)” technologies such as:

1. ERP packages.

2. SOA & XML.

3. Business Intelligence.

4. Data Lineage.

5. Data Virtualisation.

6. Communicating with the business.

1. ERP PACKAGES

As Data Architects, when faced with projects that are embarking upon the introduction of a major ERP package, have you ever heard the cry:

“We don’t need a Data Model – the package has it all”?

But, does it?

Is data part of your business requirement? Of course it is. So just how do you know whether the package meets your overall business data requirements? You did assess the data component when doing your fitness for purposes evaluation didn’t you? A Data Model will assist in both package configuration and fitness for purpose evaluation.

How can you assess that the ERP package has compatible data structures, definitions and meanings as your legacy systems? Again a good Data Model will assist this.

I feel I must make a confession here. The technologies are not really all that new!

Page 19: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

19ENTERPRISE ARCHITECTS ©2014 |

What about data integration, legacy data take on and master data integration – how can these readily be accomplished? You guessed it – a Data Model can help here too.

The critics say that modelling isn’t needed for ERP packages. But that’s because they are wedded to the old-world view that modelling is only used for DBMS development. It’s not.

In this case, when we are implementing ERP systems, the model will NOT be required to generate a DBMS from, however for all of the other aspects described above it IS invaluable.

So what’s’ the problem? Why can’t we just point our favourite Data Modelling tool at the underlying DBMS of the package? Simply put, for the most part the problem is that Database System Catalog does not hold useful metadata.

Several well-known ERP systems do not hold any Primary Key (PK) or Foreign Key (FK) constraints in the Database itself. It’s only within their application layer that this knowledge is held. It is within the proprietary ERP Data Dictionary where anything resembling a ‘Logical View’ of the data and the definitions are held.

FIGURE 4: Part of an ERP reverse engineered directly from the DBMS

Page 20: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

20 | ENTERPRISE ARCHITECTS ©2014

What we really need is to be able to get the ERP metadata into a useful format similar to that shown in Figure 5 below.

How can we do that? Well there isn’t space in this article to go into the detail, and much of it varies from ERP to ERP. However with SAP, there is a metadata extraction facility independently available called SAPHIR. Additionally, you can also validate a model created from SAPHIR by examining key screen items such as in the example illustrated below in Figure 6.

Summary: Why develop Data Models for package implementation

So why do we need to bother undertaking Data Modelling when implementing an ERP system?

1. For requirements gathering. If your business data is part of your requirement, you need to model them.

2. For a fit for purpose evaluation. Surely you must have evaluated the suitability of the package before deciding to implement it?

FIGURE 5: Useful model from an ERP

3. For gap analysis: Even if you are told “it’s a done deal – we are going with package X”, the Data Model will give you rich insight to gaps in key areas of functionality. I have used this many times with clients when implementing major well known packages to help spot areas where a work round, or manual implementation will be required.

4. For configuration. Using models as a communication vehicle to demonstrate use case is invaluable. From these the many options in the ERP system can be examined and then configured with confidence.

5. For legacy data migration and take on.

6. For master data alignment. The ERP may have its own master data sets. You can use the model to ensure correct alignment of these with your corporate master data initiative. Don’t fall into the trap of letting the tail wag the dog!

7. Fundamentally, this is the key one. It’s all about ensuring that your ERP data can integrate within your overall Information Architecture

FIGURE 6: Validating an ERP model from transaction screens

Page 21: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

21ENTERPRISE ARCHITECTS ©2014 |

2. SOA AND XML

Fundamentally, SOA is built upon a message based set of interactions, i.e. all interaction between components is through messages. These are generally XML messages, so it is true to say that XML is at the core of SOA.

But there is a potential problem.

XML is a hierarchical structure ( just like in the good old days of IMS & DL1), but the real world of data is not.

Let’s illustrate this with a real world example – a book. Looking at Figure 7, we see that this book is entitled “Data Modeling for the Business”. When we look at this example we see data such as Title, Author(s), ISBN, Price, Publisher, Amazon URL and so on.

FIGURE 7: Book example

I don’t intend to give a detailed exposition on the subject of SOA; however, it’s worth reminding ourselves of the fundamental components in the architecture.

The Bus in SOA is a “conceptual” construct, which helps to get away from point to point thinking. An approach for integrating applications via “a bus” is by using Message Oriented Middleware (MOM).

A Message Broker is a dispatcher of messages and comes in many varieties. The broker operates upon a queue of messages within the routing table.

Adapters are where the different technology worlds are translated, e.g. UNIX, Windows, OS/390 and so on.

Page 22: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

22 | ENTERPRISE ARCHITECTS ©2014

Looking at the authors, (myself, Steve & Donna) there is also some information (on the back cover) relating to each of us.

We can develop a Data Model to represent this “real world” data and show it in an Entity Relational format. Typically these ER models can represent real world data pretty accurately.

Figure 8 shows an example ER model for the “book authoring” data subject area.

A few of the business assertions that this Data Model makes are that:

A. A book can be written (authored) by at least one & possibly several writers (in this case, me, Steve and Donna).

B. A writer may be the author of many books (e.g. Steve has also written “Data Modeling Made Simple”).

C. Thus Book <> Writer is a many to many relationship. However, the intersection entity is a real world concept; it’s the “Book Authorship” entity and this is shown in Figure 8.

Now, when we want to use data in this model within an XML based system we have to remember that XML messages are hierarchic; that is a child entity can only have one parent entity; whereas an entity relationship (ER) model allows a child entity to have several parent entities. Thus we need to do something to turn the ER model representation into a hierarchic XML representation. To accomplish this we need to decide whether to make “Book” the parent of Book Authorship or to choose “Writer” to be the parent.

In Figure 9, the resultant XML model has been created after choosing Book as the parent.

Whilst simplistic (for the sake of the example), the XML model in Figure 9 now represents the XML schema we’re going to use. Within our SOA based system, we may have a transaction which utilises an XML message called “Book Details”. Figure 10 below shows how an XML message has been created from the XML schema, and is utilised (in the message queue) in our SOA solution.

So clearly, Data Modelling IS a key component required in a SOA implementation.

It’s somewhat ironic that this “new” SOA concept and the representation of data in a hierarchic form (i.e. in XML messages), draws heavily on the approaches we had to employ when designing a database schema for IMS and DL1 which were hierarchic DBMS’s!

Page 23: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

23ENTERPRISE ARCHITECTS ©2014 |

FIGURE 8: Book example ER model

FIGURE 9: Book XML model

FIGURE 10: Book details XML model

Page 24: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

24 | ENTERPRISE ARCHITECTS ©2014

3. BUSINESS INTELLIGENCE

When looking at Business Intelligence and Data Warehouses, we are trying to ensure that the data utilised by the business for their queries and reports is reliable.

In order to accomplish this, not only do we need to manage the data that the business utilises, but also the metadata. We all know by now that much of this metadata is contained within the data models.

So, what are the main reasons for managing this model metadata?

1. Reduce Cost: In addition to all the other points below, the goal here is to reduce the overall cost of managing a significant part of the IT infrastructure. Managing metadata helps automate processes, reduce costly mistakes of creating redundant/non-conformant data, and reduce the length of time to change systems according to business needs.

2. Higher Data Quality: Without proper management, the same type of data may be managed differently in the places it is used and degrade its quality/accuracy.

3. Simplified Integration: If data is understood and standardised, it reduces the need for complex and expensive coding and scripting to transform and massage data during integration.

4. Asset Inventory: Managing the knowledge about where data lives and what you store is critical for eliminating redundant creation.

5. Reporting: Creating a standard definition of data types and making it easy for the enterprise to find, will reduce cost in application development (e.g. time to research and create new objects) as well as facilitate a general understanding of the enterprise’s data.

6. Regulatory Compliance: Without metadata management, you are not complying with regulations. Bottom line: An audit trail of data, starting with its whereabouts, is critical to complying with government mandates.

Page 25: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

25ENTERPRISE ARCHITECTS ©2014 |

The top 5 benefits from managing this model metadata for reporting are:

#5 DATA STRUCTURE QUALITY.

Models ensure that the business design of data architecture is appropriately mapped to the logical

design, providing comprehensive documentation on both sides.

#4 DATA CONSISTENCY.

By having standardised nomenclature for all data – including domains, sizing, and documentation formats

– the risk of data redundancy or misalignment is greatly reduced.

#3 DATA ADVOCACY.

Models help to emphasise the critical nature of data within the organisation, indicating direction of data strategy and tying data architecture to overall enterprise architecture

plans, and ultimately to the business’s objectives.

#2 DATA REUSE.

Models, and encapsulation of the metadata underpinning data structures, ensure that data is

easily identified and is leveraged correctly in the first place, speeding incremental tasks through reuse

and minimising the accidental building of redundant structures to manage the same content.

#1 DATA KNOWLEDGE.

Models, combined with an efficient modelling practice, enable the effective communication of metadata

throughout an organisation, and ensure all stakeholders are in agreement on the most fundamental requirement:

the data.

Page 26: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

26 | ENTERPRISE ARCHITECTS ©2014

ER Models vs. Dimensional Models for Reporting

A lot has been written previously about the appropriateness of ER vs Dimensional Models for BI and Data Warehousing. To dispel any myths it’s worth looking at the key features of each type of model:

FEATURES OF A DIMENSIONAL MODEL

» “Star Schema” (or snowflake or even star flake)

» Optimised for reporting

» Business entities are de-normalised

» More data redundancy to support faster query performance

» Relationships between business entities are implicit (it’s evident that a product has a brand and manufacturer, but the nature of the relationship between these entities is not immediately obvious

» Loosely coupled to the business model – changes to the business model can often be accommodated via graceful changes without invalidating existing data or applications.

FEATURES OF AN ER MODEL

» Optimised for transactional processing (arrival of new data)

» Normalised – typically in 3rd (or 5th normal form)

» Designed for low redundancy of data

» Relationships between business entities are explicit (e.g. Product determines brand determines manufacturer)

» Tightly coupled to current business model

Page 27: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

27ENTERPRISE ARCHITECTS ©2014 |

4. DATA LINEAGE

The knowledge of how data is transformed is itself valuable intellectual property that should be retained within a business, and very importantly it is absolutely necessary for compliance with the Basel II Accord and Sarbanes-Oxley Act (SOX): SOX requires that lineage & transformation of financial data is recorded as it flows through business systems.

There are two key aspects of Data Lineage

TRANSFORMATIONS:

» What has been done to the data?

BUSINESS PROCESSES:

» Which business processes can be applied to the data?

» What type of actions do those processes perform (Create, Read, Update, Delete)?

» Audit Trail – who has supplied, accessed, updated, approved and deleted the data and when?

» Which processes have acted on the data?

So where do I need Data Lineage?

For the design of ETL processes, the creation of Dimensional Models, the transforming data to XML (typically from ER) and for workflow design.

Don’t forget Data Lineage – it’s applicable to many aspects, and now with regulatory compliance requirements in many sectors this is also a statutory need.

In BI and DW applications, mappings and transformations determine how each field in the Dimensional Model is derived. The derivations could actually drive the ETL process. In lineage, like BI the metadata is vital!

Big Data Trap: Using the metadata to help understand and document Data Lineage (as well as to help with business data understanding, data glossaries and so on) is one of the areas which companies rush into. This “me too” attitude towards big data can be damaging if companies don’t tread incredibly carefully. After all, if you haven’t got your “little & medium” data strategy correct, how can you hope to succeed in the big data space?

What is the problem that lineage helps to address?

Fundamentally we need to be able to help business users to answer questions or concerns raised such as:

» That figure doesn’t look right! Where does it come from?

» How can we prove to the auditor that financial data has been handled correctly?

Not only do we need to help our primary customers (the business folks), but we also need to be able to help IT staff to answer questions such as:

» I need to integrate the data supplied from your system with the data in my system.

» How can I understand where your data has come from and what it means?

And finally, we need to be able to help systems to answer questions such as:

» When a piece of source data is updated, which items in the Data Warehouse will need to be recalculated?

So why does Data Lineage matter?

We aim to have an increased understanding of where data comes from and how it is used, which will lead to increased confidence in the accuracy of data

Page 28: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

28 | ENTERPRISE ARCHITECTS ©2014

5. DATA VIRTUALISATION

But what is going to be presented to the applications? We’ve got all sorts of different data formats, rules, characteristics and so on in the source data. So what are we going to show in our nice new uniform view of the data that is presented to the applications?

It’s the Data Model that is absolutely the language, the key which unlocks the potential of Data Virtualisation. The Data Model informs the federation layer of the DV toolset, and it is against the definitions & structures of the Data Model that the consuming applications access the data.

You can almost imagine Data Virtualisation as being “views on steroids”.

One of the great new technologies to emerge recently is Data Virtualisation. Most of us will be familiar with Storage Virtualisation and even Server Virtualisation. The purpose of virtualisation in the IT world is to mask complexity, and present a virtual representation of the thing as if it were a real instance itself.

So with Data Virtualisation, data can be federated from a very wide variety of heterogeneous environments and data storage systems, but presented to an application as if it were a real SQL table, XML message, Web service, SOAP call etc.

Figure 11 illustrates a typical data virtualisation architecture.

FIGURE 11: Typical Data Virtualisation Architecture

Page 29: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

29ENTERPRISE ARCHITECTS ©2014 |

6. COMMUNICATING WITH THE BUSINESS

In a Conceptual Data Model, the business key, attributes and definitions of major business data objects are developed. It also shows the relationship between major business data objects. It is used to communicate with the business, to give an overview of the main entities, super types, attributes, and relationships. It will contain lots of ‘Many to Many’ and multiple meaning relationships. All of this is addressed in the more detailed logical data model, after there is agreement on scope and definitions from these high level models.

Fundamentally, these high level models have different perspectives and levels of detail for different uses.

Finally, Data Modelling can play a very useful role in helping to communicate with the business.

As described earlier in this paper, Data Models can be produced at different levels (Enterprise, Conceptual, Logical, Physical) and are for different audiences. At the higher levels a model is a phenomenal tool for getting across ideas, concepts and gaining a good understanding of the language and meaning of the major data concepts in the business.

At the highest level, an Enterprise Data Model documents the very high level business data objects and definitions. Its scope is Enterprise wide and is there to provide a strategic view of Enterprise data. The Enterprise Data Model is there to get across big picture, high level concepts.

Page 30: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

30 | ENTERPRISE ARCHITECTS ©2014

DEMONSTRATING BENEF ITS

So just how can you gain traction, budget and Executive buy-in? Here are a few tips:

4. Start small. Focus on core data that is highly visible in the organisation. Don’t try to “boil the ocean” initially.

5. Track and Promote progress that is made.

6. Measure metrics where possible:

» “Hard data” is easy (for example # data elements, #end users, money saved, etc.)

» “Softer data” is important as well (data quality, improved decision-making, etc.). Anecdotal examples help with business/executive users e.g. “Did you realise we were using the wrong calculation for Total Revenue?” (based on data definitions)

As I mentioned earlier, we constantly need to demonstrate the benefits accruing from data modelling. Nobody owes us a living, and no matter how important WE believe the place of modelling to be, it is beholdant upon us to demonstrate (and sell) the benefits within our organisations.

Remember, soft skills are becoming critically important for Information Professionals, and whilst you might not like it, the hard facts are that part of YOUR job nowadays IS marketing.

1. Be visible about the program:

» Identify key decision-makers in your organisation and update them on your project and its value to the organisation.

» Focus on the data that is crucial to the business first! Publish that and get buy in before moving on (e.g. start small with a core set of data).

2. Monitor the progress of your project and show its value.

3. Define deliverables, goals and key performance indicators (KPIs).

Page 31: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

31ENTERPRISE ARCHITECTS ©2014 |

Image: Flickr, Sebastiaan ter Berg

BE VISIBLE ABOUT THE PROGRAM.

MONITOR THE PROGRESS OF YOUR PROJECT AND SHOW ITS VALUE.

DEFINE DELIVERABLES, GOALS AND KEY PERFORMANCE INDICATORS (KPIS).

START SMALL.

TRACK AND PROMOTE PROGRESS THAT IS MADE.

MEASURE METRICS WHERE POSSIBLE.

Page 32: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

32 | ENTERPRISE ARCHITECTS ©2014

THE GRE ATEST CHANGE REQUIRED

This for example means that Data Architects need to recognise the different motivations of their users, and re-purpose the information they present to be suitable for the audience: Don’t show a business user a Data Model!

Information should be updated instantaneously, and we must make it easy for users to give feedback, after all you will achieve common definitions much quicker that way.

We need to recognise the real world commercial climate that we’re working in and break away from arcane academic arguments about notations methodologies and the like. If we want to have Data Modelling play a real part in our business then it’s up to us to demonstrate and communicate the benefits that are realised.

Remember, Data Modelling isn’t a belief system, just because you “get it” don’t assume that the next person does.

As Information Professionals, we need to break away from the view “you must read my detailed Data Model” mentality and make the appropriate model information available in a format users can readily understand.

Page 33: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

33ENTERPRISE ARCHITECTS ©2014 |

5. Be aware of the differences in behaviour & motivations of different types of users, for example a DBA is typically:

» Cautious.

» Analytical.

» Structured.

» Doesn’t like to talk.

» “Just let me code!”

However a Data Architect is:

» Analytical.

» Structured.

» Passionate.

» “Big Picture” focused

» Likes to Talk.

» “Let me tell you about my data model!”

And a Business Executive is:

» Results-Oriented.

» “Big Picture” focused.

» Has little time.

» “How is this going to help me?”

» “I don’t care about your data model.”

» “I don’t have time.”

1. Provide information to users in their “Language”:

» Repurpose information into various tools: BI, ETL, DDL, etc.

» Publish to the web.

» Exploit collaboration tools / SharePoint / Wiki and so on. What about a Company Information Management Twitter channel?

» Business users like Excel, Word, Web tools, so make the relevant data available to them in these formats.

2. Document Metadata:

» Data in context (by Organisation, Project, etc.)

» Data with definitions.

3. Provide the Right Amount of Information:

» Don’t overwhelm with too much information. For business users, terms and definitions might be enough.

» Cater to your audience. Don’t show DDL to a business user or business definitions to a DBA.

4. Market, Market, Market!

» Provide visibility to your project.

» Talk to teams in the organisation that are looking for assistance.

» Provide short-term results with a subset of information, and then move on.

As Information professionals we’ve got to get these softer skills baked into ourselves and our colleagues. Some of the key things as a profession we can do are to:

» Remember, nobody owes us a living, so we must constantly demonstrate benefits. As data professionals we constantly need to fight for their existence.

» Examine professional certification (CDMP / BCS etc.). This shows we are serious about our profession.

» Develop interpersonal skills.

» Avoid methodology wars & notation bigots. Please don’t air discussions about Barker vs IE vs UML class diagrams in front of business users. Yes, sadly enough I have seen this done!

SO WHAT CAN WE DO?

Page 34: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

34 | ENTERPRISE ARCHITECTS ©2014

Image: Flickr, Mark Sebastian

Page 35: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

35ENTERPRISE ARCHITECTS ©2014 |

WHAT NEEDS TO STAY THE SAME?

Yes indeed.

We must keep the disciplines and best practices that have existed in the modelling community for many years. These can be categorised into 3 major areas as follows:

Having highlighted the areas that need to change in order to make modelling more relevant to our business colleagues, and the information environments of today, are there any things that should stay the same?

1. MODELLING RIGOUR

Development of Conceptual, Logical and Physical Data models with good lineage and object re-use.

Structures created in the most appropriate normal form (typically 3rd normal form); good and consistent data definitions, for all components of the data model.

2. STANDARDS & GOVERNANCE

These cover standards for both development and usage of information models, including aspects of data quality.

Data Governance including ownership, stewardship and operational control of the data.

3. OBJECT REUSE VIA A COMMON REPOSITORY

Not only used for data modelling, the metadata that is captured whilst developing Conceptual, Logical and Physical Data models is of immense use for many aspects of the business. Interestingly, several organisations are now beginning to use this metadata as the basis of their Business Data Dictionaries.

The key here is holding the metadata in a common, repository and reusing the objects where appropriate.

Page 36: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

36 | ENTERPRISE ARCHITECTS ©2014

CONCLUSION

We have seen that Data Models can be produced at different levels and for different purposes and audiences.

We have examined many aspects of Data Modelling, starting with its history, its use in DBMS development, the way it is taught in some Universities and firmly refuting the criticism that it is only appropriate for DBMS development.

However as data professionals, it’s up to us to make the biggest change necessary to make it appropriate to the technologies and business environments of today.

We need to grasp the nettle and engage and communicate effectively within our businesses.

Throughout this paper we have illustrated that data is at the heart of all architecture disciplines.

Page 37: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

37ENTERPRISE ARCHITECTS ©2014 |

SO AS CAPTAIN PICARD SAID:

MR DATA - MAKE IT SO!

Page 38: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

38 | ENTERPRISE ARCHITECTS ©2014

FOR MORE DETAILS CONTACT

CHRIS BRADLEY Chief Information Architect & Enterprise Services [email protected]

ABOUT THE AUTHOR

With 32 years experience in the Information Management field, Chris works with leading organisations including Total, Barclays, RBS, GSK, Disney, BP, Statoil, Riyad Bank & Aramco in Data Governance, Information Management Strategy, Data Quality & Master Data Management, Metadata Management and Business Intelligence.

He is a Director of DAMA- I, holds the CDMP Master certification, a Fellow of the Chartered Institute of Management Consulting (now IC) member of the MPO, and SME Director of the DM Board.

As a columnist and frequent contributor to industry publications, Chris is a recognised thought-leader in Information Management. He leads an experts channel on the influential BeyeNETWORK, is a regular speaker at major international conferences, and is the co-author of “Data Modelling For The Business – A Handbook for aligning the business with IT using high-level data models”. He also blogs frequently on Information Management (and motorsport).

Chris is the Global Chief Information Architect and UK Enterprise Services Director of Enterprise Architects, an International strategy & architecture professional services firm, providing strategic architecture delivery and support services globally.

CHRIS BRADLEY

Page 39: Data Modelling is NOT just for RDBMS's

ENTERPRISE ARCHITECTS WHITE PAPER

39ENTERPRISE ARCHITECTS ©2014 |

ABOUT ENTERPRISE ARCHITECTSEnterprise Architects (EA) is an international professional services firm specialising in business design, strategy and enterprise architecture.

OUR HISTORY

Enterprise Architects (EA) was founded in Melbourne, Australia in 2002 by Hugh Evans, our CEO. With his background in traditional architecture, Hugh was motivated to bring the benefits of Architecture Thinking to business strategy and transformation.

EA soon became a magnet for architecture talent, providing the ideal environment to learn and access strategic projects with some of the world’s most ambitious and forward thinking organisations.

A decade on, EA stands as one of the world’s premier firms delivering strategy and architecture and remains a pioneer in the growing practice of business design. We’re delivering a new kind of architecture capability, one that drives richer business engagement, strategic alignment and fast-paced change.

OUR PHILOSOPHY

Being a services firm we are centred on the needs and experiences of the people we impact. We believe good strategy requires participants to discuss opportunities and issues on common ground – comparing apples to apples.

Through our advanced business architecture-oriented methods we bring together all parties and build consensus and real belief for the strategic roadmap ahead.

OUR APPROACH

Our strength is more than just world-class practice in business design, capability-based planning and strategic enterprise architecture.

It’s about how we engage with clients, offering a seamless extension to their existing capability, however mature, and defining the roadmaps that will bring ground-breaking competitive strategies to life.

OUR EXPERIENCE

Many of the world’s leading brands trust EA to extend their business design and strategic architecture capabilities.

We are experienced across most major industry sectors including, Banking & Finance, Insurance, Tech, Energy, Oil & Gas, Telco, Health, Retail, Transport & Logistics, Professional Services, and Higher Education, as well as a broad range of government departments and agencies at local, state and federal levels.

Over the last 11 years we’ve developed architectures and supported capability for organisations across 5 continents.

Learn more about Enterprise Architects at:

enterprisearchitects.com

Page 40: Data Modelling is NOT just for RDBMS's

NEW YORK The Seagram Building375 Park Avenue, Suite 2607 New York City, NY 10152, U.S.A

+1 212 634 [email protected]

LONDON19 Eastbourne Terrace London, W2 6LGUnited Kingdom

+44 20 8906 [email protected]

MELBOURNELevel 46, Rialto South Tower525 Collins St Melbourne VIC 3000, Australia

+61 3 9615 [email protected]

SYDNEYLevel 3, 39 Martin Place Sydney NSW 2000, Australia

+61 2 8222 [email protected]

PERTHLevel 28, AMP Tower140 St Georges TerracePerth, WA 6000, Australia

+61 8 9278 [email protected]

BRISBANELevel 36, Riparian Plaza71 Eagle StreetBrisbane, QLD 4000, Australia

+61 7 3121 [email protected]