Upload
sas-nederland
View
228
Download
0
Embed Size (px)
DESCRIPTION
Future Bright - A Data Driven Reality - Engels
Citation preview
FUTURE BRIGHT
A DATA DRIVEN REALITY
5
7
10
14
18
20
24
32
36
42
46
48
52
56
60
64
Foreword by Bert Boers
Preface by Jeroen Dijkxhoorn
HR service provider Securex heads for 100% reliable CRM
Data-integration evolves thanks to Big Data and open source
Infographic Data Management
Rijkswaterstaat gains comprehensive insight into its performance
Ronald Damhof allows enterprise-wide discussion on data with the Data Quadrant Model
Improving insight into Belgium’s economic situation
DSM gains control of Master Data
Who is your data governor?
Jill Dyché blogs about her book The New IT
Credit insurance company integrates quality ratio’s in risk assessment
Data dependent on quality improvements
Data-driven decisions make the difference
Master Data Management as a foundation of your business
About SAS
TABLE OF CONTENTS
FOREWORD
The seventh edition of Future Bright explores the theme of ‘A Data Driven Reality’. The data driven society is evolving more rapidly than many organizations seem to realize. This is fuelled by developments such as the Internet of Things, which generates vast new flows of new data, creating new business opportunities. At the same time market research firm Forrester has declared that we are living in the ‘Age of the Customer’. Customers leave a digital trail and expect their suppliers to use that information to provide a better and more relevant customized offering.
Both developments have a significant impact, as many organizations are now beginning to realize. But as
yet they are taking little structured action to genuinely prepare their organization for this Data Driven
Reality. That’s understandable, because this is uncharted territory. How can you ago about it? Where do
you start?
Organizations are aware that they first need to get their foundation in order. At the same time they see a
lot of low-hanging fruit that they can pick with rewarding projects in the field of big data analytics. How do
those two worlds interrelate? Which investment generates the fastest return?
We aim to provide new insights with this book. Drawing on interviews with customers such as Securex,
DSM, Rijkswaterstaat and Crédito y Caución and experts as Jill Dyché and Ronald Damhof, we show the
steps necessary to give data management a central role in your organization, so that you can get the
basics in order and can fully exploit your data to drive innovation, conversion and satisfaction.
We hope you find it inspiring reading.
Bert Boers
Vice President South-West Europe region
SAS Institute
7
PREFACE
Jeroen Dijkxhoorn
DIRECTOR ANALYTICAL
PLATFORM CENTER OF
EXCELLENCE AT SAS
8
The fact that data was a by-product of the process resulted in databases with a considerable number of
errors and omissions. To cope with this, the data was always validated before anyone used it. If a lot of
data was found to be incorrect, all efforts were suddenly focused on supplying missing data, correcting
incorrect data, and/or cleaning contaminated databases. Human intervention was always required.
Data automatically initiates processesThis operating method is becoming problematic now that data sources are increasingly linked and
processes are likely to start at any time. Whereas the start time of an e-mail marketing campaign
used to be determined by a marketer, it now starts when triggers are received from the customer and
you want to respond. The more you understand the customer journey, the easier it will be to respond
to those triggers and the more relevant you will be as an organization for your customers. This forces
you to set out policies stating how your organization will respond when your customer or prospect
requests certain information or signs up for an e-mail newsletter.
The process then continues entirely automatically, without human intervention and hence also with-
out the validation that used to take place. The correctness of data therefore has to be checked auto-
matically, by means of services which can be deployed throughout the process. Here we can draw a
distinction between data validation (technical correction of data in a physical data stream) and data
quality (verification of functional correctness).
Data Driven RealityOrganizations used to be driven by processes, but now they are driven by data. This means any failure
to identify an error can have an immediate, major impact. Manual correction is no longer possible, so
the error will show up in multiple locations. That makes data quality monitoring much more impor-
tant. It also explains why compliancy rules and regulations are imposing new data quality require-
ments. Supervisors nowadays want data, not reports. That requires a data driven organization. We are
now speeding towards a data driven reality.
The problem is not technology, but the lack of central supervision of the consistency of data defini-
tions – data governance. That is typically the task of a Chief Data Officer, which many organizations
still lack.
Data quality has been an issue ever since the first database was created. It was a subject that for a long time received little attention, for the simple reason that process efficiency was always more important than the completeness and accuracy of data. Data was a by-product of the process. This time is over. We are heading towards a data driven reality.
PREFACE
Age of the Customer and Internet of Things are driversIt is high time to take action, because in the Age of the Customer you need to respond flexibly to trig-
gers from customers. This requires a 360 degree view of the customer. We have been talking about it
for many years, but still don’t have it because customer data are spread across various systems. The
lack of supervision of data definitions makes it impossible to pull the data together.
Another driver is developments resulting from the Internet of Things. This will generate a new stream
of data that you will want to use to optimize and largely automate your processes. This also requires a
good vision on data management.
Combination of different types of dataWhichever of the two stated realities is your main driver, in both situations it is increasingly important
to combine 100% reliable data with data containing a degree of uncertainty. Examples are weather
forecasts or social media sentiment analyses. How is it possible to combine these unstructured data,
often stored in Hadoop clusters, in an appropriate way with structured data that is 100% accurate,
such as route planning for truck drivers or purchasing behaviour data?
From a cost point of view it is not feasible to store all that data in the same database. But that would
also be highly undesirable from an organizational point of view, as Ronald Damhof explains later in
this book. After all, there is a big difference between data with which you have to account to supervi-
sors and data which you use to experiment, in pursuit of ideas for innovation. And yet those different
ways of using data must be combined, without physically lumping all the data together.
This complexity requires a clear logical data model and clear data definitions. Without these data
definitions and good data stewardship, it is impossible to exploit the opportunities that are arising in
the market and which your competitors will respond to in droves. The question is therefore no longer
whether you will start, but when. Our advice is: act today. Data is your main asset. Act accordingly and
do something with it, before a competitor or a new market player beats you to it. ■
9
“Organizations used to be driven by processes, now they are driven by data”
Jeroen Dijkxhoorn
PREFACE
10
CLEANING UP THE CLIENT RELATIONS DATABASE AND THEN KEEPING IT CLEAN
HR SERVICE PROVIDER
SECUREX HEADS FOR
A 100% RELIABLE CRM
CASE
11
Like many companies, HR service provider Securex was witnessing severe problems with their CRM database. Chief among the problems was that marketing data was poor and becoming increasingly unreliable. They cleaned up and updated the database using a SAS Data Management platform. On top of that, this platform is also being set up as a permanent watchdog to ensure the accuracy and consistency of both batch updates and manual data manipulations. The result has been an improved database with greatly enhanced contact information.
CASE
12
Securex is an HR company active in Belgium, France, and Luxemburg providing services for large busi-
nesses, SMEs, self-employed professionals, and private individuals. Services include Payroll, Staff and
Insurance Management, HR Consulting, and Health and Safety Services. Securex has a staff of approxi-
mately 1,600 throughout their nearly 30 offices, serving more than a quarter of a million clients.
Data inconsistencies lead to frustrations“We want to make sure that whenever anyone within the organization enters or modifies data, the
changes are automatically processed and rectified,” reports Securex Business Architect Jacky Decoster.
Any data inconsistencies invariably result in considerable frustration by everyone involved. Employees
are constantly updating client data, adding and changing contact information and contract data, all
while marketing teams are uploading foreign data for new campaigns and other client communi-
cations. “Each of these manipulations can produce small data errors or inconsistencies,” observes
Decoster. “Since the data is being manipulated by a multitude of persons and departments, problems
can easily arise such as duplicate entries, client records with incomplete contract information, and
missing contact information such as first name, gender, post and e-mail address, or phone number.
This is frustrating, especially for marketing teams running a campaign: many e-mails simply bounce,
some mail is sent twice to the same person, and others are based on wrong or missing information.
This sometimes severely damaged our reputation.”
Although a centralized SAP CRM database had been in place since 2004, the problems have been
growing worse in recent years. Decoster noted that complaints about data quality were coming in
from both staff and clients. “Obviously we had to do something about it and do it effectively and con-
vincingly.”
SAS Data Management clean up successfully launchedThe data quality issue was put high on the agenda when Securex launched its comprehensive Client+
project. This change project included the migration of the highly customized SAP CRM database into
the cloud-based, standardized, scalable Salesforce.com solution. Securex decided to deploy SAS Data
Management to facilitate that migration. Decoster explains that their reasoning proved to be spot on.
“SAS Data Management enabled us to meticulously clean the data before uploading it into our new
database. The data were normalized, duplicate entries were merged, and missing information was
automatically added whenever. SAS Data Management has built-in tools such as data dictionary defi-
nition, fuzzy matching, full name parsing, reliable gender determination, phone number standardiza-
tion, and e-mail address analysis that comprehensively covered all of our concerns. We have already
completed the migration of our enterprise accounts in record time and the marketing department
tells us they have virtually zero complaints about data quality. It is a huge improvement that would
have been unthinkable without SAS Data Management. We are now finalizing our self-employed and
individual accounts.”
A permanent watchdog for data qualityDecoster insists however, that improving data quality is not a one shot affair; it must be a continuous
concern within the organization. It is one reason why Securex opted for a comprehensive approach.
Their Client+ project includes the redefinition and streamlining of marketing and sales processes. Part
CASE
of this effort is sensitizing staff about the impact of their data manipulations and insisting that they
be both careful and precise. At the same time, SAS Data Management is being set up as a permanent
watchdog for data quality. Decoster explains why: “One can never be 100% sure that every single bit
of data will be entered correctly, even when people are fully trained and sensitized. That is why we
have SAS Data Management make consistency checks and updates on a regular basis, in fact every
week. Our next step will be to implement a near real time check. Whenever someone in the organiza-
tion enters or modifies data, the changed record is automatically processed and corrected by SAS Data
Management. This is a process that takes just a couple of seconds.”
Robust architecture and great flexibilityDecoster and the Securex staff have nothing but praise for the robust architecture and great flexibility
of the SAS Data Management platform. The system can be integrated into any software environment.
For example, SAS Data Management provides direct certified connectors to a variety of systems,
including Salesforce.com. This avoids the development of customized interfaces. Furthermore, all func-
tionality is offered in stored procedures, ensuring that every transaction is safe and reliable.
SAS Data Management is also easy to deploy. “It has a powerful data profiler, which enables us to
examine any available data and assess their reliability along with the risk involved in integrating
them into new applications. We use this profiler in particular to analyze all data we purchase.” The
software also provides a powerful tool to define batch jobs to clean and normalize the data, based on
the profiling statistics and information. Decoster then added a final plus. “The learning curve for SAS
Data Management is very short: after a two day training we were able to define all of the jobs we
needed.” ■
13
“Marketing says that the data quality has improved dramatically, an achievement that we previously considered impossible”
Jacky Decoster
CASE
14
INTERVIEW
“For me, Big Data does not exist as a volume concept.” This is a remarkable statement for a data integration expert to make. “Size is relative. It reflects where you come from.” As such, you cannot define a lower threshold for the ‘big’ in Big Data, but the phenomenon does touch on the field of data integration, which itself has practically become a commoditized specialization.
15
INTERVIEW
These doubts about the existence of ‘big’ data are voiced by Arturo Salazar. He is Data Management
Advisor Analytical Platform at SAS. Salazar explains how ‘big’ has a whole other meaning for a small
business than it has for a large corporation such as a financial institution. So he argues that there can
be no lower threshold for the ‘big’ in Big Data.
The Big Data trend certainly has major implications for the field of data integration, as this field is now
confronted with more data and more unknown data variables. Salazar explains that data integration
has existed for some time now and is considered almost a commodity today. However, this is not to
say that all organizations feel completely at home with data integration: the importance of using and
efficiently deploying data is not understood by all. But as a specialization it has now reached adult-
hood.
Outside the organizations comfort zoneThe ‘big’ in Big Data is indeed a relative dimension, he agrees. “Big Data is all data that falls outside
the comfort zone of an organization.” It also involves the comprehensiveness of the data sources
and, even more important, the speed with which the information can be integrated in order to derive
new and deployable insights from it. The question arises whether Big Data is related to the degree
of maturity of an organization. If the data in question is only just outside the comfort zone, isn’t it a
question of simply expanding, of growing up? “Yes, it is a question of reaching maturity,” says Salazar.
Monthly reporting is inadequateHe continues: “Businesses are currently going through a technology transformation with regard to
the way information used to be collected and used.” He mentions the striking example of how data
integration was introduced, already some years ago now. “Take the web world, for example, and logs
of web servers.” The machines recorded web page visits and click-throughs in their log files, including
the IP addresses of the origin servers, cookie data, et cetera.
DATA INTEGRATION EVOLVES THANKS TO BIG DATA AND OPEN SOURCE
How to deal with the explosion of data and the importance of analysis
16
“All those clicks; that is a lot of data.” And it’s all data that could be useful. Salazar puts it into
perspective: “Most log data can actually be thrown out, but you simply don’t know which data you
should keep.” Moreover, the valuation of data on web surfing has shifted. Data that used to be erased
may now actually prove to be valuable. While it used to be of little interest which other pages on the
same website a visitor clicked to, today that data could be crucial. “Take the real-time recommenda-
tions provided by modern online retailers.” Another example is tracking surfing habits while visitors are
logged on to a site. Profiling is now a spearhead in the customer loyalty campaigns of web companies.
Growing data mountains versus storage costsThe example of logs in the web world has brought a wider awareness of the usefulness of data inte-
gration. This has in turn fed a demand to deploy data integration for a wider range of applications.
There is added value to be had from connecting the website data to ‘traditional’ information sources
such as a CRM system or a data warehouse. It has long been recognized that such a connection makes
sound business sense. Continuously evolving insights have since ensured that this is not simply limit-
ed to the one-sided import of website logs in a CRM application, for example. Efficient use of the data
requires two-way traffic and a wider scope. This means the amount of data gets bigger and bigger.
In the first instance, the rapid growth of the data that companies collect, store and correlate may
not appear to be a major problem. The capacity of storage media continues to increase, while the
price per gigabyte is being forced down. As if hard drives abide by their own version of Moore’s law,
the exponential growth in the performance of processors. However, not only is the curve for storage
capacity increasing less steeply than that of processors, the increase is also insufficient to keep ahead
of the explosive data growth.
Extracting unknown nuggets of dataAn additional problem for data integration in the information explosion is the software, and more spe-
cifically database software. A very significant proportion of the looming data mountain cannot simply
be stored in a relatively expensive database or a costly data warehouse configuration. Although these
enormous mountains of data might contain gold, it is as yet unknown how much there is and where
it is. The SAS experts confirms that this is in essence a chicken and egg problem: the as yet unknown
value of the data versus the cost of finding it. But hope looms for this form of data mining. New tech-
nology is relieving the pioneers of the manual task of sieving for nuggets in the streams that flow out
of the data mountain. Nor do they have to laboriously dig mine shafts in the mountain any longer.
Going down the same road as LinuxThis is where the Hadoop open source software comes into play, a cheap software solution that runs
on standard hardware and that can store and process petabytes of data. How powerful is it? Hadoop is
based on technology developed by search giant Google to index the internet. “Hadoop is going down
the same road as Linux,” explains Salazar. The market is gradually adopting it for more serious appli-
cations. “No one wants to store their logs in an expensive database.” However, a problem for many
businesses is that Hadoop is still at the beginning of the road that Linux travelled long ago. Both stem
from very different worlds than what regular businesses are used to and both require quite some
technical knowledge, of users as well.
“Initially people were afraid of Linux too,” says Salazar. Since then, companies like Red Hat have
combined the system’s core software with business applications and offer the results as packages.
Hadoop has just started this packaging process. He points to Cloudera and Hortonworks; he thinks
INTERVIEW
these programs will do for Hadoop what Red Hat did for the adoption of Linux. “Many businesses still
consider Hadoop intimidating and too complicated,” says Salazar. They normally employ specialists for
such open source software, for installation and configuration as well as maintenance and even every-
day use. What skills are needed? Experienced programmers who have coding skills and administrative
ta lent, alongside the knowledge and expertise normally associated with data analysts. This is a rare
and therefore expensive combination of qualities.
Bringing Hadoop to the massesDespite its complexity, Hadoop is gaining popularity. “It offers so many advantages,” explains Sala-
zar. Business Intelligence vendor SAS is also responding to this trend. He says that the company uses
technology such as Hadoop “under the hood”. The complexity of this software is hidden within pro-
cesses and programs that the customer is familiar with. Businesses are able to focus on actually using
the tools for data integration, instead of first having to call on special experts with knowledge of the
underlying software.
In February 2015, SAS has introduced a new product to its data management range to increase the
user-friendliness of Hadoop under-the-hood. Salazar explains that the new web-based application,
called SAS Data Loader for Hadoop, will make it possible to delve even deeper into the data mountain.
This application can be used to prepare and then mine the data stored in Hadoop and can be used by
data analysts and even ordinary users. Soon we will all be able to mine for gold! ■
17
INTERVIEW
“Although these enormous mountains of data might contain gold, it is as yet unknown how much there is and where it is”
Arturo Salazar
Arturo Salazar
DATA MANAGEMENT
ADVISOR ANALYTICAL
PLATFORM AT SAS
18
DATA MANAGEMENT
19
20
Jacorien Wouters
PROGRAMME
MANAGER FOR THE
NETWORK MANAGEMENT
INFORMATION SYSTEM
AT RIJKSWATERSTAAT
CASE
21
RIJKSWATERSTAAT GAINS COMPREHENSIVE INSIGHT INTO ITS PERFORMANCE
Rijkswaterstaat, the executive agency of the Dutch Ministry of Infrastructure and the Environment, is responsible for the principal highway and waterway networks and the main water system in the Netherlands. In order to account to the Dutch Ministry of Infrastructure and the Environment and to the lower house of the Dutch Parliament, besides managing its own internal organization with regard to operational processes, Rijkswaterstaat needs to have the right information at the right time and to be able to access it internally and externally. For this purpose it developed the Network Management Information System (NIS).
Rijkswaterstaat began developing the NIS a number of years ago. The system was designed to
give an integrated insight into the performance delivered by Rijkswaterstaat, allowing tighter
control and providing a broad view of the overall achievement. In the tendering process,
Rijkswaterstaat selected SAS’ solutions because they were able to support the entire process
from source to browser. Rijkswaterstaat consequently uses SAS Business Intelligence and SAS
Data Management for the NIS.
“The NIS is now one of the most important information sources for the management of Rijkswa-
terstaat,” says Jacorien Wouters, Programme Manager for the NIS. “It has brought together the
two largest flows of information on our organization’s networks: the performances of the net-
works and data on assets such as roads and bridges, but also the Wadden Sea, for example. This
was preceded by an intensive data integration process.”
An integrated and clear view across highway and waterway networks
CASE
22
Better decisionsWhen the NIS was introduced in 2004, the data from various applications was spread across the
information systems of Rijkswaterstaat’s ten regional departments. Now, the NIS periodically obtains
data from over 40 source systems. The power of the system lies among other things in the possibility
of combining data and presenting it in charts and maps. This gives a fast and clear insight into the
performance of the individual departments and of Rijkswaterstaat as a whole. The figures in the NIS
have official status. That is very important internally, but also externally since Rijkswaterstaat reports
to the Ministry three times a year on the status of specific performance indicators, or PINs. As part of a
service level agreement, appointments have been made for a four-year budget period.
More complex analysesIn addition to improved control, access to the information has been greatly simplified, as is clear from
the increasing number of NIS users at Rijkswaterstaat. “Information which previously could only be
obtained by a few employees from a specific data source is now available through the NIS portal to all
employees at Rijkswaterstaat,” Wouters explains.
CASE
A single version of the truth“The insight into the underlying data helps us to operate more efficiently and hence ultimately to
save costs,” Wouters continues. “The clear reporting method also saves time. Now there is a single
version of the truth, so discussions on definitions or figures are a thing of the past. The fact that infor-
mation is more readily available in the NIS means we can also make faster, better adjustments. We
used to report on performance three times a year and matters came to light which we would have
preferred to tackle immediately. Now we can do just that.”
DevelopmentsIn implementing SAS Rijkswaterstaat took a step towards improving data quality. It also started to use
SAS Visual Analytics. “As we simply have more insight into our data, our management can take more
forward-looking decisions,” says Wouters. “We’re making constant progress in combining information,
highlighting connections which would not previously have been visible.” ■
23
“The insight into the underlying data helps us to operate more efficiently and hence ultimately to save costs. The clear reporting method also saves time. Now there is a single version of the truth”
Jacorien Wouters
CASE
Ronald Damhof
INDEPENDENT
CONSULTANT
INFORMATION
MANAGEMENT
24
INTERVIEW
25
“Make data management a live issue for discussion throughout the organization”
Independent information management consultant
Ronald Damhof developed the Data Quadrant Model
The data management field is awash with jargon. Most business managers have no idea what all those terms mean, let alone how to use them in understanding the precise value of particular data and how to handle it. To allow an enterprise-wide discussion on data, Ronald Damhof developed the Data Quadrant Model.
INTERVIEW
26
Damhof works as an independent information management consultant for major organizations such
as Ahold, De Nederlandsche Bank, the Dutch tax authorities, Alliander, and organizations in the finan-
cial and healthcare sectors. These are data-intensive organizations which share a growing realization
that the quality of their work is increasingly determined by the quality of their data. But how do you
move from that realization to a good data strategy? A strategy which everyone in the organization
understands, from the director in the boardroom to the engineer in IT? Damhof developed a quadrant
model to make data management a live issue for discussion.
To push or pull?Damhof starts by explaining a concept which everyone will have encountered in high school: the
‘Push Pull Point’. This concerns the extent to which demand impacts the production process. He takes
as an example the building of a luxury yacht, a process that does not start until the customer’s order
is known. The decoupling point is at the start of the production process. We can take matches as an
opposite example. If a customer wants matches, he or she goes to the supermarket and buys them.
Unless he wants black matches, then he is out of luck. The decoupling point is right at the end of
the production process. The production of a car, however, comprises standard parts and customized
parts. Customers can still state that they want a specific colour, leather upholstery or different wheel
rims. The decoupling point lies somewhere in the middle of the production process. “Similarly, in the
production of a report, dashboard, or analytical environment, the decoupling point lies somewhere in
that middle area,” Damhof explains.
The decoupling point divides the production process into two parts: a push and a pull side, also
referred to as a supply-driven and a demand-driven part. Push systems are aimed at achieving eco-
The Data Push Pull Point
INTERVIEW
Push/Supply/Source driven
• Mass deployment• Control > Agility• Repeatable & predictable processes• Standardized processes• High level of automation• Relatively high IT/Data expertise
All facts, fully temporal Truth, Interpretation, Context
• Piece deployment• Agility > Control• User-friendliness• Relatively low IT expertise• Domain expertise essential
Business Rules Downstream
Pull/Demand/Product driven
27
nomies of scale as volume and demand increase, while the quality of the product and the associated
data remains guaranteed. On the other hand there are pull systems which are demand-driven. Diffe-
rent types of users want to work the data to produce ‘their’ product, their truth, on the basis of their
own expertise and context.
Opportunistic or systematic development?On the y-axis Damhof projects the development style dimension. “By that I mean: how do you
develop an information product? You can do so systematically; the user and the developer are then
two different people and you apply defensive governance, aimed at control and compliance. This puts
into practice everything that engineers have learned in order to create software on a sound basis. You
often see this in centralized, enterprise-wide data, such as financial data and data which is reported to
regulators.” You can also use an opportunistic development style. “In that case the developer and the
user are often one and the same person. Take for example the data scientist who wants to innovate
with data, who wants to produce and test analytical models. Or situations in which speed of delivery
is essential. The governance in these cases is offensive, which means the focus is on flexibility and
adaptability.”
“A quote I have stolen from Gartner analyst Frank Buytendijk: in an average organization the car park or art collection is better managed than data”
Ronald Damhof
The Development Style
INTERVIEW
Systematic
• User and developer are separated• Defensive Governance; focus on control and compliance• Strong focus on non-functionals; auditability, robustness, traceability, ….• Centralised and organisation-wide information domain• Configured and controlled deployment environment (dev/tst/acc/prod)
• User and developer are the same person or closely related• Offensive governance; focus on adaptability & agility• Decentralised, personal/workgroup/department/theme information domain• All deployment is done in production
Opportunistic
28
Data Quadrant ModelThe combination of these two dimensions produces the following picture.
“Quadrant I is where you find the hard facts,” Damhof explains. “This data can be supplied intelligibly
to quadrants II and IV in its full, raw volume. Data in quadrant I is produced by highly standardized
systems and processes, so it is entirely predictable and repeatable.”
Diagonally opposite, in quadrant IV, is data that is characterized by innovation and prototyping. “This
is the quadrant in which the data scientists work, who actually have only three demands: data,
computer power, and cool software.” Increasingly, separate departments are set up as innovation
labs giving data scientists free rein to use the data for experimentation and analysis, with the aim of
innovation. “You need this type of data management to discover and test good ideas. When a concept
works, it needs to be raised from the fourth to the second quadrant, because you can only achieve
economies of scale with data if you can generate and analyse it systematically. You can then use it
enterprise-wide.
“I often talk to data scientists who obtain very sound insights in a kind of sandbox environment,”
Damhof continues. “But they forget or are unable to monetize those insights in a production situation.
They cannot bring their insights from quadrant IV to quadrant II. This is where governance comes into
play.” And therein lies the major challenge for many organizations, as Damhof knows only too well.
“If you explain this model to managers and ask where their priority lies, they will all say they first
have to get their foundations in order, the first quadrant. But if you ask what they are investing their
A Data Deployment Quadrant
INTERVIEW
Systematic
Opportunistic
Development Style
Data Push/Pull Point Pull/Demand/Product drivenPush/Supply/Source driven
IVIII
III
“Shadow IT, Incubation, Ad-hoc,
Once off”
Facts Context
Research, Innovation &
Prototyping Design
29
money in right now, where they are innovating, it is often in the fourth quadrant. It is great that they
are engaged in this more experimental and exploratory form of data management, but that is only
possible if your foundations are right. Otherwise it is like having a hypermodern toilet that is not con-
nected to the sewer system, so it turns into a total mess.” Ask the average data scientist what takes
up most of his or her time and he or she will answer getting the data to the right qualitative level: the
aim of quadrant 1. “Only a data scientist with powerful analytical software, a lot of computer power,
and high-quality data will genuinely make a difference.”
Reliability versus flexibility“Managers insist that systems must be reliable and flexible, but these qualities are inversely related.
A highly reliable and robust system is less flexible. And in an extremely flexible system it is necessary
to lower the requirements with regard to reliability,” says Damhof. “The Data Quadrant Model makes
this clear to managers. In quadrant I reliability takes precedence over flexibility and in quadrants II
and IV flexibility takes precedence over reliability.” Quite a few different types of expertise and com-
petence are therefore required in order to make optimum use of data.
Expertise and competencesYou often find that organizations require a single person to supply expertise and competences which
cover the entire quadrant. Such people do not exist. Employees in quadrant I have an engineering
profile. They are information and data engineers, trained in data architecture and data modelling.
“Note that this is not the classic IT profile. These are engineers who can carry out model-driven
development and have a solid understanding of the need for conceptual and logical modelling.” This
expertise is very scarce. Quadrants II and IV on the opposite side require people with expertise in the
respective business domain supplemented by Business Intelligence and/or analytical competences.
Facts and truthDamhof also calls quadrant I of the model ‘the single version of the facts’. Those facts are then made
available to employees in quadrants II and IV. That enables them to create their own thuths. Since the
same facts are used to create multiple truths in the right-hand half of the model – depending on the
“With organizations generating ever greater volumes of data, they can no longer be so
slapdash in the way they handle it. Now is the time to make sure your data manage-ment and the associated governance are
properly set up. The Data Quadrant Model helps you to achieve this”
Ronald Damhof
INTERVIEW
30
context and background of the data user – Damhof calls this half ‘the multiple version of the truth’.
You should bear in mind that the ‘truth’ quite often changes over time. “You often hear companies
talking about ‘the single version of the truth,’ but there is no such thing. After all, how you interpret
particular facts depends on the context, your outlook, background knowledge, and experiences.”
Quadrant IIISo far, Quadrant III has received little mention, even though it is incredibly important. It is the quad-
rant of data sources which are not under governance, like an ad hoc download which you obtain from
an open data provider, a list in Excel that you want to use, or a set of verification data which you have
received on a CD. “You may even want to combine governed data from quadrant I with your own
dataset in quadrant IV, that’s fine,” says Damhof.
The journey through the quadrantsIn order to get value from data, you can make various movements in the model. You can move from
fact-based data management towards a model in which the context is also important (from quadrant
I to II). “This actually is the classic journey of ‘unlock data and produce an information product,’” says
Damhof. This is often inefficient, however, because this process is based on known requirements and
wishes on the part of the user. “And the user does not really have that knowledge in advance.” Many
organizations opt for a more agile-driven form, such as from quadrant I to quadrant IV to quadrant II.
Have the employees in quadrant IV produce an information product in an iterative way using the data
in quadrant I/III. You then promote the product to quadrant II only if it is important to bring this under
management.
“People in the business world often talk about ‘the single version of the truth,’ but there is
no such thing. There is a ‘single version of the facts’ and there are multiple ‘truths’. After all, how you interpret facts depends on the
type of organization, your outlook, background knowledge, and experiences”
Ronald Damhof
31
It is also possible to move from quadrant III to quadrant IV. “You have your own datasets and you
want to try something? Great,” says Damhof. The only movement an organization must never make
is from quadrant III to quadrant II. “Because in that case you use data that you are not entirely sure
of, as it has not been subjected to good governance in the required way. An example is a compliance
report for the regulator which you want to produce using data which is not under governance. You
should not seek to do that.”
Make data management a live issue for discussionIn his day-to-day work Damhof finds that his Data Quadrant Model helps organizations to talk about
data management. “From my current customer, De Nederlandsche Bank, I regularly hear statements
such as, ‘I want to move this data product from quadrant IV to quadrant II;’ or, ‘We must put the data
in quadrant I first, but the submitter is really responsible for the data in quadrant I;’ or, ‘I want some
space to store data temporarily in quadrant III.’ Everyone understands what it means. That is new;
the organization has never thought about data in that way before. And that actually applies to almost
every data-intensive company. Organizations have long spoken of data as an ‘asset,’ but in practice
they handle data in a very unstructured way. As a result they never monetize that asset. With orga-
nizations generating ever greater volumes of data, they can no longer be so slapdash in the way they
handle it. Now is the time to make sure your data management is properly set up. The Data Quadrant
Model will help you to achieve this.” ■
How we produce, process variants
INTERVIEW
Systematic
Opportunistic
Development Style
DataPush/Pull Point Pull/Demand/Product drivenPush/Supply/Source driven
IVIII
III
“Shadow IT, Incubation, Ad-hoc,
Once off”
Facts Context
Research, Innovation & Design
32
CASE
Caroline Denil
PROJECT MANAGER
BELGIAN FEDERAL
PUBLIC SERVICE
IMPROVING INSIGHT INTO BELGIUM’S ECONOMIC SITUATION
Immediate access to easily comprehensible data
33
CASE
Vincent Vanesse
BUSINESS ANALYST
BELGIAN FEDERAL
PUBLIC SERVICE
IMPROVING INSIGHT INTO BELGIUM’S ECONOMIC SITUATION
The Belgian Federal Public Service (FPS) Economy committed itself to creating a more powerful and transparent presentation of the Belgian economic situation for the general public, statisticians, and university students, among many others. Together with SAS, it created a single web portal that offers visitors direct access to the principal indicators of the Belgian economic situation.
34
All indicators are visualized in graphs for better comprehension and are fully customizable so that
users can immediately consult the indicators in which they are interested. The portal not only created
a user-friendly statistical environment, it also opened up possibilities for new business opportunities
within other Directorate Generals within the Belgian federal government.
Scattered information creates time-consuming researchOne of the main missions of the FPS Economy is the generation and publication of statistics and
figures characterizing the Belgian economic situation. Until recently, this information was accessible
through various sources: Statbel, be.Stat, Belgostat, and the National Bank of Belgium. In such a situ-
ation, it is difficult for students, researchers, journalists, and the many other users to find the required
information to answer their specific questions and draw accurate conclusions. Hence, the FPS Economy
initiated a project to improve the user-friendliness of economic data.
Multi-departmental collaboration improves statisticsThe first goal of the project was to increase the value of information. This proved to be an intense,
but truly indispensible process bringing together FPS Economy business analysts and statisticians. The
process led to the development of graphs depicting economical information, as well as metadata that
users can consult to better understand the information being presented. “As a result, some twenty
graphs were selected and then subdivided into eight categories, including among others, energy,
gross domestic product, and consumer price index,” states Vincent Vanesse, Business Analyst at the
FPS Economy.
A single portal for all economic indicatorsNext, the FPS Economy teamed up with SAS in order to make the economic indicators accessible via
a user-friendly tool. “We have been working with SAS for quite a long time now. As a result, we are
thoroughly familiar with their competence. The exceptional statistical capabilities, robustness, and
extendibility of their solutions made our choice of SAS for this particular project obvious,” notes Caro-
line Denil, Project Manager at the FPS Economy.
The collaboration resulted in the launch of a single web portal (Ecozoom) where various users can find
all of the economic indicators they need in just a few mouse clicks. “From now on, finding information
on Belgium’s economic situation is easy,” observes Denil. “The Ecozoom tool on the FPS Economy
“From now on, finding information on Belgium’s economic situation is easy. The Ecozoom tool on the FPS Economy website gives immediate access to the twenty main economic graphs”
Caroline Denil
CASE
35
website gives immediate access to the twenty main economic graphs. Those who want more detailed
information can still click-through to the traditional Statbel, be.Stat, and Belgostat websites.”
Visualization facilitates comprehensionThe online portal presents the economic indicators as graphs that make the information much easier
to interpret quickly and accurately. Denil points out that deducing trends based on a graph is far easier
than using a table or a long series of figures.
In addition, the tool is able to visualize four graphs simultaneously. This facilitates comparisons
between various types of data to verify the magnitude of the effect of, for instance, the number of
company failures on the unemployment rate.
The old adage that “a picture is worth a thousand words” certainly holds true for the FPS Economy
confirms Denil. “Our graphs can often convey much more than a lengthy text or series of tables. In our
specific situation, the graphs certainly help users to more easily and precisely evaluate the economic
situation in Belgium.”
Customization enhances userfriendlinessVanesse is quick to point out that the four graphs that are depicted on the Ecozoom homepage are
fully customizable. “Users can select the indicators they are most interested in and save this infor-
mation. Each time they subsequently consult the tool, they will immediately start with their desired
information.”
Opening up new opportunitiesAlthough the Ecozoom tool has considerably increased the userfriendliness of economic data, the FPS
Economy is already looking into possibilities that will extend its userfriendliness even further. “We are
currently testing geo-visualization in order to visualize data for specific Belgian regions,” illustrates
Denil. “On top of that, we are also planning to make the tool accessible for mobile use on smart-
phones and tablets.”
The Ecozoom tool might potentially even open up new business opportunities. “The tool has gene-
rated interest in other Directorate Generals, up to and including the top management level. This could
intensify the collaboration between the various FPS, and even create a new type of service,” con-
cludes Denil. ■
“Users can select the indicators they are most interested in and save this information. Each time they subsequently consult the tool, they will immediately start with their desired information”
Vincent Vanesse
CASE
36
CASE
37
DSM is convinced of the value of good data quality. The global science-based company that operates in the field of health, nutrition and materials has already implemented data quality successfully and is building on that success with SAS Master Data Management (MDM).
DSM GAINS CONTROL OF MASTER DATA
DSM introduces MDM, building on its successes
with data quality
CASE
38
MDM is a method of managing business-critical data centrally for decentralized use. Errors and dis-
crepancies in the so-called Master Data are tackled: items such as customer names, material types,
suppliers and other data used across divisions and IT systems. Consistency in that critical business data
plays a vital part in supporting efficient operations. MDM gives DSM control of the ERP systems which
it has absorbed in the wake of major acquisitions over the past few years.
From state-owned mines to chemicals and the environment“We have 25,000 employees, a substantially higher number than five years ago due to acquisitions,”
says Bart Geurts, Manager Master Data Shared Services at DSM. Geurts cites the acquisition of Roche
Vitamins in 2003 as one of the major purchases. DSM is now the world’s largest vitamin maker, and
that involves different data requirements. “Good data quality is extremely important, for food safety
and health as well as flavour. It is less critical for bulk chemical products.” Geurts alludes to DSM’s
origins in the state-owned mines of the Netherlands. “Old businesses know that they have to reinvent
themselves in order to survive.”
DSM has reinvented itself several times, from mining to petrochemicals, and in recent years from fine
chemicals to life and material sciences. In its current form DSM focuses on emerging markets and cli-
mate & energy. Geurts cites lighter materials such as a replacement for steel in cars that reduce their
weight and make them more economical. The group also develops products that are manufactured
using enzymes rather than oil. These are different activities and different markets, so the company
has different requirements in terms of company data.
More complete organization overviewThe many acquisitions involved in this transformation brought not only new activities and people, but
also many new IT systems. Geurts explains that these included a large number of ERP systems. In the
new organization, the many different IT systems were found to contain errors. Not serious errors, but
discrepancies which only came to light as a result of the combined use of data and systems.
Geurts mentions the example of a staff celebration marking the launch of the new company logo.
When sending out invitations to the company-wide event, 800 people were ‘forgotten’. This was due
to an incomplete overview in the HR environment. And he says there were more inconsistencies.
There was contamination of supplier data, for example. The same supplier may use different names in
different countries, with the result that the various systems in use in a multinational may show it as
different businesses.
“Good data quality is extremely important, for both health and safety”
Bart Geurts
CASE
39
Bart Geurts
MANAGER MASTER
DATA SHARED
SERVICES DSM
CASE
40
Linking MDM to business processesBuilding on previous experiences and results with data quality, DSM moved to a central MDM
approach. Geurts says the business data is good enough for transactions taking place within a par-
ticular silo, such as a country or business division. “But as soon as operations become cross-divisional,
problems are liable to emerge.” Leading ERP suppliers offer MDM solutions, Geurts says, but put too
much focus on the individual silos. That is why DSM chose the MDM solution from SAS.
Geurts stresses the importance of the link between MDM and the business processes. It highlights
the benefit for the organization as a whole and for the individual divisions which operate efficiently
within their respective silos. Key issues are who in the organization should be the owner of the MDM
process, who plays what role, and which KPIs (key performance indicators) are used. A possible
company-wide KPI for MDM is measuring how long it takes for one customer order to be processed,
delivered and invoiced.
Think big, act smallEstablishing the MDM process and addressing the issues involved was the easiest part according to
Geurts. He describes that as ‘devised on the sofa’. Then came the implementation phase, with the
deliberate choice of a relatively small-scale start. “We conducted a pilot in the sourcing department
based on the think big, act small precept.” The term ‘small’ needs to be put in context, however.
Worldwide, DSM has six sourcing front offices and two back offices. In this small-scale pilot, the incon-
sistencies in supplier data were tackled first. The diverse vendor data, which included duplicates, was
cleaned among other things by using different language algorithms in the SAS MDM product. “The
complexity lies in the details,” says Geurts from experience.
What data is critical for your business?As well as tackling the contamination of supplier data, steps were taken to deal with other master
data components. The offices concerned were asked to state which data was critical to their business.
“Because we couldn’t analyse all the data in the business.” That would be too large an operation and
place too heavy a burden on those systems. By answering the question which data was critical, the
gap between the MDM initiative and the involved business units was bridged. After all, they them-
selves specified the selection of data that is crucial for their own processes. Such a selection is neces-
sary due to the range of master data. DSM defines master data as everything that underlies processes
and transactions. At first sight, any error can cause inefficiency, but the extent to which it actually
does so depends on the type of data. “If the telephone number of a supplier’s salesperson is incorrect,
you may be able to send an email instead,” Geurts explains. But no such back-up is available if a bank
account number or a supplier’s address is incorrect.
Preventing errors On the basis of this data which the units defined as critical, data rules were drawn up. That took
around six months, after which the implementation was completed in around three weeks. A clear
benefit which MDM has delivered for DSM is the avoidance of errors. Geurts cites the example of an
order entered in the name of the wrong department. DSM is also introducing an improvement in the
inputting of supplier data, as people sometimes make errors when searching for an existing supplier
CASE
41
or entering a new one. If the search is unsuccessful, a new entry is created that is essentially a dupli-
cate. An algorithm is now linked to the data input, which checks and then asks the person who enters
the data: “Is this the supplier you are looking for?”
Master Data Management is a continuous processThe above advantages of MDM relate to internal matters such as the staffing overview, supplier data
deduplication and error prevention. But MDM also offers external benefits for DSM. “Suppose there’s
an error in a product or in material. We want to know immediately which products are affected.”
Speed is of the essence in such cases. It is also important to continue the data checks after the initial
MDM implementation. “Keep on checking! Otherwise you’ll have new problems two or three months
down the line,” warns Geurts. After all, MDM is a continuous process that remains active to prevent
new errors that would have to be fixed later. “You don’t want that, because it would disrupt your
business process.” Making sure that all the relevant people in the organization understand this is
instrumental in ensuring success. ■
CASE
42
INTERVIEW
WHO IS YOUR DATA GOVERNOR?
How Data Governance can facilitate future data mining
43
An outsider could conclude that data quality and data governance amount to the same thing. This is
not the case, however, even though there is a strong relationship between the two data disciplines.
“Data quality involves the implementation of quality rules; data governance goes much further,”
explains Bas Dudink, Data Management Expert at SAS Netherlands. “Who is responsible for the quality
of data? Which responsibilities are involved? What agreements have been made?”
Where does your data come from?Data quality concerns the accuracy of postal addresses and databases, for example. In order to ensure
lasting quality improvements in this area, agreements will need to be made and enforced. As such,
data quality can be implemented as a component of data governance, but the two are not inextrica-
bly linked. Data governance can come from various directions.
It may be based on a wider need felt by the organization, or it could be required by legislation and
regulations. Dudink gives the Basel agreement and the ensuing regulations for financial institutions as
an example. Banks are now required to answer the question: “Where does your data come from?” In
practice, the same applies to factories, which apply or are required to apply certain standards for the
materials they use.
Metadata, file formats and standardsData governance goes further than the origin of data. It also encompasses metadata and standards
for the formats in which data are delivered, processed and passed on to other organizations, including
external partners, internal departments, as well as future customers. It could apply to information
applications that are presently unknown or unseen.
As such, data governance transcends departments and business processes. Data management is now
still often encapsulated in the silo of a particular business activity or division. The use of the data is
based on the current state of affairs. “The primary, everyday processes usually work just fine,” Dudink
describes the practical situation. However, modifications have to be made to enable new activities or
company-wide functions such as risk management.
Management consultancy rather than technology Good data governance mainly concerns non-IT issues such as business processes, organizational pro-
cedures, working agreements and the enforcement thereof. There is a security component too: who
has access to which data and how is it protected? “It’s more like management consultancy: drafting
procedures,” the SAS expert explains. He estimates the technology component to be a modest 10 to
20 percent of the total work of a data governance project.
INTERVIEW
The world of data goes much further than simply collecting and using it to make money. Only good quality data can make money and good quality data often entails good management. Not traditional IT management, but data governance. Do you already have data governance strategy in place? Who is your data governor?
44
The key question is how an organization treats its data, for example in order to manage risks. This is
a matter that needs to be arranged for the whole organization, not just a single activity or division.
Good data governance takes account of the future, so that any future customers can easily be sup-
plied using consistent data, for example for new partnerships or new business activities.
It would seem logical to link this to data quality and this can indeed produce longer term advantages.
However, solely improving data quality without involving data governance can reduce the project to
an incidental, short-term improvement. It will then turn into an operation that either does not pro-
duce a lasting result, or one that has to be constantly repeated to get a result. A data governor can
prevent this from happening.
Taking shortcuts to achieve KPIsIf an employee, supervisor or department is held accountable for certain targets then they will obvi-
ously focus on these. If these targets involve responsibility for an activity, but no accountability, then
obviously this activity will not be given priority when things are busy or resources are limited. Such
logical business and career decisions are not always good for the company.
Take the example of the call centre. Their KPI was their customers’ waiting time in the queue. In order
to keep these scores high during a particularly busy Christmas period, they decided to skip a number
of fields in the CRM system, ‘only until things have calmed down again’. Is this a smart shortcut to get
results, or a lax attitude to the company’s business processes?
If no consequences are attached to this behavior, there is a risk that this measure will become perma-
nent sooner or later. After all, it saves time and so boosts the performance of the relevant department
and its manager. Even if this somber scenario does not come to pass, the damage has already been
done. Because who is going to check and possibly enter the missing data afterwards? This means
there is a gap; nuggets are missing from the data treasure chest. Good data governance would have
prevented the organization from making this error in its own call center.
Address errors upstreamAnother practical example is the invoicing and payment process of a healthcare institution. There
was a long delay between the receipt of the invoice and payment, and this period was getting even
longer. An investigation revealed the cause: the invoices contained erroneous transactions. The health
insurer exposed these errors, and so the healthcare institution became embroiled in the time-con-
suming task of repairing them. Every invoice had to be double-checked, which substantially delayed
payment.
The institution decided to tackle the problem upstream rather than fixing the erroneous invoices
downstream, after they had already been generated. Now every patient-related medical procedure
is subject to various data quality tests, and the healthcare professional that enters the data is given
direct feedback so they can fix any mistakes immediately. The result is that the payment term has
been reduced considerably. An additional benefit is that management reports, analyses and the
relationship with the health insurer have all improved.
INTERVIEW
Get more than your money backIt’s actually a childishly simple, universal principle: if you invest in improvements upstream, you will
get more than your money back downstream. But if this so obvious, why is it so rarely applied to data
management? Why do we start by building complex data warehouses as purification plants, rather
than purifying the water at the source?
Data governance can prevent gaps occurring in the data treasure chest and the resulting time-con-
suming task of repairing the errors. So how should you implement data governance? The implemen-
tation of data governance is like a tap dance, says Dudink. Step one is generally data quality, because
this is the source of the problem. But even before that first step, awareness needs to be raised within
the organization. If the awareness is there, and data quality is improved, then you still require insight
into the broader perspective. “Data quality is a step-by-step process,” continues the SAS consultant. A
holistic approach to data governance is recommended. It is, after all, about so much more than ‘sim-
ply’ IT and data management.
The timeliness of valueSo data governance is not easy to implement. “It’s a complex process,” says Dudink. It’s theory ver-
sus practice. In theory, data is always important; both in terms of quality and the control thereof. In
practice, this importance is not always reflected throughout the organization. It has not always been
identified who is the ‘governor’ of which data. This does not only concern the responsibility for the
data, but also the value that is attached to it.
Or rather: whether the value of the data is recognized in time. “Imagine your roof has a leak while
the sun is shining,” Dudink explains the timeliness aspect. The leak becomes a problem when it starts
raining, but by that time it’s too late. Data governance is a complex affair; a company-wide operation
that transcends IT and affects matters such as business operations, the corporate culture and Human
Resource Management. But in the end, good data governance offers major company-wide advantag-
es in today’s data-driven world. ■
45
INTERVIEW
“If you invest in improvements upstream, you will get more than your money back downstream. But if this so obvious, why do we start by building complex data warehouses as purification plants, rather than purifying the
water at the source?” Bas Dudink
46
I’ve noticed other people have this tendency too. It doesn’t matter whether I’m talking to clients
about analytics, CRM, data, or digital, the question always comes up: “Who should own that, the busi-
ness or IT?”
The question of ownership pervades the day-to-day at companies worldwide. It seems everyone is
focused on everyone else—who should own it? Who should manage it? Who should take credit for it?
Who should fund it?
But in watching the companies that were effective at bridging the proverbial busi-ness-IT divide, I noticed three common traits:
» The successful companies had leaders who realized that appointing people or changing orga-
nizational structures wasn’t enough. New ways of doing business were key, and new processes
needed to be practiced in order to ensure change adoption.
THE NEW IT
Jill Dyché VICE PRESIDENT
SAS BEST PRACTICES
Someone once classified the world into two types of people: those who like categorizing people into two types and those who don’t. I used to be one of those people, the kind that saw executives as either business-focused or technology-focused.
COLUMN
47
» These companies met their cultures where they were, working
within the strictures of top-down or bottom-up and ensuring that
these new processes and rules of engagement were new enough
to be compelling but not so disruptive that they would encourage
inertia or sabotage.
» Leaders at these companies didn’t embrace these changes for
their own sake. Rather they were (and are) considering how
trends like digital business are forcing fresh approaches to long-
standing business functions.
Using the trend of the digital business and innovation as the key drivers for make-or-break changes to
IT, I wrote about practices that successful leaders have embraced to not only transform IT, but to le-
verage technology in new ways for business benefit. ‘The New IT: How Business Leaders are Enabling
Strategy in the Digital Age’ features change agents who have emerged from the trenches to tell their
stories.
What I’ve learned from these leaders is what I write about in the book, including: » If your IT only has two speeds, you’re in big trouble.
» The question “What type of CIO are you?” misses the point. The real question is, “What type of
organization are you leading, and what should it look like?”
» Collaborating by getting everyone in a room isn’t good enough anymore. (In fact, it’s dange-
rous.)
» Corporate strategy and IT strategy can be aligned on one page.
» Hierarchy is being replaced with holocracy, homogeneity with diversity.
» Innovation shouldn’t be run by an elite SWAT team in a separate building with sushi lunches and
ergonomic desk chairs. Everyone should be invited to innovate!
» More people are talking about digital than doing it. Except maybe for you, if you can circum-
scribe digital delivery.
» You don’t have to be in Silicon Valley to join the revolution. In fact you might not want to be!
The leaders profiled in ‘The New IT’ - including leaders from Medtronic, Union Bank, Men’s Wear-
house, Swedish Health, Principal Financial, and Brooks Brothers, to name a few - have shown that it’s
no longer about business versus IT. Rather, it’s about business enabling IT. And vice versa. ■
It’s no longer about business versus IT. Rather, it’s about business enabling IT
COLUMN
48
CASE
Crédito y Caución is the leading domestic and export credit insurance company in Spain and has held
this position since its founding in 1929. With a market share in Spain of nearly 60 percent, for over 80
years the company has contributed to the growth of businesses, protecting them from payment risks
associated with credit sales of goods and services. Since 2008 Crédito y Caución is the operator of the
Atradius Group in Spain, Portugal and Brazil.
Crédito y Caución’s insurance policies guarantee that its clients will be paid for invoices issued during
their trading operations. The corporate risk analysis system offered by Crédito y Caución processes more
than 100 million company records updated on an ongoing basis. It carries out continuous monitoring
of the solvency performance of the insureds’ client portfolio. Its systems study more than 10,000 credit
transactions every day, setting a solvency limit for the client which is binding on the company. To
determine that risk, Crédito y Caución requires comprehensive and accurate information on its clients’
clients. The efficiency of the service that Crédito y Caución provides to its clients largely depends on the
quality of the data contained in that information.
The ability to assess the risk that invoices are not paid by the customer is of vital importance to credit insurance companies. But what if you cannot rely on accurate information? At Crédito y Caución, data quality spearheads the implementation of long-term strategies. A look in their kitchen.
CRÉDITO Y CAUCIÓN ADDS DATA QUALITY TO ITS MANAGEMENT MODEL
Credit insurance company integrates quality ratio’s in risk assessment
49
CASE
50
Adapting to new regulationsLike all European insurance companies, Crédito y Caución must comply with the risk-based superviso-
ry framework for the insurance industry. The framework consists of the Solvency II Directive and its
Implementing Technical Standards and Guidelines. Besides financial requirements, Solvency II includes
requirements on the quality of the data handled by insurance companies. Under Solvency II, the accura-
cy of the information is no longer optional. Data quality is essential for decision-making and for certify-
ing compliance with the requirements of the new regulatory framework.
Crédito y Caución has approached its Solvency II compliance by pursuing a strategic vision that reaches
far beyond the contents of the EU directive. “Information is our greatest asset,” says Miguel Angel
Serantes, IT Development Manager at Crédito y Caución. “We are experts in locating it, storing it and
analysing it, as well as obtaining business intelligence from this information for our activities. The chal-
lenge posed by Solvency II created the opportunity to incorporate quality ratios into the information
management and to integrate these into our procedures. We do not simply meet the requirements,
but rather we are committed to instilling the highest quality in all our data management operations.
We have transformed Solvency II into a company process, integrating it into our business intelligence
environment.”
The first step: assessing data qualityThe first step in taking on this challenge required performing an assessment of the quality of the data
handled by Crédito y Caución. “We took advantage of the assessment option provided by the SAS
solution, to perform an analysis of the quality of our own data,” adds Serantes. “The results showed us
that there was still a way to go. Keep in mind that much of our data such as company names, phone
numbers and tax codes come from public, third-party sources with varying degrees of inaccuracy. From
there we decided to design and implement a data management quality model and opted for SAS,
which we integrated into our management system.”
Miguel Ángel Serantes and his team developed the foundations for the data management policy of the
company, by establishing the essential criteria to be met by the data managed: it had to be accurate,
complete and appropriate to the operations of the company. They determined the various data owner
levels that would be responsible for its content, definition, use and administration. They established
compliance ratios for each category of data, so that it would be possible, through a system of indica-
tors, to obtain an immediate overview of the quality level of each piece of data.
A constantly evolving process“We decided on SAS for various reasons,” says Serantes. “SAS has been our information management
solutions provider for many years. The relationship is very smooth. They have a resident team at
Crédito y Caución that works closely with our IT department. All of this has aided us in the efficient
integration of SAS Data Management into our information management system. It is a solution that
fits our needs. It makes it possible to set criteria and attributes to define the quality of data; it has
options for its assessment; it identifies problems with quality and helps to resolve inaccuracies. The
solution aids the implementation of long-term strategies and enables permanent monitoring of the
quality of our data.”
CASE
51
The deployment of the data quality control system at Crédito y Caución took around a year. Although,
as Serantes says, it is a never-ending process. Data quality control is constantly evolving. The advanta-
ges and benefits of this strategy and the technology solution implemented have been obvious from
the outset. “For starters, we have a data policy that is well-defined and known throughout the compa-
ny. We know what to do with the data and who is responsible for each area of data. We know where
the weaknesses are in the data and how to correct them. In addition, SAS gives us information on the
cause of inaccuracies in the data. We expect to obtain further qualitative benefits, such as the definition
of quality objectives for each piece of data. This allows us to focus on those controls that are relevant to
the business.”
The aim of Crédito y Caución is to achieve 100 percent quality in the management of the data gene-
rated by the company itself, over which it is able to establish rigorous controls. For data from external
sources over which Crédito y Caución has no direct control, the aim is to establish standardization crite-
ria in order to achieve maximum quality. ■
CASE
Miguel Angel Serantes
IT DEVELOPMENT
MANAGER AT
CRÉDITO Y CAUCIÓN
“Information is our most important
asset”Miguel Angel Serantes
52
INTERVIEW
Dudink knows why the data that businesses collect and use is often of such poor quality. It is because
everybody uses the same data. This may sound contradictory, but it’s not. On the one hand, it means
that poor quality data is continually used and reused, complete with errors, omissions and a lack of
context. Dudink is adamant on the latter point: “Data without a context is useless.”
On the other hand, using the same data for everybody entails that the same data source is used for
different uses and purposes. This may sound like a sensible approach for efficient data administration
and entry, but different applications require different quality levels of different data fields. This con
cerns IT applications and systems, as well as the use of data in various business activities.
The devil is in the detailsIf you are sending an order to a customer you will need address details but no bank account number.
If you are paying invoices, it’s the other way around. This is a simple example, with data fields that
DATA DEPENDENT ON QUALITY IMPROVEMENTS
Data usefulness hinges on quality
Time is money, and nowadays data is money too. Data is worth its weight in gold, so it’s odd that the quality of data is so often overlooked. In light of the importance of data today, including metadata and Big Data, you would think that data quality should no longer be an issue. Surely this is self-explanatory? “You would think so,” replies Bas Dudink, Data Management Expert with SAS Netherlands. “But data quality is poor by definition.”
53
INTERVIEW
54
are relatively recognizable and carefully administrated – or, in any case, they should be. It becomes
more complicated when smaller details are involved that can still have large consequences. Think of
deliveries between companies with several divisions, whereby the business units buy and sell servi
ces from each other. Subtle sales and/or purchasing discrepancies can result in major variations. Does
the one division or subsidiary offer a special discount, or does it have different delivery and payment
conditions? The data on prices, conditions and terms can vary and, as such, represent poor quality.
The danger of ignoranceDudink puts his bold statement, that data quality is poor by definition, into perspective: “It’s all in the
eye of the beholder. What’s good enough for one person may not be good enough for someone else.
It’s about awareness.” The fact that the data are not in good order is not the worst part. The problem
is being unaware of this. Dudink quotes a wellknown saying: “Ignorance is bliss.” The biggest danger
is when you think your data are accurate and up to date and take action and make decisions on this
basis. An additional problem is that many companies mistakenly think that they are not data compa
nies. In fact, in these days, every company is a data company. This even applies to the seemingly sim
ple case of a production company that processes physical goods and turns them into new products.
Both its manufacturing process and production line design are based on data!
More of a business issue than an IT matterDesign is usually followed by optimization on the basis of practical experience. The data accrued along
the way will be important if the company wants to open a new factory or optimize an existing pro
duction facility. Repeatability and improvement, summarizes Dudink, are core actions that depend on
good data.
And good data leads to good business processes. Only then is automation at the business level pos
sible. This requires awareness of data quality within the organization. “They have to know that there
is something they don’t know.” IT suppliers that offer data solutions are thus more involved in the
consultancy business than they are in software. “We don’t only provide software, we also provide
advice.”
More than just rubbing out errorsStill, the early stages of data improvement have little to do with tools and systems. It is mainly a
question of analysis: what data is concerned, how did the company acquire it, who provided it and
how? These are quite fundamental questions that could affect established processes and procedures,
or even partners. For who dares to openly complain that a supplier or customer has provided poor
data? Yet this is quite often the case. “If the data is erroneous, then it’s not sufficient to just fix the
errors. You need to comb through the entire process: where was the error generated?” All too often,
data quality is seen as a way to rub out errors. This is sometimes egocentric, stemming from an
organization’s wish to sell itself as a welloiled machine. Dudink calls this “keeping up appearances”,
but the right approach to data quality goes much further and much deeper.
INTERVIEW
55
The need for a championData quality is not very popular in practice. The shining examples in data quality improvement that
Dudink can think of have been forced into it. “Thanks to regulations. If data quality is not compulsory,
then most people would rather leave things as they are.” There is a concrete argument behind the
rules and laws that make better data compulsory: risk management. It will be no surprise that banks
and financial institutions, including insurers, are among the frontrunners.
Without the regulatory stick, an organization needs to have a data quality champion. “Someone who
can see what information is going to waste and who sees the added value in doing something about
it.” The difficulty is that such a person has to have a broad overview of the organization as well as the
insight and power to implement the change. “Preferably a CIO,” Dudink summarizes succinctly.
Basic tip: start smallDespite the necessity of having someone in a senior management position to push through data
quality improvements, it is still advisable to start with a small project. ‘Small’ is, of course, a relative
term, depending on the size of the organization. Dudink recommends that the first project is used to
germinate the idea of data quality improvement; as a breeding ground and learning school. The high
placed manager that initiates the project will need to combine a topdown vision with a bottomup
approach. This combination is required to achieve the desired improvement in business data quality.
“I think CIOs will see this too.” ■
“What’s good enough for one person may not be good enough for someone else. It’s about awareness”
Bas Dudink
INTERVIEW
56
INTERVIEW
DATA-DRIVEN DECISIONS MAKE THE DIFFERENCE
Analysing data streams for operational decision-making
Fast, faster, fastest; that is the motto of these modern times. IT has made things faster, but now the flood of data is threatening to inundate us. Although thorough data analysis can be time-consuming, Event Stream Processing (ESP) and Decision Management provide a way to make it faster and more efficient.
INTERVIEW
We are generating, storing and combining more and more data and we want to conduct ever more
complex data analyses. “In certain environments, the sheer volumes of data can be overwhelming,”
says Andrew Pease, Principal Business Solutions Manager with SAS Belgium’s Analytical Platform
Center of Excellence. “In some cases, it may not be practical to store it all, despite the declining costs
of data storage.” And there is more than just the storage costs at stake. Analysis costs can be pro-
hibitive as well, especially if the analysis takes place retrospectively. In some cases, the analysis will
come too late to be of any use.
The solution is to filter real-time data by means of Event Stream Processing and then apply an auto-
mated decision-making process using Decision Management functionality. This automation is carried
out on the basis of business processes which have a built-in link to the tactical and strategic deci-
sion-making levels. Decisions that affect the work floor or even direct interfaces with the customer
can now be taken automatically while keeping in line with the organization’s strategic goals.
Trends and anomalies on drilling platformsPease provides an example where ESP can be of use: sensor data input on drilling platforms. Some
of these sensors detect vibrations, which can be used to identify anomalies. Anomalies can point to
problems, so it is important to get the results of the data analysis fast.
A key aspect here is focus and scope. “You need to focus on what you want to know beforehand.” The
SAS expert explains that it would be pointless to analyse the vibrations every second or every
minute. A fixed time range per hour may be sufficient. The main thing is that trends and anomalies
are identified, so that prognoses can be made. ESP can process both scoring code generated by po -
werful predictive analytics as well as expert-written business rules to allow for these kinds of analyses
to determine if pre-emptive maintenance is warranted.
Mobile plans and extra cell towersAnother example is the use of call data by telecom operators. “Telecom operators sit on huge stock-
piles of data,” explains Pease. Each telephone conversation involves some 20 to 30 call detail records.
“But they aren’t all important for analysis.” However, the analysis of the right pieces of data can
reveal a lot of important information. The obvious example is the clearly useful data on the timing of
any customer’s call, so you can offer a better plan. Calling behaviour need not be tracked extremely
closely; it will usually be sufficient to identify the general patterns. “If the customer pays too much for
too long, the risk that he or she switches to a new operator will increase.”
Further, such analysis can indicate gaps in network coverage. The operator may also decide to install
extra cell towers on the basis of the number of dropped calls. By analysing the dropped calls, the
telecom operator can accurately identify the gaps in coverage or capacity. This means they can erect
an extra cell tower in the exact location where the most dropped calls are.
6157
The added value of Event Stream Processing is the ability to monitor and analyze large quantities of, for example, transaction or sensor data in real time, which results in immediate reactions to specific situations or the required interference.
Event Stream Processing
58
Catching stock exchange swindlers red-handedThe financial world works at a faster pace than most other sectors. Stock trading is automated and
fast as lightning. In this fast-paced world, ESP can be used to detect stock market fraud early on, says
Pease. The speed of the trading systems makes it impossible to store all the data that goes through
them, let alone analyse all these data streams.
“The trick is to filter out the anomalies; the suspicious transactions.” Pease mentions a book that was
released this year called ‘Flash Boys’ which is about the so-called flash traders or high-frequency
traders. Author Michael Lewis paints a very sombre picture of the international money markets,
whereby the speed of data is of the essence. The faster the data, the better the price of a purchase or
sale. Although this nonfiction book has also been criticized, it does contain many interesting lessons
and truths.
Pease relates the story of a Canadian trader’s sly trick. He split up his share trading activities and
found a second server that was some two kilometres closer to the stock exchange. He then set up
a perfectly timed delay so that his various orders arrived at different clearing houses at exactly the
same time. This enabled him to trade in bulk while ensuring that his sales did not lead to a lower
stock price and hence lower yields. Cunning and almost undetectable.
Keeping up with fast flowing data streams“These days, some data streams flow so fast that it is impossible to analyse all the data they contain,”
says Pease. This is because it takes time to process the data. Just as falling storage costs do not com-
pletely compensate for the data explosion, the advances in computing power cannot always keep up
with the analysis requirements. The cunning share trading trick using server timing is almost impos-
sible to detect, unless ESP is deployed. This is because ESP turns fast streams of data into reasonably
bite-size chunks.
There is a rising demand for fast, real-time analysis of data. This is because of the new applications,
increasing numbers of sensors that collect data and an increasing range of business models that all
depend on reliable data analysis. Where the quality of the analysis used to be paramount, speed has
become equally important. The developments affect the financial and telecoms sectors and the manu-
facturing industry the most, Pease explains.
INTERVIEW
Decision Management aims to improve organizations by enabling faster, smarter (oper-ational) decisions. This can be done by utilizing all available data in combination with au-tomated application of analytical models and derived business rules, weighing the time required and possible risks.
Decision Management
59
Beer brewing and the right cheese with your wineManufacturing companies are also installing more and more sensors in their production environments.
This is obviously an advantage for high-grade manufacturing of technologically complex products, but
other applications are being found as well. Beer brewers, for example, use sensors to detect wet hops
so the processing of the raw material can be timed on the basis of the sensor readings. Moreover, it
also enables them to schedule maintenance of the kettles more efficiently.
ESP is also implemented in other sectors, such as retail. The same applies to Decision Management.
Pease points to the interesting opportunities for scanner use by supermarket customers. If a customer
puts a bottle of wine in his trolley, then why not suggest a cheese to pair with this wine? The super-
market could even offer a discount to make it more attractive. The IT system knows that there is cur-
rently a surplus of cheese that will spoil in an x number of days, so it’s worthwhile selling it at a lower
price. This does require a reliable stock inventory system, including information on use by dates. It is
not the terabytes that count, but the speed: this all has to happen before the customer reaches the
checkout. “You need to make the right offer at the right time.”
The business strategy is paramountThe offer to the customer must also harmonize with the organization’s business strategy. The core
of Decision Management is to centralize and standardize this strategy. Every decision must concur
with the business strategy. Is a discount offered by the marketing department in line with the
policy of the risk management department? If a customer buys in bulk, then the discount could be
increased. This also has to do with the longer term value that such a customer can have for the
company. Clearly defined business rules are required if this is to be managed adequately. “The
business rules need to be entered and administered from a single central location.” The rules also
need to be consistently applied across the organization’s systems and activities. This is the only way
to guarantee that the organization’s various silos can work as a whole in order to facilitate effective
Decision Management. ■
“Where the quality of the analysis used to be paramount, speed has become equally important”
Andrew Pease
INTERVIEW
60
CASE
Data can seem deceptively simple. Take a basic data entity such as a supplier: the company name may vary per country, business process and/or activity. Master Data Management (MDM) is used to resolve these discrepancies and can prevent a lot of unnecessary work and follow-up.
CASE
It seems only logical that supplier data are accurate and up to date. After all, suppliers are important
business partners; goods and services are received from them and payments are made to them. But
still the data on such entities often leaves much to be desired. Alongside simple data entry errors and
data corruption, there is another cause of data disparities: data entry differences, potentially across
different systems.
Supplier A = Supplier 1 (= B2)One business system may refer to ‘Supplier A’, while a back-end application may use the descriptor
‘Supplier 1’. Automatically finding and replacing ‘A’ with a ‘1’ or vice versa seems the obvious solu-
tion. The problem is that this can lead to unwanted side effects, such as references in the relevant
systems or from other applications which then cease to work.
Then there is the additional risk of historical discrepancies occurring. For example, an invoice sent by
Supplier A has gone missing. The reason is that it has been renamed. In theory, an interlayer could
exist that has monitored the renaming process and is able to trace it back to its origin. This leaves the
question of the impact on the system performance, because Supplier A and 1 is really too simple an
example.
6561
Getting a grip on your data
MASTER DATA MANAGEMENT AS A FOUNDATION OF YOUR BUSINESS
62
Discrepancies due to uncontrolled growthDiscrepancies in these so-called Master Data are typically caused by the uncontrolled growth of IT sys-
tems, as Bas Dudink, Data Management Expert at SAS Netherlands, has learned from experience. The
growth is often driven by the ERP systems, the critical business environments for resource planning.
These complex, extensive systems can fulfil their invaluable roles only after a lengthy implementation
project. This means that they will only be expanded, customized or phased out if it is really necessary.
The explosive growth of IT systems can be part of an organic process, but it could also be the conse-
quence of takeovers. Company X buys company Y and Suppliers A, 1, B2, etc. encounter each other
in the combined IT environment of the newly formed division. It is often not possible to completely
integrate all systems, or it may concern a megaproject that is scheduled to happen ‘later on’, after the
impact of the takeover has been absorbed and after the business opportunities have been seized.
As many as 50 ERP systems“Some companies have as many as 50 ERP systems,” explains Dudink. “For various countries, for
various business processes, et cetera.” This means that the Master Data can be stored in 50 different
ways and used in 50 different ways. These are not theoretical differences: Dudink sees this in practice
all too often.
“Some companies have as many as 50 ERP systems. This means that the Master Data can be stored in 50 different ways and used in 50 different ways”
Bas Dudink
CASE
63
Initial steps: centralization and deduplicationDudink recommends involving the people and departments who will reap the benefits of MDM at an
early stage. This will help make the value of the project tangible. MDM can have an enormous impact,
because those 50 different forms of data storage in 50 ERP systems cause a great deal of inefficiency.
The first step is centralized implementation, which leads to considerable administrative cost savings.
You would think the logical next step would be to make the data available to the wider organization,
but it is not time for that yet. First the second, essential step in MDM must be taken.
The second step is data deduplication. This entails finding a supplier who has two different labels in
the IT environment and removing one of these, for example. This is how to make sure that Master
Data discrepancies cannot lead to an invoice getting lost for three months. Inconceivable? It actually
happened to a major corporation. An invoice for a considerable amount of money had simply ended
up in the wrong tray – except that no one knew which ‘tray’ in the immense digital environment the
invoice was in.
Who is allowed to change what?Discrepancies are a fact of life but they do need to be prevented as much as possible. This could be
considered step three of the MDM implementation: prevention of new data discrepancies. The key
question is: who controls the Master Data? Who has the authority to change a supplier’s bank account
number? The ownership of the data in relation to its use is important. Who owns the data that were
entered in the Netherlands and which are primarily used for business in the United States?
The identification of data ownership and use should lead to a logical review of that ownership. Once
cleaned up, the Master Data will be more deployable for the divisions and business processes that
benefit the most. The data will initially be deployed from the (now clean) central storage location and
can then be made available to the ‘lower’ levels of the organization.
MDM is more than detecting errorsGood Master Data Management should lead to more than just the detection and resolution of errors.
It can also be used to detect and prevent fraud, such as the so-called ‘ghost invoices’. These invoices
are sent by swindlers and made to appear as if they have been sent by bona fide suppliers. MDM pre-
vents these swindlers from slipping through the cracks in the system.
Discrepancies, errors and fraud can occur because companies and their systems have become more
and more complex. When a supplier or customer registers for the first time, the transaction will nor-
mally take place without any problems. This changes after several years and thousands of transac-
tions, processed by hundreds of people across several departments, whereby the organizational struc-
ture may have also changed. The registration of this one supplier or customer has since been subject
to countless changes, actions and exceptions. “It is quite a challenge to ensure that every transaction
is correctly processed,” as Dudink knows from experience. This challenge stands or falls with good
Master Data Management. ■
CASE
ABOUT SAS
SAS understands that almost everything is data driven. We want to help you make sure that this takes place
correctly. Is your data easily accessible, clean, integrated and correctly stored? Do you know which types of
data are used in your organization and by whom? And do you have an automated method that validates
incoming data before it is stored in your databases?
Take better decisions
Thousands or maybe even hundreds of thousands of decisions are taken daily in your organization. Everyday
decisions taken as part of a process: Can we grant this loan to this customer? What offer should I make to a
customer who calls our contact centre? Tactical decisions, such as: What is the optimum balance between
preventive and responsive maintenance of machinery? If we want to scrap five of the fifteen flavours we
offer, which should they be? But also strategic decisions on your organization’s direction. For example,
in which product-market combinations do we want to be present? Information plays a role in all these
decisions. The better the quality of the underlying data, the better the decisions you take.
Get control of your data
Making sure data is complete, accurate and timely can be a time-consuming job. Fortunately, the task can
be largely automated. Spend less time gathering and maintaining information and more time running your
business with SAS Data Management. This solution has been built on a unified platform and designed with
both the business and IT in mind. It is the fastest, easiest and most comprehensive way of getting your data
under control. SAS Data Management brings in-memory and in-database performance improvements which
give you real-time access to reliable information.
Proper process set-up
To succeed in today’s data-driven world, you’ll need more than just a robust data management platform.
Processes and behaviour also play an important role when it comes to master data management, data inte-
gration, data quality, data governance and data federation. We can help you to set these up properly too.
Because getting your data in order is not a one-off activity, but a continuous process. Whether you have a
large or not-so-large volume of data, you transform it into great value and possibilities.
Want to know more? Visit our website www.sas.com/dm, or contact us at [email protected]
64
66
Realization: SAS Nederland
Editor: Ilanite Keijsers
Photography:Eric Fecken
Authors:Jasper BakkerMirjam Hulsebos Chantal Schepers
Cover:Philip van Tol
Design:Alain Cohen
Project management:SAS Nederland
The book, Future Bright – A Data Driven Reality, was commissioned by SAS Netherlands. Content from the book can only be either copied or reproduced onto print, photo copy, film, Internet and any other medium, with the explicit permission of SAS Netherlands and its management, with proper acknowledgement. SAS Netherlands is not responsible for the statements made by the inter-viewed parties in this book.
COLOPHON