14
THE IT DEPARTMENT BECOMES A DATA BROKER Nick Bakker MSc Service Management Consultant Netherlands [email protected] Paul Wu MSc Senior Business Consultant Netherlands [email protected]

THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

THE IT DEPARTMENT BECOMES A DATA BROKER

Nick Bakker MSc Service Management [email protected]

Paul Wu MScSenior Business [email protected]

Page 2: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 2

Table of Contents Introduction ........................................................................................................................... 3

Enabling the business with data ............................................................................................ 4

The challenges of managing data .......................................................................................... 5

Supporting the business more closely ................................................................................... 7

Determining business value of data ................................................................................... 8

Identifying data sources ..................................................................................................... 8

Performing governance .....................................................................................................10

Creating sandboxes for data aggregation and analysis .....................................................10

Complying with legislation .................................................................................................10

The IT department becomes the data broker ........................................................................11

Conclusion ...........................................................................................................................13

Appendix ..............................................................................................................................14

Disclaimer: The views, processes, or methodologies published in this article are those of the

authors. They do not necessarily reflect EMC Corporation’s views, processes, or

methodologies.

Page 3: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 3

Introduction There are many books and articles written on the value of data. They all conclude that data

has become a valuable asset and the manner how data is used has changed over time and

that “Data is considered the new oil”. Like oil, data needs processing. Once it has been

processed, it becomes valuable in many ways. For companies, one of the reasons to perform

data analysis is to increase sales by capturing consumer purchasing patterns. Driven by the

advances in technology, we now have increasing capabilities to monitor customer behavior

and are able to take more variables into account than ever before. As a result, models

generated to predict the customer’s next purchase are becoming more and more complex.

Unlike black gold, the amount of data keeps growing exponentially. Not only has the amount

of data increased for each product, the number of products have also increased. Consider

the color of an automobile. A century ago Henry Ford gave his famous quote: “Any customer

can have a Model T Ford painted any color that he wants so long as it is black.” Today,

online paint shops for Ford cars offer up to 75 variations of black, including black panther

black, tuxedo black, charcoal black, and carbon black. And this is only one component of a

car. The Opel Adam can be personalized in so many ways, that the number of variations is

almost unlimited. It’s very rare to spot two identical versions.

Besides the information regarding the characteristics of a product or behavior of the

consumer, there is also a growing amount of data about the data. This ‘metadata’ contains

characteristics about data to facilitate its management and retrieval. A document on the

computer would be considered data, whereas metadata would be the author, time of

creation, and size of the document.

With the seemingly unlimited supply of data available, more variables are taken into account

in decision making and speed is much more of the essence than before. There is more to

analyze in less time.

While data is a business asset, the management and responsibility of data typically still lies

with the IT department. This is because all data is stored on the storage system and

accessed through the IT facilities either of their own IT department or supplied by external IT

organizations such as Google and Dropbox. In addition, the growing number of Bring Your

Own initiatives has increased the number of devices per person used to access company

data assets. Is the IT department still properly equipped to deal with this new-found

responsibility?

Page 4: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 4

This article will explore the changing requirements of the business regarding the use of data.

We will continue with the data management challenges met by the IT department and what is

needed to support the business more closely. Finally, we will take a brief look at how to

prepare the IT department for data management.

Enabling the business with data Although strategic decisions are still often made by intuition, data is important in the

execution of most business processes. Data is used to provide the arguments for strategic

decisions and for getting insight to the behavior of customers and systems.

IT infrastructure services alone does not provide a differentiating advantage anymore. As IT

infrastructure services have become commodity services, internal IT departments now must

compete against highly specialized companies offering low-cost IT services. A case in point;

data has enabled companies such as Amazon to increase their sales.

So where does the value of the IT department shift? It’s the use of the data generated by the

IT services. Today, we know more about our customers and their behavior than ever before.

The company’s websites are tracked with statistics about the number of visitors per day, the

peak hours, and the most frequently visited company pages. Having this information gives

the company an idea about the behavior of the consumer. When a consumer purchases

shampoo, it is likely that he will also require a conditioner. In (online)-shops, these items are

often placed together as it is evident that this increases sales.

Using information about consumer behavior and adapting the production offerings to the

need of consumers is nothing new. However, the speed of the analysis to process immense

amount of data – without the use of sampling – to arrive at a conclusion in a matter of

seconds, is. Consider the amount of information that is gathered on consumer spending over

the past 5 years through online shops. Imagine processing all that information with the

purpose to predict the customer’s next purchase in an instance based on their browsing

behavior. This would be an impossible feat for any person to perform. The amount of data

alone to take into account would be inconceivable. The manner to process this in the past

would be through the use of sampling, simply because it would take too long to process the

entire data set.

Amazon has experimented with the idea to recommend specific books to customers based

on their shopping preferences. It created a model which used all the data in the system

rather than just a sample. This technique proved to be highly successful and even worked on

products other than books. Today, a third of all of Amazon’s sales are said to result from its

Page 5: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 5

recommendation and personalization systems1. The capability to analyze massive amounts

of data has brought Amazon to the leading position in online retailing it holds today.

Google has used the principle of analyzing entire datasets rather than samples to predict flu

outbreaks. It gathered billions of search queries of the past 5 years on its engine and

analyzed it on specific search patterns that indicated a health-care seeking behavior. By

using specific algorithms, Google was able to predict a flu outbreak 7 to 10 days before the

Centers for Disease Control and Prevention (CDC). The CDC reports are slower, because

they rely on data collected and compiled from thousands of health care providers, labs, and

other sources2.

Historically, data accuracy has been a major concern. Especially when the amount of data is

too big to be analyzed in its entirety, a sample of the entire population is usually taken.

Whereas this gives a fair representation of the entire data set, if the data is inaccurate, it is

prone to misleading outcomes. This is different when the entire data set is analyzed. The

inaccuracy of data becomes less relevant the more data becomes available. In the Google

flu prediction example, the result of analyzing 50 million search queries will not be influenced

by a few typing mistakes.

The challenges of managing data Analyzing large quantities of data yields many advantages. However, realizing gains from the

high-hanging “data fruit” pose data management challenges to the IT department.

Traditionally, data support services have been delivered by owned data centers. The scope

of these services was mainly limited to activities such as:

- Maintenance of the physical server environment

- Technical application management

- Backup- and recovery

- Archiving

- Database administration

- File system maintenance

In the past, the provision of these services was highly predictable, as all data was centrally

stored in the data center and periodic meetings were conducted to monitor the demand for

the IT services.

The focus of the IT department is to manage the Service Level Agreements (SLA) that are

agreed with the business. Traditionally, these SLA’s contain the type of services delivered,

Page 6: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 6

agreements on the availability of systems, and the resolution time if incidents occur with a

focus on keeping the current systems running as smoothly as possible.

Best practice frameworks such as ITIL or MOF are used to ensure that the IT activities are

performed as efficient and effective as possible. ITIL describes that a service should be fit for

purpose (warranty, e.g. does it do what it is supposed to?) and fit for use (utility, e.g. how

well does it do it?). With the arrival of big data, these two questions are rarely posed in the IT

department to reflect the new requirements of the business. What remains are the

agreements made in the past. Whilst the activity suffices in keeping the IT systems up and

running, in the world of big data, it does not use its full potency.

As available data grows and becomes richer in content, the management of data requires a

professional approach. This creates some new challenges. First is the optimization of the

total cost of ownership (TCO) of storage. Although hardware costs are decreasing, the TCO

of storage will increase due to the rising volume and other cost elements. Second, agreed

service levels have to be met. Third, the availability and integrity of the data has to be

maintained. Further, data management should comply with privacy regulations to mitigate

security risks. Finally, data management should take care of the environmental impact of the

storage services.

These services are facilitated by support processes, most of the time according to the ITIL

framework. The initial challenge is to adapt the service strategy and the service design.

The traditional supply of data used to be relatively simple. With one data center, all business

data are stored in a central location. This facilitates processing and presentation by

applications. In the new world, the data is presented by a single user interface and cross-

provisioned by several applications. Data storage is hybrid and decentralized, a combination

of internal and external storage. Recent studies show CIO’s fear that because Cloud-based

SaaS solutions are so easy to bring in, companies might be creating silos of data3.

Obviously, the existence of external storage providers requires additional management

attention.

The volume of digital data will accelerate in the coming years. Presently, only a small part of

the digital universe is used for primary business and analytics purposes. The majority of the

stored data can be classified as digital e-waste, e.g. bits and bytes nobody wants. Digital e-

waste causes a breach of budget, environmental problems, security issues, and performance

problems. The challenge here is to redesign the archiving strategy based on new data

classification rules.

Page 7: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 7

Key elements of SLA’s are system availability and resolution time to fix the service when it’s

interrupted. Data availability is hard to measure. Although systems might be up and running,

the service loses its value if the data is not refreshed or becomes corrupted. The issue is to

create an overview of the E2E service chains. The ability to get a clear picture of the data

service is a constraint for negotiating on service levels regarding availability of data.

Another difficulty is to reach agreement on the retention time of data. In cases where the

data is used for analytic purposes, it’s difficult to predict the period the data will have

business value. The business, as consumer of the data, should be able to assess the value

of the data. The Google flu prediction was only made possible because of the amount of

historical data available.

Another challenge is to comply with legal and legislation rules. Specifically for financial and

contractual data, organizations need to know where the data is stored and if there are

agreement to physical access the data. Also, the use of data is constrained by privacy laws,

which vary per country.

“Garbage in is garbage out.” Reliable and accurate output data requires clean input. As the

information assembling process gets more complex, it’s becoming a real challenge to

guarantee clean input. Regular quality checks, or even better, a proven data delivery process

are required. This becomes less relevant as the law of large numbers becomes applicable

here.

The major challenge is to maintain control of the growing number of data sources.

Connecting the data sources is a technical issue. The complexity lies in managing the data

flows. As a result, only a third of the businesses differentiate big data from traditional data4,

missing an opportunity to fully realize the potential of the value of data.

Supporting the business more closely The IT department plays a significant role in enabling the business to analyze data. Access

to the information goes through the IT infrastructure stored on the IT systems and analysis of

the information can be automated by using specialized software. The role of the IT

department becomes that of an intermediate agent to manage data; a data broker.

A data broker’s primary responsibility is to bring together information demand and data

supply. In fact, it’s a facilitator. The data broker understands the requirements of the

business and has an overview of the various data suppliers. The data broker has the right

tools to assemble the raw data into information and also to guarantee the quality of the data.

Page 8: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 8

Figure 1: Data broker aligning information demand and data supply

As a data broker, it is up to the IT department to ensure availability and reliability of data to

support the decision making process in the business. Certain activities to consider include:

• Determining business value of data

• Identifying data sources

• Performing data governance

• Creating a sandbox for data aggregation and analysis

• Complying with legislation

Determining business value of data To adequately support the business with information from data, the IT department needs to

work closely with the business. Similar to demand and supply management roles defined in

many ITSM frameworks, a demand is required for specific data that the IT department can

deliver. Determine which part of the business can benefit the most from a data analysis and

work closely together to create a plan of approach.

Identifying data sources As the IT department manages the IT systems, most of the databases are official and known

to the organization. However, in larger organizations, it is not uncommon that a business

department runs a smaller IT support group to meet their specialized needs. Local

application servers with databases, unknown to the IT department, are installed throughout

the organization. Despite the availability of CRM, ERP, and workflow software, spreadsheets

are still commonly used to keep valuable data located on an individual’s computer. It is

important that these spreadsheets maintain their context when used for data analysis

purposes to ensure accurate comparisons.

Page 9: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 9

While data represented in databases and, to some point, spreadsheets are structured and

ideal for further analysis, it is reported that 80% of all data in an enterprise is unstructured.

To include unstructured data in the analyses requires additional techniques that are still in

development.

Another important data source is “the Internet of Things”. An increasing number of everyday

objects have the ability to collect and transmit data through the use of embedded devices or

sensors that connect with networks. Gartner predicts that 26 billion objects will be connected

to the Internet by 20205.

Furthermore, with the ease of access to cloud services, business departments often use

these services to fill any gaps left open by their own IT department. Consequently, internal

data is stored on systems of external suppliers. Obtaining these databases for additional

analysis will have to go through these suppliers.

Relatively new is the use of open data made available by some governments. Enriching the

data with your own company data can give new insights. McKinsey estimates that open data

can unlock $3 trillion to $5 trillion in economic value annually across seven sectors6.

Figure 2: Potential value in open data, $ billion

The challenge for the data broker is to locate all potentially interesting internal and external

data sources.

Page 10: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 10

Performing governance Data governance affects the whole organization. With the increase of available information,

the thirst for real-time knowledge grows. IT departments have to act accordingly to deliver

the right services. The data broker will take measures to ensure the quality of data, whether

created inside or outside the organization, by appointing data owners of sources or

facilitating audits. The value potential of the identified data sources needs to be determined.

Data that has no value potential is classified as digital e-waste. There are three classes of

digital e-waste: used data, degraded data, and unwanted data. Reducing the data will have a

positive effect on the TCO of the storage management cost7.

Data owners need to be appointed to ensure that the identified data sources remain relevant.

They will maintain the data source with the latest information and when the source becomes

obsolete, appropriate action will be taken.

Creating sandboxes for data aggregation and analysis When the relevant data sources have been identified for analysis, they will need to be

aggregated for appropriate analysis. Using the original data sources for analysis can hamper

performance of the IT system and unnecessarily occupy storage and network capacity. By

creating a sandbox, a subset of the data source can be copied in an environment isolated

from production, where analysts can perform their work. Thus, the production environment

remains available to the business without interruption caused by the analysis of gigabytes of

data.

Sandbox users can get familiar with the data at hand and can add additional copies of data

sources to enrich the dataset the user is working with. The aim of the sandbox is to try and

learn from the combination of data sources.

Complying with legislation When it comes to legislation, there is little room for compromise. For data management,

legislation dictates that certain information is stored for a certain period. There are security

guidelines and companies are regularly being audited for compliance. International

organizations face a multitude of legislation they have to adhere to, depending on the

countries they are located in. For instance, in some countries, it is against the law to perform

certain analysis on a specific customer. Instead, all information that can be traced back to an

individual or a household has to be removed before it is analyzed any further.

The location of the data center in a foreign country can have significant ramifications when it

comes to ownership of the data. If a cloud storage supplier ceases to exist, it is unclear what

Page 11: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 11

will happen with the data on the storage. The American Patriot act allows the US government

to collect information on any American system in its fight against terrorism. This made many

oversea companies decide not to make use of American cloud providers8.

Furthermore, privacy awareness has grown in the media. Unveiled data analysis practiced by

government institutions has made the public grow suspicious of any large organization with

personal information of customers. A breach of consumer privacy in the media can have

disastrous consequences for the organization.

The IT department becomes the data broker In the ITIL framework, activities are put together in processes. As ITIL is still the most used

IT Service Management framework in the world, we will use this framework to describe the

roles of the data broker. We argue the roles according to ITIL remain the same, but we

perceive a shift in emphasis towards information and data.

This is mainly the case for service strategy and service design, as these ITIL modules give

shape to the IT services and define the importance of data to the organization. The service

strategy defines the customer, added value of the IT services, and the competitive

advantage. In essence, it justifies the existence of the IT department. Service design

describes the services and activities to achieve the business objectives9. The other ITIL

modules, service transition, service operations, and continual service improvement, are an

operationalization of the strategy and the design.

The table below presents a brief overview of the focal points of the data broker for the

functions and processes of service strategy and service design.

Page 12: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 12

Service Life Cycle Phase / ITIL Process Function

Focal points as a data broker

Service Strategy

Financial Management • Estimate the business value of data and include this in the TCO of storage. • Budget a learning curve on how to mine data. • Keep measuring the business value of data for future reference. • Validate the business case for data collection, storage and analytics. • Perceive data storage as potential value as well as cost.

Service Portfolio Management (SPM)

• Include data management services such as data enrichment, data analysis and sandboxes in the portfolio.

Demand Management • Set clear business goals on what to achieve with data analysis. • Allow the IT department to participate in goal setting. • Work in partnership with the IT department in discovering the potential of

big data. • Measure the business outcomes of data.

Service Design

Service Catalogue Management

• Include data sources and data value as part of the IT services. Add new attributes to these services, e.g. retention time, location, business impact classification.

Service Level Management • Emphasize on adding value to the business with data availability, accuracy, compliance and enrichment.

Capacity Management • In addition to monitoring of storage usage, include the usage of data sources.

• Reserve additional performance capacity for data analysis or distribute this extra load evenly with high and low performance peaks.

• Classify data and clean up e-waste for unwanted, degraded and used data.

Availability Management • Describe an overview of the E2E data services and the impact of data availability on the business.

Information Security Management

• Have a clear policy on data management. • Allow data manipulation activities in a separate environment that can be

monitored and access restricted. • Categorize data sources for content sensitivity and according to the

category, restrict (simultaneous) access to these data sources.

Supplier Management • Set terms on export of data in case of change of suppliers. • Agree on usage and ownership of data. • Explore use/purchase of 'open data' of supplier that can used to enrich own

data.

IT (Storage) professionals should be prepared for their new role as data broker as it requires

attention on different areas. A clear understanding of their objectives in data management is

needed to fulfill their duties accordingly.

Page 13: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 13

Conclusion Organizations are able to analyze more data in less time. With analytics tools, organizations

are gaining deeper insight to the behavior of customers and systems, which can result in

higher profit. Data is an important business asset and should be managed professionally.

Traditionally, the IT department is held responsible for the management of storage where

data is implicitly included. Data management per se was limited to archiving, backup, and

restore activities. Today, the IT department faces challenges to keep up with changing

business requirements resulting from advancements in data management and analysis

techniques.

The IT department has to develop the capabilities to support the business more closely. We

perceived five important functions for data management, e.g. determining business value of

data, identifying data sources, performing governance, creating sandboxes for data analysis,

and complying with legislation.

To perform these functions, the IT department should become a data broker. The data broker

understands the demands of the business, has a clear picture of the data sources, and

possesses the tools to deliver the required data to the business. The data broker should

redefine its service strategy and service design to be able to add value to the business with

data.

Page 14: THE IT DEPARTMENT BECOMES A DATA BROKER...The company’s websites are tracked with statistics about the number of visitors per day, the peak hours, and the most frequently visited

2014 EMC Proven Professional Knowledge Sharing 14

Appendix 1. Big Data Revolution, Mayer-Schönberger V, Cukier K, Houghton Mifflin Harcour

Publishing Company, United States, 2013

2. Google Uses Searches to Track Flu’s Spread, Helft M, The New York Times internet,

Nov 11, 2008, http://www.nytimes.com/2008/11/12/technology/internet/12flu.html

3. Survey: CIOs Bullish on Cloud Benefits, But Worry About SaaS Data Silos,

McCarthy V, 2014 http://www.idevnews.com/stories/5339/Survey-CIOs-Bullish-on-

Cloud-Benefits-But-Worry-About-SaaS-Data-Silos

4. Research: The Big Data Management Challenge, Biddick N, Information Week, Dec

4, 2012, http://reports.informationweek.com/abstract/81/8766/business-intelligence-

and-information-management/research-the-big-data-management-challenge.html

5. Gartner Says the Internet of Things Installed Base Will Grow to 26 Billion Units By

2020, Rivera J, van der Meulen R, Gartner, Dec 12 2013,

http://www.gartner.com/newsroom/id/2636073

6. Open data: Unlocking innovation and performance with liquid information, Manyika J,

Chui M, Farrel D and others, McKinsey&Company, Oct 2013,

http://www.mckinsey.com/insights/business_technology/open_data_unlocking_innova

tion_and_performance_with_liquid_information

7. Business should care about digital e-waste, Bakker N, EMC, Feb 14, 2013

8. PATRIOT Act clouds picture for tech, Salef Rauf D, Politico, Nov 29, 2011,

http://www.politico.com/news/stories/1111/69366.html

9. De Kleine ITIL V3, Peters L, Bordewijk M, Ermers J, SDU uitgevers, 2007

EMC believes the information in this publication is accurate as of its publication date. The

information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION

MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO

THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED

WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an

applicable software license.