8
Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and Advantage When it comes to big data, companies big and small need to determine not only the best fit of new technology with existing investments but also how to incorporate proven best practices that enable them to run better and run differently. Executive Summary The avalanche of data that is stressing — and often collapsing — traditional computing systems is matched only by the staggering number of technical and architectural choices available to those seeking business value from this environ- ment. Therefore, big data platforms can be a dou- ble-edged sword. While they provide significant IT cost reductions and the power to analyze much larger data sets than previously possible with available IT capabilities, they can also unleash strong disruptive forces, driven by a general lack of understanding during planning and implemen- tation. Users and IT organizations alike still have a hard time understanding what big data technology actually is and how to effectively apply it. Many organizations are struggling to transition into full-scale production with their big data develop- ment platforms and are hamstrung to provide the business value promised and associated with such platforms. In short, big data opportunities are not without big data challenges, the majority of which can be grouped into four major categories: Immature technology landscape. Impact on end users due to fluctuations in the current business model and shortcomings of big data technology. Attempts to replace existing technology components with a big data platform. Resource availability. This white paper discusses the challenges of implementing big data technology and provides guidance on how to implement a big data initia- tive by incorporating proven best practices. Fragmented Perspectives You can hardly pick up a magazine or browse a Web site covering business or IT trends without being bombarded by content extolling the virtues of big data. Proponents typically address the tremendous promise offered by big data tools Cognizant 20-20 Insights cognizant 20-20 insights | august 2013

Traveling the Big Data Super Highway: Realizing - Cognizant

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and AdvantageWhen it comes to big data, companies big and small need to determine not only the best fit of new technology with existing investments but also how to incorporate proven best practices that enable them to run better and run differently.

Executive SummaryThe avalanche of data that is stressing — and often collapsing — traditional computing systems is matched only by the staggering number of technical and architectural choices available to those seeking business value from this environ-ment. Therefore, big data platforms can be a dou-ble-edged sword. While they provide significant IT cost reductions and the power to analyze much larger data sets than previously possible with available IT capabilities, they can also unleash strong disruptive forces, driven by a general lack of understanding during planning and implemen-tation.

Users and IT organizations alike still have a hard time understanding what big data technology actually is and how to effectively apply it. Many organizations are struggling to transition into full-scale production with their big data develop-ment platforms and are hamstrung to provide the business value promised and associated with such platforms.

In short, big data opportunities are not without big data challenges, the majority of which can be grouped into four major categories:

• Immature technology landscape.

• Impact on end users due to fluctuations in the current business model and shortcomings of big data technology.

• Attempts to replace existing technology components with a big data platform.

• Resource availability.

This white paper discusses the challenges of implementing big data technology and provides guidance on how to implement a big data initia-tive by incorporating proven best practices.

Fragmented PerspectivesYou can hardly pick up a magazine or browse a Web site covering business or IT trends without being bombarded by content extolling the virtues of big data. Proponents typically address the tremendous promise offered by big data tools

• Cognizant 20-20 Insights

cognizant 20-20 insights | august 2013

2

and techniques, from gaining insight heretofore unavailable, to significantly reducing the cost and/or time necessary to achieve business benefits. Also covered is the new-found organizational ability to analyze data generated by devices and social media, as well as other unstructured and semi-structured data.

What stands out amid these abundant capability and benefit claims is the lack of a universally accepted defi-nition of big data. According to IDC, “Big data technolo-gies describe a new gen-eration of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by

enabling high-velocity capture, discovery, and/or analysis.”1 In other words, this definition incorpo-rates all data types managed by next-generation systems that must scale to handle ever-increas-ing user workloads and data volumes.

On the other hand, McKinsey & Co. defines big data as, “Datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.”2 This suggests that big data’s size is relative to the effectiveness of the technology that handles it and that what con-stitutes big data today will not likely be big data tomorrow.

All this said, there is little wonder why a wide spectrum of big data approaches, and big data results, exists.

Invisible InkMost gurus consider the Apache Open Source Foundation’s Hadoop technology stack as the quintessential big data platform (see Figure 1). This stack actually comprises a small number of components and does not completely address key issues pertaining to real-time analytics, data security and operations. Customers frequently select one of the commercially available solutions to address these issues.

The problem is that all the leading big data solution vendors are still scrambling to fill opera-tional, visualization and information discovery gaps while also planning major product changes needed over the next six months to a year.

A hidden message is written between the lines of this big data story. Consider this:

• The capabilities available in the multitude of commercial solutions vary significantly between channels and continue to diverge.

• The current Hadoop stack is aimed at batch processing and is not tailored for real-time processing.

• Various tool sets are evolving rapidly and dra-matically, and this technological progression is expected to continue.

cognizant 20-20 insights

The leading big data solution vendors

are still scrambling to fill operational, visualization and

information discovery gaps while also planning major product changes.

Figure 1

Components of Apache’s Hadoop Platform

Sqoo

p re

latio

nal

data

base

dat

a co

llect

or

Flum

e|Ch

ukw

a lo

g da

ta c

olle

ctor

Hadoop MapReduceDistributed processing framework

HDFSHadoop distributed file system

Rstatistics

Mahoutmachine learning Pig data flow Hive data

warehouse Ooziew

orkflowZookeeper

coordination

AmbariProvisioning, managing and monitoring Hadoop clusters

• Support for third-party data visualization tools is currently not inherent in the Hadoop stack.

• The database capabilities of the platform do not currently provide high concurrency or ad hoc query support.

• Operational aspects of the platform are lacking, such as point-in-time recovery and data-level security.

• New organizational skill sets are required to design, build and support applications running on this platform.

Now that everyone is talking big data, should organizations begin the implementation of a big data strategy? The answer is a resounding “yes.” The promise of big data to allow business users to analyze large data sets in ways they cannot perform today, while significantly reducing IT infrastructure, is real. However, it will take time to transform both business and technology organi-zations into a state that will deliver full business value. Companies should understand the limita-tions of the platform and use care when deter-mining where the technology fits, and where it does not.

Planning the JourneyTo tap big data advantages early on, companies must ask — and hopefully answer — two funda-mental questions that are not mutually exclusive:

1. Is the business goal to maximize the value of data that already exists and solve current

problems with better, faster, more agile or less expensive technology? That is, do we hope to “run better” after big data is implemented?

2. Is the goal to tackle long-standing unsolved problems or discover new solutions not previously considered, using new sources and new technologies? That is, do we hope to “run differently” after big data is implemented?

To counteract the disruptive forces caused by immature technology, organizations have embarked on the journey along their own big data super highways and are beginning to pass the fol-lowing checkpoints:

• Awareness: Determining what big data really is and what it means to them.

• Innovation: Understanding the capabilities and limitations of big data technologies.

• Management: Harmonizing existing technolo-gies with big data technologies to maximize the life and use of existing IT investments.

• Operating: Mobilizing resources and structur-ing a fluid environment.

• Transformation: Continually improving ana-lytical capabilities through holistic adoption across the enterprise.

The “unknown-uncaptured advantage” (shown in orange in Figure 2) depicts the loss of possible business value by the organization due to its inability to identify, and thus capture, opportuni-ties. In some ways, the orange area is the province

3cognizant 20-20 insights

The Big Data Journey: Lifting the Highway

Figure 2

Busi

ness

Val

ue

MaturityCaptured Advantage

Awareness Innovation Managing OperatingHigh

HighLow

Transformation

ReferenceReferen eArchitectureArchitec ur

Innovation nnovatioLLaabb

SShhaarreedd SSeerrvviicceess rree vvhModell

DisruptiveForces

Unknown Uncaptured Advantage

Uncaptured Known Advantage Completed Checkpoints

Big Data Super Highway

cognizant 20-20 insights 4

of researchers and visionaries long before value is commercially viable. Here, companies cannot capture what they do not know exists.

The “uncaptured-known advantage,” on the other hand, (shown in green) represents an understand-ing of value that is not yet realized. This is where a company that makes the right technical and orga-nizational decisions early on can gain business advantage, and therefore competitive differen-tiation, vis-à-vis late adopters. Discussed in-depth in the following section, known mile markers are enabling companies to “lift” their big data highway and capture the value difference between the two curves. Here is where value captured by one enterprise and not another becomes a competi-tive advantage — or disadvantage. Here is where visionary companies can run differently.

Lastly, “captured advantage” (shown in blue) simply indicates which benefits have been obtained within the organization. Generally, this is an area where all companies can eventually be expected to participate. The blue area shows that competitive advantage has given way to competitors that have captured operational and process advantages and run better. In contrast,

a company that is not even executing well in the “captured advantage” space is running a risk of obsolescence and market failure.

Stepping Stones, Bumps in the Road and Navigating There is much value to be earned in lifting the pace of value discovery and creation and thus dis-covering value earlier in the business cycle. And so, companies must find a way to press on, despite the opposing disruptive forces. A series of steps is recommended for approaching and implementing big data initiatives that have proved to be quite successful. Likewise, there are pitfalls that come with immaturity and a lack of direction.

Marker 1: Reference Architecture

No matter what data a company needs — sales, competitive, economic, weather or demographic data — the technical journey of big data begins with the reference architecture. This addresses the practical needs required to obtain maximum reuse of existing technology investments, and it incorporates the use of both big data and tra-ditional technology components in one overall architecture (see Figure 3).

Combining Technologies to Form Overarching Reference Architecture

Figure 3

NoSQL database

Analytics appliance

Third-party database

Legacy database

Traditional database

HDFS file system

Information Discovery

ReportingSource 1

Unstructured Data

E-mail

Social media

Data marts

NoSQL

ArchivingSandbox

Metadata

Data

Vie

ws

and

Sem

antic

Lay

er

Auth

entic

atio

n an

d Au

thor

izatio

n

User

Priv

ilege

sInternal and

External Data

Landing Staging Information Discovery and Deep Analytics

Dataware-houses and Data Marts

Data Privacy and Security

Reporting Analytics and Visualization

Hivedata marts

HBASE warehouse

Source n Data warehouses

Data Integration Layer and Data Certification

Metadata ManagementData Governance, Stewardship, ILM, DLM, Data Standards, Data Federation, Lineage and Insights

SecurityHDFS (Hive/HBASE)

Documents

Data Export Process Data Extracts

(SFDC, Other)

Visualization and Discovery

Big Data Analytics

Analytics

• Standard• OLAP• LIST

• Ad hoc• Slice/dice• Drill down

• Business analytics• Data mining• Forecasting

• Guided analytics• Predictive modeling• BAM

• Static• Code data

• Predictive pattern• GUI data

• Brand sentiment analytics• Behavioral• Social media• Data indexing for search

Innovation labs are critical to enabling the right combination of toolsets, datasets, skill sets and mindsets for better, faster, cheaper and more successful use of big data technologies.

This hybrid reference architecture enables com-panies to take advantage of greater processing speeds provided by the highly scalable Hadoop environment and positions the organization for long-term use of semi-structured and unstruc-tured data. Companies can also minimize the cost and organizational impact of this new platform by allowing business users and current applications to continue to apply the same technologies they use today.

As a comprehensive solution, the reference archi-tecture promotes the following benefits:

• Provides a clear path to technology maturity.

• Offers agility by allowing quick response to changing business needs through the integra-tion and reuse of current analytics platforms and skills.

• Promotes confidence as a result of reduced risk, higher quality data and better governance.

• Explains to corporate stakeholders the role of existing and new technologies and the placement of new investments.

Marker 2: Innovation Labs

Many self-service, reporting and analytics tools to which the business is accustomed must be

replaced or adapted to Hadoop. This fact alone can result in a significant learning curve. Add to this the need to replace other tools, and the curve is exaggerated further.

To address this need, innovation labs are often used to help business users understand the limi-tations of the Hadoop platform and begin devel-oping the skills necessary for supporting the system. By utilizing an innovation lab, business users may even begin to gain new informa-tional knowledge, as well.

For example, business users can be introduced to new analytics tools while new analytics models are being developed. Simultaneously, data scientists can begin analyzing new data sources as platform usage matures within the organization. The business’s opera-tional units can also begin preparing and collabo-rating with their suppliers on key issues, such as monitoring, point-in-time recovery, failover, data security and all capabilities that are currently not part of their numerous software solutions.

Quick Take

In recent programs that used a similar reference architecture, the following results have been achieved:

• A large financial company reduced its capital expenses by eliminating the need to upgrade existing systems. It accomplished this by relocating transformational and data aggrega-tion processes to the big data platform. If it had continued processing on the current database system, the company estimates it would have needed to allocate over $20 million in capital expenditures.

• After having an integrated customer view on its wish list for over 10 years, an insurance giant was finally able to achieve its goal through the use of NoSQL technology. It combined data from nearly 100 separate administrative

and claims systems and moved from pilot to rollout in 90 days, creating a more accurate churn model that responded much sooner to new signals. This 90-day turnaround was a refreshing outcome compared with typical insurance industry IT projects, which are measured in quarters or years.3

A Cautionary Tale

But not all big data stories end well. A global company attempted to move portions of its data acquisition, aggregation and analytical process-ing to Hadoop. However, the company did not understand the platform’s inherent limitations before attempting to port a broad range of func-tionality to it. As a result, it had to revert back to its traditional platform, losing time and money and instigating concern from business teams regarding the platform’s overall viability.

Real-Life Reference Architectures

5cognizant 20-20 insights

cognizant 20-20 insights 6

Innovation labs are critical to enabling the right combination of toolsets, datasets, skill sets

and mindsets for better, faster, cheaper and more successful use of big data technologies. Another crucial concept of the innovation lab is to provide an environment where new technological compo-nents can be introduced — as the technology matures — and where IT can learn more about the limi-tations of those technologies.

Marker 3: Shared Services

Another critical element of success is the shared services model, as well as concerns as to whether the innovation lab can be properly staffed and managed. Without this support, the adoption and success rate of big data programs is often dramatically reduced. The shared services model provides a resource pool to support various development and analytics needs. It also acts as a breeding ground for technology develop-ers to learn and grow while developing the pro-cesses and procedures necessary for shifting the platform into an operational state.

The main advantages of using the shared services model include:

• Promoting the big data vision to provide the broader context for implementation success. People need direction, and without clear leadership and vision, many will find reason to resist or pull the initiative in opposing directions.

• Highly skilled team members who can help train and educate the analytics team, develop-ment staff and data scientists on the proper way to use the system.

• Mentorship as a fundamental support system. An extension of training, mentoring enables active participation in big data innovation efforts and provides an environment for leadership development.

• Use cases and usage patterns, which provide stronger design guidance than design princi-ples alone. This is because they put the prin-ciples into practice and apply them to different contexts. Doing so better illustrates how they are to be applied and allows many design deci-sions to be pre-determined.

• Planning for future adoption and integra-tion of new technologies to meet tomorrow’s business needs. Having a pre-developed plan

for big data technology progression — including how existing technologies will fit into the future big data framework — provides a solid foundation for the user community.

• Specialized expertise available by project or topic, eliminating the need to hire a full-time resource for a one-time question.

• Bypassing of the “who pays first?” dilemma, which occurs when multiple groups could benefit from onboarding a new technology but no funding model exists to allow for group cost-sharing. In other cases, an individual team is stuck with the entire cost burden of the prototype. A shared services organization can leverage a small innovation fund to seed these innovation initiatives quickly and flexibly.

As an example of shared services model success, a large financial institution was able to create, process, analyze and leverage data to drive business priorities. This capability enabled it to enhance predictions of customer risk behaviors, strengthen the identification of high-value pros-pects and automate analysis of written customer surveys, which led to decreased service improve-ment time. Here, the shared services model acted as a catalyst for achieving what was possible with big data technology.

By combining people and knowledge with a uniform plan, the company was able to bring findings into fruition. Without the shared services model, this capability would have remained in the innovation lab’s Petri dish, never to advance beyond anything more than a good idea.

Marker 4: Reaching Best in Class

Technology and capabilities alone will not create a best-in-class organization or enable an orga-nization to move from standard reporting and traditional business intelligence analytics to the next level of predictive analytics. As the organi-zation matures in the use of its team, processes and technological components, these features become increasingly ingrained in the business.

Data scientists should form a trial-and-error system, testing one idea after another as the data history grows. This may at first seem counter-intuitive, but conducting this type of analysis at this stage of maturity typically is met with higher success rates than alternatives. The reason: Companies usually have the experience needed to quickly identify opportunities, as well as the stability to overcome challenges when testing new ideas.

Data scientists should form a trial-

and-error system, testing one idea

after another as the data history grows.

cognizant 20-20 insights 7

As these ideas are tested and verified, the benefits of a truly agile analytics model begin to take shape. Also, promoting a higher degree of collaboration helps drives additional learning throughout the enterprise and converts into reality the goal of adopting an enterprise-wide big data platform.

From On-Ramp to DestinationThe technological — and organizational — consider-ations that accompany the deployment and intro-duction of big data platforms are an integral part of implementing any big data strategy. Although the Hadoop platform is evolving rapidly, organiza-tions can start achieving results now by enabling users to begin exploring the underlying technolo-gies while addressing key present-day challenges.

Companies can maximize their investment in current and future technology — and people — by considering the following tips:

• Avoid trading today’s unsolved issues with the invention of new use cases or business problems that new technology is assumed to solve. Instead, take a fresh look at current business problems that can be addressed using the existing infrastructure.

• Develop parallel strategies that immediately enable business users’ application of core technological components while correcting your organization’s operational issues.

• Refrain from replacing current technology just because you can. Instead, focus on extending the current reference architecture to blend big data components with the existing architectural stack, thereby maximizing current investments and reducing organiza-tional change management challenges. The traditional big data platform represents only one aspect of the overall technology solution, albeit an integral part. Also, your organization’s current technology will be fully aligned with reports and analytics calculations that work well. Big data can pre-process the volumes or calculate new factors, enabling your organiza-tion to leave existing investments to the pre-sentation layer of the calculated results.

• Remember that technology is not only maturing; it’s also evolving. Therefore, your organization should embrace new techno-logical components and solutions in lieu of attempting to invest in a single technology stack.

Analogies abound on how companies can safely achieve value from big data. By follow-ing this checklist, business and IT leaders alike can surmount steep challenges, derive big data value more quickly, maintain advantage over late adopters and enable the enterprise to benefit from big data technologies.

Big Data Organizational Support Model

Figure 4

Onboard business unit use cases

Big

Data

Tec

hnol

ogy

Mat

urity

Big Data Organizational Maturity

High

HighLow

Maturity

Innnovvatioon Laabs –– FFor Quick WWinss Shaared Servicees Modeel

Best in Class OOrrganizaationRefereence Arcchiteectturre

Enterprise AdoptionHow do I mobilize all my data and operationalize analytics at scale?

Agile AnalyticsHow do I continually improve my analytics capabilities?

Technology RefreshHow do I leverage big data with my existing assets

to revamp my current business model?Pilot

How do I ensure expected value proposition?

Enable InnovationHow do I find additional informational value?

Proof of ConceptsWhat are my

capabilities/limitations?

AwarenessWhat is this “stuff?”

About CognizantCognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out-sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50 delivery centers worldwide and approximately 164,300 employees as of June 30, 2013, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.

World Headquarters500 Frank W. Burr Blvd.Teaneck, NJ 07666 USAPhone: +1 201 801 0233Fax: +1 201 801 0243Toll Free: +1 888 937 3277Email: [email protected]

European Headquarters1 Kingdom StreetPaddington CentralLondon W2 6BDPhone: +44 (0) 20 7297 7600Fax: +44 (0) 20 7121 0102Email: [email protected]

India Operations Headquarters#5/535, Old Mahabalipuram RoadOkkiyam Pettai, ThoraipakkamChennai, 600 096 IndiaPhone: +91 (0) 44 4209 6000Fax: +91 (0) 44 4209 6060Email: [email protected]

© Copyright 2013, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.

About the AuthorHal Lavender is an Associate Vice President of Enterprise Architecture within Cognizant’s Enterprise Infor-mation Management Practice, leading an enterprise architecture service line that provides architectural consulting and guidance in information management and analytics. Hal has over 30 years of experience with information architecture and technology in the data management space. A graduate of the Florida Institute of Technology and University of Dallas, he holds master’s degrees in business administration and computer science. Hal can be reached at [email protected].

Footnotes1 “Extracting Value from Chaos,” IDC, June 2011, http://www.emc.com/collateral/analyst-reports/idc-

extracting-value-from-chaos-ar.pdf.2 “Big Data: The Next Frontier for Innovation, Competition and Productivity,” McKinsey Global Institute,

May 2011, mckinsey.com/Insights/MGI/Research/Technology.3 D. Henschen, “MetLife Uses NoSQL for Customer Service Breakthrough,” InformationWeek,

http://www.informationweek.com/software/information-management/metlife-uses-nosql-for-customer-service/240154741.