Upload
justopenminded
View
24
Download
1
Embed Size (px)
DESCRIPTION
Adopting Big Data through Open Source
Citation preview
Survey Results
Talend How Big Is Big Data Adoption? – Survey Results
Page 2
Table of Contents
Survey Results ............................................................................................................................. 4
Big Data Company Strategy ........................................................................................................ 6
Big Data Business Drivers and Benefits Received ...................................................................... 8
Big Data Integration ................................................................................................................... 10
Big Data Implementation Challenges ......................................................................................... 12
Big Data Implementation Technologies ..................................................................................... 14
About Talend .............................................................................................................................. 15
Talend How Big Is Big Data Adoption? – Survey Results
Page 3
Big data represents a significant paradigm shift in enterprise technology and stands to transform
much of what the modern enterprise is today. Digital data is everywhere and global data is growing at
40% per year. Companies capture trillions of bytes of information about their customers, suppliers,
and operations, and millions of networked sensors are being embedded in the physical world in
devices such as mobile phones, energy meters and automobiles, sensing, creating, and communicating
data.1 By collecting and analyzing all this information companies gain insight into new business
opportunities and threats.
But what exactly is big data? Big data encompasses a complex and large set of diverse structured
and unstructured datasets that are difficult to process using traditional data management practices
and tools. There is an increasing desire to collect call detail records, web logs, data from sensor
networks, financial transactions, social media and Internet text, and then analyze with existing data
sources. Conventional data management tools fail when trying to integrate, search and analyze big
datasets, which (for now) range from terabytes to multiple petabytes of information. As an example,
Walmart handles more than 1 million customer transactions every hour, which is imported into
databases estimated to contain more than 2.5 petabytes of data - the equivalent of 167 times the
information contained in all the books in the US Library of Congress.2 New technologies based on the
Apache Hadoop Big Data Platform have emerged as a way to analyze large data sets through a
technique called massively parallel-processing (MPP) of information.
As with any new successful paradigm, there is a technology adoption curve from innovators and early
adopters, to the early majority, to the late majority, to laggards. Early adopters are driven by
competitive advantage and innovation, take the biggest risks for success, typically use primitive tools,
and build it themselves. Conversely, the late majority and laggards strive for the productivity gains
others have received and take less risk by adopting proven technologies backed by robust products
and services. In the information arms race, companies that can collect and analyze more information
should be able to make faster, better-informed decisions compared to their competitors, e.g. by
maximizing customer wallet share, by knowing when and why customers may leave, by efficiently
creating and targeting new markets, or by deterring fraud. To date most of the big data discussion
has been about big data technology. The goal of this survey and whitepaper is to highlight big data
adoption challenges, business objectives and benefits, as well as big data technologies being used.
1 “Big data: The next frontier for innovation, competition, and productivity” McKinsey & Company, May 2011. 2 http://en.wikipedia.org/wiki/Big_data
Talend How Big Is Big Data Adoption? – Survey Results
Page 4
North America 49%
EMEA 51%
Survey Results
In the summer of 2012, Talend conducted a big data adoption survey of 231 professionals involved in
the delivery of data solutions for their company. Survey respondents were closely split between North
America (49%) and EMEA (51%), with 60% of respondents in IT and 36% having business titles. 95
respondents who did have a big data strategy, were then asked a series of questions about their
experience.
Figure 1: Survey Demographics
Key findings from the survey are:
• 41% of companies have a strategy for dealing with big data, indicating the growing adoption of
big data.
• 48% of big data initiatives are driven by the business, 39% by IT, and 13% cross-functionally.
• For those without a big data strategy, the main reason (76%) is that they do not distinguish big
data from existing corporate data.
• Increasing the depth and accuracy of predictive analytics was the number one driver for big
data, reported by 68% of those who have a big data strategy. Using today’s definition of big
data (> 10 terabytes), 71% of respondents have big data to manage.
• 62% indicated that they have achieved big data business benefits with the primary benefit
being business process optimization (28%) and improvements in marketing and sales (24%).
Business 36%
IT 60%
Other 4%
Talend How Big Is Big Data Adoption? – Survey Results
Page 5
• However, 24 companies reported not receiving a business benefit which may indicate the need
for improved big data skillsets, governance and management.
• The types (inputs) of big data that are being used today include web and social media (57% of
respondents) followed by sales data (54% of respondents).
• 61% replied that their primary big data challenge was allocating sufficient time, budget and
resources, with just over half (52%) reporting a lack of big data in-house expertise.
• Open source Apache Hadoop and Hadoop-based distributions represented over 60% of big data
implementation technologies in use or considered for use.
Talend How Big Is Big Data Adoption? – Survey Results
Page 6
Big Data Company Strategy
It was just over 10 years ago when Doug Laney from the Meta Group (now Gartner) published a report3
on the growing volume, velocity and variety of data and that organizations need to look beyond
traditional approaches. The business model of big data early adopters such as Google and Facebook
required that they create a strategy to collect and analyze large volumes of data to scale their
business. Some companies have big data collection and analysis in their DNA and have built a separate
big data strategy. Others recognize that big data is part of a larger total data management function
and incorporate big data tools and practices throughout the business to manage big data as well as
enterprise data and discrete data. In 2011 there were 9 companies who offered products based on big
data (Apache Hadoop) technologies and now there are over 120 vendors – clearly showing signs of
momentum. The survey results (Figure 2) showed that 41% of organizations do have a big data
strategy and 59% do not. Furthermore, 76% of those without a specific strategy replied that they do
not distinguish big data from existing corporate data management practices.
For companies that do have a big data strategy (n=95), it is being driven by several company functions
(Figure 4), which indicates that big data as a core strategy has moved past an early adopter stage.
39% indicated that big data initiatives are being driven by IT, or a bottom up approach to be more
efficient in collecting and analyzing large data sets. However, 48% of big data strategy is being driven
by lines of business or executives, which indicates there are compelling business reasons for big data
adoption, such as increased revenue, improved customer satisfaction, or faster time to market.
3 http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
41%
59%
Figure 2: Does your organization have a strategy for dealing with big data? (n=231)
No Yes
24%
76%
Figure 3: If no, why? (n=136)
We don't distinguish big data from existing corporate data Other reasons
Talend How Big Is Big Data Adoption? – Survey Results
Page 7
Figure 4: The big data initiative is primarily being driven by:
Early results show that big data is part of larger corporate data initiatives, and is being driven by business more than IT.
IT 39%
Business and Consumers of
Data 26%
Executive Management
20%
Cross-functional Team
13%
Board of Directors
2%
Talend How Big Is Big Data Adoption? – Survey Results
Page 8
Big Data Business Drivers and Benefits Received
Data requirements and benefits vary by industry. For example, communications providers,
government, healthcare and retail firms all have larger amounts of unstructured data such as text,
audio and video files that can benefit from big data collection and analysis. The number one business
driver for big data (68%) is increasing the accuracy and depth of predictive analytics – or the ability to
analyze current and historical data to make future predictions. Revenue optimization (51%) and new
revenue generation (48%) were the second and third highest responses as companies seek to do more
in-depth analysis to maximize market and wallet share, e.g. improve cross-selling capabilities.
Figure 5: What are the business drivers for big data in your organization? (multiple responses, n=95)
48% 51%
68%
31%
20% 19% 16%
1% 0%
10%
20%
30%
40%
50%
60%
70%
80%
Talend How Big Is Big Data Adoption? – Survey Results
Page 9
For those that have implemented big data projects, 62% indicated that they have achieved business
benefits (Figure 6) with the primary benefit being business process optimization (28%) and
improvements in marketing and sales (24%). It is a concern that 38% responded “No” or “Unknown”,
and may be due to the lack of project governance, data quality, big data skillsets and/or available
tooling, which is typical for new paradigms.
Figure 6: To date, have you realized any business benefit to big data? (n=95)
Big data business benefits include business process optimization and marketing/sales improvement; however for some projects it is too early to tell or have failed to deliver a benefit.
Yes, particularly in Marketing and Sales
24%
Yes, particularly in Crime Prevention and
Fraud 5%
Yes, particularly in Customer Retention
5% Yes, particularly in Business Process Optimization
28%
No 25%
Unknown 13%
Talend How Big Is Big Data Adoption? – Survey Results
Page 10
Big Data Integration
Common big data use cases include marketing campaign analysis, recommendation engines,
predictive analytics, sentiment analysis, risk management and fraud detection. IT is integrating
existing data warehouses and business intelligences systems with diverse sets of structured and
unstructured data for more in-depth analysis. The survey revealed that the most common applications
being integrated were financial transactions (48.4%) and social media, clickstream and Internet text
(48.4%), followed by web logs (35.8%) and call detail records (28.4%). By looking at social media and
internet text, firms can understand who the “super users” are in any social network or community,
i.e. ones that have the most influence over others inside social networks. Also, by correlating
financial transactions and call detail records with click streams, one can generate a more complete
view of customer buying patterns and behavior.
Figure 7: Which applications are driving big data needs at your organization? (multiple responses)
Furthermore the type of big data that is being used today or considered in the future reinforced the
previous response. Web and social media are being used by 57% of respondents today with 23%
considering it for the future. Sales data was the second highest for being used today (54%) and
28%
48%
25% 25%
48%
16%
36%
8% 0%
10%
20%
30%
40%
50%
60%
Call detail records
Financial transactions
Science, research data or
medical data
Sensor data Social media,
click-‐stream or internet search analytics
Video, imaging data
Web logs Other
Talend How Big Is Big Data Adoption? – Survey Results
Page 11
considered for the future (32%) for analyzing buying patterns and sales incentives. Biometric data was
the lowest on the list.
Figure 8: What type of big data are you involved in or considering making part of your Business Intelligence?
Web and social media are big data’s primary inputs with sales data a close second.
54% 42% 51%
18% 34%
22% 30%
30%
8%
36%
19% 23% 14%
69%
25%
Web and social media (web logs, twitter feeds, JSON)
Machine generated data (RFID, GPS,
phone apps and other machine generated
data)
Sales data (buying patterns and sales
incentives)
Biometric (Qingerprint, voice/
face recognition, DNA)
Human interactions (e-‐mails, voice mails,
call centers)
Actively Involved Considering for the Future Not Considering
Talend How Big Is Big Data Adoption? – Survey Results
Page 12
Big Data Implementation Challenges
The technical challenges in processing big data involve integrating, searching and analyzing large data
sets. However, like any new paradigm, companies must also find the right skillsets, get budget
approval, navigate company politics, and manage the unknowns. The United States alone faces a
shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and
analysts to analyze big data and make decisions based on their findings.4 Many early big data projects
are free of explicit project management structure and over time companies will incorporate
standards and procedures just as they have with data management projects. In the survey, 61%
replied that their primary big data challenge was allocating sufficient time, budget and resources,
with just over half (52%) reporting a lack of big data in-house expertise. Also 48% reported data
quality challenges, while only 11% reported a challenge getting C-level buy-in for big data projects.
Figure 9: What challenges to implementing big data are most hindering your success? (multiple responses)
4 “Big data: The next frontier for innovation, competition, and productivity” McKinsey & Company, May 2011.
52%
18%
36%
61%
37%
48%
11% 5% 4% 0%
10%
20%
30%
40%
50%
60%
70%
Talend How Big Is Big Data Adoption? – Survey Results
Page 13
Big data processing varies by organization, industry and the types of tools available to process the
data. It may be collecting and analyzing terabytes of information or many petabytes of data, and over
time it is assumed that the definition of “big data” will grow. Using today’s definition of big data (>
10 terabytes) 71% of respondents (Figure 10) have big data to manage. 46% of respondents have over
100 terabytes of data to manage and 12% had greater than 2 petabytes to manage.
Figure 10: What is the total amount of data that exists within your organization?
A large majority of companies have over 10 terabytes of data to manage, but the biggest barriers to big data adoption are a shortage of time, budget, expertise and resources.
29%
25% 23%
11%
6% 6%
< 10 Terabytes
10 to 99 Terabytes
100 to 499 Terabytes
500 to 1 Petabytes
2 to 5 Petabytes
> 5 Petabytes
Talend How Big Is Big Data Adoption? – Survey Results
Page 14
Big Data Implementation Technologies
Many technologies have been developed to integrate, manipulate, manage and analyze big data.
Survey respondents were asked which big data technologies they are using or they are considering to
use. Apache Hadoop (28%), an open source framework with basic big data constructs (file system,
language, and distributed system for managing large datasets) that incorporates MapReduce, was the
most popular response. Showing strong support for open source technology, Hadoop and Hadoop-
based solutions represented 62% of the responses, The large amount of Other (38%) responses suggest
an early adopter, fragmented market, and included selections for big data appliances (e.g. Teradata,
Netezza) and NoSQL databases (e.g. Couchbase, MongoDB).
Figure 11: Which implementation of big data technology are you considering or are you using already? (n=95)
Open source Apache Hadoop and Hadoop-based distributions represented over 60% of big data implementations in use or considered for use.
13%
28%
12%
2% 3% 4%
38%
Amazon Web Services
Apache Hadoop (own installation)
Cloudera (CDH)
Greenplum
Hortonworks Data Platform
MAPR
Other
Talend How Big Is Big Data Adoption? – Survey Results
Page 15
About Talend
Talend is the recognized leader in open source integration solutions. The company’s holistic
integration platform helps organizations minimize costs and maximize the value of data integration,
ETL, data quality, master data management, application integration and business process
management - while supporting their shift toward big data. More than 3,500 paying customers
worldwide, including eBay, ING, The Weather Channel, Deutsche Post and Allianz, subscribe to
Talend’s solutions and services. With over 20 million downloads, Talend’s products are the most
trusted integration solutions in the world. The company has major offices in North America, Europe
and Asia, and a global network of technical and services partners.
Talend’s open source approach and flexible integration platform for big data enables users to easily
connect and analyze data from disparate systems to help drive and improve business performance.
Talend’s big data capabilities integrate with today’s big data market leaders such as Cloudera,
Hortonworks, Google, Greenplum, Mapr, Teradata and Vertica, positioning Talend as a leader in the
management of big data. Talend’s goal is to democratize the big data market just as it has with data
integration, data quality, master data management, enterprise service bus and business process
management. Visit www.talend.com to learn more and download your free copy of Talend Open
Studio for Big Data.