Upload
mbb
View
216
Download
0
Embed Size (px)
Citation preview
7/29/2019 MapR Solution Brief - comScore
1/2
Overview
comScore is a global leader in measuring the digital world and the preferred source of digital markeng intelligence. comScore
provides syndicated and custom soluons in online audience measurement, e-commerce, adversing, search, video and mobile.
Adversing agencies, publishers, marketers and financial analysts rely on comScore for the industry-leading soluons needed to
cra successful digital, markeng, sales, product development and trading strategies.
comScore uses Hadoop to process over 40 billion internet and mobile records per month. These Hadoop jobs are run every day,
and once theyre done, data is normalized against the comScore URL data diconary and then batch loaded into a relaonal
database for analysis and reporng. comScore analysts generate reports off this data that enable comScore clients to gain
behavioral insights into their mobile and online customer base.
MapR demonstrated superior performance, availability, scalability, ease of use, and significant cost savings over comScores
exisng Cloudera distribuon.
Selection Criteria
Performance: As comScores Hadoop cluster connues to expand, performance integrity not only needs to be maintained to
generate final product faster, but also to do more with less and minimize costs.
Availability: The JobTracker has already failed with the current Hadoop distribuon so both data protecon and availability of
data are concerns as the cluster grows in size.
Scalability: comScore expects to be at 100 nodes by the end of the year, and potenally as many as 350 nodes aer 3 years.
Therefore, comScore needs a Hadoop plaorm that will enable them to maintain performance, ease of use and business con-nuity at scale.
Ease of Use: comScore needs things to just work, and operang the cluster at scale needs to be easy and intuive
MapR Evaluation Results
comScore was using the Cloudera distribuon of Hadoop in producon and was very familiar with its capabilies and
limitaons. The comScore engineering team tested MapR on a wide variety of Hadoop workloads for over a month. The
results of this tesng were as follows:
Performance: Across various benchmarks, the team saw 3 - 5 mes performance gains over the current Hadoop
distribuon and was convinced comScore could run on substanally less hardware with MapR.
Availability: MapR protects against cluster failures and data loss with their Distributed NameNode and JobTracker HA. Rollingupgrades are also now possible with MapR.
Scalability: With architectural changes made possible by the MapR Distributed NameNode HA, MapR will create more files
faster, sort more data faster, and produce beer streaming and random I/O results than what comScore has seen with the cur-
rent distribuon of Hadoop.
Ease of Use: comScore's Vice President of Engineering said, With MapR, things that should just work, just work. This means
there is going to be a lot less for comScore to manage with MapR. Will also said they were able to set up the
MapR cluster in an hour, and one of the things that struck him was that everything was a data node. This is a much beer uliza-
on of hardware from his perspecve since they dont have to dedicate one machine to a specific task i.e. namenode on tradi-
onal Hadoop. Easy to install, easy to manage, easy to get data in and out of the cluster.
MEDIA
Switching to MapR for Reliable Insights
into Global Internet Consumer Behavior
7/29/2019 MapR Solution Brief - comScore
2/2
2012 MapR Technologies. All rights reserved. Apache Hadoop and Hadoop are trademarks of the Apache Soware Foundaon
and not affiliated with MapR Technologies.
MapR Technologies is the creator of the industrys fastest, most dependable and easiest to use distribuon for Apac
Hadoop. MapR Technologies is dedicated to advancing the Hadoop plaorm and ecosystem to enable more busines
to harness the power of big data analycs for compeve advantage. For more informaon, please visit
www.mapr.com.
Quotes from comScore
MapR is the perfect distribuon for the enterprise.
MapR has built in to the design an automated DR strategy
When we started working with MapR within comScore there were a lot of quesons about how is this going to
work, it's amazing the developers come back saying we love working with MapR, the admins love working withMapR, The user community prefers working with MapR versus the standard distribuon.
Michael Brown, Chief Technology Officer, comScore, Inc
Cost Savings with MapR
MapRs performance advantages enable comScore to achieve faster performance on MapR clusters that are half the size of
Cloudera clusters.
Other Distribuon
200
$1,000,000
$50,000
$830,390
$311,040
$500,000
$2,691,430
# Servers
Servers (CapEx)
Servers (opEx)
Datacenter
Power
External Storage
Total
MapR Distribuon
100
$500,000
$25,000
$415,195
$155,520
$0
$1,095,715
Hardware and datacenter savings with MapR (3yr):
Other Distribuon MapR Distribuon
$3,000,000
$2,500,000
$2,000,000
$1,500,000
$1,000,000
$500,000
$0
$5,000
35%
400
$0.2083
$0.0800
$0.0540
$0.0540
50
$10,000
$40,000
Assumpons
Cost per server
Server lifeme (yr)Server OpEx (% of annual CapEx)
Server peak power (W)
Datacenter ammorzaon ($/W/mo)
Datacenter OpEx ($/W/mo)
Server Power ($/W/mo)
PUE overhead ($/W/mo)
Crical data on external storage (TB)
$/TB on SAN (or NAS)
Minimum NAS cost (NameNode needs it)
Summary
comScore evaluated and selected the MapR Distribuon for Apache Hadoop M5 Edion to replace their exisng Cloudera
distribuon. MapR demonstrated superior performance, availability, scalability, and ease of use over comScores exisng
Cloudera distribuon. MapR also provided significant cost savings over Cloudera.