MapR Solution Brief - comScore

  • Upload
    mbb

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

  • 7/29/2019 MapR Solution Brief - comScore

    1/2

    Overview

    comScore is a global leader in measuring the digital world and the preferred source of digital markeng intelligence. comScore

    provides syndicated and custom soluons in online audience measurement, e-commerce, adversing, search, video and mobile.

    Adversing agencies, publishers, marketers and financial analysts rely on comScore for the industry-leading soluons needed to

    cra successful digital, markeng, sales, product development and trading strategies.

    comScore uses Hadoop to process over 40 billion internet and mobile records per month. These Hadoop jobs are run every day,

    and once theyre done, data is normalized against the comScore URL data diconary and then batch loaded into a relaonal

    database for analysis and reporng. comScore analysts generate reports off this data that enable comScore clients to gain

    behavioral insights into their mobile and online customer base.

    MapR demonstrated superior performance, availability, scalability, ease of use, and significant cost savings over comScores

    exisng Cloudera distribuon.

    Selection Criteria

    Performance: As comScores Hadoop cluster connues to expand, performance integrity not only needs to be maintained to

    generate final product faster, but also to do more with less and minimize costs.

    Availability: The JobTracker has already failed with the current Hadoop distribuon so both data protecon and availability of

    data are concerns as the cluster grows in size.

    Scalability: comScore expects to be at 100 nodes by the end of the year, and potenally as many as 350 nodes aer 3 years.

    Therefore, comScore needs a Hadoop plaorm that will enable them to maintain performance, ease of use and business con-nuity at scale.

    Ease of Use: comScore needs things to just work, and operang the cluster at scale needs to be easy and intuive

    MapR Evaluation Results

    comScore was using the Cloudera distribuon of Hadoop in producon and was very familiar with its capabilies and

    limitaons. The comScore engineering team tested MapR on a wide variety of Hadoop workloads for over a month. The

    results of this tesng were as follows:

    Performance: Across various benchmarks, the team saw 3 - 5 mes performance gains over the current Hadoop

    distribuon and was convinced comScore could run on substanally less hardware with MapR.

    Availability: MapR protects against cluster failures and data loss with their Distributed NameNode and JobTracker HA. Rollingupgrades are also now possible with MapR.

    Scalability: With architectural changes made possible by the MapR Distributed NameNode HA, MapR will create more files

    faster, sort more data faster, and produce beer streaming and random I/O results than what comScore has seen with the cur-

    rent distribuon of Hadoop.

    Ease of Use: comScore's Vice President of Engineering said, With MapR, things that should just work, just work. This means

    there is going to be a lot less for comScore to manage with MapR. Will also said they were able to set up the

    MapR cluster in an hour, and one of the things that struck him was that everything was a data node. This is a much beer uliza-

    on of hardware from his perspecve since they dont have to dedicate one machine to a specific task i.e. namenode on tradi-

    onal Hadoop. Easy to install, easy to manage, easy to get data in and out of the cluster.

    MEDIA

    Switching to MapR for Reliable Insights

    into Global Internet Consumer Behavior

  • 7/29/2019 MapR Solution Brief - comScore

    2/2

    2012 MapR Technologies. All rights reserved. Apache Hadoop and Hadoop are trademarks of the Apache Soware Foundaon

    and not affiliated with MapR Technologies.

    MapR Technologies is the creator of the industrys fastest, most dependable and easiest to use distribuon for Apac

    Hadoop. MapR Technologies is dedicated to advancing the Hadoop plaorm and ecosystem to enable more busines

    to harness the power of big data analycs for compeve advantage. For more informaon, please visit

    www.mapr.com.

    Quotes from comScore

    MapR is the perfect distribuon for the enterprise.

    MapR has built in to the design an automated DR strategy

    When we started working with MapR within comScore there were a lot of quesons about how is this going to

    work, it's amazing the developers come back saying we love working with MapR, the admins love working withMapR, The user community prefers working with MapR versus the standard distribuon.

    Michael Brown, Chief Technology Officer, comScore, Inc

    Cost Savings with MapR

    MapRs performance advantages enable comScore to achieve faster performance on MapR clusters that are half the size of

    Cloudera clusters.

    Other Distribuon

    200

    $1,000,000

    $50,000

    $830,390

    $311,040

    $500,000

    $2,691,430

    # Servers

    Servers (CapEx)

    Servers (opEx)

    Datacenter

    Power

    External Storage

    Total

    MapR Distribuon

    100

    $500,000

    $25,000

    $415,195

    $155,520

    $0

    $1,095,715

    Hardware and datacenter savings with MapR (3yr):

    Other Distribuon MapR Distribuon

    $3,000,000

    $2,500,000

    $2,000,000

    $1,500,000

    $1,000,000

    $500,000

    $0

    $5,000

    35%

    400

    $0.2083

    $0.0800

    $0.0540

    $0.0540

    50

    $10,000

    $40,000

    Assumpons

    Cost per server

    Server lifeme (yr)Server OpEx (% of annual CapEx)

    Server peak power (W)

    Datacenter ammorzaon ($/W/mo)

    Datacenter OpEx ($/W/mo)

    Server Power ($/W/mo)

    PUE overhead ($/W/mo)

    Crical data on external storage (TB)

    $/TB on SAN (or NAS)

    Minimum NAS cost (NameNode needs it)

    Summary

    comScore evaluated and selected the MapR Distribuon for Apache Hadoop M5 Edion to replace their exisng Cloudera

    distribuon. MapR demonstrated superior performance, availability, scalability, and ease of use over comScores exisng

    Cloudera distribuon. MapR also provided significant cost savings over Cloudera.