37
Matt Asay (@mjasay) VP, Business Development & Strategy, MongoDB Essential Tools For Your Big Data Arsenal

Your Big Data Arsenal - Strata 2013

  • Upload
    mjasay

  • View
    792

  • Download
    0

Embed Size (px)

DESCRIPTION

Matt Asay presents at Strata 2013 on how NoSQL fits into the Big Data landscape, particularly how MongoDB and Hadoop work well together. Not an infomercial.

Citation preview

Page 1: Your Big Data Arsenal - Strata 2013

Matt Asay (@mjasay)VP, Business Development & Strategy, MongoDB

Essential Tools For Your Big Data Arsenal

Page 2: Your Big Data Arsenal - Strata 2013

The Big Data Unknown

Page 3: Your Big Data Arsenal - Strata 2013

3

Top Big Data Challenges?

Translation? Most struggle to know what Big Data is, how to manage it and who can manage it

Source: Gartner

Page 4: Your Big Data Arsenal - Strata 2013

4

Understanding Big Data – It’s Not Very “Big”

from Big Data Executive Summary – 50+ top executives from Government and F500 firms

64% - Ingest diverse, new data in real-time

15% - More than 100TB of data

20% - Less than 100TB (average of all? <20TB)

Page 5: Your Big Data Arsenal - Strata 2013

Innovation As Iteration

Page 6: Your Big Data Arsenal - Strata 2013

“I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison

Page 7: Your Big Data Arsenal - Strata 2013

7

Back in 1970…Cars Were Great!

Page 8: Your Big Data Arsenal - Strata 2013

8

So Were Computers!

Page 9: Your Big Data Arsenal - Strata 2013

9

Lots of Great Innovations Since 1970

Page 10: Your Big Data Arsenal - Strata 2013

10

Including the Relational Database

Page 11: Your Big Data Arsenal - Strata 2013

11

RDBMS Makes Development Hard

Relational Database

Object Relational Mapping

Application

Code XML Config DB Schema

Page 12: Your Big Data Arsenal - Strata 2013

12

And Even Harder To Iterate

New Table

New Table

New Column

Name Pet Phone Email

New Column

3 months later…

Page 13: Your Big Data Arsenal - Strata 2013

13

RDBMS

From Complexity to Simplicity

MongoDB

{

_id : ObjectId("4c4ba5e5e8aabf3"),

employee_name: "Dunham, Justin",

department : "Marketing",

title : "Product Manager, Web",

report_up: "Neray, Graham",

pay_band: “C",

benefits : [

{ type :  "Health",

plan : "PPO Plus" },

{ type :   "Dental",

plan : "Standard" }

]

}

Page 14: Your Big Data Arsenal - Strata 2013

14

So…Use Open Source

Page 15: Your Big Data Arsenal - Strata 2013

15

Big Data != Big Upfront Payment

Page 16: Your Big Data Arsenal - Strata 2013

16

RDBMS Is Expensive To Scale

“Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.”

IBM Press Release 28 Aug, 2012

Page 17: Your Big Data Arsenal - Strata 2013

17

Spoiled for choice

1 Oracle  Relational DBMS 1583.84 54.232 MySQL  Relational DBMS 1331.34 25.583 Microsoft SQL Server  Relational DBMS 1207 -106.784 PostgreSQL  Relational DBMS 177.01 -5.225 DB2  Relational DBMS 175.83 3.586 MongoDB  NoSQL Document Store 149.48 -2.717 Microsoft Access  Relational DBMS 142.49 -4.218 SQLite  Relational DBMS 77.88 -4.99 Sybase  Relational DBMS 73.66 -1.68

10 Teradata  Relational DBMS 54.41 3.32

DB-Engines.com Database Ranking

Page 18: Your Big Data Arsenal - Strata 2013

18

Remember the Long Tail?

Page 19: Your Big Data Arsenal - Strata 2013

19

It Didn’t Work Out So Well

Page 20: Your Big Data Arsenal - Strata 2013

20

Use Popular, Well-Known Technologies

Source: Silicon Angle, 2012

Page 21: Your Big Data Arsenal - Strata 2013

21

Ask the Right Questions…

“Organizations already have people who know their own data better than mystical data scientists….Learning Hadoop [or MongoDB] is easier than learning the company’s business.”

(Gartner, 2012)

Page 22: Your Big Data Arsenal - Strata 2013

22

Leverage Existing Skills

Page 23: Your Big Data Arsenal - Strata 2013

23

Search as a Sign?

Page 24: Your Big Data Arsenal - Strata 2013

When To Use Hadoop, NoSQL

Page 25: Your Big Data Arsenal - Strata 2013

25

Enterprise Big Data Stack

EDWHadoop

Man

agem

ent

& M

on

ito

rin

gS

ecurity &

Au

ditin

g

RDBMS

CRM, ERP, Collaboration, Mobile, BI

OS & Virtualization, Compute, Storage, Network

RDBMS

Applications

Infrastructure

Data Management

Online Data Offline Data

Page 26: Your Big Data Arsenal - Strata 2013

26

Consideration – Online vs. Offline

• Long-running• High-Latency• Availability is lower

priority

• Real-time• Low-latency• High availability

Online Offlinevs.

Page 27: Your Big Data Arsenal - Strata 2013

27

Consideration – Online vs. Offline

Online Offlinevs.

Page 28: Your Big Data Arsenal - Strata 2013

28

Hadoop Is Good for…

Risk Modeling Churn AnalysisRecommendation

Engine

Ad TargetingTransaction

AnalysisTrade

Surveillance

Network Failure Prediction

Search Quality Data Lake

Page 29: Your Big Data Arsenal - Strata 2013

29

MongoDB/NoSQL Is Good for…

360° View of the Customer

Mobile & Social Apps

Fraud Detection

User Data Management

Content Management &

DeliveryReference Data

Product CatalogsMachine to

Machine AppsData Hub

Page 30: Your Big Data Arsenal - Strata 2013

How To Use The Two Together?

Page 31: Your Big Data Arsenal - Strata 2013

31

Finding Waldo

Page 32: Your Big Data Arsenal - Strata 2013

32

Customer example: Online Travel

Travel

• Flights, hotels and cars

• Real-time offers• User profiles, reviews• User metadata

(previous purchases, clicks, views)

• User segmentation• Offer recommendation

engine• Ad serving engine• Bundling engine

Algorithms

MongoDB Connector for

Hadoop

Page 33: Your Big Data Arsenal - Strata 2013

33

Predictive Analytics

Government

• Predictive analytics system for crime, health issues

• Diverse, unstructured (incl. geospatial) data from 30+ agencies

• Correlate data in real-time

• Long-form trend analysis• MongoDB data dumped

into Hadoop, analyzed, re-inserted into MongoDB for better real-time response

Algorithms

MongoDB

+ Hadoop

Page 34: Your Big Data Arsenal - Strata 2013

34

Data Hub

Insurance

• Insurance policies• Demographic data• Customer web data• Call center data• Real-time churn

detection

• Customer action analysis

• Churn prediction algorithms

Churn Analysis

MongoDB Connector for

Hadoop

Page 35: Your Big Data Arsenal - Strata 2013

35

Machine Learning

Ad-Serving

• Catalogs and products

• User profiles• Clicks• Views• Transactions

• User segmentation• Recommendation

engine• Prediction engine

Algorithms

MongoDB Connector for

Hadoop

Page 36: Your Big Data Arsenal - Strata 2013

36

• Makes MongoDB a Hadoop-enabled file system

• Read and write to live data, in-place

• Copy data between Hadoop and MongoDB

• Full support for data processing

– Hive

– MapReduce

– Pig

– Streaming

– EMR

MongoDB + Hadoop Connector

MongoDB Connector for

Hadoop

Page 37: Your Big Data Arsenal - Strata 2013

@mjasay