36
©2012 DataStax 1 Top 5 Factors to Consider When Choosing a Big Data Solution Robin Schumacher, VP Products

The Top 5 Factors to Consider When Choosing a Big Data Solution

Embed Size (px)

Citation preview

Page 1: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 1

Top 5 Factors to Consider When

Choosing a Big Data Solution

Robin Schumacher, VP Products

Page 2: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 2

• VP Products, DataStax

• Director of Product Management MySQL, then

EnterpriseDB

• VP Product Management at Embarcadero

Technologies

• DBA with Oracle, Teradata, SQL Server, DB2,

others…

• Database software reviewer for various magazines

• Author of 3 database books

Page 3: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 3

• Founded in April 2010

• Commercial leader in Apache Cassandra™, the

popular open-source “big data” database

• 140+ customers

• 40+ employees

• Home to Apache Cassandra Chair & most

committers

• Headquartered in San Francisco Bay area

• Funded by prominent venture firms

Overview of DataStax

Page 4: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 4

• Define big data

• Identify “must have’s” of a big data

solution

• Discuss difficulty in getting all of them

from a business and technical

perspective

• Brief tour of NoSQL, Cassandra and

DataStax Enterprise

Page 5: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 5

What big data is and the

domains of data that need to be

considered.

Page 6: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 6

Page 7: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 7

“Big data technologies describe a new generation of technologies and

architectures, designed to economically extract value from very large

volumes of a wide variety of data, by enabling high-velocity capture, discovery,

and/or analysis.”

"Big data is data that exceeds the processing capacity of conventional

database systems. The data is too big, moves too fast, or doesn't fit the

strictures of your database architectures. To gain value from this data, you

must choose an alternative way to process it."

* All definitions have one thing in common: new technology is needed for big

data…

”Datasets whose size is beyond the ability of typical database software

tools to capture, store, manage, and analyze "

Page 8: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 8

1. Real-time – transactional, online, streaming, low

latency data

2. Analytic – aggregated data from real-time feeds or

other sources; many times batch in nature

3. Search – supporting data, both external and

internal, used for locating desired information and/or

objects (e.g. products, documents, etc.)

Page 9: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 9

Research done by McKinsey & Company shows the eye-opening,

10-year category growth rate differences between businesses that

smartly use their big data and those that do not.

Page 10: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 10

What are the top five things to

consider in a big data solution?

Page 11: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 11

Page 12: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 12

The characteristics that define big data are:

1. Velocity – includes the speed at which data comes in,

and the number of events/elements being stored

2. Variety – involves structured, semi-structured,

unstructured data

3. Volume – can equate to TB-PB’s of data

4. Complexity – typically entails the difficulty distributing

the data (e.g. multi-data centers, cloud, etc.) and

managing the data traffic/movement (e.g. ETL,

migrations, etc.)

Page 13: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 13

• Data has high rate of input

• Data has large quantity of elements/events

•Sensor data

•Media streaming

•Mobile devices

•Financial streams

•Web clickstream

•Traffic monitoring

•Patient care

Page 14: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 14

• Includes structured, semi, and unstructured

• Necessitates new data model and file formats

• Involves, real-time, analytic, and search data

Page 15: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 15

• TB’s to PB’s

• Also involves data maintenance functions

(e.g. purging, etc.)

Page 16: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 16

The McKinsey report found that the average investment firm with fewer than 1,000

employees has 3.8 petabytes of data stored, experiences a data growth rate of 40 percent

per year, and stores structured, semi-structured, and unstructured data. Overall, McKinsey

found that 15 out of 17 industry sectors in the United States have more data stored per

company than the U.S. Library of Congress (which had 235 terabytes of information at the

time of McKinsey’s study)

Page 17: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 17

• Typically involves data distribution,

movement, etc., across multiple data centers

and geographies

• Can be on-premise, cloud, or hybrid

Page 18: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 18

Getting a big data technology that provides two out of three can be

challenging; finding one that supplies all three can be very hard.

Page 19: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 19

NoSQL, Cassandra, and

DataStax Enterprise for big data.

Page 20: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 20

NoSQL is a broad class of next-generation database management

systems that differ from the classic model of the relational database

management system (RDBMS) in some significant ways, most

important being they:

• Sport a less-rigid, more dynamic data model

• Look to provide user controlled trade-off’s to the CAP theorem

• Do not support ANSI SQL or operations such as joins

• Attempt to solve some or all of the challenges of big data

Page 21: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 21

A NoSQL solution like Apache Cassandra:

• Handles high velocity data with ease

• Uses schema that support broad varieties of data

• Scales from GB’s to PB’s with linear performance

capabilities

• Is built to handle multi-location/data center use cases

• Is designed for continuous availability

• Offers quick installation and configuration for multi-node

clusters

• Is open source and/or cost 80-90% less than RDBMS’s

Page 22: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 22

* Uses Cassandra and Hadoop for data management

Page 23: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 23

YCSB Benchmark

Source: http://blog.cubrid.org/dev-platform/nosql-benchmarking/?utm_source=NoSQL+Weekly+List&utm_campaign=143fae86b2-

NoSQL_Weekly_Issue_41_September_8_2011&utm_medium=email

Cassandra is:

Nearly 4x better in writes

Nearly 2x better in reads

Over 12x better in reads/updates

Page 24: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 24

“Cassandra was just a better design all around – more truly horizontally scalable

and with less management overhead – and there‟s no single point of failure. I

looked at Cassandra‟s architecture and thought, „Yeah, that‟s how you do it.‟”

- Matt Conway, VP of Engineering

Page 25: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 25

“The hundreds of millions of web pages that contain this information

are stored in a multi-terabyte cache that grows continually as we

crawl the web, analyzing new pages and finding new versions of

existing pages.” – Zoominfo Architect on using Cassandra

Page 26: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 26

“I can create a Cassandra cluster in any region of the world in 10

minutes. When marketing guys decide we want to move into a

certain part of the world, we’re ready.” - Netflix architect

Page 27: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 27

• Fully integrated smart big data platform

• Production certified Cassandra

• Continuously available analytics with Hadoop

• Scalable enterprise search with Solr

• Built in workload isolation

• No costly and error-prone ETL operations

• Easy migration of RDBMS and log data

• Simple to install and grow

• OpsCenter management solution

• 80-90% less cost than RDBMS vendors

Page 28: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 28

DataStax Enterprise ServerNo ETL and Built-in Workload Isolation

• Data written to any node is automatically and transparently written to all other

nodes.

• Mixed workload management is automatic; real-time, analytic, and search

workloads/nodes do not compete for compute or data resources with other

nodes.

ETL

Staff /

Processes

Page 29: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 29

DataStax Enterprise ServerMulti-Data Center and Cloud Capable

• Built-in capabilities to maintain the same database cluster between many

different data centers

• Able to easily do on-premise data centers and cloud use case models

Data Center 1 Data Center 2

Page 30: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 30

• DataStax OpsCenter is a visual management and monitoring

solution for DataStax Enterprise

• Manage and monitor all Cassandra and Hadoop and Solr

operations

• Visual alerts and notifications

Page 31: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 31

1. Does it handle high data velocity?

2. Can it tackle all types of data?

3. How well does it perform with large data volumes?

4. Can it handle complex distribution and

implementation use cases (e.g. on-premise/cloud,

multi-geo)?

5. How does it stack up in hitting the big data “bulls

eye?” (i.e. cost, saleable performance, and

operational ease are concerned)?

Page 32: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 32

DataStax Enterprise is tailor made for high-velocity, multi-variety,

large volume, and complex deployment use cases that involve big

data.

Page 33: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 33

Recommended Reading

http://www.datastax.com/resources/whitepapers

Page 34: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 34

Next Steps

Download DataStax Enterprise and try it in your own

environment.

• Go to

www.datastax.com/download

• Download a copy of DataStax

Enterprise

• Installs and configures in minutes

• Completely free for development

use

Page 35: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 35

For More Information

Page 36: The Top 5 Factors to Consider When Choosing a Big Data Solution

©2012 DataStax 36

Move Faster.