Upload
dataversity
View
4.816
Download
0
Tags:
Embed Size (px)
Citation preview
©2012 DataStax 1
Top 5 Factors to Consider When
Choosing a Big Data Solution
Robin Schumacher, VP Products
©2012 DataStax 2
• VP Products, DataStax
• Director of Product Management MySQL, then
EnterpriseDB
• VP Product Management at Embarcadero
Technologies
• DBA with Oracle, Teradata, SQL Server, DB2,
others…
• Database software reviewer for various magazines
• Author of 3 database books
©2012 DataStax 3
• Founded in April 2010
• Commercial leader in Apache Cassandra™, the
popular open-source “big data” database
• 140+ customers
• 40+ employees
• Home to Apache Cassandra Chair & most
committers
• Headquartered in San Francisco Bay area
• Funded by prominent venture firms
Overview of DataStax
©2012 DataStax 4
• Define big data
• Identify “must have’s” of a big data
solution
• Discuss difficulty in getting all of them
from a business and technical
perspective
• Brief tour of NoSQL, Cassandra and
DataStax Enterprise
©2012 DataStax 5
What big data is and the
domains of data that need to be
considered.
©2012 DataStax 6
©2012 DataStax 7
“Big data technologies describe a new generation of technologies and
architectures, designed to economically extract value from very large
volumes of a wide variety of data, by enabling high-velocity capture, discovery,
and/or analysis.”
"Big data is data that exceeds the processing capacity of conventional
database systems. The data is too big, moves too fast, or doesn't fit the
strictures of your database architectures. To gain value from this data, you
must choose an alternative way to process it."
* All definitions have one thing in common: new technology is needed for big
data…
”Datasets whose size is beyond the ability of typical database software
tools to capture, store, manage, and analyze "
©2012 DataStax 8
1. Real-time – transactional, online, streaming, low
latency data
2. Analytic – aggregated data from real-time feeds or
other sources; many times batch in nature
3. Search – supporting data, both external and
internal, used for locating desired information and/or
objects (e.g. products, documents, etc.)
©2012 DataStax 9
Research done by McKinsey & Company shows the eye-opening,
10-year category growth rate differences between businesses that
smartly use their big data and those that do not.
©2012 DataStax 10
What are the top five things to
consider in a big data solution?
©2012 DataStax 11
©2012 DataStax 12
The characteristics that define big data are:
1. Velocity – includes the speed at which data comes in,
and the number of events/elements being stored
2. Variety – involves structured, semi-structured,
unstructured data
3. Volume – can equate to TB-PB’s of data
4. Complexity – typically entails the difficulty distributing
the data (e.g. multi-data centers, cloud, etc.) and
managing the data traffic/movement (e.g. ETL,
migrations, etc.)
©2012 DataStax 13
• Data has high rate of input
• Data has large quantity of elements/events
•Sensor data
•Media streaming
•Mobile devices
•Financial streams
•Web clickstream
•Traffic monitoring
•Patient care
©2012 DataStax 14
• Includes structured, semi, and unstructured
• Necessitates new data model and file formats
• Involves, real-time, analytic, and search data
©2012 DataStax 15
• TB’s to PB’s
• Also involves data maintenance functions
(e.g. purging, etc.)
©2012 DataStax 16
The McKinsey report found that the average investment firm with fewer than 1,000
employees has 3.8 petabytes of data stored, experiences a data growth rate of 40 percent
per year, and stores structured, semi-structured, and unstructured data. Overall, McKinsey
found that 15 out of 17 industry sectors in the United States have more data stored per
company than the U.S. Library of Congress (which had 235 terabytes of information at the
time of McKinsey’s study)
©2012 DataStax 17
• Typically involves data distribution,
movement, etc., across multiple data centers
and geographies
• Can be on-premise, cloud, or hybrid
©2012 DataStax 18
Getting a big data technology that provides two out of three can be
challenging; finding one that supplies all three can be very hard.
©2012 DataStax 19
NoSQL, Cassandra, and
DataStax Enterprise for big data.
©2012 DataStax 20
NoSQL is a broad class of next-generation database management
systems that differ from the classic model of the relational database
management system (RDBMS) in some significant ways, most
important being they:
• Sport a less-rigid, more dynamic data model
• Look to provide user controlled trade-off’s to the CAP theorem
• Do not support ANSI SQL or operations such as joins
• Attempt to solve some or all of the challenges of big data
©2012 DataStax 21
A NoSQL solution like Apache Cassandra:
• Handles high velocity data with ease
• Uses schema that support broad varieties of data
• Scales from GB’s to PB’s with linear performance
capabilities
• Is built to handle multi-location/data center use cases
• Is designed for continuous availability
• Offers quick installation and configuration for multi-node
clusters
• Is open source and/or cost 80-90% less than RDBMS’s
©2012 DataStax 22
* Uses Cassandra and Hadoop for data management
©2012 DataStax 23
YCSB Benchmark
Source: http://blog.cubrid.org/dev-platform/nosql-benchmarking/?utm_source=NoSQL+Weekly+List&utm_campaign=143fae86b2-
NoSQL_Weekly_Issue_41_September_8_2011&utm_medium=email
Cassandra is:
Nearly 4x better in writes
Nearly 2x better in reads
Over 12x better in reads/updates
©2012 DataStax 24
“Cassandra was just a better design all around – more truly horizontally scalable
and with less management overhead – and there‟s no single point of failure. I
looked at Cassandra‟s architecture and thought, „Yeah, that‟s how you do it.‟”
- Matt Conway, VP of Engineering
©2012 DataStax 25
“The hundreds of millions of web pages that contain this information
are stored in a multi-terabyte cache that grows continually as we
crawl the web, analyzing new pages and finding new versions of
existing pages.” – Zoominfo Architect on using Cassandra
©2012 DataStax 26
“I can create a Cassandra cluster in any region of the world in 10
minutes. When marketing guys decide we want to move into a
certain part of the world, we’re ready.” - Netflix architect
©2012 DataStax 27
• Fully integrated smart big data platform
• Production certified Cassandra
• Continuously available analytics with Hadoop
• Scalable enterprise search with Solr
• Built in workload isolation
• No costly and error-prone ETL operations
• Easy migration of RDBMS and log data
• Simple to install and grow
• OpsCenter management solution
• 80-90% less cost than RDBMS vendors
©2012 DataStax 28
DataStax Enterprise ServerNo ETL and Built-in Workload Isolation
• Data written to any node is automatically and transparently written to all other
nodes.
• Mixed workload management is automatic; real-time, analytic, and search
workloads/nodes do not compete for compute or data resources with other
nodes.
ETL
Staff /
Processes
©2012 DataStax 29
DataStax Enterprise ServerMulti-Data Center and Cloud Capable
• Built-in capabilities to maintain the same database cluster between many
different data centers
• Able to easily do on-premise data centers and cloud use case models
Data Center 1 Data Center 2
©2012 DataStax 30
• DataStax OpsCenter is a visual management and monitoring
solution for DataStax Enterprise
• Manage and monitor all Cassandra and Hadoop and Solr
operations
• Visual alerts and notifications
©2012 DataStax 31
1. Does it handle high data velocity?
2. Can it tackle all types of data?
3. How well does it perform with large data volumes?
4. Can it handle complex distribution and
implementation use cases (e.g. on-premise/cloud,
multi-geo)?
5. How does it stack up in hitting the big data “bulls
eye?” (i.e. cost, saleable performance, and
operational ease are concerned)?
©2012 DataStax 32
DataStax Enterprise is tailor made for high-velocity, multi-variety,
large volume, and complex deployment use cases that involve big
data.
©2012 DataStax 33
Recommended Reading
http://www.datastax.com/resources/whitepapers
©2012 DataStax 34
Next Steps
Download DataStax Enterprise and try it in your own
environment.
• Go to
www.datastax.com/download
• Download a copy of DataStax
Enterprise
• Installs and configures in minutes
• Completely free for development
use
©2012 DataStax 35
For More Information
©2012 DataStax 36
Move Faster.