Clustrix Big Data Podcast

Preview:

DESCRIPTION

In this slidecast, Robin Purohit of Clustrix describes the company's leading scale-out SQL database engineered for the cloud. "Clustrix provides the scale, flexibility, simplicity, availability, and raw power that have given both enterprise and fast-growth organizations the ability to innovate faster -- and drive those innovations to market sooner than their competition. As the most mature of the primary databases, Clustrix is the leading scale-out SQL database engineered for the cloud. With Clustrix, organizations can scale transactions, run real-time analytics, and simplify operations." Learn more: http://www.clustrix.com Watch the presentation video: http://inside-bigdata.com/2013/09/06/clustrix-scaleout-sql-database-engineered-cloud/

Citation preview

The Leading Scale-out SQL Database Engineered for the Cloud

Robin Purohit

CEO and President

SCALE-OUT DATABASES ARE THE RIGHT APPROACH

UNLESS YOU HAVE UMLIMITED MONEY TO SPEND

NoSQL NewSQL Hadoop

FOR HYPER-SCALE WEB AND MOBILE APPLICATIONS

Cloud Makes It Possible Do This Quickly and Pay-as-you-go

Great Idea Billions of Transactions and Rows

Smarter Application

Ad HocReporting

SCALE-OUT SQL DATABASE FOR OPERATIONAL DATA

MASSIVE TRANSACTIONVOLUME

REAL-TIME ANALYTICS

ACID, SQL AND MYSQL

SELF-MANAGING

BUILT-IN INSTRUMENTATION

SCALE-OUT SQL

Add nodes as demand grows

Automated recovery on failure

OPERATIONAL DATABASE

E-commerce

EXAMPLES APPLICATION SEGMENTS

BATTLED TESTED LESSONS

Consumer Web Advertising Analytics

BUSTING THE MYTH - SQL CAN SCALE

• 20 million+ users / 70,000+ TPS• Write heavy workload; 1TB+ writes / day

Massive Transaction Scale Real-Time Analytics

MIXED WORKLOADS

IF YOU DON’T BELIEVE US – BELIEVE GOOGLE

F1 Based on “SPANNER” for Ad Words

http://www.theregister.co.uk/2013/08/30/google_f1_deepdive/

“100s of applications on over 100TB serving up 100s of thousands of requests per second

+ SQL queries that scans tens of trillions of data rows a day”

HOW TO CHOOSE THE RIGHT TOOL FOR THE JOB?

E-COMERCE EXAMPLE (SQL NORMALIZATION + JOIN = GOOD)

Customers(many)

Products(many or few

& may require flexibility)

Orders(many)

Reviews(many)

Problem is naturally relational - Orders, Reviews are for products by customers

What questions do you have?• Do you want to know all reviews for a product

along with the customer who wrote it (Product X Review X Customer)

• What about most popular products in San Francisco, or last 10 orders by a customer?

What Flexibility do you need? • Maybe all products have different attributes

WHAT DATA and WHAT QUESTIONS?

How SIMPLY do the QUESTIONS need to be answered?

MAP REDUCE OR SQL?

And how many lines of code?

WHEN do you want the QUESTIONS answered?

How COMPLEX is the Question?

NoSQLKey-Value, Document

NewSQLe.g. Clustrix

Warehousing AnalyticsHadoop, Vertica, Redshift

Query Complexity

In Memory Analytics

Reads and Writes Real-Time Analytics Batch Analytics

milliseconds secondsminutes Hours

ETL

HadoopKey-Value

SQL Warehousing

Vertica

SIZE and FLEXIBILITY and QUERIES

SIZE FLEXIBILITY

NewSQL10s of TBS

100s of TBS

PetabytesKey-ValueHadoop

Document / Tabular

Relational Schema,Online schema

changes

Schema-less

NEWSQL

Rows with different columns

QUERY ABILITY

Simple lookup

Indexed lookup

Joins and complex Analytics

With Flexibility,you Lose the sophisticated

SQL Query optimizer

RIGHT TOOL FOR THE JOB

NoSQL NewSQL Hadoop Columnar

OPERATIONAL DATA BATCH ANALYSIS

With Alot More SQL

Clustrix Technical Resources

docs.clustrix.com

Recommended