View
107
Download
0
Category
Tags:
Preview:
DESCRIPTION
The Hong Kong Big Data community had a guest speaker at our Tuesday, 18 February meeting. Chris Yuen from Demyst Data discussed his experience with three NoSQL solutions: Cassandra, MongoDB, and HBase. For more information see http://www.infoincog.com/hong-kong-big-data-meeting-tuesday-18-february/.
Citation preview
Real World NoSQL x Big Data
OverviewIntroduction
Motivation for NoSQLThe NoSQL landscape
Experience sharingHBaseMongoDBCassandra
Tying it up – how does it really matter
MotivationToo much data – the need to “scale out”
CAP theorem
MotivationToo much data – the need to “scale out”
CAP theorem
PerformanceRDMBS joining is slowDenormalization
Key value data store
Alternative data representationSchemaless “No SQL”
MotivationToo much data – the need to “scale out”
CAP theorem
PerformanceRDMBS joining is slowDenormalization
Key value data store
Alternative data representationSchemaless “No SQL”
Document data store
HBaseBuilds on top of HDFS
Consistent “big-data” database
Automatically scales out
HBase… but we didn’t use it in the end
HBaseA nightmare to set up and maintain
Depends on Hadoop, HDFS, Zookeeper
HBaseA nightmare to set up and maintain
Depends on Hadoop, HDFS, Zookeeper
No secondary index
“Table” alteration requires downtime
Not spectacular latency for OLTP usage
MongoDBDe-facto “big-data” “NoSQL” database
Document based data representation
MongoDBDe-facto “big-data” “NoSQL” database
Document based data representation
MongoDBA good balance of “traditional” usage and
“NoSQL” usageSupports secondary indexRange query
Can do table scan
MongoDB“Big-data” features: sharding, replica set
MongoDB… but it got ugly pretty fast
Devil’s in the detailsReplica set management fiascoSharding is difficult to set up and poorly
implementedhttps://github.com/kizzx2/mongolab
MongoDB
MongoDBReality – it doesn’t scale beyond one machine
Replica set
CassandraColumn Family data store
CassandraColumn Family data store
CassandraColumn Family data store
More “NoSQL” than MongoDB. Less features
Column data store – strictly key/value query
CassandraAuto-sharding just works
Replica set requires 0 configuration
Append only, LSM-tree based storage formatGood for SSDHigh insert throughput
For storing analytic data
CassandraHas rudimentary support for secondary index
Difficult to do table scan or range scan
Require substantial application / paradigm shift
Real World ImplicationsWhy does NoSQL matter to Big Data?
Schemaless storage modelPerformanceScalability
Rapidly incorporate unstructured new data sources without extensive planning
How to ChooseMaintenance / Scalability
Supported operations
OLAP vs. OLTP
Thank YouChris Yuen
http://cfc.kizzx2.com
http://github.com/kizzx2
@kizzx2
chris@kizzx2.com
Recommended