View
378
Download
1
Category
Preview:
DESCRIPTION
This presentation was given at 2013 Hadoop Summit North America by MapR Technologies.
Citation preview
1 ©MapR Technologies -‐ Confiden6al
M7 – Enterprise-‐Grade NoSQL
2 ©MapR Technologies -‐ Confiden6al
One Pla9orm for Big Data
…
99.999% HA
Data Protec6on
Disaster Recovery
Scalability &
Performance Enterprise Integra6on
Mul6-‐tenancy
Map Reduce
File-‐Based Applica6ons SQL Database Search Stream
Processing
Batch Interac6ve Real-‐6me
Batch Log file Analysis
Data Warehouse Offload Fraud Detec6on
Clickstream Analy6cs
Real-‐Time Sensor Analysis “TwiUerscraping”
Telema6cs Process Op6miza6on
InteracEve Forensic Analysis Analy6c Modeling BI User Focus
3 ©MapR Technologies -‐ Confiden6al
One Pla9orm for Big Data
…
99.999% HA
Data Protec6on
Disaster Recovery
Scalability &
Performance Enterprise Integra6on
Mul6-‐tenancy
Map Reduce
File-‐Based Applica6ons SQL Search Stream
Processing
Batch Interac6ve Real-‐6me
Batch Log file Analysis
Data Warehouse Offload Fraud Detec6on
Clickstream Analy6cs
Real-‐Time Sensor Analysis “TwiUerscraping”
Telema6cs Process Op6miza6on
InteracEve Forensic Analysis Analy6c Modeling BI User Focus
Database
4 ©MapR Technologies -‐ Confiden6al
M7 – Enterprise-‐Grade NoSQL on Hadoop
§ NoSQL Columnar Store § HBase API § Integrated with Hadoop
HBase
JVM
HDFS
JVM
ext3/ext4
Disks
Other Distros
Tables/Files
Disks
MapR M7
5 ©MapR Technologies -‐ Confiden6al
Performance Easy Administra6on
Tradeoffs with Other NoSQL SoluEons
Con6nuous low latency with horizontal scaling
Easy day-‐to-‐day management with minimal learning curve
24x7 applica6ons with strong data consistency
Reliability
6 ©MapR Technologies -‐ Confiden6al
Reliability Easy Administra6on Performance Performance Reliability Easy Administra6on
Bullet-‐proof NoSQL with Zero AdministraEon
Benefit Features
High Performance Over 1 million ops/sec with 10 node cluster
ConEnuous Low Latency No I/O storms, No compac6ons
24x7 ApplicaEons Instant recovery, online schema modifica6on, snapshots, mirroring
Zero AdministraEon No processes to manage, automated splits, self-‐tuning
High Scalability 1 trillion tables
Low TCO Files and tables on one plaeorm
7 ©MapR Technologies -‐ Confiden6al
M7: Tables for Developers
• Users can create and manage their own tables – Unlimited # of tables – First copy local
• Tables can be created in any directory – Tables count towards volume and user quotas
• No admin interven6on needed – Perform tasks on the fly – No stopping/restar6ng of servers
• Automa6c data protec6on and disaster recovery – Users can recover from snapshots/mirrors on their own
8 ©MapR Technologies -‐ Confiden6al
M7: Volume Based Data Management
9 ©MapR Technologies -‐ Confiden6al
Case Study – 24x7 ApplicaEons
Web 2.0 company that op6mizes email, mobile & social campaigns
• 42 hours of compac6on every weekend
• Long cold-‐starts • “Engineering resources stuck fixing HBase problems”
• No compac6ons • Instant recovery • Easy development and administra6on
Service Disrup6ons 24x7 Up6me
Apache HBase
10 ©MapR Technologies -‐ Confiden6al
Case Study – Cost EffecEve Scalability
Managed Security Services company that provides solu6ons for data security and compliance
• Limited analy6cal tools • No machine learning capability
• Extended analy6cal eco-‐system: Machine Learning, Solr and Hive
• Similar Reliability
Expensive Data Store Cost Effec6ve Scalability
ORACLE
11 ©MapR Technologies -‐ Confiden6al
þ
Case Study – Consistent Superior Performance
Cloud-‐based predic6ve analy6cs plaeorm
• Compac6ons • Manual administra6on • Poor reliability
• No Compac6ons • Zero administra6on • Strong consistency • 2x Cassandra performance • 3x HBase performance
ý
Apache HBase Cassandra
• Compac6ons • Manual administra6on • Eventual consistency
Sociocast conducted a POC with the three solu6ons
ý
12 ©MapR Technologies -‐ Confiden6al
YCSB Benchmark (ops/sec/node)
MapR 3.0 (M7)
CDH 4.2.0 (HBase) M7 Advantage
50% read, 50% update 4227 1695 2.5x
95% read, 5% update 3323 602 5.5x
Read 5018 764 6.6x
Scan (50 rows) 922 161 5.7x
Hardware ConfiguraEon
CPU : Intel® Xeon® CPU E5645 2.40GHz 12 cores x2 RAM : 48 GB Data Disk : 12x 3TB (7200 rpm) Size – record size = 1k, data size = 2TB OS : CentOS Release 6.2 (Final)
10-‐Node Cluster
High Performance across Varying Workloads
13 ©MapR Technologies -‐ Confiden6al
YCSB Benchmark (ops/sec/node)
MapR 3.0 (M7)
CDH 4.2.1 (HBase) M7 Advantage
50% read, 50% update 6188 2547 2.4x
95% read, 5% update 13064 2660 4.9x
Read 18840 1605 11.8x
Scan (50 rows) 1135 116 9.8x
Hardware ConfiguraEon
CPU : Intel® Xeon® CPU E5620 2.40GHz 8 cores x2 Memory: 24 GB Data Disk : 1x 1.2TB – Fusion I/O ioDrive2 Size – record size = 1k, data size = 600GB OS : CentOS Release 6.3 (Final)
5-‐Node Cluster
10x Faster Reads and Scans with Flash Memory
14 ©MapR Technologies -‐ Confiden6al
M7 OTHERS 99.999% High Availability ü X
Instant Recovery from Failures ü X Con6nuous Low Latency (No Compac6ons) ü X Zero Administra6on (No Processes to Manage, Self-‐tuning) ü X Online Data Protec6on (Snapshots, Mirroring) ü X Scalability (Number of Tables Supported) Trillion Hundreds
MapR Advantages
Recommended