Upload
bigdataeverywhere
View
158
Download
4
Tags:
Embed Size (px)
Citation preview
© 2014 MapR Technologies© 2014 MapR Technologies
Getting Real With Hadoop
Jim Scott, Director, Enterprise Strategy & Architecture
@kingmesal #BigDataEverywhere #Chicago - October 1st, 2014
© 2014 MapR Technologies
© 2014 MapR Technologies
© 2014 MapR Technologies
© 2014 MapR Technologies
© 2014 MapR Technologies
© 2014 MapR Technologies 6
Can’t We All Just Get Along?
© 2014 MapR Technologies 7
We Have All Contributed…
8
The Reality is Architecture Matters
© 2014 MapR Technologies 9
No NameNode architecture
MapReduce/YARN HA
NFS HA
Instant recovery
Rolling upgrades
HA is built in
• Distributed metadata can self-heal
• No practical limit on # of files
• Jobs are not impacted by failures
• Meet your data processing SLAs
• High throughput and resilience for NFS-based data
ingestion, import/export and multi-client access
• Files and tables are accessible within seconds of a node
failure or cluster restart
• Upgrade the software with no downtime
• No special configuration to enable HA
• All MapR customers operate with HA
High Availability (HA) Everywhere
© 2014 MapR Technologies
© 2014 MapR Technologies 11
RDBMS Hammer
© 2014 MapR Technologies 12
© 2014 MapR Technologies 13
Hadoop Hammer
© 2014 MapR Technologies
Data Everywhere!
Social Media
Messages
Audio
Sensors
Mobile Data
Clickstream
© 2014 MapR Technologies
Friends don’t let friends
run name nodes.
© 2014 MapR Technologies 16
Too Many Files!
© 2014 MapR Technologies
Friends don’t let friends
run name nodes.
© 2014 MapR Technologies 18
Volumes
100K volumes are OK,
create as many as needed
Volumes dramatically simplify
management:
• Replication factor
• Scheduled mirroring
• Scheduled snapshots
• Data placement control
• User access and tracking
• Administrative permissions
/projects
/tahoe
/yosemite
/user
/msmith
/bjohnson
© 2014 MapR Technologies 19
MapR M7: The Best In-Hadoop Database
NoSQL Columnar Store
Apache HBase API
Integrated with Hadoop
HBase
JVM
HDFS
JVM
ext3/ext4
Disks
Other Distros
Tables/Files
Disks
The most scalable, enterprise-grade,
NoSQL database that supports online applications and analytics
MapR-DB
MapR Enterprise Database Edition (M7)
© 2014 MapR Technologies 20
Tradeoffs with Other NoSQL Solutions
24x7 applications with strong
data consistency
Reliability
Continuous low latency with
horizontal scaling
Performance
Easy day-to-day management
with minimal learning curve
Easy Administration
© 2014 MapR Technologies 21
Consistent, Low Read Latency
--- M7 Read Latency --- Others Read Latency
© 2014 MapR Technologies
MapR Integrates Security into HadoopMapR Integrates Security into Hadoop
© 2014 MapR Technologies 23
Hadoop Security
Authorization to ensure the right access to files and databases
Authentication for users and user-created job requests
Encryption to ensure user credentials and data are always secure
Integration with existing security infrastructure
© 2014 MapR Technologies 24
Fine-Grained Access Control
Full POSIX permissions on files and directories
ACLs on tables, column families and columns
ACLs on MapReduce jobs and queues
Administration ACLs on cluster and volumes
ACLs for Apache Hive, Apache Drill and Impala
© 2014 MapR Technologies 25
Seamless Integration with Direct Access NFS
• MapR is POSIX compliant
– Random reads/writes
– Simultaneous reading and writing to a file
– Compression is automatic and transparent
© 2014 MapR Technologies 26
Seamless Integration with Direct Access NFS
• MapR is POSIX compliant
– Random reads/writes
– Simultaneous reading and writing to a file
– Compression is automatic and transparent
• Industry-standard NFS interface (in
addition to HDFS API)
– Stream data into the cluster
– Leverage thousands of tools and
applications
– Easier to use non-Java programming
languages
– No need for most proprietary Hadoop
connectors
© 2014 MapR Technologies 27
Disaster Recovery: Mirroring
• Flexible– Choose the volumes/directories to mirror
– You don’t need to mirror the entire cluster
– Active/active
• Fast– No performance impact
– Block-level (8KB) deltas
– Automatic compression
Production
WAN
Production Research
Datacenter 1 Datacenter 2
WAN EC2
© 2014 MapR Technologies 28
Disaster Recovery: Mirroring
• Flexible– Choose the volumes/directories to mirror
– You don’t need to mirror the entire cluster
– Active/active
• Fast– No performance impact
– Block-level (8KB) deltas
– Automatic compression
• Safe– Point-in-time consistency
– End-to-end checksums
• Easy– Graceful handling of network issues
– No third-party software
– Takes less than two minutes to configure!Production
WAN
Production Research
Datacenter 1 Datacenter 2
WAN EC2
© 2014 MapR Technologies 29
99.999% uptime ✓ X
Instant recovery from failures ✓ X
Continuous low latency (no compactions) ✓ X
Zero administration
(no processes to manage, self-tuning)✓ X
Online data protection (snapshots, mirroring) ✓ X
Scalability (number of tables supported) Trillion Hundreds
MapR Advantages
MapR-DB Others
© 2014 MapR Technologies 30
Packages Supported by various distributionsMapR 4.0.1
(Sep 2014)
Cloudera 5.1.2
(Aug 2014)
Hortonworks 2.1.5
(Aug 2014)
Apache Versions
(Sep 12th, 2014)
Core Hadoop Hadoop Core, YARN 2.4.1 2.3.0 2.4.0 2.5.1
Batch Map Reduce MRv1 and MRv2 MRv1 or MRv2 MRv2 MRv2
Hive 0.12, 0.13 0.12 0.13 0.13
Tez 0.4 (Dev Preview Only) X 0.4 0.5
Pig 0.12 0.12 0.12 0.12
Cascading 2.1.6 X X 2.5
Spark 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1
Interactive SQL Impala 1.2.3 1.4 X 1.4
Drill 0.5 X X 0.5
SparkSQL 1.0.2 X 1.0.1 (Tech Preview only) 1.1
NoSQL and Search HBase/NoSQL 0.94.2, 0.98.4, MapR-DB 0.98 0.98, Accumulo 1.5.1 HBase 0.98
Phoenix X X 4.0.0 4.1.0
AsyncHBase 1.5 X X 1.5
Search LW (Solr) 2.6.1 , 2.7 Cloudera Search 1.5 X NA
Machine Learning and
Graph
Mahout 0.9 0.9 0.9 0.9
MLLib/MLBase 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1
GraphX 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1
Streaming/Messaging Spark Streaming 0.9.2, 1.0.2 1.0.0 1.0.1 (Tech Preview only) 1.1
Storm 0.9, 0.9.2 (Certified) X 0.9.1 0.9.2
Kafka X X 0.8.1.1 (Tech Preview) 0.8.1.1
Data Integration Sqoop, Sqoop2 1.4.4, 1.99.3 1.4.4, 1.99.3 1.4.4 1.4.5
Flume 1.5.0 1.5.0 1.4.0 1.5.0
Knox X X 0.4 0.4
Coordination Oozie 4.0.1 4.0.0 4.0.0 4.0.1
Zookeeper 3.4.5 3.4.5 3.4.5 3.4.5
GUI, Configuration,
Monitoring
Management MCS CM Ambari Ambari
Hue 3.5 3.6 2.5.1 3.6
Red – lacking
Blue - leading
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH-Version-and-Packaging-Information/cdhvd_cdh_package_tarball.html?scroll=topic_3_unique_8
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.5/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.5-product.html
© 2014 MapR Technologies
Pick the
Right Tool for the Job
© 2014 MapR Technologies 32
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark Streaming
Storm*
Streaming
HBase
Solr
NoSQL & Search
Juju
Provisioning &
coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow & Data
Governance
Tez*
Accumulo*
Hive
Impala
SparkSQL
Drill
SQL
Sentry* Oozie ZooKeeperSqoop
Knox* WhirrFalcon*Flume
Data Integration& Access
HttpFS
Hue
* Certification/support planned for 2014
Man
ag
em
ent
MapR Data Platform
MapR Distribution for Apache Hadoop
© 2014 MapR Technologies 33
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark Streaming
Storm*
Streaming
HBase
Solr
NoSQL & Search
Juju
Provisioning &
coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow & Data
Governance
Tez*
Accumulo*
Hive
Impala
SparkSQL
Drill
SQL
Sentry* Oozie ZooKeeperSqoop
Knox* WhirrFalcon*Flume
Data Integration& Access
HttpFS
Hue
* Certification/support planned for 2014
NFS HDFS API JSON APIHBase API
Map
R C
on
tro
l S
yste
m(M
an
ag
em
ent a
nd
Mo
nito
rin
g)
GU
IR
ES
T A
PI
CLI
MapR Distribution for Apache Hadoop
© 2014 MapR Technologies
1.65TBWITH 298 SERVERS
© 2014 MapR Technologies 35
1/7th the Hardware Footprint
© 2014 MapR Technologies 36
Forrester Wave™: Big Data Hadoop Solutions, Q1‘14February 2014 “The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014”
© 2014 MapR Technologies
© 2014 MapR Technologies 38
• Pioneering Data Agility for Hadoop
• Apache open source project
• Scale-out execution engine for low-latency queries
• Unified SQL-based API for analytics & operational applications
APACHE DRILL
40+ contributors150+ years of experience building
databases and distributed systems
© 2014 MapR Technologies 39
Drill Supports Schema Discovery On-The-Fly
• Fixed schema
• Leverage schema in centralized
repository (Hive Metastore)
• Fixed schema, evolving schema or
schema-less
• Leverage schema in centralized
repository or self-describing data
2Schema Discovered On-The-FlySchema Declared In Advance
SCHEMA ON WRITE
SCHEMA BEFORE READ
SCHEMA ON THE FLY
Schema Declared In Advance Schema Discovered On-The-Fly
© 2014 MapR Technologies 40© 2014 MapR Technologies
Operational Analytics
© 2014 MapR Technologies 41
Must Be Able to Scale
© 2014 MapR Technologies 42
Mobile
application server
OperationalReal-time and
Actionable
AnalyticsHadoop (MapR M7)
• User profiles and state
• User interactions
• Real-time location data
• Web and mobile session state
• Comments/rankings
Web
application server
Customer 360
dashboard
Churn analysis (predictive analytics)
Product/service
optimization and
personalization
Real-time ad
targeting
Data exploration
(SQL)
© 2014 MapR Technologies 43
General Application Monitoring
© 2014 MapR Technologies 44
Hard Drive Failure Rates
© 2014 MapR Technologies 45
Recommendation Engines
© 2014 MapR Technologies 46
20MSONGS
Media Content Recommendation Engine
© 2014 MapR Technologies
Fraud Detection
© 2014 MapR Technologies 48
104MCARD MEMBERS
Offer Serving, Credit Risk & Fraud
More than $600B+
© 2014 MapR Technologies 49PEOPLE
100MData Points
per second
Fastest Data Ingest Rates
© 2014 MapR Technologies 50
Speed and Intelligence…
© 2014 MapR Technologies 51
Forrester Wave™: NoSQL Key-Value Databases, Q3‘14September 2014 “The Forrester Wave™: NoSQL Key-Value Databases, Q3 2014”
© 2014 MapR Technologies 52
MapR Editions
Control System NFS Access Performance Unlimited Nodes Free
All the Features of M5 Simplified Administration
for HBase Increased Performance Consistent Low Latency Unified Snapshots,
Mirroring
Control System NFS Access Performance High Availability Snapshots & Mirroring 24 X 7 Support Annual Subscription
Fastest On-Ramp:
MapR Sandbox for Hadoop
© 2014 MapR Technologies
@mapr maprtech
Engage with us!
MapR
maprtech
mapr-technologies