2

Click here to load reader

The MapR Converged Community Edition · The MapR Converged Community Edition The MapR Converged Community Edition is an integrated platform consisting of Apache Hadoop, an event streaming

Embed Size (px)

Citation preview

Page 1: The MapR Converged Community Edition · The MapR Converged Community Edition The MapR Converged Community Edition is an integrated platform consisting of Apache Hadoop, an event streaming

The MapR Converged Community Edition

The MapR Converged Community Edition is an integrated platform consisting of Apache Hadoop, an event streaming system, a NoSQL database, and a distributed POSIX file system. It includes the latest innovations from the Hadoop 2.X and open source communities such as Apache HBase™, Apache Storm™, Apache Pig, Apache Hive™, Apache Mahout™, YARN, Apache Sqoop™, Apache Flume™, and more. It also delivers high performance, real-time operations with MapR-DB, MapR Streams, and MapR-FS.

MapR Breakthrough Innovations• MapR-DB, the integrated

multi-model NoSQL database to run existing HBase applications and JSON-based applications alongside other workloads in a single cluster

• MapR Streams for globally scalable, real-time event streaming

• Performance-optimized architecture for faster data processing and analytics

• Direct Access NFS™ and MapR-FS for real-time data access to Hadoop data

• Distributed metadata to support trillions of files in a single cluster

• Comprehensive security con-trols to protect sensitive data

• MapR Heatmap™ for instant cluster insights

• MapR volumes for easier policy management around security, retention, and quotas

Get started for free with the MapR Community Edition today—available for internal testing use.

MapR Converged Community Edition (MapR CE) is a free edition of the MapR Converged Data Platform, with usage restrictions specified in the MapR End User License Agreement, and with community forum support. This free version includes Apache Hadoop, Apache Spark™, MapR-DB (NoSQL database), MapR Streams (event streaming), and MapR-FS (POSIX file system). MapR CE enables distributed processing of large data sets across a cluster of servers. MapR deliv-ers a proven platform that supports a broad set of mission-critical and real-time production uses.

If you seek enterprise-grade business continuity capabilities, please see the Apache™ Hadoop® in the MapR Converged Data Platform data sheet, the MapR-DB data sheet, and the MapR Streams data sheet for more information.

Project choices. MapR supports a broad set of Hadoop projects, including the entire Apache Spark™ stack, YARN, Apache Drill, Impala, and more. MapR helps customers select the right tool for their specific requirements.

Monthly certified updates. MapR gives you access to the latest cutting-edge projects on Hadoop.

Backward compatibility. MapR lets you upgrade specific projects without needing to upgrade core Hadoop packages. Additionally, MapR lets you upgrade Hadoop and run your existing applications as is without rewriting them.

The high performance, integrated NoSQL database, MapR-DB, lets you run analyt-ics on live data without data copying, and deploy multiple use cases and workloads in a single, operationally efficient cluster.

MapR Streams lets you reliably deliver event data streams for real-time process-ing. With MapR Streams you can connect data producers and consumers in a high performance, publish/subscribe model.

Apache Drill on MapR lets you immediately query complex datasets such as nested data, NoSQL data, and data with rapidly evolving schemas, without requir-ing schema preparation. ANSI SQL support lets you use your existing business intelligence tools. For more information, please see the Apache Drill data sheet.

Open Choice, Open Source

Operational Analytics with In-Hadoop NoSQL

High-Throughput Event Streaming

The MapR Converged Data Platform

Self-Service SQL Analytics on Hadoop

Page 2: The MapR Converged Community Edition · The MapR Converged Community Edition The MapR Converged Community Edition is an integrated platform consisting of Apache Hadoop, an event streaming

product spotlight The MapR Converged Data Platform is powered by the industry’s fastest, most reliable, secure, and open data infrastructure that dramatically lowers TCO and enables global real-time data applications.

Interoperability Standard Hadoop tool support. MapR supports all Hadoop APIs and Hadoop data processing tools to access Hadoop data. You can move data in the MapR Distribution easily into other distributions, and vice versa.

Standards-based file access. Unlike other distributions, MapR provides true Network File System (NFS) capabilities. MapR Direct Access NFS™ lets you ac-cess Hadoop like a standard file system (via a single NFS mount point), to copy

data into and out of Hadoop easily at high rates, or to access Hadoop data using common command line tools and desktop applications. The optional add-on MapR POSIX Client provides authen-ticated NFS access from remote nodes, along with over-the-wire compression and parallel access to boost throughput.

Industry standards. MapR fully sup-ports additional industry-standard APIs, including ODBC/JDBC, LDAP, Kerberos, HBase, JSON, HDFS, NFS, and more.

Third-party tool ecosystem. The entire ecosystem of third-party tools (BI, ETL, etc.) built for use on Hadoop work on MapR. Examples of certified tools are available at the MapR App Gallery mapr.com/appgallery.

Portable applications. Hadoop applica-tions built on MapR run on any other Hadoop distribution, and vice versa, with no code changes or recompilation.

Kerberos and LDAP integration. MapR supports authentication services via Kerberos and/or LDAP.

Native authentication. MapR also offers a standards-based authentication system as a simpler alternative to Kerberos that leverages Linux Pluggable Authentication Modules (PAM) to provide the widest registry support.

Access control. Data is secured using standard Unix file permissions and advanced role-based access control expressions (ACEs).

Comprehensive auditing. MapR auditing logs help to analyze user behavior as well as to meet regulatory compliance requirements. MapR uses the JSON format to log accesses at

the administrative, authentication, database, and file levels.

Performant wire-level encryption. MapR encrypts data sent between nodes and applications to ensure data privacy, using Intel AES-NI capabilities where available.

The MapR Converged Community Edition supports standard multi-tenancy beyond the capabilities in YARN via volumes and security features to let distinct user groups, data sets, and applications coexist in isolation in the same cluster. More advanced

multi-tenancy capabilities on data and job placement control are available in the enterprise editions of the MapR Distribution.

Volumes. MapR supports the logical grouping of files and directories on

which policies (permissions, replication factors, quotas, etc.) can be set.

Security. MapR authentication and authorization controls provide another level of user and data isolation.

Customers can reduce their data center footprint with the MapR performance advantage by deploying as few as one-third the servers of other distributions. Faster file access and a faster optimized shuffle for MapReduce lets customers

get more work out of their hardware investment. A MapR cluster can scale to thousands of nodes and can store trillions of files.

MapR officially set the MinuteSort record by sorting 1.5 TB of data in under

a minute on Google Compute Engine. A MapR customer has since exceeded that record by sorting 1.65 TB, with one-seventh the number of servers of the highest non-MapR record.

The integrated NoSQL database, MapR-DB, is built on the core MapR Data Platform which set records on both the TeraSort and the MinuteSort

benchmarks. Recently, MapR-DB ran over 30,000 batch put operations per second per node, and showed as much as an eleven-fold speed

improvement over HBase. With its in-memory feature, MapR-DB can store a database in memory for additional performance gains.

Security

Multi-Tenancy

Performance and Scalability

MapR-DB High Throughput

MapR-DB Continuous Low Latency

Auto-tuning and data structure innovations in MapR-DB ensure consistent low latency, even at the 95th and 99th percentile latency measurements. MapR (in red on the

graph) consistently responds quickly, while the other distribution (in blue) shows many high latency spikes due to inefficient disk cleanup activities.

Management and Monitoring

MapR Control System. To manage, administer, and monitor your Hadoop cluster, the MapR Control System (MCS) is a browser-based interface to let you immediately view the status of your cluster via heatmaps, and drill into specific issues to investigate any problems. Alarms proactively notify you if potential problems arise.

Rolling upgrades. To minimize planned downtime, MapR allows a node-by-node Hadoop upgrade on a live cluster. With MapR backward compatibility, existing applications can still run on an upgraded Hadoop cluster with no modifications.

Elapsed Time

MapR-DB Low Latency

© 2016 MapR Technologies, Inc.