3
MapR Unique Features MapR is a complete distribution for Apache Hadoop that packages more than a dozen projects from the Hadoop ecosystem to provide a broad set of Big Data capabilities for the user. Projects such as; Apache HBase, Hive, Pig, Mahout, Flume, Avro, Sqoop, Oozie, Spark and Whirr are included along with non-Apache projects such as Cascading, Impala, Shark, Spark Streaming, MLLib and Graph X. Open Source Commitment: Along with the architectural innovations that provide business critical success, MapR constantly validates, tests, and hardens the core Apache projects before inclusion into MapR distribution. MapR typically upgrades to the latest versions of the newly released Apache OSS projects within 90 days of their release, with monthly releases to ensure you always have the latest innovations. MapR also makes code and libraries available through GitHub and Maven repositories. In addition, MapR also contributes significantly to the open-source community through contributions to projects such as Apache Mahout and Apache Drill. High Level MapR Benefits: 1. Non-Stop Operations: Our self-healing architecture protects the data and therefore the business against application downtime. Our platform has been specifically designed to provide self-healing functionality against the most severe hardware failures. Special configurations or special designs are not required. All MapR customers run High Availability as it is designed into the software. 2. Superior Performance: With approximately 100% greater performance when compared to other vendors, MapR reduces the number of severs required. This reduces equipment cost, data centre costs and the energy requirements. 100% performance improvement is conservative; on some workloads we see substantially higher. (MapR holds the Terasort record. This was achieved with better than 60% less cores and only 20% of the disks in comparison to the previous record) 3. Data Protection & Recovery: MapR have consistent point in time snapshots built in to the software as standard. This means that accidently deleted data or corrupted data can be bought back into service very quickly and easily, thus protecting the business from application down time. This also means that back up windows can be greatly extended as the snapshot can be used as point in time target for a backup. This can also be useful for developers running tests against consistent data sets. Tests can be run and restored to the beginning very easily. 4. Job Tracker High Availability: MapR's JobTracker High Availability improves recovery time. Upon a hardware failure, the MapR JobTracker automatically restarts on another node in the cluster. Task Trackers will automatically pause and then reconnect to the new JobTracker. Any currently running jobs or tasks continue without losing any progress or failing. The impact of not having this capability can be high. If a job takes five hours to run and suffers a Job Tracker failure during the run, the job has to start again from the beginning. With MapR the job picks up from where it stopped when the Job tracker node failed. 5. Multi-Version support: MapR supports backward compatibility and multi-version project support. Customers can upgrade projects and existing applications individually on their own schedule. 6. Rolling Upgrades: Upgrades to the operating system can be made without closing the cluster down. Protecting the business from operational down time and high maintenance windows.

MapR Unique features

Embed Size (px)

Citation preview

Page 1: MapR Unique features

  MapR Unique Features MapR is a complete distribution for Apache Hadoop that packages more than a dozen projects from the Hadoop ecosystem to provide a broad set of Big Data capabilities for the user. Projects such as; Apache HBase, Hive, Pig, Mahout, Flume, Avro, Sqoop, Oozie, Spark and Whirr are included along with non-Apache projects such as Cascading, Impala, Shark, Spark Streaming, MLLib and Graph X.

Open Source Commitment: Along with the architectural innovations that provide business critical success, MapR constantly validates, tests, and hardens the core Apache projects before inclusion into MapR distribution. MapR typically upgrades to the latest versions of the newly released Apache OSS projects within 90 days of their release, with monthly releases to ensure you always have the latest innovations. MapR also makes code and libraries available through GitHub and Maven repositories. In addition, MapR also contributes significantly to the open-source community through contributions to projects such as Apache Mahout and Apache Drill. High Level MapR Benefits:

1. Non-Stop Operations: Our self-healing architecture protects the data and therefore the business against application downtime. Our platform has been specifically designed to provide self-healing functionality against the most severe hardware failures. Special configurations or special designs are not required. All MapR customers run High Availability as it is designed into the software.

2. Superior Performance: With approximately 100% greater performance when

compared to other vendors, MapR reduces the number of severs required. This reduces equipment cost, data centre costs and the energy requirements. 100% performance improvement is conservative; on some workloads we see substantially higher.

(MapR holds the Terasort record. This was achieved with better than 60% less cores and only 20% of the disks in comparison to the previous record)

3. Data Protection & Recovery: MapR have consistent point in time snapshots built in

to the software as standard. This means that accidently deleted data or corrupted data can be bought back into service very quickly and easily, thus protecting the business from application down time. This also means that back up windows can be greatly extended as the snapshot can be used as point in time target for a backup. This can also be useful for developers running tests against consistent data sets. Tests can be run and restored to the beginning very easily.

4. Job Tracker High Availability: MapR's JobTracker High Availability improves

recovery time. Upon a hardware failure, the MapR JobTracker automatically restarts on another node in the cluster. Task Trackers will automatically pause and then reconnect to the new JobTracker. Any currently running jobs or tasks continue without losing any progress or failing. The impact of not having this capability can be high. If a job takes five hours to run and suffers a Job Tracker failure during the run, the job has to start again from the beginning. With MapR the job picks up from where it stopped when the Job tracker node failed.

5. Multi-Version support: MapR supports backward compatibility and multi-version

project support. Customers can upgrade projects and existing applications individually on their own schedule.

6. Rolling Upgrades: Upgrades to the operating system can be made without closing

the cluster down. Protecting the business from operational down time and high maintenance windows.

Page 2: MapR Unique features

 7. Data Integration & Interoperability: MapR supports, OBDC and POSIX compliant,

industry-standard NFS on every node. We are not dependant upon gateways or other third party hardware or software. This means that MapR customers do not have to spend time and cost in developing connection software to allow access to the cluster for existing data or new data sets. This also means that the ingest rates can be very high as every node supports this capability.

8. Unlimited Record & File Count: How many files will our customers need to ingest to

the cluster and keep for operational use in the next twelve to sixty months? Most business will not know the answer to this question and even if they do it will change over time. With MapR, there are no design constraints based upon the number of files. (This is related to name nodes and federated name nodes)

9. Real Time Applications: As MapR customers start to compliment the role of this

new technology from ETL off load batch processing to new real time applications; MapR can make this transition far easier than other organisation. Real time is dependant upon low latency and high performance. MapR provides a dramatically simplified architecture for real-time stream computation engines such as Storm. Streaming data can be written directly to the MapR Hadoop platform for long-term storage and MapReduce processing. Because MapRFS, unlike HDFS, is a true read/write filesystem, streaming data is available as soon as it is written. This allows for strong Storm/Hadoop interoperability, and a simpler implementation of these technologies on one unified platform.

10. Data Movement Between Different Environments: MapR support in built data

mirroring capabilities. This is simple to use and can be scheduled. This will greatly simplify operations when copying data from one environment to another. i.e Production to online Backup, or Pre Production to Production.

11. Data Backup: MapR supports snapshots and data mirroring. These tools can be

used to automate disk-to-disk backups from any environment to an online backup environment.

12. Disaster Recovery: As MapR customers find more applications over time to support

the business using unstructured and real time data, it is highly likely the business will want to apply the same data protection policies as it does to its business critical transactional data today. When this time arrives MapR customers will be able to enable this functionality (without additional cost) as the capability is already built into the MapR software. This is a common approach for MapR customers running business critical services.

13. Security: MapR has implemented the most advanced set of security tools for

Hadoop in the industry. These tools allow our customers to protect the various data sets from being viewed or copied without the appropriate permission as required. Thus securing customer data and protecting the customer’s brand.

14. Data Compression: MapR support automatic data compression natively. Although

we have not used compression in any of our calculations for the RFI response, it is highly likely that the short and longer-term capacity requirements of our customers can be greatly reduced thus reducing hardware costs. With an easily achievable 2:1 compression ratio on log or text files considerably more data can be stored on a MapR cluster. Please note that not all files can be compressed such as zip, jpg, mpeg or bz2 files.

15. Cluster Management: The MapR Control System (MCS) provides full visibility into

cluster resources and activity. MCS includes the MapR Heatmap, which provides visual insight into node health, service status, and resource utilisation, organised by cluster topology (e.g., datacenters and racks). Designed to manage clusters with thousands of nodes, the MapR Heatmap shows the health of the entire cluster at a glance, including visibility into all hardware and software issues. One of our large

Page 3: MapR Unique features

 customers manages over 12 Petabytes of data with a total of five administrators. This shows the efficiency of managing highly reliable clusters with enterprise class software and management tools.

16. Multi-Tenancy: MapR provides true multi-tenancy with job isolation, volumes,

quotas, data and job placement control, including for YARN. Multi-tenancy is the ability of a single instance of software to serve multiple tenants. A tenant is a group of users that have the same view of the system. Hadoop, as an enterprise data hub, naturally demands multi-tenancy. Creating different instances of Hadoop for various users or functions is not acceptable as it makes it harder to share data across departments and creates silos. From an administrator’s perspective, multi-tenancy requirements are to

• Ensure SLAs are met • Guarantee isolation • Enforce quotas • Establish security and delegation • Ensure low cost operations and simpler manageability

The MapR multi-tenant architecture provides a way for you to address these requirements using industry-leading capabilities.

17. Data Scientists & Data Engineers: MapR have a team of Data Scientists & Data

Engineers that can help design and develop new ways to bring business value from the various data types.