Upload
mapr-technologies
View
539
Download
0
Tags:
Embed Size (px)
Citation preview
1 ©MapR Technologies
Using Standard File-‐Based Applica4ons and SQL-‐Based
Tools with Hadoop
2 ©MapR Technologies
Who am I?
§ Keys Botzum § [email protected] § Senior Principal Technologist, MapR Technologies
hBp://www.mapr.com/company/events/speaking/dc-‐hug-‐9-‐18-‐12
3 ©MapR Technologies
The MapR Distribu4on for Apache Hadoop
§ The open, enterprise-‐grade distribuLon for Apache Hadoop – Open source components • Hive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, …
– Enhancements to make Hadoop more open and enterprise-‐grade
§ Growing fast and a recognized leader
4 ©MapR Technologies
MapR in the Cloud
§ Available as a service with Amazon ElasLc MapReduce (EMR) – hBp://aws.amazon.com/elasLcmapreduce/mapr
§ Available as a service with Google Compute Engine
5 ©MapR Technologies
MapR
Make Hadoop more open
Make Hadoop enterprise-‐grade
This presentaLon
• High Availability • Scalability • Management tools – Web, CLI, REST • Data ProtecLon – snapshots & mirroring • Performance
6 ©MapR Technologies
Not All Applica4ons Use the Hadoop APIs
ApplicaLons and libraries that use files and/or SQL • These are not legacy
applicaLons, they are valuable applicaLons
ApplicaLons and libraries that use the Hadoop APIs
30 years 100,000s applicaLons
10,000s libraries 10s programming languages
7 ©MapR Technologies
Hadoop Needs Industry-‐Standard Interfaces
• MapReduce and HBase applicaLons • Mostly custom-‐built
Hadoop API
• File-‐based applicaLons • Supported by most operaLng systems NFS
• SQL-‐based tools • Supported by most BI applicaLons and query builders
ODBC
8 ©MapR Technologies
NFS
9 ©MapR Technologies
Your Data is Important
§ HDFS-‐based Hadoop distribuLons do not (cannot) properly support NFS
§ Your data is important, it drives your business – make sure you can access it – Why store your data in a system which cannot be accessed by 95% of the world’s applicaLons and libraries?
§ Access to HDFS source code != access to your data
10 ©MapR Technologies
The NFS Protocol
§ RFC 1813
§ Very simple protocol
§ Random reads/writes – Read count bytes from offset offset of file file
– Write buffer data to offset offset of a file file
§ HDFS does not support random writes so it cannot support NFS
WRITE3res NFSPROC3_WRITE(WRITE3args) = 7; struct WRITE3args { nfs_fh3 file; offset3 offset; count3 count; stable_how stable; opaque data<>; }; READ3res NFSPROC3_READ(READ3args) = 6; struct READ3args { nfs_fh3 file; offset3 offset; count3 count; };
11 ©MapR Technologies
Hadoop Was Designed to Support Mul4ple Storage Layers
HDFS
o.a.h.hd
fs.Distrib
uted
FileSystem
NFS interface
Hadoop FileSystem API
S3
o.a.h.fs.s3n
aLve.NaL
veS3FileSystem
Local File System
o.a.h.fs.LocalFileSystem
FTP
o.a.h.fs.qp.FTPFileSystem
MapR storage layer
com.m
apr.fs.MapRFileSystem
o.a.h.fs.FileSystem Interface MapReduce
12 ©MapR Technologies
One NFS Gateway
What about scalability and high availability?
13 ©MapR Technologies
Mul4ple NFS Gateways
14 ©MapR Technologies
Mul4ple NFS Gateways with Load Balancing
15 ©MapR Technologies
Mul4ple NFS Gateways with NFS HA (VIPs)
16 ©MapR Technologies
Customer Examples: Import/Export Data
§ Network security vendor – Network packet captures from switches are streamed into the cluster – New paBern definiLons are loaded into online IPS via NFS
§ Online measurement company – Clickstreams from applicaLon servers are streamed into the cluster
§ SaaS company – ExporLng a database to Hadoop over NFS
§ Ad exchange – Bids and transacLons are streamed into the cluster
17 ©MapR Technologies
Customer Examples: Produc4vity and Opera4ons
§ Retailer – OperaLonal scripts are easier with NFS than HDFS + MapReduce • chmod/chown, file system searches/greps, perl, awk, tab-‐complete
– Consolidate object store with analyLcs
§ Credit card company – User and project home directories on Linux gateways • Local files, scripts, source code, … • Administrators manage quotas, snapshots/backups, …
§ Large Internet company recommendaLon system – Web server serve MapReduce results (item relaLonships) directly from cluster
§ Email markeLng company – Object store with HBase and NFS
18 ©MapR Technologies
ODBC
19 ©MapR Technologies
ODBC
§ ODBC – Open DataBase ConnecLvity – Open standard API for accessing a SQL-‐based backend – Developed by Microsoq and Simba Technologies in 1992
§ Flagship API for SQL-‐based BI and reporLng – Excel, Tableau, MicroStrategy, Crystal Reports, …
§ Advanced ODBC drivers use the latest 3.52 specificaLon
20 ©MapR Technologies
MapR ODBC Driver
§ MapR provides a Hive ODBC 3.52 driver – Developed in partnership with ODBC inventor Simba Technologies – Compliant with latest ODBC 3.52 specificaLon • 32-‐ and 64-‐bit plavorm support • Windows and Linux
§ Enables direct SQL access to MapR-‐stored data by translaLng SQL to HiveQL
§ SQLizer enables seamless connecLvity – Provides ANSI SQL-‐92 front-‐end – Targeted for exisLng apps that generate standard SQL queries – Transforms SQL query into HiveQL query
21 ©MapR Technologies
Example: Tableau
22 ©MapR Technologies
Example: Open source query builder (Kaimon)
23 ©MapR Technologies
Example: MicrosoW Excel
24 ©MapR Technologies
In Summary
§ Open standards are important § SupporLng exisLng applicaLons and tools that support those standards is valuable – Preserves investment in tools – Preserves investment in custom applicaLons that proceeded Hadoop – Leverages skills you already have
25 ©MapR Technologies
Join MapR
§ Join the fastest growing Hadoop company
§ Open posiLons in every discipline – Engineers – SoluLon Architects – Product Management
§ Email [email protected]
26 ©MapR Technologies
Time for Ques4ons
§ Download slides or send me an email – hBp://www.mapr.com/company/events/speaking/dc-‐hug-‐9-‐18-‐12
§ Download MapR to learn more – www.mapr.com/download