Upload
wei-ting-chen
View
324
Download
2
Embed Size (px)
Citation preview
MANILA* AND SAHARA*: CROSSING THE DESERT TO THE BIG DATA OASISEthan Gafford, Red HatJeff Applewhite, NetAppMalini Bhandaru, Intel
covering for Weiting Chen
AGENDA• Introduction
• Sahara Overview
• Manila Overview
• The goal for Sahara and Manila integration
• The approaches•Manila HDFS Driver
•Manila NFS Share Mount
•Manila + NetApp NFS Connector for Hadoop
• Conclusion • Q&A
2Intel NetApp RedHat
Sahara: The ProblemHadoop* (and Spark*, Storm*…) clusters are difficult to configureCommodity hardware is cheap but requires frequent (costly) maintenanceReliable hardware is expensive, and a fixed-size cluster will cause contentionDemand for data processing varies over time within an organizationBaremetal clusters go down, and can be a single point of failureHadoop dev is very difficult without a real cluster
TL;DR: Data processing clusters are harder to provision and maintain than they should be, and it hurts.
3Intel NetApp RedHat
Sahara: The SolutionPut it in a cloud!Then have easy-to-use, standardized interfaces:
● To create clusters (reliably and repeatedly)● To scale clusters● To run data processing jobs● On any popular data processing framework● With sensible defaults that just work● And sophisticated configuration management for expert users
That's OpenStack* Sahara.
4Intel NetApp RedHat
Sahara: The API
5Intel NetApp RedHat
Sahara: Architecture
6Intel NetApp RedHat
Manila
7Intel NetApp RedHat
Manila Overview
Manila Overview
8Intel NetApp RedHat
Manila Share and Access APIsOperation CLI Command Description
Create manila create Create a Manila share of specified size; optional name, availability zone, share type, share network, source snapshot
Delete manila delete Delete an existing Manila share; the manila force-delete command may be required if the Manila share is in an error state
Edit manila metadata Set or unset metadata on a Manila share
List manila list List all Manila shares
Show manila show Show details about a Manila share
Operation CLI Command Description
Allow manila access-allow Allow access to the specified share for the specified access type and value (IP address or IP network address in CIDR notation or Windows user name).
Deny manila access-deny Deny access to the specified share for the specified access type and value (IP address or IP network address in CIDR notation or Windows user name).
List manila access-list List all Manila share access rules
9Intel NetApp RedHat
Manila & SaharaNetApp driver enabled*
10Intel NetApp RedHat
The Goal for Sahara and Manila IntegrationTo support as many as storage backends and protocols in Sahara as possible
11Intel NetApp RedHat
Sahara Data Processing Model in Kilo*
Host
Virtual Cluster
VM1 VM2
Computing Task
HDFS
Computing Task
HDFS
PATTERN 1:Internal HDFS in the
same node
Host
Virtual Cluster
VM1 VM2
Computing Task HDFS
PATTERN 2:Internal HDFS in different nodes
Host
Virtual Cluster
VM1Computin
g Task
Swift*
PATTERN 3:Swift*
Host
12Intel NetApp RedHat
Compute and data reside together in the same instance in your Hadoop cluster.
Compute and data reside in different instances. This is an elastic way to manage Hadoop clusters.
In order to persist data, Sahara supports Swift to stream the data directly.
Sahara Data Processing Model in Liberty* and the futurePATTERN 4:
External HDFS via Manila*
PATTERN 5:Local Storage with Diverse Storage
Backend in Manila
PATTERN 6:NFS
Host
Virtual Cluster
VM1Computin
g Task
Host
Manila ServiceHDFS
Driver
HDFS
Host
Virtual Cluster
VM1
Computing Task
Host
Manila ServiceNFS Driver
(Extensible)
GlusterFS
Local Volume
Host
Virtual Cluster
VM1
Computing Task
NFS
Host
NetApp* Hadoop
NFS Connector
Manila ServiceNFS Driver
This feature will be implemented in Mitaka
13Intel NetApp RedHat
Sahara can support external HDFS by using the HDFS driver in Manila.
Use local storage in Hadoop and remote mount any type of file storage in Manila.
NetApp Hadoop NFS Connector can bring the NFS capability into Hadoop.
Manila HDFS DriverUse Manila HDFS Driver as external storage in Sahara
14Intel NetApp RedHat
Data Node Data Node Data Node
Name Node
Manila*Share
Compute2Compute1 Compute3
VM1 VM2 VM3 VM4
Tenant B
VM5 VM6
HDFS Driver
Use Case: Manila HDFS DriverUse Case
● Use external HDFS either in the same node w/ compute service or in a physical cluster
Rationales For Use● Use Manila HDFS driver to connect with HDFS● Manila would help to create HDFS share
The Advantages● Use existing HDFS cluster● Centralized managing HDFS via Manila
Limitations● Only support non-secured HDFS due to account
management issue between OpenStack and Hadoop
Reference: https://blueprints.launchpad.net/manila/+spec/hdfs-driver
Tenant A
Step1
Step2
Step3
User A
User A User B
HDFS HDFS HDFS
15Intel NetApp RedHat
Enable HDFS Driver in ManilaStep 1: Set up Manila configuration
• /etc/manila/manila.conf• Make sure the login username and
password are correct• Manila service needs to use the
user to login HDFS and create the share folder by individual user
Step 2: Restart Manila Service
Reference: http://docs.openstack.org/developer/manila/devref/hdfs_native_driver.html
16
share_driver = manila.share.drivers.hdfs.hdfs_native.HDFSNativeShareDriverhdfs_namenode_ip = the IP address of the HDFS namenode. Only singlenamenode is supported now.hdfs_namenode_port = the port of the HDFS namenode servicehdfs_ssh_port = HDFS namenode SSH porthdfs_ssh_name = HDFS namenode SSH login namehdfs_ssh_pw = HDFS namenode SSH login password, this parameter is not necessary, if the following hdfs_ssh_private_key is configuredhdfs_ssh_private_key = Path to the HDFS namenode private key to ssh login…
manila.conf example
Intel NetApp RedHat
Add external HDFS as a Data Source in Sahara• Make the user account - “hdfs” has been set up in HDFS side• Sahara will use “hdfs” user to access external HDFS by default. You can
still set up your own user account in Sahara as well.• Add external HDFS Location as a data source in Sahara
LimitationNo need for user account setup since currently it can only support non-secured HDFS
17Intel NetApp RedHat
NFS Share MountingBinary storage and input / output data from Manila-provisioned NFS shares
18Intel NetApp RedHat
The Feature• Mount Manila NFS shares to:
• All nodes in cluster
• Specific node groups (NN, etc.)
• Currently NFS-only
• Extensible to other share types
• API (see right)
• Path and access defaults shown
• Only id field needed
• Intended for non-EDP users
• EDP users can use auto-mount
shares: {[ { “id”: “uuid”, “path”: “/mnt/uuid”, “access_level”: “rw” }]}
19Intel NetApp RedHat
Use Case: Binary Data Storage• “Job binaries”: *.jar, *.pig, etc.
•Comparatively small size
• Initial location irrelevant to perf
• Previous storage options in Sahara•Swift (still available)
•Sahara DB (as blobs in SQL table)
• Rationales for NFS storage•Version control directly on storage FS
•Long-term storage for use by transient clusters
•HDFS clusters on separate networks can route to common repository
•RO access control from clusters useful in this case
20Intel NetApp RedHat
Gluster Node Gluster Node Gluster Node
Manila*Share
Compute2Compute1 Compute3
VM1 VM2 VM3 VM4
Tenant B
VM5 VM6
Any Drivers
Use Case: Input / Output DataPrevious options in Sahara
● Cluster-internal HDFS● External HDFS● Swift
Rationales for use● Standard FS access to data● Convenient in many cases
Data copy necessary ● Similar to built-in hadoop fs -put operation● Irrelevant in heavily reduced output or small
input case● In large input case, network transfer is a
consideration
Reference: https://blueprints.launchpad.net/sahara/+spec/manila-as-a-data-source
Tenant A
LocalLocal LocalLocalLocalLocal
Step1
Gluster-Volume Gluster-Volume Gluster-Volume
Use GlusterFS as an example
Step2
Step3
21Intel NetApp RedHat
Workflow: NFS Binary Storage and Input Data1. Create manila NFS share2. Place binary file on share at /absolute/path/to/binary.jar3. Create sahara job binary object with path reference manila://share_uuid/absolute/path/to/binary.jar
4. Utilize job binary in job template (per normal)5. Create sahara data source with path referencemanila://share_uuid/absolute/path/to/input_dir
6. Run job from template using data source
22Intel NetApp RedHat
Automatic Mounting• API field necessary to mount for non-EDP users
• Sahara’s EDP API mounts needed shares to a long-standing cluster when a job references any data source or binary on that share
• Uses defaults for permissions: rwand path: /mnt/share_uuid/
23Intel NetApp RedHat
Automatic Mounting: Under the Hood
Framework Job Binaries Data Sources
All (Universal flow, per cluster node)
Check to ensure required shares are mounted. If not:1) Install nfs-common (Debian*) or nfs-utils (Red Hat) if not present2) Get remote path for share UUID from Manila3) Manila: access-allow for each required ip in cluster (if access does not exist)4) mount -t nfs %(access_arg)s %(remote_path)s %(local_path)s
All (Universal flow) Translate manila://uuid/absolute/path to /local_path/absolute/path
Translate manila://uuid/absolute/path to file:///local_path/absolute/path
Hadoop (w/ Oozie) hadoop fs -copy-from-local into workflow directory; referenced as filesystem paths in workflow
Use file URL in Oozie workflow document (as named job parameter or positional argument)
Spark Referenced by local filesystem path in spark-submit call
Use file URL in spark-submit call (as positional argument)
Storm Referenced as filesystem paths in storm jar call
Use file URL in storm jar call (as positional argument)
24Intel NetApp RedHat
Screenshots
25Intel NetApp RedHat
26Intel NetApp RedHat
27Intel NetApp RedHat
28Intel NetApp RedHat
29Intel NetApp RedHat
NetApp Hadoop NFS ConnectorFuture Proposal: Use NetApp Hadoop NFS Connector in Sahara
30Intel NetApp RedHat
31
NetApp NFS Connector - Architecture Overview● NFS Client written in Java● Implements the Hadoop filesystem API● No changes to Hadoop framework● No changes to user programs● Eliminates copying data into HDFS● Optimized performance for NFS access
Intel NetApp RedHat
NFS Node NFS Node NFS Node
ManilaShare
Compute2Compute1 Compute3
VM1 VM2 VM3 VM4 VM5 VM6
NFS Driver
Sahara + Manila + NetApp NFS Connector
How to use1. Use Manila to expose the NFS share 2. NetApp Hadoop NFS Connector as
“interface” to shared data
The Advantages● NFS is one of the most common storage
protocols used in IT● A direct way to communicate and process data
instead of using HDFS
Reference: https://blueprints.launchpad.net/sahara/+spec/nfs-as-a-data-source
NetApp NFS
Driver
NetApp NFS
Driver
NetApp NFS
Driver
NetApp NFS
Driver
NetApp NFS
Driver
NetApp NFS
Driver
Step1
NFS Folder NFS Folder NFS Folder
Step2
Step3
32Intel NetApp RedHat
Tenant BTenant AUse Case● NFS protocol to access data for Hadoop
33
● Deployment Choices○ NFS(v3) ○ HDFS + NFS
● Open Source● Snapshot, Flexclone
Snapmirror, and Manila Disaster Recovery (Mitaka)
Intel NetApp RedHat
NetApp NFS Connector
NetApp Hadoop NFS Plugin Use NetApp NFS Connector to run Hadoop on your existing data
• $ hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-hostname>:2049/tera/in /tera/out
• $ hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-hostname>:2049/tera/in nfs://<nfs-server-hostname>:2049/tera/out
Reference:1. http://www.netapp.com/us/solutions/big-data/nfs-connector-hadoop.aspx2. https://github.com/NetApp/NetApp-Hadoop-NFS-Connector
34Intel NetApp RedHat
Summary
● The choices:a) Manila HDFS Driver
b) Manila NFS Share Mount
https://www.netapp.com/us/media/tr-4464.pdf
c) NetApp NFS Connector for Hadoop
https://github.com/NetApp/NetApp-Hadoop-NFS-Connector
35Intel NetApp RedHat
Sahara and Manila: Access the Big Data Oasis
Participating in the Intel Passport Program?
37
Are you playing? Be sure to get your Passport Stamp for attending this session! See me or my helper in the back at the end!
Not Playing yet? What are you waiting for? See me or my helper in the back at the end and we can get you started!
Don’t forget to return your stamped passport to the Intel Booth #H3 to enter our raffle drawing! 3 Stamps = 1 Raffle Ticket
Intel NetApp RedHat
THANK YOU!
38Intel NetApp RedHat