43
Abstract This guide walks you through the process of installing Isilon OneFS with Hadoop for use with the Hortonworks Data Platform (HDP) and Apache Ambari manager. EMC ISILON ONEFS WITH HADOOP AND HORTONWORKS INSTALLATION GUIDE VERSION 1.07

EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Abstract

This guide walks you through the process of installing Isilon OneFS with Hadoop for use with the Hortonworks Data Platform (HDP) and Apache Ambari manager.

EMC ISILON ONEFS WITH HADOOP AND HORTONWORKS INSTALLATION GUIDE VERSION 1.07

Page 2: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 2

Copyright © 2017 Dell Inc. or its subsidiaries. All rights reserved.

April 2017

Dell believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS-IS.“ DELL MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. USE, COPYING, AND DISTRIBUTION OF ANY DELL SOFTWARE DESCRIBED IN THIS PUBLICATION REQUIRES AN APPLICABLE SOFTWARE LICENSE.

Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property of their respective owners. Published in the USA.

EMC Corporation Hopkinton, Massachusetts 01748-9103 1-508-435-1000 In North America 1-866-464-7381 www.EMC.com

Page 3: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 3

Publication History

Version Date Description

1.01 July 20, 2016 Initial version.

1.02 August 23, 2016 Corrections to Ambari wizard procedures, including HTTPS instructions.

1.03 November 22, 2016 Corrections to the Customize Services screen for the HDFS service.

1.04 February 14, 2017 Added the Addendum for OneFS 8.0.1 with security features. Also, reorganized and updated information in the Installing OneFS with Hadoop and Configuring Ambari and Hortonworks sections.

1.05 March 22, 2017 Changed Ambari version “2.5 or later” to Ambari version “2.4 or later” in the Prerequisites section for Apache Ranger.

1.06 March 31, 2017 Changed the isi license activate command to isi license licenses activate.

Fixed the syntax to modify existing subnets and moved it to the Configure SmartConnect section.

Documented a setting that prevents an error in the Ambari Metrics.

1.07 October 5, 2017 Added “Updates and Additional Information on Isilon Hadoop Installs” and “Isilon OneFS cluster configuration” sections.

Page 4: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 4

Contents Introduction .................................................................................................................................................... 6

Audience ........................................................................................................................................................ 6

Overview ........................................................................................................................................................ 6

Updates and additional information on Isilon Hadoop installs......................................................................... 8

Prerequisites .................................................................................................................................................. 8

Ambari and Hortonworks Data Platform ....................................................................................................... 8

Isilon Multi-tenant Installations ............................................................................................................... 8

Isilon OneFS cluster configuration ............................................................................................................... 9

Installing OneFS with Hadoop ......................................................................................................................... 9

Preparing OneFS ....................................................................................................................................... 10

Validate OneFS version and license activation ....................................................................................... 10

Configure OneFS components ................................................................................................................ 11

Create an access zone ........................................................................................................................... 11

Configure SmartConnect ........................................................................................................................ 12

Configure DNS for Isilon ......................................................................................................................... 13

Verify the SmartConnect configuration ................................................................................................... 14

Create and configure Isilon HDFS root .................................................................................................... 14

Modify the access control list (ACL) setting for OneFS ............................................................................ 15

Create HDFS users and groups ............................................................................................................... 15

Preparing Ambari ...................................................................................................................................... 20

Configuring Ambari and Hortonworks ........................................................................................................ 21

Verify the deployment ............................................................................................................................... 34

Addendum.................................................................................................................................................... 36

Apache Ranger authorization policy integration ......................................................................................... 36

Prerequisites ......................................................................................................................................... 38

Deploy Apache Ranger ........................................................................................................................... 38

Ambari metrics and alerts overview ........................................................................................................... 39

Prerequisites ......................................................................................................................................... 39

View Ambari metrics .............................................................................................................................. 39

View Ambari alerts ................................................................................................................................. 41

HDFS wire encryption overview .................................................................................................................. 41

Configure HDFS Wire Encryption with OneFS .......................................................................................... 42

Page 5: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 5

Configure Apache Hadoop for HDFS wire encryption ............................................................................... 42

Integrating HTTPFS and Knox with OneFS ................................................................................................... 42

Contacting EMC Isilon Technical Support ...................................................................................................... 43

Page 6: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 6

Introduction Hadoop is an open-source framework that enables the distributed processing of large data sets across clusters of computers. Hadoop clusters use the Hadoop Distributed File System (HDFS) to provide high-throughput access to application data. You can follow the steps in this guide to install EMC Isilon OneFS with Hadoop for use with the Hortonworks Data Platform (HDP) and the Apache Ambari manager.

Before you begin, you must install an Isilon OneFS cluster.

Audience This guide is intended for systems administrators, IT program managers, IT architects, and IT managers who will be installing OneFS with Hadoop.

Overview The EMC Isilon OneFS scale-out network-attached storage (NAS) platform provides Hadoop clients with direct access to Big Data through a Hadoop Distributed File System (HDFS) interface. An Isilon OneFS cluster powered by the OneFS operating system delivers a scalable pool of storage with a global namespace.

Hadoop compute clients can access the data that is stored in an Isilon OneFS cluster by connecting to any node over the HDFS protocol. All nodes configured for HDFS provide NameNode and DataNode functionality. Each node boosts performance and expands the cluster's capacity. For Hadoop analytics, the Isilon scale-out distributed architecture minimizes bottlenecks, rapidly serves Big Data, and optimizes performance for MapReduce jobs.

In a traditional Hadoop deployment, the Hadoop compute nodes run analytics jobs against large sets of data. A NameNode directs the nodes to the data stored on a series of DataNodes. The NameNode is a separate server that holds metadata for every file that is stored on the DataNode. Often data is stored in production environments and then copied to a landing zone server to be loaded on to HDFS. This process is network intensive and exposes the NameNode as a potential single point of failure.

Page 7: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 7

In an Isilon OneFS cluster with Hadoop deployment, OneFS serves as the file system for Hadoop compute clients. On an Isilon OneFS cluster, every node in the cluster acts as a NameNode and DataNode, providing automated failover protection.

When a Hadoop client runs a job, the clients access the data that is stored on an Isilon OneFS cluster by connecting over HDFS. The HDFS protocol is native to the OneFS operating system, and no data migration is required.

The Hortonworks distribution is stored on the compute cluster, and the clients connect to the Isilon OneFS cluster over the HDFS protocol to store and access Hadoop data.

Page 8: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 8

Updates and additional information on Isilon Hadoop installs The rapid release of new features and versions of Hadoop projects can introduce new behaviors and requirements. We recommend that you review the latest updates on the Using Hadoop with Isilon - Isilon Info Hub for updates and known issues while deploying Isilon and Hadoop.

Prerequisites For supported versions, see the Hadoop Distributions and Products Supported by OneFS compatibility matrix.

Ambari and Hortonworks Data Platform

Isilon Multi-tenant Installations

If this Hadoop cluster is to be installed as a multi-tenant Isilon implementation, leveraging multiple Access Zones utilizing the same Active Directory provider: Additional configurations and considerations must be addressed before installing HDP and Ambari. We recommend that you engage Dell EMC Professional Services to understand the requirements and deployment strategies available before deploying Ambari.

Ensure that the following requirements are met:

Ambari 2.1.x or later. Hortonworks Data Platform (HDP) 2.3.x or later. Password-less SSH configured. See section 4 of the Hortonworks documentation for configuring

Password-less SSH. Familiarity with the Ambari and Hortonworks documentation and the installation instructions.

o To view the Ambari and the Hortonworks Data Platform (HDP) documents, go to http://docs.hortonworks.com/index.html

o Use the following table to record the components that you have installed.

Page 9: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 9

Component Version

Ambari version

HDP stack version

Isilon OneFS cluster name

Ambari server (FQDN)

Isilon OneFS cluster configuration

Ensure that the following requirements are met:

An Isilon OneFS cluster running OneFS 8.0.0.1 or greater. OneFS 8.0.0.1 contains a number of updates to facilitate the integration and deployment of Hadoop. We strongly recommend that you use OneFS 8.0.0.1 or greater.

The Isilon OneFS cluster has access to your network and your network has access to the Internet. Internet access is required to download components.

SmartConnect Advanced, a separately licensed Isilon module, is activated and SmartConnect is configured on your Isilon OneFS cluster.

HDFS, a separately licensed Isilon module, is activated on your Isilon OneFS cluster. Contact your EMC Isilon sales representative for more information about receiving your license keys.

A valid Isilon SmartConnect SSIP and Domain Name System (DNS) delegation is in place to provide name resolution services for a SmartConnect zone. For more information, see Isilon External Network Connectivity Guide—Routing, Network Topologies, and Best Practices for SmartConnect.

A dedicated Isilon Access Zone is in use. This is not the same as the System Zone. An Isilon HDFS root directory in the Access Zone. A simple access model between Hadoop and Isilon; UID and GUID, with parity. Use the following table to record the components that you have installed:

Component Version/License

Isilon OneFS

SmartConnect module

HDFS module

Isilon OneFS cluster name

Installing OneFS with Hadoop The installation of OneFS with Hadoop can be separated into four stages as represented in the following illustration.

Page 10: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 10

To complete the stages, you must perform tasks on both the Hortonworks/Ambari cluster and the Isilon OneFS cluster.

Preparing OneFS

Complete the following steps to configure your Isilon OneFS cluster for use with Ambari and Hortonworks Data Platform. Preparing OneFS requires you to configure DNS, SmartConnect, and Access Zones to allow for the Hadoop cluster to connect to the Isilon OneFS cluster. If these preparation steps are not successful, the subsequent configuration steps might fail.

Validate OneFS version and license activation

You must validate your OneFS version, check your licenses, and confirm that they are activated.

1. From a node in your Isilon OneFS cluster, confirm that the cluster is running OneFS 8.0.0.x by typing the following command:

isi version

Note: If you are using a version of OneFS earlier than 8.0.0.0, the implementation steps in this guide will not work.

2. Confirm that licenses for HDFS and SmartConnect Advanced are operational. If these licenses are not active and valid, some commands in this guide might not work.

Run the following commands to confirm that HDFS and SmartConnect Advanced are installed:

isi license licenses list

isi license licenses view HDFS

isi license licenses view "SmartConnect Advanced"

3. If your modules are not licensed, obtain a license key from your EMC Isilon sales representative. Type the following command to activate the license:

Preparing Ambari

Configuring Ambari

Verifying the Deployment

Preparing OneFS 1

2

3

4

Page 11: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 11

isi license licenses activate <key>

Configure OneFS components

After you configure DNS for OneFS, set up and configure the following OneFS components.

Create an access zone

Create a SmartConnect zone

Create and configure the HDFS root in the access zone

Create users and groups

Create a basic HDFS folder structure for use with HDFS

Use the following table to record the configuration information for the integration of Isilon OneFS cluster with HDP Ambari:

Parameter Value

Access zone name

Access zone path

SmartConnect zone name (FQDN)

IP range for IP pool (ranges)

SmartConnect pool name (subnet pool)

Node and interfaces in the pool

HDFS root path

Ambari server

Ambari NameNode

Open Data Platform (ODP) version

Create an access zone

On one of the Isilon nodes, you must define an access zone on the Isilon OneFS cluster and enable the Hadoop node to connect to it.

1. On a node in the Isilon OneFS cluster, create your Hadoop access zone.

isi zone zones create -–name=zone1 -–path=/ifs/zone1/hdp --create-path

2. Verify that the access zones are set up correctly.

isi zone zones list --verbose

Output similar to the following appears:

Page 12: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 12

Name: System

Path: /ifs

Groupnet: groupnet0

Map Untrusted: -

Auth Providers: lsa-local-provider:System, lsa-file-provider:System

NetBIOS Name: -

User Mapping Rules: -

Home Directory Umask: 0077

Skeleton Directory: /usr/share/skel

Cache Entry Expiry: 4H

Zone ID: 1

--------------------------------------------------------------------------------

Name: zone1-hdp

Path: /ifs/zone1/hdp

Groupnet: groupnet0

Map Untrusted: -

Auth Providers: lsa-local-provider:zone1-hdp

NetBIOS Name: -

User Mapping Rules: -

Home Directory Umask: 0077

Skeleton Directory: /usr/share/skel

Cache Entry Expiry: 4H

Zone ID: 2

3. Create the HDFS root directory within the access zone that you created.

mkdir -p /ifs/zone1/hdp/hadoop

4. List the contents of the Hadoop access zone root directory.

ls –al /ifs/zone1/hdp

Configure SmartConnect

On a node in the Isilon OneFS cluster, add a dynamic IP address pool and associate it with the access zone you created earlier.

1. Modify your existing subnets and specify a service address.

isi network subnets modify groupnet0.subnet0 -–sc-service-addr=x.x.x.x

2. Create an access zone.

isi network pools create --id=<groupnet>:<subnet>:<name> --ranges=x.x.x.x-x.x.x.x -

-access-zone=<my-access-zone> --alloc-method=dynamic -–ifaces=X-Y: <your

interfaces> --sc-subnet=subnet0 --sc-dns-zone=<my-smartconnectzone-name> --

description=hadoop

Page 13: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 13

Where:

--name subnet: <poolname>—New IP pool in subnet (for example, subnet0:pool1).

--ranges—IP range that is assigned to the IP pool

--ifaces—Node interfaces that are added to the pool

--access-zone—Access zone that the pool is assigned to

--sc-dns-zone—SmartConnect zone name

--sc-subnet—SmartConnect service subnet that is responsible for this zone

3. View the properties of the existing pool.

isi network pools view --id=groupnet0:subnet0:Hadoop-pool-hdp

Output similar to the following appears:

ID: groupnet0.subnet0.hadoop-pool-hdp

Groupnet: groupnet0

Subnet: subnet0

Name: hadoop-pool-hdp

Rules: -

Access Zone: zone1-hdp

Allocation Method: dynamic

Aggregation Mode: lacp

SC Suspended Nodes: -

Description: hdp_hadoop_access_zone

Ifaces: 1:ext-1, 2:ext-1, 3:ext-1, 4:ext-1

IP Ranges: x.x.x.x-x.x.x.x

Rebalance Policy: auto

SC Auto Unsuspend Delay: 0

SC Connect Policy: round_robin

SC Zone: isilonsczone-hdp.FQDN

SC DNS Zone Aliases: -

SC Failover Policy: round_robin

SC Subnet: subnet0

SC Ttl: 0

Static Routes: -

Configure DNS for Isilon

Before you begin, the Isilon OneFS cluster must already be implemented according to EMC Isilon best practices. For more information, see the HDFS Setup section of the EMC Isilon Best Practices for Hadoop Data Storage.

Page 14: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 14

Set up DNS records for a SmartConnect zone. You must create the required DNS records that are used to access your Isilon OneFS cluster from the Hadoop cluster. All hosts in your Hadoop cluster must be configured for both forward and reverse DNS lookups Hadoop relies heavily on DNS and performs many DNS lookups during normal operation.

You can set up a SmartConnect zone for the connections from Hadoop compute clients. SmartConnect is a module that specifies how the Isilon OneFS cluster handles connection requests from clients. For additional information and best practices for SmartConnect, see the EMC Isilon External Network Connectivity Guide.

Each SmartConnect zone represents a specific pool of IP addresses. When you associate a SmartConnect zone with an access zone, OneFS allows only clients that connect through the IP addresses in the SmartConnect zone to reach the HDFS data in the access zone. A root HDFS directory is specified for each access zone. This configuration isolates data within access zones and allows you to restrict client access to the data.

A SmartConnect zone distributes NameNode requests from Hadoop compute clients across the node interfaces in the IP pool. Each node's NameNode process will reply with the IP address of any OneFS node where the client can access the data. When a Hadoop compute client makes an initial DNS request to connect to the SmartConnect zone FQDN, the Hadoop client requests are delegated to the SmartConnect Service IP, which responds with a valid node to connect to. The client will then connect to an Isilon node that serves as a NameNode. When a second Hadoop client makes a DNS request to connect to the SmartConnect zone, the SmartConnect Service routes the client connection to a different node than the one used by the previous Hadoop compute client.

When you create a SmartConnect zone, you must add a Name Server (NS) record as a delegated domain to the authoritative DNS zone that contains the Isilon OneFS cluster.

For additional information and best practices, see the "DNS delegation best practices" section of the EMC Isilon External Network Connectivity Guide.

Verify the SmartConnect configuration

You must validate that SmartConnect is set up correctly by pinging the SmartConnect zone FQDN several times from the Hadoop client.

ping isilonsczone-hdp.FQDN

When you view the output of this command, note that different IP addresses are returned for each ping command, because with each DNS response, the IP addresses are returned through rotating round-robin DNS from the list of potential IP addresses. This validates that the SmartConnect zone name FQDN is operating correctly.

Create and configure Isilon HDFS root

On a node in the Isilon OneFS cluster, create and configure the HDFS root directory.

1. View the HDFS service settings.

isi hdfs settings view -–zone=zone1-hdp

2. Set the HDFS root directory for the access zone.

isi hdfs settings modify --zone=zone1-hdp –-root-directory=/ifs/zone1/hdp/hadoop

Page 15: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 15

3. Map the HDFS user to Isilon root. Create a user mapping rule to map the HDFS user to the OneFS root account. This mapping enables the services from the Hadoop cluster to communicate with the OneFS cluster using the correct credentials.

isi zone zones modify --user-mapping-rules=”hdfs=>root” --zone=zone1-hdp

4. Assign the Ambari NameNode in the access zone and associate the SmartConnect name with it.

isi hdfs settings modify -–zone=zone1 --ambari-namenode=<my-smartconnectzone-name>

5. Assign the Ambari server in the access zone.

Note that the host FQDN specified here is the one running the Ambari server.

isi hdfs settings modify -–zone=zone1 --ambari-server=hadoop-node.my.domain.com

6. Create an indicator file in the Hadoop directory to view your Isilon OneFS cluster and access zone through HDFS.

touch /ifs/zone1/hdp/hadoop/THIS_IS_ISILON-hdp.txt

Modify the access control list (ACL) setting for OneFS

Run the following command on a node in the Isilon OneFS cluster to modify ACL settings before you create directories or files in the next section. This creates the correct permission behavior on the cluster for HDFS.

Note: ACL policies are cluster-wide, so you should understand this change before performing it on production clusters.

isi auth settings acls modify --group-owner-inheritance=parent

isi auth settings acls view < -- validate it worked

Create HDFS users and groups

For each Hadoop system account that will submit HDFS jobs or access the file system, you must create local users on the Isilon OneFS cluster. You can add Hadoop users to the Isilon OneFS cluster manually or by using the script provided at: https://github.com/Isilon/isilon_hadoop_tools

Note: The script creates the service accounts that are required for Hadoop.

Important

EMC Isilon recommends that you maintain consistent names and numeric IDs for all users and groups on the Isilon OneFS cluster and your Hadoop clients. This consistency is important in multi-protocol environments because the HDFS protocol refers to users and groups by name, and NFS refers to users and groups by their numeric IDs (UIDs and GIDs). Maintaining this parity is critical in the behavior of OneFS multiprotocol file access.

When installing Hadoop, the installer creates all of the required system accounts. For example, a Hadoop system account, yarn, is created with the UID of 502 and the GID of 500 on the Hadoop cluster nodes. Since the Hadoop installer cannot create the local accounts directly on OneFS, they need to be created manually. You must create the Isilon yarn local account user in the Isilon access zone in which yarn will access data. You need to create a local user yarn with the UID of 502 and the GID of 500 to ensure consistency of access and permissions.

Page 16: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 16

For guidance and more information about maintaining parity between Isilon OneFS and Hadoop local users and UIDs, see the following blog post:

https://community.emc.com/community/products/isilon/blog/2016/06/22/isilon-and-hadoop-user-uid-parity

There are many methods of achieving UID and GID parity. You can leverage the create user and directory scripts, perform manual matching, or create scripts that parse users and create the equivalent users. However you choose to achieve this, the sequence will depend on your deployment methodology and management practices. EMC highly recommends that you to achieve consistency between the Hadoop cluster and OneFS, for example, hdfs=hdfs, yarn=yarn, hbase=hbase, and so on from a UID and GID consistency perspective.

Create users and directories on the Isilon OneFS cluster using Isilon scripts

This methodology achieves parity by executing user creation in the following sequence:

1. Create local users and groups on Isilon OneFS.

2. Collect the UIDs and GIDs of the users.

3. Pre-create local users and groups on all HDP hosts to be deployed.

If the local Hadoop system users and groups already exist on the Linux host, then the Ambari wizard does not create them. If you created them with the same UIDs and GIDs as on OneFS, you will maintain parity.

You must add a user on the Isilon OneFS cluster for each user that runs Hadoop services and user that submits jobs, and you must create any additional users that may access the Isilon OneFS cluster.

In the following steps, you can run an Isilon script to create local user and group accounts on the Isilon OneFS cluster. The Isilon script adds a list of default Hadoop users to the Isilon OneFS cluster that are mapped to the services and applications on the Hadoop cluster.

Note: If the users and groups must be defined by your directory services, such as Active Directory or LDAP, you must create the users and groups manually as outlined later in Create users on the Isilon OneFS cluster manually.

The isilon_create_users.sh script creates users that can be listed by running the following command:

cat zone1-hdp.passwd

For more information on the list of accounts you need based on services, go to http://docs.hortonworks.com/index.html

Steps to perform on the Isilon OneFS cluster

1. On a node in the Isilon OneFS cluster, create a scripts directory. You will extract the scripts to this directory.

mkdir -p /ifs/zone1/hdp/scripts

2. Change directories to the scripts directory that you created and change the execute permissions.

cd /ifs/zone1/hdp/scripts

3. Clone or download the latest version of the Isilon Hadoop tools as a ZIP file from https://github.com/Isilon/isilon_hadoop_tools into the /ifs/zone1/hdp/scripts directory.

Page 17: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 17

4. Unzip the files.

5. Use WinSCP to transfer the isilon_create_users.sh and isilon_create_directories.sh script files to the /ifs/zone1/hdp/scripts directory or use wget to get the scripts directly.

Alternatively, run the following commands if the Isilon OneFS cluster has a direct internet access:

wget

https://raw.githubusercontent.com/Isilon/isilon_hadoop_tools/master/isilon_create_u

sers.sh --no-check-certificate

wget

https://raw.githubusercontent.com/Isilon/isilon_hadoop_tools/master/isilon_create_d

irectories.sh --no-check-certificate

cd /ifs/zone1/hdp/scripts

chmod u+x *

6. To create all required local users and groups on your Isilon OneFS cluster for the Hadoop services and applications, run the following script:

Script usage:

bash isilon_create_users.sh –-dist=hwx –-startgid <GID> -–startuid <UID> -–

zone=<ZONE>

Where:

dist This will correspond to your Hadoop distribution - cdh, hwx, phd, phd3, or bi.

startgid The beginning UID range for the creation of users (the default is 1000).

startuid The beginning GID range for the creation off users (the default is 1000).

zone The name of the access zone where the users should be created (useful for multi-tenant environments that will use a single KDC). The System zone is the default.

Example:

bash isilon_create_users.sh --dist hwx --startgid 501 –startuid=501 -–zone=zone1

Output similar to the following appears:

Info: Hadoop distribution: hwx

Info: groups will start at GID 501

Info: users will start at UID 501

Info: will put users in zone: zone1

Info: HDFS root: /ifs/zone1/hdp

SUCCESS -- Hadoop users created successfully!

Done!

7. To create directories to map to the Hadoop users with appropriate ownership and permissions, download the isilon_create_directories.sh from https://github.com/Isilon/isilon_hadoop_tools and run the following script:

Page 18: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 18

Script usage:

bash isilon_create_directories.sh –dist=hwx -–zone=<ZONE> -–fixperm

Where:

dist The distribution of Hadoop for which the script should create an NDFS directory skeleton.

zone The name of the access zone where the directory skeleton should be created.

fixperm The argument to alter the permissions ownership of the directory skeleton according to the users created by the isilon-create-users.sh script.

Example:

bash isilon_create_directories.sh -dist hwx --fixperm –-zone=zone1

Output similar to the following appears:

Info: Hadoop distribution: hwx

Info: will fix permissions and owners on existing directories

Info: will use users in zone: zone1

Info: Access Zone ID is 2

Info: HDFS root dir is /ifs/zone1/hdp

###############################################################################

###

## Creates Hadoop directory structure on Isilon system HDFS.

###############################################################################

###

DEBUG: specs dirname /; perm 755; owner hdfs; group hadoop

DEBUG: specs dirname /app-logs; perm 777; owner yarn; group hadoop

DEBUG: specs dirname /app-logs/ambari-qa; perm 770; owner ambari-qa; group

hadoop

DEBUG: specs dirname /app-logs/ambari-qa/logs; perm 770; owner ambari-qa; group

hadoop

DEBUG: specs dirname /tmp; perm 1777; owner hdfs; group hdfs

DEBUG: specs dirname /apps; perm 755; owner hdfs; group hadoop

DEBUG: specs dirname /apps/falcon; perm 777; owner falcon; group hdfs

DEBUG: specs dirname /apps/accumulo/; perm 750; owner accumulo; group hadoop

DEBUG: specs dirname /apps/hbase; perm 755; owner hdfs; group hadoop

DEBUG: specs dirname /apps/hbase/data; perm 775; owner hbase; group hadoop

DEBUG: specs dirname /apps/hbase/staging; perm 711; owner hbase; group hadoop

DEBUG: specs dirname /apps/hive; perm 755; owner hdfs; group hdfs

DEBUG: specs dirname /apps/hive/warehouse; perm 777; owner hive; group hdfs

DEBUG: specs dirname /apps/tez; perm 755; owner tez; group hdfs

Page 19: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 19

DEBUG: specs dirname /apps/webhcat; perm 755; owner hcat; group hdfs

DEBUG: specs dirname /mapred; perm 755; owner mapred; group hadoop

DEBUG: specs dirname /mapred/system; perm 755; owner mapred; group hadoop

DEBUG: specs dirname /user; perm 755; owner hdfs; group hdfs

DEBUG: specs dirname /user/ambari-qa; perm 770; owner ambari-qa; group hdfs

DEBUG: specs dirname /user/hcat; perm 755; owner hcat; group hdfs

DEBUG: specs dirname /user/hdfs; perm 755; owner hdfs; group hdfs

DEBUG: specs dirname /user/hive; perm 700; owner hive; group hdfs

DEBUG: specs dirname /user/hue; perm 755; owner hue; group hue

DEBUG: specs dirname /user/oozie; perm 775; owner oozie; group hdfs

DEBUG: specs dirname /user/yarn; perm 755; owner yarn; group hdfs

DEBUG: specs dirname /system/yarn/node-labels; perm 700; owner yarn; group

hadoop

mkdir: /ifs/zone1/hdp/system/yarn: No such file or directory

chown: /ifs/zone1/hdp/system/yarn/node-labels: No such file or directory

chown: /ifs/zone1/hdp/system/yarn/node-labels: No such file or directory

chmod: /ifs/zone1/hdp/system/yarn/node-labels: No such file or directory

SUCCESS -- Hadoop admin directory structure exists and has correct ownership

and permissions

Done!

8. List the HDFS base directories.

ls -al /ifs/zone1/hdp/hadoop

ls -al /ifs/zone1/hdp/hadoop/user

9. On the Isilon OneFS cluster, collect the created local users and group UID & GID’s.

10. List out and copy the contents of the zone1-hdp.passwd and zone1-hdp.group files that are created when the user_create script is executed.

cat zone1-hdp.passwd

cat zone1-hdp.group

Steps to perform on the Hadoop client

For each Hadoop client:

1. Edit the /etc/passwd file on the Hadoop client and cut and paste the contents of the zone1-hdp.passwd file from Isilon to the end of the existing HDFS passwd file.

2. Edit the /etc/group file on the Hadoop client and cut and paste the contents of the zone1-hdp.group file from Isilon to the end of the existing HDFS group file.

We have now created all the relevant users and groups on Isilon and all our Hadoop clients with uid/gid parity. As the local accounts exist on the Hadoop clients, the Ambari will not create them but leverage the existed ones maintaining ID parity with the Isilon. Other methods to achieve these exist but are beyond the scope of this document.

Page 20: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 20

Create users on the Isilon OneFS cluster manually

To add more users, in addition to the users that the Isilon script configures, on the Isilon OneFS cluster you can add a user for each additional Hadoop user that submits MapReduce jobs. The following procedures show how to manually add a single user called hduser1.

Warning

If you want the users and groups to be defined by your directory service, such as Active Directory or LDAP, do not run these commands. This section addresses setting permissions of the HDFS root files or membership to run jobs. These steps create users, but will likely fail when you run jobs with this configuration.

Steps to perform on the Isilon OneFS cluster

1. Add a group to the Isilon OneFS cluster.

isi auth groups create hduser1 -–zone=zone1 --provider local –-gid <GID>

2. Create the user and the user's Hadoop home directories on the Isilon OneFS cluster.

isi auth users create hduser1 --primary-group hduser1 –zone=zone1 --provider local

--home-directory /ifs/zone1/hdp/user/hduser1 –-uid <UID>

3. Assign permissions to the user's home directory on the Hadoop cluster. The ID 2 in the example below is from when you previously ran the isi zone zones view zone1 command.

isi_run -z2 chown hduser1:hduser1 /ifs/zone1/hdp/user/hduser1

chmod 755 /ifs/zone1/hdp/hadoop/user/hduser1

Steps to perform on the Hadoop client

Since you created a new user on OneFS to run jobs, you must create the same user with UID parity on any Linux hosts that the user will access to run jobs.

1. Add the user to the Hadoop cluster.

adduser hduser1 –u <UID>

Preparing Ambari

The steps in this phase will occur on the Ambari hosts which will become your Hadoop servers and clients.

Hadoop clusters and services rely heavily on DNS. All client hosts in your system must be configured for both forward and reverse DNS lookups. Validate that all hosts can resolve each other's hostnames and IP addresses.

Before you begin the installation of Ambari, ensure that all your hosts meet the requirements needed by Ambari and Hortonworks to complete a successful Hadoop cluster installation. For more information and these installation guides, go to the Hortonworks website: http://docs.hortonworks.com/index.html

Steps to perform on the Hadoop client

1. To prepare Ambari for implementation, follow the instructions for your version of Ambari in the Hortonworks guide. In the Automated Install with Ambari guide (PDF), see the Install & Upgrade section.

The Automated Install with Ambari guide provides the steps that you must perform to prepare the Ambari environment install Ambari, and installing and configure HDP.

Page 21: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 21

Important

Complete the steps in the Hortonworks guide in section 1, "Getting Ready", and section 2, "Installing Ambari". After you start the Ambari server, do not continue with section 3 of the Hortonworks Guide until after you have completed the Preparing OneFS section of this guide.

You must complete the following steps that are described in the Hortonworks Guide:

1. Download the Ambari repository for the operating system that runs your installation host.

2. Set up the Ambari server.

3. Start the Ambari server.

4. After you complete sections 1 and 2 of the Hortonworks Guide, Ambari should be operational, with the Ambari server running on one of the Hadoop nodes.

5. Logging in to the Ambari server will bring you to the create cluster page.

6. Do not continue to section 3 of the Hortonworks Guide until the Isilon OneFS cluster is prepared in the following steps and is ready to integrate into Ambari during the installation.

Configuring Ambari and Hortonworks

Perform the steps in section 3 of the Hortonworks Ambari HDP installation guide, "Installing, Configuring, and Deploying a HDP Cluster." An outline of the steps that you must perform to configure Ambari and Hortonworks for OneFS is as follows:

1. Specify the Hadoop client and Isilon SmartConnect zone as a Target Hosts.

2. Assign NameNode and Secondary NameNode components to Isilon SmartConnect.

Page 22: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 22

3. Assign DataNodes only to Isilon SmartConnect. Do not add DataNodes to any of the HDP hosts.

4. Remove any client components for Isilon.

Steps to perform in your browser from your client

1. After starting the Ambari service, open Ambari Web using a web browser.

2. Point your browser to http://<your.ambari.server>:8080, where <your.ambari.server> is the name of your Ambari server host.

3. Log in to the Ambari server using admin as the default user name and password. You can change these credentials later.

4. For a new cluster, the Ambari install wizard displays a Welcome page from which you launch the Ambari Install wizard.

5. Select Launch Install Wizard, and complete the installation steps in the Hortonworks guide, including the following steps that are specific to Isilon.

Page 23: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 23

6. In the Name your cluster field, type a unique name for the cluster and then click Next. On the Select Version screen, select the desired version of the HDP stack and the repository to configure it. Consult the compatibility matrix for the version that you must configure with your OneFS cluster.

7. Click one of Use Public Repository or Use Local Repository based on the availability of Internet access, and in the next section, remove all the operating systems except redhat6 as shown:

Page 24: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 24

8. Click Next to proceed to the Install Options screen.

Page 25: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 25

9. On the Install Options screen, under Target Hosts, type the fully qualified name of all the Linux hosts that participate in the Hadoop cluster. Do not include Sialon at this time.

10. Under Host Registration Information attach the SSH private key.

The creation of this host and key was performed prior to running the Ambari wizard. See the Ambari prerequisites and Hortonworks Ambari documentation for additional details.

Important

Do not include the Isilon OneFS cluster in the target hosts.

To obtain the private key:

a. Log in to your Ambari host server.

b. Generate the private key as shown:

ssh-keygen

c. Copy the key to all the hosts as shown:

ssh-copy-id root@XXXXX

d. Change to the /root/.ssh directory.

e. Output the contents of the id_rsa file.

cat /root/.ssh/id_rsa

f. Copy the output to a file and save the file on your desktop.

g. Copy the file to the machine on which you are running the web-based Ambari Install Wizard.

Page 26: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 26

h. Copy the entire contents of the id_rsa file, including the leading dash characters, into the Host Registration Information box in Install Options screen of the Ambari wizard.

If no keys exist, see the Ambari documentation on generating a private key and setting up keyless SSH access on hosts in the cluster.

i. Select Register and Confirm.

11. On the Confirm Hosts screen, after the Host’s Status shows Success, click the Back button to add the Isilon SmartConnect name.

Page 27: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 27

12. On the Install Options screen, add a line in the Target Hosts box and type the fully qualified name of the Isilon SmartConnect zone name, and then select Perform manual registration on hosts and do not use SSH.

13. On the Warning screen, click OK.

14. Click Register and Confirm.

15. On the Before You Proceed screen, note that you must install Ambari Agents on each host that you want to manage before you continue.

16. Click OK.

17. All clients and the Isilon OneFS cluster should register successfully.

18. Click Next.

19. On the Choose Services screen, select or clear the Knox option based on your requirement. Refer to the Integrating HTTPFS and Knox with OneFS section for details about integrating HTTPFS and Knox with OneFS in case you have selected Knox.

20. Accept the default settings, and click Next.

Page 28: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 28

21. On the Assign Masters screen, assign the NameNode and SNameNode components to the Isilon OneFS cluster. Remove the other Isilon OneFS cluster components.

22. Assign all other master components to Hadoop client host(s). For guidance and more information about allocation and assignment, see the Hortonworks Ambari documentation.

23. On the Assign Slaves and Clients screen, ensure that DataNode is selected for the Isilon OneFS cluster, and cleared for all the other hosts.

24. Scroll to the right, and ensure that the Client option is selected for all the other desired hosts and cleared for the Isilon OneFS cluster. Clear any other components that are assigned to the Isilon OneFS cluster.

Page 29: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 29

Note: The Isilon OneFS cluster should be assigned only to the DataNode. Other hosts should have all required clients and slaves as required for your deployment. Refer to the Hortonworks deployment guide for further details.

25. Click Next.

26. On the Customize Services screen, set the passwords for the service configurations that display a red alert with the properties that require your attention. If you see red alerts, select each service tab that has an alert and enter the relevant information.

Page 30: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 30

27. On the Customize Services screen, for the HDFS service, on the Advanced settings tab, update the following settings in the Advanced hdfs site settings section:

a. Change the dfs.namenode.http-address PORT to the FQDN for the SmartConnect Zone Name followed by port 8082 from 50070.

b. Change the dfs.namenode.https-address to the FQDN for the SmartConnect Zone Name followed by port 8080 from 50470.

c. Add a property in the Custom hdfs-site field by the name dfs.client-write-packet-size.

d. Set dfs.client-write-packet-size to 131072.

e. Change the dfs.datanode.http.address port from 50075 to 8082. This setting prevents an error that generates a traceback in ambari-server.log each time you log in to the Ambari server.

28. On the Customize Services screen, for the Yarn service, on the Advanced settings tab, set yarn.scheduler.capacity.node-locality-delay to 0

For more information, see "Ambari Customizing HDP Services" in the Hortonworks guide.

29. Click Next.

You might see some warnings depending on your configuration settings. You can ignore some of the warning messages related to HDFS that are shown in the following sample screen:

Page 31: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 31

30. On the Review screen, review the configuration for accuracy. Click Deploy.

31. On the Install, Start and Test screen, wait for the test results. The tests should result in success for both the Isilon and compute nodes. A progress indicator appears.

Page 32: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 32

32. Wait until the status is 100% complete and then click Next.

Refer to the Hortonworks documentation for guidance on how to manually restart the services in case some services fail to install and start up.

Page 33: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 33

33. On the Summary screen, click Complete.

Ambari starts automatically. All services on the left pane should display as a green checkmark.

Page 34: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 34

34. Set the ODP version:

a. In Ambari Web, click Admin > Stack and Versions and then click the Versions tab.

b. Click Show Details under the HDP version.

c. Make note of the full HDP version number which appears within parentheses and contains a dash. For example, “2.4.3.0-37”.

d. Connect to a node in the Isilon OneFS cluster and run the following command through the OneFS command-line interface:

isi hdfs settings modify --odp-version=<f.u.l.l-version> --zone=<zone>

Note: You can also perform the above action through the OneFS web administration interface.

On the Versions tab, you will observe that the hosts count under Current increases by one. Also, a previous occurrence of 1 under Not Installed now changes to 0 as shown:

You are now configured to use an Isilon OneFS cluster with Hortonworks for Apache Hadoop.

Verify the deployment

Test the basic functionality of the Isilon and Ambari integration through the following steps.

1. Browse the Isilon HDFS root.

hadoop fs -ls /

Output similar to the following appears:

[root@rs-cd1 tmp]# hadoop fs –ls /

Found 4 items

Page 35: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 35

drwxr-xr-x - hbase hbase 0 2017-01-20 16:58 /hbase

drwxrwxr-x - sp;r solr 0 2016-06-20 12:53 /solr

drwxrwxrwt - hdfs supergroup 0 2016-06-20 12:53 /tmp

drwxr-xr-x - hdfs supergroup 0 2016-06-20 12:53 /user

2. Write to the Isilon HDFS root create a test directory. Output similar to the following appears:

[root@rs-cd1 tmp]# hadoop fs –mkdir /This_is_Isilon

[root@rs-cd1 tmp]# hadoop fs –ls /

Found 5 items

drwxr-xr-x - root hadoop 0 2017-01-20 17:14 /This_is_Isilon_zone1-

hdp.txt

drwxr-xr-x - hbase hbase 0 2017-01-20 16:58 /hbase

drwxrwxr-x - solr solr 0 2017-01-20 12:53 /solr

drwxrwxr-x - hdfs supergroup 0 2017-01-20 12:53 /tmp

drwxr-xr-x - hdfs supergroup 0 2017-01-20 12:53 /user

3. To test MapReduce, on the Ambari dashboard click the Services tab and then click MapReduce2 > Service

Actions > Run Service Check. The following screen appears with an indication of the progress of the service check:

This completes the high level overview of deploying Ambari with OneFS.

Page 36: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 36

Addendum This section discusses some of the security, management, and troubleshooting features that are available in an installation of OneFS with Hadoop.

Apache Ranger authorization policy integration

Apache Ranger is a centralized management console that enables you to monitor and manage data security across the Hortonworks Hadoop distribution system. A Ranger administrator can define and apply authorization policies across Hadoop components including HDFS.

With releases of OneFS earlier than 8.0.1.0, you can deploy Hortonworks with Isilon as a HDFS layer and use Ranger to create authorization policies for all the supported Hadoop components such as Hive. However, HDFS policies are not recognized by OneFS with this implementation. With OneFS 8.0.1.0 and later releases, HDFS policies that are defined in Ranger are checked before the native file access control is applied. This two-layered authorization model differs in the way the standard Ranger HDFS policies are checked with Directed Attached Storage (DAS) but the model is suitable for using OneFS as a multiprotocol datalake with Hadoop. OneFS native file system ACL allows a storage administrator to correctly set up access control for multiple workloads and with multiprotocol access to the HDFS dataset. A Ranger administrator can apply a further restrictive Hadoop user access control to the same HDFS dataset, thus providing the administrators the appropriate control span within their management domains.

In an Isilon OneFS cluster with Hadoop deployment, Ranger authorization policies serve as a filter before applying the native file access control.

Note: The Ranger Audit and Transparent Data Encryption components are not currently supported.

You can enable Apache Ranger on Isilon OneFS clusters and then check for new authorization policies, receive HDFS requests from clients, and apply authorization policies to the HDFS requests which can be one of DENY, ALLOW, or UNDETERMINED.

Note:

The Ranger DENY policy takes precedence over the ALLOW policy.

The Ranger DENY policy prevents user or group access to files and directories in OneFS that the file system would have otherwise allowed the users or groups to access.

To understand how the Ranger policies are applied, consider an example where a user in the Sales group requires access to certain files and directories that have specific HDFS file system ownership and permissions as shown:

Page 37: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 37

Scenario Files and directories HDFS file system ownership and permissions

A /ifs/<zone>/<hdfs-root> (the HDFS root directory for the zone)

read, write, and execute (rwx) permissions for everyone.

B /ifs/<zone>/<hdfs-root>/corp-

confidential Neither the user nor the Sales group have read, write, and execute (rwx) permissions.

C /ifs/<zone>/<hdfs-root>/hdfs-

confidential The Sales group owns the folder, and the folder has read, write, and execute (rwx) permissions for the group including the user.

D /ifs/<zone>/<hdfs-root>/hdfs-not-

for-user No read, write, and execute permissions for hidden files.

If Ranger policies are further applied, the user’s access to the files and directories changes as indicated in the following table:

Ranger HDFS service instance User’s view and edit permissions per scenario

None A—Yes

B—No

C—Yes

D—Yes

One policy that provides everyone in the Hadoop group, access to the root directory.

A—No

B—No

C— No

D— No

One policy that provides everyone including the Sales group, access to the root directory.

A—Yes

B—No

C— Yes

D— Yes

Two policies that are defined as follows:

Provide everyone access to the root directory.

Deny the Sales group, access to the hdfs-confidential directory.

A—Yes

B—No

C— No

D— Yes

Three policies that are defined as follows:

Provide everyone access to the root directory.

Deny the Sales group, access to the hdfs-confidential directory.

Deny the user, access to the hdfs-not-for-user directory.

A—Yes

B—No

C— No

D— No

Page 38: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 38

Prerequisites

To enable Ranger authorization on an Isilon OneFS cluster, make sure that you have installed the following components:

Apache Ranger 0.6.0 or later versions because the Ranger DENY policy is supported only with these versions

OneFS 8.0.1.0 and later releases

Ambari version 2.4 or later

Deploy Apache Ranger

Deploy the Apache Ranger service and enable or create a deny policy using the steps outlined in the Apache Ranger deny policies with OneFS 8.0.1.0 ECN blog post.

A summary of the workflow follows:

1. On a per-access zone basis, perform the following steps using the OneFS Web administration interface or the OneFS command-line administration interface to configure Ranger:

a. Enable the Ranger plugin on OneFS.

b. Specify the URL of the Apache Ranger Management console to get the policies.

c. Specify the name of the service instance.

Refer to the OneFS 8.0.1 Web Administration Guide and OneFS 8.0.1 CLI Administration Guide for guidance and information on configuring Ranger with OneFS.

2. Make sure that a Ranger service account user is configured on OneFS within your access zone.

3. Install Ranger using the steps outlined in the Hortonworks Security Guide.

Note: Do not assign Ranger components to the OneFS host as mentioned in step 5 of section 1.3.1 in the above-mentioned guide.

4. Enable the Apache Ranger HDFS plugin using the steps outlined in the Hortonworks Security Guide.

5. If you have a Kerberos-enabled cluster, follow the instructions in the Hortonworks Security Guide to enable the Ranger HDFS plugin on the cluster.

6. Enable the Ranger Deny policy using the instructions in the Apache Ranger deny policies with OneFS 8.0.1.0 ECN blog post.

Create a new service instance for OneFS using the Create Service page in Ranger. See the Hortonworks Security Guide for details. Specify values in the following fields to create a new service instance for OneFS and make note of the values:

Specify a value in the Service Name field and make note of it because you must use the same value in OneFS.

Page 39: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 39

Specify a username and in the Config Properties section specific to the service instance. The Test Connection option will continue to fail until you have saved and reopened the service instance.

Specify the Namenode URL as FQDN: hdfs://onefs.smartconnect.name:8020.

A service instance is created with a default policy all - path, granting access to all the files for the user that you included in the Service Details page.

7. Add all of your groups and individual users associated with an access zone within OneFS to the default policy in order to grant access to the groups and users. If you create local users in OneFS, or use Active Directory, you must change the UserSync settings in Ambari or add the users in the Ranger interface.

Note: OneFS file system permissions take precedence even if the policy indicates that the user or group can access everything.

8. Using the Edit Policy page in the Ranger interface, specify the group or users who will have limited access to the repository within OneFS and indicate the permissions that must be denied to that path.

9. Create a DENY policy in Ranger using the steps outlined in the Hortonworks Security Guide, if required. After you have saved the policy, OneFS enforces the policy at the next download. If you attempt to take action on a path that is denied access by the Ranger policy, this will be reported in the OneFS HDFS log at /var/log/hdfs.log. For more information, refer to the Apache Ranger deny policies with OneFS 8.0.1.0 ECN blog post.

Ambari metrics and alerts overview

In a Hadoop deployment with OneFS 8.0.1.0 or later releases, a OneFS node can monitor, collect, and push metrics data at one-minute intervals to the Ambari Metrics Collector which is one of the components of the Ambari Metrics System. All of the OneFS metrics and alert data provided to Ambari are cluster-wide. For example, for a three-node Isilon OneFS cluster, the network HDFS traffic aggregated across all the three nodes is reported to Ambari.

Note: OneFS metrics for specific Access Zones that contain HDFS dataset is not currently supported.

Prerequisites

In order to administer Ambari with OneFS, make sure that you have installed the following components:

OneFS 8.0.1.0 or later versions.

Ambari 2.0 or later versions.

Ambari Metrics must be installed on the HDP cluster.

View Ambari metrics

In order to view the Ambari metrics, follow the steps outlined in the Ambari metrics and alerts with EMC Isilon OneFS ECN blog post. A summary of the workflow is given below:

1. Access Ambari Web by opening a supported browser and entering the Ambari Web URL.

2. Click Ambari Metrics Service > Metrics Collector to determine the host name where Ambari Metrics Collector has been installed.

Page 40: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 40

3. From a node in your Isilon OneFS cluster, set the access zone and specify the name of the external Ambari host where the Ambari Metrics Collector component is installed. Refer to the OneFS 8.0.1 Web Administration Guide and OneFS 8.0.1 CLI Administration Guide for details on specifying the external Ambari host name and setting the access zone.

4. On the Ambari Web home page, click the Dashboard tab and then click one of the following widgets to view details related to the metrics data reported by OneFS:

Widget OneFS data reported

HDFS Disk Usage The usable storage capacity

DataNodes Live Always one because OneFS represents a single DataNode

Memory Usage Swap: Always zero. OneFS does not have swap space

Share: Always zero. This value is not available in OneFS

Cache: Always zero. This value is not available in OneFS

Cluster Load Procs: The number of Isilon nodes

CPUs: Number of cores in the cluster

Nodes: The number of cores in the cluster

1-min: The average workload across all OneFS nodes during the last minute

NameNode Heap Used: The sum of the current memory allocated by the HDFS process (cluster-wide)

Total: The sum of the memory limit that the HDFS process can allocate

NameNode CPU WIO Always zero. This data is unavailable in OneFS

NameNode Uptime The OneFS node that is running for the longest time

To view details related to the HDFS services, click the Services tab on the Ambari Web home page and then click HDFS on the left pane. Click one of the following widgets to view details related to the OneFS metrics data:

Widget OneFS data reported

DataNodes Live Always one because OneFS represents a single DataNode.

Block, Blocks(Total), Block Errors Always one because there are no blocks in OneFS.

Total Files + Directories Always zero. The file system does not maintain the file count.

NameNode GC count Always zero because OneFS does not run Java.

NameNode GC time Always zero because OneFS does not run Java.

Page 41: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 41

Widget OneFS data reported

NameNode Heap JVM heap used: The sum of the current memory allocated by the HDFS process.

JVM heap committed: The sum of the memory limit that the HDFS process can allocate.

Failed disk volumes Always zero because bad drives are reported on the OneFS user interface.

Corrupted Blocks Always zero because OneFS maintains its own data protection mechanism.

Under Replicated Blocks Always zero because there are no blocks in OneFS.

HDFS Space Utilization Represents the usable storage capacity in OneFS.

To view the NameNode details, click NameNode in the HDFS area on the Ambari Web user interface.

Note:

The Cores (CPU) and Memory values are not supported in OneFS 8.0.1.0.

All of the Ambari metrics data points reported by OneFS are also available within the Ambari metrics Grafana dashboards that are available with Ambari 2.2.2 and later versions.

View Ambari alerts

OneFS reports some data to Ambari that can be used in the Ambari alerts system. You can view the alerts related to OneFS in the Ambari user interface. These alerts are different than the regular alerts that the OneFS alerts system reports. On the Ambari Web home page, click Alerts and then filter by HDFS in the Service field to view the alerts. You can enable or disable the supported OneFS alerts using the Ambari Web user interface.

Note:

Custom alerts that are created for HDFS are not recognized and reported.

The Namenode Host CPU utilization alert defaults to wrong thresholds. Hence, manually set the threshold to a reasonable value or set the Check Interval option to 1 minute to receive frequent updates from this alert.

For more information, refer to the Ambari metrics and alerts with EMC Isilon OneFS ECN blog post.

HDFS wire encryption overview

Using OneFS 8.0.1.0 and later releases, you can protect data that is transmitted between an HDFS client and OneFS through the data in-flight encryption also known as the HDFS wire encryption. In a Kerberos-enabled Hadoop environment, you can enable this feature on all of the HDFS clients and on OneFS. Wire encryption manages the negotiations between an HDFS client and OneFS to encrypt and decrypt data.

Note:

You can enable wire encryption per access zone in OneFS.

Page 42: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 42

Enabling HDFS wire encryption with an Access Zone could result in HDFS traffic performance degradation while accessing data in that zone. You should characterize the performance impact as wire encryption enabled to determine whether this is acceptable to your workload.

Configure HDFS Wire Encryption with OneFS

To use the wire encryption feature, you must deploy Kerberos on your Hadoop cluster. The following instructions assume that you have already deployed and enabled Kerberos on your cluster. You can then enable HDFS wire encryption in OneFS either using the OneFS web administration interface or the OneFS command-line administration interface.

Note: HDFS wire encryption supported by EMC Isilon is different from the Apache HDFS Transparent Data Encryption technology. For more information, refer to the Enhanced Hadoop security with OneFS 8.0.1 and Hortonworks HDP ECN blog post.

You can configure HDFS wire encryption using the OneFS web administration interface or command-line administration interface. Refer to the OneFS 8.0.1 Web Administration Guide and OneFS 8.0.1 CLI Administration Guide for details.

Configure Apache Hadoop for HDFS wire encryption

To enable HDFS wire encryption with OneFS, edit the following attributes associated with Apache Hadoop:

Properties in core-site.xml

Value Properties in hdfs-site.xml Value

hadoop.rpc.p

rotection

privacy dfs.encrypt.data.transfer true

dfs.encrypt.data.transfer

.algorithm

3des (Default value)

dfs.encrypt.data.transfer

.cipher.suites

AES/CTR/NoPadding (Default value)

dfs.encrypt.data.transfer

.cipher.key.bitlength

Select one of 128,192,356. The default value is 128.

You can also edit the previous attributes directly using the Ambari Admin console or Cloudera Manager depending on how you are managing your Hadoop cluster.

Save the settings in Ambari or Cloudera and restart the affected services. You must also restart all the components of ZooKeeper even though you might not be prompted to do so. Also, if you have installed Hive on your cluster, restart the Hive service completely or restart the Hive Metastore component individually.

For more information, refer to the Enhanced Hadoop security with OneFS 8.0.1 and Hortonworks HDP ECN blog post.

Integrating HTTPFS and Knox with OneFS

With OneFS 8.0.0.3 and later releases, you can use HTTPFS as a gateway interface to OneFS and limit broad HDFS access from external clients to the Hadoop cluster. You can also integrate HTTPFS with Knox to enable the following features:

Page 43: EMC I O FS H · Use the following table to record the components that you have installed: Component Version/License Isilon OneFS SmartConnect module HDFS module Isilon OneFS cluster

Isilon OneFS Hadoop and Hortonworks Installation Guide 43

Integration with enterprise identity management solutions.

Improve service-level authorization

Integrate LDAP and AD

Enhance HDFS access security

Both HTTPFS and Knox can be installed on Kerberized and Non-Kerberized Hadoop clusters. Refer to the white paper IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY for information about integrating HTTPFS and Knox with OneFS.

Contacting EMC Isilon Technical Support Online Support: https://support.emc.com/

Telephone Support: United States: 800-782-4362 (800-SVC-4EMC) Canada: 800-543-4782 Worldwide: +1-508-497-7901 Additional worldwide access numbers

Help with Online Support Tools:

For questions specific to HDFS, contact [email protected]