40
IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions Architect DELL EMC ABSTRACT This paper describes implementing HTTPFS and Knox together with Isilon OneFS to enhance HDFS access security. This integrated solution has been tested and certified with Hortonworks on HDP v2.4 and Isilon OneFS v 8.0.0.3.

IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY

Boni Bruno, CISSP, CISM, CGEIT Principal Solutions Architect DELL EMC

ABSTRACT

This paper describes implementing HTTPFS and Knox together with Isilon

OneFS to enhance HDFS access security. This integrated solution has

been tested and certified with Hortonworks on HDP v2.4 and Isilon OneFS

v 8.0.0.3.

Page 2: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

CONTENTS

Introduction ................................................................................................................................................................... 3

WebHDFS REST API .................................................................................................................................................... 3

WebHDFS Port Assignment in Isilon OneFS ............................................................................................................... 5

WebHDFS Examples with ISILON ............................................................................................................................... 5

WebHDFS Security Concerns ..................................................................................................................................... 8

HTTPFS ........................................................................................................................................................................... 9

Installing HTTPFS........................................................................................................................................................ 9

Configuring HTTPFS ................................................................................................................................................. 10

Configuring HTTPFS for Kerberos ............................................................................................................................ 13

Running and Stopping HTTPFS ................................................................................................................................. 19

Configuring HTTPFS Auto-Start ................................................................................................................................ 19

Testing HTTPFS ........................................................................................................................................................ 22

Knox ............................................................................................................................................................................. 24

Installing Knox .......................................................................................................................................................... 24

Configuring Knox using Ambari ............................................................................................................................... 24

Configuring Knox for LDAP ....................................................................................................................................... 26

Configuring Knox for Kerberos................................................................................................................................. 28

Testing Knox and Isilon Impersonation Defense ..................................................................................................... 30

Final Comments ....................................................................................................................................................... 35

Appendix ...................................................................................................................................................................... 37

Additional Testing Results ....................................................................................................................................... 38

Page 3: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

INTRODUCTION

Hadoop provides a Java native API to support file system operations such as create, rename or delete files and

directories, open, read or write files, set permissions, etc. This is great for applications running within the Hadoop

cluster, but there may be use cases where an external application needs to make such file system operations on

files stored on HDFS as well. Hortonworks developed the WebHDFS REST API to support these requirements based

on standard REST functionalities. WebHDFS REST APIs support a complete File System / File Context interface for

HDFS.

WEBHDFS REST API

WEBHDFS IS BASED ON HTTP OPERATIONS LIKE GET, PUT, POST AND DELETE. WEBHDFS

OPERATIONS LIKE OPEN, GETFILESTATUS, LISTSTATUS ARE USING HTTP GET, OTHER

OPERATIONS LIKE CREATE, MKDIRS, RENAME, SETPERMISSIONS ARE RELYING ON HTTP

PUT. APPEND OPERATIONS ARE BASED ON HTTP POST, WHILE DELETE IS USING HTTP

DELETE. AUTHENTICATION CAN BE BASED ON USER NAME, QUERY PARAMETER (AS PART

OF THE HTTP QUERY STRING) OR IF SECURITY IS ENABLED, THROUGH KERBEROS.

Web HDFS is enabled in a Hadoop cluster by defining the following property in hdfs-site.xml:

<property>

<name>dfs.webhdfs.enabled</name>

<value>true</value> </property>

If using Ambari, enable WebHDFS under the General Settings of HDFS as shown below:

Page 4: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

When using Isilon as a centralized HDFS storage repository for a given Hadoop Cluster, all namenode and datanode

functions must be configured to run on Isilon for the entire Hadoop cluster. By design, WebHDFS needs access to

all nodes in the cluster. Before the WebHDFS interface on Isilon can be used by the Hadoop Cluster, you must

enable WebHDFS in the Protocol Settings for HDFS on the designated Access Zone in Isilon - this is easily done in

the OneFS GUI. In the example below, hdp24 is the HDFS Access Zone for the Hadoop Cluster. Note the check

mark next to ENABLE WebHDFS access.

It is not sufficient to just enable WebHDFS in Ambari. Isilon must also be configured with WebHDFS enabled so

end to end WebHDFS communication can work in the Hadoop cluster. If multiple Access Zones are defined on

Isilon, make sure to enable WebHDFS as needed on each access zone.

Page 5: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

WEBHDFS PORT ASSIGNMENT IN ISILON ONEFS

All references to Hadoop host hdp24 in this document refer to a defined SmartConnect HDFS Access Zone on

Isilon. TCP Port 8082 is the port OneFS uses for WebHDFS. It is important that the hdfs-site.xml file in the Hadoop

Cluster reflect the correct port designation for HTTP access to Isilon. See Ambari screen shot below for reference.

WEBHDFS EXAMPLES WITH ISILON

Assuming the Hadoop cluster is up and running with Isilon and WebHDFS has been properly enabled for the

Hadoop cluster, we are ready to test WebHDFS. CURL is a great command line tool for transferring data using

various protocols, including HTTP/HTTPS. The examples below use curl to invoke the WebHDFS REST API available

in Isilon OneFS to conduction various file system operations. Again, all references to hdp24 used in the curl

commands below refer to the SmartConnect HDFS Access Zone on Isilon and not some edge node in the cluster.

GETTING FILE STATUS EXAMPLE

The screen shot above shows curl being used to connect to Isilon’s WebHDFS interface on port 8082, the

GETFILESTATUS operation is used as user hduser1 to retrieve info on the projects.txt file.

Note: The projects.txt file is a test file I created. It is not part of the Hortonworks software.

Page 6: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

A web browser may also be used to get projects.txt file status from Isilon WebHDFS as shown below:

This is similar to executing hdfs dfs –ls /user/hduser1/projects.txt from a Hadoop client node n107 as shown

below:

This quick example shows the flexibility of using WebHDFS. It provides a simple way to execute Hadoop file system

operations by an external client that does not necessarily run on the Hadoop cluster itself. Let’s look at another

example.

CREATING A DIRECTORY EXAMPLE

Here the MKDIRS operation on a different client node n105 is used with PUT to create the directory /tmp/hduser

as user hduser1 on Isilon. We can tell by the true Boolean result the operation was successful. We can also check

the result by using hdfs to see the directory on Isilon as shown below:

Page 7: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

OPEN A FILE EXAMPLE

In the example above, the OPEN operation is used with curl to display the text string “Knox HTTPFS Isilon Project”

within the /tmp/hduser1/project.txt file.

As shown before, a web browser can be used to access the file as well. Here a browser is configured to

automatically open text files in notepad, so accessing the WebHDFS API on Isilon as shown below will open the

contents of /tmp/hduser1/project.txt in notepad directly.

To validate the contents from within the cluster, we can use hdfs as show below:

Page 8: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

I’m only scratching the surface with the examples above; there are various operations you can execute with

WebHDFS. You can easily use WebHDFS to append data to files, rename files or directories, create new files, etc.

See the Appendix for many more examples.

It should be apparent that WebHDFS provides a simple, standard way to execute Hadoop file system operations

with external clients that do not necessarily run within the Hadoop cluster itself.

WEBHDFS SECURITY CONCERNS

SOMETHING WORTH POINTING OUT WITH THE ABOVE EXAMPLES AND WITH WEBHDFS IN

GENERAL – CLIENTS ARE DIRECTLY ACCESSING THE NAMENODES AND DATANODES VIA

PREDEFINED PORTS. THIS CAN BE SEEN AS A SECURITY ISSUE FOR MANY ORGANIZATIONS

WANTING TO ENABLE EXTERNAL WEBHDFS ACCESS TO THEIR HADOOP INFRASTRUCTURE.

Many organizations do not want their Hadoop infrastructure accessed directly from external clients. As seen thus

far, external clients can use WebHDFS to directly access the actual ports namenodes and datanodes are listening

on in the Hadoop Cluster and leverage the WebHDFS REST API to conduct various file system operations. Although

firewalls can filter access from external clients, the ports are still being directly access. As a result, firewalls do not

prohibit the execution of various WebHDFS operations.

The solution to this issue, in many cases, is to enable Kerberos in the Hadoop cluster and deploy Secure REST API

Gateways that enforce strong authentication and access control to WebHDFS. The remainder of this document

focuses on using HTTPFS and Knox in conjunction with Isilon OneFS to provide a secure WebHDFS deployment with

Hadoop. A diagram of the secure architecture is shown below for reference.

Page 9: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

HTTPFS

The introduction section of this document provides an overview of WebHDFS and demonstrates how the

WebHDFS REST APIs support a complete File System / File Context interface for HDFS. WebHDFS is efficient as it

streams data from each datanode and can support external clients like curl or web browsers to extend data access

beyond the Hadoop cluster.

Since WebHDFS needs access to all nodes in the cluster by design, WebHDFS inherently establishes a wider foot

print for HDFS access in a Hadoop cluster since clients can access HDFS over HTTP/HTTPS. To help minimize the

size of the foot print to clients, a gateway solution is needed that provides a similar File System / File Context

interface for HDFS, and this is where HTTPFS comes in to play.

HTTPFS is a service that provides a REST HTTP gateway supporting all HDFS File System operations (read and

write). HTTPFS can be used to provide a gateway interface, i.e. choke point, to Isilon and limit broad HDFS access

from external clients to the Hadoop cluster. HTTPFS can also be integrated with Knox to improve service level

authorization, LDAP & AD integration, and overall perimeter security. See the Knox section of this document for

more details. The remainder of this section covers the installation and configuration of HTTPFS with Isilon.

INSTALLING HTTPFS

HTTPFS can be installed on Ambari server or a worker node, for production deployments, deploying on a dedicated

worker node is a best practice.

To install HTTPFS: yum install hadoop-httpfs (Note: existing HWX repos are hadoop-httpfs aware)

Note: The HTTPFS service is a tomcat application that relies on having the Hadoop libraries and configuration

available, so make sure to install HTTPFS on an edge node that is being managed by Ambari.

After you install HTTPFS, the directories below will be created on the HTTPFS server:

/usr/hdp/2.x.x.x-x/hadoop-httpfs

/etc/hadoop-httpfs/conf

/etc/hadoop-httpfs/tomcat-deployment

Page 10: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

CONFIGURING HTTPFS

If you change directories to /usr/hdp on your HTTPFS server and list the files there, you will see a directory with

the version number of your existing HDP release. Make note of it so you can set the current version for httpfs. Set

the version for current with the following command:

hdp-select set hadoop-httpfs 2.x.x.x-x (replace the x with your HDP release)

The installation of httpfs above deploys scripts which have some hardcoded values that need to be changed.

Adjust the /usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh script:

#!/bin/bash

# Autodetect JAVA_HOME if not defined

if [ -e /usr/libexec/bigtop-detect-javahome ]; then

. /usr/libexec/bigtop-detect-javahome

elif [ -e /usr/lib/bigtop-utils/bigtop-detect-javahome ]; then

. /usr/lib/bigtop-utils/bigtop-detect-javahome

fi

### Added to assist with locating the right configuration directory

export HTTPFS_CONFIG=/etc/hadoop-httpfs/conf

### Remove the original HARD CODED Version reference.

Next, you need to create the following symbolic links:

cd /usr/hdp/current/hadoop-httpfs

ln -s /etc/hadoop-httpfs/tomcat-deployment/conf conf

ln -s ../hadoop/libexec libexec

Page 11: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

Like all the other Hadoop components, httpfs follows the use of *-env.sh files to control the startup environment.

Above, in the httpfs.sh script we set the location of the configuration directory, this configuration directory is used

to find and load the httpfs-env.sh file.

The httpfs-env.sh file needs to be modified as shown below:

# Add exports to control and set the Catalina directories for starting and finding the httpfs application

export CATALINA_BASE=/usr/hdp/current/hadoop-httpfs

export HTTPFS_CATALINA_HOME=/etc/hadoop-httpfs/tomcat-deployment

# Set a log directory that matches your standards

export HTTPFS_LOG=/var/log/hadoop/httpfs

# Set a tmp directory for httpfs to store interim files

export HTTPFS_TEMP=/tmp/httpfs

The default port for httpfs is TCP 14000. If you need to change the port for httpfs, add the following export to the

above httpfs-env.sh file on the HTTPFS server:

export HTTPFS_HTTP_PORT=<new_port>

In the Ambari web interface, add httpfs as a proxy user in core-site.xml in the HDFS > Configs > Advanced >

Custom core site section:

Note: If the properties that are referenced below do not already exist, do the following steps:

1. Click the Add Property link in the Custom core site area to open the Add Property window.

2. Add each value in the <name> part in the Key field.

3. Add each value in the <value> part in the Value field.

4. Click Add. Then click Save.

Page 12: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

<property>

<name>hadoop.proxyuser.httpfs.groups</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.httpfs.hosts</name>

<value>*</value>

</property>

Make sure to restart HDFS and related components after making the above changes to core-site.xml.

At this point HTTPFS is configured to work with a non-Kerberos Hadoop cluster. If your cluster is not secured with

Kerberos, you can skip the following section CONFIGURING HTTPFS FOR KERBEROS and proceed to RUNNING AND

STOPPING HTTPFS and TESTING HTTPFS.

Page 13: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

CONFIGURING HTTPFS FOR KERBEROS

Ambari does not automate the configuration of HTTPFS to support Kerberos. If your Hadoop cluster was secured

with Kerberos using Ambari, you will need to create some needed keytabs and modify the httpfs-site.xml before

HTTPFS will work in a secure Kerberos Hadoop cluster.

The following assumptions are made for this section on configuring HTTPFS for Kerberos:

1. HTTPFS has been installed, configured, and verified to be working prior to enabling Kerberos.

2. Kerberos was enabled using Ambari and an MIT KDC and Isilon is configured and verified for Kerberos.

Both httpfs and HTTP service principals must be created for HTTPFS if they do not already exist.

Create the httpfs and HTTP (see note below) principals:

kadmin: addprinc -randkey httpfs/[email protected]

kadmin: addprinc -randkey HTTP/[email protected]

Note: HTTP principal and keytab may already exist as this is typically needed for other Hadoop services in

a secure Kerberos Hadoop cluster deployment. HTTP must be in CAPITAL LETTERS.

Create the keytab files for both httpfs and HTTP (see note above) principals:

kadmin -q "ktadd -k /etc/security/keytabs/httpfs.service.keytab httpfs/[email protected]"

kadmin -q "ktadd -k /etc/security/keytabs/spnego.service.keytab HTTP/[email protected]"

Note: The spnego keytab above only needs to be created if it does not already exist on the node running HTTPFS.

Merge the two keytab files into a single keytab file:

Page 14: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

ktutil: rkt /etc/security/keytabs/httpfs.service.keytab

ktutil: rkt /etc/security/keytabs/spnego.service.keytab

ktutil: wkt /etc/security/ketyabs/httpfs-http.service.keytab

ktutil: quit

The above will create a file named httpfs-http.service.keytab in /etc/security/keytabs

Test that the merged keytab file works:

klist -kt /etc/security/keytabs/httpfs-http.service.keytab

The above command should list both hdfs and HTTP principals for the httpfs-http.service.keytab. Below is an

example output from a test cluster:

Change the ownership and permissions of the /etc/security/keytabs/httpfs-http.service.keytab file:

chown httpfs:hadoop /etc/security/keytabs/httpfs-http.service.keytab

chmod 400 /etc/security/keytabs/httpfs-http.service.keytab

Edit the HTTPFS server httpfs-site.xml configuration file in the HTTPFS configuration directory by setting the

following properties:

Page 15: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

httpfs.authentication.type: kerberos

httpfs.hadoop.authentication.type: kerberos

httpfs.authentication.kerberos.principal: HTTP/ @< YOUR-REALM.COM>

httpfs.authentication.kerberos.keytab: /etc/hadoop-httpfs/conf/httpfs-http.keytab

httpfs.hadoop.authentication.kerberos.principal: httpfs/ @< YOUR-REALM.COM>

httpfs.hadoop.authentication.kerberos.keytab: /etc/security/keytabs/httpfs-http.keytab

httpfs.authentication.kerberos.name.rules: Use the value configured for 'hadoop.security.auth_to_local' in

Ambari's HDFS Configs under "Advanced Core-Site".

An example httpfs-site.xml is listed below, with the relevant Kerberos information highlighted in red:

<configuration>

<!-- HTTPFS proxy user setting -->

<property>

<name>httpfs.proxyuser.knox.hosts</name>

<value>*</value>

</property>

<property>

<name>httpfs.proxyuser.knox.groups</name>

<value>*</value>

</property>

<!-- HUE proxy user setting -->

<property>

<name>httpfs.proxyuser.hue.hosts</name>

<value>*</value>

Page 16: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

</property>

<property>

<name>httpfs.proxyuser.hue.groups</name>

<value>*</value>

</property>

<property>

<name>httpfs.hadoop.config.dir</name>

<value>/etc/hadoop/conf</value>

</property>

<property>

<name>httpfs.authentication.type</name>

<value>kerberos</value>

</property>

<property>

<name>httpfs.hadoop.authentication.type</name>

<value>kerberos</value>

</property>

<property>

<name>kerberos.realm</name>

<value>SOLARCH.LAB.EMC.COM</value>

</property>

Page 17: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

<property>

<name>httpfs.authentication.kerberos.principal</name>

<value>HTTP/[email protected]</value>

</property>

<property>

<name>httpfs.authentication.kerberos.keytab</name>

<value>/etc/security/keytabs/httpfs-http.service.keytab</value>

</property>

<property>

<name>httpfs.hadoop.authentication.kerberos.principal</name>

<value>httpfs/[email protected]</value>

</property>

<property>

<name>httpfs.hadoop.authentication.kerberos.keytab</name>

<value>/etc/security/keytabs/httpfs-http.service.keytab</value>

</property>

<property>

<name>httpfs.authentication.kerberos.name.rules</name>

<value>

RULE:[1:$1@$0]([email protected])s/.*/accumulo/

RULE:[1:$1@$0]([email protected])s/.*/ambari-qa/

RULE:[1:$1@$0]([email protected])s/.*/hbase/

RULE:[1:$1@$0]([email protected])s/.*/hdfs/

RULE:[1:$1@$0]([email protected])s/.*/spark/

Page 18: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

RULE:[1:$1@$0]([email protected])s/.*/accumulo/

RULE:[1:$1@$0](.*@SOLARCH.LAB.EMC.COM)s/@.*//

RULE:[2:$1@$0]([email protected])s/.*/accumulo/

RULE:[2:$1@$0]([email protected])s/.*/ams/

RULE:[2:$1@$0]([email protected])s/.*/ams/

RULE:[2:$1@$0]([email protected])s/.*/hdfs/

RULE:[2:$1@$0]([email protected])s/.*/falcon/

RULE:[2:$1@$0]([email protected])s/.*/hbase/

RULE:[2:$1@$0]([email protected])s/.*/hdfs/

RULE:[2:$1@$0]([email protected])s/.*/hive/

RULE:[2:$1@$0]([email protected])s/.*/knox/

RULE:[2:$1@$0]([email protected])s/.*/httpfs/

RULE:[2:$1@$0]([email protected])s/.*/mapred/

RULE:[2:$1@$0]([email protected])s/.*/hdfs/

RULE:[2:$1@$0]([email protected])s/.*/oozie/

RULE:[2:$1@$0]([email protected])s/.*/yarn/

DEFAULT </value>

</property>

</configuration>

This concludes the configuration work needed for HTTPFS to work in a secure Kerberos Hadoop cluster.

Follow the instructions in the next sections to start and test HTTPFS with Isilon.

Page 19: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

RUNNING AND STOPPING HTTPFS

Executing httpfs is simple.

To start:

cd /usr/hdp/current/hadoop-httpfs/sbin

./httpfs.sh start

To stop:

./httpfs.sh stop

CONFIGURING HTTPFS AUTO-START

As the root user, create the following hadoop-httpfs script in /etc/init.d:

#!/bin/bash

hdp-select set hadoop-httpfs 2.x.x.x.x-x

# See how we were called.

case "$1" in

start)

/usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh start

;;

stop)

/usr/hdp/current/hadoop-httpfs/sbin/httpfs.sh stop

;;

*)

echo $"Usage: $prog {start|stop|restart}"

esac

Page 20: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

As root user:

chmod 755 /etc/init.d/hadoop-httpfs

chkconfig --add hadoop-httpfs

# Start Service

service hadoop-httpfs start

# Stop Service

service hadoop-httpfs stop

This method will run the service as the httpfs user. Ensure that the httpfs user has permissions to write to the log

directory /var/log/hadoop/httpfs. The correct permission settings are shown below:

Note: the httpfs user also needs to be created on Isilon. The httpfs user is a system account that gets created

during installation of httpfs. As with all other Hadoop server accounts, Isilon needs to have all service accounts

defined as a LOCAL PROVIDER in the appropriate HDFS Access Zone (e.g. hdp24) as shown below.

Page 21: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

Create the httpfs user in the LOCAL HDFS Access Zone for your cluster in Isilon OneFS. Assign the httpfs user to

the hadoop primary group. Leave the httpfs account Disabled as shown above and below. The UID on Isilon does

not need to match the UID on the httpfs server.

Page 22: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

TESTING HTTPFS

As seen in the introduction section of this document, the curl command is an excellent tool for testing WebHDFS;

the same is true for testing HTTPFS. The default port for httpfs is TCP PORT 14000. The tests below show how

HTTPFS and Isilon OneFS can be used together in a Hadoop cluster. The requests made on port 14000 on the

HTTPFS gateway are passed to Isilon. The HTTPFS gateway is configured for Kerberos as is the Isilon HDFS Access

Zone. The Kerberos configuration is optional, but recommended for production Hadoop deployments to improve

cluster security.

The testing below is with Kerberos enabled. So make sure you have obtained and cached an appropriate Kerberos

ticket-granting ticket before running the commands. Use klist to verify you have a ticket cached as shown below:

GETTING A USER’S HOME DIRECTORY EXAMPLE

The screen shot above shows curl being used to connect to the HTTPFS gateway on port14000, the

GETHOMEDIRECTORY operation is used on user hduser1 to retrieve the home directory info.

HTTP enables GSS-Negotiate authentication. It is primarily meant as a support for Kerberos5 authentication but

may be also used along with another authentication method. GSS-Negotiate is specified with the –-negotiate

option with curl and the –w defines what to display on stdout after a completed and successful operation.

Page 23: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

LIST DIRECTORY EXAMPLE

The screen shot above shows curl being used to connect to the HTTPFS gateway on port14000, the LISTSTATUS

operation is used as user hduser1 to do a directory listing on /tmp/hduser1.

CREAT DIRECTORY EXAMPLE

The screen shot above shows curl being used to connect to the HTTPFS gateway on port14000, the MKDIRS

operation is used as user hduser1 to create the directory /tmp/hduser1/test. The Boolean result of true means

the command executed successfully.

We can verify the creation of the directory with the hdfs command as show below:

This concludes the HTTPFS installation, configuration, and testing section of this document. The next section

covers how to integrate Knox with HTTPFS and Isilon.

Page 24: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

KNOX

Knox enables the integration of enterprise identity management solutions and numerous perimeter security

features for REST/HTTP access to Hadoop and provides perimeter security for Hadoop services. Knox currently

supports YARN, WebHCAT, Oozie, HBase, Hive, and WebHDFS Hadoop services. The focus of this paper is on the

WebHDFS Hadoop service only. Just like HTTPFS, Knox can be installed on Kerberized and Non-Kerberized

Hadoop clusters.

Knox by default uses WebHDFS to perform any HDFS operation, but it can also leverage HTTPFS for the same HDFS

operations. Knox with HTTPFS provides a defense in depth strategy around REST/HTTP access to Hadoop and Isilon

OneFS.

This section covers the installation and configuration of Knox and LDAP services to work with HTTPFS in a

Kerberized cluster to provide secure REST/HTTP communications to Hadoop and Isilon OneFS.

INSTALLING KNOX

Knox is included with Hortonworks Data Platform by default. If you unselected the Knox service during installation

of HDP, just click the Actions button in Ambari and select the Knox service as shown below and click install.

CONFIGURING KNOX USING AMBARI

Knox can be managed through Ambari. Since HTTPFS runs on port 14000, a topology change to Knox for the

WebHDFS role is needed. Change the topology within the Advance topology section in Ambari/Knox, an example

topology configuration for the WebHDFS role is shown below:

Page 25: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

The WebHDFS role is listed as a service in the topology configuration:

<service>

<role>WEBHDFS</role>

<url>http://<HTTPFS_HOST>:14000/webhdfs</url>

</service>

The HTTPFS_HOST should be replaced with the fully qualified name of the HTTPFS server. Port 14000 is the default

port for HTTPFS. If you made a change to the HTTPFS port assignment make sure to reflect the port change in the

Knox topology configuration as well. Everything else in the topology configuration can be left alone unless you

made other port changes to other services.

In the Ambari web interface, check that knox is configured as a proxy user in core-site.xml in the HDFS > Configs >

Advanced > Custom core site section and that the fully qualified domain name of the Knox host is set.

Note: If the properties that are referenced below do not already exist, do the following steps:

1. Click the Add Property link in the Custom core site area to open the Add Property window.

2. Add each value in the <name> part in the Key field.

3. Add each value in the <value> part in the Value field.

4. Click Add. Then click Save.

Page 26: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

<property>

<name>hadoop.proxyuser.knox.host</name>

<value>n105.solarch.lab.emc.com</value>

</property>

<property>

<name>hadoop.proxyuser.knox.groups</name>

<value>users</value>

</property>

Make sure to restart HDFS and related components after making the above changes to core-site.xml.

CONFIGURING KNOX FOR LDAP

Knox can easily integrate with LDAP - just add an LDAP provider and associated parameters to the topology

configuration and you are done. An example LDAP provider (within the topology file) is shown below:

<provider>

<role>authentication</role>

<name>ShiroProvider</name>

<enabled>true</enabled>

<param>

<name>sessionTimeout</name>

<value>30</value>

</param>

<param>

<name>main.ldapRealm</name>

<value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>

</param>

<param>

<name>main.ldapRealm.userDnTemplate</name>

Page 27: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

<value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value>

</param>

<param>

<name>main.ldapRealm.contextFactory.url</name>

<value>ldap://localhost:33389</value>

</param>

<param>

<name>main.ldapRealm.contextFactory.authenticationMechanism</name>

<value>simple</value>

</param>

<param>

<name>urls./**</name>

<value>authcBasic</value>

</param>

</provider>

The LDAP provider directs Knox to use a directory service for authentication. In the example above, a local LDAP

Provider (port 33389) is being used for basic authentication for all urls. Make sure you use a supported LDAP

service compatible with Hortonworks and Isilon and to modify the Knox topology configuration to match your

deployed LDAP configuration if LDAP will be used with Knox.

Supported LDAP servers:

OpenLDAP Active Directory w/ RFC2307 schema extension

Apple OpenDirectory (OD) Centrify

Oracle Directory Sever ApacheDS Red Hat Directory Server (RHDS) Radiantlogic VDS Novell Directory Server (NDS)

Page 28: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

CONFIGURING KNOX FOR KERBEROS

If the Hadoop cluster is secure with Kerberos, you need to make sure Knox is configured for Kerberos as well to

avoid authentication errors with the HTTPFS gateway and backend Isilon cluster. The Kerberos configuration for

Knox is done under Advance gateway-site in Ambari. An example configuration is shown below:

The Advanced gateway-site configuration allows you to specify the Knox gateway port (e.g. 8444), the location of

the krb5.conf (Kerberos configuration file), and set the gateway to use Kerberos (set to true).

The Advance knox-env in Ambari allows you to set the Knox user and group accounts, Knox keytab path, and Knox

Principal Name. An example configuration is shown below:

Note: the knox user also needs to be created on Isilon. The knox user is a system account that gets created during

installation of knox. As with all other Hadoop server accounts, Isilon needs to have all service accounts defined as

a LOCAL PROVIDER in the appropriate HDFS Access Zone (e.g. hdp24) as shown below.

Page 29: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

Create the knox user in the LOCAL HDFS Access Zone for your cluster in Isilon OneFS. Assign the knox user to the

hadoop primary group. Leave the knox account Disabled as shown above and below. The UID on Isilon does not

need to match the UID on the knox server.

Page 30: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

TESTING KNOX AND ISILON IMPERSONATION DEFENSE

Now that Knox and HTTPFS have been installed and configured, we can begin end-to-end testing with Isilon in a

secure Kerberos Hadoop cluster deployment using either curl or a web browser.

GETTING A USER’S HOME DIRECTORY EXAMPLE

The screen shot above shows curl being used to connect to the Knox gateway on port 8444 with LDAP user

ldapuser1, the GETHOMEDIRECTORY operation is used to retrieve the home directory info for the LDAP user. The

network connection to the Knox gateway is secured with TLS.

Let’s see what happens when we use the same REST HTTP operation over a web browser that connects to the

Knox gateway:

First, the Knox gateway will prompt for user authentication, after entering the correct LDAP credentials, we can

see the result of the REST HTTP GETHOMEDIRECTORY operation in the web browser as shown below:

Page 31: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

Note that the network connection to the Knox gateway is secured with TLS as shown below:

I used self-signed certificates for this lab deployment, so there is a certificate error shown, but the network

connection is securely encrypted with TLS and a strong AES cipher.

OPENING A FILE EXAMPLE

Unlike the GETHOMEDIRECTORY operation shown in the previous test example, the OPEN operation actually

accesses data - we want to employ more security checks when data is being access in cases like this.

Page 32: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

The screen shot above shows curl being used to connect to the Knox gateway on port 8444 as LDAP user

ldapuser1, the OPEN operation then tries to open the contents of the project.txt file in /tmp/hduser1, but a Server

Error is encountered. Although Isilon is aware of ldapuser1, Isilon provides an added layer of security to check for

impersonation attacks.

In this case, the HTTPFS gateway (which runs as the httpfs user) is acting as a proxy for user ldapuser1 REST HTTP

request between Knox and Isilon. When Isilon receives the OPEN request from httpfs on behalf of ldapuser1,

Isilon checks its Proxy User settings to see if httpfs is authorized to impersonate as ldapuser1 or the group

ldapuser1 is in, i.e. the hadoop group.

Assuming it is within policy for httpfs to impersonate anyone in the hadoop group, we can update the Proxy User

settings on Isilon so httpfs is authorized to process requests from either the ldapuser1 user specifically or anyone

in the hadoop group. The example below depicts a proxy configuration for the hadoop group:

With the proxy user setting in place, we can successfully run the previous test example to open a file:

Page 33: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

As show above, with the correct Isilon Proxy User Policy in place on Isilon, the Open operation is now

allowed. Note: If the /tmp/hduser1 directory on Isilon did not have global read permissions set, this

operation would fail as shown below:

Changing the permissions on the /tmp/hduser1 directory on Isilon caused a permission denied error for

the same previous test operation. This is a testament to the embedded Isilon OneFS security features

and a benefit of using a centralized HDFS storage solution like Isilon.

CREAT DIRECTORY EXAMPLE

The screen shot above shows curl being used to connect to the Knox gateway on port 8444, the

MKDIRS operation is used as user ldapuser1 to create the directory /tmp/ldaptest. The Boolean result

of true means the command executed successfully.

Page 34: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

We can verify the creation of the directory with the hdfs command as show below:

This concludes the Knox installation, configuration, and testing section of this document.

Please see the Appendix for additional Knox/HTTPFS/Isilon test examples.

Page 35: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

FINAL COMMENTS

This solution has been tested and certified by both DELL EMC and Hortonworks with success. One thing that was

noticed during testing of the integrated solution is that httpfs wants the header “content-type: octet” stipulated

on data upload requests. The content-type is support by both WebHDFS & HTTPFS, but HTTPFS will throw a 400

Bad Request Error.

For example, let say you create a test data_file on the cluster with the CREATE operation, you will need to use the

–H flag with curl to specify the Content-Type accordingly, see example below:

With the Content-Type specified, the data upload successfully completes with no errors. This is an HTTPFS

requirement and has nothing to do with either Knox or Isilon OneFS. We can use hdfs command to see the

content of the created data_file as shown below:

Page 36: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

Reading the file via curl does not require anything special as shown below:

The port for Knox was changed to 8444 instead of the default 8443. Be aware when setting up HTTPS for the

Ambari web interface, the default port is also 8443. To avoid port conflicts, I recommend you carefully assign a

unique port to your Knox gateway; port 8444 is a safe bet.

Page 37: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

APPENDIX

Page 38: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

ADDITIONAL TESTING RESULTS

Below are additional testing examples for reference.

RENAMING A FILE EXAMPLE

The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute the

RENAME operation to rename data_file to data_file_new, the Boolean result of true means the command

executed successfully.

We can verify further my listing the contents of the /tmp/ldaptest directory:

SETTING FILE REPLICATION EXAMPLE

The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute

the SETREPLICATION operation to set replication to 1 for data_file_new, the Boolean result of true means

the command executed successfully.

Page 39: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

Note: Isilon will always respond with true for these kinds of requests, but the reality is that Isilon OneFS file

system is much more efficient than HDFS, Isilon uses erasure encoding instead of replication to maintain

high availability.

SETTING FILE PERMISSIONS EXAMPLE

The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute

the SETPERMISSION operation to 777 for data_file_new, the HTTP/1.1 200 OK result of means the

command executed successfully. The hdfs command shows that the permissions for this data file were

changed on Isilon accordingly.

APPENDING DATA TO A FILE EXAMPLE

Page 40: IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO … · IMPLEMENTING HTTPFS & KNOX WITH ISILON ONEFS TO ENHANCE HDFS ACCESS SECURITY Boni Bruno, CISSP, CISM, CGEIT Principal Solutions

The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute

the APPEND operation to add ApendInfo to data_file_new, the HTTP/1.1 200 OK result of means the

command executed successfully. The hdfs command shows the data was appended successfully on Isilon.

RECURSIVE DELETE EXAMPLE

The above curl command connects to the Knox gateway on port 8444 as LDAP user ldapuser1 to execute

the DELETE operation to recursively delete from /tmp/ldaptest on, the HTTP/1.1 200 OK result of means the

command executed successfully. The hdfs command shows the directory and its content was successfully

removed from Isilon.