42
© Hortonworks Inc. 2014 Securing Hadoop’s REST APIs Apache Knox Gateway Hadoop Summit 2014 Kevin Minder Larry McCay ttp://knox.apache.org/ ser (at) knox.apache.org ev (at) knox.apache.org

Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

Embed Size (px)

DESCRIPTION

Securing Hadoop's REST APIs with Apache Knox Gateway Presented at Hadoop Summit on June 6th, 2014 Describes the overall roles the Apache Knox Gateway plays in Hadoop security and briefly covers its primary features.

Citation preview

Page 1: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Securing Hadoop’s REST APIs Apache Knox Gateway

Hadoop Summit 2014

Kevin MinderLarry McCayhttp://knox.apache.org/

user (at) knox.apache.orgdev (at) knox.apache.org

Page 2: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

What is Apache Knox?

• The Apache Knox Gateway is…

• an extensible reverse proxy framework

• for securely exposing REST APIs and HTTP based services at a perimeter

• out of the box it provides:

• support for several of the most common Hadoop services

• integration with enterprise authentication systems

• several other useful features

Page 3: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

What the Apache Knox Gateway isn’t

• Not an alternative to Kerberos for strong Hadoop core authentication

• Not a channel for high volume data ingest or export

Page 4: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

History and Status of the Apache Knox Gateway?

• 2013-02: Accepted into Apache Incubator

• 2013-04: Released 0.2.0

• 2013-10: Released 0.3.0

• 2014-02: Graduated to Apache TLP

• 2014-04: Released 0.4.0, Included in HDP 2.1

Page 5: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Why Knox?

Simplified Access

• Kerberos encapsulation • Extends API reach• Single access point• Multi-cluster support• Single SSL certificate

Centralized Control

• Central REST API auditing• Service-level authorization• Alternative to SSH “edge node”

Enterprise Integration

• LDAP integration• Active Directory integration• SSO integration• Apache Shiro extensibility• Custom extensibility

Enhanced Security

• Protect network details• Partial SSL for non-SSL services• WebApp vulnerability filter

Page 6: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Layers Of Hadoop Security

Perimeter Level Security• Network Security (i.e. Firewalls)• Apache Knox (i.e. Gateways)

Authentication• Kerberos• Delegation Tokens

OS Security• File Permissions• Process Isolation

Authorization• MR ACLs• HDFS Permissions• HDFS ACLs• HiveATZ-NG• HBase ACLs• Accumulo Label Security• XA Security Policies

Data Protection• Transport• Storage

Page 7: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

REST API

HadoopServices

What does Perimeter Security really mean?

Gateway

REST API

Firewall

User

Firewall required at perimeter(today)

Knox Gateway controls all

Hadoop REST API access

through firewall

Hadoop cluster mostly

unaffected

Firewall only allows

connections through specific ports from Knox

host

Page 8: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

What REST APIs does Hadoop support?

Service URL ExampleWebHDFS http://localhost:50070/webhdfs

WebHCat (aka Templeton) http://localhost:50111/templeton

Oozie http://localhost:11000/oozie

HBase (via Stargate) http://localhost:60080

Hive (HiveServer2) http://localhost:10001/cliservice

jdbc:hive2://localhost:10001/?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice

Page 9: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Basic Knox Operation & Extensibility

Page 10: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Authentication and Identity Propagation

1. REST API Request

2. HTTP Basic Auth Challengekminder:secret

3. Authenticate kminder:secret

knoxkeytab

4. Authenticates asknox via SPNego

(i.e. Kerberos)

5. REST API RequestdoAs kminder

0. Configure knox user to be known as trusted proxy

LDAP

Page 11: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Scalability and Fault Tolerance

Hadoop

Apache HTTPD+mod_proxy_balancerf5 BIG-IPHAProxy

Knox Cluster(no shared state)

Really any traditionalweb tier

load balancer

Page 12: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Extensibility: Providers and Services

• Both are dynamically discovered on the class path via Java’s ServiceLoader

• Providers• Add new features to the gateway that can be used by Services• Typically result in one or more filters being added to one or more chains

• Services• Add new endpoints to the gateway to expose a specific service• Assemble filter chains to enable specific features via providers • Includes providing configuration to providers

• For example URL rewrite rules• Associates endpoints with filter chains

Page 13: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Topology Files

• Describe the services that should be exposed for a specific cluster• Found in <GATEWAY_HOME>/conf/topologies• Name of topology file dictates URL component

• sandbox.xml -> http://localhost:8443/gateway/sandbox/webhdfs/…

<topology> <gateway> <provider> <role>authentication</role> <name>custom</name> </provider> </gateway> <service> <role>WEBHDFS</role> <url>http://localhost:50070</url> </service></topology>

Location of WebHDFS in target cluster

Selects an authentication

provider implementation

Page 14: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Enhanced Security

Page 15: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Topology Leakage: WebHDFS Example

• WebHDFS direct

curl -i -X PUT 'http://localhost:50070/webhdfs/v1/user/guest/file1?op=CREATE&user.name=guest’

HTTP/1.1 307 TEMPORARY_REDIRECTLocation: http://sandbox.hortonworks.com:50075/webhdfs/v1/user/guest/file1?

op=CREATE&user.name=guest&namenoderpcaddress=sandbox.hortonworks.com:8020&overwrite=false

• WebHDFS via Knox

curl -u guest:guest-password -i -k -X PUT 'https://localhost:8443/webhdfs/v1/user/guest/file2?op=CREATE’

HTTP/1.1 307 Temporary RedirectLocation: https://localhost:8443/gateway/sandbox/webhdfs/data/v1/webhdfs/v1/user/guest/file2?

_=AAAACAAAABAAAACAgUDT7-QQZlpkcm09lxrxI0Bgo9d-Egghp_qxmd4pQsmm3zvYc3M_LrDBQpMBNA48DnMS9QOhyzywCMl1WAShyX4RUETPjEcZa6x9Jwz7TMANjSRKMR6F3rKf93ME-VsI2Phe8CX72L6oiI778--8F9DQCO8LHFHzLL70iB13Hm2BLyj-x9p3tn7FOHxkbPl5d-eHxVop7Dk

RPC and HTTP address of DataNode is

leaked unnecessarily to REST client

Encrypted query param contains dispatch information used by gateway

when redirect followed

Page 16: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Topology Leakage: Oozie Example

• Oozie direct

<configuration> <property> <name>oozie.wf.application.path</name> <value>hdfs://foo:9000/user/bansalm/myapp/</value> </property> ...</configuration>

• Oozie via Knox

<configuration> <property> <name>oozie.wf.application.path</name> <value>/user/bansalm/myapp/</value> </property> ...</configuration>

• Example of submitting an Oozie job from Apache docs• https://oozie.apache.org/docs/4.0.1/WebServicesAPI.html

• HTTP POST XML below to /oozie/v1/jobs

REST client must know

RPC address of NameNode

Page 17: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Partial SSL for non-SSL enabled services

REST API REST API

WebHCat

DMZ

DesktopGateway

HTTPS HTTP

First “hop” through

public/corp networks

protected with SSL

Last “hop” within secure network non-SSL

Page 18: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

WebApp Vulnerability Filter

• The Knox WebAppSec provider allows for the plugin of vulnerability prevention filters• Cross Site Request Forgery CSRF is currently provided

• Uses common required header technique• Later releases will include more filters based on standard techniques

<provider <role>webappsec</role> <name>WebAppSec</name> <enabled>true</enabled> <param><name>csrf.enabled</name><value>true</value></param> <param><name>csrf.customHeader</name><value>X-XSRF-Header</value></param> <param><name>csrf.methodsToIgnore</name><value>GET,OPTIONS,HEAD</value></param></provider>

Page 19: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Simplified Access

Page 20: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Knox Service URLs vs. direct URLs

Service Direct URL Knox URLWebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs

WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton

Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie

HBase http://hbasehost:60080 https://knox-host:8443/hbase

Hive http://hivehost:10001/cliservice https://knox-host:8443/hive

Masters could be on many

different hosts

One hosts, one port

Consistent paths

Page 21: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Hadoop CLIs require full server configs

/etc/hive/conf/hive-site.xml

<property> <name>hive.server2.thrift.http.port</name> <value>10001</value></property><property> <name>hive.server2.thrift.http.path</name> <value>cliservice</value></property>

/etc/hadoop/conf/core-site.xml

<property> <name>fs.defaultFS</name> <value>hdfs://sandbox.hortonworks.com:8020</value></property>

/etc/hadoop/conf/hdfs-site.xml

<property> <name>dfs.namenode.http-address</name> <value>sandbox.hortonworks.com:50070</value></property>

/etc/hadoop/conf/yarn-site.xml

<property> <name>yarn.resourcemanager.address</name> <value>sandbox.hortonworks.com:8050</value></property>

/etc/hive-webhcat/conf/webhcat-site.xml

<property> <name>templeton.port</name> <value>50111</value></property>

/etc/oozie/conf/oozie-site.xml

<property> <name>oozie.base.url</name> <value>http://sandbox.hortonworks.com:11000/oozie</value></property>

HBase – Command line

These files may all be on different nodes on the cluster

too!

Page 22: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Kerberos Encapsulation

1. REST API Request

2. HTTP Basic Auth Challengekminder:secret

3. Authenticate kminder:secret

knoxkeytab

4. Authenticates asknox via SPNego

(i.e. Kerberos)

5. REST API RequestdoAs kminder

0. Configure knox as trusted proxy

The client isn’t even aware the

cluster is secured with Kerberos

Page 23: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

REST API REST API

Hadoop

REST API Reach: Intranet Access Model

DMZ

DesktopGateway

Users will discover novel

ways to use easily accessible REST

APIs

Page 24: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

HTML/JS REST

Hadoop

REST API Reach: Middleware Access Model

Web Tier / DMZ

Browser

“Give the APIs to the Apps”

GatewayAppServer

REST

Most enterprises cannot deal with Kerberos in the

web tier and don’t have CLI access

Page 25: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

REST API REST API

Hadoop

REST API Reach: Internet Access Model

DMZ

“Give the APIs to the Everyone”

Gateway

Internet

HaaS vendors are exposing

Hadoop REST APIs to the

internet. What does the API tell these clients to

know about your cluster?

Page 26: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Multi-Cluster Support

Gateway

http://knox:8443/gateway/green/webhdfs/v1 http://knox:8443/gateway/blue/webhdfs/v1

greenProduction

Cluster

blueResearch

Cluster

One hosts, one port for

many clusters

Page 27: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Simplified Client Certificate Management

hdfscert

hivecert

hbasecert

knoxcert

knoxpubkey

hivepubkey

hbasepubkey

hdfspubkey

• User only needs to trust Knox’s cert• Admin only needs to manage multiple keys on Knox hosts

Page 28: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Centralized Control

Page 29: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

SCP/SSHLogin Hadoop CLIs

Hadoop

Client Edge Node CLI Access Model

DMZ

Edge NodeDesktop

“Take the Users to the CLI”Limited auditing on edge node

CLI too hard to install on desktops

Page 30: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

REST APILogin REST API

Hadoop

Improved auditing and access control

DMZ

DesktopGateway

All activity audited

consistently

Additional authorization

control available

Page 31: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Service Level Authorization

• Control access to services by user, group or IP address

<provider> <role>authorization</role> <name>AclsAuthz</name> <enabled>true</enabled> <param> <name>WEBHDFS.acl</name> <value>*;admin;127.0.0.1</value> </param></provider>

Page 32: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

XA Secure Integration

1. REST API Request

0. Distributepolicy

3. REST API Request

Policy Server

Agent

2. Service level authorization decision

Agent integrated as authorization

provider

Policies authored in the

portal and distributed by

the policy server

Page 33: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

KNOX-250: SSH Bastion Auditing Functionality

• Community is developing an extension

• Based on Apache MINA SSHD

• Provides administrative SSH access via Knox

• Further centralizes auditing of cluster administration

• https://issues.apache.org/jira/browse/KNOX-250

Page 34: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

KNOX-250: SSH Bastion Auditing Functionality

SSHLogin Hadoop CLI

HadoopDMZ

DesktopGateway

All activity audited

consistently

Page 35: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Enterprise Integration

Page 36: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Apache Shiro Authentication Provider

• Apache Shiro is the primary authentication provider for Knox

• Used for both LDAP and Active Directory

• Apache Shiro is a popular JEE and JSE security framework

• Very modular and flexible architecture

• Many community extensions

• Integrated into Knox as a servlet filter

Page 37: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Apache Shiro Authentication Provider<provider> <role>authentication</role> <name>ShiroProvider</name> <enabled>true</enabled> <param> <name>main.ldapRealm</name> <value>org.apache.shiro.realm.ldap.JndiLdapRealm</value> </param> <param> <name>main.ldapRealm.userDnTemplate</name> <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value> </param> <param> <name>main.ldapRealm.contextFactory.url</name> <value>ldap://localhost:33389</value> </param> <param> <name>main.ldapRealm.contextFactory.authenticationMechanism</name> <value>simple</value> </param> <param> <name>urls./**</name> <value>authcBasic</value> </param></provider>

Page 38: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

SSO Integration

• Similar in concept Hadoop’s trusted proxy model• Preconfigured for SiteMinder use case• HTTP Headers used to propagate pre-authenticated user and group info• Only acceptable for use in a tightly controlled network environment

<provider> <role>federation</role> <name>HeaderPreAuth</name> <enabled>true</enabled> <param> <name>preauth.validation.method</name> <value>preauth.ip.validation</value> </param> <param> <name>preauth.ip.addresses</name> <value>127.0.*</value> </param></provider>

Page 39: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

OAuth 2

• OAuth is becoming the defacto standard for communicating a user’s identity to REST APIs• It allows for explicit authorization by the user for the application to

access resources• It has a number of ways to represent the user and authentication

information to go over the wire• JSON Web Token (JWT) is an emerging standard for representing the

various claims, attributes and scopes of an identity• Can be used as a bearer token, URL parameter or Header

• OAuth is also gaining popularity as a federation token for SSO integrations

Page 40: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

KNOX-393: OAuth Resource Provider

• Community investigating OAuth Federation Provider extension • Considering Apache Oltu• Warning: Diagram dramatically oversimplified• There are a number of other potential flows

2. REST API RequestAuthorization: Bearer <token>

3. validateAccessToken(<token>)

4. Authenticates asknox via SPNego

(i.e. Kerberos)

5. REST API RequestdoAs kminder

0. Configure knox user to be known as trusted proxy

1. requestAccessToken(JWT)return Bearer token

kminder

Page 41: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

What is next for Knox?Jira Assignee Description

KNOX-393: OAuth Resource Provider for Middleware and Application Integration

COMMUNITY OAuth 2 federation provider potentially based on Apache Oltu for external application SSO to Knox and Hadoop

KNOX-355: Support Knox Authentication Provider based on Hadoop Auth Module (SPNEGO)

KNOX Team SPNEGO authentication support for Knox clients

KNOX-250: SSH Bastion Auditing Functionality COMMUNITY SSH tunneling and auditing functionality in addition to REST gateway services.

KNOX-353: Support Hadoop Java Client URLs KNOX Team In order to be used Hadoop CLIs that can use REST, we need to support the expected URLs. This is in addition to the extended URLs for multiple Hadoop cluster support by Knox.

KNOX-242: LDAP Authentication Enhancements

KNOX Team Search attribute based authentication rather than simple LDAP bind.

KNOX-74: Support YARN REST API KNOX Team Add support for the YARN REST API

KNOX-66: Support Ambari REST API access via the Gateway

KNOX Team Add support for the Ambari REST API

TBD TBD What is important to you?

Page 42: Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, 2014

© Hortonworks Inc. 2014

Interested?

• We’re hiring!• http://hortonworks.com/careers/open-positions/

• Especially hands on platform level development experience with • Kerberos• LDAP• OAuth• SAML• JAAS/GSS-API• Crypto