33
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Owen O’Malley – Co-founder & Technical Fellow Srikanth Venkat – Senior Director, Product Management Treat Your Enterprise Data Lake Indigestion: Enterprise Ready Security And Governance For Hadoop Ecosystem

Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

Embed Size (px)

Citation preview

Page 1: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Owen O’Malley – Co-founder & Technical Fellow Srikanth Venkat – Senior Director, Product Management

Treat Your Enterprise Data Lake Indigestion: Enterprise Ready Security And Governance For Hadoop Ecosystem

Page 2: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Presenters

Owen O’Malley

Co-Founder & Technical Fellow

Hortonworks

Srikanth Venkat

Senior Director of Product Management, Security & Governance

Apache Ranger, Apache Atlas, Apache Knox

Page 3: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaHDP Security

Authentication (Kerberos, Apache Knox)

Authorization & Audits (Apache Ranger)

Data Protection

HDP Governance: Apache Atlas Overview

Page 4: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDP Security: Comprehensive, Complete, Extensible

Data Protection

Protect data at rest and in motion

Audit

Maintain a record of data access

Authorization

Provision access to data

Authentication

Authenticate users and systems

Administration

Central management and consistent securitySingle administrative console to set policy across the entire cluster: Apache Ranger

Authentication for perimeter and cluster; integrates with existing Active Directory and LDAP solutions: Kerberos | Apache Knox

Consistent authorization controls across all Apache components within HDP: Apache Ranger

Record of data access events across all components that is consistent and accessible: Apache Ranger

Secure data in motion and data at rest: HDFS TDE w/ Ranger KMS + HSM, Ranger Data Masking + Row Filtering, Wire encryption + Partner Solutions

Page 5: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Authentication & API Security: Apache Knox

Page 6: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Knox Community Snapshot

Mar 2013

Entered

Incubator

Oct 2013

0.1.0 - 0.3.0

Incubator

Releases

Feb 2014

Graduates

to

Apache TLP

Apr 2014

0.4.0

TLP

Release

Nov 2014

0.5.0 May 2015

0.6.0Apr/Aug 2016

0.9.0/0.9.1

Feb 2016

0.8.0Dec 2015

0.7.0

Nov 2016

0.10.0Dec 2016

0.11.0

Mar 2017

0.12.0TBD

1.0.0

Target

Release

Date

• Committers: 17

• Contributors from:• Hortonworks, IBM, CGI,

Uber, Oracle, Blue Talon

Apache 0.12.0/HDP 2.6

• Client SDK/DSL Improvements

• Apache Zeppelin Proxying

• YARN RM UI HA Support

• Knox Token Service

• Solr API and UI

Apache 0.11.0

• LDAP Improvements

• Hadoop Group Lookup Support

• Phoenix Server Support (Avatica)

• Management UI

• Metrics

@apache_knox

Page 7: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Knox Proxying Services

★ Provide access to Hadoop via proxying of HTTP resources

★ Ecosystem APIs and UIs + Hadoop oriented dispatching for Kerberos + doAs(impersonation) etc.

Authentication Services

★ REST API access, WebSSO flow for UIs

★ LDAP/AD, Header based PreAuth

★ Kerberos, SAML, OAuth

Client DSL/SDK Services

★ Scripting through DSL

★ Using Knox Shell classes directly as SDK

Page 8: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Authentication: Kerberos

Page 9: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Background: Kerberos

⬢ Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop

⬢ Users need to be able to reliably “identify” themselves and have identity propagated throughout the Hadoop cluster

⬢ Design & implementation of Kerberos security in native Apache Hadoop was delivered by Hortonworks co-founder Owen O’Malley!

⬢ Why Kerberos?

⬢ Establishes identity for clients, hosts and services

⬢ Prevents impersonation/passwords are never sent over the wire

⬢ Integrates w/ enterprise identity mgmt tools such as LDAP &Active Directory

⬢ More granular auditing of data access/job execution

Page 10: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Automated Kerberos Setup with Ambari

Wizard driven and automated Kerberos support (kerberos principal creation for service accounts, keytab generation and distribution for appropriate hosts, permissions, etc.)

Removes cumbersome, time consuming and error prone administration of Kerberos

Works with existing Kerberos infrastructure, including Active Directory to automate common tasks, removing the burden from the operator:

• Add/Delete Host

• Add Service

• Add/Delete Component

• Regenerate Keytabs

• Disable Kerberos

Page 11: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kerberos + Active Directory

Page 18

Cross Realm Trust

Client

Hadoop Cluster

AD / LDAP KDC

Users: [email protected]

Hosts: [email protected]

Services: hdfs/[email protected]

User Store

Use existing directory tools to manage users

Use Kerberos tools to manage host + service

principals

Authentication

Page 12: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Authorization & Audits: Apache Ranger

Page 13: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

20 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only.

Apache Ranger Community Snapshot

May 2014

XASecureAcquisition

July 2014

Enters Apache Incubation

Nov 2014

Ranger 0.4.0

Release

July 2015

Ranger 0.5/ HDP2.3

Aug 2016

Ranger 0.6/ HDP2.5

Nov 2016

Ranger 0.6.2/ HDP2.5.3

Jan 2017

Ranger TLP graduation!

Apr 2017

Ranger 0.7

/HDP2.6

TBD

1.0.0

Target

Release

Date

• Committers: 22

• Contributors from:Ebay, MSFT, Huawei, Pandora, Accenture, ING, Talend

Ranger 0.7/HDP 2.6

• Export/import of Policies

• $User and macros

• Plugin status tab

• “Show columns” and “describe extended support”

• Incremental LDAP Sync

• SmartSense Metrics

Ranger 0.6/HDP2.5

• Classification (tag) based security (ABAC)

• Dynamic Column Masking & Row Filtering

• KMS HSM Integration (Safenet)

• Dynamic Policies & Deny Conditions

• LDAP Improvements & Audit Scalability

Jun 2017

Ranger 0.7.1/ HDP2.6.1

Page 14: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger

• Central audit location for all access requests

• Support multiple destination sources (HDFS, Solr, etc.)

• Real-time visual query interface

AuditingAuthorization

• Store and manage encryption keys• Support HDFS Transparent Data

Encryption• Integration with HSM

• Safenet LUNA

Ranger KMS

• Centralized platform to define, administer and manage security policies consistently across Hadoop components

• HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas

• Extensible Architecture

• Custom policy conditions, user context enrichers

• Easy to add new component types for authorization

Page 15: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger – Attribute Based Access Control (ABAC) Model

⬢ ABAC Model⬢ Combination of the subject, action,

resource, and environment ⬢ Uses descriptive attributes: AD group,

Apache Atlas-based tags or classifications, geo-location, etc.

⬢ Ranger approach is consistent with NIST 800-162

⬢ Avoid role proliferation and manageability issues

Page 16: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Architecture

HDFS

Ranger Administration Portal

HBase

Hive Server2

Ranger Audit Server

Ranger Plugin

Hadoop C

om

ponents

Ente

rprise

Users

Ranger Plugin

Ranger Plugin

Legacy Tools and Data Governance

HDFS

Knox

NifI

Ranger Plugin

Ranger Plugin

SolrRanger Plugin

Ranger Policy ServerIntegration API

KafkaRanger Plugin

YARNRanger Plugin

Ranger PluginStorm Ranger Plugin Atlas

Solr

Page 17: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDP – Security & Governance

Classification

Prohibition

Time

Location

Policies

PDPResource

Cache

Ranger

Manage Access Policies and Audit Logs

Track Metadataand Lineage

Atlas ClientSubscribers

to Topic

Gets MetadataUpdates

Atlas

MetastoreTags

Assets

Entitles

Streams

Pipelines

Feeds

HiveTables

HDFSFiles

HBaseTables

Entitiesin Data

Lake

Industry First: Dynamic Tag-based Security Policies

Page 18: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Dynamic Row Filtering & Column Masking: Apache Ranger with Apache Hive

User 2: IvannaLocation : EU

Group: HRUser 1: JoeLocation : US

Group: Analyst

Original Query:

SELECT country, nationalid, ccnumber, mrn, name FROM

ww_customers

Country NationalID

CC No DOB MRN Name Policy ID

US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424

US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984

Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909

Country National ID CC No MRN Name

US xxxxx3233 4539 xxxx xxxx xxxx null John Doe

US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe

Ranger Policy EnforcementQuery Rewritten based on Dynamic Ranger

Policies: Filter rows by region & apply relevant column masking

Users from US Analyst group see data for US persons with CC and National ID (SSN) as masked values and MRN is nullified

Country National ID Name MRN

Germany T22000129 Ernie Schwarz 876452830A

EU HR Policy Admins can see unmasked but are restricted by row filtering policies to see data for EU persons only

Original Query:

SELECT country, nationalid, name, mrn FROM

ww_customers

AnalystsHR Marketing

Page 19: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaData Protection

Page 20: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data Protection in Hadoop

must be applied at three different layers in Apache Hadoop

Storage: encrypt data while it is at rest

Transparent Data Encryption in HDFS, Ranger KMS + HSM, Partner Products (HPE Voltage, Protegrity, Dataguise)

Transmission: encrypt data as it is in motion

Native Apache Hadoop 2.0 provides wire encryption.

Upon Access: apply restrictions when accessed

Ranger (Dynamic Column Masking + Row Filtering), Partner Masking + Encryption

Data Protection

Page 21: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger KMS

Transparent Data Encryption in HDFS

NN

A B

C D

HDFS Client

A B

C D

A B

C D

DN DN DN

Benefits Selective encryption of relevant files/folders Prevent rogue admin access to sensitive data Fine grained access controls Transparent to end application w/o changes Ranger KMS integrated to external HSM

(Safenet Luna) adding to reliability/security of KMS

SafeNet-Luna HSM

Page 22: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaApache Atlas: Vision & Features Overview

Page 23: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Background: DGI Community becomes Apache Atlas

May2015

Apache AtlasIncubation

DGI groupKickoff

Dec 2014

Apr2017

Apache 0.8 Release

Global FinancialCompany

* DGI: Data Governance Initiative

Aug2016

Apache 0.7Foundation Release

Apache Atlas 0.8/HDP2.6• Simplified Search UI

• Simplified APIs

• Classification-based security for HDFS, Kafka, HBase

• Knox SSO

• Performance/scalability improvements

Apache Atlas 0.7.1/HDP2.5

• High availability support

• LDAP Authentication/Authorization

• Classification based security for Hive

• UI Redesign

• Committers – 35• Code contributors from

- Hortonworks, IBM, Aetna, Merck, Target

Jun2017

Atlas BecomesTLP!

Page 24: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas Vision: Open Metadata & Governance Services

STRUCTURED

TRADITIONALRDBMS

METADATA

MPP APPLIANCES

Kafka Storm

Sqoop

Hive

ATLASMETADATA

Falcon

RANGERCustom

Partners

Comprehensive Enterprise Data Catalog• Lists all of your data, where it is located, its origin (lineage), owner, structure,

meaning, classification and quality• Integrate both on-premise and cloud platforms to provide enterprise wide view

Open Enterprise Data Connectors• Interoperable connector framework to connect to your data catalog out of the

box with many vendor technologies• No expensive population of proprietary siloed metadata repositories

Dynamic Metadata Discovery• Metadata is added automatically to the catalog as new data is created or data is

updated• Extensible discovery processes that characterize and classify the data

Enabling Collaboration & Workflows • Subject matter experts locate the data they need quickly and efficiently, share

their knowledge about the data and its usage to help others • Interested parties and processes are notified automatically

Automated Governance Processes • Metadata-driven access control• Auditing, metering, and monitoring• Quality control and exception management• Rights (entitlement) management

Predefined standards for glossaries, data schemas, rules and regulations

Vision:

Metadata-driven foundational governance services for enterprise data ecosystem

• Open frameworks and APIs

• Agile and secure collaboration around data and advanced analytics

• Reduce operational costs while extracting economic value of data

Page 25: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

High Level Architecture: 4 Key points

Type System

Repository

Search DSL

Bri

dgeHive Storm

Falcon Custom

REST API

Graph DB

Sear

ch

Kafka

SqoopC

on

ne

cto

rs

Me

ssag

ing

Fram

ewo

rk

3 REST API

Modern, flexible access to Atlas services, HDP components, UI & external tools

1 Data Lineage

Only product that captures lineage across Hadoop components at platform level.

4 Exchange

Leverage existing metadata / models by importing it from current tools. Export metadata to downstream systems

2 Agile Data Modeling:

Type system allows custom metadata structures in a hierarchy taxonomy

Page 26: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Lineage • Where does this data originate from (source/provenance)?

• Upstream path: Path through all data assets and processes leading up to current data asset

Impact• How is this data being used ?

• What other data assets (derivative/dependent) does this impact?

• Downstream path: Path through all data assets and processes leading out of current data asset

Used for forensics • Impact analysis

• Auditing and Compliance

Apache Atlas : Lineage

Page 27: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas: Lineage and Impact

Page 28: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas: Classification• Categorize and curate data assets for easier discovery• Associate context with data assets – Governance, Security, Business, …

GOVERNANCE

SECURITY

BUSINESS

Page 29: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas Classification: usecase – cross component

Classification based security on cross-component data assets

Page 30: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metadata Catalog Search : Basic Search

Search for a hive_table classified as ‘PII’ and name starting with ‘prov’

Filter byData Asset type

Filter byClassification

Search textWildcards: prov*, *sum*Logical expressions: prov* AND *sum*

Page 31: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metadata Catalog Search : Advanced

Filter byData asset type

Search for a hive_table named ‘employees’ and owner ‘hive’

DSL search with SQL like syntax Select columns from impressions table in raw database

hive_column where table.name=‘impressions’ and table.db.name = ‘raw’

DSL query string

Page 32: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Key Takeaways Secure APIs and UIs in Hadoop ecosystem using Apache Knox gateway

Enforce appropriate security controls to monitor data access across your businesses with Apache Ranger– Implement fine-grained policy based controls to grant and monitor data access

– Track user activity on data using user access audit logging features to help with forensic auditing for breach notification purposes

– Protect sensitive data through anonymization and pseudonymization using dynamic masking and row filtering

Establish an Enterprise Data Catalog with Apache Atlas– Identify and classify data

– Harvest and maintain metadata

Track and map the movement of data through your enterprise with Apache Atlas– Maintain a “Near Real Time” view to track data movement

– Understand data proliferation (especially sensitive data) with data lineage and impact analysis

Page 33: Treat your enterprise data lake indigestion: Enterprise ready security and governance for Hadoop Ecosystem

54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

More Information…Coming up Next..

BoF session – Security, Governance & Cybersecurity

When: 6:00pm, Thursday September 21st 2017

Where: C4.7

Also Check out other sessions on Apache Atlas & Apache Ranger from recent DataWorks Summits

https://dataworkssummit.com/san-jose-2017/

https://dataworkssummit.com/munich-2017/

HortonworksProduct Pages

https://hortonworks.com/apache/ranger/

https://hortonworks.com/apache/atlas

Hortonworks Community Connection:

https://community.hortonworks.com/spaces/64/governance-lifecycle-track.html

https://community.hortonworks.com/spaces/62/security-track_2.html

Apache Software Foundationhttp://ranger.apache.org/

http://atlas.apache.org/