23
©2014 LinkedIn Corporation. All Rights Reserved. Taking Hadoop to Enterprise Security Standards Karthik Ramasamy Harsh Singhal Arvind Mani

Taking Hadoop to Enterprise Security Standards

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Taking Hadoop to Enterprise Security StandardsKarthik Ramasamy

Harsh Singhal

Arvind Mani

Page 2: Taking Hadoop to Enterprise Security Standards

Access Control

Page 3: Taking Hadoop to Enterprise Security Standards

How many of you need or have access control in Hadoop?

Page 4: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Users First Internal Threat

Keeping Data Secure

External Threat

Page 5: Taking Hadoop to Enterprise Security Standards

More granular the access controls are more people can have access to

the data

Page 6: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Hadoop – Status Quo

Multiple Query Execution Engines

Custom Code Execution

Auditing

Page 7: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

User ID Email Address IP address Billing address

Security Customer Service Data Scientist

Adding & Removing group membership can take up to few hours

HDFS file permissions are very coarse (at file level)

HDFS File Permissions

Page 8: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Other Access Control Solutions

Page 9: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Mixed Data Multiple Data Processing Systems

Data for Everyone

Challenges

Page 10: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Extensible

Authorization

Fine Grain Control

Fast Changes to Authorization

Rules

What do we need?

Page 11: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Our Solution: Access Control via Encryption

Apache Kafka

HDFS

Event name

Symm

etric Encryption Key

Key Server

Parq

uet

ETLEncrypted Events

Page 12: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

User A’s Job

User B’s Job

User C’s Job

Producer Job

ETL User

Parquet File

User Columns

A 5

B 2, 5

Key Server

Access Control via Encryption

Page 13: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Columnar Storage

Page 0

Page 1

Page 2

Column a Column b

Row

gro

up

Parquet Format

Brief Overview of Parquet

Page 14: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved. *Yet to be integrated into open source Parquet

Field mode

Page

Column

| Page Mode | Hybrid Mode

Encryption Support in Parquet*

Page 15: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Examples Emails – Analysts need it to join with other tables but may not require

access to individual emails

N Values (Page)

Encrypt each value at a time

[email protected]

[email protected]

[email protected]

[email protected]

xxxxxxx

yyyyyyy

yyyyyyy

zzzzzzz

Field Mode

Page 16: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Field Mode

Joins Counts Distribution Analysis

No/Low compression

Page 17: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Page Mode

No information is leaked except entropy of the data Better performance than other modes

N Values (Page)

Encode Compress Encrypt

Page 18: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Hybrid Mode

More fine grain control of information Increase in overhead due to double encryption/decryption

N Values (Page)

Encrypt each value Encrypt

Page 19: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Plain Text | Encrypted Value |No Access

Field Mode Page Mode

Hybrid Mode

Page 20: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Key Versioning

Each key is versioned and specific for a source (File/Event name) Reduces the exposure incase of key leakage Time based access control

– All users by default can access only last 30 days of data– Give users access to data in specific time period

Authentication of producers can be done separately

Page 21: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Better Auditing Coverage

Retention Enforcement

Key Server Features

Multifactor Authentication

Page 22: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

PIG Usage

Page 23: Taking Hadoop to Enterprise Security Standards

Thank you!