31
End-to-End Security and Auditing in a Big-Data-as-a-Service (BDaaS) Deployment Nanda Vijaydev - BlueData Abhiraj Butala - BlueData

End-to-End Security and Auditing in a Big Data as a Service Deployment

Embed Size (px)

Citation preview

Page 1: End-to-End Security and Auditing in a Big Data as a Service Deployment

End-to-End Security and Auditing in a Big-Data-as-a-Service (BDaaS) Deployment

Nanda Vijaydev - BlueDataAbhiraj Butala - BlueData

Page 2: End-to-End Security and Auditing in a Big Data as a Service Deployment

“A mechanism for the delivery of statistical analysis tools and information that helps organizations understand and use insights gained from large information sets in order to gain a competitive

advantage.”

On-Demand, Self-Service, ElasticBig Data Infrastructure, Applications,

Analytics

Source: www.semantikoz.com/blog/big-data-as-a-service-definition-classification

Big-Data-as-a-Service (BDaaS)

Page 3: End-to-End Security and Auditing in a Big Data as a Service Deployment

Multi-Tenant Big-Data-as-a-Service

Data/Storage

Prod

2.2

Dev/Test

2.4

POC

2.3

Prod

2.3

Dev/Test

2.4

MARKETING R&D MANUFACTURING360 Customer View Log Analysis Predictive Maintenance

Data Lake Staging

Multiple compute services (Hadoop, BI, Spark)

There is a shared Data Lake (Shared HDFS)

Page 4: End-to-End Security and Auditing in a Big Data as a Service Deployment

Why BDaaS? – Compute Side Of The Story

• Set of applications that interact with Hadoop keeps growing

• Various versions of the same app/distro run in parallel

• Enterprises have need to scale compute up and down based on usage

• A model similar to Amazon AWS with S3 as storage and applications on EC2

Page 5: End-to-End Security and Auditing in a Big Data as a Service Deployment

Why BDaaS? – Data Side Of The Story

• Production cluster access takes time and is generally restricted

• Staging clusters may not have all the data• Data exists on other storage systems such

as NFS Isilon is common• Users also want to upload arbitrary files

for analysis

Page 6: End-to-End Security and Auditing in a Big Data as a Service Deployment

Hadoop – A Collection Of Services

Hadoop is a collection of storage and compute services such as HDFS, HBase, Hive, Yarn, Solr, Kafka

Page 7: End-to-End Security and Auditing in a Big Data as a Service Deployment

Security In Hadoop • Authenticate user into Hadoop ecosystem

– Each service has its own integration with LDAP/AD for authentication

• Authorize and limit their actions to selected services. Authorization is granted separately for each service. Example:– Folder “/user/customer” in HDFS has ‘r-x’ to user ‘alice’, and ‘-

wx’ to user ‘bob’– Enable column level access to a Hive Table. “Customer.Name”

& “Customer.PhoneNumber” is only accessible by some users and groups

Page 8: End-to-End Security and Auditing in a Big Data as a Service Deployment

Ranger – A Pluggable Security Framework

• Ranger works with a common user DB (LDAP/AD) for authentication • Provides a plug-in for individual Hadoop services to enable

authorization• Allows users to define policies in a central location, using WEB UI or

APIs• Users can define their own plug-in for a custom service and manage

them centrally via Ranger Admin

Page 9: End-to-End Security and Auditing in a Big Data as a Service Deployment

Defining HDFS Ranger Policies

HDFS Policy List

Marketing Policy Drill Down

Page 10: End-to-End Security and Auditing in a Big Data as a Service Deployment

Security Considerations in BDaaS

Data/Storage

Prod

2.2

Dev/Test

2.4

POC

2.3

Prod

2.3

Dev/Test

2.4

MARKETING R&D MANUFACTURING360 Customer View Log Analysis Predictive Maintenance

Data Lake Staging 1. User Identity – Data Lake

2. User Identity - Application Level

3. User Identity propagation to Data Layer

1. User identity within a Data Lake

2. User identity in application layer

3. Prevent data duplication & maintain user integrity across layers

Page 11: End-to-End Security and Auditing in a Big Data as a Service Deployment

1. Securing The Data Lake

LDAPKDCData/Storage

Prod

2.2

Dev/Test

2.4

POC

2.3

Prod

2.3

Dev/Test

2.4

MARKETING R&D MANUFACTURING360 Customer View Log Analysis Predictive Maintenance

Data Lake Staging 1. Authentication & Authorization – Data Lake

2. User Identity - Application Level

3. User Identity propagation to Data Layer

Page 12: End-to-End Security and Auditing in a Big Data as a Service Deployment

2. Securing The App Layer

LDAP

KDCData/Storage

Prod

2.2

Dev/Test

2.4

POC

2.3

Prod

2.3

Dev/Test

2.4

MARKETING R&D MANUFACTURING360 Customer View Log Analysis Predictive Maintenance

Data Lake Staging 1. Authentication & Authorization – Data Lake

2. User Identity - Application Level

3. User Identity propagation to Data Layer

App containers are integrated with LDAP

KDC

AliceBob Tom

Page 13: End-to-End Security and Auditing in a Big Data as a Service Deployment

3. Identity Propagation to Data Layer

LDAP

KDCData/Storage

Prod

2.2

Dev/Test

2.4

POC

2.3

Prod

2.3

Dev/Test

2.4

MARKETING R&D MANUFACTURING360 Customer View Log Analysis Predictive Maintenance

Data Lake Staging 1. Authentication & Authorization – Data Lake

2. User Identity - Application Level

3. User Identity propagation to Data Layer

KDC

AliceBob Tom

Page 14: End-to-End Security and Auditing in a Big Data as a Service Deployment

User Identity Propagation

Two Ways–Users connect directly to HDFS

• Simple Authentication• Kerberos Authentication

–Users connect to HDFS via a Super-user (Impersonation)

Page 15: End-to-End Security and Auditing in a Big Data as a Service Deployment

HDFS Direct Connections

LDAP

KDC

Prod

2.2

Dev/Test

2.4

POC

2.3

Prod

2.3

Dev/Test

2.4

MARKETING R&D MANUFACTURING360 Customer View Log Analysis Predictive Maintenance

KDC

Alice BobTom

HDFSData Lake

Page 16: End-to-End Security and Auditing in a Big Data as a Service Deployment

HDFS Direct Connections..

– hdfs-audit.log

– Ranger policies are enforced for alice and bob as they are the effective users

Page 17: End-to-End Security and Auditing in a Big Data as a Service Deployment

HDFS Direct Connections..

• Single Hadoop Setup– Ideal

• Multi-tenant, Multi-application Setup– Kerberized HDFS needs kerberized compute and services– May not want to kerberize Dev/QA setups– Hadoop versions should be compatible all across– Data duplication

Page 18: End-to-End Security and Auditing in a Big Data as a Service Deployment

HDFS Super-user Connections

• Super-users perform actions on behalf of other users (Impersonation/Proxying)

• Adding a new super-user is easy– core-site.xml

Page 19: End-to-End Security and Auditing in a Big Data as a Service Deployment

HDFS Super-user Connections..

LDAP

KDC

Prod

2.2

Dev/Test

2.4

POC

2.3

Prod

2.3

Dev/Test

2.4

MARKETING R&D MANUFACTURING360 Customer View Log Analysis Predictive Maintenance

KDC

Alice BobTom

HDFSData Lake

DataTap Caching Servicevia – super-user

Page 20: End-to-End Security and Auditing in a Big Data as a Service Deployment

HDFS Super-user Connections..

– hdfs-audit.log

– Ranger Authorization policies still enforced, as alice and bob are effective users

Page 21: End-to-End Security and Auditing in a Big Data as a Service Deployment

HDFS Super-user Connections..

Multi-tenant, Multi-application Setup– Works for applications which don’t support Kerberos (yet)– Dev/Test setups need not be kerberized– DataTap service can abstract version incompatibilities– Can help avoid data duplication– Need tight LDAP/AD integration though!

Page 22: End-to-End Security and Auditing in a Big Data as a Service Deployment

Ranger in Action

Hue Example

Page 23: End-to-End Security and Auditing in a Big Data as a Service Deployment

HDFS Permissions on Data Lake

• Set HDFS file access for ‘/user/secret’ to strict mode

• Set umask to ‘077’

Page 24: End-to-End Security and Auditing in a Big Data as a Service Deployment

HDFS Ranger Policies

Page 25: End-to-End Security and Auditing in a Big Data as a Service Deployment

DataTap Caching Service

Page 26: End-to-End Security and Auditing in a Big Data as a Service Deployment

Create Table via Hue

Page 27: End-to-End Security and Auditing in a Big Data as a Service Deployment

Query table via Hue - Success

Page 28: End-to-End Security and Auditing in a Big Data as a Service Deployment

Query table via Hue - Failure

Page 29: End-to-End Security and Auditing in a Big Data as a Service Deployment

Ranger Audit Logs

Page 30: End-to-End Security and Auditing in a Big Data as a Service Deployment

Key Takeaways

• BDaaS is more than Hadoop-as-a-Service– Includes BI / ETL / Analytics + Data Science tools

• Security is an important consideration in BDaaS• Data duplication is not an option• Global user authentication using a centralized DB like LDAP/AD is a must• Apache Ranger helps in enforcing global policies, provided user identities

are propagated correctly

Page 31: End-to-End Security and Auditing in a Big Data as a Service Deployment

Q & A

www.bluedata.com

Nanda Vijaydev@nandavijaydev

Abhiraj Butala@abhirajbutala