© Cloudera, Inc. All rights reserved.
Cloudera training: secure your Cloudera cluster
© Cloudera, Inc. All rights reserved.
The demand for skills is high and Hadoop is the future. Customers cannot afford to move slowly in staffing their Big Data projects. Customers are building plans to ensure projects are staffed with skilled employees, and supported by a qualified services provider.
Job Trends from Indeed.com
What are you most concerned about when it comes to your readiness for big data and hadoop?
Cloudera MDP webinar poll results, July 2016
© Cloudera, Inc. All rights reserved.
Why Cloudera training?Aligned to best practices and the pace of change
1Broadest range of coursesLearning paths for Developer, Admin, Analyst
2Most experienced instructorsMore than 40,000 trained since 2009
6Widest geographic coverageMost classes offered: 50 cities worldwide plus online
7Most relevant platform & communityCDH deployed more than all other distributions combined
3Leader in certificationOver 12,000 accredited Cloudera professionals
Trusted source for training100,000+ people have attended online courses4
8Depth of training materialHands-on labs and VMs support live instruction
9Ongoing learningVideo tutorials and e-learning complement training
State of the art curriculumCourses updated as Hadoop evolves5 10 Commitment to big data education
University partnerships to teach Hadoop in colleges
© Cloudera, Inc. All rights reserved.
Creating leaders in the fieldTraining enables Big Data solutions and innovation
94%
66%
Would recommend or highly recommend Cloudera training to friends or colleagues
Draw on lessons from Cloudera training on at least a monthly basis
40% Develop new apps or perform business-critical analyses as a result of training alone
Sources: Cloudera Past Public Training Participant Study, December 2012.
Cloudera Customer Satisfaction Study, January 2013.
88% Indicate Cloudera training provided the Hadoopexpertise their roles require
© Cloudera, Inc. All rights reserved.
What is available from Cloudera University?
• Private training: Course delivered at location of customer choice to internal audience
• Public training: Courses regularly scheduled around the globe. Schedule available on web
• Virtual training: Live training accessed via the internet; available for public and private courses
• OnDemand training: Pre-recorded lecture with identical content/exercises as live training options
• Certification: Rigorously developed and meaningful bodies of knowledge
OnDemand Virtual live classroom Private onsitePublic live classroom
© Cloudera, Inc. All rights reserved.
Suggested Cloudera University curricula
Developers
• Python/Scala Training
• Developer for Spark and Hadoop
• CCA: Spark and Hadoop
Developer
• Spark ML & Kafka modules
• Topic specific training (Search,
HBase)
• Hands on practice
• CCP: Data Engineer
Administrators
• Cloudera Administration training
• CCA: Administrator
• Cloudera Security OnDemand
Data Analysts/Data Scientists
• Data Analyst: Using Hive, Pig & Impala
• CCA: Data Analyst
• Cloudera Data Science
7© Cloudera, Inc. All rights reserved.
Security for Hadoop
Carlo Lazzaris | Technical Instructor
8© Cloudera, Inc. All rights reserved.
Security Webinar Agenda
1. The need for Hadoop Security
Hacker news and legal regulations
2. Cloudera Security Implementation
Five levels of security
3. How to secure your Cloudera cluster
Cloudera Documentation
Cloudera professional services
Cloudera OnDemand security course
9© Cloudera, Inc. All rights reserved.
The need for Hadoop security
10© Cloudera, Inc. All rights reserved.
Unguarded data stores are the victims
11© Cloudera, Inc. All rights reserved.
Regulatory Compliance
Organizations can be fined up to 4% of annual global turnover for breaching GDPR
or €20 Million
12© Cloudera, Inc. All rights reserved.
Cloudera security implementation
13© Cloudera, Inc. All rights reserved.
Cloudera Enterprise CDH
13
The modern platform for machine learning and analytics optimized for the cloud
EXTENSIBLE SERVICES
CORE SERVICESDATA
ENGINEERINGOPERATIONAL
DATABASEANALYTIC DATABASE
DATA CATALOG
INGEST & REPLICATION
SECURITY GOVERNANCEWORKLOAD
MANAGEMENT
DATA SCIENCE
S3 ADLS HDFS KUDUSTORAGESERVICES
14© Cloudera, Inc. All rights reserved.
• Unified security – protects sensitive data with consistent
controls, even for transient and recurring workloads
• Consistent governance – enables secure self-service access
to all relevant data and increases compliance
• Easy workload management – increases user productivity and
boosts job predictability
• Flexible ingest and replication – aggregates a single copy of
all data, provides disaster recovery, and eases migration
• Shared catalog – defines and preserves structure and
business context of data for new applications and partner
solutions
Open platform servicesBuilt for multi-function analytics | Optimized for cloud
15© Cloudera, Inc. All rights reserved.
Cloudera Enterprise-Grade Security and Governance
Access
Defining what
users and
applications can
do with data
Technical Concepts:
Permissions
Authorization
Data
Protection
Shielding data in
the cluster from
unauthorized
visibility
Technical Concepts:
Encryption at rest & in
motion
Visibility
Reporting on
where data came
from and how it’s
being used
Technical Concepts:
Auditing
Lineage
Cloudera Manager Apache Sentry Cloudera NavigatorNavigator Encrypt &
Key Trustee
Identity
Validate users by
membership in
enterprise
directory
Technical
Concepts:Authentication
User/group mapping
16© Cloudera, Inc. All rights reserved.
Cloudera Certified Technology Partners
Data Sources Data IngestProcess, Refine
& PrepData Discovery Advanced Analytics
Connected Machines/Data sources
Other Data Sources
17© Cloudera, Inc. All rights reserved.
A certified product ensures it integrates securely
• Authenticate via Kerberos or LDAP
Authentication
• Handle Apache Sentry with Hive, Impala, Search, HDFS
Authorization
• Support HDFS transport encryption, at-rest encryption; support SSL/TLS connection encryption
Encryption
18© Cloudera, Inc. All rights reserved.
Vulnerability Response and Process
Vulnerability reports
Upstream
Internal
External
Fix Publish
19© Cloudera, Inc. All rights reserved.
Cluster Security Levels
20© Cloudera, Inc. All rights reserved.
Cloudera Enterprise
20
The modern platform for machine learning and analytics optimized for the cloud
21© Cloudera, Inc. All rights reserved.
Enterprise Encryption Performance
23© Cloudera, Inc. All rights reserved.
Disclaimer
This talk serves as a general guideline for
security implementation on Hadoop.
The actual implementation procedures and
scope of implementation vary on a case-by-
case basis, and should be assessed by
Cloudera’s Professional Services team or
certified Cloudera SI Partners.
24© Cloudera, Inc. All rights reserved.
Non-secure #0Data Free for All
25© Cloudera, Inc. All rights reserved.
Firewall
ActiveDirectory/KDC
Hadoop cluster
Cloudera Manager
Gateway node
Cloudera Worker nodesDatacenter
Applications
26© Cloudera, Inc. All rights reserved.
4 modes of Identity Management
1. Simple Authentication2. Kerberos3. LDAP4. SAML
File group ownership• AD integration• SSSD or CentrifyConsideration in large enterprises.
via SSSD
via
27© Cloudera, Inc. All rights reserved.
Simple Authentication detect the user
Firewall
ActiveDirectory
Master
Worker Worker Worker
Cloudera Manager
Master
(SSSD/Centrify)
28© Cloudera, Inc. All rights reserved.
Simple authentication =
no authentication
29© Cloudera, Inc. All rights reserved.
Minimal Security #1
Reduce Risk Exposure
30© Cloudera, Inc. All rights reserved.
How it works: Authentication
• LDAP and SAML authentication options
Web UIs
• LDAP/AD and Kerberos authentication options
SQL Access
•Kerberos authentication
•Automation provided by Cloudera Manager to leverage Active Directory (AD)
Command Lines
User authenticates to AD or KDC
Authenticated user gets Kerberos Ticket
Ticket grants access to Services e.g. Impala
User [ssmith]
Password [***** ]
31© Cloudera, Inc. All rights reserved.
Kerberos
EXAMPLE.COM
KDC
Hadoop
user
Strong Authentication
KDC Key Distribution Center
• MIT
• ActiveDirectory (more common)
realmprimary
32© Cloudera, Inc. All rights reserved.
Kerberos
Consideration in large corporates
Time synchronization
CM Kerberos Wizard
• Configure AD to create a Kerberos
principal for CM server, and to
delegate CM the ability to
create/manage Kerberos
principals
33© Cloudera, Inc. All rights reserved.
Kerberos
Consideration in large corporates
Time synchronization
CM Kerberos Wizard
• Configure AD to create a Kerberos
principal for CM server, and to
delegate CM the ability to
create/manage Kerberos
principals
34© Cloudera, Inc. All rights reserved.
Kerberos Authentication
* LDAP over SSL
35© Cloudera, Inc. All rights reserved.
Authorization/Access Control
HDFS File ACL YARN job submission
Hbase ACLs Oozie ACL
Access Control List (ACLs)
Hive
Sentry Managed
(RBAC)
Impala
36© Cloudera, Inc. All rights reserved.
Auditing
37© Cloudera, Inc. All rights reserved.
Backup/Disaster Recovery
Cloudera Backup/Disaster Recovery (BDR)
• A high performance data replicator
• Copies incremental data on the source cluster at specified schedules
Supports
Kerberos
Data encryption
HDFS replication to cloud
38© Cloudera, Inc. All rights reserved.
Kerberized BDR Best Practice
Production DR
Cloudera BDRPROD.EXAMPLE.COM
Cross-realm trustKDC KDC
DR.EXAMPLE.COM
39© Cloudera, Inc. All rights reserved.
More Security #2
Managed, Secure, Protected
40© Cloudera, Inc. All rights reserved.
Data In-Motion Encryption
RPC encryption
Data transport encryption
• Supports AES CTR, up to 256-bit
key length
HTTP TLS/SSL encryption
• No self-signed certificates in
production
Master
Worker Worker Worker
Master
Application
RPC encryption
Transport encryption
TLS/SSL
41© Cloudera, Inc. All rights reserved.
Data At-Rest Encryption
Transparent encryption
Supports any Hadoop applications
Encryption Zone
$ hadoop key create mykey
$ hadoop fs -mkdir /zone
$ hdfs crypto -createZone -keyName mykey -path /zone
/
/tmp /zone
foo bar
Encryption zone
42© Cloudera, Inc. All rights reserved.
Key Management Server Deployment (non-prod)
HDFS NameNode
Client
Java Keystore
KMSKeystore file
Separation of duties
• Encryption Zone Key (EZK) is stored in
KMS server
• HDFS super user can not decrypt files
43© Cloudera, Inc. All rights reserved.
Key Management Server/Key Trustee Server Deployment
HDFS NameNode
ClientKey Trustee
KMS
Key Trustee KMS
Firewall
Key Trustee Server
(Active)
Key Trustee Server
(Passive)
synchronization
(or more)
44© Cloudera, Inc. All rights reserved.
KMS+KTS+HSM Deployment
HDFS NameNode
Client HSM KMS
HSM KMS
Firewall
Key Trustee Server
(Active)
Key Trustee Server
(Passive)
synchronization
Key HSM
(or more)
Key HSM
HSM
HSM
45© Cloudera, Inc. All rights reserved.
Troubleshooting: Encryption Performance Anomaly
• Configuration
• AES-NI Hardware acceleration
• OpenSSL library
• Entropy
46© Cloudera, Inc. All rights reserved.
Fine Grained Access Control with Apache Sentry
47© Cloudera, Inc. All rights reserved.
Most Security #3
Secure Data Vault
48© Cloudera, Inc. All rights reserved.
Level 3 Secure Data Vault
• All data, both data-at-rest and data-in-transit is encrypted
• Key management system is fault-tolerant
• Auditing mechanisms comply with industry, government, and regulatory
standards (PCI, HIPAA, NIST, for example)
• Auditing extends from EDH to the other systems that integrate with it.
• Cluster administrators are well-trained
• Security procedures have been certified by an expert
• Cluster can pass technical review
49© Cloudera, Inc. All rights reserved.
Data Redaction
Personal Identifiable Information
• PCI-DSS, HIPAA
Best practices followed
Password
• stores in credential files, not in configuration
Log, queries
• Cloudera Manager
50© Cloudera, Inc. All rights reserved.
Full Encryption
Encrypt Data Spills
• MapReduce
• Impala
• Hive
• Flume
OS-level encryption
• Navigator Encrypt
51© Cloudera, Inc. All rights reserved.
How to secure your Cloudera cluster
52© Cloudera, Inc. All rights reserved.
Cloudera Documentation
53© Cloudera, Inc. All rights reserved.
Cloudera Professional Services security engagement
• Review security requirements and provide an overview of data security policies
• Audit architecture and current systems for security policies and best practices
• Custom tailor a security reference architecture
• Optimize OS and Java to take advantage of hardware-based crypto-acceleration
• Install and configure Kerberos with MIT Kerberos KDC or Active Directory
• Install and configure Sentry and Cloudera Navigator (license required)
• Install and configure Navigator Encrypt and Key Trustee with an HSM root of trust
• Review fine-grain permissions on sample data using Sentry
• Review audit and lineage on sample data using Navigator
• Use Cloudera Manager and Hue to review security integration for users
• Enable and configure HDFS encryption
https://www.cloudera.com/more/services-and-support/professional-services/security-integration-pilot.html
54© Cloudera, Inc. All rights reserved.
Cloudera online ondemand security course
• Online self paced training course https://ondemand.cloudera.com
• Launch planned for mid Feb 2018
• 3 days estimate worth of content at Cloudera level 1 and 2 security level
• Currently 375~ slides with 9 detailed chapters and 16 instructor demonstrations :
1. Security overview
2. Security Architecture
3. Host Security
4. Encrypting Data in motion
5. Authentication
6. Authorization
7. Encrypting Data at Rest
8. Auditing
9. Additional Considerations: Data Governance
55© Cloudera, Inc. All rights reserved.
Ondemand security course instructor guided demos
1. Potential Attack vectors
2. Securing the cluster hosts
3. Generating and managing keys for TLS
4. Configuring Cloudera Manager for TLS
5. Encrypting Data in Motion
6. Hadoop default authentication
7. Kerberizing Cluster with MIT Kerberos
8. Kerberizing Cluster with Active Directory
9. Configuring Authorising with Cloudera
Manager
10. Controlling access to Yarn
11. Controlling access to HDFS
12. Controlling access to Tables
13. Enabling HDFS Encryption
14. Protecting local data with NavEncrypt
15. Using Navigator for auditing
16. Reassessing cluster security
56© Cloudera, Inc. All rights reserved.
Ondemand security course disclaimer
THIS IS REALLY IMPORTANT:
The examples in this course are based on CM/CDH 5.12, running in a cloud-based deployment on a
cluster using the CentOS 7.2 operating system.
Given the almost limitless permutations of possible configurations, including different versions of CDH,
Cloudera Manager, operating systems, directory servers, Kerberos servers, web browsers, and other
tools, as well as variations in policies, laws, and practices that affect each organization differently, it's
impossible for a training course to cover all aspects of security.
This course is meant to provide a background that will help you to understand many important concepts
and techniques, but is not intended as a replacement for the relevant documentation or a consulting
engagement with an expert who can provide advice based on your specific requirements.
• Disclaimers ~ due to security variety and permutations
• Versions used: CDH 5.12 and Centos 7.2
57© Cloudera, Inc. All rights reserved.
Ondemand security course scenario
• Many of our demonstrations are based on a hypothetical scenario
• However, the concepts should apply to nearly any organization
• Loudacre Mobile is a fast-growing wireless carrier
• Employees serving in a variety of roles
• Data ingested from many sources, in many formats
• Data processed by many tools
58© Cloudera, Inc. All rights reserved.
Ondemand security course environment
59© Cloudera, Inc. All rights reserved.
Comprehensive demonstration cluster
60© Cloudera, Inc. All rights reserved.
Sample chapter structure: Encrypting Data in Motion
• Encryption Fundamentals
• Certificates
• Key Management
Instructor-Led Demonstration: Generating and Managing Keys for TLS
• Configuring Cloudera Manager for TLS
Instructor-Led Demonstration: Configuring Cloudera Manager for TLS
• Encrypting Hadoop’s Data in Motion
Instructor-Led Demonstration: Encrypting Hadoop’s Data in Motion
• Essential Points
61© Cloudera, Inc. All rights reserved.
Register your interest forOnDemand security course:
© Cloudera, Inc. All rights reserved.
Thank you