Upload
rommel-garcia
View
585
Download
1
Embed Size (px)
Citation preview
1
Open Source Security Tools For Big DataRommel Garcia@rommelgarciaHortonworks
2
# whoami
Global Security SME Lead @hortonworks Senior Solutions Engineer @hortonworks Book Author – Virtualizing Hadoop Co-organizer of Atlanta Hadoop User Group Regular Speaker at Big Data Conferences
Big Data Landscape
4
DATA – More Volume and More Types
I N C R E A S I N G D A T A V A R I E T Y A N D C O M P L E X I T Y
USER GENERATED CONTENT
MOBILE WEB
SMS/MMS
SENTIMENT
EXTERNAL DEMOGRAPHICS
HD VIDEO
SPEECH TO TEXT
PRODUCT/SERVICE LOGS
SOCIAL NETWORK
BUSINESS DATA FEEDS
USER CLICK STREAM
WEB LOGS
OFFER HISTORY DYNAMIC PRICING
A/B TESTING
AFFILIATE NETWORKS
SEARCH MARKETING
BEHAVIORAL TARGETING
DYNAMIC FUNNELSPAYMENTRECORD
SUPPORT CONTACTS
CUSTOMER TOUCHESPURCHASE DETAIL
PURCHASERECORD
SEGMENTATIONOFFER DETAILS
P E TA B Y T E S
T E R A B Y T E S
G I G A B Y T E S
E X A B Y T E S
E R P
B I G D ATA
W E B
C R M
5
Big Data Ecosystem
Big Data Platform
DATA REPOSITORIES
Risk modelingFraud detectionCompliance (AML, KYC)Bank 3.0
Information securitySingle view of customerTrading applicationsMarket data management
ANALYSIS & VISUALIZATION
Secu
rity
Ope
ratio
ns
Gove
rnan
ce&
Inte
grati
on
°1 ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° N
YARN : Data Operating System
Script SQL NoSQL Stream Search Others
HDFS (Hadoop Distributed File System)
In-Mem
TRADITIONAL SOURCES
EDW
OLAP Datamarts
Column Databases
CRM
RDBMS
LENDING MARKETS TRADES COMPLIANCE DATA
CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA
EMERGING & NON-TRADITIONAL SOURCES
SERVER LOGS CALL CENTER EMAILS WORD DOCUMENTS
LOCATION DATA SENSOR DATA CUSTOMER SENTIMENT
RESEARCH REPORTS
6
• HIPAA - Health Insurance Portability and Accountability Act of 1996 • HITECH - The Health Information Technology for Economic and Clinical Health Act• PCI DSS - Payment Card Industry Data Security Standard• SOX - The Sarbanes-Oxley Act of 2003• ISO - International Organization Standardization• COBIT - Control Objectives for Information and Related Technology
• Corporate Security Policies
Compliance Adherences
Big Data Security
8
• Authentication• Authorization• Audit• Data at rest/in-motion Encryption• Centralized Administration
5 Pillars of Security
9
Big Data Ecosystem
Big Data Platform
DATA REPOSITORIES
Risk modelingFraud detectionCompliance (AML, KYC)Bank 3.0
Information securitySingle view of customerTrading applicationsMarket data management
ANALYSIS & VISUALIZATION
Secu
rity
Ope
ratio
ns
Gove
rnan
ce&
Inte
grati
on
°1 ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° N
YARN : Data Operating System
Script SQL NoSQL Stream Search Others
HDFS (Hadoop Distributed File System)
In-Mem
TRADITIONAL SOURCES
EDW
OLAP Datamarts
Column Databases
CRM
RDBMS
LENDING MARKETS TRADES COMPLIANCE DATA
CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA
EMERGING & NON-TRADITIONAL SOURCES
SERVER LOGS CALL CENTER EMAILS WORD DOCUMENTS
LOCATION DATA SENSOR DATA CUSTOMER SENTIMENT
RESEARCH REPORTS
1
1 Knox2 Kerberos3 Ranger4 HDFS Enc.5 LDAP
2
3
4
5
10
• Authentication -> Knox, Kerberos• Authorization -> Ranger• Audit -> Ranger• Data Protection -> HDFS Encryption, Wire Encryption• Centralized Administration -> Ranger
5 Pillars of Security
11
Knox
12
Why Knox?
Simplified Access
•Kerberos encapsulation •Extends API reach•Single access point•Multi-cluster support•Single SSL certificate
Centralized Control
• Central REST API auditing• Service-level authorization• Alternative to SSH “edge node”
Enterprise Integration
•LDAP integration•Active Directory integration•SSO integration•Apache Shiro extensibility•Custom extensibility
Enhanced Security
• Protect network details• Partial SSL for non-SSL services• WebApp vulnerability filter
13
Knox Deployment with Hadoop Cluster
Application Tier
DMZ
Switch Switch
….Master Nodes
Rack 1
Switch
NN
SNN
….Slave Nodes
Rack 2
….Slave Nodes
Rack N
SwitchSwitch
DN DN
Web Tier
LB
Knox
Hadoop CLIs
14
REST API
HadoopServices
What does Perimeter Security really mean?
Gateway
Firewall
User
Firewall required at perimeter
(today)Knox Gateway
controls all Hadoop REST API access through
firewall
Hadoop cluster mostly
unaffected
Firewall only allows connections
through specific ports from Knox
host
Hive Host
HBase Host
WebHDFS
HBase HostHBase Host
REST API
15
Kerberos
16
Why Kerberos?
Provides Strong Authentication
Establishes identity for users, services and hosts
Prevents impersonation on unauthorized account
Supports token delegation model
Works with existing directory services
Basis for Authorization
Page 16
17
Don’t be afraid of Kerberos…..
18
Security Implications
$ whoamibaduser$ hadoop fs -ls /tmpFound 2 itemsdrwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hivedrwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure$ hadoop fs -ls /tmp/securels: Permission denied: user=baduser, access=READ_EXECUTE, inode="/tmp/secure":hdfs:hdfs:drwx------
Good right?
19
Security Implications
$ whoamibaduser$ hadoop fs -ls /tmpFound 2 itemsdrwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hivedrwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure$ hadoop fs -ls /tmp/securels: Permission denied: user=baduser, access=READ_EXECUTE, inode="/tmp/secure":hdfs:hdfs:drwx------
Good right? – Look Again!!!$ HADOOP_USER_NAME=hdfs hadoop fs -ls /tmp/secureFound 1 itemsdrwxr-xr-x - hdfs hdfs 0 2015-07-14 20:35 /tmp/secure/blah
20
Kerberos Primer
Page 20
Client
KDC
NN
DN
1. kinit - Login and get Ticket Granting Ticket (TGT)
3. Get NameNode Service Ticket (NN-ST)
2. Client Stores TGT in Ticket Cache
4. Client Stores NN-ST in Ticket Cache
5. Read/write file given NN-ST and file name; returns block locations, block IDs and Block Access Tokens
if access permitted
6. Read/write block givenBlock Access Token and block ID
Client’sKerberos Ticket
Cache
21
Ranger
22
Plugin PluginPlugin PluginPlugin Plugin
Apache Ranger authZ Architecture
Hive YARN Knox Storm Solr Kafka
Plugin
HDFS
Plugin
Audit Server Policy Server
Administration Portal
REST APIs
DB
SOLR
HDFS
KMS
LDAP/AD
user/group syncLog4j
HBase
23
Sample Simplified Workflow - HDFS
Policy Manager
Plugin
Admin sets policies for HDFS files/folder
Data scientist runs a map reduce job
User Application
Users access HDFS data through application Name Node
IT users access HDFS through CLI
Namenode usesPlugin for Authorization
Audit Database Audit logs pushed to DB
Namenode provides resource access to user/client
1
2
2
2
3
4
5
24
Ranger Stacks
• Apache Ranger v0.5 supports stack-model to enable easier onboarding of new components, without requiring code changes in Apache Ranger.
Ranger Side Changes
Define Service-type
Secured Components Side Changes
Develop Ranger Authorization Plugin• Create a JSON file with following
details :- Resources- Access types- Config to connect
• Load the JSON into Ranger.
• Include plugin library in the secure component. • During initialization of the service: Init RangerBasePlugIn &
RangerDefaultAuditHandler class. • To authorize access to a resource: Use
RangerAccessRequest.isAccessAllowed()• To support resource lookup: Implement
RangerBaseService.lookupResource() & RangerBaseService.validateConfig()
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207
25
HDFS Encryption
26
Data Protection
Hadoop allows you to apply data protection policy at two different layers across the Hadoop stack
Layer What? How ?
Storage Encrypt data in diskVolume level: LUKS (Linux), BitLocker (Windows)Native in Hadoop: HDFS EncryptionPartners: Voltage, Protegrity, DataGuise, VormetricOS level encrypt
Transmission Encrypt data as it moves Native in Hadoop: SSL & SASLAES 256 for SSL & DTP with SASL
27
Data at rest Encryption Protection
Volume Level Encryption (Open Source - LUKS, DMCrypt)
OS File Level Encryption (Open Source - eCryptfs)
Hadoop Level Encryption (HDFS TDE*, Hive CLE**, HBase** )
28
1
°
°
°
°
° °
° °
° °
° °
° N°
HDFS Encryption – How it works
DATA ACCESS
DATA MANAGEMENT
1 ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
SECURITY
YARN
HDFS Client
° ° ° ° ° °
° ° ° ° ° °
° °
° °
° °
° °
°HDFS (Hadoop Distributed File System)
Encryption Zone (attributes - EZKey ID, version)
HDFS-6134
Encrypted File(attributes - EDEK, IV)
Name Node
KeyProviderAPI
KeyProvider API
Key Management System (KMS)Hadoop-10433
KeyProvider API – Hadoop-10141
EDEK
DEK
Crypto Stream
(r/w with DEK)DEKs EZKs
Acronym Description
EZ Encryption Zone (an HDFS directory)
EZK Encryption Zone Key; master key associated with all files in an EZ
DEK Data Encryption Key, unique key associated with each file. EZ Key used to generate DEK
EDEK Encrypted DEK, Name Node only has access to encrypted DEK.
IV Initialization Vector
EDEK
EDEK
29
As HDFS Admin
HDFS Encryption – Common Commands
• Run KMS Server– ./kms.sh run
• Create Encryption Key– hadoop key create key1 -size 128 – # Key size can be 128, 192 or 256. 256 requires unlimited strength JCE file.
• List all Encryption Keys– hadoop key list –metadata
• As an Admin(hdfs user) create an encryption Zone– hdfs crypto -createZone -keyName key1 -path /secure1 – Point to an existing & empty directory
• List all Encryption Zones– hdfs crypto –listZones
• Read/Write to HDFS unchanged– hdfs dfs -copyFromLocal /tmp/vinay.txt /secure1– hdfs dfs -cat /securehive/sal.txt
Run this as user not in HDFS admin role
As HDFS End-user
30
Encrypting Data In-Motion
Page 30
Protocol Communication Point Encryption Mechanism
• REST • WebHDFS (Client to Cluster)• Client to Knox
• REST over SSL• Knox Gateway SSL• SPNEGO - provides a mechanism for extending Kerberos to
Web applications through the standard HTTP protocol
• HTTP • NameNode/JobTracker UI• MapReduce Shuffle
• HTTPS• Encrypted MapReduce Shuffle (MAPREDUCE-4117)
• RPC • Hadoop Client (Client to Cluster, Intra-Cluster)
• SASL – The Hadoop RPC system implements SASL which provides different QoP including encryption
• JDBC/ODBC • HiveServer2 • SSL
• TCP/IP • Data Transfer (Client to Cluster, Intra-Cluster)
• Encrypted DataTransfer Protocol available in Hadoop• Adding SASL support to the DataTransferProtocol
Real-world Implementation
32
Data Sources
Data Sources
33
Thank You !