Upload
robert-gibbon
View
178
Download
0
Embed Size (px)
Citation preview
Big Data Consultingdoing hadoop, securely
Rob Gibbon■ Architect @Big Industries Belgium■ Focus on designing, deploying & integrating web
scale solutions with Hadoop■ Deliveries for clients in telco, financial services &
media
Hadoop was built to survive data tsunamis
■ a response to challenges that enterprise vendors were unable to address
■ focused on data volumes and cost reduction■ initially, the solution had some serious holes
Confidentiality, Integrity, Availability
■ early prereleases couldn’t really meet any of these three fundamental infosec objectives
■ basic controls weren’t there
the early days■ Multiple SPoF■ No authentication■ Easily spoofed authorisation■ No encryption of data at rest nor in transit■ No accounting
enter the hadoop vendors■ Vendors like Cloudera focus on making Apache
Hadoop “enterprise ready”■ Includes building robust infosec controls into
Hadoop core■ Multilayer security is now available for Hadoop
running a cluster in non-secure mode
■ malicious|mistaken user:■ recursively delete all the data please■ by the way, I’m the system superuser
■ hadoop:■ oh ok then
bad things happen with slack controls in place
average cost of a data breach = $3.8m
running a secure cluster
■ Kerberos is one of the primary security controls you can use
■ Btw, what’s wrong with this kerberos principal?■ [email protected]
kerberos continued
■ Kerberos uses a three-part principal■ hdfs/[email protected]■ hdfs/[email protected]
■ Best to use explicit mappings from kerberos principals to local users
hive / impala■ HiveServer doesn’t support Kerberos => use HiveServer2■ Best to use Sentry to enforce role based access controls from
SQL■ Users can upload and execute arbitrary [possibly hostile] UDFs
=> enable Sentry■ Older versions of Metastore don’t enforce permissions on
grant_* and revoke_* APIs => stay up to date
availability■ Most core components now support HA
■ HDFS■ YARN■ Hive■ Hbase
disaster recovery■ HDFS and HBase offer point-in-time snapshots
■ => consistentency!■ Vendor-tethered solutions for site-to-site replication
are available
encryption at rest■ HDFS encryption zones
■ transparent to existing applications■ minimal performance overhead on Intel
architecture■ key management is externalised
wire encryption■ SSL encryption is now available for most Hadoop
services■ Note that AES-256 for SSL and for Kerberos preauth
requires extra JCE policy files on the cluster
accounting
■ Vendor-tethered solutions are available for auditing■ Navigator for Cloudera clusters■ Ranger for HortonWorks clusters
tokenization
■ The process of substituting a sensitive data element with a non-sensitive equivalent
■ 3rd Party vendor solutions are available that integrate well with Hadoop
some places where there’s still some work to do
■ Setting up hadoop security controls is complex and time consuming
■ Not much support for SELinux around here■ No general, coherent, policy-based framework for controlling
resource access demands■ Apache Knox is a starting point■ => network and host resource access?
Integration■ Integrating hadoop into an organisation’s services environment
needs careful planning■ Hadoop can conflict with established governance policies
■ system accounts & privileges■ remote access■ firewall flows■ domains and trust■ etc.
layered security in hadoop-core■ Authentication: Kerberos■ Authorisation: Local unix group or LDAP mappings■ Authorisation: Sentry RBACS for hive/impala■ Encryption: HDFS encryption■ Encryption: SSL encryption for most services■ Availability: Active/Passive failover HDFS, YARN, Hbase■ Integrity: HDFS block replication & CRC checksum
but what about poodle/heartbleed/shellshock/whatever...
■ underlines the need for a mature information security governance strategy & architecture
defence-in-depth
■ A layered security architecture for Hadoop clusters is doable
■ eg. MasterCard’s Cloudera Hadoop cluster achieved PCI compliance in 2014 http://goo.gl/FP5DUt
thanks for listeningbe.linkedin.com/in/robertgibbon
www.bigindustries.be