Hadoop & Security - Past, Present, Future

  • Published on
    05-Aug-2015

  • View
    197

  • Download
    3

Embed Size (px)

Transcript

1. Hadoop & Security Past, Present, Future uweseiler 2. Page2 About me Big Data Nerd TravelpiratePhotography Enthusiast Hadoop TrainerData Architect 3. Page3 Agenda Past Present Authentification Authorization Auditing Data Protection Future 4. Page4 Past 5. Page5 Hadoop & Security 2010 Owen OMalley @ Hadoop Summit 2010 http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010 6. Page6 Hadoop & Security 2010 Owen OMalley @ Hadoop Summit 2010 http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010 7. Page7 Hadoop & Security (Not that long ago) Hadoop Cluster User SSH hadoop fs -put SSH Gateway /user/uwe/ 8. Page8 Present 9. Page9 Security in Hadoop 2015 Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion Kerberos in Native Apache Hadoop HTTP/REST API Secured with Apache Knox Gateway Authentication Who am I/prove it? Wire encryption in Hadoop File Encryption Built-in since Hadoop 2.6 Partner tools HDFS, YARN, MapReduce, Hive & HBase Storm & Knox Fine grain access control Centralized audit reporting Policy and access history Centralized Security Administration 10. Page10 Typical Flow - Hive Access with Beeline CLI HDFS HiveServer 2 A B C Beeline Client 11. Page11 Typical Flow - Authenticate trough Kerberos HDFS HiveServer 2 A B C Beeline Client KDC Use Hive, submit query Hive gets NameNode (NN) Service Ticket Hive creates MapReduce/Tez job using NN Client gets Service Ticket for Hive 12. Page12 Typical Flow - Authorization through Ranger HDFS HiveServer 2 A B C Beeline Client KDC Use Hive, submit query Hive gets NameNode (NN) Service Ticket Hive creates MapReduce/Tez job using NN Client gets Service Ticket for Hive Ranger 13. Page13 Typical Flow - Perimeter through Knox HDFS HiveServer 2 A B C Beeline Client KDC Hive gets NameNode (NN) Service Ticket Knox gets Service Ticket for Hive Ranger Client gets query result Original request with user id/password Knox runs as proxy user using Hive Hive creates MapReduce/Tez job using NN 14. Page14 Typical Flow - Wire & File Encryption HDFS HiveServer 2 A B C Beeline Client KDC Hive gets NameNode (NN) Service Ticket Hive creates MapReduce/Tez job using NN Knox gets Service Ticket for Hive Ranger Knox runs as proxy user using Hive Original request with user id/password Client gets query result SSL SSL SASL SSL SSL 15. Page15 Authentication Kerberos 16. Page16 Kerberos Synopsis Client never sends a password Sends a username + token instead Authentication is centralized Key Distribution Center (KDC) Client will receive a Ticket- Granting-Ticket Allows authenticated client to request access to secured services Clients establish a timed session Clients establish trust with services by sending KDC- stamped tickets to service 17. Page17 Kerberos + Active Directory/LDAP Cross Realm Trust Client Hadoop Cluster AD / LDAP KDC Hosts: host1@HADOOP.EXAMPLE.COM Services: hdfs/host1@HADOOP.EXAMPLE.COM User Store Use existing directory tools to manage users Use Kerberos tools to manage host + service principals Authentication Users: seiler@EXAMPLE.COM 18. Page18 Ambari & Kerberos Install & Configure Kerberos Server on a single node Client on rest of the nodes Define Principals & Keytabs A keytab (key table) is a file containing a key for a principal Since there are a few dozen principals, Ambari can generate keytab data for your entire cluster as a downloadable csv file Configure User Permissions 19. Page19 Perimeter Security Apache Knox 20. Page20 Load Balancer Knox: Core Concept Data Ingest ETL SSH RPC Call Falcon Oozie Scoop Flume Admin / Data Operator Business User Hadoop Admin JDBC/ODBCREST/HTTP Hadoop Cluster HDFS Hive App XApp CApplication Layer REST/HTTP Edge Node 21. Page21 Knox: Hadoop REST API Service Direct URL Knox URL WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie HBase http://hbasehost:60080 https://knox-host:8443/hbase Hive http://hivehost:10001/cliservice https://knox-host:8443/hive YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager Masters could be on many different hosts One host, one port Consistent paths SSL config at one host 22. Page22 Knox: Features Simplified Access Kerberos Encapsulation Single Access Point Multi-cluster support Single SSL certificate Centralized Control Central REST API auditing Service-level authorization Alternative to SSH edge node Enterprise Integration LDAP / AD integration SSO integration Apache Shiro extensibility Custom extensibility Enhanced Security Protect network details SSL for non-SSL services WebApp vulnerability filter 23. Page23 Knox: Architecture REST Client Enterprise Identity Provider Knox Firewall Firewall DMZ L B Edge Node / Hadoop CLIs RPC HTTP Slaves RM NN Web HCat Oozie DN NM HS2 HBase Knox Knox Masters Slaves Hadoop Cluster 1 Slaves RM NN Web HCat Oozie DN NM HS2 HBaseMasters Slaves Hadoop Cluster 2 24. Page24 Knox: Whats New in Version 0.6 Knox support for HDFS HA Support for YARN REST API Support for SSL to Hadoop Cluster Services (WebHDFS, HBase, Hive & Oozie) Knox Management REST API Integration with Ranger for Knox Service Level Authorization Use Ambari for install/start/stop/configuration 25. Page3 Agenda Past Present Authentification Authorization Auditing Data Protection Future 26. Page26 The Hadoop Layers 27. Page27 Authorization: Overview HDFS Permissions ACLs YARN Queue ACLs Pig No server component to check/enforce ACLs Hive Column level ACLs HBase Cell level ACLs 28. Page28 Authorization: HDFS Permissions hadoop fs -chown maya:sales /sales-data hadoop fs -chmod 640 /sales-data 29. Page29 Authorization: HDFS ACLs New Requirements: Maya, Diana and Clark are allowed to make modifications New group execs should be able to read the sales data 30. Page30 Authorization: HDFS ACLs hdfs dfs -setfacl -m group:execs:r-- /sales-data hdfs dfs -getfacl /sales-data hadoop fs -ls /sales-data 31. Page31 Authorization: HDFS Best Practices Start with traditional HDFS file permissions to implement most permission requirements Define a small number of ACLs to handle exceptional cases A file/folder with ACL incurs an additional cost in memory in the NameNode compared to a file/folder with traditional permissions 32. Page4 Past 33. Page33 Authorization: Hive Hive has traditionally offered full table access control via HDFS access control Solution for column-based control Let HiveServer2 check and submit the query execution Let the table accessible only by a special (technical) user Provide an authorization plugin to restrict UDFs and file formats Use standard SQL permission constructs GRANT / REVOKE Store the ACLs in Hive Metastore 34. Page34 Authorization: Hive ATZ-NG Details: https://issues.apache.org/jira/browse/HIVE-5837 35. Page35 Authorization: Hive CREATE ROLE sales_role; GRANT ALL ON DATABASE sales-data TO ROLE sales_role; GRANT SELECT ON DATABASE marketing-data TO ROLE sales_role; CREATE ROLE sales_column_role; GRANT c1,c2,c3 to sales_column_role; GRANT SELECT(c1, c2, c3) on secret_table to sales_column_role; 36. Page36 Authorization: Pig There is no Pig (or MapReduce) Server to submit and check column-based access Pig (and MapReduce) is restricted to full data access via HDFS access control 37. Page37 Authorization: HBase The HBase permission model traditionally supported ACLs defined at the namespace, table , column family and column level This is sufficient to meet most requirements Cell-based security was introduced with HBase 0.98 On par with the security model of Accumolo 38. Page38 Authorization & Auditing Apache Ranger 39. Page5 Hadoop & Security 2010 Owen OMalley @ Hadoop Summit 2010 http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010 40. Page40 Ranger: Authorization Policies 41. Page41 Ranger: Auditing 42. Page42 Ranger: Architecture 43. Page43 Ranger: Whats New in Version 0.4? New Components Coverage Storm Authorization & Auditing Knox Authorization & Auditing Deeper Integration with HDP Windows Support Integration with Hive Auth API, support grant/revoke commands Support grant/revoke commands in HBase Enterprise Readiness Rest APIs for policy manager Store Audit logs locally in HDFS Support Oracle DB Ambari support, as part of Ambari 2.0 release 44. Page44 Data Protection Encryption 45. Page45 Encryption: Data in motion Hadoop Client to DataNode via Data Transfer Protocol Client reads/writes to HDFS over encrypted channel Configurable encryption strength ODBC/JDBC Client to HiveServer2 Encryption via SASL Quality of Protection Mapper to Reducer during Shuffle/Sort Phase Shuffle is over HTTP(S) Supports mutual authentification via SSL Host name verification enabled REST Protocols SSL Support 46. Page46 Encryption: Data at rest HDFS Transparent Data Encryption Install and run KMS on top of HDP 2.2 Change according HDFS parameters (via Ambari) Create encryption key hadoop key create key1 -size 256 hadoop key list metadata Create an encryption zone using the key hdfs dfs -mkdir /zone1 hdfs crypto -createZone -keyName key1 /zone1 hdfs listZones Details: http://hortonworks.com/kb/hdfs-transparent-data-encryption/ 47. Page47 Future 48. Page48 Apache Atlas: Data Classification Currently in Incubation https://wiki.apache.org/incubator/AtlasProposal 49. Page49 Apache Atlas: Tag-based Policies HDFS HiveServer 2 A B C Beeline Client RangerMetadata Server Data Classification Table1|marketing Tag Policy Logs IT-Admin Create Data Ingestion / ETL Falcon Oozie Source Data Scoop Flume 50. Page50 Future: More goodies Dynamic, Attribute based Access Control (ABAC) Extend Ranger to support data or user attributes in policy decisions Example: Use geo-location of users Enhanced Auditing Ranger can stream audit data through Kafka&Storm into multiple stores Use Storm for correlation of data Encryption as First Class Citizen Build native encryption support in HDFS, Hive & HBase Ranger-based key management to support encryption 51. Page51 Contact Details Twitter: @uweseiler uwe.seiler@codecentric.de Mail: uwe.seiler@codecentric.de Phone +49 176 1076531 XING: https://www.xing.com/profile/Uwe_Seiler

Recommended

View more >