35
© 2014 MapR Technologies 1 ® © 2014 MapR Technologies Keys Botzum, Senior Principal Technologist July 2014

Securing Hadoop by Sr. Principal Technologist Keys Botzum

Embed Size (px)

DESCRIPTION

Presentation given at the Hadoop DC meetup on July of 2014. Keys presented on why Hadoop should be secured, what it takes, and how to get security beyond the core. Keys Botzum - Senior Principal Technologist at MapR Keys Botzum has over 15 years of experience in large scale distributed system design. At MapR his primary responsibility is working with customers as a consultant, but he also teaches classes, contributes to documentation, and works with MapR engineering. Previously he was a Senior Technical Staff Member with IBM and a respected author of many articles on WebSphere Application Server as well as a book. He holds a Masters degree in Computer Science from Stanford University and a B.S. in Applied Mathematics/Computer Science from Carnegie Mellon University.

Citation preview

Page 1: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 1

®

© 2014 MapR Technologies

Keys Botzum, Senior Principal Technologist July 2014

Page 2: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 2

Agenda •  What’s MapR •  Why Secure Hadoop •  Securing MapR Hadoop •  Security beyond the core

Page 3: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 3

MapR Data Platform

Man

agem

ent

MapR Data Platform MAPR-DB MAPR-FS

APACHE HADOOP AND OSS ECOSYSTEM

Hue ... Shark Impala Drill Hive/

Stinger/ Tez

Sqoop

Storm Sentry Spark Solr Cascading Mahout Flume

Oozie HBase MapReduce YARN Pig Whirr Zookeeper

MapR Data Platform TABLES FILES MapR Data Platform MAPR-DB MAPR-FS Patent Pending

Enterprise-grade Security Operational Performance

• High availability • Data protection • Disaster recovery

• Standard file access • Standard database

access • Pluggable services • Broad developer

support

• Enterprise security authorization

• Wire-level authentication

• Data governance

• Ability to support predictive analytics, real-time database operations, and support high arrival rate data

• Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators

• Ability to deliver 2X to 7X performance

• Consistent low latency

Multi-tenancy Inter-operability

MapR Distribution for Hadoop

Page 4: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 4

MapR: Best Product, Best Business, Best Customers, Best Partners, Best Investors

Top Ranked Exponential Growth

500+ Customers Cloud Leaders

3X bookings Q1 ‘13 – Q1 ‘14

80% of accounts expand 3X

90% software licenses

< 2% lifetime churn

> $1B in incremental revenue generated by 1 customer

Page 5: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 5

Why Secure Hadoop Now? •  Historically security wasn’t a high priority

–  Reflection of the type of data and the type of organizations using Hadoop •  Hadoop is now being used by more traditional firms as well as

organizations with high security requirements –  Highly regulated –  Sensitive data sets –  People with experience with security in existing enterprise technologies (e.g.,

databases) are asking for the same in Hadoop •  Think for a moment and imagine the value of the data in a Hadoop

cluster used as a data lake –  Much valuable operational data about your customers, systems, sales, etc.

Page 6: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 6

Typical Hadoop Deployment Weaknesses •  Client operating system is trusted to identify user (weak

authentication) –  If I can compromise client, I can run jobs or access HDFS as anyone –  Think about virtual machines with root access

•  Hadoop servers trust anyone that can reach them on the network –  Could I falsify a data node, job tracker, etc.?

•  Hive Server runs as ‘system’ user –  All Hive Server submitted jobs run as that ‘system’ user

•  Intruders can see and modify all network traffic

Page 7: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 7

Agenda •  What’s MapR •  Why Secure Hadoop •  Securing MapR Hadoop •  Security beyond the core

Page 8: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 8

MapR 3.1: Securing MapR Hadoop •  Core goals

–  Leverage Apache Hadoop security functionality –  Authenticate network traffic

•  Users authenticate •  Servers authenticate to each other •  Support but do not require Kerberos

–  Encrypt network traffic –  Authorization

•  Integrate with existing authorization functionality •  Enhance MapR Tables authorization with fine grained controls

–  Low barrier to entry •  Low performance overhead •  Simple and easy to administer •  Support, but do not require Kerberos

Page 9: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 9

MapR Native Security •  Hadoop security without Kerberos

–  But borrows heavily from Kerberos design

•  Password authentication is natively supported •  Kerberos integration if desired

Page 10: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 10

Architecture •  Shared secrets like Kerberos

–  Managed at cluster level –  Two shared keys: cldb key and server key

•  Identity represented using a ticket which is issued by MapR CLDB servers (Container Location DataBase)

Page 11: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 11

Tickets •  A ticket represents a valid authenticated identity •  Contains

–  An expiration time, renewal lifetime, and creation time –  A randomly generated secret key –  Information about the identity – userid, group ids

•  Signed and encrypted when issued by CLDB –  CLDB key used for ‘permanent’ server tickets –  Server key used for ephemeral tickets issued for users

•  A client authenticates to trusted servers using the ticket

Page 12: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 12

User Experience •  User invokes maprlogin

–  maprlogin connects to CLDB (over https) •  Provide userid & password (or Kerberos ticket) for validation by CLDB

–  Ticket is returned, saved in file in /tmp file and accessible only by owning user – file name is /tmp/maprticket_<uid>

•  All processes automatically pick up ticket (nothing to do) –  Java and C/C++ clients implicitly look for valid ticket and use it –  Clients optionally use existing Kerberos identity to get MapR ticket

•  MapR PAM module –  Optional MapR provided PAM module creates MapR tickets automatically

during Unix login

Page 13: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 13

Maprlogin •  Primary user visible security tool •  Actions are

–  password - authenticate to a MapR cluster using a valid password –  kerberos - authenticate to a MapR cluster using Kerberos –  print - print information on your existing credentials –  authtest - test authentication as a generic client –  end / logout - logout of cluster –  renew - renew existing ticket

Page 14: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 14

Registry Neutral – Use the Operating System Security

•  User information is obtained using PAM and Linux pwent APIs –  Fully pluggable – any registry works –  Passwords are verified using PAM –  User information obtained via Unix getpw* APIs

•  nsswitch controlled

–  Basically, if it works with Linux authentication, it should work with MapR

Page 15: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 15

CLI Example $ hadoop fs -ls /

Bad connection to FS. command aborted. exception: failure to login: Unable to obtain MapR credentials

$ maprlogin password [Password for user 'fred' at cluster 'my.cluster.com': ] MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to ‘/tmp/maprticket_1001'

$ hadoop fs -ls / Found 3 items -rwxr-xr-x 3 mapr mapr 0 2013-12-10 13:25 /hbase drwxr-xr-x - mapr mapr 1 2013-12-10 13:25 /user drwxr-xr-x - mapr mapr 1 2013-12-10 13:25 /var

Page 16: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 16

Maprlogin – Under the Covers

maprlogin    

MapR  CLDB  

1.  username/passwd  sent  on  h;ps   LDAP/  

Kerberos/NIS  

2.  uses  PAM  to  authenDcate  

3.    Dcket  +  user  key  returned  

FileServer/  CLDB  

4.    Dcket  +  key  saved  in  file  in  /tmp  

hadoop  fs  –ls  /  

5.    cmd  picks  up  Dcket  +  key  from  file  

6.    client  sends  RPC  encrypted  with  user-­‐key  +  Dcket    

7.    server  decrypts  Dcket  to  authenDcate  user  and  checks  permissions  on  ACL  

Page 17: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 17

Client First Contact •  Client sends the ticket and data encrypted using secret key •  Receiving server

–  Extracts and decrypts ticket to obtain secret key –  Checks expiration –  Uses the secret key to decrypt the data

•  This proves that the client possesses the key that corresponds to the ticket

–  Extracts identity information from ticket and uses that for authorization –  Returns encrypted response to client

•  MapR user identity is independent of host or operating system identity

Page 18: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 18

Server First Contact •  When a trusted server starts it uses a local server ticket to

authenticate to the CLDB –  CLDB verifies the ticket’s authenticity using secret key –  CLDB returns the server key that is used to create and validate user

tickets –  The server is now a trusted member of the cluster

Page 19: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 19

Component Security •  Security between MapR unique components (CLDB, file server, etc.)

is handled via changes to the MapR RPC layer •  Apache components support pluggable security mechanisms –

typically SASL –  We are providing a new mechanism called ‘maprsasl’ –  maprsasl secures communication following the same techniques as the MapR

RPC layer •  Existing authorization code simply leverages the securely

authenticated identity –  File access –  Job submission –  Queue ACLs –  And so on …

Page 20: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 20

Example: Job Tracker Integration

JT can create user tickets. TT copies ticket to private job directory on local disk. taskcontroller copies it to user private local disk dir and tasks set MAPR_TICKET_LOCATION to that place.

JobClient JobTracker TaskTracker

submit job (maprsasl)

schedule job (maprsasl)

File system

1. JC copies job conf securely to FS 4. TT launches job using ticket identity

3. TT fetches ticket

2. JT creates user ticket

Page 21: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 21

Out of the Box Defaults •  User experience

–  Users authenticate using maprlogin and passwords –  User ‘mapr’ is admin as always

•  User must authenticate however, OS identity irrelevant –  Operating system identity (on or off cluster) no longer relevant to MapR

security •  Obviously root user and ‘mapr’ user can read/write /opt/mapr •  We’ve also tightened permissions for many directories under /opt/mapr

–  Web UIs require authentication –  MapR CLIs require authentication

•  hadoop fs/mfs/jar/job/etc •  maprcli

–  Any user can submit jobs, but can only admin their own jobs

Page 22: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 22

Out of the Box Defaults •  Cluster operations

–  All MapR servers authenticate to each other •  Most communication paths encrypted

–  All nodes share common maprserverticket •  Nodes can only join cluster if they have maprserverticket

–  Self-signed wildcard certificates created for HTTPS traffic •  ssl_keystore contains certificate and private key, ssl_truststore contains certificate

–  We set JVM system property: javax.net.ssl.trustStore •  Used by Web UIs, MCS, and maprlogin to CLDB •  Uses hostname command to get DNS domain for cluster and put that into

certificate

Page 23: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 23

Cryptography •  Encrypted using current NIST standards

–  AES-256 in GCM mode for encryption and signing •  http://en.wikipedia.org/wiki/Galois/Counter_Mode •  NIST standard - http://csrc.nist.gov/publications/fips/fips140-2/fips1402annexa.pdf

–  Leverage Intel hardware encryption where available, software otherwise

•  Use the open source crypto++ library for our C++ cryptography – http://cryptopp.com

•  Random number generation –  Use secure random number generation as documented here

http://www.cryptopp.com/docs/ref/class_auto_seeded_random_pool.html#_details

Page 24: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 24

Let’s Build a Secure Cluster! Node 1

apt-get install mapr…. configure.sh –C … -Z … -secure –genkeys

–  Generates all needed keys for MapR-RPC as well as for HTTPS Node N

apt-get install mapr…. scp rootORmapr@node1:/opt/mapr/conf/{cldb.key,maprserverticket,ssl_keystore,ssl_truststore} /opt/mapr/conf configure.sh –C … -Z … -secure

Clients apt-get install mapr… scp anyuser@nodeN:/opt/mapr/conf/ssl_truststore /opt/mapr/conf configure.sh … -secure

Page 25: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 25

That Was Easy – What Happened? •  Secret keys generated for encryption of RPC traffic •  Certificates generated for HTTPS traffic •  Configuration updated so that core components authenticate

traffic –  Most traffic also encrypted –  Bulk data transfer encryption optional on per file/directory basis

•  Web UIs switched to HTTPS •  Web UIs require password authentication – JT/TT/MCS •  Hadoop Java code switched to maprsasl •  Authorization components now rely upon MapR identity

Page 26: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 26

Encrypted Shuffle (?) •  No need to special case encrypting shuffle •  MapR-FS is store for Map output

–  Shuffle inherits the same encryption, authentication, and authorization functionality of the rest of MapR-FS

Page 27: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 27

Kerberos •  Not required but can use •  Kerberos SSO

–  Explicitly using ‘maprlogin kerberos’ –  Implicitly

•  If no MapR ticket available, client automatically detects and uses Kerberos ticket and uses it to obtain MapR ticket

•  Kerberos SSO requires only –  Kerberos client on CLDB and client machines –  Kerberos identity only for CLDB – typically 3-5 CLDBs

•  No need to manage identities for every node

•  SPNEGO/Kerberos for Web UIs requires creating host Kerberos identities

Page 28: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 28

Agenda •  What’s MapR •  Why Secure Hadoop •  Securing MapR Hadoop •  Security beyond the core

Page 29: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 29

Hadoop Map Reduce Clients •  Many components simply generate Map Reduce jobs. As such

they implicitly leverage the security we’ve defined for Map Reduce previously. They are: –  Hive (except Hive Server) –  Pig –  Mahout –  Sqoop

Page 30: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 30

Ecosystem Security •  All ecosystem components run securely as well in a secure

MapR cluster –  Some by default –  Some with minor configuration

•  Most Web UIs enhanced to use userid & password authentication and HTTPS –  Can configure Kerberos SPNEGO, same as from Apache

Page 31: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 31

MapR Ecosystem Security – by Default •  By default, out of the box when security enabled

–  Hive Server 2 supports password authentication •  Can configure Kerberos and SSL function, same as from Apache, including

secure impersonation

–  Oozie supports MapR ticket authentication •  Can configure Kerberos and SSL function, same as from Apache, including

secure impersonation

•  MapR Tables (HBase APIs) use native MapR security, no configuration needed

•  HBase and Hive MetaServer require Kerberos to be secured

Page 32: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 32

MapR Tables Authorization •  boolean logic constraints on access to M7 tables

–  Uses user & group information –  Very powerful

•  ( u:bob | g:admins) •  ( g:managers & ! g:restricted) •  ( g: managers & g:businessunity) | g:executives

–  Settable at table, column, and column family level for various actions –  Queries silently hide data you are not authorized to see

Page 33: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 33

MapR Hadoop Advantage •  Vastly simpler

–  Core secured by default in one step –  No requirement for Kerberos in core and associated complexity

•  Easier integration –  Leverage existing Linux authentication (PAM and NSSwitch)

•  Faster –  Leverage Intel AES hardware cryptography

Page 34: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 34

Further Reading •  MapR

–  http://mapr.com •  MapR Native Security

–  http://www.mapr.com/blog/getting-started-mapr-security-0 –  http://www.mapr.com/press-release/mapr-technologies-integrates-security-

into-hadoop –  http://www.mapr.com/products/only-with-mapr/mapr-integrates-security-into-

hadoop •  Adding Security to Apache Hadoop

–  http://hortonworks.com/wp-content/uploads/2011/10/security-design_withCover-1.pdf

•  The Evolution of Hadoop’s Security Model –  http://www.infoq.com/articles/HadoopSecurityModel/

Page 35: Securing Hadoop by Sr. Principal Technologist Keys Botzum

®© 2014 MapR Technologies 35

Q & A

@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies