How to manage authorization rules on Hadoop cluster with Apache Ranger
Krzysztof Adamski
3
We deliver innovativeIT services for the ING Groupall over the world.
ING Services Polska
4
SocialHarmonisation
Digitalisation
Customer Call CentresWebservices
In the Cloud
Virtual Bank
Software as a Service
Infrastructure as a Service
SeamlessConcept of ONE
No geographical boundaries
Exception Handling
APIs
My identity
Straight through processing
Customer experiencePersonalisation
Automation
Standardisation
Agile
Self Service
Mobile FirstReal Time
Security
24/7
‘Outside in and Inside out’
Omnichannel
Zero Touch
Customer journeys
Analytics
Big Data
Digitalised branches
Building standard for new generation digital bank
Cloud Platform as a service
Data Centre
197
289
58
10Średnia wieku w ISP
20-30 31-40 41-50 50-70
33,26
People matters
55416,43% (91)83,57%
(463)
5
How secure is your cluster?
Ownership and permissions look fine…
How secure is your cluster?
That must have been a sophisticated hack…
3 x A or 4 as you wish
Hadoop authentication methods
Simple
Hadoop authentication methods
Kerberos
HDFSHiveServer 2
A B C
KDC
Use Hive ST, submit query
Hive gets Namenode (NN) service ticket
Hive creates map reduce using NN ST
Ranger
Knox gets service ticket for Hive
Knox runs as proxy user using Hive ST
Original request with user id and password
Client gets query result
Client
Apache Knox
Active Directory
Hortonworks Ring of Defense Architecture
hortonworks.com
What is IPA?
redhat.com
AD Account mapping
redhat.com
SSSD integration
redhat.com
IPA for central UAM• This works great for OS• Can this be used by Hadoop?• Can this be used by Ranger?
HDFSHiveServer 2
A B C
KDC
Use Hive ST, submit query
Hive gets Namenode (NN) service ticket
Hive creates map reduce using NN ST
Ranger
Knox gets service ticket for Hive
Knox runs as proxy user using Hive ST
Original request with user id and password
Client gets query result
Client
Apache Knox
Active Directory
Hortonworks Ring of Defense Architecture
hortonworks.com
Installation through ambari
hortonworks.com
Installation through ambari
hortonworks.com
HDP 2.3.4
Watch for ranger.usersync.source.impl.class property
Enable Ranger for HDFS
hortonworks.com
hortonworks.com
hortonworks.com
Ranger audit
• It is recommended that you store audits in Solr and HDFS, and disable Audit to DB.
• Otherwise you can expect performance issues• Audit is stored in a single table• No partitions• No data retention
IPA as a central UAM• This works great for OS• Can this be used by Hadoop? Works great for PA in IPA• Can this be used by Ranger? Not yet. You still need to bind to LDAP.
Ranger KMS
One big advantage of encryption in HDFS is that even privileged users, such as the “hdfs” superuser, can be blocked from viewing encrypted data.
Caveats• Ranger (the same goes for Sentry) feels like slapped on security• User synchronization can be very slow with many users due to
architecture issues• Doesn’t manage HDFS ACLS and requires Hive user access… defeating
end to end security• Vulnerability scans just kill Ranger ;)
Caveats
mysql> select count(*) from x_user;+----------+| count(*) |+----------+| 99 |+----------+1 row in set (0.00 sec)
mysql> select count(*) from x_group;+----------+| count(*) |+----------+| 45 |+----------+1 row in set (0.00 sec)
mysql> select count(*) from x_group_users;+----------+| count(*) |+----------+| 645697 |+----------+1 row in set (0.13 sec)
mysql> select sum(user_id) from (select count(distinct user_id) user_id from x_group_users group by p_group_id) temp;+--------------+| sum(user_id) |+--------------+| 603 |+--------------+1 row in set (1.21 sec)
mysql> delete from x_group_users where id not in(
select minid from (select min(id) as minid from x_group_users group by
p_group_id,user_id) as temp);
Make it better• https://issues.apache.org/jira/browse/RANGER-827 usersync SSSD integration (sync excplicitly specified group)• https://issues.apache.org/jira/browse/HADOOP-12751 allow users with domain suffix (avoid naming collision)• https://issues.apache.org/jira/browse/HIVE-12981 the same for Hive• https://issues.apache.org/jira/browse/RANGER-842 PAM integrated authentication for Ranger
Ambari integration with IPA• https://
github.com/HariSekhon/tools/blob/master/ambari_freeipa_kerberos_setup.pl
Other upcoming features (0.6)• Tag based policies• Geolocation based policies• Deny and exclude policies• Hive Metastore plugin
Some take away tips • Install updates on a regular basis• Isolate your cluster from the rest of the network• Kerberize your cluster• Secure the user interfaces• dfs.namenode.acls.enabled• fs.permissions.umask-mode• Watch for superusers (hadoop.proxyuser settings)• Change OS default umask (watch for the upgrades and config permissions)• Make sure hive warehouse hdfs path is protected• Implement Ranger• Just don’t sync your whole AD with it ;)
@adamskikrzysiek
http://pl.linkedin.com/in/adamskikrzysztof
And yes. We are hiring