Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1"
Governance,"Risk"Management"and"Compliance"in"Hadoop"Mark"Donsky"Director"of"Product"Management,"Cloudera"
2"
Big"Data"Security"Breaches"
3"
DefiniDons"
• How"do"I"ensure"that"data"is"sufficiently"complete"and"accurate?"Governance"
• How"do"I"ensure"that"data"is"accessed"according"to"(legal)"requirements,"such"as"PCI,"HIPAA"and"NIST?"Compliance"
• How"do"I"idenDfy"risks"that"might"adversely"affect"my"ability"to"govern"or"comply?"
Risk"Management"
4"
The"Rise"of"Governance,"Risk"Management"and"Compliance"(GRC)"in"Hadoop"Lots"of"data"landing"in"Hadoop"• Huge"quanDDes"• Many"different"sources"–"structured"&"unstructured"• Varying"levels"of"sensiDvity"
Many"users"working"with"the"data"in"mulDple"ways"• Users:"Compliance"Officers,"Analysts,"Data"ScienDsts,"Business"Users"• Tools:"BI"tools,"ETL"tools,"Hue,"and"more"
Need"to"effecDvely"control"&"consume"data"• Get"visibility"&"control"over"the"environment"• Discover,"explore"and"consume"data"
5"
GRC"Requirements"
• View,"granDng"and"revoke"permissions"across"the"Hadoop"stack"• IdenDfy"access"to"a"data"asset"around"the"Dme"of"security"breach"• Generate"alert"when"a"restricted"data"asset"is"accessed"
AudiDng"and"Access"Management"
• Given"a"data"set,"trace"back"to"the"original"source"• Understand"the"downstream"impact"of"purging/modifying"a"data"set""Lineage"
• Search"through"metadata"to"find"data"sets"of"interest"• Given"a"data"set,"view"schema,"metadata"and"policies"
Metadata"Tagging"and"Discovery"
6"
AudiDng"
7"
Why"is"audiDng"important?"
• Who"accessed"a"parDcular"file"or"table?"• Who"was"denied"access"a"parDcular"file"or"table?"• Who"ran"queries"on"a"parDcular"table?"• What"did"someone"try"to"do"during"a"security"breach?"
8"
Hadoop"Audit"Logs"
Component( Loca,on((CDH)(
HDFS"Audit"Logs"" /var/log/hadoopchdfs/audit"
Hive"Audit"Logs" /var/log/hive/audit"
Impala"Audit"Logs" /var/log/impalad/audit"
HBase"Audit"Logs" /var/log/hbase/audit"
9"
HDFS"and"Hive"Audit"Logs"
• Logs"all"file"system"access"requests"• Impala,"HBase"and"other"components"use"a"similar"format"• Implemented"in"log4j"at"the"INFO"level"
HDFS(Property:(Log4j.logger.org.apache.hadoop.hdfs."server.namenode.FSNamesystem.audit"
{ ""allowed":""true,"""serviceName":"""HDFSc1”,"""username":"""training”,"""src":"""/user”,"""eventTime":""1398544478141,"""ipAddress":"""10.20.187.39”,"""operaDon":"""gekileinfo”,"""dest":""null,"""permissions":""null,"""impersonator":""null,"""delegaDonTokenId":""null"
}"{ ""allowed":""false,"
""serviceName":"""HDFSc1”,"""username":"""training”,"""src":"""/user/test”,"""eventTime":""1398544478187,"""ipAddress":"""10.20.187.39”,"""operaDon":"""mkdirs”,"""dest":""null,"""permissions":""null,"""impersonator":""null,"""delegaDonTokenId":""null"
}"
{ ""serviceName":""HIVEc1","""username":""admin","""impersonator":"null,"""ipAddress":""10.20.187.39","""operaDon":""QUERY","""eventTime":"1398402718797,"""operaDonText":""select"count(*)"from"salesdata","""allowed":"true,"""databaseName":""default","""tableName":""salesdata","""resourcePath":""/user/hive/warehouse/salesdata","""objectType":""TABLE""
}"{ ""serviceName":""HIVEc1","
""username":""admin","""impersonator":"null,"""ipAddress":""10.20.187.39","""operaDon":""QUERY","""eventTime":"1398402762830,"""operaDonText":""select"s_zip,"count(*)"from"salesdata"group"by"s_zip","""allowed":"true,"""databaseName":""default","""tableName":""salesdata","""resourcePath":""/user/hive/warehouse/salesdata","""objectType":""TABLE""
}"
HDFS"Audit"Log" Hive"Audit"Log"
10"
Hue"Job"Status"
11"
AudiDng"Summary"
• Hadoop"components"maintain"a"complete"audit"log,"but"they"are:"• Difficult"to"parse"• Stored"in"different"locaDons"• Limited"to"chronological"organizaDon"• Difficult"to"integrate"with"enterprise"infrastructure"
12"
Metadata"and"Discovery"
13"
Why"is"metadata"important?"
Technical"Metadata"
• Describes"the"informaDon"required"to"access"the"data,"such"as"where"the"data"resides"or"the"structure"of"the"data"in"its"naDve"environment"
• Allows"you"to"draw"relaDons"between"disparate"data"sets"like"“emp_sal”,"“salary”,"“sal”"
Business"Metadata"
• Details"businesscrelated"informaDon"about"the"data,"such"as"keywords"related"to"the"meta"object"or"notes"about"the"meta"object"
• Allows"you"to"annotate"data"for"your"users"and"retrieve"data"based"on"businessccontext"(e.g.,"all"data"related"to"a"clinical"trial)"
14"
What"kind"of"technical"metadata"is"available?"
Hive"
• Query"Text"• Table"name"• Column"name"
• Data"Type"• Owner"• ParDDons"
Pig"
• Script"name"• Owner"• CreaDon"date"
• Last"modified"date"
HDFS"
• Permissions"• Owner"• Group"• CreaDon"date"
• Last"modified"date"
MapReduce,"YARN"
• JobID"• Mapper"Class"• Reducer"Class"
• Inputs"• Outputs"
15"
Where"is"technical"metadata"located?"
Component( Metadata(
HDFS" fsimage"(ls"–lRa"/)"
Hive" Hive"Metastore"Server"(database"metadata"tables)"
MapReduce" JobTracker"
YARN" Job"History"Server"
Oozie" Oozie"Server"
Pig" JobTracker,"Job"History"Server"
16"
Technical"metadata:"Hive"Metastore"
• The"Hive"Metastore"is"a"SQLclike"querying"capability"for"its"own"tables"• Restricted"to"Hive"tables"–"not"structured"HDFS"files"
17"
Technical"metadata:"HCatalog"and"WebHCat"• HCatalog"extends"the"Hive"metastore"with"noncHive"structured"data"stored"in"HDFS"• Abstracts"the"file"locaDon"and"storage"format""• Makes"formats"available"to"Pig,"Hive,"MapReduce,"etc."• WebHCat"is"a"RESTful"interface"to"HCatalog""
#"hcat"ce""describe"salesdata”""s_num""""""""""""""" "float""""""""""""""" "None"""""""""""""""""s_borough""""""""""" "int""""""""""""""""" "None"""""""""""""""""s_neighbor"""""""""" "string"""""""""""""" "None"""""""""""""""""s_b_class""""""""""" "string"""""""""""""" "None"""""""""""""""""s_c_p""""""""""""""" "string"""""""""""""" "None"""""""""""""""""s_block""""""""""""" "string"""""""""""""" "None"""""""""""""""""s_lot""""""""""""""" "string"""""""""""""" "None"""""""""""""""""s_easement"""""""""" "string"""""""""""""" "None"""""""""""""""""w_c_p_2""""""""""""" "string"""""""""""""" "None"""""""""""""""""s_address""""""""""" "string"""""""""""""" "None"""""""""""""""""s_app_num""""""""""" "string"""""""""""""" "None"""""""""""""""""s_zip""""""""""""""" "string"""""""""""""" "None"""""""""""""""""s_res_units""""""""" "string"""""""""""""" "None"""""""""""""""""s_com_units""""""""" "string"""""""""""""" "None"""""""""""""""""s_tot_units""""""""" "int""""""""""""""""" "None"""""""""""""""""s_sq_v""""""""""""" "float""""""""""""""" "None"""""""""""""""""s_g_sq_v""""""""""" "float""""""""""""""" "None"""""""""""""""""s_yr_built"""""""""" "int""""""""""""""""" "None"""""""""""""""""s_tax_c""""""""""""" "int""""""""""""""""" "None"""""""""""""""""s_b_class2"""""""""" "string"""""""""""""" "None"""""""""""""""""s_price""""""""""""" "float""""""""""""""" "None"""""""""""""""""s_sales_dt"""""""""" "string"""""""""""""" "None"""""""""""""""""Time"taken:"1.847"seconds"
18"
What"about"Business"Metadata?"
• Business"metadata"="custom"tags"+"key"value"pairs"• Businesscfocused"terms"that"make"sense"to"end"users"and"data"custodians"• May"adhere"to"standards"such"as"CDISC"• Required"for"regulatory"compliance"(e.g.,"Basel"II,"SOX)"
• Solves"the"following"problems:"• Show"me"everything"related"to"clinical"trial"X"• Gather"all"recorded"customer"calls"about"denied"credit"• Collect"all"credit"informaDon"about"customer"Z"• Provide"consistent"naming"to"similar"columns"(e.g.,"emp_salary,"salary,"sal"!"salary)"
• Must"integrate"with"exisDng"business"metadata"stored"in"products"from"InformaDca,"Data"Advantage"Group,"etc."
• Hadoop"does"not"provide"business"metadata"
19"
Business"metadata"examples"
PharmaceuDcals/Healthcare"• Trial"site"ID"• ParDcipant"ID"• HIPAA"metadata"such"as"PHI"flag"
Financial"Services"• Account"number"• SensiDvity"level"• Data"origin"• PCI"metadata"such"as"PII"flag"
20"
Lineage"
21"
Why"is"lineage"important?"
• Lineage:"• IdenDfies"the"files,"tables,"columns,"and"transformaDons"that"have"an"impact"on"a"selected"table"or"column"
• Answers"the"following"quesDons:"• Impact(analysis:"What"happens"if"I"delete"a"file,"table,"column,"etc.?""• Governance:"What"analyses"were"performed"on"sensiDve"data?"• Data(integrity:(What"data"sources"were"used"to"generate"a"parDcular"analysis?"
• However:"• Lineage"is"very"complex"to"determine"in"Hadoop"
22"
Cloudera"Navigator"
23"
Cloudera"Enterprise"Data"Hub"(((1. Secure(&(Compliant((
• Robust"access"controls"• Data"encrypDon"opDons"• Shared"security"policies"
2. (Enterprise(Data(Governance(((((((((((• Meta"data"management"""""""""""""""• Data"lineage/tethering"""• Audit"histories"
3. (Unified(&(manageable(• Common"storage"&"resource"
management"• Oncprem","cloud"&"managed"
service"• Highly"available"(including"DR)"
4. (Open(Architecture((• Open"source"plakorm"• APIs"&"engines"for""
mulDple"workloads"• Extensible"for"3rd"parDes"
( ((((((((Enterprise(Data(Hub(
Unified(ScaleNout(Storage(For"Any"Type"of"Data"
ElasDc,"Faultctolerant,"Selfchealing,"Incmemory"capabiliDes"
Resource"Management"
Online"NoSQL""DBMS"
AnalyDc""MPP"DBMS"
Search""Engine"
Batch""Processing"
Stream""Processing"
Machine""Learning"
SQL" Streaming" File"System"(NFS)"
System""
Managem
ent"Data""
Managem
ent"
Metadata,"Security,"Audit,"Lineage"
Key(APributes(
24"
Cloudera"Navigator"
• View,"granDng"and"revoke"permissions"across"the"Hadoop"stack"• IdenDfy"access"to"a"data"asset"around"the"Dme"of"security"breach"• Generate"alert"when"a"restricted"data"asset"is"accessed"
AudiDng"and"Access"Management"
• Given"a"data"set,"trace"back"to"the"original"source"• Understand"the"downstream"impact"of"purging/modifying"a"data"set""Lineage"
• Search"through"metadata"to"find"data"sets"of"interest"• Given"a"data"set,"view"schema,"metadata"and"policies"
Metadata"Tagging"and"Discovery"
25"
Demo"