View
110
Download
3
Category
Tags:
Preview:
DESCRIPTION
Big Data analytics is estimated to save over $450B in healthcare costs, and there is exciting adoption of big data platforms with healthcare payers and providers. Hadoop and cloud computing have emerged as one of the most promising technologies for implementing big data at scale for healthcare workloads in production, using Hadoop as a service. Common considerations in the healthcare industry include privacy and data security, and the challenges of regulatory compliance with HIPAA and HITECH. Intel provides a security framework for Hadoop that enables enterprises to deploy big data analytics without compromising performance or security. Intel is contributing to a common security framework for Apache Hadoop, in the form of Project Rhino, which enables Hadoop to run workloads without compromising performance or security. Join this session to learn how your enterprise can take advantage of the security capabilities in the Intel Data Platform running on AWS to analyze healthcare data while ensuring technical safeguards that help you remain in compliance.
Citation preview
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Secure Hadoop as a Service
Vin Sharma, Intel
March 26, 2014
Who needs Hadoop security?
Big Data Analytics in Health and Life Sciences
Now: Disparate
streams of data
Next: Integrated
computing and data
Genomics
Clinical
Claims &
transactionsMeds &
labs
Patient
experience
Personal
data
Better decisions and outcomes at
reduced cost
Clinical Analysis
Genomic AnalysisFrom population- to person-based
treatment
Cost Savings via Big Data Analytics
Provider
Patient
Payer
Producer
Regulator
Personalized medicine
Data-driven adherence
Proven Pathways of care
Co-ordinated across providers
Shift volume to right setting
Reducing ER (re)admit rates
Provider / performance transparency
& payment innovation
Accelerated Approval
Accelerated Discovery
$180B
$100B$100B
$70B
Compliance Requirements
• HIPAA– Privacy Rule
– Security Rule
• Administrative Safeguards
• Physical Safeguards
• Technical Safeguards
• Others…
Provider
Patient
Payer
Producer
Regulator
Technical SafeguardsAccess Control A covered entity must implement technical policies and
procedures that allow only authorized persons to access
electronic protected health information (e-PHI).
Audit Controls A covered entity must implement hardware, software, and/or
procedural mechanisms to record and examine access and
other activity in information systems that contain or use e-PHI.
Integrity Controls A covered entity must implement policies and procedures to
ensure that e-PHI is not improperly altered or destroyed.
Electronic measures must be put in place to confirm that e-PHI
has not been improperly altered or destroyed.
Transmission Security A covered entity must implement technical security measures
that guard against unauthorized access to e-PHI that is being
transmitted over an electronic network.
Hadoop Security Challenges
Hadoop Security Challenges
HiveQL
Sqo
op
Flu
me
Zoo
kee
per
Pig
YARN (MRv2)
HDFS 2.0
R connectorsGiraph HCatalog
Hive
HBase Coprocessors
HBase
Mahout
Oozie
Components of a typical Hadoop stack
Hadoop Security ChallengesComponents sharing an authentication framework
HiveQL
Sqo
op
Flu
me
Zoo
kee
per
Pig
YARN (MRv2)
HDFS 2.0
R connectorsGiraph HCatalogMetadata
Hive
HBase Coprocessors
HBase
Mahout
OozieData flow
Hadoop Security ChallengesComponents capable of access control
HiveQL
Sqo
op
Flu
me
Zoo
kee
per
Pig
YARN (MRv2)
HDFS 2.0
R connectorsGiraph HCatalog
Hive
HBase Coprocessors
HBase
Mahout
Oozie
Hadoop Security ChallengesComponents capable of admission control
HiveQL
Sqo
op
Flu
me
Zoo
kee
per
Pig
YARN (MRv2)
HDFS 2.0
R connectorsGiraph HCatalog
Hive
HBase Coprocessors
HBase
Mahout
Oozie
Hadoop Security ChallengesComponents capable of (transparent) encryption
HiveQL
Sqo
op
Flu
me
Zoo
kee
per
Pig
HDFS 2.0
R connectorsGiraph HCatalog
Hive
HBase Coprocessors
HBase
Mahout
Oozie
YARN (MRv2)
Hadoop Security ChallengesComponents sharing a common policy engine
HiveQL
Sqo
op
Flu
me
Zoo
keep
er
Pig
HDFS 2.0
R connectorsGiraph HCatalog
Hive
HBase Coprocessors
HBase
Mahout
Oozie
YARN (MRv2)
Hadoop Security ChallengesComponents sharing a common audit log format
HiveQL
Sqo
op
Flu
me
Zoo
kee
per
Pig
HDFS 2.0
R connectorsGiraph HCatalogMetadata
Hive
HBase Coprocessors
HBase
MahoutData mining
Oozie
YARN (MRv2)
Hardening Hadoop from within
Project Rhino
Encryption and Key Management
Role Based Access Control
Common Authorization
Consistent Auditing
Deliver defense in depth
Firewall
Gateway
Authn
AuthZ
Encryption Audit & Alerts
Isolation
Protect Hadoop APIs
• Enforces consistent security policies across all Hadoop
services
• Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS
APIs
• Common Criteria EAL4+, HSM, FIPS 140-2 certified
• Deploys as software, virtual appliance, or hardware
appliance
• Available on AWS Marketplace
Hcatalog
Stargate
WebHDFS
Provide role-based access control
AuthZ
• File, table, and cell-level
access control in HBase
• JIRA HBASE-6222:
Add per-KeyValue security
_acl_tabl
Provide encryption for data at rest
MapReduce
RecordReader
Map
Combiner
Partitioner
LocalMerge & Sort
Reduce
RecordWriterDecrypt
Encrypt
Derivative
Encrypt
Derivative
Decrypt
HDFS
• Extends compression
codec into crypto codec
• Provides an abstract API
for general use
Provide encryption for data at rest
HBase • Transparent table/CF encryption HBase-
7544
Pig & Hive Encryption
• Pig Encryption Capabilities– Support of text file and Avro* file format
– Intermediate job output file protection
– Pluggable key retrieving and key resolving
– Protection of key distribution in cluster
• Hive Encryption Capabilities– Support of RC file and Avro file format
– Intermediate and final output data encryption
– Encryption is transparent to end user without changing existing SQL
Crypto Codec Framework
• Extends compression codec
• Establishes a common abstraction of the API level that can be shared
by all crypto codec implementations
CryptoCodec cryptoCodec = (CryptoCodec) ReflectionUtils.newInstance(codecClass, conf);
CryptoContext cryptoContext = new CryptoContext();
...
cryptoCodec.setCryptoContext(cryptoContext);
CompressionInputStream input = cryptoCodec.createInputStream(inputStream);...
• Provides a foundation for other components in Hadoop* such as
MapReduce or HBase* to support encryption features
Key Distribution
• Enabling crypto codec in a MapReduce job
• Enabling different key storage or management systems
• Allowing different stages and files to use different keys
• API to integrate with external key manage system
Crypto Software Optimization
Multi-Buffer Crypt
• Process multiple independent
data buffers in parallel
• Improves cryptographic
functionality up to 2-9X
Intel® Data Protection Technology
AES-NI
• Processor assistance for
performing AES encryption
• Makes enabled encryption
software faster and stronger
Secure Key (DRNG)
• Processor-based true random
number generator
• More secure, standards
compliance, high performance
Internet
Data in MotionSecure transactions used pervasively in ecommerce, banking, etc.
Data in Process Most enterprise and cloud applications offer encryption options to secure information and protect confidentiality
Data at RestFull disk encryption software protects data while saving to disk
AES-NI - Advanced Encryption Standard New Instructions
Secure Key - previously known as Intel Digital
Random Number Generator (DRNG)
Intel® AES-NI Accelerated Encryption
18.2x/19.8x
Non Intel®
AES-NI
With Intel®
AES-NI
Intel® AES-NI
Multi-Buffer
5.3x/19.8x
En
cry
ptio
n
De
cry
ptio
n
En
cry
ptio
n
De
cry
ptio
n
AES-NI - Advanced Encryption Standard New Instructions
20X
Faster
Crypto
Relative speed of crypto functions
Higher is better
Based on Intel tests
Cloud Platform for secure Hadoop
Intel® Xeon® Processors
• E7 Family
• E5 Family
• E3 Family
Amazon
• EC2 Reserved Instances
• EC2 Dedicated Instances
20 more at aws.amazon.com/ec2/instance-types
Amazon EC2 Instances with AES-NI
Resources
For more information
• intel.com/bigdata
• intel.com/healthcare/bigdata
• github.com/intel-hadoop/project-rhino/
• aws.amazon.com/compliance/
• aws.amazon.com/ec2/instance-types/
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Secure Hadoop as a Service
Vin Sharma, Intel
March 26, 2014
Thank you!
Recommended