View
154
Download
0
Category
Preview:
DESCRIPTION
Big Data is the "next" Bg Technology and Business and Hadoop is one of the important framework of Big Data. Hadoop is currently used by Yahoo, EBay and 100s of organisations. As the Big Data use cases will grow, security of Big Data technologies, solutions and applications will become extremely important. In this presentation, I have described top 5 key security challenges related to developing Big Data solutions and applications.
Citation preview
Big Data Security
Top 5 Security Risks and Best Practices
Jitendra Chauhan
Head R&D, iViZ Security
jitendra.chauhan@gmail.com
Agenda
• Key Insights of Big Data Architecture
• Top 5 Big Data Security Risks
• Top 5 Best Practices
Key Insights of Big Data
Architecture
Distributed Architecture(Hadoop as example)
Data Partition, Replication
and Distribution
Auto-tiering
Move the
Code
Real Time, Streaming and Continuous
Computation
No SQL Roadshow| 12
Integration Patterns
Real
timeVariety of
Input
Sources
Adhoc
Queries
Parallel & Powerful Programming
Framework
Example:
• 16TB Data
• 128 MB Chunks
• 82000 Maps
Java vs SQL / PLSQL
Frameworks:
• MapReduce
• Storm Topology
(Spouts & Bolts)
Big Data ArchitectureNo Single Silver Bullet
• Hadoop is already unsuitable for many Big
data problems
• Real-time analytics• Cloudscale, Storm
• Graph computation o Giraph and Pregel (Some examples graph
computation are Shortest Paths, Degree of
Separation etc.)
• Low latency queries
o Dremel
Top 5 Security Risks
Insecure Computation
Sensitive
Info
• Information Leak
• Data Corruption
• DoSHealth Data
Untrusted
Computation program
Input Validation and Filtering
• Input Validationo What kind of data is untrusted?
o What are the untrusted data sources?
• Data Filtering
o Filter Rogue or malicious data
• Challengeso GBs or TBs continuous data
o Signature based data filtering has limitations
How to filter Behavior aspect of data?
Granular Access Controls
• Designed for Performance, almost no
security in mind
• Security in Big Data still ongoing research
• Table, Row or Cell level access control gone
missing
• Adhoc Queries poses additional challenges
• Access Control is disabled by default
Insecure Data Storage
• Data at various nodes, Authentication,
Authorization & Encryption is challenging
• Autotiering moves cold data to lesser secure
medium o What if cold data is sensitive?
• Encryption of Real time data can have
performance impacts
• Secure communication among nodes,
middleware and end users are disabled by
default
Privacy Concerns in Data Mining
and Analytics
• Monetization of Big Data generally involves
Data Mining and Analytics
• Sharing of Results involve multiple
challengeso Invasion of Privacy
o Invasive Marketing
o Unintentional Disclosure of Information
• Exampleso AOL release of Anonymzed search logs, Users can
easily be identified
o Netflix faced a similar problem
Top 5 Best Practices
• Secure your Computation Code• Implement access control, code signing, dynamic
analysis of computational code
• Strategy to prevent data in case of untrusted code
• Implement Comprehensive Input Validation
and Filtering
• Implement validation and filtering of input data, from
internal or external sources
• Evaluate input validation filtering of your Big Data
solution
Top 5 Best Practices
• Implement Granular Access Control• Review Role and Privilege Matrix
• Review permission to execute Adhoc queries
• Enable Access Control
• Secure your Data Storage and Computation• Sensitive Data should be segregated
• Enable Data encryption for sensitive data
• Audit Administrative Access on Data Nodes
• API Security
Top 5 Best Practices
• Review and Implement Privacy Preserving
Data Mining and Analytics• Analytics data should not disclose sensitive
information
• Get the Big Data Audited
Thank You
jitendra.chauhan@ivizsecurity.com
http://www.ivizsecurity.com/blog/
Big Data ArchitectureKey Insights
• Distributed Architecture & Auto Tiering
• Real Time, Streaming and Continuous
Computation
• Adhoc Queries
• Parallel and Powerful Computation
Language
• Move the Code, Not the data
• Non Relational Data
• Variety of Input Sources
Top 5 Security Risks
• Insecure Computation
• End Point Input Validation and
Filtering
• Granular Access Control
• Insecure Data Storage and
Communication
• Privacy Preserving Data Mining and
Analytics
Recommended