Upload
scm24
View
1.056
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
The Big Data Cloud: Are You Ready for the Zettabyte?
Steven C. Markey, MSIS, PMP, CISSP, CIPP, CISM, CISA, STS-EV, CCSK, CompTIA Cloud Essentials
Principal, nControl, LLCAdjunct Professor
President, Cloud Security Alliance – Delaware Valley Chapter (CSA-DelVal)
• Presentation Overview– Why Should You Care?– Cloud Overview– Big Data Overview– Cloud-Based Big Data Offerings– Securing Cloud-Based DB Solutions
Big Data Cloud
• Why Should You Care– Organizational Cost Reduction Requirements• Justify Investments• Improve Efficiencies (Productivity, Time to Market)
– Digital Information – 60%~ Annual Growth Rate (AGR)– Data Storage – 15-20% AGR Capital Expense (CapEx)– Categorization, Classification & Retention Magnify• Compliance, Legal & Privacy Regulations
– Prevalent & Interconnected Business Ecosystems• Supply Chains• Business Process Outsourcers (BPO)• Information Technology Outsourcers (ITO)• Vendor’s Vendors
Big Data Cloud
Source: IDC
Source: NIST
Service Delivery Models
Source: Swain Techs
Source: Matthew Gardiner, Computer Associates
Big Data Cloud
Source: Flickr
Big Data Cloud• Big Data Overview– Aggregated Data from the Following Sources• Traditional• Source• Social
Big Data Cloud• Traditional Data– Database Management Systems• Relational Database Management Systems (RDBMS)• Object-Oriented Database Management Systems (OODBMS)• Non-Relational, Distributed DB Management Systems (NRDBMS)• Mobile Databases (SQLite, Oracle Lite)
– Online Transaction Processing (OLTP)• Real-Time Data Warehousing
– Online Analytical Processing (OLAP)• Operational Data Stores (ODS)• Enterprise Data Warehouse (EDW)
Big Data Cloud• Traditional Data– OLAP• Business Intelligence (BI)
– Data Mining– Reporting– OLAP (Continued)
» Relational OLAP (ROLAP)» Multi-Dimensional OLAP (MOLAP)» Hybrid OLAP (HOLAP)
OLTPODSEDW (Data Marts)BI (Data Mining)OLTPODSEDW (Data Marts)BI (Reporting)OLTPODSEDW (Data Marts)BI (OLAP)
Big Data Cloud
Source: Flickr
Big Data Cloud• Source Data– Log Files
• Event Logs / Operating System (OS) - Level• Appliance / Peripherals• Analyzers / Sniffers
– Multimedia• Image Logs• Video Logs
– Web Content Management (WCM)• Web Logs• Search Engine Optimization (SEO)
– Web Metadata
Big Data Cloud• Big Data Overview– Aggregators• Mostly NRDBMS Implemtations
– Not only – Structured Query Language (NoSQL)
• NRDBMS Examples– Column Family Stores: BigTable (Google), Cassandra & HBase (Apache)– Key-Values Stores: App Engine DataStore (Google), DynamoDB &
SimpleDB (AWS)– Document Databases: CouchDB, MongoDB– Graph Databases: Neo4J
Big Data Cloud• Big Data Overview– Serial Processing
• Hadoop– Hadoop Distributed File System (HDFS)– Hive – DW– Pig – Querying Language
• Riak
– Parallel Processing• HadoopDB
– Analytics• Google MapReduce• Apache MapReduce• Splunk (for Security Information / Event Management [SIEM])
Source: Cloudera
Source: Wikispaces
Source: Google
Source: Cloudera
Big Data Cloud• Cloud-Based Big Data Solutions– PaaS
• DBaaS– Amazon Web Services (AWS)
» DynamoDB» SimpleDB» Relational Database Service (RDS): Oracle 11g / MySQL
– Google App Engine» Datastore
– Microsoft SQL Azure– Oracle Public Cloud: 11g
• Processing– AWS Elastic MapReduce (EMR)– Google App Engine MapReduce: Mapper API– Microsoft: Apache Hadoop for Azure– IBM SmartCloud Enterprise on IBM InfoSphere BigInsights Basics
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud• Cloud-Based Database Solutions– IaaS
• Basic Components: Compute & Storage Nodes– AWS Elastic Compute Cloud (EC2) – AWS Elastic Block Store (EBS)– OpenStack Compute (Nova)– OpenStack Storage (Swift)
• Advanced Components– Apache Hadoop – Apache Hadoop MapReduce
• Commercial Applications– Cloudera– DataStax– MapR– Splunk
Big Data Cloud
InternetInternet
AWS CloudAWS Cloud
EC2 Availability Zone
EC2
S3 Storage
EBSEBS
EC2 EC2
EBSEBS
EBSEBS
EBSEBS
EBSEBS
EBSEBS EBS SnapshotEBS Snapshot
EBS SnapshotEBS Snapshot
EBS SnapshotEBS Snapshot
EBS SnapshotEBS Snapshot
EBS SnapshotEBS Snapshot
Source: Amazon
Big Data Cloud• Big Data in the Cloud Use Cases– Public Cloud
• AWS: EC2 Hadoop & S3• AWS: EC2 Hadoop, DynamoDB & EMR• AWS: EC2 Linux, Apache (w / Tomcat), DynamoDB & EMR• AWS: EC2 Cloudera Hadoop & EMR• AWS: EC2 Splunk
– Hybrid• Oracle Big Data Appliance & Connector, Google App Engine• OpenStack Swift, AWS EC2 Cloudera Hadoop & EMR
– Private Cloud• OpenStack Nova & Swift, Apache Hadoop • OpenStack Nova & Swift, Cloudera Hadoop
Big Data Cloud
Source: Flickr
Big Data Cloud• Securing Cloud-Based NRDBMS Solutions– General
• Focus on Application / Middleware-Level Security– SQL Injections Are Still Possible– Leverage Application IAM for NRDBMS User Rights Mgmt (URM)– Leverage Application & System Logging for Authentication, Authorization & Accounting
(AAA)
• Segregation of Duties– Read / Write Namespaces– Read-Only Namespaces
– Specific• Document
– Consistency Assurance
• Key / Value– Ensure Referential Integrity
Big Data Cloud
Big Data Cloud• Securing Big Data in the Cloud– Identity & Access Management (IAM)• Security Assertion Markup Language (SAML)• Representational State Transfer (REST)
– AWS IAM– Windows Azure Access Control Service (ACS)
• Web Services – Trust Language (WS-Trust)
Source: OASIS
Source: Intuit
Big Data Cloud• Securing Big Data in the Cloud– Identity & Access Management (IAM)• Security Assertion Markup Language (SAML)• Representational State Transfer (REST)
– AWS IAM– Windows Azure Access Control Service (ACS)
• Web Services – Trust Language (WS-Trust)
Source: Apache
Big Data Cloud
Big Data Cloud
Big Data Cloud
Big Data Cloud• Securing Big Data in the Cloud– Identity & Access Management (IAM)• Security Assertion Markup Language (SAML)• Representational State Transfer (REST)
– AWS IAM– Windows Azure Access Control Service (ACS)
• Web Services – Trust Language (WS-Trust)
Big Data Cloud
Big Data Cloud• Securing Big Data in the Cloud– Electronic Discovery (eDiscovery)• eDiscovery Reference Model (EDRM)• Legal Holds• Litigation Response
– Records & Information Management (RIM)• Generally Accepted Recordkeeping Principles (GARP®)• Information Governance Reference Model (IGRM)• Information Lifecycle Management (ILM)• MIKE2.0
Big Data Cloud
Big Data Cloud• Privacy & Data Protection for Big Data Clouds– Jurisdictions*
• Regional: EU DPA• National: PIPEDA, GLBA, HIPAA / HITECH, COPPA, Safe Harbor• Statutory: Bavarian, CA SB 1386 / 24, MA 201 CMR 17, NV SB 227
– Data Flow & Jurisdictional Adherence• Data Sharing with Third Parties
– Pseudonymization / De-Identification• Consent & Notices
– Contract Clauses• Model Contracts
– Privacy Best Practices• Generally Accepted Privacy Principles (GAPP)
* Not all inclusive.
• Presentation Take-Aways– Big Data in the Cloud is Here to Stay– It Has to be Secure–Segregation of Data–Access Controls–Separation / Segregation of Duties–Federated Identities–Logging
Big Data Cloud
• Questions?• Contact– Email: [email protected]– Twitter: markes1– LI: http://www.linkedin.com/in/smarkey– CSA-DelVal: http://www.csadelval.org/