37
Assured Cloud Computing for Assured Information Sharing Presented By

Hadoop online Training

Embed Size (px)

DESCRIPTION

Hadoop Online Training : kelly technologies is the bestHadoop online Training Institutes in Bangalore. ProvidingHadoop online Training by real time faculty in Bangalore. - PowerPoint PPT Presentation

Citation preview

Page 1: Hadoop online Training

Assured Cloud Computing for Assured Information Sharing

PresentedBy

Page 2: Hadoop online Training

OUTLINE Objectives Assured Information Sharing Layered Framework for a Secure Cloud Cloud-based Assured Information Sharing Cloud-based Secure Social Networking Other Topics

Secure Hybrid Cloud Cloud MonitoringCloud for Malware DetectionCloud for Secure Big Data

Education Directions Related Books

www.kellytechno.com

Page 3: Hadoop online Training

TEAM MEMBERS Sponsor: Air Force Office of Scientific Research The University of Texas at Dallas

Dr. Murat Kantarcioglu; Dr. Latifur Khan; Dr. Kevin Hamlen; Dr. Zhiqiang Lin, Dr. Kamil Sarac

Sub-contractors Prof. Elisa Bertino (Purdue) Ms. Anita Miller, Late Dr. Bob Johnson (North Texas Fusion

Center) Collaborators

Late Dr. Steve Barker, Dr. Maribel Fernandez, Kings College, U of London (EOARD)

Dr. Barbara Carminati; Dr. Elena Ferrari, U of Insubria (EOARD)

www.kellytechno.com

Page 4: Hadoop online Training

OBJECTIVES Cloud computing is an example of computing in which dynamically

scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them.

Our research on Cloud Computing is based on Hadoop, MapReduce, Xen

Apache Hadoop is a Java software framework that supports data intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.

XEN is a Virtual Machine Monitor developed at the University of Cambridge, England

Our goal is to build a secure cloud infrastructure for assured information sharing and related applications

www.kellytechno.com

Page 5: Hadoop online Training

INFORMATION OPERINFORMATION OPERAATIONS ACROSS INFOSPHERES: TIONS ACROSS INFOSPHERES: ASSURED INFORMATION SHARINGASSURED INFORMATION SHARING

Scientific/Technical ApproachConduct experiments as to how much

information is lost as a result of enforcing security policies in the case of trustworthy partners

Develop more sophisticated policies based on role-based and usage control based access control models

Develop techniques based on game theoretical strategies to handle partners who are semi-trustworthy

Develop data mining techniques to carry out defensive and offensive information operations

Accomplishments Developed an experimental system for

determining information loss due to security policy enforcement

Developed a strategy for applying game theory for semi-trustworthy partners; simulation results

Developed data mining techniques for conducting defensive operations for untrustworthy partners

Challenges Handling dynamically changing trust

levels; Scalability

ObjectivesDevelop a Framework for Secure and Timely

Data Sharing across InfospheresInvestigate Access Control and Usage Control

policies for Secure Data SharingDevelop innovative techniques for extracting

information from trustworthy, semi-trustworthy and untrustworthy partners Component

Data/Policy for Agency A

Data/Policy for Coalition

Publish Data/Policy

ComponentData/Policy for Agency C

ComponentData/Policy for Agency B

Publish Data/Policy

Publish Data/Policy

www.kellytechno.com

Page 6: Hadoop online Training

Our Approach • Policy-based Information Sharing

• Integrate the Medicaid claims data and mine the data;• Enforce policies and determine how much information has

been lost (Trustworthy partners); • Application of Semantic web technologies

• Apply game theory and probing to extract information from semi-trustworthy partners

• Conduct Active Defence and determine the actions of an untrustworthy partner

– Defend ourselves from our partners using data analytics techniques

– Conduct active defence – find our what our partners are doing by monitoring them so that we can defend our selves from dynamic situations

www.kellytechno.com

Page 7: Hadoop online Training

Coalition

Policy Enforcement Prototype

www.kellytechno.com

Page 8: Hadoop online Training

LAYERED FRAMEWORK FOR ASSURED CLOUD COMPUTING

04/23/23

8

Applications

Hadoop/MapReduc/Storage

HIVE/SPARQL/Query

XEN/Linux/VMM

Secure Virtual Network Monitor

PoliciesXACML

RDF

Risks/Costs

QoSResource Allocation

Cloud Monitors

Figure2. Layered Framework for Assured Cloud

www.kellytechno.com

Page 9: Hadoop online Training

SECURE QUERY PROCESSING WITH HADOOP/MAPREDUCE We have studied clouds based on Hadoop Query rewriting and optimization techniques designed

and implemented for two types of data (i) Relational data: Secure query processing with HIVE (ii) RDF data: Secure query processing with SPARQL Demonstrated with XACML policies Joint demonstration with Kings College and University

of Insubria First demo (2011): Each party submits their data and policies Our cloud will manage the data and policies Second demo (2012): Multiple clouds

www.kellytechno.com

Page 10: Hadoop online Training

Fine-grained Access Control with HiveSystem Architecture

Table/View definition and loading, Users can create tables as well as

load data into tables. Further, they can also upload XACML policies

for the table they are creating. Users can also create XACML

policies for tables/views. Users can define views only if

they have permissions for all tables specified in the query used to create the view. They can also either specify or create XACML policies for the views they are defining.

CollaborateCom 2010

www.kellytechno.com

Page 11: Hadoop online Training

ServerBackend

SPARQL Query Optimizer for Secure RDF Data Processing

Web Interface

Data PreprocessorN-Triples Converter

Prefix Generator

Predicate Based Splitter

Predicate Object Based Splitter

MapReduce FrameworkParser

Query Validator & Rewriter

XACML PDP

Plan Generator

Plan Executor

Query Rewriter By Policy

New Data Query Answer

To build an efficient storage mechanism using Hadoop for large amounts of data (e.g. a billion triples); build an efficient query mechanism for data stored in Hadoop; Integrate with Jena

Developed a query optimizer and query rewriting techniques for RDF Data with XACML policies and implemented on top of JENA

IEEE Transactions on Knowledge and Data Engineering, 2011

www.kellytechno.com

Page 12: Hadoop online Training

DEMONSTRATION: CONCEPT OF OPERATION

User Interface Layer

Fine-grained Access Control with Hive

SPARQL Query Optimizer for

Secure RDF Data Processing

Relational Data

RDF Data

Agency 1

Agency 2

Agency n

www.kellytechno.com

Page 13: Hadoop online Training

RDF-Based Policy Engine

Policies

Ontologies

Rules

In RDF

JENA RDF EngineRDF Documents

Inference Engine/Rules Processore.g., Pellet

Interface to the Semantic WebTechnologyBy UTDallas

www.kellytechno.com

Page 14: Hadoop online Training

RDF-BASED POLICY ENGINE ON THE CLOUD

Policy Transformation

Layer

Result Query

DB

DB

RDF

Policy Parser Layer

Regular Expression-Query

Translator Data Controller Provenance Controller

. . . RDF

XML

Policy / Graph Transformation Rules

Access Control/ Redaction Policy (Traditional Mechanism)

User Interface Layer

High Level Specification Policy

Translator

A testbed for evaluating different policy sets over different data representation. Also supporting provenance as directed graph and viewing policy outcomes graphically

Determine how access is granted to a resource as well as how a document is shared

User specify policy: e.g., Access Control, Redaction, Released Policy

Parse a high-level policy to a low-level representation

Support Graph operations and visualization. Policy executed as graph operations

Execute policies as SPARQL queries over large RDF graphs on Hadoop

Support for policies over Traditional data and its provenance

IFIP Data and Applications Security, 2010, ACM SACMAT 2011www.kellytechno.com

Page 15: Hadoop online Training

INTEGRATION WITH ASSURED INFORMATION SHARING:

User Interface Layer

RDF Data Preprocessor

Policy Translation and Transformation

Layer

MapReduce Framework for

Query Processing

Hadoop HDFS

Agency 1

Agency 2

Agency n

RDF Data and Policies

SPARQL Query

Resultwww.kellytechno.com

Page 16: Hadoop online Training

ARCHITECTURE

Policy Engine

Provenance

Agency 1

Agency 2 Agency n

User Interface Layer

Connection Interface

RDF GraphPolicy Request

RDF Graph: ModelRDF Query: SPARQL

RDBMS Connection: DB

Connection: Cloud

Connection: Text

Cloud-based Store

Local

Access Control

Combined

Redaction

Policy n-2

Policy n-1 Access Control

Combined

Redaction

Policy n

www.kellytechno.com

Page 17: Hadoop online Training

POLICY RECIPROCITY Agency 1 wishes to share its resources if Agency 2

also shares its resources with it

Use our Combined policies Allow agents to define policies based on reciprocity and mutual

interest amongst cooperating agencies

SPARQL query:SELECT B FROM NAMED uri1 FROM NAMED uri2WHERE P

www.kellytechno.com

Page 18: Hadoop online Training

DEVELOP AND SCALE POLICIES Agency 1 wishes to extend its existing policies with

support for constructing policies at a finer granularity.

The Policy engine Policy interface that should be implemented by all

policies Add newer types of policies as needed

www.kellytechno.com

Page 19: Hadoop online Training

JUSTIFICATION OF RESOURCES Agency 1 asks Agency 2 for a justification of

resource R2

Policy engine Allows agents to define policies over provenance Agency 2 can provide the provenance to Agency 1

But protect it by using access control or redaction policies

www.kellytechno.com

Page 20: Hadoop online Training

OTHER EXAMPLE POLICIESAgency 1 shares a resource with Agency 2

provided Agency 2 does not share with Agency 3Agency 1 shares a resource with Agency 2

depending on the content of the resource or until a certain time

Agency 1 shares a resource R with agency 2 provided Agency 2 does not infer sensitive data S from R (inference problem)

Agency 1 shares a resource with Agency 2 provided Agency 2 shares the resource only with those in its organizational (or social) network

www.kellytechno.com

Page 21: Hadoop online Training

ANALYZING AND SECURING SOCIAL NETWORKS IN THE CLOUD

ANALYTICSLOCATION MINING FROM ONLINE SOCIAL NETWORKSPREDICTING THREATS FROM SOCIAL NETWORK DATA, SENTIMENT ANALYSISCLOUD PLATFORM FOR IMPLEMENTATION

SECURITY AND PRIVACYPREVENTING THE INFERENCE OF PRIVATE ATTRIBUTES (LIBERAL OR CONSERVATIVE; GAY OR STRAIGHT)ACCESS CONTROL IN SOCIAL NETWORKSCLOUD PLATFORM FOR IMPLEMENTATION

www.kellytechno.com

Page 22: Hadoop online Training

SECURITY POLICIES FOR ON-LINE SOCIAL NETWORKS (OSN)

Security Policies ate Expressed in SWRL (Semantic Web Rules Language) examples

www.kellytechno.com

Page 23: Hadoop online Training

SECURITY POLICY ENFORCEMENT A reference monitor evaluates the requests. Admin request for access control could be evaluated

by rule rewriting Example: Assume Bob submits the following admin

request

Rewrite as the following rule

www.kellytechno.com

Page 24: Hadoop online Training

FRAMEWORK ARCHITECTURESocial

Network Application

Reference Monitor

Semantic Web

Reasoning Engine

Access request Access Decision

Policy Store

Modified Access request

Policy Retrieval

Reasoning Result

SN Knowledge BaseKnowledge Base

Queries

www.kellytechno.com

Page 25: Hadoop online Training

SECURE SOCIAL NETWORKING IN THE CLOUD WITH TWITTER-STORM

User Interface Layer

Fine-grained Access Control with Hive

SPARQL Query Optimizer for

Secure RDF Data Processing

Relational Data

RDF Data

Social Network 1

Social Network 2

Social Network N

www.kellytechno.com

Page 26: Hadoop online Training

SECURE STORAGE AND QUERY PROCESSING IN A HYBRID CLOUD

The use of hybrid clouds is an emerging trend in cloud computingAbility to exploit public resources for high throughputYet, better able to control costs and data privacy

Several key challengesData Design: how to store data in a hybrid cloud?

Solution must account for data representation used (unencrypted/encrypted), public cloud monetary costs and query workload characteristics

Query Processing: how to execute a query over a hybrid cloud? Solution must provide query rewrite rules that ensure the

correctness of a generated query plan over the hybrid cloudwww.kellytechno.com

Page 27: Hadoop online Training

HYPERVISOR INTEGRITY AND FORENSICS IN THE CLOUD

Cloud integrity & forensics

Hardware Layer

Virtualization Layer (Xen, vSphere)

Linux Solaris XP MacOS

Secure control flow of hypervisor code Integrity via in-lined reference monitor

Forensics data extraction in the cloudMultiple VMsDe-mapping (isolate) each VM memory from physical memory

Hypervisor

OS

Applications

integrityforensics

www.kellytechno.com

Page 28: Hadoop online Training

CLOUD-BASED MALWARE DETECTION

Benign

Buffer

Feature extraction and selection using

Cloud

Training & Model update

Unknown executable

Feature extraction

Classify

ClassMalware

Remove Keep

Stream of known malware or benign executables

Ensemble of Classification

models

www.kellytechno.com

Page 29: Hadoop online Training

CLOUD-BASED MALWARE DETECTION Binary feature extraction involves

Enumerating binary n-grams from the binaries and selecting the best n-grams based on information gain

For a training data with 3,500 executables, number of distinct 6-grams can exceed 200 millions

In a single machine, this may take hours, depending on available computing resources – not acceptable for training from a stream of binaries

We use Cloud to overcome this bottleneck A Cloud Map-reduce framework is used

to extract and select features from each chunk A 10-node cloud cluster is 10 times faster than a single node Very effective in a dynamic framework, where malware

characteristics change rapidly

www.kellytechno.com

Page 30: Hadoop online Training

IDENTITY MANAGEMENT CONSIDERATIONS IN A CLOUD Trust model that handles

(i) Various trust relationships, (ii) access control policies based on roles and attributes, iii) real-time provisioning, (iv) authorization, and (v) auditing and accountability.

Several technologies are being examined to develop the trust model Service-oriented technologies; standards such as SAML and

XACML; and identity management technologies such as OpenID.

Does one size fit all? Can we develop a trust model that will be applicable to all

types of clouds such as private clouds, public clouds and hybrid clouds Identity architecture has to be integrated into the cloud architecture.

www.kellytechno.com

Page 31: Hadoop online Training

Big Data and the Cloud0 Big Data describes large and complex data that cannot be managed by

traditional data management tools

0 From Petabytes to Zettabytes to Exabytes of data

0 Need tools for capture, storage, search, sharing, analysis, visualization of big data.

0 Examples include

- Web logs, RFID and surveillance data, sensor networks, social network data (graphs), text and multimedia, data pertaining to astronomy, atmospheric science, genomics, biogeochemical, biological fields, video archives

0 Big Data Technologies

0 Hadoop/MapReduce Platform, HIVE Platform, Twitter Storm Platform, Google Apps Engine, Amazon EC2 Cloud, Offerings from Oracle and IBM for Big Data Management, Other: Cassandra, Mahut, PigLatin, - - - -

0 Cloud Computing is emerging a critical tool for Big Data Management

0 Critical to maintain Security and Privacy for Big Datawww.kellytechno.com

Page 32: Hadoop online Training

Security and Privacy for Big Data0 Secure Storage and Infrastructure

0 How can technologies such as Hadoop and MapReduce be Secured

0 Secure Data Management0 Techniques for Secure Query Processing0 Examples: Securing HIVE, Cassandra

0 Big Data for Security0 Analysis of Security Data (e.g., Malware analysis)

0 Regulations, Compliance Governance0 What are the regulations for storing, retaining, managing,

transferring and analyzing Big Data0 Are the corporations compliance with the regulations0 Privacy of the individuals have to be maintained not just for raw

data but also for data integration and analytics0 Roles and Responsibilities must be clearly defined

www.kellytechno.com

Page 33: Hadoop online Training

Security and Privacy for Big Data0 Regulations Stifling Innovation?

0Major Concern is too many regulations will stifle Innovation0Corporations must take advantage of the Big Data technologies to improve business0But this could infringe on individual privacy0Regulations may also interfere with Privacy – example retaining the data 0Challenge: How can one carry out Analytics and still maintain Privacy?

0 National Science F Workshop Planned for Spring 2014 at the University of Texas at Dallas

www.kellytechno.com

Page 34: Hadoop online Training

EDUCATION ON SECURE CLOUD COMPUTING AND RELATED TECHNOLOGIES Secure Cloud Computing

NSF Capacity Building Grant on Assured Cloud Computing Introduce cloud computing into several cyber security courses Completed courses

Data and Applications Security, Data Storage, Digital Forensics, Secure Web Services

Computer and Information Security Capstone Course

One course that covers all aspects of assured cloud computing Week long course to be given at Texas Southern University

Analyzing and Securing Social Networks Big Data Analytics and Security

www.kellytechno.com

Page 35: Hadoop online Training

DIRECTIONS Secure VMM and VNM

Designing Secure XEN VMM Developing automated techniques for VMM introspection Determine a secure network infrastructure for the cloud

Integrate Secure Storage Algorithms into Hadoop

Identity Management in the Cloud Secure cloud-based Big Data

Management/Social Networking

www.kellytechno.com

Page 36: Hadoop online Training

RELATED BOOKS Developing and Securing the Cloud, CRC Press

(Taylor and Francis), November 2013 (Thuraisingham)

Secure Data Provenance and Inference Control with Semantic Web, CRC Press 2014, In Print (Cadenhead, Kantarcioglu, Khadilkar, Thuraisingham)

Analyzing and Securing Social Media, CRC Press, 2014, In preparation (Abrol, Heatherly, Khan, Kantarcioglu, Khadilkar, Thuraisingham)

www.kellytechno.com

Page 37: Hadoop online Training

PresentedBy