IEEE Project Topics project Topic… · such as games, and productivity applications access to user online private data. Such accesses must be authorized by users at installation

1 IEEE Project Topics| Soham Consultants

IEEE Project Topics

1. Enabling Secure and Efficient Ranked Keyword Search over Outsourced

Cloud Data In this paper, we define and solve the problem of secure ranked keyword search over

encrypted cloud data. Ranked search greatly enhances system usability by enabling search result

relevance ranking instead of sending undifferentiated results, and further ensures the file

retrieval accuracy

2. A Secure Erasure Code-Based Cloud Storage System with Secure Data

Forwarding We propose a threshold proxy re-encryption scheme and integrate it with a

decentralized erasure code such that a secure distributed storage system is formulated. The

distributed storage system not only supports secure and robust data storage and retrieval, but

also lets a user forward his data in the storage servers to another user without retrieving the

data back.

3. SeDas A Self-Destructing Data System Based on Active Storage Framework

Personal data stored in the Cloud may contain account numbers, passwords, notes,

and other important information that could be used and misused

by a miscreant,a competitor, or a court of law. These data are cached, copied, and

archived by Cloud Service Providers (CSPs), often without users' authorization and

control. Self-destructing data mainly aims at protecting the user data's privacy. All

the data and their copies become destructed or unreadable after a user-specified time,

without any user intervention. In addition, the decryption key is destructed after the user-

specified time. In this paper, we presentSeDas, a system that meets this challeng

through a novel integration of cryptographic techniques

with active storagetechniques based on T10 OSD standard. We implemented aproof-of-

concept SeDas prototype. Through functionality and security properties evaluations of

the SeDas prototype, the results demonstrate that SeDas is practical to use and meets all

the privacy-preserving goals described. Compared to the system without self-destructing data mechanism, throughput for uploading and downloading with the

proposed SeDas acceptably decreases by less than 72%, while latency for

upload/download operations with self-destructing data mechanism increases by less than

60%.

4. Privacy as a Service: Privacy-Aware Data Storage and Processing in Cloud Computing Architectures

In this paper we present PasS (privacy as a service); a set of security protocols for

ensuring the privacy and legal compliance of customer data in cloud computing

architectures. PasS allows for the secure storage and processing of users' confidential data

by leveraging the tamper-proof capabilities of cryptographic coprocessors. Using tamper-


proof facilities provides a secure execution domain in the computing cloud that is

physically and logically protected from unauthorized access. PasS central design goal is

to maximize users' control in managing the various aspects related to the privacy of

sensitive data. This is achieved by implementing user-configurable software protection

and data privacy mechanisms. Moreover, PasS provides a privacy feedback process

which informs users of the different privacy operations applied on their data and makes

them aware of any potential risks that may jeopardize the confidentiality of their sensitive

information. To the best of our knowledge, PasS is the first practical cloud computing

privacy solution that utilizes previous research on cryptographic coprocessors to solve the

problem of securely processing sensitive data in cloud computing infrastructures.

5. Trustrace: Mining Software Repositories to Improve the Accuracy of

Requirement Traceability Links Traceability is the only means to ensure that the source code of a system is consistent

with its requirements and that all and only the specified requirements have been

implemented by developers. During software maintenance and evolution, requirement

traceability links become obsolete because developers do not/cannot devote effort to

updating them. Yet, recovering these traceability links later is a daunting and costly task

for developers. Consequently, the literature has proposed methods, techniques, and tools

to recover these traceability links semi-automatically or automatically. Among the

proposed techniques, the literature showed that information retrieval (IR) techniques can

automatically recover traceability links between free-text requirements and source code.

However, IR techniques lack accuracy (precision and recall). In this paper, we show that

mining software repositories and combining mined results with IR techniques can

improve the accuracy (precision and recall) of IR techniques and we propose Trustrace, a

trust--based traceability recovery approach. We apply Trustrace on four medium-size

open-source systems to compare the accuracy of its traceability links with those

recovered using state-of-the-art IR techniques from the literature, based on the Vector

Space Model and Jensen-Shannon model. The results of Trustrace are up to 22.7 percent

more precise and have 7.66 percent better recall values than those of the other techniques,

on average. We thus show that mining software repositories and combining the mined

data with existing results from IR techniques improves the precision and recall of

requirement traceability links.

6. Toward Secure and Dependable Storage Services in Cloud Computing We propose in this paper a flexible distributed storage integrity auditing mechanism,

utilizing the homomorphism token and distributed erasure-coded data. The proposed design

allows users to audit the cloud storage with very lightweight communication and computation

cost.

7. Enhanced data security model for cloud computing Cloud Computing becomes the next generation architecture of IT Enterprise. In contrast

to traditional solutions, Cloud computing moves the application software and databases to the

large data centers, where the management of the data and services may not be fully

trustworthy. This unique feature, however, raises many new security challenges which have not


been well understood. In cloud computing, both data and software are fully not contained on

the user's computer; Data Security concerns arising because both user data and program are

residing in Provider Premises. Clouds typically have a single security architecture but have many

customers with different demands. Every cloud provider solves this problem by encrypting the

data by using encryption algorithms. This paper investigates the basic problem of cloud

computing data security. We present the data security model of cloud computing based on the

study of the cloud architecture. We improve data security model for cloud computing. We 2012

implement software to enhance work in a data security model for cloud computing. Finally apply

this software in the Amazon EC2 Micro instance

8. Revisiting Defenses against Large-Scale Online Password Guessing

Attacks Brute force and dictionary attacks on password-only remote login services are now

widespread and ever increasing. Enabling convenient login for legitimate users while preventing

such attacks is a difficult problem. Automated Turing Tests (ATTs) continue to be an effective,

easy-to-deploy approach to identify automated malicious login attempts with reasonable cost of

inconvenience to users. In this paper, we discuss the inadequacy of existing and proposed login

protocols designed to address large scale online dictionary attacks (e.g., from a botnet of

hundreds of thousands of nodes). We propose a new Password Guessing Resistant Protocol

PGRP), derived upon revisiting prior proposals designed to restrict such attacks. While PGRP

limits the total number of login attempts from unknown remote hosts to as low as a single

attempt per username, legitimate users in most cases (e.g., when attempts are made from

known, frequently-used machines) can make several failed login attempts before being

challenged with an ATT. We analyze the performance of PGRP with two real world data sets and

find it more promising than existing proposals.

9. Enhanced Privacy ID: A Direct Anonymous Attestation Scheme with

Enhanced Revocation Capabilities Direct Anonymous Attestation (DAA) is a scheme that enables the remote

authentication of a Trusted Platform Module (TPM) while preserving the user’s privacy. A TPM

can prove to a remote party that it is a valid TPM without revealing its identity and without

likability. In the DAA scheme, a TPM can be evoked only if the DAA private key in the hardware

has been extracted and published widely so that verifiers obtain the corrupted private key. If the

unlink ability requirement is relaxed, a TPM suspected of being compromised can be revoked

even if the private key is not known. However, with the full unlink ability requirement intact, if a

TPM has been compromised but its private key has not been distributed to verifiers, the TPM

cannot be revoked. Furthermore, a TPM cannot be revoked from the issuer, if the TPM is found

to be compromised after the DAA issuing has occurred. In this paper, we present a new DAA

scheme called Enhanced Privacy ID (EPID) scheme that addresses the above limitations. While

still providing unlinks ability, our scheme provides a method to revoke a TPM even if the TPM


private key is unknown. This expanded revocation property makes the scheme useful for other

applications such as for driver’s license. Our EPID scheme is efficient and provably secure in the

same security model as DAA, i.e., in the random oracle model under the strong RSA assumption

and the decisional Diffie-Hellman assumption.

10. Balancing the Tradeoffs between Query Delay and Data

Availability in MANETs – In mobile ad hoc networks (MANETs), nodes move freely and link/node failures are

common, which leads to frequent network partitions. When a network partition occurs, mobile

nodes in one partition are not able to access data hosted by nodes in other partitions, and

hence significantly degrade the performance of data access. To deal with this problem, we apply

data 63replication techniques. Existing data replication solutions in both wired and wireless

networks aim at either reducing the query delay or improving the data availability, but not both.

As both metrics are important for mobile nodes, we propose schemes to balance the tradeoffs

between data availability and query delay under different system settings and requirements.

Extensive simulation results show that the proposed schemes can achieve a balance between

these two metrics and provide satisfying system performance.

11. M-Score A Misuse ability Weight Measure Detecting and preventing data leakage and data misuse poses a serious challenge for

organizations, especially when dealing with insiders with legitimate permissions to access the

organization's systems and its critical data. In this paper, we present a new concept, Misuseability

Weight, for estimating the risk emanating from data exposed to insiders. This concept focuses on

assigning a score that represents the sensitivity level of the data exposed to the user and by that

predicts the ability of the user to maliciously exploit this data. Then, we propose a new measure, the

M-score, which assigns a misuseability weight to tabular data, discuss some of its properties, and

demonstrate its usefulness in several leakage scenarios. One of the main challenges in applying the

M-score measure is in acquiring the required knowledge from a domain expert. Therefore, we

present and evaluate two approaches toward eliciting misuseability conceptions from the domain

expert.

12. Recommendation Models for Open Authorization Major online platforms such as Facebook, Google, and Twitter allow third-party applications

such as games, and productivity applications access to user online private data. Such accesses

must be authorized by users at installation time. The Open Authorization protocol (OAuth) was

introduced as a secure and efficient method for authorizing third-party applications without

releasing a user's access credentials. However, OAuth implementations don't provide the

necessary fine grained access control, nor any recommendations, i.e., which access control

decisions are most appropriate. We propose an extension to the OAuth 2.0 authorization that


enables the provisioning of fine-grained authorization recommendations to users when granting

permissions to third-party applications. We propose a multicriteria recommendation model that

utilizes application-based, user-based, and category based collaborative filtering mechanisms.

Our collaborative filtering mechanisms are based on previous user decisions, and application

permission requests to enhance the privacy of the overall site's user population. We

implemented our proposed OAuth extension as a browser extension that allows users to easily

configure their privacy settings at application installation time, provides recommendations on

requested privacy permissions, and collects data regarding user decisions. Our experiments on

the collected data indicate that the proposed framework efficiently enhanced the user

awareness and privacy related to third-party application authorizations.

13. Revisiting Defenses against Large-Scale Online Password Guessing

Attacks Brute force and dictionary attacks on password-only remote login services are now

widespread and ever increasing. Enabling convenient login for legitimate users while

preventing such attacks is a difficult problem. Automated Turing Tests (ATTs) continue to

be an effective, easy-to-deploy approach to identify automated malicious login attempts

with reasonable cost of inconvenience to users. In this paper, we discuss the inadequacy of

existing and proposed login protocols designed to address large-scale online dictionary

attacks (e.g., from a botnet of hundreds of thousands of nodes). We propose a new

Password Guessing Resistant Protocol (PGRP), derived upon revisiting prior proposals

designed to restrict such attacks. While PGRP limits the total number of login attempts

from unknown remote hosts to as low as a single attempt per username, legitimate users in

most cases (e.g., when attempts are made from known, frequently-used machines) can

make several failed login attempts before being challenged with an ATT. We analyze the

performance of PGRP with two real world data sets and find it more promising than

existing proposals.

14. Outsourced Similarity Search on Metric Data Assets

This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment.

15. Multiparty Access Control for Online Social Networks: Model and

Mechanisms In this paper, we propose an approach to enable the protection of shared data

associated with multiple users in Online social networks. We formulate an access control

model to capture the essence of multiparty authorization requirements, along with a

multiparty policy speci?cation scheme and a policy enforcement mechanism


16. A Query Formulation Language for the Data Web We present a query formulation language (called MashQL) in order to easily query

and fuse structured data on the web. The main novelty of MashQL is that it allows people

with limited IT skills to explore and query one (or multiple) data sources without prior

knowledge about the schema, structure, vocabulary, or any technical details of these sources.

More importantly, to be robust and cover most cases in practice, we do not assume that a data

source should have - an offline or inline - schema. This poses several language-design and

performance complexities that we fundamentally tackle. To illustrate the query formulation

power of MashQL, and without loss of generality, we chose the Data web scenario. We also

chose querying RDF, as it is the most primitive data model; hence, MashQL can be similarly

used for querying relational databases and XML. We present two implementations of

MashQL, an online mashup editor, and a Firefox add on. The former illustrates how MashQL

can be used to query and mash up the Data web as simple as filtering and piping web feeds;

and the Firefox add on illustrates using the browser as a web composer rather than only a

navigator. To end, we evaluate MashQL on querying two data sets, DBLP and DBPedia, and

show that our indexing techniques allow instant user interaction.

17. Incremental Information Extraction Using Relational Database Information extraction systems are traditionally implemented as a pipeline of special-

purpose processing modules targeting the extraction of a particular kind of information. A

major drawback of Databases such an approach is that whenever a new extraction goal

emerges or a module is improved, extraction has to be reapplied from scratch to the entire

text corpus even though only a small part of the corpus might be affected. In this paper, we

describe a novel approach for information extraction in which extraction needs are expressed

in the form of database queries, which are evaluated and optimized by database systems.

Using database queries for information extraction enables generic extraction and minimizes

reprocessing of data by performing incremental extraction to identify which part of the data

is affected by the change of components or goals. Furthermore, our approach provides

automated query generation components so that casual users do not have to learn the query

language in order to perform extraction. To demonstrate the feasibility of our incremental

extraction approach, we performed experiments to highlight two important aspects of an

information extraction system: efficiency and quality of extraction results. Our experiments

show that in the event of deployment of a new module, our incremental extraction approach

reduces the processing time by 89.64 percent as compared to a traditional pipeline approach.

By applying our methods to a corpus of 17 million biomedical abstracts, our experiments

show that the query performance is efficient for real-time applications. Our experiments also

revealed that our approach achieves high quality extraction results.

18. Load-Balancing Multipath Switching System with Flow Slice Multipath Switching systems (MPS) are intensely used in state-of-the-art core routers

to provide terabit or even petabit switching capacity. One of the most intractable issues in

designing MPS is how to load balance traffic across its multiple paths while not disturbing

the intraflow packet orders. Previous packet-based solutions either suffer from delay

penalties or lead to O(N2 ) hardware complexity, hence do not scale. Flow-based hashing


algorithms also perform badly due to the heavy-tailed flow-size distribution. In this paper, we

develop a novel scheme, namely, Flow Slice (FS) that cuts off each flow into flow slices at

every interflow interval larger than a slicing threshold and balances the load on a finer

granularity. Based on the studies of tens of real Internet traces, we show that setting a slicing

threshold of 1-4 ms, the FS scheme achieves comparative load balancing performance to the

optimal one. It also limits the probability of out-of-order packets to a negligible level (10-6)

on three popular MPSes at the cost of little hardware complexity and an internal speedup up

to two. These results are proven by theoretical analyses and also validated through trace

driven prototype simulations

19. Application study of online education platform based on cloud

computing Aimed at some problems in Network Education Resources Construction at present,

we analyze the characteristics and application range of cloud computing, and present an

integrated solving scheme. On that basis, some critical technologies such as the cloud

storage, streaming media and cloud safety are analyzed in detail. Finally, the paper gives

summarization and expectation.

20. Towards temporal access control in cloud computing Access control is one of the most important security mechanisms in cloud computing.

Attribute-based access control provides a flexible approach that allows data owners to

integrate data access policies within the encrypted data. However, little work has been done

to explore temporal attributes in specifying and enforcing the data owner's policy and the

data user's privileges in cloud-based environments. In this paper, we present an efficient

temporal access control encryption scheme for cloud services with the help of cryptographic

integer comparisons and a proxy-based re-encryption mechanism on the current time. We

also provide a dual comparative expression of integer ranges to extend the power of attribute

expression for implementing various temporal constraints. We prove the security strength of

the proposed scheme and our experimental results not only validate the effectiveness of our

scheme, but also show that the proposed integer comparison scheme performs significantly

better than previous bitwise comparison scheme.

21. Cloud intelligent track – Risk analysis and privacy data management in

the cloud computing Cloud computing is a computing platform with the backbone of internet to store,

access the data and application which is in the cloud, not in the computer. The biggest issue

which should be addressed in cloud computing are security and privacy. Outsourcing data to

other companies worries internet clients to think about the privacy data. Most Enterprise

executives hesitate to use cloud computing system due to their sensitive enterprise

information. This paper provides data integrity and user privacy through cloud intelligent

track system. This paper discuss about the previous experiment done on the privacy and data

management. The work proposes the Architecture or system which provides intelligent track

in Privacy Manager and Risk Manager to address privacy issues which rules the cloud

environment.


22. Measurement and utilization of customer-provided resources for cloud

computing Recent years have witnessed cloud computing as an efficient means for providing

resources as a form of utility. Driven by the strong demands, such industrial leaders as

Amazon, Google, and Microsoft have all offered practical cloud platforms, mostly datacenter

based. These platforms are known to be powerful and cost-effective. Yet, as the cloud

customers are pure consumers, their local resources, though abundant, have been largely

ignored. In this paper, we for the first time investigate a novel customer-provided cloud

platform, Spot Cloud, through extensive measurements. Complementing data centers, Spot

Cloud enables customers to contribute/sell their private resources to collectively offer cloud

services. We find that, although the capacity as well as the availability of this platform is not

yet comparable to enterprise datacenters, Spot Cloud can provide very flexible services to

customers in terms of both performance and pricing. It is friendly to the customers who often

seek to run short-term and customized tasks at minimum costs. However, different from the

standardized enterprise instances, Spot Cloud instances are highly diverse, which greatly

increase the difficulty of instance selection. To solve this problem, we propose an instance

recommendation mechanism for cloud service providers to recommend short-listed instances

to the customers. Our model analysis and the real world experiments show that it can help the

customers to find the best tradeoff between benefit and cost.

23. Improving public audit ability, data possession in data storage security

for cloud computing Cloud computing is Internet based technology where the users can subscribe high

quality of services from data and software that resides solely in the remote servers. This

provides many benefits for the users to create and store data in the remote servers thereby

utilizing fewer resources in client system. However management of the data and software

may not be fully Trustworthy which possesses many security challenges. One of the

security issues is the data storage security where frequent integrity checking of remotely

stored data is carried out. RSA based storage security (RSASS) method uses public

auditing of the remote data by improving existing RSA based signature generation. This

public key cryptography technique is widely used for providing strong security. Using

this RSASS method, the data storage correctness is assured and identification of

misbehaving server with high probability is achieved. This method also supports dynamic

operation on the data and tries to reduce the server computation time. The preliminary

results achieved through RSASS, proposed scheme outperforms with improved security

in data storage when compared with the existing methods.

24. Implementation of Map Reduce-based image conversion module in

cloud computing environment In recent years, the rapid advancement of the Internet and the growing number of

people using social networking services (SNSs) have facilitated the sharing of

multimedia data. However, multimedia data processing techniques such as transcoding


and transmoding impose a considerable burden on the computing infrastructure as the

amount of data increases. Therefore, we propose a MapReduce-based image-conversion

module in cloud computing environment in order to reduce the burden of computing

power. The proposed module consists of two parts: a storage system, i.e., Hadoop

distributed file system (HDFS) for image data and a MapReduce program with a Java

Advanced Imaging (JAI) library for image transcoding. It can process image data in

distributed and parallel cloud computing environments, thereby minimizing the

computing infrastructure overhead. In this paper, we describe the implementation of the

proposed module using Hadoop and JAI. In addition, we evaluate the proposed module in

terms of processing time under varying experimental conditions.

25. Distributed -Optimal User Association and Cell Load Balancing in Wireless

Networks

In this paper, we develop a framework for user association in infrastructure-based

wireless networks, specifically focused on flow-level cell load balancing under spatially

inhomogeneous traffic distributions. Our work encompasses several different user

associations Policies: rate-optimal, throughput-optimal, delay optimal, and load-equalizing,

which we collectively denote α-optimal user association. We prove that the optimal load

vector ρ* that minimizes a generalized system performance function is the fixed point of a

certain mapping. Based on this mapping, we propose and analyze an iterative distributed user

association policy that adapts to spatial traffic loads and converges to a globally optimal

allocation. We then address admission control policies for the case where the system

is overloaded. For an appropriate system-level cost function, the optimal admission control

policy blocks all flows at cells edges. However, providing a minimum level of connectivity

to all spatial locations might be desirable. To this end, a location-dependent random blocking

and user association policy are proposed.

26. Ensuring Distributed Accountability for Data Sharing in the Cloud

Cloud computing enables highly scalable services to be easily consumed over the Internet on

an as- needed basis. A major feature of the cloud services is that users’ data are usually processed

remotely in unknown machines that users do not own or operate. While enjoying the convenience

brought by this new emerging technology, users’ fears of losing control of their own data

(particularly, financial and health data) can become a significant barrier to the wide adoption of

cloud services. To address this problem, here, we propose a novel highly decentralized information

accountability framework to keep track of the actual usage of the users’ data in the cloud. In

particular, we propose an object-centered approach that enables enclosing our logging mechanism

together with users’ data and policies. We leverage the JAR programmable capabilities to both

create a dynamic and traveling object, and to ensure that any access to users’ data will trigger

authentication and automated logging local to the JARs. To strengthen user’s control, we also

provide distributed auditing mechanisms. We provide extensive experimental studies that

demonstrate the efficiency and effectiveness of the proposed approaches.


27. A Learning-Based Approach to Reactive Security

Despite the conventional wisdom that proactive security is superior to reactive security, we

show that reactive security can be competitive with proactive security as long as the reactive

defender learns from past attacks instead of myopically overreacting to the last attack. Our game-

theoretic model follows common practice in the security literature by making worst case

assumptions about the attacker: we grant the attacker complete knowledge of the defender's

strategy and do not require the attacker to act rationally. In this model, we bound the competitive

ratio between a reactive defense algorithm (which is inspired by online learning theory) and the best

fixed proactive defense. Additionally, we show that,

unlike proactive defenses, this reactive strategy is robust to a lack of information about the

attacker's incentives and knowledge.

28. Persuasive Cued Click-Points Design, Implementation, and Evaluation of a

Knowledge- Based Authentication Mechanism

This paper presents an integrated evaluation of the Persuasive Cued Click-Points

graphical password scheme, including usability and security evaluations, and

implementation considerations. An important usability goal for knowledge-based

authentication systems is to support users in selecting passwords of higher security, in the

sense of being from an expanded effective security space. We use persuasion to influence

user choice in click-based graphical passwords, encouraging users to select more random,

and hence more difficult to guess, click-points.

29. A Methodology for Direct and Indirect Discrimination Prevention in Data Mining

In this paper, we tackle discrimination prevention in data mining and propose new

techniques applicable for direct or indirect Discrimination prevention individually or both

at the same time. We discuss how to clean training datasets and outsourced datasets in

such a way that direct and/or indirect discriminatory decision rules are converted to

legitimate (non- discriminatory) Classification rules.

30. Prediction of User’s Web-Browsing Behavior: Application of Markov Model

Predicting user's behavior while serving the Internet can be applied effectively in

various critical applications. Such application has traditional tradeoffs between modeling

complexity and prediction accuracy. In this paper, we analyze and study Markov model

and all- Kth Markov model in Web prediction. We propose a new modified Markov

model to alleviate the issue of scalability in the number of paths.

31. Query Planning for Continuous Aggregation Queries over a Network of Data

Aggregators

We present a low-cost, scalable technique to answer continuous aggregation queries using a

network of aggregators of dynamic data items. In such a network of data aggregators, each

data aggregator serves a set of data items at specific coherencies.


32. Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Preparing a data set for analysis is generally the most time consuming task in a data

mining project, requiring many complex SQL queries, joining tables, and aggregating

columns. Existing SQL aggregations have limitations to prepare data sets because they return

one column per aggregated group. In general, a significant manual effort is required to build

data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to

generate SQL code to return aggregated columns in a horizontal tabular layout, returning a

set of numbers instead of one number per row. This new class of functions is called

horizontal aggregations. Horizontal aggregations build data sets with a horizontal

denormalized layout (e.g., pointdimension, observationvariable, instance-feature), which is

the standard layout required by most data mining

algorithms. We propose three fundamental methods to evaluate horizontal aggregations:

CASE: Exploiting the programming CASE construct; SPJ: Based on standard relational

algebra operators (SPJ queries); PIVOT: Using the PIVOT operator, which is offered by

some DBMSs.

Experiments with large tables compare the proposed query evaluation methods. Our CASE

method has similar speed to the PIVOT operator and it is much faster than the SPJ method.

In general, the CASE and PIVOT methods exhibit linear scalability, whereas the SPJ method

does

33. Enabling Multilevel Trust in Privacy Preserving Data Mining

Privacy Preserving Data Mining (PPDM) addresses the problem of developing

accurate models about aggregated data without access to precise information in

individual data record. A widely studied perturbation-based PPDM approach introduces

random perturbation to individual values to preserve privacy before data are published.

Previous solutions of this approach are limited in their tacit assumption of single-level

trust on data miners. In this work, we relax this assumption and expand the scope of

perturbation-based PPDM to Multilevel Trust (MLTPPDM). In our setting, the more

trusted a data miner is, the less perturbed copy of the data it can access. Under this

setting, a malicious data miner may have access to differently perturbed copies of the

same data through various means, and may combine these diverse copies to jointly infer

additional information about the original data that the data owner does not intend to

release. Preventing such diversity attacks is the key challenge of providing MLT-PPDM

services. We address this challenge by properly correlating perturbation across copies at

different trust levels. We prove that our solution is robust against diversity attacks with

respect to our privacy goal. That is, for data miners who have access to an arbitrary

collection of the perturbed copies, our solution prevent them from jointly reconstructing

the original data more accurately than the best effort using any individual copy in the

collection. Our solution allows a data owner to generate perturbed copies of its data for

arbitrary trust levels on demand. This feature offers data owner’s maximum flexibility.

34. A Genetic Programming Approach to Record Deduplication Several systems that rely on consistent data to offer highquality services, such as

digital libraries and e-commerce brokers, may be affected by the existence of duplicates,

quasi replicas, or near-duplicate entries in their repositories. Because of that, there have


been significant investments from private and government organizations for developing

methods for removing replicas from its data repositories. This is due to the fact that clean

and replica-free repositories not only allow the retrieval of higher quality information but

also lead to more concise data and to potential savings in computational time and

resources to process this data. In this paper, we propose a genetic programming approach

to record deduplication that combines several different pieces of evidence extracted from

the data content to find a deduplication function that is able to identify whether two

entries in a repository are replicas or not. As shown by our experiments, our approach

outperforms an existing stateof- the-art method found in the literature. Moreover, the

suggested functions are computationally less demanding since they use fewer evidence.

In addition, our genetic programming approach is capable of automatically

adapting these functions to a given fixed replica

identification boundary, freeing the user from the burden

of having to choose and tune this parameter.

35. A Probabilistic Scheme for Keyword- Based Incremental Query

Construction Databases enable users to precisely express their informational needs using

structured queries. However, database query construction is a laborious and error prone

process, which cannot be performed well by most end users. Keyword search alleviates

the usability problem at the price of query expressiveness. As keyword search algorithms

do not differentiate between the possible informational needs represented by a keyword

query, users may not receive adequate results. This paper presents IQP - a novel approach

to bridge the gap between usability of keyword search and expressiveness of database

queries. IQP enables a user to start with an arbitrary keyword query and incrementally

refine it into a structured query through an interactive interface. The enabling techniques

of IQP include: 1) a probabilistic framework for incremental query construction; 2) a

probabilistic model to assess the possible informational needs represented by a keyword

query; 3) an algorithm to obtain the optimal query construction process. This paper

presents the detailed design of IQP, and demonstrates its effectiveness and scalability

through experiments over real-world data and a user study.

36. Effective Pattern Discovery for Text Mining Many data mining techniques have been proposed for mining useful patterns in text

documents. However, how to effectively use and update discovered patterns is still

an open research issue, especially in the domain of text mining. Since most existing text

mining methods adopted term-based approaches, they all suffer from the problems of

polysemy and synonymy. Over the years, people have often held the hypothesis that

pattern (or phrase)-based approaches should perform better than the term-based ones, but

many experiments do not support this hypothesis. This paper presents an innovative and

effective pattern discovery technique which includes the processes of pattern deploying

and pattern evolving, to improve the effectiveness of using and updating discovered

patterns for finding relevant and interesting information. Substantial experiments on

RCV1 data collection and TREC topics demonstrate that the proposed solution achieves

encouraging performance.


37. Efficient Fuzzy Type- Ahead Search in XML Data In a traditional keyword-search system over XML data, a user composes a keyword

query, submits it to the system, and retrieves relevant answers. In the case where the user

has imited knowledge about the data, often the user feels “left in the dark” when issuing

queries, and has to use a try-and-see approach for finding information. In this paper, we

study fuzzy type-ahead search in XML data, a new information-access paradigm in which

the system searches XML data on the fly as the user types in query keywords. It allows

users to explore data as they type, even in the presence of minor errors of their keywords.

Our proposed method has the following features: 1)Search as you type: It extends

Autocomplete by supporting queries with multiple keywords in XML data. 2) Fuzzy: It

can find high-quality answers that have

Keywords matching query keywords approximately. 3) Efficient: Our effective index

structures and searching algorithms can achieve a very high interactive speed. We study

research challenges in this new search framework. We propose effective index structures

and top-k algorithms to achieve a high interactive speed. We examine effective ranking

functions and early termination techniques to progressively identify the top-k relevant

answers. We have implemented our method on real data sets, and the experimental results

show that our method achieves high search efficiency and result quality.

38. Mining Online Reviews for Predicting Sales Performance A Case Study

in the Movie Domain Posting reviews online has become an increasingly popular way for people to express

opinions and sentiments toward the products bought or services received. Analyzing the

large volume of online reviews available would produce useful actionable knowledge that

could be of economic values to vendors and other interested parties. In this paper, we

conduct a case study in the movie domain, and tackle the problem of mining reviews for

predicting product sales performance. Our analysis shows that both the sentiments

expressed in the reviews and the quality of the reviews have a significant impact on the

future sales performance of products in question. For the sentiment factor, we propose

Sentiment PLSA (S-PLSA), in which a review is considered as a document generated by

a number of hidden sentiment factors, in order to capture the complex nature of

sentiments. Training an S-PLSA model enables us to obtain a succinct summary of the

sentiment information embedded in the reviews. Based on S-PLSFA, we propose ARSA,

an Autoregressive Sentiment-Aware model for sales prediction. We then seek to further

improve the accuracy of prediction by considering the quality factor, with a focus on

predicting the quality of a review in the absence of user-supplied indicators, and present

ARSQA, an Autoregressive Sentiment and Quality Aware model, to utilize sentiments

and quality for predicting product sales performance. Extensive experiments conducted

on a large movie data set confirm the effectiveness of the proposed approach.

39. Cloud Computing Security: From Single to Multi-Clouds The use of cloud computing has increased rapidly in many organizations. Cloud computing

provides many benefits in terms of low cost and accessibility of data. Ensuring the security of

cloud computing is a major factor in the cloud computing environment, as users often store

sensitive information with cloud storage providers but these providers may be untrusted.


Dealing with “single cloud” providers is predicted to become less popular with customers

due to risks of service availability failure and the possibility of malicious insiders in the

single cloud. A movement towards “multi-clouds”, or in other words, “interclouds” or

“cloud-ofclouds” has emerged recently. This paper surveys recent research related to single

and multi-cloud security and addresses possible solutions. It is found that the research into

the use of multicloud providers to maintain security has received less attention from the

research community than has the use of single clouds. This work aims to promote the use of

multi-clouds due to its ability to reduce security risks that affect the cloud computing user.

40. Optimization of Resource Provisioning Cost in Cloud Computing In cloud computing, cloud providers can offer cloud consumers two provisioning

plans for computing resources, namely reservation and on-demand plans. In general, cost

of utilizing computing resources provisioned by reservation plan is cheaper than that

provisioned by on-demand plan, since cloud consumer has to pay to provider in advance.

With the reservation plan, the consumer can reduce the total resource provisioning cost.

However, the best advance reservation of resources is difficult to be achieved due to

uncertainty of consumer's future demand and providers' resource prices. To address this

problem, an optimal cloud resource provisioning (OCRP) algorithm is proposed by

formulating a stochastic programming model. The OCRP algorithm can provision

computing resources for being used in multiple provisioning stages as well as a long-term

plan, e.g., four stages in a quarter plan and twelve stages in a yearly plan. The demand

and price uncertainty is considered in OCRP. In this paper, different approaches to obtain

the solution of the OCRP algorithm are considered including deterministic equivalent

formulation, sample-average approximation, and Benders decomposition. Numerical

studies are extensively performed in which the results clearly show that with the OCRP

algorithm, cloud consumer can successfully minimize total cost of resource provisioning

in cloud computing environments

41. A Secure Erasure Code-Based Cloud Storage System with Secure Data

Forwarding A cloud storage system, consisting of a collection of storage servers, provides long-term

storage services over the Internet. Storing data in a third party’s cloud system causes serious

concern over data confidentiality. General encryption schemes protect data confidentiality,

but also limit the functionality of the storage system because a few operations are supported

over encrypted data. Constructing a secure storage system that supports multiple functions is

challenging when the storage system is distributed and has no central authority. We propose

a threshold proxy re-encryption scheme and integrate it with a decentralized erasure code

such that a secure distributed storage system is formulated. The distributed storage system

not only supports secure and robust data storage and retrieval, but also lets a user forward his

data in the storage servers to another user without retrieving the data back. The main

technical contribution is that the proxy re-encryption scheme supports encoding operations

over encrypted messages as well as forwarding operations over encoded and encrypted


messages. Our method fully integrates encrypting, encoding, and forwarding. We analyze

and suggest suitable parameters for the number of copies of a message dispatched to storage

servers and the number of storage servers queried by a key server. These parameters allow

more flexible adjustment between the number of storage servers.

42. HASBE: A Hierarchical Attribute- Based Solution for Flexible and

Scalable Access Control in Cloud Computing Cloud computing has emerged as one of the most influential paradigms in the IT industry in

recent years. Since this new computing technology requires users to entrust their valuable

data to cloud providers, there have been increasing security and privacy concerns on

outsourced data. Several schemes employing attribute based encryption (ABE) have been

proposed for access control of outsourced data in cloud computing; however, most of them

suffer from inflexibility in implementing complex access control policies. In order to realize

scalable, flexible, and fine-grained access control of outsourced data in cloud computing,

in this paper, we propose hierarchical attribute-set based encryption (HASBE) by extending

cipher text policy attribute-set-based encryption (ASBE) with a hierarchical structure of

users. The proposed scheme not only achieves scalability due to its hierarchical structure, but

also inherits flexibility and fine-grained access control in supporting compound attributes of

ASBE. In addition, HASBE employs multiple value assignments for access expiration time

to deal with user revocation more efficiently than existing schemes. We formally prove the

security of HASBE based on security of the cipher text-policy attribute based encryption

(CP-ABE) scheme by Bethencourt etal. and analyze its performance and computational

complexity. We implement our scheme and show that it is both efficient and flexible in

dealing with access control for outsourced data in cloud computing with comprehensive

experiments.

43. A Distributed Access Control Architecture for Cloud Computing The large-scale, dynamic, and heterogeneous nature of cloud computing poses

numerous security challenges. But the cloud's main challenge is to provide a robust

authorization mechanism that incorporates multitenancy and virtualization aspects of

resources. The authors present a distributed architecture that incorporates principles from

security management and software engineering and propose key requirements and a

design model for the architecture.

44. Cloud Computing Security: From Single to Multi-clouds The use of cloud computing has increased rapidly in many organizations. Cloud

computing provides many benefits in terms of low cost and accessibility of data.

Ensuring the security of cloud computing is a major factor in the cloud computing

environment, as users often store sensitive information with cloud storage providers but

these providers may be untrusted. Dealing with "single cloud" providers is predicted to

become less popular with customers due to risks of service availability failure and the

possibility of malicious insiders in the single cloud. A movement towards "multi-clouds",

or in other words, "interclouds" or "cloud-of-clouds" has emerged recently. This paper


surveys recent research related to single and multi-cloud security and addresses possible

solutions. It is found that the research into the use of multi-cloud providers to maintain

security has received less attention from the research community than has the use of

single clouds. This work aims to promote the use of multi-clouds due to its ability to

reduce security risks that affect the cloud computing user.

45. Scalable and Secure Sharing of Personal Health Records in Cloud

Computing using Attribute-based Encryption Personal health record (PHR) is an emerging patient centric model of health information

exchange, which is often outsourced to be stored at a third party, such as cloud providers.

However, there have been wide privacy concerns as personal health information could be

exposed to those third party servers and to unauthorized parties. To assure the patients’

control over access to their own PHRs, it is a promising method to encrypt the PHRs before

outsourcing. Yet, issues such as risks of privacy exposure, scalability in key management,

flexible access and efficient user revocation, have remained the most important challenges

toward achieving fine-grained, cryptographically enforced data access control. In this paper,

we propose a novel patient-centric framework and a suite of mechanisms for data access

control to PHRs stored in semi-trusted servers. To achieve fine-grained and scalable data

access control for PHRs, we leverage attribute based encryption (ABE) techniques to encrypt

each patient’s PHR file. Different from previous works in secure data outsourcing, we focus

on the multiple data owner scenario, and divide the users in the PHR system into multiple

security domains that greatly reduces the key management complexity for owners and users.

A high degree of patient privacy is guaranteed simultaneously by exploiting multi-authority

ABE. Our scheme also enables dynamic modification of access policies or file attributes,

supports efficient on-demand user/attribute revocation and break-glass access under

emergency scenarios. Extensive analytical and experimental results are presented which

show the security, scalability and efficiency of our proposed scheme

46. Secure and Practical Outsourcing of Linear Programming in Cloud

Computing Cloud Computing has great potential of providing robust computational power to the

society at reduced cost. It enables customers with limited computational resources to

outsource their large computation workloads to the cloud, and economically enjoy the

massive omputational power, bandwidth, storage, and even appropriate software that can

be shared in a pay-per-use manner. Despite the tremendous benefits, security is the

primary obstacle that prevents the wide adoption of this promising computing model,

especially for customers when their confidential data are consumed and produced during

the computation. Treating the cloud as an intrinsically insecure computing platform from

the viewpoint of the cloud customers, we must design mechanisms that not only protect

sensitive information by enabling computations with encrypted data, but also protect

customers from malicious behaviors by enabling the validation of the computation result.

Such a mechanism of general secure computation outsourcing was recently shown to be

feasible in theory, but to design mechanisms that are practically efficient remains a very


challenging problem. Focusing on engineering computing and optimization tasks, this

paper investigates secure outsourcing of widely applicable linear programming (LP)

computations. In order to achieve practical efficiency, our mechanism design explicitly

decomposes the LP computation outsourcing into public LP solvers running on the cloud

and private LP parameters owned by the customer. The resulting flexibility allows us to

explore appropriate security/ efficiency tradeoff via higher level abstraction of LP

computations than the general circuit representation. In particular, by formulating private

data owned by the customer for LP problem as a set of matrices and vectors, we are able

to develop a set of efficient privacy-preserving problem transformation techniques, which

allow customers to transform original LP problem into some arbitrary one while

protecting sensitive input/output information. To validate the computation result, we

further explore the fundamental duality theorem of LP computation and derive the

necessary and sufficient conditions that correct result must satisfy. Such result

verification mechanism is extremely efficient and incurs close-tozero additional cost on

both cloud server and customers. Extensive security analysis and experiment results show

the immediate practicability of our mechanism design.

47. Secure and privacy preserving keyword searching for cloud storage

services Cloud storage services enable users to remotely access data in a cloud anytime and

anywhere, using any device, in a pay-as-you-go manner. Moving data into a cloud offers

great convenience to users since they do not have to care about the large capital

investment in both the deployment and management of the hardware infrastructures.

however, allowing a cloud service provider (CSP), whose purpose is mainly for making a

profit, to take the custody of sensitive data, raises underlying security and privacy issues.

To keep user data confidential against an untrusted CSP, a natural way is to apply

cryptographic approaches, by disclosing the data decryption key only to authorized users.

However, when a user wants to retrieve files containing certain keywords using a thin

client, the adopted encryption system should not only support keyword searching over

encrypted data, but also provide high performance. In this paper, we investigate the

characteristics of cloud storage services and propose a secure and privacy preserving

keyword searching (SPKS) scheme, which allows the CSP to participate in the

decipherment, and to return only files containing certain keywords specified by the users,

so as to reduce both the computational and communication overhead in decryption for

users, on the condition of preserving user data privacy and user querying privacy.

Performance analysis shows that the SPKS scheme is applicable to a cloud environment.

48. Data Security and Privacy Protection Issues in Cloud Computing It is well-known that cloud computing has many potential advantages and many

enterprise applications and data are migrating to public or hybrid cloud. But regarding

some business-critical applications, the organizations, especially large enterprises, still

wouldn't move them to cloud. The market size the cloud computing shared is still far

behind the one expected. From the consumers' perspective, cloud computing security

concerns, especially data security and privacy protection issues, remain the primary

inhibitor for adoption of cloud computing services. This paper provides a concise but all-

round analysis on data security and privacy protection issues associated with cloud


computing across all stages of data life cycle. Then this paper discusses some current

solutions. Finally, this paper describes future research work about data security and

privacy protection issues in cloud.

49. Application study of online education platform based on cloud computing

Aimed at some problems in Network Education Resources Construction at present, we analyze

the characteristics and application range of cloud computing, and present an integrated solving

scheme. On that basis, some critical technologies such as the cloud storage, streaming media and

cloud safety are analyzed in detail. Finally, the paper gives summarization and expectation.

50. Mining User Queries with Markov Chains Application to Online Image Retrieval

We propose a novel method for automatic annotation, indexing and annotation-based

retrieval of images. The new method, that we call Markovian Semantic Indexing (MSI),

is presented in the context of an online image retrieval system. Assuming such a system,

the users' queries are used to construct an Aggregate Markov Chain (AMC) through

which the relevance between the keywords seen by the system is defined. The users'

queries are also used to automatically annotate the images. A stochastic distance between

images, based on their annotation and the keyword relevance captured in the AMC, is

then introduced. Geometric interpretations of the proposed distance are provided and its

relation to a clustering in the keyword space is investigated. By means of a new measure

of Markovian state similarity, the mean first cross passage time (CPT), optimality

properties of the proposed distance are proved. Images are modeled as points in a vector

space and their similarity is measured with MSI. The new method is shown to possess

certain theoretical advantages and also to achieve better Precision versus Recall results

when compared to Latent Semantic Indexing (LSI) and probabilistic Latent Semantic

Indexing (pLSI) methods in Annotation-Based Image Retrieval (ABIR) tasks.

51. Towards Trustworthy Resource Scheduling in Clouds Managing the allocation of cloud virtual machines at physical resources is a key

requirement for the success of clouds. Current implementations of cloud schedulers do

not consider the entire cloud infrastructure neither do they consider the overall user and

infrastructure properties. This results in major security, privacy, and resilience concerns.

In this paper, we propose a novel cloud scheduler which considers both user requirements

and infrastructure properties. We focus on assuring users that their virtual resources are

hosted using physical resources that match their requirements without getting users

involved with understanding the details of the cloud infrastructure. As a proof-of-

concept, we present our prototype which is built on OpenStack. The provided prototype

implements the proposed cloud scheduler. It also provides an implementation of our

previous work on cloud trust management which provides the scheduler with input about

the trust status of the cloud infrastructure.

Documents

IEEE Project Topics project Topic… · such as games, and productivity applications access to user online private data. Such accesses must be authorized by users at installation