Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1 IEEE Project Topics| Soham Consultants
IEEE Project Topics
1. Enabling Secure and Efficient Ranked Keyword Search over Outsourced
Cloud Data In this paper, we define and solve the problem of secure ranked keyword search over
encrypted cloud data. Ranked search greatly enhances system usability by enabling search result
relevance ranking instead of sending undifferentiated results, and further ensures the file
retrieval accuracy
2. A Secure Erasure Code-Based Cloud Storage System with Secure Data
Forwarding We propose a threshold proxy re-encryption scheme and integrate it with a
decentralized erasure code such that a secure distributed storage system is formulated. The
distributed storage system not only supports secure and robust data storage and retrieval, but
also lets a user forward his data in the storage servers to another user without retrieving the
data back.
3. SeDas A Self-Destructing Data System Based on Active Storage Framework
Personal data stored in the Cloud may contain account numbers, passwords, notes,
and other important information that could be used and misused
by a miscreant,a competitor, or a court of law. These data are cached, copied, and
archived by Cloud Service Providers (CSPs), often without users' authorization and
control. Self-destructing data mainly aims at protecting the user data's privacy. All
the data and their copies become destructed or unreadable after a user-specified time,
without any user intervention. In addition, the decryption key is destructed after the user-
specified time. In this paper, we presentSeDas, a system that meets this challeng
through a novel integration of cryptographic techniques
with active storagetechniques based on T10 OSD standard. We implemented aproof-of-
concept SeDas prototype. Through functionality and security properties evaluations of
the SeDas prototype, the results demonstrate that SeDas is practical to use and meets all
the privacy-preserving goals described. Compared to the system without self-destructing data mechanism, throughput for uploading and downloading with the
proposed SeDas acceptably decreases by less than 72%, while latency for
upload/download operations with self-destructing data mechanism increases by less than
60%.
4. Privacy as a Service: Privacy-Aware Data Storage and Processing in Cloud Computing Architectures
In this paper we present PasS (privacy as a service); a set of security protocols for
ensuring the privacy and legal compliance of customer data in cloud computing
architectures. PasS allows for the secure storage and processing of users' confidential data
by leveraging the tamper-proof capabilities of cryptographic coprocessors. Using tamper-
2 IEEE Project Topics| Soham Consultants
proof facilities provides a secure execution domain in the computing cloud that is
physically and logically protected from unauthorized access. PasS central design goal is
to maximize users' control in managing the various aspects related to the privacy of
sensitive data. This is achieved by implementing user-configurable software protection
and data privacy mechanisms. Moreover, PasS provides a privacy feedback process
which informs users of the different privacy operations applied on their data and makes
them aware of any potential risks that may jeopardize the confidentiality of their sensitive
information. To the best of our knowledge, PasS is the first practical cloud computing
privacy solution that utilizes previous research on cryptographic coprocessors to solve the
problem of securely processing sensitive data in cloud computing infrastructures.
5. Trustrace: Mining Software Repositories to Improve the Accuracy of
Requirement Traceability Links Traceability is the only means to ensure that the source code of a system is consistent
with its requirements and that all and only the specified requirements have been
implemented by developers. During software maintenance and evolution, requirement
traceability links become obsolete because developers do not/cannot devote effort to
updating them. Yet, recovering these traceability links later is a daunting and costly task
for developers. Consequently, the literature has proposed methods, techniques, and tools
to recover these traceability links semi-automatically or automatically. Among the
proposed techniques, the literature showed that information retrieval (IR) techniques can
automatically recover traceability links between free-text requirements and source code.
However, IR techniques lack accuracy (precision and recall). In this paper, we show that
mining software repositories and combining mined results with IR techniques can
improve the accuracy (precision and recall) of IR techniques and we propose Trustrace, a
trust--based traceability recovery approach. We apply Trustrace on four medium-size
open-source systems to compare the accuracy of its traceability links with those
recovered using state-of-the-art IR techniques from the literature, based on the Vector
Space Model and Jensen-Shannon model. The results of Trustrace are up to 22.7 percent
more precise and have 7.66 percent better recall values than those of the other techniques,
on average. We thus show that mining software repositories and combining the mined
data with existing results from IR techniques improves the precision and recall of
requirement traceability links.
6. Toward Secure and Dependable Storage Services in Cloud Computing We propose in this paper a flexible distributed storage integrity auditing mechanism,
utilizing the homomorphism token and distributed erasure-coded data. The proposed design
allows users to audit the cloud storage with very lightweight communication and computation
cost.
7. Enhanced data security model for cloud computing Cloud Computing becomes the next generation architecture of IT Enterprise. In contrast
to traditional solutions, Cloud computing moves the application software and databases to the
large data centers, where the management of the data and services may not be fully
trustworthy. This unique feature, however, raises many new security challenges which have not
3 IEEE Project Topics| Soham Consultants
been well understood. In cloud computing, both data and software are fully not contained on
the user's computer; Data Security concerns arising because both user data and program are
residing in Provider Premises. Clouds typically have a single security architecture but have many
customers with different demands. Every cloud provider solves this problem by encrypting the
data by using encryption algorithms. This paper investigates the basic problem of cloud
computing data security. We present the data security model of cloud computing based on the
study of the cloud architecture. We improve data security model for cloud computing. We 2012
implement software to enhance work in a data security model for cloud computing. Finally apply
this software in the Amazon EC2 Micro instance
8. Revisiting Defenses against Large-Scale Online Password Guessing
Attacks Brute force and dictionary attacks on password-only remote login services are now
widespread and ever increasing. Enabling convenient login for legitimate users while preventing
such attacks is a difficult problem. Automated Turing Tests (ATTs) continue to be an effective,
easy-to-deploy approach to identify automated malicious login attempts with reasonable cost of
inconvenience to users. In this paper, we discuss the inadequacy of existing and proposed login
protocols designed to address large scale online dictionary attacks (e.g., from a botnet of
hundreds of thousands of nodes). We propose a new Password Guessing Resistant Protocol
PGRP), derived upon revisiting prior proposals designed to restrict such attacks. While PGRP
limits the total number of login attempts from unknown remote hosts to as low as a single
attempt per username, legitimate users in most cases (e.g., when attempts are made from
known, frequently-used machines) can make several failed login attempts before being
challenged with an ATT. We analyze the performance of PGRP with two real world data sets and
find it more promising than existing proposals.
9. Enhanced Privacy ID: A Direct Anonymous Attestation Scheme with
Enhanced Revocation Capabilities Direct Anonymous Attestation (DAA) is a scheme that enables the remote
authentication of a Trusted Platform Module (TPM) while preserving the user’s privacy. A TPM
can prove to a remote party that it is a valid TPM without revealing its identity and without
likability. In the DAA scheme, a TPM can be evoked only if the DAA private key in the hardware
has been extracted and published widely so that verifiers obtain the corrupted private key. If the
unlink ability requirement is relaxed, a TPM suspected of being compromised can be revoked
even if the private key is not known. However, with the full unlink ability requirement intact, if a
TPM has been compromised but its private key has not been distributed to verifiers, the TPM
cannot be revoked. Furthermore, a TPM cannot be revoked from the issuer, if the TPM is found
to be compromised after the DAA issuing has occurred. In this paper, we present a new DAA
scheme called Enhanced Privacy ID (EPID) scheme that addresses the above limitations. While
still providing unlinks ability, our scheme provides a method to revoke a TPM even if the TPM
4 IEEE Project Topics| Soham Consultants
private key is unknown. This expanded revocation property makes the scheme useful for other
applications such as for driver’s license. Our EPID scheme is efficient and provably secure in the
same security model as DAA, i.e., in the random oracle model under the strong RSA assumption
and the decisional Diffie-Hellman assumption.
10. Balancing the Tradeoffs between Query Delay and Data
Availability in MANETs – In mobile ad hoc networks (MANETs), nodes move freely and link/node failures are
common, which leads to frequent network partitions. When a network partition occurs, mobile
nodes in one partition are not able to access data hosted by nodes in other partitions, and
hence significantly degrade the performance of data access. To deal with this problem, we apply
data 63replication techniques. Existing data replication solutions in both wired and wireless
networks aim at either reducing the query delay or improving the data availability, but not both.
As both metrics are important for mobile nodes, we propose schemes to balance the tradeoffs
between data availability and query delay under different system settings and requirements.
Extensive simulation results show that the proposed schemes can achieve a balance between
these two metrics and provide satisfying system performance.
11. M-Score A Misuse ability Weight Measure Detecting and preventing data leakage and data misuse poses a serious challenge for
organizations, especially when dealing with insiders with legitimate permissions to access the
organization's systems and its critical data. In this paper, we present a new concept, Misuseability
Weight, for estimating the risk emanating from data exposed to insiders. This concept focuses on
assigning a score that represents the sensitivity level of the data exposed to the user and by that
predicts the ability of the user to maliciously exploit this data. Then, we propose a new measure, the
M-score, which assigns a misuseability weight to tabular data, discuss some of its properties, and
demonstrate its usefulness in several leakage scenarios. One of the main challenges in applying the
M-score measure is in acquiring the required knowledge from a domain expert. Therefore, we
present and evaluate two approaches toward eliciting misuseability conceptions from the domain
expert.
12. Recommendation Models for Open Authorization Major online platforms such as Facebook, Google, and Twitter allow third-party applications
such as games, and productivity applications access to user online private data. Such accesses
must be authorized by users at installation time. The Open Authorization protocol (OAuth) was
introduced as a secure and efficient method for authorizing third-party applications without
releasing a user's access credentials. However, OAuth implementations don't provide the
necessary fine grained access control, nor any recommendations, i.e., which access control
decisions are most appropriate. We propose an extension to the OAuth 2.0 authorization that
5 IEEE Project Topics| Soham Consultants
enables the provisioning of fine-grained authorization recommendations to users when granting
permissions to third-party applications. We propose a multicriteria recommendation model that
utilizes application-based, user-based, and category based collaborative filtering mechanisms.
Our collaborative filtering mechanisms are based on previous user decisions, and application
permission requests to enhance the privacy of the overall site's user population. We
implemented our proposed OAuth extension as a browser extension that allows users to easily
configure their privacy settings at application installation time, provides recommendations on
requested privacy permissions, and collects data regarding user decisions. Our experiments on
the collected data indicate that the proposed framework efficiently enhanced the user
awareness and privacy related to third-party application authorizations.
13. Revisiting Defenses against Large-Scale Online Password Guessing
Attacks Brute force and dictionary attacks on password-only remote login services are now
widespread and ever increasing. Enabling convenient login for legitimate users while
preventing such attacks is a difficult problem. Automated Turing Tests (ATTs) continue to
be an effective, easy-to-deploy approach to identify automated malicious login attempts
with reasonable cost of inconvenience to users. In this paper, we discuss the inadequacy of
existing and proposed login protocols designed to address large-scale online dictionary
attacks (e.g., from a botnet of hundreds of thousands of nodes). We propose a new
Password Guessing Resistant Protocol (PGRP), derived upon revisiting prior proposals
designed to restrict such attacks. While PGRP limits the total number of login attempts
from unknown remote hosts to as low as a single attempt per username, legitimate users in
most cases (e.g., when attempts are made from known, frequently-used machines) can
make several failed login attempts before being challenged with an ATT. We analyze the
performance of PGRP with two real world data sets and find it more promising than
existing proposals.
14. Outsourced Similarity Search on Metric Data Assets
This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment.
15. Multiparty Access Control for Online Social Networks: Model and
Mechanisms In this paper, we propose an approach to enable the protection of shared data
associated with multiple users in Online social networks. We formulate an access control
model to capture the essence of multiparty authorization requirements, along with a
multiparty policy speci?cation scheme and a policy enforcement mechanism
6 IEEE Project Topics| Soham Consultants
16. A Query Formulation Language for the Data Web We present a query formulation language (called MashQL) in order to easily query
and fuse structured data on the web. The main novelty of MashQL is that it allows people
with limited IT skills to explore and query one (or multiple) data sources without prior
knowledge about the schema, structure, vocabulary, or any technical details of these sources.
More importantly, to be robust and cover most cases in practice, we do not assume that a data
source should have - an offline or inline - schema. This poses several language-design and
performance complexities that we fundamentally tackle. To illustrate the query formulation
power of MashQL, and without loss of generality, we chose the Data web scenario. We also
chose querying RDF, as it is the most primitive data model; hence, MashQL can be similarly
used for querying relational databases and XML. We present two implementations of
MashQL, an online mashup editor, and a Firefox add on. The former illustrates how MashQL
can be used to query and mash up the Data web as simple as filtering and piping web feeds;
and the Firefox add on illustrates using the browser as a web composer rather than only a
navigator. To end, we evaluate MashQL on querying two data sets, DBLP and DBPedia, and
show that our indexing techniques allow instant user interaction.
17. Incremental Information Extraction Using Relational Database Information extraction systems are traditionally implemented as a pipeline of special-
purpose processing modules targeting the extraction of a particular kind of information. A
major drawback of Databases such an approach is that whenever a new extraction goal
emerges or a module is improved, extraction has to be reapplied from scratch to the entire
text corpus even though only a small part of the corpus might be affected. In this paper, we
describe a novel approach for information extraction in which extraction needs are expressed
in the form of database queries, which are evaluated and optimized by database systems.
Using database queries for information extraction enables generic extraction and minimizes
reprocessing of data by performing incremental extraction to identify which part of the data
is affected by the change of components or goals. Furthermore, our approach provides
automated query generation components so that casual users do not have to learn the query
language in order to perform extraction. To demonstrate the feasibility of our incremental
extraction approach, we performed experiments to highlight two important aspects of an
information extraction system: efficiency and quality of extraction results. Our experiments
show that in the event of deployment of a new module, our incremental extraction approach
reduces the processing time by 89.64 percent as compared to a traditional pipeline approach.
By applying our methods to a corpus of 17 million biomedical abstracts, our experiments
show that the query performance is efficient for real-time applications. Our experiments also
revealed that our approach achieves high quality extraction results.
18. Load-Balancing Multipath Switching System with Flow Slice Multipath Switching systems (MPS) are intensely used in state-of-the-art core routers
to provide terabit or even petabit switching capacity. One of the most intractable issues in
designing MPS is how to load balance traffic across its multiple paths while not disturbing
the intraflow packet orders. Previous packet-based solutions either suffer from delay
penalties or lead to O(N2 ) hardware complexity, hence do not scale. Flow-based hashing
7 IEEE Project Topics| Soham Consultants
algorithms also perform badly due to the heavy-tailed flow-size distribution. In this paper, we
develop a novel scheme, namely, Flow Slice (FS) that cuts off each flow into flow slices at
every interflow interval larger than a slicing threshold and balances the load on a finer
granularity. Based on the studies of tens of real Internet traces, we show that setting a slicing
threshold of 1-4 ms, the FS scheme achieves comparative load balancing performance to the
optimal one. It also limits the probability of out-of-order packets to a negligible level (10-6)
on three popular MPSes at the cost of little hardware complexity and an internal speedup up
to two. These results are proven by theoretical analyses and also validated through trace
driven prototype simulations
19. Application study of online education platform based on cloud
computing Aimed at some problems in Network Education Resources Construction at present,
we analyze the characteristics and application range of cloud computing, and present an
integrated solving scheme. On that basis, some critical technologies such as the cloud
storage, streaming media and cloud safety are analyzed in detail. Finally, the paper gives
summarization and expectation.
20. Towards temporal access control in cloud computing Access control is one of the most important security mechanisms in cloud computing.
Attribute-based access control provides a flexible approach that allows data owners to
integrate data access policies within the encrypted data. However, little work has been done
to explore temporal attributes in specifying and enforcing the data owner's policy and the
data user's privileges in cloud-based environments. In this paper, we present an efficient
temporal access control encryption scheme for cloud services with the help of cryptographic
integer comparisons and a proxy-based re-encryption mechanism on the current time. We
also provide a dual comparative expression of integer ranges to extend the power of attribute
expression for implementing various temporal constraints. We prove the security strength of
the proposed scheme and our experimental results not only validate the effectiveness of our
scheme, but also show that the proposed integer comparison scheme performs significantly
better than previous bitwise comparison scheme.
21. Cloud intelligent track – Risk analysis and privacy data management in
the cloud computing Cloud computing is a computing platform with the backbone of internet to store,
access the data and application which is in the cloud, not in the computer. The biggest issue
which should be addressed in cloud computing are security and privacy. Outsourcing data to
other companies worries internet clients to think about the privacy data. Most Enterprise
executives hesitate to use cloud computing system due to their sensitive enterprise
information. This paper provides data integrity and user privacy through cloud intelligent
track system. This paper discuss about the previous experiment done on the privacy and data
management. The work proposes the Architecture or system which provides intelligent track
in Privacy Manager and Risk Manager to address privacy issues which rules the cloud
environment.
8 IEEE Project Topics| Soham Consultants
22. Measurement and utilization of customer-provided resources for cloud
computing Recent years have witnessed cloud computing as an efficient means for providing
resources as a form of utility. Driven by the strong demands, such industrial leaders as
Amazon, Google, and Microsoft have all offered practical cloud platforms, mostly datacenter
based. These platforms are known to be powerful and cost-effective. Yet, as the cloud
customers are pure consumers, their local resources, though abundant, have been largely
ignored. In this paper, we for the first time investigate a novel customer-provided cloud
platform, Spot Cloud, through extensive measurements. Complementing data centers, Spot
Cloud enables customers to contribute/sell their private resources to collectively offer cloud
services. We find that, although the capacity as well as the availability of this platform is not
yet comparable to enterprise datacenters, Spot Cloud can provide very flexible services to
customers in terms of both performance and pricing. It is friendly to the customers who often
seek to run short-term and customized tasks at minimum costs. However, different from the
standardized enterprise instances, Spot Cloud instances are highly diverse, which greatly
increase the difficulty of instance selection. To solve this problem, we propose an instance
recommendation mechanism for cloud service providers to recommend short-listed instances
to the customers. Our model analysis and the real world experiments show that it can help the
customers to find the best tradeoff between benefit and cost.
23. Improving public audit ability, data possession in data storage security
for cloud computing Cloud computing is Internet based technology where the users can subscribe high
quality of services from data and software that resides solely in the remote servers. This
provides many benefits for the users to create and store data in the remote servers thereby
utilizing fewer resources in client system. However management of the data and software
may not be fully Trustworthy which possesses many security challenges. One of the
security issues is the data storage security where frequent integrity checking of remotely
stored data is carried out. RSA based storage security (RSASS) method uses public
auditing of the remote data by improving existing RSA based signature generation. This
public key cryptography technique is widely used for providing strong security. Using
this RSASS method, the data storage correctness is assured and identification of
misbehaving server with high probability is achieved. This method also supports dynamic
operation on the data and tries to reduce the server computation time. The preliminary
results achieved through RSASS, proposed scheme outperforms with improved security
in data storage when compared with the existing methods.
24. Implementation of Map Reduce-based image conversion module in
cloud computing environment In recent years, the rapid advancement of the Internet and the growing number of
people using social networking services (SNSs) have facilitated the sharing of
multimedia data. However, multimedia data processing techniques such as transcoding
9 IEEE Project Topics| Soham Consultants
and transmoding impose a considerable burden on the computing infrastructure as the
amount of data increases. Therefore, we propose a MapReduce-based image-conversion
module in cloud computing environment in order to reduce the burden of computing
power. The proposed module consists of two parts: a storage system, i.e., Hadoop
distributed file system (HDFS) for image data and a MapReduce program with a Java
Advanced Imaging (JAI) library for image transcoding. It can process image data in
distributed and parallel cloud computing environments, thereby minimizing the
computing infrastructure overhead. In this paper, we describe the implementation of the
proposed module using Hadoop and JAI. In addition, we evaluate the proposed module in
terms of processing time under varying experimental conditions.
25. Distributed -Optimal User Association and Cell Load Balancing in Wireless
Networks
In this paper, we develop a framework for user association in infrastructure-based
wireless networks, specifically focused on flow-level cell load balancing under spatially
inhomogeneous traffic distributions. Our work encompasses several different user
associations Policies: rate-optimal, throughput-optimal, delay optimal, and load-equalizing,
which we collectively denote α-optimal user association. We prove that the optimal load
vector ρ* that minimizes a generalized system performance function is the fixed point of a
certain mapping. Based on this mapping, we propose and analyze an iterative distributed user
association policy that adapts to spatial traffic loads and converges to a globally optimal
allocation. We then address admission control policies for the case where the system
is overloaded. For an appropriate system-level cost function, the optimal admission control
policy blocks all flows at cells edges. However, providing a minimum level of connectivity
to all spatial locations might be desirable. To this end, a location-dependent random blocking
and user association policy are proposed.
26. Ensuring Distributed Accountability for Data Sharing in the Cloud
Cloud computing enables highly scalable services to be easily consumed over the Internet on
an as- needed basis. A major feature of the cloud services is that users’ data are usually processed
remotely in unknown machines that users do not own or operate. While enjoying the convenience
brought by this new emerging technology, users’ fears of losing control of their own data
(particularly, financial and health data) can become a significant barrier to the wide adoption of
cloud services. To address this problem, here, we propose a novel highly decentralized information
accountability framework to keep track of the actual usage of the users’ data in the cloud. In
particular, we propose an object-centered approach that enables enclosing our logging mechanism
together with users’ data and policies. We leverage the JAR programmable capabilities to both
create a dynamic and traveling object, and to ensure that any access to users’ data will trigger
authentication and automated logging local to the JARs. To strengthen user’s control, we also
provide distributed auditing mechanisms. We provide extensive experimental studies that
demonstrate the efficiency and effectiveness of the proposed approaches.
10 IEEE Project Topics| Soham Consultants
27. A Learning-Based Approach to Reactive Security
Despite the conventional wisdom that proactive security is superior to reactive security, we
show that reactive security can be competitive with proactive security as long as the reactive
defender learns from past attacks instead of myopically overreacting to the last attack. Our game-
theoretic model follows common practice in the security literature by making worst case
assumptions about the attacker: we grant the attacker complete knowledge of the defender's
strategy and do not require the attacker to act rationally. In this model, we bound the competitive
ratio between a reactive defense algorithm (which is inspired by online learning theory) and the best
fixed proactive defense. Additionally, we show that,
unlike proactive defenses, this reactive strategy is robust to a lack of information about the
attacker's incentives and knowledge.
28. Persuasive Cued Click-Points Design, Implementation, and Evaluation of a
Knowledge- Based Authentication Mechanism
This paper presents an integrated evaluation of the Persuasive Cued Click-Points
graphical password scheme, including usability and security evaluations, and
implementation considerations. An important usability goal for knowledge-based
authentication systems is to support users in selecting passwords of higher security, in the
sense of being from an expanded effective security space. We use persuasion to influence
user choice in click-based graphical passwords, encouraging users to select more random,
and hence more difficult to guess, click-points.
29. A Methodology for Direct and Indirect Discrimination Prevention in Data Mining
In this paper, we tackle discrimination prevention in data mining and propose new
techniques applicable for direct or indirect Discrimination prevention individually or both
at the same time. We discuss how to clean training datasets and outsourced datasets in
such a way that direct and/or indirect discriminatory decision rules are converted to
legitimate (non- discriminatory) Classification rules.
30. Prediction of User’s Web-Browsing Behavior: Application of Markov Model
Predicting user's behavior while serving the Internet can be applied effectively in
various critical applications. Such application has traditional tradeoffs between modeling
complexity and prediction accuracy. In this paper, we analyze and study Markov model
and all- Kth Markov model in Web prediction. We propose a new modified Markov
model to alleviate the issue of scalability in the number of paths.
31. Query Planning for Continuous Aggregation Queries over a Network of Data
Aggregators
We present a low-cost, scalable technique to answer continuous aggregation queries using a
network of aggregators of dynamic data items. In such a network of data aggregators, each
data aggregator serves a set of data items at specific coherencies.
11 IEEE Project Topics| Soham Consultants
32. Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
Preparing a data set for analysis is generally the most time consuming task in a data
mining project, requiring many complex SQL queries, joining tables, and aggregating
columns. Existing SQL aggregations have limitations to prepare data sets because they return
one column per aggregated group. In general, a significant manual effort is required to build
data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to
generate SQL code to return aggregated columns in a horizontal tabular layout, returning a
set of numbers instead of one number per row. This new class of functions is called
horizontal aggregations. Horizontal aggregations build data sets with a horizontal
denormalized layout (e.g., pointdimension, observationvariable, instance-feature), which is
the standard layout required by most data mining
algorithms. We propose three fundamental methods to evaluate horizontal aggregations:
CASE: Exploiting the programming CASE construct; SPJ: Based on standard relational
algebra operators (SPJ queries); PIVOT: Using the PIVOT operator, which is offered by
some DBMSs.
Experiments with large tables compare the proposed query evaluation methods. Our CASE
method has similar speed to the PIVOT operator and it is much faster than the SPJ method.
In general, the CASE and PIVOT methods exhibit linear scalability, whereas the SPJ method
does
33. Enabling Multilevel Trust in Privacy Preserving Data Mining
Privacy Preserving Data Mining (PPDM) addresses the problem of developing
accurate models about aggregated data without access to precise information in
individual data record. A widely studied perturbation-based PPDM approach introduces
random perturbation to individual values to preserve privacy before data are published.
Previous solutions of this approach are limited in their tacit assumption of single-level
trust on data miners. In this work, we relax this assumption and expand the scope of
perturbation-based PPDM to Multilevel Trust (MLTPPDM). In our setting, the more
trusted a data miner is, the less perturbed copy of the data it can access. Under this
setting, a malicious data miner may have access to differently perturbed copies of the
same data through various means, and may combine these diverse copies to jointly infer
additional information about the original data that the data owner does not intend to
release. Preventing such diversity attacks is the key challenge of providing MLT-PPDM
services. We address this challenge by properly correlating perturbation across copies at
different trust levels. We prove that our solution is robust against diversity attacks with
respect to our privacy goal. That is, for data miners who have access to an arbitrary
collection of the perturbed copies, our solution prevent them from jointly reconstructing
the original data more accurately than the best effort using any individual copy in the
collection. Our solution allows a data owner to generate perturbed copies of its data for
arbitrary trust levels on demand. This feature offers data owner’s maximum flexibility.
34. A Genetic Programming Approach to Record Deduplication Several systems that rely on consistent data to offer highquality services, such as
digital libraries and e-commerce brokers, may be affected by the existence of duplicates,
quasi replicas, or near-duplicate entries in their repositories. Because of that, there have
12 IEEE Project Topics| Soham Consultants
been significant investments from private and government organizations for developing
methods for removing replicas from its data repositories. This is due to the fact that clean
and replica-free repositories not only allow the retrieval of higher quality information but
also lead to more concise data and to potential savings in computational time and
resources to process this data. In this paper, we propose a genetic programming approach
to record deduplication that combines several different pieces of evidence extracted from
the data content to find a deduplication function that is able to identify whether two
entries in a repository are replicas or not. As shown by our experiments, our approach
outperforms an existing stateof- the-art method found in the literature. Moreover, the
suggested functions are computationally less demanding since they use fewer evidence.
In addition, our genetic programming approach is capable of automatically
adapting these functions to a given fixed replica
identification boundary, freeing the user from the burden
of having to choose and tune this parameter.
35. A Probabilistic Scheme for Keyword- Based Incremental Query
Construction Databases enable users to precisely express their informational needs using
structured queries. However, database query construction is a laborious and error prone
process, which cannot be performed well by most end users. Keyword search alleviates
the usability problem at the price of query expressiveness. As keyword search algorithms
do not differentiate between the possible informational needs represented by a keyword
query, users may not receive adequate results. This paper presents IQP - a novel approach
to bridge the gap between usability of keyword search and expressiveness of database
queries. IQP enables a user to start with an arbitrary keyword query and incrementally
refine it into a structured query through an interactive interface. The enabling techniques
of IQP include: 1) a probabilistic framework for incremental query construction; 2) a
probabilistic model to assess the possible informational needs represented by a keyword
query; 3) an algorithm to obtain the optimal query construction process. This paper
presents the detailed design of IQP, and demonstrates its effectiveness and scalability
through experiments over real-world data and a user study.
36. Effective Pattern Discovery for Text Mining Many data mining techniques have been proposed for mining useful patterns in text
documents. However, how to effectively use and update discovered patterns is still
an open research issue, especially in the domain of text mining. Since most existing text
mining methods adopted term-based approaches, they all suffer from the problems of
polysemy and synonymy. Over the years, people have often held the hypothesis that
pattern (or phrase)-based approaches should perform better than the term-based ones, but
many experiments do not support this hypothesis. This paper presents an innovative and
effective pattern discovery technique which includes the processes of pattern deploying
and pattern evolving, to improve the effectiveness of using and updating discovered
patterns for finding relevant and interesting information. Substantial experiments on
RCV1 data collection and TREC topics demonstrate that the proposed solution achieves
encouraging performance.
13 IEEE Project Topics| Soham Consultants
37. Efficient Fuzzy Type- Ahead Search in XML Data In a traditional keyword-search system over XML data, a user composes a keyword
query, submits it to the system, and retrieves relevant answers. In the case where the user
has imited knowledge about the data, often the user feels “left in the dark” when issuing
queries, and has to use a try-and-see approach for finding information. In this paper, we
study fuzzy type-ahead search in XML data, a new information-access paradigm in which
the system searches XML data on the fly as the user types in query keywords. It allows
users to explore data as they type, even in the presence of minor errors of their keywords.
Our proposed method has the following features: 1)Search as you type: It extends
Autocomplete by supporting queries with multiple keywords in XML data. 2) Fuzzy: It
can find high-quality answers that have
Keywords matching query keywords approximately. 3) Efficient: Our effective index
structures and searching algorithms can achieve a very high interactive speed. We study
research challenges in this new search framework. We propose effective index structures
and top-k algorithms to achieve a high interactive speed. We examine effective ranking
functions and early termination techniques to progressively identify the top-k relevant
answers. We have implemented our method on real data sets, and the experimental results
show that our method achieves high search efficiency and result quality.
38. Mining Online Reviews for Predicting Sales Performance A Case Study
in the Movie Domain Posting reviews online has become an increasingly popular way for people to express
opinions and sentiments toward the products bought or services received. Analyzing the
large volume of online reviews available would produce useful actionable knowledge that
could be of economic values to vendors and other interested parties. In this paper, we
conduct a case study in the movie domain, and tackle the problem of mining reviews for
predicting product sales performance. Our analysis shows that both the sentiments
expressed in the reviews and the quality of the reviews have a significant impact on the
future sales performance of products in question. For the sentiment factor, we propose
Sentiment PLSA (S-PLSA), in which a review is considered as a document generated by
a number of hidden sentiment factors, in order to capture the complex nature of
sentiments. Training an S-PLSA model enables us to obtain a succinct summary of the
sentiment information embedded in the reviews. Based on S-PLSFA, we propose ARSA,
an Autoregressive Sentiment-Aware model for sales prediction. We then seek to further
improve the accuracy of prediction by considering the quality factor, with a focus on
predicting the quality of a review in the absence of user-supplied indicators, and present
ARSQA, an Autoregressive Sentiment and Quality Aware model, to utilize sentiments
and quality for predicting product sales performance. Extensive experiments conducted
on a large movie data set confirm the effectiveness of the proposed approach.
39. Cloud Computing Security: From Single to Multi-Clouds The use of cloud computing has increased rapidly in many organizations. Cloud computing
provides many benefits in terms of low cost and accessibility of data. Ensuring the security of
cloud computing is a major factor in the cloud computing environment, as users often store
sensitive information with cloud storage providers but these providers may be untrusted.
14 IEEE Project Topics| Soham Consultants
Dealing with “single cloud” providers is predicted to become less popular with customers
due to risks of service availability failure and the possibility of malicious insiders in the
single cloud. A movement towards “multi-clouds”, or in other words, “interclouds” or
“cloud-ofclouds” has emerged recently. This paper surveys recent research related to single
and multi-cloud security and addresses possible solutions. It is found that the research into
the use of multicloud providers to maintain security has received less attention from the
research community than has the use of single clouds. This work aims to promote the use of
multi-clouds due to its ability to reduce security risks that affect the cloud computing user.
40. Optimization of Resource Provisioning Cost in Cloud Computing In cloud computing, cloud providers can offer cloud consumers two provisioning
plans for computing resources, namely reservation and on-demand plans. In general, cost
of utilizing computing resources provisioned by reservation plan is cheaper than that
provisioned by on-demand plan, since cloud consumer has to pay to provider in advance.
With the reservation plan, the consumer can reduce the total resource provisioning cost.
However, the best advance reservation of resources is difficult to be achieved due to
uncertainty of consumer's future demand and providers' resource prices. To address this
problem, an optimal cloud resource provisioning (OCRP) algorithm is proposed by
formulating a stochastic programming model. The OCRP algorithm can provision
computing resources for being used in multiple provisioning stages as well as a long-term
plan, e.g., four stages in a quarter plan and twelve stages in a yearly plan. The demand
and price uncertainty is considered in OCRP. In this paper, different approaches to obtain
the solution of the OCRP algorithm are considered including deterministic equivalent
formulation, sample-average approximation, and Benders decomposition. Numerical
studies are extensively performed in which the results clearly show that with the OCRP
algorithm, cloud consumer can successfully minimize total cost of resource provisioning
in cloud computing environments
41. A Secure Erasure Code-Based Cloud Storage System with Secure Data
Forwarding A cloud storage system, consisting of a collection of storage servers, provides long-term
storage services over the Internet. Storing data in a third party’s cloud system causes serious
concern over data confidentiality. General encryption schemes protect data confidentiality,
but also limit the functionality of the storage system because a few operations are supported
over encrypted data. Constructing a secure storage system that supports multiple functions is
challenging when the storage system is distributed and has no central authority. We propose
a threshold proxy re-encryption scheme and integrate it with a decentralized erasure code
such that a secure distributed storage system is formulated. The distributed storage system
not only supports secure and robust data storage and retrieval, but also lets a user forward his
data in the storage servers to another user without retrieving the data back. The main
technical contribution is that the proxy re-encryption scheme supports encoding operations
over encrypted messages as well as forwarding operations over encoded and encrypted
15 IEEE Project Topics| Soham Consultants
messages. Our method fully integrates encrypting, encoding, and forwarding. We analyze
and suggest suitable parameters for the number of copies of a message dispatched to storage
servers and the number of storage servers queried by a key server. These parameters allow
more flexible adjustment between the number of storage servers.
42. HASBE: A Hierarchical Attribute- Based Solution for Flexible and
Scalable Access Control in Cloud Computing Cloud computing has emerged as one of the most influential paradigms in the IT industry in
recent years. Since this new computing technology requires users to entrust their valuable
data to cloud providers, there have been increasing security and privacy concerns on
outsourced data. Several schemes employing attribute based encryption (ABE) have been
proposed for access control of outsourced data in cloud computing; however, most of them
suffer from inflexibility in implementing complex access control policies. In order to realize
scalable, flexible, and fine-grained access control of outsourced data in cloud computing,
in this paper, we propose hierarchical attribute-set based encryption (HASBE) by extending
cipher text policy attribute-set-based encryption (ASBE) with a hierarchical structure of
users. The proposed scheme not only achieves scalability due to its hierarchical structure, but
also inherits flexibility and fine-grained access control in supporting compound attributes of
ASBE. In addition, HASBE employs multiple value assignments for access expiration time
to deal with user revocation more efficiently than existing schemes. We formally prove the
security of HASBE based on security of the cipher text-policy attribute based encryption
(CP-ABE) scheme by Bethencourt etal. and analyze its performance and computational
complexity. We implement our scheme and show that it is both efficient and flexible in
dealing with access control for outsourced data in cloud computing with comprehensive
experiments.
43. A Distributed Access Control Architecture for Cloud Computing The large-scale, dynamic, and heterogeneous nature of cloud computing poses
numerous security challenges. But the cloud's main challenge is to provide a robust
authorization mechanism that incorporates multitenancy and virtualization aspects of
resources. The authors present a distributed architecture that incorporates principles from
security management and software engineering and propose key requirements and a
design model for the architecture.
44. Cloud Computing Security: From Single to Multi-clouds The use of cloud computing has increased rapidly in many organizations. Cloud
computing provides many benefits in terms of low cost and accessibility of data.
Ensuring the security of cloud computing is a major factor in the cloud computing
environment, as users often store sensitive information with cloud storage providers but
these providers may be untrusted. Dealing with "single cloud" providers is predicted to
become less popular with customers due to risks of service availability failure and the
possibility of malicious insiders in the single cloud. A movement towards "multi-clouds",
or in other words, "interclouds" or "cloud-of-clouds" has emerged recently. This paper
16 IEEE Project Topics| Soham Consultants
surveys recent research related to single and multi-cloud security and addresses possible
solutions. It is found that the research into the use of multi-cloud providers to maintain
security has received less attention from the research community than has the use of
single clouds. This work aims to promote the use of multi-clouds due to its ability to
reduce security risks that affect the cloud computing user.
45. Scalable and Secure Sharing of Personal Health Records in Cloud
Computing using Attribute-based Encryption Personal health record (PHR) is an emerging patient centric model of health information
exchange, which is often outsourced to be stored at a third party, such as cloud providers.
However, there have been wide privacy concerns as personal health information could be
exposed to those third party servers and to unauthorized parties. To assure the patients’
control over access to their own PHRs, it is a promising method to encrypt the PHRs before
outsourcing. Yet, issues such as risks of privacy exposure, scalability in key management,
flexible access and efficient user revocation, have remained the most important challenges
toward achieving fine-grained, cryptographically enforced data access control. In this paper,
we propose a novel patient-centric framework and a suite of mechanisms for data access
control to PHRs stored in semi-trusted servers. To achieve fine-grained and scalable data
access control for PHRs, we leverage attribute based encryption (ABE) techniques to encrypt
each patient’s PHR file. Different from previous works in secure data outsourcing, we focus
on the multiple data owner scenario, and divide the users in the PHR system into multiple
security domains that greatly reduces the key management complexity for owners and users.
A high degree of patient privacy is guaranteed simultaneously by exploiting multi-authority
ABE. Our scheme also enables dynamic modification of access policies or file attributes,
supports efficient on-demand user/attribute revocation and break-glass access under
emergency scenarios. Extensive analytical and experimental results are presented which
show the security, scalability and efficiency of our proposed scheme
46. Secure and Practical Outsourcing of Linear Programming in Cloud
Computing Cloud Computing has great potential of providing robust computational power to the
society at reduced cost. It enables customers with limited computational resources to
outsource their large computation workloads to the cloud, and economically enjoy the
massive omputational power, bandwidth, storage, and even appropriate software that can
be shared in a pay-per-use manner. Despite the tremendous benefits, security is the
primary obstacle that prevents the wide adoption of this promising computing model,
especially for customers when their confidential data are consumed and produced during
the computation. Treating the cloud as an intrinsically insecure computing platform from
the viewpoint of the cloud customers, we must design mechanisms that not only protect
sensitive information by enabling computations with encrypted data, but also protect
customers from malicious behaviors by enabling the validation of the computation result.
Such a mechanism of general secure computation outsourcing was recently shown to be
feasible in theory, but to design mechanisms that are practically efficient remains a very
17 IEEE Project Topics| Soham Consultants
challenging problem. Focusing on engineering computing and optimization tasks, this
paper investigates secure outsourcing of widely applicable linear programming (LP)
computations. In order to achieve practical efficiency, our mechanism design explicitly
decomposes the LP computation outsourcing into public LP solvers running on the cloud
and private LP parameters owned by the customer. The resulting flexibility allows us to
explore appropriate security/ efficiency tradeoff via higher level abstraction of LP
computations than the general circuit representation. In particular, by formulating private
data owned by the customer for LP problem as a set of matrices and vectors, we are able
to develop a set of efficient privacy-preserving problem transformation techniques, which
allow customers to transform original LP problem into some arbitrary one while
protecting sensitive input/output information. To validate the computation result, we
further explore the fundamental duality theorem of LP computation and derive the
necessary and sufficient conditions that correct result must satisfy. Such result
verification mechanism is extremely efficient and incurs close-tozero additional cost on
both cloud server and customers. Extensive security analysis and experiment results show
the immediate practicability of our mechanism design.
47. Secure and privacy preserving keyword searching for cloud storage
services Cloud storage services enable users to remotely access data in a cloud anytime and
anywhere, using any device, in a pay-as-you-go manner. Moving data into a cloud offers
great convenience to users since they do not have to care about the large capital
investment in both the deployment and management of the hardware infrastructures.
however, allowing a cloud service provider (CSP), whose purpose is mainly for making a
profit, to take the custody of sensitive data, raises underlying security and privacy issues.
To keep user data confidential against an untrusted CSP, a natural way is to apply
cryptographic approaches, by disclosing the data decryption key only to authorized users.
However, when a user wants to retrieve files containing certain keywords using a thin
client, the adopted encryption system should not only support keyword searching over
encrypted data, but also provide high performance. In this paper, we investigate the
characteristics of cloud storage services and propose a secure and privacy preserving
keyword searching (SPKS) scheme, which allows the CSP to participate in the
decipherment, and to return only files containing certain keywords specified by the users,
so as to reduce both the computational and communication overhead in decryption for
users, on the condition of preserving user data privacy and user querying privacy.
Performance analysis shows that the SPKS scheme is applicable to a cloud environment.
48. Data Security and Privacy Protection Issues in Cloud Computing It is well-known that cloud computing has many potential advantages and many
enterprise applications and data are migrating to public or hybrid cloud. But regarding
some business-critical applications, the organizations, especially large enterprises, still
wouldn't move them to cloud. The market size the cloud computing shared is still far
behind the one expected. From the consumers' perspective, cloud computing security
concerns, especially data security and privacy protection issues, remain the primary
inhibitor for adoption of cloud computing services. This paper provides a concise but all-
round analysis on data security and privacy protection issues associated with cloud
18 IEEE Project Topics| Soham Consultants
computing across all stages of data life cycle. Then this paper discusses some current
solutions. Finally, this paper describes future research work about data security and
privacy protection issues in cloud.
49. Application study of online education platform based on cloud computing
Aimed at some problems in Network Education Resources Construction at present, we analyze
the characteristics and application range of cloud computing, and present an integrated solving
scheme. On that basis, some critical technologies such as the cloud storage, streaming media and
cloud safety are analyzed in detail. Finally, the paper gives summarization and expectation.
50. Mining User Queries with Markov Chains Application to Online Image Retrieval
We propose a novel method for automatic annotation, indexing and annotation-based
retrieval of images. The new method, that we call Markovian Semantic Indexing (MSI),
is presented in the context of an online image retrieval system. Assuming such a system,
the users' queries are used to construct an Aggregate Markov Chain (AMC) through
which the relevance between the keywords seen by the system is defined. The users'
queries are also used to automatically annotate the images. A stochastic distance between
images, based on their annotation and the keyword relevance captured in the AMC, is
then introduced. Geometric interpretations of the proposed distance are provided and its
relation to a clustering in the keyword space is investigated. By means of a new measure
of Markovian state similarity, the mean first cross passage time (CPT), optimality
properties of the proposed distance are proved. Images are modeled as points in a vector
space and their similarity is measured with MSI. The new method is shown to possess
certain theoretical advantages and also to achieve better Precision versus Recall results
when compared to Latent Semantic Indexing (LSI) and probabilistic Latent Semantic
Indexing (pLSI) methods in Annotation-Based Image Retrieval (ABIR) tasks.
51. Towards Trustworthy Resource Scheduling in Clouds Managing the allocation of cloud virtual machines at physical resources is a key
requirement for the success of clouds. Current implementations of cloud schedulers do
not consider the entire cloud infrastructure neither do they consider the overall user and
infrastructure properties. This results in major security, privacy, and resilience concerns.
In this paper, we propose a novel cloud scheduler which considers both user requirements
and infrastructure properties. We focus on assuring users that their virtual resources are
hosted using physical resources that match their requirements without getting users
involved with understanding the details of the cloud infrastructure. As a proof-of-
concept, we present our prototype which is built on OpenStack. The provided prototype
implements the proposed cloud scheduler. It also provides an implementation of our
previous work on cloud trust management which provides the scheduler with input about
the trust status of the cloud infrastructure.