Secure Disrtibuted De-duplication System with Improved Reliability in Cloud Computing

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 3, Issue 1, January - 2016. ISSN 2348 4853, Impact Factor 1.317

47 | 2016, IJAFRC All Rights Reserved www.ijfarc.org

Secure Disrtibuted De-duplication System with Improved

Reliability in Cloud Computing Mr. Sagar G Khengat, Mr. Swapnil S Belorkar, Mr. Alok Y Shukla, Prof. Nilesh B Madke

Dept. of Computer Engineering,

ISB & M School Of Technology Nande Village, Tal.Mulashi, Pune, Savitribai Phule Pune University.

[email protected], [email protected], [email protected],

[email protected]

A B S T R A C T

Information is a procedure for disposing of copy duplicates of information, and has been broadly utilized as a part of Cloud storage to de-crease storage room and transfer transmission capacity. On the other hand, there is one and only duplicate for every record put away in cloud regard-less of the fact that such a document is possessed by an immense number of clients. Thus, framework enhances stockpiling use while diminishing un-wavering quality. Besides, the test of security for delicate information additionally emerges when they are outsourced by clients to cloud. Intending to address the above security challenges, this paper makes the rest endeavor to formalize the thought of dispersed dependable framework. We propose new conveyed frameworks with higher unwavering quality in which the in-formation pieces are dispersed over numerous cloud servers. The security necessities of information privacy and label consistency are additionally accomplished by presenting a deterministic mystery sharing plan in dispersed stockpiling frameworks, rather than utilizing concurrent encryption as a part of past frameworks. Security examination shows that our frameworks are se-cure as far as the definitions determined in the proposed security model. As a proof of idea, we execute the proposed frameworks and exhibit that the caused overhead is extremely restricted in sensible situations.

Index Terms: cloud security, de-duplication, distributed system, proof of ownership, file-level ,block-level

I. INTRODUCTION

In this situation, it is alluring to review that the SSP meets its contractual commitments. SSPs have

numerous inspirations to fall flat these commitments; for instance, a SSP may attempt to conceal data

misfortune occurrences so as to save its notoriety or it may dispose of data that is seldom gotten to so

that it may exchange the same stockpiling. Remote data checking (RDC) permits a reviewer to test a

server to give a proof of data possession so as to approve that the server has the data that was initially

put away by a customer. We say that a RDC plan looks to give a data possession ensure. Archival system

to wrath present exceptional execution requests. Given that le data is expansive and put away at remote

locales, getting to a whole le is lavish in I/O expenses to the capacity server and in transmitting the le

over a system. Perusing a whole chronicle, even intermittently, enormously confines this capacity of

system stores. Besides, I/O acquired to set up data possession meddles with on-interest transfer speed to

store and recover data.

We presume that customers should have the capacity to check that a server has held le data without

recovering the data from the server and without having the server get to the whole le. A plan for

inspecting remote data ought to be both lightweight and powerful. Lightweight implies that it doesn't

unduly trouble the SSP; this incorporates both over-head (i.e., calculation and I/O) at the SSP and

correspondence between the SSP and the inspector. This objective can be accomplished by depending on

spot checking, in which the examiner haphazardly tests little divides of the data and checks their




uprightness, therefore minimizing the I/O at the SSP. Spot checking permits the customer to identify if a

small amount of the data put away at the server has been debased, yet it can't recognize defilement of

little parts of the data (e.g., 1 byte). Vigorous implies that the examining plan consolidates components

for relieving self-assertive measure corruption. Protecting against large corruptions ensures that the SSP

has committed the contracted storage resources: Little space can be reclaimed undetectably, making it

unattractive to delete data to save on storage costs or sell the same storage multiple times. Protecting

against small corruptions protects the data itself, not just the storage resource. Much data has value well

beyond its storage costs, making attacks that corrupt small amounts of data practical. For example,

modifying a single bit may destroy an encrypted le or invalidate authentication information.

II. PROPOSED SYSTEM

Input: Though de-duplication technique can save the storage space for the cloud storage service

providers, it reduces the reliability of the system. Data reliability is actually very critical issue in a de-

duplication storage system be-cause there is only one copy for each le stored in the server shared by all

the owners. If such a shared le/chunk was lost, a disproportionately large amount of data becomes

inaccessible because of the unavailability of all the les that share this le/chunk. If the value of a chunk

were measured in terms of the amount of le data that would be lost in case of losing a single chunk, then

the amount of user data lost when a chunk in the storage system is corrupted grows with the number of

the commonality of the chunk. Thus, how to guarantee high data reliability in de-duplication system is a

critical problem. Most of the previous de-duplication systems have only been considered in a single-

server setting. However, as lots of de-duplication systems and cloud storage systems are intended by

users and applications for higher reliability, especially in archival storage systems where data are critical

and should be preserved over long time periods. This requires that the de-duplication storage systems

provide reliability comparable to other high-available systems

Output: User will login and upload on cloud with tags.

III. Advantages of Proposed System

1. Higher reliability in which the data chunks are distributed across multiple cloud servers.

2. Security requirements of data confidentiality and tag consistency are also achieved by introducing a

deterministic secret sharing scheme in distributed storage systems.

IV. LITERATURE SURVEY

1. Reclaiming Space from Duplicate Files in a Server less Distributed File System:

The Far site distributed file system provides availability by replicating each file onto multiple desktop

computers. Since this replication consumes significant storage space, it is important to reclaim used

space where possible. Measurement of over 500 desktop file systems shows that nearly half of all

consumed space is occupied by duplicate files. We present a mechanism to reclaim space from this

incidental duplication to make it available for controlled file replication. Our mechanism includes 1)

convergent encryption, which enables duplicate files to coalesced into the space of a single file, even if the

files are encrypted with different users keys, and 2) SALAD, a Self- Arranging, Loss, Associative Database

for aggregating file content and location information in a decentralized, scalable, fault-tolerant manner.

Large-scale simulation experiments show that the duplicate-file coalescing system is scalable, highly

effective, and fault-tolerant.




2. DupLESS: Server-Aided Encryption for De-duplicated Storage:

Cloud storage service providers such as Dropbox, Mozy, and others perform de-duplication to save space

by only storing one copy of each le uploaded. Should clients conventionally encrypt their les, however,

savings are lost. Message-locked encryption (the most prominent manifestation of which is convergent

encryption) resolves this tension. However it is inherently subject to brute-force attacks that can recover

les falling into a known set. We propose an architecture that provides secure de-duplicated storage

resisting brute-force attacks, and realize it in a system called DupLESS. In DupLESS, clients encrypt under

message-based keys obtained from a key-server via an oblivious PRF protocol. It enables clients to store

encrypted data with an existing service, have the service perform de-duplication on their behalf, and yet

achieves strong condentiality guarantees. We show that encryption for de-duplicated storage can

achieve performance and space savings close to that of using the storage service with plaintext data.

3. Message-Locked Encryption and Secure De-duplication.

We formalize a new cryptographic primitive, Message-Locked Encryption (MLE), where the key under

which encryption and decryption are performed is itself derived from the message. MLE provides a way

to achieve secure de-duplication (space-ecient secure outsourced storage), a goal currently targeted by

numerous cloud-storage providers. We provide denitions both for privacy and for a form of integrity

that we call tag consistency. Based on this foundation, we make both practical and theoretical

contributions. On the practical side, we provide ROM security analyses of a natural family of MLE

schemes that includes deployed schemes. On the theoretical side the challenge is standard model

solutions, and we make connections with deterministic encryption, hash functions secure on correlated

inputs and the sample-then-extract paradigm to deliver schemes under dierent assumptions and for

dierent classes of message sources. Our work shows that MLE is a primitive of both practical and

theoretical interest.

4. CD Store: Toward Reliable, Secure, and Cost-Efcient Cloud Storage via Convergent Dispersal

We present CD Store, which disperses users backup data across multiple clouds and provides a unied

multi cloud storage solution with reliability, security, and cost- efciency guarantees. CD Store builds on

an augmented secret sharing scheme called convergent dispersal, which supports de-duplication by using

deterministic content- derived hashes as inputs to secret sharing. We present the design of CD Store, and

in particular, describe how it combines convergent dispersal with two-stage de-duplication to achieve

both bandwidth and storage savings and be robust against side-channel attacks. We evaluate the

performance of our CD Store prototype using real-world workloads on LAN and commercial cloud test

beds. Our cost analysis also demonstrates that CD Store achieves a monetary cost saving of 70% over a

baseline cloud storage solution using state-of-the-art secret sharing.

5. Secure De-duplication and Data Security With Efficient And Reliable CEKM

Secure de-duplication is a technique for eliminating duplicate copies of storage data, and provides security to them. To reduce storage space and upload bandwidth in cloud storage de-duplication has been a well-known technique. For that purpose convergent encryption has been extensively adopt for secure de-duplication, critical issue of making convergent encryption practical is to efficiently and reliably manage a huge number of convergent keys. The basic idea in this paper is that we can eliminate duplicate copies of storage data and limit the damage of stolen data if we decrease the value




of that stolen information to the attacker. This paper makes the first attempt to formally address the problem of achieving efficient and reliable key management in secure de-duplication. We first introduce a baseline approach in which each user holds an independent master key for encrypting the convergent keys and outsourcing them. However, such a baseline key management scheme generates an enormous number of keys with the increasing number of users and requires users to dedicatedly protect the master keys. To this end, we propose Dekey, User Behavior Profiling and Decoys technology. Dekey new construction in which users do not need to manage any keys on their own but instead securely distribute the convergent key shares across multiple servers for insider attacker. As a proof of concept, we implement Dekey using the Ramp secret sharing scheme and demonstrate that Dekey incurs limited overhead in realistic environments

V. ARCHITECTURE OF PROPOSED SYSYTEM

In following figure we design a system which is useful for preventing data de-duplication with improved

reliability.

Figure 1. Architecture of Proposed System.

VI. CONCLUSION

We concentrated on the issue of evaluating if an untrusted server stores a customer's data. We presented

a model for provable data possession (PDP), in which it is alluring to minimize the le piece gets to, the

calculation on the server, and the client server correspondence. Our answers for PDP t this model: They

cause a low (or even steady) overhead at the server and oblige a little, consistent measure of

correspondence per challenge. Key parts of our plans are the backing for spot checking, which guarantees

that the plans stay light weight, and the homomorphic variable labels, which permit to concern data

possession without having entry to the genuine data le. We likewise dene the idea of hearty inspecting,

which coordinates remote data checking (RDC) with for-ward mistake amending codes to moderate

discretionarily little le debasements and propose a non-specific change for adding vigor to any spot




checking-based RDC plan. Examinations demonstrate that our plans make it down to earth to check

possession of vast data sets. Past plans that don't permit testing are not commonsense when PDP is

utilized to demonstrate possession of a lot of data, as they force a significant I/O and computational

weigh on the server.

VII. FUTURE SCOPE

The distributed de-duplication systems to improve the reliability of data. Four constructions were

proposed to support file-level and block-level data de-duplication. Our de-duplication systems using the

Ramp secret sharing scheme and demonstrated that it incurs small encoding/decoding overhead

compared to the network transmission overhead in regular upload/download operations.

VIII. REFERENCES

[1] Jin Li, Xiao feng Chen, Shaohua Tangand, Yang Xiang, Mo-hammad Mehedi Hassan, Abdulhameed

Alelaiwi Secure Distributed De-duplication Systems with Improved Reliability, IEEE 2015,pp.

15574014

[2] J. Gantz and D. Reinsel, The digital universe in 2020: Big data, bigger digital shadows, and biggest

growth in the far east, http://www.emc.com/collateral/analyst-reports/idcthe- digital-universe-

in-2020.pdf, Dec 2014.

[3] M. O. Rabin, Fingerprinting by random polynomials, Center for Re-search in Computing

Technology, Harvard University, Tech. Rep. Tech. Report TR-CSE-03-01,

[4] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M. Theimer, Re-claiming space from duplicate

les in a serverless distributed le system. in ICDCS, 2013, pp. 617624

[5] Message-locked encryption and secure de-duplication, in EUROCRYPT, 2013, pp. 296312.

[6] G. R. Blakley and C. Meadows, Security of ramp schemes, in Advances in Cryptology: Proceedings

of CRYPTO 84, ser. Lecture Notes in Computer Science, G. R. Blakley and D. Chaum, Eds. Springer-

Verlag Berlin/Heidelberg, 1985, vol. 196, pp. 242268

[7] M. O. Rabin, client dispersal of information for security, load balancing, and fault tolerance,

Journal of the ACM, vol. 36, no. 2, pp. 335348, Apr. 2014

[8] A. Shamir, How to share a secret Communication. ACM, vol. 22, no. 11, pp. 612613, 2014

[9] J. Li, X. Chen, M. Li, J. Li, P. Lee, and W. Lou, Secure de-duplication with efficient and reliable

convergent key management, in IEEE Trans-actions on Parallel and Distributed Systems, 2014,

pp. vol. 25(6), pp. 16151625.

Documents

Secure Disrtibuted De-duplication System with Improved Reliability in Cloud Computing