83
US 20100333116A1 (12) Patent Application Publication (10) Pub. No.: US 2010/0333116 A1 (19) United States Prahlad et al. (43) Pub. Date: Dec. 30, 2010 (54) CLOUD GATEWAY SYSTEM FOR MANAGING DATA STORAGE TO CLOUD STORAGE SITES (76) Inventors: Anand Prahlad, Bangalore (IN); Marcus S. Muller, Tinton Falls, NJ (US); Rajiv Kottomtharayil, Marlboro, NJ (US); Srinivas Kavuri, Miyapur (IN); Parag Gokhale, Ocean, NJ (US); Manoj Vij ayan, Marlboro, NJ (US) Correspondence Address: PERKINS COIE LLP PATENT-SEA PO. BOX 1247 SEATTLE, WA 98111-1247 (US) (21) App1.No.: 12/751,953 (22) Filed: Mar. 31, 2010 Related US. Application Data (60) Provisional application No. 61/299,313, ?led on Jan. 28, 2010, provisional application No. 61/221,993, ?led on Jun. 30, 2009, provisional application No. 61/223,695, ?led on Jul. 7, 2009. 130 Client 195 Data 165 agent Secondary 130 storage computer device Client 195 Data I agent . 165 2 Secondary ' storage computer 130 device Client 1 Data agent Publication Classi?cation (51) Int. Cl. G06F 9/44 (2006.01) G06F 15/167 (2006.01) H04L 29/06 (2006.01) (52) us. c1. ........................ .. 719/328; 709/216; 713/153 (57) ABSTRACT Systems and methods are disclosed for performing data stor age operations, including content-indexing, containeriZed deduplication, andpolicy-driven storage, Within a cloud envi ronment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud envi ronment that requires data transfer over Wide area networks, such as the Internet, Which may have appreciable latency and/or packet loss, using various network protocols, includ ing HTTP and FTP. Methods are disclosed for content index ing data stored Within a cloud environment to facilitate later searching, including collaborative searching. Methods are also disclosed for performing containeriZed deduplication to reduce the strain on a system namespace, effectuate cost savings, etc. Methods are disclosed for identifying suitable storage locations, including suitable cloud storage sites, for data ?les subject to a storage policy. Further, systems and methods for providing a cloud gateWay and a scalable data object store Within a cloud environment are disclosed, along With other features. 115A , Cloud storage site A http/https/ftp protocols 1 15B Cloud storage site B 115N Cloud storage site N

(19) United States (12) Patent Application Publication (10 ... Toolkits/The Cloud... · - 165 - * V 165 secondary storage computing E secondary storage computing dev'ce 235 205 deViCe

Embed Size (px)

Citation preview

US 20100333116A1

(12) Patent Application Publication (10) Pub. No.: US 2010/0333116 A1 (19) United States

Prahlad et al. (43) Pub. Date: Dec. 30, 2010

(54) CLOUD GATEWAY SYSTEM FOR MANAGING DATA STORAGE TO CLOUD STORAGE SITES

(76) Inventors: Anand Prahlad, Bangalore (IN); Marcus S. Muller, Tinton Falls, NJ (US); Rajiv Kottomtharayil, Marlboro, NJ (US); Srinivas Kavuri, Miyapur (IN); Parag Gokhale, Ocean, NJ (US); Manoj Vij ayan, Marlboro, NJ (US)

Correspondence Address: PERKINS COIE LLP PATENT-SEA PO. BOX 1247 SEATTLE, WA 98111-1247 (US)

(21) App1.No.: 12/751,953

(22) Filed: Mar. 31, 2010

Related US. Application Data

(60) Provisional application No. 61/299,313, ?led on Jan. 28, 2010, provisional application No. 61/221,993, ?led on Jun. 30, 2009, provisional application No. 61/223,695, ?led on Jul. 7, 2009.

130

Client 195

Data 165 agent

Secondary 130 storage computer

device Client

195 Data I agent .

165

2 Secondary ' storage computer

130 device

Client 1

Data agent

Publication Classi?cation

(51) Int. Cl. G06F 9/44 (2006.01) G06F 15/167 (2006.01) H04L 29/06 (2006.01)

(52) us. c1. ........................ .. 719/328; 709/216; 713/153

(57) ABSTRACT

Systems and methods are disclosed for performing data stor age operations, including content-indexing, containeriZed deduplication, andpolicy-driven storage, Within a cloud envi ronment. The systems support a variety of clients and cloud storage sites that may connect to the system in a cloud envi ronment that requires data transfer over Wide area networks, such as the Internet, Which may have appreciable latency and/or packet loss, using various network protocols, includ ing HTTP and FTP. Methods are disclosed for content index ing data stored Within a cloud environment to facilitate later searching, including collaborative searching. Methods are also disclosed for performing containeriZed deduplication to reduce the strain on a system namespace, effectuate cost savings, etc. Methods are disclosed for identifying suitable storage locations, including suitable cloud storage sites, for data ?les subject to a storage policy. Further, systems and methods for providing a cloud gateWay and a scalable data object store Within a cloud environment are disclosed, along With other features.

115A

‘ , Cloud storage site A

http/https/ftp protocols

1 15B

Cloud storage site B

115N

Cloud storage site N

Patent Application Publication Dec. 30, 2010 Sheet 1 0f 33 US 2010/0333116 A1

29% m 9.6 @9906 U320

mm: < 9m @9206 “520

N ENE

Emmm Ema mm?

E96

om;

2822a @3559;

m2>wn 6:588 @9206 Bmucoowm mg

8320 bmucooow

“comm Ema

E26

on?

All

mm?

Ewmm Ema mm?

E26

on?

Patent Application Publication Dec. 30, 2010 Sheet 2 0f 33 US 2010/0333116 A1

245 105 150 storage manager

I 235 233 l : network mgmt l | agent agent :

211 l 220 225 : mgmt. : jobs interface r

' l 1?,0 Index I agent agent I 130 Chem ‘. L “ _ ' ' ' _ _ - ' - “ , client

270 255 195 195 255 270

m t network network - ea Client data data client meta

base agent agent agent agent base

A

260 260

_l?“i/s£"e£e_____ __¢_______ secondary storage

261 - 165 - * V 165

secondary storage computing E secondary storage computing dev'ce 235 205 deViCe 235 205

content 247 Network content Network indexing agent lndexmg agent component 38 component

299 light 299

deduplication '"dex tie-duplication module module

240 240

Media file system agent 236 Media ?le system agent 236 Cloud storage Cloud storage submodule submodule

297

- A 297 Deduplication , .

database 115 " 115 Deduphcatlon Storage Storage database Device Device

(e.g., cloud (e.g., cloud storage site) storage site)

FIG. 2

Patent Application Publication Dec. 30, 2010 Sheet 3 0f 33 US 2010/0333116 A1

340 Receive a ?le system request to write data to a target cloud

storage site

i 350 Add data associated with

received file system request to buffer

Buffer full?

Convert file system requests to vendor-specific API calls

ii 380 Transmit buffer using vendor

specific API calls

Transmission successful?

FIG. 3A

Patent Application Publication Dec. 30, 2010 Sheet 4 0f 33 US 2010/0333116 A1

300

c > 310

Receive copy of an original data set from a file system

320

Index data

330 Deduplicate data and store deduplicated data on cloud

storage

( Return )

FIG. 3B

Patent Application Publication Dec. 30, 2010 Sheet 5 0f 33 US 2010/0333116 A1

400

130 297

Client 1 Deduplication Database

Deduplication Module 299

410 420

tion generation

Client 2 425 430

Identi?er Criteria comparison evaluation

130 1 15

. Storage

Chent n device

FIG. 4

Patent Application Publication

500

510

5151

Dec. 30, 2010 Sheet 6 0f 33 US 2010/0333116 A1

502

chunk folder

504

——> metadata file

506

--——> N file

508

———> S file

FIG. 5A

502

chunk folder 1

504

———> metadata file 1

506

—————> N file 1

508

__—__> 8 file 1

U

chunk folder 2

504

—————-+ metadata file 2

506

—--—> N file 2

FIG. 5B

Patent Application Publication Dec. 30, 2010 Sheet 7 0f 33 US 2010/0333116 A1

522 524 522 524 522 524

Stream Stream Stream Stream . _ . Stream Stream Header 1 Data 1 Header 2 Data 2 Header 11 Data n

520 ; FIG. 5 C

542 542 542 542 542

C0 C1 C2 C3 - ~ - C”

O 5 10 15 65

544 544 544 544 544

FIG. 5D

Patent Application Publication Dec. 30, 2010 Sheet 8 0f 33 US 2010/0333116 A1

600

( Prune )

v 605

Receive selection of an archive ?le to prune

v 610

Perform lookup of archive file

615 Does

archive file have references out?

620

Delete the references out

archive files reference by references out have other

references in?

630 Prune archive files referenced by

references out

635 Does archive file have

references in?

640 v 650

Delete references in Prune archive file

\ 645 655

Add reference to archive file to deleted Add deleted time stamp archive file table to archive file table

FIG. 6

Patent Application Publication Dec. 30, 2010 Sheet 11 0f 33 US 2010/0333116 A1

802

804

_> Chunk_001

Metadata ?le 806

——> Non-SI data

Metadata index ?le 808

—> Index to metadata file

Container file 001 810

‘—> B1 B2 B3 - - ~ Bn

Container file 002 811

--> B1 B2 B3 ' ~ ' Bn

+ Container index file 812

001_B1 001__B2 . _ . 002_B1 0O2_Bn I 0 1 1 O

805

_> Chunk_002

Metadata file 807 Non-Si . . Non-SI

——> data Link Link data

Metadata index file 809

—> Index to metadata file

Container file 001 813

——> B1 B2 B3 B4 B5 --- Bn

Container index file 814

0011_B1 001o_B2 ._. 0011_Bn

FIG. 8

Patent Application Publication Dec. 30, 2010 Sheet 12 0f 33 US 2010/0333116 A1

900

905

Receive selection of a job to be pruned 932

entries in container index file corresponding 0 the container equa

to zero?

v 907

Determine archive file, volume folders, and chunk folders

corresponding to job

i 910 933

Delete metadata ?les and metadata index ?les in chunk Delete container file

folders A

V 915

Access container file in chunk folders More

container files in chunk folders?

920 For the

block in the container file, is its reference count

in primary table equal

Free up space in container files?

Set corresponding entry in container index file equal to

zero

W Free up space in container files

l ‘

V

Return i

More blocks in container file?

FIG. 9

Patent Application Publication Dec. 30, 2010 Sheet 13 0f 33 US 2010/0333116 A1

C Index content >

Select copy of data set

1010

1020

Identify content

1030

Update content index

C Return D

FIG. 10

US 2010/0333116 A1

- ENE

Dec. 30, 2010 Sheet 14 0f 33

00:

Patent Application Publication

Patent Application Publication

1200

Dec. 30, 2010 Sheet 15 0f 33

1 Restore 1

v 1205

Receive selection of a file to restore

v 1210

Determine archive file ID and offset

v 1215

Access secondary storage

\ 1220

Open chunk folder

v 1225

Parse metadata file

v 1230

Determine location of file from metadata

v 1235

Open file

v 1240

Restore ?le

V

1 Return 1

FIG. 12

US 2010/0333116 A1

Patent Application Publication Dec. 30, 2010 Sheet 16 0f 33

1310 1320 1330

Archive File ID File ID Offset

AF1 F1 OF1 F2 OFZ F3 OF3

FN OFn

1370 1380 1390

Archive File ID Media Chunk Start

C, J, Cycle, AF M1, C1 AF1, OF1, Size M, C2 AF1, OF2, Size M2, C3 AF1, OF3, Size

FIG. 13B

US 2010/0333116 A1

1300

1350

Patent Application Publication Dec. 30, 2010 Sheet 17 0f 33 US 2010/0333116 A1

( Search Index )

1410

Receive Search Request

1420

Search Content Index

1425

Generate Search Results

1430

Get Next Search Result

Archived?

Retrieve Archived Content

More Results?

1460

Provide Search Results

( Return 1

FIG. 14

Patent Application Publication Dec. 30, 2010 Sheet 18 0f 33 US 2010/0333116 A1

2 25 $906 “520 2m:

m 26 mmmLBw @320 mm:

< 26 @9906 “520 (m:

36E

E26 02

EE w<z

=9

mom? H .6 cm;

cm_

ow? H .5

$2650 02‘ @320

Patent Application Publication Dec. 30, 2010 Sheet 19 0f 33 US 2010/0333116 A1

29% m 96 @9906 2.20

mm:

\ij E20 mmmr

E20 mmmv