35
This document is solely for the use of Sony Pictures Entertainment (SPE) personnel. No part of it may be circulated, quoted, or reproduced for distribution outside the SPE organization without prior written approval from IBM. © Copyright IBM Corporation 2009 Sony Pictures Entertainment Digital Backbone File Management and Infrastructure October 15, 2009

Sony Pictures Entertainment Digital Backbone File

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

This document is solely for the use of Sony Pictures Entertainment (SPE) personnel. No part of it may be circulated, quoted, or reproduced for distribution outside the SPE organization without prior written approval from IBM.

© Copyright IBM Corporation 2009

Sony Pictures Entertainment

Digital Backbone File Management and Infrastructure

October 15, 2009

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Agenda

Review of assumptions and figures from our last meetingIntroduction and discussion of proposed approach to file managementReview and discussion of impact new requirements will have on server and storage infrastructureLTO Library Capacity Requirements

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

100MB/s

Data FROM CW = 330MB/s

DBB TO LTO = 430MB/s

Data FROM ‘other sources’ = 100MB/s

100MB/s

Data TO ‘other sources’ = 100MB/s

200MB/s

Data TO other Destinations = 200MB/s

Data FROM LTO Library to DBB = 86MB/sData FROM CW = 330MB/sData FROM other sources = 100MB/s

DBB TO LTO = 430MB/sData TO CW = 100MB/sData TO other sources = 100MB/sData TO other Destinations = 200MB/s

Total WRITE = 516MB/s

It was mentioned that there may be a desire to have two copies of everything on tape and/or send a second copy through compression for storage to LTO off site. Assuming we need to keep up with ‘constant ingest’ this will add an additional 430MB/s for DR.

Total System Bandwidth = 1,776MB/s

DBB TO LTO2 (DR) = 430MB/s

Other Base Assumptions:o 1 Week = 6 dayso 1 Day = 24 hourso Data Retention (DBB) will vary but is assumed to be 18 monthso CW Scanner Generate 25TB/day (150TB/week)o CW Lib Masters and Digital Cam shots add another 50TB/week o CW File Size can be between 10 and 110MBo “Other” sources to DBB will provide 50M files/year at 20MB/file

and 70TB/week/weeko Original workflow called for all content to go to BB and then be

“MOVED” to tape. Selected sequences would be called via a list from Sony and “COPIED” back from Tape to Disk.

o There may be alternative workflows which are more efficient.o Sufficient Network infrastructure exists within the DBBo Other assumptions related to file management and works are

included in deck

10/14/09

430MB/sSpecified at 20% = 86

Imageworks DBB

330MB/s 100MB/s

Data TO CW = Undefined – Assuming 100MB/s

86MB/s READ from LTO (20% specified)

Total READ = 830MB/s

70TB/week1

200TB/6-day week (35TB/day)1

Volumes:CW to DBB = 210TB/Week (35TB/day specified)Other Sources to DBB = 70TB/WeekTotal expected volume to DBB (and therefore LTO)

= 280TB/Week1Note:1. Assuming 330MB/s over 24 hours yields 28.5TB NOT the 35TB specified2. Assuming 100MB/s over 24 hours yields 8.6TB NOT the 12TB specified

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Sony Backbone ArchiveAssumptions and Details

Steve KauffmanRainer RichterECM Lab Services – San Jose10/14/09

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Sony Backbone Archive Overview

High speed tape accessA group of files are written as a single object on tape.A smaller number of larger tape objects is more efficient.A reduction in the number of objects in the tape storage system results in a smaller tape object database.Larger objects are written to tape at a faster rate.The map between tape objects and disk files is stored with the files, both on disk and on tape, and can also be saved independently.Individual files can be retrieved without retrieving the entire group.Greater flexibility in what protocols locations may be enabled for LTO read/writeImproved organization of files on LTOEase of recovery from system problems Integrated with standard TSM and GPFS products

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Virtual File Repository

SonyScripts

VFR API

TSM

TSM tape

GPFSBackbone

VFRMovers

Sony scripts interact with the files through the VFR API, requesting files for applications

VFR moves files between GPFS and TSM.

VFR stores a sequence of files as one TSM object

VFR retrieves single files, partial sequences of files, or an entire sequence of files from TSM

TSMDatabase

SonyApplications

GPFS

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Phase Overview

Write tapes in durabytes formatGPFS/TSM/Durabytes

TBDDurabytes5

Ingest directly to tape, which reduce GPFS bandwidth

GPFS / TSM AggTSM AggTSM Agg4

Data is moved to tape, GPFS deletesGPFS / TSM AggTSM AggGPFS3

Tapes are write only pending development of recall code

GPFSTSM AggGPFS2

Application use GPFS only during TSM config and VFR development

GPFSNoneGPFS1

CommentRecall fromTape Mgmt

Ingest ToPhase

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Key Considerations

Phase 3 needs to be complete before GPFS capacity is exceeded

Workflow responsibility for WIP tracking, error notification and retries

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Phase 1- Disk Only

Write data to GPFS arrayTest feeds and speedsSony Applications are unchanged, point to GPFSDevelop VFR components

Pre-Reqs1. GPFS installed and operational2. Sony apps can access GPFS

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Phase 2 – Tape Archive

GPFS data Archived to tape in aggregated format, source files REMAIN– Tapes will be valid for future phases, no migration will be needed– Immediate aggregation benefit

Sony Applications may still access GPFS for recalled dataCustom applications select files to archive by directory

Pre-Reqs

TSM Archive Aggregation code completed– Mapper

– Selects directories to be archived– Creates the Mapfile for each directory, saved in that directory

– Archiver– Locates directories with Mapfiles and data files, invokes PRouter to archive

them– PRouter - TSM API archive application

– Aggregates a directory of files (including the mapfile) and stores in TSM

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Phase 3 – Tape RecallGPFS Ingest data Archived to tape, source files DELETEDSony Apps link to VFR API

– Data retrieved from GPFS or TSM– Delivered directly to destination

Pre-Reqs

VFR API complete– Recall processor

– Interface to Sony App to get Request file– Convert Sony Request file to a Recall Mapfile, invoke PRouter

– PRouter TSM recall capability– Read the Recall Mapfile, Recall files from TSM via partial object restore,

write to destination via FTP, CIFS, NFS etc.Sony apps modified to access VFR

– List based requests to optimize sequential tape reads– Optional : Recall to GPFS to preposition data for subsequent fast recall

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Phase 4 – Direct Archive

Ingested data archived from source to tape– No intermediate GPFS copy

– Reduces GPFS and network bandwidth

Pre-Reqs

Modify Ingest routines to access source files– Mapper, Archiver, PRouter

Trigger routines based on source availability – Files must be in place before triggering

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Phase 5 - Durabytes

Data archived to LTO tape in Durabytes format– Open, portable, self-describing format

– Potential tape write speed increase

– No migration of existing TSM data– VFR concurrently supports TSM and Durabytes repositories

Pre-Reqs

Durabytes Tape Manager– TSM or new

PRouter converted to support Durabytes

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

State Transition Diagram - Phase 3

Initial Ingested Mapped Archived

HotFolder

0 GPFS files 1440 files 1441 files 1 file

Ingest(Sony) Mapper Archiver

/ PRouter Recall

PrePosition

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

GPFS Contents - Initial

/Project1

00:00:00 00:01:00 00:02:00 00:03:00

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

GPFS Contents - Ingested

/Project1

00:00:00 00:01:00 00:02:00 00:03:00

00:00:00:01.dpx00:00:00:02.dpx

.

.

.

.00:00:59:23.dpx

00:01:00:01.dpx00:01:00:02.dpx

.

.

.

.00:01:59:23.dpx

00:02:00:01.dpx00:02:00:02.dpx

.

.

.

.00:02:59:23.dpx

00:03:00:01.dpx00:03:00:02.dpx

.

.

.

.00:03:59:23.dpx

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

GPFS Contents - Mapped

/Project1

00:00:00 00:01:00 00:02:00 00:03:00

Project1:00:00.map00:00:00:01.dpx00:00:00:02.dpx

.

.

.

.00:00:59:23.dpx

Project1:00:01.map00:01:00:01.dpx00:01:00:02.dpx

.

.

.

.00:01:59:23.dpx

Project1:00:02.map00:02:00:01.dpx00:02:00:02.dpx

.

.

.

.00:02:59:23.dpx

Project1:00:03.map00:03:00:01.dpx00:03:00:02.dpx

.

.

.

.00:03:59:23.dpx

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

GPFS Contents - Archived

/Project1

00:00:00 00:01:00 00:02:00 00:03:00

Project1:00:00.map Project1:00:01.map Project1:00:02.map Project1:00:03.map

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Hot Folder Contents - Recall

/Project1_ClientA_Req1234

00:01:12:18.dpx00:01:12:19.dpx

.

.

.

.00:01:13:04.dpx

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

State Transition Diagram - Phase 4

Initial Mapped Archived

HotFolder

1440 source files 1441 src files 1 GPFS file

Mapper Archiver/ PRouter Recall

PrePosition

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Backbone Assumptions 1. The backbone will have disk storage and tape storage2. There will be a single file system view which unifies file searching and access regardless

of the actual file location3. Files are copied to the backbone, or copied from the backbone. They are NEVER

updated directly while on the backbone.4. If a file is changed while off the backbone, it will be returned to the backbone under a

different name.a. i.e. there will only ever be a single version of each file on the backbone, though it

can be in multiple locations, disk, tape.b. Each file will have a unique combination of path and name. i.e. names can repeat

but must be in different subdirectories, There is no support for overwrite.5. All files arriving on the backbone disk will be copied to tape, resulting in 2 copies of the

file, 1 on disk, 1 on tapea. The copy to tape will be asynchronous as capacity allowsb. The copy on tape may be suitable for DR c. Optional policies will create copies on 2 different tapes for redundancy

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Backbone Assumptions

6. Files on disk may be deleted per policy, leaving only a single copy on tape7. Files on tape will be deleted via manual processes on a project basis8. Requests for access to a file on tape will cause the file to be copied from tape to the client

locationa. The file is NOT returned to the disk area of the backbone since the presumption is that

the files will be changed and return under a different name.b. File delivery will be highly asynchronous due to request queues, tape, and tape drive

access limitations9. Directory naming conventions will help optimize performance

a. directory names will isolate projects from each other i. facilitate tape reclamationii. map to TSM constructs, e.g. tape pools

b. directory arrangements will group together data likely to be accessed together (e.g. frames with the same minute) i. minimize tape mounts required for file recall

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Summary

Use GPFS as the Index for all files– Directly if the file is on disk

– Indirectly via a Mapfile if the file is on Tape–~1500X reduction in GPFS and TSM objects

Mapfile is an aggregation of stubsCustom code enables Aggregation– Aggregate files to TSM

– Partial object restore for efficient recall

This document is solely for the use of Sony Pictures Entertainment (SPE) personnel. No part of it may be circulated, quoted, or reproduced for distribution outside the SPE organization without prior written approval from IBM.

© Copyright IBM Corporation 2009

Disk Performance Options for Sony Pictures Digital Backbone

October 15, 2009

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Objectives

Understand aggregate storage performance requirements

Design a system to meet those requirements

Reuse existing assets

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

100MB/s

Data FROM CW = 330MB/s

DBB TO LTO = 430MB/s

Data FROM ‘other sources’ = 100MB/s

100MB/s

Data TO ‘other sources’ = 100MB/s

200MB/s

Data TO other Destinations = 200MB/s

Data FROM LTO Library to DBB = 86MB/sData FROM CW = 330MB/sData FROM other sources = 100MB/s

DBB TO LTO = 430MB/sData TO CW = 100MB/sData TO other sources = 100MB/sData TO other Destinations = 200MB/s

Total WRITE = 516MB/s

It was mentioned that there may be a desire to have two copies of everything on tape and/or send a second copy through compression for storage to LTO off site. Assuming we need to keep up with ‘constant ingest’ this will add an additional 430MB/s for DR.

Total System Bandwidth = 1,776MB/s

DBB TO LTO2 (DR) = 430MB/s

Other Base Assumptions:o 1 Week = 6 dayso 1 Day = 24 hourso Data Retention (DBB) will vary but is assumed to be 18 monthso CW Scanner Generate 25TB/day (150TB/week)o CW Lib Masters and Digital Cam shots add another 50TB/week o CW File Size can be between 10 and 110MBo “Other” sources to DBB will provide 50M files/year at 20MB/file

and 70TB/week/weeko Original workflow called for all content to go to BB and then be

“MOVED” to tape. Selected sequences would be called via a list from Sony and “COPIED” back from Tape to Disk.

o There may be alternative workflows which are more efficient.o Sufficient Network infrastructure exists within the DBBo Other assumptions related to file management and works are

included in deck

10/14/09

430MB/sSpecified at 20% = 86

Imageworks DBB

330MB/s 100MB/s

Data TO CW = Undefined – Assuming 100MB/s

86MB/s READ from LTO (20% specified)

Total READ = 830MB/s

70TB/week1

200TB/6-day week (35TB/day)1

Volumes:CW to DBB = 210TB/Week (35TB/day specified)Other Sources to DBB = 70TB/WeekTotal expected volume to DBB (and therefore LTO)

= 280TB/Week1Note:1. Assuming 330MB/s over 24 hours yields 28.5TB NOT the 35TB specified2. Assuming 100MB/s over 24 hours yields 8.6TB NOT the 12TB specified

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Ran disk modeling tool

Assumptions:2 MB block sizeVaried throughput till one of the system components began to be stressed

Measures:Internal FC utilizationExternal Host Adapter utilization Hard Drive utilizationProcessor UtilizationPCI Bus utilization

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Ran disk modeling tool

Assumptions:2 MB block sizeVaried throughput till one of the system components began to be stressed

Measures:Internal FC utilizationExternal Host Adapter utilization This is where most bottlenecks occurredHard Drive utilizationProcessor UtilizationPCI Bus utilization

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Results of modeling

Disk Throughput Comparisons

0500

100015002000250030003500

DS48004 x 4 Gb HAs

192 disks

DS51008 x 4 Gb HAs

384 disks

DS51008 x 8 Gb HAs

384 disks

DS530016 x 8 Gb HAs

384 disks

MB/

sec

Estimated Sony DBB Throughput

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Observations

Loop optimizationsDS4800 performs best with disk drawers in even multiples of 4DS5000 performs best with disk drawers in even multiples of 8

Performance optimizationsIncreasing quantity and speed of Host Adapters had biggest impact, as would be expected for a large block size workloadSATA disk drives provide sufficient bandwidth if present in sufficient quantity

Other WorkloadsGPFS metadata and TSM/VFS database should be on separate, high performance Fiber channel subsystems

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Options for DBB

1. 2 x DS5300; each with– 384 x 1 TB SATA drives, – 16 x 8 Gb/sec Host Adapters– 16 GB Cache

– Provides significant headroom, more than double the current required performance

– Provides performance capacity to allow remote copy capability if desired

2. 2 x DS5100; each with– 384 x 1 TB SATA drives, – 16 x 8 Gb/sec Host Adapters– 8 GB Cache

– This will handle the current requirements, but will already be at about 60% of performance capacity at the beginning

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Reuse options

1. One new DS5300 and upgrade current DS5100 to DS5300DS5300s to have 16 x 8 Gb/sec Host Adapters and 16 GB cache

Reuse SATA drives, adding 4 drawers per subsystem

2. One new DS5100 and upgrade current DS5100 to have 8 x 8Gb/sec Host Adapters and addnl drive attachment (up to 448)

Reuse SATA drives, adding 4 drawers per subsystem

For GPFS metadata and TSM/VFS, reuse DS4800s and add 1 drawer of FC drives to each

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Other considerations

SAN Switch Fabric

Additional GPFS servers

TSM/VFS servers

Backup and restore requirements

DR requirements

This document is solely for the use of Sony Pictures Entertainment (SPE) personnel. No part of it may be circulated, quoted, or reproduced for distribution outside the SPE organization without prior written approval from IBM.

© Copyright IBM Corporation 2009

Projected LTO Library Capacity Requirements

IBM Media & Entertainment Solutions

Sony Pictures Digital Backbone | October 15, 2009 | Confidential © Copyright IBM Corporation 2009

Projected Library Capacity Requirements

Additional Required Frames

SLOTS L53 D53 D53 S54 S54 S54 S54 D53Exisitng Slot Count 2423 287 408 408 1,320 1,320 1,320 1,320 440Required Slot Count 6,701 6,823

10 Feet20 Feet

MONTHS WEEKSTOTAL WEEKS

70TB+35TB PER WEEK

TOTAL TB STORED

18 4.5 81 105 8,505

Assumptions:

• 18 months of stored data at 105 TB of data per week.

• 8.5 PB of storage in month 18.

• The stored data is not compressible.

• The media will fill to 70% of the native capacity.

• LTO4 (560 GB consumed capacity) requires 10,600 slots.

• Current library has 2,423 slots.

• The plan includes moving to LTO5 (1.2 TB consumed capacity) upon its introduction.

• An undetermined amount of LTO4 media will remain in the library.

IBM recommendation:

Add three S54 and one D53 frames (totaling 6,700 slots) upon the introduction of LTO5.

10 Feet Existing FramesAdd 10 Feet Additional frames