Upload
amazon-web-services
View
347
Download
1
Embed Size (px)
Citation preview
Digital Media Ingest and Storage Options on AWS
Henry ZhangAmazon Web Services
Content has Gravity and is getting heavier …
…it’s easier to move processing to the content
4k/8kContent
Where is the problem?
More Bandwidth$$$$$
More PowerfulCompute $$$$$
Way more Storage$$$$$
Some Progress(ABR, HEVC, VP10)
Where is the sliding scale on my Infrastructure?
File Block Object
AWS Storage options for digital media
Amazon
EFS
Amazon
EBS
Amazon EC2
Instance
storage
Amazon
S3Amazon
Glacier
A Concept - the Content LakeInspired from Data Lake (Coined by James Dixon in 2010)
A single store of all of digital content that you create and
acquire in any form or factor
•Don’t assume any resolutions/formats (for now or future)
•It is up to the consumer (application consuming the content) to use the
appropriate infrastructure for processing
Amazon S3 : the Content Lake
• Durable, cost-effective and fast
• Highly scalable front-end – Multi-part uploads (parallel writes)
– Range-gets (parallel reads)
• No need for capacity planning or provisioning
• Use Amazon S3 with on-premises storage in a hybrid model
• Secure
S3 scalability: buckets and objects
Hydrating the Content Lake
Amazon S3
Amazon S3(multi-part Upload)
Direct Connect
N x 1G | 10G
Massively Scalable Front-end
Introducing AWS Import/Export Snowball
Scale and Speed
• Up to 50TB Capacity per device
• 10Gbps and 1Gbps connectivity
• Parallel data transfer enables PBs transferred in a week
Secure
• Tamper-resistant enclosure
• 256-bit encryption with KMS
• Secure data erasure
Simple
• Manage entire process through AWS Console
• Lightweight data transfer client
• Notifications
What is Snowball? Petabyte scale data transport
E-ink shipping
label
Ruggedized
case
“8.5G Impact”
All data encrypted
end-to-end50 TB
10G network
Rain & dust
resistant
Tamper-resistant
case & electronics
Can I drop it?
• No (please don’t)
• Snowball is its own box
• Has had many drop tests already
• Can handle 8.5G impacts
• Designed for shipping
How it works
What does it cost?
• $200 / job plus shipping
• Includes 10 days to fill the device at your site
• $15/day after the tenth day on site
• Standard Amazon S3 charges apply
• $0.03/GB to transfer data out
• $0.00/GB to transfer data in
How fast is that truck full of drives?
• Less than 1 day to transfer 250TB via 5x10G connections with 5
Snowballs, less than 1 week including shipping
• Number of days to transfer 250TB via the Internet at typical
utilizations
InternetConnectionSpeed
Utilization 1Gbps 500Mbps 300Mbps 150Mbps
25% 95 190 316 632
50% 47 95 158 316
75% 32 63 105 211
What does it cost?
Example 1:
• 250TB loaded on to 5 Snowballs
• 8 days at your site
• 5 * $200 = $1,000 plus shipping
Example 2:
• 30TB exported on to 1 Snowball
• 8 days at your site
• $200 + 30TB * $0.03/GB = $1,121.60 plus shipping
Edge Locations
Availability Zone
Region
Dallas (2)
St.Louis
Miami
JacksonvilleLos Angeles (2)
Seattle
Ashburn (3)
Newark
New York (3)
Dublin
London (2)
Amsterdam (2)
Stockholm
Frankfurt (2)Paris (2)
Singapore(2)
Hong Kong (2)
Tokyo (2)
Sao Paulo
South Bend
San JosePalo AltoHayward
OsakaMilan
Sydney
MadridSeoul
Mumbai
Chennai
Regional Lakes …
Source
(Virginia)
Destination
(Oregon)
• Only replicates new PUTs. Once S3
is configured, all new uploads into a
source bucket will be replicated
• Entire bucket or prefix based
• 1:1 replication between any 2
regions
Use cases
Compliance - store data hundreds of miles apart
Lower latency - distribute data to remote customers/partners)
S3 cross-region replicationAutomated, fast, and reliable asynchronous replication of data across AWS regions
Amazon S3
Amazon S3 (range-gets)
Direct Connect
N x 1G | 10G
Massively Scalable S3 Front-end
EBS
Instance
Store
cMassively Scalable Compute on AWS Cloud
On-Prem Apps
Consuming the Content Lake
Object life cycle from hot to cold
S3 Standard• Primary data
• 11 9’s of durability
• 2.75c – 3c per GB/month, $338 -369 per TB/year
S3 – Infrequent Access• Active Archives
• Mezzanine files
• 11 9’s of durability
• 1.25c per GB/month, $154 per TB/year
• 1c per GB for retrievals
Glacier
• Deep/offline archives
• WORM-compliant
data
• 11 9’s of durability
• 0.7c per GB/month,
$86 per TB/year
Data tiering using Life Cycle Policies
Actual customer quote: $0.0125 ?! OMG I will
take all your storage!!!
1 PB raw storage
800 TB usable storage
600 TB allocated storage
400 TB application data
S3 capacity pricing—pay only for what you use!
AWS Cloud
Storage
Securing your data on S3
• AWS alignment with the latest MPAA cloud based application guidelines for content security –August 2015
• VPC private endpoint for Amazon S3 – enables a true private workflow capability
• Encryption & key management capabilities
• Amazon Glacier Vault for high-value media/originals
Preserve, retrieve, and restore every version
of every object stored in your bucket
S3 automatically adds new versions and
preserves deleted objects with delete markers
Easily control the number of versions kept by
using lifecycle expiration policies
Easy to turn on in the AWS Management
Console
Key = photo.gif
ID = 121212
Key = photo.gif
ID = 111111
Versioning Enabled
PUTKey = photo.gif
S3 versioning
Amazon S3 event notifications
Delivers notifications to Amazon SNS, Amazon SQS, or AWS
Lambda when events occur in Amazon S3
S3
Events
SNS topic
SQS queue
Lambda function
Notifications
Foo() {
…
}
Support for notification when
objects are created via Put,
Post, Copy, or Multipart
Upload.
Support for notification when
objects are deleted, as well
as with filtering on prefixes
and suffixes for all types of
notifications.
Reference Architecture – Content Processing Pipeline
(Using Lambda)
S3 multi-part API
S3 as backend storage for Content Files acesable to
other processing tasks
Amazon Elastic
Transcoder
S3 Notification
Trigger a Lambda
Function to Start a
transcoding job
Ingest
S3 Notification
Lambda function to
generate a signed
URL to share the
file
Update CMS or
Metadata
Elastic File System - Rendering in the Cloud
• Designed to support petabyte scale file systems
• Throughput scales linearly with storage
• Same latency spec across each AZ
• Thousands of concurrent NFS connections
• Works great for large I/O sizes
• Pay for only what you use not what you provision
• Managed with multi-copy durability
Media Workloads (redefined)
EBSInstance
Store
Amazon EBS/EFS/EC2 Instance Store
Process
Partner/Affiliate/Service Provider
User Delivery/ConsumptionVFX/Production
On-Prem Apps
Archive
Amazon Glacier (Life Cycle Policies)
c
c
Direct Connect
Content Access Transfer
Disposable Infrastructure
Auto-scaling
Workload specific
Amazon S3
EFS
Q&A
Learn more at: http://aws.amazon.com/s3/
http://aws.amazon.com/glacier/
http://aws.amazon.com/importexport/