Upload
amazon-web-services
View
1.875
Download
0
Embed Size (px)
Citation preview
Financial Services Analytics on AWS
Infrastructure Regions Availability Zones Points of Presence
EnterpriseApplications Virtual Desktops Sharing & Collaboration
Core Services Storage(Object, Block and Archival)
Compute(VMs, Auto-scaling and Load Balancing)
Databases(Relational, NoSQL, Caching)
Networking(VPC, DX, DNS)
CDN
Access Control
Usage & Resource Tracking
Monitoring and Logs
Administration & Security
Key Storage & Management
IdentityManagement
Service Catalog
Platform Services
Deployment & ManagementOne-click web app deploymentDev/ops resourcemanagement
Resource Templates
PushNotifications
Mobile Services
Identity
Sync
Mobile Analytics
App ServicesQueuing &Notifications
Workflow
App streaming
Transcoding
Search
Analytics
Hadoop
Data Pipelines
Data warehouse
Real-timeStreaming Data
Code Deploy
Code Pipeline
Code Commit
Machine Learning
US-WEST (Oregon) EU-WEST (Ireland)ASIA PAC (Tokyo)
ASIA PAC (Singapore)
US-WEST (San Francisco)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
GOV CLOUD
ASIA PAC (Sydney)
Global Infrastructure
CHINA (beta)
EU-CENTRAL(Germany)
Current Region
Announced Region
Hong Kong
Ohio
London
India
Availability Zone
Global Infrastructure
Requirements
Store Any Amount of DataWithout Capacity Planning
Perform Complex Analysis on Any Data
Scale on Demand
Store Data SecurelyMove to Real Time
Realised Value
Agile Analytics, DevOps in the WarehouseDecrease Time to Market
Build Environments Quickly
Reduce CostsReduce Capital Expenditure
Enable Global Reach
87% now will consider cloud
for their big data Advanced analytics
closing-in on BI
Issues beyond security (reality, perception, regulation) being
addressed by march of technology
Building & deploying Big Data analytics or processing applications in the cloud can reduce complexity and time to market
Source: Gigaom Research data warehous ing survey 2014
Ingestion…
Integration…
Retention
STORAGECOMPUTECOMPUTE COMPUTE
COMPUTECOMPUTE
COMPUTE
COMPUTE
COMPUTECOMPUTE
COMPUTE
Availability99.99%
Durability 99.999999999%
A Distributed Object StoreNot a file system
No Single Points of FailureEventually consistent
Paradigm Object storePerformance Very FastRedundancy Across Availability Zones
Security Public Key / Private KeyPricing $0.03/GB/month
Typical use case Write once, read many
Simple Storage Service
Highly scalable object storage for the internet
1 byte to 5TB in size99.999999999% durability
S3 – Standard S3 – Infrequent Access Amazon Glacier
34 secs per terabyte
GB/Second
Read
er C
onne
ctio
ns
Amazon S3 provides near linear scalability
S3 Streaming Performance100 VMs; 9.6GB/s; $26/hr 350 VMs; 28.7GB/s; $90/hr
S3 Performance & Scalability
AWS Security Services
Compute Storage
AWS Global Infrastructure
Database
App Services
Deployment & Administration
Networking
Analytics
IAM
Users
AWS
Directory Service
AD Connector
Direct Connect
Hardware VPN
Amazon KinesisManaged Service for Real Time Big Data ProcessingCreate Streams to Produce & Consume DataElastically Add and Remove Shards for PerformanceUse Kinesis Worker Library to Process DataIntegration with S3, Redshift and Dynamo DB
Compute Storage
AWS Global Infrastructure
Database
App Services
Deployment & Administration
Networking
Analytics
Application Services
Data Sources
App.4
[Machine Learning]
AWS E
ndpo
int
App.1
[Aggregate & De-‐Duplicate]
Data Sources
Data Sources
Data Sources
App.2
[Metric Extraction]
S3
DynamoDB
Redshift
App.3[Sliding Window Analysis]
Data Sources
Availability Zone
Kinesis Streams
Availability Zone
Availability Zone
Shard 1Shard 2Shard N
without writing an application managing infrastructure
Batch compress encryptin as little as 60 secs
Capture and submit streaming data to Firehose
Firehose loads streaming data continuously into S3 and Redshift
Analyze streaming data using your favorite BI tools
Kinesis Firehose
Traditional Business Intelligence…
OLAP…
Data Sources for ML
Relational Database ServiceManaged Database-as-a-ServiceNo need to install or manage database instancesAutomated Backup/Recover, Patching & UpgradeScalable and fault tolerant configurations6TB & 30,000 IOPS
Managed Database
RDS Dynamo DB
Redshift ElastiCache
Compute Storage
AWS Global Infrastructure
Database
App Services
Deployment & Administration
Networking
Analytics
Managed Data Warehouse
RedshiftManaged Massively Parallel Petabyte Scale Data WarehouseStreaming Backup/Restore to S3Load data from S3, DynamoDB and EMRExtensive Security FeaturesScale from 160 GB -> 1.6 PB Online
RDS Dynamo DB
Redshift ElastiCache
Compute Storage
AWS Global Infrastructure
Database
App Services
Deployment & Administration
Networking
Analytics
Redshift lets you start small and grow bigExtra Large Node (dc1.xl & ds2.xl)3 spindles, 15-30GiB RAM 2 or 4 virtual cores, 10GigE
Single Node (160GB SSD or 2TB Magnetic)
Cluster 2-32 Nodes (320GB SSD – 64TB Magnetic)
8 Extra Large Node (dc1.8xl & ds2.8xl)24 spindles, 120-244GiB RAM, 2.56TB SSD or 16TB Magnetic, 16 or 32 virtual cores, 10GigE
Cluster 2-100 Nodes (5TB SSD – 1.6PB Magnetic)
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL 8XL
XL
XLXLXLXL
XLXLXLXL
XLXLXLXL
XLXLXLXL
XLXLXLXL
XLXLXLXL
XLXLXLXL
XLXLXLXL
23
LEADING INDEX PROVIDER WITH 41,000+ INDEXES
ACROSS ASSET CLASSES AND GEOGRAPHIES
Over 10,000 Corporate Clients in 60 countries
Our technology powers over
70 MARKETPLACES,
regulators, CSDs and clearing-houses
in over 50 COUNTRIES100+ DATA
PRODUCT OFFERINGSsupporting 2.5+ millioninvestment professionals
and users IN 98 COUNTRIES
26 Markets 3 Clearing Houses5 Central Securities
Depositories
Lists more than 3,500 companies in 35 countries,
representing more than $8.8 trillion in total market value
Exploratory Analytics…
Data Cleansing…
Advanced Data Science
Elastic MapReduceManaged, elastic Hadoop (1.x & 2.x) clusterIntegrates with S3, DynamoDB and RedshiftInstall End User Tools Automatically (Spark, Presto, Impala)Support for EC2 Spot InstancesTransient or Always on Clusters
Managed Big Data
Elastic MapReduce
Compute Storage
AWS Global Infrastructure
Database
App Services
Deployment & Administration
Networking
Analytics
Try different configurations to find the optimal cost/performance balance
CPUc4 family
cc2.8xlarged2 family
Memorym2 familyr3 family
Disk/IOd2 familyi2 family
Generalm3 family
Choose your instance types
ETL Machine Learning Spark HDFS
Weather Insurance for Farms
Challenge:Volatile weather is deadly to crops like grapes
60 years of crop data
200 TB of S3 Data
1M government Doppler radar points
Solution:Built a predictive model based on freely available data:
150B Soil Observations 850K Precision Rainfall Grids Tracked
3M Daily Weather Measurements
50 EMR clusters process new data as it comes into S3
each day, continuously updating the model
$10-20M Savings by moving Platform to AWS
Predictive Analytics…
Easily create machine learning models
Visualize and optimize models
Put models into production in seconds
Battle-hardened technologyMachine Learning
SoftwareDevelopment
Introducing Amazon Machine Learning
Developing with Amazon Machine Learning
Buildmodel
Validate &optimize
Make predictions
1 2 3
Use existing data in S3, Redshift and RDS
Automatic data visualization & exploration
Descriptive and summary statistics
Your data doesn’t have to be perfect
Missing data, malformed data records, type validation
Building a Predictive Model
Model Validation and Optimization Tools
Batch predictionsAsynchronous predictions with trained model
Real time predictionsSynchronous, low latency, high throughputMount API end-point with a single click
Making Predictions
Data Visualiation…
Old-guard BI
Costs Too Much
Pay $ million before seeing first analysis3 year TCO $150 to $250 per user per month
Takes Too Long
Spend 6 to 12 months of consulting and SW implementation time
A very fast, cloud-powered, BI service for 1/10th the cost of old-guard BI software
$9 per user per month
With 1 year commitment
Business user
Sign-in
First analysis in about 60 seconds
Register for preview beginning Oct 7 at aws.amazon.com/quicksight
Business User
QuickSight API
Data Prep Metadata SuggestionsConnectors SPICE
Business User
QuickSight UI
Mobile Devices Web Browsers
Partner BI products
AmazonS3
Amazon Kinesis
Amazon DynamoDB
Amazon EMR
Amazon Redshift Amazon RDSFiles Third-party
Native mobile experienceiOS, Android
Full experience on tablets
Consumption experience on
smart phones
Very fast response
$9
$18
Per user per month
Per user per month
Integrated Analytics
Validate records, recordsets or datasets
Store validation status
Manage validation rules
Abide data store
Validation rules
Validation results / log
Manage ingestion rules
Split data into records
Assign record identifiers Output records
Store event details – rule, stamp etc
Assign record metadata
Check record format
Transform to common format
Ingestion rules
Ingestion audit log
Get data
Manage input queue
Manage receive rules
Assign dataset identifier
Assign dataset metadata
Store original data
Store event details
Receive rules
Receive audit log
Original data store
Data service endpoints Fetch data set Perform
calculations Save datasets Re-‐validate
Store event details
Manage processing rules
Processing audit log
Processing rules
Format data Check data
Store event details
Manage output rules
Send output
Output audit log
Output rules
Data service endpoints
Storage
Service endpoint
Function
Rules
Receive
Ingest
Validate
Process
Output
Raw data
Common format
Validated
Processed
Output format
Data path
Events & logic
Optional data path
Raw data*
With dataset metadata*
* Visio 2013 only © Abide Financial
Thank You!