Fast growing startups building high scale applications demand a lot from their infrastructure and in particular from their databases. Often, databases become the bottleneck of the startups technology stack, with the risk of inhibiting fast growth as they are not easy to set up, operate and scale in the cloud. This webinar focuses on how to build scalable databases in the Cloud and covers how to effectively combine the use of relational, NoSQL, and even data warehouse databases, which have become a reality for startups with the launch of Amazon Redshift. Key takeaways: Understand the trade-off between SQL and NoSQL and when to go for a hybrid model. Best practices in setting up your database in the AWS cloud whether using managed services or managing it yourself. Learn how to minimize the costs of your database with the right architecture and pricing models. Who should attend: DBAs Startup CTOs Developers Engineers Architects Growth Hackers
Scalable Databases for Fast Growing Startups Blair Layton Business Development Manager Database Services Amazon Web Services - APAC
Agenda Self-Managed or Managed Database Services? NoSQL or Relational? Performance Tips and Tricks How to scale from 1 to 10,000,000 users? How do I save money? Summary Q&A
backup & recovery, data load & unload performance tuning 25%25%25%25%40%40%40%40% 5%5%5%5% 5%5%5%5% scripting & coding security planning install, upgrade, patch and migrate documentation, licensing & training Why Managed Databases?
If You Host Your Databases On-premises Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups Scaling High availability DB s/w installs OS installation you App optimization
Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups Scaling High availability DB s/w installs OS installation you App optimization If You Host Your Databases On-premises
If You Host Your Databases in EC2 Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups Scaling High availability DB s/w installs OS installation you App optimization
OS patches DB s/w patches Database backups Scaling High availability DB s/w installs you App optimization Power, HVAC, net Rack & stack Server maintenance OS installation If You Host Your Databases in EC2
If You Choose a Managed Database Service Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups App optimization High availability DB s/w installs OS installation you Scaling
differentiated effort increases the uniqueness of an application
Amazon RDS Amazon DynamoDB Amazon Redshift Amazon ElastiCache Compute Storage AWS Global Infrastructure Database Application Services Deployment & Administration Networking AWS Database Services Scalable High Performance Application Storage in the Cloud
Relational Databases Fully managed; zero admin MySQL, Oracle, Postgres, SQL Server Trillions of I/O requests/month Amazon RDS
Flipboard relies on Amazon RDS Flipboard is an online magazine with millions of users and billions of flips per month Uses Amazon RDS and its Multi-AZ capabilities to store mission critical user data "We were able to go from concept to delivered product in about six months with just a handful of engineers." - Greg Scallan, Chief Architect, Flipboard
Manageability Rapid deployment with pre-configured parameters Patch Management Monitoring and Metrics Availability and Data Durability Automated Backups and Point-In-Time-Recovery DB Snapshots Automatic Host Replacement (Single-AZ) Multi-AZ deployments Scalability Push-Button Scaling Storage, Memory and Compute Read Replicas Key Features
RDS for Production Workloads AmazonAmazonAmazonAmazon RDSRDSRDSRDS ConfigurationConfigurationConfigurationConfiguration ImproveImproveImproveImprove AvailabilityAvailabilityAvailabilityAvailability IncreaseIncreaseIncreaseIncrease ThroughputThroughputThroughputThroughput ReduceReduceReduceReduce LatencyLatencyLatencyLatency PushPushPushPush----Button ScalingButton ScalingButton ScalingButton Scaling MultiMultiMultiMulti AZAZAZAZ ReadReadReadRead ReplicasReplicasReplicasReplicas Provisioned IOPSProvisioned IOPSProvisioned IOPSProvisioned IOPS Read ReplicasPush-Button Scaling Provisioned IOPS Region Multi-AZ availability zone availability zone
In-Memory Cache Elastic and reliable Memcached or Redis Fully managed; zero admin Amazon ElastiCache
ElastiCache: Fully Managed Cache Service Easy to Deploy Deploy master- slave(s) configuration with a few button clicks or API calls Easy to Migrate Compatible with memcached or Redis Existing code will work when you update node end points Easy to Administer ElastiCache automatically replaces failed nodes and patches software as needed CloudWatch enables you to monitor cache performance metrics Easy to Secure Supports VPC and Security Group configurations Easy to Scale Provide assisted scale up and scale out capability
Application Server Hot Items Small, frequently-accessed items are ideal candidates for read caching Reduce server-side latency to 3TB of storage on RDS
NoSQL or Relational?
Not available on AWS Spectrum of Database Options SQL NoSQL Low Cost High Cost Do-it Yourself Fully Managed
Spectrum of Database Options SQL NoSQL Do-it Yourself Fully Managed
Thinking About the Questions Should I use SQL or NoSQL? Should I use MySQL or PostgreSQL? Should I use Redis, Memcache, or ElastiCache? ?Should I use MongoDB, Cassandra, or DynamoDB?
Actually, Thinking About the Right Questions What are my scale and latency needs? What are my transactional and consistency needs? What are my read/write, storage and IOPS needs? What are my time to market and server control needs? ?
Factors to Consider Factors SQL NoSQL Application App with complex business logic? Web app with lots of users? Transactions Complex transactions, joins, updates? Simple data model, updates, queries? Scale Developer managed Automatic, on-demand scaling Performance Developer architected Consistent, high performance at scale Availability Architected for fail-over Seamless and transparent Core Skills SQL + Java/Ruby/Python/PhP NoSQL + Java/Ruby/Python/PhP Best of both worlds: Possible to Use SQL and NoSQL models in one AppBest of both worlds: Possible to Use SQL and NoSQL models in one App
Performance Tips and Tricks
Performance Tips and Tricks Understand your workload Read:Write ratio, I/O requirements, CPU requirements Identify bottlenecks CPU, Memory, Disk I/O, Network latency/bandwidth Use Cloudwatch and OS metrics Choose the right instance type High CPU, High Memory, High Storage, etc. Understand EBS!
Amazon EBS Magnetic Amazon Elastic Block Storage (EBS) IOPS: ~100 IOPS steady-state, with best-effort bursts to hundreds. 40-200 IOPS in terms of variability. Throughput: variable by workload, best effort to 10s of MB/s. Latency: Varies, reads typically 100 First lets separate out our single host into more than one Web Database Use RDS to make your life easier Web Instance Elastic IP RDS DB Instance Amazon Route 53 User
User > 1000 Next lets address our lack of failover and redundancy issues Elastic Load Balancing Another web instance In another Availability Zone Enable Amazon RDS multi-AZ Web Instance RDS DB Instance Active (Multi-AZ) Availability Zone Availability Zone Web Instance RDS DB Instance Standby (Multi-AZ) Elastic Load Balancing Amazon Route 53 User
User >10 ks100 ks RDS DB Instance Active (Multi-AZ) Availability Zone Availability Zone RDS DB Instance Standby (Multi-AZ) Elastic Load Balancing RDS DB Instance Read Replica RDS DB Instance Read Replica RDS DB Instance Read Replica RDS DB Instance Read Replica Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Web Instance Amazon Route 53 User
This will take us pretty far honestly, but we care about performance and efficiency, so lets clean this up a bit
Shift Some Load Around Lets lighten the load on our web and database instances Move static content from the web instance to Amazon S3 and CloudFront Move dynamic content from the Elastic Load Balancing to CloudFront Move session/state and DB caching to ElastiCache or DynamoDB Web Instance RDS DB Instance Active (Multi-AZ) Availability Zone Elastic Load Balancing Amazon S3 Amazon CloudFront Amazon Route 53 User ElastiCache Amazon DynamoDB
User >500k+ Availability Zone Amazon Route 53 User Amazon S3 Amazon Cloudfront Availability Zone Elastic Load Balancing DynamoDB RDS DB Instance Read Replica Web Instance Web Instance Web Instance ElastiCache RDS DB Instance Read Replica Web Instance Web Instance Web Instance ElastiCacheRDS DB Instance Standby (Multi-AZ) RDS DB Instance Active (Multi-AZ)
From 500K to 1 Million Users Getting serious now Significant user base Plenty of attention if things go wrong Interesting phase for startups with funding rounds
Time to make some radical improvements at the web & app layers
SOAing Move services into their own tiers or modules. Treat each of these as 100% separate pieces of your infrastructure and scale them independently. Use queues! Amazon.com and AWS do this extensively! It offers flexibility and greater understanding of each component.
Users > 1 Million RDS DB Instance Active (Multi-AZ) Availability Zone Elastic Load Balancer RDS DB Instance Read Replica RDS DB Instance Read Replica Web Instance Web Instance Web Instance Web Instance Amazon Route 53 User Amazon S3 Amazon Cloudfront Amazon DynamoDB Amazon SQS ElastiCache Worker Instance Worker Instance Amazon CloudWatch Internal App Instance Internal App Instance Amazon SES
The next big steps
From 5 to 10 Million Users You may start to run into issues with your database around contention on the write master. How can you solve it? Federation (splitting into multiple DBs based on function) Sharding (splitting one data set up across multiple hosts) Moving some functionality to other types of databases NoSQL for hot tables, lookup tables, leaderboards/scoring, meta data Data warehouse for analytics: user behavior, performance monitoring, a/b testing results, KPIs/dashboards.
How do I Save Money?
Saving $$$ Use managed database services Focus your limited resources on the application Elasticache can reduce your database costs Understand how to scale from the start Save redesign work and unhappy customers Start and stop instances as required Use the AWS platform Dont reinvent the wheel, concentrate on your core competency Using CloudFront will reduce your costs on EC2 dramatically Purchase RIs and use spot instances Constantly monitor and right-size your environment
Sorry, How do I Scale my Database?
Summary Decide on self-managed or managed database services Choose the right database for your use case and skillsets to start with Use Multi-AZ for your infrastructure Choose the right instance family and size for your workloads Understand the 3 types of EBS (Magnetic, General Purpose and PIOPS) Make use of self-scaling services (Elastic Load Balancing, Amazon S3, Amazon SNS, SQS, Amazon SES, etc.) Build in redundancy at every level Blend SQL & NoSQL wisely Use a data warehouse to offload large analytical queries from your main database Cache data both inside and outside your infrastructure Purchase RIs and use Spot instances Split tiers into individual services (SOA) Use autoscaling once you are ready for it Use automation tools in your infrastructure Make sure you have good metrics, monitoring, and logging tools in place Dont reinvent the wheel