31
On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud [email protected]

On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud [email protected] 2 Agenda

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

On Building Public Cloud Service for PostgreSQL and MySQL

Guangzhou Zhang, Alibaba Cloud [email protected]

Page 2: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

2

Agenda •  Resource isolation

•  CPU/IO/MEM •  Disk

•  Privilege management •  Critical issues

•  Handling IO hang •  Handling OOM

•  Enhance the service with Proxy •  Transparent switch-over with Proxy •  Transparent connection pool with Proxy •  Read write auto routing

Page 3: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

3

Self-Intro •  Guangzhou Zhang, Developer from Alibaba Cloud RDS Team •  Alibaba Cloud RDS Team

•  Covers MySQL/PostgreSQL/SQL Server •  Handles infrastructure, database kernel (MySQL/PG), operations, customer support, etc. •  Run one of world largest database instance cluster, and still growing more than 100% YTY

Page 4: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

4

Architecture

Master

Slave

HA

Instances

User requests thru VIP

Proxy layer Server Load Balancer

Page 5: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

5

Agenda •  Resource isolation

•  CPU/IO/MEM •  Disk

•  Privilege management •  Critical issues

•  Handling IO stalls •  Handling OOM

•  Enhance the service with Proxy •  Transparent switch-over with Proxy •  Transparent connection pool with Proxy •  Read write auto routing

Page 6: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

6

Resource isolation – CPU/IO/MEM •  Use bare-metal machines (no VM no network storage )

•  Locate up to hundred of instances on the same host •  Take full control over the whole stack, minimize dependency with predictable SLA and

performance •  Control resource allocation layout

Page 7: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

7

Resource isolation – CPU/IO/MEM •  Use cgroup for CPU/IO/MEM isolation

cgroup for Instance A

master

backend backend

backend

cgroup for Instance B

slave

backend backend

backend

Host Machine

Page 8: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

8

Resource isolation – CPU/IO/MEM •  Use cgroup for CPU/IO/MEM isolation

•  Trick: Separate data and write-ahead-logging (WAL) disks, loose the limit of WAL disk IOPS, why?

•  Maintenance work inside database requires heavy IOs and should be finished as soon as possible

•  Strict limit on WAL IOPS on slave could result in severe replication lag

•  Strict limit on WAL IOPS on master could easily cause a failover switch

•  For better write performance

Why we have to loose the limit? •  No one will constantly be writing to database at a

high pace, since database storage has a upper limit (2T) and is expensive

•  The storage usage of WAL files is charged •  Relocate ‘hot’ instances to different hosts as needed

Why we are able to loose the limit?

cgroup for Instance A master

backend backend

backend

WAL Disk Data Disk

10000 IOPS 500 IOPS

Page 9: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

9

Resource isolation - Disk

•  Regularly collect disk usage of each database instance, mark those with excessive disk usage as locked-down

•  Change the database kernel to add a parameter that controls user access: •  Allow read access •  Disallow any write to instance •  Allow DROP/TRUNCATE commands

Page 10: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

10

Agenda •  Resource isolation

•  CPU/IO/MEM •  Disk

•  Privilege management •  Critical issues

•  Handling IO stalls •  Handling OOM

•  Enhance the service with Proxy •  Transparent switch-over with Proxy •  Transparent connection pool with Proxy •  Read write auto routing

Page 11: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

11

Privilege Management

•  Superuser role is kept from the end users for security reason

•  Prevent users from messing up critical configurations for HA or maintenance •  Convenient for us to handle end users’ roles and maintenance roles differently in code

•  Users keep on asking for more from “superuser” privilege set for managing

data, monitoring, extensions, roles, etc.

•  We have to loose privilege check for some cases to make users happy

Page 12: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

12

Privilege Management

•  Creating a new role – rds_superuser •  Allow this role to manage all other non-superuser’s data, kill sessions, etc. •  Allow it to set specific configuration parameters

Page 13: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

13

Agenda •  Resource isolation

•  CPU/IO/MEM •  Disk

•  Privilege management •  Critical issues

•  Handling IO stalls •  Handling OOM

•  Enhance the service with Proxy •  Transparent switch-over with Proxy •  Transparent connection pool with Proxy •  Read write auto routing

Page 14: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

14

Handling OOM

•  OOM (out of memory) cases were frequently observed •  Cloud users tend to buy small class instance firstly, then regularly upgrade to higher class •  Large objects / big temp memory are widely used in some instances •  Concurrent connections are not suitably configured by application

•  When an instance goes OOM, one of its processes is picked up and killed by the Linux kernel. The whole instance will be restarted with all connections lost.

•  How can we minimize OOM impact for users?

Page 15: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

15

Handling OOM

Linux Box CGroup

backend

backend

backend

CGroup backen

d

backend

backend

Page 16: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

16

Handling OOM

Linux Box CGroup

backend

backend

backend

CGroup backen

d

backend

backend

CGroup

Public CGroup

Kill –USR2

Page 17: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

17

Handling IO stalls

•  Symbols of IO stalls (ext4 filesystem with mount –data=order on Linux) •  Long checkpoint fsync time •  Nearly all operations including setup of a new connection hang for > 10s. All instances on the

same machine are affected.

•  But IO usage is low (< 10%) for the problematic disk.

Page 18: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

18

Handling IO stalls fsync can be slow

Linux Box CGroup

backend

backend

checkpointer

fsync

Ext4 journaling buffer for metadata

Dirty data in IO queue File 2 metadata

File 1 metadata

Page 19: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

19

Handling IO stalls

Linux Box CGroup

backend

backend

checkpointer

fsync

Ext4 journaling buffer for metadata

Dirty data in IO queue File 2 metadata

File 1 metadata

write

write() can be blocked by fsync()

Page 20: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

20

Handling IO stalls

•  Mount ext4 with option data=writeback •  Change the database kernel to call sync_file_range() before calling fsync in

checkpointer process

•  Linux kernel enhancement

Page 21: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

21

Agenda •  Resource isolation

•  CPU/IO/MEM •  Disk

•  Privilege management •  Critical issues

•  Handling IO stall •  Handling OOM

•  Enhance the service with Proxy •  Transparent switch-over with Proxy •  Transparent connection pool with Proxy •  Read write auto routing

Page 22: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

22

Architecture

Master

Slave

HA

Instances

User requests thru VIP

Proxy layer Server Load Balacer

Page 23: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

23

Transparent connection pool with Proxy

Backend thread/process

Proxy Instance

•  Connection establishment is costly •  Performance is bad for short-connection applications

Page 24: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

24

Transparent connection pool with Proxy

Backend thread/process

Proxy DB Instance

Conn pool

Page 25: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

25

Transparent connection pool with Proxy

•  Proxy needs to restore all the resources held by this connection before putting it into the connection pool

•  Authentication needs to be performed against the incoming user when connection is reused

•  Change the database kernel to support “SET AUTH_REQ = ‘user/database/md5password/salt’” for authentication

Page 26: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

26

Transparent switch-over with Proxy

•  There are quite a few cases requiring instance restart. And we implement a restart with switch-over

•  Upgrade or degrade an instance’s class (e.g. from 2G mem to 4G) •  Move an instance from one machine to another •  Database kernel version upgrades

•  Transparent switch-over is the process that does a switch-over without breaking existing user connections. This is key to SLA and customer satisfactory

Page 27: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

27

Transparent switch-over with Proxy

Master

Slave

HA Proxy layer

DB Instances

Page 28: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

28

Transparent switch-over with Proxy

•  Proxy does the connection re-establishing out of transaction boundaries

•  Only for connections that have not done anything not re-creatable •  Temporary table usage •  Statement preparation

•  Change the kernel to support a command like “set connection_user to xxx”for proxy to re-establish connection without an explicit authentication process

Page 29: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

29

Transparent switch-over with Proxy

Master

Slave

HA Proxy layer

DB Instances

Change user to app user

Application user

Page 30: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

30

Read-write auto routing

Master

Slave

HA Proxy layer

DB Instances

Application user

ReadonlyReplica

Page 31: On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for PostgreSQL and MySQL Guangzhou Zhang, Alibaba Cloud guangzhou.zgz@alibaba-inc.com 2 Agenda

31

Agenda •  Resource isolation

•  CPU/IO/MEM •  Disk

•  Privilege management •  Critical issues

•  Handling IO stalls •  Handling OOM

•  Enhance the service with Proxy •  Transparent switch-over with Proxy •  Transparent connection pool with Proxy •  Read write auto routing