View
1.234
Download
3
Category
Preview:
Citation preview
Zero-downtime Postgres upgrades
Restarting databases without the apps noticing
@ChrisSinjo
GOCARDLESS
POST /cash/monies HTTP/1.1
{ amount: 100 }
💰💰💰
High 💵 per-request
Uptime is 🔑
Good durability guarantees
Good durability guarantees
Feature-cautious
Good durability guarantees
Feature-cautious
Transactions are cool
–Postgres
“Speak to this one node.”
Client
Postgres
Client
PostgresPostgresReplication
Client
PostgresPostgresReplication
Wake a human up
Client
PostgresPostgresReplication
Client
PostgresPostgres
Client
PostgresPostgres
Client
PostgresPostgresReplication
Awful time-to-recovery
Error-prone
You gotta perform:
- Many steps - In the right order - Perfectly
Don’t make a
tired SRE think
Add automation
Pacemaker
A clustering tool
Client
PostgresPostgresReplication
How do we know a node has failed?
Client
PostgresPostgresReplication
Client
PostgresPostgresPostgresRepl Repl
Client
PostgresPostgresPostgres Repl Repl
Pacemaker Pacemaker Pacemaker
Client
PostgresPostgresPostgres Repl Repl
Pacemaker Pacemaker Pacemaker
VIP
Client
PostgresPostgresPostgres Repl Repl
Pacemaker Pacemaker Pacemaker
VIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
VIP
PostgresPostgresPostgresRepl
Pacemaker Pacemaker Pacemaker
Client
VIP
PostgresPostgresPostgresRepl
Pacemaker Pacemaker Pacemaker
Client
VIP
PostgresPostgresPostgresRepl
Pacemaker Pacemaker Pacemaker
Client
VIP
Client
PostgresPostgresPostgres Repl
Repl
VIP
Pacemaker Pacemaker Pacemaker
$💯
Seems hard, right?
It kinda is
You gotta know:
- Postgres - Distributed systems - Pacemaker
Get someone else to run it for you
Client
PostgresPostgresPostgres Repl Repl
Pacemaker Pacemaker Pacemaker
VIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
VIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
VIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
VIP
Every move means a connection reset
Every move means dropped requests
POST /cash/monies HTTP/1.1
{ amount: 100 }
💰💰💰
POST /cash/monies HTTP/1.1
{ amount: 100 }
500 Internal Server Error
What does this mean for upgrades?
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
VIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
9.4.9 9.4.9 9.4.9
VIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
9.4.9 9.4.9 9.4.9
Repl Repl
VIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
9.4.10 9.4.9 9.4.10
Repl Repl
VIP
Client
PostgresPostgresPostgres Repl
Repl
VIP
Pacemaker Pacemaker Pacemaker
9.4.10 9.4.9 9.4.10
Every upgrade means a connection reset
Every upgrade means dropped requests
POST /cash/monies HTTP/1.1
{ amount: 100 }
500 Internal Server Error
Solution: never upgrade
🙄
Not upgrading is
never an option
Solution: never upgrade
Solution: never upgrade
Solution: ???
1thing missing
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
VIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
PgBouncerPgBouncer PgBouncerVIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
PgBouncerPgBouncer PgBouncerVIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
PgBouncerPgBouncer PgBouncerVIP
VIP
PgBouncer has This One Weird Trick™
PAUSE;
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
PgBouncerPgBouncer PgBouncerVIP
VIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
PgBouncerPgBouncer PgBouncerVIP
VIP
PAUSE;
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
PgBouncerPgBouncer PgBouncerVIP
PAUSE;
VIP
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
PgBouncerPgBouncer PgBouncerVIP
PAUSE;
VIP
So what does this mean for upgrades?
Client
PostgresPostgresPostgres
Pacemaker Pacemaker Pacemaker
PgBouncerPgBouncer PgBouncerVIP
VIP
Client
PostgresPostgresPostgres
PgBouncerPgBouncer PgBouncerVIP
VIP
Client
PostgresPostgresPostgres
PgBouncerPgBouncer PgBouncerVIP
VIP
9.4.10 9.4.9 9.4.10
Client
PostgresPostgresPostgres
PgBouncerPgBouncer PgBouncerVIP
VIP
9.4.10 9.4.9 9.4.10
PAUSE;
Client
PostgresPostgresPostgres
PgBouncerPgBouncer PgBouncerVIP
9.4.10 9.4.9 9.4.10
VIP
PAUSE;
Client
PostgresPostgresPostgres
PgBouncerPgBouncer PgBouncerVIP
9.4.10 9.4.9 9.4.10
VIP
RESUME;
Client
PostgresPostgresPostgres
PgBouncerPgBouncer PgBouncerVIP
9.4.10 9.4.10 9.4.10
VIP
RESUME;
$💯
Caveats
Minor versions
9.4.9 → 9.4.10
pglogical
Minor versions
Long-running transactions
while(running_queries): if(now > timeout): abandon_migration else: sleep(0.1)
promote_new_primary
Minor versions
Long-running transactions
Pause length
7-10s total
$💯
One more thing… (#sorrynotsorry)
We’re hiring✌❤
@ChrisSinjo @GoCardlessEng
Thank you✌❤
@ChrisSinjo @GoCardlessEng
Questions?✌❤
@ChrisSinjo @GoCardlessEng
Recommended