51
PostgreSQL Enterprise Features Michael Banck <[email protected]> Percona Live Europe 2018

PostgreSQL Enterprise Features - Percona Enterprise...PostgreSQL: What the Analysts Say I Donald Feinberg, Gartner: I ‘’Postgres functionality has increased greatly and is now

  • Upload
    others

  • View
    51

  • Download
    0

Embed Size (px)

Citation preview

PostgreSQL Enterprise Features

Michael Banck <[email protected]>

Percona Live Europe 2018

Michael Banck

I Senior Consultant / Project Manager at credativ (since 2009)

I credativ database team

I Debian Developer (since 2001)

I Debian PostgreSQL packaging team

I Several PostgreSQL patches, e.g.

I checksum validation during base backupsI exclude schemas during pg restoreI permanent replication slot setup in pg basebackup

Michael Banck <[email protected]> credativ GmbH 1

PostgreSQL - Overview

I ‘’The World’s Most Advanced Open Source Relational Database”

I Extensible, object-relational database system

I Created as a research project at Berkeley, community-based development since themid-90s

I Vendor-neutral, commercial support available from multiple companies

I “Postgres Global Development Group”, core team (5 members), around 30 committers

I No copyright assignments, no open-core, no dual licensing

I BSD/MIT-style licence

I Many (also proprietary) forks

Michael Banck <[email protected]> credativ GmbH 2

PostgreSQL - Forks

Michael Banck <[email protected]> credativ GmbH 3

PostgreSQL - Forks

Michael Banck <[email protected]> credativ GmbH 4

PostgreSQL - Cloud Provider

Michael Banck <[email protected]> credativ GmbH 5

Top Feature-Requests 2009

I Simple built-in replicationI In-place upgradesI Administration/monitoringI Driver quality/maintenanceI Extension managementI Per-column locale/collationI Materialized and updatable viewsI Autonomous transactionsI Parallel queriesI Index-only scansI Merge/upsert statementI Managed partitioningI Hot StandbyI Recursive queries and window functions

Michael Banck <[email protected]> credativ GmbH 6

Top Feature-Requests 2009 - Status in 2018

I Simple built-in replicationI In-place upgradesI Administration/monitoringI Driver quality/maintenanceI Extension managementI Per-column locale/collationI Materialized and updatable viewsI Autonomous transactionsI Parallel queriesI Index-Only ScansI Merge/upsert statementI Managed partitioningI Hot StandbyI Recursive queries and window functions

Michael Banck <[email protected]> credativ GmbH 7

PostgreSQL: What the Analysts Say

I Donald Feinberg, Gartner:

I ‘’Postgres functionality has increased greatly and is now more than sufficient to run bothmission-critical and non-mission-critical applications.”

I Noel Yuhanna, Forrester:

I ‘’PostgreSQL has the second-largest open source community; has competitive technologyand features and continues to expand its growth across various industries.”

I ‘’Performance, integration, security, unpredictable workloads, and high availability arecompanies’ top data management challenges.”

I Matt Aslett, 451 Group:

I ‘’PostgreSQL is a proven database for enterprise relational application workloads”I ‘’Increased commercial offerings and cloud-based functionality are driving adoption”

http://2013.pgconf.de/de/talks/edb-pggermany-2013-v01.pdf

Michael Banck <[email protected]> credativ GmbH 8

Enterprise Features - Definition

I Predictable major and patch releases, long support timeframes

I Fault tolerance and data consistency

I Enterprise relevant security-features

I Interoperability and extensibility

I Integrated operations, monitoring, backup

I Replication and high availability ‘

I Big data analytics

I Vertical and horizontal scaling

Michael Banck <[email protected]> credativ GmbH 9

But Wait - What about Antivirus?

Michael Banck <[email protected]> credativ GmbH 10

pg snakeoil - The PostgreSQL Antivirus

I Typical antivirus software on PostgreSQL has severe drawbacks

I Severely affecting performanceI Making the filesystem unreliableI Unclear failure modes

I Running antivirus software is sometimes required by local policy

I PostgreSQL extension pg snakeoil provides antivirus capabilities

I Leverages ClamAV to scan PostgreSQL data

I Technology preview

https://github.com/credativ/pg_snakeoil

Michael Banck <[email protected]> credativ GmbH 11

Predictable Major and Patch Releases, Long SupportTimeframes

Michael Banck <[email protected]> credativ GmbH 12

Predictable Major and Patch Releases, Long SupportTimeframes

I One major version per year, usually in September/October

Version Release Date

11 October 18, 201810 October 5, 20179.6 September 29, 20169.5 January 7, 20169.4 December 18, 20149.3 September 9, 20139.2 September 10, 20129.1 September 12, 20119.0 September 20, 2010

Michael Banck <[email protected]> credativ GmbH 13

Predictable Major and Patch Releases, Long SupportTimeframes

I Time-based code freeze (Q1), subsequent beta phase

I Release happens when no more serious bugs are presentI Release management team (since 2016)

I Major releases are supported for 5 years (so called back branches)

I Quarterly, predictable point releases for critical and security-relevant bugsI Always on the second Thursday in the second month of the quarterI Schedule: https://www.postgresql.org/developer/roadmap/I Security team handles security issuesI Potentially out-of-band point releases in case of emergencies

I Distribution packages for all supported versions for Red Hat/CentOS/SLES andDebian/Ubuntu

I http://yum.postgresql.org, http://apt.postgresql.org

Michael Banck <[email protected]> credativ GmbH 14

Predictable Major and Patch Releases, Long SupportTimeframes

I No bug tracker, but bug submission form

I Reported bugs are getting fixed promptly

I Patch support available from companies

I ‘’When I submitted a bug to the list, usually within an hour or two I would get anemail back saying ”confirmed that that’s a bug, I’m gonna look at it” and for the firstthree or four months, I never submitted a bug that I didn’t have a fix installed andrunning within 24 hours of submitting the initial post. And after my experience withSybase and MySQL/MaxDB, it was totally amazing.”

http://archives.postgresql.org/pgsql-advocacy/2007-08/msg00620.php

Michael Banck <[email protected]> credativ GmbH 15

Fault Tolerance and Data Consistency

Michael Banck <[email protected]> credativ GmbH 16

Fault Tolerance and Data Consistency

I ‘’I manage thousands of databases (PostgreSQL, SQL Server, and MySQL), and thispast weekend we had a massive power surge that knocked out two APC cabinets.[. . . ] Long story short, every single PostgreSQL machine survived the failure withzero data corruption. I had a few issues with SQL Server machines, and virtuallyevery MySQL machine has required data cleanup and table scans and tweaks to get itback to ”production” status.”

I ‘’I had exactly the same experience 3 years ago. Complete power failure (the stand-bygenerator took fire) in one small datacenter (around 500 machines). We had Oracle,SQL Server, DB2, MySQL, Progress, and of course PostgreSQL. The only databaseengine that restarted with no operation required was PostgreSQL. There were veryminimal problems with Oracle (typing recover on some instances), but we had quite afew problems with the other engines.”

http://archives.postgresql.org/pgsql-advocacy/2011-04/msg00085.php

Michael Banck <[email protected]> credativ GmbH 17

Fault Tolerance and Data Consistency

I Write-Ahead-Log protects transactions against crashes

I Automatic replay of transaction log during crash-recovery

I Synchronous replication to standbys possible

I Data checksums protect against storage errors

I https://github.com/credativ/pg_checksums

I Verification of index/data consistency (amcheck extension)

I Regression, isolation and WAL-consistency-checks during development

I Fuzz-Testing via sqlsmith

Michael Banck <[email protected]> credativ GmbH 18

Enterprise-Relevant Security-Features

Michael Banck <[email protected]> credativ GmbH 19

Enterprise-Relevant Security-Features

I Authentication

I Source-IP/User/Database basedI LDAPI SSL certificatesI SCRAM-SHA-256

I Database access control

I Column-based grantsI Row-level security (RLS)I SELinux extension (sepgsql)

I Auditing

I PGAudit extensionI Object audit logging

Michael Banck <[email protected]> credativ GmbH 20

Enterprise-Relevant Security-Features - STIG

https://crunchydata.com/postgres-stig/PGSQL-STIG-9.5+.pdf

Michael Banck <[email protected]> credativ GmbH 21

Interoperability and Extensibility

Michael Banck <[email protected]> credativ GmbH 22

Interoperability and Extensibility

I Federation via Foreign Data Wrappers (FDW) SQL/MED-Standard

I Particularly to other Postgres instances (postgres fdw)I Other SQL databases: MySQL, Oracle, Informix, SQLAlchemy

I Extensions

I Available since Postgres 9.1I Pure SQL or additional C-based librariesI Powerful API and hooksI Large, growing number

I Additional data typesI Procedural languagesI Administrative helpersI Auditing/loggingI Foreign-Data-WrapperI New index types (since 10)

Michael Banck <[email protected]> credativ GmbH 23

Extensions

Michael Banck <[email protected]> credativ GmbH 24

Enterprise Relevant Extensions - Examples

I pgaudit - Event auditing

I pglogical - Logical replication

I orafce - Oracle compatibility

I postgis - Spatial

I pg partman - Partition management

I pgcrypto - Table encryption

I tsearch/pg trgm - Full text search / similarity search

I sepgsql - SELinux-based Mandatory Access Controls

I pgstrom - GPU-offloading of compute-intensive workloads

Michael Banck <[email protected]> credativ GmbH 25

Integrated Operations, Monitoring, Backup

Michael Banck <[email protected]> credativ GmbH 26

PostgreSQL Appliance Dashboard

https://elephant-shed.io

https://github.com/credativ/elephant-shedMichael Banck <[email protected]> credativ GmbH 27

Elephant-Shed

I pgAdmin4 - Web-based PostgreSQL administration

I Grafana - Monitoring dashboards

I pgBadger - Logfile analysis

I pgBackRest - Backups

I Prometheus - Monitoring metrics

I Cockpit - System and services administration

I Shell In A Box - Web-based terminal emulator

Michael Banck <[email protected]> credativ GmbH 28

Elephant-Shed Monitoring Dashboard

Michael Banck <[email protected]> credativ GmbH 29

Replication and High Availability

Michael Banck <[email protected]> credativ GmbH 30

Physical (Streaming) Replication

I Transaction log streaming to standby

I Read-only queries possible on standby (Hot-Standby)

I Quorum-based synchronous replication, optionally per transaction

I Consistent reads from synchronous standbys

I Crash-proof retention of required transaction logs per standby via replication slots

I Standby cloning via base backup

I Switchover, switchback, promote and remastering

I Cascading and/or delayed replication

Michael Banck <[email protected]> credativ GmbH 31

Logical Replication - Use Cases

I Native (since 10) or via pglogical extension

I Major upgrades

I Change Data Capture

I Database changes as e.g. JSON

I Data Aggregation and Integration

I Individual tablesI Row/column filtering (pglogical)

I Bi-directional replication

I 3rd-party solutionI Geographically Distributed ClusterI Conflict resolution handling required

Michael Banck <[email protected]> credativ GmbH 32

High Availability - Definition

I Protection against hardware/software outages

I CPU defectI Network card failureI Kernel panicI Postgres process crash

I Maintenance does not impair service

I Restart of Postgres process after patching or configuration changeI Major version upgrade of PostgresI Operating system upgrade

I Application is continuously available

I No long-lasting locks during schema changes

Michael Banck <[email protected]> credativ GmbH 33

High Availability - Failover Solutions

I Pacemaker/Corosync

I pgsql resource agent (standard)I pgsqlms resource agent (PostgreSQL Automatic Failover, PAF)

I Patroni

I repmgr

I pgpool-II

I Kubernetes Operator

I PatroniI Crunchy Data Container Suite

I Client-based failover via definition of multiple hosts

I PgJDBC (since 9.3-1100)I libpq (since 10)

Michael Banck <[email protected]> credativ GmbH 34

High Availability - Pacemaker Master/Slave Set (PAF)

I Resource agent pgsqlms, developed by Dalibo, Postgres licence

I Master/Slave set, streaming replication

I Controlled switchover/switchback and demote possible besides failover/promote

I Switchover only if the current primary can become a standby without any problems

I In case of promote a notify event is intercepted and it is checked whether otherstandys have further replayed transactions

I Relatively simple configuration

I STONITH device required, timeouts need to be tested/adjusted

Michael Banck <[email protected]> credativ GmbH 35

High Availability - Pacemaker Example-Setup

Michael Banck <[email protected]> credativ GmbH 36

High Availability - Patroni

I Agent, configures instances and replication, enables switchover (Bot-Pattern)

I Uses a distributed consensus store (etcd, Consul, Zookeeper) for leader election andsplit-brain avoidance

I Offers a REST-API for status, health checks and configuration changes

I Optional HAProxy for master/replica service endpoints

I HTTP check REST-API on /master and /replica, respectively

I Deployment in containers, Kubernetes, bare-metal or via Debian/Ubuntu packages

Michael Banck <[email protected]> credativ GmbH 37

High Availability - Patroni

Michael Banck <[email protected]> credativ GmbH 38

High Availability - Continuous Service Maintenance

I Transparent Postgres RestartI pgBouncer: PostgreSQL connection proxy/pooler/routerI Holds incoming connections with PAUSE commandI Postgres restart after all active connections have endedI Application sees delayed connection instead of error messagesI Requires short-lived sessions/transactionsI Incoming connection routing during major-version upgrade switchover

I Near-Zero-Downtime Major UpgradesI Logical Replication (pglogical, internal (from 10), Slony-I)

I Requires redundant hardware/storage and primary keys

I In-Place Upgrades with pg upgrade

I Does not require primary keys, but second data directoryI Hardlink mode (without possibility of switchback) downtime from 10sI Scales with amount of database objects, not database size

Michael Banck <[email protected]> credativ GmbH 39

High Availability - Long-Lasting Locks

The following operations do not require long-lasting exclusive locks or table rewrites:

I Adding columns with NULL or DEFAULT (from 11)

I Dropping columns

I Dropping or validating constraints

I Concurrent index creation

I Foreign Key creation

I Unique constraint creation via concurrent index

I Table reorganization with pg repack

Michael Banck <[email protected]> credativ GmbH 40

Big Data Analytics

Michael Banck <[email protected]> credativ GmbH 41

Big Data Analytics

I Declarative partitioning (since 10) allows management of huge tables

I CUBE, ROLLUP, and GROUPING SETS analytical functions

I Block-Range indexes (BRIN) partition data at 1% of default index size

I TABLESAMPLE command allows for data sample with upper bound runtime

I Parallel query allows usage of multiple cores for reporting queries

I PL/R procedural language allows for statistical analysis in R

Michael Banck <[email protected]> credativ GmbH 42

Vertical and Horizontal Scaling

Michael Banck <[email protected]> credativ GmbH 43

Vertical and Horizontal Scaling - Definition

I Vertical scaling: improved utilization of the server’s existing resources

I More transactions per CPU coreI Usage of multiple CPU cores for individual queries

I Horizontal scaling: load distribution to several servers

I Distributing queries to multiple servers

I Data replicated to every server: load balancingI Data distributed between servers: sharding

I Usage of multiple servers for individual queries

I Massive Parallel Processing

Michael Banck <[email protected]> credativ GmbH 44

Vertical Scaling - TPC-H Benchmark 5 GB v9.5-v11Sequential vs. Parallel

Michael Banck <[email protected]> credativ GmbH 45

Vertical Scaling - TPC-H Benchmark 5 GB v9.5-v11Selected Queries

Michael Banck <[email protected]> credativ GmbH 46

Vertical Scaling - TPC-H Benchmark 1 TB v9.6, 72Cores, q1

https://blog.2ndquadrant.com/parallel-monster-benchmark/

Michael Banck <[email protected]> credativ GmbH 47

Horizontal Scaling - Load Balancing

I Read queries get distributed

I Write queries on primary

I Data is replicated to all nodes

I Application-transparent

I pgpool-II

I Application support required

I HAProxyI pgBouncer DNS Round-RobinI PgJDBC connection option LoadBalance=true

I Parameter remote apply for consistent read queries

Michael Banck <[email protected]> credativ GmbH 48

Horizontal Scaling - Sharding

I Read and write queries get distributed

I Data is distributed between nodes

I Fact tables usually replicated for efficient joins

I Postgres-XL

I Greenplum

I CitusDB

I PL/Proxy

I FDW-based native sharding probably coming in the future

‘’Towards Built-in Sharding in Community PostgreSQL”

https://www.pgcon.org/2017/schedule/events/1069.en.html

Michael Banck <[email protected]> credativ GmbH 49

Thanks for your attention - Contact

I Question?

I Michael Banck <[email protected]>

I http://www.credativ.de

I http://www.credativ.de/postgresql-competence-center

I http://www.credativ.de/jobs

I http://www.credativ.de/blog

Michael Banck <[email protected]> credativ GmbH 50