30
www.postgrespro.ru Indexes don't mean slow inserts Anastasia Lubennikova

Indexes don't mean slow inserts

Embed Size (px)

Citation preview

Page 1: Indexes don't mean slow inserts

www.postgrespro.ru

Indexes don't meanslow inserts

Anastasia Lubennikova

Page 2: Indexes don't mean slow inserts

Agenda

1. Why do we need it?2. Write-optimisation techniques3. PostgreSQL specific4. Advanced PostgreSQL indexes5. Future of indexing in PostgreSQL

2/29

Page 3: Indexes don't mean slow inserts

PostgreSQL indexes

● Speed up search● Primary key● Constraints

● Secondaryindexes

3/29

Page 4: Indexes don't mean slow inserts

Index maintenance overhead

● Index size● INSERT slowdown● Random I/O● Index becomes

fragmented

● More indexes -more overhead

4/29

Page 5: Indexes don't mean slow inserts

Do we need write-optimisation?

UPDATE mytable SET a = a + 1;

● Heavy write load● MVCC

update = insert

5/29

Page 6: Indexes don't mean slow inserts

Do we need write-optimisation?

● 1Gb table ● Update all values:

Without index ~ 200sWith index ~ 600s

6/29

Page 7: Indexes don't mean slow inserts

DBMS trade-offs

7/29

Page 8: Indexes don't mean slow inserts

DBMS trade-offs

● CAP theorem● ACID vs BASE● Lower hardware cost vs Increase productivity● Read speed vs Write speed● Productivity vs Fault-tolerance

8/29

Page 9: Indexes don't mean slow inserts

Write-optimisation

● Writes are faster● Reads are good● Storage is fault-tolerant

9/29

Page 10: Indexes don't mean slow inserts

Insert buffer

● Accumulate data. Sort. Insert at a time.+ Avoids random I/O- Seqscan buffer- Possible data loss- Merge time

● Avoids hidden scans- only non-unique

● MySQL InnoDB Change Buffer 10/29

Page 11: Indexes don't mean slow inserts

Cache-oblivious data structures

● Approximately optimal for any hardware● Divide & Conquer

11/29

Page 12: Indexes don't mean slow inserts

LSM trees

● Cascade of B-trees● First tree is in memory

● LevelDB● BigTable ● Cassandra ● Hbase● SophiaDB● other NoSQL DBs

12/29

Page 13: Indexes don't mean slow inserts

From LSM to COLA

● Search optimisation for LSM● Leaf levels are linked by lookahead pointers

● Introduced in 2007 by founders of Tokutek

13/29

Page 14: Indexes don't mean slow inserts

Cache-oblivious lookahead arrays

● Drop internal trees nodes● Bound the scan area with lookahead pointers

14/29

Page 15: Indexes don't mean slow inserts

COLA: theory and pracrice

● Prototype shows incredible results!

15/29

Page 16: Indexes don't mean slow inserts

COLA: theory and pracrice

● VACUUM?● WAL?● Concurrency?● Index size?

● too hard =(

● Prototype shows incredible results!

16/29

Page 17: Indexes don't mean slow inserts

Fractal Tree

● Insert the message instead of the data● Send it down the tree● Apply a message

to leaf page

● TokuDB for MySQL● TokuMX for MongoDB

17/29

Page 18: Indexes don't mean slow inserts

PostgreSQL specific

18/29

Page 19: Indexes don't mean slow inserts

PostgreSQL specific (1)

● Write-Ahead-Log• WAL is not extendable

● Storage manager• 1 Relation (Heap or Index) = 1 continuous file• Free Space Map

● Block size• 8 Kb

19/29

Page 20: Indexes don't mean slow inserts

Advanced indexes

20/29

Page 21: Indexes don't mean slow inserts

Advanced PostgreSQL indexes

● Optimize the number of indexes• pg_stat_statements

● REINDEX● CREATE INDEX CONCURRENTLY

• rebuild bloated and fragmented indexes● Partial indexes● BRIN

• tiny min/max index

21/29

Page 22: Indexes don't mean slow inserts

Ideas?

22/29

Page 23: Indexes don't mean slow inserts

Covering and Unique

● To maintain constraint (Unique, Pimary key..) on A● To use IndexOnlyScan on A,B● Have to maintain2 indexes

23/29

Page 24: Indexes don't mean slow inserts

Covering + Unique

CREATE UNIQUE INDEX ON mytable USING btree(a)INCLUDING(b);

24/29

Page 25: Indexes don't mean slow inserts

Covering + Unique (1)

# CREATE INDEX covering_idx ON people (first_name) INCLUDING (last_name);

# EXPLAIN SELECT first_name, last_name FROM people WHERE first_name = 'Paul'; QUERY PLAN--------------------------------------------------------------------------- Index Only Scan using covering_idx on people (cost=0.28..1.44 rows=4 width=13) Index Cond: (first_name = 'Paul'::text)

25/29

Page 26: Indexes don't mean slow inserts

Covering + Unique (2)

● Searches are faster• You can use IndexOnlyScan now

● Indexes maintenace overhead decreased• Inserts are faster• Size is smaller

26/29

Page 27: Indexes don't mean slow inserts

Effective storage of duplicates

● Compress duplicated keys on index pageIN PROGRESS

27/29

Page 28: Indexes don't mean slow inserts

Bulk insert

● INSERT INTO mytable SELECT x FROM generate_series(0, 1000000) as x;

● 1.000.000 B-tree searches● 1.000.000 WAL records

28/29

Page 29: Indexes don't mean slow inserts

Insert Buffer

● Flexible● Recoverable

29/29

Page 30: Indexes don't mean slow inserts

www.postgrespro.ru

Thanks for attention!Any questions?

[email protected]