Why physical storage of your database tables might matter Apoorva Aggarwal

Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Download PDF Report

Upload
others
View
1
Download
0

Embed Size (px)

Citation preview

Why physical storage of your database tables might matter

Apoorva Aggarwal

About MeData Platform Engineer at Grofers helping build systems for data informed decisions

@apoorva_1993 apoorvaaggarwal

Page 3: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 4: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 5: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

The War ZoneBird’s eye view

Page 6: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Behind the scenesNarrowing down into problem space

Page 7: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

We looked at the query plan

Page 8: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Default index in postgres is of type BTree which constitutes a BTree or a balanced tree of index entries and index leaf nodes which store physical address of an index entry.

An index lookup requires three steps:

1. Tree traversal - O(log n)2. Following the leaf node chain - O(n) ~

O(200)3. Fetching the table data

Page 9: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

In postgres, location of a row is given by ctid which is a tuple. ctid is of type tid (tuple identifier), called ItemPointer in the C code. Per documentation:

This is the data type of the system column ctid. A tuple ID is a pair (block number, tuple index within block) that identifies the physical location of the row within its table.

Distribution of Data

http://www.postgresql.org/docs/current/interactive/datatype-oid.html

Page 10: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

CLUSTER command

- Physically rearranges the rows on disk based on a given column.

- Acquiring a READ WRITE lock on table and requiring 2.5x the table size made it tricky to use.

Possible solutions to root cause and exploring feasibility Insert cluster gif

https://www.postgresql.org/docs/9.1/static/sql-cluster.html

Page 11: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

An attempt to fix the problem right from the source

Page 12: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Understanding how Spark writes data to a table

Page 13: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 14: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 15: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 16: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 17: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 18: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Looking at the code we found that we had partitioned on product_id for a particular transformation operation.

Note how rows containing a particular product ID are in one partition

Page 19: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Testing hypothesis

Page 20: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 21: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Attempting to align the data

Page 22: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

We repartitioned the dataframe

Page 23: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

New data distributionAlas, the table is still not pivoted on customer_id.

What did we do wrong?

Page 24: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 25: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Understanding Repartitioning

Page 26: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Visualizing the dataframe

Page 27: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Bring all rows of a product together?How?

Sort them maybe!?

Page 28: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

After sorting rows within a partition by customer_id

Page 29: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Let’s write this to the database table And see distribution

Page 30: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 31: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 32: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Testing the solutionThe final test still remains. Will the query execution now be faster. Let’s see what the query planner says

Page 33: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 34: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

A plain index scan fetches one tuple-pointer at a time from the index, and immediately visits that tuple in the table. A bitmap scan fetches all the tuple-pointers from the index in one go, sorts them using an in-memory "bitmap" data structure, and then visits the table tuples in physical tuple-location order.

Why a bitmap scan

Page 35: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Execution time came down from ~100 ms to ~3ms.

Page 36: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

Page 37: Why physical storage of your database tables might matter · 2020-03-16 · Why physical storage of your database tables might matter Apoorva Aggarwal. About Me Data Platform Engineer

We are hiring!

Email: [email protected]

The Evolving Law On Overlapping Employment Regulatory ... · Questions: Which areas of the law might apply and on what grounds? What steps might you suggest be taken in this matter?

Documents

TAKING BACK YOUR POWER - soulmatealchemy.comThe way you live may not matter at all, But you never know, it might. And just in case it could be that another’s life, through you, might

Documents

When Does Strategic Debt Service Matter?1pages.stern.nyu.edu/~sternfin/vacharya/public_html/valn2.pdf · When Does Strategic Debt-Service Matter?.....3 impact. One might expect strategic

Documents

Writing Numerical Expressions—Hexagon Tablesmathpractices.edc.org/pdf/Writing_Numerical_Expressions-Hexagon_Tables.pdf(5) Sam: No matter how many tables there are, there will always

Documents

Understanding learning gain and why this might matter to you

Education

Diversity, Equity, and Inclusion in EECS Coursesendremad/slides/... · impostor syndrome or stereotype threat. • Discuss at your tables what you might say instead (or encourage

Documents

A Matter of Time - DB2 zOS Temporal Tables - White Paper v1.4.1

Documents

Designing Effective Tables 2.02.19 PM - CMUDesigning Effective Tables A table is an effective method for presenting data when the precise values matter – whereas figures are better

Documents

OCCASIONAL TABLES - National Office Furniture · arrowood occasional tables occasional tables 486 national office furniture—tables

Documents

FEDERAL BRITAIN - iea.org.uk · Summary xiv List of tables, ... its overseas allies, including my own country. Failure might ... FEDERAL BRITAIN. FEDERAL BRITAIN

Documents

Useful Tables PM Tables

Documents

Tables - Sauder Worshipsauderworship.com/images/resources/literature/worship_tables.pdf · Occasional tables ranging from end tables, coffee tables, corner tables, and console tables

Documents

Boardroom Tables | Conference Tables | Meeting Tables

Documents

OKSAT at NTCIR-11 RecipeSearchresearch.nii.ac.jp/ntcir/workshop/OnlineProceedings11/... · 2014. 11. 27. · 'Standard Tables of Food Composition' might be interesting. 6. CONCLUSIONS

Documents

The Kinetic Theory of Matter€¦ · A fast-moving particle might collide with another particle and lose some of its speed. A slow-moving particle might be struck by a faster one

Documents

Size Really Doesn’t Matter: In Search of a National Scale ...faculty.haas.berkeley.edu/arose/SizeOver.pdf · Reasons why Size Might Matter Much Economics: Bigger is Better “Scale

Documents

How might deep determinants matter today?

Documents

Does Social Capital Matter for Political Participationsites.duke.edu/.../10/Enhancing-Political-Participation.doc · Web viewWord Count: 9,894 (incl. notes, tables and references)

Documents

Converting truth tables into Boolean expressions - … · Converting truth tables into Boolean expressions ... or SOP, form. As you might suspect ... incinerator waste valve based

Documents

TABLES D’ADRESSAGE DISPERSÉ (HASH TABLES; TABLES DE HACHAGE)

Documents

TABLES - Jasper Group reconnect the tables no matter the order no matter the direction. Power in-feed cables can be attached to whichever end of a configuration is closest to

Documents

NATIONAL ORGANIC PROGRAM Dry Matter Demand Tables For ... · Mature Goat Does (Dairy) r Maintenance only and Breeding Table 1 r 2: Daily Dry Matter Demand Requirements in Kilograms

Documents

Pairwise Testing Techniques - RBCS, Inc · pairwise testing in situations where supposedly independent options might interact In our previous webinars, decision tables, state based

Documents

Soft Matter - · PDF fileThe zero temperature properties of frictionless soft spheres near the jamming point ... frictional systems might appear ... Soft Matter,2014,10,1519–1536

Documents

Creating Analog Behavioral Modelslumerink.com/...Creating_Analog_Behavioral_Models.pdf · matter of writing a few mathematical expressions. In other cases, it might require curve

Documents

$THE DELICATE MATTER OF ROCK BRITTLENESS - · PDF fileThe Delicate Matter of Rock Brittleness by Mehrdad Soltanzadeh ... This will let the fractures to ... there might be some justifiable$

THE DELICATE MATTER OF ROCK BRITTLENESS - · PDF fileThe Delicate Matter of Rock Brittleness by Mehrdad Soltanzadeh ... This will let the fractures to ... there might be some justifiable

Documents

Does Inflation Targeting Matter for Output Growth? · · 2017-12-14Does Inflation Targeting Matter for Output Growth? ... conclusion might still be debatable, ... also considers

Documents

Might might not

Education

Front matter, preface, tables of content

Documents

Soft Matter Simulation Notes - personalpages.manchester.ac.uk · Contents List of Figures iii List of Tables v Nomenclature vii 1 Review of Soft Matter Systems1 1.1 Introduction

Documents