34
Approximate Indexing with BFTrees* A RUM access method Manos Athanassoulis* Anastasia Ailamaki Harvard SEAS EPFL *work done while at EPFL

Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Approximate  Indexing  with  BF-­‐Trees*A  RUM  access  method

Manos  Athanassoulis*                Anastasia  AilamakiHarvard  SEAS                                                          EPFL

*work  done  while  at  EPFL

Page 2: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Tree  indexing

2…  is  designed  for  disks

wide  and  short  trees

…  which  have  large  size  in  order  to  minimizing  random  accesses

This  is  not  enough!

Page 3: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

A  brave  new  (storage)  world

3

Page 4: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

A  brave  new  (storage)  world

4memory  price  varies

read  performance  varies

update  cost  varies

Page 5: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

RUM:  Read  vs Update  vs Memory  

5

Read  Optimized

Update  Optimized Memory/Storage  Optimized

Access  Method

Page 6: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

RUM:  Read  vs Update  vs Memory  

6

Read  Optimized

Update  Optimized Memory/Storage  Optimized

HDD-­‐based  access  methods

Flash-­‐aware  access  methodsSSD-­‐based  access  methods

LA-­‐Tree  [PVLDB09]FD-­‐Tree  [PVLDB10]μ-­‐Tree  [EMSOFT10]SILT  [SOSP11]MaSM [SIGMOD11]PIO  B-­‐Tree  [PVLDB11]Bw-­‐Tree  [ICDE13]

Page 7: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Memory  Price  vs.  Reads

7

Better

Better

High  PerformanceExpensive  Memory

Low  PerformanceCheap  Memory

rethink  indexing!

exchange  more  reads  for  lower  size

Page 8: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

RUM:  Read  vs Update  vs Memory  

8

Read  Optimized

Update  Optimized Memory/Storage  Optimized

Approximate  Indexing  withBloom  filter  Tree

Future  exploration

let’s  see  how  BF-­‐Tree  works

Page 9: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Flash-­‐aware  indexing

focuses  on  internal  node  organization

lazy  updates

immutable  data

9what  about  solid-­‐state  storage  fast  reads?

LA-­‐Tree  [PVLDB09]FD-­‐Tree  [PVLDB10]μ-­‐Tree  [EMSOFT10]SILT  [SOSP11]MaSM [SIGMOD11]PIO  B-­‐Tree  [PVLDB11]Bw-­‐Tree  [ICDE13]

Page 10: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Approximate  Tree  IndexingDesign  choices  

use  Bloom  filters for  membership  queries  per  page

tunable  tree  sizeprobabilistically  tunable  random  reads

Caveatworks  well  for  datasets  with  implicit  clustering

10how  common  is  implicit  clustering?

Page 11: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Implicit  Clustering

11

TPCH  data:  transaction  dates

data  organized  based  on  creation  time

Page 12: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Implicit  Clustering

12

Electricity  consumption  – Smart  Home  Dataset  (SHD)

data  values  correlated  with  creation  time

how  can  BF-­‐Tree  index  such  data?

Page 13: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Bloom  filter  Treesselect  desired  tree  size  (aka  BF  per  page  size)

each  partition  has  about  the  same  unique valueshas  a  partition-­‐wide  min  and  maxhas  a  (variable)  number  of  physical  pages

13

Pj-­‐2 Pj-­‐1 Pj …… Pj+1

[-­‐10,-­‐5) [-­‐5,0) [1,9) [9,11)

Page 14: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

for  every  page  of  every  partitionbuild  BF  with  desired  size  (and,  hence,  false  positive)

build  a  B+-­‐Tree  on  top  of  all  partitions  using  the  min/max  as  keys

Bloom  filter  Trees

14

Pj-­‐2 Pj-­‐1 Pj …… Pj+1

BF BF BF BFBF BF BF BF

[-­‐10,-­‐5) [-­‐5,0) [1,9) [9,11)

Page 15: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Bloom  filter  Trees

15

……

BF BF BF BFBF BF BF BF

[-­‐10,-­‐5) [-­‐5,0) [1,9) [9,11)

Pj-­‐2 Pj-­‐1 Pj Pj+1

Page 16: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Bloom  filter  Trees

16

……

Partition  Pj with  k  pages

Partition  PjAll  k  pages  contain values  between  partition-­‐widemin  and  max.  

1,3 2,5 3,6

min:1 max:  8

BF BF BF BF

4,8

Page 17: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Bloom  filter  Trees

17

……

Partition  Pj with  k  pages

BF BF BF BF

1,3 2,5 4,8

min:1 max:  8

Search  for  3

3,6

Page 18: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Bloom  filter  Trees

18

……

Partition  Pj with  k  pages

BF BF BF BF

1,3 2,5 4,8

min:1 max:  8

Search  for  3

3,6

Page 19: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Bloom  filter  Trees

19

……

Partition  Pj with  k  pages

BF BF BF BF

1,3 2,5 4,8

min:1 max:  8

Search  for  3

3,6

Page 20: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Bloom  filter  Trees

20

BF

……

BF BFBF-­‐leafk  Bloom  filtersK  pages  in  the  partition  Pj

BF

Partition  Pj with  k  pages

1,3 2,5 4,8

min:1 max:  8

Search  for  3

3,6

Page 21: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

1,3 2,5 4,8

min:1 max:  8

Bloom  filter  Trees

21

BF

……

BF BF BF

Partition  Pj with  k  pages

Search  for  3

3,6

Page 22: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Bloom  filter  Trees

22

BF

……

BF BF BF

Partition  Pj with  k  pagesretrieve  and  search  for  desired  value1,3 2,5 4,8 3,6

min:1 max:  8

Search  for  3

false  positives  are  also  possible

Page 23: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Bloom  filter  Trees

23

BF

……

BF BF BF

Partition  Pj with  k  pages

1,3 2,5 4,8 3,6

min:1 max:  8

Search  for  3

False  positive

Page 24: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

BF-­‐Tree  Design

Variable  false  positive  probability (fpp)

24

BF BF BF BFBF

BF

BF

BF

BF

BF

BF

BF

BFs  have  tunable  size

tunable  size  à variable  performance

BF BF BF BFBF

BF

BF

BF

BF

BF

BF

BF

False  positive

p1  =  0.01%If  BF  size  is  halfp2  =  1%

Page 25: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

BF-­‐Trees  in  actionDatasets

1GB  synthetic with  256b  tuples  and  8b  keys30GB  TPCH  (SF30)Smart  Home  Dataset  (SHD)

WorkloadPoint  queries  (PK  or  TPCH  date  or  energy  level)

5  storage  configurations  (index/data)mem/SSDmem/HDDSSD/SSDSSD/HDDHDD/HDD

25

Page 26: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

1.E-­‐01

1.E+00

1.E+01

1.E+02

1.00E-­‐151.00E-­‐121.00E-­‐091.00E-­‐061.00E-­‐031.00E+00

Respon

se  time  (m

s)

mem/SSD mem/HDD SSD/SSDSSD/HDD HDD/HDD

BF-­‐Trees  for  PKaverage  index  probe  time  for  1GB  relationvarying

false  positive  probability;  storage  configuration

26

false  positive  probability

1.E-­‐01

1.E+00

1.E+01

1.E+02B+-­‐Tree  Latency

Tuplesize:  256  bytesKeysize:  8  bytes

Bigger  Tree  Size

Page 27: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

1.E-­‐01

1.E+00

1.E+01

1.E+02

1.00E-­‐151.00E-­‐121.00E-­‐091.00E-­‐061.00E-­‐031.00E+00

Respon

se  time  (m

s)

mem/SSD mem/HDD SSD/SSDSSD/HDD HDD/HDD

BF-­‐Trees  for  PKaverage  index  probe  time  for  1GB  relationvarying

false  positive  probability;  storage  configuration

27

false  positive  probability

1.E-­‐01

1.E+00

1.E+01

1.E+02B+-­‐Tree  Latency

Tuplesize:  256  bytesKeysize:  8  bytes

Bigger  Tree  Size

Data  location  matters  most

Both  data/index  locations  matter

what  about  the  tree  size?

Page 28: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

average  index  probe  time  for  1GB  relationvarying

false  positive  probability;  storage  configuration

1.E-­‐01

1.E+00

1.E+01

1.E+02

mem/SSD mem/HDD SSD/SSD SSD/HDD HDD/HDD

Respon

se  time  (m

s)

Solid: B+-­‐Tree Pattern: BF-­‐Tree  (best)

BF-­‐Tree  vs B+-­‐Tree:  Size  &  Latency

28

Tuplesize:  256  bytesKeysize:  8  bytes

3.8x  smaller  size 12.2x

19.4x

Page 29: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

average  index  probe  time  for  1GB  relationvarying

false  positive  probability;  storage  configuration

1.E-­‐01

1.E+00

1.E+01

1.E+02

mem/SSD mem/HDD SSD/SSD SSD/HDD HDD/HDD

Respon

se  time  (m

s)

Solid: B+-­‐Tree Pattern: BF-­‐Tree  (best)

BF-­‐Tree  vs B+-­‐Tree:  Size  &  Latency

29

Tuplesize:  256  bytesKeysize:  8  bytes

3.8x  smaller  size

competitive  performance  with  space  savings

12.2x19.4x

Page 30: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

BF-­‐Tree  for  SHD

30

0.25

6.2

0.7

6.2

22

0.31

6.0

0.6

6.2

16

1

2

3

4

1.E-­‐01

1.E+00

1.E+01

1.E+02

mem/SSD mem/HDD SSD/SSD SSD/HDD HDD/HDD

Capa

city  Gain

Respon

se  time  (m

s)

Solid: B+-­‐TreePattern: BF-­‐Tree  (best)

Page 31: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

TPCH  point  queries  on  date

31

0

1

2

3

4

5

6

7

8

9

0% 5% 10% 25%

BF  normalize

d  resp.  tim

e  with

 B+Tree

Probe  hit  rate

mem/SSD mem/HDD SSD/SSD SSD/HDD HDD/HDD

BF-­‐Tree  is  always  faster  for  low  hit  rate

High  hit  rate:  B+  Tree  is  faster

Data  on  HDD  à High  overhead  (unless   index  is  slow)

Index  perf.  ≈  data  perf.  à Low  overhead

Cardinality:  2k  values

Page 32: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

Approximate  Tree  Indexingtunable  size  à variable  performance

competitive  resp.  time  w/  4-­‐20x  capacity  savings

tailored  for:datasets  with  implicit  clusteringworkloads  with  low  hit  rate

more  details  in  paper:analytical  modeling,  range  scansupdates,  datasets,  more  comparisons

32

Page 33: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

RUM  Tunable  Indexing

33

Read  Optimized

Update  Optimized Memory/Storage  Optimized

HDD-­‐based  access  methods

Flash-­‐aware  access  methodsSSD-­‐based  access  methods

LA-­‐Tree  [PVLDB09]FD-­‐Tree  [PVLDB10]μ-­‐Tree  [EMSOFT10]SILT  [SOSP11]MaSM [SIGMOD11]PIO  B-­‐Tree  [PVLDB11]Bw-­‐Tree  [ICDE13]

Page 34: Approximate+Indexing+with+BF4Trees* ARUM+access+method · RUM:+Read+vsUpdate+vsMemory+ 6 Read+Optimized Update+Optimized Memory/Storage+Optimized HDD4basedaccess+methods Flash4aware+access+methods

RUM  Tunable  Indexing

34

Read  Optimized

Update  Optimized Memory/Storage  Optimized

http://daslab.seas.harvard.edu/rum-­‐conjecture/

Thanks!Questions?