20
An Empirical Evaluation of Extendible Arrays Stelios Joannou & Rajeev Raman University of Leicester 7 May 2011 10th International Symposium on Experimental Algorithms

An Empirical Evaluation of Extendible Arrays

  • Upload
    vivek

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

An Empirical Evaluation of Extendible Arrays. Stelios Joannou & Rajeev Raman University of Leicester. Introduction. There is increasing use of in-memory RAM dynamic data structures (DS) to store large dynamic data sets. eg .: succinct DS, bloom filters and hash tables - PowerPoint PPT Presentation

Citation preview

Page 1: An Empirical Evaluation of Extendible Arrays

An Empirical Evaluation of Extendible Arrays

Stelios Joannou & Rajeev RamanUniversity of Leicester

7 May 2011 10th International Symposium on Experimental Algorithms

Page 2: An Empirical Evaluation of Extendible Arrays

Introduction• There is increasing use of in-memory RAM

dynamic data structures (DS) to store large dynamic data sets. eg.: succinct DS, bloom filters and hash tables

• Unlike traditional (pointer-based) dynamic DS, RAM DS can allocate/free variable-sized chunks of memory → memory fragmentation!

27 May 2011 10th International Symposium on Experimental Algorithms

Page 3: An Empirical Evaluation of Extendible Arrays

Does Fragmentation Matter?

• Many recent computers have VM with 64-bit address space = 16 exabytes of addressable memory, but:– Many 32-bit machines still around– Java VM has 2GB limit– Some OS have no VM (Android VM)– Fragmentation can lead to thrashing, even when allocated

memory is clearly less than available physical memory.• Many studies regarding fragmentation (from [B. Randell.

Comm. ACM’69] to [Brodal et al. Acta Inf.’05]) in general but not about specific DS.

37 May 2011 10th International Symposium on Experimental Algorithms

Page 4: An Empirical Evaluation of Extendible Arrays

Introduction

• Explicit memory management for dynamic DS is infeasible in practice:– Use fragmentation-friendly DS

• We consider the extendible array (EA) and collection of EAs (CEA).– CEAs can be used to construct complex DS

[eg Raman/Rao ICALP‘03]• Aim of this paper: study implementations

for CEA from fragmentation perspective. 47 May 2011 10th International Symposium on

Experimental Algorithms

Page 5: An Empirical Evaluation of Extendible Arrays

EAs and CEAs• Dynamic arrays that can grow/shrink from

one side:– grow/shrink ()– access (i)

• CEA (collection of EAs) similar to EA:– create ()– destroy (A)– access (i, A)– grow/shrink (A)

57 May 2011 10th International Symposium on Experimental Algorithms

Page 6: An Empirical Evaluation of Extendible Arrays

Vector EA• Included in the C++ STL• Data stored in an array• When array full, do “doubling”

– Create a new array of double the size– Copy everything to the new array, delete old array

• Advantages– Access time is worst case O (1)– Grow/Shrink takes O (1) amortized time

• Disadvantages– It can have internal fragmentation of Θ (n) words

67 May 2011 10th International Symposium on Experimental Algorithms

Page 7: An Empirical Evaluation of Extendible Arrays

Simple EA

• Uses constant size of DB, double IB when full• DB and IB size are a power of 2

Index Block (IB) keeps track of DBs

Data Blocks(DB) contain data

77 May 2011 10th International Symposium on Experimental Algorithms

Page 8: An Empirical Evaluation of Extendible Arrays

• Advantage– access: O(1) worst-case time. – grow/shrink: O(1) amortized time.– most memory is allocated/freed in fixed-size

blocks, so reduced external fragmentation when used in CEA (fragmentation-friendly).

• Disadvantages– A small DB size will lead to a huge IB– A big DB size may lead to internal fragmentation– optimal DB size may be data dependent!

Simple EA

87 May 2011 10th International Symposium on Experimental Algorithms

Page 9: An Empirical Evaluation of Extendible Arrays

Self-tuning• Self-tuning DS choose main parameters

automatically and automatically rearrange data accordingly.• Simple is not self-tuning.

• We want EAs that are both fragmentation-friendly and self-tuning.

97 May 2011 10th International Symposium on Experimental Algorithms

Page 10: An Empirical Evaluation of Extendible Arrays

Brodnik EA ([Brodnik et al. WADS '99])

Data split in super blocks (SB) of size

Further split in data blocks (DB) of size

Each SB has DBsIndex Block (IB) keeps track of DBs

107 May 2011 10th International Symposium on Experimental Algorithms

Page 11: An Empirical Evaluation of Extendible Arrays

• Advantages– access: O(1) worst-case time– grow/shrink: O(1) amortized time– Wasted space is O ()– Self-tuning DS

• Disadvantages– CPU is heavily used during access (i)

– Different DB sizes can lead to fragmentation

Brodnik EA ([Brodnik et al. WADS '99])

117 May 2011 10th International Symposium on Experimental Algorithms

Bug in the paper

Page 12: An Empirical Evaluation of Extendible Arrays

Modified Brodnik EA• Combines Brodnik and Simple• DBs have equal size, DB and IB sizes are a

power of 2• When growing: alternates doubling IB and DB

127 May 2011 10th International Symposium on Experimental Algorithms

Page 13: An Empirical Evaluation of Extendible Arrays

• Advantages• Access time for worst-case O (1)• Grow/shrink amortized time O (1)• Wasted space is O (• After alteration of DB size there is

possibility that new DBs will be contiguous• Uses less CPU during access (i) than

Brodnik EA• Fragmentation-friendly when only one

instance of the EA• Disadvantages

• Fragmentation can lead to underuse of physical memory when used as part of a CEA with various EA sizes

Modified Brodnik EA

137 May 2011 10th International Symposium on Experimental Algorithms

Page 14: An Empirical Evaluation of Extendible Arrays

Global Brodnik CEA

• Keeps the same DB size between individual EAs

• Tries to maintain an ideal DB size of across all EAs

− t is the number of EAs currently created− N is their total size

• Array containing EAs doubles when full

147 May 2011 10th International Symposium on Experimental Algorithms

Page 15: An Empirical Evaluation of Extendible Arrays

Global Brodnik CEA

157 May 2011 10th International Symposium on Experimental Algorithms

Page 16: An Empirical Evaluation of Extendible Arrays

• Advantages• Self-tuning CEA• access: O(1) worst-case time• grow/shrink: O(1) amortized time• Fragmentation-friendly because of equal-

sized DBs across EAs.• Disadvantages

• Wasted space is O () words

Global Brodnik CEA

167 May 2011 10th International Symposium on Experimental Algorithms

Page 17: An Empirical Evaluation of Extendible Arrays

Experimental Results• Speed tests for different

ratios of EAs/# of elements

• Sequential Access:– Vector is the fastest – Brodnik slowest

• Random Access:– Simple Slowest in first

test– Vector is faster in 2nd

and 3rd case (indirection)

EAs x Elements DS Grow Sequential Random

16 x 16777216

Vector 2.38 0.25 22.65Brodnik 2.93 1.90 28.66Simple 1.87 0.31 40.53Modified Brodnik 1.69 0.29 20.63Global Brodnik 4.95 0.33 23.96

16384 x 16384

Vector 1.90 0.25 24.03Brodnik 3.12 1.87 57.46Simple 1.85 0.32 44.35Modified Brodnik 2.39 0.30 48.05Global Brodnik 4.93 0.34 44.46

2097152 x 128

Vector 3.12 0.29 44.69Brodnik 6.31 2.09 86.45Simple 2.11 0.43 56.28Modified Brodnik 6.21 0.58 54.04Global Brodnik 6.26 0.48 58.26

177 May 2011 10th International Symposium on Experimental Algorithms

Page 18: An Empirical Evaluation of Extendible Arrays

Experimental Results - 2• 80-20 Test

– 20% of DS contain 80% of total elements

– Going through CEA, shrink an EA, grow a random one based on the rule

– Keep doing that for times equal to the total number of elements

DSInitial Ending

VM (MB) RES (MB) VM (MB) RES (MB)

Vector 278 266 692 603Brodnik 388 377 628 523Simple 304 293 357 343

Modified Brodnik 328 318 577 485

Global Brodnik 328 318 372 357

187 May 2011 10th International Symposium on Experimental Algorithms

Page 19: An Empirical Evaluation of Extendible Arrays

Experimental Results - 3• Thrashing

– Run a speed test after 80-20 test

– Measured CPU time and wall time

– Used EAs, each of size 1200 elements

• Resulting usage close to physical memory

– Time measured is in seconds

Thrashing occurred in Brodnik, Mod. Brod and Vector EAs

Simple, Global Brod. kept low memory usage so thrashing

Results verified by examining CPU usage and page faults

DSInitial Final

CPU ElapsedVM RES VM RES

Vector 4.23 3.73 7.34 3.74 40.12 780Brodnik 3.66 3.65 6.06 3.73 40.19 872

Simple 2.83 2.82 3.20 3.17 28.20 150

Modified Brodnik

3.15 3.14 5.71 3.67 43.28 1988

Global Brodnik

3.15 3.14 3.51 3.47 25.52 134

197 May 2011 10th International Symposium on Experimental Algorithms

Page 20: An Empirical Evaluation of Extendible Arrays

Conclusion

207 May 2011 10th International Symposium on Experimental Algorithms

– Increase in importance of RAM DS makes fragmentation important, demonstrated e.g. that thrashing occurs even in simple DSs, even when memory allocated is well below physical memory.

– Introduced and established “self-tuning” and “fragmentation-friendliness” as desirable features for dynamic RAM data structures. For CEAs:• The standard solution (vector) is not efficient.• Fragmentation-friendly and self-tuning DS seem good

all-round performers.– Further testing in real-world applications is needed.