97
Linux Filesystems Performance for Databases Portland PostgreSQL Performance Pad Selena Deckelmann [email protected] PostgreSQL Global Development Group twitter: @selenamarie

Linux Filesystems Performance for Databases - O'Reilly …assets.en.oreilly.com/1/event/27/Linux Filesystem Performance for... · Linux Filesystems Performance for Databases Portland

  • Upload
    haxuyen

  • View
    254

  • Download
    0

Embed Size (px)

Citation preview

Linux Filesystems Performance for Databases

Portland PostgreSQL Performance Pad

Selena [email protected]

PostgreSQL Global Development Grouptwitter: @selenamarie

OSCO

N 2009

OSCO

N 2009

Do filesystems do what we expect?

3

OSCO

N 2009

We are volunteers.

4

OSCO

N 2009

We think you should run these tests.

5

OSCO

N 2009

We are:DBAs

SysadminsPerformance tuners

6

OSCO

N 2009

How will this hardware perform?

7

OSCO

N 2009

How will this filesystem perform?

8

OSCO

N 2009

Why should you care about filesystem-specific

performance?

9

OSCO

N 2009

Expectations

10

OSCO

N 2009

Where to start?

11

OSCO

N 2009

The Defaults.

12

OSCO

N 2009

13

OSCO

N 2009

14

Not addressing reliability

OSCO

N 2009

15

Very Narrow Use Case:A Relational Database

OSCO

N 2009

16

Need for periodic testing.(And we've got some

hardware!)

OSCO

N 2009

17

★Kernel differences★FS patch-level differences★Mount options★mkfs options

OSCO

N 2009

18

Focused on THROUGHPUT

(Because that’s what people who buy large systems look for)

OSCO

N 2009

19

Later:Response Time

Operations per second

OSCO

N 2009

No, we will not be testing ZFS.

20

OSCO

N 2009

BtrFS (nope, not yet)

21

FS

OSCO

N 2009

What do we expect?

22

OSCO

N 2009

Some conventional wisdom:

23

OSCO

N 2009

“RAID5 is the worst choice

for a database.”

24

OSCO

N 2009

“LVM incurstoo much overhead

to use.”

25

OSCO

N 2009

“Striping doubles performance.”

26

OSCO

N 2009

“Turning off 'atime'is a big

performance gain.”

27

OSCO

N 2009

28

“Getting rid of atime updates would give us more everyday Linux performance than all the pagecache speedups of the last 10 years, _combined_.”

OSCO

N 2009

“Journaling filesystems (ext3) will have worse performance than non-journaling filesystems

(ext2).”

29

OSCO

N 2009

“Your read-ahead buffer

is big enough.”

30

OSCO

N 2009

Now... on to the good stuff.

31

OSCO

N 2009

32

OSCO

N 2009

PostgreSQL’s Portland Performance Pad

33

Hosted by CommandPrompt, Inc.

OSCO

N 2009

Our machine:

HP ProLiant DL380G5Smart Array p800

72GB 15,000 RPM SAS (up to 25 disks)32GB RAM

Linux: 2.6.25-gentoo-r6*New tests being run with 2.6.28

34

OSCO

N 2009

Our machine:Chosen because

of it’s low, low price.

Thank you, HP.

35

OSCO

N 2009

Our tests:fio64 GB working set8 threadsno fadviseno direct i/o8KB blocksizeI/O elevator: deadline

36

OSCO

N 2009

Our tests:fio read (sequential, random) write (sequential, random) read-write (50/50 mix)

37

OSCO

N 2009

Our stats:sarmpstatiostatvmstatreadprofile

38

OSCO

N 2009

Our tests:Chosen because of their relevance to PostgreSQL

39

OSCO

N 2009

Filesystems Tested:

ext2ext3jfsxfsreiserfsext4 (but had trouble)

40

OSCO

N 2009

Disk configs tested:

Single diskRAID-0RAID-1RAID-5RAID-10RAID-6

41

OSCO

N 2009

The Data:http://moourl.com/fsperf

42

OSCO

N 2009

Confessions:• May be high standard deviation with results (don’t know yet!)

•No filesystem tuning, all default create and mount options

•No software raid comparison or lvm (volume management test) for 2.6.28 tests

43

OSCO

N 2009

Confessions:• Some xfs runs had to be repeated and some ext4 runs did not complete successfully

• Only presenting throughput

• Interested in system performance for a specific application, not code performance

44

OSCO

N 2009

Confessions:•I/O profiles don’t exhibit atime or partition alignment issues

•Disk controller firmware not at the latest version in 2.6.25 tests

•Software RAID is on top of 1 disk RAID 0 devices (HP SmartArray doesn’t have JBOD option)

45

OSCO

N 2009

AUDIENCE PARTICIPATIONHigher throughput:

ext2 or ext3?

46

OSCO

N 2009

47

OSCO

N 2009

48

OSCO

N 2009

49

OSCO

N 2009

50

OSCO

N 2009

Seek bundling/batching in ext3 is better?

51

OSCO

N 2009

What if we add a disk?

52

OSCO

N 2009

53

OSCO

N 2009

54

OSCO

N 2009

55

OSCO

N 2009

56

OSCO

N 2009

57

OSCO

N 2009

AUDIENCE PARTICIPATIONRAID 0 (stripe) versus

RAID 1 (mirroring)performance?

58

OSCO

N 2009

59

OSCO

N 2009

60

OSCO

N 2009

61

OSCO

N 2009

What happens when we: add disks to a

RAID 0 (stripe) LUN?

62

OSCO

N 2009

63

OSCO

N 2009

64

OSCO

N 2009

65

OSCO

N 2009

66

OSCO

N 2009

Adding disks to a RAID 5 LUN

67

OSCO

N 2009

68

OSCO

N 2009

69

OSCO

N 2009

70

OSCO

N 2009

Only have 4 disks?What should you do?

71

OSCO

N 2009

72

OSCO

N 2009

73

OSCO

N 2009

74

OSCO

N 2009

75

OSCO

N 2009

In most cases, RAID 5 out-performs on sequential writes (xlog).

Random writes is only an improvement on xfs and reiserfs.

76

OSCO

N 2009

Are software RAID and LVM are slow?

77

OSCO

N 2009

78

OSCO

N 2009

79

OSCO

N 2009

The Read-ahead buffer

80

OSCO

N 2009

AUDIENCE PARTICIPATIONReadahead buffer:

Default is 128 KWhat do you think it should be?

81

OSCO

N 2009

82

OSCO

N 2009

And is there a cost to increasing the buffer

that much?

83

OSCO

N 2009

84

OSCO

N 2009

http://moourl.com/readaheadconfirm

85

OSCO

N 2009

OLTP workload

86

• DBT-2 toolkit (Fair-use derivative of TPC-C)

• Used 35 drives ultimately

• pgtune: http://pgfoundry.org/projects/pgtune/

OSCO

N 2009

87

OSCO

N 2009

88

OSCO

N 2009

7% improvement! :)

89

OSCO

N 2009

For more info...

90

• See Mark Wong’s blog: http://pugs.postgresql.org/blog/92

• Takeaway: for DBT-2, increasing checkpoint_segments had the largest impact (fewer checkpoints :)

OSCO

N 2009

Future Work•OLTP system characterization, sizing (ongoing)

•Daily OLTP regression testing•More presentations•P5 - PostgreSQL Portland Performance Pad PRACTICE (done!)

91

OSCO

N 2009

MOAR Hardware?

Thanks again, HP!MSA70, DL380 in late 2009 ??

92

OSCO

N 2009

Let’s recap...

93

OSCO

N 2009

“RAID5 is the worst choice for a database.” Fast for sequential writes in our tests.

“LVM incurs too much overhead to use. Software RAID is slower.” For reads – throughput is about the same, but saw higher CPU.

“Turning off 'atime' is a big performance gain.” Not in our tests. But, 2-3% for “free”.

94

OSCO

N 2009

“Journaling filesystems will have worse performance than non-journaling filesystems.” Turn the data journaling off on ext3, and you do see better performance, but there are edge cases and performance differences we could not explain.

“Striping doubles performance.” Performance is better, but no where near double. Why?

95

OSCO

N 2009

“Your read-ahead buffer is big enough.” Your read-ahead buffer IS NOT big enough. Make it 8MB. And can we make that the default?

96