23
BitRot detection in GlusterFS Venky Shankar Gaurav Garg

Bit rot detection-lce-2015

Embed Size (px)

Citation preview

Page 1: Bit rot detection-lce-2015

BitRot detection in GlusterFS

Venky ShankarGaurav Garg

Page 2: Bit rot detection-lce-2015

We are, Gluster developers at Red Hat

... participate in meetups, open source events

... hang out on #freenode: gluster, gluster-dev nick: overclk, ggarg... interact with community: [email protected] [email protected]

Page 3: Bit rot detection-lce-2015

OK, enough. Let’s get started...

Page 4: Bit rot detection-lce-2015

GlusterFS Quick Tour

Page 5: Bit rot detection-lce-2015

Where’s my data?

● Distributed● Local filesystem (brick)

○ XFS○ EXT3, EXT4○ BTRFS

● Prerequisite○ POSIX compatible○ Xattr support

Page 6: Bit rot detection-lce-2015

Understanding data corruption

Page 7: Bit rot detection-lce-2015

Corruption?How ?

● Direct “brick” manipulation○ Script bug○ Admin○ Malicious

Page 8: Bit rot detection-lce-2015

Corruption?How ?

(cont..)

● Silent corruption○ Disk itself

■ Firmware bug■ Mechanical wear■ Ageing

Page 9: Bit rot detection-lce-2015

Illustration

Page 10: Bit rot detection-lce-2015
Page 11: Bit rot detection-lce-2015

Solution: Integrity checks

Page 12: Bit rot detection-lce-2015

Integrity CheckConsistency

● Track data modifications○ Checksum (signature)○ Persistent

● Verify during access○ Recompute and check

● Repair if corrupted

Page 13: Bit rot detection-lce-2015

Enough of theory, show me how it’s done.

Page 14: Bit rot detection-lce-2015

ImplementationConstraints on choices

● Big fat-file story● Deployments

○ Distribute + Replicate○ Stripe, now [3.7+] sharding○ Erasure coded

Page 15: Bit rot detection-lce-2015

ImplementationConstraints on choices (cotd..)

● In-band data signing○ Costly○ RMW cycle○ Degraded I/O performance

● Verification○ “Ditto”

Page 16: Bit rot detection-lce-2015

ImplementationDetails

● Out-of-band data signing○ Daemon○ Asynchronous

■ Policy■ Strong hash (reason ?)

● Verification○ Daemon (scrubber)○ On-demand○ Pre-scrubbed

Page 17: Bit rot detection-lce-2015

ImplementationDetails (cotd..)

● Object versioning○ Versioned upon modification○ Versioning xattr (64 bit)○ Reflect “object state”

● Signature○ xattr○ Attached to a “version”

Page 18: Bit rot detection-lce-2015

ImplementationDetails (cotd..)

● Integrity checking○ Periodic

■ daily, weekly, etc..○ Filesystem scan

■ Signature mismates■ Matching version

○ QoS■ Controlled crunching

● Corrupted objects○ Denies access (EIO)○ Repairable

■ Replica, Codes

Page 19: Bit rot detection-lce-2015

Use cases

Page 20: Bit rot detection-lce-2015

Use Cases● Small files● Long lived data

○ Archival storage

○ WORM workload

Page 21: Bit rot detection-lce-2015

Future

● Replica consistency● Metadata checksumming● Offloading

○ BTRFS● Sharding adaption● GlusterFS 4.0 [Interesting!]

○ In-band (weaker hash)○ Checksum everything○ Default○ Lost (phantom) writes

Page 22: Bit rot detection-lce-2015

3.7

● Bitrot detection● No recovery● In comes sharding

3.7.2

● Recovery support

3.7.4

● Bug fixes● Scrub status

3.8

● Sharding ready● Bitrot adaption

4.0

● Hell of a change● Sharding by default● Checksum everything

Page 23: Bit rot detection-lce-2015

Q & A