31
ID generation PHP London 2012-08-02 @davegardnerisme

Unique ID generation in distributed systems

Embed Size (px)

DESCRIPTION

A run through of the various options available for generating unique IDs

Citation preview

Page 1: Unique ID generation in distributed systems

ID generation

PHP London 2012-08-02@davegardnerisme

Page 2: Unique ID generation in distributed systems

@davegardnerisme

hailoapp.com/dave(for a £5 discount)

Page 3: Unique ID generation in distributed systems

Web AppMySQL

DC 1

MySQL auto increment

1,2,3,4…

Page 4: Unique ID generation in distributed systems

MySQL auto increment

• Numeric IDs

• Go up with time

• Not resilient

Page 5: Unique ID generation in distributed systems

Web AppMySQL

DC 1

MySQL multi-master replication

MySQL

1,3,5,7…

2,4,6,8…

Page 6: Unique ID generation in distributed systems

MySQL multi-master replication

• Numeric IDs

• Do not go up with time

• Some resilience

Page 7: Unique ID generation in distributed systems

Going global…

DC 1

DC 2

DC 3

DC 4

DC 5

DC 6

Page 8: Unique ID generation in distributed systems

Web App

DC 1

MySQL in multi DC setup

MySQL

Web App

DC 2

?

1,2,3…

WAN LINK

Page 9: Unique ID generation in distributed systems

Web App

DC 1

Flickr MySQL ticket server

Ticket Server

Web App

DC 2

1,3,5…

WAN LINK

Ticket Server

4,6,8…

WAN link not required to generate an ID

Page 10: Unique ID generation in distributed systems

Flickr MySQL ticket server

• Numeric IDs

• Do not go up with time

• Resilient and distributed

• ID generation separated from data store

Page 11: Unique ID generation in distributed systems

DC

The anatomy of a ticket server

Web App

Web App

Web App

Web App

Ticket Server

Page 12: Unique ID generation in distributed systems

DC

Making things simpler

ID gen

Web App

ID gen

Web App

ID gen

Web App

ID gen

Web App

Page 13: Unique ID generation in distributed systems

UUIDs

• 128 bits

• Could use type 4 (Random) or type 1 (MAC address with time component)

• Can generate on each machine with no co-ordination

Page 14: Unique ID generation in distributed systems

Type 4 – random

xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx

f47ac10b-58cc-4372-a567-0e02b2c3d479

version

variant (8, 9, A or B)

Page 15: Unique ID generation in distributed systems

5.3 x 1036

possible values for a type 4 UUID

Page 16: Unique ID generation in distributed systems

1.1 x 1019

UUIDs we could generate per second since the Universe began

Page 17: Unique ID generation in distributed systems

2.1 x 1027

Olympic swimming pools filled if each possible value contributed a millilitre

Page 18: Unique ID generation in distributed systems

Type 1 – MAC address

51063800-dc76-11e1-9fae-001c42000009

• Time component is based on 100 nanosecond intervals since October 15, 1582

• Most significant bits of timestamp shifted to least significant bits of UUID

Page 19: Unique ID generation in distributed systems

Type 1 – MAC address

• The address (MAC) of the computer that generated the ID is encoded into it

• Lexical ordering essentially meaningless

• Deterministically unique

Page 20: Unique ID generation in distributed systems

There are some other options…

Page 21: Unique ID generation in distributed systems

No co-ordination needed

Deterministically unique

K-ordered (time-ordered lexically)

Page 22: Unique ID generation in distributed systems

Twitter Snowflake

• Under 64 bits

• No co-ordination (after startup)

• K-ordered

• Scala service, Thrift interface, uses Zookeeper for configuration

Page 23: Unique ID generation in distributed systems

Twitter Snowflake

41 bits Timestampmillisecond precision,

bespoke epoch

10 bits Configured machine ID

12 bits Sequence number

Page 24: Unique ID generation in distributed systems

Twitter Snowflake

77669839702851584

= (timestamp << 22) | (machine << 12) | sequence

Page 25: Unique ID generation in distributed systems

Boundary Flake

• 128 bits

• No co-ordination at all

• K-ordered

• Erlang service

Page 26: Unique ID generation in distributed systems

Boundary Flake

64 bits Timestampmillisecond precision,

1970 epoch

48 bits MAC address

16 bits Sequence number

Page 27: Unique ID generation in distributed systems

PHP Cruftflake

• Based on Twitter Snowflake

• No co-ordination (after startup)

• K-ordered

• PHP, ZeroMQ interface, uses Zookeeper for configuration

Page 28: Unique ID generation in distributed systems

Questions?

Page 29: Unique ID generation in distributed systems

References

Flickr distributed ticket serverhttp://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/

UUIDshttp://tools.ietf.org/html/rfc4122

How random are random UUIDs?http://stackoverflow.com/a/2514722/15318

Twitter Snowflakehttps://github.com/twitter/snowflake

Boundary Flakehttps://github.com/boundary/flake

PHP Cruftflakehttps://github.com/davegardnerisme/cruftflake

Page 30: Unique ID generation in distributed systems

private function mintId64($timestamp, $machine, $sequence){ $timestamp = (int)$timestamp; $value = ($timestamp << 22) | ($machine << 12) | $sequence; return (string)$value;}

private function mintId32($timestamp, $machine, $sequence){ $hi = (int)($timestamp / pow(2,10)); $lo = (int)($timestamp * pow(2, 22)); // stick in the machine + sequence to the low bit $lo = $lo | ($machine << 12) | $sequence;

// reconstruct into a string of numbers $hex = pack('N2', $hi, $lo); $unpacked = unpack('H*', $hex); $value = $this->hexdec($unpacked[1]); return (string)$value;}

Page 31: Unique ID generation in distributed systems

public function generate(){ $t = floor($this->timer->getUnixTimestamp() - $this->epoch); if ($t !== $this->lastTime) { $this->sequence = 0; $this->lastTime = $t; } else { $this->sequence++; if ($this->sequence > 4095) { throw new \OverflowException('Sequence overflow'); } } if (PHP_INT_SIZE === 4) { return $this->mintId32($t, $this->machine, $this->sequence); } else { return $this->mintId64($t, $this->machine, $this->sequence); }}