Download pdf - Search at Twitter

Search @twitter

Michael Busch@[email protected] [email protected]

1

mailto:[email protected]




Agenda

‣ Introduction

- Search Architecture

- Inverted Index 101

- Realtime Posting Lists

Search @twitter

2

Introduction

3

Introduction

Twitter has more than 230 million monthly active users.

4

Introduction

500 million tweets are sent per day.

5

Introduction

More than 300 billion tweets have been sent since company founding in 2006.

6

Introduction

Tweets-per-second world record:33,388 TPS.

7

Introduction

More than 2 billion search queries per day.

8

Introduction

2008

2009

2010

2011

2012

2013

2014

Twitter acquires Summize (MySQL-based RT search engine)

Modified Lucene (Earlybird) ships and replaces MySQL indexes

New Earlybird features: image/video search; index compression;efficient relevance search in time-sorted index

Tweet archive search on SSD with vanilla Lucene

New RT posting list format that supports arbitrary documentlengths, but keeps performance optimizations for tweets

9

Introduction

2008

2009

2010

2011

2012

2013

2014






10

Introduction

2008

2009

2010

2011

2012

2013

2014






11

Introduction

2008

2009

2010

2011

2012

2013

2014






12

Introduction

2008

2009

2010

2011

2012

2013

2014






13

Realtime Search @twitter

Agenda

- Introduction

‣ Search Architecture



14

Search Architecture

15

RT index

Search Architecture

RT streamAnalyzer/Partitioner

RT index(Earlybird)

Blender

RT indexArchive index

MapreduceAnalyzer

rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

16

Search Architecture

Analyzer/Partitioner

• Pre-processes Tweets for indexing

• Analyzing (tokenization/normalization) of text

• Geo-coding, URL expansion, etc.

• Hash partitioning

17

RT index

Search Architecture


RT index(Earlybird)

Blender


rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

MapreduceAnalyzer

18

RT index

Search Architecture

RT index(Earlybird)

• Modified Lucene index implementation optimized for realtime search

• IndexWriter buffer is searchable (no need to flush to allow searching)

• In-memory

• Hash-partitioned, static layout

19

Cluster layout

Replicas

EarlybirdEarlybird

Earlybird

20

Cluster layout

...

n hash partitions (docId % n)

Replicas

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

21

Cluster layout

...

...

...

... ... ... ...Timeslices

n hash partitions (docId % n)

Replicas

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

22

Cluster layout

...

...

...

... ... ... ...

Writabletimeslice

Completetimeslices

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

EarlybirdEarlybird

Earlybird

23

RT index

Search Architecture

RT index(Earlybird)

• Modified Lucene index implementation optimized for realtime search

• IndexWriter buffer is searchable (no need to flush to allow searching)

• In-memory

• Hash-partitioned, static layout

24

RT index

Search Architecture


RT index(Earlybird)

Blender


rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

MapreduceAnalyzer

25

Search Architecture

MapreduceAnalyzer

• Daily jobs that process raw tweets

• Analyzes text

• Aggregates metadata and signals

26

RT index

Search Architecture


RT index(Earlybird)

Blender


rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

MapreduceAnalyzer

27

Search Architecture


• Standard Lucene (4.4) indexes

• Reverse time-sorted (new to old)

• Cluster layout similar to realtime search cluster

28

Search Architecture


• Two tiers: In-memory and on SSD

In-memory index

SSD index

29

Search Architecture



In-memory index

SSD index

Contains small number of best tweets of all time

30

Search Architecture



In-memory index

SSD index

Much bigger index with more tweets, less max. QPS, limited by

SSD IOPS.Only needs to be queried if in-

memory index did not yield enough results

31

RT index

Search Architecture


RT index(Earlybird)

Blender


rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

MapreduceAnalyzer

32

RT index

Search Architecture

RT index(Earlybird)

Blender


searcheswrites

Searchrequests

• Blender is our Thrift service aggregator

• Queries multiple Earlybirds, merges results

33

RT index

Search Architecture


RT index(Earlybird)

Blender


rawtweets

HDFS

searcheswrites

Searchrequests

analyzedtweets

analyzedtweets

rawtweets

Tweet archive

MapreduceAnalyzer

34

RT index

Search Architecture

TweetsAnalyzer/Partitioner

RT index(Earlybird)

Blender


queue

HDFS

Searchrequests

Updates Deletes/Engagement (e.g. retweets/favs)

searcheswrites

MapreduceAnalyzer

35


Agenda

- Introduction


‣ Inverted Index 101


36

Inverted Index 101

37

Inverted Index 101

1 The old night keeper keeps the keep in the town

2 In the big old house in the big old gown.

3 The house in the town had the big old keep

4 Where the old night keeper never did sleep.

5 The night keeper keeps the keep in the night

6 And keeps in the dark and sleeps in the light.

Table with 6 documents

Example from:Justin Zobel , Alistair Moffat, Inverted files for text search engines, ACM Computing Surveys (CSUR)v.38 n.2, p.6-es, 2006

38

Inverted Index 101







term freqand 1 <6>big 2 <2> <3>

dark 1 <6>did 1 <4>

gown 1 <2>had 1 <3>

house 2 <2> <3>in 5 <1> <2> <3> <5> <6>

keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>

never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>

sleep 1 <4>sleeps 1 <6>

the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>


Dictionary and posting lists39

Inverted Index 101








dark 1 <6>did 1 <4>

gown 1 <2>had 1 <3>

house 2 <2> <3>in 5 <1> <2> <3> <5> <6>


never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>


the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>


Dictionary and posting lists

Query: keeper

40

Inverted Index 101








dark 1 <6>did 1 <4>

gown 1 <2>had 1 <3>

house 2 <2> <3>in 5 <1> <2> <3> <5> <6>


never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>


the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>


Dictionary and posting lists

Query: keeper

41

Posting list encoding

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

42


Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:

43


Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:

00000101VInt compression:

Values 0 <= delta <= 127 need one byte

44


Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:


Values 128 <= delta <= 16384 need two bytes

00011001

45


Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:


First bit indicates whether next byte belongs to the same value

00011001

46


Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:

11000110VInt compression: 00011001

• Variable number of bytes - a VInt-encoded posting can not be written as a primitive Java type; therefore it can not be written atomically

47


Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

5 10 8985 2 90998 90Delta encoding:

Read direction

• Each posting depends on previous one; decoding only possible in old-to-new direction

• With recency ranking (new-to-old) no early termination is possible

48


• By default Lucene uses a combination of delta encoding and VInt compression

• VInts are expensive to decode

• Problem 1: How to traverse posting lists backwards?

• Problem 2: How to write a posting atomically?

49


Agenda

- Introduction



‣ Realtime Posting Lists

50

Realtime Posting Lists

51

Posting list encoding in Earlybird v1

int (32 bits)

docID24 bits

max. 16.7M

textPosition8 bits

max. 255

• Tweet text can only have 140 chars

52

Posting list encoding in Earlybird v1

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

Earlybird encoding:

Read direction

5 15 9000 9002 100000 100090

53

Early query termination

Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090

Earlybird encoding:

Read direction

5 15 9000 9002 100000 100090

E.g. 3 result are requested: Here we can terminate after reading 3

postings

54

Inverted index components

Parallel arraysDictionary

pointer to the most recently indexed posting for a term

Posting list storage

?

55

Inverted index components


pointer to the most recently indexed posting for a term

Posting list storage

?

56

• Store many single-linked lists of different lengths space-efficiently

• The number of java objects should be independent of the number of lists or number of items in the lists

• Every item should be a possible entry point into the lists for iterators, i.e. items should not be dependent on other items (e.g. no delta encoding)

• Append and read possible by multiple threads in a lock-free fashion (single append thread, multiple reader threads)

• Traversal in backwards order

Posting lists storage - Objectives

57

Memory management

= 32K int[]

4 int[]pools

58

Memory management

= 32K int[]

4 int[]pools

Each pool can be grown

individually by adding 32K

blocks

59

Memory management

• For simplicity we can forget about the blocks for now and think of the pools as continuous, unbounded int[] arrays

• Small total number of Java objects (each 32K block is one object)

4 int[]pools

60

Memory management

• Slices can be allocated in each pool

• Each pool has a different, but fixed slice size

21

24

27

211slice size

61

Adding and appending to a list

21

24

27

211slice size

availableallocatedcurrent list

62


21

24

27

211slice size

Store first twopostings in this slice


63


21

24

27

211slice size

When first slice is full, allocate another one in second pool


64


21

24

27

211slice size


Allocate a slice on each level as list grows

65


21

24

27

211slice size


On upper most level one list can own multiple slices

66

Posting list format v1

int (32 bits)

docID24 bits

max. 16.7M

textPosition8 bits

max. 255

• Tweet text can only have 140 chars

67

Addressing items

• Use 32 bit (int) pointers to address any item in any list unambiguously:

int (32 bits)

poolIndex2 bits0-3

offset in slice1-11 bits

depends on pool

sliceIndex19-29 bits

depends on pool

• Nice symmetry: Postings and address pointers both fit into a 32 bit int

68

Linking the slices

21

24

27

211slice size


69

Linking the slices

21

24

27

211slice size



pointer to the last posting indexed for a term

70

Posting list encoding - Summary

• ints can be written atomically in Java

• Backwards traversal easy on absolute docIDs (not deltas)

• Every posting is a possible entry point for a searcher

• Skipping can be done without additional data structures as binary search, though there are better approaches (skip lists)

• Repeating docIDs if a term occurs multiple times in the same document only works for small docs

• Max. segment size: 2^24 = 16.7M tweets

71

New posting list encoding

• Objectives:

• 32 bit positions and variable-length payloads

• Store term frequency (TF) instead of repeating docIDs

• Keep:

• Concurrency model

• Space-efficiency for short documents

• Performance

72


DocID, termFreq Position, Payload

73



Fixed length for each posting

74



Variable length

75


DocID, termFreq

Position, Payload

76


DocID, termFreq

Position, Payload

DocID, termFreq

Position, Payload, Position

DocID, termFreq

Position, Payload

...

...

77


DocID, termFreq

Position, Payload

DocID, termFreq


DocID, termFreq

Position, Payload

...

...

• Store TF instead of repeating the same DocID

• Store DocID/TF pairs separately from position/payloads

• Find a way to synchronously decode the two streams without storing a pointer for each posting (expensive)

78


DocID, termFreq

Position, Payload

DocID, termFreq


DocID, termFreq

Position, Payload

...

...

• Store TF instead of repeating the same DocID

• Store DocID/TF pairs separately from position/payloads

• Find a way to synchronously decode the two streams without storing a pointer for each posting (expensive)

Fixed length for each posting (32 bits)

79


• Idea: Use an embedded skip list as periodical “synchronization points”

• Keeps memory overhead for pointers low and improves search performance

80

21

24

27

211slice size



81


Slice header

• Header contains:

• Back-pointer to previous slice (as before)

• Skip list

• Slice id

82


int (32 bits)

docID24 bits

max. 16.7M

textPosition8 bits

max. 255

• Observation: Most tweets don’t need all 8 bits for text position

• Idea: Use the position “inlining” approach for short documents, but support Lucene’s 32-bit positions and variable length payloads

83


int (32 bits)

docID24 bits

max. 16.7M

textPositionor

termFreq7 bits

max. 127

As a storage optimization, the text position is stored with the docID if:o termFreq == 1 (term occurs once only in the doc) ANDo textPosition <= 127 AND o Posting has no payload ANDo Posting is not at a skip point of the docID posting list (see later).

0=textPosition1=termFreq

1 bit

84

New posting list encoding - Summary

• Support for 32 bit positions and arbitrary length payloads stored in separate data structure

• Performance and space consumption very similar compared to previous encoding for tweet search

• Skip lists used for speed and synchronization points

• For short documents positions can still be inlined

85

Questions?Michael Busch@[email protected] [email protected]

Previous talk: http://vimeo.com/31195040

86





http://vimeo.com/31195040

http://vimeo.com/31195040