32
Near Real time Map/Reduce with Views Sarath Lakshman | Software Engineer, Couchbase

Near Real time Map-Reduce with Views: Couchbase Connect 2014

Embed Size (px)

Citation preview

Page 1: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Near Real time Map/Reduce with

ViewsSarath Lakshman | Software Engineer, Couchbase

Page 2: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Overview of incremental Map/Reduce

Architecture of Couchbase views

Database Change Protocol (DCP) for Views

Index on-disk storage lower layer rewrites

Faster index updates and indexing latency

Read Your Own Writes (RYOW) for view queries

Overview

©2014 Couchbase, Inc. 2

Page 3: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Overview to Map/Reduce views

Performing indexing and aggregations

Page 4: Near Real time Map-Reduce with Views: Couchbase Connect 2014

In Couchbase, Map-Reduce is specifically used to create Indexes

Map functions are applied to JSON documents and their output or "emit" data is stored in an index

What are Map/Reduce views ?

©2014 Couchbase, Inc. 4

Page 5: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Sample view for players of tetris with level > 100

Map function

©2014 Couchbase, Inc. 5

function (doc, meta) {if (doc.game == “tetris” && doc.level > 100) {

emit([doc.level, meta.id], doc.score);}

}

Page 6: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Finding maximum score

function (keys, values, rereduce) {

var maxscore = 0;

for (var i = 0; i < values.length; i++) {

if (values[i] > maxscore) {

maxscore = values[i];

}

}

return maxscore;

}

Reduce function

©2014 Couchbase, Inc.

Page 7: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Incremental reduction

©2014 Couchbase, Inc. 77

20 60

60

[105,u1], 10 [110,u2], 20 [105,u3], 50 [101,u4], 60

reduce

re-reduce

emitted values

Page 8: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Incremental reduction

©2014 Couchbase, Inc. 88

200 60

200

[105,a], 200 [110,b], 20 [105,c], 50 [101,d], 60

reduce

re-reduce

emitted values

Page 9: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Design documents

©2014 Couchbase, Inc. 9

Couchbase

Bucket

Design Document 2

ViewViewView

Design Document 1

ViewView

Indexers Are Allocated

Per Design Doc

Can Only Access Data in

the Bucket NamespaceAll Updated at Same Time

Page 10: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Querying view indexes

©2014 Couchbase, Inc. 10

http://localhost:8092/default/_design/players/_view/players?limit=10

Page 11: Near Real time Map-Reduce with Views: Couchbase Connect 2014

View engine architecture

View engine architecture and internals

Page 12: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Couchbase 3.0 Architecture

©2014 Couchbase, Inc. 12

Document update

Main/Replica

XDCR

Page 13: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Couchbase 3.0 Architecture

©2014 Couchbase, Inc. 13

Document update

Main/Replica

XDCR

Page 14: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Couchbase 3.0 Architecture

©2014 Couchbase, Inc. 14

Document update

Main/Replica

XDCR

Page 15: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Copy On Write (COW) – Append only b+ tree

©2014 Couchbase, Inc. 15

KP1

KV1 KV2 KV3

KP2

Root

KV

1

KV

2

KV

3

KP

1

KP

2Root

Key-Value and Key-Pointer nodes

Append-Only file

Page 16: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Copy On Write (COW) – Append only b+ tree

©2014 Couchbase, Inc. 16

KP1

KV1 KV2 KV3

KP2

Root

KV

1

KV

2

KV

3

KP

1

KP

2Root

Root#

KP2#

KV3#

KV3# KP2# Root#

Append-Only file

B-tree state after

an update on KV3

Page 17: Near Real time Map-Reduce with Views: Couchbase Connect 2014

View engine index maintenance

©2014 Couchbase, Inc. 17

Loader WriterMapper

Index update pipeline

JSON

Document

1

• View index update is performed every five seconds when there are at-least

5000 doc changes

• Update pipeline is invoked for each design document

Page 18: Near Real time Map-Reduce with Views: Couchbase Connect 2014

View engine index maintenance

©2014 Couchbase, Inc. 18

Loader WriterMapper

Index update pipeline

KV1 KV2

emitted

values

JSON

Document

2

• View index update is performed every five seconds when there are at-least

5000 doc changes

• Update pipeline is invoked for each design document

Page 19: Near Real time Map-Reduce with Views: Couchbase Connect 2014

View engine index maintenance

©2014 Couchbase, Inc. 19

Batcher SorterWALOn-Disk

Btree

build/update

Index Writer

Bottom-up

btree build

Bulk

update

Initial buildIncremental

Update

Page 20: Near Real time Map-Reduce with Views: Couchbase Connect 2014

View queries

©2014 Couchbase, Inc. 20

Node

1

Node

2

Node

3

Query

Request

Couchbase Cluster

Page 21: Near Real time Map-Reduce with Views: Couchbase Connect 2014

View queries

©2014 Couchbase, Inc. 21

Node

1

Node

2

Node

3

Query Coord

Query

Request

Scatter

Requests

Couchbase Cluster

Page 22: Near Real time Map-Reduce with Views: Couchbase Connect 2014

View queries

©2014 Couchbase, Inc. 22

Node

1

Node

2

Node

3

Query Coord

Query

Request

Query

ResultsK-way

stream

merger

queue

Scatter

Requests

Gather

results

Couchbase Cluster

Page 23: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Index fragmentation and compaction

©2014 Couchbase, Inc. 23

View-1

btree

View-2

btree

View-1

btree

View-2

btree

Compactor

Old

index

file

Compacted

index file

Index-file.1 Index-file.2

Page 24: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Index fragmentation and compaction

©2014 Couchbase, Inc. 24

View-1

btree

View-2

btree

View-1

btree

View-2

btree

Apply

deltas

UpdaterWAL

Old

index

file

Compacted

index file

Index-file.1 Index-file.2

Page 25: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Indexing latency in Couchbase 3.0

Improvements in view performance

25

Page 26: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Document updates are delivered to view engine in near real time

Disk reads are not required from document storage engine

Faster index creation

Lower indexing latencies

Ability for view engine to rollback during node failures without full index rebuild

Database Change Protocol and Views

Page 27: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Lock contention and bottlenecks in Erlang VM

View engine needs to be faster in order to consume and process database changes through DCP

Rewritten index builder, index updater and index compactor

Future work will improve rebalance duration with views

Rewrite of index engine in storage layer

©2014 Couchbase, Inc. 27

Page 28: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Benchmark on indexing latency

©2014 Couchbase, Inc. 28

Time taken (ms) for an updated document to get indexed in a view index

4 nodes, 1 bucket, 20M docs of size 2KB, 250 mutations/sec

0

10000

20000

30000

40000

Couchbase 2.5.1 Couchbase 3.0

34916

597

Page 29: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Query and consistency

How to achieve Read Your Own Writes (RYOW) ?

29

Page 30: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Stale = ok Least query latency

Returns the query results from index storage

Index is by default incrementally updated every 5 seconds

Stale = update_after Similar to stale=ok

Forces indexer to perform index update immediately as part of query ignoring index update interval

Stale = false Higher query latency

Triggers indexer update and wait for index to be updated at-least up to the current document store

Query results are returned only after index is updated

Query options for staleness / consistency

©2014 Couchbase, Inc. 30

Page 31: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Performing RYOW in Couchbase 2.5 Stale=false only ensured that persisted documents are available in the view query results

User should ensure a document has persisted by using Observe command before querying

RYOW using Couchbase 3.0 Stale=false ensures that query results are at-least up-to-date till the point in time of request

How view engine ensures at-least point-in-time consistency Document dataset is partitioned to smaller set across the cluster

Each partition has a sequence number that incremented on every update operation on a document

View engine notes down the current sequence numbers for all the partitions during a stale=false query and waits until index is updated at-least up to those sequence numbers

Read Your Own Writes (RYOW)

©2014 Couchbase, Inc. 31

Page 32: Near Real time Map-Reduce with Views: Couchbase Connect 2014

Questions

[email protected]

32