21
Near Real time Map/Reduce with Views Sarath Lakshman | Software Engineer, Couchbase

Near Realtime MapReduce Views

Embed Size (px)

DESCRIPTION

Couchbase view engine internals

Citation preview

Near Real time Map/Reduce with Views

Sarath Lakshman | Software Engineer, Couchbase

§  Overview of incremental Map/Reduce

§  Architecture of Couchbase views

§  Database Change Protocol (DCP) for Views

§  Index on-disk storage lower layer rewrites

§  Faster index updates and indexing latency

§  Read Your Own Writes (RYOW) for view queries

Overview

©2014 Couchbase, Inc. 2

j

Overview to Map/Reduce views Performing indexing and aggregations

What are Map/Reduce views ?

•  In Couchbase, Map-Reduce is specifically used to create Indexes •  Map functions are applied to JSON documents and they output or "emit"

data that is organized in an Index form

Map function

©2014 Couchbase, Inc. 5

function (doc, meta) { !"if (doc.type == “beer” && doc.brewery_id && doc.name) { !" "emit(doc.name, doc.abv); !"} !

}

Sample View

function (keys, values, rereduce) { ! var maxscore = 0; ! for (var i = 0; i < values.length; i++) { ! if (values[i].score > maxscore) { ! maxscore = values[i].score; ! } ! } ! return maxscore; !}

Reduce function

©2014 Couchbase, Inc. 6

Design documents

©2014 Couchbase, Inc. 7

Couchbase Bucket

Design Document 2

View View View

Indexers Are Allocated Per Design Doc

All Updated at Same Time

Can Only Access Data in the

Bucket Namespace

Design Document 1

View View

Querying view indexes

©2014 Couchbase, Inc. 8

http://localhost:8092/beer-sample/_design/beer/_view/brewery_beers?limit=10

View engine architecture View engine architecture and internals

Couchbase 3.0 Architecture

©2014 Couchbase, Inc. 10

View engine index maintenance

©2014 Couchbase, Inc. 11

View queries

©2014 Couchbase, Inc. 12

Index fragmentation and compaction

©2014 Couchbase, Inc. 13

Indexing latency in Couchbase 3.0 Improvements in view performance

©2014 Couchbase, Inc. 14

§  Document updates are delivered to view engine in near real time

§  Disk reads are not required from document storage engine

§  Faster index creation

§  Lower indexing latencies §  Ability for view engine to rollback during node failures without full

index rebuild

Database Change Protocol and Views

§  Lock contention and bottlenecks in Erlang VM

§  View engine needs to be faster inorder to consume and process database changes through DCP

§  Rewritten index builder, index updater and index compactor

§  Further work will improve rebalance duration with views

Rewrite of index engine in storage layer

©2014 Couchbase, Inc. 16

Benchmark on indexing latency

34916

359 0

10000

20000

30000

40000

Couchbase 2.5.1 Couchbase 3.0

©2014 Couchbase, Inc. 17

Time taken (ms) for an updated document to get indexed in a view index

4 nodes, 1 bucket, 20M docs of size 2KB, 250 mutations/sec

Query and consistency How to achieve RYOW ?

©2014 Couchbase, Inc. 18

§  Stale = ok/true §  Least query latency §  Returns the query results from index storage §  Index is by default incrementally updated every 5 seconds §  Stale = update_after (default staleness setting) §  Similar to stale=ok §  Forces indexer to perform index update immediately as part of query ignoring index update

interval §  Stale = false §  Higher query latency §  Triggers indexer update and wait for index to be updated at least up to the current document store §  Query results are returned only after index is updated

Query options for staleness / consistency

©2014 Couchbase, Inc. 19

§  Performing RYOW in Couchbase 2.5 §  Stale=false only ensured that persisted documents are available in the view query results §  User should ensure a document has persisted by using Observe command before querying

§  RYOW using Couchbase 3.0 §  Stale=false ensures that query results are atleast up-to-date till the point in time of request

§  How view engine ensures at-least point-in-time consistency §  Document dataset is partitioned to smaller set across the cluster §  Each partition has a sequence number that incremented on every update operation on a

document §  View engine notes down the current sequence numbers for all the partitions during a stale=false

query and waits until index is updated atleast up to those sequence numbers

Read Your Own Writes (RYOW)

©2014 Couchbase, Inc. 20

Questions

©2014 Couchbase, Inc. 21