45
1 1 Couchbase Server 2.0 Indexing and Querying Chris Anderson Chief Architect

Developing with Couchbase: App Development with Indexes and Queries

Embed Size (px)

Citation preview

Page 1: Developing with Couchbase: App Development with Indexes and Queries

1 1

Couchbase Server 2.0 Indexing and Querying

Chris AndersonChief Architect

Page 2: Developing with Couchbase: App Development with Indexes and Queries

2

What we’ll talk about

• View basics• Lifecycle of a view

Index definition, build, and query phase Indexing details

• Replica indexes, failover and compaction• Primary and Secondary indexes• View best practices• Additional patterns

Page 3: Developing with Couchbase: App Development with Indexes and Queries

3

JSON Documents

• Map more closely to objects or entities• CRUD Operations, lightweight schema

• Stored under an identifier key

{ “fields” : [“with basic types”, 3.14159, true], “like” : “your favorite language”}

client.set(“mydocumentid”, myDocument);mySavedDocument = client.get(“mydocumentid”);

Page 4: Developing with Couchbase: App Development with Indexes and Queries

What are Views?

• Extract fields from JSON documents and produce an index of the selected information

Page 5: Developing with Couchbase: App Development with Indexes and Queries

Views – The basics

• Define materialized views on JSON documents and then query across the data set • Using views you can define

• Primary indexes

• Simple secondary indexes (most common use case)

• Complex secondary, tertiary and composite indexes

• Aggregations (reduction)

• Indexes are eventually indexed • Queries are eventually consistent with respect to documents• Built using Map/Reduce technology

• Map and Reduce functions are written in Javascript

Page 6: Developing with Couchbase: App Development with Indexes and Queries

View LifecycleDefine -> Build -> Query

6 6

Page 7: Developing with Couchbase: App Development with Indexes and Queries

• Create design documents on a bucket• Create views within a design document

Buckets & Design docs & Views

7

BUCKET 1

Design document 1

View 1View 1

View 2View 2

View 3View 3

Design document 2

View 4View 4

View 5View 5

Design document 3

View 6View 6

View 7View 7

BUCKET 2

Page 8: Developing with Couchbase: App Development with Indexes and Queries

8

3333 22

Eventually indexed Views – Data flow2

Managed Cache

Dis

k Q

ueue

Disk

Replication Queue

App Server

Couchbase Server Node

Doc 1Doc 1

Doc 1

To other node

View engine

Doc 1

Page 9: Developing with Couchbase: App Development with Indexes and Queries

9

Couchbase Server Cluster

Distributed Indexing and Querying

User Configured Replica Count = 1

Active

Doc 5

Doc 2

Doc

Doc

Doc

Server 1

REPLICA

Doc 3

Doc 1

Doc 7

Doc

Doc

Doc

App Server 1

COUCHBASE Client LibraryCOUCHBASE Client Library

Cluster Map

COUCHBASE Client LibraryCOUCHBASE Client Library

Cluster Map

App Server 2

Doc 9

• Indexing work is distributed amongst nodes

• Parallelize the effort

• Each node has index for data stored on it

• Queries combine the results from required nodes

Active

Doc 3

Doc 1

Doc

Doc

Doc

Server 2

REPLICA

Doc 6

Doc 4

Doc 9

Doc

Doc

Doc

Doc 8

Active

Doc 4

Doc 6

Doc

Doc

Doc

Server 3

REPLICA

Doc 2

Doc 5

Doc 8

Doc

Doc

Doc

Doc 7

Query

Create Index / View

Page 10: Developing with Couchbase: App Development with Indexes and Queries

10

DEFINE Index / View Definition in JavaScript

CREATE INDEX City ON Brewery.City;

10

Page 11: Developing with Couchbase: App Development with Indexes and Queries

11

BUILD Distributed Index Build Phase

• Optimized for lookups, in-order access and aggregations

• View reads are from disk (different performance profile than GET/SET)

• Views built against every document on every node–Group them in a design document

• Views are automatically kept up to date

Page 12: Developing with Couchbase: App Development with Indexes and Queries

12

QUERY Dynamic Queries with Optional Aggregation

Query ?startkey=“J”&endkey=“K”{“rows”:[{“key”:“Juneau”,“value”:null}]}

• Eventually consistent with respect to document updates• Efficiently fetch a document or group of similar documents • Queries will use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries

Page 13: Developing with Couchbase: App Development with Indexes and Queries

–All the views within a design document are incrementally updated when the view is accessed or auto-indexing kicks in–Automatic view updates• In addition to forcing an index build at query time, active & replica indexes are

updated every 3 seconds of inactivity if there are at least 5000 new changes (configurable)

–The entire view is recreated if the view definition has changed–Views can be conditionally updated by specifying the “stale”

argument to the view query–The index information stored on disk consists of the combination

of both the key and value information defined within your view.

Index building details

Page 14: Developing with Couchbase: App Development with Indexes and Queries

14

Queries run against stale indexes by default

• stale=update_after (default if nothing is specified)–always get fastest response–can take two queries to read your own writes

• stale=ok–auto update will trigger eventually–might not see your own writes for a few minutes– least frequent updates -> least resource impact

• stale=false–Use with “set with persistence” if data needs to be included in

view results–BUT be aware of delay it adds, only use when really required

Page 15: Developing with Couchbase: App Development with Indexes and Queries

Views and Replica indexes

• In addition to replicas for data (up to 3 copies), optionally create replica for indexes• Each node manages replica index data structures• Set at a bucket level • Replica index populated from replica data• Replica index is used after a failover

Page 16: Developing with Couchbase: App Development with Indexes and Queries

Views and failover

• Replica indexes enabled on failover• Replicas indexes are rebuilt on replica nodes–Automatically incrementally built based on replica data–Updated every 3 seconds of inactivity if there are at least 5000

new changes–Not copied/moved to be consistent with persisted replica data

Page 17: Developing with Couchbase: App Development with Indexes and Queries

View Compaction

• Compaction is ONLINE • Reclaims empty allocated space from disk• Indexes are stored on disk for active vBuckets on each

node and updated in append-only manner• Auto-compaction performed in the background–Set the database fragmentation levels–Set the index fragmentation levels–Choose a schedule–Global and bucket specific settings

Page 18: Developing with Couchbase: App Development with Indexes and Queries

18

Development vs. Production Views

• Development views index a subset of the data.• Publishing a view builds the

index across the entire cluster.• Queries on production views

are scattered to all cluster members and results are gathered and returned to the client.

Page 19: Developing with Couchbase: App Development with Indexes and Queries

19 19

Simple Primary and

Secondary Indexing

Page 20: Developing with Couchbase: App Development with Indexes and Queries

Example Document

20

Document ID

Page 21: Developing with Couchbase: App Development with Indexes and Queries

21

Define a primary index on the bucket

• Lookup the document ID / key by key, range, prefix, suffix

Index definition

Page 22: Developing with Couchbase: App Development with Indexes and Queries

22

Define a secondary index on the bucket

• Lookup an attribute by value, range, prefix, suffix

Index definition

Page 23: Developing with Couchbase: App Development with Indexes and Queries

23

Find documents by a specific attribute

• Lets find beers by brewery_id!

Page 24: Developing with Couchbase: App Development with Indexes and Queries

24

The index definition

ValueKey

Page 25: Developing with Couchbase: App Development with Indexes and Queries

25

The result set: beers keyed by brewery_id

Page 26: Developing with Couchbase: App Development with Indexes and Queries

View Best Practices

26 26

Page 27: Developing with Couchbase: App Development with Indexes and Queries

function(doc, meta){ if (doc.ingredient) { emit(doc.ingredient.ingredtext, null); }}

function(doc, meta){ if (doc.ingredient) { emit(doc.ingredient.ingredtext, null); }} 27

View writing guidance

• Move frequently used views out to a separate design document– All views in a design document are updated at the same time

– This can result in increase index building time if all views are in a single design document, especially for frequently accessed views.

– However, grouping views into smaller number of design documents improves overall performance

• Try to avoid computing too many things with one view

• Use built-in reduces where possible - custom reduces are not optimized

• Check for attribute existence

Page 28: Developing with Couchbase: App Development with Indexes and Queries

28

View writing guidance

function(doc, meta) { if(doc.type == “player”) emit(doc.experience, null);}

function(doc, meta) { if(doc.type == “player”) emit(doc.experience, null);}

• Do not include the document in the view value– Instead either use the GET / SET API or the API that includes documents filtered by

the query [example: willIncludeDocs()]

– Emit either null or the ID instead (meta.id) in your key or value data

• Don’t emit too much data into a view value– Use views to filter documents

– Then use the data path to access the matched documents

• Use Document Types to make views more selective

emit(doc.name, null)emit(doc.name, null)

Page 29: Developing with Couchbase: App Development with Indexes and Queries

What impact do views have on the system?

• Complexity of the index CPU • Size of the value emitted and selectivity Disk size, I/O• Replica index Disk size, I/O, CPU• Number of design doc CPU, I/O, Disk size– 4 active and 2 replica design documents are built in parallel by default –Can be changed using the maxParallelIndexers and maxParallelReplicaIndexers parameters

• Compaction of views CPU, I/O • Rebalance time Increases with views to support consistent

query results during rebalance – Can be disabled using the indexAwareRebalanceDisabled parameter

Page 30: Developing with Couchbase: App Development with Indexes and Queries

Views and OS caching

• File system cache availability for the index has a big impact performance• Indexes are disk based and should have sufficient file system

cache available for faster query access• In house performance results show that by doubling system

cache availability–query latency reduces to half –throughput increases by 50%

• Runs based on 10 million items with 16GB bucket quota and 4GB, 8GB system RAM availability for indexes

Page 31: Developing with Couchbase: App Development with Indexes and Queries

Query PatternBasic Aggregations

31 31

Page 32: Developing with Couchbase: App Development with Indexes and Queries

32

Use a built-in reduce function with a group query

• Lets find average abv for each brewery!

Page 33: Developing with Couchbase: App Development with Indexes and Queries

33 33

We are reducing doc.abv with _stats

Page 34: Developing with Couchbase: App Development with Indexes and Queries

34 34

Group reduce (reduce by unique key)

Page 35: Developing with Couchbase: App Development with Indexes and Queries

Query PatternTime-based Rollups

35 35

Page 36: Developing with Couchbase: App Development with Indexes and Queries

36

Find patterns in beer comments by time

{ "type": "comment", "about_id": "beer_Enlightened_Black_Ale", "user_id": 525, "text": "tastes like college!", "updated": "2010-07-22 20:00:20"}

{ "id": "f1e62"}

timestamp

Page 37: Developing with Couchbase: App Development with Indexes and Queries

37

Query with group_level=2 to get monthly rollups

Page 38: Developing with Couchbase: App Development with Indexes and Queries

38

dateToArray() is your friend

• String or Integer based timestamps• Output optimized for group_level queries• array of JSON numbers:

[2012,9,21,11,30,44]

dateToArray()

Page 39: Developing with Couchbase: App Development with Indexes and Queries

39

group_level=2 results

• Monthly rollup• Sorted by time—sort the query results in your application if

you want to rank by value—no chained map-reduce

Page 40: Developing with Couchbase: App Development with Indexes and Queries

40

group_level=3 - daily results - great for graphing

• Daily, hourly, minute or second rollup all possible with the same index.

Page 41: Developing with Couchbase: App Development with Indexes and Queries

41 41

Query PatternLeaderboard

Page 42: Developing with Couchbase: App Development with Indexes and Queries

42

Aggregate value stored in a document

• Lets find the top-rated beers!

{ "brewery": "New Belgium Brewing", "name": "1554 Enlightened Black Ale", "abv": 5.5, "description": "Born of a flood...", "category": "Belgian and French Ale", "style": "Other Belgian-Style Ales", "updated": "2010-07-22 20:00:20", “ratings” : { “jchris” : 5, “scalabl3” : 4, “damienkatz” : 1 }, “comments” : [ “f1e62”, “6ad8c” ]}

ratings

Page 43: Developing with Couchbase: App Development with Indexes and Queries

43 43

Sort each beer by its average rating

• Lets find the top-rated beers!

average

Page 44: Developing with Couchbase: App Development with Indexes and Queries

Questions?

44 44

Page 45: Developing with Couchbase: App Development with Indexes and Queries

THANK YOU

[email protected]@jchris

45 45