110
Couchbase Server 2.0 Webinar Series Couchbase Server 2.0 Use Cases Overview Introducing Couchbase Server 2.0 Couchbase Server 2.0 and Indexing/Querying Couchbase Server 2.0 and Incremental Map Reduce for Real-Time Analytics Couchbase Server 2.0 and Cross Data Center Replication Couchbase Server 2.0 and Full-Text Search Integration 1 Wednesday, October 10, 12

Couchbase Server 2.0 and Incremental Map reduce for real-time analytics

Embed Size (px)

Citation preview

Couchbase  Server  2.0  -­‐  Webinar  Series

Couchbase Server 2.0 Use Cases Overview

Introducing Couchbase Server 2.0

Couchbase Server 2.0 and Indexing/Querying

Couchbase Server 2.0 and Incremental Map Reduce for Real-Time Analytics

Couchbase Server 2.0 and Cross Data Center Replication

Couchbase Server 2.0 and Full-Text Search Integration

1Wednesday, October 10, 12

Couchbase  Server  2.0  -­‐  Webinar  Series

Couchbase Server 2.0 Use Cases Overview

Introducing Couchbase Server 2.0

h"p://www.couchbase.com/webinars

Couchbase Server 2.0 and Indexing/Querying

Couchbase Server 2.0 and Incremental Map Reduce for Real-Time Analytics

Couchbase Server 2.0 and Cross Data Center Replication

Couchbase Server 2.0 and Full-Text Search Integration

1Wednesday, October 10, 12

Couchbase  Server  2.0  -­‐  Webinar  Series

Couchbase Server 2.0 Use Cases Overview

Introducing Couchbase Server 2.0

h"p://www.couchbase.com/webinars

Couchbase Server 2.0 and Indexing/Querying

Couchbase Server 2.0 and Incremental Map Reduce for Real-Time Analytics

Couchbase Server 2.0 and Cross Data Center Replication

Couchbase Server 2.0 and Full-Text Search Integration

1Wednesday, October 10, 12

2

Incremental  Map  Reduce  for  Real-­‐Time  Analy?cs

Jasdeep  JaitlaTechnical  Evangelist

2Wednesday, October 10, 12

New  in  Two

JSON support Indexing and Querying

Cross data center replication

Incremental Map Reduce

3Wednesday, October 10, 12

New  in  Two

JSON support Indexing and Querying

Cross data center replication

Incremental Map Reduce

3Wednesday, October 10, 12

4

What  we’ll  talk  about

• Quick  RelaAonal  vs  Document  Databases•Why  Views  are  Helpful• Anatomy  of  Views  

•Map• Reduce

• Simple  Example  of  Map  Reduce• Use  Case  -­‐  Analyzing  Reddit  in  Real-­‐Time

• Demo• Breakdown

• Final  Words  on  Views

4Wednesday, October 10, 12

DOCUMENT  DATABASE  PRIMER

55Wednesday, October 10, 12

6

RelaAonal  vs  Document  Data  Model

Rela?onal  data  model Document  data  modelCollecAon  of  complex  documents  witharbitrary,  nested  data  formats  and

varying  “record”  format.

Highly-­‐structured  table  organizaAon  with  rigidly-­‐defined  data  formats  and  

record  structure.

JSONJSON

JSON

C1 C2 C3 C4

{

}

6Wednesday, October 10, 12

7

SQL  Normalized  Data

Addresses

1 DEN 30303CO

2 MV 94040CA

3 CHI 60609IL

Users

KEY First ZIP_IDLast

4 NY 10010NY

1 Jasdeep 2Jaitla

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

ZIP_ID CITY ZIPSTATE

To  get  informa?on  about  specific  user,  you  perform  a  join  across  two  tables  

7Wednesday, October 10, 12

7

SQL  Normalized  Data

Addresses

1 DEN 30303CO

2 MV 94040CA

3 CHI 60609IL

Users

KEY First ZIP_IDLast

4 NY 10010NY

1 Jasdeep 2Jaitla

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

ZIP_ID CITY ZIPSTATE

To  get  informa?on  about  specific  user,  you  perform  a  join  across  two  tables  

7Wednesday, October 10, 12

7

SQL  Normalized  Data

Addresses

1 DEN 30303CO

2 MV 94040CA

3 CHI 60609IL

Users

KEY First ZIP_IDLast

4 NY 10010NY

1 Jasdeep 2Jaitla

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

ZIP_ID CITY ZIPSTATE

To  get  informa?on  about  specific  user,  you  perform  a  join  across  two  tables  

foreign key

7Wednesday, October 10, 12

7

SQL  Normalized  Data

Addresses

1 DEN 30303CO

2 MV 94040CA

3 CHI 60609IL

Users

KEY First ZIP_IDLast

4 NY 10010NY

1 Jasdeep 2Jaitla

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

ZIP_ID CITY ZIPSTATE

To  get  informa?on  about  specific  user,  you  perform  a  join  across  two  tables  

foreign key

SELECT * FROM Users u INNER JOIN Addresses a ON u.zip_id = a.zip_id WHERE key=1

7Wednesday, October 10, 12

Addresses

1 DEN 30303CO

2

3 CHI 60609IL

4 NY 10010NY

ZIP_ID CITY ZIPSTATE

Users

KEY First ZIP_IDLast

2

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

8

All  data  in  a  single  document

Documents  are  Aggregates

1 Jasdeep Jaitla

94103CASF

8Wednesday, October 10, 12

Addresses

1 DEN 30303CO

2

3 CHI 60609IL

4 NY 10010NY

ZIP_ID CITY ZIPSTATE

Users

KEY First ZIP_IDLast

2

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

8

All  data  in  a  single  document

Documents  are  Aggregates

1 Jasdeep Jaitla

94103CASF

1 Jasdeep Jaitla

8Wednesday, October 10, 12

+

Addresses

1 DEN 30303CO

2

3 CHI 60609IL

4 NY 10010NY

ZIP_ID CITY ZIPSTATE

Users

KEY First ZIP_IDLast

2

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

8

All  data  in  a  single  document

Documents  are  Aggregates

1 Jasdeep Jaitla

94103CASF

1 Jasdeep Jaitla

8Wednesday, October 10, 12

+

Addresses

1 DEN 30303CO

2

3 CHI 60609IL

4 NY 10010NY

ZIP_ID CITY ZIPSTATE

Users

KEY First ZIP_IDLast

2

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

8

All  data  in  a  single  document

Documents  are  Aggregates

1 Jasdeep Jaitla

94103CASF

1 Jasdeep Jaitla

SF 94103CA

8Wednesday, October 10, 12

+

Addresses

1 DEN 30303CO

2

3 CHI 60609IL

4 NY 10010NY

ZIP_ID CITY ZIPSTATE

Users

KEY First ZIP_IDLast

2

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

8

All  data  in  a  single  document

Documents  are  Aggregates

1 Jasdeep Jaitla

94103CASF

1 Jasdeep Jaitla

SF 94103CA

8Wednesday, October 10, 12

+

Addresses

1 DEN 30303CO

2

3 CHI 60609IL

4 NY 10010NY

ZIP_ID CITY ZIPSTATE

Users

KEY First ZIP_IDLast

2

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

8

All  data  in  a  single  document

Documents  are  Aggregates

=1 Jasdeep Jaitla

94103CASF

1 Jasdeep Jaitla

SF 94103CA

8Wednesday, October 10, 12

+

Addresses

1 DEN 30303CO

2

3 CHI 60609IL

4 NY 10010NY

ZIP_ID CITY ZIPSTATE

Users

KEY First ZIP_IDLast

2

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

8

All  data  in  a  single  document

Documents  are  Aggregates

   {        “ID”:  1,        “First”:  “Jasdeep”,        “Last”:  “Jaitla”,        “ZIP”:  “94103”,        “CITY”:  “SF”,        “STATE”:  “CA”    } JSON

=1 Jasdeep Jaitla

94103CASF

1 Jasdeep Jaitla

SF 94103CA

8Wednesday, October 10, 12

+

Addresses

1 DEN 30303CO

2

3 CHI 60609IL

4 NY 10010NY

ZIP_ID CITY ZIPSTATE

Users

KEY First ZIP_IDLast

2

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

8

All  data  in  a  single  document

Documents  are  Aggregates

   {        “ID”:  1,        “First”:  “Jasdeep”,        “Last”:  “Jaitla”,        “ZIP”:  “94103”,        “CITY”:  “SF”,        “STATE”:  “CA”    } JSON

=

Document Data is an Aggregate

1 Jasdeep Jaitla

94103CASF

1 Jasdeep Jaitla

SF 94103CA

8Wednesday, October 10, 12

+

Addresses

1 DEN 30303CO

2

3 CHI 60609IL

4 NY 10010NY

ZIP_ID CITY ZIPSTATE

Users

KEY First ZIP_IDLast

2

2 Joe   2Smith

3 Ali 2Dodson

4 John 3Doe

8

All  data  in  a  single  document

Documents  are  Aggregates

   {        “ID”:  1,        “First”:  “Jasdeep”,        “Last”:  “Jaitla”,        “ZIP”:  “94103”,        “CITY”:  “SF”,        “STATE”:  “CA”    } JSON

=

couchbase.get(“user::1”)

Document Data is an Aggregate

1 Jasdeep Jaitla

94103CASF

1 Jasdeep Jaitla

SF 94103CA

8Wednesday, October 10, 12

9

Document  Database  Schema  is  Flexible  &  Dynamic

 {        “ID”:  1,        “FIRST”:  “Jasdeep”,        “LAST”:  “Jaitla”,        “ZIP”:  “94103”,        “CITY”:  “SF”,        “STATE”:  “CA”

   

JSON

9Wednesday, October 10, 12

9

Document  Database  Schema  is  Flexible  &  Dynamic

 {        “ID”:  1,        “FIRST”:  “Jasdeep”,        “LAST”:  “Jaitla”,        “ZIP”:  “94103”,        “CITY”:  “SF”,        “STATE”:  “CA”

   

}

Just  add  informa?on  to  a  document

JSON

9Wednesday, October 10, 12

9

Document  Database  Schema  is  Flexible  &  Dynamic

 {        “ID”:  1,        “FIRST”:  “Jasdeep”,        “LAST”:  “Jaitla”,        “ZIP”:  “94103”,        “CITY”:  “SF”,        “STATE”:  “CA”

   

}

Just  add  informa?on  to  a  document

JSON

                                                             ,”STATUS”:  

           {    “TEXT”:  “Wow!”,                  “GEO_LOC”:  “27.4”                    “LIKES”:  45  }

9Wednesday, October 10, 12

MAP-­‐REDUCE  BASICS

1010Wednesday, October 10, 12

Document  Keys

11

JSONJSON

JSON

{

}

Document  Keys  Come  In  Many  Flavors

• Human  Readable• Incremental  Counter  Index• UUID• Timestamp  Based• Social  Media  Account  ID• Random  Numbers

Q:11Wednesday, October 10, 12

Document  Keys

11

JSONJSON

JSON

{

}

Document  Keys  Come  In  Many  Flavors

• Human  Readable• Incremental  Counter  Index• UUID• Timestamp  Based• Social  Media  Account  ID• Random  Numbers

Q: Does Couchbase have a mechanism for creating unique keys?

11Wednesday, October 10, 12

Document  Keys

11

JSONJSON

JSON

{

}

Document  Keys  Come  In  Many  Flavors

• Human  Readable• Incremental  Counter  Index• UUID• Timestamp  Based• Social  Media  Account  ID• Random  Numbers

If I use unique usernames or emails for keys, will I need a map-query?Q:

11Wednesday, October 10, 12

Document  Keys

11

JSONJSON

JSON

{

}

Document  Keys  Come  In  Many  Flavors

• Human  Readable• Incremental  Counter  Index• UUID• Timestamp  Based• Social  Media  Account  ID• Random  Numbers

Q: If I use UUID’s for ID’s will I need a map-reduce to find Documents?

11Wednesday, October 10, 12

Document  Keys

12

A:

12Wednesday, October 10, 12

Document  Keys

12

If your keys are indeterminable, you will need Secondary Indexes -- Views (Map or Map/Reduce) or Elastic Search to find Documents.

A:

12Wednesday, October 10, 12

Document  Keys

12

If your keys are indeterminable, you will need Secondary Indexes -- Views (Map or Map/Reduce) or Elastic Search to find Documents.

A:

There  are  many  pa_erns  for  key  creaAon,  it’s  a  skill  and  an  art  to  design  your  keys.

12Wednesday, October 10, 12

Document  Keys

13

A:

13Wednesday, October 10, 12

Document  Keys

13

If you want to find Documents based on more than one parameter, you may need Views as well.

A:

13Wednesday, October 10, 12

Document  Keys

13

If you want to find Documents based on more than one parameter, you may need Views as well.

A:

In  many  cases  Lookups  can  also  be  done  without  Views,  using  a  Lookup  Pa_ern,  but  that’s  not  always  

the  case  especially  for  Ame  based  or  geo  based  values.

13Wednesday, October 10, 12

ANATOMY  OF  A  VIEW

1414Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1

View

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1

ViewView

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1

ViewViewView

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1 Design Document 2

ViewViewView

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1 Design Document 2

ViewViewViewView

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

Indexers Are Allocated Per Design Doc

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

Indexers Are Allocated Per Design Doc

All Updated at Same TimeAll Updated at Same TimeAll Updated at Same Time

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

Indexers Are Allocated Per Design Doc

All Updated at Same TimeAll Updated at Same TimeAll Updated at Same Time

Can Only Access Data in the Bucket Namespace

Can Only Access Data in the Bucket Namespace

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

All Updated at Same TimeAll Updated at Same TimeAll Updated at Same Time

Can Only Access Data in the Bucket Namespace

Can Only Access Data in the Bucket Namespace

15Wednesday, October 10, 12

Buckets  >>  Design  Documents  >>  Views

15

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

Can Only Access Data in the Bucket Namespace

Can Only Access Data in the Bucket Namespace

15Wednesday, October 10, 12

function(doc,  meta)  {emit(doc.username,  doc.email)

}

Map()  funcAon  =  index

16

Every Document passes through View Map() functions

Map

16Wednesday, October 10, 12

function(doc,  meta)  {emit(doc.username,  doc.email)

}

Map()  funcAon  =  index

16

json doc

Every Document passes through View Map() functions

Map

16Wednesday, October 10, 12

function(doc,  meta)  {emit(doc.username,  doc.email)

}

Map()  funcAon  =  index

16

json doc doc metadata

Every Document passes through View Map() functions

Map

16Wednesday, October 10, 12

function(doc,  meta)  {emit(doc.username,  doc.email)

}

Map()  funcAon  =  index

16

create row

json doc doc metadata

Every Document passes through View Map() functions

Map

16Wednesday, October 10, 12

function(doc,  meta)  {emit(doc.username,  doc.email)

}

Map()  funcAon  =  index

16

indexed keycreate row

json doc doc metadata

Every Document passes through View Map() functions

Map

16Wednesday, October 10, 12

function(doc,  meta)  {emit(doc.username,  doc.email)

}

Map()  funcAon  =  index

16

indexed key output value(s)create row

json doc doc metadata

Every Document passes through View Map() functions

Map

16Wednesday, October 10, 12

function(doc,  meta)  {emit(doc.email,  meta.id)

}

Text  or  Numeric  Based  Keys

17

Map

17Wednesday, October 10, 12

function(doc,  meta)  {emit(doc.email,  meta.id)

}

Text  or  Numeric  Based  Keys

17

text key

Map

17Wednesday, October 10, 12

function(doc,  meta)  {emit(doc.email,  meta.id)

}

Text  or  Numeric  Based  Keys

17

text key

Map

doc.email meta.id

[email protected] u::1

[email protected] u::2

[email protected] u::3

17Wednesday, October 10, 12

function(doc,  meta)  {emit(dateToArray(doc.timestamp),  1)

}

Array  Based  Index  Keys

18

Array Based Index Keys get sorted by each element starting with first element

Map

18Wednesday, October 10, 12

function(doc,  meta)  {emit(dateToArray(doc.timestamp),  1)

}

Array  Based  Index  Keys

18

array key

Array Based Index Keys get sorted by each element starting with first element

Map

18Wednesday, October 10, 12

function(doc,  meta)  {emit(dateToArray(doc.timestamp),  1)

}

Array  Based  Index  Keys

18

array key

Array Based Index Keys get sorted by each element starting with first element

Map

dateToArray(doc.?mestamp) value

[2012,10,9,18,45] 1

[2012,9,26,11,15] 1

[2012,8,13,2,12] 1

18Wednesday, October 10, 12

Querying  Views

32 3219Wednesday, October 10, 12

Beer  Database  Example

20

{ "name": "Aventinus Weizenstarkbier / Doppel Weizen Bock", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, almost black-colored and streaked with fine top-fermenting yeast, this beer has a compact and persistent head. This is a very intense wheat doppelbock with a complex spicy chocolate-like arome with a hint of banana and raisins. On the palate, you experience a soft touch and on the tongue it is very rich and complex, though fresh with a hint of caramel. It finishes in a rich soft and lightly bitter impression.", "style": "South German-Style Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

meta doc

20Wednesday, October 10, 12

Beer  Database  Example

20

{ "name": "Aventinus Weizenstarkbier / Doppel Weizen Bock", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, almost black-colored and streaked with fine top-fermenting yeast, this beer has a compact and persistent head. This is a very intense wheat doppelbock with a complex spicy chocolate-like arome with a hint of banana and raisins. On the palate, you experience a soft touch and on the tongue it is very rich and complex, though fresh with a hint of caramel. It finishes in a rich soft and lightly bitter impression.", "style": "South German-Style Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

meta docalcohol by volume (abv)

brewery_id (key)document key

20Wednesday, October 10, 12

30

The  index  definiAon

21Wednesday, October 10, 12

30

The  index  definiAon

+row

21Wednesday, October 10, 12

30

The  index  definiAon

indexed key+row

21Wednesday, October 10, 12

30

The  index  definiAon

indexed key value(s)+row

21Wednesday, October 10, 12

31

The  result  set:  beers  keyed  by  brewery_id

22Wednesday, October 10, 12

31

The  result  set:  beers  keyed  by  brewery_id

brewery_id

document key (of the beer)

alcohol by volume (abv)

22Wednesday, October 10, 12

34 34

We  are  reducing  doc.abv  with  _stats

23Wednesday, October 10, 12

34 34

We  are  reducing  doc.abv  with  _stats

add _stats built-in reduction

23Wednesday, October 10, 12

33

Use  a  built-­‐in  reduce  funcAon  with  a  group  query

Find average alcohol by volume per brewery.

24Wednesday, October 10, 12

33

Use  a  built-­‐in  reduce  funcAon  with  a  group  query

Find average alcohol by volume per brewery.

set group=true & reduce=true

add _stats built-in reduction

24Wednesday, October 10, 12

35 35

Group  reduce  (reduce  by  unique  key)

25Wednesday, October 10, 12

35 35

Group  reduce  (reduce  by  unique  key)

group=true & reduce=true

number of beers by this brewery max abvmin abv

25Wednesday, October 10, 12

Using  Incremental  Map-­‐ReduceUse  Case  Example

36 3626Wednesday, October 10, 12

reddalyzer.com

27

reddalyzer.comReal-Time Analysis of Redditusing Couchbase & Clojure

27Wednesday, October 10, 12

Quick  Demo

2828Wednesday, October 10, 12

Sample  Reddit  Post  -­‐  Document

29

{ "over_18": false, "banned_by": null, "is_self": false, "link_flair_text": null, "hidden": false, "edited": false, "kind": "link", "subreddit_id": "t5_2qh55", "downs": 5, "domain": "ibelieveicanfry.com", "selftext": "", "approved_by": null, "score": 5, "author": "ibelieveicanfry", "name": "t3_yph1p", "num_comments": 0, "selftext_html": null, "link_flair_css_class": null, "likes": null, "media_embed": { }, "media": null, "title": "I don't buy the bottled Thai Sweet Chili Sauce anymore...", "thumbnail": "", "permalink": "/r/food/comments/yph1p/i_dont_buy_the_bottled_thai_sweet_chili_sauce/", "url": "http://www.ibelieveicanfry.com/2012/08/thai-sweet-chili-sauce.html", "created": 1345745189, "num_reports": null, "saved": false, "subreddit": "food", "ups": 10, "created_utc": 1345745189, "author_flair_css_class": null, "id": "yph1p", "author_flair_text": null, "clicked": false}

29Wednesday, October 10, 12

Sample  Reddit  Post  -­‐  Document

29

{ "over_18": false, "banned_by": null, "is_self": false, "link_flair_text": null, "hidden": false, "edited": false, "kind": "link", "subreddit_id": "t5_2qh55", "downs": 5, "domain": "ibelieveicanfry.com", "selftext": "", "approved_by": null, "score": 5, "author": "ibelieveicanfry", "name": "t3_yph1p", "num_comments": 0, "selftext_html": null, "link_flair_css_class": null, "likes": null, "media_embed": { }, "media": null, "title": "I don't buy the bottled Thai Sweet Chili Sauce anymore...", "thumbnail": "", "permalink": "/r/food/comments/yph1p/i_dont_buy_the_bottled_thai_sweet_chili_sauce/", "url": "http://www.ibelieveicanfry.com/2012/08/thai-sweet-chili-sauce.html", "created": 1345745189, "num_reports": null, "saved": false, "subreddit": "food", "ups": 10, "created_utc": 1345745189, "author_flair_css_class": null, "id": "yph1p", "author_flair_text": null, "clicked": false}

“score”: 5

“subreddit”: “food”

“created_utc”: 1345745189

“kind”: “link”

29Wednesday, October 10, 12

Map  Subreddits  by  Name  &  Day

30

function (doc, meta) { // Skip documents that aren't JSON if (meta.type == "json") { // Skip docs that aren't links if(doc.kind == "link") { var dt = new Date(doc.created_utc * 1000);

//Get day of week, but start week on Saturday, not Sunday, so that //we can pull out the weekend easily. var ssday = dt.getUTCDay() + 1; if (ssday == 7) ssday = 0;

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score}); } }}

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

30Wednesday, October 10, 12

Map  Subreddits  by  Name  &  Day

30

function (doc, meta) { // Skip documents that aren't JSON if (meta.type == "json") { // Skip docs that aren't links if(doc.kind == "link") { var dt = new Date(doc.created_utc * 1000);

//Get day of week, but start week on Saturday, not Sunday, so that //we can pull out the weekend easily. var ssday = dt.getUTCDay() + 1; if (ssday == 7) ssday = 0;

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score}); } }}

ensure doc.kind == “link”

ensure meta.type == “json”

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

30Wednesday, October 10, 12

Map  Subreddits  by  Name  &  Day

30

function (doc, meta) { // Skip documents that aren't JSON if (meta.type == "json") { // Skip docs that aren't links if(doc.kind == "link") { var dt = new Date(doc.created_utc * 1000);

//Get day of week, but start week on Saturday, not Sunday, so that //we can pull out the weekend easily. var ssday = dt.getUTCDay() + 1; if (ssday == 7) ssday = 0;

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score}); } }}

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

30Wednesday, October 10, 12

Map  Subreddits  by  Name  &  Day

30

function (doc, meta) { // Skip documents that aren't JSON if (meta.type == "json") { // Skip docs that aren't links if(doc.kind == "link") { var dt = new Date(doc.created_utc * 1000);

//Get day of week, but start week on Saturday, not Sunday, so that //we can pull out the weekend easily. var ssday = dt.getUTCDay() + 1; if (ssday == 7) ssday = 0;

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score}); } }}

convert doc.created_utcto Date Object

calculate day of week

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

30Wednesday, October 10, 12

Map  Subreddits  by  Name  &  Day

30

function (doc, meta) { // Skip documents that aren't JSON if (meta.type == "json") { // Skip docs that aren't links if(doc.kind == "link") { var dt = new Date(doc.created_utc * 1000);

//Get day of week, but start week on Saturday, not Sunday, so that //we can pull out the weekend easily. var ssday = dt.getUTCDay() + 1; if (ssday == 7) ssday = 0;

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score}); } }}

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

30Wednesday, October 10, 12

Map  Subreddits  by  Name  &  Day

30

function (doc, meta) { // Skip documents that aren't JSON if (meta.type == "json") { // Skip docs that aren't links if(doc.kind == "link") { var dt = new Date(doc.created_utc * 1000);

//Get day of week, but start week on Saturday, not Sunday, so that //we can pull out the weekend easily. var ssday = dt.getUTCDay() + 1; if (ssday == 7) ssday = 0;

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score}); } }}

emit (create) row

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

30Wednesday, October 10, 12

Map  Subreddits  by  Name  &  Day

30

function (doc, meta) { // Skip documents that aren't JSON if (meta.type == "json") { // Skip docs that aren't links if(doc.kind == "link") { var dt = new Date(doc.created_utc * 1000);

//Get day of week, but start week on Saturday, not Sunday, so that //we can pull out the weekend easily. var ssday = dt.getUTCDay() + 1; if (ssday == 7) ssday = 0;

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score}); } }}

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

30Wednesday, October 10, 12

Map  Subreddits  by  Name  &  Day

30

function (doc, meta) { // Skip documents that aren't JSON if (meta.type == "json") { // Skip docs that aren't links if(doc.kind == "link") { var dt = new Date(doc.created_utc * 1000);

//Get day of week, but start week on Saturday, not Sunday, so that //we can pull out the weekend easily. var ssday = dt.getUTCDay() + 1; if (ssday == 7) ssday = 0;

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score}); } }} order by doc.subreddit then order by day of week

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

30Wednesday, October 10, 12

Map  Subreddits  by  Name  &  Day

30

function (doc, meta) { // Skip documents that aren't JSON if (meta.type == "json") { // Skip docs that aren't links if(doc.kind == "link") { var dt = new Date(doc.created_utc * 1000);

//Get day of week, but start week on Saturday, not Sunday, so that //we can pull out the weekend easily. var ssday = dt.getUTCDay() + 1; if (ssday == 7) ssday = 0;

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score}); } }}

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

30Wednesday, October 10, 12

Map  Subreddits  by  Name  &  Day

30

function (doc, meta) { // Skip documents that aren't JSON if (meta.type == "json") { // Skip docs that aren't links if(doc.kind == "link") { var dt = new Date(doc.created_utc * 1000);

//Get day of week, but start week on Saturday, not Sunday, so that //we can pull out the weekend easily. var ssday = dt.getUTCDay() + 1; if (ssday == 7) ssday = 0;

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score}); } }} output hour of day output karma score

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

30Wednesday, October 10, 12

Map  Subreddits  by  Name  &  Day

30

function (doc, meta) { // Skip documents that aren't JSON if (meta.type == "json") { // Skip docs that aren't links if(doc.kind == "link") { var dt = new Date(doc.created_utc * 1000);

//Get day of week, but start week on Saturday, not Sunday, so that //we can pull out the weekend easily. var ssday = dt.getUTCDay() + 1; if (ssday == 7) ssday = 0;

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score}); } }}

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

30Wednesday, October 10, 12

Map  Output

31

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

31Wednesday, October 10, 12

Map  Output

31

{"id":"zx4sc","key":["funny",0],"value":{"hour":9,"score":0}},{"id":"zxak2","key":["funny",0],"value":{"hour":13,"score":1}},{"id":"ytw3t","key":["funny",1],"value":{"hour":0,"score":938}},{"id":"yv3uf","key":["funny",1],"value":{"hour":19,"score":2508}},......

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

31Wednesday, October 10, 12

indexed key output value(s)

Map  Output

31

{"id":"zx4sc","key":["funny",0],"value":{"hour":9,"score":0}},{"id":"zxak2","key":["funny",0],"value":{"hour":13,"score":1}},{"id":"ytw3t","key":["funny",1],"value":{"hour":0,"score":938}},{"id":"yv3uf","key":["funny",1],"value":{"hour":19,"score":2508}},......

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

31Wednesday, October 10, 12

Map  Output

31

{"id":"zx4sc","key":["funny",0],"value":{"hour":9,"score":0}},{"id":"zxak2","key":["funny",0],"value":{"hour":13,"score":1}},{"id":"ytw3t","key":["funny",1],"value":{"hour":0,"score":938}},{"id":"yv3uf","key":["funny",1],"value":{"hour":19,"score":2508}},......

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

output value(s)indexed key

31Wednesday, October 10, 12

Map  Output

31

{"id":"zx4sc","key":["funny",0],"value":{"hour":9,"score":0}},{"id":"zxak2","key":["funny",0],"value":{"hour":13,"score":1}},{"id":"ytw3t","key":["funny",1],"value":{"hour":0,"score":938}},{"id":"yv3uf","key":["funny",1],"value":{"hour":19,"score":2508}},......

emit([doc.subreddit, ssday], {hour: dt.getUTCHours(), score: doc.score});

31Wednesday, October 10, 12

Reduce  Map

32

function (keys, values, rereduce) { var out = {freqs: [], score: []} //Prefill the arrays with zeroes for(i = 0; i < 24; i++) { out.freqs[i] = 0; out.score[i] = 0; } for(v in values) {

if(!rereduce) { //Values are the output of map out.freqs[values[v].hour] += 1; out.score[values[v].hour] += values[v].score; }

else { //Values are the output of reduce // Combine the arrays for(h in values[v].freqs) { out.freqs[h] += values[v].freqs[h]; out.score[h] += values[v].score[h];

} } } return out;

}

{

"freqs": [ 178344, 174476, 171569, 161836, 146411, 120881, 94139, 75880, 62617, 56553, 57811, 70185, 88880, 114252,137301, 156750, 166376, 172562, 177094, 182093, 180485, 180434, 178706, 176525 ],

"score": [ 2856922, 2688783,2392233, 1954973, 1623642, 1355241, 1187087, 1061364, 1009152, 1165220, 1506009, 2207945, 3081796, 3868605, 4441859,4633668, 4200795, 4291777, 3986492, 3757385, 3420142, 3032258, 3029148, 2975291 ] }

h"p://localhost:8092/reddalyzr/_design/reddit/_view/posthours?stale=update_arer

32Wednesday, October 10, 12

Reduce  Map

32

function (keys, values, rereduce) { var out = {freqs: [], score: []} //Prefill the arrays with zeroes for(i = 0; i < 24; i++) { out.freqs[i] = 0; out.score[i] = 0; } for(v in values) {

if(!rereduce) { //Values are the output of map out.freqs[values[v].hour] += 1; out.score[values[v].hour] += values[v].score; }

else { //Values are the output of reduce // Combine the arrays for(h in values[v].freqs) { out.freqs[h] += values[v].freqs[h]; out.score[h] += values[v].score[h];

} } } return out;

}

{

"freqs": [ 178344, 174476, 171569, 161836, 146411, 120881, 94139, 75880, 62617, 56553, 57811, 70185, 88880, 114252,137301, 156750, 166376, 172562, 177094, 182093, 180485, 180434, 178706, 176525 ],

"score": [ 2856922, 2688783,2392233, 1954973, 1623642, 1355241, 1187087, 1061364, 1009152, 1165220, 1506009, 2207945, 3081796, 3868605, 4441859,4633668, 4200795, 4291777, 3986492, 3757385, 3420142, 3032258, 3029148, 2975291 ] }

For every row increment post count and post score (karma)

h"p://localhost:8092/reddalyzr/_design/reddit/_view/posthours?stale=update_arer

32Wednesday, October 10, 12

Reduce  Map

32

function (keys, values, rereduce) { var out = {freqs: [], score: []} //Prefill the arrays with zeroes for(i = 0; i < 24; i++) { out.freqs[i] = 0; out.score[i] = 0; } for(v in values) {

if(!rereduce) { //Values are the output of map out.freqs[values[v].hour] += 1; out.score[values[v].hour] += values[v].score; }

else { //Values are the output of reduce // Combine the arrays for(h in values[v].freqs) { out.freqs[h] += values[v].freqs[h]; out.score[h] += values[v].score[h];

} } } return out;

}

{

"freqs": [ 178344, 174476, 171569, 161836, 146411, 120881, 94139, 75880, 62617, 56553, 57811, 70185, 88880, 114252,137301, 156750, 166376, 172562, 177094, 182093, 180485, 180434, 178706, 176525 ],

"score": [ 2856922, 2688783,2392233, 1954973, 1623642, 1355241, 1187087, 1061364, 1009152, 1165220, 1506009, 2207945, 3081796, 3868605, 4441859,4633668, 4200795, 4291777, 3986492, 3757385, 3420142, 3032258, 3029148, 2975291 ] }

h"p://localhost:8092/reddalyzr/_design/reddit/_view/posthours?stale=update_arer

32Wednesday, October 10, 12

Reduce  Map

32

function (keys, values, rereduce) { var out = {freqs: [], score: []} //Prefill the arrays with zeroes for(i = 0; i < 24; i++) { out.freqs[i] = 0; out.score[i] = 0; } for(v in values) {

if(!rereduce) { //Values are the output of map out.freqs[values[v].hour] += 1; out.score[values[v].hour] += values[v].score; }

else { //Values are the output of reduce // Combine the arrays for(h in values[v].freqs) { out.freqs[h] += values[v].freqs[h]; out.score[h] += values[v].score[h];

} } } return out;

}

{

"freqs": [ 178344, 174476, 171569, 161836, 146411, 120881, 94139, 75880, 62617, 56553, 57811, 70185, 88880, 114252,137301, 156750, 166376, 172562, 177094, 182093, 180485, 180434, 178706, 176525 ],

"score": [ 2856922, 2688783,2392233, 1954973, 1623642, 1355241, 1187087, 1061364, 1009152, 1165220, 1506009, 2207945, 3081796, 3868605, 4441859,4633668, 4200795, 4291777, 3986492, 3757385, 3420142, 3032258, 3029148, 2975291 ] }

Array of Results

h"p://localhost:8092/reddalyzr/_design/reddit/_view/posthours?stale=update_arer

32Wednesday, October 10, 12

Reduce  Map

32

function (keys, values, rereduce) { var out = {freqs: [], score: []} //Prefill the arrays with zeroes for(i = 0; i < 24; i++) { out.freqs[i] = 0; out.score[i] = 0; } for(v in values) {

if(!rereduce) { //Values are the output of map out.freqs[values[v].hour] += 1; out.score[values[v].hour] += values[v].score; }

else { //Values are the output of reduce // Combine the arrays for(h in values[v].freqs) { out.freqs[h] += values[v].freqs[h]; out.score[h] += values[v].score[h];

} } } return out;

}

{

"freqs": [ 178344, 174476, 171569, 161836, 146411, 120881, 94139, 75880, 62617, 56553, 57811, 70185, 88880, 114252,137301, 156750, 166376, 172562, 177094, 182093, 180485, 180434, 178706, 176525 ],

"score": [ 2856922, 2688783,2392233, 1954973, 1623642, 1355241, 1187087, 1061364, 1009152, 1165220, 1506009, 2207945, 3081796, 3868605, 4441859,4633668, 4200795, 4291777, 3986492, 3757385, 3420142, 3032258, 3029148, 2975291 ] }

h"p://localhost:8092/reddalyzr/_design/reddit/_view/posthours?stale=update_arer

32Wednesday, October 10, 12

View  UpdaAng

33

Couchbase Bucket

Design Document 2

View View

Design Document 1

ViewViewView

33Wednesday, October 10, 12

View  UpdaAng

33

Couchbase Bucket

Design Document 2

View View

Design Document 1

ViewViewView

Updates every 3 seconds or 5000 document operations

33Wednesday, October 10, 12

View  UpdaAng

33

Couchbase Bucket

Design Document 2

View View

Design Document 1

ViewViewView

Updates every 3 seconds or 5000 document operations

This is a Configurable Setting

33Wednesday, October 10, 12

View  UpdaAng

33

Couchbase Bucket

Design Document 2

View View

Design Document 1

ViewViewView

33Wednesday, October 10, 12

View  UpdaAng

33

Couchbase Bucket

Design Document 2

View View

Design Document 1

ViewViewView

Can also be Triggered to Update by client queries by using

stale=false parameter

33Wednesday, October 10, 12

Why  is  it  Incremental?

34

View  Indexes  are  Append  Only  B+  Trees,  so  new  data  is  just  added  to  them,  and  they  are  compacted  and  opAmized  automaAcally

Views  are  only  Re-­‐Indexed  if  you  change  their  definiAon  and  republish  them.  The  original  index  

stays  available  unAl  new  redefined  index  completes  indexing.

34Wednesday, October 10, 12

Ques?ons?

[email protected]

Twi"er@scalabl3

3535Wednesday, October 10, 12