42
Friday, April 26, 13

NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Embed Size (px)

DESCRIPTION

Introduction to Map Reduce and how it is used in Couchbase Server 2.0 to query documents

Citation preview

Page 1: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Friday, April 26, 13

Page 2: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Introduc)on  to  Map  Reducewith  Couchbase

Tugdual  Grall  /  @tgrall

NoSQL  Ma)ers  ‘13  -­‐  Cologne  -­‐  April  25th  2013

Friday, April 26, 13

Page 3: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

About  Me  

• Tugdual  “Tug”  Grall­ Couchbase

• Technical  Evangelist

­ eXo

• CTO

­ Oracle

• Developer/Product  Manager

• Mainly  Java/SOA

­ Developer  in  consul@ng  firms

•Web

•@tgrall

• hEp://blog.grallandco.com• tgrall

• NantesJUG  co-­‐founder• Pet  Project  :• hEp://www.resultri.com

Friday, April 26, 13

Page 4: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

What’s  the  Problem  ?

Lots  of  DataBig  Data SaaS/Cloud  

CompuDngBig  Users

Friday, April 26, 13

Page 5: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Solu)on

Distribute:•  the  data•  the  processing  of  the  data

Friday, April 26, 13

Page 6: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Map  Reduce  

MapReduce  is  a  programming  model  for  processing  large  data  sets,  and  the  name  of  an  implementa@on  of  the  model  by  Google.  MapReduce  is  typically  used  to  do  distributed  compu@ng  on  clusters  of  computers.

hEp://research.google.com/archive/mapreduce.html

Friday, April 26, 13

Page 7: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

In  details

• Developer  specifies  2  methods:­ map (in_key, in_value) -> list(out_key, intermediate_value)

• Processes  input  data  

• Produces  key,  values  pairs­ reduce (out_key, list(intermediate_value)) -> list(out_value)

• Combines  all  intermediate  values  for  a  par@cular  key

• Produce  a  set  of  merged  output  values

Friday, April 26, 13

Page 8: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Execu)on

Friday, April 26, 13

Page 9: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Most  common  use  case

©  Yahoo  inc.

Friday, April 26, 13

Page 10: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

What  about  Couchbase?

Friday, April 26, 13

Page 11: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Couchbase  Open  Source  Project

• Leading  NoSQL  database  project  focused  on  distributed  database  technology  and  surrounding  ecosystem

• Supports  both  key-­‐value  and  document-­‐oriented  use  cases

• All  components  are  available  under  the  Apache  2.0  Public  License

• Obtained  as  packaged  soXware  in  both  enterprise  and  community  edi@ons.

Couchbase Open Source Project

Friday, April 26, 13

Page 12: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Couchbase  Server  Core  Principles

Easy  Scalability

Consistent  High  Performance

Always  On  24x365

Grow  cluster  without  applica@on  changes,  without  down@me  with  a  

single  click

Consistent  sub-­‐millisecond  read  and  write  response  @mes  with  consistent  high  throughput

No  down@me  for  soXware  upgrades,  hardware  maintenance,  etc.

Flexible  Data  Model

JSON  document  model  with  no  fixed  schema.

JSONJSONJSON

JSONJSON

PERFORMANCE

Friday, April 26, 13

Page 13: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Addi)onal  Couchbase  Server  Features

Built-­‐in  clustering  –  All  nodes  equal

Data  replica@on  with  auto-­‐failover

Zero-­‐down@me  maintenance  

Built-­‐in  managed  cached

Append-­‐only  storage  layer

Online  compac@on

Monitoring  and  admin  API  &  UI

SDK  for  a  variety  of  languages

Friday, April 26, 13

Page 14: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Heartbeat

Process  m

onito

r

Glob

al  singleton  supe

rviso

r

Confi

gura@o

n  manager

on  each  node

Rebalance  orchestrator

Nod

e  he

alth  m

onito

r

one  per  cluster

vBucket  state  and

 replica@

on  m

anager

hVpRE

ST  m

anagem

ent  A

PI/W

eb  UI

HTTP8091

Erlang  port  mapper4369

Distributed  Erlang21100  -­‐  21199

Erlang/OTP

storage  interface

Couchbase  EP  Engine

11210Memcapable    2.0

Moxi

11211Memcapable    1.0

Memcached

New  Persistence  Layer

8092Query  API

Que

ry  Engine

Data  Manager Cluster  Manager

Couchbase  Server  2.0  Architecture

Friday, April 26, 13

Page 15: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

New  Persistence  Layer

storage  interface

Couchbase  EP  Engine

11210Memcapable    2.0

Moxi

11211Memcapable    1.0

Object-­‐level  Cache

Disk  Persistence

8092Query  API

Que

ry  Engine

HTTP8091

Erlang  port  mapper4369

Distributed  Erlang21100  -­‐  21199

Heartbeat

Process  m

onito

r

Glob

al  singleton  supe

rviso

r

Confi

gura@o

n  manager

on  each  node

Rebalance  orchestrator

Nod

e  he

alth  m

onito

r

one  per  cluster

vBucket  state  and

 replica@

on  m

anager

hVp

REST  m

anagem

ent  A

PI/W

eb  UI

Erlang/OTP

Server/Cluster  Management  &  CommunicaDon

(Erlang)

RAM  Cache,  Indexing  &  Persistence  Management

(C  &  V8)

The Unreasonable Effectiveness of C by Damien Katz

Couchbase  Server  2.0  Architecture

Friday, April 26, 13

Page 16: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

COUCHBASE  SERVER  CLUSTER

Basic  Opera)on

• Docs  distributed  evenly  across  servers  

• Each  server  stores  both  ac)ve  and  replica  docsOnly  one  server  ac@ve  at  a  @me

• Client  library  provides  app  with  simple  interface  to  database

• Cluster  map  provides  map  to  which  server  doc  is  onApp  never  needs  to  know

• App  reads,  writes,  updates  docs

•Mul)ple  app  servers  can  access  same  document  at  same  )me

User  Configured  Replica  Count  =  1

READ/WRITE/UPDATE

ACTIVE

Doc  5

Doc  2

Doc

Doc

Doc

SERVER  1

ACTIVE

Doc  4

Doc  7

Doc

Doc

Doc

SERVER  2

Doc  8

ACTIVE

Doc  1

Doc  2

Doc

Doc

Doc

REPLICA

Doc  4

Doc  1

Doc  8

Doc

Doc

Doc

REPLICA

Doc  6

Doc  3

Doc  2

Doc

Doc

Doc

REPLICA

Doc  7

Doc  9

Doc  5

Doc

Doc

Doc

SERVER  3

Doc  6

APP  SERVER  1

COUCHBASE  Client  LibraryCLUSTER  MAP

COUCHBASE  Client  LibraryCLUSTER  MAP

APP  SERVER  2

Doc  9

Friday, April 26, 13

Page 17: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

How  to  access  the  data?

Friday, April 26, 13

Page 18: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Couchbase.get(“my-key”);

Friday, April 26, 13

Page 19: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Key

{        “string”  :  “string”,        “string”  :  value,        “string”  :                        {    “string”  :  “string”,                              “string”  :  value  },        “string”  :  [  array  ]}

JSONOBJECT

(“DOCUMENT”)

• How  to  find  document  based  on  its  aVributes?­ get  employee  by  email

­ get  products  by  type

­ ...

• You  need  to  look  “into”  the  document/value

Look  at  a  document

Friday, April 26, 13

Page 20: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Create  an  index  !

How  to?

Friday, April 26, 13

Page 21: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

{ "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

Key Value

Aven@nus 8.2

Avenue  Ale 4.1

... ...

{ "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

{ "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

{ "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

{ "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

{ "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

{ "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

{ "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

{ "name": "Aventinus", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, ... Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

Create  the  index

Friday, April 26, 13

Page 22: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Concrete  Example

• This  map  func)on:­ receives  the  document  and  metadata

­ as  developer  you  just  have  to  emit  the  K,V

Friday, April 26, 13

Page 23: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Map  Func)on

Text

Friday, April 26, 13

Page 24: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

ye@@couchbase.com u::4

[email protected] u::3

?startkey=”b1”  &  endkey=”zz”

Pulls  the  Index-­‐Keys  between  UTF-­‐8  Range  specified  by  the  startkey  and  endkey.

?startkey=”bz”  &  endkey=”zn”

Pulls  the  Index-­‐Keys  between  UTF-­‐8  Range  specified  by  the  startkey  and  endkey.

Friday, April 26, 13

Page 25: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

ye@@couchbase.com u::4

[email protected] u::3

?key=”[email protected]”  

Match  a  Single  Index-­‐Key

Friday, April 26, 13

Page 26: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

ye@@couchbase.com u::4

[email protected] u::3

?keys=[“[email protected]”,“[email protected]”]

Query  Mul@ple  in  the  Set  (Array  Nota@on)

Friday, April 26, 13

Page 27: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

How  it  works  ?

Friday, April 26, 13

Page 28: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

COUCHBASE  SERVER    CLUSTER

Indexing  and  Querying  

User  Configured  Replica  Count  =  1

ACTIVE

Doc  5

Doc  2

Doc

Doc

Doc

SERVER  1

REPLICA

Doc  4

Doc  1

Doc  8

Doc

Doc

Doc

APP  SERVER  1

COUCHBASE  Client  LibraryCLUSTER  MAP

COUCHBASE  Client  LibraryCLUSTER  MAP

APP  SERVER  2

Doc  9

• Indexing  work  is  distributed  amongst  nodes

• Large  data  set  possible

• Parallelize  the  effort

• Each  node  has  index  for  data  stored  on  it

• Queries  combine  the  results  from  required  nodes

ACTIVE

Doc  5

Doc  2

Doc

Doc

Doc

SERVER  2

REPLICA

Doc  4

Doc  1

Doc  8

Doc

Doc

Doc

Doc  9

ACTIVE

Doc  5

Doc  2

Doc

Doc

Doc

SERVER  3

REPLICA

Doc  4

Doc  1

Doc  8

Doc

Doc

Doc

Doc  9

Query

Friday, April 26, 13

Page 29: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Couchbase  Server  2.0:  Views

• Views  can  cover  a  few  different  use  cases­ Primary  Index  

­ Simple  secondary  indexes  (the  most  common)

­ Complex  secondary,  ter@ary  and  composite  indexes

­ Aggrega@on  func@ons  (reduc@on)

• Example:  count  the  number  of  “North  American  Ales”

­ Organizing  related  data

• Built  using  Map/Reduce­ Map  func@on  creates  a  matrix  from  document  fields

­ Reduce  func@on  summarizes  (reduces)  informa@on

Friday, April 26, 13

Page 30: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Distributed  Index  Build  Phase

• Op)mized  for  lookups,  in-­‐order  access  and  aggrega)ons

• All  view  reads  from  disk  (different  performance  profile)

• View  builds  against  every  document  on  every  node­ This  is  why  you  should  group  them  in  a  design  document

• Automa)cally  kept  up  to  date­ “Incremental”  Map  Reduce

Friday, April 26, 13

Page 31: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Dynamic  Range  Queries  with  Op5onal  Aggrega5on

•Efficiently  fetch  an  row  or  group  of  related  rows.•Queries  use  cached  values  from  B-­‐tree  inner  nodes  when  possible•Take  advantage  of  in-­‐order  tree  traversal  with  group_level  queries

Doc  4

Doc  2

Doc  5

SERVER  1

Doc  6

Doc  4

SERVER  2

Doc  7

Doc  1

SERVER  3

Doc  3

Doc  9

Doc  7

Doc  8 Doc  6

Doc  3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc  9

Doc  5

DOC

DOC

DOC

Doc  1

Doc  8 Doc  2

Replica  Docs Replica  Docs Replica  Docs

Ac@ve  Docs Ac@ve  Docs Ac@ve  Docs

?startkey=“J”&endkey=“K”

{“rows”:[{“key”:“Juneau”,“value”:null}]}

Friday, April 26, 13

Page 32: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Append  Only  Index

• Disk  acDvity  is  slow

• UpdaDng  disk  blocks  is  very  slow

• Appending  new  data  to  the  end  of  the  current  file  is  fast

• Overhead  of  reverse  reading  is  small

• Because  exisDng  blocks  are  not  re-­‐used,  can  lead  to  fragmentaDon­ Couchbase  will  compact  the  index  automa@cally

DocView

Processor Disk

DocView

Processor

Changed Documents

Appended

Original

Friday, April 26, 13

Page 33: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Adding  a  new  Document

A-R15

I-R8

M-R5

A B C D F G H I K L N O Q R

A-C3

D-F2

G-H2

I-L3

N-R4

A-H7

I-R7

A-R14

M

new root

new key

new reductions

Friday, April 26, 13

Page 34: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

What  about  Reduce  ?

• Out  of  the  box  func)ons  :­ _count()

­ _sum()

­ _stats()

• Create  your  own  if  neededfunction(key, values, rereduce) { if (rereduce) { var result = 0; for (var i = 0; i < values.length; i++) { result += values[i]; } return result; } else { return values.length; }}

Friday, April 26, 13

Page 35: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Reduce  Func)on

• Key  and  Arrays  of  values  as  parameters

•WriVen  Javascript

• Called  aner  the  map  func)on

• Used  to  reduce  the  result  of  a  map  of  single  values

• Used  with  grouping• Could  be  ignored  when  querying­ reuse  the  index

Friday, April 26, 13

Page 36: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

•Map()  Result

• Reduce()

• Result

Reduce  in  Ac)onKey Value

Belgian-­‐Style  Dubbel 1

Belgian-­‐Style  Dubbel 1

Belgian-­‐Style  Dubbel 1

Belgian-­‐Style  Pale  Ale 1

Belgian-­‐Style  White 1

Belgian-­‐Style  White 1

... ...

_count()

Key Value

Belgian-­‐Style  Dubbel 3

Belgian-­‐Style  Pale  Ale 1

Belgian-­‐Style  White 2

Friday, April 26, 13

Page 37: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

How  to  use  it?

• Use  client  SDK  to  call  the  view:

View view = client.getView("beer", "by_name");Query query = new Query(); query.setIncludeDocs(true)     .setLimit(20)     .setRangeStart(ComplexKey.of(startKey))     .setRangeEnd(ComplexKey.of(startKey + "\uefff"));

ViewResponse result = client.query(view, query); for(ViewRow row : result) { ....}

Friday, April 26, 13

Page 38: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Demonstra)on

Friday, April 26, 13

Page 39: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

≠Hadoop  &  Couchbase

• Deal  with  “Big  Data”

• “More”  is  be)er  than  “Faster”

• Batch  Oriented

• Usually  used  to  “extract/transform”  data

• Fully  distributed

­ Map,  Shuffle,  Reduce

• Distributed  

• Executed  where  the  document  is

• Deal  with  “indexing”  data  

• As  fast  as  possible

• Use  to  query  the  data  in  the  Database

Friday, April 26, 13

Page 40: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Map  Reduce  in  Couchbase

• Like  many  other  NoSQL  Database  :  Used  for  queries  !  

• Index  are  distributed  on  each  node  of  the  cluster• Index  are  updated  Incrementally

•Write  you  Map  Reduce  in  Javascript

Friday, April 26, 13

Page 41: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Thank  [email protected]

@tgrall

Get  Couchbase  Server  at  hEp://www.couchbase.com/download

Friday, April 26, 13

Page 42: NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0

Friday, April 26, 13