23
MAP/REDUCE IN COUCHDB <- watch the race car Oliver Kurowski, @okurow

CouchDB Map/Reduce

Embed Size (px)

DESCRIPTION

Explains the Map/Reduce functions in CouchDB with examples, also corvers rereduce.

Citation preview

Page 1: CouchDB Map/Reduce

MAP/REDUCE IN COUCHDB

<- watch the race carOliver Kurowski, @okurow

Page 2: CouchDB Map/Reduce

Facts about Map/Reduce Programming paradigm, popularized and patented by Google Great for parallel jobs No Joins between documents In CouchDB: Map/Reduce in JavaScript (default) Also Possible with other languages

1. Map function builds a list of key/value pairs

2. Reduce function reduces the list ( to a single Value)

Oliver Kurowski, @okurow

Workflow

Page 3: CouchDB Map/Reduce

Simple Map Example A List of Cars

Step 1: Make a list, ordered by Price

Step 2: Result:

Oliver Kurowski, @okurow

Id: 2make: Audimodel: A4year: 2009price: 16.000

Id: 1make: Audimodel: A3year: 2000price: 5.400

Id: 3make: VWmodel: Golfyear: 2009price: 15.000

Id: 4make: VWmodel: Golfyear: 2008price: 9.000

Id: 5make: VWmodel: Poloyear: 2010price: 12.000

Function(doc) { emit (doc.price, doc.id);}

Key , Value 5.400 , 1 9.000 , 412.000 , 515.000 , 316.000 , 2

KeyValu

e

Page 4: CouchDB Map/Reduce

Original Map

startkey=10.000 & endkey=15.500

key=10.000

endkey=10.000

Oliver Kurowski, @okurow

Key , Value 12.000 , 515.000 , 4

Key , Value

Key , Value 5.400 , 1

All keys from

10.000 to < 15.500

Exact key, so

no result

All keys, less than 10.000

Querying MapsKey , Value 5.400 , 1 9.000 , 412.000 , 515.000 , 316.000 , 2

Page 5: CouchDB Map/Reduce

Map Function Has one document as input Can emit all JSON-Types as key and value:

- Special Values: null, true, false- Numbers: 1e-17, 1.5, 200- Strings : “+“, “1“, “Ab“, “Audi“

- Arrays: [1], [1,2], [1,“Audi“,true]

- Objects: {“price“:1300,“sold“:true} Results are ordered by key ( or revers)

(order with mixed types: see above) In CouchDB: Each result has also the doc._id

Oliver Kurowski, @okurow

{"total_rows":5,"offset":0,"rows":[ {"id":"1","key":"Audi","value":1}, {"id":"2","key":"Audi","value":1}, {"id":"3","key":"VW","value":1}, {"id":"4","key":"VW","value":1}, {"id":"5","key":"VW","value":1} ]}

Page 6: CouchDB Map/Reduce

Reduce Function Has arrays of keys and values as input Should reduce the result of a map to a single value Javascript (Other languages possible) In CouchDB: some simple built-in native erlang functions

(_sum,_count,_stats) Is automaticaly called after the map-function has finished Can be ignored with “reduce=false“ Is needed for grouping

Oliver Kurowski, @okurow

Page 7: CouchDB Map/Reduce

Simple Map/Reduce Example A List of Cars

Step 1: Make a map, ordered by make

Result:

Oliver Kurowski, @okurow

Id: 2make: Audimodel: A4year: 2009price: 16.000

Id: 1make: Audimodel: A3year: 2000price: 5.400

Id: 3make: VWmodel: Golfyear: 2009price: 15.000

Id: 4make: VWmodel: Golfyear: 2008price: 9.000

Id: 5make: VWmodel: Poloyear: 2010price: 12.000

Function(doc) { emit (doc.make, 1);}

Key , Value Audi , 1 Audi , 1 VW, 1 VW, 1 VW, 1

KeyValue=1

Page 8: CouchDB Map/Reduce

Simple Map/Reduce Example Result:

Step 2: Write a “sum“-reduce

Result:

Oliver Kurowski, @okurow

function(keys,values) { return sum(values);}

Key , Value Audi , 1 Audi , 1 VW , 1 VW , 1 VW , 1

Key , Value null , 5

Page 9: CouchDB Map/Reduce

Simple Map/Reduce Example Step 3: Querying

- key=“Audi“

Step 4: Grouping by keys- group=true

Step 5: Use only the map Function- reduce=false

Oliver Kurowski, @okurow

Key , Value null , 2

Key , Value Audi , 2 VW , 3

Key , Value Audi , 1 Audi , 1 VW , 1 VW , 1 VW , 1

Like having

no reduce-function

Page 10: CouchDB Map/Reduce

Array-Key Map/Reduce Example A List of cars (again)

Step 1: Make a map, with array as key

Result (with group=true):

Oliver Kurowski, @okurow

Id: 2make: Audimodel: A4year: 2009price: 16.000

Id: 1make: Audimodel: A3year: 2000price: 5.400

Id: 3make: VWmodel: Golfyear: 2009price: 15.000

Id: 4make: VWmodel: Golfyear: 2008price: 9.000

Id: 5make: VWmodel: Poloyear: 2010price: 12.000

Function(doc) { emit ([doc.make,doc.model,doc.year], 1);}

Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1

Page 11: CouchDB Map/Reduce

Array-Key Map/Reduce Querying startkey=[“Audi“]

( &group=true)

startkey=[“VW“]( &group=true)

endkey=[“VW“](&group=true)

Oliver Kurowski, @okurow

Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1

Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1

Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1

Remember:Endkey is not in resultlist

Page 12: CouchDB Map/Reduce

Array-Key Map/Reduce Ranges Step 4: Range queries:

- startkey=[“VW“,“Golf“]- endkey= [“VW“,“Polo“]- (&group=true)

What, if we do not know the next model after Golf ?- startkey=[“VW“,“Golf“]- endkey=[“VW“,“Golf“,99999] - (&group=true)

- better: endkey=[“VW“,“Golf“,{}]

Oliver Kurowski, @okurow

Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1 Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1

Page 13: CouchDB Map/Reduce

Grouping with group_level group=true

(aka group_level=exact)

group_level=1(no group=true needed)

group_level=2(no group=true needed)

group_level=3 -> group_level=exact -> group=true

Oliver Kurowski, @okurow

Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1

Key , Value [Audi] , 2[VW] , 3

Key , Value [Audi, A3] , 1[Audi, A4] , 1 [VW, Golf] , 2[VW, Polo] , 1

Page 14: CouchDB Map/Reduce

Examples: Get all car makes:

- group_level=1

Get all models from VW:- startkey=[“VW“]&endkey=[“VW“,{}]&group_level=2

Get all years of VW Golf:- startkey=[“VW“,“Golf“]&endkey=[“VW“,“Golf“,{}]&group_level=3

Oliver Kurowski, @okurow

Key , Value [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1

Key , Value [Audi] , 2[VW] , 3

Key , Value [VW, Golf] , 2[VW, Polo] , 1

Page 15: CouchDB Map/Reduce

Reduce / Rereduce: A rule to use reduce-functions:

The input of a reduce function does not only accept the result of a map, but also the result of itself

Why ? A reduce function can be used more than just once

If the map is too large, then it will be split and each part runs through the reduce function, finally all the results run through the same reduce function again.

Oliver Kurowski, @okurow

Function(doc) { emit (doc.make,1);}

Key , Value Audi , 2VW , 3

function(keys,values) { return sum(values);}

Key , Value null , 5

Page 16: CouchDB Map/Reduce

WTF ?

Oliver Kurowski, @okurow

Page 17: CouchDB Map/Reduce

Reduce / Rereduce: Example for counting values( Will produce wrong result !)

Oliver Kurowski, @okurow

Key , Value 1 , 12 , 10

3 , 4…999 , 71000 , 12

function(keys,values) { return count(values);}

function(keys,values) { return count(values);}

Key , Value 1 , 12 , 10 …333 , 23

Key , Value 334 , 15335 , 99 …666 , 82

Key , Value 667 , 18668 , 149…1000 , 12

function(keys,values) { return count(values);}

function(keys,values) { return count(values);}

Key , Value null , 333

Key , Value null , 333

Key , Value null , 333

Key , Value null , 3

Boom !3 != 1000

Split

function(keys,values) { return count(values);}

Page 18: CouchDB Map/Reduce

Reduce / Rereduce: Solution: The rereduce-Flag (not mentioned yet)

- indicates, wether the function is called first or not. Set by CouchDB

Oliver Kurowski, @okurow

Key , Value 1 , 12 , 10

3 , 4…999 , 71000 , 12

…if(rereduce==false) { return count(values);

…else{ return sum(values)}

Key , Value 1 , 12 , 10 …333 , 23

Key , Value 334 , 15335 , 99 …666 , 82

Key , Value 667 , 18668 , 149…1000 , 12

…if(rereduce==false) { return count(values);

…if(rereduce==false) { return count(values);

Key , Value null , 333

Key , Value null , 333

Key , Value null , 334

Key , Value null , 1000

Correct

Split

function(keys ,values, rereduce) { if(rereduce==false) { return count(values); }else{ return sum(values);}

rereduce=false rereduce=true

Page 19: CouchDB Map/Reduce

Input of a reduce function: The map:

The function:

Input Values 1 (rereduce=false):- keys:- values:- rereduce:

Input Values 2 (rereduce=true):- keys:- values:- rereduce:

Oliver Kurowski, @okurow

function(keys ,values, rereduce) { return sum(values);}

Doc._id , Key , Value 4 , “Audi“ , 12.000 2 , “BMW“ , 20.000 1 , “Citroen“ , 9.000 3 , “Dacia“ , 6.500

[ [“Audi“,4],[“BMW“,2],[“Citroen“,1],[“Dacia“,3] ]

[ 12.000,20.000,9.000,6.500]

false

null

[47.500]

true

Page 20: CouchDB Map/Reduce

Where does Map/Reduce live ? Map/Reduce functions are stored in a design document

in the “views“ key:

Map/reduce functions start when a view is called:

Oliver Kurowski, @okurow

{ “_id“:“_design/example“, “views“: { “simplereduce“: { “map“: “function(doc) { emit(doc.make,1); }“, “reduce“: “function (keys, values) { return sum (values); }“ } }}

http://localhost:5984/mapreduce/_design/example/_view/simplereducehttp://localhost:5984/mapreduce/_design/example/_view/simplereduce?key=“Audi“http://localhost:5984/mapreduce/_design/example/_view/simplereduce?key=“VW“&group=true

Page 21: CouchDB Map/Reduce

View calling All documents in the database are called by a view once After the first call: Only new and changed docs are called by

the function when calling the view again The results are stored in CouchDB internal B+tree The result, that you receive is the stored B+tree result

That means: If a view is called first, it could take a little time to build the tree before you get the results.If there are no changes to docs, the next time you call, the result is presented instantly

Key queries like startkey and endkey are performed on the B+tree result, no rebuild needed

There are serveral parameters for calling a view:limit, skip, include_docs=true, key, startkey, endkey, descending, stale(ok,update_after),group, group_level, reduce (=false)

Oliver Kurowski, @okurow

Page 22: CouchDB Map/Reduce

View calling parameters limit: limits the output skip: skips a number of documents include_docs=true: when no reduce, docs are sent with the

map-list key, startkey,endkey: should be known now startkey_docid=x: only docs with id>=x endkey_docid=x: only docs with id<x descending=true: reverse order. When using start/endkey, they

must be changed Stale=ok: do not start indexing, just deliver the stored result Stale=update_after: deliver old results, start indexing after that Group, group_level,reduce=false: should be known

Oliver Kurowski, @okurow

Page 23: CouchDB Map/Reduce

You‘ve made it !

Oliver Kurowski, @okurow