Upload
oliver-kurowski
View
17.553
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Explains the Map/Reduce functions in CouchDB with examples, also corvers rereduce.
Citation preview
MAP/REDUCE IN COUCHDB
<- watch the race carOliver Kurowski, @okurow
Facts about Map/Reduce Programming paradigm, popularized and patented by Google Great for parallel jobs No Joins between documents In CouchDB: Map/Reduce in JavaScript (default) Also Possible with other languages
1. Map function builds a list of key/value pairs
2. Reduce function reduces the list ( to a single Value)
Oliver Kurowski, @okurow
Workflow
Simple Map Example A List of Cars
Step 1: Make a list, ordered by Price
Step 2: Result:
Oliver Kurowski, @okurow
Id: 2make: Audimodel: A4year: 2009price: 16.000
Id: 1make: Audimodel: A3year: 2000price: 5.400
Id: 3make: VWmodel: Golfyear: 2009price: 15.000
Id: 4make: VWmodel: Golfyear: 2008price: 9.000
Id: 5make: VWmodel: Poloyear: 2010price: 12.000
Function(doc) { emit (doc.price, doc.id);}
Key , Value 5.400 , 1 9.000 , 412.000 , 515.000 , 316.000 , 2
KeyValu
e
Original Map
startkey=10.000 & endkey=15.500
key=10.000
endkey=10.000
Oliver Kurowski, @okurow
Key , Value 12.000 , 515.000 , 4
Key , Value
Key , Value 5.400 , 1
All keys from
10.000 to < 15.500
Exact key, so
no result
All keys, less than 10.000
Querying MapsKey , Value 5.400 , 1 9.000 , 412.000 , 515.000 , 316.000 , 2
Map Function Has one document as input Can emit all JSON-Types as key and value:
- Special Values: null, true, false- Numbers: 1e-17, 1.5, 200- Strings : “+“, “1“, “Ab“, “Audi“
- Arrays: [1], [1,2], [1,“Audi“,true]
- Objects: {“price“:1300,“sold“:true} Results are ordered by key ( or revers)
(order with mixed types: see above) In CouchDB: Each result has also the doc._id
Oliver Kurowski, @okurow
{"total_rows":5,"offset":0,"rows":[ {"id":"1","key":"Audi","value":1}, {"id":"2","key":"Audi","value":1}, {"id":"3","key":"VW","value":1}, {"id":"4","key":"VW","value":1}, {"id":"5","key":"VW","value":1} ]}
Reduce Function Has arrays of keys and values as input Should reduce the result of a map to a single value Javascript (Other languages possible) In CouchDB: some simple built-in native erlang functions
(_sum,_count,_stats) Is automaticaly called after the map-function has finished Can be ignored with “reduce=false“ Is needed for grouping
Oliver Kurowski, @okurow
Simple Map/Reduce Example A List of Cars
Step 1: Make a map, ordered by make
Result:
Oliver Kurowski, @okurow
Id: 2make: Audimodel: A4year: 2009price: 16.000
Id: 1make: Audimodel: A3year: 2000price: 5.400
Id: 3make: VWmodel: Golfyear: 2009price: 15.000
Id: 4make: VWmodel: Golfyear: 2008price: 9.000
Id: 5make: VWmodel: Poloyear: 2010price: 12.000
Function(doc) { emit (doc.make, 1);}
Key , Value Audi , 1 Audi , 1 VW, 1 VW, 1 VW, 1
KeyValue=1
Simple Map/Reduce Example Result:
Step 2: Write a “sum“-reduce
Result:
Oliver Kurowski, @okurow
function(keys,values) { return sum(values);}
Key , Value Audi , 1 Audi , 1 VW , 1 VW , 1 VW , 1
Key , Value null , 5
Simple Map/Reduce Example Step 3: Querying
- key=“Audi“
Step 4: Grouping by keys- group=true
Step 5: Use only the map Function- reduce=false
Oliver Kurowski, @okurow
Key , Value null , 2
Key , Value Audi , 2 VW , 3
Key , Value Audi , 1 Audi , 1 VW , 1 VW , 1 VW , 1
Like having
no reduce-function
Array-Key Map/Reduce Example A List of cars (again)
Step 1: Make a map, with array as key
Result (with group=true):
Oliver Kurowski, @okurow
Id: 2make: Audimodel: A4year: 2009price: 16.000
Id: 1make: Audimodel: A3year: 2000price: 5.400
Id: 3make: VWmodel: Golfyear: 2009price: 15.000
Id: 4make: VWmodel: Golfyear: 2008price: 9.000
Id: 5make: VWmodel: Poloyear: 2010price: 12.000
Function(doc) { emit ([doc.make,doc.model,doc.year], 1);}
Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1
Array-Key Map/Reduce Querying startkey=[“Audi“]
( &group=true)
startkey=[“VW“]( &group=true)
endkey=[“VW“](&group=true)
Oliver Kurowski, @okurow
Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1
Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1
Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1
Remember:Endkey is not in resultlist
Array-Key Map/Reduce Ranges Step 4: Range queries:
- startkey=[“VW“,“Golf“]- endkey= [“VW“,“Polo“]- (&group=true)
What, if we do not know the next model after Golf ?- startkey=[“VW“,“Golf“]- endkey=[“VW“,“Golf“,99999] - (&group=true)
- better: endkey=[“VW“,“Golf“,{}]
Oliver Kurowski, @okurow
Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1 Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1
Grouping with group_level group=true
(aka group_level=exact)
group_level=1(no group=true needed)
group_level=2(no group=true needed)
group_level=3 -> group_level=exact -> group=true
Oliver Kurowski, @okurow
Key , Value [Audi, A3, 2000] , 1[Audi, A4, 2009] , 1 [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1[VW, Polo, 2010] , 1
Key , Value [Audi] , 2[VW] , 3
Key , Value [Audi, A3] , 1[Audi, A4] , 1 [VW, Golf] , 2[VW, Polo] , 1
Examples: Get all car makes:
- group_level=1
Get all models from VW:- startkey=[“VW“]&endkey=[“VW“,{}]&group_level=2
Get all years of VW Golf:- startkey=[“VW“,“Golf“]&endkey=[“VW“,“Golf“,{}]&group_level=3
Oliver Kurowski, @okurow
Key , Value [VW, Golf, 2008] , 1[VW, Golf, 2009] , 1
Key , Value [Audi] , 2[VW] , 3
Key , Value [VW, Golf] , 2[VW, Polo] , 1
Reduce / Rereduce: A rule to use reduce-functions:
The input of a reduce function does not only accept the result of a map, but also the result of itself
Why ? A reduce function can be used more than just once
If the map is too large, then it will be split and each part runs through the reduce function, finally all the results run through the same reduce function again.
Oliver Kurowski, @okurow
Function(doc) { emit (doc.make,1);}
Key , Value Audi , 2VW , 3
function(keys,values) { return sum(values);}
Key , Value null , 5
WTF ?
Oliver Kurowski, @okurow
Reduce / Rereduce: Example for counting values( Will produce wrong result !)
Oliver Kurowski, @okurow
Key , Value 1 , 12 , 10
3 , 4…999 , 71000 , 12
function(keys,values) { return count(values);}
function(keys,values) { return count(values);}
Key , Value 1 , 12 , 10 …333 , 23
Key , Value 334 , 15335 , 99 …666 , 82
Key , Value 667 , 18668 , 149…1000 , 12
function(keys,values) { return count(values);}
function(keys,values) { return count(values);}
Key , Value null , 333
Key , Value null , 333
Key , Value null , 333
Key , Value null , 3
Boom !3 != 1000
Split
function(keys,values) { return count(values);}
Reduce / Rereduce: Solution: The rereduce-Flag (not mentioned yet)
- indicates, wether the function is called first or not. Set by CouchDB
Oliver Kurowski, @okurow
Key , Value 1 , 12 , 10
3 , 4…999 , 71000 , 12
…if(rereduce==false) { return count(values);
…else{ return sum(values)}
Key , Value 1 , 12 , 10 …333 , 23
Key , Value 334 , 15335 , 99 …666 , 82
Key , Value 667 , 18668 , 149…1000 , 12
…if(rereduce==false) { return count(values);
…if(rereduce==false) { return count(values);
Key , Value null , 333
Key , Value null , 333
Key , Value null , 334
Key , Value null , 1000
Correct
Split
function(keys ,values, rereduce) { if(rereduce==false) { return count(values); }else{ return sum(values);}
rereduce=false rereduce=true
Input of a reduce function: The map:
The function:
Input Values 1 (rereduce=false):- keys:- values:- rereduce:
Input Values 2 (rereduce=true):- keys:- values:- rereduce:
Oliver Kurowski, @okurow
function(keys ,values, rereduce) { return sum(values);}
Doc._id , Key , Value 4 , “Audi“ , 12.000 2 , “BMW“ , 20.000 1 , “Citroen“ , 9.000 3 , “Dacia“ , 6.500
[ [“Audi“,4],[“BMW“,2],[“Citroen“,1],[“Dacia“,3] ]
[ 12.000,20.000,9.000,6.500]
false
null
[47.500]
true
Where does Map/Reduce live ? Map/Reduce functions are stored in a design document
in the “views“ key:
Map/reduce functions start when a view is called:
Oliver Kurowski, @okurow
{ “_id“:“_design/example“, “views“: { “simplereduce“: { “map“: “function(doc) { emit(doc.make,1); }“, “reduce“: “function (keys, values) { return sum (values); }“ } }}
http://localhost:5984/mapreduce/_design/example/_view/simplereducehttp://localhost:5984/mapreduce/_design/example/_view/simplereduce?key=“Audi“http://localhost:5984/mapreduce/_design/example/_view/simplereduce?key=“VW“&group=true
View calling All documents in the database are called by a view once After the first call: Only new and changed docs are called by
the function when calling the view again The results are stored in CouchDB internal B+tree The result, that you receive is the stored B+tree result
That means: If a view is called first, it could take a little time to build the tree before you get the results.If there are no changes to docs, the next time you call, the result is presented instantly
Key queries like startkey and endkey are performed on the B+tree result, no rebuild needed
There are serveral parameters for calling a view:limit, skip, include_docs=true, key, startkey, endkey, descending, stale(ok,update_after),group, group_level, reduce (=false)
Oliver Kurowski, @okurow
View calling parameters limit: limits the output skip: skips a number of documents include_docs=true: when no reduce, docs are sent with the
map-list key, startkey,endkey: should be known now startkey_docid=x: only docs with id>=x endkey_docid=x: only docs with id<x descending=true: reverse order. When using start/endkey, they
must be changed Stale=ok: do not start indexing, just deliver the stored result Stale=update_after: deliver old results, start indexing after that Group, group_level,reduce=false: should be known
Oliver Kurowski, @okurow
You‘ve made it !
Oliver Kurowski, @okurow