Upload
zimmerman-zimmerman
View
1.656
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Presentation given during Berlin Buzzwords 2011.Talk on project ZieOok: building a generic recommendation platform on top of Mahout an Hadoop.
Citation preview
ZieOok (‘AlsoSee’) building a generic recommendation framework for
the cultural heritage field
by Siem Vaessen - managing partner @ Zimmerman & ZimmermanBerlin Buzzwords 2011
About the images for the future
Preserving audiovisual heritage of the Netherlands through conservation and digitization;
Seven year project
Budget of €154 million;
, started in 2007, will end in 2014
During the project, a total of 137.200 hours of video, 22.510 hours of film, 123.900 hours of audio, and 2.9 million photos from these archives will be restored, preserved, digitized, and
So what to do with all this data?besides digitization...
disclosed through various services.
Current status
more info @ http://imagesforthefuture.com/en/
+ loads of interfaces, applications and tools built on top of this content
Main purpose ZieOok (‘AlsoSee’)
“To create meaningfull relations between assets and users by means of a recommendation engine” (june 2009)
Build an API which will fully function based on REST calls on top of the Mahout/Hadoop setup;Develop a recommendation framework based on an existing framework;
Develop an administrator dashboard: a central hub for controlling main components of the recommendation framework (GUI);
Code developed within ZieOok needs to become open-source.
Long tail
Bringing niche content to users
The ‘market-analysis’
Identify codebase that is suitable for the project;
Make sure that codebase is sustainable.
Question: can a semantic correlation be established within the project?
1. Lexicon- or ontology based (connecting Thesauri);2. A Trust network based sytem based on the FOAF (Friend of a Friend) specification;3. Context-adaptable system that extracts addtional information from the lexicon or the ontology.
Two frameworks identified
“Duine Framework is a (collection of) software libraries that allows developers to create prediction engines.”
Telematica Instituut/Novay / version 4.0.0.0 RC1 (17/2/09)
Apache Lucene Mahout (fka: Taste)
At that time version 0.2;An Apache foundation project;
2.0 version of the Apache License.
Choice made!and now for the actual work...
Core concept ZieOok
Technical architecture ZieOok
Rails ‘front-end’ structure
!
ZieOok datamodel: FOAF
Friend of a Friend specification. (http://www.foaf-project.org/)
!
!
<foaf:person> <foaf:gender /> <foaf:age /> <foaf:knows /> <foaf:based_near /> <foaf:made rdf:resource=”some-rating-uri” /></foaf:Person>
<zieook:rating> <foaf:maker rdf:resource=”some-user-uri” /> <foaf:Document rdf:resource=”item-uri” /> <rdf:DateTime /> <zieook:value /> <zieook:range /> <zieook:source rdf:resource=”source-uri” /> <zieook:recom rdf:resource=”recommender-uri /></zieook:rating>
ZieOok Dashboard: central hub
Import- and train collections of content-providers;
Grant access to Dashboard for content-providers;
Create recommenders;
Create templates for recommenders;
Provide statistics;Provide a HTML widget for simple usage on blogs etc.;
Provide a REST API to build GUI’s and recommendations.
Set filters to recommendations (date-limit, use subparts of collections only)
Collections, users and ratings
Twofold way:
1.using OAI PMH (Open Archives Iniative - Protocol Metadata Harvesting)
http://anyplace.org/OAI?verb=GetRecord&identifier=oai:arXiv.org:hep-th/9901001&metadataPrefix=oai_czp
+ collections are updated by content-provider;
- no user information stored in OAI however, specific ZieOok job;
2.use the Movielens format
+ have a variety of connectors available (aoi_dc -Dublin Core-);
add collection file;add ratings file;add user file;
+ ‘ideal start’: all data available from collection, users and ratings;- static, updates need to arrive from content-platform itself, no harvesting mechanism available.
- cold start problem: no information on ratings, nor users.
Recommendations
Two ways to render recommendations
1. Simple HTML widget
ZieOok created recommendation renders unstyled HTML:top 5 recommendation;like/dislike;
2. Call on the ZieOok REST API
Get full access from the ZieOok API to build custom recommenders
import/analyse/train data;
use REST calls to the ZieOok framework;
real-time;
Usecase
Connect Dutch Broadcasting Organisation (NPO) to ZieOok. (on-demand)
Recommend itemsRate items (like/dislike)See similar users & connect
Back-end: (Dashboard)
Front-end:
Set linear recommenders (in between 16:00-18:00, 18:00-20:00, 20:00-00:00)
Filters (limit date on content or only show category ‘sports,news’ within the collection )
Quality of recommendations
So what defines quality?Quality set by a gold standard;
But also define non-quality such as:
Currently an editorial process
Also see:
X
Roadmap
1.Bring ZieOok onstream: end of this month (June 2011);2.Release ZieOok REST API to the community (under discussion);
1.Maintain ZieOok Cluster for a 3 year period;
Short term
3.Connect content-platforms.
Long term
2.Hybrid recommender (recommend cross-platform);3.Identify risks in development and upgrades: Mahout API changes, Hadoop changes etc.
End of presentation / Q&Aby Siem Vaessen - managing partner @ Zimmerman & Zimmerman