Upload
jeroen-reijn
View
112
Download
0
Tags:
Embed Size (px)
DESCRIPTION
These slides were from my Hippo GetTogether 2013 presentation. During this presentation I went into detail about the architecture behind our high performance relevance platform. The talk will also cover why we chose CouchBase for storage and how Elasticsearch can be used for search and analytics. I shared how we integrated and leverage both products full-circle from within our Hippo CMS product.
Citation preview
Building a relevance platform with Couchbase and
Elasticsearch
Hippo GetTogether, 21 June 2013Jeroen Reijn | @jreijn | #hgt2013
Hippo GetTogether 2013
follow the Hippo trail
follow the Hippo trail
Hippo GetTogether 2013
About me
• Architect @ Hippo
• DevOps guy
• Blogger @ http://blog.jeroenreijn.com
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Relevance?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
“The capability of a search engine or function to
retrieve data appropriate to a user's needs.”
http://www.thefreedictionary.com/relevance
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
How we deliver relevant content
@Hippo
follow the Hippo trail
Hippo GetTogether 2013
Registration
Visitor - entity making HTTP requests
Collector - records data about a visitor or his behavior
Example: location collector (GeoIPCollector)
Targeting Data - all data about a specific visitor
Example: IP address is located in Amsterdam
follow the Hippo trail
Hippo GetTogether 2013
MatchingCharacteristic - a type of fact about visitors
Example: "comes from a city", "experiences a type of weather"
Target Group - the specification of a Characteristic
Example: "comes from a European city", "comes from Amsterdam"
Persona - one or more target groups that describe a certain type of visitor
Example: "Jim, the European urban consumer",
"Alice, the Pet owner"
follow the Hippo trail
Hippo GetTogether 2013
What do we store?Request log
Targeting data
Statistics
Averages, e.g. how many visitors became which persona
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
BIG DATA !!
follow the Hippo trail
Hippo GetTogether 2013
Real-time analysis
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoArchitecture
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
XMLJSON (X)HTML
follow the Hippo trail
Hippo GetTogether 2013
Delivery Tier
URL Matching
Fetch content
Compose output
Request
Response
Request
follow the Hippo trail
Hippo GetTogether 2013
Delivery Tier
URL Matching
Targeting Data Collection
Compose output
Request
Response
Request
Fetch content
Scoring
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoScaling
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Hippo Delivery Tier
Hippo Repository
App server
Hippo Delivery Tier
Hippo Repository
App server
Scaling out
follow the Hippo trail
Hippo GetTogether 2013
RDBMS
Delivery Tier
Repository
App server
Delivery Tier
Repository
App server
Scaling out
TargetingDatastore
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoWhat kind of ‘storage’?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoQuestion?
follow the Hippo trail
Hippo GetTogether 2013
Distributed Cache?
follow the Hippo trail
Hippo GetTogether 2013
We have a winner!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Requirements change!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoNoSQL to the rescue
follow the Hippo trail
Hippo GetTogether 2013
Suitable types• Key-value store
• Document database
follow the Hippo trail
Hippo GetTogether 2013
Assessment Criteria
Maturity Data model
Consistency model
PerformanceReplication
Caching model Query model
Monitoring
Scalability
Reliability
Support
follow the Hippo trail
Hippo GetTogether 2013
Selection Criteria• Performance
• Scalability
• Schema flexibility
• Simplicity
• Monitoring
• Support
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Performance !!
Performance !!!!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoScalability
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoSchema flexibility
follow the Hippo trail
Hippo GetTogether 2013
{ "visitorId": "7a1c7e75-8539-40", "pageUrl": "http://localhost:8080/site/news", "pathInfo": "/news", "remoteAddr": "127.0.0.1", "referer": "http://localhost:8080/site/", "timestamp": 1371419505909, "collectorData": { "geo": { "country": "", "city": "", "latitude": 0, "longitude": 0 }, "returningvisitor": false, "channel": "English Website" }, "personaIdScores": [], "globalPersonaIdScores": []}
Request log document
follow the Hippo trail
Hippo GetTogether 2013
{ "geo": { "collectorId": "geo", "city": "", "country": "", "latitude": 0, "longitude": 0 }, "channel": { "collectorId": "channel", "channels": [ "English Website" ], "lastVisitedChannel": "English Website" }}
Visitor document
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoSimplicity
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoMonitoring
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoSupport
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoCouchbase
follow the Hippo trail
Hippo GetTogether 2013
Why Couchbase?
• Drop-in replacement for memcached
• Read/Write-through cache
• High throughput
• Easy scalability
• Schema flexibility
• Low latency
follow the Hippo trail
Hippo GetTogether 2013
Couchbase
• Open Source
• Document-oriented
• Easy Scalable
• Consistent High Performance
• Apache license
follow the Hippo trail
Hippo GetTogether 2013
Performance
• Object managed cache
• Write Queue to disk
• Avoids Cold Cache
follow the Hippo trail
Hippo GetTogether 2013
Source: http://www.slideshare.net/Couchbase/benchmarking-couchbase Copyright © Altoros Systems, Inc.
follow the Hippo trail
Hippo GetTogether 2013
Easy scalable
• Auto sharding
• Cross cluster replication (XDCR)
• Master - Master replication
follow the Hippo trail
Hippo GetTogether 2013
Flexible data model
• Native JSON support
• Incremental Map Reduce
• Gives power to the developer
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
How we run Couchbase @Hippo
follow the Hippo trail
Hippo GetTogether 2013
Load Balancer
Database cluster
Hippo Delivery Tier Couchbase cluster
•Request log data•Targeting data•Statistics data
follow the Hippo trail
Hippo GetTogether 2013
Query capabilities• Querying via views
• Secondary indexes via views
• Views based on Map - Reduce
• Lacks some advanced query capabilities
follow the Hippo trail
Hippo GetTogether 2013
Elasticsearch
• Apache Lucene
• Designed to be distributed
• Schema free
• Apache license
• RESTful API
follow the Hippo trail
Hippo GetTogether 2013
Added value of ES• Full text search
• Faceted search
• Geo spatial search
• All in (near) real-time
follow the Hippo trail
Hippo GetTogether 2013
Couchbase Server Cluster Elasticsearch Server Cluster
Hippo Delivery Tier
Java API
Wri
te
Rea
d
XDCR Couchbase ES Transport plugin
Replicating to ES
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoWhat’s Next?
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoWhat’s Next?
follow the Hippo trail
Hippo GetTogether 2013
Advanced analytics
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ GotoDemo time!
follow the Hippo trail
Hippo GetTogether 2013
OneHippo @ Goto
Thank you!
Questions?
[email protected] | @jreijn
ps. We’re hiring!