Upload
volodymyr-zhabiuk
View
1.125
Download
4
Tags:
Embed Size (px)
DESCRIPTION
The techtalk @LinkedIN
Citation preview
Sensei
Volodymyr Zhabiuk
Agenda
1. History and motivation
2. High level architecture
3. Data guarantees
4. Features detailed overview
5. Quick demo
What is Sensei
� search engine and database
� Built on top of Lucene
� Full text search, relevance, faceting
� Distributed, horizontally scalable
History
• Technology stack for LinkedIn.com's search, analytics and homepage
• Open sourced in 2009, first 1.0.0 release February 2012
• https://github.com/linkedin/sensei
• http://senseidb.com
� sensei-search Google group
� Used by Xiaomi, several other OS deployments
Why yet another Lucene based search engine?
Why yet another Lucene based search engine?
• Indexing elevates query latency • Hard to distribute
Why yet another Lucene based search engine?
• Indexing elevates query latency • Hard to distribute
• Large memory overhead • Comparatively slow
Why yet another Lucene based search engine?
• Indexing elevates query latency • Hard to distribute
• Large memory overhead • Comparatively slow
SenseiDB • Designed for LinkedIn search use cases and the Homepage
Motivation • Indexing/Query isolation
• Structured vs. unstructured data (e.g. fulltext search support)
• Faceted search
Motivation • Indexing/Query isolation
• Structured vs. unstructured data (e.g. fulltext search support)
• Faceted search
• Business intelligence
Sensei’s features • Fast updates
• Rich query language - BQL
• Fulltext and faceted search
• Distributed and elastic
• Indexing and search customization
• In memory M/R
What Sensei doesn’t do � Transactions and OLTP
� Dynamic shard rebalancing
� Multi tenancy and table joins
� Dynamic schema
Volume
� 5-100 mln documents per node
� ~300K updates per minute
� Query latency < 100 ms
Deployments � Search engine for SeaS
� Backend for USCP– 400 nodes
� >6 deployments in the team $
� Other companies(2 deployments at Xiaomi)
Sensei’s technologies
Lucene
Sensei
Sensei’s technologies
Zoie
Lucene
Sensei
Sensei’s technologies
Zoie
Lucene
Bobo
Sensei
Sensei’s technologies
Zoie
Lucene
Bobo Norbert
Zookeeper
Sensei
Vocabulary
Node Shard/Partition Replica
Vocabulary
Node Shard/Partition Replica
High level architecture
Data injection
Sensei node
Gateway
Kafka RabbitMQ Databus JDBC
Event w/ version
Get events with version bigger than the existing
Data guarantees • Availability - replications
• Eventually consistent across replications
• Write durability - data stream
• Write consistency - data stream
Configuration � schema.xml
� Indexed fields,
� forward index customization
� sensei.properties � ports, plugins, zookeeper urls, etc
Features
Lucene realtime extension
Disk Index
Realtime updates • Updates are seen right away < 1s upon inserting
• Handles deletes and updates
• Indexing latency stable as index size grows
• Incremental and balanced segment merges
Hourglass(Time Series)
Offline indexing and archive • Efficient M/R indexing generation on Hadoop over
ETL'd data
• Bootstrap from HDFS
Query Engine - Bobo • Query planning/optimization
• Access to both inverted and forward data structures
• High performance faceting
• Dynamic sorting
• Dynamic relevance support
• Map/Reduce analytics engine
Bobo(cont.)
Lucene segment Lucene segment Lucene segment
Custom (forward) index
Custom (forward) index
Custom (forward) index
Result
Sensei API - BQL
SELECT color, category, year, makemodel FROM cars WHERE NOT MATCH(color, category) AGAINST("*van") GROUP BY category TOP 1 LIMIT 1000
Dynamic relevance SELECT * FROM cars WHERE price > 2000.00 USING RELEVANCE MODEL my_model (favoriteColor:"black", favoriteTag:"cool") DEFINED AS (String favoriteColor, String favoriteTag) BEGIN float boost = 1.0; if (tags.contains(favoriteTag)) boost += 0.5; if (color.equals(my_color)) boost += 1.2; return _INNER_SCORE * boost; END
Partial updates � Storing data outside of Lucene
� High update rate
� Perfect for counters
Sensei in memory M/R
Broker
Node1
Node2
Sensei in memory M/R
Broker
Node1
Node2
Lucene segments
map(IntArray docs, FieldAccessor, FacetCountAccessor)
Sensei in memory M/R
Broker
Node1
Node2
Lucene segments
map(IntArray docs, FieldAccessor, FacetCountAccessor)
Sensei in memory M/R
Broker
Node1
Node2
Lucene segments
List<MapResult> combine(List<MapResult>)
Sensei in memory M/R
Broker
Node1
Node2
Node1
Node1
Lucene segments
List<MapResult> combine(List<MapResult>)
Sensei in memory M/R
Broker
Node1
Node2
Node1
Node1
Lucene segments
Broker
JSONObject reduce(List<MapResult>)
� select distinctCount(memberId), sum(clickCount) where geo = ‘US/CA/SF’ group by seniority, age
Sensei in memory M/R
Roadmap • Just finished
o Sensei aggregation functions
o Map/Reduce analytics engine
• Plan o Goshawk – for business inteligence (WVMP v2, LI
Impressions)
o Zoie Redesign to support fixed length in memory segments
Sensei tweets demo
Questions?
� SeaS Homepage: http://go/seas
� Questions: ask_seas@
� Sensei homepage: senseidb.com
� Sensei Google group: sensei-search