Upload
enterprisesearchmeetup
View
144
Download
0
Tags:
Embed Size (px)
Citation preview
Instant Search API
Build Unique Search Experiences
Sylvain UtardVP of Engineering
[email protected]@sylvainutard
Enterprise Search and Analytics
@algolia
Who am I?5 years @ Exalead, leading the core-engine & NLP teams
• C++ • ExaScript (RIP) • Java
2 years @ Algolia, VP of Engineering • C++ • Ruby • Java • and 10+ other languages…
@sylvainutard
@algolia
A hosted search API
@algolia
A hosted search API
@algolia
@algolia
A hosted search API
Replies in milliseconds
@algolia
A hosted search API
Replies in milliseconds
From anywhere
@algolia
A hosted search API
Replies in milliseconds
From anywhere With intuitive relevance
@algolia
Algolia Today
@algolia
800+ customers in 80+ countries
Algolia Today
@algolia
800+ customers in 80+ countries
40B+ Write operations per month
4B+ User-generated queries per monthAlgolia Today
@algolia
Algolia Today
13 locations
800+ customers in 80+ countries
40B+ Write operations per month
4B+ User-generated queries per month
@algolia
Performance is our DNA
@algolia
Speed matters
Half a second delaycaused 20% drop in traffic
Every 100ms of latencycosts them 1% in sales
@algolia
Behind the scene
@algolia
Unique set of constraintsHigh volume of Read & Write operations
@algolia
Unique set of constraintsHigh volume of Read & Write operations
High-availability
@algolia
Unique set of constraintsHigh volume of Read & Write operations
High-availability
Worldwide data distribution
@algolia
API Software StackStarted as a mobile offline SDK
Written in C++
Search code embedded in Nginx as a module
Indexing is done in a separate process
Two redis instances
@algolia
API Hardware
Fast CPU (Xeon E5 >3.5GHz)
In Memory (128GB)
Backed by High-end SSD in Raid-0 (800GB)
Specific kernel settings
@algolia
Scaling horizontally
Several clusters per location
A user is assigned to one master cluster
A user can be replicated to N replicate clusters
@algolia
What is a cluster
Master-Master
Stream of writes via Consensus
At least 3 machines
@algolia
A write in practice
One of the machines acceptthe write operation via the API (https)
/1/indexes/MyFirstIndex/batch
@algolia
A write in practice
The file is saved on the three machinesas a temporary file
tmp1265
tmp7864
tmp2357
@algolia
A write in practice
Launch the consensus by contactingthe RAFT master
startConsensus(tmp2357, tmp7864, tmp1265)
@algolia
A write in practice
1 -Master send the commit order to all nodes
2- Each node returns the next job ID to master
3- If there is a majority the file is committed
@algolia
A write in practice
Same job ID on all hosts
Send to slave replicate in parallel
Processed in parallel on all hostsjob42
job42
job42
@algolia
In case one host is down
Continue to accept writes
The two other hosts keep jobs
Jobs are sequential, will catch up at restartjob42job42
@algolia
Distribution
Replicate jobs, not the result
Send to all machines in parallel
Consistent with few seconds delay
@algolia
High availability
Multi-regions in one location
@algolia
High availability
13 fully independent locations
@algolia
Network Optimisations
API usage moving from servers to browser and mobile apps
Get close to end users
@algolia
Distributed Search Network - Worldwide Synchronization
@algolia
Distributed Search Network - Worldwide Synchronization
@algolia
• 13 locations = 25 datacenters • No ideal worldwide provider
• AWS is not in India, Eastern EU, Africa…
• Need to handle several providers
• Anticipate long deliveries / customs
• Keep as few providers as possible
Distributed Search Network - Worldwide Synchronization
@algolia
DNS is key
Used to find the closest location
Several DNS providers
Good anycast network
@algolia
API Clients
DNS health checks are not enough
Smart retry logic in all our API Clients
@algolia
Analytics• What are my users searching for?
• Top search
• Top search without hits
• Top refinements
• From where do they search for?
@algolia
@algolia
@algolia
Analytics
• Billions of user-generated queries per month
• As-you-type aggregation
• ~3 months retentions
• Storing all of them in…
@algolia
Analytics
• Elasticsearch \o/
• … without FTS :)
• but with aggregations
@algolia
Analytics• No FTS
• No source
• Doc values everywhere
• SSD only
• Custom aggregations
(deprecated since ES 1.1.0)
@algolia
Top-k Aggregation• Before
• Linear memory consumption
• Exhaustivity
• After
• Constant memory consumption
• Approximative but enough
@algolia
Building your worldwide infra- Is long and difficult quest - Is a real asset & differentiator
The Future of APIs is Distributed
@algolia
All the details of our architecture are on HighScalability.com
Want to know more?
THANK YOU!
[email protected] @algolia
Build Unique Search ExperiencesWe are hirin
g in SF, NYC and Paris 😊