Upload
indeedeng
View
5.506
Download
0
Tags:
Embed Size (px)
DESCRIPTION
@IndeedEnd March: Wednesday, March 27th Video available: http://www.youtube.com/watch?v=MeRHetCMiHg The goal of Indeed's aggregation engine is to find and retrieve every job in the world, as quickly and accurately as possible. As we described in our previous tech talk, we strive to build products that are simple, fast, comprehensive, and relevant. The world's most comprehensive job search site is fueled by the more than 35 million job postings we process every day, which we deliver to jobseekers within minutes of discovery. Our original aggregation architecture was implemented using standard patterns. Our growth required levels of scalability, performance, and resilience this architecture simply could not handle. In a case study of scaling for the web, we will discuss how we tackled this problem. We will cover the issues we saw with our original architecture, how we analyzed our options to guide a solution, how we used RabbitMQ as a key component in the new architecture, and benchmarks to evaluate how successful we were. Speaker Ketan Gangatirkar is the development manager responsible for Indeed's continuous deployment infrastructure as well as its aggregation system. Speaker Cameron Davison is a software engineer on the aggregation team at Indeed and a graduate of UT Austin. He re-architected Indeed's aggregation pipeline using RabbitMQ to sustain high write volumes, and continues to improve products in the aggregation system to make it run more efficiently.
Citation preview
How to Get a Job 35 Million Times a Day Using RabbitMQ
Ketan Gangatirkar and Cameron Davison
One search. All jobs.
Aggregation gets jobs
Aggregation gets jobs soJobseekers get jobs
Aggregation != Spidering
Spiders see pages.
Aggregation sees jobs.
How spiders see job sites
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
Page
How Indeed sees job sitesStart
Job List
Job Job Job
Job List
Job Job Job
Job List
Job Job Job
Navigation Navigation
JobJob
Job
Aggregation != Spidering
Job sites have structure
Job pages have semantics
Navigation is more than following links
Rememberthis
Aggevery
job
{ Url: http://www.applytracking.com/track.aspx/3VYzR Title: Senior Erlang Engineer Company: Machine Zone Location: Palo Alto,CA,US, 94301 Source Type: Employer Job Type: Full-time ... Description: The Senior Erlang Engineer is an integral ... ... Createdate: 2013-02-05 23:18:05 ...}
What's in a job
location
description
Company
Title
Title
salary
locationjob type
description
Company
How we build products
simple
fast
comprehensive
relevant
Simple
Tough problems, simple solutions
Fast
Discover the jobs quickly
Get them to jobseekers in minutes
10% of jobseekers sort by date
Do you want only new jobs?
20% of jobseekers want only new jobs
Daily new job emails
Speed matters
Comprehensive
Get every job
Relevant
Semantic extraction
The job is still available
Ignore non-jobs
This is a hard problem
Flaky sites
Site redesigns
Javascript
Missing or bad information
Big N makes it even harder
Examine 38M jobs every day
Do this in minutes
Search 100MJobseekersAggregation
EmployersJob BoardsStaffing firmsRecruiters
Strawman* architecture
Datacenter B
MySQL
Engine
Datacenter A
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Primary Datacenter
Limitations
N connections
MySQL
Job siteJob siteJob siteJob siteJob siteJob site
Primary Datacenter
EngineEngineEngineEngineEngineEngine
Datacenter BDatacenter A
N concurrent writers
MySQL
Job siteJob siteJob siteJob siteJob siteJob site
Primary Datacenter
EngineEngineEngineEngineEngineEngine
Datacenter BDatacenter A
High latency
MySQL
Job siteJob siteJob siteJob siteJob siteJob site
Primary Datacenter
EngineEngineEngineEngineEngineEngine
Datacenter BDatacenter A
Limitation: failure points
Datacenter B
MySQL
Engine
Datacenter A
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Engine
Job site
Primary Datacenter
X
X
Scaling Patterns
What has worked for us so far?
Service-Oriented Architecture
Engine
Engine
Engine
Job Write Service MySQL
RemoteDatacenter
PrimaryDatacenter
see http://go.indeed.com/boxcar
Standard Service Interaction
Client Service Database
Our Interaction
Client Service Database
Does this do what we need?
● Lots of workers...● Sending lots of results...● Over a long distance...● That need to get processed fast...● Reliably?
Engine Failure
Engine
Engine
Engine
Job Write Service MySQL
RemoteDatacenter
XPrimaryDatacenter
Engine failure fix:Buffer to disk
Engine
Engine
Engine
Job Write Service MySQL
RemoteDatacenter
disk
disk
disk
PrimaryDatacenter
X
Network Failure
Engine
Engine
Engine
Job Write Service MySQL
RemoteDatacenter
XPrimaryDatacenter
Network failure fix:Disks solve that too
Engine
Engine
Engine
Job Write Service MySQL
RemoteDatacenter
disk
disk
disk
XPrimaryDatacenter
Write Service Failure
Job Write Service MySQL
RemoteDatacenter
XEngine
Engine
Engine
PrimaryDatacenter
Write Service Failure fix:Disks solve that too
Job Write Service MySQL
RemoteDatacenter
XEngine
Engine
Engine
PrimaryDatacenter
disk
disk
disk
Write Service Failure fix:Redundancy
Job Write Service
MySQL
RemoteDatacenter
PrimaryDatacenter
XEngine
Engine
Engine
Job Write Service
Job Write Service
Database Failure
Job Write Service MySQL
RemoteDatacenter
XEngine
Engine
Engine
PrimaryDatacenter
Database Failure fix:Buffer to disk
Job Write Service
MySQL
RemoteDatacenter
XEngine
Engine
Engine
disk
PrimaryDatacenter
Our new architectureJob Write Service
MySQL
RemoteDatacenter
PrimaryDatacenter
Engine
Engine
Engine
disk
disk
disk
Job Write Service
Job Write Service
disk
disk
disk
We could build this...Job Write Service
MySQL
RemoteDatacenter
PrimaryDatacenter
Engine
Engine
Engine
disk
disk
disk
Job Write Service
Job Write Service
disk
disk
disk
... maybe someone already hasJob Write Service
MySQL
RemoteDatacenter
PrimaryDatacenter
Engine
Engine
Engine
disk
disk
disk
Job Write Service
Job Write Service
disk
disk
disk
We should use a message queue
Cameron Davison
Aggregation Requirements
● Durable
● Multi-Data Center (latency)
● 38 million jobs a day
● 2KB average job size○ 76 GB a day
● Target peaks of 1000 jobs / second
● Programming language agnostic
Selection
What we found
High Availability
Open Source/Free
Self-hosted
Performant
Out-of-the-box Experience
Advanced Message Queuing Protocol (AMQP)
● Open Standard
● Wire protocol
● Existing Clients in Multiple Languages
Concepts
● Confirmation and Ack
● At least once
● Asynchronous Confirms
● Persistent
● Clustering
Confirmation and Ack
MQ
Producer Consumer
msg
confi
rm
ackmsg
1
2 3
4
At least once
MQ
At most once
Consumer
Message
Ack
MQ ConsumerMessage
Auto Ack
Asynchronous Confirms1
2
3
4
5
6
7
8
9
1011
12
13
14
15
16
Producer
messages
confirm #6
Persistent
MQ
Producer Consumer
Persistent
MQ
Producer Consumer
Persistent
MQ
Producer Consumer
X
Persistent
MQ
Producer Consumer
Persistent
MQ
Producer Consumer
Clustering
SlaveMaster
Producer
1
2
3
4
Testing
Test RabbitMQ
● Send millions of 2KB messages
● 20 producers and 20 consumers
● 1000 messages / second
● Simulate multiple failures
Test Consistency
Producers
RabbitMQ
RabbitMQ
Consumers
Slave
Master
Test Consistency
Producers
RabbitMQ
RabbitMQ
Consumers
Master
Slave
Test Consistency
Producers
RabbitMQ
RabbitMQ
Consumers
Master
Slave
Test Consistency
Producers
RabbitMQ
RabbitMQ
Consumers
X
Master
Test Consistency
Producers
RabbitMQ
RabbitMQ
Consumers
X
Master
Test Consistency
Producers
RabbitMQ
RabbitMQ
Consumers
Master
Slave
RabbitMQ Clustering
Master Slave
RabbitMQ Clustering
Master Slave
RabbitMQ Clustering
Master
X
RabbitMQ Clustering
Master
X
RabbitMQ Clustering
MasterSlave
RabbitMQ Clustering
MasterSlave
RabbitMQ Clustering
MasterSlave
RabbitMQ Clustering
MasterSlave
RabbitMQ Clustering
MasterSlave
RabbitMQ Clustering
MasterSlave
RabbitMQ Clustering
Master
X
RabbitMQ Clustering
MasterSlave
RabbitMQ Clustering
MasterSlave
RabbitMQ Clustering
MasterSlave
RabbitMQ Clustering
Master
X
RabbitMQ Clustering
Master
XX
RabbitMQ Clustering
Master
X
RabbitMQ Clustering
Master
X
RabbitMQ Clustering
Master Slave
Non-persistent
15990 Messages / Second30 MB/s
Persistent
2781 Message / Second5.5 MB/s
Clustered and Persistent
1262 Message / Second2.5 MB/s
Applying RabbitMQ
Unreliable High Latency Connections
Engine
Engine
Engine
Job Write Service
Remote DC Primary DC
MySQL
Replaced with RabbitMQ
Engine
Engine
Engine
Job Write ServiceRabbit
MQ
Remote DC Primary DC
MySQL
Replaced with RabbitMQ
Engine
Engine
Engine
Job Write ServiceRabbit
MQ
Remote DC Primary DC
Replaced with RabbitMQ
Engine
Engine
Engine
Job Write ServiceRabbit
MQ
Remote DC Primary DC
Replaced with RabbitMQ
Engine
Engine
Engine
Job Write Service
Remote DC Primary DC
RabbitMQ
Replaced with RabbitMQ
Engine
Engine
Engine
Job Write Service
Remote DC Primary DC
RabbitMQ
Rabbit can talk to Rabbit
Shovel Plugin
Producer RabbitMQ 1 ConsumerRabbitMQ 2
Replaced with RabbitMQ
Engine
Engine
Engine
Job Write ServiceRabbit
MQ
RabbitMQ
RabbitMQ
RabbitMQ
Remote DC Primary DC
Replaced with RabbitMQ
Engine
Engine
Engine
Job Write ServiceRabbit
MQ
RabbitMQ
RabbitMQ
RabbitMQ
Primary DC
RabbitMQ
Remote DC
Parallelize Job Write Service
RabbitMQ
Job Write Service
Job Write Service
Job Write Service
Job A
Job B
Job C
Replaced with RabbitMQ
Engine
Engine
Engine
Job Write ServiceRabbit
MQ
RabbitMQ
RabbitMQ
RabbitMQ
Primary DC
RabbitMQ
Job Write Service
Remote DC
Replaced with RabbitMQ
Engine
Engine
Engine
Job Write ServiceRabbit
MQ
RabbitMQ
RabbitMQ
RabbitMQ
Primary DC
RabbitMQ
Job Write Service
Message Flow
Message Flow
Engine
Engine
Engine
Job Write Service
Primary DC
Job Write Service
RabbitMQ
RabbitMQ
RabbitMQ
RabbitMQ
RabbitMQ
Message Flow
Engine
Engine
Engine
Job Write Service
Primary DC
Job Write Service
RabbitMQ
RabbitMQ
RabbitMQ
RabbitMQ
RabbitMQ
Message Flow
Engine
Engine
Engine
Job Write Service
Primary DC
Job Write Service
RabbitMQ
RabbitMQ
RabbitMQ
RabbitMQ
RabbitMQ
Message Flow
Engine
Engine
Engine
Job Write Service
Primary DC
Job Write Service
RabbitMQ
RabbitMQ
RabbitMQ
RabbitMQ
RabbitMQ
Jobs/minute
Jobs/minute from one site
220,000 jobs6 hours
611 jobs / minute
Jobs/minute from one site
251,000 jobs20 minutes
12550 jobs / minute
RabbitMQ
Horizontal Scale
Engine
Engine
Engine Job Write ServiceRabbit
MQ
Job Write Service
RabbitMQ
RabbitMQ
RabbitMQ
RabbitMQ
RabbitMQ
Job Write Service
Job Write Service
Horizontal Scale
Horizontal Scale
Today 1000 messages / second
RabbitMQ 3
2486 Message / Second5MB/s
RabbitMQ Configuration
● Confirmations - Fire and Forget
● Persistent Messages - Durable
● Shoveling - Multi-Data Center
● Mirrored Queues in Cluster - High Reliability
Can we do more with RabbitMQ?
Aggregation Viewer
Real-time browser-based view of job stream
● Almost real-time● Exclusive queue● Transient messages
Aggregation Viewer Architecture
Agg JobsRabbit MQ
ClusterAgg ViewerRabbit MQ
Agg Viewer
Shovel* SubscribeJobs HTTP Browser
Resume Contacts Billing
Pay-per-contact: limited budget
Resume Contacts BillingOriginal Path
Pacific
Asia DC US DC
Log repoResume Search
MySQL
see http://go.indeed.com/logrepo
Resume Contacts BillingFast Path
Pacific
Asia DC US DC
RabbitMQ
MySQL
Log repo
RabbitMQ
Resume Search
X
Company Page Edits
User-contributed content about companies
Company Page
Company Page EditsImplementation
Writing data AND reading it back
Company Page EditsSingle Datacenter
Browser
Web Server MySQL
Company Page Serving
Browser
Web Server
LSM Tree
Asia Datacenter
Memcached
see http://go.indeed.com/lsmtree
Pacific
Company Page Edits
Browser
Web Server
RabbitMQ RabbitMQ MySQL
Primary US Datacenter
Asia Datacenter EU Datacenter
Atlantic
[Et cetera]
Memcached
Pacific
Company Page Reads
MySQL
LSM TreeBuilderLSM Tree
Primary US Datacenter
Asia Datacenter
LSM Tree
EU Datacenter
Atlantic
[Et cetera]
Memcached
Pacific
Company Pages System
Browser
Web Server
RabbitMQ RabbitMQ MySQL
LSM TreeBuilderLSM Tree
Primary US Datacenter
Asia Datacenter
LSM Tree
EU Datacenter
Atlantic
[Et cetera]
Other applications
Company Pages
Recap: The jobs must flow
● Durability● High throughput● Low latency● Partition-tolerance● Efficient use of the database● Minimal points of failure