35
This document and its contents are proprietary and confidential of Zocdoc, Inc. and may not be reproduced or shared, in whole or in part, without the express written authorization of Zocdoc, Inc. Leveraging AWS and Machine Learning to Power Search at Zocdoc Pedro Rubio Head of Search Engineering Brian d’Alessandro Head of Data Science

Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Embed Size (px)

Citation preview

Page 1: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

This document and its contents are proprietary and confidential of Zocdoc, Inc. and may not be reproduced or shared, in whole or in part, without the express written authorization of Zocdoc, Inc.

Leveraging AWS and Machine Learning to Power Search at ZocdocPedro Rubio Head of Search Engineering

Brian d’Alessandro Head of Data Science

Page 2: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Agenda

- How we’re built - People and Architecture- How we’re built - the Data - Questions

Page 3: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

3

Problem Statements:

1. Patients need to find and book with a doctor, and, 2. Patients don’t often know what kind of doctor they

need.

Page 4: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

4

Page 5: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

How we’re builtAnd solving “what the patient means”

Page 6: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

6

• Cross team collaboration enabling maximum iteration

speed

• Deliver recommendations < 200 ms

• Patient satisfaction

(And our architecture plays a big role here!)

Core Optimization Problems for ZD Search

Page 7: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

7

The Search Team

EngineeringProduct

Data ScienceDesign

Page 8: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Zocdoc Tech Stack- NodeJS, ES6, Babel, React

- AWS - Cloudformation, Docker, ECR,

EC2, ELB

- Kinesis / Firehose - S3

- reporting to data-lake

- Monitoring with Datadog

- Routes with Express

Page 9: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Our Legacy Search

Page 10: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Free Text (Patient Powered) Search

Page 11: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

11

Types of Intent

Name of doctor

Medical procedure

Specialty

Symptom

Page 12: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

12

Intent Parsing

Architecture

Machine Learning

Design

Page 13: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Phase I - Auto-SuggestSearchService Pipeline

Browser

Doctor Name Retrieval

Semantic Service

NLP

Corpus Building

Specialty Retrieval

Visit Reason Retrieval

Service Handler

Semantic Retrieval

Auto-Suggest Ranking

Logging Ranking Models

Results

Phase II - Backend Search

Page 14: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

14

The structured queries comprise a reasonable percent of traffic, but are a

minority of total search terms we service.

Solving for the Long Tail

Specialties = O(10^2), Procedures = O(10^3), Names = O(10^6), Other = O(10^7)

We use Natural Language Processing (NLP) algos to map unstructured terms into our structured search set.

Page 15: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

15

Variations

• Heart beats too fast

• Heart flutters• Pulse rate too high• Irregular pulse• Heart out of

rhythm• Irregular heartbeat• Heart palpitations

Concept Interpretation

• Irregular heartbeat

• Heart palpitations

Medical Term

• Atrial Fibrillation

Different Representations of Same Concept

Page 16: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

16

ZocDoc Semantic Servicef(“presentation anxiety”) = {[{Specialty =“Psychologist”, Relevance = 0.8}, …,{Specialty

=“Psychiatrist”, Relevance = 0.7}]}

Page 17: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Early Results (And Why You Need to Always Experiment)

Page 18: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

18

Searches that Lead to “Nephrology”

Many patients don’t know what a Nephrologist is. They don’t need to know to find one now.

Page 19: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

How We’re Built - The Data

Page 20: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Data - Indexing so we can Search

Page 21: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Lesson Learned with Indexing Data

Monolith Live Cache Feed Process S3 ƛ

AWSLegacy Layer Elastic.co

Lambda’s act as a mini ETL layer getting the documents ready for our retrieval stage.

- Lambda memory max 1500mb- Our data much larger- Manage state in S3 and

Elasticsearch

- Complex “stateless” ETL process that transforms this data into the data that we need in Elasticsearch

- Load piecemeal into Elasticsearch- At the very end, swap alias to use

newly uploaded indexes

Page 22: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

More complex Processing 1

2

Joined Data Set

Mapping from 1 -> 2

3

Mapping from 1 -> 3

Business Logic Application

- Spark - ETL

- Get over 1500mb limit

- Get over 5 minute runtime limit

- Easily add more data-sets

- Currently in Databricks

- Plan to migrate to EMR (Elastic Map Reduce)

New ETL - Spark

Page 23: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Data - Event Data So we can Learn

Page 24: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

24

The MarketplaceGoal: Make it as easy as possible to match the user to the right doctor.

Considerations:• How to weight distance

vs. availability vs. experience vs. reviews?

• Does Dr. take this type of patient?

• Are we meeting regulatory requirements?

Page 25: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

25

Optimize: algo iteration speed

Subject to:

• Org too small to justify full time data scientists within

search

• Throwing models over the wall to be implemented

doesn’t work

Organizational Optimization

Page 26: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

26

Model APIQuery

RankedResults

Transformations + Model Scoring

Model dB

FilteredResults

ScoredResults

ZocDoc Prod Service

(Search)

Logs (S3/Redshift)

Research, Analysis, Model Development

(Spark/Redshift)

Production

OfflineEngineering Owned DS Owned

Agile Machine Learning

Page 27: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

27

Aqueduct: Filling the Data Lake

Some Data Lake principles:

• Allow producers to easily push data

• Allow data format changes

• Smart ETL to make consumption very easy

Page 28: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

28

Cistern: Making Datalake Drinkable

• “Raw” data lake good for exploratory research (we use Spark)

• “Clean” data lake better for analytics and quick exploration

Page 29: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Data - Insights

Page 30: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

30

Page 31: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

31

Searches for Therapy/Therapist on ZocdocWe’ve got our fingers on the pulse of public health trends.

Page 32: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

32

How Much is that Smile Worth?Click Conversion by Search Rank and DrIsSmiling

We’re exploring AWS Rekognition to research what drives user interest in Dr. profiles.

Page 33: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s
Page 34: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

34

Page 35: Search at Zocdoc Machine Learning to Power · PDF fileSearch at Zocdoc Pedro Rubio Head of ... (And our architecture plays a big role here!) ... Legacy Layer AWS Elastic.co Lambda’s

Thank you and Questions!