Upload
julia-kiseleva
View
79
Download
0
Embed Size (px)
Citation preview
Understanding and Predicting User Satisfaction
with Intelligent AssistantsJulia Kiseleva, Kyle Williams, Jiepu Jiang,
Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, Tasos Anastasakos
Eindhoven University of Technology Pennsylvania State University
University of Massachusetts Amherst Microsoft
Why do we care?
2011
-09
2011
-11
2012
-01
2012
-03
2012
-05
2012
-07
2012
-09
2012
-11
2013
-01
2013
-03
2013
-05
2013
-07
2013
-09
2013
-11
2014
-01
2014
-03
2014
-05
2014
-07
2014
-09
2014
-11
2015
-01
2015
-03
2015
-05
2015
-07
2015
-090
20
40
60
80
100Desktop Mobile
Timeline
Perc
enta
ge o
f Tr
affic
http://gs.statcounter.com
Desktop Mobile
Desktop Mobile
Understanding User Satisfaction with Intelligent Assistants
Q1: how is the weather in ChicagoQ2: how is it this weekendQ3: find me hotelsQ4: which one of these is the cheapestQ5: which one of these has at least 4 starsQ6: find me directions from the Chicago airport to number one
User’s dialogue with
Cortana:Task is
“Finding a hotel in
Chicago”
Q1: find me a pharmacy nearbyQ2: which of these is highly ratedQ3: show more information about number 2Q4: how long will it take me to get thereQ5: Thanks
User’s dialogue with
Cortana:Task is
“Finding a pharmacy”
Research Questions• RQ1: What are characteristic types of scenarios of
use?
Controlling Device• Call a person
• Send a text message
• Check on-device calendar
• Open an application
• Turn on/off wi-fi
• Play music
Knowledge Pane
Image Answer
Knowledge Pane
Image Answer Image Answer
Organic Results
Knowledge Pane
Image Answer Image Answer
Location Answer
Organic Results
User:“Do I
need to have a jacket
tomorrow?”
Search Dialogue
User:“Do I
need to have a jacket
tomorrow?”
Cortana: “You could
probably go without one. The forecast
shows …”
Search Dialogue
Cortana: “Here are
ten restaurants near you”
User:“show
restaurants near
me”
Search Dialogue
Cortana: “Here are
ten restaurants near you”
Cortana:“Here are ten restaurants
near you that have good reviews”
User:“show
restaurants near
me”
User:“show
the best restaurants near
me ”
Search Dialogue
Cortana: “Here are
ten restaurants near you”
Cortana:“Here are ten restaurants
near you that have good reviews”
Cortana:“Getting you direction to the Mayuri
Indian Cuisine”
User:“show
restaurants near
me”
User:“show
the best restaurants near
me ”
User:“show
directions to the second one”
Search Dialogue
Research Questions• RQ1: What are characteristic types of scenarios of use?
• RQ2: How can we measure different aspects of user satisfaction?
• RQ3: What are key factors determining user satisfaction for the different scenarios?
• RQ4: How to characterize abandonment in the web search scenario?
• RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario?
Research Questions• RQ1: What are characteristic types of scenarios of use?
• RQ2: How can we measure different aspects of user satisfaction?
• RQ3: What are key factors determining user satisfaction for the different scenarios?
• RQ4: How to characterize abandonment in the web search scenario?
• RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario?
USE
R
STU
DY
User Study Participants
55%45%
LANGUAGEEnglish Other
• 60 Participants• 25.53 +/- 5.42 years
User Study Participants
75%
25%
GENDER
Male Female
55%45%
LANGUAGEEnglish Other
• 60 Participants• 25.53 +/- 5.42 years
User Study Participants
75%
25%
GENDER
Male Female
55%45%
LANGUAGEEnglish Other
82%
8%2% 8%
Education
Computer ScienceElectrical EngineeringMathematicsOther
• 60 Participants• 25.53 +/- 5.42 years
User Study Design• Video Instructions (same for all participants)
• Tasks are realistic – mined from Cortana logs:
o Control type of taskso Queries where users don’t clicko Search dialogue tasks – mostly localization type of
queries
Find out what is the hair color of
your favorite celebrity
You are planning a vacation. Pick a
place. Check if the weather is good enough for the period you are planning the
vacation. Find a hotel that suits you.
Find the driving directions to this
place.
You are planning a vacation. Pick a
place. Check if the weather is good enough for the period you are planning the
vacation. Find a hotel that suits you.
Find the driving directions to this
place.
Questionnaire: Controlling Device
• Were you able to complete the task?o Yes/No
• How satisfied are you with your experience in this task?o 5-point Likert scale
• How well did Cortana recognize what you said?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
Questionnaire: Controlling Device
• Were you able to complete the task?o Yes/No
• How satisfied are you with your experience in this task?o 5-point Likert scale
• How well did Cortana recognize what you said?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
5 Tasks20 Minutes
Questionnaire: Good Abandonment
• Were you able to complete the task?o Yes/No
• Where did you find the answer?o Answer Box, Image, SERP, Visited Website
• Which query led you to finding the answer?o First, Second, Third, >= Fourth
• How satisfied are you with your experience in this task?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
Questionnaire: Good Abandonment
• Were you able to complete the task?o Yes/No
• Where did you find the answer?o Answer Box, Image, SERP, Visited Website
• Which query led you to finding the answer?o First, Second, Third, >= Fourth
• How satisfied are you with your experience in this task?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
5 Tasks20 Minutes
Questionnaire: Search Dialogue
• Were you able to complete the task?o Yes/No
• How satisfied are you with your experience in this task?o If the task has sub-tasks participants indicate their graded
satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions?
• How well did Cortana recognize what you said?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
Questionnaire: Search Dialogue
• Were you able to complete the task?o Yes/No
• How satisfied are you with your experience in this task?o If the task has sub-tasks participants indicate their graded
satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions?
• How well did Cortana recognize what you said?o 5-point Likert scale
• Did you put in a lot of effort to complete the task?o 5-point Likert scale
8 Tasks: 1 simple, 4 with 2 subtasks, 3 with 3 subtasks
30 Minutes
Search Dialog Dataset• 540 tasks that incorporated
• 2, 040 queries, of which 1, 969 were unique
• the average query-length is 7.07
• The simple task generated 130 queries in total
• Tasks with 2 context switches generated 685 queries
• Tasks with 3 context switches generated 1, 355 queries
Factors Determining Satisfaction
RQ3: What are key factors determining user satisfaction for the different scenarios?
Across Scenar-
ious
Device Control
Web Search
Structured Dialog
50
1
2
3
4
5
6
Across Scenar-
ious
Device Control
Web Search
Structured Dialog
50
1
2
3
4
5
6
Satis
fact
ion
Leve
l
Effor
ts
Results Over ScenariosMean of Satisfaction
Results `Good Abandonment’
RQ4: How to characterize abandonment in the web search scenario?
First Query
Second Query
Third Query
>= Fourth Quey
0
1
2
3
4
5
6
Answer Box
Image SERP Visited WebSite
50
1
2
3
4
5
6
Satis
fact
ion
Leve
l
Results `Good Abandonment’
Mean of Satisfaction
Search Dialogue Satisfaction
RQ5: How does query-level satisfaction relate to overall user satisfaction for the structured search dialogue scenario?
Cortana: “Here are
ten restaurants near you”
Cortana:“Here are ten restaurants
near you that have good reviews”
Cortana:“Getting you direction to the Mayuri
Indian Cuisine”
User:“show
restaurants near
me”
User:“show
the best restaurants near
me ”
User:“show
directions to the second one”
SAT?
SAT?
SAT?
SAT?
SAT?
SAT?
Overall
SAT??
Search Dialogue Satisfaction
RQ5: How does query-level satisfaction relate to overall user satisfaction for the structured search dialogue scenario?
Satisfaction Over Different Tasks
Satisfaction Level
Weather Task
Num
ber
of
Ans
wer
s
1 2 3 4 5
Satisfaction Over Different Tasks
Satisfaction Level
Weather Task Mission Task (2 sub-tasks)
Num
ber
of
Ans
wer
s
1 2 3 4 5
Satisfaction Over Different Tasks
Satisfaction Level
Weather Task Mission Task (2 sub-tasks)
Mission Task (3 sub-tasks)
Num
ber
of
Ans
wer
s
1 2 3 4 5
Q1: what do you have medicine for the stomach acheQ2: stomach ache medicine over the counter
Q3: show me the nearest pharmacyQ4: more information on the second one
Q5: do they have a stool softenerQ6: does Fred Meyer have stool softeners
General Search
Search Dialog
Combination of scenarios
User’s dialogue with Cortana related to the ‘stomach ache’ problem
Conclusions (1)• RQ1: What are characteristic types of scenarios of use?• We proposed three main types of scenarios
• RQ2: How can we measure different aspects of user satisfaction?
• We designed a series of user studies tailored to the three scenarios
• RQ3: What are key factors determining user satisfaction for the different scenarios?
• Effort is a key component of user satisfaction across the different intelligent assistants scenarios
Conclusions (2)• RQ4: How to characterize abandonment in the web
search scenario?• We concluded that to measure good abandonment we
need to investigate the other forms of interaction signals that are not based on clicks or reformulation
• RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario?
• We looked at user satisfaction as ‘a user journey towards an information goal where each step is important,’ and showed the importance of session context
Predicting User Satisfaction with Intelligent Assistants(Good Abandonment Case)
Evaluating User Satisfaction
• We need metrics to evaluate user satisfaction
• Good abandonment [Human et. al, 2009]: Mobile: 36% of abandoned queries in were likely good Desktop: 14.3%
• Traditional methods use implicit signals: clicks and dwell time
Evaluating User Satisfaction
• We need metrics to evaluate user satisfaction
• Good abandonment [Human et. al, 2009]: Mobile: 36% of abandoned queries in were likely good Desktop: 14.3%
• Traditional methods use implicit signals: clicks and dwell time
Don’t work
Our Main Research Problem
In the absence of clicks, what is the relationship between a user's gestures and satisfaction and can we use gestures to detect satisfaction and good abandonment?
Research Questions• RQ1: What SERP elements are the sources of good
abandonment in mobile search?
• RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search?
• RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment?
Research Questions• RQ1: What SERP elements are the sources of good
abandonment in mobile search?
• RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search?
• RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment?
USE
R
STU
DY
Research Questions• RQ1: What SERP elements are the sources of good
abandonment in mobile search?
• RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search?
• RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment?
USE
R
STU
DY
CR
OW
DSO
UR
CIN
G
Crowdsourcing ProcedureRandom sample of abandoned queries from the search logs of a personal digital assistant during one week in June 2015 (no query suggestion)
Crowdsourcing ProcedureQuery: Peniston
Previous Query: third eroics
Crowdsourcing Data• Total amount of queries – 3,895
• Judgments agreement (3 per one query) – 73%
• After filtering: SAT – 1,565 and DSAT – 1,924
RQ1: Reasons of Good Abandonment
RQ1: Reasons of Good Abandonment
Mean of Satisfaction
Query and Session Features• Session duration• Number of queries in session
Session Features
Query and Session Features• Session duration• Number of queries in session • Index of query within session• Time to next query • Query length (number of words)• Is this query a reformulation• Was this query reformulated
Session Features
Query Features
Query and Session Features• Session duration• Number of queries in session • Index of query within session• Time to next query • Query length (number of words)• Is this query a reformulation• Was this query reformulated• Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec)
Session Features
Query Features
Click Features
Baseline 1:Click & Dwell• Session duration• Number of queries in session • Index of query within session• Time to next query • Query length (number of words)• Is this query a reformulation• Was this query reformulated• Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec)
Session Features
Query Features
Click Features
Click > 30 sec
No Refomulation
B1: Click, Dwell with no Reform
ulation
Baseline 2: Optimistic • Session duration• Number of queries in session • Index of query within session• Time to next query • Query length (number of words)• Is this query a reformulation• Was this query reformulated• Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec)
Session Features
Query Features
Click Features
NOClick
NO Refomulation
B2: Optimistic
Baseline 3: Query-Session Model• Session duration• Number of queries in session • Index of query within session• Time to next query • Query length (number of words)• Is this query a reformulation• Was this query reformulated• Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec)
Session Features
Query Features
Click Features
B3: Query-Session Model:
Training Random Forest
Gesture Features (1)• Viewport features swipes-related:
o up swipes and down swipeso changes in swipe direction o swiped distance in pixels and average swiped distanceo swipe distance divided by time spent on the SERP
Gesture Features (1)• Viewport features swipes-related:
o up swipes and down swipeso changes in swipe direction o swiped distance in pixels and average swiped distanceo swipe distance divided by time spent on the SERP
• Time To Focuso Time to focus on Answero Time to Focus on Organic Search Results
3 seconds
6 seconds33% of
ViewPort 66% of
ViewPort
View
Port
H
eigh
t
2 seconds20% of ViewPo
rt
1s 4s 0.4s 5.4s+ + =
GF(2): Attributed Reading Time
400 pixels
300 pixels
AttributedReading Time: 5.4s
Pixel Area: (400 pix x 300
pix)
0.045 ms/pix2=
GF (3): Attributed Reading Time Per Pixel
Models: Detecting Good Abandonment
M1: Gesture Model:Training Random Forest based on gesture features
M2: Gesture Model + Query and Session Features:Training Random Forest based on gesture, query and session features
RQ2: Are gestures useful? (1)
On only abandoned user study data: 148 SAT queries and 313 DSAT queries
RQ2: Are gestures useful? (2)
On crowdsourced data: 1565 SAT queries and 1924 DSAT queries
RQ2: Are gestures useful? (3)
On all user study data: 179 SAT queries and 384 DSAT queries
Gestures Features are useful to detect user satisfaction in general!
Conclusions• RQ1: What SERP elements are the sources of good
abandonment in mobile search?Answer, Images and Snippet
• RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search?
Yes
• RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment
Time spent interacting with Answers is positively correlated. Swipe actions and time spent with SERP is negatively correlated
• Answer, Images and Snippet are potentially source of the good abandonment
• User gestures provide useful signals to detect good abandonment
• Time spent interacting with Answers is positively correlated. Swipe actions and time spent with SERP is negatively correlated
Questions?