To be or not be engaged: What are the questions (to ask)?

To be or not be engaged: What are the ques2ons (to ask)?

Mounia Lalmas Yahoo! Labs Barcelona [email protected]

1

About me

•  Since January 2011: Visi2ng Principal Scien2st at Yahoo! Labs Barcelona •  User engagement, social media, search

•  1999-‐2008: Lecturer (assistant professor) to Professor at Queen Mary, University of London •  XML retrieval and evalua>on (INEX)

•  2008-‐2010: MicrosoR Research/RAEng Research Professor at the University of Glasgow

•  Quantum theory to model informa>on retrieval

Blog: labtomarket.wordpress.com 2

Why is it important to engage users?

•  In today’s wired world, users have enhanced expecta>ons about their interac>ons with technology

… resul>ng in increased compe>>on amongst the purveyors and designers of interac>ve systems. •  In addi>on to u>litarian factors, such as usability, we must consider the hedonic and experien>al factors of interac>ng with technology, such as fun, fulfillment, play, and user engagement.

•  In order to make engaging systems, we need to understand what user engagement is and how to measure it.

3

Why is it important to measure and interpret user engagement well?

CTR

4

Outline •  What is user engagement?

•  What are the characteris>cs of user engagement?

•  How to measure user engagement?

•  What are the ques>ons to ask?

saliency, interes>ng, serendipity, relevance, sen>ment, reading, news, social media, user generated content, automa>c linking, aesthe>cs.

5

WHAT IS USER ENGAGEMENT?

6

h[p://thenextweb.com/asia/2013/05/03/kakao-‐talk-‐rolls-‐out-‐plus-‐friend-‐home-‐a-‐revamped-‐pla_orm-‐to-‐connect-‐users-‐with-‐their-‐favorite-‐brands/

Engagement is on everyone’s mind

h[p://socialbarrel.com/70-‐percent-‐of-‐brand-‐engagement-‐on-‐pinterest-‐come-‐from-‐users/51032/

h[p://iac>onable.com/user-‐engagement/

h[p://www.cio.com.au/ar>cle/459294/heart_founda>on_uses_gamifica>on_drive_user_engagement/

h[p://www.localgov.co.uk/index.cfm?method=news.detail&id=109512

h[p://www.trefis.com/stock/lnkd/ar>cles/179410/linkedin-‐makes-‐a-‐90-‐million-‐bet-‐on-‐pulse-‐to-‐help-‐drive-‐user-‐engagement/2013-‐04-‐15 7

What is user engagement?

User engagement is a quality of the user experience that emphasizes the posi>ve aspects of interac>on – in par>cular the fact of being cap>vated by the technology (Ahield et al, 2011).

user feelings: happy, sad, excited, …

emo>onal, cogni>ve and behavioural connec>on that exists, at any point in >me and over >me, between a user and a technological resource

user interac2ons: click, read, comment, buy…

user mental states: involved, lost, concentrated…

8

Considera2ons in the measurement of user engagement •  Short term (within session) and long term (across mul>ple sessions)

•  Laboratory vs. field studies •  Subjec>ve vs. objec>ve measurement •  Large scale (e.g., dwell >me of 100,000 people) vs. small scale (gaze pa[erns of 10 people)

• User engagement as process vs. as product

One is not be[er than other; it depends on what is the aim.

9

CHARACTERISTICS OF USER ENGAGEMENT

10

Characteris2cs of user engagement (I)

• Users must be focused to be engaged • Distor>ons in the subjec>ve percep>on of >me used to measure it

Focused a_en2on (Webster & Ho, 1997; O’Brien,

2008)

• Emo>ons experienced by user are intrinsically mo>va>ng • Ini>al affec>ve “hook” can induce a desire for explora>on, ac>ve discovery or par>cipa>on

Posi2ve Affect (O’Brien & Toms, 2008)

• Sensory, visual appeal of interface s>mulates user & promotes focused a[en>on

• Linked to design principles (e.g. symmetry, balance, saliency)

Aesthe2cs (Jacques et al, 1995; O’Brien, 2008)

• People remember enjoyable, useful, engaging experiences and want to repeat them

• Reflected in e.g. the propensity of users to recommend an experience/a site/a product

Endurability (Read, MacFarlane, & Casey, 2002;

O’Brien, 2008) 11

Characteris2cs of user engagement (II)

• Novelty, surprise, unfamiliarity and the unexpected • Appeal to users’ curiosity; encourages inquisi>ve behavior and promotes repeated engagement

Novelty (Webster & Ho, 1997; O’Brien,

2008)

• Richness captures the growth poten>al of an ac>vity • Control captures the extent to which a person is able to achieve this growth poten>al

Richness and control (Jacques et al, 1995; Webster &

Ho, 1997)

•  Trust is a necessary condi>on for user engagement •  Implicit contract among people and en>>es which is more than technological

Reputa2on, trust and expecta2on (Attfield et al,

2011)

• Difficul>es in sevng up “laboratory” style experiments

• Why should users engage?

Mo2va2on, interests, incen2ves, and benefits (Jacques et al., 1995; O’Brien &

Toms, 2008) 12

MEASURING USER ENGAGEMENT

13

Measuring user engagement Measures Characteris2cs

Self-‐reported engagement

Ques>onnaire, interview, report, product reac>on cards, think-‐aloud

Subjec>ve Short-‐ and long-‐term Lab and field Small-‐scale Product outcome

Cogni>ve engagement

Task-‐based methods (>me spent, follow-‐on task) Physiological measures (e.g. EEG, SCL, fMRI, eye tracking, mouse-‐tracking)

Objec>ve Short-‐term Lab and field Small-‐scale and large-‐scale Process outcome

Interac>on engagement

Web analy>cs metrics + models

Objec>ve Short-‐ and long-‐term Field Large-‐scale Process outcome

14

Large-‐scale measurements of user engagement – Web analy2cs

Intra-‐session measures Inter-‐session measures

•  Dwell >me / session dura>on •  Play >me (video) •  (Mouse movement) •  Click through rate (CTR) •  Mouse movement •  Number of pages viewed (click

depth) •  Conversion rate (mostly for e-‐

commerce) •  Number of UCG (comments)

•  Frac>on of return visits •  Time between visits (inter-‐session

>me, absence >me) •  Total view >me per month (video) •  Life>me value (number of ac>ons) •  Number of sessions per unit of >me •  Total usage >me per unit of >me •  Number of friends on site (social

networks) •  Number of UCG (comments)

•  Intra-‐session engagement measures our success in a[rac>ng the user to remain on our site for as long as possible.

•  Inter-‐session engagement can be measured directly or, for commercial sites, by observing life>me customer value.

15

Cogni2ve engagement •  Eye tracking • Mouse movement •  Face expression •  Psychophysiological measures Respira>on, Pulse rate Temperature, Brain wave, Skin conductance, …

16

Signals – Signals – Signals: Five studies

self-‐reported engagement

WHAT ARE THE QUESTIONS TO ASK?

Interac>on

engagement

17

STUDY I •  Domain: entertainment news •  Study: saliency •  Measurement: focus a[en>on and affect

18 + Lori McCay-‐Peet + Vidhya Navalpakkam

• How the visual catchiness (saliency) of “relevant” informa>on impacts user engagement metrics such as focused a[en>on and emo>on (affect) •  focused a_en2on refers to the exclusion of other things

•  affect relates to the emo>ons experienced during the interac>on

•  Saliency model of visual a[en>on developed by (Iv & Koch, 2000)

Self-‐report engagement

19

Manipula2ng saliency

Web page screenshot

Saliency maps

salient con

di>o

n no

n-‐salient con

di>o

n

(McCay-‐Peet et al, 2012) 20

Study design •  8 tasks = finding latest news or headline on celebrity or entertainment topic

•  Affect measured pre-‐ and post-‐ task using the Posi>ve e.g. “determined”, “a[en>ve” and Nega>ve e.g. “hos>le”, “afraid” Affect Schedule (PANAS)

•  Focused a[en>on measured with 7-‐item focused a4en5on subscale e.g. “I was so involved in my news tasks that I lost track of >me”, “I blocked things out around me when I was comple>ng the news tasks” and perceived >me

•  Interest level in topics (pre-‐task) and ques>onnaire (post-‐task) e.g. “I was interested in the content of the web pages”, “I wanted to find out more about the topics that I encountered on the web pages”

•  189 (90+99) par>cipants from Amazon Mechanical Turk 21

PANAS (10 posi2ve items and 10 nega2ve items) •  You feel this way right now, that is, at the present moment [1 = very slightly or not at all; 2 = a li[le; 3 = moderately;

4 = quite a bit; 5 = extremely] [randomize items]

distressed, upset, guilty, scared, hos>le, irritable, ashamed, nervous, ji[ery, afraid interested, excited, strong, enthusias>c, proud, alert, inspired, determined, a[en>ve, ac>ve

(Watson, Clark & Tellegen, 1988)

22

7-‐item focused a_en2on subscale (part of the 31-‐item user engagement scale) 5-‐point scale (strong disagree to strong agree)

1.  I lost myself in this news tasks experience 2.  I was so involved in my news tasks that I lost track of >me 3.  I blocked things out around me when I was comple>ng the

news tasks 4.  When I was performing these news tasks, I lost track of

the world around me 5.  The >me I spent performing these news tasks just slipped

away 6.  I was absorbed in my news tasks 7.  During the news tasks experience I let myself go

(O'Brien & Toms, 2010) 23

Saliency and posi2ve affect

• When headlines are visually non-‐salient •  users are slow at finding them, report more distrac>on due to web page features, and show a drop in affect

• When headlines are visually catchy or salient •  user find them faster, report that it is easy to focus, and maintain posi>ve affect

•  Saliency is helpful in task performance, focusing/avoiding distrac2on and in maintaining posi2ve affect

24

Saliency and focused a_en2on • Adapted focused a[en>on subscale from the online shopping domain to entertainment news domain

• Users reported “easier to focus in the salient condi>on” BUT no significant improvement in the focused a[en>on subscale or differences in perceived >me spent on tasks

• User interest in web page content is a good predictor of focused a_en2on, which in turn is a good predictor of posi2ve affect

25

Self-‐repor2ng, crowdsourcing, saliency and user engagement

•  Interac>on of saliency, focused a[en>on, and affect, together with user interest, is complex.

•  Using crowdsourcing worked!

• What next? •  include web page content as a quality of user engagement in focused a[en>on scale

•  more “realis2c” user (interac>ve) reading experience •  other measurements: mouse-‐tracking, eye-‐tracking, facial expression analysis, etc.

(McCay-‐Peet, Lalmas & Navalpakkam, 2012) 26

STUDY II •  Domain: news and user generated content

(comments) •  Study: interes>ngness and sen>ment •  Measurement: focus a[en>on, affect and

gaze

27 + Ioannis Arapakis + Barla Cambazoglu + Mari-‐Carmen Marcos + Joemon Jose

Gaze and self-‐repor2ng

•  News + comments •  Sen>ment, interest •  57 users (lab-‐based) •  Reading task (114)

•  Ques>onnaire (qualita>ve data) •  Record eye tracking (quan>ta>ve data)

Three metrics: gaze, focus a[en>on and posi>ve affect

28 (Lin et al, 2007)

Interes2ng content promote users engagement metrics

• All three metrics: •  focus a[en>on, posi>ve affect & gaze

• What is the right trade-‐off? •  news is news J

• Can we predict? •  provider, editor, writer, category, genre, visual aids, …, sen2mentality, …

• Role of user-‐generated content (comments) •  As measure of engagement? •  To promote engagement? 29

Lots of sen2ments but with nega2ve connota2ons!

•  Positive effect (and interest, enjoyment and wanted to know more) correlates

•  Positively (é) with sentimentality (lots of emotions)

•  Negatively (ê) with positive polarity (happy news)

Sen2Strenght (from -‐5 to 5 per word) sen>mentality: sum of absolute values (amount of sen>ments) polairity: sum of values (direc>on of the sen>ments: posi>ve vs nega>ve)

(Thelwall, Buckley & Paltoglou, 2012)

30

Effect of comments on user engagement

•  6 ranking of comments: •  most replied, most popular, newest •  sen>mentality high, sen>mentality low •  polarity plus, polarity minus

•  Longer gaze on •  newest and most popular for interes>ng news •  most replied and high sen>mentality for non-‐interes>ng news

• Can we leverage this to prolong user a[en>on? 31

Gaze, sen2mentality, interest

•  Interes>ng and “a[rac>ve” content! •  Sen>ment as a proxy of focus a[en>on, posi>ve affect and gaze?

• Next •  Larger-‐scale study •  Other domains (beyond news!) •  Role of social signals (e.g. Facebook, Twi[er) •  Lots more data: mouse tracking, EEG, facial expression

(Arapakis et al., 2013) 32

STUDY III •  Domain: news and social media (Wikipedia) •  Study: interes>ngness, aesthe>cs, task •  Measurement: focus a[en>on, affect and

mouse movement

33 + David Warnock

Mouse tracking and self-‐repor2ng •  324 users from Amazon Mechanical Turk (between subject

design) •  Two domains (BBC News and Wikipedia) •  Two tasks (reading and search) •  “Normal vs Ugly” interface

•  Ques>onnaires (qualita>ve data) •  focus a[en>on, posi>ve effect, novelty, •  interest, usability, aesthe>cs •  + demographics, handeness & hardware

•  Mouse tracking (quan>ta>ve data) •  movement speed, movement rate, click rate, pause length,

percentage of >me s>ll

34

“Ugly” vs “Normal” Interface (BBC News)

35

Mouse tracking can tell about •  Age

•  Hardware •  Mouse •  Trackpad

•  Task •  Searching: There are many different types of phobia. What is Gephyrophobia a fear of?

•  Reading: (Wikipedia) Archimedes, Sec5on 1: Biography

36

Mouse tracking could not tell much about •  focused a[en>on and posi>ve affect •  user interests in the task/topic

• BUT BUT BUT BUT •  “ugly” variant did not result in lower aesthe>cs scores •  although BBC > Wikipedia

•  BUT – the comments le� … •  Wikipedia: “The website was simply awful. Ads flashing everywhere, poor text colors on a dark blue background.”; “The webpage was en5rely blue. I don't know if it was supposed to be like that, but it definitely detracted

from the browsing experience.” •  BBC News: “The website's layout and color scheme were a bitch to navigate and read.”; “Comic sans is a horrible font.”

37

Mouse tracking and user engagement

•  Task and hardware • Do we have a Hawthorne Effect??? •  “Usability” vs engagement

•  “Even uglier” interface? • Within-‐ vs between-‐subject design?

• What next? •  Sequence of movements • Automa>c clustering

(Warnock & Lalmas, 2013)

38

STUDY IV

•  Domain: news •  Study: automa>c linking •  Measurement: interes>ngness

39 + Ioannis Arapakis +Hakan Ceylan + Pinar Domnez

Automatic linking & reading experience 40

Keeping users reading more ar>cles

LEPA: Linker for Events to Past Articles LEPA is a a fully automated approach to constructing hyperlinks in news articles using “simple” text processing and understanding techniques

Indexer

• Processes ar>cles over a >me period by extrac>ng features from each ar>cle and storing them to facilitate faster retrieval

Linker

•  Iden>fies sentences that contain newsworthy events

•  For each such event it retrieves from the index all the matching ar>cles and links the top-‐ranked with the event 41

Three-stage evaluation

Pilot study

Assessing reading experience

Assessing links

+

42

Pilot study • Rating results:

•  Bad: 35.15% •  Fair: 33.93% •  Good: 20% •  Excellent: 9.09% •  Not Judged: 1.81%

• With 63.03% of the links being good: •  initial evidence that LEPA is not too far from the

optimum achieved by human editors

Professional editors A collec>on of system-‐embedded links (164 ar>cle-‐link combina>ons) 5-‐point Likert Scale: (i) bad, (ii) fair, (iii) good, (iv) excellent, and (v) not judged

43

Assessing the links: are they related?

•  664 par>cipants recruited through Amazon Mechanical Turk; between-‐group design (two groups)

•  Precision = frac>on of links (total=164) that received, in terms of relatedness, a score equal to, or greater than, 3 on a 5-‐point Likert scale

System-Embedded Links Manually-curated Links

Participant A

Participant B

All Participant A

Participant B

All

Related to the main theme 49% 42% 45% 54% 51% 53%

Related to subtopic 21% 24% 22% 31% 34% 33% Tangentially related 13% 15% 14% 9% 12% 10% Unrelated 15% 16% 16% 5% 1% 3% Other 2% 2% 2% 1% 2% 1%

44

Assessing the Reading Experience •  120 par>cipants recruited through Amazom Mechanical Turk; between-‐groups design (three groups)

•  Editors + two opposite “extremes” of LEPA: •  High recall: best at embedding newsworthy links & ar>cles that provide interes>ng insights

•  High precision: best in terms of embedding the right number of links

good topical coverage

informa2veness

broader perspec2ve

interes2ng insights

good topical coverage

link presenta2on

content volume

posi2ve news reading

experience 45

indu

c>ve, the

ma>

c coding of

open

-‐end

ed que

s>on

s

Automa2c linking and news reading experience

•  Even under realis>c and uncontrolled condi>ons, performance of LEPA comparable to that of editors, and in some cases be[er

• High precision vs. high recall • High precision threshold leads to a be[er news reading experience: less is more “They were too many, being mostly quite long, in some cases more than half the length of the main ar5cle, and some5mes they repeated the same iden5cal informa5on”

46

STUDY V

•  Domain: social media (Yahoo! Answers and Wikipedia)

•  Study: serendipity •  Measurement: relevance, unexpectedness,

interes>ngness

47 + Ilaria Bordino + Yelena Mejova

En2ty-‐driven Exploratory Search

Linguis-cally Mo-vated Seman-c Aggrega-on Engines “transi5on to a truly seman5c aggrega5on paradigm where machines understand a user’s intent, discover and organize facts, iden5fy opinions, experiences and trends”

En>ty Search

we build an en>ty-‐driven serendipitous search system based on en>ty networks extracted from Wikipedia and Yahoo! Answers

Serendipity finding something good or useful while not specifically looking for it, serendipitous search systems provide relevant and interes>ng results

48

Yahoo! Answers vs Wikipedia community-‐driven ques>on & answer portal •  67 336 144 ques>ons & 261 770 047 answers

•  January 1, 2010 – December 31, 2011

•  English-‐language

community-‐driven encyclopedia •  3 795 865 ar>cles •  as of end of December

2011 •  English Wikipedia

curated high-‐quality knowledge variety of niche topics

minimally curated opinions, gossip, personal info

variety of points of view

49

Entity & Relationship Extraction

•  en2ty – any well-‐defined concept that has a Wikipedia page •  rela2onship – a topical rela>onship/similarity between a pair of en>>es based on document co-‐occurrence •  related to the number of documents in which the two en>>es occur

50

Dataset # Nodes # Edges Density # Isolated

Yahoo! Answers 896,799 112,595,138 0.00028 69,856

Wikipedia 1,754,069 237,058,218 0.00015 82,381

Dataset Avg Degree Max Degree Size of Largest CC

Yahoo! Answers 251 231,921 826,402 (92.15%)

Wikipedia 270 346,070 1,671,241 (95.28%)

Wikipedia

51

Yahoo! Answers

Retrieval

Wikipedia Yahoo! Answers Combined

Precision @ 5 0.668 0.724 0.744

MAP 0.716 0.762 0.782

Jus>n Bieber, Nicki Minaj, Katy Perry, Shakira, Eminem, Lady Gaga, Jose Mourinho, Selena Gomez, Kim Kardashian, Miley Cyrus, Robert Pavnson, Adele %28singer%29, Steve Jobs, Osama bin Laden, Ron Paul, Twi[er, Facebook, Ne_lix, IPad, IPhone, Touchpad, Kindle, Olympic Games, Cricket, FIFA, Tennis, Mount Everest, Eiffel Tower, Oxford Street, Nubcrburgring, Hai>, Chile, Libya, Egypt, Middle East, Earthquake, Oil spill, Tsunami, Subprime mortgage crisis, Bailout, Terrorism, Asperger syndrome, McDonal's, Vitamin D, Appendici>s, Cholera, Influenza, Pertussis, Vaccine, Childbirth

3 labels per query-‐result pair gold standard quality control

Yahoo! Answers Jon Rubinstein Timothy Cook Kane Kramer Steve Wozniak Jerry York

Wikipedia System 7

PowerPC G4 SuperDrive

Power Macintosh Power Compu>ng Corp.

Steve Jobs •  Annotator agreement (overlap): 0.85

•  Average overlap in top 5 results: <1

52

retrieve en>>es most related to a query en>ty using random walk

| relevant & unexpected | / | unexpected | number of serendipitous results out of all of the unexpected results retrieved | relevant & unexpected | / | retrieved |

serendipitous out of all retrieved 53

Baseline Data Top: 5 en>>es that occur most frequently WP 0.63 (0.58) in top 5 search from Bing and Google YA 0.69 (0.63) Top –WP: same as above, but excluding WP 0.63 (0.58) Wikipedia page from results YA 0.70 (0.64) Rel: top 5 en>>es in the related query WP 0.64 (0.61) sugges>ons provided by Bing and Google YA 0.70 (0.65) Rel + Top: union of Top and Rel WP 0.61 (0.54) YA 0.68 (0.57)

Serendipity “making fortunate discoveries by accident” Serendipity = unexpectedness + relevance

“Expected” result baselines from web search

Interes2ngness ≠ Relevance Interes2ng > Relevant

Relevant > Interes2ng

Oil Spill à Penguins in Sweaters WP

Robert Pavnson à Water for Elephants WP

Lady Gaga à Britney Spears WP

Egypt à Cairo Conference WP

Ne_lix à Blu-‐ray Disc YA

Egypt à Ptolemaic Kingdom WP & YA

54 (Bordino, Mejova & Lalmas, 2013)

Similarity (Kendall’s tau-‐b) between result sets and reference ranking

55

Data tau-‐b Which result is more WP 0.162 relevant to the query? YA 0.336 If someone is interested in the query, would WP 0.162 they also be interested in the result? YA 0.312 Even if you are not interested in the query, WP 0.139 is the result interes5ng to you personally? YA 0.324 Would you learn anything new about WP 0.167 the query from the results YA 0.307

Following (Arguello et al, 2011) 1.  Labelers provide pairwise

comparisons between results 2.  Combine into a reference ranking 3.  Compare result ranking to op>mal

ranking using Kendall’s tau

Assessing “interes2ngness”

Multimedia search activities often driven by entertainment needs, not by information needs

Serendipity in multimedia search?

(Slaney, 2011) 56

What are the ques2ons to ask? • No one measurement is perfect or complete. • All studies (process or product) have different constraints.

• Need to ensure methods are applied consistently with a[en>on to reliability: what is a good signal?

• More emphasis should be placed on using mixed methods to improve the validity of the measures.

• Be careful of the WEIRD syndrome (Western, Educated, Industrialized, Rich, and Democra>c)

57

Acknowledgements

•  Collaborators: Ioannis Arapakis, Ilaria Bordino, , Barla Cambazoglu, Hakan Ceylan, Pinar Domnez, Lori McCay-‐Peet, Yelena Mejova, Vidhya Navalpakkam, David Warnock, and others at Yahoo! Labs.

•  This talk uses some material from a tutorial “Measuring User Engagement” given at WWW 2013, Rio de Janeiro (with Heather O’Brien and Elad Yom-‐Tov)

Blog: labtomarket.wordpress.com

58

Technology

To be or not be engaged: What are the questions (to ask)?