Upload
mounia-lalmas
View
2.007
Download
1
Tags:
Embed Size (px)
DESCRIPTION
In the online world, user engagement refers to the quality of the user experience that emphasizes the phenomena associated with wanting to use a web application longer and frequently. User engagement is a multifaceted, complex phenomenon, giving rise to a number of approaches for its measurement: self-reporting (e.g., questionnaires); observational methods (e.g., facial expression analysis, desktop actions); and web analytics using online behavior metrics. These methods represent various trade-offs between the scale of the data analyzed and the depth of understanding. For instance, surveys are hardly scalable but offer rich, qualitative insights, whereas click data can be collected on a large-scale but are more difficult to analyze. Still, the core research questions each type of measurement is able to answer are unclear. This talk will present various efforts aiming at combining approaches to measure engagement and seeking to provide insights into what questions to ask when measuring engagement. Keynote at 18th International Conference on Application of Natural Language to Information Systems (NLDB2013), University of Salford, MediaCityUK Blog: http://labtomarket.wordpress.com
Citation preview
To be or not be engaged: What are the ques2ons (to ask)?
Mounia Lalmas Yahoo! Labs Barcelona [email protected]
1
About me
• Since January 2011: Visi2ng Principal Scien2st at Yahoo! Labs Barcelona • User engagement, social media, search
• 1999-‐2008: Lecturer (assistant professor) to Professor at Queen Mary, University of London • XML retrieval and evalua>on (INEX)
• 2008-‐2010: MicrosoR Research/RAEng Research Professor at the University of Glasgow
• Quantum theory to model informa>on retrieval
Blog: labtomarket.wordpress.com 2
Why is it important to engage users?
• In today’s wired world, users have enhanced expecta>ons about their interac>ons with technology
… resul>ng in increased compe>>on amongst the purveyors and designers of interac>ve systems. • In addi>on to u>litarian factors, such as usability, we must consider the hedonic and experien>al factors of interac>ng with technology, such as fun, fulfillment, play, and user engagement.
• In order to make engaging systems, we need to understand what user engagement is and how to measure it.
3
Why is it important to measure and interpret user engagement well?
CTR
4
Outline • What is user engagement?
• What are the characteris>cs of user engagement?
• How to measure user engagement?
• What are the ques>ons to ask?
saliency, interes>ng, serendipity, relevance, sen>ment, reading, news, social media, user generated content, automa>c linking, aesthe>cs.
5
WHAT IS USER ENGAGEMENT?
6
h[p://thenextweb.com/asia/2013/05/03/kakao-‐talk-‐rolls-‐out-‐plus-‐friend-‐home-‐a-‐revamped-‐pla_orm-‐to-‐connect-‐users-‐with-‐their-‐favorite-‐brands/
Engagement is on everyone’s mind
h[p://socialbarrel.com/70-‐percent-‐of-‐brand-‐engagement-‐on-‐pinterest-‐come-‐from-‐users/51032/
h[p://iac>onable.com/user-‐engagement/
h[p://www.cio.com.au/ar>cle/459294/heart_founda>on_uses_gamifica>on_drive_user_engagement/
h[p://www.localgov.co.uk/index.cfm?method=news.detail&id=109512
h[p://www.trefis.com/stock/lnkd/ar>cles/179410/linkedin-‐makes-‐a-‐90-‐million-‐bet-‐on-‐pulse-‐to-‐help-‐drive-‐user-‐engagement/2013-‐04-‐15 7
What is user engagement?
User engagement is a quality of the user experience that emphasizes the posi>ve aspects of interac>on – in par>cular the fact of being cap>vated by the technology (Ahield et al, 2011).
user feelings: happy, sad, excited, …
emo>onal, cogni>ve and behavioural connec>on that exists, at any point in >me and over >me, between a user and a technological resource
user interac2ons: click, read, comment, buy…
user mental states: involved, lost, concentrated…
8
Considera2ons in the measurement of user engagement • Short term (within session) and long term (across mul>ple sessions)
• Laboratory vs. field studies • Subjec>ve vs. objec>ve measurement • Large scale (e.g., dwell >me of 100,000 people) vs. small scale (gaze pa[erns of 10 people)
• User engagement as process vs. as product
One is not be[er than other; it depends on what is the aim.
9
CHARACTERISTICS OF USER ENGAGEMENT
10
Characteris2cs of user engagement (I)
• Users must be focused to be engaged • Distor>ons in the subjec>ve percep>on of >me used to measure it
Focused a_en2on (Webster & Ho, 1997; O’Brien,
2008)
• Emo>ons experienced by user are intrinsically mo>va>ng • Ini>al affec>ve “hook” can induce a desire for explora>on, ac>ve discovery or par>cipa>on
Posi2ve Affect (O’Brien & Toms, 2008)
• Sensory, visual appeal of interface s>mulates user & promotes focused a[en>on
• Linked to design principles (e.g. symmetry, balance, saliency)
Aesthe2cs (Jacques et al, 1995; O’Brien, 2008)
• People remember enjoyable, useful, engaging experiences and want to repeat them
• Reflected in e.g. the propensity of users to recommend an experience/a site/a product
Endurability (Read, MacFarlane, & Casey, 2002;
O’Brien, 2008) 11
Characteris2cs of user engagement (II)
• Novelty, surprise, unfamiliarity and the unexpected • Appeal to users’ curiosity; encourages inquisi>ve behavior and promotes repeated engagement
Novelty (Webster & Ho, 1997; O’Brien,
2008)
• Richness captures the growth poten>al of an ac>vity • Control captures the extent to which a person is able to achieve this growth poten>al
Richness and control (Jacques et al, 1995; Webster &
Ho, 1997)
• Trust is a necessary condi>on for user engagement • Implicit contract among people and en>>es which is more than technological
Reputa2on, trust and expecta2on (Attfield et al,
2011)
• Difficul>es in sevng up “laboratory” style experiments
• Why should users engage?
Mo2va2on, interests, incen2ves, and benefits (Jacques et al., 1995; O’Brien &
Toms, 2008) 12
MEASURING USER ENGAGEMENT
13
Measuring user engagement Measures Characteris2cs
Self-‐reported engagement
Ques>onnaire, interview, report, product reac>on cards, think-‐aloud
Subjec>ve Short-‐ and long-‐term Lab and field Small-‐scale Product outcome
Cogni>ve engagement
Task-‐based methods (>me spent, follow-‐on task) Physiological measures (e.g. EEG, SCL, fMRI, eye tracking, mouse-‐tracking)
Objec>ve Short-‐term Lab and field Small-‐scale and large-‐scale Process outcome
Interac>on engagement
Web analy>cs metrics + models
Objec>ve Short-‐ and long-‐term Field Large-‐scale Process outcome
14
Large-‐scale measurements of user engagement – Web analy2cs
Intra-‐session measures Inter-‐session measures
• Dwell >me / session dura>on • Play >me (video) • (Mouse movement) • Click through rate (CTR) • Mouse movement • Number of pages viewed (click
depth) • Conversion rate (mostly for e-‐
commerce) • Number of UCG (comments)
• Frac>on of return visits • Time between visits (inter-‐session
>me, absence >me) • Total view >me per month (video) • Life>me value (number of ac>ons) • Number of sessions per unit of >me • Total usage >me per unit of >me • Number of friends on site (social
networks) • Number of UCG (comments)
• Intra-‐session engagement measures our success in a[rac>ng the user to remain on our site for as long as possible.
• Inter-‐session engagement can be measured directly or, for commercial sites, by observing life>me customer value.
15
Cogni2ve engagement • Eye tracking • Mouse movement • Face expression • Psychophysiological measures Respira>on, Pulse rate Temperature, Brain wave, Skin conductance, …
16
Signals – Signals – Signals: Five studies
self-‐reported engagement
WHAT ARE THE QUESTIONS TO ASK?
Interac>on
engagement
17
STUDY I • Domain: entertainment news • Study: saliency • Measurement: focus a[en>on and affect
18 + Lori McCay-‐Peet + Vidhya Navalpakkam
• How the visual catchiness (saliency) of “relevant” informa>on impacts user engagement metrics such as focused a[en>on and emo>on (affect) • focused a_en2on refers to the exclusion of other things
• affect relates to the emo>ons experienced during the interac>on
• Saliency model of visual a[en>on developed by (Iv & Koch, 2000)
Self-‐report engagement
19
Manipula2ng saliency
Web page screenshot
Saliency maps
salient con
di>o
n no
n-‐salient con
di>o
n
(McCay-‐Peet et al, 2012) 20
Study design • 8 tasks = finding latest news or headline on celebrity or entertainment topic
• Affect measured pre-‐ and post-‐ task using the Posi>ve e.g. “determined”, “a[en>ve” and Nega>ve e.g. “hos>le”, “afraid” Affect Schedule (PANAS)
• Focused a[en>on measured with 7-‐item focused a4en5on subscale e.g. “I was so involved in my news tasks that I lost track of >me”, “I blocked things out around me when I was comple>ng the news tasks” and perceived >me
• Interest level in topics (pre-‐task) and ques>onnaire (post-‐task) e.g. “I was interested in the content of the web pages”, “I wanted to find out more about the topics that I encountered on the web pages”
• 189 (90+99) par>cipants from Amazon Mechanical Turk 21
PANAS (10 posi2ve items and 10 nega2ve items) • You feel this way right now, that is, at the present moment [1 = very slightly or not at all; 2 = a li[le; 3 = moderately;
4 = quite a bit; 5 = extremely] [randomize items]
distressed, upset, guilty, scared, hos>le, irritable, ashamed, nervous, ji[ery, afraid interested, excited, strong, enthusias>c, proud, alert, inspired, determined, a[en>ve, ac>ve
(Watson, Clark & Tellegen, 1988)
22
7-‐item focused a_en2on subscale (part of the 31-‐item user engagement scale) 5-‐point scale (strong disagree to strong agree)
1. I lost myself in this news tasks experience 2. I was so involved in my news tasks that I lost track of >me 3. I blocked things out around me when I was comple>ng the
news tasks 4. When I was performing these news tasks, I lost track of
the world around me 5. The >me I spent performing these news tasks just slipped
away 6. I was absorbed in my news tasks 7. During the news tasks experience I let myself go
(O'Brien & Toms, 2010) 23
Saliency and posi2ve affect
• When headlines are visually non-‐salient • users are slow at finding them, report more distrac>on due to web page features, and show a drop in affect
• When headlines are visually catchy or salient • user find them faster, report that it is easy to focus, and maintain posi>ve affect
• Saliency is helpful in task performance, focusing/avoiding distrac2on and in maintaining posi2ve affect
24
Saliency and focused a_en2on • Adapted focused a[en>on subscale from the online shopping domain to entertainment news domain
• Users reported “easier to focus in the salient condi>on” BUT no significant improvement in the focused a[en>on subscale or differences in perceived >me spent on tasks
• User interest in web page content is a good predictor of focused a_en2on, which in turn is a good predictor of posi2ve affect
25
Self-‐repor2ng, crowdsourcing, saliency and user engagement
• Interac>on of saliency, focused a[en>on, and affect, together with user interest, is complex.
• Using crowdsourcing worked!
• What next? • include web page content as a quality of user engagement in focused a[en>on scale
• more “realis2c” user (interac>ve) reading experience • other measurements: mouse-‐tracking, eye-‐tracking, facial expression analysis, etc.
(McCay-‐Peet, Lalmas & Navalpakkam, 2012) 26
STUDY II • Domain: news and user generated content
(comments) • Study: interes>ngness and sen>ment • Measurement: focus a[en>on, affect and
gaze
27 + Ioannis Arapakis + Barla Cambazoglu + Mari-‐Carmen Marcos + Joemon Jose
Gaze and self-‐repor2ng
• News + comments • Sen>ment, interest • 57 users (lab-‐based) • Reading task (114)
• Ques>onnaire (qualita>ve data) • Record eye tracking (quan>ta>ve data)
Three metrics: gaze, focus a[en>on and posi>ve affect
28 (Lin et al, 2007)
Interes2ng content promote users engagement metrics
• All three metrics: • focus a[en>on, posi>ve affect & gaze
• What is the right trade-‐off? • news is news J
• Can we predict? • provider, editor, writer, category, genre, visual aids, …, sen2mentality, …
• Role of user-‐generated content (comments) • As measure of engagement? • To promote engagement? 29
Lots of sen2ments but with nega2ve connota2ons!
• Positive effect (and interest, enjoyment and wanted to know more) correlates
• Positively (é) with sentimentality (lots of emotions)
• Negatively (ê) with positive polarity (happy news)
Sen2Strenght (from -‐5 to 5 per word) sen>mentality: sum of absolute values (amount of sen>ments) polairity: sum of values (direc>on of the sen>ments: posi>ve vs nega>ve)
(Thelwall, Buckley & Paltoglou, 2012)
30
Effect of comments on user engagement
• 6 ranking of comments: • most replied, most popular, newest • sen>mentality high, sen>mentality low • polarity plus, polarity minus
• Longer gaze on • newest and most popular for interes>ng news • most replied and high sen>mentality for non-‐interes>ng news
• Can we leverage this to prolong user a[en>on? 31
Gaze, sen2mentality, interest
• Interes>ng and “a[rac>ve” content! • Sen>ment as a proxy of focus a[en>on, posi>ve affect and gaze?
• Next • Larger-‐scale study • Other domains (beyond news!) • Role of social signals (e.g. Facebook, Twi[er) • Lots more data: mouse tracking, EEG, facial expression
(Arapakis et al., 2013) 32
STUDY III • Domain: news and social media (Wikipedia) • Study: interes>ngness, aesthe>cs, task • Measurement: focus a[en>on, affect and
mouse movement
33 + David Warnock
Mouse tracking and self-‐repor2ng • 324 users from Amazon Mechanical Turk (between subject
design) • Two domains (BBC News and Wikipedia) • Two tasks (reading and search) • “Normal vs Ugly” interface
• Ques>onnaires (qualita>ve data) • focus a[en>on, posi>ve effect, novelty, • interest, usability, aesthe>cs • + demographics, handeness & hardware
• Mouse tracking (quan>ta>ve data) • movement speed, movement rate, click rate, pause length,
percentage of >me s>ll
34
“Ugly” vs “Normal” Interface (BBC News)
35
Mouse tracking can tell about • Age
• Hardware • Mouse • Trackpad
• Task • Searching: There are many different types of phobia. What is Gephyrophobia a fear of?
• Reading: (Wikipedia) Archimedes, Sec5on 1: Biography
36
Mouse tracking could not tell much about • focused a[en>on and posi>ve affect • user interests in the task/topic
• BUT BUT BUT BUT • “ugly” variant did not result in lower aesthe>cs scores • although BBC > Wikipedia
• BUT – the comments le� … • Wikipedia: “The website was simply awful. Ads flashing everywhere, poor text colors on a dark blue background.”; “The webpage was en5rely blue. I don't know if it was supposed to be like that, but it definitely detracted
from the browsing experience.” • BBC News: “The website's layout and color scheme were a bitch to navigate and read.”; “Comic sans is a horrible font.”
37
Mouse tracking and user engagement
• Task and hardware • Do we have a Hawthorne Effect??? • “Usability” vs engagement
• “Even uglier” interface? • Within-‐ vs between-‐subject design?
• What next? • Sequence of movements • Automa>c clustering
(Warnock & Lalmas, 2013)
38
STUDY IV
• Domain: news • Study: automa>c linking • Measurement: interes>ngness
39 + Ioannis Arapakis +Hakan Ceylan + Pinar Domnez
Automatic linking & reading experience 40
Keeping users reading more ar>cles
LEPA: Linker for Events to Past Articles LEPA is a a fully automated approach to constructing hyperlinks in news articles using “simple” text processing and understanding techniques
Indexer
• Processes ar>cles over a >me period by extrac>ng features from each ar>cle and storing them to facilitate faster retrieval
Linker
• Iden>fies sentences that contain newsworthy events
• For each such event it retrieves from the index all the matching ar>cles and links the top-‐ranked with the event 41
Three-stage evaluation
Pilot study
Assessing reading experience
Assessing links
+
42
Pilot study • Rating results:
• Bad: 35.15% • Fair: 33.93% • Good: 20% • Excellent: 9.09% • Not Judged: 1.81%
• With 63.03% of the links being good: • initial evidence that LEPA is not too far from the
optimum achieved by human editors
Professional editors A collec>on of system-‐embedded links (164 ar>cle-‐link combina>ons) 5-‐point Likert Scale: (i) bad, (ii) fair, (iii) good, (iv) excellent, and (v) not judged
43
Assessing the links: are they related?
• 664 par>cipants recruited through Amazon Mechanical Turk; between-‐group design (two groups)
• Precision = frac>on of links (total=164) that received, in terms of relatedness, a score equal to, or greater than, 3 on a 5-‐point Likert scale
System-Embedded Links Manually-curated Links
Participant A
Participant B
All Participant A
Participant B
All
Related to the main theme 49% 42% 45% 54% 51% 53%
Related to subtopic 21% 24% 22% 31% 34% 33% Tangentially related 13% 15% 14% 9% 12% 10% Unrelated 15% 16% 16% 5% 1% 3% Other 2% 2% 2% 1% 2% 1%
44
Assessing the Reading Experience • 120 par>cipants recruited through Amazom Mechanical Turk; between-‐groups design (three groups)
• Editors + two opposite “extremes” of LEPA: • High recall: best at embedding newsworthy links & ar>cles that provide interes>ng insights
• High precision: best in terms of embedding the right number of links
good topical coverage
informa2veness
broader perspec2ve
interes2ng insights
good topical coverage
link presenta2on
content volume
posi2ve news reading
experience 45
indu
c>ve, the
ma>
c coding of
open
-‐end
ed que
s>on
s
Automa2c linking and news reading experience
• Even under realis>c and uncontrolled condi>ons, performance of LEPA comparable to that of editors, and in some cases be[er
• High precision vs. high recall • High precision threshold leads to a be[er news reading experience: less is more “They were too many, being mostly quite long, in some cases more than half the length of the main ar5cle, and some5mes they repeated the same iden5cal informa5on”
46
STUDY V
• Domain: social media (Yahoo! Answers and Wikipedia)
• Study: serendipity • Measurement: relevance, unexpectedness,
interes>ngness
47 + Ilaria Bordino + Yelena Mejova
En2ty-‐driven Exploratory Search
Linguis-cally Mo-vated Seman-c Aggrega-on Engines “transi5on to a truly seman5c aggrega5on paradigm where machines understand a user’s intent, discover and organize facts, iden5fy opinions, experiences and trends”
En>ty Search
we build an en>ty-‐driven serendipitous search system based on en>ty networks extracted from Wikipedia and Yahoo! Answers
Serendipity finding something good or useful while not specifically looking for it, serendipitous search systems provide relevant and interes>ng results
48
Yahoo! Answers vs Wikipedia community-‐driven ques>on & answer portal • 67 336 144 ques>ons & 261 770 047 answers
• January 1, 2010 – December 31, 2011
• English-‐language
community-‐driven encyclopedia • 3 795 865 ar>cles • as of end of December
2011 • English Wikipedia
curated high-‐quality knowledge variety of niche topics
minimally curated opinions, gossip, personal info
variety of points of view
49
Entity & Relationship Extraction
• en2ty – any well-‐defined concept that has a Wikipedia page • rela2onship – a topical rela>onship/similarity between a pair of en>>es based on document co-‐occurrence • related to the number of documents in which the two en>>es occur
50
Dataset # Nodes # Edges Density # Isolated
Yahoo! Answers 896,799 112,595,138 0.00028 69,856
Wikipedia 1,754,069 237,058,218 0.00015 82,381
Dataset Avg Degree Max Degree Size of Largest CC
Yahoo! Answers 251 231,921 826,402 (92.15%)
Wikipedia 270 346,070 1,671,241 (95.28%)
Wikipedia
51
Yahoo! Answers
Retrieval
Wikipedia Yahoo! Answers Combined
Precision @ 5 0.668 0.724 0.744
MAP 0.716 0.762 0.782
Jus>n Bieber, Nicki Minaj, Katy Perry, Shakira, Eminem, Lady Gaga, Jose Mourinho, Selena Gomez, Kim Kardashian, Miley Cyrus, Robert Pavnson, Adele %28singer%29, Steve Jobs, Osama bin Laden, Ron Paul, Twi[er, Facebook, Ne_lix, IPad, IPhone, Touchpad, Kindle, Olympic Games, Cricket, FIFA, Tennis, Mount Everest, Eiffel Tower, Oxford Street, Nubcrburgring, Hai>, Chile, Libya, Egypt, Middle East, Earthquake, Oil spill, Tsunami, Subprime mortgage crisis, Bailout, Terrorism, Asperger syndrome, McDonal's, Vitamin D, Appendici>s, Cholera, Influenza, Pertussis, Vaccine, Childbirth
3 labels per query-‐result pair gold standard quality control
Yahoo! Answers Jon Rubinstein Timothy Cook Kane Kramer Steve Wozniak Jerry York
Wikipedia System 7
PowerPC G4 SuperDrive
Power Macintosh Power Compu>ng Corp.
Steve Jobs • Annotator agreement (overlap): 0.85
• Average overlap in top 5 results: <1
52
retrieve en>>es most related to a query en>ty using random walk
| relevant & unexpected | / | unexpected | number of serendipitous results out of all of the unexpected results retrieved | relevant & unexpected | / | retrieved |
serendipitous out of all retrieved 53
Baseline Data Top: 5 en>>es that occur most frequently WP 0.63 (0.58) in top 5 search from Bing and Google YA 0.69 (0.63) Top –WP: same as above, but excluding WP 0.63 (0.58) Wikipedia page from results YA 0.70 (0.64) Rel: top 5 en>>es in the related query WP 0.64 (0.61) sugges>ons provided by Bing and Google YA 0.70 (0.65) Rel + Top: union of Top and Rel WP 0.61 (0.54) YA 0.68 (0.57)
Serendipity “making fortunate discoveries by accident” Serendipity = unexpectedness + relevance
“Expected” result baselines from web search
Interes2ngness ≠ Relevance Interes2ng > Relevant
Relevant > Interes2ng
Oil Spill à Penguins in Sweaters WP
Robert Pavnson à Water for Elephants WP
Lady Gaga à Britney Spears WP
Egypt à Cairo Conference WP
Ne_lix à Blu-‐ray Disc YA
Egypt à Ptolemaic Kingdom WP & YA
54 (Bordino, Mejova & Lalmas, 2013)
Similarity (Kendall’s tau-‐b) between result sets and reference ranking
55
Data tau-‐b Which result is more WP 0.162 relevant to the query? YA 0.336 If someone is interested in the query, would WP 0.162 they also be interested in the result? YA 0.312 Even if you are not interested in the query, WP 0.139 is the result interes5ng to you personally? YA 0.324 Would you learn anything new about WP 0.167 the query from the results YA 0.307
Following (Arguello et al, 2011) 1. Labelers provide pairwise
comparisons between results 2. Combine into a reference ranking 3. Compare result ranking to op>mal
ranking using Kendall’s tau
Assessing “interes2ngness”
Multimedia search activities often driven by entertainment needs, not by information needs
Serendipity in multimedia search?
(Slaney, 2011) 56
What are the ques2ons to ask? • No one measurement is perfect or complete. • All studies (process or product) have different constraints.
• Need to ensure methods are applied consistently with a[en>on to reliability: what is a good signal?
• More emphasis should be placed on using mixed methods to improve the validity of the measures.
• Be careful of the WEIRD syndrome (Western, Educated, Industrialized, Rich, and Democra>c)
57
Acknowledgements
• Collaborators: Ioannis Arapakis, Ilaria Bordino, , Barla Cambazoglu, Hakan Ceylan, Pinar Domnez, Lori McCay-‐Peet, Yelena Mejova, Vidhya Navalpakkam, David Warnock, and others at Yahoo! Labs.
• This talk uses some material from a tutorial “Measuring User Engagement” given at WWW 2013, Rio de Janeiro (with Heather O’Brien and Elad Yom-‐Tov)
Blog: labtomarket.wordpress.com
58