Upload
christian-timmerer
View
174
Download
0
Embed Size (px)
Citation preview
Is One Second Enough? Evalua&ng QoE for Inter-‐Des&na&on Mul&media Synchroniza&on
using Human Computa&on Benjamin Rainer, Stefan Petscharnig, Chris<an Timmerer, and Hermann Hellwagner
Alpen-‐Adria-‐Universität Klagenfurt (AAU) w Faculty of Technical Sciences (TEWI) w Department of Informa&on
Technology (ITEC) w Mul&media Communica&on (MMC) w Sensory Experience Lab (SELab) hLp://blog.&mmerer.com w hLp://selab.itec.aau.at/ w hLp://dash.itec.aau.at w chris&an.&[email protected]
Chief Innova&on Officer (CIO) at bitmovin GmbH hLp://www.bitmovin.com w chris&an.&[email protected]
Slides: hBp://www.slideshare.net/chris<an.<mmerer
QoMEX 2015, May 27, 2015
Outline • Mo&va&on • Our Approach • Reac&on Game for Subjec&ve Quality Assessment • Evalua&on Methodology • Results • Conclusions May 27, 2015 QoMEX 2015 2
Mo&va&on • Watching mul&media content online together while geographically distributed,
e.g., sport events, Twitch, online quiz shows, … • SocialTV scenario featuring real-‐&me communica&on via text, voice, video
• Inter-‐Des&na&on Mul&media Synchroniza&on[0] == the playout of media streams at two or more geographically distributed loca&ons in a &me synchronized manner
May 27, 2015 QoMEX 2015 3
User 1 User 2 Goal! Did you see the goal?
Which goal? Thanks for the spoiler!
[0] M. Montagud, F. Boronat, H. Stokking, R. Brandenburg, "Interdes&na&on mul&media synchroniza&on: schemes, use cases and standardiza&on," Mul$media Systems, vol. 18, pp. 459–482, 2012.
Mo&va&on (cont’d) • Geerts et. al: Are we in sync?[1]
– Watching videos online together, while using voice and text chat
– No&ceability of asynchronism and its impact on annoyance and togetherness
– Recommenda&on: 1 second is enough – we don‘t think so! • What is the lower threshold on asynchronism for IDMS?
– Alterna&vely: Above which level of asynchronism do users realize that they are not in sync?
• How to assess QoE in SocialTV scenarios?
May 27, 2015 QoMEX 2015 4
[1] D. Geerts, et al., "Are we in sync?: synchroniza&on requirements for watching online video together," Proc. of SIGCHI Conference on Human Factors in Compu$ng Systems (CHI '11), pp. 311-‐314, 2011.
Our Approach • We adopt a combina&on of
– Games with a purpose[2] – Gamifica&on[3] – Crowdsourcing[4]
• We design and implement a game to evaluate the impact of asynchronism on – Fairness – Togetherness – Annoyance – QoE
May 27, 2015 QoMEX 2015 5
[2] L. von Ahn, L. Dabbish, "Labeling images with a computer game," Proceedings of the SIGCHI Conf. on Human Factors in Compu$ng Systems (CHI’04), pp. 319-‐326, 2004. [3] E. D. Mekler, F. Bruhlmann, K. Opwis, A. N. Tuch, "Do points, levels and leaderboards harm intrinsic mo&va&on?: An empirical analysis of common gamifica&on elements," Proceedings of the First Interna$onal Conference on Gameful Design, Research, and Applica$ons (Gamifica$on’13), pp. 66-‐73, 2013. [4] T. Hossfeld, C. Keimel, M. Hirth, B. Gardlo, J. Habigt, K. Diepold, and P. Tran-‐Gia, "Best Prac&ces for QoE Crowdtes&ng: QoE Assessment with Crowdsourcing,” IEEE Transac$ons on Mul$media, vol. 16, no. 2, pp. 541-‐558, 2014.
Reac&on Game for Subjec&ve Quality Assessments
• Aligned to use case, synchroniza&on • Connected to video content, not a full game • Crowdsourcable (simulated opponent) • Game Idea: Collabora&ve reac&on game – Players have to react to game events – Collabora&ve aspect: bonus score whenever both players click within a given &me window
– Explicit user feedback (hit, miss, bonus)
May 27, 2015 QoMEX 2015 6
Game Events
May 27, 2015 QoMEX 2015 7
Bonus Score Example
May 27, 2015 QoMEX 2015 8
Evalua&on Procedure • Evalua&on using the WESP[5]
framework
• Structured in five phases – Explain the experiment – Gather demographic data – Get par&cipants used to the procedure – Play a game round with subsequent
evalua&on for each test case – Give feedback to evalua&on process
May 27, 2015 QoMEX 2015 9
[5] B. Rainer, M. Waltl, C. Timmerer, "A Web based Subjec&ve Evalua&on Plavorm,” Proceedings of the 5th Interna$onal Workshop on Quality of Mul$media Experience (QoMEX’15). pp. 24–25, 2013.
Crowdsourcing
May 27, 2015 QoMEX 2015 10
• Subjec&ve quality assessment using crowdsourcing – We used Microworker[6] crowdsourcing plavorm and
paid 0.5 USD for each successful par&cipa&on – Dura&on about 15 minutes – Simulated opponent
[6] hLp://www.microworkers.com
• Implicit Measures • Number of browser focus changes • Number of clicks • Video playback length • Score • Number of pauses • …
• Explicit Measures • Fairness • Togetherness • Annoyance • QoE
Slider with a con&nuous scale from 0 (very low) to 100 (very high) with ini&al posi&on at 50 (medium)
S&muli and Par&cipants • Videos: in-‐game footage of
– inFAMOUS: Second Son[7] – Knack[8]
• Training phase – Infamous: Second Son 0 (00:54, 3 events)
• Main evalua&on using three video sequences* – Infamous : Second Son 1 (01:46, 6 Events) – Infamous : Second Son 2 (01:58, 8 Events) – Knack (01:50, 4 Events)
• Video sequences pre-‐cached to avoid any bias caused by stalls • Display of configura&ons in random order
May 27, 2015 QoMEX 2015 11
Test Configura<on
Asynchronism [ms]
Window length [ms]
Bonus window [ms]
Training 0 2000 2000
Synchronous 0 2000 2000
Small Async 400 2000 1600
Medium Async 750 2000 1250
Big Async 1500 2000 500
[7] inFAMOUS: Second Son -‐ Sukker Punch, hLp://infamous-‐second-‐son.com/ [8] Knack -‐ SCE Japan Studio, hLp://www.playsta&on.com/en-‐us/games/knack-‐ps4/
* With a resolu&on of 720p, 29 fps, and approx. 2 Mbit/s
S&muli and Par&cipants (cont‘d) • In total, 89 microworkers par&cipated in the study – The campaign was restricted to Europe, Northern America, Australia and New Zealand
• We screened 45 par&cipants, by filtering them according to: – Browser focus change (27) – Total number of clicks < 1 (16) – Number of clicks during any event < 1 (2)
May 27, 2015 QoMEX 2015 12
Results: Togetherness & Annoyance
May 27, 2015 13 QoMEX 2015
Significant difference in means between • 0 ms and 750 ms (t = 1.68, p-‐value = 0.096, alpha = 0.1) • 400 ms and 750 ms (t = 2.08, p-‐value = 0.040, alpha = 0.05)
Significant difference in means between • 400 ms and 750 ms (t = -‐1.31, p-‐value = 0.049, alpha = 0.05)
Results: Fairness & QoE
May 27, 2015 QoMEX 2015 14
Significant difference in means between • 400 ms and 750 ms (t = 2.51, p-‐value = 0.014, alpha = 0.05) • 400 ms and 1500 ms (t = 1.93, p-‐value = 0.057, alpha = 0.1) • For the pairs of test cases (0 ms, 750 ms) and (0 ms, 1500 ms)
the p-‐value is slightly above alpha = 0.1
Significant difference in means between • 400 ms and 750 ms (t = 1.73 p-‐value = 0.087 alpha = 0.1) • 400 ms and 1500 ms (t = 2.1 p-‐value = 0.039 alpha = 0.05)
Results: Game Score
• Drop in score a}er 400ms
• Same tendencies as in previous results
May 27, 2015 QoMEX 2015 15
Conclusions • Using a game to evaluate the impact of asynchronism on QoE, fairness,
togetherness, and annoyance
ONE
• Our evalua&on showed that there is significantly – lower QoE – lower fairness – lower togetherness – higher annoyance above a threshold T (400 ms ≤ T ≤ 750 ms)
• Future work – More precise threshold value – Rela&onship between QoE and other variables (fairness, togetherness, annoyance)
May 27, 2015 QoMEX 2015 16
One second is clearly not enough
Thank you for your aLen&on
... ques&ons, comments, etc. are welcome …
Stefen Petscharnig and Priv.-‐Doz. Dipl.-‐Ing. Dr. Chris&an Timmerer Associate Professor
Alpen-‐Adria-‐Universität Klagenfurt, Department of Informa&on Technology (ITEC) Universitätsstrasse 65-‐67, A-‐9020 Klagenfurt, AUSTRIA
chris&an.&[email protected]‐klu.ac.at hLp://research.&mmerer.com/
Tel: +43/463/2700 3621 Fax: +43/463/2700 3699 © Copyright: Chris$an Timmerer and Stefan Petscharnig 17 May 27, 2015 QoMEX 2015