77
Thesis for the Master’s degree in Computer Science Speciale for cand.scient graden i datalogi OScreen Interaction Karampis Panagiotis Supervisor: Sebastian Boring Department of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen East, Denmark [email protected] April 2015

Offscreen Interaction

Embed Size (px)

DESCRIPTION

Offscreen interaction is my thesis. Using an external device, to be able to manipulate windows' state (view/hidden) using hands

Citation preview

  • Thesis for the Masters degree in Computer ScienceSpeciale for cand.scient graden i datalogi

    OScreen Interaction

    Karampis Panagiotis

    Supervisor: Sebastian Boring

    Department of Computer Science, University of Copenhagen

    Universitetsparken 1, DK-2100 Copenhagen East, Denmark

    [email protected]

    April 2015

  • ii

  • Abstract

    Modern desktop paradigms are operated through a set of keyboard combinations,

    mouse clicks even mouse pad gestures, in which users are tied with and naturally

    provide after so many years of usage a fluent interaction. Despite the vast evolution

    of the available ways used to interact with Virtual Reality, the fundamental principle

    of interaction remains always the same: usage of the concrete, well-known physical

    devices (keyboard, mouse) attached to the computer. We present Oscreen Interac-

    tion, a system to utilize the spatial space around the screen, which serves as windows

    storage area while we interact with the computer screen through a pluggable, gesture-

    recognizing device. The aim is to comprehend how users react to the existence or lack

    of any form of visual feedback and whether grouping windows while positioning af-

    fects the performance when no feedback is given. In a user study, we found that the

    most ecient and eective way of interaction was when visual feedback was given;

    in the case of no visual feedback, we observed that participants achieved the highest

    performance by grouping windows or applying some subjective methodology.

    iii

  • iv

  • Acknowledgements

    To family and Lela who endlessly supported through all this eort...

    v

  • vi

  • Contents

    1 Introduction 1

    1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Objectives - Thesis Establishment . . . . . . . . . . . . . . . . . . . . 2

    1.3 Research Questions addressed . . . . . . . . . . . . . . . . . . . . . . . 4

    1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Related Work 7

    2.1 Using secondary means . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.2 Using virtual space management . . . . . . . . . . . . . . . . . . . . . 8

    2.3 Using space around displays . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.4 Mid-air Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    3 Design 13

    3.1 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    3.2 The oscreen area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.3 Gestures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.4 Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    3.5 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4 Implementation 17

    4.1 Application Description . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    4.2 Development Languages & Frameworks . . . . . . . . . . . . . . . . . 18

    vii

  • 4.3 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    4.4 Connectivity & Architecture . . . . . . . . . . . . . . . . . . . . . . . . 20

    4.5 Basic workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    4.6 Gesture algorithm workflow . . . . . . . . . . . . . . . . . . . . . . . . 22

    4.7 Matching Coordinate Systems & Mouse Movement . . . . . . . . . . . 24

    4.8 Application Identification . . . . . . . . . . . . . . . . . . . . . . . . . 24

    4.9 Implementation the oscreen and inscreen area recognition . . . . . . 25

    4.9.1 The orthogonal cognitive area . . . . . . . . . . . . . . . . . . . 25

    4.9.2 Implementation of boxes . . . . . . . . . . . . . . . . . . . . . . 26

    4.10 Abstract application workflow . . . . . . . . . . . . . . . . . . . . . . . 27

    5 Experiment - User study 29

    5.1 Experimental Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    5.2 Experiments Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    5.2.1 Welcome - Brief explanation of interaction . . . . . . . . . . . . 30

    5.2.2 User Learn Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    5.2.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    5.2.4 Demographic Information . . . . . . . . . . . . . . . . . . . . . 33

    5.2.5 Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    5.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    6 Results 37

    6.1 Completion Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    6.2 Error Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    6.3 Peculiar Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    6.4 Subjective Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    7 Discussion 43

    8 Conclusion - Future Work 45

    8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    viii

  • 8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    A Code snippets 53

    A.1 Coordinate translation & mouse movement . . . . . . . . . . . . . . . 53

    A.2 Screen point conversion . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    A.3 Inscreen area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    A.4 Oscreen area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    A.5 Upon grab, extract windows under cursor information . . . . . . . . . 54

    A.6 Shue windows before experiment starts . . . . . . . . . . . . . . . . . 54

    A.7 Extract the title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    A.8 Generate image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    A.9 Move window (Single Feedback) . . . . . . . . . . . . . . . . . . . . . . 56

    B Demographic Information Form 57

    C Technique Assessment Form 59

    ix

  • x

  • List of Figures

    1.1 Open windows arrangement before switching to a new task. . . . . . . 2

    1.2 Open windows arrangement after switching to Sublime. . . . . . . . . 3

    1.3 User points or gestures on a window. Window becomes selected. . . 3

    1.4 User drag and drop the window in the o-screen area. Foreground

    is occupied by another window. . . . . . . . . . . . . . . . . . . . . . . 4

    1.5 Window has been dragged in the oscreen area is now available to be

    retrieved by reversing the process. . . . . . . . . . . . . . . . . . . . . 4

    3.1 Oscreen area illustrated with dimensions. . . . . . . . . . . . . . . . . 14

    3.2 Grab gesture sequence. Release sequence is the reversed grab. . . . . . 15

    3.3 Chrome applications image on box (0,1), current hand position in box

    (0,2) with red frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    3.4 Chrome applications image on box (0,1) and hand position at (0,1) . 16

    4.1 Leap Motions size and upon operation of hand skeletal tracking. . . . 19

    4.2 Leap Motion Controller structure. [56] . . . . . . . . . . . . . . . . . . 19

    4.3 Leap Motion Interaction box: A reversed pyramid shape formulated

    by cameras and LEDS. [2] . . . . . . . . . . . . . . . . . . . . . . . . . 20

    4.4 Leap Motion Controller architecture. [2] . . . . . . . . . . . . . . . . . 21

    4.5 Oscreen Interaction - Abstract workflow of Basic Modules. . . . . . . 22

    4.6 Gesture algorithm workflow: From initial state to grab state. . . . . . 23

    4.7 Gesture algorithm workflow: From grab state to release state. . . . . . 23

    4.8 Interaction box diagram. [2] . . . . . . . . . . . . . . . . . . . . . . . . 26

    4.9 Box class diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    xi

  • 5.1 Set-up for left handed mouse users. . . . . . . . . . . . . . . . . . . . . 29

    5.2 Set-up for right handed mouse users. . . . . . . . . . . . . . . . . . . . 29

    5.3 Notification with remaining windows number and title. . . . . . . . . . 32

    5.4 Notification showing next window to be fetched from oscreen. . . . . 32

    5.5 Windows after shuing. . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    5.6 Selected windows for the experiment . . . . . . . . . . . . . . . . . . . 33

    6.1 Participant 1 fully matched FF, SF, NF. . . . . . . . . . . . . . . . . . 39

    6.2 Participants 1 grouping. . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    6.3 Participants 2 potential grouping. . . . . . . . . . . . . . . . . . . . . 41

    6.4 Mean Values assessed by participants for each feedback mode. . . . . . 42

    xii

  • List of Tables

    5.1 Feedback cycling for the first 3 participants. Afterwards, it repeats itself. 31

    5.2 Sample Balanced Latin Square algorithm for windows . . . . . . . . . 31

    5.3 Demographic information . . . . . . . . . . . . . . . . . . . . . . . . . 33

    6.1 Abbreviations for the feedback modes . . . . . . . . . . . . . . . . . . 37

    6.2 Mean time dierences between modes . . . . . . . . . . . . . . . . . . 38

    6.3 Mean error dierences between modes . . . . . . . . . . . . . . . . . . 39

    xiii

  • xiv

  • Chapter 1

    Introduction

    Window switching is one of the most frequent tasks a user performs and can occur

    several hundred times per day. Numerous window operations are performed when we

    work and run multiple windows, such as moving, resizing and switching. Management

    of such activity, would help enhancing users computer experience and eectiveness.

    Window switch can be evolved as a really complicated task, since for example a

    developer may need to switch to browser and look for documentation, switch back

    to IDE to write code, switch to terminal test and push the code, check for emails

    and update completed tasks. Window switch is unavoidable, even on larger screens

    users tend to consume all available screen space and create even more windows to

    navigate to [23]. Generally, users rely on the Operating Systems window manager to

    provide a convenient way to manage open windows according to their needs so that

    they are easy to retrieve. Switching windows is divided into two subtasks: find the

    desired window and then bring it to the foreground. Widely accepted techniques to

    achieve both subtasks are performed either directly with mouse or with keyboards

    keys combinations.

    1.1 Problem Description

    Interaction with objects scattered across the screens using the mouse can be trouble-

    some as mouses movement is limited to the area of desk. In addition, it gets even

    more problematic when drag and drop is involved, where one miss-click restarts the

    process. Furthermore, mouses cursor is limited to the screen thus any interactions

    are restricted within this plane. 3D interaction with screen has been researched and

    several input devices have been developed that enable users to manipulate virtual

    reality (VR), for example virtual hand and depth cursor techniques. Such techniques

    -along with corresponding input devices- have been considered as inadequate, appro-

    priate to use only under specific circumstances. Additionally, the hardware used is

    way too expensive such as systems used to track bodies in the cinema industry [53],

    [42].

    1

  • Several Window Managers (WM) have been implemented to visualize this process.

    Windows 7 presents each window on its own frame even if they are windows of the

    same application. Macs mission control tiles opened windows so that they are visible

    at once and simultaneously stacks windows of the same application, for all applications

    that belong to the same desktop or workspace. Gnome 3 assigns dierent windows on

    its own frame but presents windows under the same application stacked. The way to

    navigate between open windows varies between Window Managers. The combination

    of Alt+Tab (Cmd+Tab) is across all systems for consistency reasons, although cycling

    between windows of same application varies from one Window Manager to another.

    However, Robertson et al. [47] showed that stacking windows under same application

    confuses many users, because the applications windows may not be related to the

    same task, where a task is defined as a collection of applications organized around a

    particular activity.

    1.2 Objectives - Thesis Establishment

    We propose a dierent technique for switching between tasks and involve human-

    computer interaction (HCI) in the function of windows switching. As there are many

    HCI patterns including body posture, hand/finger gestures, speech recognition, eyes

    movement and so on, we chose to use hand gestures as they are the most trivial and

    easy for users to get familiar with.

    Using MacOSXs Mission Control to switch between applications can be cumbersome.

    For example, windows are re-arranged every time another window gets focus or a new

    one is created. In the following scenario the user is writing a document, therefore

    the application Pages is selected, then user wants to switch to Sublime and activates

    the Mission Control by pressing F3 and selects Sublime from the upper left corner.

    Figure 1.1 shows the windows initial arrangement.

    Figure 1.1: Open windows arrangement before switching to a new task.

    At that point, user wants to switch back to the Pages and presses the F3. Figure 1.2

    shows now a completely dierent screen, where the user has to gaze and find where

    2

  • Pages are located so that they can select it.

    Figure 1.2: Open windows arrangement after switching to Sublime.

    From that point and after, switching between Pages and Sublime does not trigger

    re-arrangement, unless a third task interleaves. Then again, user will have to find the

    correct window in a rearranged list.

    We experienced also this behavior by using the common paradigm of Cmd+Tab: after

    switching between tasks using keyboard, Mission Control has rearranged the windows.

    Preliminary experiment conducted with users both familiar and unfamiliar with Ma-

    cOSX, showed that users were frustrated by the mechanic, were unable to understand

    the pattern and stated that they expected pattern of windows order to be clear on

    first sight rather than having to spend time into investigating how windows are ar-

    ranged. All users reacted positively in arranging windows in a linear form such as in

    rows or columns.

    The contribution of this research is to design a new Human-Computer interaction

    utilizing gestures in the oscreen spatial space. This would allow a more eective

    task-switching paradigm by placing and retrieving desired windows in the spatial

    space around a computer screen. This spatial space is defined by the cone area where

    the LEAP Motion operates in. Figures 1.3, 1.4, 1.5 demonstrate this functionality.

    Figure 1.3: User points or gestures on a window. Window becomes selected.

    3

  • Figure 1.4: User drag and drop the window in the o-screen area. Foreground is

    occupied by another window.

    Figure 1.5: Window has been dragged in the oscreen area is now available to be

    retrieved by reversing the process.

    1.3 Research Questions addressed

    Here, we present the scientific research questions addressed by Oscreen Interaction

    and in terms of goals, we want to exploit the impact and the eciency of such 2.5D1

    pointing system through dierent visual feedback modes, bringing users one step

    closer to freeing themselves from operating solely with mouse. Therefore, the main

    questions that this thesis will try to answer are:

    Q1:How important is the role of visual feedback in VR systems? Towhich extend it aects the performance?

    Feedback and HCI techniques are inextricably connected. For instance, we

    get feedback when we press Alt+Tab to switch windows through a dialog in

    the screen or when we move the mouse, we can see the mouse cursor in the

    screen (full feedback). What would be our experience in HCI if there was no

    dialog when pressing Alt+Tab (no feedback) or if when we hover over menus, a

    change of color (typical behavior) would only appear frequently and occasionally

    (partial feedback)? We need to investigate this behavior in our system and

    examine how feedback influences performance on the oscreen area.

    Q2: Do users interact randomly with an non-visible area or developtechniques?

    1With the term 2.5D we mean that although we interact with a typical 2D computer screen, the

    actual pointing is happening in a 3D space, thus the term 2.5D

    4

  • Another factor that would improve vastly the performance and the eectiveness

    is grouping according to subjective patterns or following some specific subjective

    methodology when positioning windows in the oscreen area. As we explain

    in the experiment 5.2.3, we would like to see the existence (or not) of such

    formations; which we consider would give us directions for future research.

    1.4 Thesis Structure

    The structure and the content of the thesis is shortly described below. This work is

    consisted of 8 Chapters, including this chapter, which are structured as follow:

    Chapter 2: Publications related to several forms of interaction (Secondarymeans 2.1, Virtual Space management 2.2, using space around displays 2.3, mid-

    air interaction 2.4), are presented. We quote papers that are highly relevant to

    our work and others that are in similar field, as this study merges a variety of

    considerations. In this chapter we target to help the reader comprehend current

    limitations and current approaches in the area as well as introduce the reader

    to this field.

    Chapter 3: We present the design of our implementation and analyze the con-siderations taken behind the Oscreen Interaction while backgrounding choices

    made with scientific facts. The major design choices (oscreen design, gesture

    choice and feedback modes), which are listed and explained, will help the reader

    to understand better our work and act as a preliminary but required step before

    reading the implementation.

    Chapter 4: We present the implementation of the prototype application builtto test the Oscreen Interaction. We provide thorough details on the way it

    works, architecture and workflows as well as on the way the prototype was

    built. Small code snippets can be found in this section while the functions that

    we consider as the most important can be found in the appendix A.

    Chapter 5: All the information regarding the User Study we performed, areincluded. We present and explain the phases of the experiment along with its

    modes for each participant that took part in the study. Statistical demographic

    results are also shown in the end of the chapter.

    Chapter 6: We present the results of the User Study, separated in correspond-ing categories. Each category represents a specific metric we tested. We also

    provide a form of visualization through graphs and state our observations after

    investigating the log files containing data regarding positioning and order of

    windows during the experiment.

    Chapter 7: First we set our hypotheses on the results. Secondly we discussthe results based on the findings of Anova analysis performed previously and

    on the participants assessment. Finally we comment on whether the overall

    concept and the hypotheses assumed hold.

    5

  • Chapter 8: In this chapter, the final conclusions are presented along withidentified and proposed research areas for future work.

    6

  • Chapter 2

    Related Work

    The work described in this thesis aims to develop a prototype model for interacting

    with computer displays using an external device that bridges humans 3D environment

    with the desktop metaphor of the computers screen for performing task switching

    operations.

    This chapter provides an overview of existing techniques, technologies and imple-

    mented frameworks. Although there is no clear distinguish between the several themes

    mentioned below, we categorized existing work into four themes: Using secondary

    means, space around displays, virtual space management and mid-air interaction.

    2.1 Using secondary means

    Interacting with a display by using secondary means indicates that the user can

    manipulate and interact to what is shown in a display through another interface.

    Such interface can be a mobiles, PDAs, touchpads screen or a tabletop by using

    gestures with the dominant or both hands.

    A secondary display, such as the PDAs display, allows a user to remotely switch

    between top-level tasks and scroll in context using the non-dominant hand while

    dominant hand operates with mouse.[39] While the system provides such advantages,

    it is imperative to indicate that switching tasks through a PDAs screen can be cum-

    bersome and prone to errors of not selecting the desirable task due to small resolution

    size. Although current technology enhances PDAs with higher resolution, fast switch-

    ing between tasks is limited by having to focus on PDAs screen, identify the correct

    task from the list, select it and focus back on computers screen.

    A numerous researchers have explored the manipulation of objects within the design

    and animation computer area. Multi-point touch tablets have existed for long ago[34],

    yet the Multi-finger Cursor Techniques[38] -implemented on a modified touchpad- and

    Visual Touchpad[36], which projects hands gestures into a computers screen, allow

    7

  • a degree of hands freedom and interaction with screen objects.

    Boring et al. implemented a system that allows users to manipulate devices that are

    incapable of touch interaction. By capturing video using a smartphone, along with

    techniques for targeting tasks such as Auto Zoom and Freeze were able to precisely

    interact with an object in distance.[11]

    When interacting with secondary means, areas close to keyboard and mouse are most

    appropriate for one handed interactions as shown by Bi et al. [10] The Magic Desk

    integrates desktop environment with multi-touch computing and provides a set of

    interaction techniques allowing computer screen to be projected into the desk. The

    DigitalDesk[58] is built as an ordinary desk, but with additional functionality that

    emphasizes in physical paper interactions. The desk uses a reversed approach than

    commonly used: Instead of representing physical objects as digitalized objects, it uses

    the computer power to enhance the functionality of a physical object. With video-

    based finger tracking and usage of two cameras for document scan and project data

    on the desk, the DigitalDesk provides digital properties to physical papers.

    Malik et al. [37] and Keijser et al. [31] use multi-hand gestures to enable the user to

    control an object and the workspace simultaneously, thus allowing the user to bridge

    the distance between objects and oer the user a wider range of gestures by allowing

    the two hands to work together. However, this has been accomplished through ad-

    ditional hardware and vision-based systems. Another approach to manipulating 3D

    objects is the Sticky tools[19] which achieves all 6DOF without additional hardware,

    allowing users to manipulate an object using one, two or three fingers by using shallow

    depth[18] technique.

    Part of goal and therefore our dierentiation between forehand mentioned systems is

    a) to provide such a system that supports hand -not just a limited amount of fingers-

    interaction, b) without using external cameras which are subject to occlusion, c) avoid

    stationary systems or limit our interaction experience by using secondary screens like

    mobiles or PDAs.

    2.2 Using virtual space management

    Virtual space management refers to the concept of enhancing user experience (UX)

    by applying algorithms and operations in existing or newly opened windows in the

    desktop. Several researches have been conducted on optimizing the way windows

    resize, categorize windows according to user need or even provide their own desktop

    metaphor.

    The pioneer in that area, Rooms[21], is a system that similar tasks are assigned in

    a separate virtual space (room) whilst sharing of windows between similar tasks is

    supported. The Task Gallery: A 3D window manager[48], a successor of Rooms,

    provides a 3D virtual environment for window and task management where a 3D

    stage shows the current task while other tasks are placed in the edges of the stage

    8

  • namely in the floor, ceiling and walls. Each task comprises of its related windows.

    To allow users to have an overview of open tasks, a navigational system is introduced

    where users can go back or go forward and thus see more tasks or focus more on one.

    While the system provides advantages and animation techniques, there are also dis-

    advantages with respect to aiding users who desire to complete many dierent tasks

    simultaneously in a small time span. If for example a user wants to switch between

    two or more tasks, they will have to move backwards until they find the required task,

    select it, find the desired window from the loose stack and operate on it. To switch

    to previous task the user will have to reverse the whole process. We can see that the

    complexity of such navigation increases according to number of tasks the user needs

    to perform. Furthermore the time required to switch between tasks is increased by

    the one-second animations between moving forward/backward and opening/closing

    tasks.

    Elastic Windows[29] and New Operations for Display Space Management and Win-

    dow Management[25] endorse the need for new and advanced windowing operations

    for users with many tasks. The Elastic Windows implementation is based on hierar-

    chically managing and displaying windows with root window to be the desktop. The

    system supports inheritance where some characteristics of the parent window can be

    inherited to its children. The philosophy of the system is that windows under the same

    parent cannot overlap but they consume all the available space. Some side-eects of

    this philosophy are that when one window resizes, all of its children and windows of

    same parent are also resized leading to a whole desktop re-arrangement with only one

    resize. Another side-eect directly arisen from the no overlapping rule is that while

    the number of child windows increases, the eectiveness of displayed information de-

    creases. Similarly, QuickSpace[24] automatically moves windows so that they will not

    be overlapped by a resizing window. All techniques rely on the existence of empty

    space, which may not often be available even on multiple monitor systems as shown

    by Hutchings et al.[23]

    Scwm[8] is a parametrized window manager for the X11 window manager and is

    based on defining constraints between windows. These constraints are presented as

    relationships between windows and are user defined. While this system provides some

    advantages such as operating on two related windows as one, it is susceptible to multi-

    tasking requirements where the number of windows increases so does the number and

    complexity of relationships that have to been defined and maintained. Yamanaka et

    al.[60] approach the virtual space management in a dierent way. Instead of creating

    algorithms to adjust window space upon creation new window or resize eect, the

    Switchback Cursor exploits the z-index axis of overlapped windows on a Windows 7

    operating system. Mouse cursor upon specific movement in conjunction with specific

    keyboard press, traverses and selects windows that are below the visible one.

    One approach to address that users work on dierent tasks in parallel and switch

    back and forth between dierent windows is to analyses user activities and assign

    windows to tasks automatically based on if windows overlap or not[59]. Another

    approach addresses the users fast switch by analyzing the window content, relocation,

    9

  • transparency and content combined[54],[35].

    The majority of techniques apply algorithms to manipulate currently open windows

    when new one is created. We would like to extend the Rooms and the Task Gallery

    according to our needs allowing the underlying resizing technique as is: managed by

    the operation system. Thus, instead of using implemented semantics such as move

    back, move forward, go left room we define our semantics which indicate where a

    window is virtually placed.

    2.3 Using space around displays

    The theme, space around displays, refers to a concept where the physical (empty)

    space close to the interaction target, such as a computer or mobile screen, is used in

    conjunction with external sensors like depth cameras and hand gesture recognizing

    devices to provide an interaction channel the between human and the computer.

    The Unadorned Desk[20] which is our inspiration, is an interaction system that util-

    ities the free space in a regular desk enhanced with sensor. It virtually places the

    o-screen and secondary workspace onto the desk providing more screen space for

    the primary workspace and thus acting as an extra input canvas. The experiments

    conducted with the unadorned desk showed that interaction with virtual items on

    the desk is possible when items are of reasonable size and number with or without

    on-screen feedback.

    The usage of a Kinect depth camera mounted on top of the desk limits the mobility

    of the system and requires that user is pinned in a specific place in the desk. As

    Kinect camera by nature is prone to false detections when sunlight, the camera has

    to be placed attentively. Although the system works well when few items are virtually

    placed in the extra input canvas, the desks surface has to be clean of physical objects

    which is not always a valid case as desks tend to be messy.

    Virtual Shelves[35] and Chen et al.[12] combine touch and orientation sensors to

    create a virtual spatial space around the user that allows invocation of commands,

    menu selection and mobiles applications interaction in an eye-free manner. Wang et

    al. [55] demonstrated the benefits of using hand tracking for manipulation of virtual

    reality and 3D data in CAD environment using two webcams to track 6DOF for both

    hands. However, such tasks are restricted to controlled, small areas targeting specific

    frameworks (CAD). Usage of Kinect camera is widespread when comes to augmented

    and virtual reality. MirageTable[9], HoloDesk[22], FaceShift[57], KinectFusion[26] is

    a set of applications supporting real-time physics-based interactions however as noted

    by Spindler et al.[51], tracking with depth cameras, still has limited precision and

    reliability. SpaceTop[33] also uses a depth Kinect camera but also a see-through

    display. Although it allows 3D spatial input, a 2D touchscreen and keyboard are

    also available for input. The unclear visual representation for guidance as noted by

    authors is a subject that we need to take seriously. Physical space requires extra

    sensors, extra cameras and most importantly to be free of several objects. Our goal

    10

  • is to move the interaction area from solid objects such as a desk into a virtual area

    around the screen and thus eliminate usage of additional sensors and cameras which

    make the system less portable.

    2.4 Mid-air Interaction

    There are occasions where interaction with displays has to be done from greater

    distance than standing in front of the screen. By mid-air interaction we mean that

    the communication channel between user and display is the air/empty space and done

    through usage of laser pointers, Wiimote, virtual screens or even worn gloves.

    Uses of mid-air interaction varies from point and click to manipulation and selection of

    3D data. The latter, was implemented by Gallo et al[14] using a Wiimote controller

    to manipulate and select 3D data in a medical environment. Even if system was

    not evaluated, the system was able to dierentiate between two states: pointing and

    manipulation. In the pointing state, the Wiimote acts as a point and click laser

    pointer whilst in the manipulation state, Wiimote interacts with a 3D object with

    available actions of rotating, translating and scaling.

    In [28], the authors utilize both Kinect depth camera and skeletal tracking[30] to

    identify pointing gestures made by a standing person in front of the Kinect camera.

    The spatial area in front of the camera is considered as virtual touch screen where

    the user can point. To detect the direction of the pointing gesture, they detect and

    track the pointing finger using a minimum bounding rectangle and Kalman filtering.

    Interaction techniques using laser have the advantage of low cost and compose the

    best known perspective-base technique. A laser beam can act as a control for point

    in a multi-display system. Simultaneously, laser beams suer from the limitation

    that there are no buttons to augment the single point of tracked input rendering

    mouse operations impossible. Additionally, laser pointer techniques suer from poor

    accuracy and stability [41], and can be very tiring for sustained interactions. Olsen

    et al[41] proposed a non-direct input system and explored the use of dwell time to

    perform mouse operations. However, the installation cost and complexity of most

    systems are prohibiting when increasing scalability.

    Kim et al.[32] tried to approach these limitations by researching ways to embed sen-

    sor in body, specifically a wrist-worn device, which recognizes body movements thus

    reduce the need for external sensors. The wrist-worn device consists of a depth cam-

    era combined with an IMU which recognizes preachingly fingers movements through

    usage of biomechanics.

    Laser interaction doesnt allow generally recognition of gestures and although Jing et

    al. [28] implemented a point and click system, that system is stationary. Wiimote

    suers from the lack of identifying finger gestures whilst worn sensors and gloves

    require that user embeds an external device upon skin, which might be uncomfortable.

    We propose a system that is mobile, with no skin attached devices which identifies

    11

  • hand gestures for both hands and in extension provides the required functionality of

    augmented buttons if that is required.

    2.5 Summary

    We have seen that there is a variety of ways; among them air, third interfaces and

    cameras in order to interact with a computer screen, using a huge range of methods

    and tools such as worn devices, fingers biomechanics, laser pointers, mobiles, game

    controllers and so on. Each aforementioned theme has situational advantages and

    drawbacks and comes with implementations that provide an alternative user experi-

    ence and interaction. The interaction with the computer screen comes at a cost of

    introducing either rather complex, non-mobile interaction systems or ways of window

    management that rely on algorithms and not on users desires.

    We thus propose a work that combines features from the virtual space management

    and mid-air interaction by keeping the interaction mean (mid-air) as simple as possi-

    ble. At the same time we provide a trivial window managing metaphor of select and

    then show or hide, giving the user the ability to resize and position windows the way

    they want.

    12

  • Chapter 3

    Design

    On this chapter we describe the considerations to follow in an Virtual Reality interac-

    tion according to our needs, the oscreen area, gestures, feedback modes and choice

    of framework that supports the interaction.

    3.1 Considerations

    Selection on screen: The initial event which enables the interaction by iden-tifying the collision of a virtual object, any -partially- visible window, with the

    mouse cursor upon a grab gesture.

    Selection o screen: The initial event which enables the interaction by iden-tifying the the hand coordinates in the oscreen area upon a grab gesture.

    Drop in oscreen: The second stage event that upon release gesture in theo screen area removes focus from the selected window (hide).

    Drop in screen: The second stage event that upon release gesture in the mainscreen area gives focus to window selected (show).

    Select o screen & drop o screen: A two stage event that cognitivelymoves an already hidden window to another box in the oscreen area allowing

    organisation of windows.

    Worth mentioning is that since our system is considered as a 2.5D pointing system

    with no need to implement interaction in the Z axis, we employ manipulation of

    virtual objects with four degrees of freedom (4DOF), namely up / down, left / right.

    13

  • 3.2 The oscreen area

    The interaction box of the physical input device provides us enough space to extend

    cognitively the screen area on the top side; space that serves for cognitively saving

    data for assigned windows. This area has as much width as the screen width and is

    scalable1 to the screen proportions.

    Smith et al. [50] observed that the average number of open windows a user keeps per

    session is 4 on single display while the dual monitor users keep 12 opened windows.

    Based on such observations we chose to divide the oscreen area in 8 boxes thus

    allowing to save state for 8 windows.

    Figure 3.1: Oscreen area illustrated with dimensions.

    The fact that human hand stability deteriorates with age, fatigue, caeine and other

    factors [3] indicates that the oscreen area should be designed as large as possible

    with large enough boxes to neglect users unstable hand.

    We thus cognitively divided the oscreen area in two rows, 4 boxes per row, giving

    more freedom for hand movement without risking to choose the wrong box. Each

    boxs width is calculated by the formula screenWidth/4 and each box is assigned a

    number between 0-3, which indicates the position in the X axis. Since we have two

    rows we followed a Cartesian system with positions (X,Y). The X axis has domain

    values 0-3 whilst the Y axis 0 and 1. Figure 3.1 illustrates this concept in details.

    Although the interaction box is wide enough to operate outside of the screen width,

    we decided to keep the interactions strictly within the screen width and thus virtually

    extend the screen only on the Y axis. The intuition behind this choice is, that

    we didnt want to increase the diculty of cognitively dividing the oscreen area,

    especially when no visual feedback was provided in the experiment.

    The interaction with the application starts when the user places their left or right

    hand inside the interaction box. At that point, the user has complete control of the

    mouse cursor movement.

    3.3 Gestures

    Gestures implemented are classified as concrete. They are evaluated after they have

    been completely performed, e.g. Selection on screen is only valid when hand from

    1Up to a limit

    14

  • having extended fingers (release) is now a punch (lack of fingers).

    Research on previews works [43], [44] has shown that gestures applied on physical

    input devices are preferred to match gestures that users would normally apply on

    physical objects. Such natural gestures allow the manipulation of virtual objects and

    are tied with users knowledge and skills of the physical world.

    Hand gestures are used to interact on the 3D space rather than pointing. The later is

    commonly used in 2D view systems because of its simplicity and stable performance.

    Several user-defined gestures such as pointing, pinch, grab, 2-hand stretch have being

    classified according to user experience [44]. Out of the two possible candidates, pinch

    and grab, that imply natural and realistic interaction, we have selected the later one

    because grab is closer to the natural interaction we target and secondly because pinch

    is mostly used for other interactions like rotation and shrink / stretch in dierent axis

    [27].

    For our purpose, and based on the commonly used gestures as presented above, the

    grab is a crucial gesture to implement interactions within the Virtual Reality space.

    The prototype then defines two gestures: Grab and Release both applicable to the

    oscreen and to the in-screen area. These gestures must be performed sequentially to

    complete an action due to the fact these gestures imply that hand transits from one

    state to another.

    Figure 3.2: Grab gesture sequence. Release sequence is the reversed grab.

    Furthermore, mouse cursor can only be under one window, thus only one window

    can be identified per grab. In extension to this, interaction is performed only by one

    hand. Either the dominant or the secondary, according to user preference.

    15

  • 3.4 Feedback

    We have designed the application to operate in three dierent modes: Full feedback,

    single feedback and no feedback.

    The full feedback mode, as shown in Figure 3.3 , provides a window with information

    about all boxes, which shown when hand is in the oscreen area. The selected box

    is visually identified by red border and in case of a window is cognitively saved, a

    screen shot taken at the grab event is shown to help the user identify the window

    when required.

    The single feedback mode (Figure 3.4), provides a window with information about

    the specific box which is associated with current hand position in the oscreen area.

    Apart from the screen shot shown, user gets extra visual help by having as window

    title the applications title e.g. Chrome.

    The no feedback mode provides no informational window nor any other information

    when the hand is in the oscreen area.

    Figure 3.3: Chrome applications

    image on box (0,1), current hand

    position in box (0,2) with red frame

    Figure 3.4: Chrome applications

    image on box (0,1) and hand position

    at (0,1)

    3.5 Frameworks

    The nature of this research is such that forces us to use low level programming lan-

    guages, frameworks and libraries as close as possible to the operating system while

    using up-to-date programming paradigms (Object-oriented programming). For such

    reason languages that would require a wrapper to access native calls such as Java or

    languages mainly targeting web development such as JavaScript have been excluded

    from consideration even if the physical input device supports them.

    16

  • Chapter 4

    Implementation

    Oscreen Interaction is an application that operates on a standard Mac with any

    screen size which runs OS X version 10.7 or higher. Oscreen Interaction cannot be

    ported to Windows or Linux based systems because of a) OS X native libraries are

    used and b) targets as a replacement of Mission Control.

    The implementation has been developed in Objective C language using Xcode. Xcode

    was developed by Apple for both OS X and iOS, which contains on the fly documen-

    tation for all libraries alongside with other utilities that serve to enhance coding

    eciency and experience such as debugger, tester, profiler, analyzation tool etc. In-

    troduced at June 2, Swift is a new programming language for coding Apple software.

    Xcode supports also developing of AppleScript which stands for a scripting language

    built into OS X since version 7.

    Important code snippets that were crucial for the implementation can be found in the

    Appendix A.

    4.1 Application Description

    Oscreen Interaction is an application which has been built to test the capabilities

    and the limits of interacting with desktops applications. It aims to provide an area

    above the screen plane where user can interact with, in order to organize and switch

    between windows. The interaction with the user is performed through hand gestures

    received by a motion tracking input device. Although Oscreen Interaction is not a

    complete application, it embeds all the required infrastructure needed to support the

    new interaction we are proposing. The basic scenarios are the following:

    When user grabs a window and releases in the oscreen area we should be ableto identify which window this is, save information in memory and hide it.

    When user grabs a window from the oscreen area and releases it in screen, weshould be able to identify which window it was by checking memory data and

    17

  • then show it.

    When user releases a window in the oscreen area, if the box is occupied thenoverwrite this box with the new windows data and pop out previously stored

    window.

    When user grabs a window from the oscreen area and releases in oscreen area,if the box is occupied then swap the saved memory data, otherwise move the

    window in the new box.

    Despite this may sound trivial at first glance, Apple does not provide one univer-

    sal API for interacting, manipulating and identifying windows. As result several

    workarounds were implemented in order to overcome limitations that where found.

    Specifically, between the Accessibility API and the Cocoa the limitation was that the

    Accessibility hierarchy is independent and separate from the window hierarchy.

    4.2 Development Languages & Frameworks

    Objective-C [40]. Primary programming language for developing OS X ap-plications. Objective-C as name states is an Objected-oriented Programming

    language (OOP), successor of C++.

    Cocoa [13]. High level API that combines three other frameworks: Foundation,AppKit and CoreData included by header file Cocoa.h that automates many

    application features to comply with Apples human interface guidelines.[50]

    Quartz [46]. Provides access to core to lower graphic services, composed byQuartz 2D API and Quartz Extreme windowing environment.

    AppleScript [7]. Scripting and narrative language for automation of repetitivetasks.

    Accessibility API [5]. Extra libraries targetting to assist users with disabili-ties.

    Leap SDK ver 2.2.2 [49]. Provides access to the physical input device chosenfor this application (LEAP Motion) through Objective-C calls.

    Carbon (Legacy) [6]. Legacy API, acting as bridge between Cocoa and Ac-cessibility API.

    4.3 Hardware

    For identifying gestures and bridging the real with virtual space, the Leap Motion

    Controller is used. The controller is a small peripheral 3 x 1.2 x 0.5 inches no much

    heavier than a common USB stick. It utilizes two cameras and three infra-red LEDs

    18

  • serving as light sources to capture motion and gesture information. The system is

    capable of tracking movement of hands, fingers or several other objects within an area

    of 60cm around the device in real time. It can detect small motions and has accuracy

    of 0.01mm [15].

    The small cameras that Leap Motion Controller comes with cannot extract as many

    information as other systems that come with large cameras. Because embedded al-

    gorithms extract only the data required, the computational analysis of images is less

    considerable and therefore the latency introduced by the Leap Motion Controller is

    very small and negligible. The fact that the controller is small and mostly software

    based, makes it suitable for embedding it in other more complicated VR systems.

    Figure 4.1: Leap Motions size and upon operation of hand skeletal tracking.

    Although a few details are known regarding algorithms and its advanced principles

    for the Leap Motion, as is protected by patental restrictions, Guna et al. [16] attempt

    to analyze its precision and reliability for static and dynamic tracking. Ocial doc-

    umentation [1] states that it recognises hands, fingers, and tools, reporting discrete

    positions, gestures, and motion.

    Figure 4.2: Leap Motion Controller structure. [56]

    19

  • J. Samuel [27] categorizes the Controller as an optical tracking system based on stereo

    vision principle, where the interaction box of the controller is shown in the Figure

    4.3 below. The size of the InteractionBox is determined by the Leap Motion field of

    view and the users interaction height setting (in the Leap Motion control panel). The

    controller software adjusts the size of the box based on the height to keep the bottom

    corners within the filed of view. [1] with a maximum height of 25cm.

    Figure 4.3: Leap Motion Interaction box: A reversed pyramid shape formulated by

    cameras and LEDS. [2]

    It is important to mention that controller is accessed through an API that supports

    dierent programming languages, amongst Objective-C that is our main programming

    language.

    4.4 Connectivity & Architecture

    The Leap Motion runs over USB port as a system service that receives motion tracking

    data from the controller. Using dylib (dynamic libraries) on Mac platform exposes

    these data to the Objective-C programming language. Furthermore, the software

    supports connections with a WebSocket interface in order to communicate with web

    applications.

    Figure 4.4, shows the architecture of the Leap Motion Controller, which consists of:

    Leap Service, receives and processes data from the controller over the USBbus and makes data available to a running Leap-enabled application.

    Leap control panel, runs separately from service allowing configuration of thecontroller, calibration and troubleshooting.

    Foreground application, receives motion tracking data directly from the ser-vice while application has focus and is in the foreground.

    20

  • Background application, allows reception of motion tracking data even if theapplication runs in background, is headless or run as daemon.

    Figure 4.4: Leap Motion Controller architecture. [2]

    4.5 Basic workflow

    The Oscreen Interaction application can be dividing in the following modules:

    Start Application - represents the launch o the application.

    Window module - in this module the applications window controller is reg-istered and listens to events for showing or hiding in the designed areas. This

    controller supports three modes: Full feedback, single feedback, no feedback.

    View module - module responsible for visualizing the selected o screen box.

    Leap module - this module loads the application logic to handle motion track-ing data from the Leap Controller.

    Experiment module - This module is responsible for guiding the user whenconducting experiment, although it is not a discrete entity, as is included in the

    Leap module.

    Logger module - saves various data in log file when experiment is conducted.

    The flowchart of those modules can be observed on figure 4.5. We notice that al-

    though registration of modules is linear, windows module in cooperation with leap

    21

  • module oer the variation between the three feedback modes, thus allowing user to

    get familiarized first with the cognitive oscreen area before actually conducting the

    experiment.

    Figure 4.5: Oscreen Interaction - Abstract workflow of Basic Modules.

    4.6 Gesture algorithm workflow

    On this work, we have implemented an algorithm that performs actions of hide / show

    of opened applications as well as organizing them in the oscreen area based on hand

    coordinates and the two gestures of grab and release. Figures 4.6 and 4.7 demonstrate

    how algorithm works in general before getting into details on the next sections. The

    algorithm starts upon hand is recognized by the Leap service and executes per frame

    based on tracking data given (coordinates and handStrength).

    It is vital to refer to an exposed variable by the API named handStrength, part

    of the LeapHand class, which indicates how close a hand is to appear as a fist

    by measuring how curly fingers are. Based on that, we can say that fingers that

    arent curly will reduce the grab strength and therefore reduce the probability of

    identifying a grab gesture whilst fingers that are curly will increase the grab strength

    and therefore reduce the probability of identifying a release gesture. This variable has

    domain values [0..1] and experimentally decided that value >=0.8 indicates a grab

    22

  • and
  • 4.7 Matching Coordinate Systems & Mouse Move-

    ment

    Leap Controller provides coordinates of fingers in units of real world (mm), thus

    it is vital to translate coordinates in screen pixels, according to screen resolution.

    The SDK provides methods to normalize values in the [0..1] range and get screen

    coordinates. We need to keep in mind that top left corner in OS X is (0,0) whilst the

    (0,0) in the Leap is on bottom left, thus we need to flip the Y coordinate.

    LeapInteractionBox *iBox = frame.interactionBox;

    LeapVector *normalizedPoint = [iBox normalizePoint:leapPoint clamp:YES

    ];

    int appX = normalizedPoint.x * screenWidth;

    int appY = normalizedPoint.y * screenHeight;

    appY=screenHeight -appY;

    Having the screen coordinates, it is then trivial to control the mouse with hand:

    CGEventRef move1 = CGEventCreateMouseEvent(NULL , kCGEventMouseMoved ,

    CGPointMake(appX ,appY),kCGMouseButtonLeft);

    CGEventPost(kCGHIDEventTap , move1);

    4.8 Application Identification

    The next algorithm shows the steps to identify a window under mouse cursor, after a

    grab gesture is performed in the in-screen area:

    GrabIsFinished AND isInScreen

    Get mouse location using Cocoa

    Convert mouse location to CGPoint using Carbon API

    Get AXUIElementRef by CGPoint using Accessibility API

    Extract application PID from AXUIElementRef

    Extract application Title from AXUIElementRef

    Generate image of Application given then Window Title.

    Scan windows in screen

    If kCGWindowOwnerName is equal to extracted Title

    Get the windowID

    Create bitmap of application by windowID

    Save data in custom object for using upon release.

    Listing 4.1: Abstract algorithm for identifying window.

    As we can see, three APIs (Cocoa, Carbon, Accessibility) are cooperating for a single

    window identification.

    24

  • 4.9 Implementation the oscreen and inscreen area

    recognition

    In the Design chapter(3.2), we defined abstractly the oscreen area. Now, we show

    how this cognitive area is implemented by combining screen coordinates and the Leap

    Controllers coordinate system.

    4.9.1 The orthogonal cognitive area

    In section 4.7 we normalize hand coordinates to fit screen dimensions, thus we limit

    X and Y values within the screen. In order to overcome this, we also save the hand

    coordinates (X,Y) values to variables before the normalization takes place. Given

    that the interaction box of Leap starts at 82.5mm which corresponds to pixel 0 and

    ends at 317,5mm which corresponds to screen height (fig. 4.8), we can say that pre-

    normalized values higher than 317,5 refer to pixels cognitively greater than the screen

    size. Indeed, using the mathematical formula p = (yk y0) hyny0 where p is the(cognitive) pixel, yk is the hand position in millimeters, h is the screen height (=900)

    and y0,yn the bottom and upper height of the interaction box, proves that for hand

    position of e.g. 350, we refer to the pixel 1024 which in our occasion is in oscreen

    area.

    Having explained mathematically the translation from pixels to millimeters, we can

    now define the oscreen area code-wise by applying if statements:

    if(appY ==0&&yy >= LOWER_BOUND &&yy 0 && appX =0&& appY 0 && appX

  • Figure 4.8: Interaction box diagram. [2]

    4.9.2 Implementation of boxes

    Since the number of boxes we defined in the Design chapter(3.2) is a constant(=8),

    the width of each box is also constant. As per design, we need to distribute these

    boxes in two rows and therefore their size is calculated by screenwidthnumberofboxes2

    , number which

    is only dependable on the screen width. To find in which of the box our hand is for

    the X axis, we calculate boxx = appX numberofboxes

    2screenwidth and discard the decimals. For

    example, when hand is in position appX=465, for a 1440px wide screen, then we refer

    to box 1 (= 465 821440 ). For the Y axis, we map the yy value to 0 when its value is[LOWER BOUND, (UPPER BOUND LOWER BOUND)/2] and 1 when itsvalue is ((UPPER BOUND LOWER BOUND)/2, UPPER BOUND].

    26

  • We thus have the unique coordinations of any box, allowing us to build an Objective-

    C class and store it in a NSMutableDictionary using these unique coordinations as

    identifying key. The instance of the box class (Figure 4.9) is created upon a release

    gesture in order to save specific windows data such as title, PID and screen shot

    which were generated when user performed a grab in an open window.

    Figure 4.9: Box class diagram.

    4.10 Abstract application workflow

    Having defined key algorithms and workflows above, we can now define the abstract

    workflow of our implementation as shown in 4.4.

    Initialization

    Load modules (as described in Chapter 4.5)

    End

    Main Loop (per frame) onFrame ()

    While hand found do

    Move mouse according to normalized hand position

    If hand is in offscreen area AND feedback mode is not none

    Show feedback window

    End

    Measure hand grab strength

    When grab strength 0.8 gesture is finished

    Get gesture and position of event

    If gesture was grab

    Apply algorithm as described in Fig.4.6

    End

    If gesture was release

    Apply algorithm as described in Fig.4.7

    Free resources

    End

    End

    End

    End

    Listing 4.4: Abstract algorithm of the implementation.

    27

  • 28

  • Chapter 5

    Experiment - User study

    On this chapter we describe the details of the user study we conducted by presenting

    the principles of the experiment itself and explain thoroughly all the phases that the

    experiment consists of; from the setup until the final phase: the assessment given by

    the participants. Finally, we present briefly statistical data of our participants.

    5.1 Experimental Set-up

    In order to interact with the content and evaluate the interaction, we have set-up a

    testing area on which a fixed Mac with 17 retina display placed in a table, easily

    accessible by the participants sitting in front of it. In front of the laptop, the Leap

    Motion controller is placed in a way that potential cable twist will not lean device

    forwards or backwards aecting user experience.

    The device is attached either on the left or on the right side of the laptop depending

    to participants dominant hand and their preference.

    Figure 5.1: Set-up for left handed

    mouse users.

    Figure 5.2: Set-up for right handed

    mouse users.

    29

  • 5.2 Experiments Procedure

    In this chapter, the procedure of the experiment is discussed and thoroughly analyzed.

    We conducted a user study to test and analyze the OScreen Interaction. Each user

    participating in this study, should follow the instructions given to them, complete the

    experimental part and finally provide us an assessment. The procedure followed for

    each participant consist of the following parts.

    Welcome - Brief explanation of interaction - User Learn Mode - Experiment - Demo-

    graphic information - Assessment

    These parts will be discussed, interpreted and visualized with appropriate figures.

    Duration of each experiment was calculated to last around 25 minutes plus time

    for the instructor to explain the concept and the participant to perform the final

    assessment. We ended up with a study that requires approximately 40 minutes.

    Though, the time required for the whole procedure to be completed was variant, as

    was depended on how much time the participant used in the User Learn Mode in

    order to feel comfortable with the oscreen area. We also logged data other than

    timing and errors, data that might help us understand potential position patterns of

    windows placed in the oscreen area.

    5.2.1 Welcome - Brief explanation of interaction

    In the beginning of the experiment the experimenter introduces himself, welcomes the

    participant and explains their rights. Participants are allowed to withdraw from the

    experiment at any time they want or if they feel uncomfortable or tired. After that

    we thank them for participating and helping us in this study and finally introduce

    them to the purpose of this study, as well as stating which are the goals we are trying

    to achieve.

    At that point, we start explaining the various aspects of the experiment. We explain

    first, the ways they can interact with the system and explained the two gestures.

    Afterwards the three feedback modes were thoroughly presented in conjunction with

    an in-depth explanation of the oscreen area. Before proceeding to the actual exper-

    iment, we made sure that the participant understood the concepts of the interaction

    by encouraging them to use the User Learn Mode (5.2.2) in all feedback modes

    and familiarize themselves before the experiment.

    Furthermore, participants were asked to get a close look to the specific applications

    that they will operate on, as our prototype application could take screen shots of the

    majority of applications but not all. Thus, we were forced to use specific applications,

    some of which, participant might not be familiar with.

    30

  • 5.2.2 User Learn Mode

    For reasons we will explain below (5.2.3), it is vital for the participant to spend a

    few minutes in the User Learn Mode, where no timings and errors are logged. In this

    part, user takes in turn the Full Feedback, Single Feedback and No Feedback mode

    while they get brief explanation of screen visualizations and their components where

    this is applicable. When participants were ready, they could initiate the experiment

    by pressing the ESC key.

    At this point and afterwards, the experimenter interferes no more with the participant

    as on screen instructions are given.

    5.2.3 Experiment

    Principles

    The experiment itself consists of cycling between the three feedback modes where on

    each feedback, each participant is asked to operate on a random windows sequence.

    When the experiment is initiated, the three modes follow the Balanced Latin Square

    algorithm [52]:

    Participant First Mode Second Mode Third Mode

    1 Full Feedback Single Feedback No Feedback

    2 No Feedback Full Feedback Single Feedback

    3 Single Feedback No Feedback Full Feedback

    Table 5.1: Feedback cycling for the first 3 participants. Afterwards, it repeats itself.

    The reason for encouraging participants to use time in the User Learn Mode, is that

    even numbered participants (2,4,6,8), would have to conduct the experiment starting

    with the No Feedback mode, a mode that is considered hard especially if one does not

    understand position of boxes in the oscreen area.

    The same behavior is applied on the window list. The windows to operate on, are

    stored in a 8x8 structure where each row contains window titles following the Balanced

    Latin Square algorithm as shown in Table 5.2.

    # Window Titles

    1 Wnd 1 Wnd 2 Wnd 8 Wnd 3 Wnd 7 Wnd 4 Wnd 6 Wnd 5

    2 Wnd 2 Wnd 3 Wnd 1 Wnd 4 Wnd 8 Wnd 5 Wnd 7 Wnd 6

    3 Wnd 3 Wnd 4 Wnd 2 Wnd 5 Wnd 1 Wnd 6 Wnd 8 Wnd 7

    4 Wnd 4 Wnd 5 Wnd 3 Wnd 6 Wnd 2 Wnd 7 Wnd 1 Wnd 8

    Table 5.2: Sample Balanced Latin Square algorithm for windows

    Upon experiment start, by pressing the ESC key, a row is picked randomly and the

    participant is asked to move all windows cognitively in the oscreen area at any order

    31

  • they want. Simultaneously, they get a notification of the number of windows that are

    still inscreen and their title (fig. 5.3). In order to allow participants to pick windows of

    their choice, all windows are shown, randomly resized and then shued. This ensures

    us that there are windows overlapping and thus a participant can grab partially shown

    windows if they desire. Fig. 5.5 demonstrates this concept. When no windows are left

    visible in the desktop, the prototype asks the participant to show a specific window

    (fig 5.4) based on the previous randomly picked row of windows order. At any time

    the participant can ask the system which window was the requested by pressing the

    ESC key.

    Figure 5.3: Notification with

    remaining windows number and title.

    Figure 5.4: Notification showing next

    window to be fetched from oscreen.

    Figure 5.5: Windows after shuing.

    To successfully complete a trial, the participant has to grab the application they were

    asked from the oscreen area and then release it in the inscreen area. Two errors can

    happen:

    Wrong window. Participant grabbed a window other than the one requested,error that is visualized by notification only after a release.

    Nothing. Participant did a grab gesture in a box that does not contain awindow. This happens when one makes a gesture in a box which now is empty

    because that window has already become visible in a previous trial. Error

    notification is visualized immediately.

    Experiment Windows List

    The following applications have been chosen for the participants to operate on.

    32

  • (a) Chrome (b) Firefox (c) Eclipse (d) Sublime

    (e) Xcode (f) Pages (g) Skype (h) Calendar

    Figure 5.6: Selected windows for the experiment

    We chose deliberately two programs for Software Development (Xcode, Eclipse), two

    for Internet browsing (Firefox, Chrome) and two for text editing (Sublime, Pages)

    because their functionality is similar to each other enabling us to look for patterns

    regarding grouping applications in sequential boxes in the oscreen area. We also had

    to narrow down our options due to limitation of screen shot as previously described.

    5.2.4 Demographic Information

    Demographic Information form is used to collect non-crucial personal information

    about the participant. Information such as name, surname and address are not asked

    to avoid any confidentiality issues. The information that need to be filled are shown

    in table 5.3.

    Age Based on year of birth

    Gender Male or female

    Job Title Participants work area

    Table 5.3: Demographic information

    We then ask the participants to state their experience in MacOSX usage and espe-

    cially if they are using the Mission Control to navigate between windows. Finally

    we ask them to state how many dierent tasks do they usually perform and rate

    their experience with computers. Rating is based on a level 5 Likert scale; Highly

    Inexperienced (1) to Highly Experienced (5).

    This form can give us clues about if the several factors mentioned below, introduce a

    potentially big variation in the error and time variables during the experiment.

    In particular:

    1. Familiarity with MacOSX

    2. Usage of Mission Control

    33

  • 3. Number of tasks

    4. Computer experience

    The Demographic form as subsection of the Assessment form can be found in Ap-

    pendix B.

    5.2.5 Assessment

    Having completed all trials in all feedback modes, participants instructed to fill a

    questionnaire in a five point Likert scale where they rate and optionally comment on

    each of the feedback mode the encountered. The questionnaire (Appendix C) was

    divided in three parts, one for each feedback mode. In particular the participants had

    to provide an assessment for the following type of feedbacks:

    1) Full feedback, 2) Single feedback, 3) No feedback

    The questionnaire assessment has the same structure and is independent of the mode

    of the feedback. Each assessment consists of 10 questions, which are explained below

    along with their importance.

    1. Grabbing windows during the experiment was: The participant had to

    rate how easy could grab a window for the specific feedback mode. This question

    will help us identify if grab window gesture aects or not the question #5.

    2. Releasing windows during the experiment was: The participant had to

    rate how easy could release a window for the specific feedback mode. This

    question will help us identify if release window gesture aects or not the question

    #6.

    3. Mouse movement smoothness during experiment was: The participant

    had to rate the smoothness during the experiment. Smoothness is a subjective

    factor and with this question we can assess whether smoothness of mouse while

    moving hand inscreen interferes with what window user wants to select.

    4. Dividing the o screen area cognitively was: The participant had to rate

    how dicult it was to imaginary think of the oscreen area and split it into

    boxes. The importance of this question is, that we can assess if splitting the

    oscreen in two rows is a feasible solution.

    5. Grabbing (oscreen) and releasing (inscreen) the correct window dur-

    ing the experiment was: The participant had to rate how dicult it was to

    perform an operation from oscreen to in screen.

    6. It was easy to release (oscreen) in the area I wanted: The participant

    had to rate how dicult it was to target the box they wanted in the oscreen

    area.

    Questions 5 and 6 are of high importance since we get feedback about the nature

    of the interaction.

    34

  • 7. Arm fatigue: Level of fatigue in the arm.

    8. Arm fatigue: Level of fatigue in the wrist.

    Questions 7 and 8 help us understand if we should find alternative ways of

    interaction for Future Research (Section 8.2)

    9. General comfort: We ask the participant what their general impression was

    regarding comfort. Of course there is correlation between this factor and the

    wrist and arm fatigue factors.

    10. Overall, the interaction was: We measure the general impression of the

    feedback mode. It is measured in usability scale.

    Finally in the bottom of each page, there is area designed to allow participants com-

    ment the interaction, provide advantages and disadvantages they might have faced as

    well as share their thoughts of improvements, if any. This was not a required section,

    however we extracted some very welcomed feedback from them.

    5.3 Participants

    We found 9 participants willing to participate in our study ranging from age 23 to 45

    years old. We extracted the following information from their demographic form.

    As we can see from Figure 5.7a participants are mostly ranged between 26 and 30

    years old. We also tried to recruit people from older and younger ages, that would

    hopefully behave interestingly dierent on the experiment. We managed to recruit 2

    participants on the 41-45 range and other 1 below 25 years old.

    (a) Distribution of

    participants age.

    (b) Distribution of

    participants gender.

    Participants were mostly male (56%) as it can be observed on Figure 5.7b. To be

    precise, we had 5 participants that were male and 4 female (44%). Four out of nine

    are working as managers for a department and three were students/developers. Also

    all of them showed that they had experience with Mac before, even if they dont own

    one. Figures 5.8a, 5.8b illustrate that all of our users have up to one degree experience

    with computers and 56% use the Mission Control to switch between tasks by invoking

    it by one way or another (mousepad gesture, press F3 button).

    Finally, in Figure 5.8c we observe that the majority of participants interact with 3-5

    tasks in their daily routine whilst only one performs more than 5. Though, this only

    35

  • (a) Distribution of

    participants experience.

    (b) Distribution of use of

    Mission Control.

    (c) Distribution of

    simultaneous tasks.

    implies interaction with these tasks, as it is also possible windows that compose other

    tasks to be opened but user not using them (inactive).

    36

  • Chapter 6

    Results

    This chapter presents the results we collected after performing the user study on the

    Oscreen Interaction. We decided to focus on 2 basic categories, which include the

    most significant information we collected and analyzed. We present the results for

    Completion Time and Error Rate. We present and illustrate the mean values for all

    the parts of the assessment as discussed in Section 5.2.5. We also refer to the feedback

    modes of the trials with the abbreviations presented in Table 6.1.

    Feedback mode Abbreviation

    Full FF

    Single SF

    None NF

    Table 6.1: Abbreviations for the feedback modes

    6.1 Completion Time

    Completion time is a parameter that counts the time a participant spent from the mo-

    ment they were asked to show a window until they actually show that window. That

    means, the timer is triggered upon system notification which informs the participant

    for the requested window title and stops when this window is shown in screen. When

    the correct window is shown, new system notification is triggered and participant

    proceeds with the next timed interaction.

    We applied the ANalysis Of VAriance (ANOVA) [4] statistical model, using the aggre-

    gated values of time. We used ANOVA to determine if the mean values for completion

    time per feedback mode are statistically dierent. We could export basic information

    about the means by comparing them, but we want to know how these dierences in

    the mean values aect our results and if these dierences are significant.

    Using repeated measures ANOVA we found significant main eect in feedback modes

    37

  • (F1.064,8.512=9.905, p
  • type of feedback mode. As we can see, and as already stated, Mean dierences are

    really significant between Full/Single Feedback and No Feedback.

    Feedback mode (i) Feedback mode (j) Mean Dierence (i-j) Sig.

    FF SF -0.056 1

    FF NF -0.611

  • behavior as this is strengthened by log file entries (Listing 6.1). The final result of

    grouped applications is shown in Figure 6.2.

    Figure 6.2: Participants 1 grouping.

    To give meaningful meaning in the log entries presented below, we will try to explain

    them by commenting:

    1:C:Chrome :(2 ,0) //Put Chrome in box (2,0)

    1:C:Skype :(0,0) //Put Skype in box (0,0)

    1:C:Sublime Text 2:(0 ,1) //Put Sublime Text 2 in box (0,1)

    1:C:eclipse :(1,1) //Put eclipse in box (1,1)

    1:C:Pages :(2,1) //Put Pages in box (2,1)

    1:C:Firefox :(3,0) //Put Firefox in box (3,0)

    1:C:Calendar :(2,1) //Put Calendar in box (2,1), Pages pops out

    1:C:Xcode :(0,0) //Put Xcode in box (0,0), Skype pops out

    1:C:Pages :(0,0) //Put Pages in box (0,0), Xcode pops out

    1:C:Xcode :(1,0) //Put Xcode in box (1,0)

    1:C:Skype :(3,1) //Put Skype in box (3,1)

    Listing 6.1: Log entries for Participant 1, NF mode.

    Participant 2 on the other hand, followed a dierent methodology. They filled the

    bottom row first and then the upper row. We were unable to identify whether the

    positioning in such pattern as shown in Figure 6.3 was accidentally or predicted as

    there are no pop outs (Listing 6.2).

    2:C:Pages :(0,0)

    2:C:Chrome :(1 ,0)

    2:C:Skype :(2,0)

    2:C:eclipse :(3,0)

    2:C:Sublime Text 2:(0 ,1)

    2:C:Firefox :(1,1)

    2:C:Calendar :(2,1)

    2:C:Xcode :(3,1)

    Listing 6.2: Log entries for Participant 2, NF mode.

    40

  • Figure 6.3: Participants 2 potential grouping.

    6.4 Subjective Preferences

    In Figure 6.4, the mean values for all the parts of the assessment presented in section

    5.2.5 are illustrated and presented. On each diagram the vertical axis represents the

    mean values of the corresponding category, which are ranged from 1 to 5 (5 point

    Likert scale) whilst X axis represents the mode. Generally, higher values mean that

    participants positively rated the specified category.

    FF mode scored repeatedly well across all categories, better than any other mode.

    Precisely, Full Feedback (Blue) was ranked higher concerning the Overall Interaction

    Experience (3.66 / 5) Figure 6.4h and the General Comfort (3.11 / 5) Figure 6.4g.

    However, most importantly, NF (Orange) performed badly in a large degree com-

    pared to the rest of modes through all categories except wrist fatigue, Figure 6.4d

    -category to which all modes performed mediocre with STDmean = 0.231. Arm fa-

    tigue (Figure 6.4c) especially in NF, indicates a possible future research topic with

    target to diminish the eect. Finally, grabbing from oscreen/inscreen and releas-

    ing inscreen/oscreen respectively, as well as smoothness, which operates uniformly

    (mean = 3.92, STDmean = 0.16) ranged high in all modes (Figures 6.4e, 6.4f, 6.4a),

    which proves that our prototype application is well structured and corresponds well

    to hand movement and gesture identification.

    (a) Mean Smoothness

    (b) Mean Oscreen Area Cognitively

    Division

    41

  • (c) Mean Arm Fatigue (d) Mean Wrist Fatigue

    (e) Mean Grab Oscreen, Release

    Inscreen easiness

    (f) Mean Grab Inscreen, Release

    Oscreen easiness

    (g) Mean General Comfort (h) Mean Overall Interaction Experience

    Figure 6.4: Mean Values assessed by participants for each feedback mode.

    42

  • Chapter 7

    Discussion

    We conducted a study and presented the results, now we need to discuss and interpret

    these results. What glues together the experiment (Chapter 5) and the results (Chap-

    ter 6) are the hypotheses made, thus we will commence the discussion by stating the

    hypotheses and discuss if they hold.

    (H1) Visual feedbacks (Full and Single) outperform significantly No Feedbackin terms of task completion time.

    In section 6.1 we stated that the No Feedback mode is the slowest mode and we

    showed a dierence of ratio of 15 times and 6 times slower than Full and Single

    modes respectively. At section 6.3 we presented two cases that achieved really

    fast completion time in the No Feedback mode. Although their performance

    was remarkable, their times in No Feedback mode outperformed only ones

    participant completion time in Single Feedback. This participant performed so

    badly in all modes (e.g. 10 times slower in Single Feedback than rest) and

    therefore is considered an isolated event. We believe that albeit H1 still holds,

    future research is required to strengthen our belief.

    (H2) Error rates in No Feedback are significantly higher than any visual feed-back.

    In Chapter 6, Section 6.2 we noted that in No Feedback mode error rates were

    huge compared to the feedback modes that give visual information. This was

    expected, as in the No Feedback mode users have no indication where in oscreen

    area their hand is and whether the requested window is under this box or the

    box is empty. Applying patterns while positioning phase (grouping tasks &

    positioning methodology) although aected time completion, error rate is still

    high. Even if the error rate is only composed by the Nothing error (5.2.3), still

    this is a form of error and we therefore conclude that H2 holds.

    (H3) Task completion time in No Feedback mode can be improved upon usinggrouping or special spatial positioning.

    43

  • We analyzed results in the Section 6.3 where we presented two cases varying

    significantly from others. Indeed our analysis of data validates H3 as two partic-

    ipants managed to improve completion time (best=3.73s) by a factor of 5 from

    the fastest participant who achieved 15.13s.

    Taking into consideration the above discussed hypotheses, we perceive that all of them

    hold. It this though imperative to discuss the influence of H3 in the No Feedback

    mode before we can conclude:

    Since our window list was chosen such as to allow grouping of windows and strength-

    ened by our findings, we could assume directly from H3 that if there is correlation

    between tasks, systems that provide No Feedback might behave no significantly worse

    than systems with visual feedback. This assumption though is not taking into con-

    sideration error rates and a longer lasting usage of a No Feedback system, which

    due to high arm fatigue (Fig 6.4c) would increase significantly the error rate and user

    frustration resulting in a no usable system with slow completion time, prone to errors.

    In the additional assessment users provided us with (Appendix C), we asked them to

    put the 3 dierent modes in ascending order depending on which was the easiest to

    use. Users dominantly rejected the No Feedback mode as the most tiring and least

    eective to use which definitely depicts user preferences. Observing the results we

    got from the error and completion time parameters, we realize that the No Feedback

    is outperformed by all other modes regarding all measured parameters. Results indi-

    cate no significant dierence between the Full feedback and the Single feedback but

    nonetheless Full feedback is slightly more ecient.

    Our findings contradict findings of Gustafson et al. [17] where they developed and

    studied a viable system without any visual feedback. This contradiction is interpreted

    by the fact that our interaction is a mixed interaction between a computer screen and

    spatial space whilst their system interacts only in the spatial space. Similarly, Po et

    al. [45] state and i quote

    Our findings suggest that pointing without visual feedback is a potentially

    viable technique for spatial interaction with large-screen display environ-

    ments that should be seriously considered.

    We would like here to dierentiate our study from studies for large-screen displays as

    our mechanics dont apply in such displays, research area that we will address in the

    section below.

    After stating the above observations, and combining H1, H2, H3 we conclude that

    a system for interacting between desktops windows and the oscreen space can a)

    be seamlessly achieved as long as there is a form of visual feedback to the user and

    b) that the existence of some (subjective) form of grouping windows or positioning

    methodology in the oscreean area decreases the interaction time but not the error

    rate when no visual feedback is provided.

    44

  • Chapter 8

    Conclusion - Future Work

    This paper has presented an alternative 2.5D interaction technique for task switching

    operations on a MacOSX operating system using an external, portable device for

    gesture recognition. We will present in this chapter our conclusions based on our

    findings as well as provide research areas that are consequently derived or areas that

    we didnt research but we believe they should be.

    8.1 Conclusion

    In this study we proposed Oscren Interaction, an alternative interaction technique

    for manipulating (hide and show) open windows, which provides an eective task

    switching paradigm by placing and retrieving tasks in the spatial space around a

    computer screen. This spatial space is defined by a cone area the Leap Motion Con-

    troller -a depth sensing device- operates and allows us to obtain information of hands

    and fingers. The main objectives were to investigate the role of visual feedback, how

    its absence or presence aects the interaction, and examine if grouping windows im-

    proves the interaction when there is absence of visual feedback. Current approaches

    and implementations were presented at the very start of this paper along with the

    initial problem that intrigued us to build this new technique for switching between

    windows on a MacOSX environment. Relevant work and researches that have been

    already conducted on the same or related fields were presented afterwards. We sep-

    arated them in themes, although there is no clear distinguish, and explained the

    dierentiation of our approach.

    We then presented the design of our implementation, along with important concepts

    and thoroughly explained our choices by grounding them from recommendations of

    previous works. To test and study the oscreen interaction technique, we devel-

    oped a prototype application. The implementation, architecture, description of the

    development frameworks, workflows and hardware specifications for the prototype

    application we developed were presented in later chapter. We continued by analyzing

    45

  • the procedure and the context of the user study. We had 9 participants taking part

    in this study. Each participant had to go through the verbal instructions, fill in the

    Demographic information form, familiarize themselves with the interaction through

    the User Learn Mode (5.2.2), accomplish the three feedback modes (3.4) for the ex-

    periment consisting of 8 open windows and finally fill in an assessment form for each

    of the 3 dierent type of modes. We then presented statistical information regarding

    the demographic information of the participants that took place in our study.

    We then presented the computed results for the two basic parameters: completion time

    and error rate. We stated our observations from the log file regarding the positioning

    in the oscreen area followed their significance. The results and arguments on the

    validity of the hypotheses were also discussed.

    Finally, we provided evidence and supported that all the hypotheses hold and con-

    cluded that a) a system for interacting between desktop windows and the oscreen

    space is feasible given that user receivers some form of feedback and b) no feedback

    is a feasible solution for