Offscreen Interaction

Thesis for the Masters degree in Computer ScienceSpeciale for cand.scient graden i datalogi

OScreen Interaction

Karampis Panagiotis

Supervisor: Sebastian Boring

Department of Computer Science, University of Copenhagen

Universitetsparken 1, DK-2100 Copenhagen East, Denmark

[email protected]

April 2015

Abstract

Modern desktop paradigms are operated through a set of keyboard combinations,

mouse clicks even mouse pad gestures, in which users are tied with and naturally

provide after so many years of usage a fluent interaction. Despite the vast evolution

of the available ways used to interact with Virtual Reality, the fundamental principle

of interaction remains always the same: usage of the concrete, well-known physical

devices (keyboard, mouse) attached to the computer. We present Oscreen Interac-

tion, a system to utilize the spatial space around the screen, which serves as windows

storage area while we interact with the computer screen through a pluggable, gesture-

recognizing device. The aim is to comprehend how users react to the existence or lack

of any form of visual feedback and whether grouping windows while positioning af-

fects the performance when no feedback is given. In a user study, we found that the

most ecient and eective way of interaction was when visual feedback was given;

in the case of no visual feedback, we observed that participants achieved the highest

performance by grouping windows or applying some subjective methodology.

iii

Acknowledgements

To family and Lela who endlessly supported through all this eort...

v

Contents

1 Introduction 1

1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Objectives - Thesis Establishment . . . . . . . . . . . . . . . . . . . . 2

1.3 Research Questions addressed . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Related Work 7

2.1 Using secondary means . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Using virtual space management . . . . . . . . . . . . . . . . . . . . . 8

2.3 Using space around displays . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Mid-air Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Design 13

3.1 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 The oscreen area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Gestures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4 Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Implementation 17

4.1 Application Description . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2 Development Languages & Frameworks . . . . . . . . . . . . . . . . . 18

vii

4.3 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.4 Connectivity & Architecture . . . . . . . . . . . . . . . . . . . . . . . . 20

4.5 Basic workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.6 Gesture algorithm workflow . . . . . . . . . . . . . . . . . . . . . . . . 22

4.7 Matching Coordinate Systems & Mouse Movement . . . . . . . . . . . 24

4.8 Application Identification . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.9 Implementation the oscreen and inscreen area recognition . . . . . . 25

4.9.1 The orthogonal cognitive area . . . . . . . . . . . . . . . . . . . 25

4.9.2 Implementation of boxes . . . . . . . . . . . . . . . . . . . . . . 26

4.10 Abstract application workflow . . . . . . . . . . . . . . . . . . . . . . . 27

5 Experiment - User study 29

5.1 Experimental Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.2 Experiments Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.2.1 Welcome - Brief explanation of interaction . . . . . . . . . . . . 30

5.2.2 User Learn Mode . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.2.4 Demographic Information . . . . . . . . . . . . . . . . . . . . . 33

5.2.5 Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Results 37

6.1 Completion Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.2 Error Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.3 Peculiar Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.4 Subjective Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

7 Discussion 43

8 Conclusion - Future Work 45

8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

viii

8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

A Code snippets 53

A.1 Coordinate translation & mouse movement . . . . . . . . . . . . . . . 53

A.2 Screen point conversion . . . . . . . . . . . . . . . . . . . . . . . . . . 53

A.3 Inscreen area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

A.4 Oscreen area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

A.5 Upon grab, extract windows under cursor information . . . . . . . . . 54

A.6 Shue windows before experiment starts . . . . . . . . . . . . . . . . . 54

A.7 Extract the title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

A.8 Generate image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

A.9 Move window (Single Feedback) . . . . . . . . . . . . . . . . . . . . . . 56

B Demographic Information Form 57

C Technique Assessment Form 59

ix

List of Figures

1.1 Open windows arrangement before switching to a new task. . . . . . . 2

1.2 Open windows arrangement after switching to Sublime. . . . . . . . . 3

1.3 User points or gestures on a window. Window becomes selected. . . 3

1.4 User drag and drop the window in the o-screen area. Foreground

is occupied by another window. . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Window has been dragged in the oscreen area is now available to be

retrieved by reversing the process. . . . . . . . . . . . . . . . . . . . . 4

3.1 Oscreen area illustrated with dimensions. . . . . . . . . . . . . . . . . 14

3.2 Grab gesture sequence. Release sequence is the reversed grab. . . . . . 15

3.3 Chrome applications image on box (0,1), current hand position in box

(0,2) with red frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4 Chrome applications image on box (0,1) and hand position at (0,1) . 16

4.1 Leap Motions size and upon operation of hand skeletal tracking. . . . 19

4.2 Leap Motion Controller structure. [56] . . . . . . . . . . . . . . . . . . 19

4.3 Leap Motion Interaction box: A reversed pyramid shape formulated

by cameras and LEDS. [2] . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.4 Leap Motion Controller architecture. [2] . . . . . . . . . . . . . . . . . 21

4.5 Oscreen Interaction - Abstract workflow of Basic Modules. . . . . . . 22

4.6 Gesture algorithm workflow: From initial state to grab state. . . . . . 23

4.7 Gesture algorithm workflow: From grab state to release state. . . . . . 23

4.8 Interaction box diagram. [2] . . . . . . . . . . . . . . . . . . . . . . . . 26

4.9 Box class diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

xi

5.1 Set-up for left handed mouse users. . . . . . . . . . . . . . . . . . . . . 29

5.2 Set-up for right handed mouse users. . . . . . . . . . . . . . . . . . . . 29

5.3 Notification with remaining windows number and title. . . . . . . . . . 32

5.4 Notification showing next window to be fetched from oscreen. . . . . 32

5.5 Windows after shuing. . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.6 Selected windows for the experiment . . . . . . . . . . . . . . . . . . . 33

6.1 Participant 1 fully matched FF, SF, NF. . . . . . . . . . . . . . . . . . 39

6.2 Participants 1 grouping. . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.3 Participants 2 potential grouping. . . . . . . . . . . . . . . . . . . . . 41

6.4 Mean Values assessed by participants for each feedback mode. . . . . . 42

xii

List of Tables

5.1 Feedback cycling for the first 3 participants. Afterwards, it repeats itself. 31

5.2 Sample Balanced Latin Square algorithm for windows . . . . . . . . . 31

5.3 Demographic information . . . . . . . . . . . . . . . . . . . . . . . . . 33

6.1 Abbreviations for the feedback modes . . . . . . . . . . . . . . . . . . 37

6.2 Mean time dierences between modes . . . . . . . . . . . . . . . . . . 38

6.3 Mean error dierences between modes . . . . . . . . . . . . . . . . . . 39

xiii

Chapter 1

Introduction

Window switching is one of the most frequent tasks a user performs and can occur

several hundred times per day. Numerous window operations are performed when we

work and run multiple windows, such as moving, resizing and switching. Management

of such activity, would help enhancing users computer experience and eectiveness.

Window switch can be evolved as a really complicated task, since for example a

developer may need to switch to browser and look for documentation, switch back

to IDE to write code, switch to terminal test and push the code, check for emails

and update completed tasks. Window switch is unavoidable, even on larger screens

users tend to consume all available screen space and create even more windows to

navigate to [23]. Generally, users rely on the Operating Systems window manager to

provide a convenient way to manage open windows according to their needs so that

they are easy to retrieve. Switching windows is divided into two subtasks: find the

desired window and then bring it to the foreground. Widely accepted techniques to

achieve both subtasks are performed either directly with mouse or with keyboards

keys combinations.

1.1 Problem Description

Interaction with objects scattered across the screens using the mouse can be trouble-

some as mouses movement is limited to the area of desk. In addition, it gets even

more problematic when drag and drop is involved, where one miss-click restarts the

process. Furthermore, mouses cursor is limited to the screen thus any interactions

are restricted within this plane. 3D interaction with screen has been researched and

several input devices have been developed that enable users to manipulate virtual

reality (VR), for example virtual hand and depth cursor techniques. Such techniques

-along with corresponding input devices- have been considered as inadequate, appro-

priate to use only under specific circumstances. Additionally, the hardware used is

way too expensive such as systems used to track bodies in the cinema industry [53],

[42].

1

Several Window Managers (WM) have been implemented to visualize this process.

Windows 7 presents each window on its own frame even if they are windows of the

same application. Macs mission control tiles opened windows so that they are visible

at once and simultaneously stacks windows of the same application, for all applications

that belong to the same desktop or workspace. Gnome 3 assigns dierent windows on

its own frame but presents windows under the same application stacked. The way to

navigate between open windows varies between Window Managers. The combination

of Alt+Tab (Cmd+Tab) is across all systems for consistency reasons, although cycling

between windows of same application varies from one Window Manager to another.

However, Robertson et al. [47] showed that stacking windows under same application

confuses many users, because the applications windows may not be related to the

same task, where a task is defined as a collection of applications organized around a

particular activity.

1.2 Objectives - Thesis Establishment

We propose a dierent technique for switching between tasks and involve human-

computer interaction (HCI) in the function of windows switching. As there are many

HCI patterns including body posture, hand/finger gestures, speech recognition, eyes

movement and so on, we chose to use hand gestures as they are the most trivial and

easy for users to get familiar with.

Using MacOSXs Mission Control to switch between applications can be cumbersome.

For example, windows are re-arranged every time another window gets focus or a new

one is created. In the following scenario the user is writing a document, therefore

the application Pages is selected, then user wants to switch to Sublime and activates

the Mission Control by pressing F3 and selects Sublime from the upper left corner.

Figure 1.1 shows the windows initial arrangement.

Figure 1.1: Open windows arrangement before switching to a new task.

At that point, user wants to switch back to the Pages and presses the F3. Figure 1.2

shows now a completely dierent screen, where the user has to gaze and find where

2

Pages are located so that they can select it.

Figure 1.2: Open windows arrangement after switching to Sublime.

From that point and after, switching between Pages and Sublime does not trigger

re-arrangement, unless a third task interleaves. Then again, user will have to find the

correct window in a rearranged list.

We experienced also this behavior by using the common paradigm of Cmd+Tab: after

switching between tasks using keyboard, Mission Control has rearranged the windows.

Preliminary experiment conducted with users both familiar and unfamiliar with Ma-

cOSX, showed that users were frustrated by the mechanic, were unable to understand

the pattern and stated that they expected pattern of windows order to be clear on

first sight rather than having to spend time into investigating how windows are ar-

ranged. All users reacted positively in arranging windows in a linear form such as in

rows or columns.

The contribution of this research is to design a new Human-Computer interaction

utilizing gestures in the oscreen spatial space. This would allow a more eective

task-switching paradigm by placing and retrieving desired windows in the spatial

space around a computer screen. This spatial space is defined by the cone area where

the LEAP Motion operates in. Figures 1.3, 1.4, 1.5 demonstrate this functionality.

Figure 1.3: User points or gestures on a window. Window becomes selected.

3

Figure 1.4: User drag and drop the window in the o-screen area. Foreground is

occupied by another window.

Figure 1.5: Window has been dragged in the oscreen area is now available to be

retrieved by reversing the process.

1.3 Research Questions addressed

Here, we present the scientific research questions addressed by Oscreen Interaction

and in terms of goals, we want to exploit the impact and the eciency of such 2.5D1

pointing system through dierent visual feedback modes, bringing users one step

closer to freeing themselves from operating solely with mouse. Therefore, the main

questions that this thesis will try to answer are:

Q1:How important is the role of visual feedback in VR systems? Towhich extend it aects the performance?

Feedback and HCI techniques are inextricably connected. For instance, we

get feedback when we press Alt+Tab to switch windows through a dialog in

the screen or when we move the mouse, we can see the mouse cursor in the

screen (full feedback). What would be our experience in HCI if there was no

dialog when pressing Alt+Tab (no feedback) or if when we hover over menus, a

change of color (typical behavior) would only appear frequently and occasionally

(partial feedback)? We need to investigate this behavior in our system and

examine how feedback influences performance on the oscreen area.

Q2: Do users interact randomly with an non-visible area or developtechniques?

1With the term 2.5D we mean that although we interact with a typical 2D computer screen, the

actual pointing is happening in a 3D space, thus the term 2.5D

4

Another factor that would improve vastly the performance and the eectiveness

is grouping according to subjective patterns or following some specific subjective

methodology when positioning windows in the oscreen area. As we explain

in the experiment 5.2.3, we would like to see the existence (or not) of such

formations; which we consider would give us directions for future research.

1.4 Thesis Structure

The structure and the content of the thesis is shortly described below. This work is

consisted of 8 Chapters, including this chapter, which are structured as follow:

Chapter 2: Publications related to several forms of interaction (Secondarymeans 2.1, Virtual Space management 2.2, using space around displays 2.3, mid-

air interaction 2.4), are presented. We quote papers that are highly relevant to

our work and others that are in similar field, as this study merges a variety of

considerations. In this chapter we target to help the reader comprehend current

limitations and current approaches in the area as well as introduce the reader

to this field.

Chapter 3: We present the design of our implementation and analyze the con-siderations taken behind the Oscreen Interaction while backgrounding choices

made with scientific facts. The major design choices (oscreen design, gesture

choice and feedback modes), which are listed and explained, will help the reader

to understand better our work and act as a preliminary but required step before

reading the implementation.

Chapter 4: We present the implementation of the prototype application builtto test the Oscreen Interaction. We provide thorough details on the way it

works, architecture and workflows as well as on the way the prototype was

built. Small code snippets can be found in this section while the functions that

we consider as the most important can be found in the appendix A.

Chapter 5: All the information regarding the User Study we performed, areincluded. We present and explain the phases of the experiment along with its

modes for each participant that took part in the study. Statistical demographic

results are also shown in the end of the chapter.

Chapter 6: We present the results of the User Study, separated in correspond-ing categories. Each category represents a specific metric we tested. We also

provide a form of visualization through graphs and state our observations after

investigating the log files containing data regarding positioning and order of

windows during the experiment.

Chapter 7: First we set our hypotheses on the results. Secondly we discussthe results based on the findings of Anova analysis performed previously and

on the participants assessment. Finally we comment on whether the overall

concept and the hypotheses assumed hold.

5

Chapter 8: In this chapter, the final conclusions are presented along withidentified and proposed research areas for future work.

6

Chapter 2

Related Work

The work described in this thesis aims to develop a prototype model for interacting

with computer displays using an external device that bridges humans 3D environment

with the desktop metaphor of the computers screen for performing task switching

operations.

This chapter provides an overview of existing techniques, technologies and imple-

mented frameworks. Although there is no clear distinguish between the several themes

mentioned below, we categorized existing work into four themes: Using secondary

means, space around displays, virtual space management and mid-air interaction.

2.1 Using secondary means

Interacting with a display by using secondary means indicates that the user can

manipulate and interact to what is shown in a display through another interface.

Such interface can be a mobiles, PDAs, touchpads screen or a tabletop by using

gestures with the dominant or both hands.

A secondary display, such as the PDAs display, allows a user to remotely switch

between top-level tasks and scroll in context using the non-dominant hand while

dominant hand operates with mouse.[39] While the system provides such advantages,

it is imperative to indicate that switching tasks through a PDAs screen can be cum-

bersome and prone to errors of not selecting the desirable task due to small resolution

size. Although current technology enhances PDAs with higher resolution, fast switch-

ing between tasks is limited by having to focus on PDAs screen, identify the correct

task from the list, select it and focus back on computers screen.

A numerous researchers have explored the manipulation of objects within the design

and animation computer area. Multi-point touch tablets have existed for long ago[34],

yet the Multi-finger Cursor Techniques[38] -implemented on a modified touchpad- and

Visual Touchpad[36], which projects hands gestures into a computers screen, allow

7

a degree of hands freedom and interaction with screen objects.

Boring et al. implemented a system that allows users to manipulate devices that are

incapable of touch interaction. By capturing video using a smartphone, along with

techniques for targeting tasks such as Auto Zoom and Freeze were able to precisely

interact with an object in distance.[11]

When interacting with secondary means, areas close to keyboard and mouse are most

appropriate for one handed interactions as shown by Bi et al. [10] The Magic Desk

integrates desktop environment with multi-touch computing and provides a set of

interaction techniques allowing computer screen to be projected into the desk. The

DigitalDesk[58] is built as an ordinary desk, but with additional functionality that

emphasizes in physical paper interactions. The desk uses a reversed approach than

commonly used: Instead of representing physical objects as digitalized objects, it uses

the computer power to enhance the functionality of a physical object. With video-

based finger tracking and usage of two cameras for document scan and project data

on the desk, the DigitalDesk provides digital properties to physical papers.

Malik et al. [37] and Keijser et al. [31] use multi-hand gestures to enable the user to

control an object and the workspace simultaneously, thus allowing the user to bridge

the distance between objects and oer the user a wider range of gestures by allowing

the two hands to work together. However, this has been accomplished through ad-

ditional hardware and vision-based systems. Another approach to manipulating 3D

objects is the Sticky tools[19] which achieves all 6DOF without additional hardware,

allowing users to manipulate an object using one, two or three fingers by using shallow

depth[18] technique.

Part of goal and therefore our dierentiation between forehand mentioned systems is

a) to provide such a system that supports hand -not just a limited amount of fingers-

interaction, b) without using external cameras which are subject to occlusion, c) avoid

stationary systems or limit our interaction experience by using secondary screens like

mobiles or PDAs.

2.2 Using virtual space management

Virtual space management refers to the concept of enhancing user experience (UX)

by applying algorithms and operations in existing or newly opened windows in the

desktop. Several researches have been conducted on optimizing the way windows

resize, categorize windows according to user need or even provide their own desktop

metaphor.

The pioneer in that area, Rooms[21], is a system that similar tasks are assigned in

a separate virtual space (room) whilst sharing of windows between similar tasks is

supported. The Task Gallery: A 3D window manager[48], a successor of Rooms,

provides a 3D virtual environment for window and task management where a 3D

stage shows the current task while other tasks are placed in the edges of the stage

8

namely in the floor, ceiling and walls. Each task comprises of its related windows.

To allow users to have an overview of open tasks, a navigational system is introduced

where users can go back or go forward and thus see more tasks or focus more on one.

While the system provides advantages and animation techniques, there are also dis-

advantages with respect to aiding users who desire to complete many dierent tasks

simultaneously in a small time span. If for example a user wants to switch between

two or more tasks, they will have to move backwards until they find the required task,

select it, find the desired window from the loose stack and operate on it. To switch

to previous task the user will have to reverse the whole process. We can see that the

complexity of such navigation increases according to number of tasks the user needs

to perform. Furthermore the time required to switch between tasks is increased by

the one-second animations between moving forward/backward and opening/closing

tasks.

Elastic Windows[29] and New Operations for Display Space Management and Win-

dow Management[25] endorse the need for new and advanced windowing operations

for users with many tasks. The Elastic Windows implementation is based on hierar-

chically managing and displaying windows with root window to be the desktop. The

system supports inheritance where some characteristics of the parent window can be

inherited to its children. The philosophy of the system is that windows under the same

parent cannot overlap but they consume all the available space. Some side-eects of

this philosophy are that when one window resizes, all of its children and windows of

same parent are also resized leading to a whole desktop re-arrangement with only one

resize. Another side-eect directly arisen from the no overlapping rule is that while

the number of child windows increases, the eectiveness of displayed information de-

creases. Similarly, QuickSpace[24] automatically moves windows so that they will not

be overlapped by a resizing window. All techniques rely on the existence of empty

space, which may not often be available even on multiple monitor systems as shown

by Hutchings et al.[23]

Scwm[8] is a parametrized window manager for the X11 window manager and is

based on defining constraints between windows. These constraints are presented as

relationships between windows and are user defined. While this system provides some

advantages such as operating on two related windows as one, it is susceptible to multi-

tasking requirements where the number of windows increases so does the number and

complexity of relationships that have to been defined and maintained. Yamanaka et

al.[60] approach the virtual space management in a dierent way. Instead of creating

algorithms to adjust window space upon creation new window or resize eect, the

Switchback Cursor exploits the z-index axis of overlapped windows on a Windows 7

operating system. Mouse cursor upon specific movement in conjunction with specific

keyboard press, traverses and selects windows that are below the visible one.

One approach to address that users work on dierent tasks in parallel and switch

back and forth between dierent windows is to analyses user activities and assign

windows to tasks automatically based on if windows overlap or not[59]. Another

approach addresses the users fast switch by analyzing the window content, relocation,

9

transparency and content combined[54],[35].

The majority of techniques apply algorithms to manipulate currently open windows

when new one is created. We would like to extend the Rooms and the Task Gallery

according to our needs allowing the underlying resizing technique as is: managed by

the operation system. Thus, instead of using implemented semantics such as move

back, move forward, go left room we define our semantics which indicate where a

window is virtually placed.

2.3 Using space around displays

The theme, space around displays, refers to a concept where the physical (empty)

space close to the interaction target, such as a computer or mobile screen, is used in

conjunction with external sensors like depth cameras and hand gesture recognizing

devices to provide an interaction channel the between human and the computer.

The Unadorned Desk[20] which is our inspiration, is an interaction system that util-

ities the free space in a regular desk enhanced with sensor. It virtually places the

o-screen and secondary workspace onto the desk providing more screen space for

the primary workspace and thus acting as an extra input canvas. The experiments

conducted with the unadorned desk showed that interaction with virtual items on

the desk is possible when items are of reasonable size and number with or without

on-screen feedback.

The usage of a Kinect depth camera mounted on top of the desk limits the mobility

of the system and requires that user is pinned in a specific place in the desk. As

Kinect camera by nature is prone to false detections when sunlight, the camera has

to be placed attentively. Although the system works well when few items are virtually

placed in the extra input canvas, the desks surface has to be clean of physical objects

which is not always a valid case as desks tend to be messy.

Virtual Shelves[35] and Chen et al.[12] combine touch and orientation sensors to

create a virtual spatial space around the user that allows invocation of commands,

menu selection and mobiles applications interaction in an eye-free manner. Wang et

al. [55] demonstrated the benefits of using hand tracking for manipulation of virtual

reality and 3D data in CAD environment using two webcams to track 6DOF for both

hands. However, such tasks are restricted to controlled, small areas targeting specific

frameworks (CAD). Usage of Kinect camera is widespread when comes to augmented

and virtual reality. MirageTable[9], HoloDesk[22], FaceShift[57], KinectFusion[26] is

a set of applications supporting real-time physics-based interactions however as noted

by Spindler et al.[51], tracking with depth cameras, still has limited precision and

reliability. SpaceTop[33] also uses a depth Kinect camera but also a see-through

display. Although it allows 3D spatial input, a 2D touchscreen and keyboard are

also available for input. The unclear visual representation for guidance as noted by

authors is a subject that we need to take seriously. Physical space requires extra

sensors, extra cameras and most importantly to be free of several objects. Our goal

10

is to move the interaction area from solid objects such as a desk into a virtual area

around the screen and thus eliminate usage of additional sensors and cameras which

make the system less portable.

2.4 Mid-air Interaction

There are occasions where interaction with displays has to be done from greater

distance than standing in front of the screen. By mid-air interaction we mean that

the communication channel between user and display is the air/empty space and done

through usage of laser pointers, Wiimote, virtual screens or even worn gloves.

Uses of mid-air interaction varies from point and click to manipulation and selection of

3D data. The latter, was implemented by Gallo et al[14] using a Wiimote controller

to manipulate and select 3D data in a medical environment. Even if system was

not evaluated, the system was able to dierentiate between two states: pointing and

manipulation. In the pointing state, the Wiimote acts as a point and click laser

pointer whilst in the manipulation state, Wiimote interacts with a 3D object with

available actions of rotating, translating and scaling.

In [28], the authors utilize both Kinect depth camera and skeletal tracking[30] to

identify pointing gestures made by a standing person in front of the Kinect camera.

The spatial area in front of the camera is considered as virtual touch screen where

the user can point. To detect the direction of the pointing gesture, they detect and

track the pointing finger using a minimum bounding rectangle and Kalman filtering.

Interaction techniques using laser have the advantage of low cost and compose the

best known perspective-base technique. A laser beam can act as a control for point

in a multi-display system. Simultaneously, laser beams suer from the limitation

that there are no buttons to augment the single point of tracked input rendering

mouse operations impossible. Additionally, laser pointer techniques suer from poor

accuracy and stability [41], and can be very tiring for sustained interactions. Olsen

et al[41] proposed a non-direct input system and explored the use of dwell time to

perform mouse operations. However, the installation cost and complexity of most

systems are prohibiting when increasing scalability.

Kim et al.[32] tried to approach these limitations by researching ways to embed sen-

sor in body, specifically a wrist-worn device, which recognizes body movements thus

reduce the need for external sensors. The wrist-worn device consists of a depth cam-

era combined with an IMU which recognizes preachingly fingers movements through

usage of biomechanics.

Laser interaction doesnt allow generally recognition of gestures and although Jing et

al. [28] implemented a point and click system, that system is stationary. Wiimote

suers from the lack of identifying finger gestures whilst worn sensors and gloves

require that user embeds an external device upon skin, which might be uncomfortable.

We propose a system that is mobile, with no skin attached devices which identifies

11

hand gestures for both hands and in extension provides the required functionality of

augmented buttons if that is required.

2.5 Summary

We have seen that there is a variety of ways; among them air, third interfaces and

cameras in order to interact with a computer screen, using a huge range of methods

and tools such as worn devices, fingers biomechanics, laser pointers, mobiles, game

controllers and so on. Each aforementioned theme has situational advantages and

drawbacks and comes with implementations that provide an alternative user experi-

ence and interaction. The interaction with the computer screen comes at a cost of

introducing either rather complex, non-mobile interaction systems or ways of window

management that rely on algorithms and not on users desires.

We thus propose a work that combines features from the virtual space management

and mid-air interaction by keeping the interaction mean (mid-air) as simple as possi-

ble. At the same time we provide a trivial window managing metaphor of select and

then show or hide, giving the user the ability to resize and position windows the way

they want.

12

Chapter 3

Design

On this chapter we describe the considerations to follow in an Virtual Reality interac-

tion according to our needs, the oscreen area, gestures, feedback modes and choice

of framework that supports the interaction.

3.1 Considerations

Selection on screen: The initial event which enables the interaction by iden-tifying the collision of a virtual object, any -partially- visible window, with the

mouse cursor upon a grab gesture.

Selection o screen: The initial event which enables the interaction by iden-tifying the the hand coordinates in the oscreen area upon a grab gesture.

Drop in oscreen: The second stage event that upon release gesture in theo screen area removes focus from the selected window (hide).

Drop in screen: The second stage event that upon release gesture in the mainscreen area gives focus to window selected (show).

Select o screen & drop o screen: A two stage event that cognitivelymoves an already hidden window to another box in the oscreen area allowing

organisation of windows.

Worth mentioning is that since our system is considered as a 2.5D pointing system

with no need to implement interaction in the Z axis, we employ manipulation of

virtual objects with four degrees of freedom (4DOF), namely up / down, left / right.

13

3.2 The oscreen area

The interaction box of the physical input device provides us enough space to extend

cognitively the screen area on the top side; space that serves for cognitively saving

data for assigned windows. This area has as much width as the screen width and is

scalable1 to the screen proportions.

Smith et al. [50] observed that the average number of open windows a user keeps per

session is 4 on single display while the dual monitor users keep 12 opened windows.

Based on such observations we chose to divide the oscreen area in 8 boxes thus

allowing to save state for 8 windows.

Figure 3.1: Oscreen area illustrated with dimensions.

The fact that human hand stability deteriorates with age, fatigue, caeine and other

factors [3] indicates that the oscreen area should be designed as large as possible

with large enough boxes to neglect users unstable hand.

We thus cognitively divided the oscreen area in two rows, 4 boxes per row, giving

more freedom for hand movement without risking to choose the wrong box. Each

boxs width is calculated by the formula screenWidth/4 and each box is assigned a

number between 0-3, which indicates the position in the X axis. Since we have two

rows we followed a Cartesian system with positions (X,Y). The X axis has domain

values 0-3 whilst the Y axis 0 and 1. Figure 3.1 illustrates this concept in details.

Although the interaction box is wide enough to operate outside of the screen width,

we decided to keep the interactions strictly within the screen width and thus virtually

extend the screen only on the Y axis. The intuition behind this choice is, that

we didnt want to increase the diculty of cognitively dividing the oscreen area,

especially when no visual feedback was provided in the experiment.

The interaction with the application starts when the user places their left or right

hand inside the interaction box. At that point, the user has complete control of the

mouse cursor movement.

3.3 Gestures

Gestures implemented are classified as concrete. They are evaluated after they have

been completely performed, e.g. Selection on screen is only valid when hand from

1Up to a limit

14

having extended fingers (release) is now a punch (lack of fingers).

Research on previews works [43], [44] has shown that gestures applied on physical

input devices are preferred to match gestures that users would normally apply on

physical objects. Such natural gestures allow the manipulation of virtual objects and

are tied with users knowledge and skills of the physical world.

Hand gestures are used to interact on the 3D space rather than pointing. The later is

commonly used in 2D view systems because of its simplicity and stable performance.

Several user-defined gestures such as pointing, pinch, grab, 2-hand stretch have being

classified according to user experience [44]. Out of the two possible candidates, pinch

and grab, that imply natural and realistic interaction, we have selected the later one

because grab is closer to the natural interaction we target and secondly because pinch

is mostly used for other interactions like rotation and shrink / stretch in dierent axis

[27].

For our purpose, and based on the commonly used gestures as presented above, the

grab is a crucial gesture to implement interactions within the Virtual Reality space.

The prototype then defines two gestures: Grab and Release both applicable to the

oscreen and to the in-screen area. These gestures must be performed sequentially to

complete an action due to the fact these gestures imply that hand transits from one

state to another.

Figure 3.2: Grab gesture sequence. Release sequence is the reversed grab.

Furthermore, mouse cursor can only be under one window, thus only one window

can be identified per grab. In extension to this, interaction is performed only by one

hand. Either the dominant or the secondary, according to user preference.

15

3.4 Feedback

We have designed the application to operate in three dierent modes: Full feedback,

single feedback and no feedback.

The full feedback mode, as shown in Figure 3.3 , provides a window with information

about all boxes, which shown when hand is in the oscreen area. The selected box

is visually identified by red border and in case of a window is cognitively saved, a

screen shot taken at the grab event is shown to help the user identify the window

when required.

The single feedback mode (Figure 3.4), provides a window with information about

the specific box which is associated with current hand position in the oscreen area.

Apart from the screen shot shown, user gets extra visual help by having as window

title the applications title e.g. Chrome.

The no feedback mode provides no informational window nor any other information

when the hand is in the oscreen area.

Figure 3.3: Chrome applications

image on box (0,1), current hand

position in box (0,2) with red frame

Figure 3.4: Chrome applications

image on box (0,1) and hand position

at (0,1)

3.5 Frameworks

The nature of this research is such that forces us to use low level programming lan-

guages, frameworks and libraries as close as possible to the operating system while

using up-to-date programming paradigms (Object-oriented programming). For such

reason languages that would require a wrapper to access native calls such as Java or

languages mainly targeting web development such as JavaScript have been excluded

from consideration even if the physical input device supports them.

16

Chapter 4

Implementation

Oscreen Interaction is an application that operates on a standard Mac with any

screen size which runs OS X version 10.7 or higher. Oscreen Interaction cannot be

ported to Windows or Linux based systems because of a) OS X native libraries are

used and b) targets as a replacement of Mission Control.

The implementation has been developed in Objective C language using Xcode. Xcode

was developed by Apple for both OS X and iOS, which contains on the fly documen-

tation for all libraries alongside with other utilities that serve to enhance coding

eciency and experience such as debugger, tester, profiler, analyzation tool etc. In-

troduced at June 2, Swift is a new programming language for coding Apple software.

Xcode supports also developing of AppleScript which stands for a scripting language

built into OS X since version 7.

Important code snippets that were crucial for the implementation can be found in the

Appendix A.

4.1 Application Description

Oscreen Interaction is an application which has been built to test the capabilities

and the limits of interacting with desktops applications. It aims to provide an area

above the screen plane where user can interact with, in order to organize and switch

between windows. The interaction with the user is performed through hand gestures

received by a motion tracking input device. Although Oscreen Interaction is not a

complete application, it embeds all the required infrastructure needed to support the

new interaction we are proposing. The basic scenarios are the following:

When user grabs a window and releases in the oscreen area we should be ableto identify which window this is, save information in memory and hide it.

When user grabs a window from the oscreen area and releases it in screen, weshould be able to identify which window it was by checking memory data and

17

then show it.

When user releases a window in the oscreen area, if the box is occupied thenoverwrite this box with the new windows data and pop out previously stored

window.

When user grabs a window from the oscreen area and releases in oscreen area,if the box is occupied then swap the saved memory data, otherwise move the

window in the new box.

Despite this may sound trivial at first glance, Apple does not provide one univer-

sal API for interacting, manipulating and identifying windows. As result several

workarounds were implemented in order to overcome limitations that where found.

Specifically, between the Accessibility API and the Cocoa the limitation was that the

Accessibility hierarchy is independent and separate from the window hierarchy.

4.2 Development Languages & Frameworks

Objective-C [40]. Primary programming language for developing OS X ap-plications. Objective-C as name states is an Objected-oriented Programming

language (OOP), successor of C++.

Cocoa [13]. High level API that combines three other frameworks: Foundation,AppKit and CoreData included by header file Cocoa.h that automates many

application features to comply with Apples human interface guidelines.[50]

Quartz [46]. Provides access to core to lower graphic services, composed byQuartz 2D API and Quartz Extreme windowing environment.

AppleScript [7]. Scripting and narrative language for automation of repetitivetasks.

Accessibility API [5]. Extra libraries targetting to assist users with disabili-ties.

Leap SDK ver 2.2.2 [49]. Provides access to the physical input device chosenfor this application (LEAP Motion) through Objective-C calls.

Carbon (Legacy) [6]. Legacy API, acting as bridge between Cocoa and Ac-cessibility API.

4.3 Hardware

For identifying gestures and bridging the real with virtual space, the Leap Motion

Controller is used. The controller is a small peripheral 3 x 1.2 x 0.5 inches no much

heavier than a common USB stick. It utilizes two cameras and three infra-red LEDs

18

serving as light sources to capture motion and gesture information. The system is

capable of tracking movement of hands, fingers or several other objects within an area

of 60cm around the device in real time. It can detect small motions and has accuracy

of 0.01mm [15].

The small cameras that Leap Motion Controller comes with cannot extract as many

information as other systems that come with large cameras. Because embedded al-

gorithms extract only the data required, the computational analysis of images is less

considerable and therefore the latency introduced by the Leap Motion Controller is

very small and negligible. The fact that the controller is small and mostly software

based, makes it suitable for embedding it in other more complicated VR systems.

Figure 4.1: Leap Motions size and upon operation of hand skeletal tracking.

Although a few details are known regarding algorithms and its advanced principles

for the Leap Motion, as is protected by patental restrictions, Guna et al. [16] attempt

to analyze its precision and reliability for static and dynamic tracking. Ocial doc-

umentation [1] states that it recognises hands, fingers, and tools, reporting discrete

positions, gestures, and motion.

Figure 4.2: Leap Motion Controller structure. [56]

19

J. Samuel [27] categorizes the Controller as an optical tracking system based on stereo

vision principle, where the interaction box of the controller is shown in the Figure

4.3 below. The size of the InteractionBox is determined by the Leap Motion field of

view and the users interaction height setting (in the Leap Motion control panel). The

controller software adjusts the size of the box based on the height to keep the bottom

corners within the filed of view. [1] with a maximum height of 25cm.

Figure 4.3: Leap Motion Interaction box: A reversed pyramid shape formulated by

cameras and LEDS. [2]

It is important to mention that controller is accessed through an API that supports

dierent programming languages, amongst Objective-C that is our main programming

language.

4.4 Connectivity & Architecture

The Leap Motion runs over USB port as a system service that receives motion tracking

data from the controller. Using dylib (dynamic libraries) on Mac platform exposes

these data to the Objective-C programming language. Furthermore, the software

supports connections with a WebSocket interface in order to communicate with web

applications.

Figure 4.4, shows the architecture of the Leap Motion Controller, which consists of:

Leap Service, receives and processes data from the controller over the USBbus and makes data available to a running Leap-enabled application.

Leap control panel, runs separately from service allowing configuration of thecontroller, calibration and troubleshooting.

Foreground application, receives motion tracking data directly from the ser-vice while application has focus and is in the foreground.

20

Background application, allows reception of motion tracking data even if theapplication runs in background, is headless or run as daemon.

Figure 4.4: Leap Motion Controller architecture. [2]

4.5 Basic workflow

The Oscreen Interaction application can be dividing in the following modules:

Start Application - represents the launch o the application.

Window module - in this module the applications window controller is reg-istered and listens to events for showing or hiding in the designed areas. This

controller supports three modes: Full feedback, single feedback, no feedback.

View module - module responsible for visualizing the selected o screen box.

Leap module - this module loads the application logic to handle motion track-ing data from the Leap Controller.

Experiment module - This module is responsible for guiding the user whenconducting experiment, although it is not a discrete entity, as is included in the

Leap module.

Logger module - saves various data in log file when experiment is conducted.

The flowchart of those modules can be observed on figure 4.5. We notice that al-

though registration of modules is linear, windows module in cooperation with leap

21

module oer the variation between the three feedback modes, thus allowing user to

get familiarized first with the cognitive oscreen area before actually conducting the

experiment.

Figure 4.5: Oscreen Interaction - Abstract workflow of Basic Modules.

4.6 Gesture algorithm workflow

On this work, we have implemented an algorithm that performs actions of hide / show

of opened applications as well as organizing them in the oscreen area based on hand

coordinates and the two gestures of grab and release. Figures 4.6 and 4.7 demonstrate

how algorithm works in general before getting into details on the next sections. The

algorithm starts upon hand is recognized by the Leap service and executes per frame

based on tracking data given (coordinates and handStrength).

It is vital to refer to an exposed variable by the API named handStrength, part

of the LeapHand class, which indicates how close a hand is to appear as a fist

by measuring how curly fingers are. Based on that, we can say that fingers that

arent curly will reduce the grab strength and therefore reduce the probability of

identifying a grab gesture whilst fingers that are curly will increase the grab strength

and therefore reduce the probability of identifying a release gesture. This variable has

domain values [0..1] and experimentally decided that value >=0.8 indicates a grab

22

4.7 Matching Coordinate Systems & Mouse Move-

ment

Leap Controller provides coordinates of fingers in units of real world (mm), thus

it is vital to translate coordinates in screen pixels, according to screen resolution.

The SDK provides methods to normalize values in the [0..1] range and get screen

coordinates. We need to keep in mind that top left corner in OS X is (0,0) whilst the

(0,0) in the Leap is on bottom left, thus we need to flip the Y coordinate.

LeapInteractionBox *iBox = frame.interactionBox;

LeapVector *normalizedPoint = [iBox normalizePoint:leapPoint clamp:YES

];

int appX = normalizedPoint.x * screenWidth;

int appY = normalizedPoint.y * screenHeight;

appY=screenHeight -appY;

Having the screen coordinates, it is then trivial to control the mouse with hand:

CGEventRef move1 = CGEventCreateMouseEvent(NULL , kCGEventMouseMoved ,

CGPointMake(appX ,appY),kCGMouseButtonLeft);

CGEventPost(kCGHIDEventTap , move1);

4.8 Application Identification

The next algorithm shows the steps to identify a window under mouse cursor, after a

grab gesture is performed in the in-screen area:

GrabIsFinished AND isInScreen

Get mouse location using Cocoa

Convert mouse location to CGPoint using Carbon API

Get AXUIElementRef by CGPoint using Accessibility API

Extract application PID from AXUIElementRef

Extract application Title from AXUIElementRef

Generate image of Application given then Window Title.

Scan windows in screen

If kCGWindowOwnerName is equal to extracted Title

Get the windowID

Create bitmap of application by windowID

Save data in custom object for using upon release.

Listing 4.1: Abstract algorithm for identifying window.

As we can see, three APIs (Cocoa, Carbon, Accessibility) are cooperating for a single

window identification.

24

4.9 Implementation the oscreen and inscreen area

recognition

In the Design chapter(3.2), we defined abstractly the oscreen area. Now, we show

how this cognitive area is implemented by combining screen coordinates and the Leap

Controllers coordinate system.

4.9.1 The orthogonal cognitive area

In section 4.7 we normalize hand coordinates to fit screen dimensions, thus we limit

X and Y values within the screen. In order to overcome this, we also save the hand

coordinates (X,Y) values to variables before the normalization takes place. Given

that the interaction box of Leap starts at 82.5mm which corresponds to pixel 0 and

ends at 317,5mm which corresponds to screen height (fig. 4.8), we can say that pre-

normalized values higher than 317,5 refer to pixels cognitively greater than the screen

size. Indeed, using the mathematical formula p = (yk y0) hyny0 where p is the(cognitive) pixel, yk is the hand position in millimeters, h is the screen height (=900)

and y0,yn the bottom and upper height of the interaction box, proves that for hand

position of e.g. 350, we refer to the pixel 1024 which in our occasion is in oscreen

area.

Having explained mathematically the translation from pixels to millimeters, we can

now define the oscreen area code-wise by applying if statements:

if(appY ==0&&yy >= LOWER_BOUND &&yy 0 && appX =0&& appY 0 && appX

Figure 4.8: Interaction box diagram. [2]

4.9.2 Implementation of boxes

Since the number of boxes we defined in the Design chapter(3.2) is a constant(=8),

the width of each box is also constant. As per design, we need to distribute these

boxes in two rows and therefore their size is calculated by screenwidthnumberofboxes2

, number which

is only dependable on the screen width. To find in which of the box our hand is for

the X axis, we calculate boxx = appX numberofboxes

2screenwidth and discard the decimals. For

example, when hand is in position appX=465, for a 1440px wide screen, then we refer

to box 1 (= 465 821440 ). For the Y axis, we map the yy value to 0 when its value is[LOWER BOUND, (UPPER BOUND LOWER BOUND)/2] and 1 when itsvalue is ((UPPER BOUND LOWER BOUND)/2, UPPER BOUND].

26

We thus have the unique coordinations of any box, allowing us to build an Objective-

C class and store it in a NSMutableDictionary using these unique coordinations as

identifying key. The instance of the box class (Figure 4.9) is created upon a release

gesture in order to save specific windows data such as title, PID and screen shot

which were generated when user performed a grab in an open window.

Figure 4.9: Box class diagram.

4.10 Abstract application workflow

Having defined key algorithms and workflows above, we can now define the abstract

workflow of our implementation as shown in 4.4.

Initialization

Load modules (as described in Chapter 4.5)

End

Main Loop (per frame) onFrame ()

While hand found do

Move mouse according to normalized hand position

If hand is in offscreen area AND feedback mode is not none

Show feedback window

End

Measure hand grab strength

When grab strength 0.8 gesture is finished

Get gesture and position of event

If gesture was grab

Apply algorithm as described in Fig.4.6

End

If gesture was release

Apply algorithm as described in Fig.4.7

Free resources

End

End

End

End

Listing 4.4: Abstract algorithm of the implementation.

27

Chapter 5

Experiment - User study

On this chapter we describe the details of the user study we conducted by presenting

the principles of the experiment itself and explain thoroughly all the phases that the

experiment consists of; from the setup until the final phase: the assessment given by

the participants. Finally, we present briefly statistical data of our participants.

5.1 Experimental Set-up

In order to interact with the content and evaluate the interaction, we have set-up a

testing area on which a fixed Mac with 17 retina display placed in a table, easily

accessible by the participants sitting in front of it. In front of the laptop, the Leap

Motion controller is placed in a way that potential cable twist will not lean device

forwards or backwards aecting user experience.

The device is attached either on the left or on the right side of the laptop depending

to participants dominant hand and their preference.

Figure 5.1: Set-up for left handed

mouse users.

Figure 5.2: Set-up for right handed

mouse users.

29

5.2 Experiments Procedure

In this chapter, the procedure of the experiment is discussed and thoroughly analyzed.

We conducted a user study to test and analyze the OScreen Interaction. Each user

participating in this study, should follow the instructions given to them, complete the

experimental part and finally provide us an assessment. The procedure followed for

each participant consist of the following parts.

Welcome - Brief explanation of interaction - User Learn Mode - Experiment - Demo-

graphic information - Assessment

These parts will be discussed, interpreted and visualized with appropriate figures.

Duration of each experiment was calculated to last around 25 minutes plus time

for the instructor to explain the concept and the participant to perform the final

assessment. We ended up with a study that requires approximately 40 minutes.

Though, the time required for the whole procedure to be completed was variant, as

was depended on how much time the participant used in the User Learn Mode in

order to feel comfortable with the oscreen area. We also logged data other than

timing and errors, data that might help us understand potential position patterns of

windows placed in the oscreen area.

5.2.1 Welcome - Brief explanation of interaction

In the beginning of the experiment the experimenter introduces himself, welcomes the

participant and explains their rights. Participants are allowed to withdraw from the

experiment at any time they want or if they feel uncomfortable or tired. After that

we thank them for participating and helping us in this study and finally introduce

them to the purpose of this study, as well as stating which are the goals we are trying

to achieve.

At that point, we start explaining the various aspects of the experiment. We explain

first, the ways they can interact with the system and explained the two gestures.

Afterwards the three feedback modes were thoroughly presented in conjunction with

an in-depth explanation of the oscreen area. Before proceeding to the actual exper-

iment, we made sure that the participant understood the concepts of the interaction

by encouraging them to use the User Learn Mode (5.2.2) in all feedback modes

and familiarize themselves before the experiment.

Furthermore, participants were asked to get a close look to the specific applications

that they will operate on, as our prototype application could take screen shots of the

majority of applications but not all. Thus, we were forced to use specific applications,

some of which, participant might not be familiar with.

30

5.2.2 User Learn Mode

For reasons we will explain below (5.2.3), it is vital for the participant to spend a

few minutes in the User Learn Mode, where no timings and errors are logged. In this

part, user takes in turn the Full Feedback, Single Feedback and No Feedback mode

while they get brief explanation of screen visualizations and their components where

this is applicable. When participants were ready, they could initiate the experiment

by pressing the ESC key.

At this point and afterwards, the experimenter interferes no more with the participant

as on screen instructions are given.

5.2.3 Experiment

Principles

The experiment itself consists of cycling between the three feedback modes where on

each feedback, each participant is asked to operate on a random windows sequence.

When the experiment is initiated, the three modes follow the Balanced Latin Square

algorithm [52]:

Participant First Mode Second Mode Third Mode

1 Full Feedback Single Feedback No Feedback

2 No Feedback Full Feedback Single Feedback

3 Single Feedback No Feedback Full Feedback

Table 5.1: Feedback cycling for the first 3 participants. Afterwards, it repeats itself.

The reason for encouraging participants to use time in the User Learn Mode, is that

even numbered participants (2,4,6,8), would have to conduct the experiment starting

with the No Feedback mode, a mode that is considered hard especially if one does not

understand position of boxes in the oscreen area.

The same behavior is applied on the window list. The windows to operate on, are

stored in a 8x8 structure where each row contains window titles following the Balanced

Latin Square algorithm as shown in Table 5.2.

# Window Titles

1 Wnd 1 Wnd 2 Wnd 8 Wnd 3 Wnd 7 Wnd 4 Wnd 6 Wnd 5




Table 5.2: Sample Balanced Latin Square algorithm for windows

Upon experiment start, by pressing the ESC key, a row is picked randomly and the

participant is asked to move all windows cognitively in the oscreen area at any order

31

they want. Simultaneously, they get a notification of the number of windows that are

still inscreen and their title (fig. 5.3). In order to allow participants to pick windows of

their choice, all windows are shown, randomly resized and then shued. This ensures

us that there are windows overlapping and thus a participant can grab partially shown

windows if they desire. Fig. 5.5 demonstrates this concept. When no windows are left

visible in the desktop, the prototype asks the participant to show a specific window

(fig 5.4) based on the previous randomly picked row of windows order. At any time

the participant can ask the system which window was the requested by pressing the

ESC key.

Figure 5.3: Notification with

remaining windows number and title.

Figure 5.4: Notification showing next

window to be fetched from oscreen.

Figure 5.5: Windows after shuing.

To successfully complete a trial, the participant has to grab the application they were

asked from the oscreen area and then release it in the inscreen area. Two errors can

happen:

Wrong window. Participant grabbed a window other than the one requested,error that is visualized by notification only after a release.

Nothing. Participant did a grab gesture in a box that does not contain awindow. This happens when one makes a gesture in a box which now is empty

because that window has already become visible in a previous trial. Error

notification is visualized immediately.

Experiment Windows List

The following applications have been chosen for the participants to operate on.

32

(a) Chrome (b) Firefox (c) Eclipse (d) Sublime

(e) Xcode (f) Pages (g) Skype (h) Calendar

Figure 5.6: Selected windows for the experiment

We chose deliberately two programs for Software Development (Xcode, Eclipse), two

for Internet browsing (Firefox, Chrome) and two for text editing (Sublime, Pages)

because their functionality is similar to each other enabling us to look for patterns

regarding grouping applications in sequential boxes in the oscreen area. We also had

to narrow down our options due to limitation of screen shot as previously described.

5.2.4 Demographic Information

Demographic Information form is used to collect non-crucial personal information

about the participant. Information such as name, surname and address are not asked

to avoid any confidentiality issues. The information that need to be filled are shown

in table 5.3.

Age Based on year of birth

Gender Male or female

Job Title Participants work area

Table 5.3: Demographic information

We then ask the participants to state their experience in MacOSX usage and espe-

cially if they are using the Mission Control to navigate between windows. Finally

we ask them to state how many dierent tasks do they usually perform and rate

their experience with computers. Rating is based on a level 5 Likert scale; Highly

Inexperienced (1) to Highly Experienced (5).

This form can give us clues about if the several factors mentioned below, introduce a

potentially big variation in the error and time variables during the experiment.

In particular:

1. Familiarity with MacOSX

2. Usage of Mission Control

33

3. Number of tasks

4. Computer experience

The Demographic form as subsection of the Assessment form can be found in Ap-

pendix B.

5.2.5 Assessment

Having completed all trials in all feedback modes, participants instructed to fill a

questionnaire in a five point Likert scale where they rate and optionally comment on

each of the feedback mode the encountered. The questionnaire (Appendix C) was

divided in three parts, one for each feedback mode. In particular the participants had

to provide an assessment for the following type of feedbacks:

1) Full feedback, 2) Single feedback, 3) No feedback

The questionnaire assessment has the same structure and is independent of the mode

of the feedback. Each assessment consists of 10 questions, which are explained below

along with their importance.

1. Grabbing windows during the experiment was: The participant had to

rate how easy could grab a window for the specific feedback mode. This question

will help us identify if grab window gesture aects or not the question #5.

2. Releasing windows during the experiment was: The participant had to

rate how easy could release a window for the specific feedback mode. This

question will help us identify if release window gesture aects or not the question

#6.

3. Mouse movement smoothness during experiment was: The participant

had to rate the smoothness during the experiment. Smoothness is a subjective

factor and with this question we can assess whether smoothness of mouse while

moving hand inscreen interferes with what window user wants to select.

4. Dividing the o screen area cognitively was: The participant had to rate

how dicult it was to imaginary think of the oscreen area and split it into

boxes. The importance of this question is, that we can assess if splitting the

oscreen in two rows is a feasible solution.

5. Grabbing (oscreen) and releasing (inscreen) the correct window dur-

ing the experiment was: The participant had to rate how dicult it was to

perform an operation from oscreen to in screen.

6. It was easy to release (oscreen) in the area I wanted: The participant

had to rate how dicult it was to target the box they wanted in the oscreen

area.

Questions 5 and 6 are of high importance since we get feedback about the nature

of the interaction.

34

7. Arm fatigue: Level of fatigue in the arm.

8. Arm fatigue: Level of fatigue in the wrist.

Questions 7 and 8 help us understand if we should find alternative ways of

interaction for Future Research (Section 8.2)

9. General comfort: We ask the participant what their general impression was

regarding comfort. Of course there is correlation between this factor and the

wrist and arm fatigue factors.

10. Overall, the interaction was: We measure the general impression of the

feedback mode. It is measured in usability scale.

Finally in the bottom of each page, there is area designed to allow participants com-

ment the interaction, provide advantages and disadvantages they might have faced as

well as share their thoughts of improvements, if any. This was not a required section,

however we extracted some very welcomed feedback from them.

5.3 Participants

We found 9 participants willing to participate in our study ranging from age 23 to 45

years old. We extracted the following information from their demographic form.

As we can see from Figure 5.7a participants are mostly ranged between 26 and 30

years old. We also tried to recruit people from older and younger ages, that would

hopefully behave interestingly dierent on the experiment. We managed to recruit 2

participants on the 41-45 range and other 1 below 25 years old.

(a) Distribution of

participants age.

(b) Distribution of

participants gender.

Participants were mostly male (56%) as it can be observed on Figure 5.7b. To be

precise, we had 5 participants that were male and 4 female (44%). Four out of nine

are working as managers for a department and three were students/developers. Also

all of them showed that they had experience with Mac before, even if they dont own

one. Figures 5.8a, 5.8b illustrate that all of our users have up to one degree experience

with computers and 56% use the Mission Control to switch between tasks by invoking

it by one way or another (mousepad gesture, press F3 button).

Finally, in Figure 5.8c we observe that the majority of participants interact with 3-5

tasks in their daily routine whilst only one performs more than 5. Though, this only

35

(a) Distribution of

participants experience.

(b) Distribution of use of

Mission Control.

(c) Distribution of

simultaneous tasks.

implies interaction with these tasks, as it is also possible windows that compose other

tasks to be opened but user not using them (inactive).

36

Chapter 6

Results

This chapter presents the results we collected after performing the user study on the

Oscreen Interaction. We decided to focus on 2 basic categories, which include the

most significant information we collected and analyzed. We present the results for

Completion Time and Error Rate. We present and illustrate the mean values for all

the parts of the assessment as discussed in Section 5.2.5. We also refer to the feedback

modes of the trials with the abbreviations presented in Table 6.1.

Feedback mode Abbreviation

Full FF

Single SF

None NF

Table 6.1: Abbreviations for the feedback modes

6.1 Completion Time

Completion time is a parameter that counts the time a participant spent from the mo-

ment they were asked to show a window until they actually show that window. That

means, the timer is triggered upon system notification which informs the participant

for the requested window title and stops when this window is shown in screen. When

the correct window is shown, new system notification is triggered and participant

proceeds with the next timed interaction.

We applied the ANalysis Of VAriance (ANOVA) [4] statistical model, using the aggre-

gated values of time. We used ANOVA to determine if the mean values for completion

time per feedback mode are statistically dierent. We could export basic information

about the means by comparing them, but we want to know how these dierences in

the mean values aect our results and if these dierences are significant.

Using repeated measures ANOVA we found significant main eect in feedback modes

37

(F1.064,8.512=9.905, p

type of feedback mode. As we can see, and as already stated, Mean dierences are

really significant between Full/Single Feedback and No Feedback.

Feedback mode (i) Feedback mode (j) Mean Dierence (i-j) Sig.

FF SF -0.056 1

FF NF -0.611

behavior as this is strengthened by log file entries (Listing 6.1). The final result of

grouped applications is shown in Figure 6.2.

Figure 6.2: Participants 1 grouping.

To give meaningful meaning in the log entries presented below, we will try to explain

them by commenting:

1:C:Chrome :(2 ,0) //Put Chrome in box (2,0)

1:C:Skype :(0,0) //Put Skype in box (0,0)

1:C:Sublime Text 2:(0 ,1) //Put Sublime Text 2 in box (0,1)

1:C:eclipse :(1,1) //Put eclipse in box (1,1)

1:C:Pages :(2,1) //Put Pages in box (2,1)

1:C:Firefox :(3,0) //Put Firefox in box (3,0)

1:C:Calendar :(2,1) //Put Calendar in box (2,1), Pages pops out

1:C:Xcode :(0,0) //Put Xcode in box (0,0), Skype pops out

1:C:Pages :(0,0) //Put Pages in box (0,0), Xcode pops out

1:C:Xcode :(1,0) //Put Xcode in box (1,0)

1:C:Skype :(3,1) //Put Skype in box (3,1)

Listing 6.1: Log entries for Participant 1, NF mode.

Participant 2 on the other hand, followed a dierent methodology. They filled the

bottom row first and then the upper row. We were unable to identify whether the

positioning in such pattern as shown in Figure 6.3 was accidentally or predicted as

there are no pop outs (Listing 6.2).

2:C:Pages :(0,0)

2:C:Chrome :(1 ,0)

2:C:Skype :(2,0)

2:C:eclipse :(3,0)

2:C:Sublime Text 2:(0 ,1)

2:C:Firefox :(1,1)

2:C:Calendar :(2,1)

2:C:Xcode :(3,1)

Listing 6.2: Log entries for Participant 2, NF mode.

40

Figure 6.3: Participants 2 potential grouping.

6.4 Subjective Preferences

In Figure 6.4, the mean values for all the parts of the assessment presented in section

5.2.5 are illustrated and presented. On each diagram the vertical axis represents the

mean values of the corresponding category, which are ranged from 1 to 5 (5 point

Likert scale) whilst X axis represents the mode. Generally, higher values mean that

participants positively rated the specified category.

FF mode scored repeatedly well across all categories, better than any other mode.

Precisely, Full Feedback (Blue) was ranked higher concerning the Overall Interaction

Experience (3.66 / 5) Figure 6.4h and the General Comfort (3.11 / 5) Figure 6.4g.

However, most importantly, NF (Orange) performed badly in a large degree com-

pared to the rest of modes through all categories except wrist fatigue, Figure 6.4d

-category to which all modes performed mediocre with STDmean = 0.231. Arm fa-

tigue (Figure 6.4c) especially in NF, indicates a possible future research topic with

target to diminish the eect. Finally, grabbing from oscreen/inscreen and releas-

ing inscreen/oscreen respectively, as well as smoothness, which operates uniformly

(mean = 3.92, STDmean = 0.16) ranged high in all modes (Figures 6.4e, 6.4f, 6.4a),

which proves that our prototype application is well structured and corresponds well

to hand movement and gesture identification.

(a) Mean Smoothness

(b) Mean Oscreen Area Cognitively

Division

41

(c) Mean Arm Fatigue (d) Mean Wrist Fatigue

(e) Mean Grab Oscreen, Release

Inscreen easiness

(f) Mean Grab Inscreen, Release

Oscreen easiness

(g) Mean General Comfort (h) Mean Overall Interaction Experience

Figure 6.4: Mean Values assessed by participants for each feedback mode.

42

Chapter 7

Discussion

We conducted a study and presented the results, now we need to discuss and interpret

these results. What glues together the experiment (Chapter 5) and the results (Chap-

ter 6) are the hypotheses made, thus we will commence the discussion by stating the

hypotheses and discuss if they hold.

(H1) Visual feedbacks (Full and Single) outperform significantly No Feedbackin terms of task completion time.

In section 6.1 we stated that the No Feedback mode is the slowest mode and we

showed a dierence of ratio of 15 times and 6 times slower than Full and Single

modes respectively. At section 6.3 we presented two cases that achieved really

fast completion time in the No Feedback mode. Although their performance

was remarkable, their times in No Feedback mode outperformed only ones

participant completion time in Single Feedback. This participant performed so

badly in all modes (e.g. 10 times slower in Single Feedback than rest) and

therefore is considered an isolated event. We believe that albeit H1 still holds,

future research is required to strengthen our belief.

(H2) Error rates in No Feedback are significantly higher than any visual feed-back.

In Chapter 6, Section 6.2 we noted that in No Feedback mode error rates were

huge compared to the feedback modes that give visual information. This was

expected, as in the No Feedback mode users have no indication where in oscreen

area their hand is and whether the requested window is under this box or the

box is empty. Applying patterns while positioning phase (grouping tasks &

positioning methodology) although aected time completion, error rate is still

high. Even if the error rate is only composed by the Nothing error (5.2.3), still

this is a form of error and we therefore conclude that H2 holds.

(H3) Task completion time in No Feedback mode can be improved upon usinggrouping or special spatial positioning.

43

We analyzed results in the Section 6.3 where we presented two cases varying

significantly from others. Indeed our analysis of data validates H3 as two partic-

ipants managed to improve completion time (best=3.73s) by a factor of 5 from

the fastest participant who achieved 15.13s.

Taking into consideration the above discussed hypotheses, we perceive that all of them

hold. It this though imperative to discuss the influence of H3 in the No Feedback

mode before we can conclude:

Since our window list was chosen such as to allow grouping of windows and strength-

ened by our findings, we could assume directly from H3 that if there is correlation

between tasks, systems that provide No Feedback might behave no significantly worse

than systems with visual feedback. This assumption though is not taking into con-

sideration error rates and a longer lasting usage of a No Feedback system, which

due to high arm fatigue (Fig 6.4c) would increase significantly the error rate and user

frustration resulting in a no usable system with slow completion time, prone to errors.

In the additional assessment users provided us with (Appendix C), we asked them to

put the 3 dierent modes in ascending order depending on which was the easiest to

use. Users dominantly rejected the No Feedback mode as the most tiring and least

eective to use which definitely depicts user preferences. Observing the results we

got from the error and completion time parameters, we realize that the No Feedback

is outperformed by all other modes regarding all measured parameters. Results indi-

cate no significant dierence between the Full feedback and the Single feedback but

nonetheless Full feedback is slightly more ecient.

Our findings contradict findings of Gustafson et al. [17] where they developed and

studied a viable system without any visual feedback. This contradiction is interpreted

by the fact that our interaction is a mixed interaction between a computer screen and

spatial space whilst their system interacts only in the spatial space. Similarly, Po et

al. [45] state and i quote

Our findings suggest that pointing without visual feedback is a potentially

viable technique for spatial interaction with large-screen display environ-

ments that should be seriously considered.

We would like here to dierentiate our study from studies for large-screen displays as

our mechanics dont apply in such displays, research area that we will address in the

section below.

After stating the above observations, and combining H1, H2, H3 we conclude that

a system for interacting between desktops windows and the oscreen space can a)

be seamlessly achieved as long as there is a form of visual feedback to the user and

b) that the existence of some (subjective) form of grouping windows or positioning

methodology in the oscreean area decreases the interaction time but not the error

rate when no visual feedback is provided.

44

Chapter 8

Conclusion - Future Work

This paper has presented an alternative 2.5D interaction technique for task switching

operations on a MacOSX operating system using an external, portable device for

gesture recognition. We will present in this chapter our conclusions based on our

findings as well as provide research areas that are consequently derived or areas that

we didnt research but we believe they should be.

8.1 Conclusion

In this study we proposed Oscren Interaction, an alternative interaction technique

for manipulating (hide and show) open windows, which provides an eective task

switching paradigm by placing and retrieving tasks in the spatial space around a

computer screen. This spatial space is defined by a cone area the Leap Motion Con-

troller -a depth sensing device- operates and allows us to obtain information of hands

and fingers. The main objectives were to investigate the role of visual feedback, how

its absence or presence aects the interaction, and examine if grouping windows im-

proves the interaction when there is absence of visual feedback. Current approaches

and implementations were presented at the very start of this paper along with the

initial problem that intrigued us to build this new technique for switching between

windows on a MacOSX environment. Relevant work and researches that have been

already conducted on the same or related fields were presented afterwards. We sep-

arated them in themes, although there is no clear distinguish, and explained the

dierentiation of our approach.

We then presented the design of our implementation, along with important concepts

and thoroughly explained our choices by grounding them from recommendations of

previous works. To test and study the oscreen interaction technique, we devel-

oped a prototype application. The implementation, architecture, description of the

development frameworks, workflows and hardware specifications for the prototype

application we developed were presented in later chapter. We continued by analyzing

45

the procedure and the context of the user study. We had 9 participants taking part

in this study. Each participant had to go through the verbal instructions, fill in the

Demographic information form, familiarize themselves with the interaction through

the User Learn Mode (5.2.2), accomplish the three feedback modes (3.4) for the ex-

periment consisting of 8 open windows and finally fill in an assessment form for each

of the 3 dierent type of modes. We then presented statistical information regarding

the demographic information of the participants that took place in our study.

We then presented the computed results for the two basic parameters: completion time

and error rate. We stated our observations from the log file regarding the positioning

in the oscreen area followed their significance. The results and arguments on the

validity of the hypotheses were also discussed.

Finally, we provided evidence and supported that all the hypotheses hold and con-

cluded that a) a system for interacting between desktop windows and the oscreen

space is feasible given that user receivers some form of feedback and b) no feedback

is a feasible solution for

Documents

Offscreen Interaction