Where Have I Been - Visualizing Personal Geolocation Data · Where Have I Been - Visualizing Personal Geolocation Data Jorge Miguel Saldanha Filipe Thesis to obtain the Master of

Where Have I Been - Visualizing Personal Geolocation Data

Jorge Miguel Saldanha Filipe

Thesis to obtain the Master of Science Degree in

Information Systems and Computer Engineering

Supervisor: Prof. Daniel Jorge Viegas Gonçalves

Examination Committee

Chairperson: Prof. Miguel Nuno Dias Alves Pupo CorreiaSupervisor: Prof. Daniel Jorge Viegas Gonçalves

Member of the Committee: Prof. João Manuel Brisson Lopes

November 2015

ii

For my parents

iii

iv

Acknowledgments

This journey would not have been possible without the support of my family, professors and friends.

To my family, thank you for encouraging me in all of my pursuits and inspiring me to follow my dreams.

I am especially grateful to my parents, who supported me emotionally and financially.

I take this opportunity to express my profound gratitude and deep regards to my supervisor, Daniel

Jorge Viegas Goncalves for his exemplary guidance, monitoring and constant encouragement through-

out the course of this thesis.

To my friends, thank you for listening, offering me advice, and supporting me through this entire

process.

v

vi

Resumo

Com a grande quantidade de dispositivos portateis que existem hoje em dia que sao capazes de reunir

informacao de GPS, grandes volumes de dados sao produzidos. Apesar do facto de que as pessoas

controlam a sua mobilidade, a maioria das abordagens sobre a analise de dados espaco-temporais sao

muito complexas e tecnicas. Assim, por um lado, e muito facil e comum recolher dados espaco-temporais.

Mas por outro, e difıcil analisar esses dados e extrair coisas com relevancia pessoal.

Para tornar a analise de dados sobre geolocalizacao pessoal mais facil, determinamos uma lin-

guagem visual para ver e consultar os dados, incluindo suporte para semantica pessoal dos locais. Esta

linguagem visual foi validada, antes de continuar a desenvolver o sistema, para ver se os utilizadores

conseguiam usa-la e compreende-la.

Depois da validacao, implementamos um sistema que integra a nossa linguagem visual com a

visualizacao dos resultados e a interaccao com mapas. A avaliacao mostrou que as pessoas con-

seguiam usar e compreender todo o sistema, deixando claro que os objectivos iniciais foram alcancados.

Palavras-chave: dados de movimento, dados espaco-temporais, visualizacao escalavel,

geo-visualizacao, semantica pessoal

vii

viii

Abstract

With the large amount of portable devices that exist nowadays and are capable of collecting GPS data,

big volumes of data are being produced. Despite the fact that people track their mobility, most ap-

proaches on the analysis of spatiotemporal data are too complex and technical. So on one side it is very

easy and common to collect spatio-temporal data. On the other hand it is difficult to analyze this data

and extract things with personal relevance.

To make the analysis of personal geolocation data easier, we devised a visual language for accessing

and querying that data, including support for personal semantics of locations. This visual language was

validated, before proceeding to develop the system, to see if users could use it and understand it.

We then implemented a system that integrated our visual language with result displaying and map

interaction. An evaluation showed people could use and understand the whole system, thus certifying

we had achieved our initial objectives.

Keywords: movement data, spatio-temporal data, scalable visualization, geovisualization, per-

sonal semantics

ix

x

Contents

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Resumo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

List of Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

1 Introduction 1

1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Related Work 5

2.1 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Visits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.2 AllAboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.3 QS Spiral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.4 ST-TrajVis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.5 LifeLines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.6 Microsoft GeoFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.7 Geotime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.8 Visual Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.9 Visual analytics of movement: an overview of methods, tools and procedures . . . 12

2.1.10 AprilZero Sport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.11 Google Maps Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.12 TrajRank: Exploring Travel Behaviour on a Route by Trajectory Ranking . . . . . . 15

2.1.13 Generic mapping applications and tools . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1.14 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Specifying Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.1 Cigales, Sketch! and Lvis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

xi

2.2.2 The Challenges of Specifying Intervals and Absences in Temporal Queries: A

Graphical Language Approach (TQ:AGLA) . . . . . . . . . . . . . . . . . . . . . . 18

2.2.3 VizPattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.4 TaxiVis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.5 GeoDec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.6 Time Automaton - a visual mechanism for temporal querying . . . . . . . . . . . . 22

2.2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3 Conceptual Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3.1 Towards a general theory of action and time . . . . . . . . . . . . . . . . . . . . . . 23

2.3.2 Topological relationships between complex spatial objects . . . . . . . . . . . . . . 23

2.3.3 Triad Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3.4 Conceptual Framework and Taxonomy of Techniques for Analyzing Movement . . 24

2.3.5 Visual Analytics for Analysis of Movement Data . . . . . . . . . . . . . . . . . . . . 25

2.3.6 STNexus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4 List of requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Data Collection 29

3.1 Which data to collect? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 GPS tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.2 Semantic meaning and annotations . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Real data problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.1 Problems related to GPS tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.2 Problems related to semantic meaning and annotations . . . . . . . . . . . . . . . 34

3.2.3 Common to both . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 Solving real data problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.1 Dataset size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.2 GPS Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.3 Forgotten Start/End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3.4 Loss of signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3.5 Battery Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.6 Multiple meanings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.7 Forgetting something . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.8 Keeping up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.9 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.10 User’s Burden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Visual Queries 43

4.1 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2 Visual language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

xii

4.3 Visual language examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 Visual language validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Where Have I Been 55

5.1 GPX Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 Semantic Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3 Backend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3.1 Query translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.4 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.4.1 User interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 Evaluation 75

6.1 Experimental protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.1.1 User profile questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.1.2 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.1.3 Overall questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2.1 Results by task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6.2.2 Results by feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.2.3 Questionnaire results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7 Conclusions 85

7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Bibliography 87

A Test protocols 91

A.1 First test protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

A.2 Final tests protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

B Questionnaires 99

B.1 User profile questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

B.2 Overview questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

B.2.1 Part One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

B.2.2 Part Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

C GPX Library Documentation 102

D Evaluation results 105

D.1 User profile results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

D.2 Task results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

xiii

D.3 Overall questionnaire results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

xiv

List of Tables

2.1 Visualization Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Specifying Queries Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Spatio-Temporal classification for personal geolocation data . . . . . . . . . . . . . . . . . 45

4.2 Average number of errors made by task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.1 Summary of collected data during 5 years . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.1 Results by task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.2 SUS summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

xv

List of Figures

2.1 Visits interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 AllAboard panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 QS Spiral interactive visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 ST-TrajVis interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.5 LifeLines interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.6 GeoFlow visualization of U.S. power stations . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.7 Screenshot of GeoTime prototype in calendar mode (Linked time chart) showing recent

events within a smaller local area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.8 Visual Mobility UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.9 AprilZero Sport interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.10 Google Maps Timeline interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.11 TrajRank interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.12 Cigales interface and query visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.13 LVis evolution operators between surfacic objects . . . . . . . . . . . . . . . . . . . . . . . 18

2.14 Query representing ”stroke occurs during Drug A” . . . . . . . . . . . . . . . . . . . . . . 19

2.15 VizPattern workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.16 TaxiVis workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.17 User interface of GeoDec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.18 Spatial visualization of bus stops’ activity in a weekend . . . . . . . . . . . . . . . . . . . . 22

2.19 Temporal topological relationships (image from [34], original idea from [27]) . . . . . . . . 23

2.20 4 of the 82 topological relationships between two complex lines . . . . . . . . . . . . . . . 24

2.21 The interface of the interactive time filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1 Proposal for the semantic context file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2 Common GPS problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Comparison between an original gpx track (red) and one with RDP (green) . . . . . . . . 35

3.4 Comparison between GPX simplification algorithms . . . . . . . . . . . . . . . . . . . . . 35

3.5 Standard RDP problem explanation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.6 Standard RDP problem explanation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.7 Comparison between an original gpx track and one after applying the smoothing algorithm 38

3.8 Comparison between original gpx track and one after applying smoothing algorithm . . . 39

xvi

3.9 Comparison between an original gpx track and one after applying the interpolation algorithm 40

4.1 Where Have I Been query area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 Editing a start time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 Editing a temporal range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4 Editing a duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.5 Applying recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.6 Editing a spatial range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.7 Adding a comparison route . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.8 Visual query demonstrating range of time . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.9 Visual query demonstrating absolute time . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.10 Visual query demonstrating time relative to another event . . . . . . . . . . . . . . . . . . 50

4.11 Visual query to specify spatial fuzziness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.12 Visual query demonstrating spatial accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.13 Places in relation to other ones - simplified version . . . . . . . . . . . . . . . . . . . . . . 51

4.14 An exact place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.15 A route within a certain distance from a baseline . . . . . . . . . . . . . . . . . . . . . . . 51

4.16 Visual query demonstrating ending times . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.17 Visual query demonstrating duration constraints and class of places . . . . . . . . . . . . 52

5.1 Solution schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.2 Data collection edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3 Database Entity-Relationship diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.4 Query illustrating SQL translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.5 Aggregated result with quartiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.6 Where Have I Been user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.7 Where Have I Been query area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.8 Editing a start time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.9 Editing a temporal range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.10 Editing a duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.11 Editing a spatial range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.12 Where Have I Been search area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.13 Removing a query element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.14 Results reuse and lock options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.15 Results panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.16 Where Have I Been settings interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.17 Interface input widgets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.18 Fuzziness interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.19 Disaggregated results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.20 Result aggregation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

xvii

6.1 Clicks by task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.2 Time by task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

D.1 Users’ gender summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

D.2 Users’ age summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

D.3 Users’ studies summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

D.4 Users degree summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

D.5 SUS summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

D.6 Global appreciation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

D.7 Smartphone possession . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

D.8 GPS usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

D.9 GPS usage situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

D.10 Would users track location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

D.11 User concerns on tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

xviii

Listings

3.1 Tkrajina implementation of RDP with temporal modification . . . . . . . . . . . . . . . . . 36

5.1 SQL translation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

xix

Chapter 1

Introduction

Lately, with the massive spread of GPS tracking devices, such as smartphones, smartwatches, tablets

and dedicated devices, huge volumes of data are produced, not only representing the mobility of people,

but also animal behavior or natural phenomena. Because of this massive collection of data, it is possible

to say that, at some point, everyone is a spatio-temporal analyst [1], either by planning a journey, looking

for a job or searching for restaurants. With the advance of technology people tend to plan more and

more their life: find the days of the week that will be rainy, understand terrain prices in order to build a

house, search for epidemic contagion.

GIS (Geographic Information Systems) applications are tools that allow users to manipulate and

analyze this information. Common features include the creation of interactive queries, analysis of spatial

information, editing data in maps, and the presentation of the results of all these operations. These

tools are, however, not designed for the general public: they require some programming or expert-level

knowledge.

Despite all those features and the fact that people track their mobility, most approaches on spatio-

temporal data, including GIS systems, don’t consider the personal side of the data. That is, data having

its own meaning for the user (either spatial: like ”my home”, ”kids school” or temporal, like ”my birthday”).

There are however, some exceptions worth mentioning: spatio-temporal location in social networks and

lifelogging.

The integration between spatio-temporal location and social networks is an example regarding per-

sonal data. There has been considerable interest in applying social network analysis methods to geo-

graphically embedded networks [2] such as population migration, international trade, population health

behaviors, information dissemination, or human behavior in a crisis [3] [4] [5].

Lifelogging, a practice where people track personal data generated by their own behavioral activities

(like exercising, sleeping, and eating), is a subject under expansion. Many people have interest in it -

there are even people who love this subject and created a website to show their own personal logging

information. But this only shows that there is not a global solution users can benefit from. Those who

can, implement their ad-hoc solution, which is many times only valid for a specific geographical situation.

Furthermore, these solutions tend to ignore the results of research in the GIS area.

1

Personal spatio-temporal data involves geographical space, time and human behavior. Several chal-

lenges arise from this complexity. However, it also enables the use of such data for different purposes:

to study the properties of space and places, to understand the dynamics of personal events, to recreate

and find patterns in human behavior and so on [6]. A negative side of collecting personal individual data

is the growing threat to personal privacy [7] [8].

So on the one side it is very easy and common to collect spatio-temporal data. On the other it is

difficult to analyze this data and extract things with personal relevance. Most importantly, there is a lack

of tools that are capable of visualizing and querying personal geo-temporal information.

Based on these tools, users could find information and standards that really are personally important

to them. For example, one of the main benefits for users, is the ability to have insights about their

personal location data. Users can query about places they have been, find out the places where they

spent more time or where they go more often. Another feature that differentiates this solution is the

possibility to query information regarding a users’ own semantic meaning, namely, one could ask about

how often he went to his mom’s house. By using these insights, a user could also be more conscious

about traveling decisions. One could analyze the time wasted during rush hour, compared to the time

wasted at a time of less traffic and use this information to optimize future decisions. Another interesting

outcome is the ability to optimize traveling distances by comparing similar routes a user does to get to

the sample place - by doing this analysis a user could save money and fuel.

Since a simple and specific tool, for which there is no need for users to have expert level knowledge,

is missing, we propose to fill that gap. Our proposal is a tool that is able to query personal geolocation

data in order to find specific routes, and including the personal semantics of the locations. In order to

achieve that, we must first identify the different data inputs we need to process and avoid the direct use

of SQL language in the querying system, so the user can deal with it with ease and, instead, create a

visual language. After that, we will create a system based on that language and test it with users.

In order to be easily understandable we believe a query must be visually represented. We will study

how to create a visual language that has full support in querying geo-temporal data and also supports

user assigned semantics. There is also an obvious relation between geo-temporal data and maps - so

we came up with the idea to integrate maps with the query system, allowing the user to both directly

select locations and see the query results in a map. Because the language must be understood and

used by users, we have to validate it with users before proceeding into the actual implementation.

After having the visual language definition, we also need to know how can we offer the user a good

interface for the incoming results of a query. Since the results can extend for a large period of time,

a user must be able to quickly understand an overview of all the data. We will, therefore, study how

to present the results. To do both result presentation and query language definition we will extensively

review related literature, so that we can perceive existing patterns and ideas that may be applicable to

our solution, as well as existing problems faced by solutions dealing with geo-spatial data.

Lastly, we evaluate the system as a whole to understand to what extent is it easy to use. There is

particular interest in evaluating how users deal with the query interface, because it is the most relevant

part of the solution.

2

1.1 Objectives

We are only focusing on showing the data of one person - this is not a tool related to crowd-sourcing.

Despite being the data of only one person, this data can be extensive and span many years of life. There

is also no concern on how the user gathers his/her data. One of the things a user should be able to do is

to have insight of its personal mobility information and perceive patterns in the daily life. Thus, our main

objective is:

Create a visual query language for personal mobility data that allows users tovisually query their personal spatio-temporal information, with personal seman-tics.

Several secondary goals arise from the analysis of the main goal. We will discuss those goals below.

One of the objectives is to analyze and study the expressiveness of the queries we want our

system to support. It is essential to understand what are the informations a user might want to know

about mobility data and then translate the need into a systematized document. We will also need to

analyze the best way to visually represent the resulting information (this is the information a user

wants to see about a specific track - duration, average speed, etc.). A challenge is choosing the best

way to represent this information: graphic bar, circular graphic, etc. And, perhaps the most important

issue, is to analyze the best way to visually query the information, that is, the query interface. It is

important that this interface has enough expressiveness for a non-expert user, and that no SQL queries

have to be written.

Since we’re dealing with personal data, we also have the concern of privacy. There is the need to

identify which information can be extracted from various types of data [8] and privacy-preserve that data.

1.2 Contributions

Below we enumerate the main contributions of our work.

• Visual language expressiveness: a table showing all the studied outcomes when dealing with

personal spatial and temporal data, regarding accuracy, relativity and concreteness.

• Visual language specification: a written specification of a visual language to query personal spatio

temporal data based on the previously mentioned table.

• Personal spatio temporal visual query system: A system (browser based) that implements the

visual language described above. This system also includes viewing the results. It is available at

GitHub1

• GPX Library to organize and clean GPX files: A library made to process input data, in order to

increase data quality. It is open source and available at GitHub2

1https://github.com/jmsfilipe/where-have-i-been-frontend accessed: 09-09-20152https://github.com/jmsfilipe/where-have-i-been-lib accessed: 09-09-2015

3

https://github.com/jmsfilipe/where-have-i-been-frontend

https://github.com/jmsfilipe/where-have-i-been-lib

• A paper for the Special Issue ” The examined life: Personal uses for personal data”, for the HCI

Journal

1.3 Structure

In the next chapter we will discuss several published works regarding three different areas: Visualization

of information, specifying queries and conceptual frameworks. Then, in chapter 3, we will analyze known

problems regarding data collection and address ways to solve them. In chapter 4 we describe our visual

language, stating its expressiveness and giving examples of its usage and the validation results. Chapter

5 describes the system implementation, concerning the backend and frontend. In chapter 6 we present

the results of evaluating our solution with users. Finally, we conclude in Chapter 7 with a brief summary

of our work and suggesting some guidelines for future research.

4

Chapter 2

Related Work

In this section we will analyze existing works regarding visualization (of personal and other data), speci-

fying queries and conceptual frameworks. This section is divided in three subsections. In Visualization

we will discuss works that aim to visualize personal geo-temporal data. Then in Specifying Query sec-

tion, relevant works that focus on visual queries for spatial-temporal data will be analyzed. Finally, in the

last section, Conceptual Frameworks, we will show some frameworks that help the task of providing

an abstraction when manipulating spatio and temporal data.

2.1 Visualization

This works aims to provide an interface where users can view their personal mobility data. Thus, in

this section we will see applications that provides a means of visualizing information that is somehow,

relevant for this work.

2.1.1 Visits

Many apps and services exist that collect location data either continuously or check-in based. How-

ever, this data does not reflect the way people remember their trips. Human memory captures trips as

narrative-like sequences of events.

Visits 1 creates a visualization of automatically collected spatio-temporal data that reflects current

knowledge about how people naturally remember autobiographical episodes, such as their journeys [9].

They developed map-timelines, a visualization technique that integrates temporal and spatial infor-

mation to display histories of trips as a series of visited places.

This concept can also be applied to other types of spatial-temporal data, such as historical records

of famous journeys, or city development.

The goal of this approach is to support the identification of: the chronological order of stays; repeated

stays at the same place and the duration of stays while preserving the fine-grained location information

of the data.1http://v.isits.in/ accessed: 14-10-2015

5

http://v.isits.in/

The interface consists of two items, the centrally placed horizontal map-timeline, and the overview

map in the lower left corner (figure 2.1).

The map regions - showing visited places - that are visible in the map timeline above are also shown

as circles in the overview map. Curves connect these overview circles with the corresponding segment

in the map timeline.

The data to be analyzed can come from three different sources: Flickr Collections, Google Maps

history, and OpenPaths2 history.

Figure 2.1: Visits interface

One of the advantages of Visits is the way information is represented. By using a timeline, events

are chronologically ordered and are easy for a user to locate in time. A major drawback is the problem

of scalability: if the amount of events to be shown is too large, the timeline will become too big, and the

circles would become too small to be visible.

2.1.2 AllAboard

With the high level of penetration of cell phones it is now possible to store samples that are several

orders of magnitude larger than manual surveys. AllAboard aims to aid city authorities in exploring urban

mobility and in optimizing public transport by collecting cell phone data [10]. The data was provided by

a cell phone operator in Ivory Coast: it is based on exchanged SMS - the dataset tuples contain the

following: <UserID, Day, Time, Antenna>. The main component is the antenna which provides the

means to find the GPS coordinates, thus allowing a spatial location.

Like many other application, AllAboard has to assure it’s data quality. They do so by in a component

which processes mobile phone location data and by applying a set of algorithms, extract information on

users’ stops, origin/destination flows, and shared route patterns.

AllAboard provides a user interface that allows the operator to explore mobility data, validate them

and evaluate different optimization strategies by interacting with maps, density heat maps and other

2https://openpaths.cc/ accessed: 19-10-2014

6

https://openpaths.cc/

visual representations.

Figure 2.2: AllAboard panel

The operator can select the time interval to an-

alyze. A line chart is presented where the x-axis

represents time. Due to the large scale data, it

is possible to select the desired time interval us-

ing the time bar (figure 2.2). The flow of peo-

ple between antennas is then shown on the map.

Darker arrows represent a larger amount of trips

between the antennas.

This system has a spatio-temporal focus, how-

ever, the temporal side as a minimal role, just allowing for a simple time filtering. Furthermore the only

personal data concern in this project is to receive anonymized user data. Also, the visualization chal-

lenge, regarding the flow between antennas, could have a better implementation than just changing the

arrow color.

2.1.3 QS Spiral

QS Spiral is an Android Application that allows the visualization of periodic properties of quantified self

data [11].

Figure 2.3: QS Spiral interactive visual-

ization

The software continuously acquires and logs data from

sensors (GPS and WiFi). From this data, locations are ex-

tracted to be shown to the user.

In figure 2.3 a collection of geocaptured data is shown.

The top of the screen shows a tag cloud, specifying the most

visited locations in colors corresponding to the colors shown

in the spiral. The bottom of the display contains time series

of places with map segments showing the places visited.

When a user taps a visited place, the corresponding places

will be highlighted in the spiral.

The spiral visualization allows for repetitive data devia-

tions to stand out clearly, also events with similar periods

are aligned in similar arcs of the spiral. This allows the user

to explore the existing patterns at a glance.

Since it is a mobile application, many alternative ways,

comparing to a standard web application can be used, such

as: pinching for zooming into a particular part of the data, or

swipe to allow to pan.

This interface provides the means to answer cyclic tem-

poral queries, such as recurring temporal events like ”every Monday”.

7

Limitations of this system are mostly related to scalability. An extended log of data would translate

into a messy spiral, in which a user could not tap the desired part.

2.1.4 ST-TrajVis

ST-TrajVis is an application for the visualization of movement data [12]. It’s main visualization focus are

2D maps and space-time cube.

The data used in the project consists of a subset of another project’s dataset. Each entry is rep-

resented as a sequence of time-stamped points, containing the information of latitude, longitude and

altitude.

However, it does not use both representations separately, it combines them into the same visualiza-

tion. This way, they combine the 2D map focus on spatial information with the space-time cube focus

on temporal information (figure 2.4). The representations are linked - when one point is selected, both

visualizations highlight that point, providing additional information about it.

Figure 2.4: ST-TrajVis interface

It allows for data querying through filtering according to spatial and/or temporal properties: the defi-

nition of a geographical area, the start and end dates of search and the period of hours to be visualized.

It also allows for data enhancement, including the representation of speed, the trajectory’s recency

and smoothing of the trajectories.

Recently, a study suggested that users generally prefer static maps over space-time cubes [13].

Space time cubes present occlusion problems (overlapping data) dealing with several tracks, that may

cause misunderstanding of the data.

2.1.5 LifeLines

Lifelines is a general visualization interface [14]. It was one of the first applications representing personal

information in a dynamic way, serving as inspiration for many current works. Initially conceived to register

8

personal data regarding youth justice records, it is also possible to represent several other types of

personal information.

Figure 2.5: LifeLines interface

In the case of juvenile justice, when a com-

plaint is received, usually from the police, the

workers have to find the current status and pre-

vious crime history. They propose to do so in one

screen only. They used an existing dataset con-

taining youth records of the Maryland Department

of Juveline Justice.

Despite this work’s focus not being spatial nor

temporal, it has some interesting ideas regarding

personal information visualization. They propose

a timeline for each person (figure 2.5) - varying

status are displayed as horizontal lines and dis-

crete events are represented as icons. Each offense is represented in that timeline. Line thickness and

color are used to indicate the severity of the offense. Relationships between periods or events on the

line can be highlighted.

They also include a set of techniques to allow the information to be represented in one screen only -

without scrolling vertically or horizontally. By disabling scrolling they assure users see all the information

and do not forget any result.

The visualization environment is not computationally demanding and can handle a variety of records.

Its a personal record format that can be exchanged or synchronized between multiple services, thus

making it scalable.

2.1.6 Microsoft GeoFlow

Figure 2.6: GeoFlow visualization of U.S. power stations

9

GeoFlow3 (figure 2.6) originated in Microsoft Research.

This tool is an Excel module, and it allows to:

1. Map Data: Plot rows of data from an Excel workbook, including the Excel Data Model or Power-

Pivot, in 3D on Bing maps. Visualization modes include columns, heat maps, and bubble visual-

izations.

2. Discover Insights: Discover new insights by seeing data in geographic space and seeing time-

stamped data change over time.

3. Share Stories: Capture scenes and build cinematic, guided tours that can be shared broadly,

engaging audiences.

In the map data feature, it allows to plot more than a million rows of data.

2.1.7 Geotime

Figure 2.7: Screenshot of GeoTime prototype in

calendar mode (Linked time chart) showing recent

events within a smaller local area

Contrary to LifeLines, which only allows the dis-

play of events in the single dimension of time,

GeoTime, as the name suggests, aims to display

geographical and temporal information. They do

so by taking advantage of three dimensional com-

puter graphics [15].

The visualization concept of GeoTime are

space-time tracks. They represent a stream of

time through a particular location and are repre-

sented as a literal line in space (figure 2.7).

Each unique point of interest (location) will

have one spatial timeline. Events that occur on

that location are arranged along the timeline.

There are three variations of Spatial Timelines

that emphasize spatial and temporal qualities to

varying extents. Each variation is an increase of

salience of time over geography. These are 3-D

Z axis Timelines, 3-D viewer facing Timelines and

linked time chart Timelines.

1. 3-D Z axis Timelines: 3-D Timelines are oriented normal to the terrain view plane. This method

places more emphasis on the geographical view.

2. 3-D viewer facing Timelines: similar to 3-D Timelines except that they rotate about the instant of

focus point so that they always remain perpendicular to the viewpoint from which the scene is

rendered.3http://research.microsoft.com/en-us/news/features/geoflow_data_viz-041113.aspx/ accessed: 27-09-2014

10

http://research.microsoft.com/en-us/news/features/geoflow_data_viz-041113.aspx/

3. Linked time chart Timelines: connect a 2-D grid in screen space to locations marked in the 3-D

terrain representation. More emphasis is placed on the time view (figure 2.7).

This work, however does not scale very well. With hundreds of events on screen, there is the need

to see through a dense display of objects and labels.

2.1.8 Visual Mobility

Visual Mobility is a visual exploration tool for multi-modal personal mobility information that provides a

flexible filtering interface and contextual visualizations that try to extract meaningful mobility patterns

[16].

One of the objectives is to make users more ecologically aware, by analyzing their mobility patterns.

To create the tool, real mobility datasets were used - a simple data collection tool was created by the

author to help users record their data.

Regarding data collection, several issues had to be addressed:

1. GPS accuracy: Accuracy errors are the most common type of problem. It causes misrepresenta-

tion of the actual route taken. It also provokes excessive positioning wandering when in a stationary

position.

2. Cold start/end: Forgetting to turn on/off data collection leads to wrong detection of trips.

3. Battery requirement: GPS data collection devices have a high battery requirement. Usually results

in data loss: users abandon collecting data, gather lower quality data (less samples to save battery)

or incomplete data (battery drained).

4. Privacy: Users can abandon data collection due to privacy concerns. They are not keen on giving

public availability to their daily life routine. However, in an infrequent scenario like a trip, users like

to share it.

5. User’s burden: Mobility data collection is more than recording tracks. Raw data has to be pro-

cessed and this work can be burdensome, depending on the tool.

Due to the impossibility to focus on an automatic data collection process, these problems had to be

mitigated.

11

Figure 2.8: Visual Mobility UI

Focusing on the visual exploration component, the UI is composed of two main areas, the Canvas

and the Sidepanel.

The Sidepanel contains several scented widgets that allow users to filter data:

1. Slider Controller: to set a minimum and maximum value for an information space (for example, the

distance traveled during each hour of the day).

2. Toggle Controller: toggle discrete portions of the information space (for example, time spent in

each transport mode).

3. Tag List with Autocomplete: help users select known Start and End locations of a trip. The results

are ordered by the number of check-ins.

The Canvas area displays the visualizations users want to explore.

1. Map view: visualization of the tracks filtered using the sidepanel. It also provides layers that can

be activated (heatmap).

2. Scatter Plot: Map view allows information contextualization, however, it does not allow individual

track dynamics. In this visualization, it is possible to assign a specific track attribute to one of four

possible scales.

3. Locations Relationships: contextualizes location check-ins in an overall perspective.

2.1.9 Visual analytics of movement: an overview of methods, tools and proce-

dures

This paper surveys relevant work and divides it in four categories [17]. Here we will focus on just one

category: Looking at trajectories.

12

1. Looking at trajectories: The focus is on moving objects. It supports the exploration of spatial

and temporal properties of individual trajectories and comparison of multiple trajectories. Three

essential areas are addressed: visualization, clustering and time transformations.

(a) Visualizing trajectories: The most common types of display for the visualization of movements

of discrete entities are static and animated maps and interactive space-time cubes.

(b) Clustering trajectories: Clustering is a popular technique used in visual analytics for han-

dling large amounts of data. Usually existing clustering methods are wrapped in interactive

visual interfaces supporting not only visual inspection but often also interactive refinement of

clustering results.

(c) Transforming times in trajectories: Comparison of dynamic properties of trajectories using

space time cube, time graph, or other temporal displays is difficult when the trajectories are

distant in time because their representations are located far from each other in a display.

This problem can be solved or alleviated by transforming times in trajectories. Two classes of

transformations are suggested:

i. Transformations that reflect the cyclic nature of time. Depending on the data and appli-

cation, trajectories can be projected in time to a single year / season / month / week / day

etc. This allows the user to uncover and study movement patterns related to temporal

cycles, e.g., find typical routes taken in the morning and see their differences from the

routes taken in the evening.

ii. Transformations with respect to the individual lifelines of trajectories. Thus, trajectories

can be shifted in time to a common start time or a common end time. This facilitates

the comparison of dynamic properties of the trajectories (particularly, spatially similar

trajectories), for example, the dynamics of the speed. Aligning both the start and end

times supports comparison of internal dynamics in trajectories irrespective of the aver-

age movement speed. Particularly, movement patterns of fast and slow movers can be

compared in this way.

2. Looking inside trajectories: consider methods that operate on the level of points and segments

of trajectories.

3. Bird’s-eye view on movement: generalization and aggregation are used to uncover spatio-

temporal patterns.

4. Movement in context: focus on relations and interactions between moving objects and the envi-

ronment. Including various kinds of spatial and temporal objects, as well as phenomena (weather).

13

2.1.10 AprilZero Sport

This is a website created by Anand Sharma 4. It started as a place where he could centralise all

his personal data: heartbeat rate, blood level, running tracks, moving data captured by the Moves5

application, body fat, Foursquare, Instagram and Facebook personal data. Different types of information

are shown, from the heart rate to the transportation modes used. This website provides a visualization

for every day of the month: it is possible to know where he was and where he went to in each hour of the

day. (figure 2.9). The colored icons represent the place where he was, and the horizontal bar spawns

from the start to the end of the stay in that place.

Figure 2.9: AprilZero Sport interface

2.1.11 Google Maps Timeline

Google timeline6 7 uses Google Maps, Google Photos and the user’s Location History. It allows to view

the places a user has been on a given day, month or year. It is private and only visible to the user. It is

possible to control the locations to keep - this means it is possible to easily delete a day or the full history

at any time. It also supports semantics since a user can edit any place that appears, including removing

a specific location or giving a frequented spot a private name like - Moms House or My Favorite Running

Spot.

Its interface comprises of a map, in which locations and routes are shown, and a summary, consisting

of a vertical timeline representing visited places.

A downside of this application is related to the data quality present in routes. Since Location History

is not continuously tracking a user’s position, when traveling, GPS information will consist of points both

spatially and temporally separated, thus not giving a full comprehension of which route the user took.

An example of this problem can be viewed in figure 2.10 - on the map (top part) there is a big straight

line from one point to another, in which there is no straight road to fit it. No conclusions about which

route was travelled by the user can be inferred in this case.

4http://aprilzero.com/sport/ accessed: 06-10-20145https://www.moves-app.com/ accessed: 07-10-20146http://google-latlong.blogspot.pt/2015/07/your-timeline-revisiting-world-that.html accessed: 07-08-20157https://www.google.com/maps/timeline accessed: 07-08-2015

14

http://aprilzero.com/sport/

https://www.moves-app.com/

http://google-latlong.blogspot.pt/2015/07/your-timeline-revisiting-world-that.html

https://www.google.com/maps/timeline

Figure 2.10: Google Maps Timeline interface

2.1.12 TrajRank: Exploring Travel Behaviour on a Route by Trajectory Ranking

Figure 2.11: TrajRank interface

TrajRank provides an interactive visual analytic method for taxi travel behaviour exploration of a route[18].

It takes taxi GPS data and road network data as input. In the offline pre-processing stage, GPS trajec-

tories are cleaned and matched to road network. On each road segment, the travel time of different

trajectories are clustered into groups and these groups are ranked by average travel time in ascending

order. The ranking is visualized and supports interactions for further exploration.

The interface consists of four views: a spatial-temporal view, a horizon graph view, a ranking view

and a menu panel. In the spatial-temporal view, users interactively define spatial-temporal filters and

configure of route segmentation. The horizon graph view displays temporal distribution of selected

15

trajectories over a day. The ranking view supports trajectory ranking analysis. It consists of three

components: a ranking diagram, an occurrence temporal distribution view and a modified box-plot.

The ranking diagram visualizes trajectories ranking over road segments. The temporal distribution view

displays the distribution of trajectory groups with respect to occurrence time. The modified box-plot gives

a statistical description of travel time on each road segment. Such statistics are also shown in the spatial

view and encoded by the width of each road segment band.

2.1.13 Generic mapping applications and tools

Many mobile applications now allow for activity and location tracking. We will focus on a specific applica-

tion, Moves, because this application has a large set of connected visualization tools that are interesting

to explore 8. This tools work with the Moves API and require access to the data stored by the Moves

application.

• Fluxtream 9 - is set up as an aggregation and visualization tool. It is possible to map very different

data sets, including where you were tweeting last week.

• Resvan Maps 10 - It plots your places, paths, and categorize paths depending on the activity

(transport, walking, running, and cycling). Additionally, it is possible to create analysis circles and

have the application compute the time you spent in a certain location you bound.

• MMapper 11 - is a more technical application (requiring setup). It allows to visualize where peo-

ple spend most of their time in an interactive way. There is one particularly interesting work in

Quantified Bob - Visualizing 2 weeks of Passive Location Tracking12 that uses this application.

• Move-O-Scope 13 - possible to explore maps by activity type, day of the week, and custom data

ranges. Also possible to see how many times a specific place was visited, where you come from

and where you go next, what days you typically visit, and your typical time of day at that place.

2.1.14 Discussion

1) Visits 2) AllAboard 3) QSSpiral 4) ST-TrajVis 5) LifeLines 6) GeoFlow 7) GeoTime 8) Visual Mobility 9) AprilZero Sport 10) Google Timeline 11) TrajRankScalability no yes no some yes yes no some yes yes yesPersonal semantics no no no no no no no yes yes yes noColected Data

Other apps GPS Wi-Fi, GPS

Existing dataset

Existing dataset MS Excel

Existing dataset GPX tracks Other apps, sensors GPS, Wi-Fi GPX tracks

Visualization method Timelinemap + graphics Spiral

map + spacetime cube Timeline map 3D timeline map + graphics Timeline Timeline Timeline

Track simplification no no no yes NA NA no yes no yes (too much) noError removal no yes no yes NA NA no yes no no no

Table 2.1: Visualization Summary

8http://quantifiedself.com/2014/03/map-moves-data/ accessed: 07-10-20149https://fluxtream.org/ accessed: 07-10-2014

10http://resvan.com/map/ accessed: 07-10-201411https://github.com/feltron/MMapper accessed: 07-10-201412http://www.quantifiedbob.com/2013/09/what-moves-me-visualizing-2-weeks-of-passive-location-tracking/

accessed: 06-10-201413https://move-o-scope.halftone.co/ accessed: 07-10-2014

16

http://quantifiedself.com/2014/03/map-moves-data/

https://fluxtream.org/

http://resvan.com/map/

https://github.com/feltron/MMapper

http://www.quantifiedbob.com/2013/09/what-moves-me-visualizing-2-weeks-of-passive-location-tracking/

https://move-o-scope.halftone.co/

Table 2.1 shows a summary of the most important aspects of the works reviewed in this section.

We will not evaluate usability issues. Despite the increasing number of visualization studies, usability

has been, somewhat, neglected [19]. It is still unclear how usable and useful these techniques are, how

can they be improved and in which tasks they should be used [20]. Keeping that in mind, we are not

evaluating the usability of the studied works - we are analyzing relevant features of those works.

The first row analyses scalability - this is a long-lasting challenge for information visualization [19]

[21]. The second row refers to whether or not the information shown has personal meaning to the user

(eg. my house, my work). The third row shows how data was collected. The fourth lists the kind of

visualization method used. The fifth shows if the work applies some kind of track simplification (dividing

recorded tracks into trips, etc). The last rows specifies if there is any error removal mechanism (GPS

accuracy, cold start/end, etc.)

Regarding scalability, our work requires that a huge amount of data (potentially a person’s whole life)

can be queried and visualized with ease. Not many works allow this - most of them rely on importing

data that spawns roughly a week. More than that time and the interface would become clogged by too

much information for a user to understand. The most notable exception is 9) - this work’s focus is on a

person’s whole life, and therefore allows to display the information in a clear and understandable way.

The field of personal semantics, that is, the support for locations and times that only make sense

for the user (my house, my work, mom’s house, lunch time, etc.) is only supported by two works (8)

and 9)). As stated in the Objectives section, our work aims to allow the user to query his/her personal

information. This information has concepts that only make sense in the scope of that particular user

(mom’s birthday, my house location), thus it is important to support this feature.

Regarding the visualizations methods, we have chosen not to include any works that use automatic

animation, because the widely spread opinion is that it fails to be more helpful than other visualization

types [22] [23]. Many usual techniques are used, such as 2D maps and space time cubes. There is

however, one different and interesting way of representing the information - in a timeline (1), 5), 7) and

9)) - that seems to work well.

Another important feature is the data quality and cleaning issue. It is important, for a successful

visualization experience, that the shown data is previously analyzed and divided in parts that make

sense for the current context (eg. dividing a big GPS track into several tracks that correspond to a

trip). It is also important to assure that the information is real - when working with devices that gather

GPS signals, it is often common to get many types of errors, including [16]: GPS accuracy (causing

misrepresentation of the actual route taken) and Cold start/end (when the persons forgets to turn on/off

the data collection - it leads to wrong detection of trip ends). Despite being an essential feature, very

few works worry about dealing with it. 8) even releases a tool to process the collected data.

To summarize, the most important topics in geo-temporal visualization our solution will address are:

• Scalability (the support to potentially show a person’s whole life data) of the solution;

• Support of personal semantics while querying the data (it is one of the main goals);

• Focus on data quality - the tracks must be pre-processed for error-removal and simplification.

17

2.2 Specifying Queries

In this section we will present relevant works regarding visual querying.

2.2.1 Cigales, Sketch! and Lvis

Figure 2.12: Cigales interface and query visualiza-

tion

Cigales language [24] allows to express a visual

spatial query by means of a composition of static

icons.

The interface contains two windows: one to

express an elemental query and the other to sum-

marize the final query. An elementary query is

built by defining the operands at first, then clicking

on the spatial operator icon. The system answers

by displaying the iconic composition.

For example, the query Which towns are bor-

dered by forest? is represented in figure 2.12.

Figure 2.13: LVis evolution operators between sur-

facic objects

Sketch! [25] gives the user greater freedom. A

spatial query is expressed by drawing a sketch on

the screen, which is later interpreted by the sys-

tem. This means that the operators are directly

derived from the sketch, not chosen by the user.

Lvis is an extension of the Cigales language

[26]. Spatio-temporal queries involve at least one

spatial criterion and one temporal criterion. The

spatial operators are: intersection, inclusion, ad-

jacency, disjunction and equality. While the temporal operators are based on [27]. The conjunction of

both (a spatio-temporal query) originates six different operators (see figure 2.13).

2.2.2 The Challenges of Specifying Intervals and Absences in Temporal Queries:

A Graphical Language Approach (TQ:AGLA)

This work creates a temporal query interface with particular focus on two event sequences: intervals

and the absence of an event [28]. It aims to determine the central difficulties of query specification and

alleviates these difficulties through graphical interaction.

The supported events are: point event, absence of point event, compacted interval event, expanded

interval event and absence of interval event (figure 2.14).

The first difficulty is specifying intervals as point events: users would attempt to specify a query for

an interval event as two separate point events (one for start, the other for end). A second difficulty is

specifying absence as non-presence. Users often assumed that by not specifying the presence of an

18

event, they were, by implication, specifying its absence. Another is making users understand they can

access ”does not occur” relationships.

Figure 2.14: Query representing ”stroke occurs dur-

ing Drug A”

Complex questions may require complex

queries, however, many times, a simpler ap-

proach to answer the same question is available

using ”does not occur”. The last difficulty is un-

derstanding the logic of absences. Specifying

queries using absences is counter intuitive. Fur-

thermore, users are unaware of the different types

of absences.

All this recurrent difficulties must be handled

by the interface and are part of the learning pro-

cess. The first difficulty, for example, is solved by

forcing the users to specify the full interval (not

just the start or the end).

While the scope of temporal query is very broad, this work focuses is only on intervals (with start and

end time) and the absence of an event.

2.2.3 VizPattern

Figure 2.15: VizPattern workspace

VizPattern is an interactive visual query environ-

ment [29]. It has no spatial focus - only temporal.

It uses a comic strip metaphor to enable users to

define and locate complex temporal patterns.

The comic strip metaphor has been previously

used by the same authors in QueryMarvel [30].

Their study concluded comic strips are easy to

learn and understand by the users. Comparing

to traditional ways of querying (form based), users

were more effectively able to translate queries into

a comic based representation and extract the ac-

curate meaning.

The interface is divided into two panels. The

upper is the comic strip editor, where users compose their queries, and the other is the results panel

where answers are displayed (figure 2.15).

A comic strip consists of a series of panels laid out along an invisible time line. Each panel can

represent one event.

The temporal events supported by VizPattern are:

1. Event B happens after event A - is represented as panel B followed by panel A.

19

2. Event A and B happen at the same time - A and B are represented in the same panel.

3. Event A happens after B with specified time interval - a text specifying the time is shown on the

upper part of the panel.

4. Event happens at a specific time - a clock on the upper right corner of a panel is shown.

5. Event happens in an absolute time range - represented using a combination of the above methods.

They also provide the meanings to manipulate results in order to refine queries: users can edit

undesired results to form queries that return desired results.

Although VizPattern focuses on medical data, it can be extended to many other areas.

2.2.4 TaxiVis

Figure 2.16: TaxiVis workspace

TaxiVis operates on an important urban data set:

taxi trips. It has a geographical and temporal fo-

cus in addition to multiple variables associated

with each trip.

It proposes a new visual query model that

supports complex spatio-temporal queries over

origin-destination data. Users formulate queries

visually by interacting with maps and other visual

representations. They can iteratively refine their

queries through direct manipulation of the results.

Temporal constraints are defined using a wid-

get (figure 2.16 A).

Spatial constraints are specified by polygons

and arrows on the map. The user can also link

two regions to form a directional constraint (figure

2.16 B C). Maps are also used to display query results.

The basic visual representation of the results is a point cloud: each trip is represented by a pair of

points denoting the pickup and dropoff locations. For a small number of trips, it gives a quick insight,

however, as the number of trips increases, it gets cluttered very quickly. A set of alternative visualizations

are provided to the user, including: an adaptive level of detail strategy to reduce the number of rendered

points, and a heat map visualization to show the distribution of pickups/dropoffs in one area

2.2.5 GeoDec

One of the main design goals of GeoDec is the provision of an immersive environment that enables

users to interact with GeoDec and perform a wide range of spatiotemporal queries intuitively. [31] [32].

The visualization interface for GeoDec allows users to navigate and interactively query the 3D envi-

ronment in real-time.

20

The main elements of the GUI are as follows (figure 2.17): 1) 3D rendering of the geolocation with

superimposed interactive visualizations of query results; 2) query creation panel with available data

types, query types and spatial and temporal bounds; 3) query result layers panel where issued queries

are presented and 4) an advanced time-line that enables users to pan and zoom in time and define time

ranges.

One of the most important features of GeoDec is its querying capability. The formulation of queries

involves determining five building blocks, name Data Types, Query Type, Spatial Bounds, Temporal

Bounds, and optionally Query Scheduling. When creating a query, users select each one of the building

blocks in turn.

First, the Data Types is determined, which can be done (or any combinations of) object(s) or infor-

mation the user is interested in (e.g., road networks, buildings).

Next, Query Types will be specified, which can be a Fetch (Range) Query, Nearest Neighbor Query,

Shortest Path or Visibility Query.

Third, the user will formulate the Spatial Bound, four modes of spatial bound are supported: 1)

Circular bound; 2) Two Points bound; 3) Rectangular bound and 4) Trajectory bound.

Fourth, the Temporal Bound of the query is set as the From Time and To Time, which means all the

returning objects should have a valid lifetime overlapping this temporal bound.

Figure 2.17: The user interface of GeoDec detailing its important elements: 1) rendering area, 2) query

creation panel, 3) query results panel and 4) temporal navigation. 5) and 6) show magnified renderings

of 3) and 4) respectively.

21

2.2.6 Time Automaton - a visual mechanism for temporal querying

Figure 2.18: Spatial visualization of bus stops’ activ-

ity in a weekend

Time Automaton [33] is a visual temporal querying

mechanism that is capable of formulating differ-

ent types of temporal queries, including complex

ones. It is specially useful for queries involving

sequential patterns like for example ”every sec-

ond Monday” or ”every other Monday in every 4th

month in every year”.

The logic of the Time Automaton model is in-

spired on finite-state automata. A query is rep-

resented as a graph defining how an input string,

built from temporal data, is processed.

The mentioned string is a sequence of words

with temporal meaning, in which some of the

words refer to temporal markers and others to temporal data (facts).

In figure 2.18 a resulting query visualization is shown. They query is on the upper part of the image,

and the spatial result on the lower part. This query shows the activity of bus stops’ during a weekend.

2.2.7 Discussion

Table 2.2 shows a summary of the most important aspects regarding queries in the works studied above.

The first works analyzed in this section are merely from a historic perspective, since more recent

works use similar techniques in much more developed way.

The first row specifies if the main objective of the work is to query spatial information, while the

second row specifies if it is to query temporal information. The third row, which is recurrency, refers to

whether or not that work supports recurrent queries. The personal semantics row specifies if queries

include any personal semantic mechanism, and the last row states which kind of metaphor is used when

building the visual query.

There are two types of scope in the analyzed works: Spatial and Temporal. The first one refers to

works that only allow us to query spatial features, either by specifying boundaries or selecting a zone

on a map. The second type of works focus on querying temporal information, either by making use of a

timeline, some widget or some other metaphor.

Some works may make use of both temporal and spatial features, but only allow the querying of one

type. This is the case of 3). It allows the querying of temporal data, and the resulting information is

shown with help of a 2D map.

22

1) VizPattern 2) TaxiVis 3) Time Automaton 4) GeoDec 5) TQ:AGLA 6) ST-TrajVis

Spatial Scope no yes no yes no yes

Temporal Scope yes yes yes yes yes yes

Recurrency no no yes no no no

Personal Semantics no no no no no no

Metaphors comic stripmap interaction/time

widgetfinite automata

spatial bounds/time

widget

timeline

intervals

spatial bounds/time

widget

Table 2.2: Specifying Queries SummaryBesides these two types of scope, we also focus on the support of recurrent queries and the possi-

bility to support queries with personal semantics.

Only 3) allows a very useful thing in our work - the possibility to have recurrent queries. That is, a

user might want to know things like ”show all the places near a shopping mall where I went more than

three times”.

None of the studied works has personal semantics in their query system. Our work is closely related

to personal information because we need to allow the user to include his/her places/times that have a

personal semantic (eg. my house, lunch time).

2.3 Conceptual Frameworks

With the enhancement of spatio-temporal capabilities of GIS systems, many frameworks were developed

with the aim of providing an abstraction while manipulating spatial and temporal movement data.

The first two works analyzed cover basic topological relations. The first is an early, yet updated work,

regarding topological relationships in time.

The second aims to review all the possible spatial relationships.

2.3.1 Towards a general theory of action and time

Figure 2.19: Temporal topological relationships (im-

age from [34], original idea from [27])

Topological relationships define relative locations

along a timeline. It is useful to be able to express

relationships between events viewed as knots or

singularities along the timeline. These relation-

ships are shown on figure 2.19 [27] [34]. Note

that each of the seven basic relative interval rela-

tionships has an inverse.

2.3.2 Topological relationships be-

tween complex spatial objects

From a database and GIS perspective, the development of spatial relationships has been motivated by

the need of formally defined predicates as filter conditions for queries. The resulting large number of

23

predicates in this paper, makes them difficult to handle for the user. To overcome that, they were divided

in five groups [35]:

• relationships between two complex lines

• relationships between two complex regions

• relationships between two complex points and lines

• relationships between a complex point and a region

• relationships between complex lines and a complex region

Figure 2.20: 4 of the 82 topological relationships be-

tween two complex lines

In the scope of our work, the most important

relation is the first, regarding complex lines (fig-

ure 2.20). This work gives importance to the dif-

ference between endpoints and paths, an essen-

tial concept in our work. Endpoints represent the

start/end of a track, and a path represents the way

between the start and end points.

2.3.3 Triad Framework

In a time when GIS systems were geared towards the representation and analysis of situations frozen

in time, J. Peuquet [34] suggested a framework that unified temporal and location aspects.

The framework states information related to where (location-based view), what (object-based view)

and when (time-based view).

The framework permits the user to ask the following questions:

1. when + where ⇒ what Which objects are present at a given location in a given time.

2. when + what ⇒ where The location(s) occupied by an object (or a set of objects) at a given time.

3. where + what ⇒ when The time (or set of times) that a object (or set of objects) occupied a given

location (or set of locations).

2.3.4 Conceptual Framework and Taxonomy of Techniques for Analyzing Move-

ment

In the base of this framework are the three fundamental constituents of movement: space, time and

objects. Movement does not exist without any of these, while there is no other constituent that is essential

[6].

The framework also exhaustively encompasses all possible linkages between the three constituents.

It includes characteristics of objects in terms of locations and times, characteristics of locations in terms

of objects and times and characteristics of times in terms of objects and locations.

24

They divide the visualizations in not aggregated and aggregated movement data.

Not aggregated data allows for 14 different visualizations based on maps, time graphs, temporal bar

charts, map sequences, space-time cubes and space-time graphs.

Aggregated data allows for 17 different visualization techniques using maps, temporal display, chart

map (map with embedded charts), time graph, space-time graph, space-time cube, map sequence,

transition matrix, flow map, flow map sequence, sequence of transition matrices, transition matrices with

embedded charts, among others.

2.3.5 Visual Analytics for Analysis of Movement Data

This framework considers generic tasks that arise in the analysis of movement data [36]:

1. Data pre-processing: Enrich the data with additional fields; filter sequences corresponding to

absence of movement; etc.

2. Extraction of significant places: usually time spent in a place indicates the significance of the

place. To interpret this places and recognize whether they are significant, an analyst may overlay

them on a map and look for objects situated nearby.

3. Extraction of trips: a sequence of GPS records needs to be partitioned into sub-sequences

corresponding to trips.

4. Examination of trips

(a) Viewing individual trips: Individual trips can only be viewed using an interactive time filter

(figure 2.21), otherwise overlapping trips would be represented.

(b) Clustering of trips: Trips may be similar in a variety of ways, either by coinciding fully or

partially in space, by having similar shapes, common start and/or end points, etc. It is useful

to have a tool that allows the user to choose a similarity function to group related trips.

(c) Summarization of trips: Representing trips by lines does not give an accurate measure of

frequency. A method that allows representing multiple trips in a generalized and summarized

way consists in drawing arrows (which show the movement directions) and the thickness of

the arrow is proportional to the number of moves.

Figure 2.21: The interface of the interactive time fil-

ter

This interface of temporal querying (figure

2.21) is designed for interval querying only. Re-

current expressions such as ”every Monday” are

not possible to formulate. Another limitation is the

timeline - it does not allow to display very large

time intervals. Time properties (days, months,

etc) are also not represented, meaning that inter-

val queries such as ”from Monday to Thursday” are not formulable.

25

2.3.6 STNexus

STNexus is a framework that supports space-time database and information visualization capabilities

[37].

It is designed around a modular architecture in order to maximize flexibility and allow a wide range

of possible scenarios.

It supports analysis of a combination of different types of data - maps, images, statistics - and raw

data about people and objects in space and time. By analyzing and inferring the data, it is possible to

assemble evidence and assess potential threats in a military and security context.

The prototype consists of a database, visualization and knowledge acquisition components.

In the scope of this work, the main focus are the database, query and visualization components.

• Database - uses Secondo14, an extendable open source database engine.

• Queries - uses GeoTools15, an open source library that provides an abstraction layer between data

storage (spatial database) and usage (visualization).

• Visualization - uses GeoVISTA Studio16, which provides a suite of components for quickly building

geovisualization applications.

2.3.7 Discussion

In 2.3.1 we reference one of the most well known works that specifies temporal relations: it allows

for seven different ones. As our work aims to allow a user to query its personal data, many temporal

relations studied are not valid in this context because, as it is obvious, a person cannot be in two places

at the same time - the only important relations for us are X before Y and X meets Y.

The two main concepts of our solution are spatial and temporal queries. Personal information is

not considered a query type, because it will be included in the spatial and temporal strands. It is also

important to allow recurrency in our system - people want to ask things like ”show all the places near a

shopping mall where I went more than three times”.

Regarding the spatial scope, an emergent pattern that arises from the analysis of 2.3.2 is the dif-

ference between endpoints and paths. Both concepts are very different: the endpoints are what is

commonly referred to the end and start of the trip. The path is the way that connects the start to the end.

Our solution also relies on this approach: while formulating the spatial part of a query, it is important

to distinguish the endpoints from the path because there are attributes that an endpoint can have and a

path cannot.

14http://dna.fernuni-hagen.de/secondo/ accessed: 08-10-201415http://www.geotools.org/ accessed: 08-10-201416http://www.geovistastudio.psu.edu/ accessed: 08-10-2014

26

http://dna.fernuni-hagen.de/secondo/

http://www.geotools.org/

http://www.geovistastudio.psu.edu/

2.4 List of requirements

Here we will present the requirements of our work, based on the previous analyzed works and gathered

information.

• Visual query system - In section 2.1 we analyzed a variety of works relying on visual query

interfaces. Our work will also rely on them, with the main purpose to hide query complexity from

the user and to make the user more at ease with the interface. Below we describe the the query

types our system will support.

• Temporal query - As seen in the reviewed works and as Adrienko et al. [8] states, more attention

to time and users should be given. Geo-visual tools should be more temporal and should be de-

veloped to be used by different types of users, not only by those who possess advanced computer

competences. We will need to introduce time as part of our solution in order to specify the temporal

spawn, that is, the duration of the analyzed interval. Queries should have enough expressiveness

to understand which routes correspond to the time specified by the user.

• Spatial query - Supported queries will also have, as a natural consequence of manipulation GPS

data, a spatial scope. As explained in 2.3.2, spatial queries can be divided in two categories:

endpoints - locations that represent the start/end of a route; and paths - that are the route used

between the start and final location. Spatial queries should have enough expressiveness to trans-

late the description of the user into a real place in the map.

• Recurrent query - As introduced by 2.1.3, we can also consider a special type of query: recurrent

ones. They emerge from the need to ask things like ”Where did I go about three times, in the first

week of November?”.

• Semantics query - One of the goals specified in the beginning of this report is the support for

personal semantics. A user should have support to specify the times when he was in a location that

make sense for him (home, son’s kindergarten, etc.), and should be able to use that information in

the query system.

• Scalability - Dealing with personal data is a challenge for many reasons. One of them is the data

size. Despite being possible to reduce the number of points in a GPS track while keeping accuracy,

the magnitude of a few months of data collection is still very big. In order to keep a stable and

smooth system, our back-end implementation must be capable to handle large amounts of data,

providing good response times.

• Data Quality - As referenced in 2.1.8, GPS data collection brings some problems related to accu-

racy. Our work will have to manipulate and treat that data in order to remove errors, spikes in the

paths, cold/start ends and reduce the size of the file by removing useless points.

• Privacy - Our solution will run on a local scope. Each user will have a copy of the software and run

it on a personal computer - no data will be uploaded or stored online. This decision ensures that

27

no personal data is compromised and gives no reason for a user to stop tracking his data because

of privacy issues.

28

Chapter 3

Data Collection

As explained before, Where Have I Been’s main objective is to find a simple way to visually let users

explore their personal mobility data. In order to test it, we developed an application that uses a visual

specification. To understand if we fulfill our objectives, we had to collect personal mobility data from

two different sources. Both sources require some degree of user intervention. The first one is GPS

data collected by the user with the help of any capable device and then stored in gpx files. The second

source is user annotations about the tracks. The second source requires annotating tracks with personal

semantic context, in order to give meaning to the locations a user has been to.

Although the goal of our research is not centered on data collection, for evaluation purposes we

will need to test our work with test datasets containing personal mobility information. Therefore in this

Chapter we will study the various data collection types and discuss how we can improve this data.

Mobility datasets have a lot of potential applicabilities. For example, recent works have been developed

in order to understand human behavior through mobility data investigation [18].

Independently of the purpose, mobility data collection has several inherent problems that can jeop-

ardize the data quality or even tempt users to abandon the process. First we will go through which

data users need to collect in section 3.1. The most common errors and problems with GPS data and

semantic annotation are described below in section 3.2. Then, in section 3.3 we will describe a set of

algorithms used to minimize errors in collected data.

3.1 Which data to collect?

Our application will consume two different data sources. The first is GPS data collected by a device and

stored in files. The second is semantic data about the places visited by the user.

3.1.1 GPS tracks

Our main goal is to explore a users’ personal mobility data: for that purpose the user must collect GPS

data on a daily basis. Data collection is independent of the device as long as it produces some file

output, which will be used as input for our application.

29

Nowadays many mobile applications (for every mobile operating system) emerged as a solution for

GPS track recording. These applications have an important feature in common: they allow exporting

the recorded information to a file(s) - and most important, they all have a specific format in common:

GPX. GPX, or GPS Exchange Format1, is an XML schema designed as a common GPS data format for

software applications and we have chosen it as the input type for GPS data in our application.

Before proceeding, there is an important clarification to make: the difference between tracks and

trips. A track is an object generated by some software, which consists of location data - in our work

we will consider tracks simply to be GPX files. On the other hand, a trip is a concept related to user

behavior. A user makes a trip when he is traveling from when place to another, regardless of the

traversed distance. Therefore, there is no one to one relation between tracks and trips. Users can be

tracking their location and may forget to turn off the software, thus generating a track that corresponds

to more than one trip. Likewise, trips may be divided in several tracks because, for example, the user

accidentally stopped recording the GPS signal.

Ideally, the user would turn on GPS data collection when moving from one place to another and turn it

off when arriving at some place. This behavior is ideal because of two different reasons: batteries don’t

last forever, and therefore, reducing tracking to the essential minimum would spare some battery. The

second reason is that tracking location indoors does not work - because the GPS signal cannot reach

the device, thus it only makes sense to record tracks while moving outdoors. This behavior generates a

single GPX file for every time a user was moving, which is a better result than having a whole file for a

single day. However, as we’ll see in section 3.2 it introduces some problems. For example, a user might

forget to turn off the signal when not moving - this will feed false information to our application (we will

lose the one to one relation between trips and tracks). Furthermore, today’s devices are not capable of

using the GPS sensor during a full day without recharging batteries.

3.1.2 Semantic meaning and annotations

It is also important to assign personal meaning to the data collected by GPS sensors. This data must

require some degree of user interaction, for only the user knows which places and tracks have a special

personal meaning. To illustrate this idea we could assign My home to a specific coordinate, from then

on, every time the user wanted to search for his home he would refer to it as such, and not use direct

coordinates.

We propose a format, inspired by an existing one, used by some supporters of lifelogging, (figure 3.1)

in order to address this question. The purpose of the format is to assign semantic meaning to a period

of time. In the case of this application the purpose is to assign a name to a time a user was indoors.

A line starting with -- represents a new day. A date must follow, in the format yyyy_mm_dd. The

following lines represent an interval of time when a user was not moving - meaning he was at some

location with personal significance and semantic. Each following line will consist of a range of time and

a location name. Time is represented with HHMM-HHMM, in which the first time represents arrival time,

and the last the departure time. Both are separated by a colon. After the colon, the name of the place1http://www.topografix.com/gpx.asp accessed: 14-11-2015

30

http://www.topografix.com/gpx.asp

should be written. After specifying the hours, another new day is expected, as this format will consist in

a sequence of days.

This format also takes into account timezone changes. Timezones can be specified by entering,

alone in a line, something in the notation UTC(+/-)value, such as UTC-10 or UTC+1. All entries below

will be considered in said timezone until another line stating a different timezone appears. The default

timezone (when there are no timezone lines) is the user’s home timezone.

This is the main idea behind the format. There are however, a few cases that require it to be extended.

The first one is related to traveling underground: when a user enters a subway system he cannot

record GPS data due to lack of signal. When generating the semantic file, there would be an interval in

which the user was supposedly indoors (and not moving) but he was actually traveling underground. If

we kept the format as explained before, there would be no correct way to annotate that entry. In order to

accommodate this exception if the name contains an arrow -> we will consider it as a travel instead of a

place (for instance, 1234-1250:saldanha->baixa-chiado).

The second extension is about place travels that imply a timezone change. If a user travels between

two places in which timezones are different he could use the UTC tag, but doing so would imply that

both places were in the same timezone. What is expected is having the origin city in one timezone and

the destination in another. To accomplish this a user can specify @UTC-10 before the travel and only the

destination hour will be affected.

As we can see in figure 3.1 the user started the 15th of June at Home, then went to IST-Alameda,

Mcdonald’s, INESC-ID and at 18h11m made a subway trip from Saldanha to Oriente (notice the arrow

->). On the next day the user went to the airport at 10h12m and went on a flight to Paris (notice @UTC -

every hour following it, except the departure are considered in the specified timezone). On the following

day, the user went to the train station to catch the train from Paris to Athens. Because that trip was

fully recorded, there is no need to use the arrow or the at sign. All the entries bellow UTC will be in that

timezone.

This file (eventually a complete interface in future projects) will be generated every time one (or more)

GPX files are inserted in the system and give information about time periods. The time intervals in a

semantic context file are, in practice the complement of the time intervals for which there are recorded

GPS tracks. Then the user will have to record the location he was at, at the specified time. After editing

the file, the system will update the information, reflecting the changes. After a few editions, the resulting

file will not come empty of place names - our system will understand and eventually suggest places

based on previous locations and times.

31

Figure 3.1: Proposal for the semantic context file

3.2 Real data problems

Dealing with real world data is not an easy task: registering the same route from A to B will generate

a different output each time a user records it because the collected data is subject to GPS reception

problems, among others. Below we will analyze the most common problems regarding GPS tracks

collection and semantic annotation.

3.2.1 Problems related to GPS tracks

The following issues are based on known GPS problems either technological or generated by the user.

• Dataset size: By keeping a constant record of everyday life, a user can easily generate hundreds

of megabytes and thousands of files a year. Since we want to allow this tool to search for a user’s

entire journey, we need to keep it’s data volume to a reasonable size.

• GPS Accuracy: GPS positional accuracy errors are the most common type of errors, causing mis-

representation of the actual route taken. Also causes problems like the wandering effect (tangles)

32

which is an illusion of excessive positioning wandering when in a stationary position. It can also

generate spikes and inconsistencies, usually when there is some physical obstacle between the

satellites and the receiver (figure 3.2 - first three images).

• Forgotten Start/End: Another very common error with GPS-based data collection is forgetting to

turn on/off the data collection, which in turn leads to a wrong detection of trip ends. To explain it

simply: a GPX file might not contain a single trip, but several. More generally: there is no direct

linkage between files and trips. Our application will have to analyze a GPX track and know when

to join tracks that belong to the same trip and tracks that are unrelated.

• Loss of signal: Due to loss of signal, some routes may be missing trackpoints, which will affect

the way we perceive the route. There is a need to refill those empty spaces of data with something

close to what happened to reality. A simple example of this happens when a user enters a tunnel

(figure 3.2 - last image).

• Battery Requirement: GPS-based data collection has a very high battery requirement. Although

both hardware and software have drastically evolved, the battery requirement is still very high.

This can cause users to abandon collecting data or to gather lower quality (reducing sampling

frequency to save battery) or incomplete data (battery totally drained in the middle of a track).

Figure 3.2: Common GPS problems

33

3.2.2 Problems related to semantic meaning and annotations

The following are problems that occur due to user intervention during the annotation of tracks.

• Multiple meanings: Users directly manipulating data can lead to flaws. In this particular case, a

user might have several names for the same place (eg. Mom’s house, Mommy’s place, home) and

despite being all the same place, our program would need to have the intelligence to understand

that.

• Forgetting something: Users that want to annotate some data (eg. add an interval where he/she

was that was not recorded by the gps device) might provide dubious data, since, most likely, they

do not remember the precise hours.

• Keeping up: Places may change their names and names may change their place. That is, a

restaurant may change its name or move to a different location several times during a lifetime.

Similarly, people change address several times - thus ’Mom’s home’ may have several locations

during a lifetime.

3.2.3 Common to both

These are problems that exist both in GPS data collection and in track annotation.

• Privacy: Users sometimes opt to abandon a data collection methodology due to privacy concerns.

Users are not very keen on giving public availability to daily mobility information.

• User’s Burden: Mobility data collection is more than just recording tracks. Besides collecting

tracks there is the need to process this raw data by cleaning, correcting and completing with more

contextual data, trip purpose or trip ends. This extra work can be very burdensome depending on

the methodology and technology used, which in turn can lead to a bigger gap between the data

collection and the processing taking place and ultimately resulting in lower quality data.

3.3 Solving real data problems

In the previous sections we detailed problems that arise from dealing with real data. Below we will

address these problems and find solutions to minimize them.

3.3.1 Dataset size

In order to keep the number of necessary files and dataset size to a low level, we must process the

collected data and apply some simplification. It is important to notice that we want data simplification,

not visual simplification - that is, we want the path to be the same without loosing any significant data.

To reduce the number of points we tested two different algorithms: Ramer-Douglas-Peucker algorithm2

2http://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm accessed: 14-12-2014

34

http://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm

(RDP) and Visvalingam-Whyatt polyline simplification algorithm3. We also tested D3 simplification, but

this one is not useful for us, since it a visual simplification that does not affect the stored data, but only

the visual outcome in the front-end.

Figure 3.3: Comparison between an original gpx track (red) and one with RDP (green)

We compared the algorithms in terms of resulting points (figure 3.4): for a similar coefficient, RDP

keeps more points than Visvalingam. This latter one, however, takes too much time processing the

information, it can take up to 12 seconds (RDP takes less than a second) for a track of this dimension

(2825 trackpoints).

Figure 3.4: Comparison between GPX simplification algorithms

Our final choice was RDP, because it allows fast processing of tracks, and keeps an enough amount

of points (that have enough space between them) so that future statistics will not be lost (figure 3.3).

The comparison can be viewed online4.

This algorithm, however, is only focused on changing how a track looks, it is not worried about the

trackpoints. It is important that the algorithm only deletes a trackpoint if it is spatially and temporally

close to another. By analyzing figures 3.5 and 3.6 we can understand it better. Both figures represent a

3https://hydra.hull.ac.uk/assets/hull:8338/content accessed: 14-12-20144http://web.tecnico.ulisboa.pt/~jorge.s.filipe/gpx_comparison/ accessed 28-12-2014

35

https://hydra.hull.ac.uk/assets/hull:8338/content

http://web.tecnico.ulisboa.pt/~jorge.s.filipe/gpx_comparison/

track from point A to point B, thus both have the same traveled distance. The first figure shows how the

traditional algorithm works: given a set of points, it does so by ”thinking” of a line between the first and

last point that form the curve. It checks which point in between is farthest away from this line. If the point

(and as follows, all other in-between points) is closer than a given distance - defined by us, it removes all

these in-between points. The result is a simpler line, as represented in figure 3.5 - lower. For example,

if we want to know where we were at 12:15, it would be merged to interpolate between the end points -

generating a point halfway. Because we are moving at a constant speed, it is correct.

Figure 3.5: Standard RDP problem explanation 1

Considering the situation in figure 3.6, we can see that the user was stuck in a traffic jam (more GPS

samples in the same time). If we applied the traditional algorithm to this track, we would lose information

and would not be able to ask where the user was at 12h15. It would provide misleading information: the

user would be ahead in the path, when he was actually behind, because of traffic.

In this case, halfway does not correspond to the right place for 12:15, because we were not moving

at a constant speed. Thus, we must ensure that only areas of relative constant speed are removed.

Figure 3.6: Standard RDP problem explanation 2

To solve this problem we modified the algorithm5, so it would take in account the speed of the user

(and therefore, the temporal side). Our modification to the algorithm can be seen in listing 3.1.

Listing 3.1: Tkrajina implementation of RDP with temporal modification

1 prevSpeed = −100000

2 def s i m p l i f y p o l y l i n e ( po in ts , max distance , max time ) :

3 #Does Ramer−Douglas−Peucker a lgo r i t hm f o r s i m p l i f i c a t i o n o f p o l y l i n e

4

5 i f len ( po in t s ) < 3:

6 return po in t s

7

8 begin , end = po in t s [ 0 ] , po in t s [−1]

5https://github.com/tkrajina/gpxpy accessed: 15-08-2015

36

https://github.com/tkrajina/gpxpy

9

10 a , b , c = g e t l i n e e q u a t i o n c o e f f i c i e n t s ( begin , end )

11 t = end . time−begin . t ime

12

13 tmp max distance = −1000000

14 tmp max d is tance pos i t ion = None

15

16 for po in t no in range ( len ( po in t s [1 : −1 ] ) ) :

17 po in t = po in t s [ po in t no ]

18 d = abs ( a ∗ po in t . l a t i t u d e + b ∗ po in t . l ong i t ude + c )

19

20 i f d > tmp max distance :

21 tmp max distance = d

22 tmp max d is tance pos i t ion = po in t no

23

24 v = length 2d ( [ end , begin ] ) / t . seconds #m/ s

25 global prevSpeed

26 i f ( abs ( prevSpeed−v ) <0.5) :

27 return ( s i m p l i f y p o l y l i n e ( po in t s [ : tmp max d is tance pos i t ion + 2 ] ,

max distance , max time ) +

28 s i m p l i f y p o l y l i n e ( po in t s [ tmp max d is tance pos i t ion + 1 : ] , max distance ,

max time ) [ 1 : ] )

29 prevSpeed = v

30

31 rea l max dis tance , rea l t ime =

32 d i s t a n c e f r o m l i n e ( po in t s [ tmp max d is tance pos i t ion ] , begin , end )

33

34 i f rea l max d is tance < max distance or rea l t ime < t imedelta ( seconds=

max time ) :

35 return [ begin , end ]

36

37 return ( s i m p l i f y p o l y l i n e ( po in t s [ : tmp max d is tance pos i t ion + 2 ] ,

max distance , max time ) + s i m p l i f y p o l y l i n e ( po in t s [

tmp max d is tance pos i t ion + 1 : ] , max distance , max time ) [ 1 : ] )

Our modification is available at GitHub6.

After applying the RDP algorithm, we perform another simplification intended to remove points that

are not within a minimum distance of each other. This algorithm just checks the whole track again re-

6https://github.com/jmsfilipe/where-have-i-been-lib

37

https://github.com/jmsfilipe/where-have-i-been-lib

moves points that are too close, in order to improve the dataset size. This minimum separation between

points is specified in meters.

3.3.2 GPS Accuracy

There is the need to alleviate certain inconsistencies which were generated by accuracy problems. The

most common issues are spikes and tangles along the path. A point may be to the left or right of the

path, giving a mistaken location as to the position of the path.

In order to mitigate this we will smooth the recorded path. That is, we will make the path visually

uniform, to avoid, for example, spikes and inconsistencies. This will also reduce the number of points

needed to describe the path taken without significantly reducing the information content (figure 3.7).

Despite reducing the number of points, the idea behind this it not at all related to the previous section

(reducing dataset size). Here reducing points is a side effect of correcting and removing outlying points.

While in the previous section we wanted to perform a data simplification, here we want to have a visual

simplification.

Various approaches are possible, with varying levels of complexity. A straightforward but inaccurate

approach would be to assume the measurement noise is Gaussian and just average together clusters of

points, with the difficulty of deciding what is a cluster. A more evolved approach would be to fit a spline

or other parametric curve of a given complexity to the data using least squares, giving one significant

control over the level of smoothing.

The approach used here, based on the implementation by Tkrajina7 focuses on two different ideas:

calculating average distance between points to understand which points are outliers and therefore need

to be removed, and applying a ratio to the other points to achieve a smooth, well-fitting path.

Figure 3.7: Comparison between an original gpx track (left) and one after applying the smoothing algo-

rithm (right)

7https://github.com/tkrajina/gpxpy/ accessed: 12-08-2015

38

https://github.com/tkrajina/gpxpy/

3.3.3 Forgotten Start/End

Our data processing has to assume that there is no one-to-one relation between GPX files and trips: a

file may have any number of trips. Furthermore, a trip may start in one file and end in another different

one (a user accidentally hits the stop recording button and starts recording after a while). To achieve the

right behavior our application has to know when to: join data from two files - in order to form a valid trip;

split data from two files - in order to form a valid trip; find and delete irrelevant trips - trips so short that

were most likely recorded by accident.

We implemented a way to join tracks that represent the same route but, by some reason, were split.

We also take in account the opposite case, in which a route is split into different tracks and needs to be

joined in one track. GPX files are analyzed and if the spatial and temporal space between two different

tracks is so small, it means those tracks should be only one file - thus representing one trip. Furthermore,

if a file contains more than one trip - there is a big blank of space and time in it - the track will be divided

in two, each one representing a different route.

The algorithm to split tracks works as follows: if there is a large variation (given as a parameter) of

distance or time between two points, the track is divided in two - the first track ending at that point, and

the second track starting at that point. The algorithm to join tracks checks temporal and spatial distances

and if those distances are too close (given as a parameter), it implies those tracks should be the same

and are, therefore, combined into one track.

Figure 3.8: Comparison between an original gpx track (left) and one after applying the track2trips and

smoothing algorithm (right). Three tracks can now be seen - the first until the red dot, the second from

the red to the green dot, and the last from the green dot.

3.3.4 Loss of signal

Should a user drive through a tunnel, for instance, GPS devices are not capable of recording such

information. In such cases, our application should try to find out which route was taken by the user (using

map matching tools like Track Matching8 , for example) - however, these kind of tools are still under early

8https://mapmatching.3scale.net accessed: 12-08-2015

39

https://mapmatching.3scale.net

development and do not produce a good percentage of realistic results. This said, in these cases, our

application will just calculate a linear interpolation when there is a clear loss of signal. This interpolation

adds evenly spaced points (containing spatial and temporal interpolated information) between the points

where initially there was no information.

Figure 3.9: Comparison between an original gpx track (left) and one after applying the interpolation

algorithm (right)

3.3.5 Battery Requirement

This is the only problem that has no solution in our approach. Collecting GPS data everyday is battery

expensive and there is not much we can do about it. Although, while we were daily collecting data, the

operating system battery usage stated that, in average, 30% of the battery consumption was due to GPS

tracking. With 70% battery left, there is still time to use smartphones for other usual tasks. However,

We understand, that technology developments regarding battery duration should be met, so that the

batteries could last longer and encourage users to track their mobility data.

3.3.6 Multiple meanings

Multiple meaning is considered a data quality issue. There is no definitive solution for this, but there

are several things we can do in order to mitigate this issue. By providing a good annotation interface, a

user could be offered suggestions while writing a location name and then, instead of creating another

name representing the same thing, he would chose one that already exists. Furthermore, the application

should offer a find duplicates feature in which places that are very close and have similar names would

be listed and asked to be verified by the user. Our application provides neither of these tweaks, as our

annotation interface is very rudimentary and intended to be perfected in future projects.

3.3.7 Forgetting something

It is also considered a data quality problem. Since it depends on the user interaction, there is no solution

to fix it. There are, however, a few things that can be done in order to mitigate the problem. One of

40

them is finding patterns - if it is usual of a user to go to a certain place at a certain day, it would be

natural to suggest that, in order to help the user accomplish the annotation faster and without messing

up information. As before, our application does not have support for patterns - it will be considered as

future work.

3.3.8 Keeping up

Shops, restaurants, parks, supermarkets, may change name with some frequency. In case a location

changes its name, all the previous stored locations referring to the same place, should reflect that naming

change. The same situation happens when a friend moves to a new home. Despite still being called, for

instance John’s house, its location is another. Solving this implies providing a mechanism to associate

a location to a history of names, and a name to a history of locations.

3.3.9 Privacy

Our application does not rely on any external service and, thus, no data is sent to the internet. All of the

gpx files are stored in the users hard drive and the application runs on a local scope.

3.3.10 User’s Burden

As said before, this problem is divided in two different areas: recording the tracks and annotating that

information. It is possible to make data annotation a more pleasant and faster job if we offer a good inter-

face which makes suggestions based on previous behavior. Our algorithm is capable of understanding

patterns and suggest places that were visited before around the same hour. When a user fills in the

semantics information, our application stores, in a cache file, data about the specified locations (time,

name, coordinates). Each time users open the annotation interface, to insert more data, our algorithm

processes all the cache data, and suggests, based on the most common relations between hours/places

the locations they were likely to have been at, on the recently uploaded tracks.

41

42

Chapter 4

Visual Queries

Now that we have data to feed our application, we need a way to explore that data. Below we analyze

the necessary expressiveness for a visual language and explain how to represent that language.

4.1 Expressiveness

Our first input will be loose GPX tracks that have little usefulness for querying personal spatio-temporal

data. It is essential to systematize how to handle these tracks, so we can find relevant information to

query.

In table 4.1 it is possible to see a summary, based on the analysis of all the previous works, of how

to systematize personal data regarding space and time.

Three ways in which a user might query it’s data are addressed: Accuracy, Relativity and Con-

creteness. Each of this concepts can be applied to four different areas: Temporal, Recurrence, Spa-

tial Endpoint, Spatial Path. This was thought to give a fairly complete coverage of all things that may

evoke reasonable questions in this context.

Accuracy refers to whether or not the user wants to refer to something precise or wider. In the case

of time, a user could ask for an accurate time (09:00) or something more vague (around 12:00), in the

case of space, if it is an endpoint a user might want to refer to a precise place (e.g. exactly at home)

or a range around a place (no more than 10m from home); if it is a path, a user might want to refer to

a precise route (accurate - an actual GPS route) or a route within some limits (fuzzy - e.g. a route that

doesn’t deviate 100m from another route). When applying recurrency, a query is accurate if the user

wants to know something like ”where I went exactly 3 times”, or fuzzy if the user asks ”where I went

around 3 times”.

Relativity refers to comparisons between things. In the case of time, a relative time is one relative

to another event (one hour after...) and an absolute event is the same as a concrete or accurate one

(9:00). In the case of space, if considering an endpoint, we can be relating a place to another (e.g. a

place near the shopping mall), or an exact place (e.g. at home); if considering a path, something like

(e.g. a route that doesn’t deviate more than 100m, overall, from route X - or endpoint X) is relative and a

43

particular route (e.g. a similar route to an actual GPS route) is absolute. In relativity, recurrency is easy

to understand: a user might want to compare the number of times he went to a certain place - ”went

more times to X than to Y”.

Concreteness defines the target. In the case of time, it can be an abstract date, such as ”mom’s

birthday”, or a concrete time like ”9:00”. In the case of space, if considering an endpoint, we might give a

rough description of a place (e.g. generic shopping mall) or its absolute position (e.g. mom’s house). If

considering the path, an abstract path is a class of possible routes (e.g. highway) and a concrete refers

to a concrete route (e.g. highway A5). In the case of recurrency, we believe there is no such thing as an

”abstract” recurrency. A concrete recurrency is the same as an absolute or accurate one.

This table will serve as a basis for our query mechanism, because it specifies the expressiveness we

want the system to have. Thus, it specifies all the possible query types our system will have to support.

In section 4.4 we cover individual cases of the expression table and show how they can be translated to

our visual language.

44

End

po

int

Pa

th

Fuzz

y

A t

ime

inte

rval

(ex:

fiv

e m

inu

tes

aro

un

d 1

2:00

)

Som

e am

ou

nt

of

tim

e

(ex:

wen

t ar

ou

nd

3 t

imes

to

…)

A r

ange

(ex:

no

mo

re t

han

10

met

ers

fro

m h

om

e)

The

sam

e ro

ute

, wit

hin

so

me

limit

s

(ex:

all

rou

tes

taki

ng

the

sam

e ro

ads

fro

m A

to

B, w

her

e "t

he

sam

e ro

ute

" m

ean

s th

ey d

on

't d

evia

te, o

vera

ll, m

ore

th

an 1

00

m

fro

m e

ach

oth

er)

Acc

ura

te

A s

pec

ific

tim

e

(ex:

at

9:0

0)

Spec

ific

am

ou

nt

of

tim

e

(ex:

wen

t ex

actl

y 3

tim

es t

o…

)

A s

pec

ific

loca

tio

n o

r d

ista

nce

(ex:

at

lat

37.5

43, l

on

: -7

.432

432

;

at

Ho

me;

exa

ctly

100

m f

rom

Ho

me)

A p

arti

cula

r ro

ute

(ex:

all

rou

tes

sim

ilar

to a

n a

ctu

al G

PS

rou

te)

Re

lati

ve

Tim

e re

lati

ve t

o a

no

the

r ev

en

t

(ex:

on

e h

ou

r af

ter…

)

Co

mp

aris

on

(ex:

wen

t m

ore

tim

es t

o X

th

an t

o Y

)

Pla

ces

in r

elat

ion

to

oth

er o

nes

(ex:

a p

lace

nea

r O

eira

s P

arq

ue)

A r

ou

te w

ith

in a

cer

tain

dis

tan

ce f

rom

a b

asel

ine,

or

wit

h

sim

ilar

feat

ure

s

(ex:

a r

ou

te t

hat

do

esn

't d

evia

te m

ore

th

an 1

00

m, o

vera

ll, f

rom

rou

te X

- o

r en

dp

oin

t X

)

Ab

solu

teC

on

cre

te t

imes

/ in

terv

als

(ex:

at

9:0

0)

Spec

ific

am

ou

nt

of

tim

e

(ex:

wen

t ex

actl

y X

tim

es t

o…

)

An

exa

ct p

lace

(ex:

at

Ho

me)

A p

arti

cula

r ro

ute

(ex:

all

rou

tes

sim

ilar

to a

n a

ctu

al G

PS

rou

te)

Ab

stra

ct

A "

clas

s o

f ti

mes

" (t

ime

des

crip

tio

n)

(ex:

sta

rt o

f a

foo

tbal

l gam

e; c

hri

stm

as d

ay)

-

A c

lass

of

pla

ces

or

rou

gh d

escr

ipti

on

(ex:

a s

ho

pp

ing

mal

l)

A c

lass

of

po

ssib

le r

ou

tes

(ex:

usi

ng

a h

igh

way

)

Co

ncr

ete

A c

on

cre

te t

ime

*

(ex:

at

9:0

0)

Spec

ific

am

ou

nt

of

tim

e

(ex:

wen

t ex

actl

y 3

tim

es t

o…

)

A c

on

cret

e p

lace

(ex:

mo

m's

ho

use

, lat

37

.432

, lo

n:-

7.5

432

)

A c

on

cret

e ro

ute

(ex:

usi

ng

hig

hw

ay A

5)

* It

can

be

con

cret

e an

d f

uzz

y ("

aro

un

d 1

2:0

0"),

or

con

cret

e an

d a

ccu

rate

("1

2:00

") o

r ab

stra

ct a

nd

fu

zzy

("ar

ou

nd

din

ner

tim

e")

or

abst

ract

an

d c

on

cret

e ("

at d

inn

er t

ime"

).

Co

ncr

eten

ess

is d

efin

ing

the

targ

et (

wit

h m

ore

or

less

sem

anti

cs f

or

the

use

r). F

uzz

ines

s is

ho

w "

tigh

t" a

rou

nd

th

at t

arge

t th

e lim

its

are

Acc

ura

cy

Rel

ativ

ity

Co

ncr

eten

ess

Spat

ial

Tem

po

ral

Rec

urr

ence

Table 4.1: Spatio-Temporal classification for personal geolocation data

45

4.2 Visual language

In order for users to make queries we developed a visual language allowing them to search for their

personal spatial data.

As explained in section 4.1, our language will have to support three different concepts - time, space

(in two ways: place and path) and recurrence. Each of these concepts will be divided in different degrees

- accuracy, relativity and concreteness.

Our language metaphor is a timeline. Users can sketch their query in this timeline with two different

types of objects: paused time and movement time. As a guiding principle, everything related to time

will be drawn on the horizontal axis and everything related to space will be drawn on the vertical axis.

Another important consideration is that larger areas imply a larger value (either of time or space).

Routes cannot exist on their own, they must have a start point and a destination point, thus, every

time two paused times are created, a route (movement time) connecting the first to the second will

automatically be added.

Figure 4.1: Where Have I Been query area

Paused time (represented in figure 4.1 as a blue rectangle) reflects a period of time when the user

was not moving - usually means the user was at some place: home, restaurant, school, etc.

Movement time (represented in figure 4.1 as a gray line between the rectangles) is a period of time

when the user was moving. There is always a line between two rectangles.

To support each different concept (time, space, recurrence) each object has fields that can be filled

in. Every parameter is optional - in case the user doesn’t fills in any value, it will not be considered.

• Temporal concept language specification

Below we explain how the language was formulated to support the time expressiveness repre-

sented in table 4.1.

– Accuracy support A time is either accurate or fuzzy. To represent an accurate time a user

must fill in a time box, located at the left and right sides of an indoor time (figure 4.2). By just

selecting a time, the user will be referring to precisely that time. However, a user may want

to know something like ”Where was I after 17h?” - in that case the mathematical symbols (>,

<, ≥, ≤) should be used. To represent a fuzzy time, a user must first specify an accurate time

and then specify the desired range (figure 4.3). Times are shared between different objects,

for example, in figure 4.1 the first paused time shared its ending time with the route object

(this time as the start time). Instead of specifying a start/end time, a user might just want

to specify the route/stay. In this case no conflicting information can be provided: that is, the

duration cannot be longer than the specified (if any) start/end times (figure 4.4).

46

– Relativity support Relativity only makes sense when there is more than one object. To

achieve a relative specification a user has to specify a duration to the route connecting two

locations. For example, if the user wants to know where he was one hour after being at home,

he should draw two rectangles and specify the duration of one hour to the route connecting

both rectangles.

– Concreteness support An abstract time (a class of times) is specified in the same box as

start/end time - but as soon as the user types, a dropdown of existing abstract times is shown

for the user to select.

Figure 4.2: Editing a start time

Figure 4.3: Editing a temporal range

Figure 4.4: Editing a duration

47

• Recurrence concept language specification

Below we explain how the language was formulated to support the recurrency expressiveness

represented in table 4.1.

– Accuracy support Recurrence is represented by an arch above a stay. It is created by

selecting the rectangle and then the icon representing the arch. Recurrence is applied daily -

that is, all results will shown how many times you went to the specified place during a specific

day. As with time, recurrence can also be fuzzy. Selecting fuzziness follows the same principle

as in time, using mathematical symbols (>, <, ≥, ≤) - see figure 4.5.

Figure 4.5: Applying recurrence

• Spatial concept language specification

Below we explain how the language was formulated to support the spatial expressiveness repre-

sented in table 4.1.

– Endpoints Spatial concept is divided in Endpoints and Paths. Endpoints are actual locations

either named by the user, or represented by geographical coordinates.

∗ Accuracy support To provide a fuzzy specific location a user either can write a location

name in the middle of the rectangle or select existing locations from a dropdown menu.

If a user wants to specify a coordinate instead of a name, he must click on the box and

then on the location he wants on the map. As explained before, space is represented

vertically. So, in order to specify a fuzzy location (that is, a location with some range)

a user can vertically drag the upper and lower borders of the rectangle (or just input a

number - in meters, in the box on the upper right corner), see figure 4.6.

∗ Relativity support To specify a place in relation to another, a user can specify a location

(either by name or coordinate) and then specify a range in which the wanted location is.

∗ Concreteness support As with times, locations can also be grouped in classes. The

process to choose a class of places is similar. While writing in the location box, categories

will also appear on the dropdown menu.

48

Figure 4.6: Editing a spatial range

– Paths A path is the actual route taken by the user to get from one point to another.

∗ Accuracy support To specify a specific route of comparison a user can upload a GPX

file containing that route (see figure 4.7). To specify a fuzzy route, the user also needs

to upload a file, but then has to vertically drag the line, following the principle that larger

areas mean larger values, to specify a desired fuzziness, in meters.

∗ Relativity support Relativity in this case means using a location (coordinate) as a pass-

ing point of the route. To specify it, a user must pick click on the route box and pick a

coordinate from the map. Doing so will automatically add a range (defined in the settings).

∗ Concreteness support As with time and endpoints, paths can also be grouped into

classes. In order to select a class a user needs to start typing in the route box - as usual

a suggestion will appear on a dropdown, allowing the user to choose the desired route.

Figure 4.7: Adding a comparison route

4.3 Visual language examples

In this section we will consider a few entries of table 4.1 and show how they can be translated into visual

queries.

However, some of the cells are not implemented in this solution, mainly due to schedule constraints.

Temporal fuzziness: in this first example, the user states a starting time - 12 o’clock more or less 5

minutes - and says he was at home (figure 4.8).

49

Figure 4.8: Visual query demonstrating range of time

Temporal accuracy: in this example, the user specifies a start time (12 hours) and the location

(Home) (figure 4.9).

Figure 4.9: Visual query demonstrating absolute time

Temporal relativity: here the user is asking for the time when he left home and after one hour was

at School - specifying time relatively to a previous event (figure 4.10).

Figure 4.10: Visual query demonstrating time relative to another event

Spatial (endpoint) fuzziness: in this fourth example the user wants to know when he was within

100 meters from Home (figure 4.11).

Figure 4.11: Visual query to specify spatial fuzziness

Spatial (endpoint) accuracy: in this example the user wants to know when he was exactly 100

meters from Home (figure 4.12).

50

Figure 4.12: Visual query demonstrating spatial accuracy

Spatial (endpoint) relativity: in this example example (figure 4.13) we refer to relative spatial end-

points. We implemented a simplified version of this concept: we can mark a coordinate on the map or

refer to a know location and then specify a spatial range - much like in figure 4.11.

Figure 4.13: Places in relation to other ones - simplified version

Spatial (endpoint) absoluteness: in the seventh example we refer to precise know locations already

registered by the user (figure 4.14).

Figure 4.14: An exact place

Spatial (path) relativity: here we state we want to know all the routes that passed nearby (default is

200 meter range) that coordinate (figure 4.16).

Figure 4.15: A route within a certain distance from a baseline

The two following examples show more complex queries, that represent the full potential of the visual

language and illustrate more realistic situations. In the first situation (figure 4.16) the user can find when

he was at least 1 hour at home close to noon.

51

Figure 4.16: Visual query demonstrating ending times

In the last situation (figure 4.17), users want to know every time they went to a commercial space on

the way from home to IST Alameda, but have not been there more than 30 minutes.

Figure 4.17: Visual query demonstrating duration constraints and class of places

Some cells in the table were not (yet) implemented in this solution (grey cells in table 4.2). Below is

a brief explanation:

Regarding the temporal side, we haven’t implemented abstract times (class of times), for that would

imply even more annotation from the user.

Recurrence was discarded from implementation because of schedule constraints.

Regarding the spatial scope no class of places was implemented (we do have categories that ag-

gregate locations, but, as of now, only serve the purpose of coloring the results). Similarly, we do not

support named routes (such as highway ).

4.4 Visual language validation

This user test was intended to evaluate the understandability of the visual query mechanism. Here we

will present a few conclusions and the changes thereby made regarding the first test results that served

as a query validation method.

Our universe of testers was composed by a group of four people, which demonstrated interest in

using this kind of tools, with some experience in using technological gadgets (such as smart phones,

tablets, computers), females and males and aged between 20 and 26 years old.

Firstly, they were briefly introduced the system’s interface and the main goals and essential mechan-

ics were explained. Then each user was handed a copy of the test protocol (appendix A.1) and could

consult it to check any doubts.

Users were asked to perform tasks speaking out loud about their doubts on how to proceed, so that

we could gather some feedback. Notes about user behavior and decisions while using the interface

were taken, in order to complement the validation. The number of errors users made while attempting

to perform each task was also recorded.

Tasks were designed to cover all implemented cells of table 4.1, from temporal concreteness and

fuzziness to spatial concreteness and fuzziness, searches with durations, searches with more than one

52

location, etc.. Three tasks focused on the spatial side, four tasks focused on the temporal side and one

tasks was dedicated to explore the interface.

To test spatial accuracy, for example, we used task 1 (in appendix A.1) which stated to: ”Search for

dates where I arrived at ist-alameda at precisely 9h16min”.

In each task we counted the number of errors made by users, so we could have some metrics about

the successfulness of the task. Results are presented in table 4.2.

Below is the list of all the tasks presented to users.

1. Search for dates where I arrived at ist alameda at precisely 9h16min.

2. Search for dates where I was at ist alameda from around 10h120min to 12h120min and then went

somewhere else

3. Search for when I was at ist alameda and then went to ist taguspark

4. Search for where I was at ist alameda, then took route A5 to ist taguspark

5. Search when I was at ist alameda for more than 2 hours

6. Search when I spent less than 1 hour between ist alameda and ist taguspark

7. Search when I was in a 500 meter range from ist alameda

8. Search all the days I was at casa

To validate, we used our implementation so far. It was not just a comprehension test - we wanted to

know if users could really make the queries. Thus we needed to use something concrete for users to

complete the tasks.

Table 4.2: Average number of errors made by task - each cell is a task. Gray cells were not implemented

in this solution. (* this cell expected behavior is the same as another existing one)

Analyzing table 4.2 we can see tasks 2 and 4 presented a higher number of errors. Task 2 specifies

”Search for dates where I was at ist-alameda from around 10h120min to 12h120min and then went

53

somewhere else”. Its complex nature justifies a higher number of errors: a user must first click in the

blue arrow on the side of the rectangle, which may not be an obvious thing to do. Task 4 states ”Search

for where I was at ist-alameda, then took route A5 to ist-taguspark ”. Because we do not support classes

of routes, to successfully finish this task a user must select a point belonging to A5 in the map.

We also derived a list of suggested changes - divided in what we actually implemented and what we

consider future work.

Implemented issues:

• Replace all the names in aggregation results with ”click to expand” instead of overlapping location

names

• Change the way to erase input. Instead of having a cross in the widget popup, show a cross in the

input box

• Add to settings the time at which results become aggregated

• Change map representation (theme) or change route line color for more contrast

Future work issues:

• Result comparison: select two queries and compare both of them

• Pop-up tutorial video

Generally, the results of this validation were positive, to the extent that most people understood the

main principles of the visual language. Despite some tasks that were more error prone, the majority of

tasks was successfully executed without difficulty by the users, revealing that the visual encoding behind

the language specification is understandable and can be easily used by users.

54

Chapter 5

Where Have I Been

Here we will present our approach - first by explaining the scenario in which it will be used. After that,

by explaining all the components present in our implementation.

Our typical user will use a smartphone every day, doesn’t matter which brand, as long as it has a

GPS receiver and a good enough battery. Each time a user moves from point A to B, either walking,

driving, etc, he/she will record the path with a smartphone - a very simple application for this, in Android

is My Tracks1, this application is simple because it allows to record and stop the path with a single click,

without having to write anything or go through complicated menus.

The main idea is, each time a user moves from one place to another, a different track should be

recorded. This means that if, for example, a user goes from home to a coffee-shop, spends five minutes

drinking coffee, and then goes to school, it would result in 2 different tracks: home → coffee-shop and

coffee-shop → school.

As stated before, we support personal semantics related to locations a user is familiar with. For this

to happen the user will have to record in which place he was at a certain hour - with this mechanism (that

is intelligent enough to understand patterns and suggest routes that happened in similar conditions) we

will be able to categorize the locations in a personal semantic way.

We adopted a client-server approach in order to improve scalability. The main functions of our server

are to clean GPX files, store data in a spatial database and process the queries.

As explained before, our main objective is not caring about the source of the data, but it is also

important to provide quality data in order for the queries and visualizations to be optimal. Thus, each

GPX file representing a track will be affected by a library (GPX Lib) that, among other things, cleans the

data, removes error points, smooths the trajectory, etc. All the problems related to GPX files explained

in section 3.3 will be solved at this stage.

All our server code is Python. This choice was made because it has great community support and

many libraries are publicly available. We provide this API at GitHub2, because this library will not be

specific for this work, and we think it could be used by others to remove errors from GPX tracks.

After being processed, track data is stored in a PostgreSQL + PostGIS database. It is not a common

1https://play.google.com/store/apps/details?id=com.google.android.maps.mytracks accessed: 20-09-20152https://github.com/jmsfilipe/where-have-i-been-backend accessed: 15-08-2015

55

https://play.google.com/store/apps/details?id=com.google.android.maps.mytracks

https://github.com/jmsfilipe/where-have-i-been-backend

relational database because we are handling a particular data type: geographical data. While relational

databases are created to deal with types like integer, dates, string, currency, we need a convenient

database to handle points, edges, polygons (areas). Several extensions to known database systems

provide spatial support. We’ve chosen PostgreSQL extension - PostGIS3 because it is considered a

mature service and it also supports point, linestring and polygons. Furthermore it follows the SQL

specification from the Open Geospatial Consortium4.

Communication between the client (browser) and the server are done using Websockets. Web-

sockets were chosen instead of Ajax requests because they allow the server to communicate with the

browser without the browser having to request it. The browser can also send data to the server. This is

quite useful for sending notifications and updates because the server can send them when it gets them,

instead of waiting for the browser to ask for them.

The front-end side will exchange data both from queries and results, from the server with JSON

messages, in particular, messages that contain spatial data will be encoded in GeoJSON5. We choose

JSON because it is a common standard to transmit data between a server and a web application,

furthermore, having a specific standard for dealing with spatial structures is also useful and can benefit

this solution in the future, integrating with other projects. Another advantage of choosing JSON is having

a lot of tools that already support JSON messages.

To implement the query and result viewing mechanism we used javascript libraries, such as vis.js6,

moment.js7, list.js8, jqueryUI9, clockpicker10, Semantic UI sidebar11, Google Maps Javascript API12 and

some bootstrap13 features.

Figure 5.1: Solution schema

3http://postgis.net/ accessed: 14-12-20144http://en.wikipedia.org/wiki/Simple_Features accessed: 23-12-20145http://geojson.org/geojson-spec.html accessed: 12-08-20156http://visjs.org/docs/timeline/ accessed: 15-08-20157http://momentjs.com/ accessed: 15-08-20158http://www.listjs.com/ accessed: 15-08-20159https://jqueryui.com/ accessed: 15-08-2015

10http://weareoutman.github.io/clockpicker/ accessed: 15-08-201511http://semantic-ui.com/ accessed: 15-08-201512https://developers.google.com/maps/documentation/javascript/ accessed: 15-08-201513http://getbootstrap.com/css/ accessed: 15-08-2015

56

http://postgis.net/

http://en.wikipedia.org/wiki/Simple_Features

http://geojson.org/geojson-spec.html

http://visjs.org/docs/timeline/

http://momentjs.com/

http://www.listjs.com/

https://jqueryui.com/

http://weareoutman.github.io/clockpicker/

http://semantic-ui.com/

https://developers.google.com/maps/documentation/javascript/

http://getbootstrap.com/css/

5.1 GPX Library

In chapter 3 we described our GPX manipulation library. Here we explain how we use the library. The

first step in manipulating GPX files is to analyze each file and detect whether or not it corresponds to a

single trip, to several trips in one file, or just to a bit of a trip. After analyzing all the files, the algorithm

will merge and split files so that the final output is a file for each single trip. All of the files are saved

following a convention - the date at which the route took place followed by the number of the track in

that day ( 2014-10-09-part0.gpx). Now that all the necessary files are ready we start by minimizing

GPS accuracy problems by applying the smooth algorithm. As previously explained in section 3.3, this

algorithm will remove inconsistencies, spikes and tangles, resulting in the same number of files, but

without visual problems. The next algorithm applied to the files is the modified Ramer-Douglas-Peucker

(simplify_polyline). It aims to do data simplification and not visual simplification (reduce the number

of points while maintaining the appearance). The last algorithm (reduce_points) will re-check every

track and see if there are points that still can be removed due to being either spatially or timely too close

to each other. The full documentation of GPX Library can be read in appendix C.

5.2 Semantic Annotation

After being processed files are ready to be annotated by the user - annotation uses the format specified

in section 3.1.2. We provide a rudimentary interface (figure 5.2) for this task. Future work includes

developing a good user friendly interface to correct GPX data and add personal semantics to the tracks.

Currently our interface only shows a text editor for the semantic locations file.

Figure 5.2: Data collection edition

5.3 Backend

The backend of our solution is composed of three distinct components (figure 5.1), Database, Query

translation and Communication components.

57

After GPX files are manipulated and annotated their data is stored in a spatial Database. A spatial

database is a database that is optimized to store and query data that represents objects defined in a

geometric space. In addition to typical SQL queries such as SELECT statements, spatial databases can

perform a wide variety of spatial operations. The following operations and many more are specified by

the Open Geospatial Consortium standard:

• Spatial Measurements: Computes line length, polygon area, the distance between geometric fig-

ures, etc.

• Spatial Functions: Modify existing features to create new ones (like intersecting features).

• Spatial Predicates: Allows true/false queries about spatial relationships between geometries. Ex-

amples include ”do two polygons overlap” or ’is there a residence located within a mile of this

place’?.

• Geometry Constructors: Creates new geometries, usually by specifying the vertices (points or

nodes) which define the shape. Types include Point, Linestring, Polygon, MultiPoint, MultiLinestring,

MultiPolygon and GeometryCollection.

• Observer Functions: Queries which return specific information about a feature such as the location

of the center of a circle

There is a large number of spatial reference systems (datums) in use. Many of them are optimised

for use in one particular part of the world. We chose to use WGS-84 (4326)14 because it is globally

used when dealing with GPS coordinates. A datum is a model of the earth that is used in mapping. The

datum consists of a series of numbers that define the shape and size of the ellipsoid and it’s orientation

in space. A datum is chosen to give the best possible fit to the true shape of the Earth.

In our case, a track will correspond to a Linestring in PostGIS, while an annotated location will corre-

spond to a Point. Tracks will be stored in table linestrings (figure 5.3) while locations are stored in table

places.

Route start and end times are stored in table trips. If a route has a corresponding Linestring (that is,

if the user recorded it), its corresponding data will be referenced in table linestrings. If the trip was made

without being recorded (that is, just annotated), a null entry will appear in table linestrings.

Start and end times of stays (indoor time) are stored in table stays.

Locations are stored in table places and are regularly updated. When a user annotates a location, our

algorithm will compare the coordinate corresponding to that location against all the previous coordinates

with the same name and calculate a new (more accurate) location based on the average and removing

outliers.

14https://nsidc.org/data/atlas/epsg_4326.html accessed: 20-09-2015

58

https://nsidc.org/data/atlas/epsg_4326.html

Figure 5.3: Database Entity-Relationship diagram

We also performed load testing in our solution, to determine if it could deal with a lifetime of geospatial

personal data. We used 20835 gpx files - assuming eight files per day, it would correspond to roughly

seven years of data. These files totaled 1.7GB in size. This assumption of eight files per day was derived

from a worst-case scenario based on data collected by the author’s supervisor during five years (table

5.1).

Table 5.1: Summary of collected data during 5 years

They were first converted by our GPX library (to split/join, smooth and simplify tracks) and resulted

in 26505 gpx files. They totaled 280.8 megabytes and the processing was done in 54 minutes and

57 seconds. It consisted in a size reduction of 83%. Despite taking almost an hour, we believe its an

acceptable result, since an operation like this is not frequent. Usually a user is not going to process 7

years of data at a time, but only when new data arrives, ideally daily, probably once a week. Applying

the semantic processing algorithm (associating hours from the semantic interface) took 8 minutes and

33 seconds. Inserting all the data in the database took 13 minutes and 13 seconds. These tests were

made using Intel Core i5-430M Dual-Core 2,26GHz, 4GB DDR3-1066 and a Samsung SSD 840 Evo

disk.

59

5.3.1 Query translation

After all this data is stored, our server is ready to wait for queries from the client. When a user enters a

query in the interface and then presses the search button, a JSON object containing all the fields is sent

to the server. Upon receiving this object, the server has to translate that information into a SQL query.

The translation mechanism is quite simple: each part of the query system has a direct translation to SQL,

we just need to compose all the different parts in a single query. Furthermore, different query types have

different SQL templates, thus generating a full SQL statement implies combining all of the above. If a

user builds a query based on temporal fuzzyness, we know the main structure of our statement will need

a WHERE clause. And because we are dealing with fuzzyness, we will also need a BETWEEN clause.

After generating the template, the properties that are inside the JSON object must be converted to the

correct types and injected into the SQL statement.

For example, the query represented in figure 5.4 results in the SQL code in listing 5.1. The query

states ”all the situations in which a user was around 100 meters from ist-alameda and then went to

around 100 meters of ist-taguspark. This query will require information from both stays and trip tables.

The first part of the query (from line 1 to 22) and the last part (from line 28 onwards), both return

information from table stays by restricting the location name with the spatial constraint of 100 meters.

The second part from line 23 to 27 returns all trips (because there are no constraints on the trip). These

three different parts of the query and then joined (INNER JOIN) so that we only have results that match

all three parts.

Figure 5.4: Query illustrating SQL translation

Listing 5.1: SQL translation example

1 SELECT q1 . s tay id ,

2 q1 . s t a r t d a t e ,

3 q1 . end date ,

4 q2 . t r i p i d ,

5 q2 . s t a r t d a t e ,

6 q2 . end date ,

7 q3 . s tay id ,

8 q3 . s t a r t d a t e ,

9 q3 . end date

10 FROM (WITH l AS

11 (SELECT po in t

60

12 FROM places

13 WHERE d e s c r i p t i o n = ’ i s t −alameda ’ ) , k AS

14 (SELECT d e s c r i p t i o n

15 FROM places ,

16 l

17 WHERE ST Distance ( places . po in t , l . po i n t ) < ’ 100 ’ )

18 SELECT s tay id ,

19 s t a r t d a t e ,

20 end date

21 FROM k

22 INNER JOIN stays ON d e s c r i p t i o n = stays . s t a y i d ) q1

23 INNER JOIN

24 (SELECT DISTINCT t r i p i d ,


26 end date

27 FROM t r i p s ) q2 ON q1 . end date = q2 . s t a r t d a t e

28 INNER JOIN (WITH l AS

29 (SELECT po in t

30 FROM places

31 WHERE d e s c r i p t i o n = ’ i s t −taguspark ’ ) , k AS

32 (SELECT d e s c r i p t i o n

33 FROM places ,

34 l

35 WHERE ST Distance ( places . po in t , l . po i n t ) < ’ 100 ’ )

36 SELECT s tay id ,


38 end date

39 FROM k

40 INNER JOIN stays ON d e s c r i p t i o n = stays . s t a y i d ) q3 ON q2 .

end date = q3 . s t a r t d a t e

After generating the whole SQL statement, the server executes it and starts to process its resulting

information. In order to improve result readability on the client side, first, results are aggregated by

similarity and location name - entries having similar times and the same location are aggregated. This

process results in aggregated entries, and loose entries. Loose results will keep their properties (start

time, end time and location) while aggregated entries will have as properties a summary of its containing

entries (start and end times divided by quartiles). Quartiles are calculated by analyzing all containing

results. The practical result is shown in figure 5.5. As it can be seen, it shows a summary of all the stays

at casa, showing the most relevant (frequent) hours of arrival/departure. After being aggregated results

61

are sent to the client in the form of a JSON object.

Figure 5.5: Aggregated result with quartiles

5.4 Frontend

In this section we will explain how we build our interfaced based on the previously specified visual

language. We’ll start by presenting how to use the interface and them explain some implementation

decisions.

5.4.1 User interface

Our interface is divided in four different areas: Query area, Results area, Map area and Settings area

(by order in figure 5.6). Each of these components has a different purpose. In the Query area the user

will be able to specify its query in a simple visual way, as previously explained. A careful analysis of

the visual query language was given in section 4.2. Below we review each component of the interface,

starting with the Query area.

Figure 5.6: Where Have I Been user interface

62

Query area

We have two types of sketchable objects: paused time and movement time.

Paused time as we have seen in chapter 4 is represented by a rectangle and reflects a period of

time when the user was not moving - usually means the user was at some place: home, restaurant,

school, etc.

Movement time as seen in 4 it is represented as a gray line between the rectangles. Represents a

period of time when the user was moving. There is always a line between two rectangles.

Double clicking on the timeline will create a rectangle (paused type) in which start and end time

are undefined. Clicking and dragging will create a rectangle which duration will change while dragging.

Everytime two paused types are created, a route (movement type) connecting the first to the second

location will automatically appear.

Figure 5.7: Where Have I Been query area

Each sketchable type has parameters that can be configured. Every parameter is optional - in case

the user doesn’t fills in any value, it will not be considered. Below we explain the different parameters

and their usage.

Paused type parameters:

• start time (figure 5.8) - A start time is precisely that - a time when the stay started. This time

appears on the lower left side of the rectangle and, when clicked, shows a clock widget for the

user to select hours and minutes. By just selection an hour, the user will be referring to precisely

that hour. However, a user may want to know something like ”Where was I after 17h?” - in that

case the mathematical symbols (>, <, ≥, ≤) should be used.

63

Figure 5.8: Editing a start time

• end time (figure 5.8) - An end time follows the same principle of a start time, except that it is

located at the opposite side.

• fuzzyness in start time (figure 5.9, upper left side) - By setting this parameter, in minutes, a user

can add some degree of fuzzyness to start time. For instance, if someone sets fuzzy time at 30

minutes and it’s start time at 10h, this would create a valid range of time from 9h45m to 10h15m.

• fuzzyness in end time (figure 5.9, upper right side) - Follows the same principle of the previous,

except that it is located on the right side.

Figure 5.9: Editing a temporal range

• duration (figure 5.10) - Duration represents the duration of the stay. This parameter cannot conflict

with start/end times: if a user specifies a start time at 10h, end time at 12h and duration of 3 hours,

that would make an invalid query producing no results.

64

Figure 5.10: Editing a duration

• location (figure 5.9, upper middle) - It is possible to specify a location. This location can be either

selected from a dropdown menu (content from the annotation process) or a geographic location

picked on the map.

• spatial range (figure 5.11) - Defining a spatial range is useful because sometimes a user might

not know exactly where he was, so he can define a location and then define a range (in meters)

around that location that is also valid. Can be changed by vertically dragging the rectangles border

or writing directly on the box.

Figure 5.11: Editing a spatial range

Movement type parameters:

• route (figure 5.7, middle) - The route parameter allows a user to specify a point in the map where

the user may have passed. By default, this point accepts a range of 250m about itself. By using

this a user can easily find whether or not he/she drove on a specific road - just pick a coordinate

on the map near that road.

• start time (figure 5.8) - There is no need to define a start time for a movement type. Since paused

and movement types are intimately related, we can see on figure 5.7 that the indoor’s end time will

65

be the movement’s start time.

• end time (figure 5.8) - In this case, the movement’s end time will be the second indoor’s start time.

• fuzzyness in start time (figure 5.9, upper left side) - Fuzzyness is also referenced from the paused

type. If someone sets a fuzzy time at 30 minutes and it’s end time at 10h, this would create a valid

range of movement start times from 9h45m to 10h15m.

• fuzzyness in end time (figure 5.9, upper right side) - Fuzzyness at end time follows the same

principle as for start time.

• duration (figure 5.7, middle) - As with paused type, in movement type, durations cannot conflict

with start or end times.

In the Results area, the results that correspond to the query will be listed. Each entry represents a

result that, somehow, matches the query. In the Map area the highlighted result will be shown on

a 2D map.

In the settings area a user can edit the settings, including assigning categories to places, so that

results will be showed in the category’s color.

After sketching the query a user can either start search by pressing the search button, or clean the

canvas if some mistake was made (figure 5.12). A user can also individually delete a paused type by

clicking on the red cross icon (figure 5.13). It only makes sense to have an odd number of elements in

the search area - that is, it doesn’t make sense having a paused type followed only by a moment type,

leading to nowhere. This decision has several implications: a user can only create paused types, all

movement types are automatically added between paused types; and a user can only delete paused

types, all associated movement types will also be deleted.

Figure 5.12: Where Have I Been search area

A user might also define a date to search, on the left side (figure 5.12). If no date is provided, it

will be considered a global search (that is, it will search all recorded days), and will be represented as

”–/–/—-”.

Figure 5.13: Removing a query element

66

Because visual queries can grow quickly (for example, if a user sketches several locations), there is

the need to fit all that information on the screen to show it to the user. Just shrinking the sketch in order

to fit would not be an option, since overall size would become too small. So, we decided to allow pan

and zoom in all queries. Pan is done by clicking on the query and horizontally dragging the mouse, while

zoom is made using the scroll wheel.

Inserting elements in the middle of an existing query is also possible and useful (if a user forgets to

add an indoor type, for instance). One can drag the mouse between two indoor times (in the movement

time zone) and add another one between the existing two.

Results area

Upon clicking the search button in the search area, our backend translates the query and sends the

results to the client. In the results area (figure 5.6 - 2), however, not all results are shown at once, for a

matter of performance, since we plan to allow viewing an entire lifetime of information, more results will

be shown when the user starts scrolling down (in bulks of 10). Our current implementation aggregates

results in the server side, just before sending them to the client. When dealing with a large dataset,

because of the complex nature of our aggregation algorithm there is a huge performance drawback -

results take too long to be shown on the client. All data corresponding to the results of the query are

stored in the client’s memory. Despite more results being drawn only when a users scrolls down, they

are still in the browser memory and take too long to process.

Figure 5.14: Results reuse and lock options

To avoid this situation in large datasets, a future approach will imply to ease the aggregation algorithm

so it can be replied in SQL. In that case results would be aggregated in the database, and only a fraction

of the results will be sent when the users pressed search. The following results would be requested as

the users scroll downs, with a pagination mechanism on the server side.

Aggregated results are shown as a small summary of their content, conveying the idea of a stronger

gradient where the user was more often. They are represented as overlapped rectangles with opacity,

showing durations of different locations. When an aggregated result is clicked it expands, showing its

aggregated content (figure 5.15).

Users also have the option of reusing a result as a new query (magnifying glass icon in figure 5.14) -

doing so will clean the results and copy the location, start and end dates of the result to the query area,

ready to be refined and searched again.

67

Results can also be panned and zoomed. By default, panning and zooming will apply to all the results

shown - that is, all results shown will be panned/zoomed at the same time. This decision was made to

allow a comparison between results. If a users wants to inspect a result individually, he/she can pan and

zoom by clicking in the locker icon (figure 5.14).

Figure 5.15: Results panel

Map area

When clicking on a specific result (not a summary result) the combination of locations and tracks are

shown on the map (figure 5.7 - 3). When selecting (clicking on) an aggregated result, the first entry of

the aggregation is shown on the map. As a future feature, we plan to show a summary of the containing

trips, instead of just the first. A user can also highlight a specific location and track, just by clicking

exactly on it after clicking on the entry. The map will automatically zoom to show the full track. If there

is no data (tracks or locations) associated with the result, no information will be shown on the map. The

map is also integrated in the search area: a user can pick any point on the map and use it in a query.

Settings area

The settings interface (figure 5.16) appears on the left side of the screen when a users presses the

button represented in (figure 5.6 - 4). There are two different areas in the settings. The Categories area

allows to create categories - that is, a name representing a set of places - and assign a color to it. In the

Places area, a user can assign to all of the recorded locations a category, and thus, a color. This will

affect the way results are colored: all results belonging to one category will appear with the color defined

by the user. There is also the possibility to import and export both category and places relations from/to

a CSV file.

68

Figure 5.16: Where Have I Been settings interface

5.4.2 Implementation

Where Have I Been’s frontend was entirely constructed with pure javascript, without any framework. The

only library used to facilitate development was Browserify15 , which allows dividing javascript code into

modules, and producing a single minimized file with all the necessary code.

Generic timeline considerations

Our implementation of the visual language was based on the vis.js16 timeline module. Vis.js data items

can take place at a single date, or have start and end dates (a range). It’s possible to freely move

and zoom in the timeline by dragging and scrolling in the Timeline. Items can be created, edited, and

deleted in the timeline. The time scale on the axis is adjusted automatically, and supports scales ranging

from milliseconds to years. Timeline uses regular HTML DOM to render the timeline and items on the

15http://browserify.org/ accessed: 12-10-201516http://visjs.org/docs/timeline/index.html accessed: 12-10-2015

69

http://browserify.org/

http://visjs.org/docs/timeline/index.html

timeline. This allows flexible customization using CSS styling. However, our visual language required

more than what the library offered. Necessary interventions included: changing the way timelines were

represented, so that they were vertically centered with an axis; adding a representation for movement

time; support for input text, input date, input time, input duration, input spatial range; the option to apply

temporal fuzziness to start and end times; the ability to add paused time in the middle of a movement

time and the option to vertically and horizontally expand the timeline.

The first challenge (vertically center the timeline and add an axis) involved deleting useless DOM

components the original timeline was provided and add new DOM elements do center our timeline and

to represent a thin horizontal line. The next challenge involved the creation of classes for each of our

types: paused and movement. Each instance of these classes has the necessary methods to draw their

representation according to data received as a parameter. These classes also take care of the dynamic

beneath each <input> box, which is explained in detail on the next section.

Text input

When dealing with text input, several important decisions were made across the interface. <input>

boxes are absolutely positioned in CSS in relation to the rectangle, so that, when panning, they follow

its movement. We have chosen to include widgets that allow time and duration selection. The widget to

select time was based on ClockPicker17. It represents a clock metaphor, and allows the user to easily

select an hour/minute. Because in our solution time may be relative (more than two hours, less than two

hours), we appended a <div> to the widget that allows the user to choose between four different options

(close, >, <, =, ≥, ≤) - figure 5.17 - 1.

Figure 5.17: Interface input widgets: 1 - the time selector; 2 - duration selector; 3 - spatial constraint

selector

When picking a duration, we chose to implement a picker that allows to select hours and minutes

individually: when minutes exceed 59, an hour is incremented. This widget also includes the previously17https://weareoutman.github.io/clockpicker/ accessed: 12-10-2015

70

https://weareoutman.github.io/clockpicker/

mentioned mathematical operators in the same way - an appended div. When dealing with spatial

constraints there is also the need to specify whether we are referring to more than a value or less than

a value - so the option we chose was to use a single widget, as represented in figure 5.17 - 3. Using all

the widgets affects the input box we are dealing with - if we choose an operator, it will appear in front of

the value in the input box. When the user wants to select a specified date to affect the query, a standard

calendar picker is shown.

Dealing with fuzziness

When dealing with fuzzy time: an arrow must be clicked in order for the fuzziness interface to appear.

Figure 5.18: Fuzziness interface. Left side: the user clicked and activated the interface. Right side:

arrow for the user to click and activate the interface.

Fuzzy time can be edited either by filling in the <input> or by horizontally dragging the line. This line

is very small (only 1px of height), so we also decided to increase the available area to click (a square

of 10px) at the endpoints of the line. By horizontally dragging the line, a logarithmic scale is applied so

that when the user starts moving the mouse, values increase slowly and, as the user keeps moving the

mouse, they start to increase faster. This behavior is ideal, because it allows a user to fine tune a value

when moving slowly, and to jump to higher values when moving faster.

Gesture support

In order to support mouse events, we used HammerJS18 to facilitate integration, mainly on complex

situations, such as dragging and panning. In order to drag paused time, we created a small area

both on the left and right side of the rectangle, where the user can vertically drag in order to increase

duration. The same situation happens when dealing with spatial constraints: a user can vertically drag

the rectangle, in order to expand it. The top and bottom area of the rectangle were enlarged, so that the

user could easily click and expand the rectangle. Previously, in the library, those areas were so small,

the user had to aim at the line and with some luck succeed in expanding it.

Results

Results are also based on the same principle as the main query: they have paused time and movement

time, but with only the start/end time and location name. As previously explained, results can be aggre-

gated. If we look at figure 5.19, we can see results are not aggregated. One immediate consequence of

showing results like this would be presenting a huge list of results to the user - showing a lot of results

18http://hammerjs.github.io/ accessed: 12-10-2015

71

http://hammerjs.github.io/

the user might not be interested. To solve it, we decided to aggregate similar results. On the figure we

can see that some location names are repeated (like ’casa’, ’ist-taguspark’ or ’intermarche’).

Figure 5.19: Disaggregated results

Aggregation is calculated at server side. The main rule of thumb is to try to aggregate the entries

with common names, and then understand if there are temporal relations between them (that is, if times

are within a common range, which can be specified in the settings). If a set of results matches those

properties, they are aggregated. Other unrelated results are shown as normal entries.

The main advantage of aggregating results is occupying less space. Those results are shown as

a single entry (figure 5.20 - gray entry), and have to be clicked on in order to be expanded. Once

expanded, a user can close it by clicking on it again.

Another advantage is providing insight about the users behavior. Aggregated results are represented

as a summary of the contained results. We show a maximum of four overlapping rectangles (figure 5.20

- gray entry) that correspond to the quartiles explained in section 5.3.1. Each rectangle has opacity,

so that when they overlap, they transmit the idea of being more frequent. Furthermore, we also show

start/end times. These times can overlap and lead to a confusing experience. To ensure this behavior,

we hide part of the text when components are overlapping. In order to view the whole content, a user

has the option to either zoom in, or hover and read the tooltip. When in an aggregation different location

names overlap, we show ”expand” instead of all the names, in order to reduce the cluttering presented

to the user.

In figure 5.20 we can see how results were aggregated. The first entry (with a gray background)

represents an aggregation. As you can see, different locations names were replaced by an ”expand”

placeholder. In this case, results were aggregated by end time: it’s possible to see that every entry of

ist-taguspark ends between 17 and 18 o’clock.

72

Figure 5.20: Result aggregation example

Minimum widths

Another important situation was defining minimum widths for paused time. Since width is proportional

to the duration it represents, we could not allow short times to be so small they would risk not be seen.

To solve it, we defined a minimum width (240px) to be just the size to fit an average sized location name

and start/end times.

Time consistency

When a user sketches a query its width will serve as a standard for the corresponding results. That is,

the width of the rectangles in the results, will be proportional to that of the query, so that comparison

between them can be easily done. Furthermore, all dates are aligned. It’s possible to see, in figure 5.20,

that dates in each entry are vertically aligned. That is, all the same times appear in the same axis, thus

facilitating comparison.

Map integration

To accomplish map interaction with the results panel, we trigger a javascript event when a result is

clicked, so that its spatial data is correctly shown on the map. When a user specifies a spatial range in

the query, he may chose to pick a location from the map. In this case, we use Google Maps API to draw

a circle, representing the specified range by the user, around the location.

73

74

Chapter 6

Evaluation

In the beginning of this dissertation we described a series of objectives our final solution should comply

with. Many of this objectives correspond to features that will have to be assessed to understand to what

extent are these easy to use. There is particular interest in evaluating how users deal with the query

interface, because it is the most relevant part of the solution. In section 4.4 we’ve seen that people could

use our visual language, now we will evaluate the system as a whole.

6.1 Experimental protocol

The main objective of the evaluation was to understand if users, in practice, understood and could use

our visual language and if they could, in general, use the system in a useful way that would be personally

relevant.

To evaluate the usability, a set of users were given the system to try and comment on its use. This

evaluation consisted of a session, composed of three parts:

1. Initial form to trace testers profile (e.g. age, gender, studies, etc.);

2. Tasks, a set of actions to perform with the application that covered all the possible solutions to a

certain problem;

3. Final questionnaire, aimed at usability and qualitative assessment of the system.

The mobility data used to evaluate our work consisted of two months of daily tracking. The same

dataset was used for each user.

For the evaluation we asked for a set of tasks to be completed. We measured the time users took to

finish the task, recorded the screen, and asked them to talk about their difficulties aloud. Those tasks

are related to the issues below:

• Evaluating the temporal part of a query - users should evaluate and describe (talking aloud) the

difficulties they encounter while elaborating the temporal part of a query. There will be focus on

the different ways a temporal query can be done, such as time intervals or specific times.

75

• Evaluating the spatial part of a query - users should follow the same principle as before: talk aloud

about the difficulties in making a spatial query, with particular focus on a range around a place and

a specific place.

• Evaluating how results are shown - tasks will ask users to find a precise result in the result list.

Users should identify it and comment aloud about the doubts they have.

• Evaluating the way a route is shown on the map.

• Evaluating the settings interface.

Data collected during the task execution includes: task duration time, number of errors, number of

clicks and other relevant annotations. In the end we asked users to fill a questionnaire that qualitatively

evaluates the implemented functionality. Users commented and classified the utility of the features

mentioned above. We also asked users to fill a questionnaire regarding the global feeling of the solution.

This questionnaire was based on System Usability Scale (SUS) 1.

6.1.1 User profile questionnaire

In order to obtain the users characterization, we used the questionnaire presented in appendix B.1 in

order to have a basic characterization of the testers universe (gender, age, studies).

Tests were executed on 21 people, 47.6% were male and 52.4% were female. 71.4% were aged

between 18 and 25 years old, 23.8% between 26 and 35, and only one person with more than 35 years.

Regarding studies, most of the users had a background in Science and Engineering (90.5%), while the

rest had a background in Social Sciences.

Regarding the users’ education level, 57.1% had a Bachelor’s Degree, 23.8% a Master’s Degree and

the rest finished High School.

All user profile results are shown in appendix D.1.

6.1.2 Tasks

Before starting the first stage, we explained the motivation behind our tool and asked users to briefly

explore the tool’s user interface and various functionalities in order to get comfortable with the tool and

reduce errors and long task execution time periods on the first tasks.

The task set (appendix A.2) consists of 12 different tasks focusing different areas of the interface

and different expressiveness query levels. An example of a task related to temporal expressiveness

is ”Search when I was at ist-alameda for more than 2 hours”. Another example, this time related to

exploring the results panel is ”Repeat the first query. Point all the results in which I went from castanheira

do ribatejo to ist-alameda”.

Despite monitoring task time, we decided to ask users to think out loud, during task execution. This

decision was made so that we could be aware of how they are thinking how to reach their goal. This way

1http://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html accessed: 09-09-2015

76

http://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html

we can hear why users execute the proposed tasks the way they do and we also get verbal feedback

about how the participant feels about what is happening. For instance, users may show frustration,

annoyance or even happiness with the task outcome. We also collected the number of errors and the

number of clicks to accomplish each proposed task. We expected and verified that the impact of both

thinking out loud and tracking time to be of little relevance when making the analysis we want - see if

users understood and could use the system.

6.1.3 Overall questionnaire

After the tasks completion a second questionnaire was filled. The objective of this questionnaire was to

assert the system’s usability, users’ satisfaction and feedback for future work. It is available in appendix

B.2.

The questionnaire was divided into two parts. In the first part it evaluates the system’s usability using

the SUS methodology and in the second part some additional domain specific questions.

SUS scores have a range of 0 to 100, and in order to evaluate what is a good result, we looked into

several studies, as the opinions can slightly vary. In ”An Empirical Evaluation of the System Usability

Scale” [38], the author has done 2324 assessments with an average on 70.14 based on all the assess-

ments, although when the assessments are divided up into each project the average is 69.69. Thus, the

author claims that good systems get between 70-80 points, and exceptional systems get 90 or more.

The rational behind the second part of the questionnaire was to understand if users were willing to

collect their own geolocation data and use it in our application in order to gain insight about their daily

routes. To understand that, questions related to smartphone familiarity and usage were asked. Users

could state if they had already used any application that involved the GPS system, or if they use that

feature frequently. When users chose not track daily geolocation data, they were asked which reasons

led to such decision. We also wanted users to give our application a global appreciation score.

6.2 Results

In this section we will present all the results from both questionnaires and task execution.

6.2.1 Results by task

We delivered the tasks’ guide When the initial questionnaire was filled. The first task stated ”Search for

when I was at ist-alameda and then went to ist-taguspark ”. By analyzing table 6.1 we can see that, on

average, users took 36 seconds to perform the task, with a rather big standard deviation of 13 seconds.

Despite being a simple task, its average time is too big. A valid explanation for this value is the fact

that it was the first real interaction with the system. Furthermore, some users during this task dragged

the mouse in order to create a paused time, which would lead to specifying a duration. The expected

behavior was double clicking to create, thus explaining the recorded errors shown on the table.

77

The following task, ”Search when I was at ist-alameda for more than 2 hours”, despite being a little

more complex, because it required mathematical operators, presented a faster execution time and a

lower error rate - mainly explained because it was the second interaction with the system, and the basic

doubts were already clarified. To successfully complete the task, the user must sketch one paused

time and change its duration. Some users changed the duration, but forgot to specify the mathematical

operator for more than.

The third task has a complex nature: ”Search for dates where I was at ist-alameda from around

10h±120min to 12h±120min and then went somewhere else”. In essence, this task is very similar to

the first one, except it requires to add temporal fuzziness. Specifying fuzziness can be achieved in two

different ways, either by filling in the box, or by dragging the fuzziness line. This is the task that presents

the largest completion time. It can be explained because of the way fuzziness is shown - a user must

first click on the arrows on the side of the rectangle in order to the fuzziness information to appear. Since

this behavior is not obvious, users took more time in order to discover it. All this exploring phase lead to

a larger completion time, but also a larger number of errors and clicks.

The fourth task, ”Search when I spent less than 1 hour between ist-alameda and ist-taguspark ” is

very similar to the second task. This time however, instead of just a paused time, it requires two paused

times and a movement time. Comparing its results to the second one, we can see that completion time

decreased a little, even though we are dealing with a more complex task. This result is explained by user

familiarization with the interface, which demonstrates our interface is easy to understand and reuse.

The fifth task is the first to evaluate the spatial expressiveness. It states ”Search when I was in a 500

meter range from ist-alameda”. This task can be completed in different ways - by vertically dragging the

rectangle, or by filling in the box with a the specified number of meters. Users mainly opted to choose

vertically dragging, because it was more intuitive.

The sixth task is simple: ”Search all the days I was at casa”. However, users had some doubts on

how to proceed because specifying all days seemed not trivial. All users had to do to complete the task

was leaving the location field in blank. Because of their doubts, users explored the interface, which led

to a high number of clicks (7) when only 3 clicks were needed.

The seventh task is the last one evaluating spatial scope. It appears a simple task but, because it

requires map interacting it tends to take longer. ”Search for where I was at ist-alameda, then took route

A5 to ist-taguspark ”. This tasks presents a large completion time and error rate, mainly because of

the way a route has to be specified. Specifying the route is done by clicking on the map near the A5

motorway. To accomplish this, users had to sketch the query and then fill in the route of the movement

type by picking a location in the map (and this means users had to zoom the map and search for the

highway).

The eighth (”Add new category School and assign any color you wish”) and ninth tasks (”Assign ist-

alameda and ist-taguspark to School”) evaluate the settings area. They are simple tasks, representing

similar difficulty levels, with just one way of completing them. In both cases users easily completed the

task. Only a few errors were made, which included forgetting to save the definitions and trying to add a

place in the search box.

78

The tenth tasks focuses on exploring results. First a user was asked to repeat a query he made

earlier. With this behavior we could understand if the user evolved and completed the task in a faster

time. The task states: ”Repeat the first query. Point all the results in which I went from castanheira

do ribatejo to ist-alameda”. Comparing with the first task, the average time decreased by 21%, and the

standard deviation was also reduced to 4.2 seconds and users did not show any sign of trouble on how to

proceed. We can conclude users understood how to specify queries and could repeat the process with

ease in less time. Regarding the way results were shown, all users could easily identify the aggregated

result they were asked for.

The eleventh ”Repeat the fourth task. Which of the first three results took less time between from

castanheira do ribatejo and ist-alameda” can also be used to compare previously performed tasks. By

comparing this task to the fourth one, we can notice users completed the task with less clicks, in a faster

time with a smaller time deviation. Allowing us to conclude users evolved and managed to understand

how to specify queries. When exploring the results, user could easily understand which of the results

took less time, as they pointed out ”its the one with the smaller line”.

The twelfth task, ”Explore the previous results on the map”, had no metrics. It was set so we could

understand if users comprehended how to view results on the map area. We observed that users were

familiar with the way results were shown, as they are very similar to the way Google Maps shows a

navigation result. There was some confusion with some results that showed nothing on the map, but

that was because there was no data for that entry.

Table 6.1: Results by task

79

Figure 6.1: Clicks by task

Figure 6.2: Time by task

80

6.2.2 Results by feature

Below we present an analysis of the results for each feature.

General

Some users, in the beginning, did not know how to correctly create a time. They would drag to create,

instead of just double clicking. They were, somehow, correct - the difference is subtle. When a user

double clicks, a paused time is created - without any extra property. However, when a user drags the

mouse and releases the button, the created paused time will have an assigned duration, proportional to

the distance dragged by the mouse. Users after the first task were clarified about this behavior, for they

could not successfully complete the task without understanding that.

Temporal

When dealing with the temporal scope (using it for the first time), users did now know to to add fuzziness

to the query. A user must first click in the blue arrow at the rectangle’s endpoints in order for the fuzziness

interface to appear. However, after discovering the interface they could it well. Another less frequent

mistake was forgetting to assign a mathematical operator to time, when needed - thus giving a complete

different meaning to the query.

Spatial

As in with the temporal scope, users also forgot to add mathematical operators when dealing with spatial

constraints. Another relevant situation is related to the way a user must choose a route from the map.

To complete the seventh task, a user had to browse the map for the A5 highway and click on it - this led

to big completion times, since some users did not exactly know where the highway was.

Results

When visualizing results, users could easily spot the desired result the task asked them to find. They

did, however find that scrolling down to find the result would sometimes skip the desired entry. They

then suggested a ”order by” and ”search” mechanism to be included in the results panel.

Settings

The only major mistake that occurred in the settings interface was a user trying to add a new entry in

the categories table by writing on the search box.

81

6.2.3 Questionnaire results

After executing the proposed tasks and taking notes about how long each user took in each exercise,

we asked our users to answer a final questionnaire. As we have referred earlier, this questionnaire is

divided in 2 parts: first part where we used the System Usability Scale (SUS) and the second part where

we have specific questions about smartphone usage and GPS.

On table 6.2 we can see the SUS scores. Our overall SUS Score is 80.75, which means that we

have a good system and we are on the right track.

Regarding the second part of the questionnaire (results can be seen in appendix D.3), 57% of the

users gave a global appreciation (on a scale from 1 - terrible to 5 - very good) of 4, while 33% gave the

maximum classification. As for the questions about smartphone usage, only one user did not have one.

From those who had one, 66.7% said they used the GPS of their phone. The most common usage of

the GPS was map navigation, with 85.7% of the users using it. When asked if they would track their

daily geolocation data, 52.4% of the users answered no. The main reason was short battery life.

Table 6.2: SUS summarized

6.3 Discussion

Generally, the results of our experiments were positive, to the extent that most people considered that

our application is helpful and the tasks were adequate. Also, having a score of 80.75 on SUS is a very

good result, although we know that there is a lot to improve.

Overall the task results were good and corroborate that our efforts to build a simple and flexible

82

interface were well applied. Usability tests show that there are still some finetunings to be made (mainly

related to the way results are shown and map integration), but overall feedback from users is very

positive. The usability of the system was proven with the tests, with all users completing all tasks.

Situations to be tuned include allowing the user to order results and clearly stating an entry has no map

data.

Overall users feedback was also very positive. They stated that they gained a great perspective

on mobility patterns and would frequently use the tool if the collection method was simple and did not

drained the phone’s battery, thus proving the system to be of utility.

Regarding our objectives (initially specified in section 1.1) we can say that all of them were achieved.

Below we make a careful analysis of each objective and the topics that still need some intervention. The

main objective (Create a visual query language for personal mobility data that allows users to visually

query their personal spatio-temporal information, with personal semantics) was successfully achieved,

only with minor situations to correct and improve. User tests have shown us that our language was

easily manipulated and understood, and could easily generate complex queries in little time.

Another objective was to visually represent the resulting information. This was also accomplished,

but it requires more improvement, in order for results to be easily explored. Users stated they would like

results to be more dynamic, by having the possibility to order and search them.

83

84

Chapter 7

Conclusions

The growing offer and massive spread of GPS tracking devices can generate huge volumes of personal

mobility data. As we’ve seen in this report, there is no efficient way that simultaneously provides a

pleasant experience for the user to explore mobility data.

Thus we found the need to develop a solution that allows a user to query personal mobility data while

maintaining the personal semantics of the locations. First we needed to find the essential components

of our solution: the visualization and the query system. For the query system we had to understand what

was the expressiveness we were looking for - we analyzed what were the main components (temporal,

spatial and recurrency) and made a table that exemplifies the supported queries.

In order to understand the best compromise to use in the formulation of visual queries we analyzed

several works that also used this mechanism, so we could understand the advantages and disadvan-

tages of different approaches. After considering the different options (comic strips, timelines and graphs)

we chose the timeline, because it allows for a obvious time dependence between events.

We also analyzed several works to understand the best visualizations methods (space-time cubes,

2D maps, timelines, etc) and realized the best option was to integrate several of these components.

Thus we have chosen 2D maps to show the route, and a timeline to present all the results of a given

query.

So, based on the works reviewed, we proceeded to determine the expression of queries that a

personal system of this type should have. There we have one of our main contributions, as detailed

in Chapter 4. After specifying the visual language we had to see if users could actually use it, so we

implemented a solution for users to test. This validation was successful, as stated in section 4.4.

After concluding that users understood the visual language, we implemented a whole system includ-

ing that language, which is another contribution of our work. This system allows specifying queries and

exploring their results.

Our initial goals of creating a visual query language for personal mobility data that allows users to

visually query their personal spatio-temporal information, with personal semantics and visually represent

the resulting information were achieved, as our evaluation shows in chapter 6.

The system truly works. Despite having a few minor flaws to be perfected, we did manage to help

85

users understand their own personal mobility data.

7.1 Future Work

Regarding the current implementation, there are still several issues that need intervention:

• Interface for data annotation: as explained in section 3.1.2, our current interface for data annotation

is just a text box with a submit button in the browser. Ideally, this interface would allow selective

removal of entries, a dropdown of location suggestions and a more carefully though design.

• Complete coverage of the expressiveness table: table 4.1 specifies our visual language coverage.

We did not, however, implemented all the table cells. For instance, the column related to recursive

queries is not present in the current implementation. Implementing the remaining cells is the next

step.

• Video tutorial: the first time a user starts using the system, there are no visual clues where to start.

An initial video, explaining the basic concepts of our application would counter those problems.

• Order results by: during users’ tests, a few users suggested they could order the results of a query

either alphabetically or by date.

• Allow result comparison: another interesting suggestion, derived from users’ tests was the possi-

bility to compare results. One could analyze results side by side, and find out in which periods he

was more active, for example.

• Summary of aggregated results on the map: as explained in section 5.4.1, we made the decision

to simplify result aggregation on the map, just by showing the first result. Ideally, when inspecting

an aggregated result, it should show a summary of its tracks on the map (heap map, for example).

86

Bibliography

[1] G. Andrienko, N. Andrienko, U. Demsar, D. Dransch, J. Dykes, S. I. Fabrikant, M. Jern, M.-J. Kraak,

H. Schumann, and C. Tominski. Space, time and visual analytics. Int. J. Geogr. Inf. Sci., 24

(10):1577–1600, Oct. 2010. ISSN 1365-8816. doi: 10.1080/13658816.2010.508043. URL http:

//dx.doi.org/10.1080/13658816.2010.508043.

[2] B. Simpson, C. L. Giles, and A. M. MacEachren. Geodiscoverer: A search engine to integrate social

networks with geospatial information. Raytheon Technology Today, 4:12–13, 2007. URL http://

www.geovista.psu.edu/publications/2007/Simpson_GeoDiscoverer_in_RaytheonToday.pdf.

[3] W. Luo, A. M. MacEachren, P. Yin, and F. Hardisty. Spatial-social network visualization for ex-

ploratory data analysis. In 3rd ACM SIGSPATIAL International Workshop on Location-Based So-

cial Networks (LBSN 2011), Chicago, IL, November 1 2011. URL http://www.geovista.psu.edu/

publications/2011/Luo_2011_Spatial-SocialNetworkVisforEDA.pdf.

[4] W. Luo and A. M. MacEachren. Geo-social visual analytics. Journal of Spatial Information Sci-

ence, pages 27–66, 2014. doi: 5311/JOSIS.2014.8.139. URL http://www.geovista.psu.edu/

publications/2014/Luo_GeoSocial_JOSIS_2014.pdf.

[5] W. Luo. Geovisual analytics approaches for the integration of geography and social network

contexts. Master’s thesis, The Pennsylvania State University, University Park, Pennsylvania,

08/2014 2014. URL http://www.geovista.psu.edu/publications/2014/Wei_L_2014_Thesis_

Final.pdf.

[6] G. Andrienko, N. Andrienko, P. Bak, D. Keim, S. Kisilevich, and S. Wrobel. A conceptual framework

and taxonomy of techniques for analyzing movement. J. Vis. Lang. Comput., 22(3):213–232, June

2011. ISSN 1045-926X. doi: 10.1016/j.jvlc.2011.02.003. URL http://dx.doi.org/10.1016/j.

jvlc.2011.02.003.

[7] G. Andrienko, N. Andrienko, D. Keim, A. M. MacEachren, and S. Wrobel. Challenging problems of

geospatial visual analytics. Journal of Visual Languages & Computing, 22(4):251 – 256, 2011. ISSN

1045-926X. doi: http://dx.doi.org/10.1016/j.jvlc.2011.04.001. URL http://www.sciencedirect.

com/science/article/pii/S1045926X11000280. Part Special Issue on Challenging Problems in

Geovisual Analytics.

87

http://dx.doi.org/10.1080/13658816.2010.508043

http://dx.doi.org/10.1080/13658816.2010.508043

http://www.geovista.psu.edu/publications/2007/Simpson_GeoDiscoverer_in_RaytheonToday.pdf

http://www.geovista.psu.edu/publications/2007/Simpson_GeoDiscoverer_in_RaytheonToday.pdf

http://www.geovista.psu.edu/publications/2011/Luo_2011_Spatial-SocialNetworkVisforEDA.pdf

http://www.geovista.psu.edu/publications/2011/Luo_2011_Spatial-SocialNetworkVisforEDA.pdf

http://www.geovista.psu.edu/publications/2014/Luo_GeoSocial_JOSIS_2014.pdf

http://www.geovista.psu.edu/publications/2014/Luo_GeoSocial_JOSIS_2014.pdf

http://www.geovista.psu.edu/publications/2014/Wei_L_2014_Thesis_Final.pdf

http://www.geovista.psu.edu/publications/2014/Wei_L_2014_Thesis_Final.pdf

http://dx.doi.org/10.1016/j.jvlc.2011.02.003


http://www.sciencedirect.com/science/article/pii/S1045926X11000280

http://www.sciencedirect.com/science/article/pii/S1045926X11000280

[8] G. Andrienko, N. Andrienko, D. Keim, A. M. MacEachren, and S. Wrobel. Editorial: Challenging

problems of geospatial visual analytics. J. Vis. Lang. Comput., 22(4):251–256, Aug. 2011. ISSN

1045-926X. doi: 10.1016/j.jvlc.2011.04.001. URL http://dx.doi.org/10.1016/j.jvlc.2011.04.

001.

[9] A. Thudt, D. Baur, and S. Carpendale. Visits: A Spatiotemporal Visualization of Location Histories.

pages 79–83. doi: 10.2312/PE.EuroVisShort.EuroVisShort2013.079-083. URL http://diglib.

eg.org/EG/DL/PE/EuroVisShort/EuroVisShort2013/079-083.pdf.

[10] G. di lorenzo, M. L. Sbodio, F. Calabrese, M. Berlingerio, R. Nair, and F. Pinelli. Allaboard: Visual

exploration of cellphone mobility data to optimise public transport. In Proceedings of the 19th

International Conference on Intelligent User Interfaces, IUI ’14, pages 335–340, New York, NY,

USA, 2014. ACM. ISBN 978-1-4503-2184-6. doi: 10.1145/2557500.2557532. URL http://doi.

acm.org/10.1145/2557500.2557532.

[11] J. Larsen, A. Cuttone, and S. Jrgensen. QS Spiral: Visualizing Periodic Quantified Self Data. 2013.

[12] T. Goncalves, A. P. Afonso, B. Martins, and D. Goncalves. St-trajvis: Interacting with trajectory

data. In Proceedings of the 27th International BCS Human Computer Interaction Conference,

BCS-HCI ’13, pages 48:1–48:6, Swinton, UK, UK, 2013. British Computer Society. URL http:

//dl.acm.org/citation.cfm?id=2578048.2578106.

[13] T. Goncalves, A. P. Afonso, and B. Martins. Visualizing human trajectories: Comparing space-time

cubes and static maps. In Proceedings of 28th British HCI Conference, HCI 2014 - Sand, Sea and

Sky - Holiday HCI (accepted for publication), BCS HCI, 2014.

[14] C. Plaisant, B. Milash, A. Rose, S. Widoff, and B. Shneiderman. Lifelines: Visualizing personal

histories. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,

CHI ’96, pages 221–227, New York, NY, USA, 1996. ACM. ISBN 0-89791-777-4. doi: 10.1145/

238386.238493. URL http://doi.acm.org/10.1145/238386.238493.

[15] T. Kapler and W. Wright. Geotime information visualization. In Information Visualization, 2004.

INFOVIS 2004. IEEE Symposium on, pages 25–32, Oct 2004. doi: 10.1109/INFVIS.2004.27.

[16] F. F. Leandro. Visual mobility - visual exploration of personal mobility data, 2012.

[17] N. Andrienko and G. Andrienko. Visual analytics of movement: An overview of methods, tools and

procedures. Information Visualization, 2012.

[18] M. Lu, Z. Wang, and X. Yuan. Trajrank: Exploring travel behaviour on a route by trajectory ranking.

In Visualization Symposium (PacificVis), 2015 IEEE Pacific, pages 311–318, April 2015. doi: 10.

1109/PACIFICVIS.2015.7156392.

[19] C. Chen. Top 10 unsolved information visualization problems. Computer Graphics and Applications,

IEEE, 25(4):12–16, July 2005. ISSN 0272-1716. doi: 10.1109/MCG.2005.91.

88



http://diglib.eg.org/EG/DL/PE/EuroVisShort/EuroVisShort2013/079-083.pdf

http://diglib.eg.org/EG/DL/PE/EuroVisShort/EuroVisShort2013/079-083.pdf

http://doi.acm.org/10.1145/2557500.2557532

http://doi.acm.org/10.1145/2557500.2557532

http://dl.acm.org/citation.cfm?id=2578048.2578106

http://dl.acm.org/citation.cfm?id=2578048.2578106

http://doi.acm.org/10.1145/238386.238493

[20] T. Goncalves, A. P. Afonso, and B. Martins. Visualization techniques of trajectory data: Challenges

and limitations. In Proceedings of the 2nd AGILE PhD School 2013, volume 1136 of CEUR Work-

shop Proceedings, http://ceur-ws.org/Vol-1136/paper3.pdf, 2014.

[21] P. C. Wong, H.-W. Shen, C. Johnson, C. Chen, and R. B. Ross. The top 10 challenges in extreme-

scale visual analytics. Computer Graphics and Applications, IEEE, 32(4):63–67, July 2012. ISSN

0272-1716. doi: 10.1109/MCG.2012.87.

[22] G. Robertson, R. Fernandez, D. Fisher, B. Lee, and J. Stasko. Effectiveness of animation in trend

visualization. IEEE Transactions on Visualization and Computer Graphics, 14(6):1325–1332, Nov.

2008. ISSN 1077-2626. doi: 10.1109/TVCG.2008.125. URL http://dx.doi.org/10.1109/TVCG.

2008.125.

[23] B. Tversky, J. B. Morrison, and M. Betrancourt. Animation: Can it facilitate? Int. J. Hum.-Comput.

Stud., 57(4):247–262, Oct. 2002. ISSN 1071-5819. doi: 10.1006/ijhc.2002.1017. URL http:

//dx.doi.org/10.1006/ijhc.2002.1017.

[24] D. Calcinelli and M. Mainguenaud. Cigales: A visual query language for geographical information

system: The user interface. Journal of Visual Languages and Computing, 5:113–132, 1994.

[25] B. Meyer. Beyond icons. In R. Cooper, editor, Interfaces to Database Systems (IDS92), Workshops

in Computing, pages 113–135. Springer London, 1993. ISBN 978-3-540-19802-4. doi: 10.1007/

978-1-4471-3423-7 8. URL http://dx.doi.org/10.1007/978-1-4471-3423-7_8.

[26] C. Bonhomme, C. Trepied, M.-A. Aufaure, and R. Laurini. A visual language for querying spatio-

temporal databases. In Proceedings of the 7th ACM International Symposium on Advances

in Geographic Information Systems, GIS ’99, pages 34–39, New York, NY, USA, 1999. ACM.

ISBN 1-58113-235-2. doi: 10.1145/320134.320144. URL http://doi.acm.org/10.1145/320134.

320144.

[27] J. F. Allen. Towards a general theory of action and time. Artif. Intell., 23(2):123–154, July

1984. ISSN 0004-3702. doi: 10.1016/0004-3702(84)90008-0. URL http://dx.doi.org/10.1016/

0004-3702(84)90008-0.

[28] M. Monroe, R. Lan, J. Morales del Olmo, B. Shneiderman, C. Plaisant, and J. Millstein. The

challenges of specifying intervals and absences in temporal queries: A graphical language ap-

proach. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,

CHI ’13, pages 2349–2358, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1899-0. doi:

10.1145/2470654.2481325. URL http://doi.acm.org/10.1145/2470654.2481325.

[29] J. Jin and P. Szekely. Interactive querying of temporal data using a comic strip metaphor. In Visual

Analytics Science and Technology (VAST), 2010 IEEE Symposium on, pages 163–170, Oct 2010.

doi: 10.1109/VAST.2010.5652890.

89

http://dx.doi.org/10.1109/TVCG.2008.125

http://dx.doi.org/10.1109/TVCG.2008.125

http://dx.doi.org/10.1006/ijhc.2002.1017

http://dx.doi.org/10.1006/ijhc.2002.1017

http://dx.doi.org/10.1007/978-1-4471-3423-7_8

http://doi.acm.org/10.1145/320134.320144

http://doi.acm.org/10.1145/320134.320144

http://dx.doi.org/10.1016/0004-3702(84)90008-0

http://dx.doi.org/10.1016/0004-3702(84)90008-0

http://doi.acm.org/10.1145/2470654.2481325

[30] J. Jin and P. Szekely. Querymarvel: A visual query language for temporal patterns using comic

strips. In Visual Languages and Human-Centric Computing, 2009. VL/HCC 2009. IEEE Symposium

on, pages 207–214, Sept 2009. doi: 10.1109/VLHCC.2009.5295262.

[31] L. Nocera, A. Rihan, S. Xing, A. Khodaei, A. Khoshgozaran, F. Banaei-Kashani, and C. Shahabi.

Geodec: A multi-layered query processing framework for spatio-temporal data. In Proceedings

of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information

Systems, GIS ’09, pages 546–547, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-649-6.

doi: 10.1145/1653771.1653869. URL http://doi.acm.org/10.1145/1653771.1653869.

[32] C. Shahabi, F. Banaei-Kashani, A. Khoshgozaran, L. Nocera, and S. Xing. Geodec: A framework

to visualize and query geospatial data for decision-making. IEEE Multimedia, 17(3):14–23, 2010.

ISSN 1070-986X. doi: http://doi.ieeecomputersociety.org/10.1109/MMUL.2010.1.

[33] L. Certo, T. Galvao, and J. Borges. Time automaton: A visual mechanism for temporal querying.

J. Vis. Lang. Comput., 24(1):24–36, Feb. 2013. ISSN 1045-926X. doi: 10.1016/j.jvlc.2012.10.001.

URL http://dx.doi.org/10.1016/j.jvlc.2012.10.001.

[34] D. J. Peuquet. It’s about time: A conceptual framework for the representation of temporal dynamics

in geographic information systems. Annals of the Association of American Geographers, 84(3):

441–461, 1994. ISSN 1467-8306. doi: 10.1111/j.1467-8306.1994.tb01869.x. URL http://dx.

doi.org/10.1111/j.1467-8306.1994.tb01869.x.

[35] M. Schneider and T. Behr. Topological relationships between complex spatial objects. ACM Trans.

Database Syst., 31(1):39–81, Mar. 2006. ISSN 0362-5915. doi: 10.1145/1132863.1132865. URL

http://doi.acm.org/10.1145/1132863.1132865.

[36] G. Andrienko, N. Andrienko, and S. Wrobel. Visual analytics tools for analysis of movement data.

SIGKDD Explor. Newsl., 9(2):38–46, Dec. 2007. ISSN 1931-0145. doi: 10.1145/1345448.1345455.

URL http://doi.acm.org/10.1145/1345448.1345455.

[37] STNexus: An Integrated Database and Visualization Environment for Space-Time Informa-

tion Exploitation, Orlando, FL, Nov. 29 - Dec. 2 2005. URL http://www.geovista.psu.edu/

publications/2005/Weaver_ARDA_05.pdf.

[38] A. Bangor, P. T. Kortum, and J. T. Miller. An empirical evaluation of the system usability scale. Int. J.

Hum. Comput. Interaction, 24(6):574–594, 2008. URL http://dblp.uni-trier.de/db/journals/

ijhci/ijhci24.html#BangorKM08.

90

http://doi.acm.org/10.1145/1653771.1653869


http://dx.doi.org/10.1111/j.1467-8306.1994.tb01869.x

http://dx.doi.org/10.1111/j.1467-8306.1994.tb01869.x

http://doi.acm.org/10.1145/1132863.1132865

http://doi.acm.org/10.1145/1345448.1345455

http://www.geovista.psu.edu/publications/2005/Weaver_ARDA_05.pdf

http://www.geovista.psu.edu/publications/2005/Weaver_ARDA_05.pdf

http://dblp.uni-trier.de/db/journals/ijhci/ijhci24.html#BangorKM08

http://dblp.uni-trier.de/db/journals/ijhci/ijhci24.html#BangorKM08

Appendix A

Test protocols

A.1 First test protocol

91

Where Have I Been is an application that allows to search your own personal geolocation data. Interface Firstly we’ll start with an overview of the interface:

1. Search area This area allows you to make your queries. A blue colored rectangle represents a period of time you were indoors. A gray line represents a period of time when you were moving. Queries allow different parameters to be specified. All of those parameters are optional. Start/end time Here you state you want results starting/ending at a specific hour. Using mathematical symbols (>, <, ≥, ≤) you can specify things like “after” or “before”.

Temporal Range Specifies a range in the starting/ending time. Below valid start times would be from 11h55m to 12h25m.

92

Location name / coordinates It can be place represented by name “home”, “school”, etc., or coordinates that can be selected on the map.

Spatial Range Can be changed by vertically dragging the rectangle’s border or writting directly on the box. In the query below, all results within 329meters of ”home” would be a match.

Duration It means for how long the stay/trip was. Using mathematical signs we can state if the stay/trip was longer than… or less than...

93

Double clicking on the timeline will create a rectangle in which start and end time are undefined Clicking and dragging will create a rectangle which duration will change When the second rectangle is created, a route connecting the first and second locations will automatically appear.

2. Results area Shows the results matching your query. Results follow the same presentation as the query. Results may be aggregated by similarity, in which case, they will appear transparent and will be collapsable.

3. Map area Shows the routes and locations of a result. When you click on a result, if there is enough data, the map will highlight the locations and routes related to that result.

4. Settings Possibility to assign categories to locations. Different categories will be presenetd with different colours in the results area.

94

Tasks 0. Explore the interface freely 1. Search for dates where I arrived at istalameda at precisely 9h16min. 2. Search for dates where I was at istalameda from around 10h±120min to 12h±120min and then went somewhere else 3. Search for when I was at istalameda and then went to isttaguspark 4. Search for where I was at istalameda, then took route A5 to isttaguspark 5. Search when I was at ist alameda for more than 2 hours 6. Search when I spent less than 1 hour between istalameda and isttaguspark 7. Search when I was in a 500 meter range from istalameda 8. Search all the days I was at casa

95

A.2 Final tests protocol

96

Where Have I Been

Test protocol

First of all, thank you for taking your time in testing this application.

Where Have I Been allows you to search your own personal geolocation data. In this case it

will not be your personal geolocation data, but mine.

You’ll be asked to navigate around the interface so that you become familiar with the

interface and then you’ll perform a few tasks.

If you authorize we will also record the process in video, solely for research purposes.

We will also record the time, but there is no need to feel any pressure, we are just making it

for statistical purposes.

We will not be tracking any identifying information. Your responses are completely

anonymous. You may be assured of complete confidentiality. The information you provide

will be stored only to track survey completion. The data will be reported only in the aggregate

and no individual will be identified.

First, we’ll give you around 5 minutes to explore Where Have I Been’s interface freely. You

can do anything you wish – we just want to get you a little comfortable.

After that we ask you to perform the tasks described in the following page (by order).

We will record the screen and time each task, so we can later analyze your task and count

the number of clicks. We’ll also take written notes regarding your execution of each task.

We ask you to, during the whole process, express your doubts, concerns and problems out

loud, so that we can understand if you are making progress.

There will be a break in the middle of the tasks.

97

Tasks

First, we’ll give you some time to explore the interface. When you think you’re okay, tell us.

A) Dealing with time

1. Search for when I was at ist-alameda and then went to ist-taguspark

2. Search when I was at ist-alameda for more than 2 hours

3. Search for dates where I was at ist-alameda from around 10h±120min to

12h±120min and then went somewhere else

4. Search when I spent less than 1 hour between ist-alameda and ist-taguspark

B) Dealing with space

5. Search when I was in a 500 meter range from ist-alameda

6. Search all the days I was at casa

7. Search for where I was at ist-alameda, then took route A5 to ist-taguspark

C) Adjusting settings

8. Add new category School and assign any color you wish

9. Assign ist-alameda and ist-taguspark to School

D) Search results

10. Repeat the first query. Point all the results in which I went from castanheira do

ribatejo to ist-alameda

11. Repeat the fourth task. Which of the first three results took less time between

from castanheira do ribatejo and ist-alameda

12. Explore the previous results on the map

98

Appendix B

Questionnaires

B.1 User profile questionnaire

1. Gender

(a) Male

(b) Female

2. Age

(a) 18-25

(b) 26-35

(c) 36-50

(d) 51+

3. Academic studies

(a) Science/Engineering

(b) Humanities and Social sciences

(c) Health sciences

(d) Other

4. Academic degree

(a) Did Not Complete High School

(b) High School

(c) Bachelors Degree

(d) Master’s Degree

(e) Advanced Graduate work or Ph.D.

99

B.2 Overview questionnaire

B.2.1 Part One

1. I think that I would like to use this system frequently.

2. I found the system unnecessarily complex.

3. I thought the system was easy to use.

4. I think that I would need the support of a technical person to be able to use this system.

5. I found the various functions in this system were well integrated.

6. I thought there was too much inconsistency in this system.

7. I would imagine that most people would learn to use this system very quickly.

8. I found the system very cumbersome to use.

9. I felt very confident using the system.

10. I needed to learn a lot of things before I could get going with this system.

B.2.2 Part Two

1. Global appreciation

(a) Linear scale (1 - Terrible to 5 - Very good)

2. Do you have a smartphone with GPS?

(a) Yes

(b) No

3. Do you make use of the GPS?

(a) Yes

(b) No

4. In which situations?

(a) Map navigation

(b) Track recording

(c) Photo geotagging

(d) Other

5. Would you track your daily routes?

(a) Yes

100

(b) No

6. What are the reasons that prevent you from tracking your routes?

(a) Privacy concerns

(b) Battery issues

(c) Other

101

Appendix C

GPX Library Documentation

102

Where Have I Been

GPX Library

Intro

This repository offers the source code of the GPX library used in Where Have I Been.

We devised a library to process GPX files, based on the work of tkrajina.

This library has several main purposes:

Smoothing GPX files Dividing GPX files into tracks representing, each, a moment of movement Reducing dataset size

Auxiliary library to process GPX tracks

Important methods specified by calling order:

Dividing and splitting tracks

If there is a strange variation of distance or time between two points, the track is divided in two. If those distances are too close, it implies those tracks should be the same and are, therefore, combined into one track.

track2trip(split_on_new_track, split_on_new_track_interval, min_sameness_distance)

split_on_new_track - whether or not a track should be split into a different file

split_on_new_track_interval - temporal distance between two points in order to consider splitting it

min_sameness_distance - minimum distance (in meters) in order to consider splitting the file

103

Smoothing tracks

Smooths track data, based on the implementation by Tkrajina, focuses on two different ideas: calculating average distance between points to understand which points are outliers and therefore need to be removed, and applying a ratio to the other points to achieve a smooth, well-fitting path.

smooth(remove_extremes, how_much_to_smooth)

remove_extremes - remove outlying points

how_much_to_smooth - decimal value that specifies how much to smooth

Visually simplifying tracks (Ramer Douglas-Peucker)

Adaptation of Ramer Douglas-Peucker, by Tkrajina, including both spatial and temporal constraints. This version takes in account the temporal distance between the original curve and the simplified curve.

simplify(max_distance, max_time)

max_distance - specifies, in kilometers, what is the expected maximum space between track points after the simplification

max_time - specifies, in seconds, the maximum time between two points that is expected after the simplification

Reducing number of points

Intended to remove points that are not within a minimum distance of each other. This minimum separation between points is specified in meters.

reduce_points(min_distance, min_time)

min_distance - the minimum distance between points (meters)

min_time - the minimum time between points (seconds)

104

Appendix D

Evaluation results

D.1 User profile results

Figure D.1: Users’ gender summarized

Figure D.2: Users’ age summarized

105

Figure D.3: Users’ studies summarized

Figure D.4: Users degree summarized

106

D.2 Task results

107

task user clicks time error task user clicks time error

1 9 34 0 1 16 40 0

2 12 30 0 2 20 55 1

3 15 25 0 3 26 46 1

4 10 32 0 4 13 32 0

5 11 26 0 5 15 39 0

6 13 40 1 6 25 61 2

7 18 27 0 7 15 36 0

8 19 63 2 8 20 43 0

9 15 33 0 9 26 51 0

10 11 41 1 10 23 49 1

11 9 55 2 11 20 34 0

12 12 36 0 12 25 44 0

13 15 39 0 13 13 38 0

14 10 26 0 14 26 65 4

15 11 71 3 15 20 60 2

16 11 29 0 16 13 35 0

17 13 36 0 17 16 42 1

18 8 45 1 18 29 57 2

19 13 29 0 19 28 68 4

20 14 22 0 20 13 33 0

21 18 25 0 21 30 70 4

1 7 25 0 1 15 30 0

2 9 35 0 2 14 25 0

3 9 33 0 3 12 23 0

4 10 45 1 4 16 34 0

5 7 22 0 5 12 20 0

6 11 31 0 6 20 41 1

7 7 20 0 7 12 33 0

8 15 44 0 8 12 27 0

9 20 55 2 9 22 50 2

10 12 31 0 10 21 42 1

11 9 25 0 11 13 29 0

12 7 20 0 12 20 25 0

13 12 36 0 13 16 31 0

14 8 24 0 14 19 42 1

15 21 47 1 15 17 25 0

16 11 26 0 16 18 28 0

17 8 21 0 17 17 33 0

18 7 43 1 18 19 41 1

19 15 40 1 19 12 22 0

20 10 32 0 20 15 29 0

21 13 25 0 21 13 21 0

3

4

1

2

108

1 7 18 0 1 12 30 0

2 11 30 1 2 14 35 0

3 8 21 0 3 22 65 2

4 10 39 1 4 11 51 2

5 9 25 0 5 16 40 1

6 12 35 1 6 11 27 0

7 8 22 0 7 12 26 0

8 10 34 1 8 15 42 1

9 8 29 0 9 22 60 2

10 8 26 0 10 11 29 0

11 12 42 2 11 12 31 0

12 10 28 1 12 16 51 2

13 8 21 0 13 15 44 1

14 7 19 0 14 25 55 2

15 15 45 2 15 23 63 2

16 8 22 0 16 16 41 1

17 8 23 0 17 11 25 0

18 8 20 0 18 13 30 1

19 8 21 0 19 11 26 0

20 8 23 0 20 20 42 1

21 7 18 0 21 26 59 2

1 5 5 0 1 6 15 0

2 10 18 1 2 6 16 0

3 5 6 0 3 8 20 0

4 5 7 0 4 7 22 0

5 11 25 1 5 6 15 0

6 21 65 3 6 6 14 0

7 5 5 0 7 7 17 0

8 5 5 0 8 8 20 0

9 5 5 0 9 10 25 1

10 5 6 0 10 11 26 1

11 7 7 0 11 6 15 0

12 16 44 2 12 7 17 0

13 5 5 0 13 10 22 1

14 5 6 0 14 12 25 1

15 5 6 0 15 6 15 0

16 5 7 0 16 7 18 0

17 12 30 1 17 6 16 0

18 5 5 0 18 6 15 0

19 5 6 0 19 7 19 0

20 5 8 0 20 9 20 1

21 5 7 0 21 6 15 0

7

8

5

6

109

1 6 17 0 1 12 25 0

2 6 16 0 2 13 25 0

3 6 18 0 3 12 23 0

4 6 16 0 4 14 24 0

5 7 20 0 5 12 20 0

6 6 19 0 6 12 22 0

7 6 16 0 7 12 21 0

8 6 15 0 8 14 27 0

9 7 21 0 9 13 21 0

10 6 19 0 10 13 26 0

11 10 25 1 11 13 21 0

12 6 16 0 12 12 22 0

13 6 18 0 13 16 31 1

14 6 16 0 14 12 23 0

15 11 22 1 15 13 25 0

16 12 25 1 16 12 23 0

17 11 26 1 17 15 33 1

18 6 16 0 18 13 25 0

19 6 17 0 19 12 22 0

20 6 16 0 20 11 23 0

21 6 19 0 21 13 21 0

1 9 30 0

2 12 28 0

3 11 25 0

4 10 32 0

5 11 26 0

6 13 26 0

7 10 27 0

8 15 38 1

9 11 33 0

10 11 26 0

11 9 34 0

12 12 36 0

13 11 32 0

14 10 26 0

15 11 27 0

16 11 29 0

17 9 23 0

18 8 28 0

19 13 29 0

20 9 22 0

21 11 25 0

119

10

110

D.3 Overall questionnaire results

Figure D.5: SUS summarized

Figure D.6: Global appreciation

Figure D.7: Smartphone possession

111

Figure D.8: GPS usage

Figure D.9: GPS usage situations

Figure D.10: Would users track location

112

Figure D.11: User concerns on tracking

113

114

Documents

Where Have I Been - Visualizing Personal Geolocation Data · Where Have I Been - Visualizing Personal Geolocation Data Jorge Miguel Saldanha Filipe Thesis to obtain the Master of