Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Where Have I Been - Visualizing Personal Geolocation Data
Jorge Miguel Saldanha Filipe
Thesis to obtain the Master of Science Degree in
Information Systems and Computer Engineering
Supervisor: Prof. Daniel Jorge Viegas Gonçalves
Examination Committee
Chairperson: Prof. Miguel Nuno Dias Alves Pupo CorreiaSupervisor: Prof. Daniel Jorge Viegas Gonçalves
Member of the Committee: Prof. João Manuel Brisson Lopes
November 2015
ii
For my parents
iii
iv
Acknowledgments
This journey would not have been possible without the support of my family, professors and friends.
To my family, thank you for encouraging me in all of my pursuits and inspiring me to follow my dreams.
I am especially grateful to my parents, who supported me emotionally and financially.
I take this opportunity to express my profound gratitude and deep regards to my supervisor, Daniel
Jorge Viegas Goncalves for his exemplary guidance, monitoring and constant encouragement through-
out the course of this thesis.
To my friends, thank you for listening, offering me advice, and supporting me through this entire
process.
v
vi
Resumo
Com a grande quantidade de dispositivos portateis que existem hoje em dia que sao capazes de reunir
informacao de GPS, grandes volumes de dados sao produzidos. Apesar do facto de que as pessoas
controlam a sua mobilidade, a maioria das abordagens sobre a analise de dados espaco-temporais sao
muito complexas e tecnicas. Assim, por um lado, e muito facil e comum recolher dados espaco-temporais.
Mas por outro, e difıcil analisar esses dados e extrair coisas com relevancia pessoal.
Para tornar a analise de dados sobre geolocalizacao pessoal mais facil, determinamos uma lin-
guagem visual para ver e consultar os dados, incluindo suporte para semantica pessoal dos locais. Esta
linguagem visual foi validada, antes de continuar a desenvolver o sistema, para ver se os utilizadores
conseguiam usa-la e compreende-la.
Depois da validacao, implementamos um sistema que integra a nossa linguagem visual com a
visualizacao dos resultados e a interaccao com mapas. A avaliacao mostrou que as pessoas con-
seguiam usar e compreender todo o sistema, deixando claro que os objectivos iniciais foram alcancados.
Palavras-chave: dados de movimento, dados espaco-temporais, visualizacao escalavel,
geo-visualizacao, semantica pessoal
vii
viii
Abstract
With the large amount of portable devices that exist nowadays and are capable of collecting GPS data,
big volumes of data are being produced. Despite the fact that people track their mobility, most ap-
proaches on the analysis of spatiotemporal data are too complex and technical. So on one side it is very
easy and common to collect spatio-temporal data. On the other hand it is difficult to analyze this data
and extract things with personal relevance.
To make the analysis of personal geolocation data easier, we devised a visual language for accessing
and querying that data, including support for personal semantics of locations. This visual language was
validated, before proceeding to develop the system, to see if users could use it and understand it.
We then implemented a system that integrated our visual language with result displaying and map
interaction. An evaluation showed people could use and understand the whole system, thus certifying
we had achieved our initial objectives.
Keywords: movement data, spatio-temporal data, scalable visualization, geovisualization, per-
sonal semantics
ix
x
Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Resumo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi
List of Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
1 Introduction 1
1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related Work 5
2.1 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Visits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 AllAboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 QS Spiral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.4 ST-TrajVis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.5 LifeLines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.6 Microsoft GeoFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.7 Geotime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.8 Visual Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.9 Visual analytics of movement: an overview of methods, tools and procedures . . . 12
2.1.10 AprilZero Sport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.11 Google Maps Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.12 TrajRank: Exploring Travel Behaviour on a Route by Trajectory Ranking . . . . . . 15
2.1.13 Generic mapping applications and tools . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.14 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Specifying Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 Cigales, Sketch! and Lvis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
xi
2.2.2 The Challenges of Specifying Intervals and Absences in Temporal Queries: A
Graphical Language Approach (TQ:AGLA) . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3 VizPattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.4 TaxiVis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.5 GeoDec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.6 Time Automaton - a visual mechanism for temporal querying . . . . . . . . . . . . 22
2.2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Conceptual Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Towards a general theory of action and time . . . . . . . . . . . . . . . . . . . . . . 23
2.3.2 Topological relationships between complex spatial objects . . . . . . . . . . . . . . 23
2.3.3 Triad Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.4 Conceptual Framework and Taxonomy of Techniques for Analyzing Movement . . 24
2.3.5 Visual Analytics for Analysis of Movement Data . . . . . . . . . . . . . . . . . . . . 25
2.3.6 STNexus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 List of requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Data Collection 29
3.1 Which data to collect? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 GPS tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.2 Semantic meaning and annotations . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Real data problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 Problems related to GPS tracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2 Problems related to semantic meaning and annotations . . . . . . . . . . . . . . . 34
3.2.3 Common to both . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Solving real data problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.1 Dataset size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 GPS Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.3 Forgotten Start/End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.4 Loss of signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.5 Battery Requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.6 Multiple meanings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.7 Forgetting something . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.8 Keeping up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.9 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.10 User’s Burden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Visual Queries 43
4.1 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Visual language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
xii
4.3 Visual language examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Visual language validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Where Have I Been 55
5.1 GPX Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Semantic Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Backend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3.1 Query translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4.1 User interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 Evaluation 75
6.1 Experimental protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.1.1 User profile questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.1.2 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.1.3 Overall questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2.1 Results by task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2.2 Results by feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2.3 Questionnaire results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7 Conclusions 85
7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Bibliography 87
A Test protocols 91
A.1 First test protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.2 Final tests protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
B Questionnaires 99
B.1 User profile questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
B.2 Overview questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.2.1 Part One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
B.2.2 Part Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
C GPX Library Documentation 102
D Evaluation results 105
D.1 User profile results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
D.2 Task results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
xiii
D.3 Overall questionnaire results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
xiv
List of Tables
2.1 Visualization Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Specifying Queries Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1 Spatio-Temporal classification for personal geolocation data . . . . . . . . . . . . . . . . . 45
4.2 Average number of errors made by task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.1 Summary of collected data during 5 years . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.1 Results by task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 SUS summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
xv
List of Figures
2.1 Visits interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 AllAboard panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 QS Spiral interactive visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 ST-TrajVis interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 LifeLines interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 GeoFlow visualization of U.S. power stations . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Screenshot of GeoTime prototype in calendar mode (Linked time chart) showing recent
events within a smaller local area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8 Visual Mobility UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.9 AprilZero Sport interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.10 Google Maps Timeline interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.11 TrajRank interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.12 Cigales interface and query visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.13 LVis evolution operators between surfacic objects . . . . . . . . . . . . . . . . . . . . . . . 18
2.14 Query representing ”stroke occurs during Drug A” . . . . . . . . . . . . . . . . . . . . . . 19
2.15 VizPattern workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.16 TaxiVis workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.17 User interface of GeoDec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.18 Spatial visualization of bus stops’ activity in a weekend . . . . . . . . . . . . . . . . . . . . 22
2.19 Temporal topological relationships (image from [34], original idea from [27]) . . . . . . . . 23
2.20 4 of the 82 topological relationships between two complex lines . . . . . . . . . . . . . . . 24
2.21 The interface of the interactive time filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1 Proposal for the semantic context file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Common GPS problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Comparison between an original gpx track (red) and one with RDP (green) . . . . . . . . 35
3.4 Comparison between GPX simplification algorithms . . . . . . . . . . . . . . . . . . . . . 35
3.5 Standard RDP problem explanation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6 Standard RDP problem explanation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7 Comparison between an original gpx track and one after applying the smoothing algorithm 38
3.8 Comparison between original gpx track and one after applying smoothing algorithm . . . 39
xvi
3.9 Comparison between an original gpx track and one after applying the interpolation algorithm 40
4.1 Where Have I Been query area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 Editing a start time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Editing a temporal range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Editing a duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5 Applying recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6 Editing a spatial range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.7 Adding a comparison route . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.8 Visual query demonstrating range of time . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.9 Visual query demonstrating absolute time . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.10 Visual query demonstrating time relative to another event . . . . . . . . . . . . . . . . . . 50
4.11 Visual query to specify spatial fuzziness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.12 Visual query demonstrating spatial accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.13 Places in relation to other ones - simplified version . . . . . . . . . . . . . . . . . . . . . . 51
4.14 An exact place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.15 A route within a certain distance from a baseline . . . . . . . . . . . . . . . . . . . . . . . 51
4.16 Visual query demonstrating ending times . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.17 Visual query demonstrating duration constraints and class of places . . . . . . . . . . . . 52
5.1 Solution schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2 Data collection edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Database Entity-Relationship diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 Query illustrating SQL translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.5 Aggregated result with quartiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.6 Where Have I Been user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.7 Where Have I Been query area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.8 Editing a start time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.9 Editing a temporal range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.10 Editing a duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.11 Editing a spatial range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.12 Where Have I Been search area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.13 Removing a query element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.14 Results reuse and lock options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.15 Results panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.16 Where Have I Been settings interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.17 Interface input widgets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.18 Fuzziness interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.19 Disaggregated results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.20 Result aggregation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
xvii
6.1 Clicks by task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.2 Time by task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
D.1 Users’ gender summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
D.2 Users’ age summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
D.3 Users’ studies summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
D.4 Users degree summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
D.5 SUS summarized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
D.6 Global appreciation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
D.7 Smartphone possession . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
D.8 GPS usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
D.9 GPS usage situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
D.10 Would users track location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
D.11 User concerns on tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
xviii
Listings
3.1 Tkrajina implementation of RDP with temporal modification . . . . . . . . . . . . . . . . . 36
5.1 SQL translation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
xix
Chapter 1
Introduction
Lately, with the massive spread of GPS tracking devices, such as smartphones, smartwatches, tablets
and dedicated devices, huge volumes of data are produced, not only representing the mobility of people,
but also animal behavior or natural phenomena. Because of this massive collection of data, it is possible
to say that, at some point, everyone is a spatio-temporal analyst [1], either by planning a journey, looking
for a job or searching for restaurants. With the advance of technology people tend to plan more and
more their life: find the days of the week that will be rainy, understand terrain prices in order to build a
house, search for epidemic contagion.
GIS (Geographic Information Systems) applications are tools that allow users to manipulate and
analyze this information. Common features include the creation of interactive queries, analysis of spatial
information, editing data in maps, and the presentation of the results of all these operations. These
tools are, however, not designed for the general public: they require some programming or expert-level
knowledge.
Despite all those features and the fact that people track their mobility, most approaches on spatio-
temporal data, including GIS systems, don’t consider the personal side of the data. That is, data having
its own meaning for the user (either spatial: like ”my home”, ”kids school” or temporal, like ”my birthday”).
There are however, some exceptions worth mentioning: spatio-temporal location in social networks and
lifelogging.
The integration between spatio-temporal location and social networks is an example regarding per-
sonal data. There has been considerable interest in applying social network analysis methods to geo-
graphically embedded networks [2] such as population migration, international trade, population health
behaviors, information dissemination, or human behavior in a crisis [3] [4] [5].
Lifelogging, a practice where people track personal data generated by their own behavioral activities
(like exercising, sleeping, and eating), is a subject under expansion. Many people have interest in it -
there are even people who love this subject and created a website to show their own personal logging
information. But this only shows that there is not a global solution users can benefit from. Those who
can, implement their ad-hoc solution, which is many times only valid for a specific geographical situation.
Furthermore, these solutions tend to ignore the results of research in the GIS area.
1
Personal spatio-temporal data involves geographical space, time and human behavior. Several chal-
lenges arise from this complexity. However, it also enables the use of such data for different purposes:
to study the properties of space and places, to understand the dynamics of personal events, to recreate
and find patterns in human behavior and so on [6]. A negative side of collecting personal individual data
is the growing threat to personal privacy [7] [8].
So on the one side it is very easy and common to collect spatio-temporal data. On the other it is
difficult to analyze this data and extract things with personal relevance. Most importantly, there is a lack
of tools that are capable of visualizing and querying personal geo-temporal information.
Based on these tools, users could find information and standards that really are personally important
to them. For example, one of the main benefits for users, is the ability to have insights about their
personal location data. Users can query about places they have been, find out the places where they
spent more time or where they go more often. Another feature that differentiates this solution is the
possibility to query information regarding a users’ own semantic meaning, namely, one could ask about
how often he went to his mom’s house. By using these insights, a user could also be more conscious
about traveling decisions. One could analyze the time wasted during rush hour, compared to the time
wasted at a time of less traffic and use this information to optimize future decisions. Another interesting
outcome is the ability to optimize traveling distances by comparing similar routes a user does to get to
the sample place - by doing this analysis a user could save money and fuel.
Since a simple and specific tool, for which there is no need for users to have expert level knowledge,
is missing, we propose to fill that gap. Our proposal is a tool that is able to query personal geolocation
data in order to find specific routes, and including the personal semantics of the locations. In order to
achieve that, we must first identify the different data inputs we need to process and avoid the direct use
of SQL language in the querying system, so the user can deal with it with ease and, instead, create a
visual language. After that, we will create a system based on that language and test it with users.
In order to be easily understandable we believe a query must be visually represented. We will study
how to create a visual language that has full support in querying geo-temporal data and also supports
user assigned semantics. There is also an obvious relation between geo-temporal data and maps - so
we came up with the idea to integrate maps with the query system, allowing the user to both directly
select locations and see the query results in a map. Because the language must be understood and
used by users, we have to validate it with users before proceeding into the actual implementation.
After having the visual language definition, we also need to know how can we offer the user a good
interface for the incoming results of a query. Since the results can extend for a large period of time,
a user must be able to quickly understand an overview of all the data. We will, therefore, study how
to present the results. To do both result presentation and query language definition we will extensively
review related literature, so that we can perceive existing patterns and ideas that may be applicable to
our solution, as well as existing problems faced by solutions dealing with geo-spatial data.
Lastly, we evaluate the system as a whole to understand to what extent is it easy to use. There is
particular interest in evaluating how users deal with the query interface, because it is the most relevant
part of the solution.
2
1.1 Objectives
We are only focusing on showing the data of one person - this is not a tool related to crowd-sourcing.
Despite being the data of only one person, this data can be extensive and span many years of life. There
is also no concern on how the user gathers his/her data. One of the things a user should be able to do is
to have insight of its personal mobility information and perceive patterns in the daily life. Thus, our main
objective is:
Create a visual query language for personal mobility data that allows users tovisually query their personal spatio-temporal information, with personal seman-tics.
Several secondary goals arise from the analysis of the main goal. We will discuss those goals below.
One of the objectives is to analyze and study the expressiveness of the queries we want our
system to support. It is essential to understand what are the informations a user might want to know
about mobility data and then translate the need into a systematized document. We will also need to
analyze the best way to visually represent the resulting information (this is the information a user
wants to see about a specific track - duration, average speed, etc.). A challenge is choosing the best
way to represent this information: graphic bar, circular graphic, etc. And, perhaps the most important
issue, is to analyze the best way to visually query the information, that is, the query interface. It is
important that this interface has enough expressiveness for a non-expert user, and that no SQL queries
have to be written.
Since we’re dealing with personal data, we also have the concern of privacy. There is the need to
identify which information can be extracted from various types of data [8] and privacy-preserve that data.
1.2 Contributions
Below we enumerate the main contributions of our work.
• Visual language expressiveness: a table showing all the studied outcomes when dealing with
personal spatial and temporal data, regarding accuracy, relativity and concreteness.
• Visual language specification: a written specification of a visual language to query personal spatio
temporal data based on the previously mentioned table.
• Personal spatio temporal visual query system: A system (browser based) that implements the
visual language described above. This system also includes viewing the results. It is available at
GitHub1
• GPX Library to organize and clean GPX files: A library made to process input data, in order to
increase data quality. It is open source and available at GitHub2
1https://github.com/jmsfilipe/where-have-i-been-frontend accessed: 09-09-20152https://github.com/jmsfilipe/where-have-i-been-lib accessed: 09-09-2015
3
• A paper for the Special Issue ” The examined life: Personal uses for personal data”, for the HCI
Journal
1.3 Structure
In the next chapter we will discuss several published works regarding three different areas: Visualization
of information, specifying queries and conceptual frameworks. Then, in chapter 3, we will analyze known
problems regarding data collection and address ways to solve them. In chapter 4 we describe our visual
language, stating its expressiveness and giving examples of its usage and the validation results. Chapter
5 describes the system implementation, concerning the backend and frontend. In chapter 6 we present
the results of evaluating our solution with users. Finally, we conclude in Chapter 7 with a brief summary
of our work and suggesting some guidelines for future research.
4
Chapter 2
Related Work
In this section we will analyze existing works regarding visualization (of personal and other data), speci-
fying queries and conceptual frameworks. This section is divided in three subsections. In Visualization
we will discuss works that aim to visualize personal geo-temporal data. Then in Specifying Query sec-
tion, relevant works that focus on visual queries for spatial-temporal data will be analyzed. Finally, in the
last section, Conceptual Frameworks, we will show some frameworks that help the task of providing
an abstraction when manipulating spatio and temporal data.
2.1 Visualization
This works aims to provide an interface where users can view their personal mobility data. Thus, in
this section we will see applications that provides a means of visualizing information that is somehow,
relevant for this work.
2.1.1 Visits
Many apps and services exist that collect location data either continuously or check-in based. How-
ever, this data does not reflect the way people remember their trips. Human memory captures trips as
narrative-like sequences of events.
Visits 1 creates a visualization of automatically collected spatio-temporal data that reflects current
knowledge about how people naturally remember autobiographical episodes, such as their journeys [9].
They developed map-timelines, a visualization technique that integrates temporal and spatial infor-
mation to display histories of trips as a series of visited places.
This concept can also be applied to other types of spatial-temporal data, such as historical records
of famous journeys, or city development.
The goal of this approach is to support the identification of: the chronological order of stays; repeated
stays at the same place and the duration of stays while preserving the fine-grained location information
of the data.1http://v.isits.in/ accessed: 14-10-2015
5
The interface consists of two items, the centrally placed horizontal map-timeline, and the overview
map in the lower left corner (figure 2.1).
The map regions - showing visited places - that are visible in the map timeline above are also shown
as circles in the overview map. Curves connect these overview circles with the corresponding segment
in the map timeline.
The data to be analyzed can come from three different sources: Flickr Collections, Google Maps
history, and OpenPaths2 history.
Figure 2.1: Visits interface
One of the advantages of Visits is the way information is represented. By using a timeline, events
are chronologically ordered and are easy for a user to locate in time. A major drawback is the problem
of scalability: if the amount of events to be shown is too large, the timeline will become too big, and the
circles would become too small to be visible.
2.1.2 AllAboard
With the high level of penetration of cell phones it is now possible to store samples that are several
orders of magnitude larger than manual surveys. AllAboard aims to aid city authorities in exploring urban
mobility and in optimizing public transport by collecting cell phone data [10]. The data was provided by
a cell phone operator in Ivory Coast: it is based on exchanged SMS - the dataset tuples contain the
following: <UserID, Day, Time, Antenna>. The main component is the antenna which provides the
means to find the GPS coordinates, thus allowing a spatial location.
Like many other application, AllAboard has to assure it’s data quality. They do so by in a component
which processes mobile phone location data and by applying a set of algorithms, extract information on
users’ stops, origin/destination flows, and shared route patterns.
AllAboard provides a user interface that allows the operator to explore mobility data, validate them
and evaluate different optimization strategies by interacting with maps, density heat maps and other
2https://openpaths.cc/ accessed: 19-10-2014
6
visual representations.
Figure 2.2: AllAboard panel
The operator can select the time interval to an-
alyze. A line chart is presented where the x-axis
represents time. Due to the large scale data, it
is possible to select the desired time interval us-
ing the time bar (figure 2.2). The flow of peo-
ple between antennas is then shown on the map.
Darker arrows represent a larger amount of trips
between the antennas.
This system has a spatio-temporal focus, how-
ever, the temporal side as a minimal role, just allowing for a simple time filtering. Furthermore the only
personal data concern in this project is to receive anonymized user data. Also, the visualization chal-
lenge, regarding the flow between antennas, could have a better implementation than just changing the
arrow color.
2.1.3 QS Spiral
QS Spiral is an Android Application that allows the visualization of periodic properties of quantified self
data [11].
Figure 2.3: QS Spiral interactive visual-
ization
The software continuously acquires and logs data from
sensors (GPS and WiFi). From this data, locations are ex-
tracted to be shown to the user.
In figure 2.3 a collection of geocaptured data is shown.
The top of the screen shows a tag cloud, specifying the most
visited locations in colors corresponding to the colors shown
in the spiral. The bottom of the display contains time series
of places with map segments showing the places visited.
When a user taps a visited place, the corresponding places
will be highlighted in the spiral.
The spiral visualization allows for repetitive data devia-
tions to stand out clearly, also events with similar periods
are aligned in similar arcs of the spiral. This allows the user
to explore the existing patterns at a glance.
Since it is a mobile application, many alternative ways,
comparing to a standard web application can be used, such
as: pinching for zooming into a particular part of the data, or
swipe to allow to pan.
This interface provides the means to answer cyclic tem-
poral queries, such as recurring temporal events like ”every Monday”.
7
Limitations of this system are mostly related to scalability. An extended log of data would translate
into a messy spiral, in which a user could not tap the desired part.
2.1.4 ST-TrajVis
ST-TrajVis is an application for the visualization of movement data [12]. It’s main visualization focus are
2D maps and space-time cube.
The data used in the project consists of a subset of another project’s dataset. Each entry is rep-
resented as a sequence of time-stamped points, containing the information of latitude, longitude and
altitude.
However, it does not use both representations separately, it combines them into the same visualiza-
tion. This way, they combine the 2D map focus on spatial information with the space-time cube focus
on temporal information (figure 2.4). The representations are linked - when one point is selected, both
visualizations highlight that point, providing additional information about it.
Figure 2.4: ST-TrajVis interface
It allows for data querying through filtering according to spatial and/or temporal properties: the defi-
nition of a geographical area, the start and end dates of search and the period of hours to be visualized.
It also allows for data enhancement, including the representation of speed, the trajectory’s recency
and smoothing of the trajectories.
Recently, a study suggested that users generally prefer static maps over space-time cubes [13].
Space time cubes present occlusion problems (overlapping data) dealing with several tracks, that may
cause misunderstanding of the data.
2.1.5 LifeLines
Lifelines is a general visualization interface [14]. It was one of the first applications representing personal
information in a dynamic way, serving as inspiration for many current works. Initially conceived to register
8
personal data regarding youth justice records, it is also possible to represent several other types of
personal information.
Figure 2.5: LifeLines interface
In the case of juvenile justice, when a com-
plaint is received, usually from the police, the
workers have to find the current status and pre-
vious crime history. They propose to do so in one
screen only. They used an existing dataset con-
taining youth records of the Maryland Department
of Juveline Justice.
Despite this work’s focus not being spatial nor
temporal, it has some interesting ideas regarding
personal information visualization. They propose
a timeline for each person (figure 2.5) - varying
status are displayed as horizontal lines and dis-
crete events are represented as icons. Each offense is represented in that timeline. Line thickness and
color are used to indicate the severity of the offense. Relationships between periods or events on the
line can be highlighted.
They also include a set of techniques to allow the information to be represented in one screen only -
without scrolling vertically or horizontally. By disabling scrolling they assure users see all the information
and do not forget any result.
The visualization environment is not computationally demanding and can handle a variety of records.
Its a personal record format that can be exchanged or synchronized between multiple services, thus
making it scalable.
2.1.6 Microsoft GeoFlow
Figure 2.6: GeoFlow visualization of U.S. power stations
9
GeoFlow3 (figure 2.6) originated in Microsoft Research.
This tool is an Excel module, and it allows to:
1. Map Data: Plot rows of data from an Excel workbook, including the Excel Data Model or Power-
Pivot, in 3D on Bing maps. Visualization modes include columns, heat maps, and bubble visual-
izations.
2. Discover Insights: Discover new insights by seeing data in geographic space and seeing time-
stamped data change over time.
3. Share Stories: Capture scenes and build cinematic, guided tours that can be shared broadly,
engaging audiences.
In the map data feature, it allows to plot more than a million rows of data.
2.1.7 Geotime
Figure 2.7: Screenshot of GeoTime prototype in
calendar mode (Linked time chart) showing recent
events within a smaller local area
Contrary to LifeLines, which only allows the dis-
play of events in the single dimension of time,
GeoTime, as the name suggests, aims to display
geographical and temporal information. They do
so by taking advantage of three dimensional com-
puter graphics [15].
The visualization concept of GeoTime are
space-time tracks. They represent a stream of
time through a particular location and are repre-
sented as a literal line in space (figure 2.7).
Each unique point of interest (location) will
have one spatial timeline. Events that occur on
that location are arranged along the timeline.
There are three variations of Spatial Timelines
that emphasize spatial and temporal qualities to
varying extents. Each variation is an increase of
salience of time over geography. These are 3-D
Z axis Timelines, 3-D viewer facing Timelines and
linked time chart Timelines.
1. 3-D Z axis Timelines: 3-D Timelines are oriented normal to the terrain view plane. This method
places more emphasis on the geographical view.
2. 3-D viewer facing Timelines: similar to 3-D Timelines except that they rotate about the instant of
focus point so that they always remain perpendicular to the viewpoint from which the scene is
rendered.3http://research.microsoft.com/en-us/news/features/geoflow_data_viz-041113.aspx/ accessed: 27-09-2014
10
3. Linked time chart Timelines: connect a 2-D grid in screen space to locations marked in the 3-D
terrain representation. More emphasis is placed on the time view (figure 2.7).
This work, however does not scale very well. With hundreds of events on screen, there is the need
to see through a dense display of objects and labels.
2.1.8 Visual Mobility
Visual Mobility is a visual exploration tool for multi-modal personal mobility information that provides a
flexible filtering interface and contextual visualizations that try to extract meaningful mobility patterns
[16].
One of the objectives is to make users more ecologically aware, by analyzing their mobility patterns.
To create the tool, real mobility datasets were used - a simple data collection tool was created by the
author to help users record their data.
Regarding data collection, several issues had to be addressed:
1. GPS accuracy: Accuracy errors are the most common type of problem. It causes misrepresenta-
tion of the actual route taken. It also provokes excessive positioning wandering when in a stationary
position.
2. Cold start/end: Forgetting to turn on/off data collection leads to wrong detection of trips.
3. Battery requirement: GPS data collection devices have a high battery requirement. Usually results
in data loss: users abandon collecting data, gather lower quality data (less samples to save battery)
or incomplete data (battery drained).
4. Privacy: Users can abandon data collection due to privacy concerns. They are not keen on giving
public availability to their daily life routine. However, in an infrequent scenario like a trip, users like
to share it.
5. User’s burden: Mobility data collection is more than recording tracks. Raw data has to be pro-
cessed and this work can be burdensome, depending on the tool.
Due to the impossibility to focus on an automatic data collection process, these problems had to be
mitigated.
11
Figure 2.8: Visual Mobility UI
Focusing on the visual exploration component, the UI is composed of two main areas, the Canvas
and the Sidepanel.
The Sidepanel contains several scented widgets that allow users to filter data:
1. Slider Controller: to set a minimum and maximum value for an information space (for example, the
distance traveled during each hour of the day).
2. Toggle Controller: toggle discrete portions of the information space (for example, time spent in
each transport mode).
3. Tag List with Autocomplete: help users select known Start and End locations of a trip. The results
are ordered by the number of check-ins.
The Canvas area displays the visualizations users want to explore.
1. Map view: visualization of the tracks filtered using the sidepanel. It also provides layers that can
be activated (heatmap).
2. Scatter Plot: Map view allows information contextualization, however, it does not allow individual
track dynamics. In this visualization, it is possible to assign a specific track attribute to one of four
possible scales.
3. Locations Relationships: contextualizes location check-ins in an overall perspective.
2.1.9 Visual analytics of movement: an overview of methods, tools and proce-
dures
This paper surveys relevant work and divides it in four categories [17]. Here we will focus on just one
category: Looking at trajectories.
12
1. Looking at trajectories: The focus is on moving objects. It supports the exploration of spatial
and temporal properties of individual trajectories and comparison of multiple trajectories. Three
essential areas are addressed: visualization, clustering and time transformations.
(a) Visualizing trajectories: The most common types of display for the visualization of movements
of discrete entities are static and animated maps and interactive space-time cubes.
(b) Clustering trajectories: Clustering is a popular technique used in visual analytics for han-
dling large amounts of data. Usually existing clustering methods are wrapped in interactive
visual interfaces supporting not only visual inspection but often also interactive refinement of
clustering results.
(c) Transforming times in trajectories: Comparison of dynamic properties of trajectories using
space time cube, time graph, or other temporal displays is difficult when the trajectories are
distant in time because their representations are located far from each other in a display.
This problem can be solved or alleviated by transforming times in trajectories. Two classes of
transformations are suggested:
i. Transformations that reflect the cyclic nature of time. Depending on the data and appli-
cation, trajectories can be projected in time to a single year / season / month / week / day
etc. This allows the user to uncover and study movement patterns related to temporal
cycles, e.g., find typical routes taken in the morning and see their differences from the
routes taken in the evening.
ii. Transformations with respect to the individual lifelines of trajectories. Thus, trajectories
can be shifted in time to a common start time or a common end time. This facilitates
the comparison of dynamic properties of the trajectories (particularly, spatially similar
trajectories), for example, the dynamics of the speed. Aligning both the start and end
times supports comparison of internal dynamics in trajectories irrespective of the aver-
age movement speed. Particularly, movement patterns of fast and slow movers can be
compared in this way.
2. Looking inside trajectories: consider methods that operate on the level of points and segments
of trajectories.
3. Bird’s-eye view on movement: generalization and aggregation are used to uncover spatio-
temporal patterns.
4. Movement in context: focus on relations and interactions between moving objects and the envi-
ronment. Including various kinds of spatial and temporal objects, as well as phenomena (weather).
13
2.1.10 AprilZero Sport
This is a website created by Anand Sharma 4. It started as a place where he could centralise all
his personal data: heartbeat rate, blood level, running tracks, moving data captured by the Moves5
application, body fat, Foursquare, Instagram and Facebook personal data. Different types of information
are shown, from the heart rate to the transportation modes used. This website provides a visualization
for every day of the month: it is possible to know where he was and where he went to in each hour of the
day. (figure 2.9). The colored icons represent the place where he was, and the horizontal bar spawns
from the start to the end of the stay in that place.
Figure 2.9: AprilZero Sport interface
2.1.11 Google Maps Timeline
Google timeline6 7 uses Google Maps, Google Photos and the user’s Location History. It allows to view
the places a user has been on a given day, month or year. It is private and only visible to the user. It is
possible to control the locations to keep - this means it is possible to easily delete a day or the full history
at any time. It also supports semantics since a user can edit any place that appears, including removing
a specific location or giving a frequented spot a private name like - Moms House or My Favorite Running
Spot.
Its interface comprises of a map, in which locations and routes are shown, and a summary, consisting
of a vertical timeline representing visited places.
A downside of this application is related to the data quality present in routes. Since Location History
is not continuously tracking a user’s position, when traveling, GPS information will consist of points both
spatially and temporally separated, thus not giving a full comprehension of which route the user took.
An example of this problem can be viewed in figure 2.10 - on the map (top part) there is a big straight
line from one point to another, in which there is no straight road to fit it. No conclusions about which
route was travelled by the user can be inferred in this case.
4http://aprilzero.com/sport/ accessed: 06-10-20145https://www.moves-app.com/ accessed: 07-10-20146http://google-latlong.blogspot.pt/2015/07/your-timeline-revisiting-world-that.html accessed: 07-08-20157https://www.google.com/maps/timeline accessed: 07-08-2015
14
Figure 2.10: Google Maps Timeline interface
2.1.12 TrajRank: Exploring Travel Behaviour on a Route by Trajectory Ranking
Figure 2.11: TrajRank interface
TrajRank provides an interactive visual analytic method for taxi travel behaviour exploration of a route[18].
It takes taxi GPS data and road network data as input. In the offline pre-processing stage, GPS trajec-
tories are cleaned and matched to road network. On each road segment, the travel time of different
trajectories are clustered into groups and these groups are ranked by average travel time in ascending
order. The ranking is visualized and supports interactions for further exploration.
The interface consists of four views: a spatial-temporal view, a horizon graph view, a ranking view
and a menu panel. In the spatial-temporal view, users interactively define spatial-temporal filters and
configure of route segmentation. The horizon graph view displays temporal distribution of selected
15
trajectories over a day. The ranking view supports trajectory ranking analysis. It consists of three
components: a ranking diagram, an occurrence temporal distribution view and a modified box-plot.
The ranking diagram visualizes trajectories ranking over road segments. The temporal distribution view
displays the distribution of trajectory groups with respect to occurrence time. The modified box-plot gives
a statistical description of travel time on each road segment. Such statistics are also shown in the spatial
view and encoded by the width of each road segment band.
2.1.13 Generic mapping applications and tools
Many mobile applications now allow for activity and location tracking. We will focus on a specific applica-
tion, Moves, because this application has a large set of connected visualization tools that are interesting
to explore 8. This tools work with the Moves API and require access to the data stored by the Moves
application.
• Fluxtream 9 - is set up as an aggregation and visualization tool. It is possible to map very different
data sets, including where you were tweeting last week.
• Resvan Maps 10 - It plots your places, paths, and categorize paths depending on the activity
(transport, walking, running, and cycling). Additionally, it is possible to create analysis circles and
have the application compute the time you spent in a certain location you bound.
• MMapper 11 - is a more technical application (requiring setup). It allows to visualize where peo-
ple spend most of their time in an interactive way. There is one particularly interesting work in
Quantified Bob - Visualizing 2 weeks of Passive Location Tracking12 that uses this application.
• Move-O-Scope 13 - possible to explore maps by activity type, day of the week, and custom data
ranges. Also possible to see how many times a specific place was visited, where you come from
and where you go next, what days you typically visit, and your typical time of day at that place.
2.1.14 Discussion
1) Visits 2) AllAboard 3) QSSpiral 4) ST-TrajVis 5) LifeLines 6) GeoFlow 7) GeoTime 8) Visual Mobility 9) AprilZero Sport 10) Google Timeline 11) TrajRankScalability no yes no some yes yes no some yes yes yesPersonal semantics no no no no no no no yes yes yes noColected Data
Other apps GPS Wi-Fi, GPS
Existing dataset
Existing dataset MS Excel
Existing dataset GPX tracks Other apps, sensors GPS, Wi-Fi GPX tracks
Visualization method Timelinemap + graphics Spiral
map + spacetime cube Timeline map 3D timeline map + graphics Timeline Timeline Timeline
Track simplification no no no yes NA NA no yes no yes (too much) noError removal no yes no yes NA NA no yes no no no
Table 2.1: Visualization Summary
8http://quantifiedself.com/2014/03/map-moves-data/ accessed: 07-10-20149https://fluxtream.org/ accessed: 07-10-2014
10http://resvan.com/map/ accessed: 07-10-201411https://github.com/feltron/MMapper accessed: 07-10-201412http://www.quantifiedbob.com/2013/09/what-moves-me-visualizing-2-weeks-of-passive-location-tracking/
accessed: 06-10-201413https://move-o-scope.halftone.co/ accessed: 07-10-2014
16
Table 2.1 shows a summary of the most important aspects of the works reviewed in this section.
We will not evaluate usability issues. Despite the increasing number of visualization studies, usability
has been, somewhat, neglected [19]. It is still unclear how usable and useful these techniques are, how
can they be improved and in which tasks they should be used [20]. Keeping that in mind, we are not
evaluating the usability of the studied works - we are analyzing relevant features of those works.
The first row analyses scalability - this is a long-lasting challenge for information visualization [19]
[21]. The second row refers to whether or not the information shown has personal meaning to the user
(eg. my house, my work). The third row shows how data was collected. The fourth lists the kind of
visualization method used. The fifth shows if the work applies some kind of track simplification (dividing
recorded tracks into trips, etc). The last rows specifies if there is any error removal mechanism (GPS
accuracy, cold start/end, etc.)
Regarding scalability, our work requires that a huge amount of data (potentially a person’s whole life)
can be queried and visualized with ease. Not many works allow this - most of them rely on importing
data that spawns roughly a week. More than that time and the interface would become clogged by too
much information for a user to understand. The most notable exception is 9) - this work’s focus is on a
person’s whole life, and therefore allows to display the information in a clear and understandable way.
The field of personal semantics, that is, the support for locations and times that only make sense
for the user (my house, my work, mom’s house, lunch time, etc.) is only supported by two works (8)
and 9)). As stated in the Objectives section, our work aims to allow the user to query his/her personal
information. This information has concepts that only make sense in the scope of that particular user
(mom’s birthday, my house location), thus it is important to support this feature.
Regarding the visualizations methods, we have chosen not to include any works that use automatic
animation, because the widely spread opinion is that it fails to be more helpful than other visualization
types [22] [23]. Many usual techniques are used, such as 2D maps and space time cubes. There is
however, one different and interesting way of representing the information - in a timeline (1), 5), 7) and
9)) - that seems to work well.
Another important feature is the data quality and cleaning issue. It is important, for a successful
visualization experience, that the shown data is previously analyzed and divided in parts that make
sense for the current context (eg. dividing a big GPS track into several tracks that correspond to a
trip). It is also important to assure that the information is real - when working with devices that gather
GPS signals, it is often common to get many types of errors, including [16]: GPS accuracy (causing
misrepresentation of the actual route taken) and Cold start/end (when the persons forgets to turn on/off
the data collection - it leads to wrong detection of trip ends). Despite being an essential feature, very
few works worry about dealing with it. 8) even releases a tool to process the collected data.
To summarize, the most important topics in geo-temporal visualization our solution will address are:
• Scalability (the support to potentially show a person’s whole life data) of the solution;
• Support of personal semantics while querying the data (it is one of the main goals);
• Focus on data quality - the tracks must be pre-processed for error-removal and simplification.
17
2.2 Specifying Queries
In this section we will present relevant works regarding visual querying.
2.2.1 Cigales, Sketch! and Lvis
Figure 2.12: Cigales interface and query visualiza-
tion
Cigales language [24] allows to express a visual
spatial query by means of a composition of static
icons.
The interface contains two windows: one to
express an elemental query and the other to sum-
marize the final query. An elementary query is
built by defining the operands at first, then clicking
on the spatial operator icon. The system answers
by displaying the iconic composition.
For example, the query Which towns are bor-
dered by forest? is represented in figure 2.12.
Figure 2.13: LVis evolution operators between sur-
facic objects
Sketch! [25] gives the user greater freedom. A
spatial query is expressed by drawing a sketch on
the screen, which is later interpreted by the sys-
tem. This means that the operators are directly
derived from the sketch, not chosen by the user.
Lvis is an extension of the Cigales language
[26]. Spatio-temporal queries involve at least one
spatial criterion and one temporal criterion. The
spatial operators are: intersection, inclusion, ad-
jacency, disjunction and equality. While the temporal operators are based on [27]. The conjunction of
both (a spatio-temporal query) originates six different operators (see figure 2.13).
2.2.2 The Challenges of Specifying Intervals and Absences in Temporal Queries:
A Graphical Language Approach (TQ:AGLA)
This work creates a temporal query interface with particular focus on two event sequences: intervals
and the absence of an event [28]. It aims to determine the central difficulties of query specification and
alleviates these difficulties through graphical interaction.
The supported events are: point event, absence of point event, compacted interval event, expanded
interval event and absence of interval event (figure 2.14).
The first difficulty is specifying intervals as point events: users would attempt to specify a query for
an interval event as two separate point events (one for start, the other for end). A second difficulty is
specifying absence as non-presence. Users often assumed that by not specifying the presence of an
18
event, they were, by implication, specifying its absence. Another is making users understand they can
access ”does not occur” relationships.
Figure 2.14: Query representing ”stroke occurs dur-
ing Drug A”
Complex questions may require complex
queries, however, many times, a simpler ap-
proach to answer the same question is available
using ”does not occur”. The last difficulty is un-
derstanding the logic of absences. Specifying
queries using absences is counter intuitive. Fur-
thermore, users are unaware of the different types
of absences.
All this recurrent difficulties must be handled
by the interface and are part of the learning pro-
cess. The first difficulty, for example, is solved by
forcing the users to specify the full interval (not
just the start or the end).
While the scope of temporal query is very broad, this work focuses is only on intervals (with start and
end time) and the absence of an event.
2.2.3 VizPattern
Figure 2.15: VizPattern workspace
VizPattern is an interactive visual query environ-
ment [29]. It has no spatial focus - only temporal.
It uses a comic strip metaphor to enable users to
define and locate complex temporal patterns.
The comic strip metaphor has been previously
used by the same authors in QueryMarvel [30].
Their study concluded comic strips are easy to
learn and understand by the users. Comparing
to traditional ways of querying (form based), users
were more effectively able to translate queries into
a comic based representation and extract the ac-
curate meaning.
The interface is divided into two panels. The
upper is the comic strip editor, where users compose their queries, and the other is the results panel
where answers are displayed (figure 2.15).
A comic strip consists of a series of panels laid out along an invisible time line. Each panel can
represent one event.
The temporal events supported by VizPattern are:
1. Event B happens after event A - is represented as panel B followed by panel A.
19
2. Event A and B happen at the same time - A and B are represented in the same panel.
3. Event A happens after B with specified time interval - a text specifying the time is shown on the
upper part of the panel.
4. Event happens at a specific time - a clock on the upper right corner of a panel is shown.
5. Event happens in an absolute time range - represented using a combination of the above methods.
They also provide the meanings to manipulate results in order to refine queries: users can edit
undesired results to form queries that return desired results.
Although VizPattern focuses on medical data, it can be extended to many other areas.
2.2.4 TaxiVis
Figure 2.16: TaxiVis workspace
TaxiVis operates on an important urban data set:
taxi trips. It has a geographical and temporal fo-
cus in addition to multiple variables associated
with each trip.
It proposes a new visual query model that
supports complex spatio-temporal queries over
origin-destination data. Users formulate queries
visually by interacting with maps and other visual
representations. They can iteratively refine their
queries through direct manipulation of the results.
Temporal constraints are defined using a wid-
get (figure 2.16 A).
Spatial constraints are specified by polygons
and arrows on the map. The user can also link
two regions to form a directional constraint (figure
2.16 B C). Maps are also used to display query results.
The basic visual representation of the results is a point cloud: each trip is represented by a pair of
points denoting the pickup and dropoff locations. For a small number of trips, it gives a quick insight,
however, as the number of trips increases, it gets cluttered very quickly. A set of alternative visualizations
are provided to the user, including: an adaptive level of detail strategy to reduce the number of rendered
points, and a heat map visualization to show the distribution of pickups/dropoffs in one area
2.2.5 GeoDec
One of the main design goals of GeoDec is the provision of an immersive environment that enables
users to interact with GeoDec and perform a wide range of spatiotemporal queries intuitively. [31] [32].
The visualization interface for GeoDec allows users to navigate and interactively query the 3D envi-
ronment in real-time.
20
The main elements of the GUI are as follows (figure 2.17): 1) 3D rendering of the geolocation with
superimposed interactive visualizations of query results; 2) query creation panel with available data
types, query types and spatial and temporal bounds; 3) query result layers panel where issued queries
are presented and 4) an advanced time-line that enables users to pan and zoom in time and define time
ranges.
One of the most important features of GeoDec is its querying capability. The formulation of queries
involves determining five building blocks, name Data Types, Query Type, Spatial Bounds, Temporal
Bounds, and optionally Query Scheduling. When creating a query, users select each one of the building
blocks in turn.
First, the Data Types is determined, which can be done (or any combinations of) object(s) or infor-
mation the user is interested in (e.g., road networks, buildings).
Next, Query Types will be specified, which can be a Fetch (Range) Query, Nearest Neighbor Query,
Shortest Path or Visibility Query.
Third, the user will formulate the Spatial Bound, four modes of spatial bound are supported: 1)
Circular bound; 2) Two Points bound; 3) Rectangular bound and 4) Trajectory bound.
Fourth, the Temporal Bound of the query is set as the From Time and To Time, which means all the
returning objects should have a valid lifetime overlapping this temporal bound.
Figure 2.17: The user interface of GeoDec detailing its important elements: 1) rendering area, 2) query
creation panel, 3) query results panel and 4) temporal navigation. 5) and 6) show magnified renderings
of 3) and 4) respectively.
21
2.2.6 Time Automaton - a visual mechanism for temporal querying
Figure 2.18: Spatial visualization of bus stops’ activ-
ity in a weekend
Time Automaton [33] is a visual temporal querying
mechanism that is capable of formulating differ-
ent types of temporal queries, including complex
ones. It is specially useful for queries involving
sequential patterns like for example ”every sec-
ond Monday” or ”every other Monday in every 4th
month in every year”.
The logic of the Time Automaton model is in-
spired on finite-state automata. A query is rep-
resented as a graph defining how an input string,
built from temporal data, is processed.
The mentioned string is a sequence of words
with temporal meaning, in which some of the
words refer to temporal markers and others to temporal data (facts).
In figure 2.18 a resulting query visualization is shown. They query is on the upper part of the image,
and the spatial result on the lower part. This query shows the activity of bus stops’ during a weekend.
2.2.7 Discussion
Table 2.2 shows a summary of the most important aspects regarding queries in the works studied above.
The first works analyzed in this section are merely from a historic perspective, since more recent
works use similar techniques in much more developed way.
The first row specifies if the main objective of the work is to query spatial information, while the
second row specifies if it is to query temporal information. The third row, which is recurrency, refers to
whether or not that work supports recurrent queries. The personal semantics row specifies if queries
include any personal semantic mechanism, and the last row states which kind of metaphor is used when
building the visual query.
There are two types of scope in the analyzed works: Spatial and Temporal. The first one refers to
works that only allow us to query spatial features, either by specifying boundaries or selecting a zone
on a map. The second type of works focus on querying temporal information, either by making use of a
timeline, some widget or some other metaphor.
Some works may make use of both temporal and spatial features, but only allow the querying of one
type. This is the case of 3). It allows the querying of temporal data, and the resulting information is
shown with help of a 2D map.
22
1) VizPattern 2) TaxiVis 3) Time Automaton 4) GeoDec 5) TQ:AGLA 6) ST-TrajVis
Spatial Scope no yes no yes no yes
Temporal Scope yes yes yes yes yes yes
Recurrency no no yes no no no
Personal Semantics no no no no no no
Metaphors comic stripmap interaction/time
widgetfinite automata
spatial bounds/time
widget
timeline
intervals
spatial bounds/time
widget
Table 2.2: Specifying Queries SummaryBesides these two types of scope, we also focus on the support of recurrent queries and the possi-
bility to support queries with personal semantics.
Only 3) allows a very useful thing in our work - the possibility to have recurrent queries. That is, a
user might want to know things like ”show all the places near a shopping mall where I went more than
three times”.
None of the studied works has personal semantics in their query system. Our work is closely related
to personal information because we need to allow the user to include his/her places/times that have a
personal semantic (eg. my house, lunch time).
2.3 Conceptual Frameworks
With the enhancement of spatio-temporal capabilities of GIS systems, many frameworks were developed
with the aim of providing an abstraction while manipulating spatial and temporal movement data.
The first two works analyzed cover basic topological relations. The first is an early, yet updated work,
regarding topological relationships in time.
The second aims to review all the possible spatial relationships.
2.3.1 Towards a general theory of action and time
Figure 2.19: Temporal topological relationships (im-
age from [34], original idea from [27])
Topological relationships define relative locations
along a timeline. It is useful to be able to express
relationships between events viewed as knots or
singularities along the timeline. These relation-
ships are shown on figure 2.19 [27] [34]. Note
that each of the seven basic relative interval rela-
tionships has an inverse.
2.3.2 Topological relationships be-
tween complex spatial objects
From a database and GIS perspective, the development of spatial relationships has been motivated by
the need of formally defined predicates as filter conditions for queries. The resulting large number of
23
predicates in this paper, makes them difficult to handle for the user. To overcome that, they were divided
in five groups [35]:
• relationships between two complex lines
• relationships between two complex regions
• relationships between two complex points and lines
• relationships between a complex point and a region
• relationships between complex lines and a complex region
Figure 2.20: 4 of the 82 topological relationships be-
tween two complex lines
In the scope of our work, the most important
relation is the first, regarding complex lines (fig-
ure 2.20). This work gives importance to the dif-
ference between endpoints and paths, an essen-
tial concept in our work. Endpoints represent the
start/end of a track, and a path represents the way
between the start and end points.
2.3.3 Triad Framework
In a time when GIS systems were geared towards the representation and analysis of situations frozen
in time, J. Peuquet [34] suggested a framework that unified temporal and location aspects.
The framework states information related to where (location-based view), what (object-based view)
and when (time-based view).
The framework permits the user to ask the following questions:
1. when + where ⇒ what Which objects are present at a given location in a given time.
2. when + what ⇒ where The location(s) occupied by an object (or a set of objects) at a given time.
3. where + what ⇒ when The time (or set of times) that a object (or set of objects) occupied a given
location (or set of locations).
2.3.4 Conceptual Framework and Taxonomy of Techniques for Analyzing Move-
ment
In the base of this framework are the three fundamental constituents of movement: space, time and
objects. Movement does not exist without any of these, while there is no other constituent that is essential
[6].
The framework also exhaustively encompasses all possible linkages between the three constituents.
It includes characteristics of objects in terms of locations and times, characteristics of locations in terms
of objects and times and characteristics of times in terms of objects and locations.
24
They divide the visualizations in not aggregated and aggregated movement data.
Not aggregated data allows for 14 different visualizations based on maps, time graphs, temporal bar
charts, map sequences, space-time cubes and space-time graphs.
Aggregated data allows for 17 different visualization techniques using maps, temporal display, chart
map (map with embedded charts), time graph, space-time graph, space-time cube, map sequence,
transition matrix, flow map, flow map sequence, sequence of transition matrices, transition matrices with
embedded charts, among others.
2.3.5 Visual Analytics for Analysis of Movement Data
This framework considers generic tasks that arise in the analysis of movement data [36]:
1. Data pre-processing: Enrich the data with additional fields; filter sequences corresponding to
absence of movement; etc.
2. Extraction of significant places: usually time spent in a place indicates the significance of the
place. To interpret this places and recognize whether they are significant, an analyst may overlay
them on a map and look for objects situated nearby.
3. Extraction of trips: a sequence of GPS records needs to be partitioned into sub-sequences
corresponding to trips.
4. Examination of trips
(a) Viewing individual trips: Individual trips can only be viewed using an interactive time filter
(figure 2.21), otherwise overlapping trips would be represented.
(b) Clustering of trips: Trips may be similar in a variety of ways, either by coinciding fully or
partially in space, by having similar shapes, common start and/or end points, etc. It is useful
to have a tool that allows the user to choose a similarity function to group related trips.
(c) Summarization of trips: Representing trips by lines does not give an accurate measure of
frequency. A method that allows representing multiple trips in a generalized and summarized
way consists in drawing arrows (which show the movement directions) and the thickness of
the arrow is proportional to the number of moves.
Figure 2.21: The interface of the interactive time fil-
ter
This interface of temporal querying (figure
2.21) is designed for interval querying only. Re-
current expressions such as ”every Monday” are
not possible to formulate. Another limitation is the
timeline - it does not allow to display very large
time intervals. Time properties (days, months,
etc) are also not represented, meaning that inter-
val queries such as ”from Monday to Thursday” are not formulable.
25
2.3.6 STNexus
STNexus is a framework that supports space-time database and information visualization capabilities
[37].
It is designed around a modular architecture in order to maximize flexibility and allow a wide range
of possible scenarios.
It supports analysis of a combination of different types of data - maps, images, statistics - and raw
data about people and objects in space and time. By analyzing and inferring the data, it is possible to
assemble evidence and assess potential threats in a military and security context.
The prototype consists of a database, visualization and knowledge acquisition components.
In the scope of this work, the main focus are the database, query and visualization components.
• Database - uses Secondo14, an extendable open source database engine.
• Queries - uses GeoTools15, an open source library that provides an abstraction layer between data
storage (spatial database) and usage (visualization).
• Visualization - uses GeoVISTA Studio16, which provides a suite of components for quickly building
geovisualization applications.
2.3.7 Discussion
In 2.3.1 we reference one of the most well known works that specifies temporal relations: it allows
for seven different ones. As our work aims to allow a user to query its personal data, many temporal
relations studied are not valid in this context because, as it is obvious, a person cannot be in two places
at the same time - the only important relations for us are X before Y and X meets Y.
The two main concepts of our solution are spatial and temporal queries. Personal information is
not considered a query type, because it will be included in the spatial and temporal strands. It is also
important to allow recurrency in our system - people want to ask things like ”show all the places near a
shopping mall where I went more than three times”.
Regarding the spatial scope, an emergent pattern that arises from the analysis of 2.3.2 is the dif-
ference between endpoints and paths. Both concepts are very different: the endpoints are what is
commonly referred to the end and start of the trip. The path is the way that connects the start to the end.
Our solution also relies on this approach: while formulating the spatial part of a query, it is important
to distinguish the endpoints from the path because there are attributes that an endpoint can have and a
path cannot.
14http://dna.fernuni-hagen.de/secondo/ accessed: 08-10-201415http://www.geotools.org/ accessed: 08-10-201416http://www.geovistastudio.psu.edu/ accessed: 08-10-2014
26
2.4 List of requirements
Here we will present the requirements of our work, based on the previous analyzed works and gathered
information.
• Visual query system - In section 2.1 we analyzed a variety of works relying on visual query
interfaces. Our work will also rely on them, with the main purpose to hide query complexity from
the user and to make the user more at ease with the interface. Below we describe the the query
types our system will support.
• Temporal query - As seen in the reviewed works and as Adrienko et al. [8] states, more attention
to time and users should be given. Geo-visual tools should be more temporal and should be de-
veloped to be used by different types of users, not only by those who possess advanced computer
competences. We will need to introduce time as part of our solution in order to specify the temporal
spawn, that is, the duration of the analyzed interval. Queries should have enough expressiveness
to understand which routes correspond to the time specified by the user.
• Spatial query - Supported queries will also have, as a natural consequence of manipulation GPS
data, a spatial scope. As explained in 2.3.2, spatial queries can be divided in two categories:
endpoints - locations that represent the start/end of a route; and paths - that are the route used
between the start and final location. Spatial queries should have enough expressiveness to trans-
late the description of the user into a real place in the map.
• Recurrent query - As introduced by 2.1.3, we can also consider a special type of query: recurrent
ones. They emerge from the need to ask things like ”Where did I go about three times, in the first
week of November?”.
• Semantics query - One of the goals specified in the beginning of this report is the support for
personal semantics. A user should have support to specify the times when he was in a location that
make sense for him (home, son’s kindergarten, etc.), and should be able to use that information in
the query system.
• Scalability - Dealing with personal data is a challenge for many reasons. One of them is the data
size. Despite being possible to reduce the number of points in a GPS track while keeping accuracy,
the magnitude of a few months of data collection is still very big. In order to keep a stable and
smooth system, our back-end implementation must be capable to handle large amounts of data,
providing good response times.
• Data Quality - As referenced in 2.1.8, GPS data collection brings some problems related to accu-
racy. Our work will have to manipulate and treat that data in order to remove errors, spikes in the
paths, cold/start ends and reduce the size of the file by removing useless points.
• Privacy - Our solution will run on a local scope. Each user will have a copy of the software and run
it on a personal computer - no data will be uploaded or stored online. This decision ensures that
27
no personal data is compromised and gives no reason for a user to stop tracking his data because
of privacy issues.
28
Chapter 3
Data Collection
As explained before, Where Have I Been’s main objective is to find a simple way to visually let users
explore their personal mobility data. In order to test it, we developed an application that uses a visual
specification. To understand if we fulfill our objectives, we had to collect personal mobility data from
two different sources. Both sources require some degree of user intervention. The first one is GPS
data collected by the user with the help of any capable device and then stored in gpx files. The second
source is user annotations about the tracks. The second source requires annotating tracks with personal
semantic context, in order to give meaning to the locations a user has been to.
Although the goal of our research is not centered on data collection, for evaluation purposes we
will need to test our work with test datasets containing personal mobility information. Therefore in this
Chapter we will study the various data collection types and discuss how we can improve this data.
Mobility datasets have a lot of potential applicabilities. For example, recent works have been developed
in order to understand human behavior through mobility data investigation [18].
Independently of the purpose, mobility data collection has several inherent problems that can jeop-
ardize the data quality or even tempt users to abandon the process. First we will go through which
data users need to collect in section 3.1. The most common errors and problems with GPS data and
semantic annotation are described below in section 3.2. Then, in section 3.3 we will describe a set of
algorithms used to minimize errors in collected data.
3.1 Which data to collect?
Our application will consume two different data sources. The first is GPS data collected by a device and
stored in files. The second is semantic data about the places visited by the user.
3.1.1 GPS tracks
Our main goal is to explore a users’ personal mobility data: for that purpose the user must collect GPS
data on a daily basis. Data collection is independent of the device as long as it produces some file
output, which will be used as input for our application.
29
Nowadays many mobile applications (for every mobile operating system) emerged as a solution for
GPS track recording. These applications have an important feature in common: they allow exporting
the recorded information to a file(s) - and most important, they all have a specific format in common:
GPX. GPX, or GPS Exchange Format1, is an XML schema designed as a common GPS data format for
software applications and we have chosen it as the input type for GPS data in our application.
Before proceeding, there is an important clarification to make: the difference between tracks and
trips. A track is an object generated by some software, which consists of location data - in our work
we will consider tracks simply to be GPX files. On the other hand, a trip is a concept related to user
behavior. A user makes a trip when he is traveling from when place to another, regardless of the
traversed distance. Therefore, there is no one to one relation between tracks and trips. Users can be
tracking their location and may forget to turn off the software, thus generating a track that corresponds
to more than one trip. Likewise, trips may be divided in several tracks because, for example, the user
accidentally stopped recording the GPS signal.
Ideally, the user would turn on GPS data collection when moving from one place to another and turn it
off when arriving at some place. This behavior is ideal because of two different reasons: batteries don’t
last forever, and therefore, reducing tracking to the essential minimum would spare some battery. The
second reason is that tracking location indoors does not work - because the GPS signal cannot reach
the device, thus it only makes sense to record tracks while moving outdoors. This behavior generates a
single GPX file for every time a user was moving, which is a better result than having a whole file for a
single day. However, as we’ll see in section 3.2 it introduces some problems. For example, a user might
forget to turn off the signal when not moving - this will feed false information to our application (we will
lose the one to one relation between trips and tracks). Furthermore, today’s devices are not capable of
using the GPS sensor during a full day without recharging batteries.
3.1.2 Semantic meaning and annotations
It is also important to assign personal meaning to the data collected by GPS sensors. This data must
require some degree of user interaction, for only the user knows which places and tracks have a special
personal meaning. To illustrate this idea we could assign My home to a specific coordinate, from then
on, every time the user wanted to search for his home he would refer to it as such, and not use direct
coordinates.
We propose a format, inspired by an existing one, used by some supporters of lifelogging, (figure 3.1)
in order to address this question. The purpose of the format is to assign semantic meaning to a period
of time. In the case of this application the purpose is to assign a name to a time a user was indoors.
A line starting with -- represents a new day. A date must follow, in the format yyyy_mm_dd. The
following lines represent an interval of time when a user was not moving - meaning he was at some
location with personal significance and semantic. Each following line will consist of a range of time and
a location name. Time is represented with HHMM-HHMM, in which the first time represents arrival time,
and the last the departure time. Both are separated by a colon. After the colon, the name of the place1http://www.topografix.com/gpx.asp accessed: 14-11-2015
30
should be written. After specifying the hours, another new day is expected, as this format will consist in
a sequence of days.
This format also takes into account timezone changes. Timezones can be specified by entering,
alone in a line, something in the notation UTC(+/-)value, such as UTC-10 or UTC+1. All entries below
will be considered in said timezone until another line stating a different timezone appears. The default
timezone (when there are no timezone lines) is the user’s home timezone.
This is the main idea behind the format. There are however, a few cases that require it to be extended.
The first one is related to traveling underground: when a user enters a subway system he cannot
record GPS data due to lack of signal. When generating the semantic file, there would be an interval in
which the user was supposedly indoors (and not moving) but he was actually traveling underground. If
we kept the format as explained before, there would be no correct way to annotate that entry. In order to
accommodate this exception if the name contains an arrow -> we will consider it as a travel instead of a
place (for instance, 1234-1250:saldanha->baixa-chiado).
The second extension is about place travels that imply a timezone change. If a user travels between
two places in which timezones are different he could use the UTC tag, but doing so would imply that
both places were in the same timezone. What is expected is having the origin city in one timezone and
the destination in another. To accomplish this a user can specify @UTC-10 before the travel and only the
destination hour will be affected.
As we can see in figure 3.1 the user started the 15th of June at Home, then went to IST-Alameda,
Mcdonald’s, INESC-ID and at 18h11m made a subway trip from Saldanha to Oriente (notice the arrow
->). On the next day the user went to the airport at 10h12m and went on a flight to Paris (notice @UTC -
every hour following it, except the departure are considered in the specified timezone). On the following
day, the user went to the train station to catch the train from Paris to Athens. Because that trip was
fully recorded, there is no need to use the arrow or the at sign. All the entries bellow UTC will be in that
timezone.
This file (eventually a complete interface in future projects) will be generated every time one (or more)
GPX files are inserted in the system and give information about time periods. The time intervals in a
semantic context file are, in practice the complement of the time intervals for which there are recorded
GPS tracks. Then the user will have to record the location he was at, at the specified time. After editing
the file, the system will update the information, reflecting the changes. After a few editions, the resulting
file will not come empty of place names - our system will understand and eventually suggest places
based on previous locations and times.
31
Figure 3.1: Proposal for the semantic context file
3.2 Real data problems
Dealing with real world data is not an easy task: registering the same route from A to B will generate
a different output each time a user records it because the collected data is subject to GPS reception
problems, among others. Below we will analyze the most common problems regarding GPS tracks
collection and semantic annotation.
3.2.1 Problems related to GPS tracks
The following issues are based on known GPS problems either technological or generated by the user.
• Dataset size: By keeping a constant record of everyday life, a user can easily generate hundreds
of megabytes and thousands of files a year. Since we want to allow this tool to search for a user’s
entire journey, we need to keep it’s data volume to a reasonable size.
• GPS Accuracy: GPS positional accuracy errors are the most common type of errors, causing mis-
representation of the actual route taken. Also causes problems like the wandering effect (tangles)
32
which is an illusion of excessive positioning wandering when in a stationary position. It can also
generate spikes and inconsistencies, usually when there is some physical obstacle between the
satellites and the receiver (figure 3.2 - first three images).
• Forgotten Start/End: Another very common error with GPS-based data collection is forgetting to
turn on/off the data collection, which in turn leads to a wrong detection of trip ends. To explain it
simply: a GPX file might not contain a single trip, but several. More generally: there is no direct
linkage between files and trips. Our application will have to analyze a GPX track and know when
to join tracks that belong to the same trip and tracks that are unrelated.
• Loss of signal: Due to loss of signal, some routes may be missing trackpoints, which will affect
the way we perceive the route. There is a need to refill those empty spaces of data with something
close to what happened to reality. A simple example of this happens when a user enters a tunnel
(figure 3.2 - last image).
• Battery Requirement: GPS-based data collection has a very high battery requirement. Although
both hardware and software have drastically evolved, the battery requirement is still very high.
This can cause users to abandon collecting data or to gather lower quality (reducing sampling
frequency to save battery) or incomplete data (battery totally drained in the middle of a track).
Figure 3.2: Common GPS problems
33
3.2.2 Problems related to semantic meaning and annotations
The following are problems that occur due to user intervention during the annotation of tracks.
• Multiple meanings: Users directly manipulating data can lead to flaws. In this particular case, a
user might have several names for the same place (eg. Mom’s house, Mommy’s place, home) and
despite being all the same place, our program would need to have the intelligence to understand
that.
• Forgetting something: Users that want to annotate some data (eg. add an interval where he/she
was that was not recorded by the gps device) might provide dubious data, since, most likely, they
do not remember the precise hours.
• Keeping up: Places may change their names and names may change their place. That is, a
restaurant may change its name or move to a different location several times during a lifetime.
Similarly, people change address several times - thus ’Mom’s home’ may have several locations
during a lifetime.
3.2.3 Common to both
These are problems that exist both in GPS data collection and in track annotation.
• Privacy: Users sometimes opt to abandon a data collection methodology due to privacy concerns.
Users are not very keen on giving public availability to daily mobility information.
• User’s Burden: Mobility data collection is more than just recording tracks. Besides collecting
tracks there is the need to process this raw data by cleaning, correcting and completing with more
contextual data, trip purpose or trip ends. This extra work can be very burdensome depending on
the methodology and technology used, which in turn can lead to a bigger gap between the data
collection and the processing taking place and ultimately resulting in lower quality data.
3.3 Solving real data problems
In the previous sections we detailed problems that arise from dealing with real data. Below we will
address these problems and find solutions to minimize them.
3.3.1 Dataset size
In order to keep the number of necessary files and dataset size to a low level, we must process the
collected data and apply some simplification. It is important to notice that we want data simplification,
not visual simplification - that is, we want the path to be the same without loosing any significant data.
To reduce the number of points we tested two different algorithms: Ramer-Douglas-Peucker algorithm2
2http://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm accessed: 14-12-2014
34
(RDP) and Visvalingam-Whyatt polyline simplification algorithm3. We also tested D3 simplification, but
this one is not useful for us, since it a visual simplification that does not affect the stored data, but only
the visual outcome in the front-end.
Figure 3.3: Comparison between an original gpx track (red) and one with RDP (green)
We compared the algorithms in terms of resulting points (figure 3.4): for a similar coefficient, RDP
keeps more points than Visvalingam. This latter one, however, takes too much time processing the
information, it can take up to 12 seconds (RDP takes less than a second) for a track of this dimension
(2825 trackpoints).
Figure 3.4: Comparison between GPX simplification algorithms
Our final choice was RDP, because it allows fast processing of tracks, and keeps an enough amount
of points (that have enough space between them) so that future statistics will not be lost (figure 3.3).
The comparison can be viewed online4.
This algorithm, however, is only focused on changing how a track looks, it is not worried about the
trackpoints. It is important that the algorithm only deletes a trackpoint if it is spatially and temporally
close to another. By analyzing figures 3.5 and 3.6 we can understand it better. Both figures represent a
3https://hydra.hull.ac.uk/assets/hull:8338/content accessed: 14-12-20144http://web.tecnico.ulisboa.pt/~jorge.s.filipe/gpx_comparison/ accessed 28-12-2014
35
track from point A to point B, thus both have the same traveled distance. The first figure shows how the
traditional algorithm works: given a set of points, it does so by ”thinking” of a line between the first and
last point that form the curve. It checks which point in between is farthest away from this line. If the point
(and as follows, all other in-between points) is closer than a given distance - defined by us, it removes all
these in-between points. The result is a simpler line, as represented in figure 3.5 - lower. For example,
if we want to know where we were at 12:15, it would be merged to interpolate between the end points -
generating a point halfway. Because we are moving at a constant speed, it is correct.
Figure 3.5: Standard RDP problem explanation 1
Considering the situation in figure 3.6, we can see that the user was stuck in a traffic jam (more GPS
samples in the same time). If we applied the traditional algorithm to this track, we would lose information
and would not be able to ask where the user was at 12h15. It would provide misleading information: the
user would be ahead in the path, when he was actually behind, because of traffic.
In this case, halfway does not correspond to the right place for 12:15, because we were not moving
at a constant speed. Thus, we must ensure that only areas of relative constant speed are removed.
Figure 3.6: Standard RDP problem explanation 2
To solve this problem we modified the algorithm5, so it would take in account the speed of the user
(and therefore, the temporal side). Our modification to the algorithm can be seen in listing 3.1.
Listing 3.1: Tkrajina implementation of RDP with temporal modification
1 prevSpeed = −100000
2 def s i m p l i f y p o l y l i n e ( po in ts , max distance , max time ) :
3 #Does Ramer−Douglas−Peucker a lgo r i t hm f o r s i m p l i f i c a t i o n o f p o l y l i n e
4
5 i f len ( po in t s ) < 3:
6 return po in t s
7
8 begin , end = po in t s [ 0 ] , po in t s [−1]
5https://github.com/tkrajina/gpxpy accessed: 15-08-2015
36
9
10 a , b , c = g e t l i n e e q u a t i o n c o e f f i c i e n t s ( begin , end )
11 t = end . time−begin . t ime
12
13 tmp max distance = −1000000
14 tmp max d is tance pos i t ion = None
15
16 for po in t no in range ( len ( po in t s [1 : −1 ] ) ) :
17 po in t = po in t s [ po in t no ]
18 d = abs ( a ∗ po in t . l a t i t u d e + b ∗ po in t . l ong i t ude + c )
19
20 i f d > tmp max distance :
21 tmp max distance = d
22 tmp max d is tance pos i t ion = po in t no
23
24 v = length 2d ( [ end , begin ] ) / t . seconds #m/ s
25 global prevSpeed
26 i f ( abs ( prevSpeed−v ) <0.5) :
27 return ( s i m p l i f y p o l y l i n e ( po in t s [ : tmp max d is tance pos i t ion + 2 ] ,
max distance , max time ) +
28 s i m p l i f y p o l y l i n e ( po in t s [ tmp max d is tance pos i t ion + 1 : ] , max distance ,
max time ) [ 1 : ] )
29 prevSpeed = v
30
31 rea l max dis tance , rea l t ime =
32 d i s t a n c e f r o m l i n e ( po in t s [ tmp max d is tance pos i t ion ] , begin , end )
33
34 i f rea l max d is tance < max distance or rea l t ime < t imedelta ( seconds=
max time ) :
35 return [ begin , end ]
36
37 return ( s i m p l i f y p o l y l i n e ( po in t s [ : tmp max d is tance pos i t ion + 2 ] ,
max distance , max time ) + s i m p l i f y p o l y l i n e ( po in t s [
tmp max d is tance pos i t ion + 1 : ] , max distance , max time ) [ 1 : ] )
Our modification is available at GitHub6.
After applying the RDP algorithm, we perform another simplification intended to remove points that
are not within a minimum distance of each other. This algorithm just checks the whole track again re-
6https://github.com/jmsfilipe/where-have-i-been-lib
37
moves points that are too close, in order to improve the dataset size. This minimum separation between
points is specified in meters.
3.3.2 GPS Accuracy
There is the need to alleviate certain inconsistencies which were generated by accuracy problems. The
most common issues are spikes and tangles along the path. A point may be to the left or right of the
path, giving a mistaken location as to the position of the path.
In order to mitigate this we will smooth the recorded path. That is, we will make the path visually
uniform, to avoid, for example, spikes and inconsistencies. This will also reduce the number of points
needed to describe the path taken without significantly reducing the information content (figure 3.7).
Despite reducing the number of points, the idea behind this it not at all related to the previous section
(reducing dataset size). Here reducing points is a side effect of correcting and removing outlying points.
While in the previous section we wanted to perform a data simplification, here we want to have a visual
simplification.
Various approaches are possible, with varying levels of complexity. A straightforward but inaccurate
approach would be to assume the measurement noise is Gaussian and just average together clusters of
points, with the difficulty of deciding what is a cluster. A more evolved approach would be to fit a spline
or other parametric curve of a given complexity to the data using least squares, giving one significant
control over the level of smoothing.
The approach used here, based on the implementation by Tkrajina7 focuses on two different ideas:
calculating average distance between points to understand which points are outliers and therefore need
to be removed, and applying a ratio to the other points to achieve a smooth, well-fitting path.
Figure 3.7: Comparison between an original gpx track (left) and one after applying the smoothing algo-
rithm (right)
7https://github.com/tkrajina/gpxpy/ accessed: 12-08-2015
38
3.3.3 Forgotten Start/End
Our data processing has to assume that there is no one-to-one relation between GPX files and trips: a
file may have any number of trips. Furthermore, a trip may start in one file and end in another different
one (a user accidentally hits the stop recording button and starts recording after a while). To achieve the
right behavior our application has to know when to: join data from two files - in order to form a valid trip;
split data from two files - in order to form a valid trip; find and delete irrelevant trips - trips so short that
were most likely recorded by accident.
We implemented a way to join tracks that represent the same route but, by some reason, were split.
We also take in account the opposite case, in which a route is split into different tracks and needs to be
joined in one track. GPX files are analyzed and if the spatial and temporal space between two different
tracks is so small, it means those tracks should be only one file - thus representing one trip. Furthermore,
if a file contains more than one trip - there is a big blank of space and time in it - the track will be divided
in two, each one representing a different route.
The algorithm to split tracks works as follows: if there is a large variation (given as a parameter) of
distance or time between two points, the track is divided in two - the first track ending at that point, and
the second track starting at that point. The algorithm to join tracks checks temporal and spatial distances
and if those distances are too close (given as a parameter), it implies those tracks should be the same
and are, therefore, combined into one track.
Figure 3.8: Comparison between an original gpx track (left) and one after applying the track2trips and
smoothing algorithm (right). Three tracks can now be seen - the first until the red dot, the second from
the red to the green dot, and the last from the green dot.
3.3.4 Loss of signal
Should a user drive through a tunnel, for instance, GPS devices are not capable of recording such
information. In such cases, our application should try to find out which route was taken by the user (using
map matching tools like Track Matching8 , for example) - however, these kind of tools are still under early
8https://mapmatching.3scale.net accessed: 12-08-2015
39
development and do not produce a good percentage of realistic results. This said, in these cases, our
application will just calculate a linear interpolation when there is a clear loss of signal. This interpolation
adds evenly spaced points (containing spatial and temporal interpolated information) between the points
where initially there was no information.
Figure 3.9: Comparison between an original gpx track (left) and one after applying the interpolation
algorithm (right)
3.3.5 Battery Requirement
This is the only problem that has no solution in our approach. Collecting GPS data everyday is battery
expensive and there is not much we can do about it. Although, while we were daily collecting data, the
operating system battery usage stated that, in average, 30% of the battery consumption was due to GPS
tracking. With 70% battery left, there is still time to use smartphones for other usual tasks. However,
We understand, that technology developments regarding battery duration should be met, so that the
batteries could last longer and encourage users to track their mobility data.
3.3.6 Multiple meanings
Multiple meaning is considered a data quality issue. There is no definitive solution for this, but there
are several things we can do in order to mitigate this issue. By providing a good annotation interface, a
user could be offered suggestions while writing a location name and then, instead of creating another
name representing the same thing, he would chose one that already exists. Furthermore, the application
should offer a find duplicates feature in which places that are very close and have similar names would
be listed and asked to be verified by the user. Our application provides neither of these tweaks, as our
annotation interface is very rudimentary and intended to be perfected in future projects.
3.3.7 Forgetting something
It is also considered a data quality problem. Since it depends on the user interaction, there is no solution
to fix it. There are, however, a few things that can be done in order to mitigate the problem. One of
40
them is finding patterns - if it is usual of a user to go to a certain place at a certain day, it would be
natural to suggest that, in order to help the user accomplish the annotation faster and without messing
up information. As before, our application does not have support for patterns - it will be considered as
future work.
3.3.8 Keeping up
Shops, restaurants, parks, supermarkets, may change name with some frequency. In case a location
changes its name, all the previous stored locations referring to the same place, should reflect that naming
change. The same situation happens when a friend moves to a new home. Despite still being called, for
instance John’s house, its location is another. Solving this implies providing a mechanism to associate
a location to a history of names, and a name to a history of locations.
3.3.9 Privacy
Our application does not rely on any external service and, thus, no data is sent to the internet. All of the
gpx files are stored in the users hard drive and the application runs on a local scope.
3.3.10 User’s Burden
As said before, this problem is divided in two different areas: recording the tracks and annotating that
information. It is possible to make data annotation a more pleasant and faster job if we offer a good inter-
face which makes suggestions based on previous behavior. Our algorithm is capable of understanding
patterns and suggest places that were visited before around the same hour. When a user fills in the
semantics information, our application stores, in a cache file, data about the specified locations (time,
name, coordinates). Each time users open the annotation interface, to insert more data, our algorithm
processes all the cache data, and suggests, based on the most common relations between hours/places
the locations they were likely to have been at, on the recently uploaded tracks.
41
42
Chapter 4
Visual Queries
Now that we have data to feed our application, we need a way to explore that data. Below we analyze
the necessary expressiveness for a visual language and explain how to represent that language.
4.1 Expressiveness
Our first input will be loose GPX tracks that have little usefulness for querying personal spatio-temporal
data. It is essential to systematize how to handle these tracks, so we can find relevant information to
query.
In table 4.1 it is possible to see a summary, based on the analysis of all the previous works, of how
to systematize personal data regarding space and time.
Three ways in which a user might query it’s data are addressed: Accuracy, Relativity and Con-
creteness. Each of this concepts can be applied to four different areas: Temporal, Recurrence, Spa-
tial Endpoint, Spatial Path. This was thought to give a fairly complete coverage of all things that may
evoke reasonable questions in this context.
Accuracy refers to whether or not the user wants to refer to something precise or wider. In the case
of time, a user could ask for an accurate time (09:00) or something more vague (around 12:00), in the
case of space, if it is an endpoint a user might want to refer to a precise place (e.g. exactly at home)
or a range around a place (no more than 10m from home); if it is a path, a user might want to refer to
a precise route (accurate - an actual GPS route) or a route within some limits (fuzzy - e.g. a route that
doesn’t deviate 100m from another route). When applying recurrency, a query is accurate if the user
wants to know something like ”where I went exactly 3 times”, or fuzzy if the user asks ”where I went
around 3 times”.
Relativity refers to comparisons between things. In the case of time, a relative time is one relative
to another event (one hour after...) and an absolute event is the same as a concrete or accurate one
(9:00). In the case of space, if considering an endpoint, we can be relating a place to another (e.g. a
place near the shopping mall), or an exact place (e.g. at home); if considering a path, something like
(e.g. a route that doesn’t deviate more than 100m, overall, from route X - or endpoint X) is relative and a
43
particular route (e.g. a similar route to an actual GPS route) is absolute. In relativity, recurrency is easy
to understand: a user might want to compare the number of times he went to a certain place - ”went
more times to X than to Y”.
Concreteness defines the target. In the case of time, it can be an abstract date, such as ”mom’s
birthday”, or a concrete time like ”9:00”. In the case of space, if considering an endpoint, we might give a
rough description of a place (e.g. generic shopping mall) or its absolute position (e.g. mom’s house). If
considering the path, an abstract path is a class of possible routes (e.g. highway) and a concrete refers
to a concrete route (e.g. highway A5). In the case of recurrency, we believe there is no such thing as an
”abstract” recurrency. A concrete recurrency is the same as an absolute or accurate one.
This table will serve as a basis for our query mechanism, because it specifies the expressiveness we
want the system to have. Thus, it specifies all the possible query types our system will have to support.
In section 4.4 we cover individual cases of the expression table and show how they can be translated to
our visual language.
44
End
po
int
Pa
th
Fuzz
y
A t
ime
inte
rval
(ex:
fiv
e m
inu
tes
aro
un
d 1
2:00
)
Som
e am
ou
nt
of
tim
e
(ex:
wen
t ar
ou
nd
3 t
imes
to
…)
A r
ange
(ex:
no
mo
re t
han
10
met
ers
fro
m h
om
e)
The
sam
e ro
ute
, wit
hin
so
me
limit
s
(ex:
all
rou
tes
taki
ng
the
sam
e ro
ads
fro
m A
to
B, w
her
e "t
he
sam
e ro
ute
" m
ean
s th
ey d
on
't d
evia
te, o
vera
ll, m
ore
th
an 1
00
m
fro
m e
ach
oth
er)
Acc
ura
te
A s
pec
ific
tim
e
(ex:
at
9:0
0)
Spec
ific
am
ou
nt
of
tim
e
(ex:
wen
t ex
actl
y 3
tim
es t
o…
)
A s
pec
ific
loca
tio
n o
r d
ista
nce
(ex:
at
lat
37.5
43, l
on
: -7
.432
432
;
at
Ho
me;
exa
ctly
100
m f
rom
Ho
me)
A p
arti
cula
r ro
ute
(ex:
all
rou
tes
sim
ilar
to a
n a
ctu
al G
PS
rou
te)
Re
lati
ve
Tim
e re
lati
ve t
o a
no
the
r ev
en
t
(ex:
on
e h
ou
r af
ter…
)
Co
mp
aris
on
(ex:
wen
t m
ore
tim
es t
o X
th
an t
o Y
)
Pla
ces
in r
elat
ion
to
oth
er o
nes
(ex:
a p
lace
nea
r O
eira
s P
arq
ue)
A r
ou
te w
ith
in a
cer
tain
dis
tan
ce f
rom
a b
asel
ine,
or
wit
h
sim
ilar
feat
ure
s
(ex:
a r
ou
te t
hat
do
esn
't d
evia
te m
ore
th
an 1
00
m, o
vera
ll, f
rom
rou
te X
- o
r en
dp
oin
t X
)
Ab
solu
teC
on
cre
te t
imes
/ in
terv
als
(ex:
at
9:0
0)
Spec
ific
am
ou
nt
of
tim
e
(ex:
wen
t ex
actl
y X
tim
es t
o…
)
An
exa
ct p
lace
(ex:
at
Ho
me)
A p
arti
cula
r ro
ute
(ex:
all
rou
tes
sim
ilar
to a
n a
ctu
al G
PS
rou
te)
Ab
stra
ct
A "
clas
s o
f ti
mes
" (t
ime
des
crip
tio
n)
(ex:
sta
rt o
f a
foo
tbal
l gam
e; c
hri
stm
as d
ay)
-
A c
lass
of
pla
ces
or
rou
gh d
escr
ipti
on
(ex:
a s
ho
pp
ing
mal
l)
A c
lass
of
po
ssib
le r
ou
tes
(ex:
usi
ng
a h
igh
way
)
Co
ncr
ete
A c
on
cre
te t
ime
*
(ex:
at
9:0
0)
Spec
ific
am
ou
nt
of
tim
e
(ex:
wen
t ex
actl
y 3
tim
es t
o…
)
A c
on
cret
e p
lace
(ex:
mo
m's
ho
use
, lat
37
.432
, lo
n:-
7.5
432
)
A c
on
cret
e ro
ute
(ex:
usi
ng
hig
hw
ay A
5)
* It
can
be
con
cret
e an
d f
uzz
y ("
aro
un
d 1
2:0
0"),
or
con
cret
e an
d a
ccu
rate
("1
2:00
") o
r ab
stra
ct a
nd
fu
zzy
("ar
ou
nd
din
ner
tim
e")
or
abst
ract
an
d c
on
cret
e ("
at d
inn
er t
ime"
).
Co
ncr
eten
ess
is d
efin
ing
the
targ
et (
wit
h m
ore
or
less
sem
anti
cs f
or
the
use
r). F
uzz
ines
s is
ho
w "
tigh
t" a
rou
nd
th
at t
arge
t th
e lim
its
are
Acc
ura
cy
Rel
ativ
ity
Co
ncr
eten
ess
Spat
ial
Tem
po
ral
Rec
urr
ence
Table 4.1: Spatio-Temporal classification for personal geolocation data
45
4.2 Visual language
In order for users to make queries we developed a visual language allowing them to search for their
personal spatial data.
As explained in section 4.1, our language will have to support three different concepts - time, space
(in two ways: place and path) and recurrence. Each of these concepts will be divided in different degrees
- accuracy, relativity and concreteness.
Our language metaphor is a timeline. Users can sketch their query in this timeline with two different
types of objects: paused time and movement time. As a guiding principle, everything related to time
will be drawn on the horizontal axis and everything related to space will be drawn on the vertical axis.
Another important consideration is that larger areas imply a larger value (either of time or space).
Routes cannot exist on their own, they must have a start point and a destination point, thus, every
time two paused times are created, a route (movement time) connecting the first to the second will
automatically be added.
Figure 4.1: Where Have I Been query area
Paused time (represented in figure 4.1 as a blue rectangle) reflects a period of time when the user
was not moving - usually means the user was at some place: home, restaurant, school, etc.
Movement time (represented in figure 4.1 as a gray line between the rectangles) is a period of time
when the user was moving. There is always a line between two rectangles.
To support each different concept (time, space, recurrence) each object has fields that can be filled
in. Every parameter is optional - in case the user doesn’t fills in any value, it will not be considered.
• Temporal concept language specification
Below we explain how the language was formulated to support the time expressiveness repre-
sented in table 4.1.
– Accuracy support A time is either accurate or fuzzy. To represent an accurate time a user
must fill in a time box, located at the left and right sides of an indoor time (figure 4.2). By just
selecting a time, the user will be referring to precisely that time. However, a user may want
to know something like ”Where was I after 17h?” - in that case the mathematical symbols (>,
<, ≥, ≤) should be used. To represent a fuzzy time, a user must first specify an accurate time
and then specify the desired range (figure 4.3). Times are shared between different objects,
for example, in figure 4.1 the first paused time shared its ending time with the route object
(this time as the start time). Instead of specifying a start/end time, a user might just want
to specify the route/stay. In this case no conflicting information can be provided: that is, the
duration cannot be longer than the specified (if any) start/end times (figure 4.4).
46
– Relativity support Relativity only makes sense when there is more than one object. To
achieve a relative specification a user has to specify a duration to the route connecting two
locations. For example, if the user wants to know where he was one hour after being at home,
he should draw two rectangles and specify the duration of one hour to the route connecting
both rectangles.
– Concreteness support An abstract time (a class of times) is specified in the same box as
start/end time - but as soon as the user types, a dropdown of existing abstract times is shown
for the user to select.
Figure 4.2: Editing a start time
Figure 4.3: Editing a temporal range
Figure 4.4: Editing a duration
47
• Recurrence concept language specification
Below we explain how the language was formulated to support the recurrency expressiveness
represented in table 4.1.
– Accuracy support Recurrence is represented by an arch above a stay. It is created by
selecting the rectangle and then the icon representing the arch. Recurrence is applied daily -
that is, all results will shown how many times you went to the specified place during a specific
day. As with time, recurrence can also be fuzzy. Selecting fuzziness follows the same principle
as in time, using mathematical symbols (>, <, ≥, ≤) - see figure 4.5.
Figure 4.5: Applying recurrence
• Spatial concept language specification
Below we explain how the language was formulated to support the spatial expressiveness repre-
sented in table 4.1.
– Endpoints Spatial concept is divided in Endpoints and Paths. Endpoints are actual locations
either named by the user, or represented by geographical coordinates.
∗ Accuracy support To provide a fuzzy specific location a user either can write a location
name in the middle of the rectangle or select existing locations from a dropdown menu.
If a user wants to specify a coordinate instead of a name, he must click on the box and
then on the location he wants on the map. As explained before, space is represented
vertically. So, in order to specify a fuzzy location (that is, a location with some range)
a user can vertically drag the upper and lower borders of the rectangle (or just input a
number - in meters, in the box on the upper right corner), see figure 4.6.
∗ Relativity support To specify a place in relation to another, a user can specify a location
(either by name or coordinate) and then specify a range in which the wanted location is.
∗ Concreteness support As with times, locations can also be grouped in classes. The
process to choose a class of places is similar. While writing in the location box, categories
will also appear on the dropdown menu.
48
Figure 4.6: Editing a spatial range
– Paths A path is the actual route taken by the user to get from one point to another.
∗ Accuracy support To specify a specific route of comparison a user can upload a GPX
file containing that route (see figure 4.7). To specify a fuzzy route, the user also needs
to upload a file, but then has to vertically drag the line, following the principle that larger
areas mean larger values, to specify a desired fuzziness, in meters.
∗ Relativity support Relativity in this case means using a location (coordinate) as a pass-
ing point of the route. To specify it, a user must pick click on the route box and pick a
coordinate from the map. Doing so will automatically add a range (defined in the settings).
∗ Concreteness support As with time and endpoints, paths can also be grouped into
classes. In order to select a class a user needs to start typing in the route box - as usual
a suggestion will appear on a dropdown, allowing the user to choose the desired route.
Figure 4.7: Adding a comparison route
4.3 Visual language examples
In this section we will consider a few entries of table 4.1 and show how they can be translated into visual
queries.
However, some of the cells are not implemented in this solution, mainly due to schedule constraints.
Temporal fuzziness: in this first example, the user states a starting time - 12 o’clock more or less 5
minutes - and says he was at home (figure 4.8).
49
Figure 4.8: Visual query demonstrating range of time
Temporal accuracy: in this example, the user specifies a start time (12 hours) and the location
(Home) (figure 4.9).
Figure 4.9: Visual query demonstrating absolute time
Temporal relativity: here the user is asking for the time when he left home and after one hour was
at School - specifying time relatively to a previous event (figure 4.10).
Figure 4.10: Visual query demonstrating time relative to another event
Spatial (endpoint) fuzziness: in this fourth example the user wants to know when he was within
100 meters from Home (figure 4.11).
Figure 4.11: Visual query to specify spatial fuzziness
Spatial (endpoint) accuracy: in this example the user wants to know when he was exactly 100
meters from Home (figure 4.12).
50
Figure 4.12: Visual query demonstrating spatial accuracy
Spatial (endpoint) relativity: in this example example (figure 4.13) we refer to relative spatial end-
points. We implemented a simplified version of this concept: we can mark a coordinate on the map or
refer to a know location and then specify a spatial range - much like in figure 4.11.
Figure 4.13: Places in relation to other ones - simplified version
Spatial (endpoint) absoluteness: in the seventh example we refer to precise know locations already
registered by the user (figure 4.14).
Figure 4.14: An exact place
Spatial (path) relativity: here we state we want to know all the routes that passed nearby (default is
200 meter range) that coordinate (figure 4.16).
Figure 4.15: A route within a certain distance from a baseline
The two following examples show more complex queries, that represent the full potential of the visual
language and illustrate more realistic situations. In the first situation (figure 4.16) the user can find when
he was at least 1 hour at home close to noon.
51
Figure 4.16: Visual query demonstrating ending times
In the last situation (figure 4.17), users want to know every time they went to a commercial space on
the way from home to IST Alameda, but have not been there more than 30 minutes.
Figure 4.17: Visual query demonstrating duration constraints and class of places
Some cells in the table were not (yet) implemented in this solution (grey cells in table 4.2). Below is
a brief explanation:
Regarding the temporal side, we haven’t implemented abstract times (class of times), for that would
imply even more annotation from the user.
Recurrence was discarded from implementation because of schedule constraints.
Regarding the spatial scope no class of places was implemented (we do have categories that ag-
gregate locations, but, as of now, only serve the purpose of coloring the results). Similarly, we do not
support named routes (such as highway ).
4.4 Visual language validation
This user test was intended to evaluate the understandability of the visual query mechanism. Here we
will present a few conclusions and the changes thereby made regarding the first test results that served
as a query validation method.
Our universe of testers was composed by a group of four people, which demonstrated interest in
using this kind of tools, with some experience in using technological gadgets (such as smart phones,
tablets, computers), females and males and aged between 20 and 26 years old.
Firstly, they were briefly introduced the system’s interface and the main goals and essential mechan-
ics were explained. Then each user was handed a copy of the test protocol (appendix A.1) and could
consult it to check any doubts.
Users were asked to perform tasks speaking out loud about their doubts on how to proceed, so that
we could gather some feedback. Notes about user behavior and decisions while using the interface
were taken, in order to complement the validation. The number of errors users made while attempting
to perform each task was also recorded.
Tasks were designed to cover all implemented cells of table 4.1, from temporal concreteness and
fuzziness to spatial concreteness and fuzziness, searches with durations, searches with more than one
52
location, etc.. Three tasks focused on the spatial side, four tasks focused on the temporal side and one
tasks was dedicated to explore the interface.
To test spatial accuracy, for example, we used task 1 (in appendix A.1) which stated to: ”Search for
dates where I arrived at ist-alameda at precisely 9h16min”.
In each task we counted the number of errors made by users, so we could have some metrics about
the successfulness of the task. Results are presented in table 4.2.
Below is the list of all the tasks presented to users.
1. Search for dates where I arrived at ist alameda at precisely 9h16min.
2. Search for dates where I was at ist alameda from around 10h120min to 12h120min and then went
somewhere else
3. Search for when I was at ist alameda and then went to ist taguspark
4. Search for where I was at ist alameda, then took route A5 to ist taguspark
5. Search when I was at ist alameda for more than 2 hours
6. Search when I spent less than 1 hour between ist alameda and ist taguspark
7. Search when I was in a 500 meter range from ist alameda
8. Search all the days I was at casa
To validate, we used our implementation so far. It was not just a comprehension test - we wanted to
know if users could really make the queries. Thus we needed to use something concrete for users to
complete the tasks.
Table 4.2: Average number of errors made by task - each cell is a task. Gray cells were not implemented
in this solution. (* this cell expected behavior is the same as another existing one)
Analyzing table 4.2 we can see tasks 2 and 4 presented a higher number of errors. Task 2 specifies
”Search for dates where I was at ist-alameda from around 10h120min to 12h120min and then went
53
somewhere else”. Its complex nature justifies a higher number of errors: a user must first click in the
blue arrow on the side of the rectangle, which may not be an obvious thing to do. Task 4 states ”Search
for where I was at ist-alameda, then took route A5 to ist-taguspark ”. Because we do not support classes
of routes, to successfully finish this task a user must select a point belonging to A5 in the map.
We also derived a list of suggested changes - divided in what we actually implemented and what we
consider future work.
Implemented issues:
• Replace all the names in aggregation results with ”click to expand” instead of overlapping location
names
• Change the way to erase input. Instead of having a cross in the widget popup, show a cross in the
input box
• Add to settings the time at which results become aggregated
• Change map representation (theme) or change route line color for more contrast
Future work issues:
• Result comparison: select two queries and compare both of them
• Pop-up tutorial video
Generally, the results of this validation were positive, to the extent that most people understood the
main principles of the visual language. Despite some tasks that were more error prone, the majority of
tasks was successfully executed without difficulty by the users, revealing that the visual encoding behind
the language specification is understandable and can be easily used by users.
54
Chapter 5
Where Have I Been
Here we will present our approach - first by explaining the scenario in which it will be used. After that,
by explaining all the components present in our implementation.
Our typical user will use a smartphone every day, doesn’t matter which brand, as long as it has a
GPS receiver and a good enough battery. Each time a user moves from point A to B, either walking,
driving, etc, he/she will record the path with a smartphone - a very simple application for this, in Android
is My Tracks1, this application is simple because it allows to record and stop the path with a single click,
without having to write anything or go through complicated menus.
The main idea is, each time a user moves from one place to another, a different track should be
recorded. This means that if, for example, a user goes from home to a coffee-shop, spends five minutes
drinking coffee, and then goes to school, it would result in 2 different tracks: home → coffee-shop and
coffee-shop → school.
As stated before, we support personal semantics related to locations a user is familiar with. For this
to happen the user will have to record in which place he was at a certain hour - with this mechanism (that
is intelligent enough to understand patterns and suggest routes that happened in similar conditions) we
will be able to categorize the locations in a personal semantic way.
We adopted a client-server approach in order to improve scalability. The main functions of our server
are to clean GPX files, store data in a spatial database and process the queries.
As explained before, our main objective is not caring about the source of the data, but it is also
important to provide quality data in order for the queries and visualizations to be optimal. Thus, each
GPX file representing a track will be affected by a library (GPX Lib) that, among other things, cleans the
data, removes error points, smooths the trajectory, etc. All the problems related to GPX files explained
in section 3.3 will be solved at this stage.
All our server code is Python. This choice was made because it has great community support and
many libraries are publicly available. We provide this API at GitHub2, because this library will not be
specific for this work, and we think it could be used by others to remove errors from GPX tracks.
After being processed, track data is stored in a PostgreSQL + PostGIS database. It is not a common
1https://play.google.com/store/apps/details?id=com.google.android.maps.mytracks accessed: 20-09-20152https://github.com/jmsfilipe/where-have-i-been-backend accessed: 15-08-2015
55
relational database because we are handling a particular data type: geographical data. While relational
databases are created to deal with types like integer, dates, string, currency, we need a convenient
database to handle points, edges, polygons (areas). Several extensions to known database systems
provide spatial support. We’ve chosen PostgreSQL extension - PostGIS3 because it is considered a
mature service and it also supports point, linestring and polygons. Furthermore it follows the SQL
specification from the Open Geospatial Consortium4.
Communication between the client (browser) and the server are done using Websockets. Web-
sockets were chosen instead of Ajax requests because they allow the server to communicate with the
browser without the browser having to request it. The browser can also send data to the server. This is
quite useful for sending notifications and updates because the server can send them when it gets them,
instead of waiting for the browser to ask for them.
The front-end side will exchange data both from queries and results, from the server with JSON
messages, in particular, messages that contain spatial data will be encoded in GeoJSON5. We choose
JSON because it is a common standard to transmit data between a server and a web application,
furthermore, having a specific standard for dealing with spatial structures is also useful and can benefit
this solution in the future, integrating with other projects. Another advantage of choosing JSON is having
a lot of tools that already support JSON messages.
To implement the query and result viewing mechanism we used javascript libraries, such as vis.js6,
moment.js7, list.js8, jqueryUI9, clockpicker10, Semantic UI sidebar11, Google Maps Javascript API12 and
some bootstrap13 features.
Figure 5.1: Solution schema
3http://postgis.net/ accessed: 14-12-20144http://en.wikipedia.org/wiki/Simple_Features accessed: 23-12-20145http://geojson.org/geojson-spec.html accessed: 12-08-20156http://visjs.org/docs/timeline/ accessed: 15-08-20157http://momentjs.com/ accessed: 15-08-20158http://www.listjs.com/ accessed: 15-08-20159https://jqueryui.com/ accessed: 15-08-2015
10http://weareoutman.github.io/clockpicker/ accessed: 15-08-201511http://semantic-ui.com/ accessed: 15-08-201512https://developers.google.com/maps/documentation/javascript/ accessed: 15-08-201513http://getbootstrap.com/css/ accessed: 15-08-2015
56
5.1 GPX Library
In chapter 3 we described our GPX manipulation library. Here we explain how we use the library. The
first step in manipulating GPX files is to analyze each file and detect whether or not it corresponds to a
single trip, to several trips in one file, or just to a bit of a trip. After analyzing all the files, the algorithm
will merge and split files so that the final output is a file for each single trip. All of the files are saved
following a convention - the date at which the route took place followed by the number of the track in
that day ( 2014-10-09-part0.gpx). Now that all the necessary files are ready we start by minimizing
GPS accuracy problems by applying the smooth algorithm. As previously explained in section 3.3, this
algorithm will remove inconsistencies, spikes and tangles, resulting in the same number of files, but
without visual problems. The next algorithm applied to the files is the modified Ramer-Douglas-Peucker
(simplify_polyline). It aims to do data simplification and not visual simplification (reduce the number
of points while maintaining the appearance). The last algorithm (reduce_points) will re-check every
track and see if there are points that still can be removed due to being either spatially or timely too close
to each other. The full documentation of GPX Library can be read in appendix C.
5.2 Semantic Annotation
After being processed files are ready to be annotated by the user - annotation uses the format specified
in section 3.1.2. We provide a rudimentary interface (figure 5.2) for this task. Future work includes
developing a good user friendly interface to correct GPX data and add personal semantics to the tracks.
Currently our interface only shows a text editor for the semantic locations file.
Figure 5.2: Data collection edition
5.3 Backend
The backend of our solution is composed of three distinct components (figure 5.1), Database, Query
translation and Communication components.
57
After GPX files are manipulated and annotated their data is stored in a spatial Database. A spatial
database is a database that is optimized to store and query data that represents objects defined in a
geometric space. In addition to typical SQL queries such as SELECT statements, spatial databases can
perform a wide variety of spatial operations. The following operations and many more are specified by
the Open Geospatial Consortium standard:
• Spatial Measurements: Computes line length, polygon area, the distance between geometric fig-
ures, etc.
• Spatial Functions: Modify existing features to create new ones (like intersecting features).
• Spatial Predicates: Allows true/false queries about spatial relationships between geometries. Ex-
amples include ”do two polygons overlap” or ’is there a residence located within a mile of this
place’?.
• Geometry Constructors: Creates new geometries, usually by specifying the vertices (points or
nodes) which define the shape. Types include Point, Linestring, Polygon, MultiPoint, MultiLinestring,
MultiPolygon and GeometryCollection.
• Observer Functions: Queries which return specific information about a feature such as the location
of the center of a circle
There is a large number of spatial reference systems (datums) in use. Many of them are optimised
for use in one particular part of the world. We chose to use WGS-84 (4326)14 because it is globally
used when dealing with GPS coordinates. A datum is a model of the earth that is used in mapping. The
datum consists of a series of numbers that define the shape and size of the ellipsoid and it’s orientation
in space. A datum is chosen to give the best possible fit to the true shape of the Earth.
In our case, a track will correspond to a Linestring in PostGIS, while an annotated location will corre-
spond to a Point. Tracks will be stored in table linestrings (figure 5.3) while locations are stored in table
places.
Route start and end times are stored in table trips. If a route has a corresponding Linestring (that is,
if the user recorded it), its corresponding data will be referenced in table linestrings. If the trip was made
without being recorded (that is, just annotated), a null entry will appear in table linestrings.
Start and end times of stays (indoor time) are stored in table stays.
Locations are stored in table places and are regularly updated. When a user annotates a location, our
algorithm will compare the coordinate corresponding to that location against all the previous coordinates
with the same name and calculate a new (more accurate) location based on the average and removing
outliers.
14https://nsidc.org/data/atlas/epsg_4326.html accessed: 20-09-2015
58
Figure 5.3: Database Entity-Relationship diagram
We also performed load testing in our solution, to determine if it could deal with a lifetime of geospatial
personal data. We used 20835 gpx files - assuming eight files per day, it would correspond to roughly
seven years of data. These files totaled 1.7GB in size. This assumption of eight files per day was derived
from a worst-case scenario based on data collected by the author’s supervisor during five years (table
5.1).
Table 5.1: Summary of collected data during 5 years
They were first converted by our GPX library (to split/join, smooth and simplify tracks) and resulted
in 26505 gpx files. They totaled 280.8 megabytes and the processing was done in 54 minutes and
57 seconds. It consisted in a size reduction of 83%. Despite taking almost an hour, we believe its an
acceptable result, since an operation like this is not frequent. Usually a user is not going to process 7
years of data at a time, but only when new data arrives, ideally daily, probably once a week. Applying
the semantic processing algorithm (associating hours from the semantic interface) took 8 minutes and
33 seconds. Inserting all the data in the database took 13 minutes and 13 seconds. These tests were
made using Intel Core i5-430M Dual-Core 2,26GHz, 4GB DDR3-1066 and a Samsung SSD 840 Evo
disk.
59
5.3.1 Query translation
After all this data is stored, our server is ready to wait for queries from the client. When a user enters a
query in the interface and then presses the search button, a JSON object containing all the fields is sent
to the server. Upon receiving this object, the server has to translate that information into a SQL query.
The translation mechanism is quite simple: each part of the query system has a direct translation to SQL,
we just need to compose all the different parts in a single query. Furthermore, different query types have
different SQL templates, thus generating a full SQL statement implies combining all of the above. If a
user builds a query based on temporal fuzzyness, we know the main structure of our statement will need
a WHERE clause. And because we are dealing with fuzzyness, we will also need a BETWEEN clause.
After generating the template, the properties that are inside the JSON object must be converted to the
correct types and injected into the SQL statement.
For example, the query represented in figure 5.4 results in the SQL code in listing 5.1. The query
states ”all the situations in which a user was around 100 meters from ist-alameda and then went to
around 100 meters of ist-taguspark. This query will require information from both stays and trip tables.
The first part of the query (from line 1 to 22) and the last part (from line 28 onwards), both return
information from table stays by restricting the location name with the spatial constraint of 100 meters.
The second part from line 23 to 27 returns all trips (because there are no constraints on the trip). These
three different parts of the query and then joined (INNER JOIN) so that we only have results that match
all three parts.
Figure 5.4: Query illustrating SQL translation
Listing 5.1: SQL translation example
1 SELECT q1 . s tay id ,
2 q1 . s t a r t d a t e ,
3 q1 . end date ,
4 q2 . t r i p i d ,
5 q2 . s t a r t d a t e ,
6 q2 . end date ,
7 q3 . s tay id ,
8 q3 . s t a r t d a t e ,
9 q3 . end date
10 FROM (WITH l AS
11 (SELECT po in t
60
12 FROM places
13 WHERE d e s c r i p t i o n = ’ i s t −alameda ’ ) , k AS
14 (SELECT d e s c r i p t i o n
15 FROM places ,
16 l
17 WHERE ST Distance ( places . po in t , l . po i n t ) < ’ 100 ’ )
18 SELECT s tay id ,
19 s t a r t d a t e ,
20 end date
21 FROM k
22 INNER JOIN stays ON d e s c r i p t i o n = stays . s t a y i d ) q1
23 INNER JOIN
24 (SELECT DISTINCT t r i p i d ,
25 s t a r t d a t e ,
26 end date
27 FROM t r i p s ) q2 ON q1 . end date = q2 . s t a r t d a t e
28 INNER JOIN (WITH l AS
29 (SELECT po in t
30 FROM places
31 WHERE d e s c r i p t i o n = ’ i s t −taguspark ’ ) , k AS
32 (SELECT d e s c r i p t i o n
33 FROM places ,
34 l
35 WHERE ST Distance ( places . po in t , l . po i n t ) < ’ 100 ’ )
36 SELECT s tay id ,
37 s t a r t d a t e ,
38 end date
39 FROM k
40 INNER JOIN stays ON d e s c r i p t i o n = stays . s t a y i d ) q3 ON q2 .
end date = q3 . s t a r t d a t e
After generating the whole SQL statement, the server executes it and starts to process its resulting
information. In order to improve result readability on the client side, first, results are aggregated by
similarity and location name - entries having similar times and the same location are aggregated. This
process results in aggregated entries, and loose entries. Loose results will keep their properties (start
time, end time and location) while aggregated entries will have as properties a summary of its containing
entries (start and end times divided by quartiles). Quartiles are calculated by analyzing all containing
results. The practical result is shown in figure 5.5. As it can be seen, it shows a summary of all the stays
at casa, showing the most relevant (frequent) hours of arrival/departure. After being aggregated results
61
are sent to the client in the form of a JSON object.
Figure 5.5: Aggregated result with quartiles
5.4 Frontend
In this section we will explain how we build our interfaced based on the previously specified visual
language. We’ll start by presenting how to use the interface and them explain some implementation
decisions.
5.4.1 User interface
Our interface is divided in four different areas: Query area, Results area, Map area and Settings area
(by order in figure 5.6). Each of these components has a different purpose. In the Query area the user
will be able to specify its query in a simple visual way, as previously explained. A careful analysis of
the visual query language was given in section 4.2. Below we review each component of the interface,
starting with the Query area.
Figure 5.6: Where Have I Been user interface
62
Query area
We have two types of sketchable objects: paused time and movement time.
Paused time as we have seen in chapter 4 is represented by a rectangle and reflects a period of
time when the user was not moving - usually means the user was at some place: home, restaurant,
school, etc.
Movement time as seen in 4 it is represented as a gray line between the rectangles. Represents a
period of time when the user was moving. There is always a line between two rectangles.
Double clicking on the timeline will create a rectangle (paused type) in which start and end time
are undefined. Clicking and dragging will create a rectangle which duration will change while dragging.
Everytime two paused types are created, a route (movement type) connecting the first to the second
location will automatically appear.
Figure 5.7: Where Have I Been query area
Each sketchable type has parameters that can be configured. Every parameter is optional - in case
the user doesn’t fills in any value, it will not be considered. Below we explain the different parameters
and their usage.
Paused type parameters:
• start time (figure 5.8) - A start time is precisely that - a time when the stay started. This time
appears on the lower left side of the rectangle and, when clicked, shows a clock widget for the
user to select hours and minutes. By just selection an hour, the user will be referring to precisely
that hour. However, a user may want to know something like ”Where was I after 17h?” - in that
case the mathematical symbols (>, <, ≥, ≤) should be used.
63
Figure 5.8: Editing a start time
• end time (figure 5.8) - An end time follows the same principle of a start time, except that it is
located at the opposite side.
• fuzzyness in start time (figure 5.9, upper left side) - By setting this parameter, in minutes, a user
can add some degree of fuzzyness to start time. For instance, if someone sets fuzzy time at 30
minutes and it’s start time at 10h, this would create a valid range of time from 9h45m to 10h15m.
• fuzzyness in end time (figure 5.9, upper right side) - Follows the same principle of the previous,
except that it is located on the right side.
Figure 5.9: Editing a temporal range
• duration (figure 5.10) - Duration represents the duration of the stay. This parameter cannot conflict
with start/end times: if a user specifies a start time at 10h, end time at 12h and duration of 3 hours,
that would make an invalid query producing no results.
64
Figure 5.10: Editing a duration
• location (figure 5.9, upper middle) - It is possible to specify a location. This location can be either
selected from a dropdown menu (content from the annotation process) or a geographic location
picked on the map.
• spatial range (figure 5.11) - Defining a spatial range is useful because sometimes a user might
not know exactly where he was, so he can define a location and then define a range (in meters)
around that location that is also valid. Can be changed by vertically dragging the rectangles border
or writing directly on the box.
Figure 5.11: Editing a spatial range
Movement type parameters:
• route (figure 5.7, middle) - The route parameter allows a user to specify a point in the map where
the user may have passed. By default, this point accepts a range of 250m about itself. By using
this a user can easily find whether or not he/she drove on a specific road - just pick a coordinate
on the map near that road.
• start time (figure 5.8) - There is no need to define a start time for a movement type. Since paused
and movement types are intimately related, we can see on figure 5.7 that the indoor’s end time will
65
be the movement’s start time.
• end time (figure 5.8) - In this case, the movement’s end time will be the second indoor’s start time.
• fuzzyness in start time (figure 5.9, upper left side) - Fuzzyness is also referenced from the paused
type. If someone sets a fuzzy time at 30 minutes and it’s end time at 10h, this would create a valid
range of movement start times from 9h45m to 10h15m.
• fuzzyness in end time (figure 5.9, upper right side) - Fuzzyness at end time follows the same
principle as for start time.
• duration (figure 5.7, middle) - As with paused type, in movement type, durations cannot conflict
with start or end times.
In the Results area, the results that correspond to the query will be listed. Each entry represents a
result that, somehow, matches the query. In the Map area the highlighted result will be shown on
a 2D map.
In the settings area a user can edit the settings, including assigning categories to places, so that
results will be showed in the category’s color.
After sketching the query a user can either start search by pressing the search button, or clean the
canvas if some mistake was made (figure 5.12). A user can also individually delete a paused type by
clicking on the red cross icon (figure 5.13). It only makes sense to have an odd number of elements in
the search area - that is, it doesn’t make sense having a paused type followed only by a moment type,
leading to nowhere. This decision has several implications: a user can only create paused types, all
movement types are automatically added between paused types; and a user can only delete paused
types, all associated movement types will also be deleted.
Figure 5.12: Where Have I Been search area
A user might also define a date to search, on the left side (figure 5.12). If no date is provided, it
will be considered a global search (that is, it will search all recorded days), and will be represented as
”–/–/—-”.
Figure 5.13: Removing a query element
66
Because visual queries can grow quickly (for example, if a user sketches several locations), there is
the need to fit all that information on the screen to show it to the user. Just shrinking the sketch in order
to fit would not be an option, since overall size would become too small. So, we decided to allow pan
and zoom in all queries. Pan is done by clicking on the query and horizontally dragging the mouse, while
zoom is made using the scroll wheel.
Inserting elements in the middle of an existing query is also possible and useful (if a user forgets to
add an indoor type, for instance). One can drag the mouse between two indoor times (in the movement
time zone) and add another one between the existing two.
Results area
Upon clicking the search button in the search area, our backend translates the query and sends the
results to the client. In the results area (figure 5.6 - 2), however, not all results are shown at once, for a
matter of performance, since we plan to allow viewing an entire lifetime of information, more results will
be shown when the user starts scrolling down (in bulks of 10). Our current implementation aggregates
results in the server side, just before sending them to the client. When dealing with a large dataset,
because of the complex nature of our aggregation algorithm there is a huge performance drawback -
results take too long to be shown on the client. All data corresponding to the results of the query are
stored in the client’s memory. Despite more results being drawn only when a users scrolls down, they
are still in the browser memory and take too long to process.
Figure 5.14: Results reuse and lock options
To avoid this situation in large datasets, a future approach will imply to ease the aggregation algorithm
so it can be replied in SQL. In that case results would be aggregated in the database, and only a fraction
of the results will be sent when the users pressed search. The following results would be requested as
the users scroll downs, with a pagination mechanism on the server side.
Aggregated results are shown as a small summary of their content, conveying the idea of a stronger
gradient where the user was more often. They are represented as overlapped rectangles with opacity,
showing durations of different locations. When an aggregated result is clicked it expands, showing its
aggregated content (figure 5.15).
Users also have the option of reusing a result as a new query (magnifying glass icon in figure 5.14) -
doing so will clean the results and copy the location, start and end dates of the result to the query area,
ready to be refined and searched again.
67
Results can also be panned and zoomed. By default, panning and zooming will apply to all the results
shown - that is, all results shown will be panned/zoomed at the same time. This decision was made to
allow a comparison between results. If a users wants to inspect a result individually, he/she can pan and
zoom by clicking in the locker icon (figure 5.14).
Figure 5.15: Results panel
Map area
When clicking on a specific result (not a summary result) the combination of locations and tracks are
shown on the map (figure 5.7 - 3). When selecting (clicking on) an aggregated result, the first entry of
the aggregation is shown on the map. As a future feature, we plan to show a summary of the containing
trips, instead of just the first. A user can also highlight a specific location and track, just by clicking
exactly on it after clicking on the entry. The map will automatically zoom to show the full track. If there
is no data (tracks or locations) associated with the result, no information will be shown on the map. The
map is also integrated in the search area: a user can pick any point on the map and use it in a query.
Settings area
The settings interface (figure 5.16) appears on the left side of the screen when a users presses the
button represented in (figure 5.6 - 4). There are two different areas in the settings. The Categories area
allows to create categories - that is, a name representing a set of places - and assign a color to it. In the
Places area, a user can assign to all of the recorded locations a category, and thus, a color. This will
affect the way results are colored: all results belonging to one category will appear with the color defined
by the user. There is also the possibility to import and export both category and places relations from/to
a CSV file.
68
Figure 5.16: Where Have I Been settings interface
5.4.2 Implementation
Where Have I Been’s frontend was entirely constructed with pure javascript, without any framework. The
only library used to facilitate development was Browserify15 , which allows dividing javascript code into
modules, and producing a single minimized file with all the necessary code.
Generic timeline considerations
Our implementation of the visual language was based on the vis.js16 timeline module. Vis.js data items
can take place at a single date, or have start and end dates (a range). It’s possible to freely move
and zoom in the timeline by dragging and scrolling in the Timeline. Items can be created, edited, and
deleted in the timeline. The time scale on the axis is adjusted automatically, and supports scales ranging
from milliseconds to years. Timeline uses regular HTML DOM to render the timeline and items on the
15http://browserify.org/ accessed: 12-10-201516http://visjs.org/docs/timeline/index.html accessed: 12-10-2015
69
timeline. This allows flexible customization using CSS styling. However, our visual language required
more than what the library offered. Necessary interventions included: changing the way timelines were
represented, so that they were vertically centered with an axis; adding a representation for movement
time; support for input text, input date, input time, input duration, input spatial range; the option to apply
temporal fuzziness to start and end times; the ability to add paused time in the middle of a movement
time and the option to vertically and horizontally expand the timeline.
The first challenge (vertically center the timeline and add an axis) involved deleting useless DOM
components the original timeline was provided and add new DOM elements do center our timeline and
to represent a thin horizontal line. The next challenge involved the creation of classes for each of our
types: paused and movement. Each instance of these classes has the necessary methods to draw their
representation according to data received as a parameter. These classes also take care of the dynamic
beneath each <input> box, which is explained in detail on the next section.
Text input
When dealing with text input, several important decisions were made across the interface. <input>
boxes are absolutely positioned in CSS in relation to the rectangle, so that, when panning, they follow
its movement. We have chosen to include widgets that allow time and duration selection. The widget to
select time was based on ClockPicker17. It represents a clock metaphor, and allows the user to easily
select an hour/minute. Because in our solution time may be relative (more than two hours, less than two
hours), we appended a <div> to the widget that allows the user to choose between four different options
(close, >, <, =, ≥, ≤) - figure 5.17 - 1.
Figure 5.17: Interface input widgets: 1 - the time selector; 2 - duration selector; 3 - spatial constraint
selector
When picking a duration, we chose to implement a picker that allows to select hours and minutes
individually: when minutes exceed 59, an hour is incremented. This widget also includes the previously17https://weareoutman.github.io/clockpicker/ accessed: 12-10-2015
70
mentioned mathematical operators in the same way - an appended div. When dealing with spatial
constraints there is also the need to specify whether we are referring to more than a value or less than
a value - so the option we chose was to use a single widget, as represented in figure 5.17 - 3. Using all
the widgets affects the input box we are dealing with - if we choose an operator, it will appear in front of
the value in the input box. When the user wants to select a specified date to affect the query, a standard
calendar picker is shown.
Dealing with fuzziness
When dealing with fuzzy time: an arrow must be clicked in order for the fuzziness interface to appear.
Figure 5.18: Fuzziness interface. Left side: the user clicked and activated the interface. Right side:
arrow for the user to click and activate the interface.
Fuzzy time can be edited either by filling in the <input> or by horizontally dragging the line. This line
is very small (only 1px of height), so we also decided to increase the available area to click (a square
of 10px) at the endpoints of the line. By horizontally dragging the line, a logarithmic scale is applied so
that when the user starts moving the mouse, values increase slowly and, as the user keeps moving the
mouse, they start to increase faster. This behavior is ideal, because it allows a user to fine tune a value
when moving slowly, and to jump to higher values when moving faster.
Gesture support
In order to support mouse events, we used HammerJS18 to facilitate integration, mainly on complex
situations, such as dragging and panning. In order to drag paused time, we created a small area
both on the left and right side of the rectangle, where the user can vertically drag in order to increase
duration. The same situation happens when dealing with spatial constraints: a user can vertically drag
the rectangle, in order to expand it. The top and bottom area of the rectangle were enlarged, so that the
user could easily click and expand the rectangle. Previously, in the library, those areas were so small,
the user had to aim at the line and with some luck succeed in expanding it.
Results
Results are also based on the same principle as the main query: they have paused time and movement
time, but with only the start/end time and location name. As previously explained, results can be aggre-
gated. If we look at figure 5.19, we can see results are not aggregated. One immediate consequence of
showing results like this would be presenting a huge list of results to the user - showing a lot of results
18http://hammerjs.github.io/ accessed: 12-10-2015
71
the user might not be interested. To solve it, we decided to aggregate similar results. On the figure we
can see that some location names are repeated (like ’casa’, ’ist-taguspark’ or ’intermarche’).
Figure 5.19: Disaggregated results
Aggregation is calculated at server side. The main rule of thumb is to try to aggregate the entries
with common names, and then understand if there are temporal relations between them (that is, if times
are within a common range, which can be specified in the settings). If a set of results matches those
properties, they are aggregated. Other unrelated results are shown as normal entries.
The main advantage of aggregating results is occupying less space. Those results are shown as
a single entry (figure 5.20 - gray entry), and have to be clicked on in order to be expanded. Once
expanded, a user can close it by clicking on it again.
Another advantage is providing insight about the users behavior. Aggregated results are represented
as a summary of the contained results. We show a maximum of four overlapping rectangles (figure 5.20
- gray entry) that correspond to the quartiles explained in section 5.3.1. Each rectangle has opacity,
so that when they overlap, they transmit the idea of being more frequent. Furthermore, we also show
start/end times. These times can overlap and lead to a confusing experience. To ensure this behavior,
we hide part of the text when components are overlapping. In order to view the whole content, a user
has the option to either zoom in, or hover and read the tooltip. When in an aggregation different location
names overlap, we show ”expand” instead of all the names, in order to reduce the cluttering presented
to the user.
In figure 5.20 we can see how results were aggregated. The first entry (with a gray background)
represents an aggregation. As you can see, different locations names were replaced by an ”expand”
placeholder. In this case, results were aggregated by end time: it’s possible to see that every entry of
ist-taguspark ends between 17 and 18 o’clock.
72
Figure 5.20: Result aggregation example
Minimum widths
Another important situation was defining minimum widths for paused time. Since width is proportional
to the duration it represents, we could not allow short times to be so small they would risk not be seen.
To solve it, we defined a minimum width (240px) to be just the size to fit an average sized location name
and start/end times.
Time consistency
When a user sketches a query its width will serve as a standard for the corresponding results. That is,
the width of the rectangles in the results, will be proportional to that of the query, so that comparison
between them can be easily done. Furthermore, all dates are aligned. It’s possible to see, in figure 5.20,
that dates in each entry are vertically aligned. That is, all the same times appear in the same axis, thus
facilitating comparison.
Map integration
To accomplish map interaction with the results panel, we trigger a javascript event when a result is
clicked, so that its spatial data is correctly shown on the map. When a user specifies a spatial range in
the query, he may chose to pick a location from the map. In this case, we use Google Maps API to draw
a circle, representing the specified range by the user, around the location.
73
74
Chapter 6
Evaluation
In the beginning of this dissertation we described a series of objectives our final solution should comply
with. Many of this objectives correspond to features that will have to be assessed to understand to what
extent are these easy to use. There is particular interest in evaluating how users deal with the query
interface, because it is the most relevant part of the solution. In section 4.4 we’ve seen that people could
use our visual language, now we will evaluate the system as a whole.
6.1 Experimental protocol
The main objective of the evaluation was to understand if users, in practice, understood and could use
our visual language and if they could, in general, use the system in a useful way that would be personally
relevant.
To evaluate the usability, a set of users were given the system to try and comment on its use. This
evaluation consisted of a session, composed of three parts:
1. Initial form to trace testers profile (e.g. age, gender, studies, etc.);
2. Tasks, a set of actions to perform with the application that covered all the possible solutions to a
certain problem;
3. Final questionnaire, aimed at usability and qualitative assessment of the system.
The mobility data used to evaluate our work consisted of two months of daily tracking. The same
dataset was used for each user.
For the evaluation we asked for a set of tasks to be completed. We measured the time users took to
finish the task, recorded the screen, and asked them to talk about their difficulties aloud. Those tasks
are related to the issues below:
• Evaluating the temporal part of a query - users should evaluate and describe (talking aloud) the
difficulties they encounter while elaborating the temporal part of a query. There will be focus on
the different ways a temporal query can be done, such as time intervals or specific times.
75
• Evaluating the spatial part of a query - users should follow the same principle as before: talk aloud
about the difficulties in making a spatial query, with particular focus on a range around a place and
a specific place.
• Evaluating how results are shown - tasks will ask users to find a precise result in the result list.
Users should identify it and comment aloud about the doubts they have.
• Evaluating the way a route is shown on the map.
• Evaluating the settings interface.
Data collected during the task execution includes: task duration time, number of errors, number of
clicks and other relevant annotations. In the end we asked users to fill a questionnaire that qualitatively
evaluates the implemented functionality. Users commented and classified the utility of the features
mentioned above. We also asked users to fill a questionnaire regarding the global feeling of the solution.
This questionnaire was based on System Usability Scale (SUS) 1.
6.1.1 User profile questionnaire
In order to obtain the users characterization, we used the questionnaire presented in appendix B.1 in
order to have a basic characterization of the testers universe (gender, age, studies).
Tests were executed on 21 people, 47.6% were male and 52.4% were female. 71.4% were aged
between 18 and 25 years old, 23.8% between 26 and 35, and only one person with more than 35 years.
Regarding studies, most of the users had a background in Science and Engineering (90.5%), while the
rest had a background in Social Sciences.
Regarding the users’ education level, 57.1% had a Bachelor’s Degree, 23.8% a Master’s Degree and
the rest finished High School.
All user profile results are shown in appendix D.1.
6.1.2 Tasks
Before starting the first stage, we explained the motivation behind our tool and asked users to briefly
explore the tool’s user interface and various functionalities in order to get comfortable with the tool and
reduce errors and long task execution time periods on the first tasks.
The task set (appendix A.2) consists of 12 different tasks focusing different areas of the interface
and different expressiveness query levels. An example of a task related to temporal expressiveness
is ”Search when I was at ist-alameda for more than 2 hours”. Another example, this time related to
exploring the results panel is ”Repeat the first query. Point all the results in which I went from castanheira
do ribatejo to ist-alameda”.
Despite monitoring task time, we decided to ask users to think out loud, during task execution. This
decision was made so that we could be aware of how they are thinking how to reach their goal. This way
1http://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html accessed: 09-09-2015
76
we can hear why users execute the proposed tasks the way they do and we also get verbal feedback
about how the participant feels about what is happening. For instance, users may show frustration,
annoyance or even happiness with the task outcome. We also collected the number of errors and the
number of clicks to accomplish each proposed task. We expected and verified that the impact of both
thinking out loud and tracking time to be of little relevance when making the analysis we want - see if
users understood and could use the system.
6.1.3 Overall questionnaire
After the tasks completion a second questionnaire was filled. The objective of this questionnaire was to
assert the system’s usability, users’ satisfaction and feedback for future work. It is available in appendix
B.2.
The questionnaire was divided into two parts. In the first part it evaluates the system’s usability using
the SUS methodology and in the second part some additional domain specific questions.
SUS scores have a range of 0 to 100, and in order to evaluate what is a good result, we looked into
several studies, as the opinions can slightly vary. In ”An Empirical Evaluation of the System Usability
Scale” [38], the author has done 2324 assessments with an average on 70.14 based on all the assess-
ments, although when the assessments are divided up into each project the average is 69.69. Thus, the
author claims that good systems get between 70-80 points, and exceptional systems get 90 or more.
The rational behind the second part of the questionnaire was to understand if users were willing to
collect their own geolocation data and use it in our application in order to gain insight about their daily
routes. To understand that, questions related to smartphone familiarity and usage were asked. Users
could state if they had already used any application that involved the GPS system, or if they use that
feature frequently. When users chose not track daily geolocation data, they were asked which reasons
led to such decision. We also wanted users to give our application a global appreciation score.
6.2 Results
In this section we will present all the results from both questionnaires and task execution.
6.2.1 Results by task
We delivered the tasks’ guide When the initial questionnaire was filled. The first task stated ”Search for
when I was at ist-alameda and then went to ist-taguspark ”. By analyzing table 6.1 we can see that, on
average, users took 36 seconds to perform the task, with a rather big standard deviation of 13 seconds.
Despite being a simple task, its average time is too big. A valid explanation for this value is the fact
that it was the first real interaction with the system. Furthermore, some users during this task dragged
the mouse in order to create a paused time, which would lead to specifying a duration. The expected
behavior was double clicking to create, thus explaining the recorded errors shown on the table.
77
The following task, ”Search when I was at ist-alameda for more than 2 hours”, despite being a little
more complex, because it required mathematical operators, presented a faster execution time and a
lower error rate - mainly explained because it was the second interaction with the system, and the basic
doubts were already clarified. To successfully complete the task, the user must sketch one paused
time and change its duration. Some users changed the duration, but forgot to specify the mathematical
operator for more than.
The third task has a complex nature: ”Search for dates where I was at ist-alameda from around
10h±120min to 12h±120min and then went somewhere else”. In essence, this task is very similar to
the first one, except it requires to add temporal fuzziness. Specifying fuzziness can be achieved in two
different ways, either by filling in the box, or by dragging the fuzziness line. This is the task that presents
the largest completion time. It can be explained because of the way fuzziness is shown - a user must
first click on the arrows on the side of the rectangle in order to the fuzziness information to appear. Since
this behavior is not obvious, users took more time in order to discover it. All this exploring phase lead to
a larger completion time, but also a larger number of errors and clicks.
The fourth task, ”Search when I spent less than 1 hour between ist-alameda and ist-taguspark ” is
very similar to the second task. This time however, instead of just a paused time, it requires two paused
times and a movement time. Comparing its results to the second one, we can see that completion time
decreased a little, even though we are dealing with a more complex task. This result is explained by user
familiarization with the interface, which demonstrates our interface is easy to understand and reuse.
The fifth task is the first to evaluate the spatial expressiveness. It states ”Search when I was in a 500
meter range from ist-alameda”. This task can be completed in different ways - by vertically dragging the
rectangle, or by filling in the box with a the specified number of meters. Users mainly opted to choose
vertically dragging, because it was more intuitive.
The sixth task is simple: ”Search all the days I was at casa”. However, users had some doubts on
how to proceed because specifying all days seemed not trivial. All users had to do to complete the task
was leaving the location field in blank. Because of their doubts, users explored the interface, which led
to a high number of clicks (7) when only 3 clicks were needed.
The seventh task is the last one evaluating spatial scope. It appears a simple task but, because it
requires map interacting it tends to take longer. ”Search for where I was at ist-alameda, then took route
A5 to ist-taguspark ”. This tasks presents a large completion time and error rate, mainly because of
the way a route has to be specified. Specifying the route is done by clicking on the map near the A5
motorway. To accomplish this, users had to sketch the query and then fill in the route of the movement
type by picking a location in the map (and this means users had to zoom the map and search for the
highway).
The eighth (”Add new category School and assign any color you wish”) and ninth tasks (”Assign ist-
alameda and ist-taguspark to School”) evaluate the settings area. They are simple tasks, representing
similar difficulty levels, with just one way of completing them. In both cases users easily completed the
task. Only a few errors were made, which included forgetting to save the definitions and trying to add a
place in the search box.
78
The tenth tasks focuses on exploring results. First a user was asked to repeat a query he made
earlier. With this behavior we could understand if the user evolved and completed the task in a faster
time. The task states: ”Repeat the first query. Point all the results in which I went from castanheira
do ribatejo to ist-alameda”. Comparing with the first task, the average time decreased by 21%, and the
standard deviation was also reduced to 4.2 seconds and users did not show any sign of trouble on how to
proceed. We can conclude users understood how to specify queries and could repeat the process with
ease in less time. Regarding the way results were shown, all users could easily identify the aggregated
result they were asked for.
The eleventh ”Repeat the fourth task. Which of the first three results took less time between from
castanheira do ribatejo and ist-alameda” can also be used to compare previously performed tasks. By
comparing this task to the fourth one, we can notice users completed the task with less clicks, in a faster
time with a smaller time deviation. Allowing us to conclude users evolved and managed to understand
how to specify queries. When exploring the results, user could easily understand which of the results
took less time, as they pointed out ”its the one with the smaller line”.
The twelfth task, ”Explore the previous results on the map”, had no metrics. It was set so we could
understand if users comprehended how to view results on the map area. We observed that users were
familiar with the way results were shown, as they are very similar to the way Google Maps shows a
navigation result. There was some confusion with some results that showed nothing on the map, but
that was because there was no data for that entry.
Table 6.1: Results by task
79
Figure 6.1: Clicks by task
Figure 6.2: Time by task
80
6.2.2 Results by feature
Below we present an analysis of the results for each feature.
General
Some users, in the beginning, did not know how to correctly create a time. They would drag to create,
instead of just double clicking. They were, somehow, correct - the difference is subtle. When a user
double clicks, a paused time is created - without any extra property. However, when a user drags the
mouse and releases the button, the created paused time will have an assigned duration, proportional to
the distance dragged by the mouse. Users after the first task were clarified about this behavior, for they
could not successfully complete the task without understanding that.
Temporal
When dealing with the temporal scope (using it for the first time), users did now know to to add fuzziness
to the query. A user must first click in the blue arrow at the rectangle’s endpoints in order for the fuzziness
interface to appear. However, after discovering the interface they could it well. Another less frequent
mistake was forgetting to assign a mathematical operator to time, when needed - thus giving a complete
different meaning to the query.
Spatial
As in with the temporal scope, users also forgot to add mathematical operators when dealing with spatial
constraints. Another relevant situation is related to the way a user must choose a route from the map.
To complete the seventh task, a user had to browse the map for the A5 highway and click on it - this led
to big completion times, since some users did not exactly know where the highway was.
Results
When visualizing results, users could easily spot the desired result the task asked them to find. They
did, however find that scrolling down to find the result would sometimes skip the desired entry. They
then suggested a ”order by” and ”search” mechanism to be included in the results panel.
Settings
The only major mistake that occurred in the settings interface was a user trying to add a new entry in
the categories table by writing on the search box.
81
6.2.3 Questionnaire results
After executing the proposed tasks and taking notes about how long each user took in each exercise,
we asked our users to answer a final questionnaire. As we have referred earlier, this questionnaire is
divided in 2 parts: first part where we used the System Usability Scale (SUS) and the second part where
we have specific questions about smartphone usage and GPS.
On table 6.2 we can see the SUS scores. Our overall SUS Score is 80.75, which means that we
have a good system and we are on the right track.
Regarding the second part of the questionnaire (results can be seen in appendix D.3), 57% of the
users gave a global appreciation (on a scale from 1 - terrible to 5 - very good) of 4, while 33% gave the
maximum classification. As for the questions about smartphone usage, only one user did not have one.
From those who had one, 66.7% said they used the GPS of their phone. The most common usage of
the GPS was map navigation, with 85.7% of the users using it. When asked if they would track their
daily geolocation data, 52.4% of the users answered no. The main reason was short battery life.
Table 6.2: SUS summarized
6.3 Discussion
Generally, the results of our experiments were positive, to the extent that most people considered that
our application is helpful and the tasks were adequate. Also, having a score of 80.75 on SUS is a very
good result, although we know that there is a lot to improve.
Overall the task results were good and corroborate that our efforts to build a simple and flexible
82
interface were well applied. Usability tests show that there are still some finetunings to be made (mainly
related to the way results are shown and map integration), but overall feedback from users is very
positive. The usability of the system was proven with the tests, with all users completing all tasks.
Situations to be tuned include allowing the user to order results and clearly stating an entry has no map
data.
Overall users feedback was also very positive. They stated that they gained a great perspective
on mobility patterns and would frequently use the tool if the collection method was simple and did not
drained the phone’s battery, thus proving the system to be of utility.
Regarding our objectives (initially specified in section 1.1) we can say that all of them were achieved.
Below we make a careful analysis of each objective and the topics that still need some intervention. The
main objective (Create a visual query language for personal mobility data that allows users to visually
query their personal spatio-temporal information, with personal semantics) was successfully achieved,
only with minor situations to correct and improve. User tests have shown us that our language was
easily manipulated and understood, and could easily generate complex queries in little time.
Another objective was to visually represent the resulting information. This was also accomplished,
but it requires more improvement, in order for results to be easily explored. Users stated they would like
results to be more dynamic, by having the possibility to order and search them.
83
84
Chapter 7
Conclusions
The growing offer and massive spread of GPS tracking devices can generate huge volumes of personal
mobility data. As we’ve seen in this report, there is no efficient way that simultaneously provides a
pleasant experience for the user to explore mobility data.
Thus we found the need to develop a solution that allows a user to query personal mobility data while
maintaining the personal semantics of the locations. First we needed to find the essential components
of our solution: the visualization and the query system. For the query system we had to understand what
was the expressiveness we were looking for - we analyzed what were the main components (temporal,
spatial and recurrency) and made a table that exemplifies the supported queries.
In order to understand the best compromise to use in the formulation of visual queries we analyzed
several works that also used this mechanism, so we could understand the advantages and disadvan-
tages of different approaches. After considering the different options (comic strips, timelines and graphs)
we chose the timeline, because it allows for a obvious time dependence between events.
We also analyzed several works to understand the best visualizations methods (space-time cubes,
2D maps, timelines, etc) and realized the best option was to integrate several of these components.
Thus we have chosen 2D maps to show the route, and a timeline to present all the results of a given
query.
So, based on the works reviewed, we proceeded to determine the expression of queries that a
personal system of this type should have. There we have one of our main contributions, as detailed
in Chapter 4. After specifying the visual language we had to see if users could actually use it, so we
implemented a solution for users to test. This validation was successful, as stated in section 4.4.
After concluding that users understood the visual language, we implemented a whole system includ-
ing that language, which is another contribution of our work. This system allows specifying queries and
exploring their results.
Our initial goals of creating a visual query language for personal mobility data that allows users to
visually query their personal spatio-temporal information, with personal semantics and visually represent
the resulting information were achieved, as our evaluation shows in chapter 6.
The system truly works. Despite having a few minor flaws to be perfected, we did manage to help
85
users understand their own personal mobility data.
7.1 Future Work
Regarding the current implementation, there are still several issues that need intervention:
• Interface for data annotation: as explained in section 3.1.2, our current interface for data annotation
is just a text box with a submit button in the browser. Ideally, this interface would allow selective
removal of entries, a dropdown of location suggestions and a more carefully though design.
• Complete coverage of the expressiveness table: table 4.1 specifies our visual language coverage.
We did not, however, implemented all the table cells. For instance, the column related to recursive
queries is not present in the current implementation. Implementing the remaining cells is the next
step.
• Video tutorial: the first time a user starts using the system, there are no visual clues where to start.
An initial video, explaining the basic concepts of our application would counter those problems.
• Order results by: during users’ tests, a few users suggested they could order the results of a query
either alphabetically or by date.
• Allow result comparison: another interesting suggestion, derived from users’ tests was the possi-
bility to compare results. One could analyze results side by side, and find out in which periods he
was more active, for example.
• Summary of aggregated results on the map: as explained in section 5.4.1, we made the decision
to simplify result aggregation on the map, just by showing the first result. Ideally, when inspecting
an aggregated result, it should show a summary of its tracks on the map (heap map, for example).
86
Bibliography
[1] G. Andrienko, N. Andrienko, U. Demsar, D. Dransch, J. Dykes, S. I. Fabrikant, M. Jern, M.-J. Kraak,
H. Schumann, and C. Tominski. Space, time and visual analytics. Int. J. Geogr. Inf. Sci., 24
(10):1577–1600, Oct. 2010. ISSN 1365-8816. doi: 10.1080/13658816.2010.508043. URL http:
//dx.doi.org/10.1080/13658816.2010.508043.
[2] B. Simpson, C. L. Giles, and A. M. MacEachren. Geodiscoverer: A search engine to integrate social
networks with geospatial information. Raytheon Technology Today, 4:12–13, 2007. URL http://
www.geovista.psu.edu/publications/2007/Simpson_GeoDiscoverer_in_RaytheonToday.pdf.
[3] W. Luo, A. M. MacEachren, P. Yin, and F. Hardisty. Spatial-social network visualization for ex-
ploratory data analysis. In 3rd ACM SIGSPATIAL International Workshop on Location-Based So-
cial Networks (LBSN 2011), Chicago, IL, November 1 2011. URL http://www.geovista.psu.edu/
publications/2011/Luo_2011_Spatial-SocialNetworkVisforEDA.pdf.
[4] W. Luo and A. M. MacEachren. Geo-social visual analytics. Journal of Spatial Information Sci-
ence, pages 27–66, 2014. doi: 5311/JOSIS.2014.8.139. URL http://www.geovista.psu.edu/
publications/2014/Luo_GeoSocial_JOSIS_2014.pdf.
[5] W. Luo. Geovisual analytics approaches for the integration of geography and social network
contexts. Master’s thesis, The Pennsylvania State University, University Park, Pennsylvania,
08/2014 2014. URL http://www.geovista.psu.edu/publications/2014/Wei_L_2014_Thesis_
Final.pdf.
[6] G. Andrienko, N. Andrienko, P. Bak, D. Keim, S. Kisilevich, and S. Wrobel. A conceptual framework
and taxonomy of techniques for analyzing movement. J. Vis. Lang. Comput., 22(3):213–232, June
2011. ISSN 1045-926X. doi: 10.1016/j.jvlc.2011.02.003. URL http://dx.doi.org/10.1016/j.
jvlc.2011.02.003.
[7] G. Andrienko, N. Andrienko, D. Keim, A. M. MacEachren, and S. Wrobel. Challenging problems of
geospatial visual analytics. Journal of Visual Languages & Computing, 22(4):251 – 256, 2011. ISSN
1045-926X. doi: http://dx.doi.org/10.1016/j.jvlc.2011.04.001. URL http://www.sciencedirect.
com/science/article/pii/S1045926X11000280. Part Special Issue on Challenging Problems in
Geovisual Analytics.
87
[8] G. Andrienko, N. Andrienko, D. Keim, A. M. MacEachren, and S. Wrobel. Editorial: Challenging
problems of geospatial visual analytics. J. Vis. Lang. Comput., 22(4):251–256, Aug. 2011. ISSN
1045-926X. doi: 10.1016/j.jvlc.2011.04.001. URL http://dx.doi.org/10.1016/j.jvlc.2011.04.
001.
[9] A. Thudt, D. Baur, and S. Carpendale. Visits: A Spatiotemporal Visualization of Location Histories.
pages 79–83. doi: 10.2312/PE.EuroVisShort.EuroVisShort2013.079-083. URL http://diglib.
eg.org/EG/DL/PE/EuroVisShort/EuroVisShort2013/079-083.pdf.
[10] G. di lorenzo, M. L. Sbodio, F. Calabrese, M. Berlingerio, R. Nair, and F. Pinelli. Allaboard: Visual
exploration of cellphone mobility data to optimise public transport. In Proceedings of the 19th
International Conference on Intelligent User Interfaces, IUI ’14, pages 335–340, New York, NY,
USA, 2014. ACM. ISBN 978-1-4503-2184-6. doi: 10.1145/2557500.2557532. URL http://doi.
acm.org/10.1145/2557500.2557532.
[11] J. Larsen, A. Cuttone, and S. Jrgensen. QS Spiral: Visualizing Periodic Quantified Self Data. 2013.
[12] T. Goncalves, A. P. Afonso, B. Martins, and D. Goncalves. St-trajvis: Interacting with trajectory
data. In Proceedings of the 27th International BCS Human Computer Interaction Conference,
BCS-HCI ’13, pages 48:1–48:6, Swinton, UK, UK, 2013. British Computer Society. URL http:
//dl.acm.org/citation.cfm?id=2578048.2578106.
[13] T. Goncalves, A. P. Afonso, and B. Martins. Visualizing human trajectories: Comparing space-time
cubes and static maps. In Proceedings of 28th British HCI Conference, HCI 2014 - Sand, Sea and
Sky - Holiday HCI (accepted for publication), BCS HCI, 2014.
[14] C. Plaisant, B. Milash, A. Rose, S. Widoff, and B. Shneiderman. Lifelines: Visualizing personal
histories. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
CHI ’96, pages 221–227, New York, NY, USA, 1996. ACM. ISBN 0-89791-777-4. doi: 10.1145/
238386.238493. URL http://doi.acm.org/10.1145/238386.238493.
[15] T. Kapler and W. Wright. Geotime information visualization. In Information Visualization, 2004.
INFOVIS 2004. IEEE Symposium on, pages 25–32, Oct 2004. doi: 10.1109/INFVIS.2004.27.
[16] F. F. Leandro. Visual mobility - visual exploration of personal mobility data, 2012.
[17] N. Andrienko and G. Andrienko. Visual analytics of movement: An overview of methods, tools and
procedures. Information Visualization, 2012.
[18] M. Lu, Z. Wang, and X. Yuan. Trajrank: Exploring travel behaviour on a route by trajectory ranking.
In Visualization Symposium (PacificVis), 2015 IEEE Pacific, pages 311–318, April 2015. doi: 10.
1109/PACIFICVIS.2015.7156392.
[19] C. Chen. Top 10 unsolved information visualization problems. Computer Graphics and Applications,
IEEE, 25(4):12–16, July 2005. ISSN 0272-1716. doi: 10.1109/MCG.2005.91.
88
[20] T. Goncalves, A. P. Afonso, and B. Martins. Visualization techniques of trajectory data: Challenges
and limitations. In Proceedings of the 2nd AGILE PhD School 2013, volume 1136 of CEUR Work-
shop Proceedings, http://ceur-ws.org/Vol-1136/paper3.pdf, 2014.
[21] P. C. Wong, H.-W. Shen, C. Johnson, C. Chen, and R. B. Ross. The top 10 challenges in extreme-
scale visual analytics. Computer Graphics and Applications, IEEE, 32(4):63–67, July 2012. ISSN
0272-1716. doi: 10.1109/MCG.2012.87.
[22] G. Robertson, R. Fernandez, D. Fisher, B. Lee, and J. Stasko. Effectiveness of animation in trend
visualization. IEEE Transactions on Visualization and Computer Graphics, 14(6):1325–1332, Nov.
2008. ISSN 1077-2626. doi: 10.1109/TVCG.2008.125. URL http://dx.doi.org/10.1109/TVCG.
2008.125.
[23] B. Tversky, J. B. Morrison, and M. Betrancourt. Animation: Can it facilitate? Int. J. Hum.-Comput.
Stud., 57(4):247–262, Oct. 2002. ISSN 1071-5819. doi: 10.1006/ijhc.2002.1017. URL http:
//dx.doi.org/10.1006/ijhc.2002.1017.
[24] D. Calcinelli and M. Mainguenaud. Cigales: A visual query language for geographical information
system: The user interface. Journal of Visual Languages and Computing, 5:113–132, 1994.
[25] B. Meyer. Beyond icons. In R. Cooper, editor, Interfaces to Database Systems (IDS92), Workshops
in Computing, pages 113–135. Springer London, 1993. ISBN 978-3-540-19802-4. doi: 10.1007/
978-1-4471-3423-7 8. URL http://dx.doi.org/10.1007/978-1-4471-3423-7_8.
[26] C. Bonhomme, C. Trepied, M.-A. Aufaure, and R. Laurini. A visual language for querying spatio-
temporal databases. In Proceedings of the 7th ACM International Symposium on Advances
in Geographic Information Systems, GIS ’99, pages 34–39, New York, NY, USA, 1999. ACM.
ISBN 1-58113-235-2. doi: 10.1145/320134.320144. URL http://doi.acm.org/10.1145/320134.
320144.
[27] J. F. Allen. Towards a general theory of action and time. Artif. Intell., 23(2):123–154, July
1984. ISSN 0004-3702. doi: 10.1016/0004-3702(84)90008-0. URL http://dx.doi.org/10.1016/
0004-3702(84)90008-0.
[28] M. Monroe, R. Lan, J. Morales del Olmo, B. Shneiderman, C. Plaisant, and J. Millstein. The
challenges of specifying intervals and absences in temporal queries: A graphical language ap-
proach. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
CHI ’13, pages 2349–2358, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1899-0. doi:
10.1145/2470654.2481325. URL http://doi.acm.org/10.1145/2470654.2481325.
[29] J. Jin and P. Szekely. Interactive querying of temporal data using a comic strip metaphor. In Visual
Analytics Science and Technology (VAST), 2010 IEEE Symposium on, pages 163–170, Oct 2010.
doi: 10.1109/VAST.2010.5652890.
89
[30] J. Jin and P. Szekely. Querymarvel: A visual query language for temporal patterns using comic
strips. In Visual Languages and Human-Centric Computing, 2009. VL/HCC 2009. IEEE Symposium
on, pages 207–214, Sept 2009. doi: 10.1109/VLHCC.2009.5295262.
[31] L. Nocera, A. Rihan, S. Xing, A. Khodaei, A. Khoshgozaran, F. Banaei-Kashani, and C. Shahabi.
Geodec: A multi-layered query processing framework for spatio-temporal data. In Proceedings
of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information
Systems, GIS ’09, pages 546–547, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-649-6.
doi: 10.1145/1653771.1653869. URL http://doi.acm.org/10.1145/1653771.1653869.
[32] C. Shahabi, F. Banaei-Kashani, A. Khoshgozaran, L. Nocera, and S. Xing. Geodec: A framework
to visualize and query geospatial data for decision-making. IEEE Multimedia, 17(3):14–23, 2010.
ISSN 1070-986X. doi: http://doi.ieeecomputersociety.org/10.1109/MMUL.2010.1.
[33] L. Certo, T. Galvao, and J. Borges. Time automaton: A visual mechanism for temporal querying.
J. Vis. Lang. Comput., 24(1):24–36, Feb. 2013. ISSN 1045-926X. doi: 10.1016/j.jvlc.2012.10.001.
URL http://dx.doi.org/10.1016/j.jvlc.2012.10.001.
[34] D. J. Peuquet. It’s about time: A conceptual framework for the representation of temporal dynamics
in geographic information systems. Annals of the Association of American Geographers, 84(3):
441–461, 1994. ISSN 1467-8306. doi: 10.1111/j.1467-8306.1994.tb01869.x. URL http://dx.
doi.org/10.1111/j.1467-8306.1994.tb01869.x.
[35] M. Schneider and T. Behr. Topological relationships between complex spatial objects. ACM Trans.
Database Syst., 31(1):39–81, Mar. 2006. ISSN 0362-5915. doi: 10.1145/1132863.1132865. URL
http://doi.acm.org/10.1145/1132863.1132865.
[36] G. Andrienko, N. Andrienko, and S. Wrobel. Visual analytics tools for analysis of movement data.
SIGKDD Explor. Newsl., 9(2):38–46, Dec. 2007. ISSN 1931-0145. doi: 10.1145/1345448.1345455.
URL http://doi.acm.org/10.1145/1345448.1345455.
[37] STNexus: An Integrated Database and Visualization Environment for Space-Time Informa-
tion Exploitation, Orlando, FL, Nov. 29 - Dec. 2 2005. URL http://www.geovista.psu.edu/
publications/2005/Weaver_ARDA_05.pdf.
[38] A. Bangor, P. T. Kortum, and J. T. Miller. An empirical evaluation of the system usability scale. Int. J.
Hum. Comput. Interaction, 24(6):574–594, 2008. URL http://dblp.uni-trier.de/db/journals/
ijhci/ijhci24.html#BangorKM08.
90
Appendix A
Test protocols
A.1 First test protocol
91
Where Have I Been is an application that allows to search your own personal geolocation data. Interface Firstly we’ll start with an overview of the interface:
1. Search area This area allows you to make your queries. A blue colored rectangle represents a period of time you were indoors. A gray line represents a period of time when you were moving. Queries allow different parameters to be specified. All of those parameters are optional. Start/end time Here you state you want results starting/ending at a specific hour. Using mathematical symbols (>, <, ≥, ≤) you can specify things like “after” or “before”.
Temporal Range Specifies a range in the starting/ending time. Below valid start times would be from 11h55m to 12h25m.
92
Location name / coordinates It can be place represented by name “home”, “school”, etc., or coordinates that can be selected on the map.
Spatial Range Can be changed by vertically dragging the rectangle’s border or writting directly on the box. In the query below, all results within 329meters of ”home” would be a match.
Duration It means for how long the stay/trip was. Using mathematical signs we can state if the stay/trip was longer than… or less than...
93
Double clicking on the timeline will create a rectangle in which start and end time are undefined Clicking and dragging will create a rectangle which duration will change When the second rectangle is created, a route connecting the first and second locations will automatically appear.
2. Results area Shows the results matching your query. Results follow the same presentation as the query. Results may be aggregated by similarity, in which case, they will appear transparent and will be collapsable.
3. Map area Shows the routes and locations of a result. When you click on a result, if there is enough data, the map will highlight the locations and routes related to that result.
4. Settings Possibility to assign categories to locations. Different categories will be presenetd with different colours in the results area.
94
Tasks 0. Explore the interface freely 1. Search for dates where I arrived at istalameda at precisely 9h16min. 2. Search for dates where I was at istalameda from around 10h±120min to 12h±120min and then went somewhere else 3. Search for when I was at istalameda and then went to isttaguspark 4. Search for where I was at istalameda, then took route A5 to isttaguspark 5. Search when I was at ist alameda for more than 2 hours 6. Search when I spent less than 1 hour between istalameda and isttaguspark 7. Search when I was in a 500 meter range from istalameda 8. Search all the days I was at casa
95
A.2 Final tests protocol
96
Where Have I Been
Test protocol
First of all, thank you for taking your time in testing this application.
Where Have I Been allows you to search your own personal geolocation data. In this case it
will not be your personal geolocation data, but mine.
You’ll be asked to navigate around the interface so that you become familiar with the
interface and then you’ll perform a few tasks.
If you authorize we will also record the process in video, solely for research purposes.
We will also record the time, but there is no need to feel any pressure, we are just making it
for statistical purposes.
We will not be tracking any identifying information. Your responses are completely
anonymous. You may be assured of complete confidentiality. The information you provide
will be stored only to track survey completion. The data will be reported only in the aggregate
and no individual will be identified.
First, we’ll give you around 5 minutes to explore Where Have I Been’s interface freely. You
can do anything you wish – we just want to get you a little comfortable.
After that we ask you to perform the tasks described in the following page (by order).
We will record the screen and time each task, so we can later analyze your task and count
the number of clicks. We’ll also take written notes regarding your execution of each task.
We ask you to, during the whole process, express your doubts, concerns and problems out
loud, so that we can understand if you are making progress.
There will be a break in the middle of the tasks.
97
Tasks
First, we’ll give you some time to explore the interface. When you think you’re okay, tell us.
A) Dealing with time
1. Search for when I was at ist-alameda and then went to ist-taguspark
2. Search when I was at ist-alameda for more than 2 hours
3. Search for dates where I was at ist-alameda from around 10h±120min to
12h±120min and then went somewhere else
4. Search when I spent less than 1 hour between ist-alameda and ist-taguspark
B) Dealing with space
5. Search when I was in a 500 meter range from ist-alameda
6. Search all the days I was at casa
7. Search for where I was at ist-alameda, then took route A5 to ist-taguspark
C) Adjusting settings
8. Add new category School and assign any color you wish
9. Assign ist-alameda and ist-taguspark to School
D) Search results
10. Repeat the first query. Point all the results in which I went from castanheira do
ribatejo to ist-alameda
11. Repeat the fourth task. Which of the first three results took less time between
from castanheira do ribatejo and ist-alameda
12. Explore the previous results on the map
98
Appendix B
Questionnaires
B.1 User profile questionnaire
1. Gender
(a) Male
(b) Female
2. Age
(a) 18-25
(b) 26-35
(c) 36-50
(d) 51+
3. Academic studies
(a) Science/Engineering
(b) Humanities and Social sciences
(c) Health sciences
(d) Other
4. Academic degree
(a) Did Not Complete High School
(b) High School
(c) Bachelors Degree
(d) Master’s Degree
(e) Advanced Graduate work or Ph.D.
99
B.2 Overview questionnaire
B.2.1 Part One
1. I think that I would like to use this system frequently.
2. I found the system unnecessarily complex.
3. I thought the system was easy to use.
4. I think that I would need the support of a technical person to be able to use this system.
5. I found the various functions in this system were well integrated.
6. I thought there was too much inconsistency in this system.
7. I would imagine that most people would learn to use this system very quickly.
8. I found the system very cumbersome to use.
9. I felt very confident using the system.
10. I needed to learn a lot of things before I could get going with this system.
B.2.2 Part Two
1. Global appreciation
(a) Linear scale (1 - Terrible to 5 - Very good)
2. Do you have a smartphone with GPS?
(a) Yes
(b) No
3. Do you make use of the GPS?
(a) Yes
(b) No
4. In which situations?
(a) Map navigation
(b) Track recording
(c) Photo geotagging
(d) Other
5. Would you track your daily routes?
(a) Yes
100
(b) No
6. What are the reasons that prevent you from tracking your routes?
(a) Privacy concerns
(b) Battery issues
(c) Other
101
Appendix C
GPX Library Documentation
102
Where Have I Been
GPX Library
Intro
This repository offers the source code of the GPX library used in Where Have I Been.
We devised a library to process GPX files, based on the work of tkrajina.
This library has several main purposes:
Smoothing GPX files Dividing GPX files into tracks representing, each, a moment of movement Reducing dataset size
Auxiliary library to process GPX tracks
Important methods specified by calling order:
Dividing and splitting tracks
If there is a strange variation of distance or time between two points, the track is divided in two. If those distances are too close, it implies those tracks should be the same and are, therefore, combined into one track.
track2trip(split_on_new_track, split_on_new_track_interval, min_sameness_distance)
split_on_new_track - whether or not a track should be split into a different file
split_on_new_track_interval - temporal distance between two points in order to consider splitting it
min_sameness_distance - minimum distance (in meters) in order to consider splitting the file
103
Smoothing tracks
Smooths track data, based on the implementation by Tkrajina, focuses on two different ideas: calculating average distance between points to understand which points are outliers and therefore need to be removed, and applying a ratio to the other points to achieve a smooth, well-fitting path.
smooth(remove_extremes, how_much_to_smooth)
remove_extremes - remove outlying points
how_much_to_smooth - decimal value that specifies how much to smooth
Visually simplifying tracks (Ramer Douglas-Peucker)
Adaptation of Ramer Douglas-Peucker, by Tkrajina, including both spatial and temporal constraints. This version takes in account the temporal distance between the original curve and the simplified curve.
simplify(max_distance, max_time)
max_distance - specifies, in kilometers, what is the expected maximum space between track points after the simplification
max_time - specifies, in seconds, the maximum time between two points that is expected after the simplification
Reducing number of points
Intended to remove points that are not within a minimum distance of each other. This minimum separation between points is specified in meters.
reduce_points(min_distance, min_time)
min_distance - the minimum distance between points (meters)
min_time - the minimum time between points (seconds)
104
Appendix D
Evaluation results
D.1 User profile results
Figure D.1: Users’ gender summarized
Figure D.2: Users’ age summarized
105
Figure D.3: Users’ studies summarized
Figure D.4: Users degree summarized
106
D.2 Task results
107
task user clicks time error task user clicks time error
1 9 34 0 1 16 40 0
2 12 30 0 2 20 55 1
3 15 25 0 3 26 46 1
4 10 32 0 4 13 32 0
5 11 26 0 5 15 39 0
6 13 40 1 6 25 61 2
7 18 27 0 7 15 36 0
8 19 63 2 8 20 43 0
9 15 33 0 9 26 51 0
10 11 41 1 10 23 49 1
11 9 55 2 11 20 34 0
12 12 36 0 12 25 44 0
13 15 39 0 13 13 38 0
14 10 26 0 14 26 65 4
15 11 71 3 15 20 60 2
16 11 29 0 16 13 35 0
17 13 36 0 17 16 42 1
18 8 45 1 18 29 57 2
19 13 29 0 19 28 68 4
20 14 22 0 20 13 33 0
21 18 25 0 21 30 70 4
1 7 25 0 1 15 30 0
2 9 35 0 2 14 25 0
3 9 33 0 3 12 23 0
4 10 45 1 4 16 34 0
5 7 22 0 5 12 20 0
6 11 31 0 6 20 41 1
7 7 20 0 7 12 33 0
8 15 44 0 8 12 27 0
9 20 55 2 9 22 50 2
10 12 31 0 10 21 42 1
11 9 25 0 11 13 29 0
12 7 20 0 12 20 25 0
13 12 36 0 13 16 31 0
14 8 24 0 14 19 42 1
15 21 47 1 15 17 25 0
16 11 26 0 16 18 28 0
17 8 21 0 17 17 33 0
18 7 43 1 18 19 41 1
19 15 40 1 19 12 22 0
20 10 32 0 20 15 29 0
21 13 25 0 21 13 21 0
3
4
1
2
108
1 7 18 0 1 12 30 0
2 11 30 1 2 14 35 0
3 8 21 0 3 22 65 2
4 10 39 1 4 11 51 2
5 9 25 0 5 16 40 1
6 12 35 1 6 11 27 0
7 8 22 0 7 12 26 0
8 10 34 1 8 15 42 1
9 8 29 0 9 22 60 2
10 8 26 0 10 11 29 0
11 12 42 2 11 12 31 0
12 10 28 1 12 16 51 2
13 8 21 0 13 15 44 1
14 7 19 0 14 25 55 2
15 15 45 2 15 23 63 2
16 8 22 0 16 16 41 1
17 8 23 0 17 11 25 0
18 8 20 0 18 13 30 1
19 8 21 0 19 11 26 0
20 8 23 0 20 20 42 1
21 7 18 0 21 26 59 2
1 5 5 0 1 6 15 0
2 10 18 1 2 6 16 0
3 5 6 0 3 8 20 0
4 5 7 0 4 7 22 0
5 11 25 1 5 6 15 0
6 21 65 3 6 6 14 0
7 5 5 0 7 7 17 0
8 5 5 0 8 8 20 0
9 5 5 0 9 10 25 1
10 5 6 0 10 11 26 1
11 7 7 0 11 6 15 0
12 16 44 2 12 7 17 0
13 5 5 0 13 10 22 1
14 5 6 0 14 12 25 1
15 5 6 0 15 6 15 0
16 5 7 0 16 7 18 0
17 12 30 1 17 6 16 0
18 5 5 0 18 6 15 0
19 5 6 0 19 7 19 0
20 5 8 0 20 9 20 1
21 5 7 0 21 6 15 0
7
8
5
6
109
1 6 17 0 1 12 25 0
2 6 16 0 2 13 25 0
3 6 18 0 3 12 23 0
4 6 16 0 4 14 24 0
5 7 20 0 5 12 20 0
6 6 19 0 6 12 22 0
7 6 16 0 7 12 21 0
8 6 15 0 8 14 27 0
9 7 21 0 9 13 21 0
10 6 19 0 10 13 26 0
11 10 25 1 11 13 21 0
12 6 16 0 12 12 22 0
13 6 18 0 13 16 31 1
14 6 16 0 14 12 23 0
15 11 22 1 15 13 25 0
16 12 25 1 16 12 23 0
17 11 26 1 17 15 33 1
18 6 16 0 18 13 25 0
19 6 17 0 19 12 22 0
20 6 16 0 20 11 23 0
21 6 19 0 21 13 21 0
1 9 30 0
2 12 28 0
3 11 25 0
4 10 32 0
5 11 26 0
6 13 26 0
7 10 27 0
8 15 38 1
9 11 33 0
10 11 26 0
11 9 34 0
12 12 36 0
13 11 32 0
14 10 26 0
15 11 27 0
16 11 29 0
17 9 23 0
18 8 28 0
19 13 29 0
20 9 22 0
21 11 25 0
119
10
110
D.3 Overall questionnaire results
Figure D.5: SUS summarized
Figure D.6: Global appreciation
Figure D.7: Smartphone possession
111
Figure D.8: GPS usage
Figure D.9: GPS usage situations
Figure D.10: Would users track location
112
Figure D.11: User concerns on tracking
113
114