Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
�
��
������������ ����������������� ������
����������������������
�
�
�
�
�
�
�
�
�
��
�
���
������������
� �������������������
���������
��
����������������
�
�
�
������������ ���������������������������
����������
�
�
����
������������ ���������������� ����������
�
����������������������������������
��������������������
�
�
�
������������������ ��
������������ ��������������������������
��������������������
���������������� ����������� ��
�
�
�
�
������������������������
�
�
���
�
�
�
������������
� �������������������
�
�
�
����������� �
�
�
���
���������� �������
�
�� ������� ��� �� ���������������������� ����� ����� ����� ��� ���� ������ ������
���! �� ����"�� ����# �#$��������%#�� ����# ������&�������#' �'#�(�
� ������ #��� ��)�������"���#)���������$#����(� !�� ���!��$�#�������* ��������
�++� �,-����� � ����)�� #$� � $#�����# � �%� #�#!�� #�� � �'��� ��� � � ���
%#) ����'#���+�
���� �������$��������� �����$#) ���#���� %#��%��� ��$�����!���)������* ��������
���������!����#�'������'���������!�+�
�
�
���.�/� )����01��0��2�
� � �
������������
���������������������
� �
�
�
����
� �������������������
�
�� �#�� ��� �%���� ����� ����%�� '#�(� "�� ��� � � ��� ������ ������ ���! � � ��
��"�� ����# �#$��������%#�� ����# ������&�����#�����������%��'#�(�
'���� #� ��! �$�%� �� %# ����)��# � $�#�� � �� #���� "��# +� � ������ %# ����)��# ���"�
'�������( ������ ��)����%( #'��!��� �������%#�"�������������� �'���� �
����+�
��) ����� �����3�#��#��� %�"#��%��#$�45��� ���,-������ ����)��#$�� $#�����# �
�%� #�#!�� �#'����� "��!������+����$#��� �� ��� � � �)��#�� #$� ��� ��#�� ������ ������
�%���� ����� #� "#���# � #$���� ������ ���� � � "��!����3�� � �� � ���������� )��� ���
�$� %����"�#"�����$����%���+�
��) ����(��$������$#) ��!)�����#$�� ��$#�����"��!�������� ������#��������������� �
�$����'����#$������!������* �������������������!����#�'������'���#(����
�����!��� �������45��� �����) �����������������!����#�")��������� ���# ����
45��) �������� '����� # � '��%�� ���� #$� ��)� ��� ��� "��%�� '�#� �)�������
"��!����3�������+�
�
�
�
���.�/� )����01��0��2�
� � �
������������
���������������������
�
�
������
�
�
�
�
�
�
�
�
�
�
�
�
� �
����������
Dedicated to my Parents, Family,
and
All my respected Teachers and Advisors
�
�
����
���������� ��
�
���������� ����������������� ������ ���������� ������������� �����������
��� �������������� �� ������������������ ������� ������������������������
� ����� ��������������� � ��� ������ ���� �!�������� �������������"� � �� ����
� �"� � ������������#�������� ������������� ����������� ���������������"����
������ �������������
$�������� ������%��� ��������"������������������ ��"������� ���!���#&&�������
������� ������ ����� �������������� ����� ������������"����%����� ��"�� '
������ ��!���(���)������ �� ������������������������� ���������������������
�� ������"� � � ��� ��� � ���� �� ������ ������� ����� �� ����"� ������ ���� ����������
� ������� ������%�� ��� ������ �� ���"� ������ �� ��� ��� �� ���� ������� � �
���� �������*����"� ������������� ���+ ������ �� ������������� ��� ����� ������
������ ��������� ������������������ ��������������� �����
,�������� ������%��� ����%�����"�� ����������� ��� �� ��������+ �������#�������
!����������+��-���� ������ ���������������������%������� ���� ����������
��� ������ ��� ����"� �!����������
$����"���"������������������ ��� ��"�����"�� ��������� ����� ����� ����������
� �������� ���������� ����� ���� ��������������������������������� �!���
� �
�
������������
� �������������������
�
� �
�
�
���
���������
������������ ����������������� ��������������������
�������
.�� ������� �� "���� ��� ��� � � ���� ���� �"� �� ������������ ��� ���� ��
"���� ���� ���� ��� ���/� ��� ����� ���� ������������ ��� ������� ������"�
��� ���� �� � � ������ ���������� ����� � � ����� ��� ���� ��� ���� "���� ����� ��
��� ������� �� "���� ���� ����� ����� ���� � �� ����� ����� �� ������� �� ���
����������� �'� �������� ���� ���� �%�� ���������� ��� ������ ����
��� ������� ��� ���������� ���*��� �� �� ������� �� �� � ����� ������� ���
��������"� �� ������ ������� ��� ���� � -�������� �������� $ ��*����� ���
$��� %� ���� ���� ������ ���� ������� ��������� � �� ��� ������� �� "�����
-���� ��������������� ������� ������������ ����� ����� �����"����� ��������
��������"����� ���������������� �� ���������� ����� ������������������� ���
�� ��� ���� �� "���� ������� �� �� ���� �� ����� �� ��� ���� �� ��������� � � ���
������������������������� ���� ������������������������������� ��"��������� ��
���/���� ����������������
-��� ������ ����������� �� ��� ������� �� "����� ��"� � ������ ��
� �������������� ��� ��� ��������������������� ��� ��������������������/�� �������
������������������������ ������������ ������� ��� ������������������������
������������ ������������������-������� ������������ ������� �����������
������������ ������������������"��� �� ���������������� �����'�������� ���
���� � ������� �� �� ���� ������� ��� � �� � �������� ��� ������ ��� ��� ��������
"����� - � �������� ���'����� ��� ������� ��� ��������� ��� �� ���������0� ����
������������ �������� � ������ 1������ ����� ��� � ��� �2�� ��� ����%'���� �� �����
������������� ��� � ��� ������ ���� ���� �� ���� ��� � ��������� � ������� ����
� �������"����� �� ���� ����������'������������������� �� �������������� ��
���'����� ��� ������� �� "����� #������"�� ����'���� � ��� ������ ����������
������*��� ������ �� �� ���� ������ ������� �� ��� �������� �������"� ���� � �
������� ������������ �������������������� ����������������� ���-���������"�����
� ������� ������������ �������� ������������*��������� ����� ������� ��������
�
�
����
�� ����������� ��������������"�������������� ������������������ ��������� ��
��������������� �������
.�� ������� ��"�����������������������������"������������������ ���
��������������������� ������� ������������������ ������� ���� �����������
� ������� ���� ���� � ���� ��� ��� ���� ����"� ������ ���� �������� ��
������� ����������������� ������� ��������*���������������������������� ���
���� �%������ ����� ���� ������ �� ������������ ����0�12������������� �����
1�2� � �� ��� ������� ��� -��� �� ������ � ����� ���� ���� �%� 1������ $��� %��
-�����������$ ��*���2��� ����� �� ���������� ����������������������������"������
#������"�� � �����"� ���� � ��������� � �� ����� ��� � ������� � ������ 1�������������
� ����� �2�� #���� ����� ����� ��� ��� �����3��� � �� � ������ �� �� ��� ���3���
��� ������� �� � �� ���� ����� ����� +� ��� � �������� �� �����3��� � � � ������� ����
�� � ��� ��� ������� �� � ���� ���� ��� �� ��� ����� ����� � � � ������
��� ������� ������"�������� ���������� �������������3���� ��� ����������������
� ��� ����� )'���� ���������� �� � ������� � �������� ���� ��� � � ������� ���
�� ������������������ ����������"��$��������)',�����,����� ���������+ � �"�
(�����3�� �� ���������� �� ������ ������� ���3�����%���� ����� ������������ ��
��� ������� �������������"��4������������������ �������������� � ���� ����
������������������ ������� ��� ������������'�������� ��������������������
�� �����������������"��%������������ ���� ���� ���������� ������� �� �����
�� � ���������*������������������ ����� ������������� ������ ������������������
���������� � � ���������*�����������������������"��������� �������� ����������
���$'��������+ ������� ����� ���������� ����� � ���������*�����������5�56�
��� 7�86� ������� �����"� �� ������ � �� ������ ��� � �� ��� ������� ���
����������"��
��� ����� �� ���'� ���� ����������"�� �� � ��� � ���� ��� ��� ������� � ��
�������� ������� ���� � ��������� � ����������� � ������ ��� � ����� ������
#������"�� �� � ��� � �� ��� ������� �� � ���� ��� ��� ���� ��� �����"� �� ������
������� ��� ���� � � � ��� ������ ������� ����� � � ���� ��������� �� �� �������
�������� ����� ���
�
xii
TABLE OF CONTENTS
Chapter 1 Introduction ..................................................................................... 1
1.1 Recommendation Systems .............................................................. 2
1.2 Location Based Recommendation Systems ..................................... 3
1.3 Food Recommendation Systems ..................................................... 4
1.4 Challenges ...................................................................................... 6
1.4.1 Scalability ............................................................................ 6
1.4.2 Cold Start ............................................................................. 6
1.4.3 Data Sparseness .................................................................... 7
1.4.4 Over Specialization Problem ................................................ 7
1.4.5 Recommendation of Popular Objects.................................... 7
1.4.6 Attacks on Recommendations .............................................. 7
1.5 Scope of Research .......................................................................... 8
1.6 Motivation ...................................................................................... 9
1.7 Contributions ................................................................................ 10
1.8 Organization of Dissertation ......................................................... 13
Chapter 2 Overview of Recommendation Systems ......................................... 14
2.1 Overview ...................................................................................... 15
2.1.1 Content-based Recommendation Methods .......................... 16
2.1.2 Collaborative Filtering based Recommendation Method .... 17
2.1.3 Hybrid Recommendation Methods ..................................... 19
2.2 Criteria for Recommendations ...................................................... 20
2.2.1 Accuracy ............................................................................ 21
2.2.2 Familiarity .......................................................................... 21
2.2.3 Novelty .............................................................................. 22
2.2.4 Diversity ............................................................................ 22
2.2.5 Context Compatibility ........................................................ 23
2.2.6 Justification of Recommendations ...................................... 23
2.2.7 Sufficiency of Information ................................................. 24
2.3 Similarity Calculations in Recommendation Systems ................... 25
2.3.1 Cosine Based Similarity ..................................................... 25
2.3.2 Correlation Based Similarity .............................................. 26
xiii
2.3.3 Adjusted Cosine Similarity ................................................. 27
2.4 Evaluation of Recommendations................................................... 27
2.4.1 Prediction Metrics .............................................................. 28
2.4.2 Quality of the set of recommendations ............................... 29
2.4.3 Quality of the List of Recommendations ............................ 30
2.4.4 Novelty and Diversity ........................................................ 30
2.4.5 Stability .............................................................................. 31
2.4.6 Reliability .......................................................................... 31
2.5 Cloud Computing in Recommendation Systems ........................... 32
2.6 Summary ...................................................................................... 35
Chapter 3 Real-Time Venue Based Recommendation System ........................ 37
3.1 Location Based Recommender Systems ........................................ 38
3.1.1 Geo Tagged Media Based................................................... 38
3.1.2 Point Location Based ......................................................... 39
3.1.3 Trajectory Based ................................................................ 40
3.2 Distinguishing Features of Locations ............................................ 40
3.2.1 Location Hierarchy ............................................................. 40
3.2.2 Distance of Locations and Users......................................... 41
3.2.3 Sequential Ordering............................................................ 41
3.3 Motivation .................................................................................... 42
3.4 Related Work ................................................................................ 45
3.4.1 Matrix Factorization Techniques ........................................ 48
3.4.2 Explicit Rating Techniques ................................................ 50
3.4.3 Implicit Rating Techniques ................................................ 51
3.4.4 Route Recommendation Techniques .................................. 51
3.4.5 Locations Recommendation Techniques ............................ 53
3.4.6 Group Recommendation Techniques .................................. 55
3.5 Quantitative Analysis .................................................................... 57
3.5.1 Datasets .............................................................................. 57
3.5.2 Techniques ......................................................................... 58
3.5.3 Experiments ....................................................................... 59
3.5.4 Observations ...................................................................... 60
3.6 Real-Time Venue Recommendation Model .................................. 66
xiv
3.6.1 System Architecture ........................................................... 70
3.6.2 Proposed Algorithm ........................................................... 70
3.6.3 Complexity of Clustering Algorithm .................................. 73
3.6.4 Complexity of Ranking Algorithm ..................................... 73
3.7 Formal Verification ...................................................................... 73
3.7.1 High Level Petri Nets ......................................................... 73
3.7.2 SMT-Lib and Z3 Solver ..................................................... 74
3.7.3 Modeling and Analysis of Proposed Algorithm .................. 76
3.7.4 Verification Property .......................................................... 78
3.8 Experimental Setup and Results .................................................... 78
3.8.1 Experimental Setup ............................................................ 78
3.8.2 Results ............................................................................... 79
3.9 Summary ...................................................................................... 90
Chapter 4 A Smart Food Recommendation System ........................................ 91
4.1 Health Recommendation Systems ................................................. 92
4.1.1 Significance ....................................................................... 95
4.2 Motivation .................................................................................... 98
4.3 Related Work .............................................................................. 100
4.4 Proposed Model .......................................................................... 101
4.4.1 Diet-Right Architecture .................................................... 102
4.4.2 Proposed Algorithm ......................................................... 103
4.4.3 Ranked List Generation .................................................... 107
4.4.4 Complexity of Proposed Algorithm .................................. 111
4.5 Formal Verification .................................................................... 112
4.5.1 Modeling and Analysis of Proposed Algorithm ................ 113
4.5.2 Verification Property ........................................................ 115
4.6 Experimental Setup and Results .................................................. 115
4.6.1 Experimental Setup .......................................................... 115
4.6.2 Results ............................................................................. 116
4.7 Summary .................................................................................... 130
Chapter 5 Conclusion and Future Work ....................................................... 131
5.1 Conclusions ................................................................................ 132
5.1.1 Real-Time Venue Recommendations ............................... 132
xv
5.1.2 Food Recommendation ..................................................... 134
5.2 Opportunities in Recommendation Systems ................................ 136
5.2.1 Healthcare ........................................................................ 136
5.2.2 Transportation .................................................................. 136
5.2.3 Tourism ............................................................................ 136
5.2.4 Education ......................................................................... 137
5.3 Future Directions ........................................................................ 137
5.3.1 Real- World Factors and Group Recommendation ............ 137
5.3.2 Balance in Accuracy and Diversity of Recommendations . 138
5.3.3 Cold Start Problem ........................................................... 139
5.3.4 Extension in Food Recommendations ............................... 139
Chapter 6 References ................................................................................. 141
xvi
LIST OF FIGURES
Figure 1.1 Overview of Problems Tackled and Contributions .................................. 12
Figure 2.1 Components of Recommendation Systems .............................................. 15
Figure 2.2 Hierarchy of Recommendation Systems .................................................. 16
Figure 2.3 Cloud Computing ................................................................................... 34
Figure 3.1 Services offered by LBRS ....................................................................... 38
Figure 3.2 Visit patterns of users in a LBRS ............................................................ 42
Figure 3.3 Categorization of Techniques used in LBRS ........................................... 48
Figure 3.4 Concept of Matrix Factorization ............................................................. 49
Figure 3.5 Basic Concept of Implicit Rating using Check-Ins .................................. 51
Figure 3.6 Overview of Location Recommendations ................................................ 55
Figure 3.7 Precision of Gowalla Dataset .................................................................. 60
Figure 3.8 Recall of Gowalla Dataset ....................................................................... 61
Figure 3.9 F-Measure of Gowalla Dataset ................................................................ 61
Figure 3.10 Precision of Foursquare Dataset ............................................................ 62
Figure 3.11 Recall of Foursquare Dataset ................................................................ 63
Figure 3.12 F-Measure of Foursquare Dataset .......................................................... 63
Figure 3.13 Precision of MovieLens Dataset ............................................................ 64
Figure 3.14 Recall of MovieLens Dataset ................................................................ 65
Figure 3.15 F-Measure of MovieLens Dataset ......................................................... 65
Figure 3.16 System diagram for RTVR Model ......................................................... 66
Figure 3.17 User Placement in the Relevant Cluster ................................................. 69
Figure 3.18 System Architecture .............................................................................. 70
Figure 3.19 HLPN Model for the Proposed RTVR Algorithms ................................ 76
Figure 3.20 Precision with Clustering ...................................................................... 79
Figure 3.21 Recall with Clustering ........................................................................... 80
Figure 3.22 F-Measure with Clustering .................................................................... 80
Figure 3.23 Time Comparison with and without Clustering ..................................... 81
Figure 3.24 Precision without Clustering ................................................................. 82
Figure 3.25 Recall without Clustering ...................................................................... 82
Figure 3.26 F-Measure without Clustering .............................................................. 83
Figure 3.27 Difference between Precision Values .................................................... 83
Figure 3.28 Difference between Recall Values ......................................................... 84
Figure 3.29 Difference between F-Measure Values .................................................. 84
Figure 3.30 Scalability over 5 Recommendations .................................................... 85
Figure 3.31 Scalability over 10 Recommendations ................................................... 85
Figure 3.32 Scalability over 15 Recommendations ................................................... 86
Figure 3.33 Scalability over 20 Recommendations ................................................... 86
Figure 3.34 Scalability Comparison w.r.t Precision .................................................. 87
Figure 3.35 Scalability Comparison w.r.t Recall ...................................................... 87
Figure 3.36 Scalability Comparison w.r.t F-Measure ............................................... 88
Figure 3.37 Number of Selected Clusters vs Precision ............................................. 88
xvii
Figure 3.38 Number of Selected Clusters vs Recall .................................................. 89
Figure 3.39 Time Comparison between Single Node and Cloud ............................... 89
Figure 4.1 Hierarchy of Health Recommendation System ........................................ 95
Figure 4.2 Architecture of Diet-Right .................................................................... 102
Figure 4.3 Graph Representation of the Problem ................................................... 104
Figure 4.4 HLPN Model for the Proposed Diet-Right Algorithms .......................... 113
Figure 4.5 Comparison on Accuracy ...................................................................... 116
Figure 4.6 Comparison on Time Complexity ......................................................... 117
Figure 4.7 Comparison on Average RMSE ............................................................ 118
Figure 4.8 Tradeoff between Numbers of Ants to Time Complexity ...................... 118
Figure 4.9 Tradeoff between number of Ants and Average RMSE ......................... 119
Figure 4.10 Cost over Varying No. of Ants and Iterations ...................................... 120
Figure 4.11 Convergence Time of Different Diseases ............................................ 120
Figure 4.12 Cost Comparisons of Diseases ............................................................ 121
Figure 4.13 Accuracy of Recommendations ........................................................... 121
Figure 4.14 Precision @ K=10 ............................................................................... 122
Figure 4.15 Recall @ K=10 ................................................................................... 123
Figure 4.16 F-Measure @ K =10 ........................................................................... 123
Figure 4.17 Precision @ K =20 .............................................................................. 124
Figure 4.18 Recall @ K =20 .................................................................................. 124
Figure 4.19 F-Measure @ K =20 ........................................................................... 125
Figure 4.20 Average RMSE @ K =10 .................................................................... 125
Figure 4.21 Average RMSE @ K =20 .................................................................... 126
Figure 4.22 Time Comparison @ K =10 ................................................................ 126
Figure 4.23 Time Comparison @ K =20 ................................................................ 127
Figure 4.24 Precision ............................................................................................. 128
Figure 4.25 Recall ................................................................................................. 128
Figure 4.26 F-Measure ........................................................................................... 129
Figure 4.27 Convergence Time of Single Node and Cloud ..................................... 129
xviii
LIST OF TABLES
Table 2.1 Comparison of Content Based and CF Based Techniques ......................... 19
Table 2.2 Description of combinations used in Hybrid Technique ............................ 20
Table 2.3 Criteria for recommendations in different research ................................... 24
Table 2.4 User-location Matrix ................................................................................ 25
Table 2.5 Summary of Evaluation Metrics ............................................................... 32
Table 3.1 Example of Explicit Rating ...................................................................... 50
Table 3.2 Strength and Weaknesses of the Selected Techniques .............................. 56
Table 3.3 Summary of Techniques ........................................................................... 56
Table 3.4 Data type for HLPN Model ...................................................................... 75
Table 3.5 Places and mapping used in HLPN model ................................................ 75
Table 3.6 Experimental Setup .................................................................................. 78
Table 4.1 Data Types for HLPN Model ................................................................. 112
Table 4.2 Places and mapping used in HLPN model .............................................. 112
Table 4.3 Experimental Setup ................................................................................ 115
xix
LIST OF ABBREVIATIONS
ACO Ant Colony Optimization
CF Collaborative Filtering
COFID Composition of Foods Integrated Dataset
COR Pearson's Correlation
CORC Pearson's Correlation Constrained
CSF Cerebrospinal Fluid
DCG Discounted Cumulative Gain
FRS Food Recommendation System
GA Genetic Algorithms
GPS Global Positioning System
JMSD Jaccard plus Mean Squared Difference
KNN K-Nearest Neighbor
LARS Location Aware Recommendation System
LBRS Location Based recommendation System
LCF Location-based Collaborative Filtering
MAE Mean Absolute Error
MAS Mean Absolute Shift
MPC Most-Preferred-Category-Based
MSD Mean Squared Difference
MSE Mean Square Error
NMAE Normalized Mean Average Error
PCF Preference-based Collaborative Filtering
POI Point of Interests
RBC Red Blood Cells
RMSE Root of Mean Square Error
ROC Receiver Operating Characteristics
RS Recommendation System
RTVR Real-Time Venue Recommendation
SR Spearman's Rank-Order Correlation
SVD Singular Value Decomposition
TF-IDF Term Frequency-Inverse Document Frequency
TOPSIS Technique for Order Preference by Similarity to Ideal Solution
UBCF User Based Collaborative Filtering
V2VCS Vehicle-to-Vehicle Communication System
VRS Venue Recommendation System
WBC White Blood Cells
1
Chapter 1
Introduction
2
1.1 Recommendation Systems
Recommendation Systems were introduced in the early 1990s to deal with the
challenges of personalized and automatic data retrieval from diverse information
sources [1]. The basic principle of a recommendation system is to compute a list of
items for the user considering user’s preferences, likes/dislikes, ratings, and other
implicit/explicit relationships between user to user or user to item. Various knowledge
discovery techniques [2-5] are applied to users’ contextual and historical data to extract
information, services, and products that have a significant effect on users’ preferences.
Once extraction is performed, the recommendation models apply filtration on the
extracted data and attempt to predict the next items a user may prefer in the future.
However, a recommendation system must consider various factors, such as stability,
accuracy, diversity, and novelty to balance user’s preferences in the recommendations
[1].
With the evolution of smartphones and online social networking applications,
the user generated online content/feedback is continuously increasing. The aforesaid
generated online content might reflect the user’s bias towards certain products, services,
food, venues etc. Based on the user generated content, some of the popular e-commerce
and online service providers, such as Amazon, Facebook, Flicker, and Netflix have also
integrated recommendation systems to provide personalized recommendations to the
users. For instance, Amazon generates recommendations using collaborative filtering
model and clustering algorithms [6]. The Amazon applies collaborative filtering on
item-to-item and user-to-user considering the fact that a user � who buys an item �,
may also be interested in buying the item �, if other users similar to the user U follow
similar pattern of purchasing the items � and �. Here, similarity between the two items
is highly correlated. In the same manner, Amazon utilizes clustering algorithms and
numerous classification algorithms on user-user matrix to identify the similar set of
users.
Recommendation systems have been applied in numerous application areas
including: movie recommendations [7-9], product recommendations [10-12], location
(venue) recommendations [12, 13], diet recommendations [14, 15], academic
recommendations [16, 17], application recommendations [18, 19], and point of interest
(POI) recommendations [20, 21]. However, the basic models and approaches usually
remain the same, regardless of the application areas in which recommendation
3
algorithms are applied. In this dissertation, to have a more focused approach, research
is narrowed down to the following application areas: (a) venue recommendations and
(b) food recommendations. The rapid increase of smartphone users has shifted the focus
of recent research towards the aforementioned areas in order to offer users with food
and venue related recommendations ubiquitously [22-25].
In the discussion to follow, a brief overview of the two application areas of
recommendation systems that are considered in our work: (a) location based
recommendation systems, and (b) food recommendation systems, is presented.
Moreover, this research will discuss the challenges recommendation systems face, the
motivation behind our work in addressing those challenges, and our contributions. In
the end, organization of thesis is presented.
1.2 Location Based Recommendation Systems
In recent years, numerous social networking applications for location-acquisition and
wireless communications were developed for smart phones and mobile devices. The
most popular among those is Facebook, Foursquare, and Instagram. In a user’s context,
location can be considered as one of the most important information. By using the
location history of a user, one can easily extract extensive knowledge about the
preferences and behavior of that particular user [26]. Use of location-based content
helps in bridging the gap between the online social networking services and the physical
world [27]. Another advantage of geo-tagged content is the modeling of new relations
among users, locations, and both. Presently, availability of huge volumes of users’
geospatial data, for instance, Foursquare datasets [28], movies datasets [29], and taxi
trace datasets [30], has motivated the research community to focus efforts on the design
of location-based recommendation systems with models based on the data extracted by
mobile social networking applications [1, 31-33]. Such systems perform venue
recommendations based on users’ location and preferences. For instance, a tourist
entering a city may need to know about popular locations of his interest, such as hotel,
restaurant, shopping malls, and other popular POIs. In this scenario, the system
generates recommendations according to the preferences of the users.
The evolving mobile social networks now allow users to provide a feedback (or
a tip), or rate the venue they visit (or perform check-in at a venue) [3, 26, 34]. Moreover,
huge volume of data such as videos, music, images, and text is collected by mobile
social networks. The collected huge volumes of data is also referred to as Big Data [35].
4
Big Data introduced new challenges for traditional recommender systems that were
initially designed without giving much consideration to scalability factors. Moreover,
the modern recommender systems need to process the big data while facing challenges
like (a) Data sparseness: when limited number of ratings for an item results in sparse
user-to-item matrix [31], (b) Cold start: occurs when the system generates
recommendation for a user who is new to the system with insufficient historic data [31,
36], and (c) Scalability: refers to the ability of a recommendation systems to maintain
good performance under an increased load [36-38]. The aforementioned challenges, if
not tackled decently, may affect the recommendation quality [1, 31].
The core objective of our study is to develop recommendation model that not
only caters the scalability issues of existing recommendation systems, but also utilizes
optimization techniques to address multiple conflicting objectives in producing optimal
list of recommendations. To address scalability challenges, this research leverage cloud
computing to perform real-time processing of large scale data. The model is based on
Hadoop MapReduce framework to enable parallel computation on multiple nodes.
Moreover, data sparsity problem is handled by pre-processing the data to filter out
insignificant/redundant data from online recommendation. The pre-processing phase
successfully eliminated the data sparsity by reducing the dataset. To handle the cold
start problem, our model maintains a pre-computed ranked list of popular users and
venues. Using such list, our model compares the preferences of the new user with the
existing popular users, i.e., users with top rankings. The model suggests the
recommendations according to the top user that best matches the preferences with the
new user.
1.3 Food Recommendation Systems
In the last few years, recommendation systems have been applied in many areas of
health sector. Numerous recommendation models have been proposed to target specific
health application [39-43]. One important emerging area of health recommendations is
food/diet recommendation. Due to lack of concise information about healthy diet,
people mostly have to rely on medications instead of adapting a preventive approach,
like required food items. The selection of proper diet is critical for patients suffering
from various diseases. eHealth initiatives and research efforts aim to offer various
pervasive applications for novice end users to improve their health [44]. Several studies
depict that inappropriate and inadequate intake of daily diet is the major reason of
5
various health issues and diseases [5, 45, 46]. For instance, a study conducted by World
Health Organization (WHO) estimates that around 30% of the total population of the
world suffering from various diseases, and 60% deaths each year in children are related
to malnutrition [47, 48]. Another study by WHO reports that inadequate and
imbalanced intake of food causes around 9% of heart attack deaths, about 11% of
ischemic heart disease deaths, and 14% of gastrointestinal cancer deaths worldwide
[45]. Moreover, around 0.25 billion children are suffering from Vitamin-A deficiency
[49], 0.2 billion people are suffering from iron deficiency anemia [50], and 0.7 billion
people are suffering from iodine deficiency [51]. Generally, a person remains unaware
of major causes behind deficiency or excess of various vital substances, such as
calcium, proteins, and vitamins, and how to normalize such substances through a
balanced diet.
Several works in the literature [15, 46, 52-57] proposed recommendation systems
related to food. These systems can be categorized as: (a) food recommendation systems
[46, 52], (b) menu recommendations [54], (c) diet plan recommendations [15], (d)
health recommendations for different diseases such as diabetes and cardiovascular [55,
56], and (e) recipe recommendations [53, 57]. The aforementioned systems provide
recommendations for either some specific disease or balance the diet without
considering information about any disease or nutrition deficiency in the users. For
instance, in [52], a food recommendation system is proposed for diabetic patients. The
system recommends food for diabetic patients without considering the diabetes level
that may fluctuate frequently. Similarly, the authors in [46] do not consider the nutrition
factors that have significant importance for a balanced diet recommendation.
Keeping in view the aforesaid facts, it is of paramount importance to maintain
a balanced intake of food particularly for the people suffering deficiency or excess of
certain vital ingredients. However, it is quite challenging for a common person to keep
track of personal food requirements because of the massive diversity of dietary
components in different food items. A systematic food recommendation system is
required to recommend the appropriate food considering the disease of the person. The
major challenge in designing such a system is the handling of greater volumes of data
in terms of ingredients, quantity, nutritional facts, user preferences, and simultaneously
taking into consideration a person’s pathological reports. The system must be scalable
enough to handle recommendation queries from all over the globe. Moreover, in food
recommendations, it is of grave importance to generate accurate and relevant
6
recommendations. In this dissertation, a cloud based food recommendation system
called Diet-Right is presented that considers the users’ pathological tests results and
recommends a list of optimal food items. To achieve optimal results that best matches
the users’ preferences, Ant Colony Optimization (ACO) based algorithm is used.
Moreover, cloud computing is used that enables us to perform on-demand scalability
of computing and storage resources.
1.4 Challenges
The existing approaches used in various recommendation systems mostly rely on
collaborative filtering [1, 27, 31]. However, most of the existing approaches commonly
face the issues such as data sparseness, cold start, and scalability. Below, research gaps
are briefly discuss in related work that significantly impact the performance of
recommendation systems. In this section, some of the unsolved problems of the
previous research works are highlighted which are still affecting the performance of
current LBRS. These problems include:
1.4.1 Scalability
In memory-based CF recommendation systems, user rating data are utilized to apply
simple methods to perform similarity computations between items or users [36-38].
Neighborhood based CF (e.g., K-nearest neighbor) is one of such kind of memory-
based CF approaches. However, scalability is the main issue in such systems as a major
requirement is the real-time parsing of massive volumes of data. If not done so, results
in poor efficiency and performance. Consequently, such systems are not able enough
to handle big data. To overcome the scalability problems, model-based CF is applied in
some of the existing techniques [26, 33, 58]. Compared to memory-based approaches,
model-based approaches are better in the sense that such approaches help in reducing
the size of the user-item rating matrix that decreases the online processing time [1, 27].
1.4.2 Cold Start
Cold start is the problem when the system generates recommendation for a user who is
new to the system. Such problem occurs in many existing CF-based recommendation
systems [31, 36]. When the system does not have enough records available from the
new user, it is almost impossible to compute similarity measures. Inadequate entries
causes zero values of similarity calculations that poorly affect the performance and
quality of recommendations.
7
1.4.3 Data Sparseness
Users visiting limited numbers of locations results in sparse user-to-location check-in
matrix. The negative effect of data sparseness is the non-optimal computations of the
nearest neighbor set of similar users with the particular user that also affect the accuracy
of recommendations. Moreover, performances of the existing LBRS are also affected
by the sparseness of user-to-user relationship matrix when directly manipulated with
CF-based models [31].
1.4.4 Over Specialization Problem
Over specialization problem occurs when recommendation systems restrict user to get
only those recommendations that match a user’s preferences or that are already rated
by the user [59]. The problem prevents users to discover new items or locations.
However, many recommendation systems also focus on diversity as an important
feature of the recommendations. Diversity indicates how distinct the recommendations
are when compared to each other [60]. At the same time, the focus should also be on
maintaining the balance between diversity and similarity [3, 61, 62]. Therefore, a
tradeoff between diversity and matching user preferences is still a challenging problem
for the recommendation systems.
1.4.5 Recommendation of Popular Objects
In some cases, the focus of the recommendation systems is to recommend only those
items that are popular amongst others and are likely to be highly rated by the users. In
this case, the items or locations that are less popular may be overlooked [63]. Popular
items or locations are easy to discover by users, as compared to unpopular items.
Therefore, a recommendation list must also include the less popular items or locations
that are unlikely to be discovered by the users.
1.4.6 Attacks on Recommendations
One of main challenges faced by recommendation systems is security issues.
Recommendation systems are widely used in e-commerce applications and are likely
to be targeted by malicious attacks. The attackers may try to hinder or promote some
locations or items unjustly [64]. Therefore, keeping in view the threat of attacks, a good
recommendation systems should be equipped with wide scale of tools in order to
prevent different kind of attacks. In [65], the authors exposed eight different strategies
8
of attackers which must be considered by the recommendation systems in their
prevention tools.
1.5 Scope of Research
In section 1.4, some of the challenges faced by recommendation systems are
highlighted. Among the challenges referenced above, this research focuses on the major
challenges mentioned in the state of the art literature. The challenges addressed in this
work include: (a) cold start, (b) scalability, and (c) data sparseness. Moreover, real-time
parameters for venue recommendations are also considered. Subsequent sections
elaborate the challenges addressed in this research.
A user’s real-time location is an important parameter for recommendations. Other
parameters may include a user’s current speed and direction. All of the aforementioned
parameters, including a user’s past interaction pattern constitute a user’s context.
Nowadays, smart phone manufacturers embed various kinds of sensors that allow real-
time tracking of user activities. Considering real-time varying context, recommendation
is challenging because of large diversity in user’s context and historic patterns [66, 67].
For generating the real-time recommendation, the recommendation system needs to
consider the following factors: (a) personal preferences, (b) past check-ins, (c) current
context, such as time and location, and (d) collaborative social opinions (other
individuals’ preferences), (e) ratings of the users, and (f) distance from the location. In
most of the existing literature [68-70], these features are simply ignored to keep the
model trivial, however, at the cost of recommendation quality.
The cold start problem in state-of-the-art collaborative filtering (CF) recommendation
systems [71] is noticed when recommendation is generated for a new user. The main
reason of this behavior is that system has only limited information for a new user to
perform similarity measures. As a result, records output zero values of similarity
computations which reduce the recommendation quality.
Most of the traditional recommendation systems were not designed to be scalable due
to small amount of historical data [36, 72, 73]. The performance of such systems
degraded as the data size increased continuously. To handle the scalability issue, a few
works considered applying data mining and machine learning techniques to
reduce/filter out insignificant data to reduce the computation time [1, 31]. However,
there exists a tradeoff between the reduced dataset size and quality of recommendation.
9
The reduction in dataset size may help in faster online processing, but at the cost of
decrease in recommendation quality [58]. Therefore, a careful approach needs to be
devised to enforce a balance between recommendation quality and dataset size.
Another problem faced by the traditional recommendation systems is data sparsity. [74-
76]. Data sparseness in recommendation system occurs when there is limited data in
users-item matrix. Users visiting limited numbers of locations or a few user ratings with
limited number of items results in sparse user-to-location check-in or user-to-item
matrix. The negative effect of data sparseness is the non-optimal computations of the
nearest neighbor set of similar users for a particular user that in turn affects the accuracy
of recommendations. Moreover, performance of the existing recommendation systems
are also affected by the sparseness of user-to-user relationship matrix when directly
manipulated with CF-based models [31].
1.6 Motivation
The popular social networks are the main derivation for the explosive usage of smart
phones. Recent years have seen a prolific increase in network-enabled mobile devices.
According to a market research report, in year 2016 alone a total of 1495.36 million
smart phones were sold with Wi-Fi capability and it is further estimated that one-third
of the world’s total population is projected to own a smartphone by 2017 [77]. One of
the popular application areas of fully connected networks is mobile social networks.
These mobile social networks offer variety of services for smart phone users to share
their experiences and contents in massive size. Until January 2017, Foursquare has
gathered almost 9 billion check-ins. Moreover, over 55 million users use different
services of the Foursquare each month [28]. The above mentioned social networking
site collects different information from the users, such as users’ check-ins which
includes users’ geographical information and users’ comments about the venue. Users
of these applications are provided with options to enter “check-in” information at
different venues to share their experience and knowledge by giving a tip or feedback
[3, 26, 27]. Moreover, these services also keep track of the users’ geospatial check-in
data such as time and longitude/latitude [3]. More recently, the availability of huge
volumes of users’ geospatial data has motivated the research community to focus their
efforts on the design of various Venue Recommendation Systems (VRS) that are based
on the extracted information and data from mobile social networking applications [1,
3, 26, 27]. Such systems perform recommendations of different venues to users that are
10
directly related to the users’ preferences. The main focus of our research is to develop
model to generate real-time venue recommendations for the users by incorporating
MapReduce framework to process large-scale data. The factors that are considered in
this research include users’ preferences, current context, past check-ins, rankings,
geospatial characteristics, and collaborative social opinion.
The second area of recommendation systems targeted in our thesis is health
recommendation. With the increase of smart phone usage in recent past, the trend of
using the smart phone for health related applications has also seen a significant increase.
Recent study [78] showed that 62% of smart phone users queried health related
information. One of the emerging health related recommendation systems area is
diet/food recommendation. Due to diversity in food items/components and large
number of dietary sources, it is a challenging task to perform balanced selection of diet
patterns that must fulfill one’s nutrition needs. Particularly, selection of proper diet is
critical for patients suffering from various diseases. Therefore, a systematic food
recommendation system is desired to recommend the appropriate food considering the
disease of the person. The major challenge in designing such a system is the handling
of greater volumes of data in terms of ingredients, quantity, nutrition facts, people’s
preferences, and simultaneously taking into consideration a person’s pathological
reports. The system must be scalable enough to handle recommendation queries from
all over the globe. A solution to the aforementioned challenge is the use of cloud
computing.
1.7 Contributions
Our three main contributions are: (a) quantitative analysis of selected techniques on
different datasets, (b) designing of real-time venue recommendation system (VRS) for
individuals, and (c) food recommendation to a user. The most popular social networks
such as Foursquare, Facebook, and Twitter are the main services used by numerous
users across the world. Similarly, variety of other social networking sites and
applications share huge amount of unstructured data in every single minute of the day.
There are numerous challenges in handling and parsing the huge volume of data. As a
first step, a quantitative analysis is conducted on state of the art recommendation
techniques over datasets of MovieLens, Gowalla, and Foursquare. The objective was
to analyze the behavior of different datasets in terms of precision, accuracy, and F-
11
Measure on the selected techniques. Four techniques are compared namely: (a)
Location Aware Recommendation System (LARS), (b) Geographical Probabilistic
Factor Model (GPFM), (c) Latent Dirichlet Allocation (LDA), and (d) Random Walk
with Restart (RANDOM). Critical analysis of the aforementioned techniques has been
conducted against all of the datasets in terms of precision, recall, and F-Measure. The
analysis is presented in Section 2.8.
The second contribution of this dissertation is regarding real-time venue
recommendation system. To achieve the objective of real-time venue
recommendations, a hybrid model is developed by combining the CF method with K-
means clustering technique and K-Nearest Neighbor (KNN) ranking technique.
Moreover, MapReduce model has been used for clustering venues in the foursquare
dataset, which are processed in parallel on cloud. For clustering potential venues, K-
mean clustering technique is used and for ranking of top venues for final
recommendations KNN ranking technique is used. Cloud-based infrastructure is
utilized to process, compare, mine, and manage datasets for real-time recommendations
in a scalable architecture.
Third contribution of the dissertation is related to food recommendations. Due
to diversity in food components and large number of dietary sources, it is challenging
to perform real-time selection of diet patterns that must fulfill one’s nutrition needs.
Particularly, selection of proper diet is critical for patients suffering from various
diseases. This study highlighted the issue of selection of proper diet that must fulfill
patients’ nutrition requirements. To address this issue, a cloud based food
recommendation system is presented, called Diet-Right, for dietary recommendations
based on users’ pathological reports. The model uses ant colony algorithm to generate
optimal food list and recommends suitable foods considering users’ pathological
reports. A high-level overview of problems tackled in this dissertation along with
proposed techniques is provided in Figure 1.1.
12
Figure 1.1 Overview of Problems Tackled and Contributions
The list of published and submitted articles from the above mentioned contributions is
as follows.
Published Articles:
1. Rehman, F., Khalid, O., & Madani, S. A. (2017). A comparative study of
location-based recommendation systems. The Knowledge Engineering Review,
32.
Chapters Challenges & Contributions
Challenges • Large-scale data sets • Lack of standardized models/formulas
Contributions • Quantitative Analysis on the datasets • Datasets used: Foursquare, MovieLens, and Gowalla.
Quantitative Analysis
(Chapter 2)
Challenges • Real-time factors not considered • Not scalable
Contributions • Clustering using K-Means • Ranking using KNN • Cloud Based model for scalability • Real-Time Venue Recommendation
Real-Time Venue Recommendation System
(Chapter 3)
Challenges • Diversity in available food components and each
individual's nutrition requirements for specific disease
Contributions • Computation of required proportion of nutrition based
on pathological reports for a person's specific disease. • Use of Ant colony to compute optimal solution • Cloud based implementation to address scalability and
to generate real-time food recommendations
Smart Food Recommendation System
(Chapter 4)
13
2. Rehman, F., Khalid, O., Bilal, K., & Madani, S. A. (2017). Diet-Right: A Smart
Food Recommendation System. KSII Transactions on Internet & Information
Systems, 11(6).
Submitted Article:
1. Rehman, F., Khalid, O., & Madani, S. A. (2017). Real-Time Context Aware
Cloud Based Venue Recommendation System. IEEE Access.
1.8 Organization of Dissertation
The rest of the dissertation is organized as follows. Chapter 2 provides an overview of
existing works that are closely related to this research. Particularly, the criteria for
recommendations is illustrated, and basic similarity measures followed by evaluation
of recommendations. Moreover, the challenges faced by recommendation systems are
presented. Furthermore, the models and techniques proposed in the literature for LBRS
and food recommendations have been critically investigated. Chapter 3 presents the
Real-Time Venue Recommendation (RTVR) model based on cloud infrastructure and
service-based interfaces in order to process, compare, mine, and manage large datasets
for real-time recommendations in a scalable architecture. This research used
collaborative filtering along with K-Mean clustering to pre-process large dataset. For
ranking, K-Nearest Neighbors is used to compute the personalized ranking list for
venues. The quantitative analysis of the datasets used in recommendation system is also
performed. In Chapter 4, a smart food recommendation system is presented. This
chapter present a cloud based food recommendation system, for dietary
recommendations based on users’ pathological reports. Hybrid technique have been
used by using knowledge based technique along with Ant Colony Optimization for
generating ranked list of optimized food items. Chapter 5 presents discussion on various
challenges in developing scalable recommendation systems and our contributions have
been highlighted to handle these challenges. Moreover the chapter also provides some
opportunities in the area of recommendation systems and directions for the future work.
14
Chapter 2
Overview of Recommendation Systems
15
2.1 Overview
Recommendation systems have a profound association with cognitive science [79] and
information retrieval [80]. For the past few years, there has been significant research in
the area of recommendation systems [31, 65, 81-90]. In the literature, authors used
different approaches to deal with the challenges of personalized data retrieval from
diverse sources of information [11, 91, 92]. Majority of the existing work utilized
techniques, such as content based filtering and collaborative filtering with an intention
to balance various factors, such as accuracy, diversity, novelty, and familiarity [27, 59].
Moreover, the existing work also attempted to deal with the challenges, such as
scalability, cold start, and data sparseness that are common in recommendation systems
[71, 74]. Figure 2.1 shows the contents of recommendation systems in terms of types,
algorithms, and data sets. The major algorithms used in recommendation system are
content based, collaborative filtering based, and hybrid techniques. These algorithms
are used for information filtering on different datasets that include user’s profiles, user’s
trajectories, user’s location information, and user’s preferences. Finally, different types
of recommendations are generated that include locations, routes, movies, products, and
books. Figure 2.2 shows a general hierarchy of the recommendation techniques.
Figure 2.1 Components of Recommendation Systems
Recommendations
Locations
Routes
Products
Movies
Books
Algorithms
Content-Based
CF-Based
Hybrid
Datasets
User's Profile
User's Preferences
User's Locations
User's Trajectories
16
Figure 2.2 Hierarchy of Recommendation Systems
2.1.1 Content-based Recommendation Methods
The main idea of content-based or cognitive methods (e.g., [89, 93-95]) is to recognize
the common characteristics of particular items that were already evaluated and rated by
users. Based on those characteristics, content-based system finds and recommends a
new item that shares the identical characteristics to the user’s preferences. Moreover,
detailed information about the item or user is implicitly presented in the form of a
feature vector [96]. For other items, such as text documents [94, 95], and web
documents [89, 93], the feature vector usually comprises the Term Frequency-Inverse
Document Frequency (TF-IDF) weights of the most frequent keywords [80]. The TF-
IDF approach was also used to predict ratings of a user for any new item [94]. Bayesian
approaches were also used in the literature for the same purpose [89, 94, 97].
Recommendation Systems
Conent Based
Collaborative Filtering Based
Memory-based
User Based
Item Based
Model-based
Latent Semantic Analysis
Bayesian Clustering
Support Vector Machines
Maximum Entropy
Latent Dirichlet Allocation
Hybrid
Cascade
Weighted
Mixed
Switching
Feature Augmentation
Feature Combination
Meta-Level
17
Content-based recommendation systems suffer from two main problems: (a)
inadequate analysis of contents and (b) over specialization [98]. Inadequate analysis of
contents stems from the circumstance where the recommendation systems have limited
or incomplete information about the contents of the item or about the users. There are
numerous reasons behind such lack of information. For example, privacy is a major
concern for many users that might restrict provision of personal of information of the
users. Similarly, information about items such as music or images is costly or difficult
to acquire [27]. Finally, in some cases, the information about the item is inadequate to
evaluate the quality of the item [99]. For instance, it is quite difficult to differentiate
between a well written article and a poorly written article when both of the articles use
the same terms. On the other side, over specialization is the problem that is a side effect
of the methodology in which recommendation systems recommend new items. The
rating predictions of a user is high for an item if the characteristics of the item are
similar to the ones already rated high by the same user. For example, a recommendation
application for a movie may recommend a movie of the same category to the user, if
the user has rated movies of the same category previously. Similarly, a system may
recommend a movie to the user which has the same actors to that of the previously rated
movie. Because of the nature of the content-based recommendation technique, the
system does not consider any other movie that is different yet might be fascinating to
the user. For the aforementioned issue, solutions were proposed that introduce diversity
in recommendations by adding some randomness in the recommendations [98] or
filtration of too similar items [94, 100].
2.1.2 Collaborative Filtering based Recommendation Method
Unlike content-based methods, in which the main idea is to recognize the common
characteristics of a particular item that has already been evaluated and rated by a user,
collaborative filtering (CF) based methods depend on the ratings of a user along with
other users’ ratings in the system [85, 98, 101]. CF is the most commonly used
technique for the recommendation problem [1, 27, 31, 33]. Although CF-based methods
are frequently used with other filtering techniques such as knowledge based or content-
based, the main objective of using CF-based recommendation systems is to locate the
subset of similar users who have similar profiles and preferences. Rating by a user ��
for an item � is expected to be similar to rating by another user ��, if and only if �� and �� had followed a similar pattern in rating other items [98]. Similarly, �� is likely to
18
rate two items i and j in a same manner, if similar rating has been given to both items
by other users [88]. The reason of such likeliness among the users is because of the
greater values of similarity parameter among the users. The CF-based recommendation
systems function by matching a particular user’s items record (stored in a matrix) with
other users’ stored record. The matrix must contain users’ visited locations and the
number of visits to each location. The CF-based methods present valuable
recommendations to a given user by extracting the ratings shared by similar users on
the items.
CF-based methods eliminate certain existing problems of content-based
recommendation systems. For instance, when the rating information about an item is
needed, and it is not available or difficult to acquire, CF-based models can still generate
recommendations for the users through the feedback and ratings of other users.
Moreover, in CF-based methods, users mostly rate an item keeping in consideration the
quality or ratings of the item. This is not the case in content-based methods that mostly
rely on content matching, which may lead to poor quality of recommendation.
The two generic classes of CF approaches are memory-based and model-based.
In the memory-based CF method, often referred to as neighborhood based [98] or
heuristic based [83], user-item rating matrix stored in the system is directly accessed
and used for the prediction of ratings for new items. Memory-based models are further
categorized as item based and user based, as reflected in Figure 2.2. In item based
methods [85, 98], the prediction of rating for a user is based on the already stored ratings
of the similar items by that user. The system considers two items similar if and only if
most of the users rated that item in the same way. Alternatively, user based approaches
[84] consider a user’s interest towards an item using the previously stored ratings of
other users for the same item. The similarities are calculated among all the pairs of users
based on the users’ ratings. The similarity formulas, such as cosine similarity or Pearson
correlation are used for calculating the similarity among the users [21]. The resultant
correlated users are known as neighbors that are used for the rating prediction of the
other users. In model-based approaches, data mining and machine learning algorithms
are applied to train the probabilistic models for various patterns. Compared to memory-
based approaches, model-based approaches are better in a sense that these approaches
help in reducing the size of the user-item rating matrix that decreases the online
processing time [1, 27]. Also, model-based methods perform categorization of users’
19
preferences that may include some hidden factors. For instance, without actually
defining any notion such as “suspense” or “horror”, a movie recommendation system
recommends a movie that is both suspense and horror [102]. In such a situation, model-
based approaches determine a user’s preference about a movie, without the user
explicitly stating the preference [102]. Alternatively, memory-based methods extract
associations in the user-item rating matrix. As a result, the recommendation system may
recommend a movie that is against one’s taste or a movie that is not very popular
because one of the user’s nearest neighbors highly rated that movie. Techniques
commonly used in model-based recommendation systems include Bayesian Clustering
[98], Latent Semantic Analysis [103], Latent Dirichlet Allocation [104], Maximum
Entropy [73], and Support Vector Machines [102]. Table 2.1 shows the comparison of
the above mentioned techniques.
Table 2.1 Comparison of Content Based and CF Based Techniques
Technique Procedure References
Content
Based
Use information about the features
of the items and rating provided by
the users. Ratings are combined to
the preferences of the users based on
the features of the items already
rated by the same user.
[20, 76, 94-96, 105, 106]
CF Based
Use information about the ratings of
a user along with other users’ ratings
in the system. The main objective is
to locate the subset of similar users
who have similar profiles and
preferences.
[21, 65, 87, 96, 107-111]
2.1.3 Hybrid Recommendation Methods
Combination of two or more techniques form a hybrid recommendation systems.
Usually, for achieving the best performance, techniques are combined in a way that the
methods with few drawbacks are chosen [112]. Collaborative filtering is the most
20
commonly used technique for combination with any other technique [113]. This is
because the main objective of using CF-based recommendation systems is to locate the
subset of similar users who have similar profiles and preferences. Some of the
combination methods used for the creation of hybrid recommendation systems is shown
in Table 2.2.
Table 2.2 Description of combinations used in Hybrid Technique
Hybrid Methods Description
Weighted [114]
The weights, votes, scores, ratings of different
techniques are combined together.
Mixed
[86]
Different recommendations by multiple ranked list are
presented simultaneously.
Switching
[115]
Switching between different techniques is done by the
system according to the situation.
Cascade [116]
Refinement done by one technique on
recommendations offered by other.
Feature Combination
[113]
Combination of features of different recommendation
data sources
Feature Augmentation
[112] Input of technique is the output of other technique
Meta-Level [117]
Input of the technique is the complete model of other
technique
2.2 Criteria for Recommendations
In recommendation systems, a user’s main concern is the quality of the
recommendations generated for an item or location. User reviews the resultant
recommendations offered by the system and rate those as appropriate or inappropriate
depending on their personalized preferences. In addition, it is important for the
recommendation systems to present such recommendations in a manner that is
acceptable to the users. Therefore, presentation of the recommendations must be
handled carefully to influence the users to accept the recommendations [118]. Several
questions must be addressed for instance, would the recommendations include only top
ranked items or locations, and similar ones? Or would the recommendations include the
popular locations or items, and also well-proportioned highly ranked ones? Similarly,
another key aspect of recommendation systems is trust [119]. The user has more trust
on the system if the recommendations generated by the system best match the user’s
21
preferences. Therefore, a user’s expectation is directly dependent on trust in the system
[61, 120]. The recommendation systems also affect a user’s confidence in the system
that whether the generated recommendations are satisfying a user or not. The users are
more likely to accept the recommendations if their confidence in the system is high
[121]. In the following subsections, some of the criteria for the recommendation
systems is discussed that must be considered to build trust and confidence of the users.
2.2.1 Accuracy
To evaluate a recommendation system, one of the most dominant and important
criterion is accuracy [122]. Numerous works have been carried out in the past decade
to improve accuracy of recommendations [123]. The most commonly used metric in
the rating based recommendation systems is Mean Absolute Error (MAE) [87]. MAE
measures the difference between the ratings predicted by the algorithm and the actual
ratings of the users [118]. Similarly, for content-based recommendation systems,
“switching task” is used to measure the accuracy [86]. Switching task measures the
change of user preferences before and after the recommendation [86]. If some user
changes his/her preferences after the recommended list, the accuracy of the system will
be decreased [123]. In recent years, the focus of the research is shifted towards users’
perceived accuracy. User’s perceived accuracy measure is the degree of users’
satisfaction level [112]. If the recommendations generated by the system best matches
a user’s interest and preferences, the degree of the user’s perceived accuracy will be
high. Perceived accuracy has more direct impact on user’s trust level as compared to
objective algorithm accuracy [114]. However, it was also shown by some researchers
that both perceived accuracy and objective accuracy do not have a direct correlation
with each other [123]. It is concluded from different studies that for recommendation
systems the user-based collaborative filtering based on K-nearest neighbors has been
verified to achieve high accuracy [Refer].
2.2.2 Familiarity
Another criterion for the recommendation systems is familiarity. The most familiar item
is most likely to be recommended [118]. As compared to unfamiliar recommendations,
users’ liking for familiar recommendation is high. The study conducted in [118] showed
that when the items that are highly liked by a user are included in the recommendations,
the items increase the user’s trust in the system. The study also showed that the users
22
are more likely to be interested in the familiar items rather than unfamiliar ones. In the
study conducted in [37], familiar road segments were recognized and then familiar road
networks were constructed using historic trajectories to generate personalized routes.
While the preference of a user is always towards the familiar recommendations, the
user also needs new recommendations to which the user was previously unfamiliar. For
example, if LBRS recommends a similar restaurant to a user, the user might feel that
the system is not capable of recommending new restaurants that are of different taste
from the user [61].
2.2.3 Novelty
The limitation shown in the previous criterion of the recommendation systems can be
handled by introducing novelty in the recommendations. Novelty is another important
criterion that needs to be considered by the recommendation systems. Novelty of the
recommendation can be referred to as how the recommendation is different with respect
to previously seen recommendations. The main idea of introducing novelty is to provide
users with fresh recommendations. Novelty is also sometimes referred to as
“serendipity” [122]. However, the difference between novelty and serendipity was
elaborated in [123], where novelty means “new” and serendipity means “new and
surprising”. Therefore, it can be concluded that novelty is the extent to which
recommendations are newer for a user [61]. In a study conducted in [3], the researchers
discovered that users give high ratings to the music recommended by Pandora. Pandora
is considered as a novel music recommendation systems because it frequently provides
listeners with latest music tracks.
2.2.4 Diversity
Another key criterion for recommendation systems is diversity [118]. Diversity and
novelty are different notions though both are closely related. As discussed earlier,
novelty of the recommendation can be referred to as how the recommendation is
different with respect to previously seen recommendations. Whereas, diversity
indicates how distinctly dissimilar the recommendations are when compared to each
other [60]. In item-to-item collaborative filtering algorithm, a user is trapped in a
“similarity junction” that provides the user with similar recommendations according to
his/her or friends’ preferences [122]. Therefore, the focus of the researchers is shifted
towards diversity in recent years. At the same time, the focus should also be on
23
maintaining the balance between diversity and similarity [3, 61, 62]. Moreover, highly
diverse recommendations also affect the similarity and accuracy criteria for the user.
Therefore the diversity of the recommendations should be provided with a balance in
aforementioned tradeoffs [3]. The study showed that users like diverse
recommendations as compared to more accurate recommendations [60]. Confidence of
a user may be affected when the user receives low diversity recommendations [118]. It
was also showed in another study that the satisfaction level of users drive beyond
accuracy and that the users are more likely to accept more diverse recommendations
[62].
2.2.5 Context Compatibility
Context compatibility is a criterion for recommendation systems in which the current
context should be considered before the final recommendation [118]. For example, if
a user wants to dine in a restaurant, the current context includes current location, food
choice, accomplices, and weather conditions. Similarly, if a user wants to watch a
movie, the current context should be considered because the user’s preferences for
watching a movie with friends may differ compared to watching a movie with family
[124]. One of the advantages of adding contextual considerations in the
recommendation systems is that it helps new users to get recommendations instantly
without adding robust profiles [118].
2.2.6 Justification of Recommendations
For user satisfaction, it is not enough to give recommendations according to the user’s
preferences and ratings. In addition, the user must understand the criteria of selection
of items for recommendations. Studies showed that a user’s satisfaction and trust level
is enhanced in those recommendation systems that also provide good justifications with
the recommendation list [102]. Some of the primary goals of including the justifications
to recommendations are transparency, effectiveness, and smoothness [102]. After
realizing the importance of justifications of recommendations, commercial websites
such as Netflix, Pandora, and amazon have also added features, such as “why this was
recommended” on the web pages [118].
24
2.2.7 Sufficiency of Information
The last criterion for recommendation systems is to provide sufficient information with
the recommendation list to facilitate the user and help enhancing the decision making
process of the user. For example, when Amazon recommends a book to a user, the
information needed by the user is sufficient. The title of the book, author name, edition,
binding hardcover or paperback, current ratings, and price of the book, all the necessary
information, is displayed with recommended book. Studies showed that the availability
of the descriptive information about the individual item positively correlated with
perceived effectiveness and ease of access of the recommendation systems [123]. For
example, if recommendation systems recommend a shopping mall to a user, then the
user would like to have more detailed information about the shopping mall, such as its
distance from the user, the shortest path to the shopping mall, driving routes, and the
description of different shops in the shopping mall. Therefore, the recommendation
systems must consider the criterion of adding sufficient information with the
recommendations. Table 2.3 is the summary of criteria for recommendations applied in
various research works.
Table 2.3 Criteria for recommendations in different research
Tec
hn
iqu
es
Acc
ura
cy
Fam
ilia
rity
Novel
ty
Div
ersi
ty
Con
text
Com
pati
bil
ity
Ju
stif
icati
on
of
Rec
om
men
dati
on
s
Su
ffic
ien
cy o
f
Info
rm
ati
on
Gao, H., et al. [21] � � � � � � �
Chen et al. [114] � � � � � � �
Pu et. al. [118] � � � � � � �
Shani et al. [119] � � � � � � �
Rana et al. [122] � � � � � � �
Pu et al. [123] � � � � � � �
Chang et al. [37] � � � � � � �
Liu et al. [61] � � � � � � �
Adomavicius et al. [3] � � � � � � �
Verbert et al. [124] � � � � � � �
Javari et al. [62] � � � � � � �
Desrosiers et al. [102] � � � � � � �
25
2.3 Similarity Calculations in Recommendation Systems
One of the most critical steps in recommendation systems is to compute the similarity
between users and items. The process of computing similarity starts by computing the
similarity between users and items, and then the recommendation system selects the
most similar items for recommendations. The process for calculating similarity is
identical in all the application areas of recommendation systems. For illustration
purpose, example of LBRS is given being one the focused areas of our research.
However, similar calculations can be applied to food recommendation systems.
The basic idea in computing the similarity between two locations l1 and l2 is to
identify the users who have rated both the locations and then apply similarity
computation methods to find the similarity �����. For example, in Table 2.4, user u1 and
u2 have both rated locations l1 and l3. Similarly, the similarity between two users u1, u2
can be calculated using the existing similarity computation techniques.
Table 2.4 User-location Matrix
Ratings of users on locations
User-location l1 l2 l3 . . ln
u1 4 3 5 . . -
u2 3 - 4 . . -
u3 3 5 - . . -
. . . . . . -
. . . . . . -
un - - - - - -
A variety of different techniques exist to compute the similarity between users
and locations [33, 125-127]. The most common among such techniques include cosine
based similarity, correlation based similarity, and adjusted cosine similarity.
2.3.1 Cosine Based Similarity
In cosine based similarity, two different locations are the two vectors in the user-
location matrix. The cosine similarity between two locations is calculated by computing
the cosine of the angles between the locations [126]. Suppose a matrix with � × �
26
user-to-location as shown in Table 2.4, then similarity between two locations �� and �� can be calculated using the following formula [101]:
������, ��� = cos����, ��� =��� . �������� × ����� , (2.1)
where “.” represents the dot product of two vectors. Cosine based similarity is
considered to be computationally tractable [128]. Cosine similarity ranges between 0
and 1. One of the basic drawback of cosine similarity is that it does not show the
negative values of similarity that happens in cases when users have rated the different
set of locations. This deficiency of cosine similarity is taken into account in the Pearson
correlation coefficient. If the attribute vectors ����� and ������ are normalized by subtracting
the vector means, the measure is called centered cosine similarity and is equivalent to
the Pearson correlation coefficient. Pearson correlation is discussed in the next
subsection.
2.3.2 Correlation Based Similarity
In correlation based approach, the similarity between two locations �� and �� is
calculated using Pearson-r correlation ����,�� [127]. The calculation of Pearson-r
correlation starts with the isolation of different locations that are already rated by the
user u. For example, in Table 2.3, the locations �� and ��, both are rated by user ��. If a
set of users ��,��, �� … … . �� denoted by U, then the Pearson-r correlation similarity
is given by [81]:
��� ���, �� � = ∑ � ��,��
− ���� ��,��− �����∈�
�∑ � ��,��− ������∈� − �∑ � ��,��
− ������∈�
. (2.2)
In the above equation, ���,�� represents the rating of user �� for location �� and
� ��is the average rating of the l-th location. Compared to cosine similarity, Pearson
correlation evaluates to more accurate similarity computations as it incorporates the
negative similarity values as well. The negative similarity depicts how far the two users
are in their preferences. However, it is important to note that Pearson correlation has
two issues which must be taken into account while computing similarity between users,
items, or locations. The first issue occurs when one user has rated an item or location
but the other user has not rated the same item or location. As a result, a large set of
items could be ignored that are not commonly rated by both users, though, still those
27
items could have some impact on similarity computations. The other issue with Pearson
correlation is that users with only a few commonly rated items or locations could have
high similarities, despite the smaller item count. This could induce bias in the similarity
values, and the issue can be resolved by using significance weighting [129].
2.3.3 Adjusted Cosine Similarity
One of the main differences between the similarity computations of users and locations
is that the similarity of users is computed along the rows of the matrix and similarity of
locations are computed along the columns of the user-to-location matrix [33, 125].
Basic cosine similarity has one limitation that it does not calculate the difference in
rating scales of the users. In adjusted cosine similarity, the drawback of basic cosine
similarity is eliminated by subtracting the corresponding user’s rating average from
each co-rated pair of locations. The similarity between locations using adjusted cosine
similarity is calculated as [130]:
������, ��� = ∑ � ��,��
− ���� ��,��
− �����∈�
�∑ � ��,��− ���
���∈� − �∑ � ��,��− ���
���∈�
, (2.3)
where, � ��is the average rating of the �-th user’s rating.
The above mentioned similarity calculations are a primary step to find out
similar users or locations. After computing the similarity, the recommendation system
selects the most similar locations and generates the recommendation list accordingly.
A variety of performance metrics are used to evaluate the resulting recommendation
lists. In the next Section, the most commonly used evaluation metrics in the field of
LBRS are discussed.
2.4 Evaluation of Recommendations
To analyze quality of recommendations, it is important to evaluate the
recommendations using evaluation metrics. The use of evaluation metrics helps in the
comparison of various solutions proposed by the researchers and as a result,
recommendations have been improved gradually [88]. The existing evaluation metrics
have a standard formulization that is used for the testing and evaluations of the
recommendations [1].
A variety of evaluation metrics [1] are used, but the most common ones are
classified as prediction metrics, set recommendation metrics, rank recommendation
28
metrics, and diversity metrics. Prediction metrics are used to find the accuracy of
recommendations using Mean Absolute Error (MAE) [87], Normalized Mean Average
Error (NMAE) [131], and Root of Mean Squared Error (RMSE) [132]. Prediction
methods are also used to find the coverage. Set recommendation metrics include Recall
[4], Precision [4], and Receiver Operating Characteristics (ROC) [133]. Rank
Recommendation metrics include half-life [88] and discounted cumulative gain (DCG)
[110]. Diversity metrics include novelty and diversity of the recommendations [60].
Moreover, the validation process is completed using the most commonly used cross
validation technique known as k-fold cross validation [86] and random sub-sampling
validations [134]. The next subsection provides the elaboration of each of the evaluation
metrics in detail.
2.4.1 Prediction Metrics
To find accuracy of recommendations, researchers usually employ the calculations of
the most commonly used prediction error metric such as MAE and its associated
metrics, such as NMAE, MSE, and RMSE [119].
Suppose a set of users (��,��,
��,… … ��
) is denoted by U, a set of items
(��, ��, �� … … ��) is denoted by �, ��,� is the rating of user � for item �, � the lack of
ratings that is ��,� = � means user u has not rated item � , and ��,� is the prediction of
item � for user �. Let �� be the set of items rated by user � having prediction values
where ��,� = �� ∈ � | ��,� ≠ � ⋀ ��,� ≠ ��. The system’s MAE and RMSE are the
average of the users’ MAE. The prediction error is the absolute difference between real
values and the prediction, denoted as ��,� − ��,�. MAE [135] and RMSE [132] are given
by the following two formulas, respectively:
��� =1
|�|��
1
|��| ����,� − ��,� ��∈��
��∈�
.
(2.4)
��� =1
|�|�� 1
|��|����,� − ��,���∈��
.
�∈�
(2.5)
The metric Coverage measures the percentage of situations in which there are
chances of at least 1 k-neighbor of each active user that can rate the unrated item of that
29
active user [119]. Let ��,� be the set of neighbors of a user � that has rated an item �. The coverage of the system is the average of the user’s total coverage. Let
��= {i ∈ �|��,� = � ˄ �,� ≠ ∅}
and !� = "� ∈ �|��,� = �#,
then the coverage can be calculated using (2.6) [119]
$%&'�()' =�
|�|∑ |��|
|�|× 100�∈� . (2.6)
2.4.2 Quality of the set of recommendations
A user’s satisfaction not only depends on accuracy, but it also depends on being
provided with a concise as well as diverse set of recommendations. The combination of
accuracy, diversity, and concise items list compose the quality of recommendations
[111]. The most common recommendation metrics used for quality measurement are
precision, recall, and F1 [136]. Precision is the number of relevant recommendations
out of the total recommendations. Recall is the number of relevant recommendations
from the number of relevant recommendations. F1 is the combination of recall and
precision. F1 is generally used because of the advantage that it considers both the values
of precision and recall and returns the value of only positive results [136].
Suppose �� is the set of recommendations to user �, �� is the set of top �
recommendations to �. The relevancy threshold is � and the evaluation recall, precision
and F1 measures for obtained recommendations by taking n test recommendations to
user �. Assume that all the users take � test recommendations, then precision, recall
and F1 can be calculated by (2.7), (2.8), and (2.9) respectively [136].
��'$���%* =1
|�| � |"� ∈ +����,� ≥ ,#|*�∈�
. (2.7)
�'$(�� =1
|�| � |"� ∈ +����,� ≥ ,#|
|"� ∈ +����,� ≥ ,#| + |"� ∈ +����,� ≥ ,#|
�∈�
. (2.8)
F1 =2 × ��'$���%* × �'$(����'$���%* + �'$(�� . (2.9)
30
2.4.3 Quality of the List of Recommendations
A common issue faced by users of recommendation systems is when users select only
the first item from a large list of recommendations. Ignoring the rest of the list of
recommendations may affect the selection of recommendations as there may be some
quality recommendations down the list. To address the aforementioned issue, ranking
metrics are often used by researchers. Such ranking metrics include half-life and
discounted cumulative gain. In Half-life [1], when a user moves away from the
recommendations at the top, Half-life assumes an exponential decrease in the interest
of users. Half-life assumes that the selection probability of relevant recommendation
decreases exponentially down the list. Half-life is calculated using (2.10). In discounted
cumulative gain [137], it is assumed that the selection probability of relevant
recommendation decreases logarithmically down the list and can be calculated using
(2.11).
-(�. − /�.' =1
|�|�� max���,��
− 0, 02��� ��
�
����∈�
.
(2.10)
!�1� = 1
|�|����,��
+� ��,���%)�����
���
��∈�
.
(2.11)
In (2.10) and (2.11), the set of recommendation list is represented
by ���,��,�� … … ���, ��,�� is the true rating of the user � for the item �� , � is the rank
of the evaluated item, � is the position of the item in the list such that there is 50%
chance that the user will rate that item, and � is the default rating.
2.4.4 Novelty and Diversity
Novelty metrics calculate the difference between the recommendations recommended
to the user with those already known by a user that are significant. Alternatively,
diversity metric calculates the internal difference of the recommendations. At present,
no standard metric was defined for novelty and diversity. Therefore, different metrics
are proposed by researchers [60]. Most of the authors used the following mathematical
calculations to find the novelty and diversity in recommendations [100].
0�&'���23�� = 1
| �|| �| − 1� � 41 − �����, 5�6
�∈�� ����∈��
. (2.12)
31
*%&'�23� =1
| �| − 1�41 − �����, 5�6�∈��
, � ∈ � . (2.13)
In (2.13), set of � recommendations is represented by �� , and item-item
memory-based similarity measure is represented by �����, � .
2.4.5 Stability
A user has more trust in a recommendation systems when the recommendations
generated by the system best match the user’s preferences. A recommendation system
is known as stable when the recommendations generated do not deviate over a short
period [138]. The metric defined for the evaluation of the stability of recommendation
systems is Mean Absolute Shift (MAS) [124]. The MAS metric consists of a set of
known ratings �� known by the user and a set of unknown ratings !�. After a period of
time, the user rated some of the unknown ratings and the new recommendations !� are
generated by the system. Now, MAS can be calculated as (2.14) [124].
�2(7���23 = ��� =1
|8�|� |8��9, ��− 8�(9, �)|
�,�∈��
. (2.14)
2.4.6 Reliability
When a user gets a recommendation, it is important to know whether the
recommendation is valuable for the user or not. A valuable recommendation is
considered reliable for the user. The most commonly used metrics to find reliability of
the recommendations are Pearson's Correlation (COR), the Mean Squared Difference
(MSD), Pearson's Correlation Constrained (CORC), Spearman's Rank-Order
Correlation (SR), and the Jaccard plus MSD (JMSD) [139]. The reliability measures
are proposed according to the notion that “more reliable a prediction, the less liable to
be wrong” [1]. However, the reliability metric is just used to evaluate the
recommendation systems based on the K-nearest neighbor algorithm. It is based on the
numeric factor ��,� as shown in (2.15) and "�,� as shown in (2.16), where ��,� the
similarity of the neighbors used for computing recommendations, ��,� and "�,� is the
dissimilarity among the ratings of the neighbors. The reliability is calculated using
(2.17) [119].
.���,� = 1 − �̅
�̅� ��,� ,
where ��,� = ∑ ���(9,&)�∈��,� .
(2.15)
32
.��:�,� = ;����������,�
�������< ��.
���� ��������� ���� , :�,� .
(2.16)
�'��(7���23 =∑ ����9,&���,� − �̅� − ��,� + �̅��∈��,� ∑ ���(9,&)�∈��,�
�
. (2.17)
In (2.15), (2.16), and (2.17), �̅ and $̅ are the medians of ��,� and "�,� ,
respectively. ��,� is the set of neighbors of user u that have rated the item I, and min-
max is the discrete rating values. Table 2.5 summarizes the evaluation metrics used in
various research works.
Table 2.5 Summary of Evaluation Metrics
Criteria Metrics References
Accuracy MAE, NMAE, MSE, and RMSE [88, 119]
Quality of set of
Recommendations
Precision, Recall, and F1 [111, 136]
Quality of the List of
Recommendations
Half-Life, DCG [1, 137]
Novelty and Diversity
No standard metric defined [60, 100]
Stability
MAS [124]
Reliability COR, MSD, CORC, SR, and
JMSD
[1, 139]
2.5 Cloud Computing in Recommendation Systems
Cloud computing is one of the most utilized IT paradigm providing internet-based
computing. End users can access the remote cloud resources via an internet link
eliminating the need of local dedicated computing resources. Cloud computing
provides a set of server, network, and storage devices housed in data centers for various
applications. The data centers can vary in size from a small room to the size of a football
stadium based on the requirement of the IT facility. A limitless number of applications
utilize cloud computing facilities including but not limited to, image processing, big
data analysis, recommender systems, and remote file storage. The applications and their
33
data are hosted over cloud servers housed in the data centers. Usually, more than one
application resides over a cloud server as the cloud manages the applications flexibly
based on their resource demands. As a result, pay-as-you-go pricing model is created
that favors both small and large business enterprises. Business enterprises avoid capital
costs of acquiring computing infrastructure and exploit cloud services with monthly or
yearly payment plans based on the overall resource utilization. Cloud computing has
gained immense popularity due to the flexible characteristics. Moreover, multiple
businesses and industries have migrated their computational requirements to the cloud
infrastructure. As a result of the increasing popularity, the cloud computing industry
had an annual revenue of $250 billion in 2016 [140]. Moreover, cloud computing
industry consumed 2.4% of the global electricity [140].
Cloud computing is an ideal execution platform for applications that either
require large computations or store data extensively. Big data applications are an ideal
integration for cloud computing paradigm as they require extensive data processing and
storage for analysis that reveal trends and associations among the data. The cloud data
centers housing thousands of compute and storage nodes facilitate big data applications
with flexible resource management techniques. The data for big data applications is
captured from sources, such as IoT, sensors, and mobile devices and stored over the
cloud servers [141]. Further, big data applications execute over the data to produce end
results and reports after extensive analysis.
Applications of recommender systems, image processing, e-health, bio-
informatics, and smart cities can be categorized as big data based on the scale of their
volume, variety, velocity, and value. Any scientific application integrating with cloud
computing has to develop tools for parallel processing that exploit the cloud resources
efficiently for optimizations. Big data and cloud computing have an application based
relationship in the form of Hadoop and MapReduce frameworks [142]. Hadoop is a
distributed framework for data storage and processing that can be easily mapped on the
cloud servers. The distributed data storage of Hadoop is based on Hadoop Distributed
File System (HDFS) while the compute component of Hadoop is based on the
MapReduce programming model [142]. In terms of big data, the Map element performs
a query such as sorting on the data, while the Reduce element summarizes the query
results to form a report after certain query based analysis. Conventionally, the Hadoop
and MapReduce frameworks provided scalability but did not provide efficiency when
34
compared to parallel databases [143]. However, as the existing big data activities are
performed on large-scale data centers with the help of distributed Hadoop and
MapReduce frameworks, they are eventually more efficient than the conventional
parallel processing compute applications [144]. To provide ease of use to the end user,
the Hadoop and MapReduce frameworks manage the task scaling, failover, and
parallelism by themselves while optimizing the overall query performance [145].
Figure 2.3 Cloud Computing
The user does not need to write sophisticated programs for parallel processing of the
data. End user is only required to formulate a query for the data analysis as shown in
Figure 4. In the basic integration model of cloud and big data, the big data sources are
connected to the cloud servers that store the data in a distributed manner over multiple
servers with the help of HDFS. The big data sources including sensors and mobile
devices require internet connectivity to store data on the cloud servers. Further, the
Hadoop framework executes on the distributed data to produce the analytical reports
after operations like filter, sort, and complex analytical queries. Similar to the
distributed data storage of Hadoop, the MapReduce element also functions in parallel
while adapting to the multi-server architecture of the cloud. Three techniques can be
adopted for efficient integration of the cloud and MapReduce framework. These are:
(a) the cloud MapReduce runtime which is an extension of the basic MapReduce
35
framework, (b) utilizing MapReduce as a cloud service, and (c) installing MapReduce
independently on a set of cloud servers for distributed parallel processing [145, 146].
Modern recommender systems take user preferences from sources like social networks,
sensors, and mobile devices. Due to the nature of the data sources, most of the data on
which modern recommender systems work on is big data. The data extracted from these
sources can have details of locality, user preferences, and decisions. Therefore,
extensive storage resources are required to perform the tasks of recommender systems.
Moreover, recommender systems make predictions and recommendations after filtering
data and addressing issues like data diversity, accuracy, and stability [1]. Therefore, the
resources and programs required for performing the operations of recommender
systems efficiently cannot be limited to a single compute node. Hence, recommender
systems often utilize cloud computing based infrastructure for effective real-time
implementation. The MapReduce framework needs to be aware of data locality in order
to perform its tasks efficiently [143].
2.6 Summary
In this chapter, a systematic review of the scientific literature is presented and
summarized the efforts and contributions of researchers in the area of recommendation
systems. First, basic filtration techniques are discussed such as content-based and
collaborative filtering based and also the hierarchy of the said techniques used in
recommendation systems. Secondly, the classification of criteria for recommendations
that include accuracy, familiarity, novelty, diversity, content compatibility, justification
of recommendations, and sufficiency of information is discussed in detail. The
similarity calculations and evaluation metrics used in the area of recommendation
systems are also presented in detail. Moreover, this research has critically investigated
the challenges in the design and implementation of recommendation systems.
This research summarize the following observations: (1) the quality of
recommendations can be improved by using all kind of additional information such as
check-in data, geographical information, social relationships and temporal information.
(2) model-based approaches are more effective and efficient than memory-based
approaches, and the performance of model-based approaches are also consistent. (3)
quality recommendations can only be achieved by using user-based recommendation
approaches rather than item-based approaches. Among criteria for recommendations,
36
the following observations are noticed: (a) Accuracy alone is not sufficient for the
selection of related algorithm, (b) the users are more likely interested in the familiar
items rather than unfamiliar ones, and (c) users like diverse recommendations as
compared to more accurate recommendations.
Next chapter attempted to overcome the problems identified in this chapter for
recommendation systems. The focus is to generate real-time recommendations while
handling the problems of cold start, data sparseness, and scalability. To address
scalability challenges, cloud computing is used to perform real-time processing of large
scale data. MapReduce framework is also used to enable parallel computation on
multiple nodes. Moreover, data is pre-processed to filter out insignificant/redundant
data from online recommendation. To handle the cold start problem, our model
maintains a pre-computed ranked list of popular users and venues. Using such list, our
model compares the preferences of the new user with the existing popular users. The
model suggests the recommendations according to the top user that best matches the
preferences with the new user.
37
Chapter 3 Real-Time Venue Based Recommendation System
38
3.1 Location Based Recommender Systems
The advent of sharing locations with each other in social networking services helps
strengthening the association between real-world social networks and online social
networking services [147]. LBRS is the system in which people share the location
embedded information with each other [148]. Users in LBRS share location tagged
media contents such as text, videos, and photos [149]. The location parameters
comprises the current location of the user with a timestamp as well as the location
history of the given user for a certain period. When same location is shared by two or
more users, the information also includes the complete knowledge of the users’
common behavior, interests, and activities extracted from the users’ location history
and location tagged information [27]. Existing LBRS services can be categorized as
geo tagged media based, point location based, and trajectory-based as shown in Figure
3.1.
Figure 3.1 Services offered by LBRS
3.1.1 Geo Tagged Media Based
Geo-tagged media-based services allow users to add location with users’ media
contents such as text, videos, and photos that were created in the physical world [150].
Passive tagging occurs when a user explicitly creates and adds the contents along with
the location [150]. Geo-tagged media-based services allow a user to view other users’
content in a geographical context by using digital maps on smart phones [151]. Popular
Services of LBRS
Geo Tagged Media Based
Point Location Based Trajectory Based
39
applications that provide LBRS services include Geo Twitter1, Flickr2, and Panoramio3.
It has been shown in [33] that addition of only location dimensions such as longitude,
latitude does not necessarily attract users, as users are more interested by actual media
content. Therefore, the addition of location information only acts as an add-on to enrich
and unify the media contents. The authors in [33] further indicated that addition of the
location feature does not have much impact on the connections and relationships among
the users rather it is the media content that is responsible for such connections and
relationships between users.
3.1.2 Point Location Based
Point location based services allow users to add and share users’ locations such as
restaurants, shopping malls, or cinemas [20]. The most common applications to offer
such services are Foursquare, Instagram, and Facebook that encourage users to share
their existing location. Users of such applications are provided with options to perform
a “check-in” at different locations that are visited by the users in their daily routine to
share experiences and knowledge by giving a tip or feedback [32, 33]. For example, a
user can share their views about a dinner to their community on an online social site
while using their smart phone. Moreover, such services also keep track of the users’
geospatial check-in data such as time and longitude/latitude [33]. In Foursquare, after
checking in at different locations, the application awarded badges and points to the user.
The user that has the most number of visits at a particular location has been capped as
“Mayor”. One of the main advantages of such services that allow real-time location
tracking of users is that users can discover friends around their physical locations that
help in boosting a user’s social activities in the physical world. For instance, after
discovering a friend’s physical location from his/her social network, one can offer the
friend to have a lunch or shopping activity. The use of tip or feedback in location-based
services allows users to share comments and suggestions that can be either positive or
negative. Such tips or feedbacks are pivotal in aggregating recommendations. Unlike
geo tagged media, a point location venue is the main component for the users that
1 http://geo-twitter.appspot.com/ 2 https://www.flickr.com/ 3 http://www.panoramio.com/
40
determine the connection between the users and the media content such as feedbacks,
badges, and tips are associated with the point location [147].
3.1.3 Trajectory Based
Trajectory based services allow users to add both point locations and the routes to that
point location. The most common applications that offer such services are Bikely4,
Microsoft GeoLife5, and SportsDo6. Trajectory based approaches compute information
by extracting data about a user’s visit patterns at different locations, duration of stays,
and the paths selected. Such services allow users to add information such as speed,
distance, duration, and route about a specific trajectory, as well as the user’s media
contents such as tips, tags, and photos along with the given trajectory [26, 33]. Other
users of the same community can take guidance from the experiences of their friends
by following the trajectory using digital maps or smart phones. In summary, such
services offer both “how and what” along with “where and when”.
3.2 Distinguishing Features of Locations
In LBRS, the focus of the recommendations remains on the locations beside the media
contents. Therefore, it is important for the recommendation systems to recognize and
consider the unique features of the locations to make recommendations to a user that
meets the criteria of both accuracy and quality of recommendation. Following are the
distinguishing features of the location.
3.2.1 Location Hierarchy
There are multiple scales at which location can be considered. A location can be a small
shop or restaurant or it can be a big town or city. Such locations, smaller or bigger form
a hierarchy where locations at the bottom are refer to smaller geographical areas [152].
For instance, a restaurant, cinema, or shopping mall may belong to a community,
community belongs to a town, town belongs to a country, and so on. The different levels
of hierarchical formulation of locations lead to diverse user-location and location-
location graphs. The authors of [27] suggested that even if a user has identical location
histories, different user-location and location-location graphs will be formulated. The
4 http://www.bikely.com/ 5 http://research.microsoft.com/en-us/projects/geolife/ 6 www.sportsdo.com.br/
41
importance of the hierarchical relationships and their consideration is essential for the
recommendation systems because these relationships have a very significant role in
establishing the connections between the users [67]. For example, users who share
locations such as restaurant or shopping malls that are considered as lower level items
in the hierarchy possibly have strong connections than users who share locations such
as towns or countries that are considered as higher level hierarchy. As a consequence,
the hierarchical property in LBRS is unique and needs to be considered.
3.2.2 Distance of Locations and Users
The second distinguishing property of locations in LBRS is distance. To find the
strength of relationship and connections between the users, distance must be
considered. The shorter the distance the stronger will be the relationship. There are three
geospatial distance relations defined to compute the relationship and connections
among the users using distance [67]. The three geospatial distance relations are the
distance between the users, the distance between a location and a user, and the distance
between two locations. Distance in all three aforementioned cases can affect the
recommendation systems in three possible ways. In the first case, the distance between
two users shows the similarity between users. For instance, users who have similar
history of visiting same location have high priority to have similar preferences and
interest [125, 152] and the users who have similar residential area are possibly friends
[153]. In the second case, the distance between users and locations shows the
probability of a user’s attraction to the specific location. For example, users frequently
visit nearby places than other places that are far away from users’ homes [154]. In the
last case, the distance between two locations shows the association between different
locations. For instance, restaurants and shopping malls are mostly situated near to each
other [21].
3.2.3 Sequential Ordering
Most users visit favorite places at regular intervals. Such regular intervals create a
relationship between users in a sequential order. For example, two users of the same
company regularly dine at two different restaurants and meet again in the cinema, an
ordering of visits can be created that may show some common preferences among them
[27] or that may possibly show traffic conditions [155]. Figure 3.2 shows the visit
42
patterns users can have in a LBRS. A user can visit different locations and add such
locations with media contents such as tips, tags, and feedbacks.
Figure 3.2 Visit patterns of users in a LBRS
3.3 Motivation
Rapid increase in the use of mobile social networking applications has been witnessed
in the past few years. These applications empower a user to update his status while
visiting a venue. The status is updated either based on feedback or tip based mechanism
[3, 26, 34]. The amount of data gathered by such applications on daily basis is huge in
volume. On the basis of data collected through mobile social networks, numerous
recommendation systems have been developed in recent years. The recommendation
systems consider location based data to suggest venues to the users based on the spatial
locality. However, generating real-time recommendation is still a research issue
because of large diversity in dataset of users’ historical check-ins [66, 67]. For
generating the optimal recommendation for user, the recommendation system has to
strictly consider the following factors: (a) personal preferences, (b) past check-ins, (c)
current context, such as time and location, and (d) collaborative social opinions (other
individuals’ preferences).
Numerous studies [20, 34, 70, 72, 87, 98, 101, 110, 127, 131, 148, 156, 157]
have considered collaborative filtering (CF) method to solve the venue
recommendation problems. The legacy venue recommendation systems consider
matching given user’s venue check-in profile data with one pre-recorded by other set
43
of users. The profiled data is stored in a user-venue check-in matrix. The similar set of
users based on the similar patterns presents their opinions for the visited venues. As a
result, the recommendation system offers useful personalized recommendations for the
target user. However, the given below research gap in the previous works significantly
impacts the performance of legacy venue recommender systems:
Cold Start. The cold start problem in state-of-the-art CF recommendation systems [71]
is noticed when venue recommendation is needed for a new user. The main reason of
this behavior is this that system knows only limited information for new user to perform
similarity measures on user-venue matrix. As a result, insufficient records of venues
outputs zero values of similarity computations. Further, zero values shows that
recommendation quality is substantially degraded. Our proposed model solved the cold
start problem by generating the recommendations of Top-N users whose preferences
best matches with cold start user. The Top-N users must be the nearest neighbor and
also the popular among the nearest neighbors.
Data Sparseness. Users visiting limited numbers of locations results in sparse user-to-
location check-in matrix. The negative effect of data sparseness is the non-optimal
computations of the nearest neighbor set of similar users with the particular user that
also affect the accuracy of recommendations. Moreover, performances of the existing
LBRS are also affected by the sparseness of user-to-user relationship matrix when
directly manipulated with CF-based models [31]. In our proposed model, the problem
of data sparseness is handled by introducing a preprocessing phase. The preprocessing
phase utilizes the K-Means clustering mechanism to create various clusters within data
and assign users to the clusters. The users within a same cluster are more likely to have
similarity in preferences. Therefore, this leads to the reduction in zero similarity values
that occurs when data is sparse and similarity is computed among users with non-
matching preferences.
Scalability. The memory-based CF recommender systems exploit rating data generated
by users for applying simplistic approaches for computations of similarity among users
or items [36, 72, 73]. Unfortunately, such type of systems suffers from scalability
issues. These type of systems parses thousands of users at real-time in user-venue
matrix. However, this approach is neither efficient nor scalable. To handle the
scalability issue, a few works has considered applying model based CF. The model-
44
based approaches embodies data mining and machine learning methods for finding
patterns based on the trained data for reducing the big data size of the user-item rating
matrix [1, 31]. However, there exists a tradeoff between the suppressed dataset size and
quality of recommendation. It is saying that if the size of dataset is reduced to achieve
fast online processing, then it will result degradation of recommendation quality. The
instant aftermath of the above listed issues is the suboptimal performance in CF-based
recommendation models. Consequently, it might not be viable to exclusively employ
memory-based CF model for venue recommendations. To address the challenge of
scalability, our model used cloud computing to perform real-time processing of large
scale data. Moreover, MapReduce framework is used to enable parallel computation on
multiple nodes.
The core objective of our study is employing efficiently the aforementioned
factors to acquire truly real-time recommendations for venues. But, there occurs several
issues which negatively impacts performance of the real-time recommendation process
mainly driven by the cost and complexity of processing large-scale diverse datasets [1,
31]. For efficiently scaling, a recommendation system needs large-scale storage and
computational resources. This current study describes an approach to leverage cloud
resources and offers service-based interfaces for processing, mining, comparing, and
managing large-scale datasets as required by real-time recommendations. Model-based
approach is used on cloud based infrastructure for generating venue recommendations.
The results of our research showed the success in achieving the target of generating
real-time recommendations by using model-based approaches on cloud infrastructure.
Moreover, the target by reducing the dataset is achieved on cloud environment using
MapReduce. However, a tradeoff between recommendation quality and reduced
dataset is still a hurdle. Quality of recommendations may be affected if the dataset is
significantly reduced to improve efficiency of online real-time processing. The
proposed model has solved the scalability issues by putting forward a cloud-based
architecture that allocates data and computational intensive loads over cloud servers.
For handling the problem of cold start in venue recommendations, our model compares
the preferences of the new user with the existing top users (popular users). The model
suggests the recommendations according to the top user that best matches the
preferences with the new user. For handling the data sparsity problem, our model
eliminates the sparse data in the pre-processing phase while clustering the venues.
45
3.4 Related Work
In recent years, various literatures have been presented summarizing commonly used
techniques and challenges in the field of recommendation systems [1, 59, 114, 118,
124, 135, 158]. The authors of [1] provided a general overview of recommender
systems and discussed various collaborative filtering algorithms. The authors also
provided classification in terms of similarity, neighborhood, predictions, and
recommendations. Moreover, the aforementioned survey provided a discussion on
various KNN schemes for recommendation systems and the cold start issues, along with
evaluations. However, the authors did not specifically discuss location based
recommendations. In the article presented in [98] and [59] and the focus is on the
research challenges of recommendation systems. The authors provided an overview of
the major techniques such as collaborative filtering, content based filtering, and hybrid
recommendations along with the various challenges faced by such techniques. The
authors in [118] and [135] attempted to summarize the evaluation metrics and
techniques used in recommendation systems. In the latest survey on recommendation
systems, the main focus is on neighborhood-based recommendation methods used for
item recommendation [158].
Most of the above mentioned literature provided a general overview and
research challenges of commonly deployed techniques in recommendation systems. To
the best of our knowledge, there has not been any extensive study conducted on location
based recommendation systems (LBRS) as presented here. A related study was
presented in [27]. However, the aforementioned study was conducted on the topic of
location based social networks, whereas this research has focused specifically on
LBRS. In [27], the authors restricted their analysis to the data sources used, e.g., user
profiles, history of user visited location, and history of online user activities on LBSNs,
methodology employed for recommendation, for example, content based, collaborative
based, and link-analysis based, and the objectives of the recommendations , for
instance, users, locations, social media, and activities. In contrast, our study presents a
qualitative comparison of various techniques proposed in LBRS not only for
individuals but also for group based location recommendations. Moreover, this research
additionally discussed numerous significant services offered by LBRS. Such services
are categorized as: (a) geo-tagged media based which are the services that allow users
to add location with users’ media contents, such as text, videos, and photos that were
46
created in the physical world, (b) point-location based services that allow users to add
and share users’ locations, such as restaurants, shopping malls, or cinemas, and (c)
trajectory based services that allow users to add both destination point locations and the
routes to that destination. Furthermore, the distinguishing features of the locations
which are utilized by LBRS for recommendations are also presented. The location
features are categorized as: a) location hierarchy, b) distance of locations and users, and
c) sequential ordering. Criteria to build a user’s trust and confidence on a
recommendation systems is also discussed. The criteria includes: (a) accuracy, (b)
familiarity, (c) novelty, (d) diversity, (e) context compatibility, (f) justification of
recommendations, and (g) sufficiency of information. Basic similarity calculations and
evaluation metrics are also presented, such as (a) cosine based similarity [126], (b)
correlation based similarity [127], and (c) adjusted cosine similarity [33] [125]. In the
end, a comparative study and a tabular summary of the existing schemes is presented.
LBRS can be divided into two main categories. (a) Generic location
recommendation and (b) Personalized location recommendation. In generic location
recommendations, public opinions are extracted and the system recommends the most
popular locations according to the extracted public opinions [159]. The limitation of
such type of systems is the identical recommendations from the system due to the lack
of users’ preferences. Alternatively, personalized location recommendations have been
proposed to overcome the limitation of generic location recommendations. Such
systems provide users with the most relevant locations according to the preferences
given by the user [91].
Moreover, this research categorize existing approaches as [1, 31, 34, 92, 160]
(a) trajectory based, (b) explicit rating based, and (c) check-in based approaches. In the
proposed categories, trajectory based approaches use profiled data regarding a user’s
visit succession to various locations, paths selected, and the duration of stays at those
locations. In [36], a trajectory-based graphical model has been proposed which keeps
track of most frequently traversed routes by the users and recommends most appropriate
route to a new user. In comparison to [36] , a personalized route recommendations
method is presented in [37] that is much similar to [36]. The proposed work discussed
in [159] has mined GPS trajectories data for extracting the popular locations on the
basis of users’ travel sequence. Though the aforementioned approaches propose
locations on the basis of past trajectories of the users, however, they are not able
47
distinguishing the places in terms of their categories, which is considered in our
proposed RTVR framework. Rating-based recommendation exploits the available
existing ratings’ profiled data for recommending people with most popular venues or
travel routes in rural areas. The proposed work as discussed in [161] and [162] has
design models using collaborative filtering that took into account users’ profiled ratings
for generating personalized venue recommendations. A few studies has proposed
recommendations based on implicit ratings. The implicit ratings consider check-ins
performed by the users at different venues to estimates recommendations [33, 162]. For
instance, the proposed work as discussed in [156] has applied random-walk-with-restart
method on user-venue check-in matrix for generating personalized recommendations
for the target user. The authors of [34] has put forward a recommendation approach that
generates the region-wise expert users in addition to venues from check-in data under
various types.
Majority of the aforementioned approaches consider designing recommendation built
on (memory based) CF models. The considered models enable these approaches to
efficiently depict a user’s future preferences based on profiled data. However, these
approaches overlooked scalability issues owing to large size of similarity computations
over user-venue matrix while performing online recommendation process[71, 88]. In
addition, these approaches highly suffer from the divergence of data, real-time
recommendations, and cold start problems because of having limited data in terms of
user-location matrix and user-check-in matrix [107]. Also, these approaches have
overlooked the group recommendation problem and do not take into account the impact
of real-world time-varying conditions on result of recommendations.
Varieties of location recommendation approaches are available such as matrix
factorization, explicit rating, implicit rating, route recommendations, location
recommendations, and location- based group recommendations. In the following
subsections, some of the techniques used in LBRS are discussed in detail. This Section
is categorized based on the different techniques used in LBRS as depicted in Figure 3.3.
48
Figure 3.3 Categorization of Techniques used in LBRS
3.4.1 Matrix Factorization Techniques
In matrix factorization, it is tried to find out two more small matrices by factorizing a
larger matrix, such that when the smaller matrices are multiplied, the result will be
approximately the large matrix7. Matrix factorization discovers the latent features
underlying the interactions between two different kinds of entities that help in
predicting ratings in collaborative filtering. Formally, let � represent a set of users and
a set � of locations. Let � be the matrix of size |�| � |�|that contains all the ratings of
the users assigned to locations. To find � latent features, two matrices
�withdimensions|�| � �� and �withdimensions|�| � ��, are also needed to
find such that, when matrices �and� aremultiplied, the result approximates to
matrix� as indicated in Figure 3.4 and equation (3.1). The equation for matrix
factorization can be given as
� � � ������ (3.1)
7 http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-
in-python/
Techniques used in LBRS
Types
Individual Recommendation
Group Recommendation
Methodologies
Matrix Factorization
Implicit Rating-based
Explicit Rating-based
49
Figure 3.4 Concept of Matrix Factorization
In this way, each row of matrix � and % would represent the strength of the
association between user and features, as well as, location and features, respectively.
For a user ��, prediction of rating for a location �� can be calculated by the dot product
of the two vectors corresponding to �� and �� as
�̂�� = ?��3� = � ?���
���
3�� (3.2)
The most common recommendation approach used in various literatures is
matrix factorization [163]. Matrix factorization was first used in LBRS in [164]. Using
an experimental Global Positioning Systems (GPS) dataset, interesting activities and
locations are revealed for recommendations. Among different matrix factorization
models, the most popular model used for recommendations in LBRS is the 0/1 scheme
model [21, 33]. The value 0 is used for non-visited locations and 1 is used for visited
locations. By using the 0/1 model, authors [21, 33] studied the social and geographical
influence in point-of-interest POI recommendation based on CF techniques. Moreover,
another model was proposed by [165] based on the frequencies of check-ins in order to
compute the preferences of users for venues. With the help of this scheme, the author
developed a LBRS using matrix factorization method. Furthermore, in [166], the
authors proposed a multi-center Gaussian model in combination with matrix
factorization. The multi-center Gaussian model was used to capture the geographical
influence and matrix factorization was combined with social regularization to develop
a LBRS. The main disadvantage of using matrix factorization technique is non-
convexity problem [167]. In non-convexity problem, the model has many local optimal
points and many iterations are required to know that the problem has any optimal
solution or the solution is global. Therefore, the efficiency in terms of time in using
matrix factorization techniques is effected by non-convexity problem.
50
3.4.2 Explicit Rating Techniques
Recently, many online social services have evolved that permit users to explicitly rate
the locations visited by the users. In rating-based recommendation systems, existing
ratings’ data is utilized to generate user preferred recommendations. The authors of [26,
161] have presented similar models based on users’ existing ratings to compute
personalized CF-based location recommendations. A similar approach was proposed in
[31] that applies an item-based CF-based method with all the ratings of locations in
users’ surroundings. The aforementioned techniques may strictly capture users’
preferences, but are less effective in terms of scalability. Similarly, very few entries in
a user-rating matrix results in the data sparseness problem in the above mentioned
approaches.
In addition to model/memory-based categorization, LBRS, such as [1, 27, 32,
92, 160] are also classified as being trajectory based, check-in based, and explicit rating
based. The trajectory-based graphical model was proposed in [36] in which tracks are
recorded for most frequent routes travelled by users and in return, the system
recommends the best available route to a new user. Similarly, an approach to compute
personalized route recommendations is proposed in [37]. Extracting the most popular
location by mining of GPS trajectories is presented in [27]. The process of extraction is
based on users’ travel history. All the above mentioned techniques recommend
locations based on users’ visited routes/trajectories, but such systems cannot properly
perform the differentiation between the locations in terms of the categories that will be
the main task in our proposed framework. Moreover, most of the aforementioned
techniques relied on memory-based CF-based models that allow such techniques to
represent a user’s expectations based on the user’s previous activities. However, such
techniques are not capable of providing sufficient scalability by simultaneously
processing massive amount of real-time data. Table 3.1 shows an example of explicit
rating in which each user has rated some locatio
Table 3.1 Example of Explicit Rating
Ratings of users on locations
User-location l1 l2 l3 l4 l5 l6
u1 4 1 3 3 . 2
u2 2 - 4 4 5 1
u3 5 3 4 3 . 5
u4 2 . . . . 2
51
3.4.3 Implicit Rating Techniques
There are some techniques proposed in which the models are based on implicit ratings
[26, 33]. In implicit ratings, the numbers of check-ins performed by a user at multiple
locations are recorded. In [38] the authors applied a random-walk-with-restart approach
on a user-location check-in matrix to compute personalized recommendations for a
specific user. A similar kind of recommendation approach is proposed in [32] that
compute region-wise popular locations and users from check-in data. Figure 3.5 shows
the basic concept of implicit ratings based on number of check-ins performed by the
users. Users first perform check-in by visiting different locations and later rate the
location according to their experience in that specific location. The rating of the location
along with check-in data is stored in user-location rating matrix. Different implicit
rating models are applied to user-location rating matrix in order to generate the final
recommendation list.
Figure 3.5 Basic Concept of Implicit Rating using Check-Ins
3.4.4 Route Recommendation Techniques
The authors of [68] focused on the problem of traffic jams and long queues while
traveling to tourist hotspots. Tourists suffer from congestion on the road, time wastage
due to the long queues or they often miss their favorite spot because of the traffic
conditions. The aforementioned problem of self-driving tourists may affect their
interest and satisfaction level towards their favorite locations and hot spots [168]. The
52
main idea is to eliminate the traffic jam problem by utilizing personalization techniques
and real-time traffic information. The techniques used to overcome such problem
include, Vehicle-to-Vehicle Communication System (V2VCS), Fuzzy set theory, and
Genetic Algorithms (GA). V2VCS [169] is used to share the real-time traffic
information among self-drive tourists. Fuzzy set theory [170] along with “Technique
for Order Preference by Similarity to Ideal Solution (TOPSIS)” [171] is used to
automatically score all the candidate routes instead of asking the user to rate huge
number of candidate routes. Such candidate routes are scored according to the given
preferences of the user and real-time value of five routes attributes such as, Point-of-
Interest POI, road condition, distance, fee, and traffic. A genetic algorithm [171] is used
to find the most appropriate and relevant route from the entire candidate routes
according to the preferences given by the tourist. A GA is also used to explore the route
scores and real-time traffic information. The assumptions made in the research are that
information about personalization and real-time context can be collected by the system
instantaneously. For the evaluation of the system, a web prototype was designed for the
simulation of self-drive tourists. The main contributions of the research are to introduce
a novel technique to score all the candidate routes, propose a route generation
technique, and designing of the personalized recommendation systems to offer the
service of real-time recommendations of routes. However, the authors did not consider
attributes such as travel time, route complexity, time-of-day, and the location type.
Also, flexibility in the route recommendations was not considered that allow the tourists
to customize their own route preferences. And finally, V2VCS is not applicable to some
countries because of the lack of infrastructure. Moreover, in many developing and
under developed countries, the vehicles embedded with V2VCS sensors are of latest
models and are not affordable by majority of population. Therefore, other platforms
such as smart phones need to be considered for the route recommendation services.
Similarly, a route recommendation technique is proposed in [69]. The authors
addressed the problem that a user’s preferences on the selection of routes are influenced
by many dynamic and latent factors that are difficult to model by using existing
techniques. For solving the aforementioned problem, the CrowdPlanner [69] technique
is used. This is a crowd based route recommendation technique that uses the crowd’s
knowledge. A large-scale real trajectory dataset and hundreds of volunteers were
involved for the experiments and evaluation of the technique. The main achievement
53
of the work is the selection of the best route that was verified by comparing the results
of the technique with the previous techniques, mining algorithms, and web services.
The authors of CrowdPlanner did not consider the quality controls of popular route
mining algorithms and mining latent factors that may affect the driving route.
3.4.5 Locations Recommendation Techniques
Location recommendations on the basis of the preferences of users are also proposed in
[34]. In addition to user’s preferences, social opinions from the local experts are also
considered for the final recommendations to the users. The main focus of the work is
to overcome the problem of data sparsity. The user-location matrix is sparse due to the
limited visits to locations. Therefore data sparsity occurs and it will become more
challenging when users travel outside their native city. A weighted category hierarchy
(WCH) is used to model each user’s preference and a preference-aware candidate
selection algorithm is used to select candidate local experts for taking the social
opinions. A real dataset collected from Foursquare was used for evaluation. The authors
claim that their system is more efficient than the major LBRS such as Most-Preferred-
Category-based (MPC) [27], Location-based Collaborative Filtering (LCF) [33], and
Preference-based Collaborative Filtering (PCF) [72]. However, the authors have not
considered real-world factors such as weather conditions and temporal features to
achieve the quality of recommendations.
In [154], the authors proposed Location Aware Recommendation System
(LARS). The authors have proposed a technique that used spatial properties for location
recommendation that were not considered in the previous techniques. The primary
recommendation technique used in the work is collaborative filtering. The secondary
technique used is user partitioning. It is a technique that is used to retain an adaptive
pyramid structure. The dataset used for the evaluation of the techniques is taken from
MovieLens 8and Foursquare9. For improving the proposed technique, location
attributes of real-time world factors should be considered.
The authors of [66] considered patterns of human mobility that is one of the
important aspects of recommendation, especially in LBRS. More than 10,000 frequent
8 Movie Lens: https://movielens.org
9 Foursquare: http://foursquare.com.
54
users of LBRS were investigated for monitoring their frequent mobility patterns. The
system investigates the metadata related with the location of the users, considering the
type of location and their evolution over time. The clustering of users is then performed
based on users’ movement behavior and then the system predicts the user’s next
movements. The users of the system perform a check-in for sharing their current
location with the system. GPS-enabled devices are used for performing the check-ins.
Foursquare dataset is used for the evaluation of the system. The authors claimed that
the proposed system efficiently predicts the human mobility when compared to
traditional systems. The system also predicts better mobility patterns, even when
provided with the limited history of the user. Besides the achievements, the authors did
not consider the location predictions using temporal pattern models.
The authors of [38] proposed an approach for modeling human activities and
geographical areas by categorizing the locations. A spectral clustering algorithm is used
on users’ similarities and locations of two cities using a dataset from Foursquare. The
approach used in the work allows urban neighborhood comparison within and across
cities and identification of communities based on identical visits to the same category
of locations. The key achievement of the work includes profiling and dividing users
into communities and relating users of mobile phones with their particular category and
locations. The authors of the work did not consider temporal variations for further
characterization in order to characterize users and areas at certain periods of a day. User
tips and comments are also not considered for including semantic information. Figure
3.6 shows the overview of location recommendations.
55
Figure 3.6 Overview of Location Recommendations
3.4.6 Group Recommendation Techniques
With the rapid increase in the use of social networking services, the significance of
recommendation models considering group preference has also increased. However,
most of the existing traditional recommendation schemes do not take into account group
of “friends” scenarios [38, 66, 69]. In [70], the authors proposed a location-sensitive
recommendations. The recommendations are used in ad-hoc social network
environments. The paper proposed an approach known as spatial social union which
computes recommendations not only for a single user but also for group of users. The
approach computes the similarity between two users and generates multiple matrices
derived from user-user, user-item, and user-location graphs. The online dataset of
MovieLens is used for evaluation. In [157], the authors proposed personalized event-
based group recommendations. The main contribution of the work is group
recommendations to the users living in the same city. The localization property of users
and groups was extracted and further integration of latent factor model with explicit
features of location was done in order to provide group recommendations. The dataset
used is Meetup, which is an online social media site. Table 3.2 summarizes the
description and weaknesses of the state of the art techniques and Table 3.3 provides a
summary of some of the selected techniques.
56
Table 3.2 Strength and Weaknesses of the Selected Techniques
Techniques Description Weaknesses
Liu, L., et al. [68] Scoring of all the candidate routes,
introducing route generation
technique, and real-time
recommendation of routes.
Some key routes attributes
were ignored such as travel
time, route complexity, time-
of-day, and the location.
Customization options for
users to plan their routes were
missing. Platforms such as
smart phones need to be
considered.
Bao, J., et. al. [34] Location-based Collaborative
Filtering (LCF), and Preference-
based Collaborative Filtering
(PCF). Achieved better
performance than Most-Preferred-
Category-based (MPC).
Real-world factors such as
weather conditions and
temporal features were
ignored.
Levandoski, J.J., et al. [154] Spatial properties of the users are
introduced using CF-based and
user portioning techniques that are
previously ignored by traditional
recommendation systems.
Location attributes of real-
time world factors were not
considered.
Su, H., et al. [69] Use of mining algorithms and web
services to find optimal route.
Quality control of popular
route mining algorithms and
mining latent factors were not
considered.
Preoţiuc-Pietro., et al. [66] Better prediction of human
mobility patterns. Well-handled
data sparsity problem.
Location predictions using
temporal pattern models were
not considered. Trajectory
based information also not
considered.
Noulas, A., et al. [156] Profiling and dividing users in
communities. Relating users of
mobile phones with their particular
category and locations.
Temporal variations for
further characterization and
including semantic
information need to be
considered.
Table 3.3 Summary of Techniques
Recommendations Problems
Addressed Techniques Dataset Used
Routes [68] Traffic Jams,
Long Queue
V2VCS,
Fuzzy Set Theory,
Genetic Algorithm
Questionnaires,
V2VCS data.
Location [34] Data Sparsity
Cold Start
Hyperlinked Induced
Topic Search Foursquare
Location [37] Uncertain
Trajectories User Explicit Ratings Real Dataset
57
Location [162] Personalized
Routes User Explicit Ratings Questionnaire
Location [154] Quality of
Recommendations K-Nearest Neighbor
Movie Lens,
Foursquare
Routes [36] Optimal Routes Implicit Ratings
Open Street Map
Project, Spatial
Dataset
Locations [38] Un-visited
Locations
Random Walk with
Restart
Gowalla,
Foursquare
Locations [164]
Un-visited
Locations and
Activities
Matrix Factorization Real Dataset using
GPS Trajectories
Locations[165] Spot
Identification Matrix Factorization
Austin in Texas
ATX and New
York City NYC
Datasets
Locations [166] Un-Visited
Locations
Multi Centered
Gaussian Model and
Matrix Factorization
Gowalla
Group Recommendations
[172]
Implicit
Similarities
Hybrid
Recommendations
Yahoo! Movie
Yahoo! Music
Locations [70] Location
Sensitive Spatial social union Movie Lens
3.5 Quantitative Analysis
In Section 3.4, qualitative analysis is conducted of some of the techniques proposed in
the area of LBRS. This section presents quantitative comparison of some of the selected
techniques from literature to show how the performance of existing models is affected
by different datasets. A model may exhibit good performance with only one type of
dataset but may not perform better with some other dataset. Therefore, when the models
are evaluated with different datasets, a fair comparison is made and this also provides
an insight about how the performance of a model is effected by an increase or decrease
in data sparseness, or size of dataset. Quantitative analysis of four state of the art
recommendation techniques have been performed over datasets of Foursquare,
MovieLens and Gowalla. The details of the techniques and datasets used for the
comparison are discussed below.
3.5.1 Datasets
Three real datasets have been selected that are publicly available for experimentation.
Each of the selected dataset are discussed below.
58
(a) Foursquare
Foursquare [28] offers users from all around the world to share their location experience
by performing a check-in. Foursquare dataset has more than 10 billion records of check-
ins with different attributes including users, venues, ratings, social graphs, and
longitude, latitude of the location or venue.
(b) MovieLens
MovieLens [29] is the movie rating dataset that is available online by movie
recommendation systems (a project at University of Minnesota). There are different
sizes of dataset available on the MovieLens website for experimental purposes. Dataset
is taken that consists of 1 million ratings for 900 movies with 700 users.
(c) Gowalla
Gowalla [173] is another dataset that is available online for experiments. The Gowalla
dataset was collected by a location-based social networking website which is not
functional now. The dataset consists of 30 million check-ins by 0.3 million users on 2.8
million locations.
3.5.2 Techniques
Four state of the art recommendation techniques are used for the quantitative analysis
that are discussed below.
(a) Location Aware Recommendation System (LARS)
LARS systems [154] is a technique that used spatial properties for location
recommendation. Spatial items nearest in distance will be given priority to be selected
as final recommendation to the user. The primary recommendation technique used in
the work is collaborative filtering. The secondary technique used is user partitioning. It
is a technique that is used to retain an adaptive pyramid structure.
(b) Geographical Probabilistic Factor Model (GPFM)
GPFM [174] is a technique based on user preferences and mobility patterns of the users.
The technique captivates the geographical influence on check-in data provided by the
users. GPFM also uses check-in data and extracts user’s feedback to model user
preferences.
59
(c) Latent Dirichlet Allocation (LDA)
LDA [104] is a technique for modeling user preferences according to the ratings
provided by the users. In LDA users are regarded as documents and rated items are
regarded as the words of the documents. Moreover, item location or user information
is not considered in the LDA model.
(d) Random Walk with Restart (RANDOM)
RANDOM [156] is a technique in which random walker jumps in between the nodes
of the graphs according to the transition probability. The time consumed on each node
is different with certain assumptions. The technique approaches a steady-state which
results in a vector of steady-state probabilities for each visited node. Moreover, at any
step the constant probability allows the model to jump back to the targeted node at any
time, resulting in high rank of the closer nodes from the target nodes.
3.5.3 Experiments
To evaluate the effectiveness of our model, this research opted the performance metrics
that include: (a) Precision, (b) Recall, and (c) F-measure. Precision states the ratio of
correct recommendations (true positives (&�)) to the total number of recommendations
(&� + false positives (��)). The equation for calculating the precision is given in
@�'$���%* =2�2� + .� (3.3)
Here &� represents the total number of correct recommendations (i.e., true
positives) and �� indicates the false positives. The term &� + �� captures the total
number of recommendations including true positives and false positives.
Recall states the ratio of hit set size to the total size of test set. It is measure of the
recommendation coverage for a given recommendation system as given below:
'$(�� = (2�)
(2� + .*)
(3.4)
Here, �� indicates false negative that means the recommendations incorrectly
identified as true recommendations.
60
F-measure states the harmonic mean of precision and recall as shown in (3.5).
A-�'(�9�' = 2 × @�'$���%* × '$(��@�'$���%*+ '$(�� . (3.5)
3.5.4 Observations
Figure 3.7, 3.8, and 3.9 shows the performance in terms of precision, recall and F-
measure on Gowalla dataset. The performance is calculated for the maximum value of
N=20 because most of the techniques ignored the value greater than 20 for Top-N
recommendations. Four techniques are compared namely LARS, RANDOM, LDA, and
GPFM on the all the datasets. In the first step, all the techniques are compared on
Gowalla dataset. It is observed that precision, recall, and F-Measure of all the selected
techniques is high when considering Gowalla and Foursquare datasets as depicted in
Figure 3.7 to Figure 3.12. The key reason of this behavior is this that the user online
behavior is not affected by the location factor. Therefore, considering MovieLens
dataset, probably more candidate items are available for the test user. As a result, these
potentially related items get prior rankings (i.e., larger ranking scores) that
consequently affects the accuracy and performance. Moreover, with the increase in
number of recommended venues recall value increases, whereas precision gets highly
affected as shown in Figure 3.7, 3.10, and 3.13.
Figure 3.7 Precision of Gowalla Dataset
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
2 4 6 8 10 12 14 16 18 20
Pre
cis
ion
Number of Recommendations
LARS
RANDOM
LDA
GPFM
61
Figure 3.8 Recall of Gowalla Dataset
As can be seen, the precision decreases with an increase in list size, with the dissimilar
features that maintains their virtual ranking in performance. However, with a difference
in trend, recall significantly increases with the growth of the recommendation list
Again, the dominating features outperform the others over the entire range of list size
as depicted in Figure 3.8, 3.11, and 3.14. The presented analysis represents the tradeoff
between the outcome of precision and recall that each feature faces. Therefore, the real
systems should tune their results according to the users’ preferences.
Figure 3.9 F-Measure of Gowalla Dataset
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
2 4 6 8 10 12 14 16 18 20
Reca
ll
Number of Recommendations
LARS
RANDOM
LDA
GPFM
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
2 4 6 8 10 12 14 16 18 20
F-M
easu
re
Number of Recommendations
LARS
RANDOM
LDA
GPFM
62
It is observed that among all techniques, RANDOM outperforms other techniques in
terms of precision, recall and F-Measure as RANDOM concurrently leverages several
sources of data prior to applying encoding. Other reason for RANDOM performing
better than other techniques, showing the advantages of exploiting user home location
information. Moreover, RANDOM is a model-based technique and recently literature
has demonstrated the superiority of model-based approaches to memory-based methods
[175]. It is also noticed that the performance of GPMF technique is low in all the
datasets. The reason of low performance is that GPMF considers Gaussian distribution
to model each feature that was estimated based on partitioning of space into regions. In
next stage, it opts multinomial distribution for modeling user’s mobility outline for the
regions based on the footprints left on the regions. Therefore, GPMF recommends items
with only limited (or even no) rating information available only if they reside inside the
user’s activity area. However, GPMF is poor in performance in comparison to the
proposed techniques owing to its nature to overlook the user home location profile; as
result, GPMF fails to efficiently recommend items when limited ratings information is
available. Also, while considering the early increasing information accuracy, the key
reason of this behavior is shifting data dense in location cell owing to low granularity.
Afterward, with coarse item locations, the accuracy decreases.
Figure 3.10 Precision of Foursquare Dataset
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
2 4 6 8 10 12 14 16 18 20
Pre
cis
ion
Number of Recommendations
LARS
RANDOM
LDA
GPFM
63
Figure 3.11 Recall of Foursquare Dataset
Besides, the results obtained for the two datasets including Gowalla and Foursquare are
identical to each other’s; the noticeable thing in this behavior is because the chosen
systems have used dissimilar interfaces and incentives for user participation. It was also
noticed that no location recommendation engine was put in place based on either service
while data collection was performed.
Figure 3.12 F-Measure of Foursquare Dataset
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
2 4 6 8 10 12 14 16 18 20
Rec
all
Number of Recommendations
LARS
RANDOM
LDA
GPFM
0
0.1
0.2
0.3
0.4
0.5
0.6
2 4 6 8 10 12 14 16 18 20
F-M
easu
res
Number of Recommendations
LARS
RANDOM
LDA
GPFM
64
Figure 3.13 Precision of MovieLens Dataset
The difference while considering performance was obtained based on the home distance
feature which is constantly superior in Gowalla than in Foursquare for all the
performance metrics. The main reason of this fact is that the average count of user
check-ins in Gowalla is more to the one observed in Foursquare. Resultantly, it permits
the “home” location inference to be more precise and accurate. Also, for Gowalla, when
all user social links and check-ins are available then random walk based models attain
larger performance gain due to their ability to use more high quality data for building
network structure. The characteristics of our obtained data are different and it
differentiates it from the traditional recommender system scenario. A key
differentiation is this that for other studies scenarios user reveals preferences using their
ordinal ratings, however, the proposed study captures check-ins for only numeric
frequencies: as a result, the negative feedback from the user is almost zero.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
2 4 6 8 10 12 14 16 18 20
Pre
cis
ion
Number of Recommendations
LARS
RANDOM
LDA
GPFM
65
Figure 3.14 Recall of MovieLens Dataset
Figure 3.15 F-Measure of MovieLens Dataset
Moreover, the chosen data is very sparse and the various users and venues occupy a
single check-in. Meanwhile, for both datasets, a few places are noticed that contains
extremely high number of check-ins, while maximum of the total only enjoys few user
check-ins. Therefore, the heterogeneity across users check in at different places is
noticed very high, whereas some venues are noticed reaching to the high ranks of
popularity. There exist several reasons to answer why this may be the potential case.
Firstly, check-in data doesn’t not fully captures all the preferences of users. Be a fact,
in contrast to web ratings, it tends towards capturing habitual behavior and discourages
0
0.1
0.2
0.3
0.4
0.5
0.6
2 4 6 8 10 12 14 16 18 20
Rec
all
Number of Recommendations
LARS
RANDOM
LDA
GPFM
0
0.1
0.2
0.3
0.4
0.5
0.6
2 4 6 8 10 12 14 16 18 20
F-M
easu
re
Number of Recommendations
LARS
RANDOM
LDA
GPFM
66
negative feedback. Secondly, like-mindedness may be sufficient to accurately model
the reason of venue visiting by users.
3.6 Real-Time Venue Recommendation Model
The objective of this work is to design a real-time venue based recommendation system
for individuals in a scalable manner. The existence of the huge volume of data provided
by Facebook, Foursquare, and Twitter needs data refinement such that the information
extracted must be more specific and related to the user’s query. To achieve the objective
of generating real-time recommendations in a scalable manner, following factors have
been considered: (a) users’ preferences, (b) current context such as time and location,
(c) historic check-ins, (d) geospatial characteristics, (e) ratings, and (f) collaborative
social opinion. Scalable cloud-based infrastructure is used for real-time
recommendations. Main steps of our proposed RTVR model are depicted in Figure
3.16.
Figure 3.16 System diagram for RTVR Model
67
Step 1 and 2: Clustering of Venues
In step 1 and 2, we utilize clustering approach to reduce the dataset size. We have used
K-Mean clustering algorithm for set of venues based on their geographical location. K-
Means clustering is used due to its efficient working in terms of computational cost
[176]. K-mean algorithm returns cluster label (cluster indices) based on distance from
each point to every centroid (cluster’s center point) of � clusters. For each user '�, we
find in which cluster the user lies. To achieve this we find Euclidian distance between
user location and each � cluster center as depicted in (3.6).
(�������� = ��) ��� ����� ∪ ( � =1� * + ,'� − (�,
�∈��
-,
(3.6)
where (� is location of ��� cluster center and '� is the location of ��� user location and
n is given by (3.7).
� = +��
���
,
(3.7)
where �� is the number of venues in � cluster, � is the number of venues in cluster, '� is the user and (� is the cluster center. To achieve the objective of real-time
recommendations, we have clustered all the venues � before getting the check-in
information of the user as shown in (3.7).
The reason for clustering venues is that without clustering we have to consider all the
venues for generating recommendations for each user, which takes significantly longer
time. Whereas, with clustering based approach we only consider the venues in the
cluster where user is currently located. To improve accuracy and quality of
recommendation, we also consider the neighboring clusters of the current cluster as
shown in (3.8).
�'�'$2'0���� = B �� = 1(�)��* C0��� ,�� = � � ��� − �����
�����
�D , � = � 2% 1 ,
(3.8)
68
where � is the number of selected neighbor clusters, (�, (�are the ��� and ��� cluster
centers, and . is the number of parameters used to find the distance between (�and (�. Reduction of data through offline partitioning into clusters leads to significant reduction
in time complexity and helps the system to generate real-time recommendations.
Step 3: Placement of User in the Relevant Cluster
RTVR model selects the most relevant cluster for each user. Each user is placed in
his/her relevant cluster based on the distance of the cluster centroid with the user.
Euclidian distance is used for the placement of user. The reason of using Euclidian
distance is because it represents the physical distance between two points especially
when using with K-Mean clustering technique. The direction of the points for which
the distance is calculated may be opposite but they may reside in the same cluster, if
the distance from the centroid is same for both points [177]. Suppose, a set of point / ��� ,� , �� … . . �� and set of point 0 �1� , 1 , 1� … . . 1� , then the Euclidian distance
between point / and point 0 is given by (3.9) [178].
���,�� = ������� − ������ + ����� − ������ + ⋯ ����� − �������
= �;∑ ��E� − F������� <
(3.9)
It is likely that the preferred venue may lie in other neighboring venue clusters. Thus � neighboring clusters based on the distance from the selected cluster center to other
cluster centers are selected. Figure 3.17 shows the example of user placement in his/her
relevant cluster. When the user perform “check-in”, the system places him in his/her
relevant cluster according to the geographical values of the check-in. It is shown in the
example that user is placed in cluster 5 by the system.
69
Figure 3.17 User Placement in the Relevant Cluster
Step 4 and 5: Ranking of the Venue
In the last phase, RTVR model uses KNN algorithm to rank all the selected venues in
order to recommend the top ranked venues to the user. KNN is used because it is proved
to be the most efficient algorithm in terms of performance for ranking especially when
using with Map-Reduce in large datasets [144]. Another advantage of using KNN with
Map-Reduce is its low communication cost therefore it is frequently used for rankings
on cloud with large datasets [179]. For all venues in the selected � clusters, RTVR
ranks the venues based on preferences of the user including ratings provided by social
friends, frequency of check-ins, user to venue distance, and average rating of the venue.
Consequently, the top ranked venues will be suggested as the final recommendation to
the user as shown in (3.10)[130].
���2'�,"3 =4'�.$�4,'�, ‖$�‖ ∀ $� ∈ "
(3.10)
Where " is the set of selected candidates’ venues and '� is the user’s preferences.
70
3.6.1 System Architecture
The proposed model generates real-time recommendations to the end user. As depicted
in Figure 3.18, the first step is the user input, i.e., acquire user’s information using
check-in data. Dataset of foursquare is used having over a million check-ins by different
users. In the second step, the recommendation request of the user is dispatched to the
cloud infrastructure. All the users and venues information including user’s preferences,
check-ins, ratings, social graph, and location of the venues is stored on the cloud. In the
third step, the data of clustered venues is fetched that has already been generated offline
using K-Means clustering. In the fourth step, the user is placed in the relevant cluster
according to the location provided in the check-in. In the fifth step ranking of the venues
is generated using KNN and top ranked matrix of user-venue is created to suggest the
final recommendations to the user.
Figure 3.18 System Architecture
3.6.2 Proposed Algorithm
The proposed clustering based venue recommendation algorithm is presented in
Algorithm 1. Detailed working of proposed algorithm is described as follows. In the
71
first step, the proposed algorithm reads the venues one by one and call MapReduce
mapping function against each venue. Distance is initially set to zero. In the next step
minimum distance is calculated from the center of clusters. Add new cluster to the list
of clusters and test if the cluster size is appropriate. Afterwards, the algorithm resize
the cluster if required. To do this step, minimum size of the clusters and maximum size
of the clusters are found among the generated lists of clusters. In order to achieve equal
cluster size, clusters are resized by dividing it into two or more clusters if the maximum
size among cluster list is greater than 1/k time total size of the data. Otherwise merge
the clusters and generate new cluster list.
Algorithm 1. Clustering using K-Means
Input: Venues Vmn
Output: Clustered Venues
1. Load "��
2. FilePointer = Map "��
3. Create Two Lists
4. List2 = List1
5. Call read ("��)
6. FilePointer = MapCluster ( )
7. Distance_Value = 0
8. read (distance_venue)
9. Distance_Value = minCenter ( )
10. New List of Cluster
11. if cluster size is too large or too low then
12. Resize the cluster
13. Clrmax = findMaxSize (ListofClusters)
14. Clrmax = findMinSize (ListofClusters)
15. end if
16. if Clrmax > 1/k * total size then
17. Resize the cluster
18. end if
19. Add Cluster to List of Clusters
72
The proposed ranking of venues using KNN algorithm is presented in Algorithm 2.
Detailed working of proposed algorithm is described as follows. In the first step, the
proposed algorithm loads all clusters, social graph of the users, ratings provided by the
user, check-in information and all the users in the list. The distance of all the centroids
are calculated and the smallest distance center from the user is selected. In the next step,
k neighboring clusters are selected. For each cluster, feature matrix is constructed and
for each venue average of ratings of the friends for all venues is calculated. Similarly,
for each venue, total check-in for venues is counted, distance from user '�for venue "�
and average ratings of all the users is calculated. In the next step, ranks are generated
for each cluster and then ranking for all the nodes is combined. After combination of
rankings of all the nodes, top � venues are selected for final recommendation.
Algorithm 2. Ranking of venues using KNN
Input: Clusters, social graph, ratings, check-ins, users
Output: Ranked top � venues
1. Load clusters, social graph, ratings, check-ins, users
2. for each user '�
3. calculate the distance 6�and select the center with smallest distance
4. �� = /�)��� 78�'� + �( 9
5. select the neighboring cluster
6. end for
7. for each ( ������������construct the feature matrix with features
8. for each venue in ( ������������
9. calculate Average (Venue . rating (friends �) )
10. Count check-ins for venues
11. calculate distance from user '� venue "�
12. calculate average rating of venues by all users
13. Find the rank within each cluster ( ���������
14. Combine the ranking for all nodes
15. Select Top-� venues
16. end for
73
17. end for
3.6.3 Complexity of Clustering Algorithm
The first phase of algorithm reads list of venues and maps each venue to a certain cluster
that takes :(� × . × log�. ) time where � is the total number of venues and . is the
number of clusters. In the next phase, each cluster is evaluated and clusters are resized
considering maximum and minimum size of clusters having complexity . × (. ×
log�. ). Consequently, complexity of clustering algorithm is represented in (3.11),
which is reduced to the (3.12) and (3.13) :
:(�� × (. × log�. + (. × (. × log�. ))
:2�. × log�. × �� + . 3
:(log�. × (� + .))
(3.11)
(3.12)
(3.13)
3.6.4 Complexity of Ranking Algorithm
In the first step, algorithm calculates distance between the user and all the cluster
centers followed by selection of the cluster with least distance from the user that takes :(. × log (.)) time, where . represents the number of clusters. Afterwards, feature
matrix is calculated for each venue in the selected cluster, which has time
complexity :(), assuming average number of venues in each cluster is that can be
calculated as = � .; . Consequently, the complexity of ranking algorithm is
represented by (3.14):
:(. × log�. + ) (3.14)
3.7 Formal Verification
Basic introduction to HLPN, SMT-Lib, and Z3 solver for a clear understanding of the
reader is provided as follows.
3.7.1 High Level Petri Nets
In wide variety of systems like, parallel, distributed, stochastic, concurrent,
asynchronous and non-deterministic systems, the Petri Nets are extensively applied for
74
their mathematical and graphical modeling [180]. In this wok HLPN, a variant of
conventional Petri is used for the formal verification of our algorithm. HLPN is a 7-
tuple structure: � = �<, =, >, ?, �, @, A� where P refers to the set of places, the
transitions set is denoted by = such that (< ∩ = = ∅) and the flow relation is showed
by > such that �> ⊆ �< × = ∪ �= ∪ < . Moreover, ? is used to map the places < to
data types and transitions rule set is denoted by �. By representing the initial marking, @ is a label on > and A�. The data regarding the organization of the net is provided by
(<, =, > ) and the static semantics are provided by (?, �, @ ), which shows that
throughout the system the information does not change. In HLPN, the places may
contain multiple types of tokens and may also be a, two or more type`s cross product.
To enable any transition the pre-conditions should be satisfied. Furthermore, to enable
particular transition the variables from the incoming flows are used. Likewise, for
transition firing, the post-condition uses variables from outgoing flows.
3.7.2 SMT-Lib and Z3 Solver
To check the satisfiability of formula over theories under consideration, SMT-Lib
(Satisfiability Modulo Theories Library) is used [181] which offers a collective input
platform and benchmarking framework. That platform/framework evaluates the
systems. SMT has been practiced in many areas comprising deductive software
verification. Being developed at a Microsoft Research, SMT-Lib with Z3 solver is used
to prove the theorems. To verify whether the set of formulas are satisfiable in the built-
in theories of SMT-Lib, Z3 solver being an automated satisfiability checker is used.
The correctness of the system is checked during the verification process. The
phenomenon of Bounded model checking confirms that system terminates after finite
number of states (for any of the input parameters) or not. The readers interested to get
a more detailed introduction of Petri nets are encouraged to see [182, 183]. Moreover,
in this checking (a) the system`s description is provided stating its properties (b) system
is denoted by a model (c) a verification system checks whether the specified properties
are present in the model or not. The HLPN model for RTVR is shown in Figure 3.9.
The development of petri nets model involves identification of data types and places,
and the mappings of data types to places. Table 3.4 and 3.5 show the data types and
their mappings, respectively. Figure 3.19 shows the HLPN model for the proposed
75
RTVR algorithms. The rectangular black boxes show the transitions and belong to set =, whereas the circles represent the places and belong to set < in HLPN model.
Table 3.4 Data type for HLPN Model
Data Type Description
Lat A number depicting latitude value
Long A number representing longitude value
VID A number representing logical ID of a venue
PVenue A list containing VIDs, lat, and long of venues
ClID A number for cluster ID
CID A number showing centroid ID
PCl A list containing ClID, CID, and VID
UID A number representing user ID
PUReq A list containing U_Req, UID, lat, long
U_Req A string showing user request
PnClust A list of n closest PCl
Soc_Graph Social graph of the user
P Ratings A list containing ratings, UIDs, and VIDs
PCheck-ins A list containing check-in frequency and VIDs
PSF_ratings A list of UIDs, social friends, ratings (integer), and VIDs
PCI_Freq A list of check-in frequency (integer) against VID
Distance A number showing distance from user to venue
PAvg_Rat List of average rating for VIDs
PRank_Ven List of ranked VIDs from Pn clusters
Ptop_n_Ven List of top n VIDs from Prank_Ven
Table 3.5 Places and mapping used in HLPN model
Place Mappings
φ (VL) Ϸ(PVenue)
φ (CL) Ϸ (Pvenue × PCl)
φ (UR) Ϸ (PUReq)
76
φ (nC) Ϸ (PnClust)
φ (HD) Ϸ (Pcheck-ins × PRatings)
φ (Features) Ϸ (PSF_ratings × PCl_Freq ×Distance ×PAvg_Rat)
φ (nV) Ϸ (Prank_Ven × Ptop_n_Ven)
Figure 3.19 HLPN Model for the Proposed RTVR Algorithms
3.7.3 Modeling and Analysis of Proposed Algorithm
In step 1 all of the venues are clustered based on their geographical locations. As
described earlier K-Mean clustering is used for the said process. Following formula
map to the transition under discussion
�((_") ⩝ � ∈ (� , ∀ ∈ ( ⎹
77
( = �_AB�� (�) ˄
(ˊ = ( ∪ �(C1D , (C2D , (C3D� Based on user request, closest ‘n’ clusters are selected. The following rule highlights
the transition ‘C_S’
�((_�) = ∀� ∈ (�, ∀� ∈ (�, ∀� ∈ (�⎹
(� ∶= ��B�&_���&B�� ((�,(�) ˄
(�ˊ = (� ∪ �(�C DC1D , (�C DC2D ,(�C DC3D � Features are extracted based on user’s social friend’s ratings, check-ins, frequencies,
distances of user to venue, and average ratings for a particular venue. The following
rules maps to the transition ‘F-C’
��>_( = ∀� ∈ (�, ∀� ∈ (�, ∀� ∈ (�, ∀� ∈ (�⎹
(� C1D ∶= (��_�>��&��) ((� C2D , (�, (�)˄
(� C2D ∶= (��_(��>�B� ((� C1D , (�,(�)˄
(� C3D ∶= (��_6��& ((�, (�)˄
(� C4D ∶= (��_�$)��& ((� C2D , (�, (�)˄
(�ˊ = (� ∪ �(�C1D , (�C2D , (�C3D ,(�C4D� Based on the feature calculated in previous step, all venues of selected n clusters are
ranked are sorted in best to worst order. From the sorted list ‘m’ top venues are selected
and displayed to user. The following rule maps to the transition
�(&_�_$) = ∀�� ∈ (��, ∀�� ∈ (��, ∀� ∈ (�⎹
(�C1D ∶= ���._���_$B��B� ((�� C1D ,(�� C2D ,(�� C3D, (�� C4D (��) ˄
(�ˊ = (� ∪ �(�C1D� ˄
78
(�ˊ = (� ∪ �(�C2D� 3.7.4 Verification Property
The objective of the verification was to ensure that the top ranked venues are in fact the
highest ranked venues. As all of the features are translated to number, therefore, the
selected top venues must have highest cumulative sum of feature values, with lesser
distance. The property can be expressed as follows
/E F��ℎ�)ℎB�&2����<&�_�_$B�C�D, <�$)_��&C�D<ℎB._���C�D,<�>_��&��)C�D− 6��&��BC�D 3G
The property was tested is Z-solver and was satisfied in 221 msec.
3.8 Experimental Setup and Results
Extensive simulations are conducted to evaluate performance of the proposed system.
The experimental setup and results are discussed as follows.
3.8.1 Experimental Setup
The details of the experimental setup and parameters used for evaluation are presented
in Table 3.6.
Table 3.6 Experimental Setup
Parameters Values
Total Number of Users 2153470
Total Number of Venues 1143091
Total Number of Check-ins 1021967
Total Number of Ratings 2809581
Edges of Social Graph 27098488
Simulation Tool MATLAB
Cloud Configuration MATLAB Parallel Cloud, Cores 16
79
3.8.2 Results
This section presents the evaluation of the proposed cloud based Real-Time Venue
Recommendation (RTVR) framework. For the comparison, the following existing
recommendation techniques are selected proposed in literature which are closely related
to our work. Cloud based context aware recommendations OmniSuggest [58],
Popularity based ranking (POPULAR) [156], Social-based ranking SOCIAL [156],
and Single Value Decomposition (SVD) matrix factorization [31].
Proposed algorithm is executed using Matlab cloud framework. Being
responding in real-time traffic information mode, RTVR response time is ideally
superior to other systems because of parallelization in the processes. MapReduce is
used for the parallel execution of the proposed model on Cloud environment. To
evaluate the effectiveness of a user recommendation model, following performance
metrics are opted that include: (a) Precision, (b) Recall, and (c) F-measure.
Figure 3.20 Precision with Clustering
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 5 10 15 2 0
Precision
No. of Recommendations
RTVR
OmniSuggest
SVD
POPULAR
Social
80
Figure 3.21 Recall with Clustering
Figure 3.22 F-Measure with Clustering
Figure 3.20, 3.21, and 3.22 show that the RTVR framework achieved the best
performance while considering precision and recall parameters in comparison to the
rest of the schemes except OmniSuggest (each of the result shows the average of 100
random runs). The reason behind this behavior is that the objective of our proposed
model is to generate real-time recommendations by reducing the dataset therefore
accuracy is compromised in terms of precision, recall, and F-Measure. However, the
behavior of RTVR model is analyzed without clustering to check the accuracy of
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 5 10 15 20
Recall
No. of Recommmendations
RTVR
OmniSuggest
SVD
POPULAR
Social
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1 5 1 0 15 20
F-M
easures
No. of Recommedendations
RTVR
OmniSuggests
SVD
POPULAR
Social
81
proposed model in terms of precision, recall, and F-Measure. Figure 3.24, 3.25, and
3.26 depicted that RTVR outperformed all the existing techniques.
Figure 3.23 Time Comparison with and without Clustering
However, time complexity to generate the recommendations from the complete
dataset is very high as depicted in Figure 3.23. The time shown in Figure 3.23 without
clustering presented constant behavior because it is not based on clusters. Therefore,
the target of generating real-time recommendations by reducing the dataset is
successfully achieved. However, tradeoff between recommendation quality and
reduced dataset still exists. Quality of recommendations may be affected if the dataset
is significantly reduced to improve efficiency of online real-time processing. Therefore,
the ratio of precision and recall is slightly less than OmniSuggest as depicted in Figure
3.20 and Figure 3.21. However, RTVR outperforms other techniques in terms of
precision and recall. The commonly used and highly cited collaborative filtering
technique, such as SVD, have shown lower performance in terms of precision and recall
compared to RTVR. The popularity-based approaches including POPULAR and
SOCIAL has shown better performance than the collaborative filtering techniques. The
main reason for this behavior is that popularity-based approaches overlook similarity
computations in the design of their models. The recall of RTVR framework was noticed
highest for N=20 due to greater coverage of framework in terms of recommendations.
0
200
400
600
800
1000
1200
1400
1600
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Tim
e i
n S
eco
nd
s
No. of Selected Clusters
with clustering
without clustering
82
In comparison to the rest of selected schemes the proposed cloud based RTVR
model exhibited superior performance in terms of the F-measure. On the other hand,
performance of RANDOM remained low for all the aforementioned metrics. The main
reason of this behavior is that RANDOM only shuffles the candidate set of unvisited
locations for each user and overlooks performing similarity computations.
Figure 3.24 Precision without Clustering
Figure 3.25 Recall without Clustering
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 5 10 15 20
Pre
cis
ion
No. of Recommendations
RTVR
OmniSuggest
SVD
POPULAR
Social
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 5 10 15 20
Recall
No. of Recommmendations
RTVR
OmniSuggest
SVD
POPULAR
Social
83
Figure 3.26 F-Measure without Clustering
Figure 3.27, 3.28, and 3.29 depicts the difference between precision, recall, and F-
Measure values respectively. The effect of the clustering and non-clustering is shown
in the aforementioned figures.
Figure 3.27 Difference between Precision Values
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 5 1 0 15 20
F-M
EASURES
NO. OF RECOMMEDENDATIONS
RTVR
OmniSuggests
SVD
POPULAR
Social
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 5 10 15 20
Pre
cisi
on
No of Recommendations
84
Figure 3.28 Difference between Recall Values
Figure 3.29 Difference between F-Measure Values
Figures 3.30, 3.31, 3.32, and 3.33 show the results for evaluating scalability of
proposed model. Evaluation has been conducted over 5, 10, 15, and 20
recommendations. It has been observed that the proposed model exhibits consistent
behavior in terms of Precision, Recall, and F-Measure while varying the number of
recommendations. The results indicate that the proposed model is scalable while
increasing the number of recommendations.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 5 10 15 20
Reca
ll
No of Recommendations
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 5 10 15 20
F-M
easu
res
No of Recommendations
85
Figure 3.30 Scalability over 5 Recommendations
Figure 3.31 Scalability over 10 Recommendations
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
50 100 150 200 250 300 350 400 450 500
Rati
o
No. of Users
Precision
Recall
F-Measure
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
50 100 150 200 250 300 350 400 450 500
Rati
o
No. of Recommendations
Precision
Recall
F-Measure
86
Figure 3.32 Scalability over 15 Recommendations
Figure 3.33 Scalability over 20 Recommendations
Figure 3.34, 3.35 and 3.36 show the results for evaluating scalability on the basis of
precision, recall, and F-Measure, respectively. Evaluation has been conducted for N =
20 (maximum) recommendations. It has been observed that the proposed model
outperforms other techniques in terms of Precision, Recall, and F-Measure while
varying the number of users. Moreover, the results indicate that proposed RTVR model
is scalable and exhibits consistent behavior while increasing the number of
recommendations.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
50 100 150 200 250 300 350 400 450 500
Ratio
No of Users
Precision
Recall
F-Measure
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
50 100 150 200 250 300 350 400 450 500
Rati
o
No of Users
Precision
Recall
F-Measure
87
Figure 3.34 Scalability Comparison w.r.t Precision
Figure 3.35 Scalability Comparison w.r.t Recall
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
50 100 150 200 250 300 350 400 450 500
Pre
cis
ion
No. of Users
RTVR
OmniSuggest
SVD
POPULAR
Social
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
50 100 150 200 250 300 350 400 450 500
Rec
all
No. of Users
RTVR
OmniSuggest
SVD
POPULAR
Social
88
Figure 3.36 Scalability Comparison w.r.t F-Measure
Figure 3.37 and Figure 3.38 illustrate the results for precision and recall over
number of increasing clusters. The results indicate that precision and recall is improved
as the number of clusters are increased up to 5 clusters. Whereas, precision and recall
starts to decline slowly as the number of clusters are further increased. The reason
behind such behavior is that when the number of clusters are increased, the number of
potential venues also increase.
Figure 3.37 Number of Selected Clusters vs Precision
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
50 100 150 200 250 300 350 400 450 500
F-M
easu
re
No. of Users
RTVR
OmniSuggest
SVD
POPLUAR
Social
89
Figure 3.38 Number of Selected Clusters vs Recall
Figure 3.39 depicts the comparison of RTVR execution on a single node and on
cloud. For this experiment, the proposed model is executed using Matlab cloud
framework. It is evident from the results that the recommendation time is significantly
reduced when RTVR is executed on cloud. It is noteworthy that execution time on a
single node increases rapidly with the increase in number of clusters. The reason behind
such behavior is that on a single node, the increasing clusters have to execute
sequentially. Whereas on cloud framework, each cluster is processed in parallel that
leads to significant reduction in execution time.
Figure 3.39 Time Comparison between Single Node and Cloud
0
50
100
150
200
250
300
350
400
450
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Tim
e (S
eco
nd
s)
Number of Selected Clusters
Single Node
Cloud
90
3.9 Summary
This study has presented multifold contributions by devising a cloud based real-time
venue recommendation framework in social networks for different users. The major
contribution of this work is the integration of knowledge engineering techniques
including K-means, KNN, and collaborative filtering on a cloud infrastructure for
generating real-time recommendations. The proposed RTVR framework considered the
effect of dynamic real-world physical factors in addition to the collective opinions of
the experienced users. RTVR has solved the scalability issues by putting forward a
cloud-based architecture that allocates data and computational intensive loads over
cloud servers. As a result, RTVR always considers precompiled set of experienced
users for all categories that enabled it to recommend appropriate venues for new user
at finer granularity. The evaluation results have proven that performance of the
proposed RTVR framework is superior to many of the existing chosen schemes.
In the next chapter, an attempt is made to overcome the challenges faced in the area of
health recommendations. It is of critical importance to maintain a balanced intake of
food. However, it is quite challenging for a common person to keep track of personal
food requirements because of the massive diversity of dietary components and items.
A systematic food recommendation system is desired to recommend the appropriate
food considering the disease of the person. The major challenge in designing such a
system is the handling of greater volumes of data in terms of ingredients, quantity,
nutrition facts, people’s preferences, and simultaneously taking into consideration a
person’s pathological reports. The system must be scalable enough to handle
recommendation queries from all over the globe. A solution to the aforementioned
challenges is the use of cloud computing.
91
Chapter 4 A Smart Food Recommendation System
92
4.1 Health Recommendation Systems
Recently, an increase in the health information needs and changes in information
seeking behavior is noticed in the whole world [184]. In recent studies [78, 185], it is
reported that 81% of U.S. adults frequently use the Internet. Moreover, the studies
revealed that 59% of US adults reported that they frequently use online health
information seeking for disease diagnoses and treatments [78]. Similarly, in a survey
conducted by Pew Research Center in 2013, 19% of the users in Pakistan are seeking
online health information using their smart phones and internet devices [186]. The
dependency on the internet based health seeking affects the patient physician
relationship because educated patients pose questions or discuss the available treatment
options [187]. As a result, a patient becomes an active participant in the decision-
making practice. This change in the thinking process is referred as patient
empowerment [188]. However, information overloads, imprecise information, and
irrelevant material are the major issues when drawing the conclusions on the personal
health status and taking adequate actions [189]. Due to large amount of medical
information availability on different sources (e.g., news sites, web forums, etc.), the
patient usually lost or feel uncertain when investigating based on his own experience.
Moreover, a diverse and assorted medical dictionary poses one or more obstruction for
laymen such as difficulty in finding the relevant information or understanding the
medical terminologies [190]. As a result, enhanced adapted delivery of medical
contents is capable of supporting users in discovery of relevant information [105].
It is noticed that nowadays medical information available for patient-oriented decision
making has considerably increased. But, the medical information is regularly spread
over different web sites [191]. As an alternate, personal health record systems (PHRS)
are introduced to integrate one’s health data for allowing access to the owner and
authorized health professionals [192]. A recommender system efficiently suggests an
item of interest to a particular user who is relevant to an information system or e-
business systems. The idea behind recommender systems can be adapted to cope with
the special requirements of the health domain. Health Recommendation System (HRS)
states a recommendable item of interest and structurally it denotes piece of non-
confidential, systematically confirmed or at least in general acknowledged medical
information that in itself has no connection to an individual’s medical history profile.
Consequently, it is possible to calculate and distribute potentially related information
93
items from trust worthy health associated data repositories. So, the users might be
supplied with high excellence state of information to handle certain disease or settle in
his or her everyday life habits.
In such a scenario, a common person communicates to the underlying HRS based PHRS
system without requiring direct support from a medical expert. In return, system
generates laymen-friendly contents based on person’s long term medical history profile.
The most relevant items are offered inside the user interface of PHR system. After
selection of highest ranking documents, the individual becomes able to acquire health
information. Thus, the risk for retrieving “incomplete, misleading and inaccurate”
material using most popular search engines could be minimized.
Finding the most relevant and important information is a challenging task
especially when health related information systems are considered. People’s desires for
searching an individual’s health associated information is often known as health
information seeking [193]. The author of [194] established three motivations (a)
findable, (b) comprehendible, and (c) context awareness that are discussed below,
regarding why people use the Internet for finding relevant materials: “desires for
reassurance, for second opinions for greater understanding of existing information and
to circumvent perceived external barriers to traditional sources”.
(a) Findable: Determining a relevant resource is hard to find irrespective of internet
resource availability. Many times, user cannot generate a suitable search string for
narrowing down the search space for the objective medical problem. In addition, the
association between medical concepts is very difficult to understand irrespective of
hyperlinks availability. Considering more worse situation, many layman do not fully
explore search result, as “users of the Internet explore only the first few links on general
search engines when seeking health information” [195].
(b) Comprehendible: This problem arises from partial perception of medical
terminology for the people lacking grip on medical knowledge. Consider a case where
layperson has no mental model connected to some portion of information. In particular,
while a medical issue is concerned, the medical terms accessed from several online
resources is either misread or misinterpreted by the laymen. There exits multiple
instances where people misread or misunderstood the medical terminologies and later
comprehended their faults [196].
(c) Context Awareness: Usually, health connected information is provided or
interpreted in a medical context, often it comes in combination with a case or medical
94
situation. Based on a search engine or encyclopedia based sources, the chance of absent
of this kind of context is very common. As a result, this absent of data can lead
inadequate decisions [197].
A health recommender system is bound to deliver context-based high quality and
precise material. Particularly, a semantically-enabled HRS possibly deals with complex
association among medical concepts, resolving medical abbreviations and
categorization codes and acclimatizes to an individual’s medical rank of knowledge.
This type of system can also mitigate the impacts of information overload [198] because
of the reason that it offers users only those items which are most relevant for a given
case.
As health recommendation system is an emerging field therefore limited
research work has been carried out so far. HRS can be categorized as (a) consultant
health care advisor [199, 200], (b) exercise HRS[201-203], (c) dietary HRS[15, 23, 45,
204], and (d) disease specific [56, 202]. Consultant health care advisor is further sub
divided into (a) online doctor to patient [199, 200, 205] recommendation, and (b) offline
doctor to patient recommendation[206, 207]. Dietary HRS is further subdivided into (a)
food recommendation [46, 208], (b) recipe recommendation [57, 209], and (c) menu
recommendation [43, 210]. Disease specific HRS is further subdivided into (a)
cardiovascular [56, 211], (b) diabetic [55, 212], and (c) obesity [14, 213]. The hierarchy
of health recommendation system is depicted in Figure 4.1.
95
Figure 4.1 Hierarchy of Health Recommendation System
4.1.1 Significance
Due to lack of information on healthy diet, people mostly rely on medicines for various
health issues instead of adapting preventive food based measures. Various studies prove
that most of the medicines hold unwanted minor or major side effects [214]. Therefore,
considering the massive growth and adoption in the Information and Communication
Hea
lth R
eco
mm
endat
ion S
yst
ems
Consultant Health Care Advisor
Online Doctor to Patient Recommendations
Offline Doctor to Patient Recommendations
Exercise
Dietary
Food Recommendations
Recipe Recommendations
Menu Recommendations
Disease Specific
Diabetic
Cardiovescular
Obesity
96
Technology (ICT) sector, some initiatives and smart systems are desired for the novice
users to solve their health problems using various food items instead of relying fully on
medicines. For instance, deficiency of calcium may cause various diseases [215], such
as bone loss (osteoporosis), inactivity of the parathyroid gland (hypoparathyroidism),
weak bones (osteomalacia), and some muscle disease (latent tetany). Variety of
medicines (dietary supplements) is available to overcome the calcium deficiency.
However, such medicines also incur some of the side effects and problems, such as
vomiting, loss of unusual weight, appetite loss, muscular pain, mood changes, increased
thirst and urination, tiredness, weakness, and headaches [216]. Whereas calcium
deficiency may be mitigated using various food items instead of using the dietary
supplements. Numerous dairy foods, such as milk, yogurt, and cheese are main source
of calcium. Similarly, dark leafy green vegetables like turnip, spinach, collard greens,
kale, fortified orange juice, fortified cereals, sardines, soybeans, breads, and waffles are
enriched with calcium.
All of the food items consist of different nutrition facts which have different values.
People often use dietary supplements to overcome the deficiencies. For instance, one
of the dietary supplements for calcium deficiency is known as CMZ-3 in which amount
of calcium is 1000 mg in three serving tablets daily which is the recommended dose.
One cup (8-10 oz) of milk contains 300-350 mg of calcium, which indicates that 3 cups
of milk are enough daily to overcome the deficiency of calcium. Therefore, why use
dietary supplements when food items are available with nutrition facts.
Beside medicines, there are other factors that also have influence on a healthy
life style. These include different foods and no exercise. Various studies showed that
regular exercise is associated with improved mood, overall satisfaction, and emotional
or bodily well-being in all ages [213]. Therefore, people with less or no exercise are
often convoluted in health related issues. One of the main reasons of doing no exercise
is the excess use of technology [217]. People often spend their extra time in using
technology like social networking sites (Facebook, twitter), playing video games, use
of smart phones, and watching television. The excess use of technology may affect a
healthy life style [217].
Different studies [218-220] show that adults and children are taking more
calories in fast foods and other restaurant foods rather than homemade foods. These
97
studies show that taking more fast food than nutritious food can lead to poor health and
poor nutrition. The study [219] showed that children eating fast foods frequently, (3
times a week at least) are at high risk of rhinitis (congested nose) and asthma. Another
study [220] showed that eating fast food like hamburgers and pizza and baked items
like cake and doughnuts may be the cause of depression. Following are some of the
serious impacts of fast foods on different body systems.
(a) Carbohydrates: Most of the fast food items and soft drinks are enriched with
carbohydrates which ultimately yield high calories. High amount of carbohydrates is
the main source of producing a spike in blood sugar level resulting in disturbing the
insulin response in the body. The disturbed insulin response contributes in resistance of
insulin, and inflamed patches of skin especially in children, and type-2 diabetes. [212,
219].
(b) Added Sugar: There is no nutritional value in added sugar but they are enriched
with calories. The excess of extra calories due to added sugar results in more weight
which is one of the main causes of heart diseases [221]. Consuming foods with high
sugar and carbohydrates allow the bacteria residing in the mouth to produce acids. The
acid produced by the bacteria can harm or destroy tooth enamel which resists cavities.
Once the enamel is destroyed, it cannot be replaced [222].
(c) Sodium: Another factor affecting the health of people due to fast food is sodium.
Intake of excess of sodium may result in increasing the risk of high blood pressure,
kidney stones, stomach cancer, osteoporosis (fragile bones), and enlarged heart muscles
[204].
(d) Trans Fats: Trans fats are the sub type of saturated fats. Most of the fast foods are
enriched with Trans fats which are the primary source or raising the low-density
lipoprotein (LDL) cholesterol levels and lowering the high-density lipoprotein level
(HDL) cholesterol. Both the levels when altered may cause heart disease and increase
the risk of type 2 diabetes [223].
From the above discussion, it can be concluded that the balancing of various dietary
components is of critical importance for staying healthy. The recommendation systems
can be employed to suggest a balanced diet. However, most of the traditional diet
98
recommendation systems do not consider a person’s deficiencies or excess of chemical
compounds in the body. Therefore, those parameters are considered in our model along
with nutritional demands, and various types of food components from diversified
dietary sources.
4.2 Motivation
One of the major factors for a healthy life is daily diet and food, specifically, for the
people suffering from some minor or major diseases. eHealth initiatives and research
efforts aim to offer various pervasive applications for novice end users to improve their
health [44]. Various studies depict that inappropriate and inadequate intake of daily diet
are the major reasons of various health issues and diseases. A study conducted by World
Health Organization WHO estimates that around 30% of the total population of the
world is suffering from various diseases, and 60% deaths each year in children are
related to malnutrition [47, 48]. Another study by WHO reports that inadequate and
imbalanced intake of food causes around 9% of heart attack deaths, about 11% of
ischemic heart disease deaths, and 14% of gastrointestinal cancer deaths worldwide
[45]. Moreover, around 0.25 billion children are suffering from Vitamin-A deficiency
[49], 0.2 billion people are suffering from iron deficiency anemia [50], and 0.7 billion
people are suffering from iodine deficiency [51].
Generally, a person remains unaware of major causes behind deficiency or
excess of various vital substances, such as calcium, proteins, and vitamins, and how to
normalize such substances through balanced diet. Several works [15, 46, 52-57]
proposed different recommendation systems related to food. These systems can be
categorized as: (a) food recommendation systems [46, 52], (b) menu recommendations
[54], (c) diet plan recommendations [15], (d) health recommendations for different
diseases such as diabetes and cardiovascular [55, 56], and (e) recipe recommendations
[53, 57]. All the aforementioned systems provide recommendations to either some
specific disease or balance the diet without considering information about any disease
or nutrition deficiency in the body. For instance, in [52], a food recommendation system
is proposed for the patients of diabetes. The system recommends various foods for
diabetic patients without considering the diabetes level that may fluctuate frequently.
Similarly, the authors in [46] do not consider the nutrition factors that have significant
importance for a balanced diet recommendation.
99
Keeping in view the above mentioned facts and figures, it is of critical
importance to maintain a balanced intake of food. However, it is quite challenging for
a common person to keep track of personal food requirements because of the massive
diversity of dietary components and items. A systematic food recommendation system
is desired to recommend the appropriate food considering the disease of the person. The
major challenge in designing such a system is the handling of greater volumes of data
in terms of ingredients, quantity, nutrition facts, people’s preferences, and
simultaneously taking into consideration a person’s pathological reports. The system
must be scalable enough to handle recommendation queries from all over the globe. A
solution to the aforementioned challenges is the use of cloud computing. Cloud
computing is an innovative and emerging platform that enables users to perform on-
demand scalability of computing and storage resources [224].
In this chapter, a cloud based food recommendation system called Diet-Right is
presented that considers the users’ pathological tests results, and recommends a list of
optimal food items. To achieve optimal results, an algorithm is developed based on Ant
Colony Optimization ACO. A database of 345 pathological test reports and their normal
ranges are designed. A database was created by performing a field survey and collecting
the information about pathological reports from different laboratories [225-227]. The
collected data was verified by a pathologist of a hospital. Moreover, a database of 3,400
food items with 26 entries for most common nutrition was taken from the official
website of composition of foods integrated dataset (CoFID) [228]. Based on the real-
time input of user’s parameters, the Diet-Right recommends top ranked food items to
the user.
In medical practice, sometimes, pathological tests are required to identify a
particular disease. A pathological test report usually indicates deficiency or excess of
certain compounds/parameters in human body, e.g., levels of iron, calcium, or red blood
cells (RBC) count, etc. which may cause particular disease. In this dissertation, a novel
food recommendation system is presented specifically dealing with the pathological
tests results. Our system considers diseases related to pathological reports, and most
common nutrition factors in recommending the food items to the users. For this
purpose, a database of 345 pathological test reports is used to categorize various
diseases that occur due to the deviation from the normal ranges of
100
compounds/parameters. Moreover, a system is designed that allows users to input
values for a specific parameter. Based on the deviations of the input parameter value
from the normal ranges, the system generates a diet plan that aims to cover those
abnormalities. Furthermore, ant colony algorithm is used to train the system with the
values of various parameters’ ranges and diseases.
4.3 Related Work
Several works have been proposed for different recommendation systems related to diet
and food. These systems are used for food recommendations, menu recommendations,
diet plan recommendations, health recommendations for specific diseases, and recipe
recommendations. Majority of these recommendation systems extract users’
preferences from different sources such as users ratings [106, 209], recipe choices [229,
230], and browsing history [231-233]. For instance, in [230], a recipe recommendation
system is proposed using social navigation system. The social navigation system
extracts users’ choices of recipes and in return recommends the recipes. Similarly, in
[233], a recipe recommendation system is proposed that is capable of learning similarity
measure of recipes using crowd card-sorting. The above mentioned recommendation
systems lack in solving a common problem known as cold start problem. All these
system must wait for the users to enter enough data for the effective recommendations
[234]. Some of the commercial applications such as [235, 236] offer users for a quick
survey in order to get users preferences in a short time. For instance, the survey used
by [235] is specifically designed to match the lifestyle of the user i.e., healthy,
sportsman, pregnant, etc. The survey also attempts to avoid various foods which do not
match the user’s lifestyle. Similarly, a questionnaire is used by [236] through which a
user answers different questions about his/her lifestyle, food preferences, nutrient
intake, and habits. The system once extracts all the basic information is then able to
recommend different meals for daily and weekly basis.
A Food Recommendation System FRS [52] is proposed for diabetic patients
that used K-mean clustering and Self-Organizing Map for clustering analysis of food.
The proposed system recommends the substituted foods according to nutrition and food
parameters. However, FRS does not adequately address the disease level issue because
the level of diabetes may vary hourly in different situations of the patient and the food
recommendations may also vary accordingly.
101
Tags and latent factor are used for android based food recommender system
[46]. The system recommends personalized recipe to the user based on tags and ratings
provided in user preferences. The proposed system used latent feature vectors and
matrix factorization in their algorithm. Prediction accuracy is achieved by use of tags
which closely match the recommendations with users’ preferences. However, the
authors do not consider the nutrition factors in order to balance the diet of the user
according to his needs.
Content based food recommender system [53] is proposed which recommend
food recipes according to the preferences already given by the user. The preferred
recipes of the user are fragmented into ingredients which are assigned ratings according
to the stored users’ preferences. The recipes with the matching ingredient are
recommended. The authors do not consider the nutrition factors and the balance in the
diet. Moreover, chances of identical recommendation are also present because the
preference of the user may not change on daily basis.
In [5], knowledge based dietary nutrition recommendation system is proposed
for obesity. The recommendations include dietary nutrition and diet menus for
individuals using collaborative filtering technique. An application for mobile users is
also developed in order to recommend the dietary nutrition and menus to the users.
Similarly, a food recommender system is proposed in [208] for patients in care
facilities. The application is designed for caregivers in the care facilities in order to
offer the food according to the patient preferences.
The above mentioned food recommendation systems are specifically dealing
with some diseases or related to balance the diet. In case of food recommendation for
specific diseases, the systems recommend different foods for patients without knowing
the level of disease which may vary in different cases. Similarly, in case of food
recommendations to balance the diet, nutrition factors are ignored which are very much
important to recommend food and balance diet.
4.4 Proposed Model
The main focus of this work is to provide dietary assistance to different people who are
suffering from common diseases. The proposed model recommends various foods and
nutrition to the people based on their pathological test reports. Every pathological
report has some indicators that are calculated based on the nature of the tests. For
102
instance, if a doctor advised a patient to take pathological test of blood, then the
common test entries include the values of hemoglobin, red blood cells (RBC), white
blood cells (WBC), plasma, and sugar. Normal ranges of the aforementioned indicators
are usually given in the test reports. In this way, the patient can identify the
abnormalities after comparing with the normal ranges. In our proposed system, a user
is provided with the complete list of the test parameters to make selection from. The
user inputs the specific values of test report in the selected parameters. The data of
normal ranges is gathered for tests including blood plasma and serum, urine, stool,
cerebrospinal fluid (CSF), and gastric and secretion tests. A matrix of 345 entries was
constructed. Each individual components of a test, e.g., blood test have normal ranges
with lower and upper bound. The ranges of the same component may differ on the basis
of gender, age groups, and fasting or no fasting. Our system is trained on various types
of age groups and their respective ranges of parameters. This allows the system to
suggest diets as per needs of the users.
4.4.1 Diet-Right Architecture
In majority of the existing food recommendation systems, centralized architecture is
used [15, 46, 52-57]. The main disadvantage of using such system is scalability, when
dealing with the massive amount of data. A cloud based solution is proposed to offer
the scalability and pervasiveness, where the smart phone users can conveniently access
the recommendation system as depicted in Figure 4.2.
Figure 4.2 Architecture of Diet-Right
103
The model takes the input values as a first step from the user. User enters the
demographic data including gender and age as well as the value of the pathological test
reports. The values of the pathological reports varies for different age groups and
gender. These values are sent to cloud infrastructure in second step and are compared
with the normal ranges that are stored in the database. In the third step, the abnormality
level of the pathological test reports is computed. In some cases, the abnormality level
may be sufficient and in some cases the level may be insufficient. Therefore, the
proposed algorithm must note the abnormality level carefully in order to recommend
the food items. In the next step, the weight assignments and matrix generation process
is carried out. Detailed discussion on weight assignment and matrix generation is
discussed in the next section. In the fifth step, ranks are calculated for each food item
and are sorted in descending order. In the sixth step, the user is provided the
recommended list of food items.
4.4.2 Proposed Algorithm
In this subsection, the food recommendation process is presented using variant of ant
colony approach on a graph of foods to generate the optimal food set for the users. In
Diet-Right, Ant Colony Optimization (ACO) technique is used. There are some other
techniques also available for solving optimization problem such as Particle Swarm
Optimization (PSO) [237] and Genetic Algorithm [238]. The effectiveness of the three
techniques are intensely investigated by comparison discussed in the next section. After
the quantitative analysis of the aforementioned techniques, ACO is selected for this
study. ACO metaheuristic is a constructive and population based-approach which relies
on the social behavior of ants. It is recognized as a most powerful approach for the
solution of combinatorial optimization problems [239]. One of the motivation of using
ACO is that it can be run as a distributed algorithm, therefore, it is suitable to run on
cloud environment whereas other techniques lack such ability. Some other advantages
of using ACO is its behavior of quick discovery of optimal solutions and due to these
advantages it can be used in dynamic applications. The main steps used in our proposed
Algorithm are explained as follows:
Each food item is placed on nodes and a strongly connected graph is generated as shown
in Figure 4.3. Each link of graph has associated H and I values, where I is the randomly
initialized pheromone, and H is the heuristic information initialized as the inverse of
104
squared sum of difference of all the ingredients �. In (4.1), . represents index for food
ingredient, � and � represent ��� and ��� food item, I represents a single ingredient in a
certain food item, and m is the total number of ingredients in a certain food item.
��� = ∑ ��� − ����� ��� .
(4.1)
Where, H is used to control exploration and exploitation of ACO and the values of H ∈�0,1 .
Figure 4.3 Graph Representation of the Problem
After initialization, each ant constructs its local solution by visiting nodes which
provide best cost in terms of low error compared to targets. Target vector represents the
amount of food ingredients required against the particular disease. Target vector is
predefined based on pathological reports, for instance, target vector for user with
calcium deficiency may ranges from 9 to 10.5. The different nodes or food items are
selected using transition rule which selects a path with highest transition probability.
Transition probability is given by (4.2)[240]:
������ = � �� ���� × �� ����∑ [�����] × �� ����∈��
, �� � ∈ �
0 ��ℎ������
(4.2)
Where, I��& represents the pheromone level at time &, H��& is the heuristic information
at time & , and �, J are the hyper parameters in the model used to weight heuristic
105
information and pheromone level used for fine tuning. Moreover, . represents ant, � represents initial node, � is the target node, � is index of current selected path, and �
represents the solution.
When an ant selects a path among all existing paths excluding the path in the solution,
it updates the pheromone level locally as depicted in (4.3)[120].
���� � + 1 =����� ��� × �+ ��� ,�1 − ������ �
�� � ∈ ��� ��ℎ������
(4.3)
Where, I�� �& + 1 is new pheromone level that is increased by amount KI , and
evaporation is governed by multiplication of pheromone decaying parameter L. Also, �
is the solution of the .�� ant at time &.
Each ant provides locally optimized food set based on the nutrition expert
recommendations, but here, the interest is in the globally optimized solution. In this
study, supervised approach is used therefore Root Mean Squared Error (RMSE) is used
for the selection of globally best solution. To do so, global solution is initialized to
EMPTY set and solve for each ant. Initially, solution returned by the first ant is
considered as the global solution. Afterwards, when rest of the ants return with a
solution, their RMSE is compared with the current global solution replacing the global
solution with the solution having minimum RMSE value. For fast convergence of the
solution, the pheromone level is updated again using the same formula, but the update
is only for the path that is globally optimal solution as depicted in (4.4)[120].
���� + 1 = ����� ��� × �+ ��� ,�1 − �������
�� � ∈ ��� ��ℎ������
(4.4)
In food selection process, there is a need to select diversified foods to enhance
the acceptance of foods among different people. The heuristic information is managed
and updated in such a way that the diversity among foods is maximized. For heuristic
information update, (4.5)[120] is used.
�� =1��
� ���� ���� �1 + ��� ����� � ! ,�
����� �"��� (4.5)
106
Where, � is the selected number of foods, M� is the number of times a food is
selected in whole iteration, and m, M� are the parameters used to balance the solution in
terms of local and global perspective. The used heuristic information facilitates in the
selection of foods with minimal redundancy. Algorithm 1 presents food
recommendation using ACO.
Algorithm 1: Food Recommendation using ACO
Input: Dataset �f, n� f foods, n nutrition, � (�) prescribed nutrition plan, and
maximum iterations
Output: Selection of optimized food Set R based � (�)
1: Set initial values of the heuristic information ���, � and level of
pheromone trails ��,� randomly and ��� ←
2: Sg ← ∅
3: repeat
4: Generate and randomly place ants at different nodes of the graph
5: � ← 1
6: for . ≤ ���&� do
7: while all nodes are not visited do
8: Calculate transition probability using 2
9: Update pheromone locally ���� using 3
10: Add ��)������ �& � & � t
11: end while
12: if �� = ∅ then
13: �� �& = � �&
14: else if NO= − ∑ ����� �!����
≥ OF= − ∑ �� �� �"!�#
��� GQ then
15: �� �& = � �&
16: end for
17: Update pheromone globally I��� using 4
18: Update heuristic information �� using 5
19: R=, �� = O= − ∑ ����� �!����
20: until R=, �� ≥ =$ or maximum iterations reached
21: Return ��
107
4.4.3 Ranked List Generation
Collaborative filtering is the most widely used technique in recommendation system
where users are likely to get similar recommendations when they have matching
preferences [98, 110]. There exists two different approaches to express user
preferences, i.e., implicit rating and explicit rating. Implicit rating is used when user
preferences are expressed in terms of number of clicks, views, or purchases etc.
Whereas, explicit rating is used when ratings for different items are available. Various
techniques have been proposed to improve the accuracy of rating predictions that
attempts to predict the rating of users on unseen items [70, 139, 200]. In practice,
however, a ranked list consisting of top-K recommended items is presented to the user
[3, 109, 110, 195]. It has been shown in relevant studies that ranking based models
outperform rating-prediction based techniques in terms of higher recommendation
accuracy for the problem of top-K recommendations [62, 134]. This is mainly due to
the fact that users are likely to pay more attention to items at the top of list instead of
items at the bottom of the ranked list. Consequently, focus is shifted towards accurate
ranked list generation instead of improving the performance of rating prediction.
Therefore, ranking technique is applied on top of our proposed ACO based model for
generating top-K ranked recommendation list.
The proposed model uses ACO for generating optimized list for the users. The proposed
ACO model generates optimized list of 100 food items according to the pathological
reports of the users. To generate the personalized recommendation list of top-K items
for the user, optimized list computed by ACO must be ranked considering user
preferences such that the user is provided with the top-K recommendations. For
experimentation, the value of K is varied between 1 and 20. To this end, ranking
techniques are used to generate the personalized recommendation list for the user.
Variety of ranking techniques are available to rank the final list generated by the
underlying algorithm. To select the best approach for ranking in terms of accuracy,
time, and RMSE, a qualitative analysis of the state-of-the-art methods are performed
that will generate a final ranked list of food items for the user. The techniques included
for analysis are: (a) user-based KNN [241], (b) item-based KNN [108], (c) matrix
factorization [242], (d) Bayesian [243], and (e) most popular [109]. Dataset of Amazon
food items is used that comprises 1,297,156 ratings over 140,000 food items [244].
108
However, those food items are included in the analysis for which nutrition facts are
available from the CoFID [228] dataset used in our model. The selected ranking
techniques are briefly described here.
(a) K-Nearest Neighbor
KNN is a machine learning technique that classifies the data and falls in the category
of lazy learning algorithms. It takes � closest training sets as input for processing in
the feature space. It consider the set of objects for which classification is to be done
prior to initiation of classification. This step is known as training set for the KNN. The
distinguishing feature of KNN to K-means, another machine learning technique, is its
high sensitivity to the defined local structure of data. The KNN algorithm adds training
set in list of training examples. Later on, for each query, it estimates K neighbor from
the training example list. In response of query it returns back the class that represents
the highest . instances. For ranking, KNN based approaches consider labeled
neighbors in the query feature space to rank the data. Two variations of KNN ranking
techniques are commonly used for recommendation namely KNN user-based and KNN
item-based.
(i) KNN User-Based:
In KNN user-based approach, to estimate the rating user u would give to item i i.e., �~��, � . Let U represents set of all the users, and I represents set of all the items, then
similarity between user u and similar users � , which is given by (4.6) [241].
��� ��, � = ∑ ���, � . �(� , �)�∈%(&,&')
O∑ ���, � �∈%(&,&') O∑ ��� , � �∈%(&,&')
(4.6)
In equation (4.6), �(�, � ) is the set of all the items rated by user u and � . Set of neighbors
N(u) of user u are based on the similarity calculations and it ranges from 1 to |'| − 1,
i.e., all the other users. Therefore, �~��, � can be calculated as the adjusted weighted
sum all the given ratings �~�� , � where � T ∈ �(�)
�~�� , � = �(�) +∑ �����, � . ��� , � − �(�) & '∈((&)∑ |�����, � |& '∈((&)
(4.7)
109
Where �(�) is the average rating of user u.
(ii) KNN Item Based:
The difference between user-based KNN and item-based KNN is only the symmetry
between users and items in all the KNN algorithms. Therefore, equation (4.6), and (4.7)
can be re-written for KNN item-based as (4.8), and (4.9) [245]
��� ��, U ̅ = ∑ ���, � .�(U,̅ �)&∈(�,)̅)
O∑ ���, � &∈(�,))̅ O∑ ��U,̅ � &∈(�,)̅)
(4.8)
�~�U,̅ � = �(U) +∑ �����, U ̅ . ��U,̅� − �(U) ) '∈((�)∑ |�����, U ̅|)̅∈((�)
(4.9)
(b) Bayesian Networks:
Bayesian networks are used to characterize knowledge about a domain that is not
clearly defined. Bayesian networks are usually represented through a directed acyclic
graph (DAG). Particularly, each node in the DAG denotes a random variable, whereas
the edges between the nodes of DAG represent direct probabilistic inter-dependencies
between the connected random variables. The probabilistic dependencies among the
nodes (random variables) are calculated using known computational and statistical
techniques. Consequently, Bayesian networks bring together the principles from
probability theory, statistics, graph theory, and computer science.
Bayesian networks generate a system model based on training data where each node
represents a decision tree and edges signify user information. The advantage of using
Bayesian network is that the system model can be developed offline that can take hours
or days. However, the created model is very compact, fast, and nearly as accurate as
nearest neighbor models [246]. Bayesian networks have proven to be more effective in
application areas where knowledge about user preferences does not change rapidly with
reference to the time required to develop the model. Alternatively, Bayesian network
model are considered inappropriate for application scenarios wherein user preferences
110
are updated frequently. Consequently, Bayesian networks is selected for
experimentation in conjunction with our proposed food recommendation model
because user preferences regarding food are not expected to change frequently. For
Bayesian ranking calculations, the essentials needed are the set of different variables
and set of directed edges between these variables. Moreover, all variables also have a
set of mutually exclusive states. Directed edges and all the variables form a directed
acyclic graph. For each variable X with parents %�,%,%�, … … … %�, there must be
attached a probability table <V�|%�,%, %�, … … … %�W, then the equation for Bayesian
ranking is [247]
<X��4%�Y = <X%�4��Y<(��)<X%�4��Y<(��) + <X%�4��Y<(�) + ⋯ + <X%�4��Y<(��)
(4.10)
From equation (4.10), the simplified form can be written as
<(%�) = +<X%�4� Y<(� )
�
��
(4.11)
(c) Matrix Factorization
Matrix factorization, also known as matrix decomposition, divides a matrix into a
product of matrixes. The most popular matrix factorization method is lower and upper
LU decomposition [248]. The LU decomposition, decomposes a matrix into lower
triangular matrix and upper triangular matrix. The advantage of matrix factorization is
its ability to find underling latent factors and/or to predict missing values of the matrix
that represents some recommender system [81]. For recommender systems, matrix
decomposition technique characterizes users and item based on vectors of factors which
is estimated based on item rating pattern. State of the art recommendation systems
places the input data in matrix where one user are located on one dimension with item
of interest residing on other dimension. Formally, let ' represent a set of users and a
set � represent a set of items. Let A be the matrix of size |'| × |�| that contains all the
ratings of the users assigned to items. Also, assume that Z latent features are required
to be found. Therefore, two matrices � with dimensions |'| × Z and % �with dimensions |�| × Z , are also needed to be found such that, when matrices
111
� and % are multiplied, the result approximates to matrix Z as indicated in equation
(4.12) [249].
# ≈ $ × %� = #& (4.12)
In this way, each row of matrix � and % would represent the strength of the
association between user and features, as well as, item and features, respectively. For a
user ��, prediction of rating for item �� can be calculated by the dot product of the two
vectors corresponding to �� and �� as shown in (4.13) [249].
�̂�� = $�� %� = � $��
�
���%�� (4.13)
(d) Most Popular
As the name implies, most popular recommendation technique recommends the items
to the users that are most popular among other users. Particularly, items that have higher
average ranking (popular among the users) and are recommended / ranked by higher
number of users are recommended to the user under consideration. Mathematically,
most popular ranking [245] can be expressed as
���.2<� �� 3 = |'(�)|, [ℎB�B '�� = �� ∈ '|∃�(�, �)� (4.14)
Where �(�, �) is the rating of user u on item i.
4.4.4 Complexity of Proposed Algorithm
In ACO based food recommendation algorithm, the outermost loop is executed at most � times, where � represents the maximum number of iterations. The inner for loop is
executed . times that represents the total number of ants. Whereas, inner while loop is
executed � times representing total number of food items. Consequently, the
complexity of ACO based algorithms is captured by equation (15):
:�� × � × . (4.15)
112
4.5 Formal Verification
The basic introduction is provided to HLPN, SMT-Lib, and Z3 solver for a clear
understanding of the reader in section 3.4. Table 4.1 and 4.2 show the data types and
their mappings, respectively. The HLPN model for RTVR is shown in Figure 4.4
Table 4.1 Data Types for HLPN Model
Table 4.2 Places and mapping used in HLPN model
Place Mapping
φ (Init) Ϸ (Kants × Lpher_trial × Gpher_trial ×Nut x HI × GSol)
φ (Nod) Ϸ (Nodes x Kants ×Lpher_trial × Gpher_trial × Nut × HI × GSol)
φ (Prob) Ϸ (Kants × Trans_prob × Lpher_trial × Gpher_tria × Nut × HI ×GSol )
φ (Update) Ϸ (Kants × Lpher_trial × Gpher_trial × GSol)
φ (GS) Ϸ (GSol)
φ (SS) Ϸ (GSol)
Data Type Description
Kants A list representing k number of ants in the system.
Lpher_trial A list depicting local phenomena trail of ants.
GSol A list showing global solution.
Gpher_trial A list depicting global phenomena trail of ants.
Nut A list showing nutritional values.
HI A list containing heuristic information of nutrition.
Nodes List of nodes in the system.
Trans_prob A list with transitional probability of every ant.
113
Figure 4.4 HLPN Model for the Proposed Diet-Right Algorithms
4.5.1 Modeling and Analysis of Proposed Algorithm
The process starts with the initialization of values, where ants are given local and global
phenomena trials. Global solution is initialized with NULL and heuristic information is
initialized against nutrition. The following rule shows the process
R(I_V) =⩝ c₁ ϵ C₁,⩝ c₂ ϵ C₂ ,⩝ c₃ ϵ C₃ ⎹ (C1D ∶= ��� �(�C2D , (�C3D , (�C1D� ˄
114
(C5D ∶= ���&_\� �(�C4D , (�C5D� ˄
(C6D ∶= �'@@ ˄
(ˊ = (� ∪ �(C1D , (C5D , (C6D� ˄ (�C3D ∶= (C2D˄ (�C4D ∶= (C3D˄ (�C5D ∶=(C4D˄ (�C6D ∶= (C5D˄ (C7D ∶= (C6D˄
(�ˊ = (� ∪ �(�C3D , (�C4D , (C5D, (�C6D,(�C7D� Later ants are randomly assigned to the nodes.
�(/_/) = ∀� ∈ (�, ∀� ∈ (�, ∀� ∈ (�⎹
(�[1] ∶= /���)� (�����((�C2D)) ˄
(�ˊ = (� ∪ �(� C1D� ˄
(�ˊ = (� ∪ �(�� The transitional probability of every ant is calculated according to equation (x). . .
�(=_<) = ∀� ∈ (�, ∀� ∈ (�⎹
(�C2D: = (��+,-.�(�C1D, (�C2D ˄
(�C3D: = (�C3D ˄ (�C4D˄ ∶= (�C4D ˄(�C5D: = (�C5D ˄(�C6D: = (�C6D ˄(�C7D: = (�C7D ˄
(�ˊ = (� ∪ �(� C2D, (� C3D, (� C4D, (� C5D, (� C6D, (� C7D� Local and Global phenomenon’s and global solutions are updated
�('_�) = ∀� ∈ (�, ∀� ∈ (�⎹
(�C2D ≔ '���&B/�(�C3D), ˄ (�C3D ˄ ≔ '���&B_E ((�C4D)˄(�C4D≔ '���&B_E� ((�C7D)
(�ˊ = (� ∪ �(� C2D, (� C3D,(� C4D� At the end of iterations, global solution is returned to the user as was described in . . . .
�(�_�) = ∀�� ∈ (��, ∀�� ∈ (��⎹
115
(�� ∶= (�� ˄
(��ˊ = (�� ∪ �(��� 4.5.2 Verification Property
The aim was to verify that the foods with highest nutritional values are returned. The
tested property was
/E 2��ℎ�)ℎB�&���&C�D, \�C�D, E�� 3
The property was tested is Z-solver and was satisfied in 397 msec.
4.6 Experimental Setup and Results
Extensive simulations are conducted to evaluate performance of the proposed system.
The experimental setup and results are discussed as follows.
4.6.1 Experimental Setup
The details of the experimental setup and parameters used for evaluation are presented
in Table 4.3.
Table 4.3 Experimental Setup
Parameters Values
Total Number of Food Items 3400
Nutrition for Each Food Item 26
Total Number of Pathological Test
Reports 345
Number of Ants used for Simulation 10-120
Maximum Iterations for each Ant 200
Simulation Tool MATLAB
Single Node System Configuration RAM 16 GB, Cores 4
Cloud Configuration MATLAB Parallel Cloud, Cores 16
116
4.6.2 Results
As a first step, the well-known optimization techniques are compared in terms of
accuracy and time complexity for the selection of our study. The techniques include
ACO, PSO, and GA. ACO metaheuristic is a constructive and population based
approach which relies on the social behavior of ants. It is recognized as a most powerful
approach for the solution of combinatorial optimization problems. PSO is a population
based stochastic approach to solve the optimization problems whereas, GA is also used
for optimizing high quality solutions and search problems. There is no guarantee about
the global optimization of GA and most importantly the convergence time of GA and
PSO is relatively high in solving the real-time problems. Moreover, it is also observed
that the convergence time, accuracy, and RMSE computations for the aforementioned
techniques are problem dependent.
Figure 4.5 shows the comparison of ACO, PSO, and GA in terms of accuracy. The
behavior of the system is tested by varying the size of the dataset. It is observed that the
accuracy of ACO is relatively high from PSO and GA. The reason of this behavior is
the ability of ants used in ACO that search the optimal solution effectively and
collaboratively. Other reason for high accuracy in ACO is that in the realistic domain
ants can discover their targets efficiently and rapidly due to the higher global search
capability.
Figure 4.5 Comparison on Accuracy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
500 1000 1500 1000 1500 2500 3000 3200
Accuracy
No. of Inputs
ACO
PSO
GA
117
Figure 4.6 shows the comparison of the three techniques in terms of time complexity
or convergence time. The convergence time of the algorithm is the average iteration
time required for converging to the optimal solution. Same parameters are used to
compute the time complexity of the techniques. It is quite evident from the figure that
the convergence time of ACO is better than PSO and GA. The reason behind the less
convergence time of ACO is its local and global pheromone update strategy. The
collaborative searching of target nodes and updating the pheromone level locally and
globally help ACO to converge the solution in minimal time.
Figure 4.6 Comparison on Time Complexity
Finally, as a comparison all three techniques are compared for computing the average
RMSE as depicted in Figure 4.7. It is quite evident from the figure, that ACO
outperforms PSO and GA for providing minimum average RMSE. The reason for the
minimum RMSE is that each ant of the ACO must visit each node for the optimal
solution which means more iterations and more calculations which help ACO for
reducing the error rate.
100
20100
40100
60100
80100
100100
120100
500 1000 1500 1000 1500 2500 3000 3200
Tim
e in
Sec
ond
s
No of Inputs
GA
PSO
ACO
118
Figure 4.7 Comparison on Average RMSE
The behavior of proposed algorithm are analyzed in terms of time complexity using
ACO. It is observed that increasing number of ants converge the solution to its
minimum cost, but practically, it is not feasible to use high number of ants. Moreover,
using high number of ants to contract a solution increases the time complexity as shown
in Figure 4.8.
Figure 4.8 Tradeoff between Numbers of Ants to Time Complexity
To select optimal number of ants for best results irrespective of time complexity,
RMSE was estimated. It is observed that 110 ants produce lowest error rate. As our
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
500 1000 1500 1000 1500 2500 3000 3200
Aver
age
RM
SE
No. of Inputs
ACO
PSO
GA
119
algorithm uses random initialization, it produced varied results. To address this
abnormality, average of 10 executions with same settings is used. It can be observed
that with increasing number of ants, RMSE is decreasing. Figure 4.9 shows average
RMSE with respect to increasing number of ants.
Figure 4.9 Tradeoff between number of Ants and Average RMSE
For selection of optimal number of ants, best cost analysis is performed for
number of iterations versus number of ants. It is evident in Figure 4.10 that in our case,
80 ants provide best results in terms of convergence of the solution.
Number of Ants
120
Figure 4.10 Cost over Varying No. of Ants and Iterations
Figure 4.11 presents the convergence time of different diseases. As shown in
the previous result, 80 ants provide the best result in terms of convergence of the
solution. Therefore, 80 ants are used for the convergence time comparison between the
most common diseases. The result shows that the convergence time for normal person
is higher compared to persons with some disease. On the other hand, the convergence
time to recommend foods for a hypertension patient is significantly lower compared to
others. The reason for such variance is that the number of foods available for a
normal/healthy person is much higher compared to the number of foods that are
available for a patient.
Figure 4.11 Convergence Time of Different Diseases
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
20 40 60 80 100 120 140 160 180 200
Avera
ge R
MS
E
Iterations
20 Ants
40 Ants
60 Ants
80 Ants
100 Ants
0
500
1000
1500
2000
2500
3000
Normal Iron Deficiency Kidney Disease Diabetes Hypertension
Tim
ein
Sec
onds
121
Figure 4.12 illustrates the cost comparison of common diseases. It can be seen
that least cost is achieved for hypertension, whereas normal person exhibited highest
cost compared to others. This shows that the dataset used in this study is more suitable
for certain diseases, such as hypertension.
Figure 4.12 Cost Comparisons of Diseases
Figure 4.13 depicts the accuracy of recommendations relative to number of ants.
The result shows that the highest accuracy is achieved with 110 ants. It is quite evident
that when the number of ants are increased, the accuracy is also increased. Moreover,
it is observed that the accuracy remains constant between 80 to 100 ants.
Figure 4.13 Accuracy of Recommendations
0
0.2
0.4
0.6
0.8
1
1.2
20 40 60 80 100 120 140 160 180 200
Av
era
ge R
MS
E
Iterations
Diabetes
Iron Deficieny
Kidney Disease
Hypertension
Normal
0.966
0.9665
0.967
0.9675
0.968
0.9685
0.969
0.9695
0.97
0.9705
10 20 30 40 50 60 70 80 90 100 110 120
Accu
racy
No Of Ants
122
As the last step, before generating final recommendations to the user, ranking of the
optimized food list generated by ACO must be computed. In order to select the best
ranking technique, in terms of accuracy, a comparative analysis of the state-of-the-art
ranking techniques is conducted discussed in Section 4.4.3. The results of comparative
analysis are discussed in the subsequent paragraphs.
Figure 4.14, 4.15, and 4.16 show the comparison of all the selected techniques in terms
of precision, recall, and F-Measure, respectively. Behavior of the system is analyzed by
varying the value of �=2 to 10 in this case. It is observed that precision, recall, and F-
Measure of KNN variants, i.e., KNN item-based and KNN user-based outperform other
techniques.
Figure 4.14 Precision @ K=10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
2 4 6 8 10
Pre
cisi
on
No of Food Items
KNN-ITEM
KNN-USER
MF
MP
BAYESIAN
123
Figure 4.15 Recall @ K=10
Figure 4.16 F-Measure @ K =10
Figure 4.17, 4.18, and 4.19 show comparison of selected techniques in terms of
precision, recall, and F-Measure over value of �=12 to 20. Similar behavior is observed
from the evaluated techniques with the exception that values of precision, recall, and
F-Measure are relatively high as compared to �=10. The higher accuracy is achieved
because of the higher value of K. In this case, more recommendations are generated,
i.e., K=20, that enables the techniques to generate ranked list of food items with higher
precision, recall, and F-Measure.
0
0.1
0.2
0.3
0.4
0.5
0.6
2 4 6 8 10
Rec
all
No of Food Items
KNN-ITEM
KNN-USER
MF
MP
BAYESIAN
0
0.1
0.2
0.3
0.4
0.5
0.6
2 4 6 8 10
F-M
easu
rE
No of Food Items
KNN-ITEM
KNN-USER
MF
MP
BAYESIAN
124
Figure 4.17 Precision @ K =20
Figure 4.18 Recall @ K =20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
12 14 16 18 20
Pre
cisi
on
Top-K Recommendations
KNN-ITEM
KNN-USER
MF
MP
BAYESIAN
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
12 14 16 18 20
Rec
all
Top-K Recommendations
KNN-ITEM
KNN-USER
MF
MP
BAYESIAN
125
Figure 4.19 F-Measure @ K =20
Figure 4.20 and 4.21 show average RMSE against all the selected techniques with the
variation of K=2 to 20. It is evident from the results that variants of KNN outperform
other techniques, in terms of average RMSE, for all the dataset. The reason behind
superior performance of KNN is that the distance between a given user/item and nearest
neighbors becomes smaller because of the lower size of the dataset that helps KNN
variants to find the nearest neighbor with low error rate. It is also observed that MF and
MP have attained highest RMSE values. One of the main reason of MF not providing
minimum error rate is because MF techniques only consider Euclidian structure of the
data and simply ignore the geometric structure of the data. This leads to higher error
rate for MF technique especially when dealing with ranking of the given list.
Figure 4.20 Average RMSE @ K =10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
12 14 16 18 20
F-M
easu
re
Top-K Recommendations
KNN-ITEM
KNN-USER
MF
MP
BAYESIAN
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
2 4 6 8 10
Aver
age
RM
SE
Top-K Recommendations
KNN-ITEM
KNN-USER
MF
MP
BAYESIAN
126
Figure 4.21 Average RMSE @ K =20
Similarly, The reason for Bayesian not performing well is its inability to select a prior
(probability distribution) which is needed for the accuracy of the recommendations.
Bayesian technique needs assistance to transform individual prior views into a
mathematically expressed prior, which results in sub-optimal performance of Bayesian
technique when used for ranking. However, Bayesian techniques can outperform other
techniques in case probability distribution is selected correctly depending on the dataset
used for ranking. Here the emphasis is on the performance of the aforementioned
techniques largely depend on the datasets and particular application scenarios.
Figure 4.22 Time Comparison @ K =10
0
0.2
0.4
0.6
0.8
1
1.2
12 14 16 18 20
Aver
age
RM
SE
Top-K Recommendations
KNN-ITEM
KNN-USER
MF
MP
BAYESIAN
0
1
2
3
4
5
6
7
8
2 4 6 8 10
Tim
e in
Sec
ond
s
Top-K Recommendations
KNN-ITEM
KNN-USER
MF
MP
BAYESIAN
127
Figure 4.22 and 4.23 show the comparison of all the techniques in terms of execution
time. The selected techniques are compared with the variation in size of K from 2 to 20
while increasing the size of dataset. Bayesian and MP exhibited highest time for all the
values of K, respectively. Reason for Bayesian technique having higher execution time
is its computational cost especially when the number of parameters are more in the
dataset as in our case. Another reason for lower performance of these techniques, in
terms of higher computational cost, is that these techniques are mostly used with the
implicit ratings. Whereas in our case, explicit rating is used to improve accuracy of top-
K recommendations. This depends on the dataset used for the experiments and the
ultimate target of the recommendations. In our case, explicit ratings are used because
the objective of this work is to improve accuracy of recommendations. Alternatively,
in most cases, implicit ratings are used when the performance metric is diversity or
novelty of the recommendations. It is also noteworthy that Bayesian techniques are
proved to be more computationally expensive in most of the literature [250].
Figure 4.23 Time Comparison @ K =20
For comparison of our proposed model Diet-Right, following existing techniques are
selected. Knowledge Based (KB) technique [2] and food recommendation using
ontology and heuristic based approach (FROH) [251]. The KB technique is based on
explicit knowledge about the user preferences and the recommendation criteria.
Usually, KB recommendation systems generate the recommendations which are
domain dependent. FROH used TF_IDF in combination with cosine similarity measure
0
1
2
3
4
5
6
7
8
12 14 16 18 20
Tim
e in
Sec
onds
Top-K Recommendations
KNN-ITEM
KNN-USER
MF
MP
BAYESIAN
128
for generating the food list. Moreover, FROH also used heuristic information for
generating the final list.
Figure 4.24 Precision
Figure 4.25 Recall
Figure 4.24, 4.25, and 4.26 show that our system has achieved best precision, recall,
and F-Measure as compared with the existing technique FROH and KB. The reason of
Diet-Right outperforming KB and FROH is the ability of ACO in which each ant must
visit each node for the optimal solution which means more relevant food items are
selected by the ants which ultimately increased the ratio of precision, recall, and F-
Measure.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 1 0
Precision
No. of Food Items
Diet-Right
KB
FROH
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9 1 0
Recall
No. of Food Items
Diet-Right
KB
FROH
129
Figure 4.26 F-Measure
Figure 4.27 depicts the convergence time of single node and cloud-based
execution. For this experiment, our algorithm is executed using Matlab’s cloud
framework [252]. It is evident from the result that the convergence time is significantly
reduced with cloud-based execution. It is noteworthy that the convergence time of cloud
based execution is approximately 12 times lower on average compared to single node
execution.
Figure 4.27 Convergence Time of Single Node and Cloud
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9 1 0
F-M
easure
No. of Food Items
Diet-Right
KB
FROH
10
100
1000
10000
100000
10 20 30 40 50 60 70 80 90 100 110 120
Tim
e S
econ
ds
No. of Ants
Single Node
Cloud
130
4.7 Summary
In this chapter, a cloud based food recommendation system called Diet-Right is
presented. Based on user input, it recommends a list of optimal food items using an
ACO model. Diet-Right manages and updates the heuristic information in such a way
that the diversity among foods is maximized. Extensive experimentation was performed
to check the cost, accuracy, convergence time, and performance gain. Diet-Right can
play a vital role in controlling various diseases. The experimental results showed that
compared to single node execution, the convergence time of parallel execution on cloud
is approximately 12 times lower.
131
Chapter 5
Conclusion and Future Work
132
5.1 Conclusions
Overall findings of this dissertation are summarized as follows. Detailed quantitative
analysis of different datasets has been conducted to show how the performance of
existing models is affected by different datasets. A quantitative analysis is performed
of four state of the art recommendation techniques over datasets of Foursquare,
MovieLens and Gowalla. Quality of recommendations is significantly improved by
using auxiliary information, such as check-in data, geographical information, social
relationships and temporal information. Similarly, experimental results proved that
model-based approaches are more effective and efficient, in terms of accuracy and
scalability, than memory-based approaches. Moreover, the performance of model-
based approaches proved to be scalable while increasing the number of
recommendations.
In recent past, tremendous increase has been witnessed in the use of social networking
sites. The social networking sites collect different information from users, such as
users’ check-ins and geographical information that include longitude and latitude, time,
user ratings, and comments about the location. To this end, Amazon, Facebook, Twitter,
Google, and YouTube gather massive amount of data on daily basis. For instance,
statistics show that only YouTube users upload 72 hours of new videos in a single
minute. Facebook users share approximately 2.5 million pieces of information
(pictures, videos, check-ins, and comments) in every single minute and Google receives
over 4 Million search queries in every minute [253]. These services offer users to share
different types of data, such as images, videos, tweets, ratings, and comments.
Therefore, unstructured dataset created by these services for variety of applications is
huge in volume. The collected data is used to suggest variety of recommendations
according to the users’ preferences. There are numerous challenges in handling and
parsing the huge volume of data. In this dissertation, focus is on the challenges faced
by the recommendation systems in terms of scalability, real-time recommendations,
data sparsity, and cold start.
5.1.1 Real-Time Venue Recommendations
This research is started with the problem to generate real-time recommendations from
the large datasets through effective use of scalable cloud architecture. This study has
presented multifold contributions by devising a cloud based real-time venue
133
recommendation framework. The major contribution of this work is the integration of
knowledge engineering techniques including K-means, KNN, and collaborative
filtering on a cloud infrastructure for generating real-time recommendations. The
proposed RTVR framework considered the effect of dynamic real-world physical
factors in addition to the collective opinions of the experienced users. RTVR solved the
scalability issues by putting forward a cloud-based architecture that offloads data and
computationally intensive processing to cloud servers. As a result, RTVR always
consider precompiled set of experienced users for all categories that enable it to
recommend appropriate venues for new users at finer granularity. For handling the
problem of cold start in venue recommendations, our model compares the preferences
of the new user with the existing Top-N users. Our model maintains a pre-computed
ranked list of popular users and venues. Using such list, our model compares the
preferences of the new user with the existing popular users, i.e., users with top rankings.
The model suggests the recommendations according to the top user that best matches
the preferences with the new user. Data sparsity problem is handled in the pre-
processing phase when our model is clustering the venues. The results of our proposed
RTVR model showed that this research is successful in achieving the target of
scalability, cold start, data sparsity, and generating real-time recommendations.
Moreover, extensive comparison is performed among the state of the art and
most popular techniques, i.e., OmniSuggest, POPULAR, SVD, and Social. As a first
step, the performance metrics of precision, recall, and F-measure are opted to evaluate
the effectiveness of RTVR model. The results showed that RTVR model performed
better than other selected techniques in terms of precision, recall, and F-Measure except
OmniSuggest. The reason behind superior performance of OmniSuggest in terms of
precision, recall and F-measure is that the focus of our proposed model is to generate
real-time recommendations, which is achieved by significant reduction in the dataset.
Consequently, to generate real-time recommendations, RTVR slightly compromised
the accuracy due to inherent tradeoff between data size and accuracy [1]. The reason
for better performance compared to rest of the techniques is due to clustering the venues
using K-Means clustering technique. It is observed from the literature that most of the
traditional techniques perform clustering based on user’s location. In contrast, better
performance is achieved by clustering the venues and reducing the dataset in order to
place user in his/her relevant cluster immediately after the check-in is performed.
Moreover, proposed model is evaluated with other techniques without clustering.
134
Results of the experiments showed that without clustering, high precision and recall
values are achieved. However, time complexity also increases due to the execution of
proposed algorithm over complete dataset. MapReduce is used for the parallel
execution of the selected clusters on cloud. However, a tradeoff between
recommendation quality and reduced dataset exists. Quality of recommendations may
be affected if the dataset is significantly reduced to improve efficiency of online real-
time processing. The evaluation results have proven that performance of the proposed
RTVR framework is superior to many of the existing schemes proposed in the literature.
5.1.2 Food Recommendation
Next, the field of health recommendations is explored and a cloud based food
recommendation system is presented, called Diet-Right, for dietary recommendations
based on users’ pathological reports. The proposed model uses ant colony algorithm to
generate optimal food list and recommends suitable foods according to the values of
pathological reports. Diet-Right can play a vital role in controlling various diseases.
One of the motivation of using ACO is that it can be executed as a distributed algorithm,
therefore, it is suitable to run on cloud environment. Some other advantages of using
ACO is its behavior of quick discovery of optimal solutions and due to these advantages
it can be used in dynamic applications. Moreover, extensive comparison is performed
among the most popular optimization techniques, i.e., ACO, PSO, and GA. All three
techniques are compared to evaluate the time complexity, accuracy, and RMSE in order
to select one of the optimization technique for our experiments. The results showed that
ACO outperformed PSO and GA in terms of time complexity, accuracy, and RMSE.
However, it must be noted that performance of these optimization techniques is problem
dependent. The experimental results showed that compared to single node execution,
the convergence time of parallel execution on cloud is approximately 12 times faster.
Moreover, adequate accuracy is achieved by increasing the number of ants. Similarly,
experiments were performed to analyze cost, accuracy, convergence time, and
performance gain of proposed model. Experimental results clearly indicate that
proposed model generates optimal food recommendations for users in a scalable
manner. Moreover, our model is compared with the traditional Knowledge Based (KB)
model to compare the precision, recall, F-Measure, and time complexity of our model.
Results showed that Diet-Right model outperforms KB model in terms of precision,
recall, F-Measure, and time complexity. Furthermore, the problem of cold start does
135
not exist in the proposed food recommendation model because the historical data of the
user is not required to generate recommendations. The proposed model only requires
the values of pathological reports and demographic data by the user to generate the
personalized food recommendations.
Work presented in this dissertation addresses an important problem concerning
real-time recommendations in a scalable architecture. Evaluation metrics have been
utilized, such as accuracy, diversity, and quality of recommendation for evaluating the
effectiveness of proposed models. This dissertation provides key elements, in terms of
system models and algorithms that can be utilized by other researchers for development
of novel algorithms and architectures for designing and performance evaluation of such
systems. State of the art is used to position the contributions of this work and
performance analysis is done compared with relevant key algorithms.
The main motivation behind this study was to generate real-time recommendations
when dealing with high volume of unstructured data. However, generating real-time
recommendations in isolation is not valuable unless other relevant parameters, such as
scalability, cold start, data sparseness, accuracy, and diversity in recommendations are
considered. System models and algorithms presented as part of this dissertation are
specifically targeted to generate real-time recommendations with a balance with
challenges of scalability, data sparsity, and cold start problems. Our proposed models
can be used in variety of applications of recommendation system. For venue based
recommendations, our model can be used by users searching for variety of locations
such as restaurants, shopping malls, hospitals, and tourist places. As an example, the
user when arriving in a new venue may encounter problems while visiting the new
venues without proper recommendations according to his/her preferences. Similarly,
our food recommendation model can be used in variety of health applications such as
balanced diet and health recommendations for different diseases.
Based on the aforementioned findings, it is concluded that for generating real-
time recommendations and solving scalability challenges, cloud computing plays a vital
role. Similarly, cloud based approaches coupled with clustering techniques help in
efficiently analyzing the huge volume of unstructured dataset collected by different
online social network services. Moreover, reducing the dataset helps in generating real-
time recommendations, however, significant reduction of dataset may affect the
136
accuracy of the recommendations. The tradeoff between reduced dataset and accuracy
affects the quality of recommendations provided to the users.
5.2 Opportunities in Recommendation Systems
The recommendation systems have vast applications in the areas, such as healthcare,
transportation, tourism, and education. In the following subsections, some of the
opportunities in adopting the services of recommendation systems are briefly discussed.
5.2.1 Healthcare
Healthcare is one of the main areas where recommendation systems can significantly
enhance the efficiency, reliability, and effectiveness of the system [190, 254, 255].
People from various domains often require multiple healthcare services, such as
specific disease specialist, hospitals, and health insurance plans [256] that closely match
people’s preferences. Recommendation systems can play an important role in the
healthcare industry in order to connect and provide localized recommendations for
patients, healthcare providers, and insurance companies.
5.2.2 Transportation
Another interesting area for the adoption of recommendation systems is transportation.
Recommendation systems can be helpful in route recommendations, e.g., for
individuals driving their personal vehicles, cab drivers, and public transporters [68, 69].
Heavy traffic in peak hours is one of the significant problems all over the world. In such
situations, people can use the services of LBRS for different routes to their destinations.
Similarly, carpooling is also among one of the services provided by recommendation
systems10 [257]. Effective recommendation systems adoption in transportation can
significantly reduce the cost of fuel and enhance the reliability of services provided by
LBRS.
5.2.3 Tourism
An important area where recommendation systems are actively deployed is the tourism
industry where people want to plan in advance their preferred locations to visit.
Sometimes it is difficult to choose the appropriate place when one has to make a
10 www.uber.com
137
decision from multiple available choices. Recommendation systems has been used to
provide the effectiveness in tourism by recommending the appropriate trip plans as well
as the other well-known nearby point-of-interests such as hotels, restaurants, shopping
malls [129, 258]. The adoption of recommendation systems in tourism can significantly
save the time of tourists to reach their destination in a suitable time.
5.2.4 Education
Another key area where the services of recommendation systems can significantly play
an important role is education. Students need better institutions, such as colleges and
universities for their higher education. Recommendation systems can be used to
discover the best institutes according to the preferences of the student11 12.
5.3 Future Directions
There can be several research directions to further extend the work presented in this
dissertation. Below, future extensions are listed that can be applied to proposed work.
5.3.1 Real- World Factors and Group Recommendation
With the continuous evolution of social networking services, the significance of
recommendation models considering group preference has also increased. However,
most of the existing traditional recommendation schemes do not take into account group
of “friends” scenarios [38, 66, 69]. In group scenarios, the recommendation systems
not only models the preferences of a group member but the location of each member
must also be taken into account to satisfy all the members in the group. The individuals’
preferences and their preferred locations are then aggregated as the recommendation
for the whole group [172]. In recent past, limited work has been carried out in the field
of group recommendations, such as [70, 172, 259]. Most of the existing techniques
proposed in the literature do not specifically focus on the effects of real-time physical
factors, such as distance from location, traffic, and weather conditions on group
recommendations. However, the complexity and cost of processing the large-scale
datasets negatively affect the performance and efficiency of recommendation systems.
11 www.ratemyprofessors.com/ 12 https://foursquare.com/
138
In the context of LBRS, there has been limited work performed on group-based location
recommendations.
The main motivation behind consideration of real-world parameters in group
recommendation is to include the current context of each of the group member in the
location recommendation process. By doing so, the selected location will be based on
mutual consensus of group members and will be the one that satisfies all the members
in a group. It is noteworthy to mention that providing real-time recommendations is
highly compute-intensive task as the workload consists of huge volume of user data
accumulated in the system over time. When the system is offering routes along with
locations, then the key location and routes attributes such as location type, distance of
the location, travel time, route complexity, time-of-day, and real-time world factors
with temporal features need to be considered. Moreover, most of the existing
recommendation systems use only a single type of data source for recommendations.
Using diverse data sources will enable the recommendation systems to provide effective
recommendations. Diverse data sources may include distance from location, traffic
conditions, weather conditions, multiple routes to location, and time of the day
(morning, evening or night).
5.3.2 Balance in Accuracy and Diversity of Recommendations
The researchers of recommendation systems domain carried research for the last few
years on balancing the diversity and accuracy while generating the final
recommendations. Variety of techniques have been proposed for offering diverse and
novel recommendations but has negative impact on accuracy. At the same time, the
focus should also be on maintaining the balance between diversity and similarity [3, 61,
62]. Moreover, highly diverse recommendations also affect the similarity and accuracy
criteria for the user. Therefore the diversity of the recommendations should be provided
with a balance in aforementioned tradeoffs [3]. Studies such as [119] showed that
accuracy alone is not sufficient for the selection of related algorithm. For example, user
may have intent to incorporate diversity or novelty in the recommendations along with
accuracy. Therefore, considering accuracy alone as the criterion for recommendation is
not sufficient. Moreover, it was also shown in another study that the satisfaction level
of users supersedes the requirements of being accurate and that the users are more likely
to accept more diverse recommendations [62]. But in majority cases, users need a
139
balance between accuracy and diversity [62]. Furthermore, different frameworks of
recommendation systems have different levels of accuracy and diversity. The levels
may change from one framework to another framework. Therefore, the level adjustment
for accuracy and diversity is significant for recommendation systems. In order to
maintain the balance in metrics like accuracy and diversity, the model used in
recommendation systems must consider hybrid techniques that will minimize the
tradeoff between the aforementioned metrics.
5.3.3 Cold Start Problem
The problem of cold start in recommendation systems is still an open research area as
there has not been an optimal solution for cold start users and items. A solution for cold
start problem is provided by comparing the preferences of cold start user to the existing
top users. As a result, the recommendations are generated based on the preferences of
top users generated by KNN whose similarity best matches the cold start user. Variety
of solutions exist in the literature [71, 107, 260], but still there are some areas where
there is no optimal solution provided for cold start problem. Existing approaches mostly
rely on neighbors of the new user and provide recommendations to the new user that
best matches with the neighbor’s preferences [260]. These techniques apply various
learning techniques on user’s historical data to discover the nearest neighbors of the
new user in order to provide recommendations. The techniques used for finding the
nearest neighbors or top users is applicable only when there exist sufficient dataset with
relevant information. There can be numerous areas where either the dataset is not
sufficient or the recommendation area is new and limited number of users are using that
system and hence there are no top users or neighbors. In such scenarios, traditional
solutions are inadequate to generate recommendations for a new user. Therefore, hybrid
techniques need to be considered from multiple disciplines including artificial neural
networks, Bayesian networks, and machine learning techniques for developing
solutions that will efficiently handle the issue of cold start.
5.3.4 Extension in Food Recommendations
In this dissertation, a food recommendation is proposed based on user’s pathological
reports which is not specific to any disease. As a future research for food
recommendation system, breakdown of recommended diets is desired for different
timings of the day, such as breakfast, lunch, and dinner. As the values of nutrition in a
140
specific diet plays an important role in monitoring and controlling any disease, therefore
a breakdown in the nutrition are important. For instance, a diabetic patient only needs
a specific amount of sugar value in every diet. Therefore, it is important to breakdown
the sugar value into the complete diet taken by the patient in breakfast, lunch, and
dinner. Moreover, group food recommendation for family/friends is another interesting
research area that can be explored. There may exist more than one patient in a family
suffering from different diseases. In such case, a system is desired to generate food
recommendations which satisfy the preferences of different patients’ groups in the same
family. The system must generate an optimal food list where diversity among food
items is minimum and patients in the family can take the same food to control different
diseases.
141
Chapter 6
References
142
1. Bobadilla, J., et al., Recommender systems survey. Knowledge-based systems,
2013. 46: p. 109-132.
2. Colombo-Mendoza, L.O., et al., RecomMetz: A context-aware knowledge-
based mobile recommender system for movie showtimes. Expert Systems with
Applications, 2015. 42(3): p. 1202-1222.
3. Adomavicius, G. and Y. Kwon, Improving aggregate recommendation diversity
using ranking-based techniques. IEEE Transactions on Knowledge and Data
Engineering, 2012. 24(5): p. 896-911.
4. Esparza, S.G., M.P. O’Mahony, and B. Smyth, Mining the real-time web: a
novel approach to product recommendation. Knowledge-Based Systems, 2012.
29: p. 3-11.
5. Jung, H. and K. Chung, Knowledge-based dietary nutrition recommendation for
obese management. Information Technology and Management, 2016. 17(1): p.
29-42.
6. Estrela, D., et al. A Recommendation System for Online Courses. in World
Conference on Information Systems and Technologies. 2017. Springer.
7. Ivarsson, J. and M. Lindgren, Movie recommendations using matrix
factorization. 2016.
8. Zhao, L., et al. Matrix factorization+ for movie recommendation. in
Proceedings of the 25th International Joint Conference on Artificial
Intelligence (IJCAI’16). 2016.
9. Hwang, T.-G., et al., An algorithm for movie classification and recommendation
using genre correlation. Multimedia Tools and Applications, 2016. 75(20): p.
12843-12858.
10. Aerts, G., T. Smits, and P. Verlegh, The Influence Of A Product Picture And A
Prior Review On Product Recommendations And Evaluations. 2017.
11. Pöyry, E., et al. Personalized Product Recommendations: Evidence from the
Field. in Proceedings of the 50th Hawaii International Conference on System
Sciences. 2017.
12. Balasubramanian, S., et al., Product recommendations based on analysis of
social experiences. 2016, Google Patents.
143
13. Eravci, B., et al. Location Recommendations for New Businesses Using Check-
in Data. in Data Mining Workshops (ICDMW), 2016 IEEE 16th International
Conference on. 2016. IEEE.
14. Pasanisi, F., L. Santarpia, and C. Finelli, Diet Recommendations, in Clinical
Management of Overweight and Obesity. 2016, Springer. p. 13-21.
15. Su, C.-J., Y.-A. Chen, and C.-W. Chih, Personalized ubiquitous diet plan
service based on ontology and web services. International Journal of
Information and Education Technology, 2013. 3(5): p. 522.
16. Wilson, D., An Evaluation of Leadership Competencies and Formal Leadership
Education Recommendations for Library Leaders of the 21st Century. 2016,
WILMINGTON UNIVERSITY (DELAWARE).
17. Howard, T.C., T.-R. Douglas, and C.A. Warren, “What Works”:
Recommendations on Improving Academic Experiences and Outcomes for
Black Males. Teachers College Record, 2016. 118(6): p. n6.
18. Mehta, B., C. Sonntag, and I. Mahaniok, Context-influenced application
recommendations. 2016, Google Patents.
19. Evans, E.Z., D.A. Markley, and J.N. Adkins III, Application recommendations
based on application and lifestyle fingerprinting. 2016, Google Patents.
20. Gao, H., et al. Content-Aware Point of Interest Recommendation on Location-
Based Social Networks. in AAAI. 2015.
21. Ye, M., et al. Exploiting geographical influence for collaborative point-of-
interest recommendation. in Proceedings of the 34th international ACM SIGIR
conference on Research and development in Information Retrieval. 2011. ACM.
22. Snashall, E. and S. Hindocha, The Use of Smartphone Applications in Medical
Education. Open Medicine Journal, 2016. 3(1).
23. Chen, J., et al., The use of smartphone health apps and other mobile health
(mHealth) technologies in dietetic practice: a three country study. Journal of
Human Nutrition and Dietetics, 2017.
24. Tussyadiah, I.P. and D. Wang, Tourists’ attitudes toward proactive smartphone
systems. Journal of Travel Research, 2016. 55(4): p. 493-508.
25. Cao, H. and M. Lin, Mining smartphone data for app usage prediction and
recommendations: A survey. Pervasive and Mobile Computing, 2017.
144
26. Ying, J.J.-C., et al. Mining user similarity from semantic trajectories. in
Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location
Based Social Networks. 2010. ACM.
27. Bao, J., et al., A survey on recommendations in location-based social networks.
ACM Transaction on Intelligent Systems and Technology, 2013.
28. Foursquare. [cited 2016 January 03]; Available from:
https://foursquare.com/about.
29. MovieLens. [cited 2016 February 14]; Available from:
https://grouplens.org/datasets/movielens/.
30. Taxi Trace Dataset. [cited 2016 18 January, 2016]; Available from:
http://crawdad.org/roma/taxi/20140717/.
31. Lü, L., et al., Recommender systems. Physics Reports, 2012. 519(1): p. 1-49.
32. Sarwat, M., et al., LARS*: a scalable and efficient location-aware recommender
system. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2013.
33. Ye, M., P. Yin, and W.-C. Lee. Location recommendation for location-based
social networks. in Proceedings of the 18th SIGSPATIAL international
conference on advances in geographic information systems. 2010. ACM.
34. Bao, J., Y. Zheng, and M.F. Mokbel. Location-based and preference-aware
recommendation using sparse geo-social networking data. in Proceedings of
the 20th international conference on advances in geographic information
systems. 2012. ACM.
35. De Mauro, A., et al. What is big data? A consensual definition and a review of
key research topics. in AIP conference proceedings. 2015. AIP.
36. Doytsher, Y., B. Galon, and Y. Kanza. Storing routes in socio-spatial networks
and supporting social-based route recommendation. in Proceedings of the 3rd
ACM SIGSPATIAL International Workshop on Location-Based Social
Networks. 2011. ACM.
37. Chang, K.-P., et al. Discovering personalized routes from trajectories. in
Proceedings of the 3rd ACM SIGSPATIAL International Workshop on
Location-Based Social Networks. 2011. ACM.
38. Noulas, A., et al., Exploiting Semantic Annotations for Clustering Geographic
Areas and Users in Location-based Social Networks. The Social Mobile Web,
2011. 11(2).
145
39. Batchinsky, A., L.C. Cancio, and J. Salinas, Patient care recommendation
system. 2017, Google Patents.
40. Molla, Y.B., et al., Geographic information system for improving maternal and
newborn health: recommendations for policy and programs. BMC Pregnancy
and Childbirth, 2017. 17(1): p. 26.
41. Wang, S.-L., et al., Design and evaluation of a cloud-based Mobile Health
Information Recommendation system on wireless sensor networks. Computers
& Electrical Engineering, 2016. 49: p. 221-235.
42. Bates, D.W., et al., Big data in health care: using analytics to identify and
manage high-risk and high-cost patients. Health Affairs, 2014. 33(7): p. 1123-
1131.
43. Kim, J., D. Lee, and K.-Y. Chung, Item recommendation based on context-
aware model for personalized u-healthcare service. Multimedia Tools and
Applications, 2014. 71(2): p. 855-872.
44. Oh, Y., A. Choi, and W. Woo, u-BabSang: a context-aware food
recommendation system. The Journal of Supercomputing, 2010. 54(1): p. 61-
81.
45. Who, J. and F.E. Consultation, Diet, nutrition and the prevention of chronic
diseases. World Health Organ Tech Rep Ser, 2003. 916(i-viii).
46. Ge, M., et al. Using tags and latent factors in a food recommender system. in
Proceedings of the 5th International Conference on Digital Health 2015. 2015.
ACM.
47. Organization, W.H., Nutrition for health and development: a global agenda for
combating malnutrition. 2000.
48. Organization, W.H., Childhood nutrition and progress in implementing the
international code of marketing of Breast-milk substitute. Report by the
Secretariat. A, 2002. 55.
49. UNICEF and W.H. Organization, Global prevalence of vitamin A deficiency.
1995.
50. Organization, W.H., Iron deficiency anaemia: assessment, prevention and
control: a guide for programme managers. 2001.
51. Organization, W.H., Progress towards the elimination of iodine deficiency
disorders (IDD). 1999.
146
52. Phanich, M., P. Pholkul, and S. Phimoltares. Food recommendation system
using clustering analysis for diabetic patients. in Information Science and
Applications (ICISA), 2010 International Conference on. 2010. IEEE.
53. Freyne, J. and S. Berkovsky, Evaluating recommender systems for supportive
technologies, in User Modeling and Adaptation for Daily Routines. 2013,
Springer. p. 195-217.
54. Runo, M., FooDroid: A Food Recommendation App for University Canteens.
Unpublished semester thesis, Swiss Federal Institute of Theology, Zurich, 2011.
55. Evert, A.B., et al., Nutrition therapy recommendations for the management of
adults with diabetes. Diabetes care, 2013. 36(11): p. 3821-3842.
56. LeFevre, M.L., Behavioral Counseling to Promote a Healthful Diet and
Physical Activity for Cardiovascular Disease Prevention in Adults With
Cardiovascular Risk Factors: US Preventive Services Task Force
Recommendation StatementBehavioral Counseling in Adults With
Cardiovascular Risk Factors. Annals of internal medicine, 2014. 161(8): p.
587-593.
57. Teng, C.-Y., Y.-R. Lin, and L.A. Adamic. Recipe recommendation using
ingredient networks. in Proceedings of the 4th Annual ACM Web Science
Conference. 2012. ACM.
58. Khalid, O., et al., Omnisuggest: A ubiquitous cloud-based context-aware
recommendation system for mobile social networks. IEEE Transactions on
Services Computing, 2014. 7(3): p. 401-414.
59. Sharma, L. and A. Gera, A survey of recommendation system: Research
challenges. International Journal of Engineering Trends and Technology
(IJETT), 2013. 4(5): p. 1989-1992.
60. Vargas, S. and P. Castells. Rank and relevance in novelty and diversity metrics
for recommender systems. in Proceedings of the fifth ACM conference on
Recommender systems. 2011. ACM.
61. Liu, Q. Accurate and Diverse Recommendations via Integrated Communities of
Interest and Trustable Neighbors. in Management of e-Commerce and e-
Government (ICMeCG), 2014 International Conference on. 2014. IEEE.
62. Javari, A. and M. Jalili, A probabilistic model to resolve diversity–accuracy
challenge of recommendation systems. Knowledge and Information Systems,
2015. 44(3): p. 609-627.
147
63. Yin, H., et al., Challenging the long tail recommendation. Proceedings of the
VLDB Endowment, 2012. 5(9): p. 896-907.
64. Gong, S., Research on Attack on Collaborative Filtering Recommendation
Systems. Advances in Information Sciences and Service Sciences, 2013. 5(10):
p. 938.
65. Burke, R., M.P. O’Mahony, and N.J. Hurley, Robust collaborative
recommendation, in Recommender systems handbook. 2011, Springer. p. 805-
835.
66. Preoţiuc-Pietro, D. and T. Cohn. Mining user behaviours: a study of check-in
patterns in location based social networks. in Proceedings of the 5th Annual
ACM Web Science Conference. 2013. ACM.
67. Wang, H., M. Terrovitis, and N. Mamoulis. Location recommendation in
location-based social networks using user check-in data. in Proceedings of the
21st ACM SIGSPATIAL International Conference on Advances in Geographic
Information Systems. 2013. ACM.
68. Liu, L., et al., A real-time personalized route recommendation system for self-
drive tourists based on vehicle to vehicle communication. Expert Systems with
Applications, 2014. 41(7): p. 3409-3417.
69. Su, H., et al. Crowdplanner: A crowd-based route recommendation system. in
Data Engineering (ICDE), 2014 IEEE 30th International Conference on. 2014.
IEEE.
70. Hao, F., et al., An efficient approach to generating location-sensitive
recommendations in ad-hoc social network environments. IEEE Transactions
on Services Computing, 2015. 8(3): p. 520-533.
71. Ji, K. and H. Shen, Addressing cold-start: scalable recommendation with tags
and keywords. Knowledge-based systems, 2015. 83: p. 42-50.
72. Bellogín, A., P. Castells, and I. Cantador. Improving memory-based
collaborative filtering by neighbour selection based on user preference overlap.
in Proceedings of the 10th Conference on Open Research Areas in Information
Retrieval. 2013. LE CENTRE DE HAUTES ETUDES INTERNATIONALES
D'INFORMATIQUE DOCUMENTAIRE.
73. Kaleli, C., An entropy-based neighbor selection approach for collaborative
filtering. Knowledge-Based Systems, 2014. 56: p. 273-280.
148
74. Hu, Y., Q. Peng, and X. Hu. A time-aware and data sparsity tolerant approach
for web service recommendation. in Web Services (ICWS), 2014 IEEE
International Conference on. 2014. IEEE.
75. Guo, G., J. Zhang, and D. Thalmann, Merging trust in collaborative filtering to
alleviate data sparsity and cold start. Knowledge-Based Systems, 2014. 57: p.
57-68.
76. Yin, H., et al. Lcars: a location-content-aware recommender system. in
Proceedings of the 19th ACM SIGKDD international conference on Knowledge
discovery and data mining. 2013. ACM.
77. Statistics of Smart Phones. [cited 2016 October 21]; Available from:
https://www.statista.com/statistics/203734/global-smartphone-penetration-per-
capita-since-2005/.
78. Pew Research Center. [cited 2016 March 21, 2016]; Available from:
http://www.pewinternet.org/2015/04/01/us-smartphone-use-in-2015/.
79. Bobrow, J., Representation and understanding: Studies in cognitive science.
2014: Elsevier.
80. Lewis, D.D. Learning in intelligent information retrieval. in Machine Learning:
Proceedings of the Eighth International Workshop. 2014.
81. Pirasteh, P., D. Hwang, and J.J. Jung, Exploiting matrix factorization to
asymmetric user similarities in recommendation systems. Knowledge-Based
Systems, 2015. 83: p. 51-57.
82. Guy, I., Social recommender systems, in Recommender Systems Handbook.
2015, Springer. p. 511-543.
83. Felfernig, A., et al., Toward the next generation of recommender systems:
applications and research challenges, in Multimedia services in intelligent
environments. 2013, Springer. p. 81-98.
84. Konstan, J.A. and J. Riedl, Recommender systems: from algorithms to user
experience. User Modeling and User-Adapted Interaction, 2012. 22(1-2): p.
101-123.
85. Park, D.H., et al., A literature review and classification of recommender systems
research. Expert Systems with Applications, 2012. 39(11): p. 10059-10072.
86. Bostandjiev, S., J. O'Donovan, and T. Höllerer. TasteWeights: a visual
interactive hybrid recommender system. in Proceedings of the sixth ACM
conference on Recommender systems. 2012. ACM.
149
87. Bobadilla, J., et al., Improving collaborative filtering recommender system
results and performance using genetic algorithms. Knowledge-based systems,
2011. 24(8): p. 1310-1316.
88. Cacheda, F., et al., Comparison of collaborative filtering algorithms:
Limitations of current techniques and proposals for scalable, high-performance
recommender systems. ACM Transactions on the Web (TWEB), 2011. 5(1): p.
2.
89. Jannach, D., et al., Recommender systems: an introduction. 2010: Cambridge
University Press.
90. Cantador, I., P. Castells, and A. Bellogín, An enhanced semantic layer for
hybrid recommender systems: Application to news recommendation.
International Journal on Semantic Web and Information Systems (IJSWIS),
2011. 7(1): p. 44-78.
91. Rikitianskii, A., M. Harvey, and F. Crestani. A personalised recommendation
system for context-aware suggestions. in European Conference on Information
Retrieval. 2014. Springer.
92. Chiang, H.-S. and T.-C. Huang, User-adapted travel planning system for
personalized schedule recommendation. Information Fusion, 2015. 21: p. 3-17.
93. Kolomvatsos, K., C. Anagnostopoulos, and S. Hadjiefthymiades, An efficient
recommendation system based on the optimal stopping theory. Expert Systems
with Applications, 2014. 41(15): p. 6796-6806.
94. Vanetti, M., et al. Content-based filtering in on-line social networks. in
International Workshop on Privacy and Security Issues in Data Mining and
Machine Learning. 2010. Springer.
95. Lops, P., M. De Gemmis, and G. Semeraro, Content-based recommender
systems: State of the art and trends, in Recommender systems handbook. 2011,
Springer. p. 73-105.
96. Lu, Z., et al. Content-Based Collaborative Filtering for News Topic
Recommendation. in AAAI. 2015. Citeseer.
97. Guo, G., J. Zhang, and N. Yorke-Smith. A Novel Bayesian Similarity Measure
for Recommender Systems. in IJCAI. 2013.
98. Shi, Y., M. Larson, and A. Hanjalic, Collaborative filtering beyond the user-
item matrix: A survey of the state of the art and future challenges. ACM
Computing Surveys (CSUR), 2014. 47(1): p. 3.
150
99. Milicevic, A.K., A. Nanopoulos, and M. Ivanovic, Social tagging in
recommender systems: a survey of the state-of-the-art and possible extensions.
Artificial Intelligence Review, 2010. 33(3): p. 187-209.
100. Hurley, N. and M. Zhang, Novelty and diversity in top-n recommendation--
analysis and evaluation. ACM Transactions on Internet Technology (TOIT),
2011. 10(4): p. 14.
101. Wei, S., et al. Item-based collaborative filtering recommendation algorithm
combining item category with interestingness measure. in Computer Science &
Service System (CSSS), 2012 International Conference on. 2012. IEEE.
102. Desrosiers, C. and G. Karypis, A comprehensive survey of neighborhood-based
recommendation methods, in Recommender systems handbook. 2011, Springer.
p. 107-144.
103. Evangelopoulos, N., X. Zhang, and V.R. Prybutok, Latent semantic analysis:
five methodological recommendations. European Journal of Information
Systems, 2012. 21(1): p. 70-86.
104. Hoffman, M., F.R. Bach, and D.M. Blei. Online learning for latent dirichlet
allocation. in advances in neural information processing systems. 2010.
105. Agarwal, D., et al., Content recommendation on web portals. Communications
of the ACM, 2013. 56(6): p. 92-101.
106. Forbes, P. and M. Zhu. Content-boosted matrix factorization for recommender
systems: experiments with recipe recommendation. in Proceedings of the fifth
ACM conference on Recommender systems. 2011. ACM.
107. Wei, J., et al., Collaborative filtering and deep learning based recommendation
system for cold start items. Expert Systems with Applications, 2017. 69: p. 29-
39.
108. Burke, R. and F. Eskandanian. Collaborative Recommendation of Informal
Learning Experiences. in Web Intelligence Workshops (WIW), IEEE/WIC/ACM
International Conference on. 2016. IEEE.
109. Zhao, T., J. McAuley, and I. King. Leveraging social connections to improve
personalized ranking for collaborative filtering. in Proceedings of the 23rd
ACM International Conference on Conference on Information and Knowledge
Management. 2014. ACM.
151
110. Balakrishnan, S. and S. Chopra. Collaborative ranking. in Proceedings of the
fifth ACM international conference on Web search and data mining. 2012.
ACM.
111. Kim, H.-N., et al., Collaborative filtering based on collaborative tagging for
enhancing the quality of recommendation. Electronic Commerce Research and
Applications, 2010. 9(1): p. 73-83.
112. Cremonesi, P., et al. Comparative evaluation of recommender system quality.
in CHI'11 Extended Abstracts on Human Factors in Computing Systems. 2011.
ACM.
113. Zarrinkalam, F. and M. Kahani. A multi-criteria hybrid citation
recommendation system based on linked data. in Computer and Knowledge
Engineering (ICCKE), 2012 2nd International eConference on. 2012. IEEE.
114. Chen, L. and P. Pu, Critiquing-based recommenders: survey and emerging
trends. User Modeling and User-Adapted Interaction, 2012. 22(1-2): p. 125-
150.
115. Parra, D., P. Brusilovsky, and C. Trattner. See what you want to see: visual user-
driven approach for hybrid recommendation. in Proceedings of the 19th
international conference on Intelligent User Interfaces. 2014. ACM.
116. Lampropoulos, A.S., P.S. Lampropoulou, and G.A. Tsihrintzis, A cascade-
hybrid music recommender system for mobile services based on musical genre
classification and personality diagnosis. Multimedia Tools and Applications,
2012. 59(1): p. 241-258.
117. Khribi, M.K., M. Jemni, and O. Nasraoui, Recommendation systems for
personalized technology-enhanced learning, in Ubiquitous learning
environments and technologies. 2015, Springer. p. 159-180.
118. Pu, P., L. Chen, and R. Hu, Evaluating recommender systems from the user’s
perspective: survey of the state of the art. User Modeling and User-Adapted
Interaction, 2012. 22(4-5): p. 317-355.
119. Shani, G. and A. Gunawardana, Evaluating recommendation systems, in
Recommender systems handbook. 2011, Springer. p. 257-297.
120. Bedi, P. and R. Sharma, Trust based recommender system using ant colony for
trust computation. Expert Systems with Applications, 2012. 39(1): p. 1183-
1190.
152
121. Avazpour, I., et al., Dimensions and metrics for evaluating recommendation
systems, in Recommendation systems in software engineering. 2014, Springer.
p. 245-273.
122. Rana, C. and S.K. Jain, A study of the dynamic features of recommender
systems. Artificial Intelligence Review, 2015. 43(1): p. 141-153.
123. Pu, P., L. Chen, and R. Hu. A user-centric evaluation framework for
recommender systems. in Proceedings of the fifth ACM conference on
Recommender systems. 2011. ACM.
124. Verbert, K., et al., Context-aware recommender systems for learning: a survey
and future challenges. IEEE Transactions on Learning Technologies, 2012.
5(4): p. 318-335.
125. Arora, G., et al., MOVIE RECOMMENDATION SYSTEM BASED ON
USERS’SIMILARITY. International Journal of Computer Science and Mobile
Computing, 2014. 3(4): p. 765-770.
126. Zheng, Y., et al., Recommending friends and locations based on individual
location history. ACM Transactions on the Web (TWEB), 2011. 5(1): p. 5.
127. Cechinel, C., et al., Evaluating collaborative filtering recommendations inside
large learning object repositories. Information Processing & Management,
2013. 49(1): p. 34-50.
128. Lyakhov, A.O., A.R. Oganov, and M. Valle, How to predict very large and
complex crystal structures. Computer Physics Communications, 2010. 181(9):
p. 1623-1632.
129. Chen, J.-H., K.-M. Chao, and N. Shah. Hybrid recommendation system for
tourism. in e-Business Engineering (ICEBE), 2013 IEEE 10th International
Conference on. 2013. IEEE.
130. Xia, P., L. Zhang, and F. Li, Learning similarity with cosine similarity
ensemble. Information Sciences, 2015. 307: p. 39-52.
131. Dakhel, G.M. and M. Mahdavi. A new collaborative filtering algorithm using
k-means clustering and neighbors' voting. in Hybrid Intelligent Systems (HIS),
2011 11th International Conference on. 2011. IEEE.
132. Golbandi, N., Y. Koren, and R. Lempel. Adaptive bootstrapping of
recommender systems using decision trees. in Proceedings of the fourth ACM
international conference on Web search and data mining. 2011. ACM.
153
133. Hsu, F.-M., Y.-T. Lin, and T.-K. Ho, Design and implementation of an
intelligent recommendation system for tourist attractions: The integration of
EBM model, Bayesian network and Google Maps. Expert Systems with
Applications, 2012. 39(3): p. 3257-3264.
134. Ergu, D., et al., The analytic hierarchy process: task scheduling and resource
allocation in cloud computing environment. The Journal of Supercomputing,
2013: p. 1-14.
135. Zuva, T., et al., A survey of recommender systems techniques challenges and
evaluation metrics. International Journal of Emerging Technology and
Advanced Engineering, 2012. 2(11): p. 382-386.
136. Jäschke, R., et al., Challenges in tag recommendations for collaborative tagging
systems, in Recommender systems for the social web. 2012, Springer. p. 65-87.
137. Baltrunas, L., T. Makcinskas, and F. Ricci. Group recommendations with rank
aggregation and collaborative filtering. in Proceedings of the fourth ACM
conference on Recommender systems. 2010. ACM.
138. Adomavicius, G. and J. Zhang, Stability of recommendation algorithms. ACM
Transactions on Information Systems (TOIS), 2012. 30(4): p. 23.
139. Hernando, A., et al., Incorporating reliability measurements into the predictions
of a recommender system. Information Sciences, 2013. 218: p. 1-16.
140. Shuja, J., et al., Survey of techniques and architectures for designing energy-
efficient data centers. IEEE Systems Journal, 2016. 10(2): p. 507-519.
141. Hashem, I.A.T., et al., The rise of “big data” on cloud computing: Review and
open research issues. Information Systems, 2015. 47: p. 98-115.
142. Verma, J.P., B. Patel, and A. Patel. Big data analysis: recommendation system
with Hadoop framework. in Computational Intelligence & Communication
Technology (CICT), 2015 IEEE International Conference on. 2015. IEEE.
143. Palanisamy, B., et al. Purlieus: locality-aware resource allocation for
MapReduce in a cloud. in Proceedings of 2011 International Conference for
High Performance Computing, Networking, Storage and Analysis. 2011. ACM.
144. Pakize, S.R. and A. Gandomi, Comparative study of classification algorithms
based on MapReduce model. International Journal of Innovative Research in
Advanced Engineering, ISSN, 2014: p. 2349-2163.
154
145. Chen, C.P. and C.-Y. Zhang, Data-intensive applications, challenges,
techniques and technologies: A survey on Big Data. Information Sciences,
2014. 275: p. 314-347.
146. Mell, P. and T. Grance, The NIST definition of cloud computing. 2011.
147. Zheng, V.W., et al., Towards mobile intelligence: Learning from GPS history
data for collaborative recommendation. Artificial Intelligence, 2012. 184: p.
17-37.
148. Symeonidis, P., D. Ntempos, and Y. Manolopoulos, Recommender systems for
location-based social networks. 2014: Springer.
149. Zheng, Y. Tutorial on location-based social networks. in Proceedings of the
21st international conference on World wide web, WWW. 2012. Citeseer.
150. Majid, A., et al., A context-aware personalized travel recommendation system
based on geotagged social media data mining. International Journal of
Geographical Information Science, 2013. 27(4): p. 662-684.
151. Chon, J. and H. Cha, Lifemap: A smartphone-based context provider for
location-based services. IEEE Pervasive Computing, 2011. 10(2): p. 58-67.
152. Xiao, X., et al., Inferring social ties between users with human location history.
Journal of Ambient Intelligence and Humanized Computing, 2014. 5(1): p. 3-
19.
153. DeScioli, P., et al., Best friends: Alliances, friend ranking, and the MySpace
social network. Perspectives on Psychological Science, 2011. 6(1): p. 6-8.
154. Levandoski, J.J., et al. Lars: A location-aware recommender system. in Data
Engineering (ICDE), 2012 IEEE 28th International Conference on. 2012.
IEEE.
155. Tang, K.P., et al. Rethinking location sharing: exploring the implications of
social-driven vs. purpose-driven location sharing. in Proceedings of the 12th
ACM international conference on Ubiquitous computing. 2010. ACM.
156. Noulas, A., et al. A random walk around the city: New venue recommendation
in location-based social networks. in Privacy, Security, Risk and Trust
(PASSAT), 2012 International Conference on and 2012 International
Confernece on Social Computing (SocialCom). 2012. Ieee.
157. Zhang, W., J. Wang, and W. Feng. Combining latent factor model with location
features for event-based group recommendation. in Proceedings of the 19th
155
ACM SIGKDD international conference on Knowledge discovery and data
mining. 2013. ACM.
158. Ning, X., C. Desrosiers, and G. Karypis, A comprehensive survey of
neighborhood-based recommendation methods, in Recommender systems
handbook. 2015, Springer. p. 37-76.
159. Cao, X., G. Cong, and C.S. Jensen, Mining significant semantic locations from
GPS data. Proceedings of the VLDB Endowment, 2010. 3(1-2): p. 1009-1020.
160. Oh, J., O.-R. Jeong, and E. Lee, Collective Intelligence Based Place
Recommendation System, in Advanced Infocomm Technology. 2013, Springer.
p. 169-176.
161. Wei, L.-Y., Y. Zheng, and W.-C. Peng. Constructing popular routes from
uncertain trajectories. in Proceedings of the 18th ACM SIGKDD international
conference on Knowledge discovery and data mining. 2012. ACM.
162. Chow, C.-Y., J. Bao, and M.F. Mokbel. Towards location-based social
networking services. in Proceedings of the 2nd ACM SIGSPATIAL
International Workshop on Location Based Social Networks. 2010. ACM.
163. Yang, D., et al. A sentiment-enhanced personalized location recommendation
system. in Proceedings of the 24th ACM Conference on Hypertext and Social
Media. 2013. ACM.
164. Zheng, V.W., et al. Collaborative location and activity recommendations with
gps history data. in Proceedings of the 19th international conference on World
wide web. 2010. ACM.
165. Berjani, B. and T. Strufe. A recommendation system for spots in location-based
online social networks. in Proceedings of the 4th Workshop on Social Network
Systems. 2011. ACM.
166. Cheng, C., et al. Fused Matrix Factorization with Geographical and Social
Influence in Location-Based Social Networks. in Aaai. 2012.
167. Slawski, M., M. Hein, and P. Lutsik. Matrix factorization with binary
components. in Advances in Neural Information Processing Systems. 2013.
168. Denstadli, J.M. and J.K.S. Jacobsen, The long and winding roads: Perceived
quality of scenic tourism routes. Tourism management, 2011. 32(4): p. 780-789.
169. Cheng, X., et al., Wideband channel modeling and intercarrier interference
cancellation for vehicle-to-vehicle communication systems. IEEE Journal on
Selected Areas in Communications, 2013. 31(9): p. 434-448.
156
170. Ren, G., T. Long, and W. Juebo, A novel recommender system based on fuzzy
set and rough set theory. Advances in Information Sciences and Service
Sciences, 2011. 3(4).
171. Lemke, A., Technique for Order Preference by Similarity to Ideal Solution.
2014.
172. Christensen, I.A. and S.N. Schiaffino, A Hybrid Approach for Group Profiling
in Recommender Systems. J. UCS, 2014. 20(4): p. 507-533.
173. Gowalla Dataset. [cited 2016 February 16]; Available from:
http://www.yongliu.org/datasets.
174. Liu, B., et al. Learning geographical preferences for point-of-interest
recommendation. in Proceedings of the 19th ACM SIGKDD international
conference on Knowledge discovery and data mining. 2013. ACM.
175. Rehman, F., O. Khalid, and S.A. Madani, A comparative study of location-
based recommendation systems. The Knowledge Engineering Review, 2017.
32.
176. Pandya, S., et al. A novel hybrid based recommendation system based on
clustering and association mining. in Sensing Technology (ICST), 2016 10th
International Conference on. 2016. IEEE.
177. Sawant, K.B., Efficient determination of clusters in k-mean algorithm using
neighborhood distance. Int. J. Emerg. Eng. Res. Technol, 2015. 3: p. 22-27.
178. Guha, S. and N. Mishra, Clustering data streams, in Data Stream Management.
2016, Springer. p. 169-187.
179. Song, G., et al. Solutions for processing k nearest neighbor joins for massive
data on mapreduce. in Parallel, Distributed and Network-Based Processing
(PDP), 2015 23rd Euromicro International Conference on. 2015. IEEE.
180. Kahloul, L., et al., Using high level Petri nets in the modelling, simulation and
verification of reconfigurable manufacturing systems. International Journal of
Software Engineering and Knowledge Engineering, 2014. 24(03): p. 419-443.
181. Barrett, C., A. Stump, and C. Tinelli, The satisfiability modulo theories library
(SMT-LIB)(2010). SMT-LIB. org, 2016. 156.
182. Murata, T., Petri nets: Properties, analysis and applications. Proceedings of the
IEEE, 1989. 77(4): p. 541-580.
183. Jensen, K. and G. Rozenberg, High-level Petri nets: theory and application.
2012: Springer Science & Business Media.
157
184. Gavgani, V.Z. Health information need and seeking behavior of patients in
developing countries' context; an Iranian experience. in Proceedings of the 1st
ACM International Health Informatics Symposium. 2010. ACM.
185. Synnot, A.J., et al., Online health information seeking: how people with multiple
sclerosis find, assess and integrate treatment information to manage their
health. Health Expectations, 2016. 19(3): p. 727-737.
186. Pew Research Center Survey on Health Information Using Smart Phone in
Pakistan. [cited 2017 May 30]; Available from:
http://www.pewglobal.org/2013/05/01/spring-2013-survey/.
187. Muscat, D.M., et al., Can adults with low literacy understand shared decision
making questions? A qualitative investigation. Patient Education and
Counseling, 2016. 99(11): p. 1796-1802.
188. Ziefle, M. and A.K. Schaar, Technology acceptance by patients: empowerment
and stigma. Handbook of Smart Homes, Health Care and Well-Being, 2017: p.
167-177.
189. Dahl, S., et al., Empowering or misleading? Online health information
provision challenges. Marketing Intelligence & Planning, 2016. 34(7): p. 1000-
1020.
190. Wiesner, M. and D. Pfeifer, Health recommender systems: concepts,
requirements, technical basics and challenges. International journal of
environmental research and public health, 2014. 11(3): p. 2580-2607.
191. Johnson, J.D., Health-related information seeking: Is it worth it? Information
Processing & Management, 2014. 50(5): p. 708-717.
192. Jing, F. An empirical study on the features influencing users' adoption towards
personal health records system. in Service Systems and Service Management
(ICSSSM), 2016 13th International Conference on. 2016. IEEE.
193. Johnson, J.D. and D.O. Case, Health information seeking. 2012: Peter Lang
New York, NY.
194. Powell, J., et al., The characteristics and motivations of online health
information seekers: cross-sectional survey and qualitative interview study.
Journal of Medical Internet Research, 2011. 13(1): p. e20.
195. Pan, B., The power of search engine ranking for tourist destinations. Tourism
Management, 2015. 47: p. 79-87.
158
196. Nursing Stories. [cited 2017 February 16]; Available from:
http://allnurses.com/general-nursing-discussion/medical-terms-patients-
301633.html.
197. Xiao, N., et al., Factors influencing online health information search: An
empirical analysis of a national cancer-related survey. Decision Support
Systems, 2014. 57: p. 417-427.
198. Crook, B., et al., Sharing health information and influencing behavioral
intentions: The role of health literacy, information overload, and the Internet in
the diffusion of healthy heart information. Health communication, 2016. 31(1):
p. 60-71.
199. Bosslet, G.T., et al., The patient–doctor relationship and online social
networks: Results of a national survey. Journal of general internal medicine,
2011. 26(10): p. 1168-1174.
200. Kadry, B., et al., Analysis of 4999 online physician ratings indicates that most
patients give physicians a favorable rating. Journal of medical Internet research,
2011. 13(4): p. e95.
201. Sherrington, C., et al., Exercise to prevent falls in older adults: an updated
meta-analysis and best practice recommendations. New South Wales public
health bulletin, 2011. 22(4): p. 78-83.
202. Barnes, P.M. and C.A. Schoenborn, Trends in adults receiving a
recommendation for exercise or other physical activity from a physician or
other health professional. 2012: US Department of Health and Human Services,
Centers for Disease Control and Prevention, National Center for Health
Statistics.
203. Carek, P.J., S.E. Laibstain, and S.M. Carek, Exercise for the treatment of
depression and anxiety. The International Journal of Psychiatry in Medicine,
2011. 41(1): p. 15-28.
204. Farquhar, W.B., et al., Dietary Sodium and Health. Journal of the American
College of Cardiology, 2015. 65(10): p. 1042-1050.
205. Fox, S. and M. Duggan, Health online 2013. Washington, DC: Pew Internet &
American Life Project, 2013.
206. Horton, R., Offline: Clinical leadership improves health outcomes. The Lancet,
2013. 382(9896): p. 925.
159
207. Offline Doctor. [cited 2017 February 23]; Available from:
https://play.google.com/store/apps/details?id=appinventor.ai_wmfrhob.Doktor
_Offline&hl=en.
208. De Pessemier, T., S. Dooms, and L. Martens. A food recommender for patients
in a care facility. in Proceedings of the 7th ACM conference on Recommender
systems. 2013. ACM.
209. Freyne, J. and S. Berkovsky. Intelligent food planning: personalized recipe
recommendation. in Proceedings of the 15th international conference on
Intelligent user interfaces. 2010. ACM.
210. Kashima½, T., S. Matsumoto¾, and H. Ishii, Decision support system for menu
recommendation using rough sets. 2011.
211. Wang, T.J., et al., Vitamin D deficiency and risk of cardiovascular disease.
Circulation, 2008. 117(4): p. 503-511.
212. Tao, Z., A. Shi, and J. Zhao, Epidemiological perspectives of diabetes. Cell
biochemistry and biophysics, 2015. 73(1): p. 181-185.
213. Melanson, E., The effect of exercise on non‐exercise physical activity and
sedentary behavior in adults. Obesity Reviews, 2017. 18(S1): p. 40-49.
214. Faasse, K. and K.J. Petrie, The nocebo effect: patient expectations and
medication side effects. Postgraduate medical journal, 2013: p. postgradmedj-
2012-131730.
215. Pawlowska, M., J. Kapeluto, and D. Kendler, A case report of osteomalacia
unmasking primary biliary cirrhosis. Osteoporosis International, 2015. 26(7):
p. 2035-2038.
216. Huang, L.-C., X. Wu, and J.Y. Chen, Predicting adverse side effects of drugs.
BMC genomics, 2011. 12(5): p. S11.
217. Ng, S.W. and B.M. Popkin, Time use and physical activity: a shift away from
movement across the globe. Obesity Reviews, 2012. 13(8): p. 659-680.
218. Powell, L.M. and B.T. Nguyen, Fast-food and full-service restaurant
consumption among children and adolescents: effect on energy, beverage, and
nutrient intake. JAMA pediatrics, 2013. 167(1): p. 14-20.
219. Ellwood, P., et al., Do fast foods cause asthma, rhinoconjunctivitis and eczema?
Global findings from the International Study of Asthma and Allergies in
Childhood (ISAAC) Phase Three. Thorax, 2013: p. thoraxjnl-2012-202285.
160
220. Sánchez-Villegas, A., et al., Fast-food and commercial baked goods
consumption and the risk of depression. Public health nutrition, 2012. 15(03):
p. 424-432.
221. Lustig, R.H., L.A. Schmidt, and C.D. Brindis, Public health: The toxic truth
about sugar. Nature, 2012. 482(7383): p. 27-29.
222. Sheiham, A. and W.P.T. James, A new understanding of the relationship
between sugars, dental caries and fluoride use: implications for limits on sugars
consumption. Public health nutrition, 2014. 17(10): p. 2176-2184.
223. Klug, E.Q., et al., South African Dyslipidaemia Guideline Consensus Statement:
A joint statement from the South African Heart Association (SA Heart) and the
Lipid and Atherosclerosis Society of Southern Africa (LASSA). South African
Family Practice, 2015. 57(2): p. 22-31.
224. Rittinghouse, J.W. and J.F. Ransome, Cloud computing: implementation,
management, and security. 2016: CRC press.
225. Shaukat Khanum Laboratory. [cited 2016 16 December]; Available from:
https://shaukatkhanum.org.pk/.
226. Agha Khan Laboratory. [cited 2016 16 December]; Available from:
https://www.aku.edu/labreports/Pages/default.aspx.
227. Healthways Laboratory. [cited 2016 16 December]; Available from:
http://www.healthwayslabs.net/.
228. COFID Dataset. [cited 2016 03 September]; Available from:
https://www.gov.uk/government/publications/composition-of-foods-
integrated-dataset-cofid.
229. Geleijnse, G., et al. A personalized recipe advice system to promote healthful
choices. in Proceedings of the 16th international conference on Intelligent user
interfaces. 2011. ACM.
230. Svensson, M., K. Höök, and R. Cöster, Designing and evaluating kalas: A social
navigation system for food recipes. ACM Transactions on Computer-Human
Interaction (TOCHI), 2005. 12(3): p. 374-400.
231. Nutrino. [cited 2016 10 October]; Available from: https://nutrino.co/, .
232. Ueda, M., et al. Recipe recommendation method by considering the user’s
preference and ingredient quantity of target recipe. in Proceedings of the
International MultiConference of Engineers and Computer Scientists. 2014.
161
233. van Pinxteren, Y., G. Geleijnse, and P. Kamsteeg. Deriving a recipe similarity
measure for recommending healthful meals. in Proceedings of the 16th
international conference on Intelligent user interfaces. 2011. ACM.
234. Yang, L., et al., Yum-me: Personalized Healthy Meal Recommender System.
arXiv preprint arXiv:1605.07722, 2016.
235. ShopWell. [cited 2016 10 October]; Available from:
http://www.shopwell.com/.
236. Yummly. [cited 2016 10 October]; Available from:
http://developer.yummly.com.
237. Kennedy, J., Particle swarm optimization, in Encyclopedia of machine learning.
2011, Springer. p. 760-766.
238. Salehi, M., M. Pourzaferani, and S.A. Razavi, Hybrid attribute-based
recommender system for learning material using genetic algorithm and a
multidimensional information model. Egyptian Informatics Journal, 2013.
14(1): p. 67-78.
239. Delévacq, A., et al., Parallel ant colony optimization on graphics processing
units. Journal of Parallel and Distributed Computing, 2013. 73(1): p. 52-61.
240. Kashef, S. and H. Nezamabadi-pour, An advanced ACO algorithm for feature
subset selection. Neurocomputing, 2015. 147: p. 271-279.
241. Hariri, N., B. Mobasher, and R. Burke. Context-aware music recommendation
based on latenttopic sequential patterns. in Proceedings of the sixth ACM
conference on Recommender systems. 2012. ACM.
242. Wang, Y.-X. and Y.-J. Zhang, Nonnegative matrix factorization: A
comprehensive review. IEEE Transactions on Knowledge and Data
Engineering, 2013. 25(6): p. 1336-1353.
243. Pan, W., et al., Adaptive bayesian personalized ranking for heterogeneous
implicit feedbacks. Knowledge-Based Systems, 2015. 73: p. 173-180.
244. Amazon Food Dataset. [cited 2017 January 13]; Available from:
http://jmcauley.ucsd.edu/data/amazon/.
245. Rendle, S., et al. BPR: Bayesian personalized ranking from implicit feedback.
in Proceedings of the twenty-fifth conference on uncertainty in artificial
intelligence. 2009. AUAI Press.
162
246. Yang, X., Y. Guo, and Y. Liu, Bayesian-inference-based recommendation in
online social networks. IEEE Transactions on Parallel and Distributed Systems,
2013. 24(4): p. 642-651.
247. Pearl, J., Bayesian networks. Department of Statistics, UCLA, 2011.
248. Koren, Y., R. Bell, and C. Volinsky, Matrix factorization techniques for
recommender systems. Computer, 2009. 42(8).
249. Shi, Y., M. Larson, and A. Hanjalic. List-wise learning to rank with matrix
factorization for collaborative filtering. in Proceedings of the fourth ACM
conference on Recommender systems. 2010. ACM.
250. Snoek, J., et al. Scalable Bayesian Optimization Using Deep Neural Networks.
in ICML. 2015.
251. El-Dosuky, M., et al. Food recommendation using ontology and heuristics. in
International Conference on Advanced Machine Learning Technologies and
Applications. 2012. Springer.
252. Matlab Parallel Cloud. [cited 2017 10 January]; Available from:
http://www.mathworks.com/products/parallel-computing/matlab-parallel-
cloud/.
253. Big Data. [cited 2017 January 16]; Available from:
https://www.entrepreneur.com/article/233344.
254. Hoens, T.R., et al., Reliable medical recommendation systems with patient
privacy. ACM Transactions on Intelligent Systems and Technology (TIST),
2013. 4(4): p. 67.
255. Middleton, B., et al., Enhancing patient safety and quality of care by improving
the usability of electronic health record systems: recommendations from AMIA.
Journal of the American Medical Informatics Association, 2013. 20(e1): p. e2-
e8.
256. Abbas, A., et al., A cloud based health insurance plan recommendation system:
A user centered approach. Future Generation Computer Systems, 2015. 43: p.
99-109.
257. Zhang, D., et al., A carpooling recommendation system for taxicab services.
IEEE Transactions on Emerging Topics in Computing, 2014. 2(3): p. 254-266.
258. Le, Q.T. and D. Pishva. An innovative tour recommendation system for tourists
in Japan. in Advanced Communication Technology (ICACT), 2016 18th
International Conference on. 2016. IEEE.
163
259. Masthoff, J., Group recommender systems: Combining individual models, in
Recommender systems handbook. 2011, Springer. p. 677-702.
260. Peng, F., et al., N-dimensional Markov random field prior for cold-start
recommendation. Neurocomputing, 2016. 191: p. 187-199.