182

Welcome to Pakistan Research Repository: Home

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Welcome to Pakistan Research Repository: Home

��

������������ ����������������� ������

����������������������

��

���

������������

� �������������������

���������

��

����������������

������������ ���������������������������

����������

Page 2: Welcome to Pakistan Research Repository: Home

����

������������ ���������������� ����������

����������������������������������

��������������������

������������������ ��

������������ ��������������������������

��������������������

���������������� ����������� ��

������������������������

���

������������

� �������������������

����������� �

Page 3: Welcome to Pakistan Research Repository: Home
Page 4: Welcome to Pakistan Research Repository: Home
Page 5: Welcome to Pakistan Research Repository: Home

���

���������� �������

�� ������� ��� �� ���������������������� ����� ����� ����� ��� ���� ������ ������

���! �� ����"�� ����# �#$��������%#�� ����# ������&�������#' �'#�(�

� ������ #��� ��)�������"���#)���������$#����(� !�� ���!��$�#�������* ��������

�++� �,-����� � ����)�� #$� � $#�����# � �%� #�#!�� #�� � �'��� ��� � � ���

%#) ����'#���+�

���� �������$��������� �����$#) ���#���� %#��%��� ��$�����!���)������* ��������

���������!����#�'������'���������!�+�

���.�/� )����01��0��2�

� � �

������������

���������������������

� �

Page 6: Welcome to Pakistan Research Repository: Home

����

� �������������������

�� �#�� ��� �%���� ����� ����%�� '#�(� "�� ��� � � ��� ������ ������ ���! � � ��

��"�� ����# �#$��������%#�� ����# ������&�����#�����������%��'#�(�

'���� #� ��! �$�%� �� %# ����)��# � $�#�� � �� #���� "��# +� � ������ %# ����)��# ���"�

'�������( ������ ��)����%( #'��!��� �������%#�"�������������� �'���� �

����+�

��) ����� �����3�#��#��� %�"#��%��#$�45��� ���,-������ ����)��#$�� $#�����# �

�%� #�#!�� �#'����� "��!������+����$#��� �� ��� � � �)��#�� #$� ��� ��#�� ������ ������

�%���� ����� #� "#���# � #$���� ������ ���� � � "��!����3�� � �� � ���������� )��� ���

�$� %����"�#"�����$����%���+�

��) ����(��$������$#) ��!)�����#$�� ��$#�����"��!�������� ������#��������������� �

�$����'����#$������!������* �������������������!����#�'������'���#(����

�����!��� �������45��� �����) �����������������!����#�")��������� ���# ����

45��) �������� '����� # � '��%�� ���� #$� ��)� ��� ��� "��%�� '�#� �)�������

"��!����3�������+�

���.�/� )����01��0��2�

� � �

������������

���������������������

Page 7: Welcome to Pakistan Research Repository: Home
Page 8: Welcome to Pakistan Research Repository: Home

������

� �

����������

Dedicated to my Parents, Family,

and

All my respected Teachers and Advisors

Page 9: Welcome to Pakistan Research Repository: Home

����

���������� ��

���������� ����������������� ������ ���������� ������������� �����������

��� �������������� �� ������������������ ������� ������������������������

� ����� ��������������� � ��� ������ ���� �!�������� �������������"� � �� ����

� �"� � ������������#�������� ������������� ����������� ���������������"����

������ �������������

$�������� ������%��� ��������"������������������ ��"������� ���!���#&&�������

������� ������ ����� �������������� ����� ������������"����%����� ��"�� '

������ ��!���(���)������ �� ������������������������� ���������������������

�� ������"� � � ��� ��� � ���� �� ������ ������� ����� �� ����"� ������ ���� ����������

� ������� ������%�� ��� ������ �� ���"� ������ �� ��� ��� �� ���� ������� � �

���� �������*����"� ������������� ���+ ������ �� ������������� ��� ����� ������

������ ��������� ������������������ ��������������� �����

,�������� ������%��� ����%�����"�� ����������� ��� �� ��������+ �������#�������

!����������+��-���� ������ ���������������������%������� ���� ����������

��� ������ ��� ����"� �!����������

$����"���"������������������ ��� ��"�����"�� ��������� ����� ����� ����������

� �������� ���������� ����� ���� ��������������������������������� �!���

� �

������������

� �������������������

� �

Page 10: Welcome to Pakistan Research Repository: Home

���

���������

������������ ����������������� ��������������������

�������

.�� ������� �� "���� ��� ��� � � ���� ���� �"� �� ������������ ��� ���� ��

"���� ���� ���� ��� ���/� ��� ����� ���� ������������ ��� ������� ������"�

��� ���� �� � � ������ ���������� ����� � � ����� ��� ���� ��� ���� "���� ����� ��

��� ������� �� "���� ���� ����� ����� ���� � �� ����� ����� �� ������� �� ���

����������� �'� �������� ���� ���� �%�� ���������� ��� ������ ����

��� ������� ��� ���������� ���*��� �� �� ������� �� �� � ����� ������� ���

��������"� �� ������ ������� ��� ���� � -�������� �������� $ ��*����� ���

$��� %� ���� ���� ������ ���� ������� ��������� � �� ��� ������� �� "�����

-���� ��������������� ������� ������������ ����� ����� �����"����� ��������

��������"����� ���������������� �� ���������� ����� ������������������� ���

�� ��� ���� �� "���� ������� �� �� ���� �� ����� �� ��� ���� �� ��������� � � ���

������������������������� ���� ������������������������������� ��"��������� ��

���/���� ����������������

-��� ������ ����������� �� ��� ������� �� "����� ��"� � ������ ��

� �������������� ��� ��� ��������������������� ��� ��������������������/�� �������

������������������������ ������������ ������� ��� ������������������������

������������ ������������������-������� ������������ ������� �����������

������������ ������������������"��� �� ���������������� �����'�������� ���

���� � ������� �� �� ���� ������� ��� � �� � �������� ��� ������ ��� ��� ��������

"����� - � �������� ���'����� ��� ������� ��� ��������� ��� �� ���������0� ����

������������ �������� � ������ 1������ ����� ��� � ��� �2�� ��� ����%'���� �� �����

������������� ��� � ��� ������ ���� ���� �� ���� ��� � ��������� � ������� ����

� �������"����� �� ���� ����������'������������������� �� �������������� ��

���'����� ��� ������� �� "����� #������"�� ����'���� � ��� ������ ����������

������*��� ������ �� �� ���� ������ ������� �� ��� �������� �������"� ���� � �

������� ������������ �������������������� ����������������� ���-���������"�����

� ������� ������������ �������� ������������*��������� ����� ������� ��������

Page 11: Welcome to Pakistan Research Repository: Home

����

�� ����������� ��������������"�������������� ������������������ ��������� ��

��������������� �������

.�� ������� ��"�����������������������������"������������������ ���

��������������������� ������� ������������������ ������� ���� �����������

� ������� ���� ���� � ���� ��� ��� ���� ����"� ������ ���� �������� ��

������� ����������������� ������� ��������*���������������������������� ���

���� �%������ ����� ���� ������ �� ������������ ����0�12������������� �����

1�2� � �� ��� ������� ��� -��� �� ������ � ����� ���� ���� �%� 1������ $��� %��

-�����������$ ��*���2��� ����� �� ���������� ����������������������������"������

#������"�� � �����"� ���� � ��������� � �� ����� ��� � ������� � ������ 1�������������

� ����� �2�� #���� ����� ����� ��� ��� �����3��� � �� � ������ �� �� ��� ���3���

��� ������� �� � �� ���� ����� ����� +� ��� � �������� �� �����3��� � � � ������� ����

�� � ��� ��� ������� �� � ���� ���� ��� �� ��� ����� ����� � � � ������

��� ������� ������"�������� ���������� �������������3���� ��� ����������������

� ��� ����� )'���� ���������� �� � ������� � �������� ���� ��� � � ������� ���

�� ������������������ ����������"��$��������)',�����,����� ���������+ � �"�

(�����3�� �� ���������� �� ������ ������� ���3�����%���� ����� ������������ ��

��� ������� �������������"��4������������������ �������������� � ���� ����

������������������ ������� ��� ������������'�������� ��������������������

�� �����������������"��%������������ ���� ���� ���������� ������� �� �����

�� � ���������*������������������ ����� ������������� ������ ������������������

���������� � � ���������*�����������������������"��������� �������� ����������

���$'��������+ ������� ����� ���������� ����� � ���������*�����������5�56�

��� 7�86� ������� �����"� �� ������ � �� ������ ��� � �� ��� ������� ���

����������"��

��� ����� �� ���'� ���� ����������"�� �� � ��� � ���� ��� ��� ������� � ��

�������� ������� ���� � ��������� � ����������� � ������ ��� � ����� ������

#������"�� �� � ��� � �� ��� ������� �� � ���� ��� ��� ���� ��� �����"� �� ������

������� ��� ���� � � � ��� ������ ������� ����� � � ���� ��������� �� �� �������

�������� ����� ���

Page 12: Welcome to Pakistan Research Repository: Home

xii

TABLE OF CONTENTS

Chapter 1 Introduction ..................................................................................... 1

1.1 Recommendation Systems .............................................................. 2

1.2 Location Based Recommendation Systems ..................................... 3

1.3 Food Recommendation Systems ..................................................... 4

1.4 Challenges ...................................................................................... 6

1.4.1 Scalability ............................................................................ 6

1.4.2 Cold Start ............................................................................. 6

1.4.3 Data Sparseness .................................................................... 7

1.4.4 Over Specialization Problem ................................................ 7

1.4.5 Recommendation of Popular Objects.................................... 7

1.4.6 Attacks on Recommendations .............................................. 7

1.5 Scope of Research .......................................................................... 8

1.6 Motivation ...................................................................................... 9

1.7 Contributions ................................................................................ 10

1.8 Organization of Dissertation ......................................................... 13

Chapter 2 Overview of Recommendation Systems ......................................... 14

2.1 Overview ...................................................................................... 15

2.1.1 Content-based Recommendation Methods .......................... 16

2.1.2 Collaborative Filtering based Recommendation Method .... 17

2.1.3 Hybrid Recommendation Methods ..................................... 19

2.2 Criteria for Recommendations ...................................................... 20

2.2.1 Accuracy ............................................................................ 21

2.2.2 Familiarity .......................................................................... 21

2.2.3 Novelty .............................................................................. 22

2.2.4 Diversity ............................................................................ 22

2.2.5 Context Compatibility ........................................................ 23

2.2.6 Justification of Recommendations ...................................... 23

2.2.7 Sufficiency of Information ................................................. 24

2.3 Similarity Calculations in Recommendation Systems ................... 25

2.3.1 Cosine Based Similarity ..................................................... 25

2.3.2 Correlation Based Similarity .............................................. 26

Page 13: Welcome to Pakistan Research Repository: Home

xiii

2.3.3 Adjusted Cosine Similarity ................................................. 27

2.4 Evaluation of Recommendations................................................... 27

2.4.1 Prediction Metrics .............................................................. 28

2.4.2 Quality of the set of recommendations ............................... 29

2.4.3 Quality of the List of Recommendations ............................ 30

2.4.4 Novelty and Diversity ........................................................ 30

2.4.5 Stability .............................................................................. 31

2.4.6 Reliability .......................................................................... 31

2.5 Cloud Computing in Recommendation Systems ........................... 32

2.6 Summary ...................................................................................... 35

Chapter 3 Real-Time Venue Based Recommendation System ........................ 37

3.1 Location Based Recommender Systems ........................................ 38

3.1.1 Geo Tagged Media Based................................................... 38

3.1.2 Point Location Based ......................................................... 39

3.1.3 Trajectory Based ................................................................ 40

3.2 Distinguishing Features of Locations ............................................ 40

3.2.1 Location Hierarchy ............................................................. 40

3.2.2 Distance of Locations and Users......................................... 41

3.2.3 Sequential Ordering............................................................ 41

3.3 Motivation .................................................................................... 42

3.4 Related Work ................................................................................ 45

3.4.1 Matrix Factorization Techniques ........................................ 48

3.4.2 Explicit Rating Techniques ................................................ 50

3.4.3 Implicit Rating Techniques ................................................ 51

3.4.4 Route Recommendation Techniques .................................. 51

3.4.5 Locations Recommendation Techniques ............................ 53

3.4.6 Group Recommendation Techniques .................................. 55

3.5 Quantitative Analysis .................................................................... 57

3.5.1 Datasets .............................................................................. 57

3.5.2 Techniques ......................................................................... 58

3.5.3 Experiments ....................................................................... 59

3.5.4 Observations ...................................................................... 60

3.6 Real-Time Venue Recommendation Model .................................. 66

Page 14: Welcome to Pakistan Research Repository: Home

xiv

3.6.1 System Architecture ........................................................... 70

3.6.2 Proposed Algorithm ........................................................... 70

3.6.3 Complexity of Clustering Algorithm .................................. 73

3.6.4 Complexity of Ranking Algorithm ..................................... 73

3.7 Formal Verification ...................................................................... 73

3.7.1 High Level Petri Nets ......................................................... 73

3.7.2 SMT-Lib and Z3 Solver ..................................................... 74

3.7.3 Modeling and Analysis of Proposed Algorithm .................. 76

3.7.4 Verification Property .......................................................... 78

3.8 Experimental Setup and Results .................................................... 78

3.8.1 Experimental Setup ............................................................ 78

3.8.2 Results ............................................................................... 79

3.9 Summary ...................................................................................... 90

Chapter 4 A Smart Food Recommendation System ........................................ 91

4.1 Health Recommendation Systems ................................................. 92

4.1.1 Significance ....................................................................... 95

4.2 Motivation .................................................................................... 98

4.3 Related Work .............................................................................. 100

4.4 Proposed Model .......................................................................... 101

4.4.1 Diet-Right Architecture .................................................... 102

4.4.2 Proposed Algorithm ......................................................... 103

4.4.3 Ranked List Generation .................................................... 107

4.4.4 Complexity of Proposed Algorithm .................................. 111

4.5 Formal Verification .................................................................... 112

4.5.1 Modeling and Analysis of Proposed Algorithm ................ 113

4.5.2 Verification Property ........................................................ 115

4.6 Experimental Setup and Results .................................................. 115

4.6.1 Experimental Setup .......................................................... 115

4.6.2 Results ............................................................................. 116

4.7 Summary .................................................................................... 130

Chapter 5 Conclusion and Future Work ....................................................... 131

5.1 Conclusions ................................................................................ 132

5.1.1 Real-Time Venue Recommendations ............................... 132

Page 15: Welcome to Pakistan Research Repository: Home

xv

5.1.2 Food Recommendation ..................................................... 134

5.2 Opportunities in Recommendation Systems ................................ 136

5.2.1 Healthcare ........................................................................ 136

5.2.2 Transportation .................................................................. 136

5.2.3 Tourism ............................................................................ 136

5.2.4 Education ......................................................................... 137

5.3 Future Directions ........................................................................ 137

5.3.1 Real- World Factors and Group Recommendation ............ 137

5.3.2 Balance in Accuracy and Diversity of Recommendations . 138

5.3.3 Cold Start Problem ........................................................... 139

5.3.4 Extension in Food Recommendations ............................... 139

Chapter 6 References ................................................................................. 141

Page 16: Welcome to Pakistan Research Repository: Home

xvi

LIST OF FIGURES

Figure 1.1 Overview of Problems Tackled and Contributions .................................. 12

Figure 2.1 Components of Recommendation Systems .............................................. 15

Figure 2.2 Hierarchy of Recommendation Systems .................................................. 16

Figure 2.3 Cloud Computing ................................................................................... 34

Figure 3.1 Services offered by LBRS ....................................................................... 38

Figure 3.2 Visit patterns of users in a LBRS ............................................................ 42

Figure 3.3 Categorization of Techniques used in LBRS ........................................... 48

Figure 3.4 Concept of Matrix Factorization ............................................................. 49

Figure 3.5 Basic Concept of Implicit Rating using Check-Ins .................................. 51

Figure 3.6 Overview of Location Recommendations ................................................ 55

Figure 3.7 Precision of Gowalla Dataset .................................................................. 60

Figure 3.8 Recall of Gowalla Dataset ....................................................................... 61

Figure 3.9 F-Measure of Gowalla Dataset ................................................................ 61

Figure 3.10 Precision of Foursquare Dataset ............................................................ 62

Figure 3.11 Recall of Foursquare Dataset ................................................................ 63

Figure 3.12 F-Measure of Foursquare Dataset .......................................................... 63

Figure 3.13 Precision of MovieLens Dataset ............................................................ 64

Figure 3.14 Recall of MovieLens Dataset ................................................................ 65

Figure 3.15 F-Measure of MovieLens Dataset ......................................................... 65

Figure 3.16 System diagram for RTVR Model ......................................................... 66

Figure 3.17 User Placement in the Relevant Cluster ................................................. 69

Figure 3.18 System Architecture .............................................................................. 70

Figure 3.19 HLPN Model for the Proposed RTVR Algorithms ................................ 76

Figure 3.20 Precision with Clustering ...................................................................... 79

Figure 3.21 Recall with Clustering ........................................................................... 80

Figure 3.22 F-Measure with Clustering .................................................................... 80

Figure 3.23 Time Comparison with and without Clustering ..................................... 81

Figure 3.24 Precision without Clustering ................................................................. 82

Figure 3.25 Recall without Clustering ...................................................................... 82

Figure 3.26 F-Measure without Clustering .............................................................. 83

Figure 3.27 Difference between Precision Values .................................................... 83

Figure 3.28 Difference between Recall Values ......................................................... 84

Figure 3.29 Difference between F-Measure Values .................................................. 84

Figure 3.30 Scalability over 5 Recommendations .................................................... 85

Figure 3.31 Scalability over 10 Recommendations ................................................... 85

Figure 3.32 Scalability over 15 Recommendations ................................................... 86

Figure 3.33 Scalability over 20 Recommendations ................................................... 86

Figure 3.34 Scalability Comparison w.r.t Precision .................................................. 87

Figure 3.35 Scalability Comparison w.r.t Recall ...................................................... 87

Figure 3.36 Scalability Comparison w.r.t F-Measure ............................................... 88

Figure 3.37 Number of Selected Clusters vs Precision ............................................. 88

Page 17: Welcome to Pakistan Research Repository: Home

xvii

Figure 3.38 Number of Selected Clusters vs Recall .................................................. 89

Figure 3.39 Time Comparison between Single Node and Cloud ............................... 89

Figure 4.1 Hierarchy of Health Recommendation System ........................................ 95

Figure 4.2 Architecture of Diet-Right .................................................................... 102

Figure 4.3 Graph Representation of the Problem ................................................... 104

Figure 4.4 HLPN Model for the Proposed Diet-Right Algorithms .......................... 113

Figure 4.5 Comparison on Accuracy ...................................................................... 116

Figure 4.6 Comparison on Time Complexity ......................................................... 117

Figure 4.7 Comparison on Average RMSE ............................................................ 118

Figure 4.8 Tradeoff between Numbers of Ants to Time Complexity ...................... 118

Figure 4.9 Tradeoff between number of Ants and Average RMSE ......................... 119

Figure 4.10 Cost over Varying No. of Ants and Iterations ...................................... 120

Figure 4.11 Convergence Time of Different Diseases ............................................ 120

Figure 4.12 Cost Comparisons of Diseases ............................................................ 121

Figure 4.13 Accuracy of Recommendations ........................................................... 121

Figure 4.14 Precision @ K=10 ............................................................................... 122

Figure 4.15 Recall @ K=10 ................................................................................... 123

Figure 4.16 F-Measure @ K =10 ........................................................................... 123

Figure 4.17 Precision @ K =20 .............................................................................. 124

Figure 4.18 Recall @ K =20 .................................................................................. 124

Figure 4.19 F-Measure @ K =20 ........................................................................... 125

Figure 4.20 Average RMSE @ K =10 .................................................................... 125

Figure 4.21 Average RMSE @ K =20 .................................................................... 126

Figure 4.22 Time Comparison @ K =10 ................................................................ 126

Figure 4.23 Time Comparison @ K =20 ................................................................ 127

Figure 4.24 Precision ............................................................................................. 128

Figure 4.25 Recall ................................................................................................. 128

Figure 4.26 F-Measure ........................................................................................... 129

Figure 4.27 Convergence Time of Single Node and Cloud ..................................... 129

Page 18: Welcome to Pakistan Research Repository: Home

xviii

LIST OF TABLES

Table 2.1 Comparison of Content Based and CF Based Techniques ......................... 19

Table 2.2 Description of combinations used in Hybrid Technique ............................ 20

Table 2.3 Criteria for recommendations in different research ................................... 24

Table 2.4 User-location Matrix ................................................................................ 25

Table 2.5 Summary of Evaluation Metrics ............................................................... 32

Table 3.1 Example of Explicit Rating ...................................................................... 50

Table 3.2 Strength and Weaknesses of the Selected Techniques .............................. 56

Table 3.3 Summary of Techniques ........................................................................... 56

Table 3.4 Data type for HLPN Model ...................................................................... 75

Table 3.5 Places and mapping used in HLPN model ................................................ 75

Table 3.6 Experimental Setup .................................................................................. 78

Table 4.1 Data Types for HLPN Model ................................................................. 112

Table 4.2 Places and mapping used in HLPN model .............................................. 112

Table 4.3 Experimental Setup ................................................................................ 115

Page 19: Welcome to Pakistan Research Repository: Home

xix

LIST OF ABBREVIATIONS

ACO Ant Colony Optimization

CF Collaborative Filtering

COFID Composition of Foods Integrated Dataset

COR Pearson's Correlation

CORC Pearson's Correlation Constrained

CSF Cerebrospinal Fluid

DCG Discounted Cumulative Gain

FRS Food Recommendation System

GA Genetic Algorithms

GPS Global Positioning System

JMSD Jaccard plus Mean Squared Difference

KNN K-Nearest Neighbor

LARS Location Aware Recommendation System

LBRS Location Based recommendation System

LCF Location-based Collaborative Filtering

MAE Mean Absolute Error

MAS Mean Absolute Shift

MPC Most-Preferred-Category-Based

MSD Mean Squared Difference

MSE Mean Square Error

NMAE Normalized Mean Average Error

PCF Preference-based Collaborative Filtering

POI Point of Interests

RBC Red Blood Cells

RMSE Root of Mean Square Error

ROC Receiver Operating Characteristics

RS Recommendation System

RTVR Real-Time Venue Recommendation

SR Spearman's Rank-Order Correlation

SVD Singular Value Decomposition

TF-IDF Term Frequency-Inverse Document Frequency

TOPSIS Technique for Order Preference by Similarity to Ideal Solution

UBCF User Based Collaborative Filtering

V2VCS Vehicle-to-Vehicle Communication System

VRS Venue Recommendation System

WBC White Blood Cells

Page 20: Welcome to Pakistan Research Repository: Home

1

Chapter 1

Introduction

Page 21: Welcome to Pakistan Research Repository: Home

2

1.1 Recommendation Systems

Recommendation Systems were introduced in the early 1990s to deal with the

challenges of personalized and automatic data retrieval from diverse information

sources [1]. The basic principle of a recommendation system is to compute a list of

items for the user considering user’s preferences, likes/dislikes, ratings, and other

implicit/explicit relationships between user to user or user to item. Various knowledge

discovery techniques [2-5] are applied to users’ contextual and historical data to extract

information, services, and products that have a significant effect on users’ preferences.

Once extraction is performed, the recommendation models apply filtration on the

extracted data and attempt to predict the next items a user may prefer in the future.

However, a recommendation system must consider various factors, such as stability,

accuracy, diversity, and novelty to balance user’s preferences in the recommendations

[1].

With the evolution of smartphones and online social networking applications,

the user generated online content/feedback is continuously increasing. The aforesaid

generated online content might reflect the user’s bias towards certain products, services,

food, venues etc. Based on the user generated content, some of the popular e-commerce

and online service providers, such as Amazon, Facebook, Flicker, and Netflix have also

integrated recommendation systems to provide personalized recommendations to the

users. For instance, Amazon generates recommendations using collaborative filtering

model and clustering algorithms [6]. The Amazon applies collaborative filtering on

item-to-item and user-to-user considering the fact that a user � who buys an item �,

may also be interested in buying the item �, if other users similar to the user U follow

similar pattern of purchasing the items � and �. Here, similarity between the two items

is highly correlated. In the same manner, Amazon utilizes clustering algorithms and

numerous classification algorithms on user-user matrix to identify the similar set of

users.

Recommendation systems have been applied in numerous application areas

including: movie recommendations [7-9], product recommendations [10-12], location

(venue) recommendations [12, 13], diet recommendations [14, 15], academic

recommendations [16, 17], application recommendations [18, 19], and point of interest

(POI) recommendations [20, 21]. However, the basic models and approaches usually

remain the same, regardless of the application areas in which recommendation

Page 22: Welcome to Pakistan Research Repository: Home

3

algorithms are applied. In this dissertation, to have a more focused approach, research

is narrowed down to the following application areas: (a) venue recommendations and

(b) food recommendations. The rapid increase of smartphone users has shifted the focus

of recent research towards the aforementioned areas in order to offer users with food

and venue related recommendations ubiquitously [22-25].

In the discussion to follow, a brief overview of the two application areas of

recommendation systems that are considered in our work: (a) location based

recommendation systems, and (b) food recommendation systems, is presented.

Moreover, this research will discuss the challenges recommendation systems face, the

motivation behind our work in addressing those challenges, and our contributions. In

the end, organization of thesis is presented.

1.2 Location Based Recommendation Systems

In recent years, numerous social networking applications for location-acquisition and

wireless communications were developed for smart phones and mobile devices. The

most popular among those is Facebook, Foursquare, and Instagram. In a user’s context,

location can be considered as one of the most important information. By using the

location history of a user, one can easily extract extensive knowledge about the

preferences and behavior of that particular user [26]. Use of location-based content

helps in bridging the gap between the online social networking services and the physical

world [27]. Another advantage of geo-tagged content is the modeling of new relations

among users, locations, and both. Presently, availability of huge volumes of users’

geospatial data, for instance, Foursquare datasets [28], movies datasets [29], and taxi

trace datasets [30], has motivated the research community to focus efforts on the design

of location-based recommendation systems with models based on the data extracted by

mobile social networking applications [1, 31-33]. Such systems perform venue

recommendations based on users’ location and preferences. For instance, a tourist

entering a city may need to know about popular locations of his interest, such as hotel,

restaurant, shopping malls, and other popular POIs. In this scenario, the system

generates recommendations according to the preferences of the users.

The evolving mobile social networks now allow users to provide a feedback (or

a tip), or rate the venue they visit (or perform check-in at a venue) [3, 26, 34]. Moreover,

huge volume of data such as videos, music, images, and text is collected by mobile

social networks. The collected huge volumes of data is also referred to as Big Data [35].

Page 23: Welcome to Pakistan Research Repository: Home

4

Big Data introduced new challenges for traditional recommender systems that were

initially designed without giving much consideration to scalability factors. Moreover,

the modern recommender systems need to process the big data while facing challenges

like (a) Data sparseness: when limited number of ratings for an item results in sparse

user-to-item matrix [31], (b) Cold start: occurs when the system generates

recommendation for a user who is new to the system with insufficient historic data [31,

36], and (c) Scalability: refers to the ability of a recommendation systems to maintain

good performance under an increased load [36-38]. The aforementioned challenges, if

not tackled decently, may affect the recommendation quality [1, 31].

The core objective of our study is to develop recommendation model that not

only caters the scalability issues of existing recommendation systems, but also utilizes

optimization techniques to address multiple conflicting objectives in producing optimal

list of recommendations. To address scalability challenges, this research leverage cloud

computing to perform real-time processing of large scale data. The model is based on

Hadoop MapReduce framework to enable parallel computation on multiple nodes.

Moreover, data sparsity problem is handled by pre-processing the data to filter out

insignificant/redundant data from online recommendation. The pre-processing phase

successfully eliminated the data sparsity by reducing the dataset. To handle the cold

start problem, our model maintains a pre-computed ranked list of popular users and

venues. Using such list, our model compares the preferences of the new user with the

existing popular users, i.e., users with top rankings. The model suggests the

recommendations according to the top user that best matches the preferences with the

new user.

1.3 Food Recommendation Systems

In the last few years, recommendation systems have been applied in many areas of

health sector. Numerous recommendation models have been proposed to target specific

health application [39-43]. One important emerging area of health recommendations is

food/diet recommendation. Due to lack of concise information about healthy diet,

people mostly have to rely on medications instead of adapting a preventive approach,

like required food items. The selection of proper diet is critical for patients suffering

from various diseases. eHealth initiatives and research efforts aim to offer various

pervasive applications for novice end users to improve their health [44]. Several studies

depict that inappropriate and inadequate intake of daily diet is the major reason of

Page 24: Welcome to Pakistan Research Repository: Home

5

various health issues and diseases [5, 45, 46]. For instance, a study conducted by World

Health Organization (WHO) estimates that around 30% of the total population of the

world suffering from various diseases, and 60% deaths each year in children are related

to malnutrition [47, 48]. Another study by WHO reports that inadequate and

imbalanced intake of food causes around 9% of heart attack deaths, about 11% of

ischemic heart disease deaths, and 14% of gastrointestinal cancer deaths worldwide

[45]. Moreover, around 0.25 billion children are suffering from Vitamin-A deficiency

[49], 0.2 billion people are suffering from iron deficiency anemia [50], and 0.7 billion

people are suffering from iodine deficiency [51]. Generally, a person remains unaware

of major causes behind deficiency or excess of various vital substances, such as

calcium, proteins, and vitamins, and how to normalize such substances through a

balanced diet.

Several works in the literature [15, 46, 52-57] proposed recommendation systems

related to food. These systems can be categorized as: (a) food recommendation systems

[46, 52], (b) menu recommendations [54], (c) diet plan recommendations [15], (d)

health recommendations for different diseases such as diabetes and cardiovascular [55,

56], and (e) recipe recommendations [53, 57]. The aforementioned systems provide

recommendations for either some specific disease or balance the diet without

considering information about any disease or nutrition deficiency in the users. For

instance, in [52], a food recommendation system is proposed for diabetic patients. The

system recommends food for diabetic patients without considering the diabetes level

that may fluctuate frequently. Similarly, the authors in [46] do not consider the nutrition

factors that have significant importance for a balanced diet recommendation.

Keeping in view the aforesaid facts, it is of paramount importance to maintain

a balanced intake of food particularly for the people suffering deficiency or excess of

certain vital ingredients. However, it is quite challenging for a common person to keep

track of personal food requirements because of the massive diversity of dietary

components in different food items. A systematic food recommendation system is

required to recommend the appropriate food considering the disease of the person. The

major challenge in designing such a system is the handling of greater volumes of data

in terms of ingredients, quantity, nutritional facts, user preferences, and simultaneously

taking into consideration a person’s pathological reports. The system must be scalable

enough to handle recommendation queries from all over the globe. Moreover, in food

recommendations, it is of grave importance to generate accurate and relevant

Page 25: Welcome to Pakistan Research Repository: Home

6

recommendations. In this dissertation, a cloud based food recommendation system

called Diet-Right is presented that considers the users’ pathological tests results and

recommends a list of optimal food items. To achieve optimal results that best matches

the users’ preferences, Ant Colony Optimization (ACO) based algorithm is used.

Moreover, cloud computing is used that enables us to perform on-demand scalability

of computing and storage resources.

1.4 Challenges

The existing approaches used in various recommendation systems mostly rely on

collaborative filtering [1, 27, 31]. However, most of the existing approaches commonly

face the issues such as data sparseness, cold start, and scalability. Below, research gaps

are briefly discuss in related work that significantly impact the performance of

recommendation systems. In this section, some of the unsolved problems of the

previous research works are highlighted which are still affecting the performance of

current LBRS. These problems include:

1.4.1 Scalability

In memory-based CF recommendation systems, user rating data are utilized to apply

simple methods to perform similarity computations between items or users [36-38].

Neighborhood based CF (e.g., K-nearest neighbor) is one of such kind of memory-

based CF approaches. However, scalability is the main issue in such systems as a major

requirement is the real-time parsing of massive volumes of data. If not done so, results

in poor efficiency and performance. Consequently, such systems are not able enough

to handle big data. To overcome the scalability problems, model-based CF is applied in

some of the existing techniques [26, 33, 58]. Compared to memory-based approaches,

model-based approaches are better in the sense that such approaches help in reducing

the size of the user-item rating matrix that decreases the online processing time [1, 27].

1.4.2 Cold Start

Cold start is the problem when the system generates recommendation for a user who is

new to the system. Such problem occurs in many existing CF-based recommendation

systems [31, 36]. When the system does not have enough records available from the

new user, it is almost impossible to compute similarity measures. Inadequate entries

causes zero values of similarity calculations that poorly affect the performance and

quality of recommendations.

Page 26: Welcome to Pakistan Research Repository: Home

7

1.4.3 Data Sparseness

Users visiting limited numbers of locations results in sparse user-to-location check-in

matrix. The negative effect of data sparseness is the non-optimal computations of the

nearest neighbor set of similar users with the particular user that also affect the accuracy

of recommendations. Moreover, performances of the existing LBRS are also affected

by the sparseness of user-to-user relationship matrix when directly manipulated with

CF-based models [31].

1.4.4 Over Specialization Problem

Over specialization problem occurs when recommendation systems restrict user to get

only those recommendations that match a user’s preferences or that are already rated

by the user [59]. The problem prevents users to discover new items or locations.

However, many recommendation systems also focus on diversity as an important

feature of the recommendations. Diversity indicates how distinct the recommendations

are when compared to each other [60]. At the same time, the focus should also be on

maintaining the balance between diversity and similarity [3, 61, 62]. Therefore, a

tradeoff between diversity and matching user preferences is still a challenging problem

for the recommendation systems.

1.4.5 Recommendation of Popular Objects

In some cases, the focus of the recommendation systems is to recommend only those

items that are popular amongst others and are likely to be highly rated by the users. In

this case, the items or locations that are less popular may be overlooked [63]. Popular

items or locations are easy to discover by users, as compared to unpopular items.

Therefore, a recommendation list must also include the less popular items or locations

that are unlikely to be discovered by the users.

1.4.6 Attacks on Recommendations

One of main challenges faced by recommendation systems is security issues.

Recommendation systems are widely used in e-commerce applications and are likely

to be targeted by malicious attacks. The attackers may try to hinder or promote some

locations or items unjustly [64]. Therefore, keeping in view the threat of attacks, a good

recommendation systems should be equipped with wide scale of tools in order to

prevent different kind of attacks. In [65], the authors exposed eight different strategies

Page 27: Welcome to Pakistan Research Repository: Home

8

of attackers which must be considered by the recommendation systems in their

prevention tools.

1.5 Scope of Research

In section 1.4, some of the challenges faced by recommendation systems are

highlighted. Among the challenges referenced above, this research focuses on the major

challenges mentioned in the state of the art literature. The challenges addressed in this

work include: (a) cold start, (b) scalability, and (c) data sparseness. Moreover, real-time

parameters for venue recommendations are also considered. Subsequent sections

elaborate the challenges addressed in this research.

A user’s real-time location is an important parameter for recommendations. Other

parameters may include a user’s current speed and direction. All of the aforementioned

parameters, including a user’s past interaction pattern constitute a user’s context.

Nowadays, smart phone manufacturers embed various kinds of sensors that allow real-

time tracking of user activities. Considering real-time varying context, recommendation

is challenging because of large diversity in user’s context and historic patterns [66, 67].

For generating the real-time recommendation, the recommendation system needs to

consider the following factors: (a) personal preferences, (b) past check-ins, (c) current

context, such as time and location, and (d) collaborative social opinions (other

individuals’ preferences), (e) ratings of the users, and (f) distance from the location. In

most of the existing literature [68-70], these features are simply ignored to keep the

model trivial, however, at the cost of recommendation quality.

The cold start problem in state-of-the-art collaborative filtering (CF) recommendation

systems [71] is noticed when recommendation is generated for a new user. The main

reason of this behavior is that system has only limited information for a new user to

perform similarity measures. As a result, records output zero values of similarity

computations which reduce the recommendation quality.

Most of the traditional recommendation systems were not designed to be scalable due

to small amount of historical data [36, 72, 73]. The performance of such systems

degraded as the data size increased continuously. To handle the scalability issue, a few

works considered applying data mining and machine learning techniques to

reduce/filter out insignificant data to reduce the computation time [1, 31]. However,

there exists a tradeoff between the reduced dataset size and quality of recommendation.

Page 28: Welcome to Pakistan Research Repository: Home

9

The reduction in dataset size may help in faster online processing, but at the cost of

decrease in recommendation quality [58]. Therefore, a careful approach needs to be

devised to enforce a balance between recommendation quality and dataset size.

Another problem faced by the traditional recommendation systems is data sparsity. [74-

76]. Data sparseness in recommendation system occurs when there is limited data in

users-item matrix. Users visiting limited numbers of locations or a few user ratings with

limited number of items results in sparse user-to-location check-in or user-to-item

matrix. The negative effect of data sparseness is the non-optimal computations of the

nearest neighbor set of similar users for a particular user that in turn affects the accuracy

of recommendations. Moreover, performance of the existing recommendation systems

are also affected by the sparseness of user-to-user relationship matrix when directly

manipulated with CF-based models [31].

1.6 Motivation

The popular social networks are the main derivation for the explosive usage of smart

phones. Recent years have seen a prolific increase in network-enabled mobile devices.

According to a market research report, in year 2016 alone a total of 1495.36 million

smart phones were sold with Wi-Fi capability and it is further estimated that one-third

of the world’s total population is projected to own a smartphone by 2017 [77]. One of

the popular application areas of fully connected networks is mobile social networks.

These mobile social networks offer variety of services for smart phone users to share

their experiences and contents in massive size. Until January 2017, Foursquare has

gathered almost 9 billion check-ins. Moreover, over 55 million users use different

services of the Foursquare each month [28]. The above mentioned social networking

site collects different information from the users, such as users’ check-ins which

includes users’ geographical information and users’ comments about the venue. Users

of these applications are provided with options to enter “check-in” information at

different venues to share their experience and knowledge by giving a tip or feedback

[3, 26, 27]. Moreover, these services also keep track of the users’ geospatial check-in

data such as time and longitude/latitude [3]. More recently, the availability of huge

volumes of users’ geospatial data has motivated the research community to focus their

efforts on the design of various Venue Recommendation Systems (VRS) that are based

on the extracted information and data from mobile social networking applications [1,

3, 26, 27]. Such systems perform recommendations of different venues to users that are

Page 29: Welcome to Pakistan Research Repository: Home

10

directly related to the users’ preferences. The main focus of our research is to develop

model to generate real-time venue recommendations for the users by incorporating

MapReduce framework to process large-scale data. The factors that are considered in

this research include users’ preferences, current context, past check-ins, rankings,

geospatial characteristics, and collaborative social opinion.

The second area of recommendation systems targeted in our thesis is health

recommendation. With the increase of smart phone usage in recent past, the trend of

using the smart phone for health related applications has also seen a significant increase.

Recent study [78] showed that 62% of smart phone users queried health related

information. One of the emerging health related recommendation systems area is

diet/food recommendation. Due to diversity in food items/components and large

number of dietary sources, it is a challenging task to perform balanced selection of diet

patterns that must fulfill one’s nutrition needs. Particularly, selection of proper diet is

critical for patients suffering from various diseases. Therefore, a systematic food

recommendation system is desired to recommend the appropriate food considering the

disease of the person. The major challenge in designing such a system is the handling

of greater volumes of data in terms of ingredients, quantity, nutrition facts, people’s

preferences, and simultaneously taking into consideration a person’s pathological

reports. The system must be scalable enough to handle recommendation queries from

all over the globe. A solution to the aforementioned challenge is the use of cloud

computing.

1.7 Contributions

Our three main contributions are: (a) quantitative analysis of selected techniques on

different datasets, (b) designing of real-time venue recommendation system (VRS) for

individuals, and (c) food recommendation to a user. The most popular social networks

such as Foursquare, Facebook, and Twitter are the main services used by numerous

users across the world. Similarly, variety of other social networking sites and

applications share huge amount of unstructured data in every single minute of the day.

There are numerous challenges in handling and parsing the huge volume of data. As a

first step, a quantitative analysis is conducted on state of the art recommendation

techniques over datasets of MovieLens, Gowalla, and Foursquare. The objective was

to analyze the behavior of different datasets in terms of precision, accuracy, and F-

Page 30: Welcome to Pakistan Research Repository: Home

11

Measure on the selected techniques. Four techniques are compared namely: (a)

Location Aware Recommendation System (LARS), (b) Geographical Probabilistic

Factor Model (GPFM), (c) Latent Dirichlet Allocation (LDA), and (d) Random Walk

with Restart (RANDOM). Critical analysis of the aforementioned techniques has been

conducted against all of the datasets in terms of precision, recall, and F-Measure. The

analysis is presented in Section 2.8.

The second contribution of this dissertation is regarding real-time venue

recommendation system. To achieve the objective of real-time venue

recommendations, a hybrid model is developed by combining the CF method with K-

means clustering technique and K-Nearest Neighbor (KNN) ranking technique.

Moreover, MapReduce model has been used for clustering venues in the foursquare

dataset, which are processed in parallel on cloud. For clustering potential venues, K-

mean clustering technique is used and for ranking of top venues for final

recommendations KNN ranking technique is used. Cloud-based infrastructure is

utilized to process, compare, mine, and manage datasets for real-time recommendations

in a scalable architecture.

Third contribution of the dissertation is related to food recommendations. Due

to diversity in food components and large number of dietary sources, it is challenging

to perform real-time selection of diet patterns that must fulfill one’s nutrition needs.

Particularly, selection of proper diet is critical for patients suffering from various

diseases. This study highlighted the issue of selection of proper diet that must fulfill

patients’ nutrition requirements. To address this issue, a cloud based food

recommendation system is presented, called Diet-Right, for dietary recommendations

based on users’ pathological reports. The model uses ant colony algorithm to generate

optimal food list and recommends suitable foods considering users’ pathological

reports. A high-level overview of problems tackled in this dissertation along with

proposed techniques is provided in Figure 1.1.

Page 31: Welcome to Pakistan Research Repository: Home

12

Figure 1.1 Overview of Problems Tackled and Contributions

The list of published and submitted articles from the above mentioned contributions is

as follows.

Published Articles:

1. Rehman, F., Khalid, O., & Madani, S. A. (2017). A comparative study of

location-based recommendation systems. The Knowledge Engineering Review,

32.

Chapters Challenges & Contributions

Challenges • Large-scale data sets • Lack of standardized models/formulas

Contributions • Quantitative Analysis on the datasets • Datasets used: Foursquare, MovieLens, and Gowalla.

Quantitative Analysis

(Chapter 2)

Challenges • Real-time factors not considered • Not scalable

Contributions • Clustering using K-Means • Ranking using KNN • Cloud Based model for scalability • Real-Time Venue Recommendation

Real-Time Venue Recommendation System

(Chapter 3)

Challenges • Diversity in available food components and each

individual's nutrition requirements for specific disease

Contributions • Computation of required proportion of nutrition based

on pathological reports for a person's specific disease. • Use of Ant colony to compute optimal solution • Cloud based implementation to address scalability and

to generate real-time food recommendations

Smart Food Recommendation System

(Chapter 4)

Page 32: Welcome to Pakistan Research Repository: Home

13

2. Rehman, F., Khalid, O., Bilal, K., & Madani, S. A. (2017). Diet-Right: A Smart

Food Recommendation System. KSII Transactions on Internet & Information

Systems, 11(6).

Submitted Article:

1. Rehman, F., Khalid, O., & Madani, S. A. (2017). Real-Time Context Aware

Cloud Based Venue Recommendation System. IEEE Access.

1.8 Organization of Dissertation

The rest of the dissertation is organized as follows. Chapter 2 provides an overview of

existing works that are closely related to this research. Particularly, the criteria for

recommendations is illustrated, and basic similarity measures followed by evaluation

of recommendations. Moreover, the challenges faced by recommendation systems are

presented. Furthermore, the models and techniques proposed in the literature for LBRS

and food recommendations have been critically investigated. Chapter 3 presents the

Real-Time Venue Recommendation (RTVR) model based on cloud infrastructure and

service-based interfaces in order to process, compare, mine, and manage large datasets

for real-time recommendations in a scalable architecture. This research used

collaborative filtering along with K-Mean clustering to pre-process large dataset. For

ranking, K-Nearest Neighbors is used to compute the personalized ranking list for

venues. The quantitative analysis of the datasets used in recommendation system is also

performed. In Chapter 4, a smart food recommendation system is presented. This

chapter present a cloud based food recommendation system, for dietary

recommendations based on users’ pathological reports. Hybrid technique have been

used by using knowledge based technique along with Ant Colony Optimization for

generating ranked list of optimized food items. Chapter 5 presents discussion on various

challenges in developing scalable recommendation systems and our contributions have

been highlighted to handle these challenges. Moreover the chapter also provides some

opportunities in the area of recommendation systems and directions for the future work.

Page 33: Welcome to Pakistan Research Repository: Home

14

Chapter 2

Overview of Recommendation Systems

Page 34: Welcome to Pakistan Research Repository: Home

15

2.1 Overview

Recommendation systems have a profound association with cognitive science [79] and

information retrieval [80]. For the past few years, there has been significant research in

the area of recommendation systems [31, 65, 81-90]. In the literature, authors used

different approaches to deal with the challenges of personalized data retrieval from

diverse sources of information [11, 91, 92]. Majority of the existing work utilized

techniques, such as content based filtering and collaborative filtering with an intention

to balance various factors, such as accuracy, diversity, novelty, and familiarity [27, 59].

Moreover, the existing work also attempted to deal with the challenges, such as

scalability, cold start, and data sparseness that are common in recommendation systems

[71, 74]. Figure 2.1 shows the contents of recommendation systems in terms of types,

algorithms, and data sets. The major algorithms used in recommendation system are

content based, collaborative filtering based, and hybrid techniques. These algorithms

are used for information filtering on different datasets that include user’s profiles, user’s

trajectories, user’s location information, and user’s preferences. Finally, different types

of recommendations are generated that include locations, routes, movies, products, and

books. Figure 2.2 shows a general hierarchy of the recommendation techniques.

Figure 2.1 Components of Recommendation Systems

Recommendations

Locations

Routes

Products

Movies

Books

Algorithms

Content-Based

CF-Based

Hybrid

Datasets

User's Profile

User's Preferences

User's Locations

User's Trajectories

Page 35: Welcome to Pakistan Research Repository: Home

16

Figure 2.2 Hierarchy of Recommendation Systems

2.1.1 Content-based Recommendation Methods

The main idea of content-based or cognitive methods (e.g., [89, 93-95]) is to recognize

the common characteristics of particular items that were already evaluated and rated by

users. Based on those characteristics, content-based system finds and recommends a

new item that shares the identical characteristics to the user’s preferences. Moreover,

detailed information about the item or user is implicitly presented in the form of a

feature vector [96]. For other items, such as text documents [94, 95], and web

documents [89, 93], the feature vector usually comprises the Term Frequency-Inverse

Document Frequency (TF-IDF) weights of the most frequent keywords [80]. The TF-

IDF approach was also used to predict ratings of a user for any new item [94]. Bayesian

approaches were also used in the literature for the same purpose [89, 94, 97].

Recommendation Systems

Conent Based

Collaborative Filtering Based

Memory-based

User Based

Item Based

Model-based

Latent Semantic Analysis

Bayesian Clustering

Support Vector Machines

Maximum Entropy

Latent Dirichlet Allocation

Hybrid

Cascade

Weighted

Mixed

Switching

Feature Augmentation

Feature Combination

Meta-Level

Page 36: Welcome to Pakistan Research Repository: Home

17

Content-based recommendation systems suffer from two main problems: (a)

inadequate analysis of contents and (b) over specialization [98]. Inadequate analysis of

contents stems from the circumstance where the recommendation systems have limited

or incomplete information about the contents of the item or about the users. There are

numerous reasons behind such lack of information. For example, privacy is a major

concern for many users that might restrict provision of personal of information of the

users. Similarly, information about items such as music or images is costly or difficult

to acquire [27]. Finally, in some cases, the information about the item is inadequate to

evaluate the quality of the item [99]. For instance, it is quite difficult to differentiate

between a well written article and a poorly written article when both of the articles use

the same terms. On the other side, over specialization is the problem that is a side effect

of the methodology in which recommendation systems recommend new items. The

rating predictions of a user is high for an item if the characteristics of the item are

similar to the ones already rated high by the same user. For example, a recommendation

application for a movie may recommend a movie of the same category to the user, if

the user has rated movies of the same category previously. Similarly, a system may

recommend a movie to the user which has the same actors to that of the previously rated

movie. Because of the nature of the content-based recommendation technique, the

system does not consider any other movie that is different yet might be fascinating to

the user. For the aforementioned issue, solutions were proposed that introduce diversity

in recommendations by adding some randomness in the recommendations [98] or

filtration of too similar items [94, 100].

2.1.2 Collaborative Filtering based Recommendation Method

Unlike content-based methods, in which the main idea is to recognize the common

characteristics of a particular item that has already been evaluated and rated by a user,

collaborative filtering (CF) based methods depend on the ratings of a user along with

other users’ ratings in the system [85, 98, 101]. CF is the most commonly used

technique for the recommendation problem [1, 27, 31, 33]. Although CF-based methods

are frequently used with other filtering techniques such as knowledge based or content-

based, the main objective of using CF-based recommendation systems is to locate the

subset of similar users who have similar profiles and preferences. Rating by a user ��

for an item � is expected to be similar to rating by another user ��, if and only if �� and �� had followed a similar pattern in rating other items [98]. Similarly, �� is likely to

Page 37: Welcome to Pakistan Research Repository: Home

18

rate two items i and j in a same manner, if similar rating has been given to both items

by other users [88]. The reason of such likeliness among the users is because of the

greater values of similarity parameter among the users. The CF-based recommendation

systems function by matching a particular user’s items record (stored in a matrix) with

other users’ stored record. The matrix must contain users’ visited locations and the

number of visits to each location. The CF-based methods present valuable

recommendations to a given user by extracting the ratings shared by similar users on

the items.

CF-based methods eliminate certain existing problems of content-based

recommendation systems. For instance, when the rating information about an item is

needed, and it is not available or difficult to acquire, CF-based models can still generate

recommendations for the users through the feedback and ratings of other users.

Moreover, in CF-based methods, users mostly rate an item keeping in consideration the

quality or ratings of the item. This is not the case in content-based methods that mostly

rely on content matching, which may lead to poor quality of recommendation.

The two generic classes of CF approaches are memory-based and model-based.

In the memory-based CF method, often referred to as neighborhood based [98] or

heuristic based [83], user-item rating matrix stored in the system is directly accessed

and used for the prediction of ratings for new items. Memory-based models are further

categorized as item based and user based, as reflected in Figure 2.2. In item based

methods [85, 98], the prediction of rating for a user is based on the already stored ratings

of the similar items by that user. The system considers two items similar if and only if

most of the users rated that item in the same way. Alternatively, user based approaches

[84] consider a user’s interest towards an item using the previously stored ratings of

other users for the same item. The similarities are calculated among all the pairs of users

based on the users’ ratings. The similarity formulas, such as cosine similarity or Pearson

correlation are used for calculating the similarity among the users [21]. The resultant

correlated users are known as neighbors that are used for the rating prediction of the

other users. In model-based approaches, data mining and machine learning algorithms

are applied to train the probabilistic models for various patterns. Compared to memory-

based approaches, model-based approaches are better in a sense that these approaches

help in reducing the size of the user-item rating matrix that decreases the online

processing time [1, 27]. Also, model-based methods perform categorization of users’

Page 38: Welcome to Pakistan Research Repository: Home

19

preferences that may include some hidden factors. For instance, without actually

defining any notion such as “suspense” or “horror”, a movie recommendation system

recommends a movie that is both suspense and horror [102]. In such a situation, model-

based approaches determine a user’s preference about a movie, without the user

explicitly stating the preference [102]. Alternatively, memory-based methods extract

associations in the user-item rating matrix. As a result, the recommendation system may

recommend a movie that is against one’s taste or a movie that is not very popular

because one of the user’s nearest neighbors highly rated that movie. Techniques

commonly used in model-based recommendation systems include Bayesian Clustering

[98], Latent Semantic Analysis [103], Latent Dirichlet Allocation [104], Maximum

Entropy [73], and Support Vector Machines [102]. Table 2.1 shows the comparison of

the above mentioned techniques.

Table 2.1 Comparison of Content Based and CF Based Techniques

Technique Procedure References

Content

Based

Use information about the features

of the items and rating provided by

the users. Ratings are combined to

the preferences of the users based on

the features of the items already

rated by the same user.

[20, 76, 94-96, 105, 106]

CF Based

Use information about the ratings of

a user along with other users’ ratings

in the system. The main objective is

to locate the subset of similar users

who have similar profiles and

preferences.

[21, 65, 87, 96, 107-111]

2.1.3 Hybrid Recommendation Methods

Combination of two or more techniques form a hybrid recommendation systems.

Usually, for achieving the best performance, techniques are combined in a way that the

methods with few drawbacks are chosen [112]. Collaborative filtering is the most

Page 39: Welcome to Pakistan Research Repository: Home

20

commonly used technique for combination with any other technique [113]. This is

because the main objective of using CF-based recommendation systems is to locate the

subset of similar users who have similar profiles and preferences. Some of the

combination methods used for the creation of hybrid recommendation systems is shown

in Table 2.2.

Table 2.2 Description of combinations used in Hybrid Technique

Hybrid Methods Description

Weighted [114]

The weights, votes, scores, ratings of different

techniques are combined together.

Mixed

[86]

Different recommendations by multiple ranked list are

presented simultaneously.

Switching

[115]

Switching between different techniques is done by the

system according to the situation.

Cascade [116]

Refinement done by one technique on

recommendations offered by other.

Feature Combination

[113]

Combination of features of different recommendation

data sources

Feature Augmentation

[112] Input of technique is the output of other technique

Meta-Level [117]

Input of the technique is the complete model of other

technique

2.2 Criteria for Recommendations

In recommendation systems, a user’s main concern is the quality of the

recommendations generated for an item or location. User reviews the resultant

recommendations offered by the system and rate those as appropriate or inappropriate

depending on their personalized preferences. In addition, it is important for the

recommendation systems to present such recommendations in a manner that is

acceptable to the users. Therefore, presentation of the recommendations must be

handled carefully to influence the users to accept the recommendations [118]. Several

questions must be addressed for instance, would the recommendations include only top

ranked items or locations, and similar ones? Or would the recommendations include the

popular locations or items, and also well-proportioned highly ranked ones? Similarly,

another key aspect of recommendation systems is trust [119]. The user has more trust

on the system if the recommendations generated by the system best match the user’s

Page 40: Welcome to Pakistan Research Repository: Home

21

preferences. Therefore, a user’s expectation is directly dependent on trust in the system

[61, 120]. The recommendation systems also affect a user’s confidence in the system

that whether the generated recommendations are satisfying a user or not. The users are

more likely to accept the recommendations if their confidence in the system is high

[121]. In the following subsections, some of the criteria for the recommendation

systems is discussed that must be considered to build trust and confidence of the users.

2.2.1 Accuracy

To evaluate a recommendation system, one of the most dominant and important

criterion is accuracy [122]. Numerous works have been carried out in the past decade

to improve accuracy of recommendations [123]. The most commonly used metric in

the rating based recommendation systems is Mean Absolute Error (MAE) [87]. MAE

measures the difference between the ratings predicted by the algorithm and the actual

ratings of the users [118]. Similarly, for content-based recommendation systems,

“switching task” is used to measure the accuracy [86]. Switching task measures the

change of user preferences before and after the recommendation [86]. If some user

changes his/her preferences after the recommended list, the accuracy of the system will

be decreased [123]. In recent years, the focus of the research is shifted towards users’

perceived accuracy. User’s perceived accuracy measure is the degree of users’

satisfaction level [112]. If the recommendations generated by the system best matches

a user’s interest and preferences, the degree of the user’s perceived accuracy will be

high. Perceived accuracy has more direct impact on user’s trust level as compared to

objective algorithm accuracy [114]. However, it was also shown by some researchers

that both perceived accuracy and objective accuracy do not have a direct correlation

with each other [123]. It is concluded from different studies that for recommendation

systems the user-based collaborative filtering based on K-nearest neighbors has been

verified to achieve high accuracy [Refer].

2.2.2 Familiarity

Another criterion for the recommendation systems is familiarity. The most familiar item

is most likely to be recommended [118]. As compared to unfamiliar recommendations,

users’ liking for familiar recommendation is high. The study conducted in [118] showed

that when the items that are highly liked by a user are included in the recommendations,

the items increase the user’s trust in the system. The study also showed that the users

Page 41: Welcome to Pakistan Research Repository: Home

22

are more likely to be interested in the familiar items rather than unfamiliar ones. In the

study conducted in [37], familiar road segments were recognized and then familiar road

networks were constructed using historic trajectories to generate personalized routes.

While the preference of a user is always towards the familiar recommendations, the

user also needs new recommendations to which the user was previously unfamiliar. For

example, if LBRS recommends a similar restaurant to a user, the user might feel that

the system is not capable of recommending new restaurants that are of different taste

from the user [61].

2.2.3 Novelty

The limitation shown in the previous criterion of the recommendation systems can be

handled by introducing novelty in the recommendations. Novelty is another important

criterion that needs to be considered by the recommendation systems. Novelty of the

recommendation can be referred to as how the recommendation is different with respect

to previously seen recommendations. The main idea of introducing novelty is to provide

users with fresh recommendations. Novelty is also sometimes referred to as

“serendipity” [122]. However, the difference between novelty and serendipity was

elaborated in [123], where novelty means “new” and serendipity means “new and

surprising”. Therefore, it can be concluded that novelty is the extent to which

recommendations are newer for a user [61]. In a study conducted in [3], the researchers

discovered that users give high ratings to the music recommended by Pandora. Pandora

is considered as a novel music recommendation systems because it frequently provides

listeners with latest music tracks.

2.2.4 Diversity

Another key criterion for recommendation systems is diversity [118]. Diversity and

novelty are different notions though both are closely related. As discussed earlier,

novelty of the recommendation can be referred to as how the recommendation is

different with respect to previously seen recommendations. Whereas, diversity

indicates how distinctly dissimilar the recommendations are when compared to each

other [60]. In item-to-item collaborative filtering algorithm, a user is trapped in a

“similarity junction” that provides the user with similar recommendations according to

his/her or friends’ preferences [122]. Therefore, the focus of the researchers is shifted

towards diversity in recent years. At the same time, the focus should also be on

Page 42: Welcome to Pakistan Research Repository: Home

23

maintaining the balance between diversity and similarity [3, 61, 62]. Moreover, highly

diverse recommendations also affect the similarity and accuracy criteria for the user.

Therefore the diversity of the recommendations should be provided with a balance in

aforementioned tradeoffs [3]. The study showed that users like diverse

recommendations as compared to more accurate recommendations [60]. Confidence of

a user may be affected when the user receives low diversity recommendations [118]. It

was also showed in another study that the satisfaction level of users drive beyond

accuracy and that the users are more likely to accept more diverse recommendations

[62].

2.2.5 Context Compatibility

Context compatibility is a criterion for recommendation systems in which the current

context should be considered before the final recommendation [118]. For example, if

a user wants to dine in a restaurant, the current context includes current location, food

choice, accomplices, and weather conditions. Similarly, if a user wants to watch a

movie, the current context should be considered because the user’s preferences for

watching a movie with friends may differ compared to watching a movie with family

[124]. One of the advantages of adding contextual considerations in the

recommendation systems is that it helps new users to get recommendations instantly

without adding robust profiles [118].

2.2.6 Justification of Recommendations

For user satisfaction, it is not enough to give recommendations according to the user’s

preferences and ratings. In addition, the user must understand the criteria of selection

of items for recommendations. Studies showed that a user’s satisfaction and trust level

is enhanced in those recommendation systems that also provide good justifications with

the recommendation list [102]. Some of the primary goals of including the justifications

to recommendations are transparency, effectiveness, and smoothness [102]. After

realizing the importance of justifications of recommendations, commercial websites

such as Netflix, Pandora, and amazon have also added features, such as “why this was

recommended” on the web pages [118].

Page 43: Welcome to Pakistan Research Repository: Home

24

2.2.7 Sufficiency of Information

The last criterion for recommendation systems is to provide sufficient information with

the recommendation list to facilitate the user and help enhancing the decision making

process of the user. For example, when Amazon recommends a book to a user, the

information needed by the user is sufficient. The title of the book, author name, edition,

binding hardcover or paperback, current ratings, and price of the book, all the necessary

information, is displayed with recommended book. Studies showed that the availability

of the descriptive information about the individual item positively correlated with

perceived effectiveness and ease of access of the recommendation systems [123]. For

example, if recommendation systems recommend a shopping mall to a user, then the

user would like to have more detailed information about the shopping mall, such as its

distance from the user, the shortest path to the shopping mall, driving routes, and the

description of different shops in the shopping mall. Therefore, the recommendation

systems must consider the criterion of adding sufficient information with the

recommendations. Table 2.3 is the summary of criteria for recommendations applied in

various research works.

Table 2.3 Criteria for recommendations in different research

Tec

hn

iqu

es

Acc

ura

cy

Fam

ilia

rity

Novel

ty

Div

ersi

ty

Con

text

Com

pati

bil

ity

Ju

stif

icati

on

of

Rec

om

men

dati

on

s

Su

ffic

ien

cy o

f

Info

rm

ati

on

Gao, H., et al. [21] � � � � � � �

Chen et al. [114] � � � � � � �

Pu et. al. [118] � � � � � � �

Shani et al. [119] � � � � � � �

Rana et al. [122] � � � � � � �

Pu et al. [123] � � � � � � �

Chang et al. [37] � � � � � � �

Liu et al. [61] � � � � � � �

Adomavicius et al. [3] � � � � � � �

Verbert et al. [124] � � � � � � �

Javari et al. [62] � � � � � � �

Desrosiers et al. [102] � � � � � � �

Page 44: Welcome to Pakistan Research Repository: Home

25

2.3 Similarity Calculations in Recommendation Systems

One of the most critical steps in recommendation systems is to compute the similarity

between users and items. The process of computing similarity starts by computing the

similarity between users and items, and then the recommendation system selects the

most similar items for recommendations. The process for calculating similarity is

identical in all the application areas of recommendation systems. For illustration

purpose, example of LBRS is given being one the focused areas of our research.

However, similar calculations can be applied to food recommendation systems.

The basic idea in computing the similarity between two locations l1 and l2 is to

identify the users who have rated both the locations and then apply similarity

computation methods to find the similarity �����. For example, in Table 2.4, user u1 and

u2 have both rated locations l1 and l3. Similarly, the similarity between two users u1, u2

can be calculated using the existing similarity computation techniques.

Table 2.4 User-location Matrix

Ratings of users on locations

User-location l1 l2 l3 . . ln

u1 4 3 5 . . -

u2 3 - 4 . . -

u3 3 5 - . . -

. . . . . . -

. . . . . . -

un - - - - - -

A variety of different techniques exist to compute the similarity between users

and locations [33, 125-127]. The most common among such techniques include cosine

based similarity, correlation based similarity, and adjusted cosine similarity.

2.3.1 Cosine Based Similarity

In cosine based similarity, two different locations are the two vectors in the user-

location matrix. The cosine similarity between two locations is calculated by computing

the cosine of the angles between the locations [126]. Suppose a matrix with � × �

Page 45: Welcome to Pakistan Research Repository: Home

26

user-to-location as shown in Table 2.4, then similarity between two locations �� and �� can be calculated using the following formula [101]:

������, ��� = cos����, ��� =��� . �������� × ����� , (2.1)

where “.” represents the dot product of two vectors. Cosine based similarity is

considered to be computationally tractable [128]. Cosine similarity ranges between 0

and 1. One of the basic drawback of cosine similarity is that it does not show the

negative values of similarity that happens in cases when users have rated the different

set of locations. This deficiency of cosine similarity is taken into account in the Pearson

correlation coefficient. If the attribute vectors ����� and ������ are normalized by subtracting

the vector means, the measure is called centered cosine similarity and is equivalent to

the Pearson correlation coefficient. Pearson correlation is discussed in the next

subsection.

2.3.2 Correlation Based Similarity

In correlation based approach, the similarity between two locations �� and �� is

calculated using Pearson-r correlation ����,�� [127]. The calculation of Pearson-r

correlation starts with the isolation of different locations that are already rated by the

user u. For example, in Table 2.3, the locations �� and ��, both are rated by user ��. If a

set of users ��,��, �� … … . �� denoted by U, then the Pearson-r correlation similarity

is given by [81]:

��� ���, �� � = ∑ � ��,��

− ���� ��,��− �����∈�

�∑ � ��,��− ������∈� − �∑ � ��,��

− ������∈�

. (2.2)

In the above equation, ���,�� represents the rating of user �� for location �� and

� ��is the average rating of the l-th location. Compared to cosine similarity, Pearson

correlation evaluates to more accurate similarity computations as it incorporates the

negative similarity values as well. The negative similarity depicts how far the two users

are in their preferences. However, it is important to note that Pearson correlation has

two issues which must be taken into account while computing similarity between users,

items, or locations. The first issue occurs when one user has rated an item or location

but the other user has not rated the same item or location. As a result, a large set of

items could be ignored that are not commonly rated by both users, though, still those

Page 46: Welcome to Pakistan Research Repository: Home

27

items could have some impact on similarity computations. The other issue with Pearson

correlation is that users with only a few commonly rated items or locations could have

high similarities, despite the smaller item count. This could induce bias in the similarity

values, and the issue can be resolved by using significance weighting [129].

2.3.3 Adjusted Cosine Similarity

One of the main differences between the similarity computations of users and locations

is that the similarity of users is computed along the rows of the matrix and similarity of

locations are computed along the columns of the user-to-location matrix [33, 125].

Basic cosine similarity has one limitation that it does not calculate the difference in

rating scales of the users. In adjusted cosine similarity, the drawback of basic cosine

similarity is eliminated by subtracting the corresponding user’s rating average from

each co-rated pair of locations. The similarity between locations using adjusted cosine

similarity is calculated as [130]:

������, ��� = ∑ � ��,��

− ���� ��,��

− �����∈�

�∑ � ��,��− ���

���∈� − �∑ � ��,��− ���

���∈�

, (2.3)

where, � ��is the average rating of the �-th user’s rating.

The above mentioned similarity calculations are a primary step to find out

similar users or locations. After computing the similarity, the recommendation system

selects the most similar locations and generates the recommendation list accordingly.

A variety of performance metrics are used to evaluate the resulting recommendation

lists. In the next Section, the most commonly used evaluation metrics in the field of

LBRS are discussed.

2.4 Evaluation of Recommendations

To analyze quality of recommendations, it is important to evaluate the

recommendations using evaluation metrics. The use of evaluation metrics helps in the

comparison of various solutions proposed by the researchers and as a result,

recommendations have been improved gradually [88]. The existing evaluation metrics

have a standard formulization that is used for the testing and evaluations of the

recommendations [1].

A variety of evaluation metrics [1] are used, but the most common ones are

classified as prediction metrics, set recommendation metrics, rank recommendation

Page 47: Welcome to Pakistan Research Repository: Home

28

metrics, and diversity metrics. Prediction metrics are used to find the accuracy of

recommendations using Mean Absolute Error (MAE) [87], Normalized Mean Average

Error (NMAE) [131], and Root of Mean Squared Error (RMSE) [132]. Prediction

methods are also used to find the coverage. Set recommendation metrics include Recall

[4], Precision [4], and Receiver Operating Characteristics (ROC) [133]. Rank

Recommendation metrics include half-life [88] and discounted cumulative gain (DCG)

[110]. Diversity metrics include novelty and diversity of the recommendations [60].

Moreover, the validation process is completed using the most commonly used cross

validation technique known as k-fold cross validation [86] and random sub-sampling

validations [134]. The next subsection provides the elaboration of each of the evaluation

metrics in detail.

2.4.1 Prediction Metrics

To find accuracy of recommendations, researchers usually employ the calculations of

the most commonly used prediction error metric such as MAE and its associated

metrics, such as NMAE, MSE, and RMSE [119].

Suppose a set of users (��,��,

��,… … ��

) is denoted by U, a set of items

(��, ��, �� … … ��) is denoted by �, ��,� is the rating of user � for item �, � the lack of

ratings that is ��,� = � means user u has not rated item � , and ��,� is the prediction of

item � for user �. Let �� be the set of items rated by user � having prediction values

where ��,� = �� ∈ � | ��,� ≠ � ⋀ ��,� ≠ ��. The system’s MAE and RMSE are the

average of the users’ MAE. The prediction error is the absolute difference between real

values and the prediction, denoted as ��,� − ��,�. MAE [135] and RMSE [132] are given

by the following two formulas, respectively:

��� =1

|�|��

1

|��| ����,� − ��,� ��∈��

��∈�

.

(2.4)

��� =1

|�|�� 1

|��|����,� − ��,���∈��

.

�∈�

(2.5)

The metric Coverage measures the percentage of situations in which there are

chances of at least 1 k-neighbor of each active user that can rate the unrated item of that

Page 48: Welcome to Pakistan Research Repository: Home

29

active user [119]. Let ��,� be the set of neighbors of a user � that has rated an item �. The coverage of the system is the average of the user’s total coverage. Let

��= {i ∈ �|��,� = � ˄ �,� ≠ ∅}

and !� = "� ∈ �|��,� = �#,

then the coverage can be calculated using (2.6) [119]

$%&'�()' =�

|�|∑ |��|

|�|× 100�∈� . (2.6)

2.4.2 Quality of the set of recommendations

A user’s satisfaction not only depends on accuracy, but it also depends on being

provided with a concise as well as diverse set of recommendations. The combination of

accuracy, diversity, and concise items list compose the quality of recommendations

[111]. The most common recommendation metrics used for quality measurement are

precision, recall, and F1 [136]. Precision is the number of relevant recommendations

out of the total recommendations. Recall is the number of relevant recommendations

from the number of relevant recommendations. F1 is the combination of recall and

precision. F1 is generally used because of the advantage that it considers both the values

of precision and recall and returns the value of only positive results [136].

Suppose �� is the set of recommendations to user �, �� is the set of top �

recommendations to �. The relevancy threshold is � and the evaluation recall, precision

and F1 measures for obtained recommendations by taking n test recommendations to

user �. Assume that all the users take � test recommendations, then precision, recall

and F1 can be calculated by (2.7), (2.8), and (2.9) respectively [136].

��'$���%* =1

|�| � |"� ∈ +����,� ≥ ,#|*�∈�

. (2.7)

�'$(�� =1

|�| � |"� ∈ +����,� ≥ ,#|

|"� ∈ +����,� ≥ ,#| + |"� ∈ +����,� ≥ ,#|

�∈�

. (2.8)

F1 =2 × ��'$���%* × �'$(����'$���%* + �'$(�� . (2.9)

Page 49: Welcome to Pakistan Research Repository: Home

30

2.4.3 Quality of the List of Recommendations

A common issue faced by users of recommendation systems is when users select only

the first item from a large list of recommendations. Ignoring the rest of the list of

recommendations may affect the selection of recommendations as there may be some

quality recommendations down the list. To address the aforementioned issue, ranking

metrics are often used by researchers. Such ranking metrics include half-life and

discounted cumulative gain. In Half-life [1], when a user moves away from the

recommendations at the top, Half-life assumes an exponential decrease in the interest

of users. Half-life assumes that the selection probability of relevant recommendation

decreases exponentially down the list. Half-life is calculated using (2.10). In discounted

cumulative gain [137], it is assumed that the selection probability of relevant

recommendation decreases logarithmically down the list and can be calculated using

(2.11).

-(�. − /�.' =1

|�|�� max���,��

− 0, 02��� ��

����∈�

.

(2.10)

!�1� = 1

|�|����,��

+� ��,���%)�����

���

��∈�

.

(2.11)

In (2.10) and (2.11), the set of recommendation list is represented

by ���,��,�� … … ���, ��,�� is the true rating of the user � for the item �� , � is the rank

of the evaluated item, � is the position of the item in the list such that there is 50%

chance that the user will rate that item, and � is the default rating.

2.4.4 Novelty and Diversity

Novelty metrics calculate the difference between the recommendations recommended

to the user with those already known by a user that are significant. Alternatively,

diversity metric calculates the internal difference of the recommendations. At present,

no standard metric was defined for novelty and diversity. Therefore, different metrics

are proposed by researchers [60]. Most of the authors used the following mathematical

calculations to find the novelty and diversity in recommendations [100].

0�&'���23�� = 1

| �|| �| − 1� � 41 − �����, 5�6

�∈�� ����∈��

. (2.12)

Page 50: Welcome to Pakistan Research Repository: Home

31

*%&'�23� =1

| �| − 1�41 − �����, 5�6�∈��

, � ∈ � . (2.13)

In (2.13), set of � recommendations is represented by �� , and item-item

memory-based similarity measure is represented by �����, � .

2.4.5 Stability

A user has more trust in a recommendation systems when the recommendations

generated by the system best match the user’s preferences. A recommendation system

is known as stable when the recommendations generated do not deviate over a short

period [138]. The metric defined for the evaluation of the stability of recommendation

systems is Mean Absolute Shift (MAS) [124]. The MAS metric consists of a set of

known ratings �� known by the user and a set of unknown ratings !�. After a period of

time, the user rated some of the unknown ratings and the new recommendations !� are

generated by the system. Now, MAS can be calculated as (2.14) [124].

�2(7���23 = ��� =1

|8�|� |8��9, ��− 8�(9, �)|

�,�∈��

. (2.14)

2.4.6 Reliability

When a user gets a recommendation, it is important to know whether the

recommendation is valuable for the user or not. A valuable recommendation is

considered reliable for the user. The most commonly used metrics to find reliability of

the recommendations are Pearson's Correlation (COR), the Mean Squared Difference

(MSD), Pearson's Correlation Constrained (CORC), Spearman's Rank-Order

Correlation (SR), and the Jaccard plus MSD (JMSD) [139]. The reliability measures

are proposed according to the notion that “more reliable a prediction, the less liable to

be wrong” [1]. However, the reliability metric is just used to evaluate the

recommendation systems based on the K-nearest neighbor algorithm. It is based on the

numeric factor ��,� as shown in (2.15) and "�,� as shown in (2.16), where ��,� the

similarity of the neighbors used for computing recommendations, ��,� and "�,� is the

dissimilarity among the ratings of the neighbors. The reliability is calculated using

(2.17) [119].

.���,� = 1 − �̅

�̅� ��,� ,

where ��,� = ∑ ���(9,&)�∈��,� .

(2.15)

Page 51: Welcome to Pakistan Research Repository: Home

32

.��:�,� = ;����������,�

�������< ��.

���� ��������� ���� , :�,� .

(2.16)

�'��(7���23 =∑ ����9,&���,� − �̅� − ��,� + �̅��∈��,� ∑ ���(9,&)�∈��,�

. (2.17)

In (2.15), (2.16), and (2.17), �̅ and $̅ are the medians of ��,� and "�,� ,

respectively. ��,� is the set of neighbors of user u that have rated the item I, and min-

max is the discrete rating values. Table 2.5 summarizes the evaluation metrics used in

various research works.

Table 2.5 Summary of Evaluation Metrics

Criteria Metrics References

Accuracy MAE, NMAE, MSE, and RMSE [88, 119]

Quality of set of

Recommendations

Precision, Recall, and F1 [111, 136]

Quality of the List of

Recommendations

Half-Life, DCG [1, 137]

Novelty and Diversity

No standard metric defined [60, 100]

Stability

MAS [124]

Reliability COR, MSD, CORC, SR, and

JMSD

[1, 139]

2.5 Cloud Computing in Recommendation Systems

Cloud computing is one of the most utilized IT paradigm providing internet-based

computing. End users can access the remote cloud resources via an internet link

eliminating the need of local dedicated computing resources. Cloud computing

provides a set of server, network, and storage devices housed in data centers for various

applications. The data centers can vary in size from a small room to the size of a football

stadium based on the requirement of the IT facility. A limitless number of applications

utilize cloud computing facilities including but not limited to, image processing, big

data analysis, recommender systems, and remote file storage. The applications and their

Page 52: Welcome to Pakistan Research Repository: Home

33

data are hosted over cloud servers housed in the data centers. Usually, more than one

application resides over a cloud server as the cloud manages the applications flexibly

based on their resource demands. As a result, pay-as-you-go pricing model is created

that favors both small and large business enterprises. Business enterprises avoid capital

costs of acquiring computing infrastructure and exploit cloud services with monthly or

yearly payment plans based on the overall resource utilization. Cloud computing has

gained immense popularity due to the flexible characteristics. Moreover, multiple

businesses and industries have migrated their computational requirements to the cloud

infrastructure. As a result of the increasing popularity, the cloud computing industry

had an annual revenue of $250 billion in 2016 [140]. Moreover, cloud computing

industry consumed 2.4% of the global electricity [140].

Cloud computing is an ideal execution platform for applications that either

require large computations or store data extensively. Big data applications are an ideal

integration for cloud computing paradigm as they require extensive data processing and

storage for analysis that reveal trends and associations among the data. The cloud data

centers housing thousands of compute and storage nodes facilitate big data applications

with flexible resource management techniques. The data for big data applications is

captured from sources, such as IoT, sensors, and mobile devices and stored over the

cloud servers [141]. Further, big data applications execute over the data to produce end

results and reports after extensive analysis.

Applications of recommender systems, image processing, e-health, bio-

informatics, and smart cities can be categorized as big data based on the scale of their

volume, variety, velocity, and value. Any scientific application integrating with cloud

computing has to develop tools for parallel processing that exploit the cloud resources

efficiently for optimizations. Big data and cloud computing have an application based

relationship in the form of Hadoop and MapReduce frameworks [142]. Hadoop is a

distributed framework for data storage and processing that can be easily mapped on the

cloud servers. The distributed data storage of Hadoop is based on Hadoop Distributed

File System (HDFS) while the compute component of Hadoop is based on the

MapReduce programming model [142]. In terms of big data, the Map element performs

a query such as sorting on the data, while the Reduce element summarizes the query

results to form a report after certain query based analysis. Conventionally, the Hadoop

and MapReduce frameworks provided scalability but did not provide efficiency when

Page 53: Welcome to Pakistan Research Repository: Home

34

compared to parallel databases [143]. However, as the existing big data activities are

performed on large-scale data centers with the help of distributed Hadoop and

MapReduce frameworks, they are eventually more efficient than the conventional

parallel processing compute applications [144]. To provide ease of use to the end user,

the Hadoop and MapReduce frameworks manage the task scaling, failover, and

parallelism by themselves while optimizing the overall query performance [145].

Figure 2.3 Cloud Computing

The user does not need to write sophisticated programs for parallel processing of the

data. End user is only required to formulate a query for the data analysis as shown in

Figure 4. In the basic integration model of cloud and big data, the big data sources are

connected to the cloud servers that store the data in a distributed manner over multiple

servers with the help of HDFS. The big data sources including sensors and mobile

devices require internet connectivity to store data on the cloud servers. Further, the

Hadoop framework executes on the distributed data to produce the analytical reports

after operations like filter, sort, and complex analytical queries. Similar to the

distributed data storage of Hadoop, the MapReduce element also functions in parallel

while adapting to the multi-server architecture of the cloud. Three techniques can be

adopted for efficient integration of the cloud and MapReduce framework. These are:

(a) the cloud MapReduce runtime which is an extension of the basic MapReduce

Page 54: Welcome to Pakistan Research Repository: Home

35

framework, (b) utilizing MapReduce as a cloud service, and (c) installing MapReduce

independently on a set of cloud servers for distributed parallel processing [145, 146].

Modern recommender systems take user preferences from sources like social networks,

sensors, and mobile devices. Due to the nature of the data sources, most of the data on

which modern recommender systems work on is big data. The data extracted from these

sources can have details of locality, user preferences, and decisions. Therefore,

extensive storage resources are required to perform the tasks of recommender systems.

Moreover, recommender systems make predictions and recommendations after filtering

data and addressing issues like data diversity, accuracy, and stability [1]. Therefore, the

resources and programs required for performing the operations of recommender

systems efficiently cannot be limited to a single compute node. Hence, recommender

systems often utilize cloud computing based infrastructure for effective real-time

implementation. The MapReduce framework needs to be aware of data locality in order

to perform its tasks efficiently [143].

2.6 Summary

In this chapter, a systematic review of the scientific literature is presented and

summarized the efforts and contributions of researchers in the area of recommendation

systems. First, basic filtration techniques are discussed such as content-based and

collaborative filtering based and also the hierarchy of the said techniques used in

recommendation systems. Secondly, the classification of criteria for recommendations

that include accuracy, familiarity, novelty, diversity, content compatibility, justification

of recommendations, and sufficiency of information is discussed in detail. The

similarity calculations and evaluation metrics used in the area of recommendation

systems are also presented in detail. Moreover, this research has critically investigated

the challenges in the design and implementation of recommendation systems.

This research summarize the following observations: (1) the quality of

recommendations can be improved by using all kind of additional information such as

check-in data, geographical information, social relationships and temporal information.

(2) model-based approaches are more effective and efficient than memory-based

approaches, and the performance of model-based approaches are also consistent. (3)

quality recommendations can only be achieved by using user-based recommendation

approaches rather than item-based approaches. Among criteria for recommendations,

Page 55: Welcome to Pakistan Research Repository: Home

36

the following observations are noticed: (a) Accuracy alone is not sufficient for the

selection of related algorithm, (b) the users are more likely interested in the familiar

items rather than unfamiliar ones, and (c) users like diverse recommendations as

compared to more accurate recommendations.

Next chapter attempted to overcome the problems identified in this chapter for

recommendation systems. The focus is to generate real-time recommendations while

handling the problems of cold start, data sparseness, and scalability. To address

scalability challenges, cloud computing is used to perform real-time processing of large

scale data. MapReduce framework is also used to enable parallel computation on

multiple nodes. Moreover, data is pre-processed to filter out insignificant/redundant

data from online recommendation. To handle the cold start problem, our model

maintains a pre-computed ranked list of popular users and venues. Using such list, our

model compares the preferences of the new user with the existing popular users. The

model suggests the recommendations according to the top user that best matches the

preferences with the new user.

Page 56: Welcome to Pakistan Research Repository: Home

37

Chapter 3 Real-Time Venue Based Recommendation System

Page 57: Welcome to Pakistan Research Repository: Home

38

3.1 Location Based Recommender Systems

The advent of sharing locations with each other in social networking services helps

strengthening the association between real-world social networks and online social

networking services [147]. LBRS is the system in which people share the location

embedded information with each other [148]. Users in LBRS share location tagged

media contents such as text, videos, and photos [149]. The location parameters

comprises the current location of the user with a timestamp as well as the location

history of the given user for a certain period. When same location is shared by two or

more users, the information also includes the complete knowledge of the users’

common behavior, interests, and activities extracted from the users’ location history

and location tagged information [27]. Existing LBRS services can be categorized as

geo tagged media based, point location based, and trajectory-based as shown in Figure

3.1.

Figure 3.1 Services offered by LBRS

3.1.1 Geo Tagged Media Based

Geo-tagged media-based services allow users to add location with users’ media

contents such as text, videos, and photos that were created in the physical world [150].

Passive tagging occurs when a user explicitly creates and adds the contents along with

the location [150]. Geo-tagged media-based services allow a user to view other users’

content in a geographical context by using digital maps on smart phones [151]. Popular

Services of LBRS

Geo Tagged Media Based

Point Location Based Trajectory Based

Page 58: Welcome to Pakistan Research Repository: Home

39

applications that provide LBRS services include Geo Twitter1, Flickr2, and Panoramio3.

It has been shown in [33] that addition of only location dimensions such as longitude,

latitude does not necessarily attract users, as users are more interested by actual media

content. Therefore, the addition of location information only acts as an add-on to enrich

and unify the media contents. The authors in [33] further indicated that addition of the

location feature does not have much impact on the connections and relationships among

the users rather it is the media content that is responsible for such connections and

relationships between users.

3.1.2 Point Location Based

Point location based services allow users to add and share users’ locations such as

restaurants, shopping malls, or cinemas [20]. The most common applications to offer

such services are Foursquare, Instagram, and Facebook that encourage users to share

their existing location. Users of such applications are provided with options to perform

a “check-in” at different locations that are visited by the users in their daily routine to

share experiences and knowledge by giving a tip or feedback [32, 33]. For example, a

user can share their views about a dinner to their community on an online social site

while using their smart phone. Moreover, such services also keep track of the users’

geospatial check-in data such as time and longitude/latitude [33]. In Foursquare, after

checking in at different locations, the application awarded badges and points to the user.

The user that has the most number of visits at a particular location has been capped as

“Mayor”. One of the main advantages of such services that allow real-time location

tracking of users is that users can discover friends around their physical locations that

help in boosting a user’s social activities in the physical world. For instance, after

discovering a friend’s physical location from his/her social network, one can offer the

friend to have a lunch or shopping activity. The use of tip or feedback in location-based

services allows users to share comments and suggestions that can be either positive or

negative. Such tips or feedbacks are pivotal in aggregating recommendations. Unlike

geo tagged media, a point location venue is the main component for the users that

1 http://geo-twitter.appspot.com/ 2 https://www.flickr.com/ 3 http://www.panoramio.com/

Page 59: Welcome to Pakistan Research Repository: Home

40

determine the connection between the users and the media content such as feedbacks,

badges, and tips are associated with the point location [147].

3.1.3 Trajectory Based

Trajectory based services allow users to add both point locations and the routes to that

point location. The most common applications that offer such services are Bikely4,

Microsoft GeoLife5, and SportsDo6. Trajectory based approaches compute information

by extracting data about a user’s visit patterns at different locations, duration of stays,

and the paths selected. Such services allow users to add information such as speed,

distance, duration, and route about a specific trajectory, as well as the user’s media

contents such as tips, tags, and photos along with the given trajectory [26, 33]. Other

users of the same community can take guidance from the experiences of their friends

by following the trajectory using digital maps or smart phones. In summary, such

services offer both “how and what” along with “where and when”.

3.2 Distinguishing Features of Locations

In LBRS, the focus of the recommendations remains on the locations beside the media

contents. Therefore, it is important for the recommendation systems to recognize and

consider the unique features of the locations to make recommendations to a user that

meets the criteria of both accuracy and quality of recommendation. Following are the

distinguishing features of the location.

3.2.1 Location Hierarchy

There are multiple scales at which location can be considered. A location can be a small

shop or restaurant or it can be a big town or city. Such locations, smaller or bigger form

a hierarchy where locations at the bottom are refer to smaller geographical areas [152].

For instance, a restaurant, cinema, or shopping mall may belong to a community,

community belongs to a town, town belongs to a country, and so on. The different levels

of hierarchical formulation of locations lead to diverse user-location and location-

location graphs. The authors of [27] suggested that even if a user has identical location

histories, different user-location and location-location graphs will be formulated. The

4 http://www.bikely.com/ 5 http://research.microsoft.com/en-us/projects/geolife/ 6 www.sportsdo.com.br/

Page 60: Welcome to Pakistan Research Repository: Home

41

importance of the hierarchical relationships and their consideration is essential for the

recommendation systems because these relationships have a very significant role in

establishing the connections between the users [67]. For example, users who share

locations such as restaurant or shopping malls that are considered as lower level items

in the hierarchy possibly have strong connections than users who share locations such

as towns or countries that are considered as higher level hierarchy. As a consequence,

the hierarchical property in LBRS is unique and needs to be considered.

3.2.2 Distance of Locations and Users

The second distinguishing property of locations in LBRS is distance. To find the

strength of relationship and connections between the users, distance must be

considered. The shorter the distance the stronger will be the relationship. There are three

geospatial distance relations defined to compute the relationship and connections

among the users using distance [67]. The three geospatial distance relations are the

distance between the users, the distance between a location and a user, and the distance

between two locations. Distance in all three aforementioned cases can affect the

recommendation systems in three possible ways. In the first case, the distance between

two users shows the similarity between users. For instance, users who have similar

history of visiting same location have high priority to have similar preferences and

interest [125, 152] and the users who have similar residential area are possibly friends

[153]. In the second case, the distance between users and locations shows the

probability of a user’s attraction to the specific location. For example, users frequently

visit nearby places than other places that are far away from users’ homes [154]. In the

last case, the distance between two locations shows the association between different

locations. For instance, restaurants and shopping malls are mostly situated near to each

other [21].

3.2.3 Sequential Ordering

Most users visit favorite places at regular intervals. Such regular intervals create a

relationship between users in a sequential order. For example, two users of the same

company regularly dine at two different restaurants and meet again in the cinema, an

ordering of visits can be created that may show some common preferences among them

[27] or that may possibly show traffic conditions [155]. Figure 3.2 shows the visit

Page 61: Welcome to Pakistan Research Repository: Home

42

patterns users can have in a LBRS. A user can visit different locations and add such

locations with media contents such as tips, tags, and feedbacks.

Figure 3.2 Visit patterns of users in a LBRS

3.3 Motivation

Rapid increase in the use of mobile social networking applications has been witnessed

in the past few years. These applications empower a user to update his status while

visiting a venue. The status is updated either based on feedback or tip based mechanism

[3, 26, 34]. The amount of data gathered by such applications on daily basis is huge in

volume. On the basis of data collected through mobile social networks, numerous

recommendation systems have been developed in recent years. The recommendation

systems consider location based data to suggest venues to the users based on the spatial

locality. However, generating real-time recommendation is still a research issue

because of large diversity in dataset of users’ historical check-ins [66, 67]. For

generating the optimal recommendation for user, the recommendation system has to

strictly consider the following factors: (a) personal preferences, (b) past check-ins, (c)

current context, such as time and location, and (d) collaborative social opinions (other

individuals’ preferences).

Numerous studies [20, 34, 70, 72, 87, 98, 101, 110, 127, 131, 148, 156, 157]

have considered collaborative filtering (CF) method to solve the venue

recommendation problems. The legacy venue recommendation systems consider

matching given user’s venue check-in profile data with one pre-recorded by other set

Page 62: Welcome to Pakistan Research Repository: Home

43

of users. The profiled data is stored in a user-venue check-in matrix. The similar set of

users based on the similar patterns presents their opinions for the visited venues. As a

result, the recommendation system offers useful personalized recommendations for the

target user. However, the given below research gap in the previous works significantly

impacts the performance of legacy venue recommender systems:

Cold Start. The cold start problem in state-of-the-art CF recommendation systems [71]

is noticed when venue recommendation is needed for a new user. The main reason of

this behavior is this that system knows only limited information for new user to perform

similarity measures on user-venue matrix. As a result, insufficient records of venues

outputs zero values of similarity computations. Further, zero values shows that

recommendation quality is substantially degraded. Our proposed model solved the cold

start problem by generating the recommendations of Top-N users whose preferences

best matches with cold start user. The Top-N users must be the nearest neighbor and

also the popular among the nearest neighbors.

Data Sparseness. Users visiting limited numbers of locations results in sparse user-to-

location check-in matrix. The negative effect of data sparseness is the non-optimal

computations of the nearest neighbor set of similar users with the particular user that

also affect the accuracy of recommendations. Moreover, performances of the existing

LBRS are also affected by the sparseness of user-to-user relationship matrix when

directly manipulated with CF-based models [31]. In our proposed model, the problem

of data sparseness is handled by introducing a preprocessing phase. The preprocessing

phase utilizes the K-Means clustering mechanism to create various clusters within data

and assign users to the clusters. The users within a same cluster are more likely to have

similarity in preferences. Therefore, this leads to the reduction in zero similarity values

that occurs when data is sparse and similarity is computed among users with non-

matching preferences.

Scalability. The memory-based CF recommender systems exploit rating data generated

by users for applying simplistic approaches for computations of similarity among users

or items [36, 72, 73]. Unfortunately, such type of systems suffers from scalability

issues. These type of systems parses thousands of users at real-time in user-venue

matrix. However, this approach is neither efficient nor scalable. To handle the

scalability issue, a few works has considered applying model based CF. The model-

Page 63: Welcome to Pakistan Research Repository: Home

44

based approaches embodies data mining and machine learning methods for finding

patterns based on the trained data for reducing the big data size of the user-item rating

matrix [1, 31]. However, there exists a tradeoff between the suppressed dataset size and

quality of recommendation. It is saying that if the size of dataset is reduced to achieve

fast online processing, then it will result degradation of recommendation quality. The

instant aftermath of the above listed issues is the suboptimal performance in CF-based

recommendation models. Consequently, it might not be viable to exclusively employ

memory-based CF model for venue recommendations. To address the challenge of

scalability, our model used cloud computing to perform real-time processing of large

scale data. Moreover, MapReduce framework is used to enable parallel computation on

multiple nodes.

The core objective of our study is employing efficiently the aforementioned

factors to acquire truly real-time recommendations for venues. But, there occurs several

issues which negatively impacts performance of the real-time recommendation process

mainly driven by the cost and complexity of processing large-scale diverse datasets [1,

31]. For efficiently scaling, a recommendation system needs large-scale storage and

computational resources. This current study describes an approach to leverage cloud

resources and offers service-based interfaces for processing, mining, comparing, and

managing large-scale datasets as required by real-time recommendations. Model-based

approach is used on cloud based infrastructure for generating venue recommendations.

The results of our research showed the success in achieving the target of generating

real-time recommendations by using model-based approaches on cloud infrastructure.

Moreover, the target by reducing the dataset is achieved on cloud environment using

MapReduce. However, a tradeoff between recommendation quality and reduced

dataset is still a hurdle. Quality of recommendations may be affected if the dataset is

significantly reduced to improve efficiency of online real-time processing. The

proposed model has solved the scalability issues by putting forward a cloud-based

architecture that allocates data and computational intensive loads over cloud servers.

For handling the problem of cold start in venue recommendations, our model compares

the preferences of the new user with the existing top users (popular users). The model

suggests the recommendations according to the top user that best matches the

preferences with the new user. For handling the data sparsity problem, our model

eliminates the sparse data in the pre-processing phase while clustering the venues.

Page 64: Welcome to Pakistan Research Repository: Home

45

3.4 Related Work

In recent years, various literatures have been presented summarizing commonly used

techniques and challenges in the field of recommendation systems [1, 59, 114, 118,

124, 135, 158]. The authors of [1] provided a general overview of recommender

systems and discussed various collaborative filtering algorithms. The authors also

provided classification in terms of similarity, neighborhood, predictions, and

recommendations. Moreover, the aforementioned survey provided a discussion on

various KNN schemes for recommendation systems and the cold start issues, along with

evaluations. However, the authors did not specifically discuss location based

recommendations. In the article presented in [98] and [59] and the focus is on the

research challenges of recommendation systems. The authors provided an overview of

the major techniques such as collaborative filtering, content based filtering, and hybrid

recommendations along with the various challenges faced by such techniques. The

authors in [118] and [135] attempted to summarize the evaluation metrics and

techniques used in recommendation systems. In the latest survey on recommendation

systems, the main focus is on neighborhood-based recommendation methods used for

item recommendation [158].

Most of the above mentioned literature provided a general overview and

research challenges of commonly deployed techniques in recommendation systems. To

the best of our knowledge, there has not been any extensive study conducted on location

based recommendation systems (LBRS) as presented here. A related study was

presented in [27]. However, the aforementioned study was conducted on the topic of

location based social networks, whereas this research has focused specifically on

LBRS. In [27], the authors restricted their analysis to the data sources used, e.g., user

profiles, history of user visited location, and history of online user activities on LBSNs,

methodology employed for recommendation, for example, content based, collaborative

based, and link-analysis based, and the objectives of the recommendations , for

instance, users, locations, social media, and activities. In contrast, our study presents a

qualitative comparison of various techniques proposed in LBRS not only for

individuals but also for group based location recommendations. Moreover, this research

additionally discussed numerous significant services offered by LBRS. Such services

are categorized as: (a) geo-tagged media based which are the services that allow users

to add location with users’ media contents, such as text, videos, and photos that were

Page 65: Welcome to Pakistan Research Repository: Home

46

created in the physical world, (b) point-location based services that allow users to add

and share users’ locations, such as restaurants, shopping malls, or cinemas, and (c)

trajectory based services that allow users to add both destination point locations and the

routes to that destination. Furthermore, the distinguishing features of the locations

which are utilized by LBRS for recommendations are also presented. The location

features are categorized as: a) location hierarchy, b) distance of locations and users, and

c) sequential ordering. Criteria to build a user’s trust and confidence on a

recommendation systems is also discussed. The criteria includes: (a) accuracy, (b)

familiarity, (c) novelty, (d) diversity, (e) context compatibility, (f) justification of

recommendations, and (g) sufficiency of information. Basic similarity calculations and

evaluation metrics are also presented, such as (a) cosine based similarity [126], (b)

correlation based similarity [127], and (c) adjusted cosine similarity [33] [125]. In the

end, a comparative study and a tabular summary of the existing schemes is presented.

LBRS can be divided into two main categories. (a) Generic location

recommendation and (b) Personalized location recommendation. In generic location

recommendations, public opinions are extracted and the system recommends the most

popular locations according to the extracted public opinions [159]. The limitation of

such type of systems is the identical recommendations from the system due to the lack

of users’ preferences. Alternatively, personalized location recommendations have been

proposed to overcome the limitation of generic location recommendations. Such

systems provide users with the most relevant locations according to the preferences

given by the user [91].

Moreover, this research categorize existing approaches as [1, 31, 34, 92, 160]

(a) trajectory based, (b) explicit rating based, and (c) check-in based approaches. In the

proposed categories, trajectory based approaches use profiled data regarding a user’s

visit succession to various locations, paths selected, and the duration of stays at those

locations. In [36], a trajectory-based graphical model has been proposed which keeps

track of most frequently traversed routes by the users and recommends most appropriate

route to a new user. In comparison to [36] , a personalized route recommendations

method is presented in [37] that is much similar to [36]. The proposed work discussed

in [159] has mined GPS trajectories data for extracting the popular locations on the

basis of users’ travel sequence. Though the aforementioned approaches propose

locations on the basis of past trajectories of the users, however, they are not able

Page 66: Welcome to Pakistan Research Repository: Home

47

distinguishing the places in terms of their categories, which is considered in our

proposed RTVR framework. Rating-based recommendation exploits the available

existing ratings’ profiled data for recommending people with most popular venues or

travel routes in rural areas. The proposed work as discussed in [161] and [162] has

design models using collaborative filtering that took into account users’ profiled ratings

for generating personalized venue recommendations. A few studies has proposed

recommendations based on implicit ratings. The implicit ratings consider check-ins

performed by the users at different venues to estimates recommendations [33, 162]. For

instance, the proposed work as discussed in [156] has applied random-walk-with-restart

method on user-venue check-in matrix for generating personalized recommendations

for the target user. The authors of [34] has put forward a recommendation approach that

generates the region-wise expert users in addition to venues from check-in data under

various types.

Majority of the aforementioned approaches consider designing recommendation built

on (memory based) CF models. The considered models enable these approaches to

efficiently depict a user’s future preferences based on profiled data. However, these

approaches overlooked scalability issues owing to large size of similarity computations

over user-venue matrix while performing online recommendation process[71, 88]. In

addition, these approaches highly suffer from the divergence of data, real-time

recommendations, and cold start problems because of having limited data in terms of

user-location matrix and user-check-in matrix [107]. Also, these approaches have

overlooked the group recommendation problem and do not take into account the impact

of real-world time-varying conditions on result of recommendations.

Varieties of location recommendation approaches are available such as matrix

factorization, explicit rating, implicit rating, route recommendations, location

recommendations, and location- based group recommendations. In the following

subsections, some of the techniques used in LBRS are discussed in detail. This Section

is categorized based on the different techniques used in LBRS as depicted in Figure 3.3.

Page 67: Welcome to Pakistan Research Repository: Home

48

Figure 3.3 Categorization of Techniques used in LBRS

3.4.1 Matrix Factorization Techniques

In matrix factorization, it is tried to find out two more small matrices by factorizing a

larger matrix, such that when the smaller matrices are multiplied, the result will be

approximately the large matrix7. Matrix factorization discovers the latent features

underlying the interactions between two different kinds of entities that help in

predicting ratings in collaborative filtering. Formally, let � represent a set of users and

a set � of locations. Let � be the matrix of size |�| � |�|that contains all the ratings of

the users assigned to locations. To find � latent features, two matrices

�withdimensions|�| � �� and �withdimensions|�| � ��, are also needed to

find such that, when matrices �and� aremultiplied, the result approximates to

matrix� as indicated in Figure 3.4 and equation (3.1). The equation for matrix

factorization can be given as

� � � ������ (3.1)

7 http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-

in-python/

Techniques used in LBRS

Types

Individual Recommendation

Group Recommendation

Methodologies

Matrix Factorization

Implicit Rating-based

Explicit Rating-based

Page 68: Welcome to Pakistan Research Repository: Home

49

Figure 3.4 Concept of Matrix Factorization

In this way, each row of matrix � and % would represent the strength of the

association between user and features, as well as, location and features, respectively.

For a user ��, prediction of rating for a location �� can be calculated by the dot product

of the two vectors corresponding to �� and �� as

�̂�� = ?��3� = � ?���

���

3�� (3.2)

The most common recommendation approach used in various literatures is

matrix factorization [163]. Matrix factorization was first used in LBRS in [164]. Using

an experimental Global Positioning Systems (GPS) dataset, interesting activities and

locations are revealed for recommendations. Among different matrix factorization

models, the most popular model used for recommendations in LBRS is the 0/1 scheme

model [21, 33]. The value 0 is used for non-visited locations and 1 is used for visited

locations. By using the 0/1 model, authors [21, 33] studied the social and geographical

influence in point-of-interest POI recommendation based on CF techniques. Moreover,

another model was proposed by [165] based on the frequencies of check-ins in order to

compute the preferences of users for venues. With the help of this scheme, the author

developed a LBRS using matrix factorization method. Furthermore, in [166], the

authors proposed a multi-center Gaussian model in combination with matrix

factorization. The multi-center Gaussian model was used to capture the geographical

influence and matrix factorization was combined with social regularization to develop

a LBRS. The main disadvantage of using matrix factorization technique is non-

convexity problem [167]. In non-convexity problem, the model has many local optimal

points and many iterations are required to know that the problem has any optimal

solution or the solution is global. Therefore, the efficiency in terms of time in using

matrix factorization techniques is effected by non-convexity problem.

Page 69: Welcome to Pakistan Research Repository: Home

50

3.4.2 Explicit Rating Techniques

Recently, many online social services have evolved that permit users to explicitly rate

the locations visited by the users. In rating-based recommendation systems, existing

ratings’ data is utilized to generate user preferred recommendations. The authors of [26,

161] have presented similar models based on users’ existing ratings to compute

personalized CF-based location recommendations. A similar approach was proposed in

[31] that applies an item-based CF-based method with all the ratings of locations in

users’ surroundings. The aforementioned techniques may strictly capture users’

preferences, but are less effective in terms of scalability. Similarly, very few entries in

a user-rating matrix results in the data sparseness problem in the above mentioned

approaches.

In addition to model/memory-based categorization, LBRS, such as [1, 27, 32,

92, 160] are also classified as being trajectory based, check-in based, and explicit rating

based. The trajectory-based graphical model was proposed in [36] in which tracks are

recorded for most frequent routes travelled by users and in return, the system

recommends the best available route to a new user. Similarly, an approach to compute

personalized route recommendations is proposed in [37]. Extracting the most popular

location by mining of GPS trajectories is presented in [27]. The process of extraction is

based on users’ travel history. All the above mentioned techniques recommend

locations based on users’ visited routes/trajectories, but such systems cannot properly

perform the differentiation between the locations in terms of the categories that will be

the main task in our proposed framework. Moreover, most of the aforementioned

techniques relied on memory-based CF-based models that allow such techniques to

represent a user’s expectations based on the user’s previous activities. However, such

techniques are not capable of providing sufficient scalability by simultaneously

processing massive amount of real-time data. Table 3.1 shows an example of explicit

rating in which each user has rated some locatio

Table 3.1 Example of Explicit Rating

Ratings of users on locations

User-location l1 l2 l3 l4 l5 l6

u1 4 1 3 3 . 2

u2 2 - 4 4 5 1

u3 5 3 4 3 . 5

u4 2 . . . . 2

Page 70: Welcome to Pakistan Research Repository: Home

51

3.4.3 Implicit Rating Techniques

There are some techniques proposed in which the models are based on implicit ratings

[26, 33]. In implicit ratings, the numbers of check-ins performed by a user at multiple

locations are recorded. In [38] the authors applied a random-walk-with-restart approach

on a user-location check-in matrix to compute personalized recommendations for a

specific user. A similar kind of recommendation approach is proposed in [32] that

compute region-wise popular locations and users from check-in data. Figure 3.5 shows

the basic concept of implicit ratings based on number of check-ins performed by the

users. Users first perform check-in by visiting different locations and later rate the

location according to their experience in that specific location. The rating of the location

along with check-in data is stored in user-location rating matrix. Different implicit

rating models are applied to user-location rating matrix in order to generate the final

recommendation list.

Figure 3.5 Basic Concept of Implicit Rating using Check-Ins

3.4.4 Route Recommendation Techniques

The authors of [68] focused on the problem of traffic jams and long queues while

traveling to tourist hotspots. Tourists suffer from congestion on the road, time wastage

due to the long queues or they often miss their favorite spot because of the traffic

conditions. The aforementioned problem of self-driving tourists may affect their

interest and satisfaction level towards their favorite locations and hot spots [168]. The

Page 71: Welcome to Pakistan Research Repository: Home

52

main idea is to eliminate the traffic jam problem by utilizing personalization techniques

and real-time traffic information. The techniques used to overcome such problem

include, Vehicle-to-Vehicle Communication System (V2VCS), Fuzzy set theory, and

Genetic Algorithms (GA). V2VCS [169] is used to share the real-time traffic

information among self-drive tourists. Fuzzy set theory [170] along with “Technique

for Order Preference by Similarity to Ideal Solution (TOPSIS)” [171] is used to

automatically score all the candidate routes instead of asking the user to rate huge

number of candidate routes. Such candidate routes are scored according to the given

preferences of the user and real-time value of five routes attributes such as, Point-of-

Interest POI, road condition, distance, fee, and traffic. A genetic algorithm [171] is used

to find the most appropriate and relevant route from the entire candidate routes

according to the preferences given by the tourist. A GA is also used to explore the route

scores and real-time traffic information. The assumptions made in the research are that

information about personalization and real-time context can be collected by the system

instantaneously. For the evaluation of the system, a web prototype was designed for the

simulation of self-drive tourists. The main contributions of the research are to introduce

a novel technique to score all the candidate routes, propose a route generation

technique, and designing of the personalized recommendation systems to offer the

service of real-time recommendations of routes. However, the authors did not consider

attributes such as travel time, route complexity, time-of-day, and the location type.

Also, flexibility in the route recommendations was not considered that allow the tourists

to customize their own route preferences. And finally, V2VCS is not applicable to some

countries because of the lack of infrastructure. Moreover, in many developing and

under developed countries, the vehicles embedded with V2VCS sensors are of latest

models and are not affordable by majority of population. Therefore, other platforms

such as smart phones need to be considered for the route recommendation services.

Similarly, a route recommendation technique is proposed in [69]. The authors

addressed the problem that a user’s preferences on the selection of routes are influenced

by many dynamic and latent factors that are difficult to model by using existing

techniques. For solving the aforementioned problem, the CrowdPlanner [69] technique

is used. This is a crowd based route recommendation technique that uses the crowd’s

knowledge. A large-scale real trajectory dataset and hundreds of volunteers were

involved for the experiments and evaluation of the technique. The main achievement

Page 72: Welcome to Pakistan Research Repository: Home

53

of the work is the selection of the best route that was verified by comparing the results

of the technique with the previous techniques, mining algorithms, and web services.

The authors of CrowdPlanner did not consider the quality controls of popular route

mining algorithms and mining latent factors that may affect the driving route.

3.4.5 Locations Recommendation Techniques

Location recommendations on the basis of the preferences of users are also proposed in

[34]. In addition to user’s preferences, social opinions from the local experts are also

considered for the final recommendations to the users. The main focus of the work is

to overcome the problem of data sparsity. The user-location matrix is sparse due to the

limited visits to locations. Therefore data sparsity occurs and it will become more

challenging when users travel outside their native city. A weighted category hierarchy

(WCH) is used to model each user’s preference and a preference-aware candidate

selection algorithm is used to select candidate local experts for taking the social

opinions. A real dataset collected from Foursquare was used for evaluation. The authors

claim that their system is more efficient than the major LBRS such as Most-Preferred-

Category-based (MPC) [27], Location-based Collaborative Filtering (LCF) [33], and

Preference-based Collaborative Filtering (PCF) [72]. However, the authors have not

considered real-world factors such as weather conditions and temporal features to

achieve the quality of recommendations.

In [154], the authors proposed Location Aware Recommendation System

(LARS). The authors have proposed a technique that used spatial properties for location

recommendation that were not considered in the previous techniques. The primary

recommendation technique used in the work is collaborative filtering. The secondary

technique used is user partitioning. It is a technique that is used to retain an adaptive

pyramid structure. The dataset used for the evaluation of the techniques is taken from

MovieLens 8and Foursquare9. For improving the proposed technique, location

attributes of real-time world factors should be considered.

The authors of [66] considered patterns of human mobility that is one of the

important aspects of recommendation, especially in LBRS. More than 10,000 frequent

8 Movie Lens: https://movielens.org

9 Foursquare: http://foursquare.com.

Page 73: Welcome to Pakistan Research Repository: Home

54

users of LBRS were investigated for monitoring their frequent mobility patterns. The

system investigates the metadata related with the location of the users, considering the

type of location and their evolution over time. The clustering of users is then performed

based on users’ movement behavior and then the system predicts the user’s next

movements. The users of the system perform a check-in for sharing their current

location with the system. GPS-enabled devices are used for performing the check-ins.

Foursquare dataset is used for the evaluation of the system. The authors claimed that

the proposed system efficiently predicts the human mobility when compared to

traditional systems. The system also predicts better mobility patterns, even when

provided with the limited history of the user. Besides the achievements, the authors did

not consider the location predictions using temporal pattern models.

The authors of [38] proposed an approach for modeling human activities and

geographical areas by categorizing the locations. A spectral clustering algorithm is used

on users’ similarities and locations of two cities using a dataset from Foursquare. The

approach used in the work allows urban neighborhood comparison within and across

cities and identification of communities based on identical visits to the same category

of locations. The key achievement of the work includes profiling and dividing users

into communities and relating users of mobile phones with their particular category and

locations. The authors of the work did not consider temporal variations for further

characterization in order to characterize users and areas at certain periods of a day. User

tips and comments are also not considered for including semantic information. Figure

3.6 shows the overview of location recommendations.

Page 74: Welcome to Pakistan Research Repository: Home

55

Figure 3.6 Overview of Location Recommendations

3.4.6 Group Recommendation Techniques

With the rapid increase in the use of social networking services, the significance of

recommendation models considering group preference has also increased. However,

most of the existing traditional recommendation schemes do not take into account group

of “friends” scenarios [38, 66, 69]. In [70], the authors proposed a location-sensitive

recommendations. The recommendations are used in ad-hoc social network

environments. The paper proposed an approach known as spatial social union which

computes recommendations not only for a single user but also for group of users. The

approach computes the similarity between two users and generates multiple matrices

derived from user-user, user-item, and user-location graphs. The online dataset of

MovieLens is used for evaluation. In [157], the authors proposed personalized event-

based group recommendations. The main contribution of the work is group

recommendations to the users living in the same city. The localization property of users

and groups was extracted and further integration of latent factor model with explicit

features of location was done in order to provide group recommendations. The dataset

used is Meetup, which is an online social media site. Table 3.2 summarizes the

description and weaknesses of the state of the art techniques and Table 3.3 provides a

summary of some of the selected techniques.

Page 75: Welcome to Pakistan Research Repository: Home

56

Table 3.2 Strength and Weaknesses of the Selected Techniques

Techniques Description Weaknesses

Liu, L., et al. [68] Scoring of all the candidate routes,

introducing route generation

technique, and real-time

recommendation of routes.

Some key routes attributes

were ignored such as travel

time, route complexity, time-

of-day, and the location.

Customization options for

users to plan their routes were

missing. Platforms such as

smart phones need to be

considered.

Bao, J., et. al. [34] Location-based Collaborative

Filtering (LCF), and Preference-

based Collaborative Filtering

(PCF). Achieved better

performance than Most-Preferred-

Category-based (MPC).

Real-world factors such as

weather conditions and

temporal features were

ignored.

Levandoski, J.J., et al. [154] Spatial properties of the users are

introduced using CF-based and

user portioning techniques that are

previously ignored by traditional

recommendation systems.

Location attributes of real-

time world factors were not

considered.

Su, H., et al. [69] Use of mining algorithms and web

services to find optimal route.

Quality control of popular

route mining algorithms and

mining latent factors were not

considered.

Preoţiuc-Pietro., et al. [66] Better prediction of human

mobility patterns. Well-handled

data sparsity problem.

Location predictions using

temporal pattern models were

not considered. Trajectory

based information also not

considered.

Noulas, A., et al. [156] Profiling and dividing users in

communities. Relating users of

mobile phones with their particular

category and locations.

Temporal variations for

further characterization and

including semantic

information need to be

considered.

Table 3.3 Summary of Techniques

Recommendations Problems

Addressed Techniques Dataset Used

Routes [68] Traffic Jams,

Long Queue

V2VCS,

Fuzzy Set Theory,

Genetic Algorithm

Questionnaires,

V2VCS data.

Location [34] Data Sparsity

Cold Start

Hyperlinked Induced

Topic Search Foursquare

Location [37] Uncertain

Trajectories User Explicit Ratings Real Dataset

Page 76: Welcome to Pakistan Research Repository: Home

57

Location [162] Personalized

Routes User Explicit Ratings Questionnaire

Location [154] Quality of

Recommendations K-Nearest Neighbor

Movie Lens,

Foursquare

Routes [36] Optimal Routes Implicit Ratings

Open Street Map

Project, Spatial

Dataset

Locations [38] Un-visited

Locations

Random Walk with

Restart

Gowalla,

Foursquare

Locations [164]

Un-visited

Locations and

Activities

Matrix Factorization Real Dataset using

GPS Trajectories

Locations[165] Spot

Identification Matrix Factorization

Austin in Texas

ATX and New

York City NYC

Datasets

Locations [166] Un-Visited

Locations

Multi Centered

Gaussian Model and

Matrix Factorization

Gowalla

Group Recommendations

[172]

Implicit

Similarities

Hybrid

Recommendations

Yahoo! Movie

Yahoo! Music

Locations [70] Location

Sensitive Spatial social union Movie Lens

3.5 Quantitative Analysis

In Section 3.4, qualitative analysis is conducted of some of the techniques proposed in

the area of LBRS. This section presents quantitative comparison of some of the selected

techniques from literature to show how the performance of existing models is affected

by different datasets. A model may exhibit good performance with only one type of

dataset but may not perform better with some other dataset. Therefore, when the models

are evaluated with different datasets, a fair comparison is made and this also provides

an insight about how the performance of a model is effected by an increase or decrease

in data sparseness, or size of dataset. Quantitative analysis of four state of the art

recommendation techniques have been performed over datasets of Foursquare,

MovieLens and Gowalla. The details of the techniques and datasets used for the

comparison are discussed below.

3.5.1 Datasets

Three real datasets have been selected that are publicly available for experimentation.

Each of the selected dataset are discussed below.

Page 77: Welcome to Pakistan Research Repository: Home

58

(a) Foursquare

Foursquare [28] offers users from all around the world to share their location experience

by performing a check-in. Foursquare dataset has more than 10 billion records of check-

ins with different attributes including users, venues, ratings, social graphs, and

longitude, latitude of the location or venue.

(b) MovieLens

MovieLens [29] is the movie rating dataset that is available online by movie

recommendation systems (a project at University of Minnesota). There are different

sizes of dataset available on the MovieLens website for experimental purposes. Dataset

is taken that consists of 1 million ratings for 900 movies with 700 users.

(c) Gowalla

Gowalla [173] is another dataset that is available online for experiments. The Gowalla

dataset was collected by a location-based social networking website which is not

functional now. The dataset consists of 30 million check-ins by 0.3 million users on 2.8

million locations.

3.5.2 Techniques

Four state of the art recommendation techniques are used for the quantitative analysis

that are discussed below.

(a) Location Aware Recommendation System (LARS)

LARS systems [154] is a technique that used spatial properties for location

recommendation. Spatial items nearest in distance will be given priority to be selected

as final recommendation to the user. The primary recommendation technique used in

the work is collaborative filtering. The secondary technique used is user partitioning. It

is a technique that is used to retain an adaptive pyramid structure.

(b) Geographical Probabilistic Factor Model (GPFM)

GPFM [174] is a technique based on user preferences and mobility patterns of the users.

The technique captivates the geographical influence on check-in data provided by the

users. GPFM also uses check-in data and extracts user’s feedback to model user

preferences.

Page 78: Welcome to Pakistan Research Repository: Home

59

(c) Latent Dirichlet Allocation (LDA)

LDA [104] is a technique for modeling user preferences according to the ratings

provided by the users. In LDA users are regarded as documents and rated items are

regarded as the words of the documents. Moreover, item location or user information

is not considered in the LDA model.

(d) Random Walk with Restart (RANDOM)

RANDOM [156] is a technique in which random walker jumps in between the nodes

of the graphs according to the transition probability. The time consumed on each node

is different with certain assumptions. The technique approaches a steady-state which

results in a vector of steady-state probabilities for each visited node. Moreover, at any

step the constant probability allows the model to jump back to the targeted node at any

time, resulting in high rank of the closer nodes from the target nodes.

3.5.3 Experiments

To evaluate the effectiveness of our model, this research opted the performance metrics

that include: (a) Precision, (b) Recall, and (c) F-measure. Precision states the ratio of

correct recommendations (true positives (&�)) to the total number of recommendations

(&� + false positives (��)). The equation for calculating the precision is given in

@�'$���%* =2�2� + .� (3.3)

Here &� represents the total number of correct recommendations (i.e., true

positives) and �� indicates the false positives. The term &� + �� captures the total

number of recommendations including true positives and false positives.

Recall states the ratio of hit set size to the total size of test set. It is measure of the

recommendation coverage for a given recommendation system as given below:

'$(�� = (2�)

(2� + .*)

(3.4)

Here, �� indicates false negative that means the recommendations incorrectly

identified as true recommendations.

Page 79: Welcome to Pakistan Research Repository: Home

60

F-measure states the harmonic mean of precision and recall as shown in (3.5).

A-�'(�9�' = 2 × @�'$���%* × '$(��@�'$���%*+ '$(�� . (3.5)

3.5.4 Observations

Figure 3.7, 3.8, and 3.9 shows the performance in terms of precision, recall and F-

measure on Gowalla dataset. The performance is calculated for the maximum value of

N=20 because most of the techniques ignored the value greater than 20 for Top-N

recommendations. Four techniques are compared namely LARS, RANDOM, LDA, and

GPFM on the all the datasets. In the first step, all the techniques are compared on

Gowalla dataset. It is observed that precision, recall, and F-Measure of all the selected

techniques is high when considering Gowalla and Foursquare datasets as depicted in

Figure 3.7 to Figure 3.12. The key reason of this behavior is this that the user online

behavior is not affected by the location factor. Therefore, considering MovieLens

dataset, probably more candidate items are available for the test user. As a result, these

potentially related items get prior rankings (i.e., larger ranking scores) that

consequently affects the accuracy and performance. Moreover, with the increase in

number of recommended venues recall value increases, whereas precision gets highly

affected as shown in Figure 3.7, 3.10, and 3.13.

Figure 3.7 Precision of Gowalla Dataset

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2 4 6 8 10 12 14 16 18 20

Pre

cis

ion

Number of Recommendations

LARS

RANDOM

LDA

GPFM

Page 80: Welcome to Pakistan Research Repository: Home

61

Figure 3.8 Recall of Gowalla Dataset

As can be seen, the precision decreases with an increase in list size, with the dissimilar

features that maintains their virtual ranking in performance. However, with a difference

in trend, recall significantly increases with the growth of the recommendation list

Again, the dominating features outperform the others over the entire range of list size

as depicted in Figure 3.8, 3.11, and 3.14. The presented analysis represents the tradeoff

between the outcome of precision and recall that each feature faces. Therefore, the real

systems should tune their results according to the users’ preferences.

Figure 3.9 F-Measure of Gowalla Dataset

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2 4 6 8 10 12 14 16 18 20

Reca

ll

Number of Recommendations

LARS

RANDOM

LDA

GPFM

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

2 4 6 8 10 12 14 16 18 20

F-M

easu

re

Number of Recommendations

LARS

RANDOM

LDA

GPFM

Page 81: Welcome to Pakistan Research Repository: Home

62

It is observed that among all techniques, RANDOM outperforms other techniques in

terms of precision, recall and F-Measure as RANDOM concurrently leverages several

sources of data prior to applying encoding. Other reason for RANDOM performing

better than other techniques, showing the advantages of exploiting user home location

information. Moreover, RANDOM is a model-based technique and recently literature

has demonstrated the superiority of model-based approaches to memory-based methods

[175]. It is also noticed that the performance of GPMF technique is low in all the

datasets. The reason of low performance is that GPMF considers Gaussian distribution

to model each feature that was estimated based on partitioning of space into regions. In

next stage, it opts multinomial distribution for modeling user’s mobility outline for the

regions based on the footprints left on the regions. Therefore, GPMF recommends items

with only limited (or even no) rating information available only if they reside inside the

user’s activity area. However, GPMF is poor in performance in comparison to the

proposed techniques owing to its nature to overlook the user home location profile; as

result, GPMF fails to efficiently recommend items when limited ratings information is

available. Also, while considering the early increasing information accuracy, the key

reason of this behavior is shifting data dense in location cell owing to low granularity.

Afterward, with coarse item locations, the accuracy decreases.

Figure 3.10 Precision of Foursquare Dataset

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

2 4 6 8 10 12 14 16 18 20

Pre

cis

ion

Number of Recommendations

LARS

RANDOM

LDA

GPFM

Page 82: Welcome to Pakistan Research Repository: Home

63

Figure 3.11 Recall of Foursquare Dataset

Besides, the results obtained for the two datasets including Gowalla and Foursquare are

identical to each other’s; the noticeable thing in this behavior is because the chosen

systems have used dissimilar interfaces and incentives for user participation. It was also

noticed that no location recommendation engine was put in place based on either service

while data collection was performed.

Figure 3.12 F-Measure of Foursquare Dataset

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

2 4 6 8 10 12 14 16 18 20

Rec

all

Number of Recommendations

LARS

RANDOM

LDA

GPFM

0

0.1

0.2

0.3

0.4

0.5

0.6

2 4 6 8 10 12 14 16 18 20

F-M

easu

res

Number of Recommendations

LARS

RANDOM

LDA

GPFM

Page 83: Welcome to Pakistan Research Repository: Home

64

Figure 3.13 Precision of MovieLens Dataset

The difference while considering performance was obtained based on the home distance

feature which is constantly superior in Gowalla than in Foursquare for all the

performance metrics. The main reason of this fact is that the average count of user

check-ins in Gowalla is more to the one observed in Foursquare. Resultantly, it permits

the “home” location inference to be more precise and accurate. Also, for Gowalla, when

all user social links and check-ins are available then random walk based models attain

larger performance gain due to their ability to use more high quality data for building

network structure. The characteristics of our obtained data are different and it

differentiates it from the traditional recommender system scenario. A key

differentiation is this that for other studies scenarios user reveals preferences using their

ordinal ratings, however, the proposed study captures check-ins for only numeric

frequencies: as a result, the negative feedback from the user is almost zero.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

2 4 6 8 10 12 14 16 18 20

Pre

cis

ion

Number of Recommendations

LARS

RANDOM

LDA

GPFM

Page 84: Welcome to Pakistan Research Repository: Home

65

Figure 3.14 Recall of MovieLens Dataset

Figure 3.15 F-Measure of MovieLens Dataset

Moreover, the chosen data is very sparse and the various users and venues occupy a

single check-in. Meanwhile, for both datasets, a few places are noticed that contains

extremely high number of check-ins, while maximum of the total only enjoys few user

check-ins. Therefore, the heterogeneity across users check in at different places is

noticed very high, whereas some venues are noticed reaching to the high ranks of

popularity. There exist several reasons to answer why this may be the potential case.

Firstly, check-in data doesn’t not fully captures all the preferences of users. Be a fact,

in contrast to web ratings, it tends towards capturing habitual behavior and discourages

0

0.1

0.2

0.3

0.4

0.5

0.6

2 4 6 8 10 12 14 16 18 20

Rec

all

Number of Recommendations

LARS

RANDOM

LDA

GPFM

0

0.1

0.2

0.3

0.4

0.5

0.6

2 4 6 8 10 12 14 16 18 20

F-M

easu

re

Number of Recommendations

LARS

RANDOM

LDA

GPFM

Page 85: Welcome to Pakistan Research Repository: Home

66

negative feedback. Secondly, like-mindedness may be sufficient to accurately model

the reason of venue visiting by users.

3.6 Real-Time Venue Recommendation Model

The objective of this work is to design a real-time venue based recommendation system

for individuals in a scalable manner. The existence of the huge volume of data provided

by Facebook, Foursquare, and Twitter needs data refinement such that the information

extracted must be more specific and related to the user’s query. To achieve the objective

of generating real-time recommendations in a scalable manner, following factors have

been considered: (a) users’ preferences, (b) current context such as time and location,

(c) historic check-ins, (d) geospatial characteristics, (e) ratings, and (f) collaborative

social opinion. Scalable cloud-based infrastructure is used for real-time

recommendations. Main steps of our proposed RTVR model are depicted in Figure

3.16.

Figure 3.16 System diagram for RTVR Model

Page 86: Welcome to Pakistan Research Repository: Home

67

Step 1 and 2: Clustering of Venues

In step 1 and 2, we utilize clustering approach to reduce the dataset size. We have used

K-Mean clustering algorithm for set of venues based on their geographical location. K-

Means clustering is used due to its efficient working in terms of computational cost

[176]. K-mean algorithm returns cluster label (cluster indices) based on distance from

each point to every centroid (cluster’s center point) of � clusters. For each user '�, we

find in which cluster the user lies. To achieve this we find Euclidian distance between

user location and each � cluster center as depicted in (3.6).

(�������� = ��) ��� ����� ∪ ( � =1� * + ,'� − (�,

�∈��

-,

(3.6)

where (� is location of ��� cluster center and '� is the location of ��� user location and

n is given by (3.7).

� = +��

���

,

(3.7)

where �� is the number of venues in � cluster, � is the number of venues in cluster, '� is the user and (� is the cluster center. To achieve the objective of real-time

recommendations, we have clustered all the venues � before getting the check-in

information of the user as shown in (3.7).

The reason for clustering venues is that without clustering we have to consider all the

venues for generating recommendations for each user, which takes significantly longer

time. Whereas, with clustering based approach we only consider the venues in the

cluster where user is currently located. To improve accuracy and quality of

recommendation, we also consider the neighboring clusters of the current cluster as

shown in (3.8).

�'�'$2'0���� = B �� = 1(�)��* C0��� ,�� = � � ��� − �����

�����

�D , � = � 2% 1 ,

(3.8)

Page 87: Welcome to Pakistan Research Repository: Home

68

where � is the number of selected neighbor clusters, (�, (�are the ��� and ��� cluster

centers, and . is the number of parameters used to find the distance between (�and (�. Reduction of data through offline partitioning into clusters leads to significant reduction

in time complexity and helps the system to generate real-time recommendations.

Step 3: Placement of User in the Relevant Cluster

RTVR model selects the most relevant cluster for each user. Each user is placed in

his/her relevant cluster based on the distance of the cluster centroid with the user.

Euclidian distance is used for the placement of user. The reason of using Euclidian

distance is because it represents the physical distance between two points especially

when using with K-Mean clustering technique. The direction of the points for which

the distance is calculated may be opposite but they may reside in the same cluster, if

the distance from the centroid is same for both points [177]. Suppose, a set of point / ��� ,� , �� … . . �� and set of point 0 �1� , 1 , 1� … . . 1� , then the Euclidian distance

between point / and point 0 is given by (3.9) [178].

���,�� = ������� − ������ + ����� − ������ + ⋯ ����� − �������

= �;∑ ��E� − F������� <

(3.9)

It is likely that the preferred venue may lie in other neighboring venue clusters. Thus � neighboring clusters based on the distance from the selected cluster center to other

cluster centers are selected. Figure 3.17 shows the example of user placement in his/her

relevant cluster. When the user perform “check-in”, the system places him in his/her

relevant cluster according to the geographical values of the check-in. It is shown in the

example that user is placed in cluster 5 by the system.

Page 88: Welcome to Pakistan Research Repository: Home

69

Figure 3.17 User Placement in the Relevant Cluster

Step 4 and 5: Ranking of the Venue

In the last phase, RTVR model uses KNN algorithm to rank all the selected venues in

order to recommend the top ranked venues to the user. KNN is used because it is proved

to be the most efficient algorithm in terms of performance for ranking especially when

using with Map-Reduce in large datasets [144]. Another advantage of using KNN with

Map-Reduce is its low communication cost therefore it is frequently used for rankings

on cloud with large datasets [179]. For all venues in the selected � clusters, RTVR

ranks the venues based on preferences of the user including ratings provided by social

friends, frequency of check-ins, user to venue distance, and average rating of the venue.

Consequently, the top ranked venues will be suggested as the final recommendation to

the user as shown in (3.10)[130].

���2'�,"3 =4'�.$�4,'�, ‖$�‖ ∀ $� ∈ "

(3.10)

Where " is the set of selected candidates’ venues and '� is the user’s preferences.

Page 89: Welcome to Pakistan Research Repository: Home

70

3.6.1 System Architecture

The proposed model generates real-time recommendations to the end user. As depicted

in Figure 3.18, the first step is the user input, i.e., acquire user’s information using

check-in data. Dataset of foursquare is used having over a million check-ins by different

users. In the second step, the recommendation request of the user is dispatched to the

cloud infrastructure. All the users and venues information including user’s preferences,

check-ins, ratings, social graph, and location of the venues is stored on the cloud. In the

third step, the data of clustered venues is fetched that has already been generated offline

using K-Means clustering. In the fourth step, the user is placed in the relevant cluster

according to the location provided in the check-in. In the fifth step ranking of the venues

is generated using KNN and top ranked matrix of user-venue is created to suggest the

final recommendations to the user.

Figure 3.18 System Architecture

3.6.2 Proposed Algorithm

The proposed clustering based venue recommendation algorithm is presented in

Algorithm 1. Detailed working of proposed algorithm is described as follows. In the

Page 90: Welcome to Pakistan Research Repository: Home

71

first step, the proposed algorithm reads the venues one by one and call MapReduce

mapping function against each venue. Distance is initially set to zero. In the next step

minimum distance is calculated from the center of clusters. Add new cluster to the list

of clusters and test if the cluster size is appropriate. Afterwards, the algorithm resize

the cluster if required. To do this step, minimum size of the clusters and maximum size

of the clusters are found among the generated lists of clusters. In order to achieve equal

cluster size, clusters are resized by dividing it into two or more clusters if the maximum

size among cluster list is greater than 1/k time total size of the data. Otherwise merge

the clusters and generate new cluster list.

Algorithm 1. Clustering using K-Means

Input: Venues Vmn

Output: Clustered Venues

1. Load "��

2. FilePointer = Map "��

3. Create Two Lists

4. List2 = List1

5. Call read ("��)

6. FilePointer = MapCluster ( )

7. Distance_Value = 0

8. read (distance_venue)

9. Distance_Value = minCenter ( )

10. New List of Cluster

11. if cluster size is too large or too low then

12. Resize the cluster

13. Clrmax = findMaxSize (ListofClusters)

14. Clrmax = findMinSize (ListofClusters)

15. end if

16. if Clrmax > 1/k * total size then

17. Resize the cluster

18. end if

19. Add Cluster to List of Clusters

Page 91: Welcome to Pakistan Research Repository: Home

72

The proposed ranking of venues using KNN algorithm is presented in Algorithm 2.

Detailed working of proposed algorithm is described as follows. In the first step, the

proposed algorithm loads all clusters, social graph of the users, ratings provided by the

user, check-in information and all the users in the list. The distance of all the centroids

are calculated and the smallest distance center from the user is selected. In the next step,

k neighboring clusters are selected. For each cluster, feature matrix is constructed and

for each venue average of ratings of the friends for all venues is calculated. Similarly,

for each venue, total check-in for venues is counted, distance from user '�for venue "�

and average ratings of all the users is calculated. In the next step, ranks are generated

for each cluster and then ranking for all the nodes is combined. After combination of

rankings of all the nodes, top � venues are selected for final recommendation.

Algorithm 2. Ranking of venues using KNN

Input: Clusters, social graph, ratings, check-ins, users

Output: Ranked top � venues

1. Load clusters, social graph, ratings, check-ins, users

2. for each user '�

3. calculate the distance 6�and select the center with smallest distance

4. �� = /�)��� 78�'� + �( 9

5. select the neighboring cluster

6. end for

7. for each ( ������������construct the feature matrix with features

8. for each venue in ( ������������

9. calculate Average (Venue . rating (friends �) )

10. Count check-ins for venues

11. calculate distance from user '� venue "�

12. calculate average rating of venues by all users

13. Find the rank within each cluster ( ���������

14. Combine the ranking for all nodes

15. Select Top-� venues

16. end for

Page 92: Welcome to Pakistan Research Repository: Home

73

17. end for

3.6.3 Complexity of Clustering Algorithm

The first phase of algorithm reads list of venues and maps each venue to a certain cluster

that takes :(� × . × log�. ) time where � is the total number of venues and . is the

number of clusters. In the next phase, each cluster is evaluated and clusters are resized

considering maximum and minimum size of clusters having complexity . × (. ×

log�. ). Consequently, complexity of clustering algorithm is represented in (3.11),

which is reduced to the (3.12) and (3.13) :

:(�� × (. × log�. + (. × (. × log�. ))

:2�. × log�. × �� + . 3

:(log�. × (� + .))

(3.11)

(3.12)

(3.13)

3.6.4 Complexity of Ranking Algorithm

In the first step, algorithm calculates distance between the user and all the cluster

centers followed by selection of the cluster with least distance from the user that takes :(. × log (.)) time, where . represents the number of clusters. Afterwards, feature

matrix is calculated for each venue in the selected cluster, which has time

complexity :(), assuming average number of venues in each cluster is that can be

calculated as = � .; . Consequently, the complexity of ranking algorithm is

represented by (3.14):

:(. × log�. + ) (3.14)

3.7 Formal Verification

Basic introduction to HLPN, SMT-Lib, and Z3 solver for a clear understanding of the

reader is provided as follows.

3.7.1 High Level Petri Nets

In wide variety of systems like, parallel, distributed, stochastic, concurrent,

asynchronous and non-deterministic systems, the Petri Nets are extensively applied for

Page 93: Welcome to Pakistan Research Repository: Home

74

their mathematical and graphical modeling [180]. In this wok HLPN, a variant of

conventional Petri is used for the formal verification of our algorithm. HLPN is a 7-

tuple structure: � = �<, =, >, ?, �, @, A� where P refers to the set of places, the

transitions set is denoted by = such that (< ∩ = = ∅) and the flow relation is showed

by > such that �> ⊆ �< × = ∪ �= ∪ < . Moreover, ? is used to map the places < to

data types and transitions rule set is denoted by �. By representing the initial marking, @ is a label on > and A�. The data regarding the organization of the net is provided by

(<, =, > ) and the static semantics are provided by (?, �, @ ), which shows that

throughout the system the information does not change. In HLPN, the places may

contain multiple types of tokens and may also be a, two or more type`s cross product.

To enable any transition the pre-conditions should be satisfied. Furthermore, to enable

particular transition the variables from the incoming flows are used. Likewise, for

transition firing, the post-condition uses variables from outgoing flows.

3.7.2 SMT-Lib and Z3 Solver

To check the satisfiability of formula over theories under consideration, SMT-Lib

(Satisfiability Modulo Theories Library) is used [181] which offers a collective input

platform and benchmarking framework. That platform/framework evaluates the

systems. SMT has been practiced in many areas comprising deductive software

verification. Being developed at a Microsoft Research, SMT-Lib with Z3 solver is used

to prove the theorems. To verify whether the set of formulas are satisfiable in the built-

in theories of SMT-Lib, Z3 solver being an automated satisfiability checker is used.

The correctness of the system is checked during the verification process. The

phenomenon of Bounded model checking confirms that system terminates after finite

number of states (for any of the input parameters) or not. The readers interested to get

a more detailed introduction of Petri nets are encouraged to see [182, 183]. Moreover,

in this checking (a) the system`s description is provided stating its properties (b) system

is denoted by a model (c) a verification system checks whether the specified properties

are present in the model or not. The HLPN model for RTVR is shown in Figure 3.9.

The development of petri nets model involves identification of data types and places,

and the mappings of data types to places. Table 3.4 and 3.5 show the data types and

their mappings, respectively. Figure 3.19 shows the HLPN model for the proposed

Page 94: Welcome to Pakistan Research Repository: Home

75

RTVR algorithms. The rectangular black boxes show the transitions and belong to set =, whereas the circles represent the places and belong to set < in HLPN model.

Table 3.4 Data type for HLPN Model

Data Type Description

Lat A number depicting latitude value

Long A number representing longitude value

VID A number representing logical ID of a venue

PVenue A list containing VIDs, lat, and long of venues

ClID A number for cluster ID

CID A number showing centroid ID

PCl A list containing ClID, CID, and VID

UID A number representing user ID

PUReq A list containing U_Req, UID, lat, long

U_Req A string showing user request

PnClust A list of n closest PCl

Soc_Graph Social graph of the user

P Ratings A list containing ratings, UIDs, and VIDs

PCheck-ins A list containing check-in frequency and VIDs

PSF_ratings A list of UIDs, social friends, ratings (integer), and VIDs

PCI_Freq A list of check-in frequency (integer) against VID

Distance A number showing distance from user to venue

PAvg_Rat List of average rating for VIDs

PRank_Ven List of ranked VIDs from Pn clusters

Ptop_n_Ven List of top n VIDs from Prank_Ven

Table 3.5 Places and mapping used in HLPN model

Place Mappings

φ (VL) Ϸ(PVenue)

φ (CL) Ϸ (Pvenue × PCl)

φ (UR) Ϸ (PUReq)

Page 95: Welcome to Pakistan Research Repository: Home

76

φ (nC) Ϸ (PnClust)

φ (HD) Ϸ (Pcheck-ins × PRatings)

φ (Features) Ϸ (PSF_ratings × PCl_Freq ×Distance ×PAvg_Rat)

φ (nV) Ϸ (Prank_Ven × Ptop_n_Ven)

Figure 3.19 HLPN Model for the Proposed RTVR Algorithms

3.7.3 Modeling and Analysis of Proposed Algorithm

In step 1 all of the venues are clustered based on their geographical locations. As

described earlier K-Mean clustering is used for the said process. Following formula

map to the transition under discussion

�((_") ⩝ � ∈ (� , ∀ ∈ ( ⎹

Page 96: Welcome to Pakistan Research Repository: Home

77

( = �_AB�� (�) ˄

(ˊ = ( ∪ �(C1D , (C2D , (C3D� Based on user request, closest ‘n’ clusters are selected. The following rule highlights

the transition ‘C_S’

�((_�) = ∀� ∈ (�, ∀� ∈ (�, ∀� ∈ (�⎹

(� ∶= ��B�&_���&B�� ((�,(�) ˄

(�ˊ = (� ∪ �(�C DC1D , (�C DC2D ,(�C DC3D � Features are extracted based on user’s social friend’s ratings, check-ins, frequencies,

distances of user to venue, and average ratings for a particular venue. The following

rules maps to the transition ‘F-C’

��>_( = ∀� ∈ (�, ∀� ∈ (�, ∀� ∈ (�, ∀� ∈ (�⎹

(� C1D ∶= (��_�>��&��) ((� C2D , (�, (�)˄

(� C2D ∶= (��_(��>�B� ((� C1D , (�,(�)˄

(� C3D ∶= (��_6��& ((�, (�)˄

(� C4D ∶= (��_�$)��& ((� C2D , (�, (�)˄

(�ˊ = (� ∪ �(�C1D , (�C2D , (�C3D ,(�C4D� Based on the feature calculated in previous step, all venues of selected n clusters are

ranked are sorted in best to worst order. From the sorted list ‘m’ top venues are selected

and displayed to user. The following rule maps to the transition

�(&_�_$) = ∀�� ∈ (��, ∀�� ∈ (��, ∀� ∈ (�⎹

(�C1D ∶= ���._���_$B��B� ((�� C1D ,(�� C2D ,(�� C3D, (�� C4D (��) ˄

(�ˊ = (� ∪ �(�C1D� ˄

Page 97: Welcome to Pakistan Research Repository: Home

78

(�ˊ = (� ∪ �(�C2D� 3.7.4 Verification Property

The objective of the verification was to ensure that the top ranked venues are in fact the

highest ranked venues. As all of the features are translated to number, therefore, the

selected top venues must have highest cumulative sum of feature values, with lesser

distance. The property can be expressed as follows

/E F��ℎ�)ℎB�&2����<&�_�_$B�C�D, <�$)_��&C�D<ℎB._���C�D,<�>_��&��)C�D− 6��&��BC�D 3G

The property was tested is Z-solver and was satisfied in 221 msec.

3.8 Experimental Setup and Results

Extensive simulations are conducted to evaluate performance of the proposed system.

The experimental setup and results are discussed as follows.

3.8.1 Experimental Setup

The details of the experimental setup and parameters used for evaluation are presented

in Table 3.6.

Table 3.6 Experimental Setup

Parameters Values

Total Number of Users 2153470

Total Number of Venues 1143091

Total Number of Check-ins 1021967

Total Number of Ratings 2809581

Edges of Social Graph 27098488

Simulation Tool MATLAB

Cloud Configuration MATLAB Parallel Cloud, Cores 16

Page 98: Welcome to Pakistan Research Repository: Home

79

3.8.2 Results

This section presents the evaluation of the proposed cloud based Real-Time Venue

Recommendation (RTVR) framework. For the comparison, the following existing

recommendation techniques are selected proposed in literature which are closely related

to our work. Cloud based context aware recommendations OmniSuggest [58],

Popularity based ranking (POPULAR) [156], Social-based ranking SOCIAL [156],

and Single Value Decomposition (SVD) matrix factorization [31].

Proposed algorithm is executed using Matlab cloud framework. Being

responding in real-time traffic information mode, RTVR response time is ideally

superior to other systems because of parallelization in the processes. MapReduce is

used for the parallel execution of the proposed model on Cloud environment. To

evaluate the effectiveness of a user recommendation model, following performance

metrics are opted that include: (a) Precision, (b) Recall, and (c) F-measure.

Figure 3.20 Precision with Clustering

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 5 10 15 2 0

Precision

No. of Recommendations

RTVR

OmniSuggest

SVD

POPULAR

Social

Page 99: Welcome to Pakistan Research Repository: Home

80

Figure 3.21 Recall with Clustering

Figure 3.22 F-Measure with Clustering

Figure 3.20, 3.21, and 3.22 show that the RTVR framework achieved the best

performance while considering precision and recall parameters in comparison to the

rest of the schemes except OmniSuggest (each of the result shows the average of 100

random runs). The reason behind this behavior is that the objective of our proposed

model is to generate real-time recommendations by reducing the dataset therefore

accuracy is compromised in terms of precision, recall, and F-Measure. However, the

behavior of RTVR model is analyzed without clustering to check the accuracy of

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 5 10 15 20

Recall

No. of Recommmendations

RTVR

OmniSuggest

SVD

POPULAR

Social

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 5 1 0 15 20

F-M

easures

No. of Recommedendations

RTVR

OmniSuggests

SVD

POPULAR

Social

Page 100: Welcome to Pakistan Research Repository: Home

81

proposed model in terms of precision, recall, and F-Measure. Figure 3.24, 3.25, and

3.26 depicted that RTVR outperformed all the existing techniques.

Figure 3.23 Time Comparison with and without Clustering

However, time complexity to generate the recommendations from the complete

dataset is very high as depicted in Figure 3.23. The time shown in Figure 3.23 without

clustering presented constant behavior because it is not based on clusters. Therefore,

the target of generating real-time recommendations by reducing the dataset is

successfully achieved. However, tradeoff between recommendation quality and

reduced dataset still exists. Quality of recommendations may be affected if the dataset

is significantly reduced to improve efficiency of online real-time processing. Therefore,

the ratio of precision and recall is slightly less than OmniSuggest as depicted in Figure

3.20 and Figure 3.21. However, RTVR outperforms other techniques in terms of

precision and recall. The commonly used and highly cited collaborative filtering

technique, such as SVD, have shown lower performance in terms of precision and recall

compared to RTVR. The popularity-based approaches including POPULAR and

SOCIAL has shown better performance than the collaborative filtering techniques. The

main reason for this behavior is that popularity-based approaches overlook similarity

computations in the design of their models. The recall of RTVR framework was noticed

highest for N=20 due to greater coverage of framework in terms of recommendations.

0

200

400

600

800

1000

1200

1400

1600

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Tim

e i

n S

eco

nd

s

No. of Selected Clusters

with clustering

without clustering

Page 101: Welcome to Pakistan Research Repository: Home

82

In comparison to the rest of selected schemes the proposed cloud based RTVR

model exhibited superior performance in terms of the F-measure. On the other hand,

performance of RANDOM remained low for all the aforementioned metrics. The main

reason of this behavior is that RANDOM only shuffles the candidate set of unvisited

locations for each user and overlooks performing similarity computations.

Figure 3.24 Precision without Clustering

Figure 3.25 Recall without Clustering

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 5 10 15 20

Pre

cis

ion

No. of Recommendations

RTVR

OmniSuggest

SVD

POPULAR

Social

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 5 10 15 20

Recall

No. of Recommmendations

RTVR

OmniSuggest

SVD

POPULAR

Social

Page 102: Welcome to Pakistan Research Repository: Home

83

Figure 3.26 F-Measure without Clustering

Figure 3.27, 3.28, and 3.29 depicts the difference between precision, recall, and F-

Measure values respectively. The effect of the clustering and non-clustering is shown

in the aforementioned figures.

Figure 3.27 Difference between Precision Values

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 5 1 0 15 20

F-M

EASURES

NO. OF RECOMMEDENDATIONS

RTVR

OmniSuggests

SVD

POPULAR

Social

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 5 10 15 20

Pre

cisi

on

No of Recommendations

Page 103: Welcome to Pakistan Research Repository: Home

84

Figure 3.28 Difference between Recall Values

Figure 3.29 Difference between F-Measure Values

Figures 3.30, 3.31, 3.32, and 3.33 show the results for evaluating scalability of

proposed model. Evaluation has been conducted over 5, 10, 15, and 20

recommendations. It has been observed that the proposed model exhibits consistent

behavior in terms of Precision, Recall, and F-Measure while varying the number of

recommendations. The results indicate that the proposed model is scalable while

increasing the number of recommendations.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 5 10 15 20

Reca

ll

No of Recommendations

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 5 10 15 20

F-M

easu

res

No of Recommendations

Page 104: Welcome to Pakistan Research Repository: Home

85

Figure 3.30 Scalability over 5 Recommendations

Figure 3.31 Scalability over 10 Recommendations

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

50 100 150 200 250 300 350 400 450 500

Rati

o

No. of Users

Precision

Recall

F-Measure

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

50 100 150 200 250 300 350 400 450 500

Rati

o

No. of Recommendations

Precision

Recall

F-Measure

Page 105: Welcome to Pakistan Research Repository: Home

86

Figure 3.32 Scalability over 15 Recommendations

Figure 3.33 Scalability over 20 Recommendations

Figure 3.34, 3.35 and 3.36 show the results for evaluating scalability on the basis of

precision, recall, and F-Measure, respectively. Evaluation has been conducted for N =

20 (maximum) recommendations. It has been observed that the proposed model

outperforms other techniques in terms of Precision, Recall, and F-Measure while

varying the number of users. Moreover, the results indicate that proposed RTVR model

is scalable and exhibits consistent behavior while increasing the number of

recommendations.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

50 100 150 200 250 300 350 400 450 500

Ratio

No of Users

Precision

Recall

F-Measure

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

50 100 150 200 250 300 350 400 450 500

Rati

o

No of Users

Precision

Recall

F-Measure

Page 106: Welcome to Pakistan Research Repository: Home

87

Figure 3.34 Scalability Comparison w.r.t Precision

Figure 3.35 Scalability Comparison w.r.t Recall

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

50 100 150 200 250 300 350 400 450 500

Pre

cis

ion

No. of Users

RTVR

OmniSuggest

SVD

POPULAR

Social

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

50 100 150 200 250 300 350 400 450 500

Rec

all

No. of Users

RTVR

OmniSuggest

SVD

POPULAR

Social

Page 107: Welcome to Pakistan Research Repository: Home

88

Figure 3.36 Scalability Comparison w.r.t F-Measure

Figure 3.37 and Figure 3.38 illustrate the results for precision and recall over

number of increasing clusters. The results indicate that precision and recall is improved

as the number of clusters are increased up to 5 clusters. Whereas, precision and recall

starts to decline slowly as the number of clusters are further increased. The reason

behind such behavior is that when the number of clusters are increased, the number of

potential venues also increase.

Figure 3.37 Number of Selected Clusters vs Precision

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

50 100 150 200 250 300 350 400 450 500

F-M

easu

re

No. of Users

RTVR

OmniSuggest

SVD

POPLUAR

Social

Page 108: Welcome to Pakistan Research Repository: Home

89

Figure 3.38 Number of Selected Clusters vs Recall

Figure 3.39 depicts the comparison of RTVR execution on a single node and on

cloud. For this experiment, the proposed model is executed using Matlab cloud

framework. It is evident from the results that the recommendation time is significantly

reduced when RTVR is executed on cloud. It is noteworthy that execution time on a

single node increases rapidly with the increase in number of clusters. The reason behind

such behavior is that on a single node, the increasing clusters have to execute

sequentially. Whereas on cloud framework, each cluster is processed in parallel that

leads to significant reduction in execution time.

Figure 3.39 Time Comparison between Single Node and Cloud

0

50

100

150

200

250

300

350

400

450

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Tim

e (S

eco

nd

s)

Number of Selected Clusters

Single Node

Cloud

Page 109: Welcome to Pakistan Research Repository: Home

90

3.9 Summary

This study has presented multifold contributions by devising a cloud based real-time

venue recommendation framework in social networks for different users. The major

contribution of this work is the integration of knowledge engineering techniques

including K-means, KNN, and collaborative filtering on a cloud infrastructure for

generating real-time recommendations. The proposed RTVR framework considered the

effect of dynamic real-world physical factors in addition to the collective opinions of

the experienced users. RTVR has solved the scalability issues by putting forward a

cloud-based architecture that allocates data and computational intensive loads over

cloud servers. As a result, RTVR always considers precompiled set of experienced

users for all categories that enabled it to recommend appropriate venues for new user

at finer granularity. The evaluation results have proven that performance of the

proposed RTVR framework is superior to many of the existing chosen schemes.

In the next chapter, an attempt is made to overcome the challenges faced in the area of

health recommendations. It is of critical importance to maintain a balanced intake of

food. However, it is quite challenging for a common person to keep track of personal

food requirements because of the massive diversity of dietary components and items.

A systematic food recommendation system is desired to recommend the appropriate

food considering the disease of the person. The major challenge in designing such a

system is the handling of greater volumes of data in terms of ingredients, quantity,

nutrition facts, people’s preferences, and simultaneously taking into consideration a

person’s pathological reports. The system must be scalable enough to handle

recommendation queries from all over the globe. A solution to the aforementioned

challenges is the use of cloud computing.

Page 110: Welcome to Pakistan Research Repository: Home

91

Chapter 4 A Smart Food Recommendation System

Page 111: Welcome to Pakistan Research Repository: Home

92

4.1 Health Recommendation Systems

Recently, an increase in the health information needs and changes in information

seeking behavior is noticed in the whole world [184]. In recent studies [78, 185], it is

reported that 81% of U.S. adults frequently use the Internet. Moreover, the studies

revealed that 59% of US adults reported that they frequently use online health

information seeking for disease diagnoses and treatments [78]. Similarly, in a survey

conducted by Pew Research Center in 2013, 19% of the users in Pakistan are seeking

online health information using their smart phones and internet devices [186]. The

dependency on the internet based health seeking affects the patient physician

relationship because educated patients pose questions or discuss the available treatment

options [187]. As a result, a patient becomes an active participant in the decision-

making practice. This change in the thinking process is referred as patient

empowerment [188]. However, information overloads, imprecise information, and

irrelevant material are the major issues when drawing the conclusions on the personal

health status and taking adequate actions [189]. Due to large amount of medical

information availability on different sources (e.g., news sites, web forums, etc.), the

patient usually lost or feel uncertain when investigating based on his own experience.

Moreover, a diverse and assorted medical dictionary poses one or more obstruction for

laymen such as difficulty in finding the relevant information or understanding the

medical terminologies [190]. As a result, enhanced adapted delivery of medical

contents is capable of supporting users in discovery of relevant information [105].

It is noticed that nowadays medical information available for patient-oriented decision

making has considerably increased. But, the medical information is regularly spread

over different web sites [191]. As an alternate, personal health record systems (PHRS)

are introduced to integrate one’s health data for allowing access to the owner and

authorized health professionals [192]. A recommender system efficiently suggests an

item of interest to a particular user who is relevant to an information system or e-

business systems. The idea behind recommender systems can be adapted to cope with

the special requirements of the health domain. Health Recommendation System (HRS)

states a recommendable item of interest and structurally it denotes piece of non-

confidential, systematically confirmed or at least in general acknowledged medical

information that in itself has no connection to an individual’s medical history profile.

Consequently, it is possible to calculate and distribute potentially related information

Page 112: Welcome to Pakistan Research Repository: Home

93

items from trust worthy health associated data repositories. So, the users might be

supplied with high excellence state of information to handle certain disease or settle in

his or her everyday life habits.

In such a scenario, a common person communicates to the underlying HRS based PHRS

system without requiring direct support from a medical expert. In return, system

generates laymen-friendly contents based on person’s long term medical history profile.

The most relevant items are offered inside the user interface of PHR system. After

selection of highest ranking documents, the individual becomes able to acquire health

information. Thus, the risk for retrieving “incomplete, misleading and inaccurate”

material using most popular search engines could be minimized.

Finding the most relevant and important information is a challenging task

especially when health related information systems are considered. People’s desires for

searching an individual’s health associated information is often known as health

information seeking [193]. The author of [194] established three motivations (a)

findable, (b) comprehendible, and (c) context awareness that are discussed below,

regarding why people use the Internet for finding relevant materials: “desires for

reassurance, for second opinions for greater understanding of existing information and

to circumvent perceived external barriers to traditional sources”.

(a) Findable: Determining a relevant resource is hard to find irrespective of internet

resource availability. Many times, user cannot generate a suitable search string for

narrowing down the search space for the objective medical problem. In addition, the

association between medical concepts is very difficult to understand irrespective of

hyperlinks availability. Considering more worse situation, many layman do not fully

explore search result, as “users of the Internet explore only the first few links on general

search engines when seeking health information” [195].

(b) Comprehendible: This problem arises from partial perception of medical

terminology for the people lacking grip on medical knowledge. Consider a case where

layperson has no mental model connected to some portion of information. In particular,

while a medical issue is concerned, the medical terms accessed from several online

resources is either misread or misinterpreted by the laymen. There exits multiple

instances where people misread or misunderstood the medical terminologies and later

comprehended their faults [196].

(c) Context Awareness: Usually, health connected information is provided or

interpreted in a medical context, often it comes in combination with a case or medical

Page 113: Welcome to Pakistan Research Repository: Home

94

situation. Based on a search engine or encyclopedia based sources, the chance of absent

of this kind of context is very common. As a result, this absent of data can lead

inadequate decisions [197].

A health recommender system is bound to deliver context-based high quality and

precise material. Particularly, a semantically-enabled HRS possibly deals with complex

association among medical concepts, resolving medical abbreviations and

categorization codes and acclimatizes to an individual’s medical rank of knowledge.

This type of system can also mitigate the impacts of information overload [198] because

of the reason that it offers users only those items which are most relevant for a given

case.

As health recommendation system is an emerging field therefore limited

research work has been carried out so far. HRS can be categorized as (a) consultant

health care advisor [199, 200], (b) exercise HRS[201-203], (c) dietary HRS[15, 23, 45,

204], and (d) disease specific [56, 202]. Consultant health care advisor is further sub

divided into (a) online doctor to patient [199, 200, 205] recommendation, and (b) offline

doctor to patient recommendation[206, 207]. Dietary HRS is further subdivided into (a)

food recommendation [46, 208], (b) recipe recommendation [57, 209], and (c) menu

recommendation [43, 210]. Disease specific HRS is further subdivided into (a)

cardiovascular [56, 211], (b) diabetic [55, 212], and (c) obesity [14, 213]. The hierarchy

of health recommendation system is depicted in Figure 4.1.

Page 114: Welcome to Pakistan Research Repository: Home

95

Figure 4.1 Hierarchy of Health Recommendation System

4.1.1 Significance

Due to lack of information on healthy diet, people mostly rely on medicines for various

health issues instead of adapting preventive food based measures. Various studies prove

that most of the medicines hold unwanted minor or major side effects [214]. Therefore,

considering the massive growth and adoption in the Information and Communication

Hea

lth R

eco

mm

endat

ion S

yst

ems

Consultant Health Care Advisor

Online Doctor to Patient Recommendations

Offline Doctor to Patient Recommendations

Exercise

Dietary

Food Recommendations

Recipe Recommendations

Menu Recommendations

Disease Specific

Diabetic

Cardiovescular

Obesity

Page 115: Welcome to Pakistan Research Repository: Home

96

Technology (ICT) sector, some initiatives and smart systems are desired for the novice

users to solve their health problems using various food items instead of relying fully on

medicines. For instance, deficiency of calcium may cause various diseases [215], such

as bone loss (osteoporosis), inactivity of the parathyroid gland (hypoparathyroidism),

weak bones (osteomalacia), and some muscle disease (latent tetany). Variety of

medicines (dietary supplements) is available to overcome the calcium deficiency.

However, such medicines also incur some of the side effects and problems, such as

vomiting, loss of unusual weight, appetite loss, muscular pain, mood changes, increased

thirst and urination, tiredness, weakness, and headaches [216]. Whereas calcium

deficiency may be mitigated using various food items instead of using the dietary

supplements. Numerous dairy foods, such as milk, yogurt, and cheese are main source

of calcium. Similarly, dark leafy green vegetables like turnip, spinach, collard greens,

kale, fortified orange juice, fortified cereals, sardines, soybeans, breads, and waffles are

enriched with calcium.

All of the food items consist of different nutrition facts which have different values.

People often use dietary supplements to overcome the deficiencies. For instance, one

of the dietary supplements for calcium deficiency is known as CMZ-3 in which amount

of calcium is 1000 mg in three serving tablets daily which is the recommended dose.

One cup (8-10 oz) of milk contains 300-350 mg of calcium, which indicates that 3 cups

of milk are enough daily to overcome the deficiency of calcium. Therefore, why use

dietary supplements when food items are available with nutrition facts.

Beside medicines, there are other factors that also have influence on a healthy

life style. These include different foods and no exercise. Various studies showed that

regular exercise is associated with improved mood, overall satisfaction, and emotional

or bodily well-being in all ages [213]. Therefore, people with less or no exercise are

often convoluted in health related issues. One of the main reasons of doing no exercise

is the excess use of technology [217]. People often spend their extra time in using

technology like social networking sites (Facebook, twitter), playing video games, use

of smart phones, and watching television. The excess use of technology may affect a

healthy life style [217].

Different studies [218-220] show that adults and children are taking more

calories in fast foods and other restaurant foods rather than homemade foods. These

Page 116: Welcome to Pakistan Research Repository: Home

97

studies show that taking more fast food than nutritious food can lead to poor health and

poor nutrition. The study [219] showed that children eating fast foods frequently, (3

times a week at least) are at high risk of rhinitis (congested nose) and asthma. Another

study [220] showed that eating fast food like hamburgers and pizza and baked items

like cake and doughnuts may be the cause of depression. Following are some of the

serious impacts of fast foods on different body systems.

(a) Carbohydrates: Most of the fast food items and soft drinks are enriched with

carbohydrates which ultimately yield high calories. High amount of carbohydrates is

the main source of producing a spike in blood sugar level resulting in disturbing the

insulin response in the body. The disturbed insulin response contributes in resistance of

insulin, and inflamed patches of skin especially in children, and type-2 diabetes. [212,

219].

(b) Added Sugar: There is no nutritional value in added sugar but they are enriched

with calories. The excess of extra calories due to added sugar results in more weight

which is one of the main causes of heart diseases [221]. Consuming foods with high

sugar and carbohydrates allow the bacteria residing in the mouth to produce acids. The

acid produced by the bacteria can harm or destroy tooth enamel which resists cavities.

Once the enamel is destroyed, it cannot be replaced [222].

(c) Sodium: Another factor affecting the health of people due to fast food is sodium.

Intake of excess of sodium may result in increasing the risk of high blood pressure,

kidney stones, stomach cancer, osteoporosis (fragile bones), and enlarged heart muscles

[204].

(d) Trans Fats: Trans fats are the sub type of saturated fats. Most of the fast foods are

enriched with Trans fats which are the primary source or raising the low-density

lipoprotein (LDL) cholesterol levels and lowering the high-density lipoprotein level

(HDL) cholesterol. Both the levels when altered may cause heart disease and increase

the risk of type 2 diabetes [223].

From the above discussion, it can be concluded that the balancing of various dietary

components is of critical importance for staying healthy. The recommendation systems

can be employed to suggest a balanced diet. However, most of the traditional diet

Page 117: Welcome to Pakistan Research Repository: Home

98

recommendation systems do not consider a person’s deficiencies or excess of chemical

compounds in the body. Therefore, those parameters are considered in our model along

with nutritional demands, and various types of food components from diversified

dietary sources.

4.2 Motivation

One of the major factors for a healthy life is daily diet and food, specifically, for the

people suffering from some minor or major diseases. eHealth initiatives and research

efforts aim to offer various pervasive applications for novice end users to improve their

health [44]. Various studies depict that inappropriate and inadequate intake of daily diet

are the major reasons of various health issues and diseases. A study conducted by World

Health Organization WHO estimates that around 30% of the total population of the

world is suffering from various diseases, and 60% deaths each year in children are

related to malnutrition [47, 48]. Another study by WHO reports that inadequate and

imbalanced intake of food causes around 9% of heart attack deaths, about 11% of

ischemic heart disease deaths, and 14% of gastrointestinal cancer deaths worldwide

[45]. Moreover, around 0.25 billion children are suffering from Vitamin-A deficiency

[49], 0.2 billion people are suffering from iron deficiency anemia [50], and 0.7 billion

people are suffering from iodine deficiency [51].

Generally, a person remains unaware of major causes behind deficiency or

excess of various vital substances, such as calcium, proteins, and vitamins, and how to

normalize such substances through balanced diet. Several works [15, 46, 52-57]

proposed different recommendation systems related to food. These systems can be

categorized as: (a) food recommendation systems [46, 52], (b) menu recommendations

[54], (c) diet plan recommendations [15], (d) health recommendations for different

diseases such as diabetes and cardiovascular [55, 56], and (e) recipe recommendations

[53, 57]. All the aforementioned systems provide recommendations to either some

specific disease or balance the diet without considering information about any disease

or nutrition deficiency in the body. For instance, in [52], a food recommendation system

is proposed for the patients of diabetes. The system recommends various foods for

diabetic patients without considering the diabetes level that may fluctuate frequently.

Similarly, the authors in [46] do not consider the nutrition factors that have significant

importance for a balanced diet recommendation.

Page 118: Welcome to Pakistan Research Repository: Home

99

Keeping in view the above mentioned facts and figures, it is of critical

importance to maintain a balanced intake of food. However, it is quite challenging for

a common person to keep track of personal food requirements because of the massive

diversity of dietary components and items. A systematic food recommendation system

is desired to recommend the appropriate food considering the disease of the person. The

major challenge in designing such a system is the handling of greater volumes of data

in terms of ingredients, quantity, nutrition facts, people’s preferences, and

simultaneously taking into consideration a person’s pathological reports. The system

must be scalable enough to handle recommendation queries from all over the globe. A

solution to the aforementioned challenges is the use of cloud computing. Cloud

computing is an innovative and emerging platform that enables users to perform on-

demand scalability of computing and storage resources [224].

In this chapter, a cloud based food recommendation system called Diet-Right is

presented that considers the users’ pathological tests results, and recommends a list of

optimal food items. To achieve optimal results, an algorithm is developed based on Ant

Colony Optimization ACO. A database of 345 pathological test reports and their normal

ranges are designed. A database was created by performing a field survey and collecting

the information about pathological reports from different laboratories [225-227]. The

collected data was verified by a pathologist of a hospital. Moreover, a database of 3,400

food items with 26 entries for most common nutrition was taken from the official

website of composition of foods integrated dataset (CoFID) [228]. Based on the real-

time input of user’s parameters, the Diet-Right recommends top ranked food items to

the user.

In medical practice, sometimes, pathological tests are required to identify a

particular disease. A pathological test report usually indicates deficiency or excess of

certain compounds/parameters in human body, e.g., levels of iron, calcium, or red blood

cells (RBC) count, etc. which may cause particular disease. In this dissertation, a novel

food recommendation system is presented specifically dealing with the pathological

tests results. Our system considers diseases related to pathological reports, and most

common nutrition factors in recommending the food items to the users. For this

purpose, a database of 345 pathological test reports is used to categorize various

diseases that occur due to the deviation from the normal ranges of

Page 119: Welcome to Pakistan Research Repository: Home

100

compounds/parameters. Moreover, a system is designed that allows users to input

values for a specific parameter. Based on the deviations of the input parameter value

from the normal ranges, the system generates a diet plan that aims to cover those

abnormalities. Furthermore, ant colony algorithm is used to train the system with the

values of various parameters’ ranges and diseases.

4.3 Related Work

Several works have been proposed for different recommendation systems related to diet

and food. These systems are used for food recommendations, menu recommendations,

diet plan recommendations, health recommendations for specific diseases, and recipe

recommendations. Majority of these recommendation systems extract users’

preferences from different sources such as users ratings [106, 209], recipe choices [229,

230], and browsing history [231-233]. For instance, in [230], a recipe recommendation

system is proposed using social navigation system. The social navigation system

extracts users’ choices of recipes and in return recommends the recipes. Similarly, in

[233], a recipe recommendation system is proposed that is capable of learning similarity

measure of recipes using crowd card-sorting. The above mentioned recommendation

systems lack in solving a common problem known as cold start problem. All these

system must wait for the users to enter enough data for the effective recommendations

[234]. Some of the commercial applications such as [235, 236] offer users for a quick

survey in order to get users preferences in a short time. For instance, the survey used

by [235] is specifically designed to match the lifestyle of the user i.e., healthy,

sportsman, pregnant, etc. The survey also attempts to avoid various foods which do not

match the user’s lifestyle. Similarly, a questionnaire is used by [236] through which a

user answers different questions about his/her lifestyle, food preferences, nutrient

intake, and habits. The system once extracts all the basic information is then able to

recommend different meals for daily and weekly basis.

A Food Recommendation System FRS [52] is proposed for diabetic patients

that used K-mean clustering and Self-Organizing Map for clustering analysis of food.

The proposed system recommends the substituted foods according to nutrition and food

parameters. However, FRS does not adequately address the disease level issue because

the level of diabetes may vary hourly in different situations of the patient and the food

recommendations may also vary accordingly.

Page 120: Welcome to Pakistan Research Repository: Home

101

Tags and latent factor are used for android based food recommender system

[46]. The system recommends personalized recipe to the user based on tags and ratings

provided in user preferences. The proposed system used latent feature vectors and

matrix factorization in their algorithm. Prediction accuracy is achieved by use of tags

which closely match the recommendations with users’ preferences. However, the

authors do not consider the nutrition factors in order to balance the diet of the user

according to his needs.

Content based food recommender system [53] is proposed which recommend

food recipes according to the preferences already given by the user. The preferred

recipes of the user are fragmented into ingredients which are assigned ratings according

to the stored users’ preferences. The recipes with the matching ingredient are

recommended. The authors do not consider the nutrition factors and the balance in the

diet. Moreover, chances of identical recommendation are also present because the

preference of the user may not change on daily basis.

In [5], knowledge based dietary nutrition recommendation system is proposed

for obesity. The recommendations include dietary nutrition and diet menus for

individuals using collaborative filtering technique. An application for mobile users is

also developed in order to recommend the dietary nutrition and menus to the users.

Similarly, a food recommender system is proposed in [208] for patients in care

facilities. The application is designed for caregivers in the care facilities in order to

offer the food according to the patient preferences.

The above mentioned food recommendation systems are specifically dealing

with some diseases or related to balance the diet. In case of food recommendation for

specific diseases, the systems recommend different foods for patients without knowing

the level of disease which may vary in different cases. Similarly, in case of food

recommendations to balance the diet, nutrition factors are ignored which are very much

important to recommend food and balance diet.

4.4 Proposed Model

The main focus of this work is to provide dietary assistance to different people who are

suffering from common diseases. The proposed model recommends various foods and

nutrition to the people based on their pathological test reports. Every pathological

report has some indicators that are calculated based on the nature of the tests. For

Page 121: Welcome to Pakistan Research Repository: Home

102

instance, if a doctor advised a patient to take pathological test of blood, then the

common test entries include the values of hemoglobin, red blood cells (RBC), white

blood cells (WBC), plasma, and sugar. Normal ranges of the aforementioned indicators

are usually given in the test reports. In this way, the patient can identify the

abnormalities after comparing with the normal ranges. In our proposed system, a user

is provided with the complete list of the test parameters to make selection from. The

user inputs the specific values of test report in the selected parameters. The data of

normal ranges is gathered for tests including blood plasma and serum, urine, stool,

cerebrospinal fluid (CSF), and gastric and secretion tests. A matrix of 345 entries was

constructed. Each individual components of a test, e.g., blood test have normal ranges

with lower and upper bound. The ranges of the same component may differ on the basis

of gender, age groups, and fasting or no fasting. Our system is trained on various types

of age groups and their respective ranges of parameters. This allows the system to

suggest diets as per needs of the users.

4.4.1 Diet-Right Architecture

In majority of the existing food recommendation systems, centralized architecture is

used [15, 46, 52-57]. The main disadvantage of using such system is scalability, when

dealing with the massive amount of data. A cloud based solution is proposed to offer

the scalability and pervasiveness, where the smart phone users can conveniently access

the recommendation system as depicted in Figure 4.2.

Figure 4.2 Architecture of Diet-Right

Page 122: Welcome to Pakistan Research Repository: Home

103

The model takes the input values as a first step from the user. User enters the

demographic data including gender and age as well as the value of the pathological test

reports. The values of the pathological reports varies for different age groups and

gender. These values are sent to cloud infrastructure in second step and are compared

with the normal ranges that are stored in the database. In the third step, the abnormality

level of the pathological test reports is computed. In some cases, the abnormality level

may be sufficient and in some cases the level may be insufficient. Therefore, the

proposed algorithm must note the abnormality level carefully in order to recommend

the food items. In the next step, the weight assignments and matrix generation process

is carried out. Detailed discussion on weight assignment and matrix generation is

discussed in the next section. In the fifth step, ranks are calculated for each food item

and are sorted in descending order. In the sixth step, the user is provided the

recommended list of food items.

4.4.2 Proposed Algorithm

In this subsection, the food recommendation process is presented using variant of ant

colony approach on a graph of foods to generate the optimal food set for the users. In

Diet-Right, Ant Colony Optimization (ACO) technique is used. There are some other

techniques also available for solving optimization problem such as Particle Swarm

Optimization (PSO) [237] and Genetic Algorithm [238]. The effectiveness of the three

techniques are intensely investigated by comparison discussed in the next section. After

the quantitative analysis of the aforementioned techniques, ACO is selected for this

study. ACO metaheuristic is a constructive and population based-approach which relies

on the social behavior of ants. It is recognized as a most powerful approach for the

solution of combinatorial optimization problems [239]. One of the motivation of using

ACO is that it can be run as a distributed algorithm, therefore, it is suitable to run on

cloud environment whereas other techniques lack such ability. Some other advantages

of using ACO is its behavior of quick discovery of optimal solutions and due to these

advantages it can be used in dynamic applications. The main steps used in our proposed

Algorithm are explained as follows:

Each food item is placed on nodes and a strongly connected graph is generated as shown

in Figure 4.3. Each link of graph has associated H and I values, where I is the randomly

initialized pheromone, and H is the heuristic information initialized as the inverse of

Page 123: Welcome to Pakistan Research Repository: Home

104

squared sum of difference of all the ingredients �. In (4.1), . represents index for food

ingredient, � and � represent ��� and ��� food item, I represents a single ingredient in a

certain food item, and m is the total number of ingredients in a certain food item.

��� = ∑ ��� − ����� ��� .

(4.1)

Where, H is used to control exploration and exploitation of ACO and the values of H ∈�0,1 .

Figure 4.3 Graph Representation of the Problem

After initialization, each ant constructs its local solution by visiting nodes which

provide best cost in terms of low error compared to targets. Target vector represents the

amount of food ingredients required against the particular disease. Target vector is

predefined based on pathological reports, for instance, target vector for user with

calcium deficiency may ranges from 9 to 10.5. The different nodes or food items are

selected using transition rule which selects a path with highest transition probability.

Transition probability is given by (4.2)[240]:

������ = � �� ���� × �� ����∑ [�����] × �� ����∈��

, �� � ∈ �

0 ��ℎ������

(4.2)

Where, I��& represents the pheromone level at time &, H��& is the heuristic information

at time & , and �, J are the hyper parameters in the model used to weight heuristic

Page 124: Welcome to Pakistan Research Repository: Home

105

information and pheromone level used for fine tuning. Moreover, . represents ant, � represents initial node, � is the target node, � is index of current selected path, and �

represents the solution.

When an ant selects a path among all existing paths excluding the path in the solution,

it updates the pheromone level locally as depicted in (4.3)[120].

���� � + 1 =����� ��� × �+ ��� ,�1 − ������ �

�� � ∈ ��� ��ℎ������

(4.3)

Where, I�� �& + 1 is new pheromone level that is increased by amount KI , and

evaporation is governed by multiplication of pheromone decaying parameter L. Also, �

is the solution of the .�� ant at time &.

Each ant provides locally optimized food set based on the nutrition expert

recommendations, but here, the interest is in the globally optimized solution. In this

study, supervised approach is used therefore Root Mean Squared Error (RMSE) is used

for the selection of globally best solution. To do so, global solution is initialized to

EMPTY set and solve for each ant. Initially, solution returned by the first ant is

considered as the global solution. Afterwards, when rest of the ants return with a

solution, their RMSE is compared with the current global solution replacing the global

solution with the solution having minimum RMSE value. For fast convergence of the

solution, the pheromone level is updated again using the same formula, but the update

is only for the path that is globally optimal solution as depicted in (4.4)[120].

���� + 1 = ����� ��� × �+ ��� ,�1 − �������

�� � ∈ ��� ��ℎ������

(4.4)

In food selection process, there is a need to select diversified foods to enhance

the acceptance of foods among different people. The heuristic information is managed

and updated in such a way that the diversity among foods is maximized. For heuristic

information update, (4.5)[120] is used.

�� =1��

� ���� ���� �1 + ��� ����� � ! ,�

����� �"��� (4.5)

Page 125: Welcome to Pakistan Research Repository: Home

106

Where, � is the selected number of foods, M� is the number of times a food is

selected in whole iteration, and m, M� are the parameters used to balance the solution in

terms of local and global perspective. The used heuristic information facilitates in the

selection of foods with minimal redundancy. Algorithm 1 presents food

recommendation using ACO.

Algorithm 1: Food Recommendation using ACO

Input: Dataset �f, n� f foods, n nutrition, � (�) prescribed nutrition plan, and

maximum iterations

Output: Selection of optimized food Set R based � (�)

1: Set initial values of the heuristic information ���, � and level of

pheromone trails ��,� randomly and ��� ←

2: Sg ← ∅

3: repeat

4: Generate and randomly place ants at different nodes of the graph

5: � ← 1

6: for . ≤ ���&� do

7: while all nodes are not visited do

8: Calculate transition probability using 2

9: Update pheromone locally ���� using 3

10: Add ��)������ �& � & � t

11: end while

12: if �� = ∅ then

13: �� �& = � �&

14: else if NO= − ∑ ����� �!����

≥ OF= − ∑ �� �� �"!�#

��� GQ then

15: �� �& = � �&

16: end for

17: Update pheromone globally I��� using 4

18: Update heuristic information �� using 5

19: R=, �� = O= − ∑ ����� �!����

20: until R=, �� ≥ =$ or maximum iterations reached

21: Return ��

Page 126: Welcome to Pakistan Research Repository: Home

107

4.4.3 Ranked List Generation

Collaborative filtering is the most widely used technique in recommendation system

where users are likely to get similar recommendations when they have matching

preferences [98, 110]. There exists two different approaches to express user

preferences, i.e., implicit rating and explicit rating. Implicit rating is used when user

preferences are expressed in terms of number of clicks, views, or purchases etc.

Whereas, explicit rating is used when ratings for different items are available. Various

techniques have been proposed to improve the accuracy of rating predictions that

attempts to predict the rating of users on unseen items [70, 139, 200]. In practice,

however, a ranked list consisting of top-K recommended items is presented to the user

[3, 109, 110, 195]. It has been shown in relevant studies that ranking based models

outperform rating-prediction based techniques in terms of higher recommendation

accuracy for the problem of top-K recommendations [62, 134]. This is mainly due to

the fact that users are likely to pay more attention to items at the top of list instead of

items at the bottom of the ranked list. Consequently, focus is shifted towards accurate

ranked list generation instead of improving the performance of rating prediction.

Therefore, ranking technique is applied on top of our proposed ACO based model for

generating top-K ranked recommendation list.

The proposed model uses ACO for generating optimized list for the users. The proposed

ACO model generates optimized list of 100 food items according to the pathological

reports of the users. To generate the personalized recommendation list of top-K items

for the user, optimized list computed by ACO must be ranked considering user

preferences such that the user is provided with the top-K recommendations. For

experimentation, the value of K is varied between 1 and 20. To this end, ranking

techniques are used to generate the personalized recommendation list for the user.

Variety of ranking techniques are available to rank the final list generated by the

underlying algorithm. To select the best approach for ranking in terms of accuracy,

time, and RMSE, a qualitative analysis of the state-of-the-art methods are performed

that will generate a final ranked list of food items for the user. The techniques included

for analysis are: (a) user-based KNN [241], (b) item-based KNN [108], (c) matrix

factorization [242], (d) Bayesian [243], and (e) most popular [109]. Dataset of Amazon

food items is used that comprises 1,297,156 ratings over 140,000 food items [244].

Page 127: Welcome to Pakistan Research Repository: Home

108

However, those food items are included in the analysis for which nutrition facts are

available from the CoFID [228] dataset used in our model. The selected ranking

techniques are briefly described here.

(a) K-Nearest Neighbor

KNN is a machine learning technique that classifies the data and falls in the category

of lazy learning algorithms. It takes � closest training sets as input for processing in

the feature space. It consider the set of objects for which classification is to be done

prior to initiation of classification. This step is known as training set for the KNN. The

distinguishing feature of KNN to K-means, another machine learning technique, is its

high sensitivity to the defined local structure of data. The KNN algorithm adds training

set in list of training examples. Later on, for each query, it estimates K neighbor from

the training example list. In response of query it returns back the class that represents

the highest . instances. For ranking, KNN based approaches consider labeled

neighbors in the query feature space to rank the data. Two variations of KNN ranking

techniques are commonly used for recommendation namely KNN user-based and KNN

item-based.

(i) KNN User-Based:

In KNN user-based approach, to estimate the rating user u would give to item i i.e., �~��, � . Let U represents set of all the users, and I represents set of all the items, then

similarity between user u and similar users � , which is given by (4.6) [241].

��� ��, � = ∑ ���, � . �(� , �)�∈%(&,&')

O∑ ���, � �∈%(&,&') O∑ ��� , � �∈%(&,&')

(4.6)

In equation (4.6), �(�, � ) is the set of all the items rated by user u and � . Set of neighbors

N(u) of user u are based on the similarity calculations and it ranges from 1 to |'| − 1,

i.e., all the other users. Therefore, �~��, � can be calculated as the adjusted weighted

sum all the given ratings �~�� , � where � T ∈ �(�)

�~�� , � = �(�) +∑ �����, � . ��� , � − �(�) & '∈((&)∑ |�����, � |& '∈((&)

(4.7)

Page 128: Welcome to Pakistan Research Repository: Home

109

Where �(�) is the average rating of user u.

(ii) KNN Item Based:

The difference between user-based KNN and item-based KNN is only the symmetry

between users and items in all the KNN algorithms. Therefore, equation (4.6), and (4.7)

can be re-written for KNN item-based as (4.8), and (4.9) [245]

��� ��, U ̅ = ∑ ���, � .�(U,̅ �)&∈(�,)̅)

O∑ ���, � &∈(�,))̅ O∑ ��U,̅ � &∈(�,)̅)

(4.8)

�~�U,̅ � = �(U) +∑ �����, U ̅ . ��U,̅� − �(U) ) '∈((�)∑ |�����, U ̅|)̅∈((�)

(4.9)

(b) Bayesian Networks:

Bayesian networks are used to characterize knowledge about a domain that is not

clearly defined. Bayesian networks are usually represented through a directed acyclic

graph (DAG). Particularly, each node in the DAG denotes a random variable, whereas

the edges between the nodes of DAG represent direct probabilistic inter-dependencies

between the connected random variables. The probabilistic dependencies among the

nodes (random variables) are calculated using known computational and statistical

techniques. Consequently, Bayesian networks bring together the principles from

probability theory, statistics, graph theory, and computer science.

Bayesian networks generate a system model based on training data where each node

represents a decision tree and edges signify user information. The advantage of using

Bayesian network is that the system model can be developed offline that can take hours

or days. However, the created model is very compact, fast, and nearly as accurate as

nearest neighbor models [246]. Bayesian networks have proven to be more effective in

application areas where knowledge about user preferences does not change rapidly with

reference to the time required to develop the model. Alternatively, Bayesian network

model are considered inappropriate for application scenarios wherein user preferences

Page 129: Welcome to Pakistan Research Repository: Home

110

are updated frequently. Consequently, Bayesian networks is selected for

experimentation in conjunction with our proposed food recommendation model

because user preferences regarding food are not expected to change frequently. For

Bayesian ranking calculations, the essentials needed are the set of different variables

and set of directed edges between these variables. Moreover, all variables also have a

set of mutually exclusive states. Directed edges and all the variables form a directed

acyclic graph. For each variable X with parents %�,%,%�, … … … %�, there must be

attached a probability table <V�|%�,%, %�, … … … %�W, then the equation for Bayesian

ranking is [247]

<X��4%�Y = <X%�4��Y<(��)<X%�4��Y<(��) + <X%�4��Y<(�) + ⋯ + <X%�4��Y<(��)

(4.10)

From equation (4.10), the simplified form can be written as

<(%�) = +<X%�4� Y<(� )

��

(4.11)

(c) Matrix Factorization

Matrix factorization, also known as matrix decomposition, divides a matrix into a

product of matrixes. The most popular matrix factorization method is lower and upper

LU decomposition [248]. The LU decomposition, decomposes a matrix into lower

triangular matrix and upper triangular matrix. The advantage of matrix factorization is

its ability to find underling latent factors and/or to predict missing values of the matrix

that represents some recommender system [81]. For recommender systems, matrix

decomposition technique characterizes users and item based on vectors of factors which

is estimated based on item rating pattern. State of the art recommendation systems

places the input data in matrix where one user are located on one dimension with item

of interest residing on other dimension. Formally, let ' represent a set of users and a

set � represent a set of items. Let A be the matrix of size |'| × |�| that contains all the

ratings of the users assigned to items. Also, assume that Z latent features are required

to be found. Therefore, two matrices � with dimensions |'| × Z and % �with dimensions |�| × Z , are also needed to be found such that, when matrices

Page 130: Welcome to Pakistan Research Repository: Home

111

� and % are multiplied, the result approximates to matrix Z as indicated in equation

(4.12) [249].

# ≈ $ × %� = #& (4.12)

In this way, each row of matrix � and % would represent the strength of the

association between user and features, as well as, item and features, respectively. For a

user ��, prediction of rating for item �� can be calculated by the dot product of the two

vectors corresponding to �� and �� as shown in (4.13) [249].

�̂�� = $�� %� = � $��

���%�� (4.13)

(d) Most Popular

As the name implies, most popular recommendation technique recommends the items

to the users that are most popular among other users. Particularly, items that have higher

average ranking (popular among the users) and are recommended / ranked by higher

number of users are recommended to the user under consideration. Mathematically,

most popular ranking [245] can be expressed as

���.2<� �� 3 = |'(�)|, [ℎB�B '�� = �� ∈ '|∃�(�, �)� (4.14)

Where �(�, �) is the rating of user u on item i.

4.4.4 Complexity of Proposed Algorithm

In ACO based food recommendation algorithm, the outermost loop is executed at most � times, where � represents the maximum number of iterations. The inner for loop is

executed . times that represents the total number of ants. Whereas, inner while loop is

executed � times representing total number of food items. Consequently, the

complexity of ACO based algorithms is captured by equation (15):

:�� × � × . (4.15)

Page 131: Welcome to Pakistan Research Repository: Home

112

4.5 Formal Verification

The basic introduction is provided to HLPN, SMT-Lib, and Z3 solver for a clear

understanding of the reader in section 3.4. Table 4.1 and 4.2 show the data types and

their mappings, respectively. The HLPN model for RTVR is shown in Figure 4.4

Table 4.1 Data Types for HLPN Model

Table 4.2 Places and mapping used in HLPN model

Place Mapping

φ (Init) Ϸ (Kants × Lpher_trial × Gpher_trial ×Nut x HI × GSol)

φ (Nod) Ϸ (Nodes x Kants ×Lpher_trial × Gpher_trial × Nut × HI × GSol)

φ (Prob) Ϸ (Kants × Trans_prob × Lpher_trial × Gpher_tria × Nut × HI ×GSol )

φ (Update) Ϸ (Kants × Lpher_trial × Gpher_trial × GSol)

φ (GS) Ϸ (GSol)

φ (SS) Ϸ (GSol)

Data Type Description

Kants A list representing k number of ants in the system.

Lpher_trial A list depicting local phenomena trail of ants.

GSol A list showing global solution.

Gpher_trial A list depicting global phenomena trail of ants.

Nut A list showing nutritional values.

HI A list containing heuristic information of nutrition.

Nodes List of nodes in the system.

Trans_prob A list with transitional probability of every ant.

Page 132: Welcome to Pakistan Research Repository: Home

113

Figure 4.4 HLPN Model for the Proposed Diet-Right Algorithms

4.5.1 Modeling and Analysis of Proposed Algorithm

The process starts with the initialization of values, where ants are given local and global

phenomena trials. Global solution is initialized with NULL and heuristic information is

initialized against nutrition. The following rule shows the process

R(I_V) =⩝ c₁ ϵ C₁,⩝ c₂ ϵ C₂ ,⩝ c₃ ϵ C₃ ⎹ (C1D ∶= ��� �(�C2D , (�C3D , (�C1D� ˄

Page 133: Welcome to Pakistan Research Repository: Home

114

(C5D ∶= ���&_\� �(�C4D , (�C5D� ˄

(C6D ∶= �'@@ ˄

(ˊ = (� ∪ �(C1D , (C5D , (C6D� ˄ (�C3D ∶= (C2D˄ (�C4D ∶= (C3D˄ (�C5D ∶=(C4D˄ (�C6D ∶= (C5D˄ (C7D ∶= (C6D˄

(�ˊ = (� ∪ �(�C3D , (�C4D , (C5D, (�C6D,(�C7D� Later ants are randomly assigned to the nodes.

�(/_/) = ∀� ∈ (�, ∀� ∈ (�, ∀� ∈ (�⎹

(�[1] ∶= /���)� (�����((�C2D)) ˄

(�ˊ = (� ∪ �(� C1D� ˄

(�ˊ = (� ∪ �(�� The transitional probability of every ant is calculated according to equation (x). . .

�(=_<) = ∀� ∈ (�, ∀� ∈ (�⎹

(�C2D: = (��+,-.�(�C1D, (�C2D ˄

(�C3D: = (�C3D ˄ (�C4D˄ ∶= (�C4D ˄(�C5D: = (�C5D ˄(�C6D: = (�C6D ˄(�C7D: = (�C7D ˄

(�ˊ = (� ∪ �(� C2D, (� C3D, (� C4D, (� C5D, (� C6D, (� C7D� Local and Global phenomenon’s and global solutions are updated

�('_�) = ∀� ∈ (�, ∀� ∈ (�⎹

(�C2D ≔ '���&B/�(�C3D), ˄ (�C3D ˄ ≔ '���&B_E ((�C4D)˄(�C4D≔ '���&B_E� ((�C7D)

(�ˊ = (� ∪ �(� C2D, (� C3D,(� C4D� At the end of iterations, global solution is returned to the user as was described in . . . .

�(�_�) = ∀�� ∈ (��, ∀�� ∈ (��⎹

Page 134: Welcome to Pakistan Research Repository: Home

115

(�� ∶= (�� ˄

(��ˊ = (�� ∪ �(��� 4.5.2 Verification Property

The aim was to verify that the foods with highest nutritional values are returned. The

tested property was

/E 2��ℎ�)ℎB�&���&C�D, \�C�D, E�� 3

The property was tested is Z-solver and was satisfied in 397 msec.

4.6 Experimental Setup and Results

Extensive simulations are conducted to evaluate performance of the proposed system.

The experimental setup and results are discussed as follows.

4.6.1 Experimental Setup

The details of the experimental setup and parameters used for evaluation are presented

in Table 4.3.

Table 4.3 Experimental Setup

Parameters Values

Total Number of Food Items 3400

Nutrition for Each Food Item 26

Total Number of Pathological Test

Reports 345

Number of Ants used for Simulation 10-120

Maximum Iterations for each Ant 200

Simulation Tool MATLAB

Single Node System Configuration RAM 16 GB, Cores 4

Cloud Configuration MATLAB Parallel Cloud, Cores 16

Page 135: Welcome to Pakistan Research Repository: Home

116

4.6.2 Results

As a first step, the well-known optimization techniques are compared in terms of

accuracy and time complexity for the selection of our study. The techniques include

ACO, PSO, and GA. ACO metaheuristic is a constructive and population based

approach which relies on the social behavior of ants. It is recognized as a most powerful

approach for the solution of combinatorial optimization problems. PSO is a population

based stochastic approach to solve the optimization problems whereas, GA is also used

for optimizing high quality solutions and search problems. There is no guarantee about

the global optimization of GA and most importantly the convergence time of GA and

PSO is relatively high in solving the real-time problems. Moreover, it is also observed

that the convergence time, accuracy, and RMSE computations for the aforementioned

techniques are problem dependent.

Figure 4.5 shows the comparison of ACO, PSO, and GA in terms of accuracy. The

behavior of the system is tested by varying the size of the dataset. It is observed that the

accuracy of ACO is relatively high from PSO and GA. The reason of this behavior is

the ability of ants used in ACO that search the optimal solution effectively and

collaboratively. Other reason for high accuracy in ACO is that in the realistic domain

ants can discover their targets efficiently and rapidly due to the higher global search

capability.

Figure 4.5 Comparison on Accuracy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

500 1000 1500 1000 1500 2500 3000 3200

Accuracy

No. of Inputs

ACO

PSO

GA

Page 136: Welcome to Pakistan Research Repository: Home

117

Figure 4.6 shows the comparison of the three techniques in terms of time complexity

or convergence time. The convergence time of the algorithm is the average iteration

time required for converging to the optimal solution. Same parameters are used to

compute the time complexity of the techniques. It is quite evident from the figure that

the convergence time of ACO is better than PSO and GA. The reason behind the less

convergence time of ACO is its local and global pheromone update strategy. The

collaborative searching of target nodes and updating the pheromone level locally and

globally help ACO to converge the solution in minimal time.

Figure 4.6 Comparison on Time Complexity

Finally, as a comparison all three techniques are compared for computing the average

RMSE as depicted in Figure 4.7. It is quite evident from the figure, that ACO

outperforms PSO and GA for providing minimum average RMSE. The reason for the

minimum RMSE is that each ant of the ACO must visit each node for the optimal

solution which means more iterations and more calculations which help ACO for

reducing the error rate.

100

20100

40100

60100

80100

100100

120100

500 1000 1500 1000 1500 2500 3000 3200

Tim

e in

Sec

ond

s

No of Inputs

GA

PSO

ACO

Page 137: Welcome to Pakistan Research Repository: Home

118

Figure 4.7 Comparison on Average RMSE

The behavior of proposed algorithm are analyzed in terms of time complexity using

ACO. It is observed that increasing number of ants converge the solution to its

minimum cost, but practically, it is not feasible to use high number of ants. Moreover,

using high number of ants to contract a solution increases the time complexity as shown

in Figure 4.8.

Figure 4.8 Tradeoff between Numbers of Ants to Time Complexity

To select optimal number of ants for best results irrespective of time complexity,

RMSE was estimated. It is observed that 110 ants produce lowest error rate. As our

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

500 1000 1500 1000 1500 2500 3000 3200

Aver

age

RM

SE

No. of Inputs

ACO

PSO

GA

Page 138: Welcome to Pakistan Research Repository: Home

119

algorithm uses random initialization, it produced varied results. To address this

abnormality, average of 10 executions with same settings is used. It can be observed

that with increasing number of ants, RMSE is decreasing. Figure 4.9 shows average

RMSE with respect to increasing number of ants.

Figure 4.9 Tradeoff between number of Ants and Average RMSE

For selection of optimal number of ants, best cost analysis is performed for

number of iterations versus number of ants. It is evident in Figure 4.10 that in our case,

80 ants provide best results in terms of convergence of the solution.

Number of Ants

Page 139: Welcome to Pakistan Research Repository: Home

120

Figure 4.10 Cost over Varying No. of Ants and Iterations

Figure 4.11 presents the convergence time of different diseases. As shown in

the previous result, 80 ants provide the best result in terms of convergence of the

solution. Therefore, 80 ants are used for the convergence time comparison between the

most common diseases. The result shows that the convergence time for normal person

is higher compared to persons with some disease. On the other hand, the convergence

time to recommend foods for a hypertension patient is significantly lower compared to

others. The reason for such variance is that the number of foods available for a

normal/healthy person is much higher compared to the number of foods that are

available for a patient.

Figure 4.11 Convergence Time of Different Diseases

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

20 40 60 80 100 120 140 160 180 200

Avera

ge R

MS

E

Iterations

20 Ants

40 Ants

60 Ants

80 Ants

100 Ants

0

500

1000

1500

2000

2500

3000

Normal Iron Deficiency Kidney Disease Diabetes Hypertension

Tim

ein

Sec

onds

Page 140: Welcome to Pakistan Research Repository: Home

121

Figure 4.12 illustrates the cost comparison of common diseases. It can be seen

that least cost is achieved for hypertension, whereas normal person exhibited highest

cost compared to others. This shows that the dataset used in this study is more suitable

for certain diseases, such as hypertension.

Figure 4.12 Cost Comparisons of Diseases

Figure 4.13 depicts the accuracy of recommendations relative to number of ants.

The result shows that the highest accuracy is achieved with 110 ants. It is quite evident

that when the number of ants are increased, the accuracy is also increased. Moreover,

it is observed that the accuracy remains constant between 80 to 100 ants.

Figure 4.13 Accuracy of Recommendations

0

0.2

0.4

0.6

0.8

1

1.2

20 40 60 80 100 120 140 160 180 200

Av

era

ge R

MS

E

Iterations

Diabetes

Iron Deficieny

Kidney Disease

Hypertension

Normal

0.966

0.9665

0.967

0.9675

0.968

0.9685

0.969

0.9695

0.97

0.9705

10 20 30 40 50 60 70 80 90 100 110 120

Accu

racy

No Of Ants

Page 141: Welcome to Pakistan Research Repository: Home

122

As the last step, before generating final recommendations to the user, ranking of the

optimized food list generated by ACO must be computed. In order to select the best

ranking technique, in terms of accuracy, a comparative analysis of the state-of-the-art

ranking techniques is conducted discussed in Section 4.4.3. The results of comparative

analysis are discussed in the subsequent paragraphs.

Figure 4.14, 4.15, and 4.16 show the comparison of all the selected techniques in terms

of precision, recall, and F-Measure, respectively. Behavior of the system is analyzed by

varying the value of �=2 to 10 in this case. It is observed that precision, recall, and F-

Measure of KNN variants, i.e., KNN item-based and KNN user-based outperform other

techniques.

Figure 4.14 Precision @ K=10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

2 4 6 8 10

Pre

cisi

on

No of Food Items

KNN-ITEM

KNN-USER

MF

MP

BAYESIAN

Page 142: Welcome to Pakistan Research Repository: Home

123

Figure 4.15 Recall @ K=10

Figure 4.16 F-Measure @ K =10

Figure 4.17, 4.18, and 4.19 show comparison of selected techniques in terms of

precision, recall, and F-Measure over value of �=12 to 20. Similar behavior is observed

from the evaluated techniques with the exception that values of precision, recall, and

F-Measure are relatively high as compared to �=10. The higher accuracy is achieved

because of the higher value of K. In this case, more recommendations are generated,

i.e., K=20, that enables the techniques to generate ranked list of food items with higher

precision, recall, and F-Measure.

0

0.1

0.2

0.3

0.4

0.5

0.6

2 4 6 8 10

Rec

all

No of Food Items

KNN-ITEM

KNN-USER

MF

MP

BAYESIAN

0

0.1

0.2

0.3

0.4

0.5

0.6

2 4 6 8 10

F-M

easu

rE

No of Food Items

KNN-ITEM

KNN-USER

MF

MP

BAYESIAN

Page 143: Welcome to Pakistan Research Repository: Home

124

Figure 4.17 Precision @ K =20

Figure 4.18 Recall @ K =20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

12 14 16 18 20

Pre

cisi

on

Top-K Recommendations

KNN-ITEM

KNN-USER

MF

MP

BAYESIAN

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

12 14 16 18 20

Rec

all

Top-K Recommendations

KNN-ITEM

KNN-USER

MF

MP

BAYESIAN

Page 144: Welcome to Pakistan Research Repository: Home

125

Figure 4.19 F-Measure @ K =20

Figure 4.20 and 4.21 show average RMSE against all the selected techniques with the

variation of K=2 to 20. It is evident from the results that variants of KNN outperform

other techniques, in terms of average RMSE, for all the dataset. The reason behind

superior performance of KNN is that the distance between a given user/item and nearest

neighbors becomes smaller because of the lower size of the dataset that helps KNN

variants to find the nearest neighbor with low error rate. It is also observed that MF and

MP have attained highest RMSE values. One of the main reason of MF not providing

minimum error rate is because MF techniques only consider Euclidian structure of the

data and simply ignore the geometric structure of the data. This leads to higher error

rate for MF technique especially when dealing with ranking of the given list.

Figure 4.20 Average RMSE @ K =10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

12 14 16 18 20

F-M

easu

re

Top-K Recommendations

KNN-ITEM

KNN-USER

MF

MP

BAYESIAN

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

2 4 6 8 10

Aver

age

RM

SE

Top-K Recommendations

KNN-ITEM

KNN-USER

MF

MP

BAYESIAN

Page 145: Welcome to Pakistan Research Repository: Home

126

Figure 4.21 Average RMSE @ K =20

Similarly, The reason for Bayesian not performing well is its inability to select a prior

(probability distribution) which is needed for the accuracy of the recommendations.

Bayesian technique needs assistance to transform individual prior views into a

mathematically expressed prior, which results in sub-optimal performance of Bayesian

technique when used for ranking. However, Bayesian techniques can outperform other

techniques in case probability distribution is selected correctly depending on the dataset

used for ranking. Here the emphasis is on the performance of the aforementioned

techniques largely depend on the datasets and particular application scenarios.

Figure 4.22 Time Comparison @ K =10

0

0.2

0.4

0.6

0.8

1

1.2

12 14 16 18 20

Aver

age

RM

SE

Top-K Recommendations

KNN-ITEM

KNN-USER

MF

MP

BAYESIAN

0

1

2

3

4

5

6

7

8

2 4 6 8 10

Tim

e in

Sec

ond

s

Top-K Recommendations

KNN-ITEM

KNN-USER

MF

MP

BAYESIAN

Page 146: Welcome to Pakistan Research Repository: Home

127

Figure 4.22 and 4.23 show the comparison of all the techniques in terms of execution

time. The selected techniques are compared with the variation in size of K from 2 to 20

while increasing the size of dataset. Bayesian and MP exhibited highest time for all the

values of K, respectively. Reason for Bayesian technique having higher execution time

is its computational cost especially when the number of parameters are more in the

dataset as in our case. Another reason for lower performance of these techniques, in

terms of higher computational cost, is that these techniques are mostly used with the

implicit ratings. Whereas in our case, explicit rating is used to improve accuracy of top-

K recommendations. This depends on the dataset used for the experiments and the

ultimate target of the recommendations. In our case, explicit ratings are used because

the objective of this work is to improve accuracy of recommendations. Alternatively,

in most cases, implicit ratings are used when the performance metric is diversity or

novelty of the recommendations. It is also noteworthy that Bayesian techniques are

proved to be more computationally expensive in most of the literature [250].

Figure 4.23 Time Comparison @ K =20

For comparison of our proposed model Diet-Right, following existing techniques are

selected. Knowledge Based (KB) technique [2] and food recommendation using

ontology and heuristic based approach (FROH) [251]. The KB technique is based on

explicit knowledge about the user preferences and the recommendation criteria.

Usually, KB recommendation systems generate the recommendations which are

domain dependent. FROH used TF_IDF in combination with cosine similarity measure

0

1

2

3

4

5

6

7

8

12 14 16 18 20

Tim

e in

Sec

onds

Top-K Recommendations

KNN-ITEM

KNN-USER

MF

MP

BAYESIAN

Page 147: Welcome to Pakistan Research Repository: Home

128

for generating the food list. Moreover, FROH also used heuristic information for

generating the final list.

Figure 4.24 Precision

Figure 4.25 Recall

Figure 4.24, 4.25, and 4.26 show that our system has achieved best precision, recall,

and F-Measure as compared with the existing technique FROH and KB. The reason of

Diet-Right outperforming KB and FROH is the ability of ACO in which each ant must

visit each node for the optimal solution which means more relevant food items are

selected by the ants which ultimately increased the ratio of precision, recall, and F-

Measure.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9 1 0

Precision

No. of Food Items

Diet-Right

KB

FROH

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9 1 0

Recall

No. of Food Items

Diet-Right

KB

FROH

Page 148: Welcome to Pakistan Research Repository: Home

129

Figure 4.26 F-Measure

Figure 4.27 depicts the convergence time of single node and cloud-based

execution. For this experiment, our algorithm is executed using Matlab’s cloud

framework [252]. It is evident from the result that the convergence time is significantly

reduced with cloud-based execution. It is noteworthy that the convergence time of cloud

based execution is approximately 12 times lower on average compared to single node

execution.

Figure 4.27 Convergence Time of Single Node and Cloud

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6 7 8 9 1 0

F-M

easure

No. of Food Items

Diet-Right

KB

FROH

10

100

1000

10000

100000

10 20 30 40 50 60 70 80 90 100 110 120

Tim

e S

econ

ds

No. of Ants

Single Node

Cloud

Page 149: Welcome to Pakistan Research Repository: Home

130

4.7 Summary

In this chapter, a cloud based food recommendation system called Diet-Right is

presented. Based on user input, it recommends a list of optimal food items using an

ACO model. Diet-Right manages and updates the heuristic information in such a way

that the diversity among foods is maximized. Extensive experimentation was performed

to check the cost, accuracy, convergence time, and performance gain. Diet-Right can

play a vital role in controlling various diseases. The experimental results showed that

compared to single node execution, the convergence time of parallel execution on cloud

is approximately 12 times lower.

Page 150: Welcome to Pakistan Research Repository: Home

131

Chapter 5

Conclusion and Future Work

Page 151: Welcome to Pakistan Research Repository: Home

132

5.1 Conclusions

Overall findings of this dissertation are summarized as follows. Detailed quantitative

analysis of different datasets has been conducted to show how the performance of

existing models is affected by different datasets. A quantitative analysis is performed

of four state of the art recommendation techniques over datasets of Foursquare,

MovieLens and Gowalla. Quality of recommendations is significantly improved by

using auxiliary information, such as check-in data, geographical information, social

relationships and temporal information. Similarly, experimental results proved that

model-based approaches are more effective and efficient, in terms of accuracy and

scalability, than memory-based approaches. Moreover, the performance of model-

based approaches proved to be scalable while increasing the number of

recommendations.

In recent past, tremendous increase has been witnessed in the use of social networking

sites. The social networking sites collect different information from users, such as

users’ check-ins and geographical information that include longitude and latitude, time,

user ratings, and comments about the location. To this end, Amazon, Facebook, Twitter,

Google, and YouTube gather massive amount of data on daily basis. For instance,

statistics show that only YouTube users upload 72 hours of new videos in a single

minute. Facebook users share approximately 2.5 million pieces of information

(pictures, videos, check-ins, and comments) in every single minute and Google receives

over 4 Million search queries in every minute [253]. These services offer users to share

different types of data, such as images, videos, tweets, ratings, and comments.

Therefore, unstructured dataset created by these services for variety of applications is

huge in volume. The collected data is used to suggest variety of recommendations

according to the users’ preferences. There are numerous challenges in handling and

parsing the huge volume of data. In this dissertation, focus is on the challenges faced

by the recommendation systems in terms of scalability, real-time recommendations,

data sparsity, and cold start.

5.1.1 Real-Time Venue Recommendations

This research is started with the problem to generate real-time recommendations from

the large datasets through effective use of scalable cloud architecture. This study has

presented multifold contributions by devising a cloud based real-time venue

Page 152: Welcome to Pakistan Research Repository: Home

133

recommendation framework. The major contribution of this work is the integration of

knowledge engineering techniques including K-means, KNN, and collaborative

filtering on a cloud infrastructure for generating real-time recommendations. The

proposed RTVR framework considered the effect of dynamic real-world physical

factors in addition to the collective opinions of the experienced users. RTVR solved the

scalability issues by putting forward a cloud-based architecture that offloads data and

computationally intensive processing to cloud servers. As a result, RTVR always

consider precompiled set of experienced users for all categories that enable it to

recommend appropriate venues for new users at finer granularity. For handling the

problem of cold start in venue recommendations, our model compares the preferences

of the new user with the existing Top-N users. Our model maintains a pre-computed

ranked list of popular users and venues. Using such list, our model compares the

preferences of the new user with the existing popular users, i.e., users with top rankings.

The model suggests the recommendations according to the top user that best matches

the preferences with the new user. Data sparsity problem is handled in the pre-

processing phase when our model is clustering the venues. The results of our proposed

RTVR model showed that this research is successful in achieving the target of

scalability, cold start, data sparsity, and generating real-time recommendations.

Moreover, extensive comparison is performed among the state of the art and

most popular techniques, i.e., OmniSuggest, POPULAR, SVD, and Social. As a first

step, the performance metrics of precision, recall, and F-measure are opted to evaluate

the effectiveness of RTVR model. The results showed that RTVR model performed

better than other selected techniques in terms of precision, recall, and F-Measure except

OmniSuggest. The reason behind superior performance of OmniSuggest in terms of

precision, recall and F-measure is that the focus of our proposed model is to generate

real-time recommendations, which is achieved by significant reduction in the dataset.

Consequently, to generate real-time recommendations, RTVR slightly compromised

the accuracy due to inherent tradeoff between data size and accuracy [1]. The reason

for better performance compared to rest of the techniques is due to clustering the venues

using K-Means clustering technique. It is observed from the literature that most of the

traditional techniques perform clustering based on user’s location. In contrast, better

performance is achieved by clustering the venues and reducing the dataset in order to

place user in his/her relevant cluster immediately after the check-in is performed.

Moreover, proposed model is evaluated with other techniques without clustering.

Page 153: Welcome to Pakistan Research Repository: Home

134

Results of the experiments showed that without clustering, high precision and recall

values are achieved. However, time complexity also increases due to the execution of

proposed algorithm over complete dataset. MapReduce is used for the parallel

execution of the selected clusters on cloud. However, a tradeoff between

recommendation quality and reduced dataset exists. Quality of recommendations may

be affected if the dataset is significantly reduced to improve efficiency of online real-

time processing. The evaluation results have proven that performance of the proposed

RTVR framework is superior to many of the existing schemes proposed in the literature.

5.1.2 Food Recommendation

Next, the field of health recommendations is explored and a cloud based food

recommendation system is presented, called Diet-Right, for dietary recommendations

based on users’ pathological reports. The proposed model uses ant colony algorithm to

generate optimal food list and recommends suitable foods according to the values of

pathological reports. Diet-Right can play a vital role in controlling various diseases.

One of the motivation of using ACO is that it can be executed as a distributed algorithm,

therefore, it is suitable to run on cloud environment. Some other advantages of using

ACO is its behavior of quick discovery of optimal solutions and due to these advantages

it can be used in dynamic applications. Moreover, extensive comparison is performed

among the most popular optimization techniques, i.e., ACO, PSO, and GA. All three

techniques are compared to evaluate the time complexity, accuracy, and RMSE in order

to select one of the optimization technique for our experiments. The results showed that

ACO outperformed PSO and GA in terms of time complexity, accuracy, and RMSE.

However, it must be noted that performance of these optimization techniques is problem

dependent. The experimental results showed that compared to single node execution,

the convergence time of parallel execution on cloud is approximately 12 times faster.

Moreover, adequate accuracy is achieved by increasing the number of ants. Similarly,

experiments were performed to analyze cost, accuracy, convergence time, and

performance gain of proposed model. Experimental results clearly indicate that

proposed model generates optimal food recommendations for users in a scalable

manner. Moreover, our model is compared with the traditional Knowledge Based (KB)

model to compare the precision, recall, F-Measure, and time complexity of our model.

Results showed that Diet-Right model outperforms KB model in terms of precision,

recall, F-Measure, and time complexity. Furthermore, the problem of cold start does

Page 154: Welcome to Pakistan Research Repository: Home

135

not exist in the proposed food recommendation model because the historical data of the

user is not required to generate recommendations. The proposed model only requires

the values of pathological reports and demographic data by the user to generate the

personalized food recommendations.

Work presented in this dissertation addresses an important problem concerning

real-time recommendations in a scalable architecture. Evaluation metrics have been

utilized, such as accuracy, diversity, and quality of recommendation for evaluating the

effectiveness of proposed models. This dissertation provides key elements, in terms of

system models and algorithms that can be utilized by other researchers for development

of novel algorithms and architectures for designing and performance evaluation of such

systems. State of the art is used to position the contributions of this work and

performance analysis is done compared with relevant key algorithms.

The main motivation behind this study was to generate real-time recommendations

when dealing with high volume of unstructured data. However, generating real-time

recommendations in isolation is not valuable unless other relevant parameters, such as

scalability, cold start, data sparseness, accuracy, and diversity in recommendations are

considered. System models and algorithms presented as part of this dissertation are

specifically targeted to generate real-time recommendations with a balance with

challenges of scalability, data sparsity, and cold start problems. Our proposed models

can be used in variety of applications of recommendation system. For venue based

recommendations, our model can be used by users searching for variety of locations

such as restaurants, shopping malls, hospitals, and tourist places. As an example, the

user when arriving in a new venue may encounter problems while visiting the new

venues without proper recommendations according to his/her preferences. Similarly,

our food recommendation model can be used in variety of health applications such as

balanced diet and health recommendations for different diseases.

Based on the aforementioned findings, it is concluded that for generating real-

time recommendations and solving scalability challenges, cloud computing plays a vital

role. Similarly, cloud based approaches coupled with clustering techniques help in

efficiently analyzing the huge volume of unstructured dataset collected by different

online social network services. Moreover, reducing the dataset helps in generating real-

time recommendations, however, significant reduction of dataset may affect the

Page 155: Welcome to Pakistan Research Repository: Home

136

accuracy of the recommendations. The tradeoff between reduced dataset and accuracy

affects the quality of recommendations provided to the users.

5.2 Opportunities in Recommendation Systems

The recommendation systems have vast applications in the areas, such as healthcare,

transportation, tourism, and education. In the following subsections, some of the

opportunities in adopting the services of recommendation systems are briefly discussed.

5.2.1 Healthcare

Healthcare is one of the main areas where recommendation systems can significantly

enhance the efficiency, reliability, and effectiveness of the system [190, 254, 255].

People from various domains often require multiple healthcare services, such as

specific disease specialist, hospitals, and health insurance plans [256] that closely match

people’s preferences. Recommendation systems can play an important role in the

healthcare industry in order to connect and provide localized recommendations for

patients, healthcare providers, and insurance companies.

5.2.2 Transportation

Another interesting area for the adoption of recommendation systems is transportation.

Recommendation systems can be helpful in route recommendations, e.g., for

individuals driving their personal vehicles, cab drivers, and public transporters [68, 69].

Heavy traffic in peak hours is one of the significant problems all over the world. In such

situations, people can use the services of LBRS for different routes to their destinations.

Similarly, carpooling is also among one of the services provided by recommendation

systems10 [257]. Effective recommendation systems adoption in transportation can

significantly reduce the cost of fuel and enhance the reliability of services provided by

LBRS.

5.2.3 Tourism

An important area where recommendation systems are actively deployed is the tourism

industry where people want to plan in advance their preferred locations to visit.

Sometimes it is difficult to choose the appropriate place when one has to make a

10 www.uber.com

Page 156: Welcome to Pakistan Research Repository: Home

137

decision from multiple available choices. Recommendation systems has been used to

provide the effectiveness in tourism by recommending the appropriate trip plans as well

as the other well-known nearby point-of-interests such as hotels, restaurants, shopping

malls [129, 258]. The adoption of recommendation systems in tourism can significantly

save the time of tourists to reach their destination in a suitable time.

5.2.4 Education

Another key area where the services of recommendation systems can significantly play

an important role is education. Students need better institutions, such as colleges and

universities for their higher education. Recommendation systems can be used to

discover the best institutes according to the preferences of the student11 12.

5.3 Future Directions

There can be several research directions to further extend the work presented in this

dissertation. Below, future extensions are listed that can be applied to proposed work.

5.3.1 Real- World Factors and Group Recommendation

With the continuous evolution of social networking services, the significance of

recommendation models considering group preference has also increased. However,

most of the existing traditional recommendation schemes do not take into account group

of “friends” scenarios [38, 66, 69]. In group scenarios, the recommendation systems

not only models the preferences of a group member but the location of each member

must also be taken into account to satisfy all the members in the group. The individuals’

preferences and their preferred locations are then aggregated as the recommendation

for the whole group [172]. In recent past, limited work has been carried out in the field

of group recommendations, such as [70, 172, 259]. Most of the existing techniques

proposed in the literature do not specifically focus on the effects of real-time physical

factors, such as distance from location, traffic, and weather conditions on group

recommendations. However, the complexity and cost of processing the large-scale

datasets negatively affect the performance and efficiency of recommendation systems.

11 www.ratemyprofessors.com/ 12 https://foursquare.com/

Page 157: Welcome to Pakistan Research Repository: Home

138

In the context of LBRS, there has been limited work performed on group-based location

recommendations.

The main motivation behind consideration of real-world parameters in group

recommendation is to include the current context of each of the group member in the

location recommendation process. By doing so, the selected location will be based on

mutual consensus of group members and will be the one that satisfies all the members

in a group. It is noteworthy to mention that providing real-time recommendations is

highly compute-intensive task as the workload consists of huge volume of user data

accumulated in the system over time. When the system is offering routes along with

locations, then the key location and routes attributes such as location type, distance of

the location, travel time, route complexity, time-of-day, and real-time world factors

with temporal features need to be considered. Moreover, most of the existing

recommendation systems use only a single type of data source for recommendations.

Using diverse data sources will enable the recommendation systems to provide effective

recommendations. Diverse data sources may include distance from location, traffic

conditions, weather conditions, multiple routes to location, and time of the day

(morning, evening or night).

5.3.2 Balance in Accuracy and Diversity of Recommendations

The researchers of recommendation systems domain carried research for the last few

years on balancing the diversity and accuracy while generating the final

recommendations. Variety of techniques have been proposed for offering diverse and

novel recommendations but has negative impact on accuracy. At the same time, the

focus should also be on maintaining the balance between diversity and similarity [3, 61,

62]. Moreover, highly diverse recommendations also affect the similarity and accuracy

criteria for the user. Therefore the diversity of the recommendations should be provided

with a balance in aforementioned tradeoffs [3]. Studies such as [119] showed that

accuracy alone is not sufficient for the selection of related algorithm. For example, user

may have intent to incorporate diversity or novelty in the recommendations along with

accuracy. Therefore, considering accuracy alone as the criterion for recommendation is

not sufficient. Moreover, it was also shown in another study that the satisfaction level

of users supersedes the requirements of being accurate and that the users are more likely

to accept more diverse recommendations [62]. But in majority cases, users need a

Page 158: Welcome to Pakistan Research Repository: Home

139

balance between accuracy and diversity [62]. Furthermore, different frameworks of

recommendation systems have different levels of accuracy and diversity. The levels

may change from one framework to another framework. Therefore, the level adjustment

for accuracy and diversity is significant for recommendation systems. In order to

maintain the balance in metrics like accuracy and diversity, the model used in

recommendation systems must consider hybrid techniques that will minimize the

tradeoff between the aforementioned metrics.

5.3.3 Cold Start Problem

The problem of cold start in recommendation systems is still an open research area as

there has not been an optimal solution for cold start users and items. A solution for cold

start problem is provided by comparing the preferences of cold start user to the existing

top users. As a result, the recommendations are generated based on the preferences of

top users generated by KNN whose similarity best matches the cold start user. Variety

of solutions exist in the literature [71, 107, 260], but still there are some areas where

there is no optimal solution provided for cold start problem. Existing approaches mostly

rely on neighbors of the new user and provide recommendations to the new user that

best matches with the neighbor’s preferences [260]. These techniques apply various

learning techniques on user’s historical data to discover the nearest neighbors of the

new user in order to provide recommendations. The techniques used for finding the

nearest neighbors or top users is applicable only when there exist sufficient dataset with

relevant information. There can be numerous areas where either the dataset is not

sufficient or the recommendation area is new and limited number of users are using that

system and hence there are no top users or neighbors. In such scenarios, traditional

solutions are inadequate to generate recommendations for a new user. Therefore, hybrid

techniques need to be considered from multiple disciplines including artificial neural

networks, Bayesian networks, and machine learning techniques for developing

solutions that will efficiently handle the issue of cold start.

5.3.4 Extension in Food Recommendations

In this dissertation, a food recommendation is proposed based on user’s pathological

reports which is not specific to any disease. As a future research for food

recommendation system, breakdown of recommended diets is desired for different

timings of the day, such as breakfast, lunch, and dinner. As the values of nutrition in a

Page 159: Welcome to Pakistan Research Repository: Home

140

specific diet plays an important role in monitoring and controlling any disease, therefore

a breakdown in the nutrition are important. For instance, a diabetic patient only needs

a specific amount of sugar value in every diet. Therefore, it is important to breakdown

the sugar value into the complete diet taken by the patient in breakfast, lunch, and

dinner. Moreover, group food recommendation for family/friends is another interesting

research area that can be explored. There may exist more than one patient in a family

suffering from different diseases. In such case, a system is desired to generate food

recommendations which satisfy the preferences of different patients’ groups in the same

family. The system must generate an optimal food list where diversity among food

items is minimum and patients in the family can take the same food to control different

diseases.

Page 160: Welcome to Pakistan Research Repository: Home

141

Chapter 6

References

Page 161: Welcome to Pakistan Research Repository: Home

142

1. Bobadilla, J., et al., Recommender systems survey. Knowledge-based systems,

2013. 46: p. 109-132.

2. Colombo-Mendoza, L.O., et al., RecomMetz: A context-aware knowledge-

based mobile recommender system for movie showtimes. Expert Systems with

Applications, 2015. 42(3): p. 1202-1222.

3. Adomavicius, G. and Y. Kwon, Improving aggregate recommendation diversity

using ranking-based techniques. IEEE Transactions on Knowledge and Data

Engineering, 2012. 24(5): p. 896-911.

4. Esparza, S.G., M.P. O’Mahony, and B. Smyth, Mining the real-time web: a

novel approach to product recommendation. Knowledge-Based Systems, 2012.

29: p. 3-11.

5. Jung, H. and K. Chung, Knowledge-based dietary nutrition recommendation for

obese management. Information Technology and Management, 2016. 17(1): p.

29-42.

6. Estrela, D., et al. A Recommendation System for Online Courses. in World

Conference on Information Systems and Technologies. 2017. Springer.

7. Ivarsson, J. and M. Lindgren, Movie recommendations using matrix

factorization. 2016.

8. Zhao, L., et al. Matrix factorization+ for movie recommendation. in

Proceedings of the 25th International Joint Conference on Artificial

Intelligence (IJCAI’16). 2016.

9. Hwang, T.-G., et al., An algorithm for movie classification and recommendation

using genre correlation. Multimedia Tools and Applications, 2016. 75(20): p.

12843-12858.

10. Aerts, G., T. Smits, and P. Verlegh, The Influence Of A Product Picture And A

Prior Review On Product Recommendations And Evaluations. 2017.

11. Pöyry, E., et al. Personalized Product Recommendations: Evidence from the

Field. in Proceedings of the 50th Hawaii International Conference on System

Sciences. 2017.

12. Balasubramanian, S., et al., Product recommendations based on analysis of

social experiences. 2016, Google Patents.

Page 162: Welcome to Pakistan Research Repository: Home

143

13. Eravci, B., et al. Location Recommendations for New Businesses Using Check-

in Data. in Data Mining Workshops (ICDMW), 2016 IEEE 16th International

Conference on. 2016. IEEE.

14. Pasanisi, F., L. Santarpia, and C. Finelli, Diet Recommendations, in Clinical

Management of Overweight and Obesity. 2016, Springer. p. 13-21.

15. Su, C.-J., Y.-A. Chen, and C.-W. Chih, Personalized ubiquitous diet plan

service based on ontology and web services. International Journal of

Information and Education Technology, 2013. 3(5): p. 522.

16. Wilson, D., An Evaluation of Leadership Competencies and Formal Leadership

Education Recommendations for Library Leaders of the 21st Century. 2016,

WILMINGTON UNIVERSITY (DELAWARE).

17. Howard, T.C., T.-R. Douglas, and C.A. Warren, “What Works”:

Recommendations on Improving Academic Experiences and Outcomes for

Black Males. Teachers College Record, 2016. 118(6): p. n6.

18. Mehta, B., C. Sonntag, and I. Mahaniok, Context-influenced application

recommendations. 2016, Google Patents.

19. Evans, E.Z., D.A. Markley, and J.N. Adkins III, Application recommendations

based on application and lifestyle fingerprinting. 2016, Google Patents.

20. Gao, H., et al. Content-Aware Point of Interest Recommendation on Location-

Based Social Networks. in AAAI. 2015.

21. Ye, M., et al. Exploiting geographical influence for collaborative point-of-

interest recommendation. in Proceedings of the 34th international ACM SIGIR

conference on Research and development in Information Retrieval. 2011. ACM.

22. Snashall, E. and S. Hindocha, The Use of Smartphone Applications in Medical

Education. Open Medicine Journal, 2016. 3(1).

23. Chen, J., et al., The use of smartphone health apps and other mobile health

(mHealth) technologies in dietetic practice: a three country study. Journal of

Human Nutrition and Dietetics, 2017.

24. Tussyadiah, I.P. and D. Wang, Tourists’ attitudes toward proactive smartphone

systems. Journal of Travel Research, 2016. 55(4): p. 493-508.

25. Cao, H. and M. Lin, Mining smartphone data for app usage prediction and

recommendations: A survey. Pervasive and Mobile Computing, 2017.

Page 163: Welcome to Pakistan Research Repository: Home

144

26. Ying, J.J.-C., et al. Mining user similarity from semantic trajectories. in

Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location

Based Social Networks. 2010. ACM.

27. Bao, J., et al., A survey on recommendations in location-based social networks.

ACM Transaction on Intelligent Systems and Technology, 2013.

28. Foursquare. [cited 2016 January 03]; Available from:

https://foursquare.com/about.

29. MovieLens. [cited 2016 February 14]; Available from:

https://grouplens.org/datasets/movielens/.

30. Taxi Trace Dataset. [cited 2016 18 January, 2016]; Available from:

http://crawdad.org/roma/taxi/20140717/.

31. Lü, L., et al., Recommender systems. Physics Reports, 2012. 519(1): p. 1-49.

32. Sarwat, M., et al., LARS*: a scalable and efficient location-aware recommender

system. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2013.

33. Ye, M., P. Yin, and W.-C. Lee. Location recommendation for location-based

social networks. in Proceedings of the 18th SIGSPATIAL international

conference on advances in geographic information systems. 2010. ACM.

34. Bao, J., Y. Zheng, and M.F. Mokbel. Location-based and preference-aware

recommendation using sparse geo-social networking data. in Proceedings of

the 20th international conference on advances in geographic information

systems. 2012. ACM.

35. De Mauro, A., et al. What is big data? A consensual definition and a review of

key research topics. in AIP conference proceedings. 2015. AIP.

36. Doytsher, Y., B. Galon, and Y. Kanza. Storing routes in socio-spatial networks

and supporting social-based route recommendation. in Proceedings of the 3rd

ACM SIGSPATIAL International Workshop on Location-Based Social

Networks. 2011. ACM.

37. Chang, K.-P., et al. Discovering personalized routes from trajectories. in

Proceedings of the 3rd ACM SIGSPATIAL International Workshop on

Location-Based Social Networks. 2011. ACM.

38. Noulas, A., et al., Exploiting Semantic Annotations for Clustering Geographic

Areas and Users in Location-based Social Networks. The Social Mobile Web,

2011. 11(2).

Page 164: Welcome to Pakistan Research Repository: Home

145

39. Batchinsky, A., L.C. Cancio, and J. Salinas, Patient care recommendation

system. 2017, Google Patents.

40. Molla, Y.B., et al., Geographic information system for improving maternal and

newborn health: recommendations for policy and programs. BMC Pregnancy

and Childbirth, 2017. 17(1): p. 26.

41. Wang, S.-L., et al., Design and evaluation of a cloud-based Mobile Health

Information Recommendation system on wireless sensor networks. Computers

& Electrical Engineering, 2016. 49: p. 221-235.

42. Bates, D.W., et al., Big data in health care: using analytics to identify and

manage high-risk and high-cost patients. Health Affairs, 2014. 33(7): p. 1123-

1131.

43. Kim, J., D. Lee, and K.-Y. Chung, Item recommendation based on context-

aware model for personalized u-healthcare service. Multimedia Tools and

Applications, 2014. 71(2): p. 855-872.

44. Oh, Y., A. Choi, and W. Woo, u-BabSang: a context-aware food

recommendation system. The Journal of Supercomputing, 2010. 54(1): p. 61-

81.

45. Who, J. and F.E. Consultation, Diet, nutrition and the prevention of chronic

diseases. World Health Organ Tech Rep Ser, 2003. 916(i-viii).

46. Ge, M., et al. Using tags and latent factors in a food recommender system. in

Proceedings of the 5th International Conference on Digital Health 2015. 2015.

ACM.

47. Organization, W.H., Nutrition for health and development: a global agenda for

combating malnutrition. 2000.

48. Organization, W.H., Childhood nutrition and progress in implementing the

international code of marketing of Breast-milk substitute. Report by the

Secretariat. A, 2002. 55.

49. UNICEF and W.H. Organization, Global prevalence of vitamin A deficiency.

1995.

50. Organization, W.H., Iron deficiency anaemia: assessment, prevention and

control: a guide for programme managers. 2001.

51. Organization, W.H., Progress towards the elimination of iodine deficiency

disorders (IDD). 1999.

Page 165: Welcome to Pakistan Research Repository: Home

146

52. Phanich, M., P. Pholkul, and S. Phimoltares. Food recommendation system

using clustering analysis for diabetic patients. in Information Science and

Applications (ICISA), 2010 International Conference on. 2010. IEEE.

53. Freyne, J. and S. Berkovsky, Evaluating recommender systems for supportive

technologies, in User Modeling and Adaptation for Daily Routines. 2013,

Springer. p. 195-217.

54. Runo, M., FooDroid: A Food Recommendation App for University Canteens.

Unpublished semester thesis, Swiss Federal Institute of Theology, Zurich, 2011.

55. Evert, A.B., et al., Nutrition therapy recommendations for the management of

adults with diabetes. Diabetes care, 2013. 36(11): p. 3821-3842.

56. LeFevre, M.L., Behavioral Counseling to Promote a Healthful Diet and

Physical Activity for Cardiovascular Disease Prevention in Adults With

Cardiovascular Risk Factors: US Preventive Services Task Force

Recommendation StatementBehavioral Counseling in Adults With

Cardiovascular Risk Factors. Annals of internal medicine, 2014. 161(8): p.

587-593.

57. Teng, C.-Y., Y.-R. Lin, and L.A. Adamic. Recipe recommendation using

ingredient networks. in Proceedings of the 4th Annual ACM Web Science

Conference. 2012. ACM.

58. Khalid, O., et al., Omnisuggest: A ubiquitous cloud-based context-aware

recommendation system for mobile social networks. IEEE Transactions on

Services Computing, 2014. 7(3): p. 401-414.

59. Sharma, L. and A. Gera, A survey of recommendation system: Research

challenges. International Journal of Engineering Trends and Technology

(IJETT), 2013. 4(5): p. 1989-1992.

60. Vargas, S. and P. Castells. Rank and relevance in novelty and diversity metrics

for recommender systems. in Proceedings of the fifth ACM conference on

Recommender systems. 2011. ACM.

61. Liu, Q. Accurate and Diverse Recommendations via Integrated Communities of

Interest and Trustable Neighbors. in Management of e-Commerce and e-

Government (ICMeCG), 2014 International Conference on. 2014. IEEE.

62. Javari, A. and M. Jalili, A probabilistic model to resolve diversity–accuracy

challenge of recommendation systems. Knowledge and Information Systems,

2015. 44(3): p. 609-627.

Page 166: Welcome to Pakistan Research Repository: Home

147

63. Yin, H., et al., Challenging the long tail recommendation. Proceedings of the

VLDB Endowment, 2012. 5(9): p. 896-907.

64. Gong, S., Research on Attack on Collaborative Filtering Recommendation

Systems. Advances in Information Sciences and Service Sciences, 2013. 5(10):

p. 938.

65. Burke, R., M.P. O’Mahony, and N.J. Hurley, Robust collaborative

recommendation, in Recommender systems handbook. 2011, Springer. p. 805-

835.

66. Preoţiuc-Pietro, D. and T. Cohn. Mining user behaviours: a study of check-in

patterns in location based social networks. in Proceedings of the 5th Annual

ACM Web Science Conference. 2013. ACM.

67. Wang, H., M. Terrovitis, and N. Mamoulis. Location recommendation in

location-based social networks using user check-in data. in Proceedings of the

21st ACM SIGSPATIAL International Conference on Advances in Geographic

Information Systems. 2013. ACM.

68. Liu, L., et al., A real-time personalized route recommendation system for self-

drive tourists based on vehicle to vehicle communication. Expert Systems with

Applications, 2014. 41(7): p. 3409-3417.

69. Su, H., et al. Crowdplanner: A crowd-based route recommendation system. in

Data Engineering (ICDE), 2014 IEEE 30th International Conference on. 2014.

IEEE.

70. Hao, F., et al., An efficient approach to generating location-sensitive

recommendations in ad-hoc social network environments. IEEE Transactions

on Services Computing, 2015. 8(3): p. 520-533.

71. Ji, K. and H. Shen, Addressing cold-start: scalable recommendation with tags

and keywords. Knowledge-based systems, 2015. 83: p. 42-50.

72. Bellogín, A., P. Castells, and I. Cantador. Improving memory-based

collaborative filtering by neighbour selection based on user preference overlap.

in Proceedings of the 10th Conference on Open Research Areas in Information

Retrieval. 2013. LE CENTRE DE HAUTES ETUDES INTERNATIONALES

D'INFORMATIQUE DOCUMENTAIRE.

73. Kaleli, C., An entropy-based neighbor selection approach for collaborative

filtering. Knowledge-Based Systems, 2014. 56: p. 273-280.

Page 167: Welcome to Pakistan Research Repository: Home

148

74. Hu, Y., Q. Peng, and X. Hu. A time-aware and data sparsity tolerant approach

for web service recommendation. in Web Services (ICWS), 2014 IEEE

International Conference on. 2014. IEEE.

75. Guo, G., J. Zhang, and D. Thalmann, Merging trust in collaborative filtering to

alleviate data sparsity and cold start. Knowledge-Based Systems, 2014. 57: p.

57-68.

76. Yin, H., et al. Lcars: a location-content-aware recommender system. in

Proceedings of the 19th ACM SIGKDD international conference on Knowledge

discovery and data mining. 2013. ACM.

77. Statistics of Smart Phones. [cited 2016 October 21]; Available from:

https://www.statista.com/statistics/203734/global-smartphone-penetration-per-

capita-since-2005/.

78. Pew Research Center. [cited 2016 March 21, 2016]; Available from:

http://www.pewinternet.org/2015/04/01/us-smartphone-use-in-2015/.

79. Bobrow, J., Representation and understanding: Studies in cognitive science.

2014: Elsevier.

80. Lewis, D.D. Learning in intelligent information retrieval. in Machine Learning:

Proceedings of the Eighth International Workshop. 2014.

81. Pirasteh, P., D. Hwang, and J.J. Jung, Exploiting matrix factorization to

asymmetric user similarities in recommendation systems. Knowledge-Based

Systems, 2015. 83: p. 51-57.

82. Guy, I., Social recommender systems, in Recommender Systems Handbook.

2015, Springer. p. 511-543.

83. Felfernig, A., et al., Toward the next generation of recommender systems:

applications and research challenges, in Multimedia services in intelligent

environments. 2013, Springer. p. 81-98.

84. Konstan, J.A. and J. Riedl, Recommender systems: from algorithms to user

experience. User Modeling and User-Adapted Interaction, 2012. 22(1-2): p.

101-123.

85. Park, D.H., et al., A literature review and classification of recommender systems

research. Expert Systems with Applications, 2012. 39(11): p. 10059-10072.

86. Bostandjiev, S., J. O'Donovan, and T. Höllerer. TasteWeights: a visual

interactive hybrid recommender system. in Proceedings of the sixth ACM

conference on Recommender systems. 2012. ACM.

Page 168: Welcome to Pakistan Research Repository: Home

149

87. Bobadilla, J., et al., Improving collaborative filtering recommender system

results and performance using genetic algorithms. Knowledge-based systems,

2011. 24(8): p. 1310-1316.

88. Cacheda, F., et al., Comparison of collaborative filtering algorithms:

Limitations of current techniques and proposals for scalable, high-performance

recommender systems. ACM Transactions on the Web (TWEB), 2011. 5(1): p.

2.

89. Jannach, D., et al., Recommender systems: an introduction. 2010: Cambridge

University Press.

90. Cantador, I., P. Castells, and A. Bellogín, An enhanced semantic layer for

hybrid recommender systems: Application to news recommendation.

International Journal on Semantic Web and Information Systems (IJSWIS),

2011. 7(1): p. 44-78.

91. Rikitianskii, A., M. Harvey, and F. Crestani. A personalised recommendation

system for context-aware suggestions. in European Conference on Information

Retrieval. 2014. Springer.

92. Chiang, H.-S. and T.-C. Huang, User-adapted travel planning system for

personalized schedule recommendation. Information Fusion, 2015. 21: p. 3-17.

93. Kolomvatsos, K., C. Anagnostopoulos, and S. Hadjiefthymiades, An efficient

recommendation system based on the optimal stopping theory. Expert Systems

with Applications, 2014. 41(15): p. 6796-6806.

94. Vanetti, M., et al. Content-based filtering in on-line social networks. in

International Workshop on Privacy and Security Issues in Data Mining and

Machine Learning. 2010. Springer.

95. Lops, P., M. De Gemmis, and G. Semeraro, Content-based recommender

systems: State of the art and trends, in Recommender systems handbook. 2011,

Springer. p. 73-105.

96. Lu, Z., et al. Content-Based Collaborative Filtering for News Topic

Recommendation. in AAAI. 2015. Citeseer.

97. Guo, G., J. Zhang, and N. Yorke-Smith. A Novel Bayesian Similarity Measure

for Recommender Systems. in IJCAI. 2013.

98. Shi, Y., M. Larson, and A. Hanjalic, Collaborative filtering beyond the user-

item matrix: A survey of the state of the art and future challenges. ACM

Computing Surveys (CSUR), 2014. 47(1): p. 3.

Page 169: Welcome to Pakistan Research Repository: Home

150

99. Milicevic, A.K., A. Nanopoulos, and M. Ivanovic, Social tagging in

recommender systems: a survey of the state-of-the-art and possible extensions.

Artificial Intelligence Review, 2010. 33(3): p. 187-209.

100. Hurley, N. and M. Zhang, Novelty and diversity in top-n recommendation--

analysis and evaluation. ACM Transactions on Internet Technology (TOIT),

2011. 10(4): p. 14.

101. Wei, S., et al. Item-based collaborative filtering recommendation algorithm

combining item category with interestingness measure. in Computer Science &

Service System (CSSS), 2012 International Conference on. 2012. IEEE.

102. Desrosiers, C. and G. Karypis, A comprehensive survey of neighborhood-based

recommendation methods, in Recommender systems handbook. 2011, Springer.

p. 107-144.

103. Evangelopoulos, N., X. Zhang, and V.R. Prybutok, Latent semantic analysis:

five methodological recommendations. European Journal of Information

Systems, 2012. 21(1): p. 70-86.

104. Hoffman, M., F.R. Bach, and D.M. Blei. Online learning for latent dirichlet

allocation. in advances in neural information processing systems. 2010.

105. Agarwal, D., et al., Content recommendation on web portals. Communications

of the ACM, 2013. 56(6): p. 92-101.

106. Forbes, P. and M. Zhu. Content-boosted matrix factorization for recommender

systems: experiments with recipe recommendation. in Proceedings of the fifth

ACM conference on Recommender systems. 2011. ACM.

107. Wei, J., et al., Collaborative filtering and deep learning based recommendation

system for cold start items. Expert Systems with Applications, 2017. 69: p. 29-

39.

108. Burke, R. and F. Eskandanian. Collaborative Recommendation of Informal

Learning Experiences. in Web Intelligence Workshops (WIW), IEEE/WIC/ACM

International Conference on. 2016. IEEE.

109. Zhao, T., J. McAuley, and I. King. Leveraging social connections to improve

personalized ranking for collaborative filtering. in Proceedings of the 23rd

ACM International Conference on Conference on Information and Knowledge

Management. 2014. ACM.

Page 170: Welcome to Pakistan Research Repository: Home

151

110. Balakrishnan, S. and S. Chopra. Collaborative ranking. in Proceedings of the

fifth ACM international conference on Web search and data mining. 2012.

ACM.

111. Kim, H.-N., et al., Collaborative filtering based on collaborative tagging for

enhancing the quality of recommendation. Electronic Commerce Research and

Applications, 2010. 9(1): p. 73-83.

112. Cremonesi, P., et al. Comparative evaluation of recommender system quality.

in CHI'11 Extended Abstracts on Human Factors in Computing Systems. 2011.

ACM.

113. Zarrinkalam, F. and M. Kahani. A multi-criteria hybrid citation

recommendation system based on linked data. in Computer and Knowledge

Engineering (ICCKE), 2012 2nd International eConference on. 2012. IEEE.

114. Chen, L. and P. Pu, Critiquing-based recommenders: survey and emerging

trends. User Modeling and User-Adapted Interaction, 2012. 22(1-2): p. 125-

150.

115. Parra, D., P. Brusilovsky, and C. Trattner. See what you want to see: visual user-

driven approach for hybrid recommendation. in Proceedings of the 19th

international conference on Intelligent User Interfaces. 2014. ACM.

116. Lampropoulos, A.S., P.S. Lampropoulou, and G.A. Tsihrintzis, A cascade-

hybrid music recommender system for mobile services based on musical genre

classification and personality diagnosis. Multimedia Tools and Applications,

2012. 59(1): p. 241-258.

117. Khribi, M.K., M. Jemni, and O. Nasraoui, Recommendation systems for

personalized technology-enhanced learning, in Ubiquitous learning

environments and technologies. 2015, Springer. p. 159-180.

118. Pu, P., L. Chen, and R. Hu, Evaluating recommender systems from the user’s

perspective: survey of the state of the art. User Modeling and User-Adapted

Interaction, 2012. 22(4-5): p. 317-355.

119. Shani, G. and A. Gunawardana, Evaluating recommendation systems, in

Recommender systems handbook. 2011, Springer. p. 257-297.

120. Bedi, P. and R. Sharma, Trust based recommender system using ant colony for

trust computation. Expert Systems with Applications, 2012. 39(1): p. 1183-

1190.

Page 171: Welcome to Pakistan Research Repository: Home

152

121. Avazpour, I., et al., Dimensions and metrics for evaluating recommendation

systems, in Recommendation systems in software engineering. 2014, Springer.

p. 245-273.

122. Rana, C. and S.K. Jain, A study of the dynamic features of recommender

systems. Artificial Intelligence Review, 2015. 43(1): p. 141-153.

123. Pu, P., L. Chen, and R. Hu. A user-centric evaluation framework for

recommender systems. in Proceedings of the fifth ACM conference on

Recommender systems. 2011. ACM.

124. Verbert, K., et al., Context-aware recommender systems for learning: a survey

and future challenges. IEEE Transactions on Learning Technologies, 2012.

5(4): p. 318-335.

125. Arora, G., et al., MOVIE RECOMMENDATION SYSTEM BASED ON

USERS’SIMILARITY. International Journal of Computer Science and Mobile

Computing, 2014. 3(4): p. 765-770.

126. Zheng, Y., et al., Recommending friends and locations based on individual

location history. ACM Transactions on the Web (TWEB), 2011. 5(1): p. 5.

127. Cechinel, C., et al., Evaluating collaborative filtering recommendations inside

large learning object repositories. Information Processing & Management,

2013. 49(1): p. 34-50.

128. Lyakhov, A.O., A.R. Oganov, and M. Valle, How to predict very large and

complex crystal structures. Computer Physics Communications, 2010. 181(9):

p. 1623-1632.

129. Chen, J.-H., K.-M. Chao, and N. Shah. Hybrid recommendation system for

tourism. in e-Business Engineering (ICEBE), 2013 IEEE 10th International

Conference on. 2013. IEEE.

130. Xia, P., L. Zhang, and F. Li, Learning similarity with cosine similarity

ensemble. Information Sciences, 2015. 307: p. 39-52.

131. Dakhel, G.M. and M. Mahdavi. A new collaborative filtering algorithm using

k-means clustering and neighbors' voting. in Hybrid Intelligent Systems (HIS),

2011 11th International Conference on. 2011. IEEE.

132. Golbandi, N., Y. Koren, and R. Lempel. Adaptive bootstrapping of

recommender systems using decision trees. in Proceedings of the fourth ACM

international conference on Web search and data mining. 2011. ACM.

Page 172: Welcome to Pakistan Research Repository: Home

153

133. Hsu, F.-M., Y.-T. Lin, and T.-K. Ho, Design and implementation of an

intelligent recommendation system for tourist attractions: The integration of

EBM model, Bayesian network and Google Maps. Expert Systems with

Applications, 2012. 39(3): p. 3257-3264.

134. Ergu, D., et al., The analytic hierarchy process: task scheduling and resource

allocation in cloud computing environment. The Journal of Supercomputing,

2013: p. 1-14.

135. Zuva, T., et al., A survey of recommender systems techniques challenges and

evaluation metrics. International Journal of Emerging Technology and

Advanced Engineering, 2012. 2(11): p. 382-386.

136. Jäschke, R., et al., Challenges in tag recommendations for collaborative tagging

systems, in Recommender systems for the social web. 2012, Springer. p. 65-87.

137. Baltrunas, L., T. Makcinskas, and F. Ricci. Group recommendations with rank

aggregation and collaborative filtering. in Proceedings of the fourth ACM

conference on Recommender systems. 2010. ACM.

138. Adomavicius, G. and J. Zhang, Stability of recommendation algorithms. ACM

Transactions on Information Systems (TOIS), 2012. 30(4): p. 23.

139. Hernando, A., et al., Incorporating reliability measurements into the predictions

of a recommender system. Information Sciences, 2013. 218: p. 1-16.

140. Shuja, J., et al., Survey of techniques and architectures for designing energy-

efficient data centers. IEEE Systems Journal, 2016. 10(2): p. 507-519.

141. Hashem, I.A.T., et al., The rise of “big data” on cloud computing: Review and

open research issues. Information Systems, 2015. 47: p. 98-115.

142. Verma, J.P., B. Patel, and A. Patel. Big data analysis: recommendation system

with Hadoop framework. in Computational Intelligence & Communication

Technology (CICT), 2015 IEEE International Conference on. 2015. IEEE.

143. Palanisamy, B., et al. Purlieus: locality-aware resource allocation for

MapReduce in a cloud. in Proceedings of 2011 International Conference for

High Performance Computing, Networking, Storage and Analysis. 2011. ACM.

144. Pakize, S.R. and A. Gandomi, Comparative study of classification algorithms

based on MapReduce model. International Journal of Innovative Research in

Advanced Engineering, ISSN, 2014: p. 2349-2163.

Page 173: Welcome to Pakistan Research Repository: Home

154

145. Chen, C.P. and C.-Y. Zhang, Data-intensive applications, challenges,

techniques and technologies: A survey on Big Data. Information Sciences,

2014. 275: p. 314-347.

146. Mell, P. and T. Grance, The NIST definition of cloud computing. 2011.

147. Zheng, V.W., et al., Towards mobile intelligence: Learning from GPS history

data for collaborative recommendation. Artificial Intelligence, 2012. 184: p.

17-37.

148. Symeonidis, P., D. Ntempos, and Y. Manolopoulos, Recommender systems for

location-based social networks. 2014: Springer.

149. Zheng, Y. Tutorial on location-based social networks. in Proceedings of the

21st international conference on World wide web, WWW. 2012. Citeseer.

150. Majid, A., et al., A context-aware personalized travel recommendation system

based on geotagged social media data mining. International Journal of

Geographical Information Science, 2013. 27(4): p. 662-684.

151. Chon, J. and H. Cha, Lifemap: A smartphone-based context provider for

location-based services. IEEE Pervasive Computing, 2011. 10(2): p. 58-67.

152. Xiao, X., et al., Inferring social ties between users with human location history.

Journal of Ambient Intelligence and Humanized Computing, 2014. 5(1): p. 3-

19.

153. DeScioli, P., et al., Best friends: Alliances, friend ranking, and the MySpace

social network. Perspectives on Psychological Science, 2011. 6(1): p. 6-8.

154. Levandoski, J.J., et al. Lars: A location-aware recommender system. in Data

Engineering (ICDE), 2012 IEEE 28th International Conference on. 2012.

IEEE.

155. Tang, K.P., et al. Rethinking location sharing: exploring the implications of

social-driven vs. purpose-driven location sharing. in Proceedings of the 12th

ACM international conference on Ubiquitous computing. 2010. ACM.

156. Noulas, A., et al. A random walk around the city: New venue recommendation

in location-based social networks. in Privacy, Security, Risk and Trust

(PASSAT), 2012 International Conference on and 2012 International

Confernece on Social Computing (SocialCom). 2012. Ieee.

157. Zhang, W., J. Wang, and W. Feng. Combining latent factor model with location

features for event-based group recommendation. in Proceedings of the 19th

Page 174: Welcome to Pakistan Research Repository: Home

155

ACM SIGKDD international conference on Knowledge discovery and data

mining. 2013. ACM.

158. Ning, X., C. Desrosiers, and G. Karypis, A comprehensive survey of

neighborhood-based recommendation methods, in Recommender systems

handbook. 2015, Springer. p. 37-76.

159. Cao, X., G. Cong, and C.S. Jensen, Mining significant semantic locations from

GPS data. Proceedings of the VLDB Endowment, 2010. 3(1-2): p. 1009-1020.

160. Oh, J., O.-R. Jeong, and E. Lee, Collective Intelligence Based Place

Recommendation System, in Advanced Infocomm Technology. 2013, Springer.

p. 169-176.

161. Wei, L.-Y., Y. Zheng, and W.-C. Peng. Constructing popular routes from

uncertain trajectories. in Proceedings of the 18th ACM SIGKDD international

conference on Knowledge discovery and data mining. 2012. ACM.

162. Chow, C.-Y., J. Bao, and M.F. Mokbel. Towards location-based social

networking services. in Proceedings of the 2nd ACM SIGSPATIAL

International Workshop on Location Based Social Networks. 2010. ACM.

163. Yang, D., et al. A sentiment-enhanced personalized location recommendation

system. in Proceedings of the 24th ACM Conference on Hypertext and Social

Media. 2013. ACM.

164. Zheng, V.W., et al. Collaborative location and activity recommendations with

gps history data. in Proceedings of the 19th international conference on World

wide web. 2010. ACM.

165. Berjani, B. and T. Strufe. A recommendation system for spots in location-based

online social networks. in Proceedings of the 4th Workshop on Social Network

Systems. 2011. ACM.

166. Cheng, C., et al. Fused Matrix Factorization with Geographical and Social

Influence in Location-Based Social Networks. in Aaai. 2012.

167. Slawski, M., M. Hein, and P. Lutsik. Matrix factorization with binary

components. in Advances in Neural Information Processing Systems. 2013.

168. Denstadli, J.M. and J.K.S. Jacobsen, The long and winding roads: Perceived

quality of scenic tourism routes. Tourism management, 2011. 32(4): p. 780-789.

169. Cheng, X., et al., Wideband channel modeling and intercarrier interference

cancellation for vehicle-to-vehicle communication systems. IEEE Journal on

Selected Areas in Communications, 2013. 31(9): p. 434-448.

Page 175: Welcome to Pakistan Research Repository: Home

156

170. Ren, G., T. Long, and W. Juebo, A novel recommender system based on fuzzy

set and rough set theory. Advances in Information Sciences and Service

Sciences, 2011. 3(4).

171. Lemke, A., Technique for Order Preference by Similarity to Ideal Solution.

2014.

172. Christensen, I.A. and S.N. Schiaffino, A Hybrid Approach for Group Profiling

in Recommender Systems. J. UCS, 2014. 20(4): p. 507-533.

173. Gowalla Dataset. [cited 2016 February 16]; Available from:

http://www.yongliu.org/datasets.

174. Liu, B., et al. Learning geographical preferences for point-of-interest

recommendation. in Proceedings of the 19th ACM SIGKDD international

conference on Knowledge discovery and data mining. 2013. ACM.

175. Rehman, F., O. Khalid, and S.A. Madani, A comparative study of location-

based recommendation systems. The Knowledge Engineering Review, 2017.

32.

176. Pandya, S., et al. A novel hybrid based recommendation system based on

clustering and association mining. in Sensing Technology (ICST), 2016 10th

International Conference on. 2016. IEEE.

177. Sawant, K.B., Efficient determination of clusters in k-mean algorithm using

neighborhood distance. Int. J. Emerg. Eng. Res. Technol, 2015. 3: p. 22-27.

178. Guha, S. and N. Mishra, Clustering data streams, in Data Stream Management.

2016, Springer. p. 169-187.

179. Song, G., et al. Solutions for processing k nearest neighbor joins for massive

data on mapreduce. in Parallel, Distributed and Network-Based Processing

(PDP), 2015 23rd Euromicro International Conference on. 2015. IEEE.

180. Kahloul, L., et al., Using high level Petri nets in the modelling, simulation and

verification of reconfigurable manufacturing systems. International Journal of

Software Engineering and Knowledge Engineering, 2014. 24(03): p. 419-443.

181. Barrett, C., A. Stump, and C. Tinelli, The satisfiability modulo theories library

(SMT-LIB)(2010). SMT-LIB. org, 2016. 156.

182. Murata, T., Petri nets: Properties, analysis and applications. Proceedings of the

IEEE, 1989. 77(4): p. 541-580.

183. Jensen, K. and G. Rozenberg, High-level Petri nets: theory and application.

2012: Springer Science & Business Media.

Page 176: Welcome to Pakistan Research Repository: Home

157

184. Gavgani, V.Z. Health information need and seeking behavior of patients in

developing countries' context; an Iranian experience. in Proceedings of the 1st

ACM International Health Informatics Symposium. 2010. ACM.

185. Synnot, A.J., et al., Online health information seeking: how people with multiple

sclerosis find, assess and integrate treatment information to manage their

health. Health Expectations, 2016. 19(3): p. 727-737.

186. Pew Research Center Survey on Health Information Using Smart Phone in

Pakistan. [cited 2017 May 30]; Available from:

http://www.pewglobal.org/2013/05/01/spring-2013-survey/.

187. Muscat, D.M., et al., Can adults with low literacy understand shared decision

making questions? A qualitative investigation. Patient Education and

Counseling, 2016. 99(11): p. 1796-1802.

188. Ziefle, M. and A.K. Schaar, Technology acceptance by patients: empowerment

and stigma. Handbook of Smart Homes, Health Care and Well-Being, 2017: p.

167-177.

189. Dahl, S., et al., Empowering or misleading? Online health information

provision challenges. Marketing Intelligence & Planning, 2016. 34(7): p. 1000-

1020.

190. Wiesner, M. and D. Pfeifer, Health recommender systems: concepts,

requirements, technical basics and challenges. International journal of

environmental research and public health, 2014. 11(3): p. 2580-2607.

191. Johnson, J.D., Health-related information seeking: Is it worth it? Information

Processing & Management, 2014. 50(5): p. 708-717.

192. Jing, F. An empirical study on the features influencing users' adoption towards

personal health records system. in Service Systems and Service Management

(ICSSSM), 2016 13th International Conference on. 2016. IEEE.

193. Johnson, J.D. and D.O. Case, Health information seeking. 2012: Peter Lang

New York, NY.

194. Powell, J., et al., The characteristics and motivations of online health

information seekers: cross-sectional survey and qualitative interview study.

Journal of Medical Internet Research, 2011. 13(1): p. e20.

195. Pan, B., The power of search engine ranking for tourist destinations. Tourism

Management, 2015. 47: p. 79-87.

Page 177: Welcome to Pakistan Research Repository: Home

158

196. Nursing Stories. [cited 2017 February 16]; Available from:

http://allnurses.com/general-nursing-discussion/medical-terms-patients-

301633.html.

197. Xiao, N., et al., Factors influencing online health information search: An

empirical analysis of a national cancer-related survey. Decision Support

Systems, 2014. 57: p. 417-427.

198. Crook, B., et al., Sharing health information and influencing behavioral

intentions: The role of health literacy, information overload, and the Internet in

the diffusion of healthy heart information. Health communication, 2016. 31(1):

p. 60-71.

199. Bosslet, G.T., et al., The patient–doctor relationship and online social

networks: Results of a national survey. Journal of general internal medicine,

2011. 26(10): p. 1168-1174.

200. Kadry, B., et al., Analysis of 4999 online physician ratings indicates that most

patients give physicians a favorable rating. Journal of medical Internet research,

2011. 13(4): p. e95.

201. Sherrington, C., et al., Exercise to prevent falls in older adults: an updated

meta-analysis and best practice recommendations. New South Wales public

health bulletin, 2011. 22(4): p. 78-83.

202. Barnes, P.M. and C.A. Schoenborn, Trends in adults receiving a

recommendation for exercise or other physical activity from a physician or

other health professional. 2012: US Department of Health and Human Services,

Centers for Disease Control and Prevention, National Center for Health

Statistics.

203. Carek, P.J., S.E. Laibstain, and S.M. Carek, Exercise for the treatment of

depression and anxiety. The International Journal of Psychiatry in Medicine,

2011. 41(1): p. 15-28.

204. Farquhar, W.B., et al., Dietary Sodium and Health. Journal of the American

College of Cardiology, 2015. 65(10): p. 1042-1050.

205. Fox, S. and M. Duggan, Health online 2013. Washington, DC: Pew Internet &

American Life Project, 2013.

206. Horton, R., Offline: Clinical leadership improves health outcomes. The Lancet,

2013. 382(9896): p. 925.

Page 178: Welcome to Pakistan Research Repository: Home

159

207. Offline Doctor. [cited 2017 February 23]; Available from:

https://play.google.com/store/apps/details?id=appinventor.ai_wmfrhob.Doktor

_Offline&hl=en.

208. De Pessemier, T., S. Dooms, and L. Martens. A food recommender for patients

in a care facility. in Proceedings of the 7th ACM conference on Recommender

systems. 2013. ACM.

209. Freyne, J. and S. Berkovsky. Intelligent food planning: personalized recipe

recommendation. in Proceedings of the 15th international conference on

Intelligent user interfaces. 2010. ACM.

210. Kashima½, T., S. Matsumoto¾, and H. Ishii, Decision support system for menu

recommendation using rough sets. 2011.

211. Wang, T.J., et al., Vitamin D deficiency and risk of cardiovascular disease.

Circulation, 2008. 117(4): p. 503-511.

212. Tao, Z., A. Shi, and J. Zhao, Epidemiological perspectives of diabetes. Cell

biochemistry and biophysics, 2015. 73(1): p. 181-185.

213. Melanson, E., The effect of exercise on non‐exercise physical activity and

sedentary behavior in adults. Obesity Reviews, 2017. 18(S1): p. 40-49.

214. Faasse, K. and K.J. Petrie, The nocebo effect: patient expectations and

medication side effects. Postgraduate medical journal, 2013: p. postgradmedj-

2012-131730.

215. Pawlowska, M., J. Kapeluto, and D. Kendler, A case report of osteomalacia

unmasking primary biliary cirrhosis. Osteoporosis International, 2015. 26(7):

p. 2035-2038.

216. Huang, L.-C., X. Wu, and J.Y. Chen, Predicting adverse side effects of drugs.

BMC genomics, 2011. 12(5): p. S11.

217. Ng, S.W. and B.M. Popkin, Time use and physical activity: a shift away from

movement across the globe. Obesity Reviews, 2012. 13(8): p. 659-680.

218. Powell, L.M. and B.T. Nguyen, Fast-food and full-service restaurant

consumption among children and adolescents: effect on energy, beverage, and

nutrient intake. JAMA pediatrics, 2013. 167(1): p. 14-20.

219. Ellwood, P., et al., Do fast foods cause asthma, rhinoconjunctivitis and eczema?

Global findings from the International Study of Asthma and Allergies in

Childhood (ISAAC) Phase Three. Thorax, 2013: p. thoraxjnl-2012-202285.

Page 179: Welcome to Pakistan Research Repository: Home

160

220. Sánchez-Villegas, A., et al., Fast-food and commercial baked goods

consumption and the risk of depression. Public health nutrition, 2012. 15(03):

p. 424-432.

221. Lustig, R.H., L.A. Schmidt, and C.D. Brindis, Public health: The toxic truth

about sugar. Nature, 2012. 482(7383): p. 27-29.

222. Sheiham, A. and W.P.T. James, A new understanding of the relationship

between sugars, dental caries and fluoride use: implications for limits on sugars

consumption. Public health nutrition, 2014. 17(10): p. 2176-2184.

223. Klug, E.Q., et al., South African Dyslipidaemia Guideline Consensus Statement:

A joint statement from the South African Heart Association (SA Heart) and the

Lipid and Atherosclerosis Society of Southern Africa (LASSA). South African

Family Practice, 2015. 57(2): p. 22-31.

224. Rittinghouse, J.W. and J.F. Ransome, Cloud computing: implementation,

management, and security. 2016: CRC press.

225. Shaukat Khanum Laboratory. [cited 2016 16 December]; Available from:

https://shaukatkhanum.org.pk/.

226. Agha Khan Laboratory. [cited 2016 16 December]; Available from:

https://www.aku.edu/labreports/Pages/default.aspx.

227. Healthways Laboratory. [cited 2016 16 December]; Available from:

http://www.healthwayslabs.net/.

228. COFID Dataset. [cited 2016 03 September]; Available from:

https://www.gov.uk/government/publications/composition-of-foods-

integrated-dataset-cofid.

229. Geleijnse, G., et al. A personalized recipe advice system to promote healthful

choices. in Proceedings of the 16th international conference on Intelligent user

interfaces. 2011. ACM.

230. Svensson, M., K. Höök, and R. Cöster, Designing and evaluating kalas: A social

navigation system for food recipes. ACM Transactions on Computer-Human

Interaction (TOCHI), 2005. 12(3): p. 374-400.

231. Nutrino. [cited 2016 10 October]; Available from: https://nutrino.co/, .

232. Ueda, M., et al. Recipe recommendation method by considering the user’s

preference and ingredient quantity of target recipe. in Proceedings of the

International MultiConference of Engineers and Computer Scientists. 2014.

Page 180: Welcome to Pakistan Research Repository: Home

161

233. van Pinxteren, Y., G. Geleijnse, and P. Kamsteeg. Deriving a recipe similarity

measure for recommending healthful meals. in Proceedings of the 16th

international conference on Intelligent user interfaces. 2011. ACM.

234. Yang, L., et al., Yum-me: Personalized Healthy Meal Recommender System.

arXiv preprint arXiv:1605.07722, 2016.

235. ShopWell. [cited 2016 10 October]; Available from:

http://www.shopwell.com/.

236. Yummly. [cited 2016 10 October]; Available from:

http://developer.yummly.com.

237. Kennedy, J., Particle swarm optimization, in Encyclopedia of machine learning.

2011, Springer. p. 760-766.

238. Salehi, M., M. Pourzaferani, and S.A. Razavi, Hybrid attribute-based

recommender system for learning material using genetic algorithm and a

multidimensional information model. Egyptian Informatics Journal, 2013.

14(1): p. 67-78.

239. Delévacq, A., et al., Parallel ant colony optimization on graphics processing

units. Journal of Parallel and Distributed Computing, 2013. 73(1): p. 52-61.

240. Kashef, S. and H. Nezamabadi-pour, An advanced ACO algorithm for feature

subset selection. Neurocomputing, 2015. 147: p. 271-279.

241. Hariri, N., B. Mobasher, and R. Burke. Context-aware music recommendation

based on latenttopic sequential patterns. in Proceedings of the sixth ACM

conference on Recommender systems. 2012. ACM.

242. Wang, Y.-X. and Y.-J. Zhang, Nonnegative matrix factorization: A

comprehensive review. IEEE Transactions on Knowledge and Data

Engineering, 2013. 25(6): p. 1336-1353.

243. Pan, W., et al., Adaptive bayesian personalized ranking for heterogeneous

implicit feedbacks. Knowledge-Based Systems, 2015. 73: p. 173-180.

244. Amazon Food Dataset. [cited 2017 January 13]; Available from:

http://jmcauley.ucsd.edu/data/amazon/.

245. Rendle, S., et al. BPR: Bayesian personalized ranking from implicit feedback.

in Proceedings of the twenty-fifth conference on uncertainty in artificial

intelligence. 2009. AUAI Press.

Page 181: Welcome to Pakistan Research Repository: Home

162

246. Yang, X., Y. Guo, and Y. Liu, Bayesian-inference-based recommendation in

online social networks. IEEE Transactions on Parallel and Distributed Systems,

2013. 24(4): p. 642-651.

247. Pearl, J., Bayesian networks. Department of Statistics, UCLA, 2011.

248. Koren, Y., R. Bell, and C. Volinsky, Matrix factorization techniques for

recommender systems. Computer, 2009. 42(8).

249. Shi, Y., M. Larson, and A. Hanjalic. List-wise learning to rank with matrix

factorization for collaborative filtering. in Proceedings of the fourth ACM

conference on Recommender systems. 2010. ACM.

250. Snoek, J., et al. Scalable Bayesian Optimization Using Deep Neural Networks.

in ICML. 2015.

251. El-Dosuky, M., et al. Food recommendation using ontology and heuristics. in

International Conference on Advanced Machine Learning Technologies and

Applications. 2012. Springer.

252. Matlab Parallel Cloud. [cited 2017 10 January]; Available from:

http://www.mathworks.com/products/parallel-computing/matlab-parallel-

cloud/.

253. Big Data. [cited 2017 January 16]; Available from:

https://www.entrepreneur.com/article/233344.

254. Hoens, T.R., et al., Reliable medical recommendation systems with patient

privacy. ACM Transactions on Intelligent Systems and Technology (TIST),

2013. 4(4): p. 67.

255. Middleton, B., et al., Enhancing patient safety and quality of care by improving

the usability of electronic health record systems: recommendations from AMIA.

Journal of the American Medical Informatics Association, 2013. 20(e1): p. e2-

e8.

256. Abbas, A., et al., A cloud based health insurance plan recommendation system:

A user centered approach. Future Generation Computer Systems, 2015. 43: p.

99-109.

257. Zhang, D., et al., A carpooling recommendation system for taxicab services.

IEEE Transactions on Emerging Topics in Computing, 2014. 2(3): p. 254-266.

258. Le, Q.T. and D. Pishva. An innovative tour recommendation system for tourists

in Japan. in Advanced Communication Technology (ICACT), 2016 18th

International Conference on. 2016. IEEE.

Page 182: Welcome to Pakistan Research Repository: Home

163

259. Masthoff, J., Group recommender systems: Combining individual models, in

Recommender systems handbook. 2011, Springer. p. 677-702.

260. Peng, F., et al., N-dimensional Markov random field prior for cold-start

recommendation. Neurocomputing, 2016. 191: p. 187-199.