35
Similarity of mobile users based on sparse location history Pasi Fränti Radu Mariescu-Istodor Karol Waga 21.2.2019 P. Fränti, R. Mariescu-Istodor and K. Waga ”Similarity of mobile users based on sparse location history” Int. Conf. Artificial Intelligence and Soft Computing (ICAISC) Zakopane, Poland, 593-603, June 2018

Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Similarity of mobile usersbased on sparse location history

Pasi FräntiRadu Mariescu-Istodor

Karol Waga

21.2.2019

P. Fränti, R. Mariescu-Istodor and K. Waga

”Similarity of mobile users based on sparse location history”

Int. Conf. Artificial Intelligence and Soft Computing (ICAISC)

Zakopane, Poland, 593-603, June 2018

Page 2: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Introduction

Page 3: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Motivation

Cold start users in Recommender systems

Activity in the same area •

Opinions of local experts more valuable [Bao’2012]

Identify the same user

Page 4: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Existing methods

Complete trajectories [Ying’10]•

Longest common subsequence with speed and turn points [Liu’12]

Stop points (>30 min) [Zheng’10]

Most common stops: home and work [Biagioni’13]

User given priority for parts more [Wang’12]

Page 5: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Problems?

Full GPS trajectories not always stored

Areas of activity are often more important than the exact tracks

Explicit activities such as visits, check-ins and photo taking are widely recorded.

Page 6: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Test case: APR trio

Andrei 201214

412

8

4

5

3

4

20

7

9 20

5

3

Andrei 2013

46

11

6

8

3

Andrei 2014

Pasi 2012

10

12

Pasi 2013

8

5

8

5

Pasi 2014

9

9

Radu 2012

18

28

Radu 2013

13

11

16

Radu 2014

6

21

8

11

Page 7: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

A11

A13

A12

A14

46

11

6

8

3

20

7

9 20

5

3

14

412

8

4

5

3

410

6

6

11

Example of User A

Page 8: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Pasi 2011

Pasi 2013

Pasi 2012

Pasi 2014

5

610

12

8

4

5

8

59

9

Example of User P

Page 9: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Radu 2011

Radu 2013

Radu 2012

Radu 2014

19

32

18

28

13

11 6

16 21

8

11

Example of User R

Page 10: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Two approaches

1. Histogram comparison-

Construct histogram from some common data

-

Map points to the bins-

Histogram matching of the two counts

2. Clustering comparison-

Cluster each data separately

-

Map centroids between the data-

Calculate similarity based on the mapping

Page 11: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Selected approaches

1. Histogram comparison-

Construct histogram from some common data

-

Map points to the bins-

Histogram matching of the two counts

2. Clustering comparison-

Cluster each data separately

-

Map centroids between the data-

Calculate similarity based on the mapping

Page 12: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Approach 1:Histograms

Page 13: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Location historyActivity points of users

Page 14: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Histogram matchingCreate bins from popular locations

Page 15: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Location similarityCount visit statistics

1 0 0

1 0 0

0 0 0

0 0 3

0 0 2

2 2 0

1 1 1

4 3 19

76

Histogrambins

Visitcounts

Page 16: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Histogram of the toy example

4 3

1

2 2

0

0 0

3

1 1

1

0 0

2

1 0

0

1 0

0

0 0

0

9 6 7

8

43

3

2

1

1

0

0.44 0.50

0.14

0.22 0.33

0.00

0.00 0.00

0.43

0.11 0.17

0.14

0.00 0.00

0.28

0.11 0.00

0.00

0.11 0.00

0.00

0.00 0.00

0.00

Pekka Bartek Julia Pekka Bartek JuliaFrequencies Relative frequencies

Page 17: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Distance measures

Page 18: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Bhattacharyya distance

4 3

1

2 2

0

0 0

3

1 1

1

0 0

2

1 0

0

1 0

0

0 0

0

9 6 7

8

43

3

2

1

1

0

0.44 0.50

0.14

0.22 0.33

0.00

0.00 0.00

0.43

0.11 0.17

0.14

0.00 0.00

0.28

0.11 0.00

0.00

0.11 0.00

0.00

0.00 0.00

0.00

ii qpD ln

0.47

0.27

0.00

0.14

0.00

0.00

0.00

0.00

= 0.88

-ln = 0.13

0.26

0.00

0.00

0.15

0.00

0.00

0.00

0.000.410.89

Pekka Bartek Julia Pekka Bartek JuliaFrequencies Relative frequencies

Pekkavs

Bartek

Bartekvs

Julia

Page 19: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

L1 distance

4 3

1

2 2

0

0 0

3

1 1

1

0 0

2

1 0

0

1 0

0

0 0

0

9 6 7

8

43

3

2

1

1

0

0.44 0.50

0.14

0.22 0.33

0.00

0.00 0.00

0.43

0.11 0.17

0.14

0.00 0.00

0.28

0.11 0.00

0.00

0.11 0.00

0.00

0.00 0.00

0.00

0.06

0.11

0.00

0.06

0.00

0.11

0.11

0.00

= 0.42

0.36

0.33

0.43

0.03

0.28

0.00

0.00

0.001.43

i

ii qpL1

Page 20: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

4 3

1

2 2

0

0 0

3

1 1

1

0 0

2

1 0

0

1 0

0

0 0

0

9 6 7

8

43

3

2

1

1

0

0.44 0.50

0.14

0.22 0.33

0.00

0.00 0.00

0.43

0.11 0.17

0.14

0.00 0.00

0.28

0.11 0.00

0.00

0.11 0.00

0.00

0.00 0.00

0.00

0.36%

1.21%

0.00%

0.36%

0.00%

1.21%

1.21%

0.00%

= 0.04

13%

11%

18%

0%

8%

0%

0%

0%0.50

i

ii qpL 22

L2 distance

Page 21: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Uniform global grid (e.g. 2525 m)•

Some known places (Mopsi services)

All user data from Mopsi•

The two data as such:-

Cluster the joint activity data

-

Each cluster represents one bin-

Count blues and reds in the cluster using any histogram matching technique

How to create histogram bins

Page 22: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Approach 2:Clustering

Page 23: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Clustering of two distributions

6

10

4

1 7/1 = 7

9/6 = 1.5

4/0.5 = 8

6/0.5 = 12

Page 24: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor
Page 25: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Experimental results

Page 26: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Collected data

Municipalities:Municipalities:JoensuuJoensuuLiperiLiperiOutokumpuOutokumpuPolvijPolvijäärvirviKontiolahtiKontiolahtiIlomantsiIlomantsiJuukaJuuka

63.44N28.65E

62.25N31.58E

Joensuu sub-region bounding box

Histogram from 293 places (Mopsi services)•

User activities until 31.12.2014‒

Photos taken

Tracking started or ended

Page 27: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Summary of APR trio data 2011-2014

Distinct users:

3Total users:

12

Sample sizes:

37 -

1263

14

412

8

4

5

3

4

Andrei2012

Andrei2011

10

6

6

11

Andrei2011

10

6

6

11

20

7

9 20

5

3

Andrei2013

20

7

9 20

5

3

Andrei2013

46

11

6

8

3

Andrei201446

11

6

8

3

Andrei2014

19

32Radu201119

32Radu2011

18

28

10

Radu201218

28

10

Radu2012

13

11

16 Radu2013

13

11

16 Radu2013

6

21

8

11

9

Radu20146

21

8

11

9

Radu2014

5

64

Pasi2011 5

64

Pasi2011

10

12Pasi201210

12Pasi2012

9

9

Pasi20149

9

Pasi2014

8

5

8

5

Pasi20138

5

8

5

Pasi2013

Page 28: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Ten most popular histogram bins Frequencies shown

Work

Home

Home

Page 29: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Test protocolExpected result

True positive:

13

11

16 Radu2013

13

11

16 Radu2013 5

64

Pasi2011 5

64

Pasi2011

True negative:

vs.9

9

Pasi20149

9

Pasi20145

64

Pasi2011 5

64

Pasi2011 vs.

Distance Distance

SAME DIFFERENT

• 12*12 = 144 comparisons in total• 3*4*4 = 48 true positives (33%)• 3*8*8 = 96 true negatives (67%)

Page 30: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

??

Comparison of all pairs

vs.

Distance

0.01 0.01 0.02 0.03 0.05 0.06 …

0.74 0.76 0.81 0.82 0.88

Same user

threshold

Different user

Sort

Algorithm result:

Page 31: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Classification error

0.01 0.01 0.02 0.03 0.05 0.06 …

0.74 0.76 0.81 0.82 0.88

Same user Different user

same same not same same same …

not not not not not

Algorithm result:

Ground truth:

Errors made:

3Total comparisons:

144

Classification error:

2%

Page 32: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Classification results

Averagedistance A priori

Top-33% Best possible

Fuzzy worse than crisp

Error Error

Page 33: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

What if top-10 places removed?

Page 34: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Conclusions

Main results:•

People’s identity can be recognized with high accuracy

All except L2

and L•

Fuzzy works worse than crisp

Future research:•

Clustering-based approach

K-NN classifier

Page 35: Similarity of mobile users based on sparse location historycs.joensuu.fi/sipu/pub/LocationSimilarity.pdf · based on sparse location history. Pasi Fränti. Radu Mariescu-Istodor

Thank you

Time for Time for questions!questions!