38
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier

A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

  • Upload
    maya

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*. by G ö khan Yavaş Feb 22, 200 5. *: To appear in Data and Knowledge Engineering, Elsevier. Outline. Introduction Background Work Mobility Prediction Based On Mobility Rules Experimental Results Conclusion - PowerPoint PPT Presentation

Citation preview

Page 1: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

1

A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

by

Gökhan Yavaş

Feb 22, 2005

*: To appear in Data and Knowledge Engineering, Elsevier

Page 2: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

2

Outline

Introduction

Background Work

Mobility Prediction Based On Mobility Rules

Experimental Results

Conclusion

Future Work

Page 3: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

3

Introduction

Personal Communication Systems are becoming more

popular

Dynamic relocation of users gives rise to the problem of

Mobility Management

Methods for storing and updating the location information

of users

Mobility Prediction: the prediction of a user’s next inter-

cell movement

Page 4: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

4

Motivation

Predicted movement can be used for effectively allocating resources

instead of blindly allocating excessive resources

Benefit to the broadcast program generation [1], data items can be

broadcast to the predicted cell

Location prediction is crucial in processing of location dependent

queries [2], since answer depends on the location of user

Queries depending on future positions can be answered by effective

location prediction[1] Y. Saygin and O. Ulusoy. Exploiting Data Mining Techniques for Broadcasting Data in Mobile Computing Environments. IEEE Transactions on Knowledge and Data Engineering, 14(6): 1387-1399, 2002. [2] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the IEEE Conference on Data Engineering (ICDE’95), pages 3–14, 1995.

[2] G. Gok and O. Ulusoy. Transmission of Continuous Query Results in Mobile Computing Systems. Information Sciences, 125(1-4): 37-63, 2000

Page 5: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

5

Network Model

PCS network partitioned into smaller areas called cells

Each cell has a Base Station (BS), used for broadcasting and

receiving information

Home Location Register (HLR): database which keeps the inter-cell

movement history of user

Visitor Location Register (VLR): each BS has a database which

keeps the profiles of the users located in this cell.

Page 6: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

6

Problem Definition

It is possible for us to get the movement history of a mobile user

from HLR of a user

Movement trajectories in the form of T=<(id1, t1) ... (idk, tk)>

Partitioned into subsequences, named user actual paths, UAPs

UAPs have the form of U=<c1, c2, ..., cn>

We mine UAPs to find user mobility patterns, UMPs

Page 7: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

7

Related Work

The roots of our method go back to the Apriori algorithm [3]

Association rule mining

Sequential pattern mining problem [4]

Ordering of the items in an itemset must be taken into consideration

Not appropriate for our domain, because does not take into account the

network topology

[3] R. Agrawal, R. Srikant, Fast Algorithms for mining association rules. In Proceedings of Very Large Databases Conference (VLDB’94), pages 487-499, 1994.

[4] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the IEEE Conference on Data Engineering (ICDE’95), pages 3–14, 1995.

Page 8: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

8

Mobility Prediction Based On Mobility Rules

1. Mining UMPs from Graph Traversals: Movement data mined for

discovering regularities (UMP) in inter-cell movements

2. Generation of Mobility Rules: Mobility rules are extracted from

UMPs

3. Mobility Prediction: Prediction of next inter-cell movement based

on mobility rules

Page 9: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

9

Mining UMPs from Graph Traversals

Vertices of G: the cells in the coverage region

Edges of G: if two cells, A and B, are neighbors in the coverage

region, then there are two edges in G, A B and B A

An example coverage region and corresponding graph G

Page 10: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

10

Mining UMPs from Graph Traversals

Subsequence definition: Assume we have two UAPs, A = <a1, a2, ... , an> and

B = <b1, b2, ... , bm>. B is a

subsequence of A, iff all cells in B also exist in A while keeping their order in B

Example: A=<c3, c4, c0, c1, c6,

c5>, then B=<c4, c5> is a

length-2 subsequence of A. In

other words, B is contained by

A

Page 11: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

11

Mining UMPs from Graph Traversals

Every candidate has a count value that keeps the support given to

this candidate by UAPs

This is the point our work extends algorithm in [5, 6]

Method in [5, 6] increments the count value of a candidate by 1 if

this candidate is contained by a UAP

Unfair !!!

Treats in the same way a highly corrupted candidate pattern

a slightly corrupted (or even not corrupted at all) candidate pattern

[5] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, A Data Mining Algorithm for Generalized Web Prefetching, IEEE Transactions on Knowledge and Data Engineering, 15(5): 1155-1169, 2003.

[6] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, Effective Prediction of Web User Accesses: A Data Mining Approach, In Proceedings of the WebKDD Workshop (WebKDD’01), 2001.

Page 12: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

12

Mining UMPs from Graph Traversals

Should consider the degree of corruption for the mobile motion

prediction context

Support assigned to a candidate pattern B by a UAP A (i.e.,

suppIncsuppInc)

Page 13: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

13

Mining UMPs from Graph Traversals

Define totDisttotDist value by means of the notion of string alignment

Definition 2.1: If x and y are each single character or space, then

(x, y) denotes the score of aligning x and y. In our case, the scoring

function is defined as follows:

Page 14: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

14

Mining UMPs from Graph Traversals

Definition 2.3: Let A be a UAP and B be a pattern. A containment

alignment X' maps A and B into strings A‘ and B‘ where: |A'| = |B'|

B is contained by A, and

Removal of all spaces from A' and B' leaves A and B

Total score of the alignment X':

Page 15: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

15

Mining UMPs from Graph Traversals

For any two patterns, there may be more than one alignment Ex: Consider A=<c3, c4, c0, c1, c6, c5, c8, c5>, B=<c4, c5>

Page 16: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

16

Mining UMPs from Graph Traversals

Definition 2.4: An optimal containment alignment of UAP A and

pattern B is one that has the minimum possible value for these two

patterns

Total score of an alignment: sum of penalties

An optimal alignment should have the minimum number of mismatches,

which means the minimum score of alignment

totDist(A, B) = Score of the optimal alignment for the UAP A and

pattern B

Page 17: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

17

Mining UMPs from Graph Traversals

Example: Given UAP A=<c3, c4, c0, c1, c6, c5, c8> and pattern B=<c4,

c5 , c8 > , optimal containment alignment for these:

Score of the alignment = totDist (A, B) = 3

Support assigned to the candidate pattern B by the UAP A:

Page 18: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

18

Mining UMPs from Graph Traversals

The quality of the patterns will improve since this method is a more

accurate way of support counting Degree of corruption taken into account

This will give rise to more accurate mobility rules

Resulting in the prediction accuracy improved compared to the

accuracy by using the rules that are generated with the former way

of support counting

Application of different methods for totDist will affect the quality of

rules

Page 19: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

19

Mining UMPs from Graph Traversals

Candidate Generation:

Example: C = <c1, c2, ..., ck>

N+(ck) : the set of all nodes in G, which have an incoming edge from

the cell ck

A cell from N+(ck) is attached to the end of C to generate C'

Add C' to the set of Candidates

Page 20: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

20

Mining UMPs from Graph Traversals

Apriori Pruning can be used?

NO due to the nature of our new support counting method

Support is no longer monotonically decreasing with the increasing

size of the pattern

A length-(k-1) subpattern S of a length-k pattern P doesn’t need to

be large even if P is large

Ex: UAP <1, 6, 0, 3, 2>, P1 = <1, 0, 2> and its subpattern P2 = <1,

2>

UAP assigns a support

to P1 and to P2

)21(

1

)31(

1

Page 21: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

21

Mining UMPs from Graph Traversals

Example: Use suppmin= 1.33

Database of UAPs Set of all large Patterns (UMPs)

Page 22: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

22

Generation of Mobility Rules

Extract rules from the UMPs

For a rule:

R: < c1, c2, …, ci-1 > < ci, ci+1, ... ck >

A confidence value is calculated:

Head Tail

Page 23: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

23

Generation of Mobility Rules

The rules which have confidence higher than confmin are selected

All possible mobility rules for the UMPs given in previous example

are:

Page 24: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

24

Mobility Prediction

User has followed a path P=< c1, c2, …, ci-1 > up to now

Find the rules whose head parts are contained in P and the last cell

in their head is ci-1

Store the first cell of tail along with the (confidence + support) of rule

as a tuple

Sort these tuples w.r.t. the (confidence + support) values in

descending order

Select the first m tuples

Page 25: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

25

Mobility Prediction

Example: Assume that the current trajectory of the user is P=<2, 3,

0, 4>

Matching Rules:

<4> <0>

<4> < 5>

<3, 4> <0>

< 3, 4 > <5>

Sorted tuple array is: TupleArray = [(5, 85.83), (0, 76.5)]

If m=1, then Predicted Cells Set = {5}

If m=2, then Predicted Cells Set = {5, 0}

Page 26: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

26

Simulation Design

Mobile users travel on a 15 by 15 hexagonal shaped network

To generate UAPs, first UMPs are generated

UMPs are taken as a random walk over the network

Two types of UAPs: Outliers: a random walk over the network

Non-outliers: those which follow a UMP

o (outlier percentage): ratio of the number of outliers to the number

of non-outliers

Page 27: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

27

Simulation Design

Corruption mechanism: insert random cells between the

consecutive cells of an UMP

c (corruption ratio): denotes the ratio of the number of such random

cells to the number of cells in the corresponding UMP

Three possible outcomes of a prediction Correct prediction

Incorrect prediction

No prediction

Two performance measures:

Page 28: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

28

Algorithms Used for Comparison

Mobility Prediction Based on Transition Matrix (TM)

A cell-to-cell transition matrix formed

Select the m most probable cells from the transition matrix

Ignorant Prediction

Randomly select the m neighboring cells of the current cell

Page 29: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

29

Impact of m on Precision and Recall

Decreasing precision for both

our algorithm and TM

Increasing probability of

making some incorrect

predictions as m increases

Increasing recall for all

algorithms, but more

significant increase for TM and

Ignorant prediction

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 2 3 4 5 6

m

Pre

cis

ion UMP-Based

Ignorant

TM

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6

m

Recall UMP-Based

Ignorant

TM

Page 30: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

30

Impact of m on Precision and Recall

Setting m as small as possible

is convenient for our method

The increase rate in the recall

value from m values 1 to 2 is

maximum for TM

m ≥ 3 would cause excessive

network resource waste

Thus choose m = 2

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 2 3 4 5 6

m

Pre

cis

ion UMP-Based

Ignorant

TM

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6

m

Recall UMP-Based

Ignorant

TM

Page 31: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

31

Impact of Suppmin

Reduced recall and precision

The increase in the suppmin

value leads to a decrease in

the number of mined mobility

rules

Number of correct predictions

is reduced

Choose suppmin=0.1

0.55

0.5625

0.575

0.5875

0.6

0.6125

0.625

0.6375

0.65

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Min Support

Pre

cisi

on

0.45

0.455

0.46

0.465

0.47

0.475

0.48

0.485

0.49

0.495

0.5

0.505

0.51

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Min Support

Rec

all

Page 32: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

32

Impact of Confmin

Increasing precision Higher quality rules with the

increasing confmin

Leading to a higher decrease rate in number of predictions when compared to the decrease rate in number of correct predictions

Decreasing recall The number of mined rules is

reduced leading to a decrease in the number of correct predictions

Choose confmin=80

0.5

0.6

0.7

0.8

0.9

1

50 60 70 80 90 100

Min Conf

Pre

cis

ion

0

0.1

0.2

0.3

0.4

0.5

0.6

50 60 70 80 90 100

Min Conf

Recall

Page 33: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

33

Impact of Corruption Factor

Decreasing precision and

recall for our method and TM

For all c, better precision than

TM but worse recall than TM

For our method, as c

increases: The number of mined mobility

rules decreases No prediction in some cases

because no matching rules due to the corrupted UAPs

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8

corruption factor

Pre

cis

ion UMP-Based

Ignorant

TM

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0 0.2 0.4 0.6 0.8

corruption factor

Re

ca

ll UMP-Based

Ignorant

TM

Page 34: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

34

Impact of Outlier Percentage

Both performance measures

not affected significantly for all

methods

Rules extracted from outlier

UAPs not used commonly,

thus not reducing recall and

precision significantly

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

20 30 40 50 60

outlier percentage

Pre

cis

ion

UMP-Based

Ignorant

TM

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

20 30 40 50 60

outlier percentage

Re

ca

ll UMP-Based

Ignorant

TM

Page 35: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

35

Conclusion

A data mining algorithm for the prediction of user movements in a

mobile computing system Algorithm is based on

Mining the mobility patterns of users Then forming mobility rules from these patterns Finally predicting a mobile user’s next movements by using the mobility

rules

A good performance when compared to the performance of Ignorant

Method

Page 36: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

36

Conclusion

Performance when compared to the TM

Better Precision: More accurate predictions Most of its predictions made at each request are correct

Worse Recall: Our method may not make prediction in response to some of the prediction

requests Because there may not be any matching rule for the current trajectory of the

user when a prediction request is made

Page 37: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

37

Future Work

For calculating the totDist value, our method: Decrease the support given to pattern by a UAP as the number of

corrupted cells increases in pattern Other methods may be employed for calculating totDist value

No time domain of the mobility patterns and mobility rules

considered In real life, mobility patterns might be related to time Some specific rules valid for a specific time interval Extend our algorithm to include the time domain of mobility rules

A candidate pruning criterion suitable for our support counting method may be employed

Page 38: A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

38

?Questions & Comments