[IEEE 2007 16th IST Mobile and Wireless Communications Summit - Budapest, Hungary (2007.07.1-2007.07.5)] 2007 16th IST Mobile and Wireless Communications Summit - Towards self-characterization

Towards self-characterization of user mobility

patternsMathias Boc, Anne Fladenmuller, and Marcelo Dias de Amorim

Laboratoire d’Informatique de Paris 6 (LIP6/CNRS)

Universite Pierre et Marie Curie – Paris 6

104, avenue du President Kennedy, 75016, Paris, France

Emails: {boc,fladenmu,amorim}@rp.lip6.fr

Abstract—In this paper, we investigate the individuality ofmobility patterns in wireless networks composed of IEEE 802.11access points. We propose a mobility-aware clustering algorithmthat uses roaming events as the metric to evaluate the proximity toaccess-points (APs) without using any geographical information.The contributions of this clustering algorithm are threefold.First, it provides a sanitized image of the topological mobilityof individuals. Second, it categorizes clusters as belonging toagglomerations (where individuals pauses) or to paths betweenagglomerations. Third, it proposes a classification of user mobilitywith regard to the number of places with social meanings forthe individual according to the number of visited clusters. Weanalyze data collected within periods ranging up to 8 monthsand show that the differences in activeness, coverage of mobility,home locality, and size of the list of guest locations are clearlyindividual-related. Such results serve as a basis for the definitionof future user-centric communication systems.

I. INTRODUCTION AND RELATED WORK

Wireless networks are a response to the needs of mobility

of individuals. For such networks to be efficient in providing

high-quality services, it is fundamental that the networked

infrastructure understands the reality of user displacements.

Such an investigation is important for achieving three main

goals: (1) to understand what are the implications and how

mobility can have an impact on the network services, (2) to

create realistic mobility models to evaluate the performance

of protocols and algorithms, and (3) to propose new commu-

nication solutions that can adapt to the specificities of each

user. This latter point is fundamental in the wireless context

where resources are scarce.

We can classify the study of user mobility into three

categories: network-centric, AP-centric, and user-centric.

Network-centric studies are mainly focused on the dynamics

of the population with regard to the usage of network re-

sources [1], [2]. In other words, the idea is to recognize

hotspots in order to increase the density of access points (APs)

in the area. AP-centric studies address localized solutions

mainly on traffic engineering, quality of service (QoS), as well

as resource management [3]. User-centric approaches focus on

complementary support such as mobility modelling [4], [5] and

network cache management [6].

This work has been partially supported by the European Commissionproject WIP under contract 27402 and by the RNRT project Airnet undercontract 01205.

In this paper, we focus on user-centric solutions and more

specifically on how they behave in terms of mobility. We

refer to mobility as the variation of the association of nodes

to the infrastructure through APs. We will see later that,

contrary to traditional approaches that consider mobility as a

simple sequence of APs, our proposal inherently captures some

constraints of the environment without relying on geographic

information.

Understanding and classifying user mobility with a user-

centric approach requires an analysis of wireless data traces.

However, the results obtained from raw wireless data traces

are difficult to interpret due to the variations of wireless links

quality and to the geometric characteristics of the topology.

Indeed, it is difficult to make a correspondence from an

observed topological mobility with the real physical mobility

as the variations of signal quality can give different association

sequences. In the literature, the solutions to overcome these

obstacles are to approximate the real user location by using

building segmentations, to correlate the association sequence

with GPS information from sample users, or by using the

coordinates of the APs [3], [7]. However, such approxima-

tions introduce some distortions in the observed topological

mobility and thus tamper with the real impact on the net-

work services (e.g., network address attribution, sub network

organization, location management, and forwarding). In fact,

although several works contributed to a better understanding

mobility in the wireless context, they generally considered

users as an ensemble. Much has to be done toward a better

understanding of the individual contributions of each node to

the global behavior of the network.

In this paper, we propose a clustering algorithm that, without

using any geographical information, returns a sanitized image

of the observed mobility on an individual basis. We rely

on measurement campaigns conducted at Dartmouth campus

that are available to the research community. Through a

gathering process of APs, we better understand how micro-

mobility can impact the services offered by the network. From

these sanitized data, we are able to differentiate APs which

might have a social meaning for an individual (we call this

a place)1 from APs which can be interpreted as part of a

path between two places. Finally, thanks to this distinction

between APs and to a long-term study of the evolution of

1A similar definition, called “hubs”, has been introduced in [8].

individual displacements, we can distinguish different types

of mobility and thus propose a classification of individual

mobility. The resulting classification clearly underlines the fact

that users behave so differently from each other that future

communication infrastructures for wireless networks should

consider the possibility of running on a per-user basis. Our

results are in concordance with recent studies in the domain [2]

and offer further understanding on network dynamics from a

perspective point of view.

The remainder of this paper is organized as follows. In Sec-

tion II, we describe the data traces we used in our experiments.

In Section III, we enumerate the different phenomena that

introduce variations in the observed topological mobility with

the physical mobility. We detail our individual mobility-based

clustering algorithm and the resulting added values in the com-

prehension of individual mobility in Section IV. In Section V,

we analyze through different period of time individual-related

mobility characteristics. Finally, we conclude this paper in

Section VI.

II. DATA TRACES

For our analyses, we use data traces of the IEEE 802.11

wireless network of the Dartmouth campus [9]. The campus is

composed of 188 buildings, covered by 566 official wireless

access points. The total surface is about 200 acres and the

number of users is about 5500.

Our study focuses on the movement files [10] on a two-

months period (2004-01-01 to 2004-29-02). We use this period

of two months for three reasons: (1) to avoid noise due to

network events and school holidays, (2) to capture both short

and long-range mobility patterns, and (3) to observe specific

user behaviors according to how they are interested by the

network. This collection of data represents association and dis-

association events, obtained from Syslog data on anonymized

wireless cards.

III. UNDERSTANDING INDIVIDUAL MOBILITY

The comprehension of individual mobility becomes a tough

problem when it relies on raw measurement data. Indeed, the

wireless nature of the network with all the variations and

consequences it implies, plus the density of the APs in the

environment, creates variations in the observed topological

mobility.

We can cite at least four types of events which can cause

these variations:

• Ping-pong effect. It refers to the succession of

associations-disassociations between two ore more APs.

It is caused by the closeness of the signal to noise ratio

(SNR) of neighboring APs and/or the aggressiveness of

the wireless card, and can be misinterpreted as mobility.

• Localized network problems. If for some technical rea-

sons, one AP becomes disabled for a certain amount of

time, the user probably associates with another neighbor-

ing AP. Without geographical information, this event can

make us suppose that the user changed his/her habits.

• Physical micro-variations. The physical mobility of the

individual can present some small variations (about a few

meters). In topological dense areas, these micro-variations

can result in different association patterns.

• Erroneous reproducibility. There is a probability that the

same physical movement results in different association

sequences. Without geographical information like build-

ings position or GPS data, it is difficult to correlate these

physical movements with a topological view.

Concerning the latter point, the repetition of movements

can create junctures between those different patterns and

create sectors of micro-mobility which can be detected with a

topological standpoint. The need of an algorithm to recognize

these junctures is then required to better understand the real

objectives of the observed movements. This is part of our

proposal detailed in the following.

IV. INDIVIDUAL MOBILITY-BASED CLUSTERING

We propose a clustering algorithm based on individual

mobility. By using the number of roaming events (without

disconnection) between APs as the main metric, we regroup,

for each user, APs that are close to each other in terms of

probability of being visited.

A. Description of the algorithm

The algorithm is decoupled in three parts: the collection of

network events, the clustering, and the categorization of the

resulting clusters into places with social meaning.

1) Collection of network events: The collection of network

events is assured by two data structures. For each individual,

the relationship with the N APs in the network are represented

through the roaming matrix R = N × N , where the element

rij ∈ R counts the number of cumulated roaming events

between the APs i and j. The cumulated roaming events imply

that the relationship between two subsets of APs can appear

after independent sessions.

The second data structure is also created in a per-user basis.

It is an N -row table that stores general information about each

AP, such as the total number of associations, the cumulated

association duration, and the average association duration.

2) The clustering algorithm: Firstly, we have to define the

terms which will be used in the algorithm:

• Link. There is a “link” lij between two APs i and j if

there are bi-directional roaming events between these two

APs (rij 6= 0 and rji 6= 0).

• Cost of a link. The cost of a link or “distance” between

two APs i and j is equal to rij + rji. We thus define the

cost of a link lij as cij = cji = rij + rji.

• Cluster: A “cluster” is an AP or a group of APs. One

AP can be in only one cluster. Two APs are eligible to

be merged in the same cluster if it exists a link between

them.

• Weight of a cluster. The weight of a cluster is the value

of the maximum cost link of the cluster.

We start with a graph representation of the network in which

vertices are APs. An edge exists between two vertices if there

is a link (as defined above) between the corresponding APs.

In order to limit the variations of intra-cluster link costs, we

define a threshold k (with 0 ≤ k ≤ 1).

A

B

C

E

G

D

F

H

50

10

10

2

1160

8033

Cluster A

Weight A=50

Cluster D

Weight D=10

Cluster H

Weight H=80

Cluster E

Weight E=0

19

21

Fig. 1. Clustering of the access-points according to the number of roamingevents between them (with k = 0.5).

At the beginning, each AP i becomes a cluster ci of size 1

and weight wi = 0. We consider first the link with the highest

value in the graph. If two APs i (cluster ci) and j (cluster

cj) can merge together (i.e., cij ≥ k × max{wi, wj}), then

the weight of the resulting cluster will be equal to the highest

value of the links within the cluster.

We repeat the clustering process until there are no more

links to be considered. An example of a resulting clustered

graph representation is illustrated in Fig. 1. One can notice

that the cost of links from AP d in cluster cd and AP f or AP

g in cluster ch is not sufficient to merge both clusters. Such

an approach serves to differentiate paths from locations where

a user stays longer.

3) Categorization of the clusters: The result of this clus-

tering algorithm is an embedded graph composed by a list of

clusters. However, this is a flat representation of associations

because there is no temporal aspect that gives us the means

to categorize the clusters. We introduce this temporal aspect

by computing, for each cluster, the sum of the cumulated

association duration of each AP within the cluster. As we can

observe in Fig. 2, for a mobile user u1, there is a limited

number of clusters in which u1 spent most of her/his time

(around 10 APs). Similarly, there is a large number of clusters

for which the cumulated association duration does not reach

a certain threshold (100 seconds on the figure).

In order to define a threshold which depends on the mobility

of an individual, we propose to take the overall average asso-

ciation duration among all the visited APs. For each cluster

ci, we choose the AP with the highest average association

duration to represent the cluster. If this value is greater than the

threshold, the cluster will be considered as a place with social

meaning. In the other case, the cluster will be considered as

a part of a path or, simply, as a place without social meaning.

V. CHARACTERISATION OF INDIVIDUAL-RELATED

MOBILITY PATTERN

In this section we enumerate individual-related mobility

characteristics and study the evolution of these parameters

through different periods of time. We still use the initial

period of January to February 2004 and, for certain parameters,

1

10

100

1000

10000

100000

1e+06

1 10 100 1000

Cum

ula

ted a

ssoci

atio

n d

ura

tion (

sec)

Rank (num)

After clusteringBefore clustering

Fig. 2. Log-log of the cumulated associations’ durations with and withoutusing the clustering algorithm.

extend this period to eight months (September 2003 to April

2004). We provide the same analysis for the same group of

users which were active during the initial period in order to

see the evolution of their behaviors.

A. Periods of activity and mobility coverage

The periods of activity and the association durations are

clearly individual-related. These periods are important to un-

derstand the pattern of network utilization of an individual.

Individuals that are active all the time may not show the

same network behavior than individuals that are active only at

nights of weekdays. Of course, these two different behaviors

introduce different requirements. The mobility coverage -

number of different visited clusters during the period- is also

an individual-related parameter which depends on the user’s

knowledge of the network connectivity. The correlation be-

tween periods of activity and mobility coverage is individual-

related and is difficult to predict.

We compute, per day, the number of visited APs for a user

u1 for the period from September 2003 to April 2004. We

chose this specific user because of her/his regularity in the

utilization of the network during the different observed period.

Then we compared it with the behavior of a group of users.

We observe that users are more active on weekdays than

on weekend but the average number of visited APs among all

users remains stable (see Fig. 3(b)). During school holidays,

there are more variations in the visited APs. This is partly

due to the smaller number of users present in the network

and the different usages of the network. The period of activity

within weekdays can slightly change (the day of activity can be

different) but, even if there are great variations in the coverage

of mobility, the number of visited APs remains high (between

15 and 75).

B. Local Micro-Mobility and paths

The association durations within clusters reflect individual-

related mobility characteristic. This is also a means to classify

users’ mobility. Fig. 4 is a log-log view of the association

durations computed per week for an individual u2. We chose

TABLE I

COMPARISON BETWEEN THE OBSERVED PERIODS. JANUARY TO FEBRUARY 2004 IS THE REFERENCE PERIOD.

Observed period Sept-Oct 03 Oct-Nov 03 Nov-Dec 03 Dec 03-Jan 04 Jan-Fev 04 Fev-March 04 March-Apr 04

Total Nb of active users -1463 -1941 -1748 -1131 5514 -1075 -1565

Total Nb of active APs 546 545 535 532 537 544 550

Nb home locations (AP basis) 75,8% 69,5% 68,5% 74% 76,7% 70,4% 70,1%

Nb Home locations (cluster basis) 91% 88% 87% 89% 90% 87% 86%

0

10

20

30

40

50

60

70

80

0 14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238

Num

ber

of

dif

fere

nt

vis

ited A

cces-

poin

ts

Day (num)

(a) Number of different visited APs by the user u1 per day from 2003-01-09 to 2004-30-04.

0

2

4

6

8

10

12

14

16

18

20

0 14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238

Num

ber

of

dif

fere

nt

vis

ited A

cces-

poin

ts

Day (num)

AverageStandard deviation

(b) Average and standard deviation of the number of different visitedAPs by our group of users per day from 2003-01-09 to 2004-30-04.

Fig. 3. Periods of activity and number of visited APs

1

10

100

1000

10000

100000

1 10 100 1000

Cu

mu

late

d a

sso

cia

tio

n d

ura

tio

n

Sorted access-points by cumulated association duration

week1week2week3week4week5week6week7week8

Fig. 4. Log-log of the cumulated association duration among the clusters ofu2 per week between the 2004-01-01 and 2004-29-02.

u2 because she/he does show a change in her/his mobility

behavior through the observed period.

There are two important parameters in this figure which

can help classifying the mobility patterns: number of visited

clusters and number of places with social meaning. The ratio

between the number of places and the total number of visited

clusters give us an information on the intensity of the mobility.

The closer to 1 the ratio is, the less mobile an individual is.

In the same way, the number of places gives us information

on the regularity of the mobility. If this number is equal to 1,

the individual is stationary.

By combining these information we can propose four

classes of mobility:

• Stationary: 1 visited cluster. The individual is static or can

present an undergo micro-mobility (ping-pong effect).

• Occasional: 1 place with social meaning but a ratio low-

ers than 1. The individual mobility presents prevalence

for one cluster and a number of associations in other

clusters under the average associations’ duration.

• Regular: ratio included between r and 1 (with 0 < r < 1),

and number of places with social meanings higher than 1.

The individual stays regularly associated to a few number

of places and has a limited ratio.

• Intense: ratio included between 0 and r, and number of

places with social meanings higher than 1. The mobility

of this individual is complex and could be difficult to

predict.

In order to differentiate regular users from intense ones, we

have set the ratio r to 0.2. With this value, regular users have

at least 20% of places with social meanings in their list of

visited clusters.

This classification is close in essence to the one proposed by

Balazinska and Castro through a system of prevalence (ratio

of time staying associated on one AP). By computing the

maximum prevalence and the median prevalence, they are also

able to classify the different mobility patterns. Nevertheless,

we use here different metrics (places with social meanings

compared to the number of visited clusters).

0

500

1000

1500

2000

2500

3000

3500

0 14 28 42 56 70 84 98 112 126 140 154 168 182 196 210 224 238

Num

ber

of

acti

ve

wir

eles

s ca

rds

Day (num)

Fig. 5. Number of active users per day during the observed period.

C. Home location and guest locations

In our context, we define as a possible “home location”

the cluster with the highest association duration and “guest

locations” the other clusters considered as places with social

meaning (the home location is a special case of guest location).

A home location is an AP or a group of collocated APs

where a user spends at least 50% of his/her total time in the

network [1], [2], [7]. With an algorithm based on one AP we

observe that 76% of users get a home location against 90%

with the clustering algorithm (see Table I). This result is in

the same order as the results in obtained in [7] (up to 95%

in the same environment); it is important to underline that the

results shown in [7] were computed thanks to a scheme based

on the geographic coordinates of the APs. Home location

is definitely an individual-related characteristic because most

home locations do not correspond with network hotspots [7]. If

we can not distinguish a home location for a user, it is because

of the association durations on the other guest locations or a

change of home location. A change in home location denotes

a deep modification of the user behavior while the number of

guest locations denotes the regularity of the mobility.

D. Cyclic mobility patterns

Users show cyclic mobility patterns. We observe that a great

majority of users is present during weekdays of school periods.

The regularity of the number of individuals associated on

weekdays and on weekends denotes of this cyclic pattern (see

Fig 5). However, the activity during school holidays presents

some great variations. This aspect can introduce complexity in

the way to manage the network. If we correlate the information

of weeks of activity of the individual u1 in Fig 3(a) and the

number of active users in Fig 5, we see that u1 (days 56 to

70 and 84 to 98) was not associated in the network while

the general behavior shows a constant number of associations.

Thus, it is the sum of all these different users activity patterns

which gives the cyclic aspect observed on a network scale.

VI. CONCLUSION AND FUTURE WORK

In this paper we proposed an individual mobility-based

clustering algorithm which uses roaming events as a metric

to evaluate the proximity of APs. Thanks to this algorithm,

which does not use any geographical information, we can

differentiate network places with social importance for each

individual and APs which are as part of a path between two

destinations. We have analyzed the evolution of individual-

related mobility characteristics within an 8-months period.

We showed that the difference in activeness, the coverage of

mobility but also the existence of a home location, and the

number of guest locations are clearly individual-related. As a

consequence, defining fixed parameters among all the users

seems to be inefficient to classify all users’ behaviors.

An immediate conclusion of our study is that an individual-

centric approaches should be used toward efficient com-

munication systems. Finally, thanks to the clustering algorithm

and the analysis of the mobility behaviors, we have been able

to classify user mobility into four categories: intense, regular,

stationary, and occasional.

Compared to others works which generally considered users

as an ensemble, this work falls under an effort to analyze the

contribution of each individual to the general dynamic of the

network. We are interested, as future work, to study the impact

of clusters on network services and how each individual could

contribute to improve network services.

REFERENCES

[1] D. Schwab and R. Bunt, “Characterising the use of a campus wirelessnetwork,” in Proc. of the 23rd Annual Joint Conference of the IEEEComputer and Communications Societies (INFOCOM), Hong Kong,China, 2004, pp. 862–870.

[2] M. Balazinska and P. Castro, “Characterizing mobility and networkusage in a corporate wireless local-area network,” in Proc. of MobiSys2003, San Francisco, CA, May 2003, pp. 303–316.

[3] M. Kim and D. Kotz, “Classifying the mobility of users and the pop-ularity of access points,” in Proceedings of the International Workshopon Location- and Context-Awareness (LoCA), ser. Lecture Notes inComputer Science, T. Strang and C. Linnhoff-Popien, Eds., vol. 3479.Germany: Springer-Verlag, May 2005, pp. 198–209.

[4] W. jen Hsu and A. Helmy, “On modeling user associations in wirelesslan traces on university campuses,” in Proc. of the Second Workshop onWireless Network Measurements (WiNMee 2006), Boston, MA, USA,2006.

[5] M. Kim, D. Kotz, and S. Kim, “Extracting a mobility model fromreal user traces,” in Proc. of the 25th Annual Joint Conference of theIEEE Computer and Communications Societies (INFOCOM), Barcelona,Spain, 2006.

[6] F. Chinchilla, M. Lindsey, and M. Papadopouli, “Analysis of wirelessinformation locality and association patterns in a campus,” in Proc.of the 23rd Annual Joint Conference of the IEEE Computer and

Communications Societies (INFOCOM), Hong Kong, China, 2004, pp.906–917.

[7] T. Henderson, D. Kotz, and I. Abyzov, “The changing usage of a maturecampus-wide wireless network,” in Proc. of the Tenth Annual Interna-tional Conference on Mobile Computing and Networking (MobiCom),Philadelphia, PA, USA, 2004, pp. 187–201.

[8] J. Ghosh, M. J. Beal, H. Q. Ngo, and C. Qiao, “On profiling mobilityand predicting locations of campus-wide wireless users,” in Proc. ofthe second international workshop on Multi-hop ad hoc networks: from

theory to reality, Florence, Italy.[9] CRAWDAD, “A community resource for archiving wireless data at

dartmouth.” [Online]. Available: http://crawdad.cs.dartmouth.edu[10] D. Kotz, T. Henderson, and I. Abyzov, “CRAWDAD trace

set dartmouth/campus/movement (v. 2005-03-08),” March 2005.[Online]. Available: http://crawdad.cs.dartmouth.edu/dartmouth/campus/movement/

Documents

[IEEE 2007 16th IST Mobile and Wireless Communications Summit - Budapest, Hungary (2007.07.1-2007.07.5)] 2007 16th IST Mobile and Wireless Communications Summit - Towards self-characterization