Indoor Localization in Multi-Floor Environments with Reduced Effortai.stanford.edu/~huayanw/PerCom10.pdf · 2014. 3. 1. · Indoor Localization in Multi-Floor Environments with Reduced

Indoor Localization in Multi-Floor Environments with Redu ced Effort

Hua-Yan Wang∗, Vincent W. Zheng†, Junhui Zhao‡ and Qiang Yang†∗ Stanford University, Email: [email protected]

† Hong Kong University of Science and Technology, Email:{vincentz,qyang}@cse.ust.hk‡ NEC Labs China, Email: [email protected]

Abstract—In pervasive computing, localizing a user in wire-less indoor environments is an important yet challenging task.Among the state-of-art localization methods, fingerprintingis shown to be quite successful by statistically learning thesignal to location relations. However, a major drawback forfingerprinting is that, it usually requires a lot of labeleddata to train an accurate localization model. To establish afingerprinting-based localization model in a building with manyfloors, we have to collect sufficient labeled data on each floor.This effort can be very burdensome. In this paper, we study howto reduce this calibration effort by only collecting the labeleddata on one floor, while collecting unlabeled data on otherfloors. Our idea is inspired by the observation that, althoughthe wireless signals can be quite different, the floor-plansin abuilding are similar. Therefore, if we co-embed these differentfloors’ data in some common low-dimensional manifold, we areable to align the unlabeled data with the labeled data well sothat we can then propagate the labels to the unlabeled data. Weconduct empirical evaluations on real-world multi-floor datasets to validate our proposed method.

Keywords-Reduced Calibration Effort, Wireless Localization,Multi-Floor Environment

I. I NTRODUCTION

With the proliferation of wireless technologies, indoorlocalization using wireless signal strength has attractedin-creasing interests from both research and industrial commu-nities [1], [2]. Based on accurate location information, manyuseful services can be provided, such as object tracking,security control,etc. A number of localization methods havebeen proposed. Among them, fingerprinting is shown to bequite successful. In general, given sufficient labeled datainsome environment, fingerprinting methods statistically learnthe relations between the received signal strengths and thelocations [3], [4]. However, getting a lot of labeled data canbe very expensive. This drawback poses a major difficultyfor fingerprinting methods in real-world applications. Forexample, to facilitate the indoor localization for a buildingwith many floors, we have to collect sufficient labeled dataand train a localization model at each floor. When thereare many floors in the building, it will be very burdensometo collect labeled data for all the floors. In this paper, weare interested in studying how to reduce such effort. Weshow that, by using our proposed method, we only have tocollect the labeled data on one floor. For the other floors,we only need to carry a wireless device and walk in anarbitrary way to collect some unlabeled data. After that, a

sufficiently accurate localization model for the floors withoutlabeled data can be obtained for the other floors, saving asignificant amount of labeling effort.

Our idea is inspired by the observation that, althoughthe wireless signals can be quite different, the floor-plansin a building are usually similar. As an example, a floorplan for our building is shown in Figure 1. Generally, the

Figure 1. An academic building at HKUST (partial)

wireless signal data are composed of the received signalstrengths (RSS) from various access points (APs) in the en-vironments. On different floors, we may detect different APs.Moreover, even for the same APs, we may receive differentsignal strength values at different floors. So the collectedsignal data from different floors can constitute differenthigh-dimensional spaces, which can be described as high-dimensional manifolds [20]. Fortunately, although thesehigh-dimensional data spaces vary across the floors, theyare all constrained to some common low-dimensional repre-sentation, such as 2-D locations in a floor-plan. Therefore,ifwe can embed these different floors’ data in some commonlow-dimensional manifold by using some correspondence-learning method, we are able to align the unlabeled datawith the labeled data. We will show in Section IV-A that,by using a maximal-RSS criterion, we can identify suchcorrespondences well. After that, we can propagate the labelsfrom the labeled data to the unlabeled data and train alocalization model for the floors with only unlabeled data.We can save a significant amount of effort in this way inmulti-floor localization.

Our contributions are summarized as follows:

• We describe a new multi-floor indoor localization prob-lem, and analyze the traditional fingerprinting solutionson this problem.

• We provide a reduced-effort solution by using the unla-beled data. Compared to other multi-floor localizationwork [11], [12], [13], our method does not requiredata labeling at every other floor, thus greatly reducingthe calibration effort. Compared to other localizationmethods also using unlabeled data [5], our model iscapable of handling a more difficult case when wehave multiple signal datasets with the received signalstrengths (i.e.data distributions) and the detected accesspoints (i.e. data feature dimensions) being different.

• We validate our proposed method in real-world multi-floor environments.

II. RELATED WORK

In recent years, there have been many studies on howto utilize the radio frequency values of wireless signalsfor indoor localization. In general, the proposed wirelesslocalization methods fall into two main categories:Propaga-tion ModelsandFingerprinting Models. Propagation Modelscan benefit from the knowledge of radio propagation andalso the access points’ locations. By using trilateration ortriangulation techniques, propagation models can computethe position for the mobile target [6], [7]. A drawback forsuch models is that, they cannot well handle the signal un-certainties. Fingerprinting models, also known as learning-based models, aim to use some data mining techniques,to discover the signal-location patterns [8]. Typical patterndescriptions include histogram [9], mixture of gaussian[10],or simply the mean value of signal strength at differentlocations [1]. As fingerprinting models can take the signaluncertainties into consideration, they are usually shown tohave rather satisfying performance in localization. However,there are still some drawbacks for such models; and onemain drawback of them is that they usually require muchhuman calibration effort to build a localization model fora given environment. When the environment changes, thelearned signal-location patterns may change as well, thus thealready-built localization model may not work well anymore.Our work belongs to the category of fingerprinting models,but it goes beyond the standard fingerprinting methods byhandling the environmental changes caused by the effects ofmultiple floors with reduced effort.

Some previous works also considered multiple floors inwireless localization. For example, Otsasonet al. studiedthe GSM-based indoor localization by using wide signal-strength fingerprints [11]. They tested their model on threemulti-floor buildings, and showed that it can effectively dif-ferentiate between floors and achieve a satisfying accuracyfor within-floor localization. Letchner et al. proposed touse hierarchical Bayesian network for large-scale wirelesslocalization [12]. For an indoor environment, they testedtheir model in a 7-floor building. Varshavsky et al. designeda system based on a GSM fingerprinting-based localizationsystem know as SkyLoc, to identify which floor a mobile

phone user is on in multi-floor buildings [13]. They studiedseveral feature-selection methods for their system, and testedthem in three multi-floor buildings. However, the aboveworks required users to collect labeled data (i.e. fingerprints)at all the floors, thus are quite expensive for human calibra-tion. In contrast, we only need to collect labeled data onasingle floor, and collect unlabeled data on all other floors.It will greatly reduce the labeling cost. There is also someother work on studying how to track a user in a multi-floorbuilding such as [14], but they typically require some otherspecific sensors instead of using only the wireless adapters.

Besides, there are some works discussed how to reduce thehuman labeling effort in wireless localization. For example,[15] and [16] only asked for unlabeled data to do local-ization; however, both of them required the access points’locations to be known in advance, which is not alwaysavailable in practice due to security control. In contrast,we don’t need to know the APs’ locations. [17] and [18]addressed the environmental changes with time and devicesin WiFi localization with minimal effort; however, bothof them required at least some labeled data in the newenvironments (i.e. either time or devices), while in ourcurrent model, we don’t need to collect any labeled datain the new environments (i.e. floors).

III. L OCALIZATION FOR MULTI -FLOOR ENVIRONMENTS

In our proposed localization method, we only require theusers to collect the labeled data (i.e. fingerprints) on onefloor, and collect unlabeled data on other floors. Then, wewill build a localization model for the floors with unlabeleddata. For clarity, we will use two floors as an example todemonstrate our idea in the following sections1. In particular,we will take as input a source floor having sufficient labeleddata and a target floor having only unlabeled data. Aftertraining, we assign labels for the unlabeled data in the targetfloor, and thus build a localization model (such as nearestneighbor [1],etc.). Finally, we will test the model on someheld-out test data (labeled) on the target floor. In applyingthe localization model in real time, for a multi-floor buildingenvironment, when a query signal vector is sent to ourlocalization system, we will compare its detected APs witheach floor’s AP list (as each floor’s AP list is different) andthus determine which floor the real-time signal vector maybelong to. After that, we will use that floor’s localizationmodel to output the predicted location.

Assume that, in some floorA, we carry a wireless deviceand collect a set of fingerprints fromDA access points.Therefore, we will have the labeled data (XA

trn, YAtrn) in

floor A for training. We callA the source floor. Here,XAtrn

is a N1 × DA matrix with each row as a received signalstrength (RSS) vector and each column corresponding to an

1Our model can also naturally extend to cope with more than twofloors ata time, but in our real-world experiments, we didn’t find thatit necessarilybrings any performance improvement.

access point.YAtrn is aN1× 2 matrix, with each row as the

2-D location where the corresponding RSS vector inXAtrn is

collected. Similarly, in a target floorB, we will collect someunlabeled signal data fromDB access points for training.Denote the data asXB

trn, which is aN2 ×DB matrix. Wealso collect another set of test data (XB

tst, YBtst) for the target

floor. Here,XBtst is aN3×DB matrix andYB

tst is aN3×2matrix. These test data are held out for evaluation.

In practice, the signal data for two different floors aregenerally different in both received signal strengths anddetected access points. In other words, the columns inXA

trn

andXBtrn are generally different as the floor is changed, such

that the labeled dataXAtrn and their labelsYA

trn cannot bedirectly used by putting them together. This is the difficultyfor the multi-floor localization problem. Fortunately, wefind that, although the signal data can be different, theunderlying floor-plans are usually similar. Furthermore, ina building, different floors usually sharesomeaccess pointsin the practical scenarios. The correspondences betweenthese shared access points can be established by examiningthe MAC addresses of the access points recorded by thereceiving devices. We will manage to make use of bothobservations to facilitate our reduced-effort localization inthe multi-floor environment.

*signal from two access points is shown here

Signal Space for Floor B (n-dimension)Signal Space for Floor A (m-dimension)F1

F1

F2 F3

0

0 0

Legend signal vector

signal boundary

grid point

grid boundary

Low-dimensional Space (2-D)

Y

X

L1

L2

S1

S2

S′

1

S′

3

S′

2

S3

L3

Figure 2. Multi-manifold to some latent space (e.g. 2-D location space)

As shown in Figure 2, the two sets of signal vectorsXA

trn and XBtrn can constitute two manifolds in two high-

dimensional spaces; at the same time, they are constrainedto some low-dimensional space such as a 2-D location floor-plan in our case. For example, the signal vectorS1 forfloor A and the signal vectorS′

1 for floor B, althoughin different signal spaces ({F1, F2} vs. {F1, F3}), theyboth correspond to a 2-D locationL1. Because we do nothave any labels on floorB, we cannot align the signalS′

1 to location L1. Similarly, we cannot alignS′2 to L2.

Fortunately, we find that signal vectorS3 andS′3 have some

inherent correspondences,i.e. they both have the maximalsignal strengths from the access pointF1 in their own

Figure 3. A flowchart of our proposed method.

datasets. As the signal strengths will attenuate as the distancefrom the device to the access point increases, we may inferthat their 2-D locations are very close, or sometimes even thesame. By using such a correspondence, we can propagate thelabel (i.e. L3) from S3 to S′

3 (in a sophisticated way as wewill shown in the following sections). Therefore, by usingthe neighborhood information, we can further propagate theneeded label information toS′

1 andS′2, finally aligning them

to L1 andL2. The above procedure constitutes our idea toco-embedboth the labeled and unlabeled signal data fromthe two floors into a common latent space (such as the 2-Dfloor-plan) for label propagation. Our algorithm flowchart isshown in Figure 3. First, we embedXA

trn andXBtrn into a

common latent space, so as to infer the labelsYBtrn for XB

trn.Then, we learn a localization model fromXB

trn and YBtrn,

which is readily used for localization in floorB (evaluatedon test dataXB

tst in our experiments).

IV. L OW-COSTLOCALIZATION BY CO-EMBEDDING

To make the co-embedding possible, we will introducetwo intra-manifold graphs and one inter-manifold graph,given the data on two floors2. Recall that each floor’ssignal data naturally constitute one manifold in some high-dimensional space. To describe such a manifold, we build anintra-manifold graph encoding the connections (or distances)among the RSS vectors. Hence, for two floors, we canformally describe them as two intra-manifold graphs. As ouraim is to co-embed these two manifolds in some commonlow-dimensional space for label propagation, we will needto build some connections between these two manifolds. Weachieve this by modeling an inter-manifold graph, which canbe initialized by some correspondences between the RSSvectors on both manifolds. As discussed above, we willassign such a correspondence for two RSS vectors having themaximal signal strengths from some shared AP respectivelyon both floors. After that, we will further refine the inter-manifold graph by making used of the intra-manifold graphs.

Having the inter-manifold graph and two intra-manifoldgraphs, we are now ready to co-embed both floors’ data.

2Hence, if we want to deal withm floors (m > 2) at a time, we willhavem intra-manifold graphs and(m

2) inter-manifold graphs.

There can be two alternatives to do this: (1)Co-embeddingwith location constraints, in which we will use the knownlocationsYA

trn as constraints and co-embed the two floors’data into a 2-D space. In this case, the label informationYA

trn will be used in both co-embedding and label prop-agation. (2)Co-embedding without location constraints, inwhich we will ignoreYA

trn and co-embed the two floors’data into some arbitrary high dimensional space (thus canbe higher than 2-D). In this case, the label informationYA

trn

will only be used in label propagation.

A. Build Graphs for Co-embedding

In order to constrain the embedding of the RSS vectors,we build graphs within each of the datasetsXA

trn andXBtrn

as well as between them. For the intra-dataset graphs, thedistances between the RSS vectors inXA

trn (as well asXBtrn)

can be readily computed (e.g. by the Euclidean distance).Thus we can simply connect each RSS vector with itsK

nearest neighbors (which results in a binary graph matri-ces WA

N1×N1and WB

N2×N2. For the inter-dataset graph,

distances between the RSS vectors acrossXAtrn and XB

trn

cannot be directly computed due to different feature spaces.To build the connections between the datasets, we proposeto use a maximal-RSS criterion to get the correspondencesfor initializing the inter-dataset graph.

• Maximal-RSS Criterion: For each shared access point,we connect the two RSS vectors (fromXA

trn andXBtrn

respectively) with maximum received signal strengthfrom that access point.

The resulted graph (with binary graph matrixWABN1×N2

) isgenerally too sparse to effectively relate the embedding oftwo datasets, because the number of shared access points isusually much less than number of RSS vectors in a denselysampled dataset. We therefore use the following iterativerule to enhance the inter-dataset graph by incorporating theintra-dataset graphs:

WAB ←WAWABWB. (1)

Each such update essentially propagates and enhances theinter-dataset connections according to the intra-datasetcon-nections. In practice, (1) is executed a sufficient number oftimes to allowWAB effectively relate the two datasets.

Let {p(k)}Kk=1 ⊂ RN1+N2 denote the embedding coordi-

nates of the source and target training set (inK-dimensionallatent space). Fori = 1, · · · , N1, p(k)(i) is thek-th coordi-nates of thei-th RSS vector inXA

trn, and forj = 1, · · · , N2,

p(k)(N1 + j) is thek-th coordinates of thej-th RSS vectorin XB

trn. The co-embedding should be constrained by thefollowing regularization term in order to be smooth withrespect to the inter- and intra-dataset graphs:

N1∑

i=1

N2∑

j=1

(p(k)(i)−p(k)(N1+j))2 WAB(i, j) = p(k)⊤LABp(k),

(2)

whereLAB is a graph Laplacian matrix, defined as

LAB(N1+N2)×(N1+N2)

=

(DAB

N1×N1−WAB

N1×N2

−(WAB)⊤N2×N1DBA

N2×N2

).

(3)Here,DAB

N1×N1is a diagonal matrix with diagonal elements

{∑

j WAB(i, j)}N1

i=1, andDBAN2×N2

is a diagonal matrix withdiagonal elements{

∑i W

AB(i, j)}N2

j=1.In co-embedding, the inter-dataset graph is combined with

the intra-dataset graphs with some relative weightµ. Finally,the composite graph Laplacian is

L =

(LA 00 LB

)+ µ LAB

(N1+N2)×(N1+N2)(4)

where LA and LB are the graph Laplacian of the intra-dataset graphs, defined similarly with (3). So the con-straint imposed by the graphs on the co-embedding can besummarized in one termp(k)⊤Lp(k) for each dimensionk ∈ {1, · · · , K}.

B. Co-embedding with Location Constraints

Given the 2-D location labelsYAtrn, we can use the 2-

D floor-plan as the common low-dimensional embeddingspace, and the image of each RSS vector inXA

trn isconstrained by its 2-D location [19] (as given inYA

trn).Specifically, the optimal embedding should satisfy:

p(k) = arg minp(k) [(p(k) − y

(k)P )⊤JP (p(k) − y

(k)P )

+ γ p(k)⊤Lp(k)], k = 1, 2(5)

where

JP =

(IN1×N1 0

0 0

)

(N1+N2)×(N1+N2)

, (6)

and for k = 1, 2, we definey(k)P ∈ R

N1+N2 whose firstN1 elements are the known coordinates given inYA

trn, andthe nextN2 elements can be arbitrary values. The user-specified parameterγ controls the relative strength of thegraph regularization and the known location constraints. Wewill study its impact in the experiments.

It turns out that (5) can be solved in closed form:

p(k) = (JP + γL)−1JP y(k)P , k = 1, 2 (7)

Hence, the signal data from floorsA and B are embeddedin a common low-dimensional space. Then we can propa-gate the labels by assigning each RSS vector inXB

trn thelocation label of its nearest neighbor fromXA

trn in the low-dimensional space.

C. Co-embedding without Location Constraints

One possible concern for co-embedding with locationconstraints can be that, it may unnecessarily restrict theembedding in a 2-D space, and thus unclear whether otherembeddings in higher-dimensional spaces could be better torecover the relations between the RSS vectors. So we also

study the case of co-embedding without location constraints.Similar to the co-embedding method introduced in the lastsection, we now leave alone the locationsYA

trn and theobjective function becomes:

p(k) = arg minp(k)

p(k)⊤Lp(k). (8)

In order the prevent this objective function value from goingarbitrarily small, we will impose some scale constraint onthe embedding

p(k)⊤p(k) = 1. (9)

Fortunately, this optimization problem is well known tohave the solution as smallest eigenvalues (excluding the zeroeigenvalue) [20]. In other words, if we want to find aKdimensional embedding, we simply take the second to the(K + 1)-th eigenvectors ofL. As in the last section, wecan assign each RSS vector inXB

trn the location label of itsnearest neighbor fromXA

trn in the low-dimensional space.

D. Algorithm and Complexity Analysis

Here we summarize our multi-floor localization algorithm:Offline Phase: Densely sampled labeled data is collected

in one floor (source floor).Online Calibration and Learning: When we move to

another floor (target floor) in the same building with almostidentical layout, we carry a wireless device and walk aroundin an arbitrary manner to collect unlabeled data. Then weco-embed the labeled data collected in the offline phaseand the unlabeled data collected in online calibration phase.There can be two ways for co-embedding: co-embeddingwith location constraints in Equation (5) or without locationconstraints in Equation (8). After co-embedding, we canpropagate the labels to the unlabeled data by finding thenearest neighbors in the common low-dimensional space.Finally, we have labeled data for the target floor.

Online Localization: We learn a localization model withthe newly obtained labeled data for the target floor. Varioustechniques can be used to learn such a localization model.For example, we can use a nearest neighbor method: givena RSS vector from the target floor for testing, we find itsnearest neighbor in the obtained labeled dataset for the targetfloor, and assign the corresponding location label to it.

Most computation cost of the procedure lies in computingthe eigen-decomposition and inversion of the matrices inco-embedding with/without location constraints respectively,which are generally in the order ofO(n3), wheren is thenumber of RSS vectors in our situation.

V. EXPERIMENTS

A. Datasets

To validate our approaches, we collected three sets ofwireless signal data in an academic building at our univer-

sity3, where each dataset was collected in one floor with asize of around60m×40m as shown in Figure 1. Each flooris discretized into 1181.5m× 1.5m grids on the hallways.The dataset descriptions can be found in the following table.We note that, dataset 2 shares 91 APs with dataset 1; dataset3 shares 51 APs with dataset 1 and 55 APs with dataset 2.

# OF DIFFERENTAPS # SAMPLES

DATASET 1 118 2080DATASET 2 120 1864DATASET 3 114 2513

Among these datasets, we can take any dataset as a sourcefloor and another one as a target floor, thus simulating 6calibration tasks. The target dataset is (randomly) dividedinto several halves, where one half is used asunlabeledtraining data and the other half is used as held-out testdata for evaluating the localization accuracy. The followingresults are all based on an average of 10-trial experiments.

B. Impact of the Maximal-RSS criterion

In order to relate the source and target datasets, we exploitthe assumption (i.e. the Maximal-RSS criterion) that, fora given access points, maximal RSS are received at closelocations in different datasets. In this way, we can connectthe data vectors with maximal RSS values from two datasetsfor each shared AP. To see how such a criterion performs onreal-world datasets, we plot the learned inter-dataset graphs(denoted asWAB) by using Eq.(1). As shown in Figure 4,the underlying (nearly) rectangle shapes are the experimentalareas where we collected data (in different floors though).Each blue-line circle denotes a connection between two RSSvectors from both datasets in some location. The smaller acircle is, the closer the corresponding (2-D) locations wherewe collected the two connected RSS vectors are. As wecan see, the graph between dataset 1 and dataset 2 hasmore connections/circles, which shows that the learned inter-manifold graph is better. This coincides with the fact thatthese two datasets share more APs and thus may have higherperformance in co-embedding.

C. Impact of the model parameters

There are some parameters in our approach that we needto set manually, including the weight of inter-dataset graphversus intra-manifold graph (denoted asµ), the numberof nearest neighbors in building the intra-dataset graphs(denoted as # NN), and the weight of graph constraintsversus location constraints (denoted asγ) (only in location-constrained embedding). In all our experiments, we measurethe performance by the percentage of correct predictions (i.e.accuracy) in the test set within some error distance; here, weset it as 5 meters, and we will study different error distancesin Section V-D). Here, we use dataset 1 as source and dataset2 as target for illustration for Figures 6, 7 and 8; we also

3We could not access the other multi-floor datasets used in thepreviousworks [11], [12], [13] after several requests.

0 20 40 60−5

0

5

10

15

20

25

30

35

40

45

DATA 1 & DATA 20 20 40 60

−5

0

5

10

15

20

25

30

35

40

45

DATA 2 & DATA 30 20 40 60

−5

0

5

10

15

20

25

30

35

40

45

DATA 1 & DATA 3

Figure 4. The qualities of the resulting inter-manifold graphs. The x-axis and y-axis denote the 2-D location coordinates (unit: meter). A circle is placedat some location where there is an inter-dataset connectionfor two RSS vectors, and the size of the circle indicates the connection quality.

Table IRESULTS FOR USING DIFFERENT PAIRS OF FLOORS. WE REPORT THE(ACCURACY± STANDARD DEVIATION).

# SHARED APS S TO T WITH CONS. WITHOUT CONS. BASELINE-1 [1] BASELINE-2 [5]91 1 TO 2 65 ± 3% 69 ± 2% 11 ± 1% 7 ± 1%

91 2 TO 1 64 ± 3% 66 ± 3% 6 ± 2% 7 ± 1%

51 1 TO 3 55 ± 5% 57 ± 4% 8 ± 4% 8 ± 1%

51 3 TO 1 51 ± 3% 55 ± 5% 9 ± 3% 8 ± 1%

55 2 TO 3 53 ± 4% 52 ± 4% 3 ± 1% 9 ± 1%

55 3 TO 2 56 ± 5% 54 ± 3% 9 ± 1% 8 ± 1%

get similar results by using other pairs of datasets, but dueto space limit, we will not report them.

In Figure 6, generally we observe simple unimodal be-haviors of the parameterµ, # NN and γ except that ifwe set # NN too small or too large (beyond the scalesin the figure), the method sometimes “crashes” and we getdegenerated embeddings. For clearer illustration, we furtherplot the behavior of the co-embeddings with increasingµ

in Figure 7 and increasingγ in Figure 8, respectively. Aswe can see from both the figures, asµ andγ approach theirown unimodal points (e.g.µ = 0.1 and γ = 10), the co-embeddings for two floors’ data can align better.

D. Overall results

We tested our two methods, “co-embedding with loca-tion constraints” and “co-embedding without location con-straints”, on the 6 localization tasks generated from the3 datasets (each task uses 2 datasets). The localizationaccuracies are shown in Table I, all under a 5-meter errordistance. We also compare our methods with 2 baselines.“baseline-1” refers to the RADAR algorithm [1], which isa k-nearest-neighbor method using only labeled data forlocalization. As it cannot be used for handling the datawith different feature dimensions (e.g. two floors’ data arecomposed of signals from different sets of APs), we onlyused the shared APs as the features in RADAR. In particular,we use the labeled data from the source floor (XA

trn, YAtrn)

as training data, and train aK-NN classifier (whereK is

set as 1 to get the highest accuracies) for the unlabeled testdata from the target floorXB

tst. Besides, “baseline-2” refersto the LeMan algorithm [5], which is a manifold learningmethod and also able to use the unlabeled data. However,LeMan requires the labeled data and the unlabeled data tofollow the same data distribution and have same featuredimensions, so that those data can be smoothly distributed inone manifold for label propagation. Because in multi-floorenvironment, different floors’ data can have different featuredimensions and distributions, LeMan may not work as wellas our method. To use LeMan for comparison, we also onlyuse the shared APs as the data features. Specifically, wetake the source floor’s labeled training data (XA

trn, YAtrn)

and the target floor’s unlabeled training dataXBtrn as inputs

for training a localization model. We then use the model topredict the labels for the target floor’s test dataXB

tst.

As we can see from Table I, our methods can greatlyoutperform both baselines, because we use co-embedding tocarefully align two floor’s data and meanwhile the baselinemethods suffer a lot from signal variations over differentfloors. Moreover, we also find that, for the localization taskbetween datasets 1 and 2, which share more access points,the performance is better, validating our motivation of co-embedding with shared APs.

We also compared the performances of our co-embeddingmethods using 2 datasets and using 3 datasets. The co-embeddings with 2 datasets follow the same settings withTable I, while the co-embeddings with 3 datasets works as

follows: for the taski→ j (e.g. i = 1, j = 2), we also usethe unlabeled data from another datasetk (e.g.k = 3) forco-embedding. As shown in Table II, co-embedding with2 datasets consistently outperforms co-embedding with 3datasets. This can be because the floors’ data are quitedifferent and adding one more extra unlabeled dataset forco-embedding may distract the attention of label propagationbetween the labeled dataset and the target unlabeled dataset.Moreover, according to our experience, the model using 3datasets tends to be more sensitive to the parameters andeasier to crash. We will study how to better incorporateextra unlabeled datasets in the future. Note that the resultsin Table II are based on co-embedding without location con-straints, and we obtained similar results for co-embeddingwith location constraints.

Table IICOMPARISON FOR CO-EMBEDDING WITH 2 DATASETS VS. 3 DATASETS,

USING CO-EMBEDDING WITHOUT LOCATION CONSTRAINTS.S TO T CO-EMBED W/ 2 SETS CO-EMBED W/ 3 SETS

1 TO 2 69 ± 2% 59 ± 5%

2 TO 1 66 ± 3% 58 ± 3%

1 TO 3 57 ± 4% 50 ± 5%

3 TO 1 55 ± 5% 49 ± 7%

2 TO 3 52 ± 4% 44 ± 7%

3 TO 2 54 ± 3% 44 ± 4%

We further studied our model’s performances (in formof cumulative probabilities) under varying error distancesin Figure 5. Due to the space limit, we do not plot all 12sets of results for the combinations of different dataset pairsand co-embedding strategies (with or without constraints).Instead, we only plot a set of results for a typical run of the“co-embedding without constraint” method using dataset 1as source and dataset 2 as target. We observe similar patternsfor the remaining 11 sets of results under varying errordistances. Note that to get the results in Table I and Figure 5,we simply use the nearest neighbor method after using co-embedding to propagate labels to the target floor’s unlabeledtraining data. We may expect that, using more sophisticatedmethods such as [3], [4], [8], we can further improve thelocalization accuracy. We also plot the corresponding co-embeddings in Figures 9, 10 and 11. The color in thesefigures denotes the location of each RSS vector, so it is moredesirable that points with similar color are embedded closely.As we can see from these figures, in general, when thedifferent data manifolds are better aligned by co-embedding,the localization results will be better (w.r.t. Table I).

VI. CONCLUSIONS ANDFUTURE WORK

In this paper, we put forward a multi-floor indoor local-ization problem and a solution that can effectively estimatea user’s locations from his/her received signal strengths withmuch reduced effort. Our method is motivated by the dailylife observations that, the floor-plans in a building are usually

1 2 3 4 5 6 7 8 9 100.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Error Distance Threshold

Per

cent

age

Figure 5. Accuracies under varying error distances, where the x-axisdenotes error distance (unit: meter) and y-axis denotes localization accuracy.

almost identical, and although the signals can be quitedifferent in different floors, there can be some correspon-dences between them. To make use of such observations, weproposed to co-embed different floors’ data in some commonlow-dimensional space, where we can align the unlabeleddata with the labeled data and further propagate the labels.In this way, we only need to collect the labeled data on onefloor, and collect unlabeled data on other floors.

We believe that, our general methodology deserves furtherinvestigation, whereby the graph embedding methods canbe used to help reduce calibration effort for more generaltypes of sensor network data. In the future, we will study touse more sophisticated learning methods coupled with ourco-embedding method to further improve the localizationperformance. Also, although different floors in the samebuilding have similar floor plans, some floors follow morecomplicated structures. We are interested in extending ourmethod to cope with such more complex situations. We alsoexpect to test and improve our methods on more datasets inlarger-scale environments.

ACKNOWLEDGMENT

We would like to thank NEC Labs China for their support,and the anonymous reviewers for their insightful comments.

REFERENCES

[1] P. Bahl and V.N. Padmanabhan,RADAR: An In-Building RF-Based User Location and Tracking System, In Proc. of theConference on Computer Communications, pp.775-784, 2000.

[2] L.M. Ni and Y. Liu and Y.C. Lau and A.P. Patil,LANDMARC:Indoor Location Sensing Using Active RFID, In Proc. of the 1stIEEE International Conference on Pervasive Computing andCommunications, 2003.

[3] A. Haeberlen and E. Flannery and A.M. Ladd and A. Rudysand D.S. Wallach and L.E. Kavraki,Practical Robust Local-ization over Large-Scale 802.11 Wireless Networks, In the 8thAnnual International Conference on Mobile Computing andNetworking, 2002.

[4] M. Youssef and A. Agrawala,The Horus WLAN location deter-mination system, In Proc. of the 3rd international conference onMobile systems, applications, and services, pp.205-218, 2005.

0.04 0.08 0.12 0.14 0.180.5

0.55

0.6

0.65

0.7

µ

Per

cent

age

25 30 35 40 45 500.5

0.55

0.6

0.65

0.7

# Nearest Neighbors

Per

cent

age

0.01 0.05 0.1 0.15 0.2 0.3 0.5 1 2 5 10 500.5

0.55

0.6

0.65

0.7

γ

Per

cent

age

Figure 6. The impact of the parameters. In the left and middlefigure we use the “embedding without location constraints” method (similar results areobtained for with constraints setting, although not reported here due to space limit). In the right figure we use the “embedding with location constraints”method, and haveµ fixed at 0.1 and # NN fixed at 35. Note that theγ axis is not in uniform scale.

0 20 40 60

0

10

20

30

40

µ=0.0010 20 40 60

0

10

20

30

40

µ=0.010 20 40 60

0

10

20

30

40

µ=0.10 20 40 60

0

10

20

30

40

µ=1

Figure 7. The impact ofµ on co-embedding. We use the “co-embedding with location constraints” method (similar results are obtained for withoutconstraints setting, although not reported here due to space limit). We can see that with a largerµ, the attraction between the source and target trainingdata are getting stronger, as the inter-manifold graph is becoming stronger.

0 20 40 60

0

10

20

30

40

50

γ=0.010 20 40 60

0

10

20

30

40

50

γ=0.10 20 40 60

0

10

20

30

40

50

γ=10 20 40 60

0

10

20

30

40

50

γ=10

Figure 8. The impact ofγ on co-embedding. We use the “co-embedding with location constraints” method. We can see that with a smallγ (left plot),the relative weight of the location constraints is large, hence the source training data are embedded almost “right at” their ground truth locations. Withincreasingγ, the location constraints gradually fade out. Note that in all these 4 figures, there are some (overlapping, in warm color) data points locatedat location (0,0). They are actually the target training data from the right-wing hallway (warm color), and are embeddedin a wrong location because thereis no Maximal-RSS correspondence found in the right wing hallway between dataset 1 and 2, as shown in Figure 4’s left plot.

[5] J.J. Pan and Q. Yang and H. Chang and Dit-Yan Yeung,AManifold Regularization Approach to Calibration Reductionfor Sensor-Network Based Tracking, In Proc. of the 21stNational Conference on Artificial Intelligence, 2006.

[6] N. Priyantha and A. Chakraborty and H. Balakrishnan,TheCricket Location-Support system, In Proc. of the 6th AnnualACM International Conference on Mobile Computing andNetworking, 2000.

[7] A. Savvides and C. Han and M.B. Strivastava,Dynamic fine-grained localization in Ad-Hoc networks of sensors, In Proc.of the 7th International Conference on Mobile Computing andNetworking, 2001.

[8] B. Ferris and D. Haehnel and D. Fox,Gaussian Processes

for Signal Strength-Based Location Estimation, In Proc. ofRobotics: Science and Systems, 2006.

[9] M. Youssef and A. Agrawala and U. Shankar,WLAN LocationDetermination via Clustering and Probability Distributions, InProc. of the 1st IEEE International Conference on PervasiveComputing and Communications, pp.143-150, 2003.

[10] T. Roos and P. Myllymaki and H. Tirri and P. Misikangas andJ. Sievanen,A Probabilistic Approach to WLAN User LocationEstimation, International Journal of Wireless Information Net-works, vol.9, pp.155-164, num.3, 2002.

[11] V. Otsason and A. Varshavsky and A. LaMarca and Eyal deLara, Accurate GSM Indoor Localization, In Proc. of the 7thInternational Conference on Ubiquitous Computing, pp.141-158, 2005.

−5 0 5

x 10−3

−4

−2

0

2

4x 10

−3

DATA 1 −> DATA 2 w/o constraints0 20 40 60

0

10

20

30

40

DATA 1 −> DATA 2 w/ constraints−5 0 5

x 10−3

−6

−4

−2

0

2

4x 10

−3


0

10

20

30

40

DATA 2 −> DATA 1 w/ constraints

Figure 9. The co-embedding results for localization task between datasets 1 and 2.

−0.01 0 0.01 0.02−4

−2

0

2

4

6x 10

−3


0

10

20

30

40

DATA 1 −> DATA 3 w/ constraints−0.01 0 0.01 0.02−4

−2

0

2

4

6x 10

−3


0

10

20

30

40



−5 0 5 10

x 10−3

−6

−4

−2

0

2

4x 10

−3


0

10

20

30

40

DATA 2 −> DATA 3 w/ constraints−0.01 0 0.01 0.02−6

−4

−2

0

2

4x 10

−3


0

10

20

30

40



[12] J. Letchner and D. Fox and A. LaMarca,Large-Scale Lo-calization from Wireless Signal Strength, In Proc. the 20thNational Conference on Artificial Intelligence, pp.15-20,2005.

[13] A. Varshavsky and A. LaMarca and J. Hightower and Eyal deLara, The SkyLoc Floor Localization System, In Proc. of the5th IEEE International Conference on Pervasive Computingand Communications, pp.125-134, 2007.

[14] O. Woodman and R. Harle,Pedestrian localisation for indoorenvironments, In Proc. of the 10th international conference onUbiquitous computing, pp.114-123, 2008.

[15] D. Madigan, E. Einahrawy, R.P. Martin, W.-H. Ju, P. Krish-nan, A.S. Krishnakumar,Bayesian indoor positioning systems.In Proc. of the 24th Annual Joint Conference of the IEEEComputer and Communications Societies, pp.1217-1227, 2005.

[16] H. Lim, L.-C. Kung, J.C. Hou, H. Luo,Zero-Configuration,Robust Indoor Localization: Theory and Experimentation. In

Proc. of the 25th IEEE International Conference on ComputerCommunications, pp.1-12, 2006.

[17] V.W. Zheng, S.J. Pan, Q. Yang, J.J. Pan,Transferring Multi-device Localization Models using Latent Multi-task Learning.In Proc. of the 23rd AAAI Conference on Artificial Intelli-gence, pp.1427-1432, 2008.

[18] V.W. Zheng, E.W. Xiang, Q. Yang, D. Shen,TransferringLocalization Models Over Time. In Proc. of the 23rd AAAIConference on Artificial Intelligence, pp.1421-1426, 2008.

[19] J. Hamm and D. Lee and L.K. Saul,Semisupervised alignmentof manifolds. In Proc. of the 10th International Workshop onArtificial Intelligence and Statistics, 120-127, 2005.

[20] M. Belkin and P. Niyogi,Laplacian eigenmaps for dimension-ality reduction and data representation. Neural Computation,15, 2003.

Documents

Indoor Localization in Multi-Floor Environments with Reduced Effortai.stanford.edu/~huayanw/PerCom10.pdf · 2014. 3. 1. · Indoor Localization in Multi-Floor Environments with Reduced