IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. X ...Moustafa Youssef, Senior Member, IEEE Abstract—Indoor localization using mobile sensors has gained momentum lately. Most of the current

1536-1233 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citationinformation: DOI 10.1109/TMC.2015.2478451, IEEE Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. X, NO. X, XXXX 201X 1

SemanticSLAM: Using EnvironmentLandmarks for Unsupervised Indoor

LocalizationHeba Abdelnasser, Student Member, IEEE, Reham Mohamed, Student Member, IEEE, AhmedElgohary, Student Member, IEEE, Moustafa Farid, Student Member, IEEE, He Wang, Student

Member, IEEE, Souvik Sen, Member, IEEE, Romit Roy Choudhury, Senior Member, IEEE, andMoustafa Youssef, Senior Member, IEEE

Abstract—Indoor localization using mobile sensors has gained momentum lately. Most of the current systems rely on anextensive calibration step to achieve high accuracy. We propose SemanticSLAM, a novel unsupervised indoor localizationscheme that bypasses the need for war-driving. SemanticSLAM leverages the idea that certain locations in an indoor environmenthave a unique signature on one or more phone sensors. Climbing stairs, for example, has a distinct pattern on the phone’saccelerometer; a specific spot may experience an unusual magnetic interference while another may have a unique set of WiFiaccess points covering it. SemanticSLAM uses these unique points in the environment as landmarks and combines them withdead-reckoning in a new Simultaneous Localization And Mapping (SLAM) framework to reduce both the localization error andconvergence time. In particular, the phone inertial sensors are used to keep track of the user’s path, while the observed landmarksare used to compensate for the accumulation of error in a unified probabilistic framework.Evaluation in two testbeds on Android phones shows that the system can achieve 0.53 meters human median localization errors.In addition, the system can detect the location of landmarks with 0.83 meters median error. This is 62% better than a systemthat does not use SLAM. Moreover, SemanticSLAM has a 33% lower convergence time compared to the same systems. Thishighlights the promise of SemanticSLAM as an unconventional approach for indoor localization.

Index Terms—Unconventional localization, semantic SLAM, indoor localization, unsupervised localization.

F

1 INTRODUCTION

Although GPS is considered a ubiquitous outdoorlocalization technology, there is still no equivalent in-door technology that can provide similar accuracy andscale. This can be due to a number of reasons: First,a class of indoor localization technologies, e.g. [2]–[7] depends on special hardware installment, whichin turn limits their scalability. Second, WiFi-based

• Heba Abdelnasser and Reham Mohamed are with the Wireless ResearchCenter, Egypt-Japan University for Science and Technology (E-JUST),Alexandria, Egypt.E-mail: {heba.abdelnasser,reham.mohamed}@ejust.edu.eg

• Ahmed Elgohary is with the University of Maryland, USA. This workwas performed when the author was at E-JUST, Egypt.E-mail: [email protected]

• Moustafa Farid is with the University of California, Los Angeles, USA.This work was performed when the author was at E-JUST, Egypt.E-mail: [email protected]

• He Wang and Romit Choudhury are with the University of Illinois atUrbana-Champaign, USA.E-mail: {hewang5,croy}@illinois.edu

• Souvik Sen is with HP Labs, USA.E-mail: [email protected]

• Moustafa Youssef is with Egypt-Japan University for Science andTechnology (E-JUST), Alexandria, Egypt. Currently on sabbatical fromAlexandria University, Egypt.E-mail: [email protected]

An earlier version of this paper appeared in the proceedings of the ACMTenth International Conference on Mobile Systems, Applications, andServices (MobiSys) 2012 [1].

localization systems, e.g. [8]–[17], offer ubiquitouslocalization, however, they require tedious calibrationeffort. Third, to reduce this calibration effort, a num-ber of systems have been proposed, e.g. [16], [18], [19];nevertheless, in order to do that, they usually need tosacrifice accuracy.

Recently, techniques that leverage the inertial sen-sors (mainly the accelerometer, gyroscope, and com-pass) on cell phones have been proposed [20], [21].Such techniques depend on dead-reckoning, wherethe accelerometer signal is used to count the usersteps and the compass to determine the user direc-tion. However, since dead-reckoning error accumu-lates quickly, re-calibration of the user location isrequired. This is usually performed using the GPSin outdoor environments. However, GPS is unreliableindoors, and hence, other approaches are required forerror resetting.

In this paper, we propose SemanticSLAM, a sys-tem that leverages the smart phone sensors to detectunique points in the indoor environment, i.e. seman-tic landmarks, that can be used to reset the dead-reckoning error indoors. Starting from a buildingfloorplan that is either manually entered or auto-matically generated [22]–[24], SemanticSLAM discov-ers the landmarks and their locations in a crowd-sensing approach based on the data collected from




the building users and their dead-reckoned locations.These discovered landmarks are then used to resetthe error in the dead-reckoning estimation and henceleads to better localization accuracy. Note that this re-cursive dependence between estimating the landmarklocation and the user location lends itself naturally tothe Simultaneous Localization And Mapping (SLAM)framework commonly used in the robotics domain[25]. Therefore, at the core of SemanticSLAM is a novelSLAM framework that handles the characteristics ofsemantic landmarks while being robust to landmarkrecognition errors.

A semantic landmark is defined by two attributes:its sensors pattern and physical location. Based onthis, SemanticSLAM identifies two types of semanticlandmarks: seed landmarks and organic landmarks.When both attributes of a semantic landmark areknown a priori, the landmark is defined as a seedlandmark, which can be mapped to the physical en-vironment. For example, a person using the elevatorwill have a unique known pattern affecting the phoneacceleration. On the other hand, the attributes, i.e.pattern and physical location, of an organic landmarkcannot be known a priori. Therefore, SemanticSLAMlearns them in an unsupervised manner. For example,a point in the building with a dead cellular receptioncan be used as an organic landmark. Note that a seedlandmark location and pattern can also be learned,if needed, in an unsupervised way using the sametechnique used for organic landmarks. However, en-tering them initially bootstraps the system and speedsconvergence.

Evaluation of SemanticSLAM using Android phonesin a university building and a mall shows that thesystem can reach 0.53m median accuracy for humanlocation detection while localizing the landmarks towithin 0.83m median error. In addition, Semantic-SLAM leads to 33% enhancement in convergence timecompared to systems that do not use the SLAMframework.

In Summary, the contributions of this paper are:• We present the SemanticSLAM architecture and

framework that leverages smart phone sensors toboth dead-reckon the user location and identifysemantic landmarks. These landmarks are usedin a SLAM probabilistic framework to reset theaccumulated localization error.

• We present supervised and unsupervised tech-niques for the automatic detection of both seedand organic landmarks. We show that adequatelandmarks exist in indoor environments, leadingto accurate localization with no calibration.

• We implement SemanticSLAM on different An-droid phones and evaluate it in two differenttestbeds quantifying its accuracy and fast conver-gence time.

The rest of the paper is organized as follows: Sec-tion 2 gives a background about the SLAM frame-

work. Section 3 gives the architectural overview andthe details of landmarks extraction. The proposedsemantic SLAM framework is presented in Section 4.In Section 6, we present the system evaluation. Finally,we discuss related work and conclude the paper insections 7 and 8 respectively.

2 BACKGROUND

In this section, we provide a brief background onthe Simultaneous Localization and Mapping (SLAM)framework as well as the advantage of the FastSLAMalgorithm we choose as our implementation frame-work [26], [27].

2.1 Overview of the SLAM Framework

SLAM was originally used by mobile robots [25]to enable them to build an estimated map of anenvironment and, at the same time, use this mapto deduce the robot location. To do that, the robotgathers information about sensed nearby landmarksand concurrently measures its own motion. Both typesof measurements are noisy. SLAM provides a proba-bilistic framework to estimate both the map (Θ) alongwith the robot pose (location (xt, yt) and orientation(φt)). In particular, the goal in SLAM is to find theestimated pose (st) and map (Θt) that maximize thefollowing probability density function:

p(st,Θ|ut, zt, nt) (1)

Where ut is the robot motion update (displace-ment and heading) at time t obtained from the robotsensors, with ut = u1, ..., ut capturing the completehistory, zt = z1, ..., zt are the history of landmarkposition observations relative to the user position, andnt = n1, ..., nt are data association variables, where ntspecifies the identity of the landmark observed at timet.

The traditional approach for estimating the prob-ability density function in Eq. 1 was to use an Ex-tended Kalman Filter (EKF) [28], [29]. The EKF ap-proach represents the robot’s map and pose by ahigh-dimensional Gaussian density over all map land-marks and the robot pose. The off-diagonal elementsin the covariance matrix of this multivariate Gaussianrepresent the correlation between errors in the robotpose and the landmarks in the map. Therefore, theEKF can accommodate the natural correlation of er-rors in the map.

In the EKF approach, the probability density func-tion P (st,Θ|ut, zt, nt) is factorized into two inde-pendent models: a motion model P (st|ut, st−1) anda measurement model p(zt|st, θnt , nt), where θnt isthe location of landmark nt observed at time t. Themotion model describes how a control ut, assertedin the time interval [t − 1, t), affects the user’s pose.On the other hand, the measurement model describes




how measurements evolve from state. Both modelsare traditionally modelled by nonlinear functions withindependent Gaussian noise:

p(st|ut; st−1) = h(ut, st−1) + δt (2)

p(zt|st,Θ, nt) = g(st, θnt) + εt (3)

Here h and g are nonlinear functions, and δt and εtare Gaussian noise variables with covariance Rt andPt, respectively.

One limitation of the EKF-based approach is thecomputation complexity, which is quadratic in thenumber of landmarks [27]. Another key limitation isthe data association problem, i.e. how to determine theidentity of the detected landmarks when multiple ofthem have a similar signature (e.g. two nearby stairs,elevators, or turns), which can lead to different mapsbased on the chosen association. Gaussians cannotrepresent such multi-modal distribution over the dif-ferent candidate landmarks. The typical approach tohandle this problem in the EKF-SLAM literature is torestrict the inference to the most probable landmarkgiven the robots current map [30]–[32]. However,these tend to fail to converge when the estimated dataassociation is incorrect. Other approaches have beenproposed to interleave the data association decisionswith map building to revise past data associationdecisions [33]–[36]. However, such approaches cannotbe executed in real-time and hence cannot be used forhuman tracking.

The FastSLAM approach [26], [27] was introducedto address the issues of the EKF-SLAM approach.FastSLAM combines particle filters [37], [38] and ex-tended Kalman filters. The idea is to exploit a struc-tural property of the SLAM problem, where landmarkestimates are conditionally independent given therobot path. In other words, correlations in the uncer-tainty among different landmarks arise only throughrobot pose uncertainty; If the robot’s correct pathis known, the errors in its landmark estimates areindependent of each other. This observation allowsFastSLAM to factor the posterior over poses andmaps.

More formally, in FastSLAM the robot path, st =(s1, ..., st), is estimated as:

p(st,Θ|zt, ut, nt) = p(st|zt, ut, nt)NL∏n=1

p(θn|st, zt, nt)

(4)Where NL is the number of landmarks. This factor-

ization is exact and universal.Since the user path is not known in advance, Fast-

SLAM estimates the first term (the robot path (st))by a particle filter, where each particle representsa possible path. Conditioned on these particles, theindividual map errors are independent, hence the sec-ond term (mapping problem) can be factored into NL

UsersMobileTraces

User pose ,map

f

{θ}

Resampling

w

Positionupdate

Mapupdate

Observation update

Particle [m]

p , z

u

Dead-Reckoning

Ambience

Activity

Gyro.clusters

Wifi.clusters

ACC. And Mag.clusters

Multi-sensorclusters

Add new OLM

yes

Landmarks

AddnewSLM

If Sensor Cluster=

Spatial Cluster?(using dead-reckoning)

Feature Extraction

{s}ElevatorStationary

Stairs

Escalator

Walking

Acc.Var. Acc.

Organic Landmark Clustering Seed Landmark Decision Tree

Var. Mag. Field

Corr. Z-Y. Acc.

SLAM Algorithm

LANDMARK DETECTION

Fig. 1: SemanticSLAM System architecture.

separate problems, one for each landmark in the map.The individual landmark location probability densityfunction (p(θn|st, zt, nt)) is estimated using an EKF.More formally, the posterior of the mth particle (S[m]

t )contains a path st,[m] and NL landmark estimatesdenoted by the landmark type (fn,t), mean µ

[m]n,t and

covariance Σ[m]n,t :

S[m]t = st,[m], f1,t, µ

[m]1,t ,Σ

[m]1,t︸︷︷︸

Landmark θ1

, ..., fN,t, µ[m]N,t,Σ

[m]N,t︸︷︷︸

Landmark θNL

(5)

2.2 Advantages of the FastSLAM Algorithm

The factorization employed by FastSLAM leads toan algorithm that is logarithmic in the number oflandmarks, as compared to the quadratic time com-plexity for the EKF-SLAM. Moreover, data associationdecisions in FastSLAM can be made on a per-particlebasis. Therefore, the algorithm maintains posteriorsover multiple data associations, not just the mostlikely one as in the EKF-SLAM approach. This makesFastSLAM significantly more robust to data associa-tion problems [26], [27]. FastSLAM can also cope withnon-linear models and is proven to converge undercertain assumptions [27]. Therefore, we leverage theFastSLAM approach in SemanticSLAM due to theseadvantages.

3 SYSTEM OVERVIEW

Figure 1 shows the system architecture. The systemconsists of four main modules: Sensor data collectionand features extraction, landmark detection, dead-reckoning, and the SemanticSLAM framework. In thebalance of this section, we give an overview of thedifferent modules.




3.1 Sensor data collection and features extractionSensors data is collected from the users’ mobilephones in a crowd-sensing manner. Collected sensorsinclude inertial sensors (accelerometer, compass, andgyroscope) as well as WiFi and cellular access pointsand their associated signal strength. Note that inertialsensors have a low-energy profile while WiFi andcellular information is available during the phonenormal operation. Therefore, SemanticSLAM has aminimal effect on the phone energy consumption.

Collected sensors data is then analyzed to extractthe different features that can be used to identify thelandmarks.

3.2 Dead-reckoningInertial sensors are combined to provide an estimateof the user location. Starting from a reference point,e.g. last GPS location of the person outside a building,the user next location is obtained based on the motionupdate measurement ut = {lt, φt}, where lt is thedisplacement and φt is the heading change at timet.

3.2.1 Displacement from the accelerometerOne possible solution to obtain the displacement isto double-integrate the accelerometer readings. How-ever, due to the noisy cheap sensors on the phones,error accumulates quickly and can reach 100m withinseconds [39]. A better approach [21], [39] is to use astep counting approach based on the human walkingpattern. We use the UPTIME approach [39] as itadapts to the user step size based on her gait.

3.2.2 Orientation using compass/gyroscopeThe magnetic field in indoor environments, due toferromagnetic material and electrical objects in thevicinity, is very noisy, which can severely degrade thedead-reckoning performance. To address this issue,we fuse the gyroscope and magnetic sensor readings.The gyroscope provides accurate short term relativeangle change while the magnetometer provides longterm stability. In particular, we leverage the correla-tion between the two sensors readings to determinethe points of time where the compass reading isaccurate. We use these points as reference points(landmarks) to measure the relative angle from usingthe gyroscope until the detection of the next angularreference point [40].

3.3 Landmark DetectionEven though SemanticSLAM’s step counting approachreduces the dead-reckoning error accumulation, dis-placement error is still unbounded, which cannot beused for indoor tracking. Therefore, SemanticSLAMleverages a novel approach of detecting unique pointsin the environment, i.e. landmarks, that can be used to

reset the errors. Specifically, whenever the user phonedetects a landmark based on a unique multi-modalsensor signature, her position is reset to the positionof this landmark, resetting the dead-reckoning error.We define two types of landmarks: seed landmarks(SLM) and organic landmarks (OLM).

Seed landmarks are landmarks that can be mappedto physical points in the environment and are usedto bootstrap the system. Examples of SLMs includestairs, elevators, escalators, etc. Those SLMs have aunique effect on one or more of the phone sensorsand hence can be uniquely detected.

On the other hand, organic landmarks do not nec-essarily map to an object and are detected based ontheir unique signature on the sensors. Usually, theseare detected based on detecting consistent anomaliesin one or more sensor patterns.

3.4 The SemanticSLAM FrameworkSince the landmark location is estimated based onthe user location, which in turn is a function ofthe detected landmark location; this recursive defi-nition lends itself naturally to a SLAM framework.SemanticSLAM provides a novel framework that useslandmarks as observations to enhance both the userlocation estimation and the landmark identification.

In particular, the dead-reckoning state as well as thedetected landmarks are fed into the SemanticSLAMalgorithm which calculates the current pose of thetracked entity and updates the landmarks positionsin a unified framework.

4 LANDMARKS DETECTION

Many points in indoor environments exhibit uniquesensors signatures, which can be used as landmarks.Indoor environments are rich with ambient signals,like sound, light, magnetic field, temperature, WiFi,3G, etc. Moreover, different building structures (e.g.,stairs, doors, elevators) force humans to behave incertain ways.

In this section, we give the details of the detectionof both the seed and organic landmarks.

4.1 Seed LandmarksSeed landmarks (SLMs) are landmarks that can beassociated with specific objects in the environmentsuch as elevators and stairs. If the building floorplanis known (which is often necessary to visualize theuser’s location), then we can infer the locations ofdoors, elevators, staircases, escalators, etc. This im-plies that the locations of seed landmarks are im-mediately known. As long as the smartphone candetect these SLMs while passing through them, it canrecalibrate its location. Thus, the goal of the SLMdetection module is to define sensors patterns that areglobal across all buildings.




Figure 6: (a) UnLoc users walk and periodically encounter landmarks – refines landmark locations, corrects own location.(b) The solid circle showing the centroid of the dead-reckoned estimates. Multiple erroneous estimates leading to a betterapproximation of the landmark location.

!"#$%&'()*%&&#(+

,%&-.#/0

!"#$%&'(

1%(2)3--2)4'50

1%(2,%62))78#"/

4'50

!9-%"%&'( :&%&8'+%(;

<#9='

<#9

<#9='

>'((2)?@<)3--24'50

:&%8(9

=' <#9

='

A%"B8+6>'((2)?@<

)3--

2

>'((2)?@<

)3--

2

1%(

2),%6

2))

78#"/

3--

2

1%(

2),%6

2))

78#"/

1%(

2)3--

2

1%(

2)3--

2

Figure 7: Decision tree for detecting Seed LandMarks (SLMs). The top level separates the elevator based on its uniqueacceleration pattern. The second level separates the constant velocity classes (stairs and escalator) from the other twoclasses (walking and stairs) based on the variance of the acceleration. The third level uses the variance of magnetic fieldto separate the escalator from the stationary case and the correlation between the Z and Y acceleration components toseparate between the stairs and walking cases.

cations of doors, elevators, staircases, escalators, etc. Thisimplies that the location of seed landmarks (SLMs) are imme-diately known. As long as the smartphone can detect theseSLMs while passing through them, it can recalibrate its loca-tion. Thus, the goal of the SLM detection module is to definesensor patterns that are global across all buildings.

Elevators, Staircases, and EscalatorsThis class of SLMs are based on using the inertial sensors.These sensors have the advantage of being ubiquitously in-stalled on a large class of smart phones, having a low-energyfootprint, and being always on during the phone operation(to detect the change of screen orientation). We focus onthree particular examples that are common in indoor envi-ronments: elevators, escalators, and stairs. Figure 7 showsa classification tree for detecting the three classes of interestand separating them from walking and being stationary. Wenote that a false positive leads to errors in estimating the lo-cation of the SLM while a false negative leads to missing anopportunity for recalibration. Therefore, high detection accu-racy with low false positive/negative rates are highly desired.

Elevator: A typical elevator usage trace consists of normal

walking period, followed by waiting for the elevator for some-time, walking into the elevator, standing inside for a shorttime, an over-weight/weightloss occurs (depending on thedirection of the elevator), then a stationary period which de-pends on the number of the floors the elevator moved, an-other weight-loss/over-weight, and finally a walk-out. To rec-ognize the elevator motion pattern, we developed a FiniteState Machine (FSM) that depends on the observed state tran-sitions. Different thresholds are used to move between thestates. Evaluation over 22 traces show that the thresholds arerobust to changes in the environment and can achieve 0.6%and 0% false positive and negative rates, respectively.

Escalator: Once the elevator has been separated, it is easy toseparate the classes with constant velocity (escalator and sta-tionary) from the other classes (walking and stairs) using thevariance of acceleration. Now, to separate the escalator fromstationarity, we found that the variance of the magnetic fieldcan be a reliable discriminator. Of course, a user may some-times climb up the escalator – we find that magnetic variancealso differentiates between this and an actual staircase-climb(see Figure 9).

Stairs: Once the scenario with constant speed is separated,

Fig. 2: Classification tree for detecting SLMs. The top level separates the elevator based on its uniqueacceleration pattern. The second level separates the constant velocity classes (stairs and escalator) from theother two classes (walking and stairs) based on the variance of the acceleration. The third level uses thevariance of magnetic field to separate the escalator from the stationary case and the correlation between theZ and Y acceleration components to separate between the stairs and walking cases.

In this section, we discuss three inertial sensors-based of SLMs that are common in indoor environ-ments: Elevators, Staircases, and Escalators. Inertialsensors have the advantage of being ubiquitouslyinstalled on a large class of smart phones, havinga low-energy footprint, and being always on duringthe phone operation (to detect the change of screenorientation). Figure 2 shows a classification tree fordetecting the three classes of interest and separatingthem from walking and being stationary.

Elevator: A typical elevator usage trace (Figure 3)consists of a normal walking period, followed bywaiting for the elevator for some time, walking intothe elevator, standing inside for a short time, anover-weight/weight-loss occurs (depending on thedirection of the elevator), then a stationary periodwhich depends on the number of the floors the el-evator moved, another weight-loss/over-weight, andfinally a walk-out. The accelerometer shows distinctsignatures in an elevator in the form of a pair ofsymmetric bumps in opposite directions, as shown inFigure 3. To recognize the elevator motion pattern,we use a Finite State Machine (FSM) that depends onthe observed state transitions. Different thresholds areused to move between the states.

Evaluation over 22 traces shows that the thresholdsare robust to changes in the testbed and can achieve0.6% and 0% false positive and negative rates, respec-tively.

Escalator: Once the elevator has been separated, itis easy to separate the classes with constant veloc-ity (escalator and stationary) from the other classes(walking and stairs) using the variance of acceleration.To further separate the escalator from stationarity, wefound that the variance of the magnetic field can be areliable discriminator (Figure 4) due to the motor of

0 5 10 15 20 25 308

9

10

11

Time in Seconds

Magnitude

ofAcceleration(m/s2 )

Elevator Stopping

Elevator StartingNormal walking

Waiting for the elevatior

Fig. 3: Accelerometer signature inside the elevator(caused by the elevator starting and stopping).

Mag

net

ic V

aria

nce

4

2

0 3 6 2 5 8 11

Stairs Escalator

Time in Seconds

Fig. 4: Difference in magnetic variance when a user isclimbing stairs and escalator.

the escalator.Stairs: Once the scenarios with constant speed are

separated, we need to differentiate between the stairsand walking case. The main observation here is thatwhen the user is using the stairs, her speed increasesor decreases based on whether the gravity is helpingor not. This creates a higher correlation between theacceleration in the direction of motion and direc-tion of gravity as compared to walking. As reportedlater, staircases can sometimes lead to false negatives



IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. X, NO. X, XXXX 201X 6Ph

on

es

Time Cluster on

Sensor features Locations of

cluster members

OLM

Sensor readings

Fig. 5: A matrix showing sensor readings collectedby devices across time. Readings are clustered andthe location of cluster members is computed. If allcluster members fall within a small region, an OLMis detected.

(1.8%).

4.2 Organic Landmarks

In addition to seed landmarks, some landmarks canbe detected dynamically. Most of indoor environ-ments offer some ambient signatures across one ormany sensing dimensions. These signatures can bein the magnetic domain, where metals in a specificlocation may produce unique and reproducible fluc-tuations on the user’s magnetometer near that lo-cation. Signatures could also be WiFi-based, a spotmay overhear a set of WiFi base stations, but theset may change at short distances away from thatspot. A few (dead) spots inside a building may notoverhear any WiFi or GSM/3G signals, which by itselfis a signature. Further, even a water-fountain couldbe a signature, users that stop to drink water mayexhibit some common patterns on the accelerometerand magnetometer domains.

The task of discovering organic landmarks (OLMs)is rooted in (1) recognizing distinct patterns frommany sensed signals, (2) and testing whether a givenpattern is spatially confined to a small area. Figure 5illustrates the flow of operations. All the sensor read-ings are gathered in a matrix: element <i,j> of thematrix contains sensor readings from phone i at timej. These sensor readings are essentially features of theraw sensed values (from the accelerometer, compass,gyroscope, magnetometer, and WiFi). Features for themagnetic and inertial sensors include mean, max, min,variance, mean-crossings, while for WiFi, they are MACID and RSSI.

These features are normalized between [−1, 1] andfed to a K-means clustering algorithm. The cluster-ing process is executed for each individual sensingdimension, as well as their combinations (such asaccelerometer and compass together). Figure 6, forexample, shows the clusters from the magnetometerreadings for K = 3. The clusters were recorded fordifferent values of K. The goal is to identify clusters

0.30.5

0.70.9

0.3

0.5

0.7

0.90.3

0.5

0.7

0.9

Fig. 6: Clusters identified by the K-means algorithm.

that have low similarity with all other clusters; thiswill suggest a good signature. For this, we computethe correlation between a given cluster and all otherclusters – if the maximum correlation is less thana similarity threshold, this cluster is considered as acandidate for landmark.

To qualify the candidate cluster as an OLM, it mustalso be confined to a small geographical area. For this,we first test whether the members of a cluster arewithin the same WiFi area (i.e., they overhear the sameWiFi APs). While this is necessary, it is not sufficientbecause many WiFi areas are large. Therefore, forclusters within a WiFi area, we compute the locationsfor each of their members. If locations of all cluster-members are indeed within a small area (we use4m2) then we declare this cluster as an OLM. Wefound that using the accelerometer, the points insideone of the sensor clusters were scattered all over theindoor space; upon investigation, we detected thatthis cluster roughly captured walking patterns. On theother hand, another cluster that proved to be withinthe 4m2 area was from a magnetic signature near anelectrical service room in the building. The locationof the OLM is obtained through the SemanticSLAMrecursive framework (Section 4).

While the above describes the generalized versionof the OLM detection algorithm, the different sensingdimensions require some customization, discussednext.

4.2.1 WiFi LandmarksWe use MAC addresses of WiFi APs and their corre-sponding RSSI values as features. Only APs with RSSIstronger than a threshold are considered. Applying K-means clustering, we identify small areas (4m2) thathave low similarity with all locations outside thatarea. We compute the similarity of two locations, l1and l2, as follows:

Let us denote the sets of WiFi APs overheard atlocations l1 and l2 as A1 and A2, respectively. Also,let A = A1

⋃A2. Let fi(a) denote the RSSI of AP a,

a ∈ A, overheard at location li; if a is not overheard at




.1 .2 .3 .4 .5 .6 .70

4

8

12

16

20

Similarity Threshold

Num

ber

of W

iFi O

LM

s

Fig. 7: Tradeoff between similarity threshold andnumber of WiFi landmarks.

li, then fi(a) = 0. We now define similarity S ∈ [0, 1],between locations l1 and l2 as:

S =1

|A|∑∀a∈A

min(f1(a), f2(a))

max(f1(a), f2(a))

The rationale for this equation is to add propor-tionally large weights to S when an AP’s signals aresimilarly strong at both locations, and vice versa. Wechoose a threshold of 0.4 in our system to define aWiFi landmark, indicating that all locations within theWiFi landmark need to exhibit less than 0.4 similaritywith any other location outside the landmark. Figure 7shows this tradeoff using traces from the EngineeringBuilding. We observed that 0.4 was a reasonable cut-off point, balancing quality and quantity of WiFiOLMs. In addition, we found that this value givescomparable performance for the two testbeds.

4.2.2 Magnetic and Inertial Sensor LandmarksIndoor environments are characterized by at least afew turns (at the end of corridors, into offices, class-rooms, stairs, etc.). Since the gyroscope offers reliableangular displacements, we recognize the opportunityto use them as organic landmarks. We design a spe-cial feature called the bending coefficient. Essentially,the coefficient captures the notion of path curvature,computed as the length of the perpendicular fromthe center of a walking segment to the straight linejoining the end-points of the segment. We computethe bending coefficient over a sliding window onthe user’s walking path, and use them as a separatefeature. Later, when we cluster on bending coefficientand WiFi together as features, similar turns within aWiFi area gather in the same cluster. The turns in thecluster could still be doors of adjacent classrooms ina corridor – these turns may very well lie within thesame WiFi area. To avoid gathering all these turns intothe same landmark, we check if the cluster is confinedto within a 4m2 area; only then is the cluster declareda landmark.

Notation Definition

f The detected landmark type (e.g. elevator pattern)ut The control data at time t (the dead-reckoning in-

put)ut The history of control data t (ut = u1, u2, ..., ut)lt The estimated displacement at time t from the sen-

sors[m] The particle indexL[m]t The sampled step length for particle [m] at time tφt The sensors heading change estimate at time t

Φ[m]t The sampled heading change for the particle [m]

st The user’s pose at time t (st = {sxt , syt , s

φt })

st The predicted pose at time tst The posterior over the entire path (st = s1, s2.....st)S[m]t The posterior over all path and landmark

positions for one particle (S[m]t =

st,[m], f1,t, µ[m]1,t ,Σ

[m]1,t︸︷︷︸

Landmark θ1

, ..., fN,t, µ[m]N,t,Σ

[m]N,t︸︷︷︸

Landmark θNL

)

zt The measurement of landmark position at time t(zt = 0)

z[m]nt

The measurement estimation of particle [m] withobserved landmark n at time t

Θ The landmark mapθnt The location of the landmark n at time tQt The landmark observation covariance matrix at time

tRt The measurement covariance at time tPt The covariance matrix of the control data at time tµ[m]st The mean of the estimated pose st at time t for

particle [m]∑[m]st

The covariance matrix of the estimated pose st attime t for particle [m]

η Normalization factorpn The likelihood correspondence of landmark n with

the observed pattern f at time tp0 The likelihood of observing a landmark for the first

timeK

[m]t Kalman gain for particle [m] at time t

µ[m]nt,t

The mean of the estimated Gaussian position oflandmark nt at time t for particle [m]∑[m]

nt,tThe covariance matrix of the estimated Gaussianposition of landmark nt at time t for particle [m]

w[m]t The weight of particle [m] at time t

TABLE 1: Notations used in the paper.

5 SEMANTICSLAM: SEMANTIC SIMULTA-NEOUS LOCALIZATION AND MAPPING

Once the landmarks have been detected, our systemcombines them with a dead-reckoning approach toboth estimate the person’s location and the landmarklocation. For that, we propose a modified FastSLAMalgorithm [27] with unknown data association withchanges to cope with the semantic landmarks detec-tion. FastSLAM has the advantage of proved con-vergence even with a single particle, where it hasa constant update time in this case, as well as itincorporates the measurement in the motion mode.This last feature allows it to handle the case whenthe noise in the person motion is large relative to themeasurement noise. Table 1 summarizes the notationsused in this section.

The algorithm consists of three steps: sampling,map update, and re-sampling. Without loss of gen-




erality, we assume that only a single landmark isobserved at each time t.

5.1 Sampling the user’s poseThe first step of SemanticSLAM algorithm is estimat-ing the current pose of the tracked entity. Given thecontrol data ut obtained from the dead reckoning step,the measurement zt, and the previous pose s[m]

t−1, thecurrent pose s[m]

t for each particle [m] is sampled bythe following probability distribution:

s[m]t ∼ P (st|st−1,[m], ut, zt, nt) (6)

This distribution can be divided into the productof two factors: the next state distribution, and theprobability of the measurement zt as:

P (st|st−1,[m], ut, zt, nt) = p(st|s[m]t−1, ut)︸︷︷︸

st∼N (h(s[m]t−1,ut),Pt)

.

η[m]

∫p(zt|θnt

, st, nt)︸︷︷︸zt∼N (g(θnt ,st),Rt)

p(θnt|st−1,[m], zt−1, nt−1)︸︷︷︸

θnt∼N (µ[m]nt,t−1,Σ

[m]nt,t−1)

(7)

The next state distribution depends on the esti-mated control data ut = {lt, φt} with displacement ltand heading change φt:

s[m]t ∼ P (st|s[m]

t−1, ut) (8)

Assuming Gaussian-distributed errors for both ltand φt, the sampled displacement and heading forparticle [m] at time t is calculated as:

L[m]t ∼ N (lt, σl) (9)

φ[m]t ∼ N (φt, σφ) (10)

where σl and σφ are the variance of the displacementand heading estimation errors respectively.

Therefore, the user’s sampled pose in Eq. 8 can berewritten as:

s[m],φt = s

[m],φt−1 + φ

[m]t (11)

s[m],xt = s

[m],xt−1 + L

[m]t cos(s

[m],φt ) (12)

s[m],yt = s

[m],yt−1 + L

[m]t sin(s

[m],φt ) (13)

Note that equations 9 through 13 are the implemen-tation of the non-linear h(.) function in Eq. 2.

The probability of the measurement zt involves anintegration over all possible landmark locations θnt ,which is not possible in the general case. To addressthis issue, the FastSLAM framework approximatesg(.) (from Eq. 3) as a linear function, leading to aclosed form solution as:

g(θnt, st) ≈ z[m]

nt+Gθ,n.(θnt

−µ[m]n,t,t−1)+Gs,n.(st− s[m]

n,t )(14)

where z[m]n,t = g(µ

[m]n,t−1, s

[m]n,t ) is the predicted mea-

surement, s[m]t = h(s

[m]t−1, ut) is the predicted user’s

pose (from Eq. 8), and θ[m]n,t = µ

[m]n,t−1 is the predicted

landmark location. The matrices Gθ,n and Gs,n are theJacobians of g(.) with respect to θ and s respectively.

Thus, the proposal distribution in Eq. 6 is Gaussianwith the parameters:

Σ[m]sn,t

= [GTs,nQ[m]−1n,t Gs,n + P−1

t ]−1 (15)

µ[m]sn,t

= s[m]n,t + Σ[m]

sn,tGTs,nQ

[m]−1n,t (zt − z[m]

n,t ) (16)

where Q[m]n,t = Gθ,nΣ

[m]n,t−1G

Tθ,n +Rt is the landmark

observation covariance matrix, Rt is the measurementcovariance matrix and zt is the actual landmark posi-tion observation.

For the SemanticSLAM problem, we set:

z[m]n,t = g(µ

[m]n,t−1, s

[m]n,t ) = ||µ[m]

n,t−1 − s[m]n,t || (17)

reflecting that the measurement is the distance be-tween the landmark and the current user’s location.Moreover, we set zn,t = 0 indicating that the landmarkis observed when the user is at the landmark location.

Therefore, the final user position is sampled fromthe distribution N (µ

[m]sn,t ,Σ

[m]sn,t)

5.2 Map Update

Each particle has an independent map that containsthe locations of the landmarks represented by theirmean (µ), covariance matrix (Σ) and the associatedlandmark pattern (f ). The purpose of the map updatestep is to update the location of the currently detectedlandmark. Due to the inherent sensors noise, thereis ambiguity in landmark detection related to boththe landmark locations and its type. Therefore, inSemanticSLAM, we compute the probability of actu-ally observing each landmark (f ) when the detectedpattern is f . This uncertainty is represented by aconfusion matrix, where each cell (i, j) in the matrixrepresents p(fi|fj) for each landmark types i, j. Giventhis confusion matrix and the position uncertainty Qnfor each landmark n, the likelihood of correspondencewith landmark n is calculated based on an EKF ap-proximation [41] as:

pn = η|2πQn,t|−12 exp{−1

2zTn,t.Q

−1n,tzn,t}.p(ft|ft) (18)

where η is the normalization factor. Note that thislandmark likelihood takes into account the distancebetween the landmark and the current user pose, thelocation uncertainty, and the confusion between theobserved and actual landmark type.

We also assume that the probability of observinga new landmark given the detected pattern can becalculated as:

pn = ηp0.p(ft|ft) (19)




where p0, the probability of observing a new land-mark, is a constant determined empirically.

Finally, the landmark with the greatest probability(nt) is selected as the currently observed landmarkand its location is updated using the standard EKFformulas as:

K[m]t = Σ

[m]nt,t−1G

Tθ,nt

Q[m]−1nt,t

(20)

µ[m]nt,t

= µ[m]nt,t−1 −K

[m]t zTnt,t (21)

Σ[m]nt,t

= (I −K [m]t Gθ,nt)Σ

[m]nt,t−1 (22)

If it is more probable that the observed landmark isa new landmark, then a new landmark is added to themap, with the current position of the tracked entitys

[m]t as its location and the measurement uncertaintyRt as its covariance matrix.

Note that in SemanticSLAM inherits the property ofFastSLAM that there are multiple hypothesis of thepossible landmark, each corresponding to a differentparticle. In other word, each particle has its ownbelief in the currently observed landmark. This multi-hypothesis property provides SemanticSLAM with ro-bustness to large errors as discussed in Section 2.

5.3 ResamplingEach particle represents an estimate of the trackedentity pose with a weight w[m] reflecting the confi-dence of the pose associated with this particle. Theweights of the particles are initialized equally andupdated at each step based on the likelihood of thedetected landmarks, i.e. the landmark with the highestprobability for each particle, as:

w[m]t = w

[m]t−1.p

[m]nt

(23)

The pose and the map of the particle which has themaximum weight are selected as the current estimate.Then, a resampling step is performed using the cur-rent weights in order to refine the particles and dropthose that significantly deviated from the actual path.

Algorithm 1 summarizes the full algorithm.

6 PERFORMANCE EVALUATION

In this section, we present the performance evaluationof SemanticSLAM. SemanticSLAM can work in twomodes of operation: offline and online. During theonline phase, the user location is reported in eachestimate. The offline mode is useful in applicationsthat can tolerate delays, such as indoor user analytics.In this case, the entire path is determined based on thebest particle at the end of the user movement traceinstead of taking a local decision at every instant toselect the best particle.

We start by describing our testbeds followed bythe landmark and user location detection accuracy.We end the section by comparing the accuracy of the

Algorithm 1 SemanticSLAM(zt, ut, St−1)

1: for m = 1 to M do do2: retrieve 〈s[m]

t−1, N[m]t−1, w

[m]t−1, 〈f

[m]1,t−1, µ

[m]1,t−1,Σ

[m]1,t−1〉, ...,

〈f [m]N,t−1, µ

[m]N,t−1,Σ

[m]N,t−1〉〉 from St−1

3: for n = 1 to N[m]t−1 do do . calculate sampling

distribution4: s

[m]t ∼ P (st|s[m]

t−1, ut) . predict pose5: z

[m]n,t = g(µ

[m]n,t−1, s

[m]n,t ) . predict

measurement6: Σ

[m]sn,t = [GTs,nQ

[m]−1n,t Gs,n + P−1

t ]−1 . Cov ofproposal distribution

7: µ[m]sn,t = s

[m]n,t + Σ

[m]sn,tG

Ts,nQ

[m]−1n,t (zt − z[m]

n,t ) .mean of proposal distribution

8: s[m]n,t ∼ N (µ

[m]sn,t ,Σ

[m]sn,t) . sample pose

9: pn = η|2πQn,t|−12 exp{− 1

2zTt .Q

−1n,tzt}.p(ft|ft)

. correspondance likelihood10: end for11: p

1+N[m]t−1

= ηp0.p(ft|ft) . likelihood of newlandmark

12: n = argmax{n∈1,...,1+N[m]t−1}

pn . MLcorrespondence

13: N[m]t = max{N [m]

t−1, n} . new number offeatures

14: if nt = 1 +N[m]t−1 then . is new landmark?

15: µ[m]nt,t

= s[m]t . initialize mean

16: Σ[m]nt,t

= Rt . initialize covariance17: else18: K

[m]t = Σ

[m]nt,t−1G

Tθ,nt

Q[m]−1nt,t

. calculateKalman gain

19: µ[m]nt,t

= µ[m]nt,t−1 −K

[m]t zTnt,t

. update mean20: Σ

[m]nt,t

= (I −K [m]t Gθ,nt)Σ

[m]nt,t−1 . update

covariance21: end if22: w

[m]t = w

[m]t−1.p

[m]nt

. importance weight23: end for24: loop M times25: draw random index m with probability α w[m]

. resample26: add 〈s[m]

t , N[m]t , w

[m]t , 〈f [m]

1,t , µ[m]1,t ,Σ

[m]1,t 〉, ...,

〈f [m]N,t , µ

[m]N,t,Σ

[m]N,t〉〉 to St

27: end loop

offline and online modes of operation as well as quan-tifying the advantages of using the proposed SLAMframework in terms of accuracy and convergence timecompared to other systems.

6.1 Experimental Testbed

SemanticSLAM is implemented on different Androidphones. The inertial sensors as well as WiFi informa-tion are sampled and sent to the server for processing.

We evaluated our system in two different testbeds:the Engineering Building in Duke university and a




Shopping Mall in Alexandria, Egypt. The Engineer-ing Building area is 3000m2 and the breakup of itslandmarks is: 9 magnetic, 8 turns, and 15 WiFi OLMsand 3 SLMs, as shown in Figure 8. The Shopping Mallarea is 6000m2 and the breakup of its landmarks is:9 magnetic, 12 turns, and 12 WiFi OLMs as well as 4SLMs. Due to space constraints, we only present theresults of the Engineering Building testbed and referthe reader to Appendix A for the detailed results ofthe Mall testbed.

Three different users move in the two testbeds tocollect the data. Each user walked around arbitrarilyin the building for 1.5 hours where each of them usesthe landmarks detected by the previous user(s). Thedefault setting of the system is the online mode.

6.2 Landmark Type Detection AccuracyIn this section, we evaluate SemanticSLAM ability todetect the landmarks type accurately.

6.2.1 Seed landmarksTable 2 shows the confusion matrix for the detectionof different seed landmarks. The matrix shows thatsome SLMs are easier to detect than others due to theirunique patterns. Overall, SemanticSLAM can achieve0.2% false positive and 1.1% false negative rates.

6.2.2 Organic landmarksIt is important that landmark signatures are stableover time and unique over distance. To evaluate thesetwo points, we collected sensor readings on multipledays. We found sound consistency in the signatures.This is expected due to our design, where we use alow “similarity threshold” to detect an organic land-mark as discussed in Section 4. Therefore, missing afew will affect performance much less than matchingto an incorrect landmark.

6.3 Localization AccuracyIn this section, we evaluate the effect of the numberof particles and landmark density on accuracy as wellas the overall user localization accuracy.

6.3.1 Effect of the number of particlesFigure 9 shows the effect of the number of particlesused for pose estimation on the overall estimationaccuracy. The figure shows that the performance satu-rates around 50 particles achieving a median accuracyof about 0.53m. Therefore, we use 50 particles for therest of this section.

6.3.2 Seed landmarks effectFigure 10 shows the CDF of localization error forthe Engineering building testbed with and withoutusing the seed landmarks. The case of not usingseed landmarks reflects running SemanticSLAM with

no prior information at all. The figure shows that,even without the seed landmarks, SemanticSLAM canachieve high accuracy. Removing the seed landmarksreduces the localization accuracy due to the reductionof the overall number of landmarks. The Engineer-ing Building has a higher accuracy compared to theShopping Mall (Appendix A) due to the higher spatialdensity of landmarks in the Engineering Building. Wenote, however, that other common seed landmarks(e.g. turns, doors, and windows) can also be usedin single floor buildings to compensate for the non-existence of “vertical transport landmarks”.

6.3.3 User location accuracyTo evaluate the user localization accuracy, we calcu-lated the Euclidean distance error at each step duringthe simulation. Figure 11 shows the pattern of theerror in the Engineering Building. From the figure,we notice a saw-tooth pattern for the error, wherethe error range decreases with time. The justificationfor this pattern is that the error increases while theuser is walking because of the noise in the sensors’readings. Whenever a landmark is observed, the er-ror is reduced. As time passes, more landmarks areobserved. Therefore, the map accuracy increases andtherefore the user location error decreases.

6.4 Offline vs Online localizationFigure 12 shows the CDF of the offline and onlineSemanticSLAM in the Engineering Building. The figureshows that the offline SemanticSLAM can significantlyreduce the tail of the distribution compared to theonline SemanticSLAM with a slight enhancement inmedian distance error.

6.5 Advantage of using a SLAM FrameworkIn this section, we quantify the advantage of theproposed semantic SLAM framework. For that, wecompare SemanticSLAM with a previous version (ourUnloc system [1]) that does not use the SLAM frame-work. Specifically, Unloc does not take into accountthe uncertainty of the landmark or the user’s locationwhich is considered in SemanticSLAM.

6.5.1 Convergence timeTo measure the constructed map accuracy, at everytime instance t we calculate the average locationaccuracy of all observed landmarks positions at theestimated map. The user movement was emulatedto extend the simulation time to reach convergenceby repeating the user’s path trace in the simula-tion. Figure 13 shows the results for the EngineeringBuilding. The figure shows that the SemanticSLAMaccuracy converges quickly within 80 minutes andsaturates at about an average accuracy of 0.83m. Thisis better than the Unloc system [1] that converges after2 hours to around 1m average error. Note that even




(a) Testbed 1: An engineering buildingin a university campus.

(b) Testbed 2: A shopping mall inAlexandria, Egypt.

Fig. 8: Testbeds used in the evaluation. The blue, red and yellow anchorsare the inertial, magnetic, and WiFi landmarks respectively.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9

Cum

ula

tive d

istr

ibution function (

CD

F)

Localization Error in meters

25 particles50 particles75 particles

100 particles

Fig. 9: CDF of user loc. accuracy fordifferent number of particles.

Elevator Stationary Escalator Walking Stairs FP FN TracesElevator 24 0 0 0 0 0% 0% 24Stationary 0 31 1 0 0 0% 3.1% 32Escalator 0 0 22 0 0 0.6% 0% 22Walking 0 0 0 39 0 0% 0% 39Stairs 0 0 0 1 52 0% 1.8% 53

Overall 0.2% 1.1% 170

TABLE 2: Confusion matrix for classifying different seed landmarks.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6

Cum

ula

tive

dis

trib

ution f

unctio

n (

CD

F)


With SLMWithout SLM

Fig. 10: CDF of loc. accuracy withand without using seed landmarks.

0

1

2

3

4

5

6

0 2 4 6 8 10 12 14 16

Loca

lizat

ion

Err

or in

met

ers

Time in minutes

Fig. 11: User localization error overtime.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Cum

ula

tive

dis

trib

ution f

unctio

n (

CD

F)


OfflineOnline

Fig. 12: CDF of loc. accuracy foronline and offline SemanticSLAM.

0.5

1

1.5

2

2.5

3

3.5

0 0.5 1 1.5 2 2.5 3 3.5 4

OLM

Loca

liza

tion

Accu

racy (

m)

Time in hours

with SLAMwithout SLAM

Fig. 13: Organic landmarks (OLM)loc. accuracy improvement withtime.

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6

Cu

mula

tive

dis

trib

ution f

unction

(C

DF

)


With SLAMWithout SLAM

Fig. 14: Advantage of using theproposed SLAM framework on loc.accuracy.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Cu

mula

tive

dis

trib

ution f

unction

(C

DF

)


SemanticSLAMWiFi [9]

Magnetic field [41]

Fig. 15: Comparison between Se-manticSLAM and other fingerprint-ing systems.

though SemanticSLAM requires some time to learn theorganic landmarks, the effort to learn these landmarksis transparent to the user (collected through crowd-sourcing) as compared to traditional fingerprintingtechniques where the user has to perform the cali-bration manually.

6.5.2 Accuracy

In Figure 14, we compare our system accuracy withand without SLAM in the Engineering Building. Thefigure shows that the proposed SLAM framework inthis paper can significantly enhance accuracy, achiev-ing 62% enhancement in median distance error.




6.6 Comparison with Other SystemsIn this section, we compare SemanticSLAM with twoother systems: a WiFi fingerprinting system (similarto the Horus system [9]) and a magnetic fingerprint-ing system (similar to the MaLoc system [42]). Themagnetic based system uses a particle filter to furtherimprove the accuracy, which is not used in the WiFi-based system. Figure 15 shows that SemanticSLAM hasbetter median accuracy as it uses more landmarks,both seed and organic. Fingerprinting-based systems,however, has a better worst-case error due to theinitial manual calibration.

7 RELATED WORK

Our work is related to prior work in both indoorlocalization systems and SLAM systems.

7.1 Indoor localizationMany systems have been proposed over the yearsto address the indoor localization problem [3], [5],[8], [9], [18], [43]–[46]. From these RF-based systems,especially those based on WiFi, have gained attentiondue to their ubiquitous deployment. Typically, thesetechniques build an RF fingerprint [8], [9] of thearea of interest to compensate for the noisy wirelessenvironment. However, building this fingerprinting isa tedious and time consuming process that needs tobe repeated from time to time due to environmentchanges.

More recently, a number of systems have beenproposed to reduce the calibration and re-calibrationoverhead of WiFi-based localization systems, e.g. [16],[18], [19]. These systems either depend on installingspecial hardware to monitor changes in the signalstrength [16], crowd-sourcing [47] with active userparticipation, and/or propagation modelling tools[15], [48]–[50].

Dead-reckoning using smart phones inertial sensorshas been also used in indoor localization [51]–[54].However, dead-reckoning error quickly accumulatesleading to complete deviation from the actual path.Therefore, GPS has been used in outdoor localizationsystems to recalibrate the dead-reckoned location [20],[21]. Since GPS is unreliable indoors, other techniqueswere used to improve the dead-reckoning errors [52],[54] such as constraining the resulting traces with theindoor floorplan layout.

SemanticSLAM, on the other hand, is unique inleveraging environment hints (i.e. landmarks) forresetting the error in dead-reckoning. These multi-sensors landmarks are learned in an organic way,with no calibration or active involvement from theuser. Moreover, by combining the SLAM frameworkwith landmarks as observations, SemanticSLAM canachieve high accuracy with guaranteed system con-vergence. In addition, since data is collected all thetime through the system users, any changes in the

organic landmarks locations or signatures can be cap-tured in realtime, keeping the system up-to-date.

7.2 SLAM

The SLAM algorithm was originally introduced inrobotics for navigation of autonomous agents in un-known environments. It has been applied recently inhuman tracking systems. WiFi SLAM [55] builds alandmark map of the WiFi signal strength based ona Gaussian Process Latent Variable Model (GP-LVM).GP-LVMs provide a framework for jointly modelingconcurrent constraints on WiFi signal strength mea-surements (observation model) and a person’s motionfrom the phone’s inertial sensors (motion model).

Similarly, ActionSLAM [56] combines dead-reckoning based on special foot-mounted inertialsensors with observations of location-related actions.

SemanticSLAM generalizes the SLAM concept towork with landmarks that can be sensed by any of thephone sensors. In addition, it introduces the conceptsof seed and organic landmarks. Moreover, it is basedon leveraging the standard noisy cell phone sensors,without using any external hardware.

8 CONCLUSION

In this paper, we proposed SemanticSLAM as acalibration-free indoor localization system that pro-vides a novel SLAM algorithmic framework. We pro-vided the architecture and details of SemanticSLAMand how it combines dead-reckoning with seman-tic information discovered from multiple cell phonesensors about nearby landmarks to perform accuratelocalization and mapping simultaneously.

The system was evaluated on two testbeds: a uni-versity building and a shopping mall. Experimen-tal results showed that SemanticSLAM can discoverdifferent multi-model landmarks with a low falsepositive and negative rate of less than 1%. In addition,it can achieve 0.53 meters median localization errorin both testbeds with fast convergence time. Thisis even enhanced in the offline mode of operation.Compared to the state-of-the-art indoor localizationsystems, SemanticSLAM provides 62% enhancement inaccuracy and 33% in convergence time, highlightingits promise for next generation indoor location-basedservices.

Currently, we are expanding the system in multipledirections including collaborative localization of mul-tiple persons, exploiting other sensors (such as soundand light sensors), among others.

ACKNOWLEDGMENT

This work is supported in part by a Google ResearchAward to E-JUST and a grant from the Egyptian In-formation Technology Industry Development Agency(ITAC).




REFERENCES

[1] H. Wang, S. Sen, A. Elgohary, M. Farid, M. Youssef, and R. R.Choudhury, “No need to war-drive: Unsupervised indoorlocalization,” in Proceedings of the 10th international conferenceon Mobile systems, applications, and services. ACM, 2012, pp.197–210.

[2] R. Want, A. Hopper, V. Falco, and J. Gibbons, “The ActiveBadge Location System,” ACM Transactions on Information Sys-tems, vol. 10, no. 1, pp. 91–102, January 1992.

[3] R. Azuma, “Tracking requirements for augmented reality,”Communications of the ACM, vol. 36, no. 7, July 1997.

[4] J. Krumm et al., “Multi-camera multi-person tracking for EasyLiving,” in 3rd IEEE Int’l Workshop on Visual Surveillance,Piscataway, NJ, 2000, pp. 3–10.

[5] N. B. Priyantha, A. Chakraborty, and H. Balakrishnan, “TheCricket Location-Support System,” in 6th ACM MOBICOM,Boston, MA, August 2000.

[6] R. Fontana and S. Gunderson, “Ultra-wideband precision assetlocation system,” in Ultra Wideband Systems and Technologies,2002. Digest of Papers. 2002 IEEE Conference on, May 2002, pp.147–150.

[7] D. Niculescu and B. Nath, “Vor base stations for indoor 802.11positioning,” in MOBICOM, 2004, pp. 58–69.

[8] P. Bahl and V. N. Padmanabhan, “Radar: An in-building rf-based user location and tracking system,” in INFOCOM 2000.Nineteenth Annual Joint Conference of the IEEE Computer andCommunications Societies. Proceedings. IEEE, vol. 2. Ieee, 2000,pp. 775–784.

[9] M. Youssef and A. Agrawala, “The Horus WLAN locationdetermination system,” in Proceedings of the 3rd internationalconference on Mobile systems, applications, and services. ACM,2005, pp. 205–218.

[10] M. Youssef, M. Abdallah, and A. Agrawala, “Multivariateanalysis for probabilistic wlan location determination sys-tems,” in Mobile and Ubiquitous Systems: Networking and Ser-vices, 2005. MobiQuitous 2005. The Second Annual InternationalConference on. IEEE, 2005.

[11] M. Youssef and A. Agrawala, “Location-clustering techniquesfor WLAN location determination systems,” International Jour-nal of Computers and Applications, vol. 28, no. 3, 2006.

[12] A. E. Kosba, A. Abdelkader, and M. Youssef, “Analysis of adevice-free passive tracking system in typical wireless envi-ronments,” in New Technologies, Mobility and Security (NTMS),2009 3rd International Conference on, 2009.

[13] M. Seifeldin, A. Saeed, A. E. Kosba, A. El-Keyi, and M. Youssef,“Nuzzer: A large-scale device-free passive localization systemfor wireless environments,” Mobile Computing, IEEE Transac-tions on, vol. 12, no. 7, 2013.

[14] M. Seifeldin and M. Youssef, “A deterministic large-scaledevice-free passive localization system for wireless environ-ments,” in Proceedings of the 3rd International Conference onPErvasive Technologies Related to Assistive Environments. ACM,2010.

[15] K. El-Kafrawy, M. Youssef, A. El-Keyi, and A. Naguib, “Prop-agation modeling for accurate indoor wlan rss-based localiza-tion,” in Vehicular Technology Conference Fall (VTC 2010-Fall),2010 IEEE 72nd. IEEE, 2010, pp. 1–5.

[16] P. Krishnan, A. Krishnakumar, W.-H. Ju, C. Mallows, andS. Gamt, “A system for LEASE: Location estimation assistedby stationary emitters for indoor rf wireless networks,” inINFOCOM 2004. Twenty-third AnnualJoint Conference of the IEEEComputer and Communications Societies, vol. 2. IEEE, 2004, pp.1001–1011.

[17] M. Youssef, A. M. Youssef, C. Rieger, A. U. Shankar, and A. K.Agrawala, “Pinpoint: An asynchronous time-based locationdetermination system,” in MobiSys, 2006, pp. 165–176.

[18] Y.-C. Cheng, Y. Chawathe, A. LaMarca, and J. Krumm, “Accu-racy characterization for metropolitan-scale wi-fi localization,”in Proceedings of the 3rd international conference on Mobile sys-tems, applications, and services. ACM, 2005, pp. 233–245.

[19] K. Chintalapudi, A. Padmanabha Iyer, and V. N. Padmanab-han, “Indoor localization without the pain,” in Proceedings ofthe sixteenth annual international conference on Mobile computingand networking. ACM, 2010, pp. 173–184.

[20] M. Youssef, M. A. Yosef, and M. El-Derini, “Gac: energy-efficient hybrid gps-accelerometer-compass gsm localization,”in Global Telecommunications Conference (GLOBECOM 2010),2010 IEEE. IEEE, 2010, pp. 1–5.

[21] I. Constandache, R. R. Choudhury, and I. Rhee, “Towards mo-bile phone localization without war-driving,” in INFOCOM,2010 Proceedings IEEE. IEEE, 2010, pp. 1–9.

[22] M. Alzantot and M. Youssef, “Crowdinside: Automatic con-struction of indoor floorplans,” in Proceedings of the 20thInternational Conference on Advances in Geographic InformationSystems, ser. SIGSPATIAL ’12. ACM, pp. 99–108.

[23] M. Elhamshary and M. Youssef, “CheckInside: A fine-grainedindoor location-based social network,” in Proceedings of the2014 ACM International Joint Conference on Pervasive and Ubiq-uitous Computing, ser. UbiComp ’14, 2014.

[24] ——, “SemSense: Automatic construction of semantic indoorfloorplans,” in Indoor Positioning and Indoor Navigation (IPIN),2010 International Conference on, 2015.

[25] H. Durrant-Whyte and T. Bailey, “Simultaneous localizationand mapping: part i,” Robotics and Automation Magazine, IEEE,vol. 13, no. 2, pp. 99–110, 2006.

[26] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit, “Fast-SLAM: A factored solution to the simultaneous localizationand mapping problem,” in AAAI/IAAI, 2002, pp. 593–598.

[27] D. Roller, M. Montemerlo, S. Thrun, and B. Wegbreit, “Fast-SALM 2.0: an improved particle filtering algorithm for simul-taneous localization and mapping that provably converges,”in Proceedings of the International Joint Conference on ArtificialIntelligence, 2003.

[28] P. Moutarlier and R. Chatila, “An experimental system forincremental environment modelling by an autonomous mobilerobot,” in Experimental Robotics I. Springer, 1990, pp. 327–346.

[29] ——, “Stochastic multisensory data fusion for mobile robotlocation and environment modeling,” in 5th Int. Symposium onRobotics Research, vol. 1. Tokyo, 1989.

[30] M. G. Dissanayake, P. Newman, S. Clark, H. F. Durrant-Whyte,and M. Csorba, “A solution to the simultaneous localizationand map building (SLAM) problem,” Robotics and Automation,IEEE Transactions on, vol. 17, no. 3, pp. 229–241, 2001.

[31] T. Bailey, “Mobile robot localisation and mapping in exten-sive outdoor environments,” Ph.D. dissertation, University ofSydney, 2002.

[32] J. Neira and J. D. Tardos, “Data association in stochasticmapping using the joint compatibility test,” Robotics and Au-tomation, IEEE Transactions on, vol. 17, no. 6, pp. 890–897, 2001.

[33] M. A. Fischler and R. C. Bolles, “Random sample consensus: aparadigm for model fitting with applications to image analysisand automated cartography,” Communications of the ACM,vol. 24, no. 6, pp. 381–395, 1981.

[34] H. Shatkay and L. P. Kaelbling, “Learning topological mapswith weak local odometric information,” in IJCAI (2), 1997,pp. 920–929.

[35] S. Thrun, W. Burgard, and D. Fox, “A probabilistic approachto concurrent mapping and localization for mobile robots,”Autonomous Robots, vol. 5, no. 3-4, pp. 253–271, 1998.

[36] A. Araneda and A. Soto, “Statistical inference in mappingand localization for mobile robots,” in Advances in ArtificialIntelligence–IBERAMIA 2004. Springer, 2004, pp. 545–554.

[37] A. Doucet, N. De Freitas, and N. Gordon, Sequential MonteCarlo methods in practice. Springer, 2001.

[38] J. S. Liu and R. Chen, “Sequential Monte Carlo methods fordynamic systems,” Journal of the American statistical association,vol. 93, no. 443, pp. 1032–1044, 1998.

[39] M. Alzantot and M. Youssef, “UPTIME: Ubiquitous pedestriantracking using mobile phones,” in WCNC, 2012, pp. 3204–3209.

[40] N. Mohssen, R. Momtaz, H. Aly, and M. Youssef, “It’s thehuman that matters: Accurate user orientation estimation formobile computing applications,” in Proceedings of the 11th Inter-national Conference on Mobile and Ubiquitous Systems: Computing,Networking and Services, ser. MOBIQUITOUS ’14, 2014.

[41] R. C. Smith and P. Cheeseman, “On the representation andestimation of spatial uncertainty,” The international journal ofRobotics Research, vol. 5, no. 4, pp. 56–68, 1986.

[42] H. Xie, T. Gu, X. Tao, H. Ye, and J. Lv, “Maloc: a practicalmagnetic fingerprinting approach to indoor localization usingsmartphones,” in Proceedings of the 2014 ACM International Joint




Conference on Pervasive and Ubiquitous Computing. ACM, 2014,pp. 243–253.

[43] R. Want, A. Hopper, V. Falcao, and J. Gibbons, “The activebadge location system,” ACM Transactions on Information Sys-tems (TOIS), vol. 10, no. 1, pp. 91–102, 1992.

[44] A. Ward, A. Jones, and A. Hopper, “A New Location Tech-nique for the Active Office,” IEEE Personal Communications,vol. 4, no. 5, pp. 42–47, October 1997.

[45] M. Youssef, “Towards truly ubiquitous indoor localization ona worldwide scale,” in ACM SIGSpatial, 2015.

[46] R. Elbakly and M. Youssef, “A calibration-free RF localizationsystem,” in ACM SIGSpatial, 2015.

[47] J.-g. Park, B. Charrow, D. Curtis, J. Battat, E. Minkov, J. Hicks,S. Teller, and J. Ledlie, “Growing an organic indoor locationsystem,” in Proceedings of the 8th International Conference onMobile Systems, Applications, and Services, ser. MobiSys ’10.ACM, pp. 271–284.

[48] Y. Ji, S. Biaz, S. Pandey, and P. Agrawal, “Ariadne: a dynamicindoor signal map construction and localization system,” inMobiSys, 2006, pp. 151–164.

[49] A. Eleryan, M. Elsabagh, and M. Youssef, “Synthetic gener-ation of radio maps for device-free passive localization,” inGLOBECOM, 2011, pp. 1–5.

[50] H. Aly and M. Youssef, “New insights into wifi-based device-free localization,” in Proceedings of the 2013 ACM conference onPervasive and ubiquitous computing adjunct publication, 2013, pp.541–548.

[51] D. Gusenbauer, C. Isert, and J. Krosche, “Self-contained indoorpositioning on off-the-shelf mobile devices,” in Indoor position-ing and indoor navigation (IPIN), 2010 international conference on.IEEE, 2010, pp. 1–9.

[52] J. A. B. Link, P. Smith, N. Viol, and K. Wehrle, “Footpath:Accurate map-based indoor navigation using smartphones,”in Indoor Positioning and Indoor Navigation (IPIN), 2011 Interna-tional Conference on. IEEE, 2011, pp. 1–8.

[53] F. Li, C. Zhao, G. Ding, J. Gong, C. Liu, and F. Zhao, “Areliable and accurate indoor localization method using phoneinertial sensors,” in Proceedings of the 2012 ACM Conference onUbiquitous Computing. ACM, 2012, pp. 421–430.

[54] A. Rai, K. K. Chintalapudi, V. N. Padmanabhan, and R. Sen,“Zee: zero-effort crowdsourcing for indoor localization,” inProceedings of the 18th annual international conference on Mobilecomputing and networking. ACM, 2012, pp. 293–304.

[55] B. Ferris, D. Fox, and N. D. Lawrence, “WiFi-SLAM usingGaussian process latent variable models.” in IJCAI, vol. 7, 2007,pp. 2480–2485.

[56] M. Hardegger, D. Roggen, S. Mazilu, and G. Troster, “Ac-tionSLAM: Using location-related actions as landmarks inpedestrian SLAM,” in Indoor Positioning and Indoor Navigation(IPIN), 2012 International Conference on. IEEE, 2012, pp. 1–10.

Heba Abdelnasser is with the Wireless Research Center at E-JUST, Egypt. She received her B.Sc. in Computer and SystemsEngineering from Alexandria University in 2013, and is currentlypreparing her M.Sc. in Computer Science in the same university. Herresearch interests include mobile computing, location determinationsystems, and NLP.

Reham Mohamed is with the Wireless Research Center at E-JUST, Egypt. She received her B.Sc. in Computer and SystemsEngineering from Alexandria University in 2013, and is currentlypreparing her M.Sc. in Computer Science in the same university. Herresearch interests include mobile computing, location determinationsystems, and machine learning.

Ahmed Elgohary received a B.Sc. degree inComputer Engineering at Alexandria Univer-sity, Egypt in 2010 and an M.A.Sc. degreein Electrical and Computer Engineering atthe University of Waterloo, Canada in 2014.Ahmed is the recipient of the Best PaperAward Runner-Up, the 2013 IEEE Int. Conf.on Data Mining. He is currently pursuing aPh.D. in Computer Science at the Universityof Maryland. Ahmed is broadly interested indata mining and machine learning.

Moustafa Alzantot is a PhD student incomputer science at University of California,Los Angeles. His research interests includemobile sensing systems, context-aware sys-tems, and privacy. Moustafa received hisM.Sc. in computer Engineering from Egypt-Japan University of Science and Technology,and B.Sc. from Tanta University. He is therecipient of the 2013 COMESA innovationaward. Contact him at [email protected].

He Wang received his B.E. degree in Electri-cal Engineering from Tsinghua University in2011 and M.S. degree in Electrical and Com-puter Engineering from Duke University in2013. Since 2013, he has been pursuing hisPh.D. degree in Electrical and Computer En-gineering at University of Illinois at Urbana-Champaign. He is interested in mobile com-puting. Currently, his research focuses ondesigning mobile systems involving varioussensors.

Souvik Sen is a Principal Research Scientistin the Networking and Mobility group of HPLabs where he currently leads HPs initiativeson indoor location. His research on indoorlocation, wireless networking and mobile sys-tems has been published at top conferencessuch as MobiSys, MobiCom and NSDI. Hehas won HP’s Leading the Way Award in2013 and 2014. He earned his Ph.D. in Com-puter Science from Duke University. He isthe winner of ACM Student Research Com-

petition in 2010 and Duke University Outstanding Ph.D. PreliminaryExam Award in 2011.

Romit Roy Choudhury is an Associate Pro-fessor of ECE at the University of Illinois atUrbana Champaign (UIUC). He joined UIUCfrom Fall 2013, prior to which he was an As-sociate Professor at Duke University. Romitreceived his PhD in the CS department ofUIUC in Fall 2006. His research interestsare in wireless protocol design mainly at thePHY/MAC layer, and in mobile computing atthe application layer. Along with his students,he received a few research awards, including

the 2015 ACM Sigmobile Rockstar Award, the NSF CAREER Award,the Google Faculty Award, Hoffmann Krippner Award for Engineer-ing Innovations, best paper at Personal Wireless Communicationsconference, etc. Visit Romit’s Systems Networking Research Group(SyNRG), at http://synrg.csl.illinois.edu

Moustafa Youssef is an Associate Profes-sor at Alexandria University and Egypt-JapanUniversity of Science and Technology (E-JUST), Egypt. He received his Ph.D. degreein computer science from University of Mary-land, USA in 2004 and a B.Sc. and M.Sc. incomputer science from Alexandria University,Egypt in 1997 and 1999 respectively. Hisresearch interests include mobile wirelessnetworks, mobile computing, location deter-mination technologies, pervasive computing,

and network security. He has fifteen issued and pending patents. Heis an associate editor for the ACM TSAS, a previous area editor of theACM MC2R, and served on the organizing and technical committeesof numerous prestigious conferences. Dr. Youssef is the recipient ofthe 2003 University of Maryland Invention of the Year award, the2010 TWAS-AAS-Microsoft Award for Young Scientists, the 2012Egyptian State Award, the 2013 and 2014 COMESA InnovationAward, multiple Google Research Awards, the 2013 ACM SIGSpatialGIS Conference Best Paper Award, among others. He is also anACM Distinguished Speaker.

Documents

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. X ...Moustafa Youssef, Senior Member, IEEE Abstract—Indoor localization using mobile sensors has gained momentum lately. Most of the current