11
Online Filtering and Uncertainty Management Techniques for RFID Data Processing RAZIA HAIDER, FEDERICA MANDREOLI, RICCARDO MARTOGLIA FIM - University of Modena and Reggio Emilia Via Campi 213/b, 41125 Modena ITALY <name.surname>@unimo.it Abstract: RFID is one of the emerging technologies for a wide-range of applications, including supply chain and asset management, healthcare and intruder localization. However, the nature of an RFID data stream is noisy, redundant and unreliable, making it unsuitable for direct use in applications. In this paper, we propose specific RFID Online Filtering and Uncertainty Management techniques that operate on unreliable and imprecise data streams in order to transform them into reliable probabilistic data that can be meaningful to the applications. Our proposal makes use of an Hidden Markov Model (HMM) that continuously infers hidden variables (locations, in case of above example) based on sensor readings. The resulting data can be directly stored in a probabilistic database table for further analysis. All the techniques presented in this paper are implemented in a complete framework and succesfully evaluated in real-world object tracking scenarios. Key–Words: RFID data streams, Hidden Markov Model, Probabilistic Data Management, Object tracking 1 Introduction Data streams are possibly infinite sources of data that stream continuously while observing a physical phe- nomenon, e.g. temperature or humidity levels, tele- phone call records or audio video streaming, and so on. Data streams could be generated in different sce- narios by different devices, such as audio and video devices, Global Positioning System (GPS), Radio Fre- quency Identification (RFID) and other types of sen- sors. Among these, RFID is one of the emerging tech- nologies for a wide-range of applications, including supply chain and asset management [11, 30], health- care [21], monitoring the location and status of pa- tients in hospital environment [18], localizing intrud- ers for alerting services [5] and so on. RFIDs offer a promising alternative to barcode identification sys- tems. In an RFID system, an environment is deployed with the RFID readers and antennas while users and objects carry RFID tags. RFID readers detect the pres- ence of tags in their vicinity and generate streams of low-level observations in the form of TREs (Tag Read Events): (tag id, antenna id, time) that show when and where tags are being sighted. These low-level ob- servations must be transformed into high-level events meaningful to applications. For example, “Tag 101 was seen at antenna 12 at 10:00” must be transformed into meaningful relation instance such as “ Alice was in her office at 10:00”. Nevertheless, the management of RFID data in transforming low-level streams in to high-level events poses a number of challenges [4, 14]. In particular, the nature of an RFID data stream is noisy, redun- dant and unreliable, making it unsuitable for direct use in applications. RFID deployments, generally, pro- duce imprecise data mainly because of the following reasons: (a) Missing Readings: Loss of reading in- stances in which RFID tags are not detected by the antenna while actually being present within its cover- age area. This is a phenomenon whose causes are en- tirely separate from the specific application scenario and the technologies used in the construction of the devices; the incidence of this phenomenon is, how- ever, high and not negligible: recent studies report that an RFID reader is usually able to detect only 60% -70% of tags that are in its vicinity [9, 15]; (b) Data- Information Mismatch: Mismatch between the infor- mation to which the application is concerned and the data produced by the sensors. Typically an applica- tion is particularly interested in high-level information such as “who is in a certain place at a given time”, “the place where he can be”, for example, a room, a spe- cific area, or near by an object. The sensors are lim- ited to providing data in form of low-level signaling i.e., “when a tag is detected by a certain antenna”. For all of these reasons the generated stream of raw data becomes unreliable for RFID applications and makes them not suitable to be directly used for WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONS Razia Haider, Federica Mandreoli, Riccardo Martoglia E-ISSN: 2224-3402 231 Volume 11, 2014

Online Filtering and Uncertainty Management Techniques for RFID Data … · 2014. 11. 6. · Key–Words: RFID data streams, Hidden Markov Model, Probabilistic Data Management, Object

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Online Filtering and Uncertainty Management Techniquesfor RFID Data Processing

    RAZIA HAIDER, FEDERICA MANDREOLI, RICCARDO MARTOGLIAFIM - University of Modena and Reggio Emilia

    Via Campi 213/b, 41125 ModenaITALY

    @unimo.it

    Abstract: RFID is one of the emerging technologies for a wide-range of applications, including supply chain andasset management, healthcare and intruder localization. However, the nature of an RFID data stream is noisy,redundant and unreliable, making it unsuitable for direct use in applications. In this paper, we propose specificRFID Online Filtering and Uncertainty Management techniques that operate on unreliable and imprecise datastreams in order to transform them into reliable probabilistic data that can be meaningful to the applications. Ourproposal makes use of an Hidden Markov Model (HMM) that continuously infers hidden variables (locations,in case of above example) based on sensor readings. The resulting data can be directly stored in a probabilisticdatabase table for further analysis. All the techniques presented in this paper are implemented in a completeframework and succesfully evaluated in real-world object tracking scenarios.

    Key–Words: RFID data streams, Hidden Markov Model, Probabilistic Data Management, Object tracking

    1 Introduction

    Data streams are possibly infinite sources of data thatstream continuously while observing a physical phe-nomenon, e.g. temperature or humidity levels, tele-phone call records or audio video streaming, and soon. Data streams could be generated in different sce-narios by different devices, such as audio and videodevices, Global Positioning System (GPS), Radio Fre-quency Identification (RFID) and other types of sen-sors. Among these, RFID is one of the emerging tech-nologies for a wide-range of applications, includingsupply chain and asset management [11, 30], health-care [21], monitoring the location and status of pa-tients in hospital environment [18], localizing intrud-ers for alerting services [5] and so on. RFIDs offera promising alternative to barcode identification sys-tems. In an RFID system, an environment is deployedwith the RFID readers and antennas while users andobjects carry RFID tags. RFID readers detect the pres-ence of tags in their vicinity and generate streams oflow-level observations in the form of TREs (Tag ReadEvents): (tag id, antenna id, time) that show whenand where tags are being sighted. These low-level ob-servations must be transformed into high-level eventsmeaningful to applications. For example, “Tag 101was seen at antenna 12 at 10:00” must be transformedinto meaningful relation instance such as “ Alice wasin her office at 10:00”.

    Nevertheless, the management of RFID data intransforming low-level streams in to high-level eventsposes a number of challenges [4, 14]. In particular,the nature of an RFID data stream is noisy, redun-dant and unreliable, making it unsuitable for direct usein applications. RFID deployments, generally, pro-duce imprecise data mainly because of the followingreasons: (a) Missing Readings: Loss of reading in-stances in which RFID tags are not detected by theantenna while actually being present within its cover-age area. This is a phenomenon whose causes are en-tirely separate from the specific application scenarioand the technologies used in the construction of thedevices; the incidence of this phenomenon is, how-ever, high and not negligible: recent studies reportthat an RFID reader is usually able to detect only 60%-70% of tags that are in its vicinity [9, 15]; (b) Data-Information Mismatch: Mismatch between the infor-mation to which the application is concerned and thedata produced by the sensors. Typically an applica-tion is particularly interested in high-level informationsuch as “who is in a certain place at a given time”, “theplace where he can be”, for example, a room, a spe-cific area, or near by an object. The sensors are lim-ited to providing data in form of low-level signalingi.e., “when a tag is detected by a certain antenna”.

    For all of these reasons the generated stream ofraw data becomes unreliable for RFID applicationsand makes them not suitable to be directly used for

    WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONSRazia Haider, Federica Mandreoli, Riccardo Martoglia

    E-ISSN: 2224-3402 231 Volume 11, 2014

  • further analysis. To this end, in this paper we proposespecific RFID Online Filtering & Uncertainty Man-agement techniques that operate on unreliable andimprecise data streams in order to transform theminto reliable probabilistic data that can be meaning-ful to the applications. A common way of dealingwith such kind of imprecise data is to build a modelof the data and use stream of raw readings as inputto the model. Our proposal makes use of a tempo-ral graphical model [19], a Hidden Markov Model(HMM) [27] that continuously infers hidden variables(locations, in case of above example) based on sen-sor readings. Such a relation, becomes a probabilis-tic relation At(tagID,location,time,prob)that can be directly stored, for instance, in a (proba-bilistic) database table and queried to detect complexevents meaningful to applications [29]. An exampletuple is (101,O1,10:00,0.7), which indicatesthat tag 101 at time 10:00 was in office O1 with prob-ability 0.7.

    All the techniques presented in this paper are im-plemented in a complete framework and evaluatedunder real-cases in the context of location tracking.However, they can be applicable in other contexts ofRFID data management applications. The rest of thepaper is organized in the following way: Section 2describes some background notions about probabilis-tic graphical models, Section 3 describes the filteringand uncertainty management techniques we propose,Section 4 contextualizes the techniques in a completeRFID data acquisition and management framework,while in Section 5 we present extensive experimentsin real object tracking scenarios, showing a very goodreliability of the proposed techniques. Finally, Section6 analyzes related works and gives some concludingremarks.

    2 Background: Probabilistic Graph-ical Models

    2.1 Representation

    Graphical models [20] are the combination of prob-ability theory and graph theory. They provide a nat-ural tool for dealing with uncertainty and complexityproblems that exist in many real world applications.The graphical models basically work on the conceptof modularity; a complex system is built by combin-ing simpler parts. Probabilistic graphical models [19]are graphs in which nodes represent random variables.Arcs, or the lack of arcs, represent conditional in-dependence assumptions. Therefore, they provide acompact representation of joint probability distribu-tions.

    There are two types of probabilistic models: undi-rected and directed graphical models. DBNs are theexample of directed graphical models of stochasticprocesses. They are used to compactly represent thestochastic evolution of a set of variables over time,where the graph structure captures the complex in-terdependencies between the variables of the pro-cess. DBNs generalize HMMs and linear dynamicsystems (LDSs) [13] by representing the hidden (andobserved) state in terms of state variables, which canhave complex interdependencies. The graphical struc-ture provides an easy way to specify these conditionalindependencies, and hence, to provide a compact pa-rameterizations of the model.

    Since the solutions we present in this paper arebased on HMMs, we will now focus on HMMs.

    2.1.1 Hidden Markov Models

    HMMs have one discrete hidden node variable andone discrete or continuous observed node variable perslice. HMMs are an often used model for time seriesdata. They are used in various applications such asimage recognition, pattern recognition, data compres-sion and speech recognition. They represent probabil-ity distributions over sequences of observations.

    Definition: An HMM is formally defined as a fi-nite set of discrete states (it can be multidimen-sional), each of which is associated with a prob-ability distribution. Since the states are discrete,transitions among states are controlled by set ofprobabilities called transition probabilitiesAij =P (St+1 = S

    (i)|St = S(i)). In particular, stateobservations can be generated according to theassociated probability distributions. Only the ob-served value, not the state, is visible to an exter-nal observer; hence, states are hidden.

    Parameters of HMMs: In order to define an HMM,the following parameters are required:

    • N , the number of states in model S ={S1, S2, . . . , SN};• M , the number of possible observations;• The initial state distribution Π = {Πi}

    where Π = P {q0 = Si} , 1 ≤ i ≤ N ;• The state transition probabilities {Aij}

    where Aij = P {qt+1 = Sj |qt = Si} 1 ≤i, j ≤ N , where qt denotes the currentstate;

    • The observation/emission probabilitiesB = bi(j):

    WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONSRazia Haider, Federica Mandreoli, Riccardo Martoglia

    E-ISSN: 2224-3402 232 Volume 11, 2014

  • bi (j) = P {ot = vk|qt = j} ,1 ≤ j ≤ N, 1 ≤ k ≤M,where Vk denotes the kth observation andOt the current parameter vector.

    Given an HMM, there are two basic problems ofinterest that must be solved for the model to be usefulin real-world applications:

    • Inference: Given the observation sequenceO = o1, o2, ..., ot and a model λ =(N,M,Πi, Aij , bi(j)), how to choose a corre-sponding state sequence S = s1, s2, ...st whichis optimal in some meaningful sense (i.e. bestexplain the observation);

    • Learning: Given the observation sequenceO = o1, o2, ..., ot and a model λ =(N,M,Πi, Aij , bi(j)), how to adjust the modelparameters in order to maximize P (O|λ).

    2.2 Inference

    In various real-world data streams, the elements of in-terest may not be directly observable (e.g. locationinformation in a raw data stream coming from RFIDtracking application [29, 16, 31]), or it may be veryexpensive to measure them. A common way to pro-cess such kind of data streams is to continuously inferthe value of the hidden variables by using observeddata. Different types of methods allow us to combineprior domain knowledge about the system behaviorwith the actually observed variables to compute thebest possible estimate of the hidden variables. Thistask is known as “inference”.

    While “exact inference” algorithms can be effec-tively used in simple cases, such as linear dynamicsystems (LDSs), most of them face severe challengesfor large, densely connected models with high up-date rates. In order to handle the intractability inreal-world scenarios “approximate inference” algo-rithms have been developed. In particular, recursiveestimate techniques such as Particle Filtering [8] arememoryless inferencing techniques which are partic-ularly effective in such contexts. They are MonteCarlo sampling based techniques implementing recur-sive Bayesian filters. The basis of the method is torepresent the posterior density by a set of random par-ticles with associated weights and then compute esti-mates based on these samples and weights. The higherweights specify more probable states.

    2.3 Learning

    A probabilistic graphical model is usually representedby the Conditional Probability Distributions (CPD),

    which are referenced as parameters of the model.These CPDs are used to define the transition modelP (St|St−1) and the observation model P (Ot|St).“Learning” is the process of estimating these parame-ters from training data.

    Maximum Likelihood Estimation (MLE) [26] isone of the most widely used statistical techniques tolearn the parameters of a CPD. From the training data,this provides an estimate of the values of the param-eter θ of the CPD, which maximizing the likelihoodof observing that data. Specifically, given a data sam-ple X1, ..., Xn, assumed to be independent and iden-tically distributed (iid) from a parametric distributionwith unknown parameters, the purpose of MLE is toestimate the value of the unknown parameters.

    In particular, the MLE method allows us to de-rive the joint probability distribution P (Ot, St). Fi-nally, by applying Bayes’ Theorem, we can obtainthe conditional probability distribution of the obser-vations P (Ot|St).

    3 Online Filtering & UncertaintyManagement

    In this section, we will present in detail the techniqueswe exploit in order to provide filtering and uncertaintymanagement to RFID data. The reference scenariowill be location tracking. The techniques will then beput in context in Section 4, where they will be shownas being at the heart of a complete RFID data manage-ment framework.

    3.1 Representation

    The detailed block diagram of the involved processis shown in Figure 1. In the reference scenario, theinterest of the application is to infer the positions ofpeople and/or objects over time on the basis of theRFID readings collected by the reader. Positions arenot being directly observable and are considered as thehidden variables, while the readings are our observ-able events, or simply our “observations”. Thus, thisprocess uses an HMM to produce, at each timestamp,a distribution over each tag location (i.e. the hiddenvariables or states) based on observations, i.e. sen-sor readings. These observations include four typesof information: 1) the identifier of the tag the read-ing concerns to; 2) the identifier of the antenna(s) thetag is seen by; 3) the Received Signal Strength Indica-tor(s) (RSSIs) of the reading; 4) the timestamp of thereading. The employed model allows us to combineprior domain knowledge about the system behaviorwith the actual observations, so to compute the most

    WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONSRazia Haider, Federica Mandreoli, Riccardo Martoglia

    E-ISSN: 2224-3402 233 Volume 11, 2014

  • RFID Data Online Filtering & Uncertainty Management

    ……………….!

    (Person, LocID, Timestamp, Prob) !(Bob, L1, t1, 0.8) !

    …………!

    Hidden&Markov&Model&(HMM)&

    Figure 1: Block Diagram of the Online Filtering andUncertainty Management process

    likely values of the hidden variables. While observa-tions are directly evaluable, the prior knowledge aboutthe system is represented by Conditional ProbabilityDistributions (CPDs) which are referenced as the pa-rameters of the HMM.

    A graphical representation of designed HMM isshown in Figure 2. The nodes of the graph representthe variables (hidden states and observations) of themodeled system, while the directional arcs representthe concept of “causality”, whose degree is indicatedby the corresponding CPD. Specifically, square nodesin the graph represent the observationsO and thus cor-respond to measurements collected by RFID antennas,while round nodes represent states and thus coincidewith the location of people (for these reason, in thefollowing we will denote each of these states as L).It is noted that, according to the well-known Markovprinciple, the model assumes that the variables at timet directly depend on the variables at time t and t − 1only and, hence, two consecutive time instances aresufficient for completely representing the whole sys-tem. The other parameters of the HMM, or the CPDthat describe the relationship between the variablesthat are represented as directional arcs in Figure 2, arelisted below:

    1. the initial states distribution P (L0) encodesknowledge about the initial state of the system(i.e. at the time instant 0);

    2. the transition probability distributionP (Lt+1|Lt) encodes the knowledge of howthe state of the hidden variables at time instantt+ 1 depends on the state at time instant t;

    T!=0! T=t! T=t+1! T=t+n!

    L0!

    O0!

    …!

    …!

    Lt!

    Ot!

    Lt+1!

    Ot+1!

    …!

    …!

    Lt+n!

    Ot+n!

    P(Ot|Lt )!

    P(Lt+1|Lt )!P(L0 )!

    Figure 2: Graphical Representation of the HiddenMarkov Model Used

    3. the observation probability distributionP (Ot|Lt) encodes the knowledge of howthe observations at time instant t depend on thestate of the hidden variables at time instant t.

    3.2 Learning

    In order to maximize the effectiveness of the filteringand uncertainty management techniques in the con-sideed location tracking context, the above discussedCPDs are modeled as follows:

    1. the initial states probability P (L0): it is as-sumed to be a uniform distribution among all thepossible locations;

    2. the transition probability P (Lt|Lt−1): it is mod-eled as a matrix whose rows ad columns are as-sociated to the available locations so that eachcell [i, j] contains the probability value of havinga movement from location i to location j (as anexample, if two locations are separated by a wallthe corresponding cell will contain the value 0);

    3. finally, the observation probability P (Ot|Lt):this information is typically not available and,thus, has to be learned from training data. Tothis end, we adopt a Maximum Likelihood Es-timation (MLE) approach: given learning data,we estimate the value of the probability functionparameter that maximizes the likelihood of theobserved data (i.e. that makes the learning data“most likely”). Actually, MLE allows us to com-pute the conjunctive probability P (Ot, Lt), fromwhich observation probability P (Ot|Lt) can beeasily computed by applying the Bayes theorem.

    3.3 Inference

    Our final aim of modeling a stochastic process withan HMM is to obtain the posterior probability dis-

    WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONSRazia Haider, Federica Mandreoli, Riccardo Martoglia

    E-ISSN: 2224-3402 234 Volume 11, 2014

  • RFID processing (detailed)

    OT,t

    P(OT,t|Lt)

    Resampling

    InferenceParticle

    init

    P(Lt+1|Lt)

    P(L0)

    Prediction

    Filtering

    prevParticles

    pt-1 i

    Particlespt

    i

    RFID Processing

    filteredparticles

    PT,t(L)

    Figure 3: Detail of the steps performed for RFID On-line Filtering & Uncertainty Management inference

    tribution P (Lτit ) over the hidden variable Lτit (loca-

    tion of tag τi at time instant t) given the observedmeasurements (“inference” task). Among the others,we decided to exploit the popular Monte Carlo al-gorithm called Particle Filtering [3], usually adoptedin sample-based inference processes. The algorithmworks by computing and constantly maintaining setsof particles to describe the historical and present statesof the model. Figure 3 represents a schema of thesteps executed by the algorithm at each time instantt. Specifically, given the observed values oτit for eachidentified tag τi, the algorithm works by iteratively ex-ecuting the following steps:

    Initialization: during this phase, an initial set ofparticles is created by randomly sampling from theinitial states probability P (L0);

    Prediction: during this phase, the state of hiddenvariables at time t is estimated by using their state attime t−1 and exploiting the parameters of the HMM.More precisely, for each existing particle pit−1 at timet− 1 a new particle pit is created for time t by sam-pling from P (Lt|Lt−1);

    Filtering: in this phase, the observation ot ar-rived at time t are used to update the states previouslyestimated for time t. More precisely, each particlepit is assigned a weight based on the values of theobserved variables at time t and on the observationprobability P (Ot|Lt). This weight is proportional toP (Ot = o

    τit |Lt = λ) where λ is the location of pit;

    Re-sampling: in this phase, the particles createdin the Filtering step are re-sampled in order to gener-ate a new set of particles, all with the same weight.This task is necessary in order to avoid degeneracy,i.e. the case where a single particle has all the weight.

    Broadly speaking, each particle pit represents aguess about the location of tag τi. Then, after anumber of iterations, the inference task is performed:to compute the posterior probability P (Lτit ) we justneed to count the number of particles in each locationand divide it by the total number.

    4 Filtering Techniques in Context:the Complete RFID Data Manage-ment Framework

    In this section, we will contextualize the filtering anduncertainty management techniques presented in thispaper in a complete RFID Data Management Frame-work, the one we exploited in order to verify their ef-fectiveness in a location tracking application.

    • At the lowest part of the framework, RFID read-ers and tags, managed in a Data AcquisitionLayer (see Section 4.1), provide raw RFID data;

    • Raw data is the input to a Data Filtering Layer,at the heart of the framework, which imple-ments the techniques discussed in the previoussections. The results of their application is thetransformation of the raw data into a streamof “filtered” tuples, according to the schema(Person, Location, T ime, Probability);

    • such filtered probabilistic tuples can then bemanaged, queried and stored in a standard proba-bilistic database, as will be briefly discuss in Sec-tion 4.2.

    4.1 RFID Data Acquisition

    At the lowest level of the framework is the DataAcquisition Layer, which is populated by RFID de-vices including RFID tags and readers. RFID tagsare attached to the objects and people that have to betracked, while RFID readers receive data from thesetags in the form of radio signals and convert them indigital form to pass it to the upper levels of the frame-work. In the following, we will give some details onthe hardware configuration that we exploited (and towhich the results presented in Section 5 will refer).

    Figures 4, 5 and 6 present the RFID reader, anten-nas and tags we employed in instantiating our frame-work. These are some specifications that can be usefulin order to better understand how the proposed tech-niques work and perform:

    • Reader: we used a fixed reader that can interro-gate tags at distances of up to 300 feet (100 me-

    WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONSRazia Haider, Federica Mandreoli, Riccardo Martoglia

    E-ISSN: 2224-3402 235 Volume 11, 2014

  • Figure 4: A fixed reader of our framework

    Figure 5: An Elliptical Polarized Antenna of ourframework

    ters) (Figure 4). The reader establishes the con-nection to the host system by using the RS422 in-terface. For data exchange, a simple master/slaveprotocol is used by the reader. The protocolalso gives us some additional information such astime of data reception, signal strength and num-ber of times the tag has been read by the reader;

    • Antennas: the choice of antennas depended onthe type and requirement of the application. AnElliptical Polarized Antenna (Figure 5) has awide apex angle of (120◦), which enables it tocover large read zone. Therefore, it is capableof reading a large number of tags at one timeeven at fast speeds. The orientation of the tagsrelative to the antenna is not important. On theother hand, a Linear Polarized Antenna is moresuitable for applications in which read zones arerestricted and data collection must be selective.

    Figure 6: An active tag exploited in our framework

    This antenna has smaller apex angle of (60◦).The field of antenna is either horizontally or ver-tically polarized depending on the mounting di-rection, thus requiring the tag to have the sameorientation. Elliptical antennas are the ones mostsuited to our purposes and have been used for fi-nal experimentation;

    • Tags: we employed active RFID tags based onUHF radio frequency (Figure 6). The tags arecapable of providing long range for wireless ap-plications and can transmit data at distances of upto 300 feet (100 meters) to readers. The tags con-tinuously send static data written in their memoryat pre-programmed intervals known as ping rate.Ping rate can be one second to four minutes (onesecond in our setup). Due to the ultra-low powerconsumption of the active tags, an operationallifetime of up to 6 years can be expected mak-ing them suitable for identification and trackingapplications.

    4.2 RFID Data Storage and Querying

    .Even if not at the focus of this paper, we will

    complete the description of the framework by provid-ing a short description of how the tuples produced bythe data filtering layer can be stored and queried inthe context of a full RFID location tracking applica-tion. Since the output of our filtering and uncertaintymanagement techniques are filtered probabilistic tu-ples, they can be directly and effectively stored in aprobabilistic database management system (an exam-ple is MayBMS [1]). A probabilistic database storesdata by means of special U-relational tables, provid-ing a complete and concise representation of the largenumber of possible worlds that are generated in thepresence of probabilistic tuples [2]. It also provides anexpressive query language that supports the entire setof capabilities offered by SQL and extends it with fea-tures designed to support the probability and to work

    WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONSRazia Haider, Federica Mandreoli, Riccardo Martoglia

    E-ISSN: 2224-3402 236 Volume 11, 2014

  • --Who was at ‘L1’ in first minute?

    SELECT tagId, conf()

    FROM Pw

    WHERE LocationId='L1'

    AND instant

  • 0

    0.001

    0.002

    0.003

    0.004

    0.005

    0.006

    0.007

    0.008

    0.009

    0.01

    1 51 101 151 201 251 301 351 401 451

    Erro

    r

    Time[s]

    Avg. Location Error

    (a) Estimated vs Ground truth Error

    0

    0.001

    0.002

    0.003

    0.004

    0.005

    0.006

    0.007

    0.008

    0.009

    0.01

    1 51 101 151 201 251 301 351 401 451

    Erro

    r

    Time[s]

    Estimated Vs Groud truth Error

    (b) Location Error

    Figure 9: Case 1: Stay with 1 Tag

    mated answer reports the same location as the groundtruth (the higher the value the better).

    In the following, there is description of each caseand the obtained results from these cases.

    Case 1: Stay with 1 Tag: this case considers oneperson with an RFID tag moving between locationsbut staying for some time on each of the location. Fig-ures 9 (a,b) show the achieved results for this case.The average precision is 96.95%, which is a very sat-isfying figure.

    Case 2: Stay with 2 Tags: in this case, two peo-ple wearing RFID tags walk side by side and stay oneach location. Figures 10 (a,b) show the results of ex-periments done in this case. Both persons were walk-ing together and change their locations on the sametime instants and, again, this behavior is shown by allgraphs in overlapping results for both tags. The aver-age precision is 95.39%.

    Case 3: No Stay with 1 Tag: in this case, a per-son wearing an RFID tag that transmits every secondrapidly moves between L1, L2 and L3 and does notstay at any of them. Please note that the movementscenario of this case (and case 4) could potentially bea difficult situation for capturing the exact locations,due to the fact that people move rapidly and do notstay on one particular point. Therefore, it could not beeasy to produce stable RSSI values from the RFID an-tennas. Figures 11 (a,b) show the estimated vs groundtruth error and location error for this case, respec-

    0

    0.001

    0.002

    0.003

    0.004

    0.005

    0.006

    0.007

    0.008

    0.009

    0.01

    1 51 101 151 201

    Erro

    r

    Time[s]

    Avg. Location Error

    Tag 1

    Tag 2

    (a) Estimated vs Ground truth Error

    0

    0.001

    0.002

    0.003

    0.004

    0.005

    0.006

    0.007

    0.008

    0.009

    0.01

    1 51 101 151 201

    Erro

    r Time[s]

    Estimated Vs Ground truth Error

    Tag 1

    Tag 2

    (b) Location Error

    Figure 10: Case 2: Stay with 2 Tags

    tively. As we can see from Figure 11 (b), our tech-niques reported a wrong estimated location at onlyone second. Similarly, if we consider estimated vsground truth error, it is clear from Figure 11 (a) thatthe highest peak of error is nearly 0.008 which is awrong location according to ground truth, while allother values are lower and correspond to correct loca-tions. The resulting average precision for this case is99%, again a very satisfying result.

    Case 4: No Stay with 2 Tags: this case considersthe same movement scenario of case 3 but the num-ber of involved people is two, holding RFID tags andwalking side by side. Figure 12 (a,b) show the ob-tained results for this case. Both persons were walk-ing side by side and changing their locations togetherand this behavior of movement is very clear from theresulting graphs. In this case, the average precision is87.05%.

    6 Related Works and ConcludingRemarks

    In last few decades, RFID technology has emergedsignificantly with many real time applications, suchas product tracking and asset management, object andpeople authentication, health care etc. Nevertheless,data management in these RFID applications poses anumber of challenges [4]. Among the issues that needto be effectively faced in most RFID deployments,

    WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONSRazia Haider, Federica Mandreoli, Riccardo Martoglia

    E-ISSN: 2224-3402 238 Volume 11, 2014

  • 0

    0.001

    0.002

    0.003

    0.004

    0.005

    0.006

    0.007

    0.008

    0.009

    0.01

    1 21 41 61 81

    Erro

    r

    Time[s]

    Avg. Location Error

    (a) Estimated vs Ground truth Error

    0

    0.001

    0.002

    0.003

    0.004

    0.005

    0.006

    0.007

    0.008

    0.009

    0.01

    1 21 41 61 81

    Erro

    r

    Time[s]

    Estimated Vs Ground truth Error

    (b) Location Error

    Figure 11: Case 3: No Stay with 1 Tag

    avoiding missing/wrong readings and being able toextract high-level complex events from the huge vol-umes of low-level atomic events acquired by the sen-sors are particularly critical and challenging tasks.

    Several techniques have been proposed for theanalysis and processing of raw noisy RFID data[28]. A number of techniques propose to clean datastreams deterministically. For instance, [10] proposesa declarative framework for RFID data cleaning andprocessing which makes use of a window-based adap-tive smoothing filter, producing more reliable RFIDdata streams by interpolating missed readings.

    Other techniques, instead, exploit the probabilis-tic nature of RFID data and manage their inherentuncertainty in the form of probabilities and correla-tions, so to achieve even higher effectiveness in theapplication scenarios they are applied to [29, 31, 17].For instance, [29, 16] generate probabilistic streamsby inference on an HMM. Then, probabilistic infer-ence is required in order to extract high-level com-plex events from the low-level atomic events acquiredby the readings. For example, in tracking applica-tions, the location of the objects is unknown to thesystem and observed low level sensor data is trans-

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.1

    1 51 101

    Erro

    r

    Time[s]

    Avg. Location Error

    Tag 1

    Tag 2

    (a) Estimated vs Ground truth Error

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.1

    1 51 101

    Erro

    r

    Time[s]

    Estimated vs Ground truth Error

    Tag 1

    Tag 2

    (b) Location Error

    Figure 12: Case 4: No Stay with 2 Tags

    lated into precise and more reliable estimates aboutthe location of these objects [29, 17]. Note that allsuch RFID systems define locations on the basis ofactual places/areas which are of interest to the finalusers (e.g. a restricted-access room), as reflected alsoby the supported queries and the produced results (e.g.“Find out which rooms entered Paul today”).

    In [6, 7], Deshpande et al. discuss techniquesbased on probabilistic model in order to handle inputerrors and inaccuracies. These techniques are basedon temporal and spatial correlations to predict miss-ing values, to identify outliers and to approximate an-swers to queries. Most of them mainly deal with inac-curacy errors present in raw RFID data, filtering andsmoothing operations before feeding into higher levelapplications, thus not dealing (and not exploiting) the“meaning” of the managed information.

    A number of probabilistic techniques have alsobeen proposed for the analysis and transformation ofRFID low-level data streams into meaningful infor-mation in order to deal with data-information mis-match problem. These techniques, exploit the proba-bilistic nature of RFID data and manage their inherentuncertainty in the form of probabilities and correla-tions, so to achieve even higher effectiveness in the ap-plication scenarios they are applied to [17, 29, 31, 32].

    WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONSRazia Haider, Federica Mandreoli, Riccardo Martoglia

    E-ISSN: 2224-3402 239 Volume 11, 2014

  • For instance [16, 29] generate probabilistic streams byinference on an HMM. Then, probabilistic inference isrequired in order to extract high-level complex eventsfrom the low-level atomic events acquired by the read-ings. For example, in tracking applications, the loca-tion of the objects is unknown to the system and theobserved low level sensor data is translated into pre-cise and more reliable estimates about the location ofthese objects by implementing an HMM [29, 31].

    In this paper, we presented filtering and uncer-tainty management techniques for RFID probabilisticdata management which are able to convey the RFIDdata to higher level data information modules, filter-ing inaccuracy errors and smoothing the raw RFIDstreams.

    The proposed techniques, also in the light of thesuccessful experimental evaluation we performed inreal-world object tracking scenarios: (a) differentlyfrom most of the techniques available in the literature,work effectively without knowing in advance any spe-cific information characterizing data uncertainty, suchas the entire probability density function or standarderror data available; (b) achieve the ultimate goal oftransforming raw RFID data into reliable meaningfulprobabilistic data streams.

    In the future, we will continue our work in mak-ing RFID data available to high level data manage-ment modules. In particular, by extending the tech-niques developed in complementary research fields,such as semantic data sharing and querying [12, 22,23, 24, 25], toward RFID data, we will contemplatethe feasibility of querying in a uniform way multipleRFID streams together with other kinds of heteroge-neous data sources.

    References:

    [1] http://www.cs.cornell.edu/bigreddata/maybms/.

    [2] L. Antova, T. Jansen, C. Koch, and D. Olteanu.Fast and simple relational processing of uncer-tain data. In Data Engineering, 2008. ICDE2008. IEEE 24th International Conference on,pages 983–992. IEEE, 2008.

    [3] Doucet Arnaud, Nando de Freitas, and GordonNeil. Sequential Monte Carlo Methods in Prac-tice. Springer, 2005.

    [4] S. S Chawathe, V. Krishnamurthy, S. Ramachan-dran, and S. Sarma. Managing RFID data. InProceedings of the Thirtieth international con-ference on Very large data bases-Volume 30,pages 1189–1195, 2004.

    [5] R. Cucchiara, M. Fornaciari, R. Haider, F. Man-dreoli, R. Martoglia, A. Prati, and S. Sassatelli.A Reasoning Engine for Intruders’ Localizationin Wide Open Areas using a Network of Cam-eras and RFIDs. In Proceedings of 1st IEEEWorkshop on Camera Networks and Wide AreaScene Analysis. IEEE, 2011.

    [6] A. Deshpande, C. Guestrin, and S. Madden. Us-ing probabilistic models for data managementin acquisitional environments. In Proc. CIDR,pages 317–328, 2005.

    [7] A. Deshpande, C. Guestrin, S. R Madden, J. MHellerstein, and W. Hong. Model-based approx-imate querying in sensor networks. The VLDBJournal, 14(4):417–443, 2005.

    [8] A. Doucet, S. Godsill, and C. Andrieu. Onsequential monte carlo sampling methods forbayesian filtering. Statistics and computing,10(3):197–208, 2000.

    [9] C. Floerkemeier and M. Lampe. Issues withRFID usage in ubiquitous computing applica-tions. Pervasive Computing, pages 188–193,2004.

    [10] M. J Franklin, S. R Jeffery, S. Krishna-murthy, F. Reiss, S. Rizvi, E. Wu, O. Cooper,A. Edakkunni, and W. Hong. Design consid-erations for high fan-in systems: The HiFi ap-proach. In Proc. of the CIDR Conf, 2005.

    [11] H. Gonzalez, J. Han, X. Li, and D. Klabjan.Warehousing and analyzing massive RFID datasets. In 22nd International Conference on DataEngineering, ICDE’06. IEEE Computer Society,2006.

    [12] F. Grandi, F. Mandreoli, R. Martoglia,E. Ronchetti, M. R. Scalas, and P. Tiberio.Ontology-based personalization of e-government services. In Intelligent UserInterfaces: Adaptation and Personalization Sys-tems and Technologies, Constantinos Mourlasand Panagiotis Germanakos (Ed.), IGI Global,pages 167–187. 2008.

    [13] G:Welch and G.Bishop. An introduction to thekalman filter. 2002.

    [14] Y. Hu, S. Sundara, T. Chorma, and J. Srinivasan.Supporting rfid-based item tracking applicationsin oracle dbms using a bitmap datatype. In Pro-ceedings of the 31st international conference onVery large data bases, pages 1140–1151. VLDBEndowment, 2005.

    WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONSRazia Haider, Federica Mandreoli, Riccardo Martoglia

    E-ISSN: 2224-3402 240 Volume 11, 2014

  • [15] S. R Jeffery, M. Garofalakis, and M. J Franklin.Adaptive cleaning for RFID data streams. InProceedings of the 32nd international confer-ence on Very large data bases, pages 163–174,2006.

    [16] B. Kanagal and A. Deshpande. Online filtering,smoothing and probabilistic modeling of stream-ing data. In Data Engineering, 2008. ICDE2008. IEEE 24th International Conference on,pages 1160–1169, 2008.

    [17] N. Khoussainova, M. Balazinska, and D. Suciu.Probabilistic event extraction from RFID data.pages 1480–1482, 2008.

    [18] D.S. Kim, J. Kim, S.H. Kim, and S.K. Yoo. De-sign of RFID based the Patient Management andTracking System in hospital. In Engineeringin Medicine and Biology Society, EMBS. 30thAnnual International Conference of the IEEE,pages 1459–1461. IEEE, 2008.

    [19] D. Koller and N. Friedman. Probabilistic graph-ical models: principles and techniques. TheMIT Press, 2009.

    [20] S.L. Lauritzen. Graphical models, volume 17.Oxford University Press, USA, 1996.

    [21] A. Anny Leema and M. Hemalatha. An effectiveand adaptive data cleaning technique for colossalrfid data sets in healthcare. WSEAS Trans. Info.Sci. and App., 8(6):243–252, 2011.

    [22] F. Mandreoli and R. Martoglia. Knowledge-based sense disambiguation (almost) for allstructures. Information Systems (Information),36(2):406–430, 2011.

    [23] F. Mandreoli, R. Martoglia, W. Penzo, andS. Sassatelli. Data-sharing p2p networks withsemantic approximation capabilities. IEEE In-ternet Computing (IEEE), 13(5):60–70, 2009.

    [24] F. Mandreoli, R. Martoglia, W. Penzo, S. Sas-satelli, and G. Villani. Sri@work: Efficient andeffective routing strategies in a pdms. In Pro-ceedings of the 8th International Conference onWeb Information Systems Engineering, Decem-ber 2007 (WISE 2007), pages 285–297, 2007.

    [25] F. Mandreoli, R. Martoglia, and E. Ronchetti.Versatile structural disambiguation for semantic-aware applications. In Proceedings of the 14thACM International Conference on InformationKnowledge and Management, November 2005(ACM CIKM 2005), pages 209–216, Bremen,Germany, 2005.

    [26] I.J. Myung. Tutorial on maximum likelihood es-timation. Journal of Mathematical Psychology,47(1):90–100, 2003.

    [27] Lawrence R. Rabiner. Readings in speech recog-nition. chapter A Tutorial on Hidden MarkovModels and Selected Applications in SpeechRecognition, pages 267–296. 1990.

    [28] J. Rao, S. Doraiswamy, H. Thakkar, and L. SColby. A deferred cleansing method for RFIDdata analytics. In Proceedings of the 32nd in-ternational conference on Very large data bases,pages 175–186, 2006.

    [29] C. Ré, J. Letchner, M. Balazinksa, and D. Su-ciu. Event queries on correlated probabilisticstreams. In Proceedings of the 2008 ACM SIG-MOD international conference on Managementof data, pages 715–728, 2008.

    [30] Chien-Yuan Su and Jinsheng Roan. Investi-gating the impacts of rfid application on sup-ply chain dynamics with chaos theory. WSEASTrans. Info. Sci. and App., 8(1):1–17, 2011.

    [31] T. Tran, C. Sutton, R. Cocci, Y. Nie, Y. Diao, andP. Shenoy. Probabilistic inference over RFIDstreams in mobile environments. In IEEE In-ternational Conference on Data Engineering,pages 1096–1107, 2009.

    [32] E. Welbourne, N. Khoussainova, J. Letchner,Y. Li, M. Balazinska, G. Borriello, and D. Su-ciu. Cascadia: a system for specifying, detect-ing, and managing rfid events. In Proceeding ofthe 6th international conference on Mobile sys-tems, applications, and services, pages 281–294.ACM, 2008.

    WSEAS TRANSACTIONS on INFORMATION SCIENCE and APPLICATIONSRazia Haider, Federica Mandreoli, Riccardo Martoglia

    E-ISSN: 2224-3402 241 Volume 11, 2014