Ieee Smart Car Parking

Embed Size (px)

Citation preview

  • 8/16/2019 Ieee Smart Car Parking

    1/6

    2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP)

    Symposium on Information Processing

    Singapore, 21–24 April 2014

    Smart Car Parking: Temporal Clustering and

    Anomaly Detection in Urban Car Parking

    Yanxu Zheng∗

    , Sutharshan Rajasegarar†, Christopher Leckie∗

    , Marimuthu Palaniswami†∗ Dept. of Computing and Information Systems,   † Dept. of Electrical and Electronic Eng.

    The University of Melbourne, Australia;

    E-mails: {yanxuz@student., sraja@, caleckie@, palani@}unimelb.edu.au

     Abstract—A major challenge for modern cities is how tomaximise the productivity and reliability of urban infrastructure,such as minimising road congestion by making better use of the limited car parking facilities that are available. To achievethis goal, there is growing interest in the capabilities of theemerging Internet of Things (IoT), which enables a wide rangeof physical objects and environments to be monitored in finedetail by using low-cost, low-power sensing and communicationtechnologies. While there has been growing interest in the IoT

    for smart cities, there have been few systematic studies that candemonstrate whether practical insights can be extracted fromreal-life IoT data using advanced data analytics techniques. Inthis work, we consider a smart car parking scenario based onreal-time car parking information that has been collected anddisseminated by the City of San Francisco. We investigate whetheruseful trends and patterns can be automatically extracted fromthis rich and complex data set. We demonstrate that by usingautomated clustering and anomaly detection techniques we canidentify potentially interesting trends and events in the data. Tothe best of our knowledge, we provide the first such analysis of the scope for clustering and anomaly detection on real-time carparking data in a major urban city.

    I. INTRODUCTION

    Around 70% of the world’s population is expected to livein cities and surrounding regions by 2050 [5]. Therefore, citieswill need to be better managed, if only to survive as platformsthat enable economic, social and environmental well-being. A“smart city” [1], [15], according to Forrester, is one that usesinformation and communications technologies (ICT) to makethe critical infrastructure and services of a city, such as publicsafety, transportation and utilities, more aware, interactiveand efficient [5]. Smart city management is technologicallypredicated on the emergent Internet of Things (IoT) [1] -a radical evolution of the current Internet into a network of interconnected   objects, such as sensors, parking meters,

    energy measuring devices and actuators, that not only harvestsinformation from the environment (sensing) and interacts withthe physical world (actuation/command/control), but also usesexisting Internet standards to provide services for informationtransfer, analytics, applications and communications [12].

    Wireless Sensor Networks (WSNs), seamlessly integratedinto urban infrastructure (transport, health, environment), formthe sensing-actuation core   objects   of an IoT in a smart citysystem and information will be shared across diverse platformsand applications. The growth in low-cost, low-power sensingand communication technologies enables a wide range of physical objects and environments to be monitored in finedetail. The detailed, dynamic data that can be collected from

    devices on the IoT provides the basis for new business andgovernment applications in areas such as public safety, trans-port logistics and environmental management. A key challengein the development of such applications is how to modeland interpret the large volumes of complex data streams thatwill be generated by the IoT. Examples of such large scaledeployments of sensors include (1) SmartSantander [1], [3],in the Spanish city of Santander, with around 12,000 sensors

    installed in places such as lamp posts for sensing temperature,CO, noise, light and buried in the asphalt for parking sensing,(2) in the city of San Francisco (SF), USA, where around8,200 wireless parking sensors in neighborhoods across thecity are installed in on-street spaces, which can enable realtime monitoring and also inform drivers of the available vacantparking lots and the rates in real time.

    While there has been much discussion of the potential forsmart cities based on the IoT, there have been few systematicstudies of how data analytics can provide practical insightsfrom IoT data. The collection of such data is intended to beused for improving traffic management, energy management,environment protection, public health and safety. However,

    urban authorities are not equipped to make use of this typeof   Big Data. Without suitable data analytics to detect andcorrelate relevant events in the urban environment, this sensinginfrastructure will not be effectively utilised and these publicservices will remain manual tasks.

    In this paper, we use the parking data collected from oneof the cities, namely the city of San Francisco (SF), USA,and apply data analytics to infer interesting events buriedin the data. Although the SF parking data provides real-tineparking availability data to the public, a meaningful analysisof the data is lacking for interpretation by the authorities. Inparticular, we perform data clustering and anomaly detectionon the collected parking data, and present several interesting

    practical insights from the data, which are impossible to inferwithout performing such machine learning tasks. To the bestof our knowledge, this is the first time such an analysis hasbeen performed in terms of clustering and anomaly detectionon the SF parking dataset, which has been made available tothe public by the city, and can be accessed from [2].

    The rest of the paper is organised as follows. Section IIprovides the existing related work in this domain, and SectionIII introduces the SF data set, the challenges in the analyticsand our approach. Section IV describes the clustering andthe anomaly detection algorithm, and Section V discusses theoutcomes. In Section VI, we provide a discussion about theresults and conclude highlighting further research directions.

    978-1-4799-2843-9/14/$31.00 © 2014 IEEE

    1

  • 8/16/2019 Ieee Smart Car Parking

    2/6

    II. RELATED  WOR K

    Analysing parking data in terms of predicting availableparking lots has received attention in the literature. The mainchallenges of parking availability prediction are the accu-racy of the long time prediction, the interaction between theparking lots in an area, and how user behaviors affect theparking availability. In [9], Caliskan et al. built a decentralized

    parking guide system based on vehicular ad hoc networks(VANETs) that uses a continuous-time homogeneous Markovmodel for parking availability prediction. The model onlytargets a single parking lot and is most effective within 15minutes. In [17], Klappenecker et al. followed the researchof [9] and developed a structural solution that simplifies thecomputation of transition probabilities. In [8], Caicedo et al.present an aggregated approach that is combined with anIntelligent Parking Reservation (IPR) system, which lets usersset their parking preferences, pay in advance and provides real-time parking information. The whole system receives drivers’parking requests, and allocates them to the best parking lotusing a calibrated discrete choice model and estimate the futuredeparture based on requests of all drivers and their habits. Then

    using the previous results to predict the parking availability,this method can reach a reasonable accuracy of prediction in a4 hour window. However, these analyses are done in either oneof the parking lots or considering all of them together, withoutgrouping them based on their behavior. Further no anomalydetection has been performed to infer interesting events in thedata.

    Our analysis on clustering and anomaly detection providesthe means to identify the interesting parking locations, suchas extreme occupancy rates during the day, and possible faultydata. Combining our analysis along with the parking predictionwill enable more accurate predictions in the future. Below wedescribe the SF parking data and our approach to clustering

    and anomaly detection in detail.

    I II . S F PARKING DATA SET AND  O UR  A PPROACH

    The city of San Francisco has deployed 8200 parkingsensors in city parking lots [2]. Sensors are installed in on-street metered parking spaces and gate controlled off-streetfacilities [2]. The real time data are collected and made avail-able on-line for public use and research activities. These sensordata make it more convenient to find vacant parking spotsby the drivers, bicyclists, pedestrians, visitors, residents andmerchants. In addition to the parking availability map availableon the SFpark.org web, information on parking availability is

    distributed via a mobile apps and the regional phone system.By checking parking availability before leaving home, driverswill know where they can expect to find parking and how muchit will cost.

    We used the public API data feed to collect parkingdata over a two weeks period between 13/August/2013 and26/August/2013 at a sampling rate of 15 minute time intervals.The features collected include date and time, type of parkinglot, whether it is on street or off street parking, parkinglot name, number of spaces currently occupied (OCC rate),number of spaces currently operational for this location, andthe longitude and latitude values of each location. In orderto perform clustering and anomaly detection analysis, we

    computed the average OCC rate over the two week periodfor every 15 minutes interval. The resulting data set consistsof 570 parking locations and 96 fifteen minute time instancesper location over a day. We perform clustering and anomalydetection on this 96 dimensional data in this paper.

    Analysing all the 570 parking location data purely usingthe time series plots is cumbersome and makes it difficult to

    infer any useful information. However, systematic clusteringanalysis and anomaly detection can limit the scope of such ananalysis by providing a focus in terms of potentially interestinglocations. Below we provide a brief overview of the clusteringand the anomaly detection algorithms that we use for analysingthis dataset.

    IV. CLUSTERING AND  A NOMALY  D ETECTION

    Clustering is a process of finding groups of similar datavectors in a given data set. This can be non-parametric, whichdoes not assume any distribution over the data a priori, andunsupervised, which does not need prior labeling of the dataas to which class or cluster it belongs, learning technique.

    Consequently, this is a suitable data analytics technique for anew dataset like the SF parking data. There is a wide varietyof clustering techniques available in the literature with varyingpros and cons [6], [18], [19], [27], [33]. We use a simplebut effective approach based on automatically separating thefarthest data vectors, called farthest first clustering, and anexpectation maximisation clustering method for the analysis.

    The task of detecting interesting or unusual events ina general manner is an open problem in the data miningcommunity, and is often referred to as the anomaly detectionproblem. An  outlier  or  anomaly   in a set of data is defined byBarnett et al. [4] as “an observation (or subset of observations)which appears to be inconsistent with the remainder of thatset of data”. Anomaly or outlier detection mechanisms canbe categorised into three general approaches depending on thetype of background knowledge of the data that is available.The first approach finds outliers without prior knowledge of the underlying data. This approach uses unsupervised learningor clustering, and assumes that the outliers are well separatedfrom the data points that are normal. The second approachuses supervised classification, where a classifier is trained withlabeled data, i.e., the training data is marked as normal orabnormal. Then the trained classifier can be used to classifynew data as either normal or abnormal. This approach requiresthe classifier to be retrained if the characteristics of normaland abnormal data changes in the system. The third approach

    is novelty detection, which is analogous to semi-supervisedrecognition. Here a classifier learns a succinct generalisationof a given set of data, which can then be used to recogniseanomalies [6], [7], [20]–[23], [25], [26], [28]. In this work weuse a novelty detection approach based on one-class suppportvector machine to detect anomalies in the SF parking data.Below we provide an overview of each of the algorithms weuse for the analysis.

    1) Farthest First (FF) Clustering:   The FF clustering is asimple clustering algorithm, introduced by [10], [11], [14], thatperforms clustering of the data given a number of clusters  ka priori. It uses a farthest first traversal to find the mutuallyfarthest   k   points. The steps involved in the algorithm are as

    2

  • 8/16/2019 Ieee Smart Car Parking

    3/6

    follows. First, a data vector   x   is selected arbitrarily. Second,a second data vector  y   is selected furthest from the first one.Third, a data vector furthest from the two data vectors  x  and  yare selected. This procedure is continued until  k   data vectorsare obtained. These   k   data vectors are used as the clustercenters, and the remaining data vectors are assigned to theclosest of those cluster centers. We utilised the Weka software[13] for performing FF clustering on the smart car parking

    data.

    2) Expectation Maximisation (EM) Clustering:   The EMalgorithm assigns a probability distribution to each data point,indicating the probability of it belonging to each of the clusters.It is an unsupervised clustering method that makes use of afinite Gaussian mixtures model, where the number of mixturesis equal to the number of clusters, and each probabilitydistribution corresponds to one cluster. The steps involved inthe EM algorithm for clustering are as follows [16].

    1) Initial values are arbitrarily assigned for the mean andstandard deviation of the normal distribution model.

    2) The parameters are iteratively refined using the two

    steps of the EM algorithm, namely the Expectation step(E) and the Maximisation step (M). In the E step, themembership probabilities for each data vector based onthe above initial parameters are computed. In the M step,the parameters are recomputed based on the new mem-bership probabilities found in the E step. The algorithmterminates when the distribution parameters converge orthe algorithm reaches a maximum number of iterations.

    3) Each data vector is assigned to a cluster with which ithas the maximum membership probability.

    EM determines the number of clusters by cross validation.The cross validation is performed as follows. First, the numberof clusters is set to one. Second, the training set is split intoa given number of folds. In this case it is split into 10 folds.EM procedure is performed 10 times with the 10 folds. Thelog likelihood values obtained from the 10 fold procedure isaveraged. If the log likelihood is increased when the number of clusters is increased by 1, then the above procedure is repeatedfrom the second step. We utilised the Weka software [13] forperforming the EM clustering on the smart car parking data.

    3) One-class SVM: Support Vector Data Description(SVDD):  A class of machine learning algorithms, called kernelmethods, uses  kernel functions   to emulate a mapping of datameasurements from the  input space  (the space where the datais collected) to a higher dimensional space called the   feature space  [24], [29]–[31]. The mapped vectors in the feature spaceare called   image vectors. Linear or smooth surfaces in thefeature space are used to classify the data as either normal oranomalous. The linear or smooth surfaces in the feature spaceusually yield nonlinear surfaces in the input space. The advan-tage of this method is that the dimension of the mapped featurespace is hidden by the kernel function and is not explicitlyknown. This facilitates highly nonlinear and complex learningtasks without excessive algorithmic complexity. A specificclass of algorithms called one-class support vector machines(SVMs) do not require labeled data for training. In this schemea separating smooth surface such as a hypersphere is found inthe feature space, such that the surface automatically separatesthe data vectors into normal and anomalous. In these schemes,

    Fig. 1. Geometry of SVDD: Data vectors are mapped from the input spaceto a higher dimensional space and a hypersphere (with center  c  and radius  R)is fit to the majority of the data. Data that falls outside the hypersphere areanomalous.

    the proportion of data vectors considered to be anomalies iscontrolled by a parameter of the algorithm. Tax et al. [32]formulated the one-class SVM using a hypersphere, calledsupport vector data description (SVDD). In this approach, aminimal radius hypersphere is fixed around the majority of theimage vectors in the feature space. The data that falls outside

    the hypersphere are identified as anomalous. Figure 1 showsthe geometry of the SVDD. This hypersphere formulation usesquadratic programming optimisation.

    Consider a data vector  xi   in the   input space  from a set of data vectors   X   =   {xi   :   i   = 1..n}   mapped to a the   featurespace  by some non-linear mapping function  φ(.), resulting ina mapped vector   φ(xi)   (image vector ). The aim of fitting ahypersphere with minimal radius   R, having a center   c   andencompassing a majority of the image vectors in the featurespace yields the following optimisation problem:

    minR∈+,ξ∈n

    R2 +  1

    νn

    n

    i=1

    ξ i

    subject to:   φ(xi) − c2 ≤ R2 + ξ i, ξ i ≥  0,   ∀i   (1)

    where {ξ i :  i  = 1...n}  are the slack variables that allow someof the image vectors to lie outside the sphere. The parameterν  ∈   (0, 1]   is the regularisation parameter which controls thefraction of image vectors that lie outside the sphere, i.e., thefraction of image vectors that can be   outliers   or   anomalies.Using the Lagrange technique, the above primal problem (1)is converted to a dual problem as follows, which is a quadraticoptimisation problem:

    minα∈n

    n

    i,j=1

    αiαjk(xi, xj) −n

    i=1

    αik(xi, xi)

    subject to:n

    i=1

    αi = 1,   0 ≤  αi ≤  1νn

    , i = 1...n.   (2)

    where,   k(xi, xj) =   φ(xi).φ(xj)   is the kernel function, andthe   αi   are the Largrange multipliers. The data vectors withαi  >  0   are called the support vectors. Using the solution forαi, the decision function for a data vector  x  can be written asf (x) = sgn(R2−

    ni,j=1 αiαjk(xi, xj)+2

    ni=1 αik(xi, x)−

    k(x, x)).   Anomalous   data vectors are those with   αi   =  1

    νn,

    which fall outside the sphere. Data vectors with  0  ≤  αi <  1

    νnfall inside or on the the sphere, and are considered   normal.The kernel function that we use in this work is the Gaussianfunction   k(xi, xj) =  exp(−xi − xj

    2 /σ2), where   σ   is the

    3

  • 8/16/2019 Ieee Smart Car Parking

    4/6

    kernel width parameter. A larger value for   σ   provides asmoother boundary around the data, while a smaller valueprovides a rugged boundary. It can be shown that   ν   is anupper bound on the fraction of anomalies and a lower boundfor the fraction of support vectors. The  ν   and  σ   are the twoparameters of this algorithm that need to be tuned dependingon the data set [32].

    V. S F PARKING DATA A NALYSIS

    The aim is to analyse the car parking data from a majorurban center and reveal any interesting clustering structure thatexists in the data. Further, we aim to identify any anomaliespresent in the data, that are indicative of potential sensorfailures or unusual behaviour.

    First we aim to identify any anomalous parking locationsin the SF region. We considered two approaches to anomalydetection. First, using the farthest first (FF) clustering algo-rithm with the number of clusters set to two. Second, usingthe one class SVM algorithm (SVDD). Figure 2 provides a

    Fig. 2. Clustering and SVDD results.

    table that shows the number of data vectors assigned to eachof the clusters form the FF and SVDD methods along withthe parameter values used for each of the algorithms. Figures3(a) and 4(a) show the parking locations in the SF regioncorresponding to each of the clusters, denoted using differentcolours. The time series plots of the occupancy rate (OCCrate) vs time for the 24 hour period are shown along withthe location maps. Figures 3(b) and 4(b) show the medianand median absolute deviation (MAD) of the data vectors thatbelong to each of the clusters. Figures 3(c) and 4(c) show themean and standard deviation of the data vectors that belongto each of the clusters. The FF produced a big cluster, cluster0, with 538 data vectors and a small cluster, cluster 1, with31 data vectors, whereas the SVDD produced 373 normal and197 anomalous data vectors. Both methods identified similar

    normal behaviors (shown in green in Figures 3(c) and 4(c)).In these normal data vectors, the time series demonstrates twoperiods of high occupancy around morning and evening peak hours. Further, a higher occupancy during the day time and alower occupancy during the early morning can be observed inthese data vectors.

    In terms of anomalies, FF clustering identified a smallnumber of locations (31 locations) that had consistently higheroccupancy even during the early morning. These parkinglots are geographically distributed across the city (see Figure3(a)), but still reasonably concentrated within each geographicregion. This may be an indication of special time limitationsor parking rates.

    In terms of the one-class SVM (the SVDD), exactly thesame normal profiles were identified as was the case with FFclustering. However, SVDD identified those locations that ex-perienced extreme behavour, either abnormally high (similar toFF anomalies - cluster 0), but also abnormally low occupancylocations. These include parking locations with close to zerooccupancy in the early morning. An open question for furtherinvestigation is the reason for such low occupancy, i.e., being

    in a business district or having security and safety concerns.

    Figure 5 shows the clusters obtained using FF clusteringwith the number of clusters set to four. The four clusters showdifferent bands of occupancy rates over the 24 hour periodin the city. This is evident from Figures 5(b) and 5(c). Inaddition to the clusters that show consistently lower occupancy(cluster 0) and higher occupancy locations (cluster 1), italso reveals two more clusters that have different behaviour.Cluster 2 (shown in black) shows very low occupancy rates,around 0.3, during the early morning and higher occupancyrates, around 0.7, during the day time. This shows largervariation between early morning and busy hours of the day.The other cluster, cluster 4 (shown in purple), shows occupancy

    rates of around 0.6 (on average) throughout the 24 hourperiod. The identification of these bands of clusters, i.e., thebands of parking locations, helps parking managers to identifyconsistent occupancy behaviors in the parking lots over thecity region, and potentially help devise appropriate parkingstrategies. For example, the parking rates for parking locationsin the cluster 3 and cluster 2 can be readjusted such that auniform occupancy is achieved, and hence better utilisationof parking lots, throughout the region. This demonstrates thebenefit of performing such clustering analysis on parking datafor identifying interesting scenarios.

    Finally, we consider the use of fine-grained clustering inorder to identify more specific behaviours. We used EM clus-

    tering for this analysis. EM clustering automatically identifiesthe number of clusters in the data set. In this case it identified16 distinct clusters in the data. Figures 6(b) and 6(c) showthe median and the mean occupancy rates over the 24 hourperiod for each of the 16 clusters respectively. Note that weomitted showing the MAD and the standard deviation valueswe computed for each cluster in the graphs for clarity. Figure6(a) shows the locations of the parking lots for each of theclusters.

    When we analyse these clusters, again we can see that thesehighlight clusters with consistently higher or lower occupancy,and with the typical daily variation in the occupancy profiles.However, there are two clusters that are particularly noteworthy

    for further investigation. Cluster 2 consistently has a median of zero. Those parking lots are geographically dispersed. They arelikely to indicate a faulty sensor. This identification becomesuseful for fault analysis. Further questions that arise from thisanalysis are, how do these sensors change over time?, i.e., if they are truly faulty, is there any drift in their profile that couldact as an early warning?

    Cluster 4 also shows unusual behaviour, in terms of havinghigher occupancy during the early morning compared to duringthe day. Further investigation is warranted to see if theselocations are affected by particular daytime activities, suchas road or building construction works, which limit accessto these parking spots during the day. Figure 7 shows the

    4

  • 8/16/2019 Ieee Smart Car Parking

    5/6

    (a)

    00.00 05.00 10.00 15.00 20.00 01.000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Time (hours.minutes)

       O   C   C

      r  a   t  e

     

    cluster0

    cluster1

    (b)

    00.00 05.00 10.00 15.00 20.00 01.000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Time (hours.minutes)

       O   C   C

      r  a   t  e

     

    cluster0

    cluster1

    (c)

    Fig. 3. Farthest First clustering with two clusters. (a) Spatial locations of the parking lots in each cluster (b) Median and median absolute deviation of the datavectors in each of the two clusters. (c) Mean and standard deviation of the data vectors in each of the two clusters

    (a)

    00.00 05.00 10.00 15.00 20.00 01.000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Time (hour.minute)

       O   C   C

      r  a   t  e

     

    Anomalies

    Normal

    (b)

    00.00 05.00 10.00 15.00 20.00 01.000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Time (hours.minutes)

       O   C   C

      r  a   t  e

     

    Anomalies

    Normal

    (c)

    Fig. 4. One-class classification (SVDD) with parameters  ν   = 0.1   and  σ   = 100. (a) Spatial locations of normal and anomalous parking lots. (b) Median andmedian absolute deviation of the normal and anomalous data vectors. (c) Mean and standard deviation of the normal and anomalous data vectors.

    (a)

    00.00 05.00 10.00 15.00 20.00 01.000

    0.2

    0.4

    0.6

    0.8

    1

    Time (hours.minutes)

       O   C   C

      r  a   t  e

     

    cluster0 cluster1 cluster2 cluster3

    (b)

    00.00 05.00 10.00 15.00 20.00 01.000

    0.2

    0.4

    0.6

    0.8

    1

    Time (hours.minutes)

       O   C   C

      r  a   t  e

     

    cluster0 cluster1 cluster2 cluster3

    (c)

    Fig. 5. Farthest First clustering using four clusters. (a) Spatial locations of the parking lots in each cluster. (b) Median and median absolute deviation of thedata in each of the four clusters. (c) Mean and standard deviation of the data in each of the four clusters.

    (a) (b)Fig. 7. Spatial locations of the parking lots in selected clusters (using EMclustering) (a) Cluster 2: have data vectors with a mean/median value of zero(green lines). (b) Cluster 4 (red lines)

    locations of the parking lots in selected clusters (Cluster 2and Cluster 4).

    VI. DISCUSSION AND  C ONCLUSION

    Data analytics in large data sets collected by smart citiesis an important task to enable intelligent management of the

    infrastructure that has been monitored using IoT devices. Inthis paper we demonstrated the importance of clustering andanomaly detection on car parking management in a majorurban center- the city of San Francisco. In contrast to usingsimple average profiles, we have shown that we can bothcharacterise normal temporal behavior, as well as identifyinganomalous behavior. In particular, we showed that farthest

    first (FF) clustering identifies a small number of heavy usageparking spots, while the one-class SVM (SVDD) identifies ex-treme behavior (both high and low occupancy). These findingsprovide a focus for further analysis into external factors thatmay affect parking behavior, e.g., pricing, land use (business orresidential), security and safety, and adjancy to other modes of transport. Furthermore, we identified how finer scale clusteringcan identify potential operational issues, such as the possibilityof faulty sensors, or parking spots that are being affected byexternal factors during specific periods of the day. Furthermore,our research has highlighted how clustering and anomalydetection can provide a focus for more detailed investigation,such as correlating observations with other sources of data,

    5

  • 8/16/2019 Ieee Smart Car Parking

    6/6

    (a)

    00.00 05.00 10.00 15.00 20.00 01.000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Time (hour.minute)

       O   C   C

      r  a   t  e

     

    cluster0

    cluster1

    cluster2cluster3

    cluster4cluster5

    cluster6

    cluster7cluster8

    cluster9cluster10

    cluster11

    cluster12cluster13

    cluster14

    cluster15

    (b)

    00.00 05.00 10.00 15.00 20.00 01.000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Time (hours.minutes)

       O   C   C

      r  a   t  e

     

    cluster0

    cluster1

    cluster2cluster3

    cluster4cluster5

    cluster6

    cluster7cluster8

    cluster9cluster10

    cluster11

    cluster12cluster13

    cluster14

    cluster15

    (c)

    Fig. 6. EM clustering. (a)Spatial locations of the parking lots in each cluster. (b) Median values of the data vectors from each cluster. (c) Mean values of thedata vectors from each cluster

    e.g., crime statistics, other modes of transport, construction ac-tivity. This type of detailed correlation with other data sourcescan be impractical if it needs to be applied to all parkinglocations. However, the cluster analysis can limit the scope of such an analysis by providing a focus in terms of potentiallyinteresting locations. In particular, we have demonstrated thatit is possible to find clusters that are indicative of potentialsensor faults. In the future, we aim to perform clustering based

    on both spatial and temporal similarity in the SF data as wellas from other platforms such as Smart Santander.

    ACKNOWLEDGMENT

    We thank the support from the Australian Research Councilgrants LP120100529 and LE120100129.

    REFERENCES

    [1] “IoT,” http://issnip.unimelb.edu.au/research program/Internet of Things, 2013.

    [2] “San Francisco parking data,” http://sfpark.org, 2013.

    [3] “Smart Santander,” http://www.smartsantander.eu/, 2013.

    [4] V. Barnett and T. Lewis,  Outliers in Statistical Data, 3rd ed. JohnWiley and Sons, 1994.

    [5] J. Belissent, “Getting clever about smart cities: New opportunities re-quire new business models,” in   http://www.forrester.com/rb/ Research/ getting   clever about smart cities new opportunities/q/id/ 56701/t/ 2,2013.

    [6] J. C. Bezdek, T. Havens, J. Keller, C. Leckie, L. Park, M. Palaniswami,and S. Rajasegarar, “Clustering elliptical anomalies in sensor networks,”in   IEEE WCCI , 2010.

    [7] J. C. Bezdek, S. Rajasegarar, M. Moshtaghi, C. Leckie, M. Palaniswami,and T. Havens, “Anomaly detection in environmental monitoring net-works,”   IEEE Comp. Int. Mag., vol. 6, no. 2, pp. 52–58, 2011.

    [8] F. Caicedo, C. Blazquez, and P. Miranda, “Prediction of parking spaceavailability in real time,”  Expert Systems with Apps., vol. 39, no. 8, pp.7281 – 7290, 2012.

    [9] M. Caliskan, A. Barthels, B. Scheuermann, and M. Mauve, “Predictingparking lot occupancy in vehicular ad hoc networks,” in   IEEE VTC ,

    2007.[10] S. Dasgupta and P. M. Long, “Performance guarantees for hierarchical

    clustering,” Jnl. of Comp. and Sys. Sci., vol. 70, no. 4, pp. 555 – 569,2005.

    [11] T. F. Gonzalez, “Clustering to minimize the maximum interclusterdistance,”  Theoretical Comp. Sci., vol. 38, no. 0, pp. 293 – 306, 1985.

    [12] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, “Internet of Things (IoT): A Vision, Architectural Elements, and Future Directions,”

     Accepted for publ. in Future Generation Computer Systems, Jan 2013.

    [13] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, andI. H. Witten, “The WEKA data mining software: An update,”   SIGKDD

     Explorations,, vol. 11, no. 1, 2009.

    [14] D. S. Hochbaum and D. B. Shmoys, “A best possible heuristic for thek-center problem,”   Maths. of Oper. Res., vol. 10, no. 2, pp. 180–184,1985.

    [15] J. Jin, J. Gubbi, T. Luo, and M. Palaniswami, “Network architectureand QoS issues in the Internet of Things for a Smart City,” in  Proc. of the ISCIT , 2012, pp. 974–979.

    [16] X. Jin and J. Han, “Expectation maximization clustering,” in Encyc. of  Mach. Learn., C. Sammut and G. Webb, Eds., 2010, pp. 382–383.

    [17] A. Klappenecker, H. Lee, and J. L. Welch, “Finding available parkingspaces made easy,”  Ad Hoc Nets., vol. 12, no. 0, pp. 243 – 249, 2014.

    [18] M. Moshtaghi, T. Havens, L. Park, J. C. Bezdek, S. Rajasegarar,C. Leckie, M. Palaniswami, and J. Keller, “Clustering ellipses for

    anomaly detection,”   Pattern Recog., vol. 44, no. 1, pp. 55–69, 2011.

    [19] M. Moshtaghi, S. Rajasegarar, C. Leckie, and S. Karunasekera, “Anefficient hyperellipsoidal clustering algorithm for resource-constrainedenvironments,”   Pattern Recog., vol. 44, no. 9, pp. 2197–2209, 2011.

    [20] C. O’Reilly, A. Gluhak, M. A. Imran, and S. Rajasegarar, “Anomalydetection in wireless sensor networks in a non-stationary environment,”

     IEEE communications, surveys and tutorials, 2013.

    [21] ——, “Online anomaly rate parameter tracking for anomaly detectionin wireless sensor networks,” in  IEEE SECON , 2012.

    [22] S. Rajasegarar, J. C. Bezdek, C. Leckie, and M. Palaniswami, “Ellip-tical anomalies in wireless sensor networks,”   ACM Trans. on Sensor 

     Networks, vol. 6, no. 1, p. 28, Dec. 2009.

    [23] S. Rajasegarar, J. C. Bezdek, M. Moshtaghi, C. Leckie, T. C. Havens,and M. Palaniswami, “Measures for clustering and anomaly detectionin sets of higher dimensional ellipsoids,” in   IEEE WCCI , 2012.

    [24] S. Rajasegarar, C. Leckie, J. C. Bezdek, and M. Palaniswami, “Centeredhyperspherical and hyperellipsoidal one-class support vector machinesfor anomaly detection in sensor networks,”   IEEE Trans. on Info.Forensics and Sec., vol. 5, no. 3, pp. 518–533, 2010.

    [25] S. Rajasegarar, C. Leckie, and M. Palaniswami, “Anomaly detection inwireless sensor networks,”   IEEE Wireless Comms., vol. 15, no. 4, pp.34–40, 2008.

    [26] ——, “Detecting data anomalies in sensor networks,” in Security in Ad-hoc and Sensor Networks, R. Beyah, J. McNair, and C. Corbett, Eds.World Scientific Publishing, Inc, ISBN: 978-981-4271-08-0, July 2009,pp. 231–260.

    [27] ——, “Hyperspherical cluster based distributed anomaly detection inwireless sensor networks,”  Jnl. of Parallel and Distributed Computing,no. 0, pp. –, 2013.

    [28] S. Rajasegarar, C. Leckie, M. Palaniswami, and J. C. Bezdek, “Dis-

    tributed anomaly detection in wireless sensor networks,” in  IEEE ICCS ,2006.

    [29] S. Rajasegarar, A. Shilton, C. Leckie, R. Kotagiri, and M. Palaniswami,“Distributed training of multiclass conic-segmentation support vectormachines on communication constrained networks,” in  ISSNIP, 2010,pp. 211–216.

    [30] B. Scholkopf and A. Smola, Learning with Kernels, 2002.

    [31] A. Shilton, S. Rajasegarar, and M. Palaniswami, “Combined multiclassclassification and anomaly detection for large-scale wireless sensornetworks,” in  IEEE ISSNIP), 2013, pp. 491–496.

    [32] D. M. J. Tax and R. P. W. Duin, “Support vector data description,” Machine Learning, vol. 54, no. 1, pp. 45–66, 2004.

    [33] H. Wackernagle,   Multivariate Geostatistics: An Introduction with Ap- plications, 1998.

    6