9
Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2013, Article ID 164810, 8 pages http://dx.doi.org/10.1155/2013/164810 Research Article Traffic Volume Data Outlier Recovery via Tensor Model Huachun Tan, 1 Jianshuai Feng, 2 Guangdong Feng, 3 Wuhong Wang, 1 and Yu-Jin Zhang 4 1 Department of Transportation Engineering, Beijing Institute of Technology, Beijing 100081, China 2 Integrated Information System Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China 3 Civil Aviation Engineering Consulting Company of China, Beijing 100621, China 4 Department of Electronic Engineering, Tsinghua University, Beijing 100084, China Correspondence should be addressed to Huachun Tan; [email protected] Received 22 September 2012; Accepted 24 February 2013 Academic Editor: Huimin Niu Copyright © 2013 Huachun Tan et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Traffic volume data is already collected and used for a variety of purposes in intelligent transportation system (ITS). However, the collected data might be abnormal due to the problem of outlier data caused by malfunctions in data collection and record systems. To fully analyze and operate the collected data, it is necessary to develop a validate method for addressing the outlier data. Many existing algorithms have studied the problem of outlier recovery based on the time series methods. In this paper, a multiway tensor model is proposed for constructing the traffic volume data based on the intrinsic multilinear correlations, such as day to day and hour to hour. en, a novel tensor recovery method, called ADMM-TR, is proposed for recovering outlier data of traffic volume data. e proposed method is evaluated on synthetic data and real world traffic volume data. Experimental results demonstrate the practicability, effectiveness, and advantage of the proposed method, especially for the real world traffic volume data. 1. Introduction In order to alleviate the traffic congestion problem and facilitate the mobility in metropolises, large amounts of traffic information are collected as a part of intelligent transporta- tion system (ITS) such as CVIS (Cooperative Vehicle Infras- tructure System) in China. ese collected traffic data have wide range of applications. e real time traffic information is provided to travelers to support their decision for making process on the optimal route choice [1]. As shown by the work of Kim et al. [2], the real time information can contribute to reduce the operation cost and maximize resource utilization. In addition to these applications, the collected data could be applied to maximize the utilization of the infrastructure for smooth flow of the traffic. One such application of real time traffic data is traffic information control [3]. On the other hand, several data mining techniques have been applied to mine time related association rules from traffic databases and their results have been used for traffic prediction such as the works of Williams et al. [4] and Xu et al. [5]. From the above discussion, it is concluded that the collected traffic data are essential for many potential applications in ITS. In real world, the collected data are always corrupted due to noise values, especially outlier value, which may be caused by detector failures, communication problems, or any other hardware/soſtware related problems. e presence of outlier data in the database would degrade significantly the quality as well as reliability of the data and might impede the effectiveness of ITS applications. erefore, it is essential to fill the gaps caused by outlier data in order to fully explore the applicability of the data and realize the ITS applications. While many different kinds of traffic data such as traffic volume, speed, and occupancy are collected, the focus of this research is on the traffic volume outlier data recovery. It is supposed that the detectors collecting traffic information are set up at road sections and the collected values represent the traffic volume for those road sections. e aim of this research is to recover the traffic volume outlier data for road sections. Literature survey in the related field shows that several filtering recovery techniques have been applied to recover the outlier traffic data [68]. Filtering methods include techniques such as singular value decomposition, wavelet analysis, immune algorithm, and spectrum subtraction. Fil- tering methods formulate the traffic volume as time series

Research Article Traffic Volume Data Outlier …downloads.hindawi.com/journals/mpe/2013/164810.pdfTraffic Volume Data Outlier Recovery via Tensor Model HuachunTan, 1 JianshuaiFeng,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Traffic Volume Data Outlier …downloads.hindawi.com/journals/mpe/2013/164810.pdfTraffic Volume Data Outlier Recovery via Tensor Model HuachunTan, 1 JianshuaiFeng,

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2013 Article ID 164810 8 pageshttpdxdoiorg1011552013164810

Research ArticleTraffic Volume Data Outlier Recovery via Tensor Model

Huachun Tan1 Jianshuai Feng2 Guangdong Feng3 Wuhong Wang1 and Yu-Jin Zhang4

1 Department of Transportation Engineering Beijing Institute of Technology Beijing 100081 China2 Integrated Information System Research Center Institute of Automation Chinese Academy of Sciences Beijing 100190 China3 Civil Aviation Engineering Consulting Company of China Beijing 100621 China4Department of Electronic Engineering Tsinghua University Beijing 100084 China

Correspondence should be addressed to Huachun Tan htan8wiscedu

Received 22 September 2012 Accepted 24 February 2013

Academic Editor Huimin Niu

Copyright copy 2013 Huachun Tan et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Traffic volume data is already collected and used for a variety of purposes in intelligent transportation system (ITS) However thecollected data might be abnormal due to the problem of outlier data caused by malfunctions in data collection and record systemsTo fully analyze and operate the collected data it is necessary to develop a validate method for addressing the outlier data Manyexisting algorithms have studied the problem of outlier recovery based on the time series methods In this paper a multiway tensormodel is proposed for constructing the traffic volume data based on the intrinsic multilinear correlations such as day to day andhour to hour Then a novel tensor recovery method called ADMM-TR is proposed for recovering outlier data of traffic volumedataThe proposed method is evaluated on synthetic data and real world traffic volume data Experimental results demonstrate thepracticability effectiveness and advantage of the proposed method especially for the real world traffic volume data

1 Introduction

In order to alleviate the traffic congestion problem andfacilitate themobility inmetropolises large amounts of trafficinformation are collected as a part of intelligent transporta-tion system (ITS) such as CVIS (Cooperative Vehicle Infras-tructure System) in China These collected traffic data havewide range of applications The real time traffic informationis provided to travelers to support their decision for makingprocess on the optimal route choice [1] As shown by the workof Kim et al [2] the real time information can contribute toreduce the operation cost and maximize resource utilizationIn addition to these applications the collected data could beapplied to maximize the utilization of the infrastructure forsmooth flow of the traffic One such application of real timetraffic data is traffic information control [3] On the otherhand several data mining techniques have been applied tomine time related association rules from traffic databases andtheir results have been used for traffic prediction such as theworks of Williams et al [4] and Xu et al [5] From the abovediscussion it is concluded that the collected traffic data areessential for many potential applications in ITS

In real world the collected data are always corrupteddue to noise values especially outlier value which may becaused by detector failures communication problems or anyother hardwaresoftware related problems The presence ofoutlier data in the database would degrade significantly thequality as well as reliability of the data and might impede theeffectiveness of ITS applications Therefore it is essential tofill the gaps caused by outlier data in order to fully explorethe applicability of the data and realize the ITS applications

While many different kinds of traffic data such as trafficvolume speed and occupancy are collected the focus of thisresearch is on the traffic volume outlier data recovery It issupposed that the detectors collecting traffic information areset up at road sections and the collected values represent thetraffic volume for those road sectionsThe aimof this researchis to recover the traffic volume outlier data for road sections

Literature survey in the related field shows that severalfiltering recovery techniques have been applied to recoverthe outlier traffic data [6ndash8] Filtering methods includetechniques such as singular value decomposition waveletanalysis immune algorithm and spectrum subtraction Fil-tering methods formulate the traffic volume as time series

2 Mathematical Problems in Engineering

model and smooth the traffic waveform These approachesrecover the outlier data of day by day through spectrumanalysis and feature information extracting However thetraffic data through the same location is significantly similarfrom day to day and these approaches cannot utilize suchcharacteristic Pei and Ma [6] show that similarity is animportant factor impacting on recovery performance Whilethe above methods consider only one mode similarity therecovery performance is mainly dependent on the smooththreshold Unfortunately the smooth threshold is empiricallydetermined

In order to improve the recovery performance and con-sider themultidimension characteristic of traffic dataminingthe multimode similarities will make a great contribution forrecovering outlier value Our approach is based on utilizingmultimode correlations of traffic data that is traffic data havedifferent correlation on different modes such as week modeday mode and hour mode More concretely the feature ofthe proposed method is to recover the outlier value using thetraffic volume information of the different modes But theproblem is not so simple because traffic volumes from manydays might be corrupted by outlier data simultaneously Inorder to consider the multiple outlier traffic volumes we usetensor modeling the traffic volume

In order to solve the traffic volume outlier data problemwe formulate the traffic volume recovery problem as a datarecovery problem based on the assumption that the essentialtraffic volume is low-119899-ranklow rank and the outliers aresparse That is the corrupted traffic volumes can formulatedas

A =L +S (1)

whereA is the observed traffic volumewhich is corruptedLis the recovered traffic volume andS represents the outliersIn the problem the entrances of corrupted traffic data areunknown One straight solution is optimizing the followingproblem under the assumption that the 119899-rank ofL is smalland the corrupted outliers are sparse or bounded

minLsum

119894

120583119894rank119894 (L)

st A minusL minusS119865 le 120575(2)

The tensor recovery problem of (2) has been studiedin recent years which will be detailed in Section 3 In thispaper a new data recovery method based on tensor modelcalledAlternatingDirectionMethod ofMultipliers for TensorRecovery (ADMM-TR) is proposed to handle the outliertraffic volumes

This paper makes three main contributions (1) We usetensor to model the traffic volume and take advantage ofthe multiway characteristics of tensor which could explorethe multicorrelations of different modes in traffic data and(2) we formulate the problem of the traffic volume outlierdata recovery as a tensor recovery problem (3) we proposedADMM-TR algorithm by extending ADMM from matrix totensor case to solve the formulated tensor recovery problemfor traffic volume and the convergence of ADMM-TR is

proved It also should be noted that the proposed ADMM-TRmethod is different from [9] which reported that extendedADM for tensor recovery is proposed In fact they presentedADM for tensor completion in which the data are missedand the entrances of missing data are known While inthis research the objective is to recover the data which arecorrupted including missing or noised and the entrances ofcorrupted values are unknown

The paper is organized as follows We present the reviewof tensor model in Section 2 Section 3 briefly reviews therelated data recovery methods In Section 4 tensor model fortraffic volume is constructed traffic data recovery problem isformulated and an efficient algorithm is proposed to solvethe formulation Also a simple convergence guarantee for theproposed algorithm is given In Section 5 we evaluated theproposed method on synthetic data and real world trafficvolume data Finally we provide some concluding remarksin Section 6

2 Notation and Review of Tensor Models

In this section we adopt the nomenclature of Kolda andBaderrsquos review on tensor decomposition [10] and partiallyadopt the notation in [11]

A tensor is the generalization of amatrix to higher dimen-sions We denote scalars by lowercase letters (119886 119887 119888 )vectors as bold-case lowercase letters (a b c ) andmatricesas uppercase letters (119860 119861 119862 ) Tensors are written ascalligraphic letters (ABC )

An 119899-mode tensor is denoted as A isin R1198681times1198682timessdotsdotsdottimes119868119873 Itselements are denoted as 119886

1198941sdotsdotsdot119894119896sdotsdotsdot119894119899

where 1 le 119894119896le 119868119870 1 le

119870 le 119873 The mode-119899 unfolding (also called matricizationor flattening) of a tensor A isin R1198681times1198682timessdotsdotsdottimes119868119873 is defined asunfold (A 119899) = 119860

(119899) The tensor element (119894

1 1198942 119894

119873) is

mapped to the matrix element (119894119899 119895) where

119895 = 1 +

119873

sum

119896=1

119896 = 119899

(119894119896minus 1) 119869119896 with 119869

119896=

119896minus1

prod

119898=1119898 = 119899

119868119898 (3)

Therefore 119860(119899)isin R119868119899times119869 where 119869 = prod

119873

119896=1 119896 = 119899119868119896

Accordingly its inverse operator fold can be defined asfold (119860

(119899) 119899) = A

The 119899-rank of an119873-dimensional tensorA isin R1198681times1198682timessdotsdotsdottimes119868119873 denoted by 119903

119899 is the rank of the mode-119899 unfolding matrix

119860(119899)

119903119899= rank

119899 (A) = rank (119860 (119899)) (4)

The inner product of two same-size tensors AB isin

R1198681times1198682timessdotsdotsdottimes119868119873 is defined as the sum of the products of theirentries that is

⟨AB⟩ = sum1198941

sum

1198942

sdot sdot sdotsum

119894119873

1198861198941sdotsdotsdot119894119896sdotsdotsdot119894119899

1198871198941sdotsdotsdot119894119896sdotsdotsdot119894119899

(5)

The corresponding Frobenius norm is A119865= radic⟨AA⟩

Besides the 1198970norm of a tensor A denoted by A

0 is the

number of nonzero elements inA and the 1198971norm is defined

Mathematical Problems in Engineering 3

as A1= sum1198941sdotsdotsdot119894119896sdotsdotsdot119894119899

|1198861198941sdotsdotsdot119894119896sdotsdotsdot119894119899

| It is clear that A119865= 119860(119899)119865

A0= 119860(119899)0 and A

1= 119860(119899)1for any 1 le 119899 le 119873

The 119899-mode (matrix) product of a tensorA isin R1198681times1198682timessdotsdotsdottimes119868119873with a matrix 119872 isin R119869times119868119899 is denoted by Atimes

119899119872 and is size

1198681times sdot sdot sdot times 119868

119899minus1times119869times119868

119899+1times sdot sdot sdot times 119868

119873 In terms of flattened matrix

the 119899-mode product can be expressed as

Y = Atimes119899119872lArrrArr 119884

(119899)= 119872119860

(119899) (6)

3 Review of Data Recovery Methods

Recently the problem of recovering the sparse and low-rank components with no prior knowledge about the sparsitypattern of the sparse matrix or the rank of the low-rankmatrix has been well studied Authors of [12] proposedthe concept of ldquorank-sparse incoherencerdquo and solved theproblem by an interior point solver after being reformulatedas a semidefinite problem However although interior pointmethods normally take very few iterations to converge theyhave difficulty in handling large matrices So this limitationprevented the usage of the technique in computer vision andthe traffic volume recovery in this research

To solve the problem for large scale matrices Wrightet al [13] have adopted the iterative thresholding techniqueto solve the problem and obtained scalability propertiesLin et al have proposed an accelerated proximal gradient(APG) algorithm [14] and applied techniques of augmentedLagrange multipliers (ALM) [15] to solve the problem Yuanand Yang [16] have utilized the alternating direction method(ADM) which can be regarded as a practical version ofthe classical ALM method to solve the matrix recoveryproblem The ADM method has been proved to have apleasing convergence speed and results in [16] demonstratedits excellent performance

Inspired by the idea of [16] this paper extends the sparseand low-rank recovery problem to tensor case which is dueto the fact that the multidimensional traffic data can beformulated into the form of tensor

4 ADMM-TR for Traffic Volume OutlierRecovery

In this section we show the solution of problem (2) Thetensor model is firstly constructed for traffic volume inSection 41 Then we present the tensor recovery problem inSection 42 In Section 43 the classical ADMM approach isintroduced In Section 44 we convert the original probleminto a constrained convex optimization problem which canbe solved by the extended ADMM approach and presentthe details of the proposed algorithm Also the convergenceguarantees of the proposed algorithm are given in thissection

41 Tensor Model for Traffic Volume The correlations oftraffic volume data are critical for recovering the corruptedtraffic volume data Traditional methods mostly exploit partof correlations such as historical or temporal neighboringcorrelationsThe classic methods usually utilize the temporal

Table 1 The similarity coefficient of four modes

Mode Size Similarity coefficientHour 6 times 12 09670Day 7 times 288 08654Week 7 times 288 09153Link 4 times 288 08497

correlations of traffic data from day to day For the singledetector data multiple correlations contain the relations oftraffic data from day to day hour to hour and so forth Inaddition the spatial correlations exist in multiple detectorsdata

In this paper quantitative analysis of traffic data corre-lation is analyzed based on the traffic volume data down-loaded from httppemsdotcagov The correlation coeffi-cient applied tomeasuring the data correlation is given by [17]

119904 =

sum119899ge119894gt119895ge1

119877 (119894 119895)

119899 (119899 minus 1) 2

(7)

where 119899 refers to the whole data points 119877(119894 119895) refers tothe correlation coefficient matrix Table 1 gives the resultsof correlation coefficient of four modes which is hour dayweek and month

Conventional methods usually use day-to-day matrixpattern to model the traffic data Although each mode oftraffic data has a very high similarity these methods do notutilize the multimode correlations which are ldquoDay times HourrdquoldquoWeek times Hourrdquo and ldquoLink times Hourrdquo simultaneously and thusmay result in poor recovery performance

To make full use of the multimode correlation and trafficspatial-temporal information traffic data need to be con-structed into multiway data set Fortunately tensor patternbased traffic data can be well used to model the multiwaytraffic dataThis helps keep the original structure and employenough traffic spatial-temporal information

42 The Tensor Recovery Problem The problem of (2) is NPhard since it is not convex Then we use an approximationformulation as shown in (8)

minLS Llowast + 120578S1

st A minusL minusS119865 le 120575(8)

where A isin R1198681times1198682timessdotsdotsdottimes119868119873 is the given matrix to be recoveredL isin R1198681times1198682timessdotsdotsdottimes119868119873 is the low-rank component of A S isin

R1198681times1198682timessdotsdotsdottimes119868119873 is the sparse component of A Compared with(2) (8) relaxes the constrain for recovering a low-119899-ranktensor from a high-dimensional data tensor despite bothsmall entry-wise noise and gross sparse errors

Recently Liu et al [18] have proposed the definition of thenuclear norm of an 119899-mode tensor

Xlowast =1

119899

119899

sum

119894=1

1003817100381710038171003817119883(119894)

1003817100381710038171003817lowast (9)

4 Mathematical Problems in Engineering

Based on this definition the optimization in (8) can bewritten as

minLS Llowast + 120578S1 equiv

1

119899

119899

sum

119894=1

120582119894

1003817100381710038171003817119871(119894)

1003817100381710038171003817lowast+

1

119899

119899

sum

119894=1

120578119894

1003817100381710038171003817119878(119894)

10038171003817100381710038171

st A minusL minusS119865 le 120575

(10)

In order to recover (L S) instead of directly solving (10)we solve the following dual problem

min LS

1

2120574

A minusL minusS2

119865+

119899

sum

119894=1

120582119894

1003817100381710038171003817119871(119894)

1003817100381710038171003817lowast+

119899

sum

119894=1

120578119894

1003817100381710038171003817119878(119894)

10038171003817100381710038171 (11)

The problem in (11) is still difficult to solve due to theinterdependent nuclear norm and 119897

1norm constraints To

simplify the problem the formulation can be reformulated asfollows

minLS119872

119894119873119894

1

2120574

A minusL minus S2

119865+

119899

sum

119894=1

120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+

119899

sum

119894=1

120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171

st 119875119894L = 119872

119894119875119894S = 119873

119894forall119894

(12)

where 119875119894is the matrix representation of mode-119894 unfolding

(note that 119875119894is a permutation matrix thus 119875

119894

119879119875119894= 119868) 119872

(119894)

and119873(119894)are additional auxiliary matrices of the same size as

the mode-119894 unfolding ofL (or S)

43The Classical ADMMApproach The classical alternatingdirection method of multipliers (ADMM) is for solvingstructured convex programs of the form

min119909isin119862119909119910isin119862119910

119891 (119909) + 119892 (119910)

st 119860119909 + 119861119910 = 119888(13)

where119891 and 119892 are convex functions defined on closed subsets119862119909and 119862

119910119860 119861 and 119888 are matrices and vector of appropriate

sizes The segmented Lagrangian function of (13) is

119871119860(119909 119910 119908) = 119891 (119909) + 119892 (119910) + ⟨119908119860119909 + 119861119910 minus 119888⟩

+

120573

2

1003817100381710038171003817119860119909 + 119861119910 minus 119888

1003817100381710038171003817

2

2

(14)

where 119908 is a Lagrangian multiplier vector and 120573 gt 0 is apenalty parameter

The approach performs one sweep of alternating mini-mization with respect to 119909 and 119910 individually then updatesthe multiplier 119908 at the iteration 119896 the steps are given by [18Equations (479)ndash(481)]

119909(119896+1)larr997888 arg min

119909isin119862119909

119871119860(119909 119910(119896) 119908(119896))

119910(119896+1)larr997888 argmin

119909isin119862119910

119871119860(119909(119896+1) 119910 119908(119896))

119908(119896+1)larr997888 119908

(119896)+ 120588120573 (119860119909

(119896+1)+ 119861119910(119896+1)minus 119888)

(15)

where 120588 is the step length A convergence proof for the aboveADMM algorithm was shown as follows

Theorem 1 (See [19 Proposition 52]) Assume that theoptimal solution set 119883lowast of (13) is nonempty Furthermoreassume that119862

119909is bounded or else the matrix119860lowast119860 is invertible

Then a sequence 119909(119896) 119910(119896) 119908(119896) generated by (15) is boundedand every limit point of 119909(119896) is an optimal solution of theoriginal of problem (13)

44 ADMM Extension to Tensor Recovery We observe (12)is well structured in the sense that the separable structureemerges in both the objective function and constraintsThuswe propose an algorithm based on an extension of theclassical ADMM approach for solving the tensor recoveryproblem by taking advantage of this favorable structure

The augmented Lagrangian of (12) is

119871119860(LS119872

119894 119873119894)

=

1

2120574

A minusL minusS2

119865

+

119899

sum

119894=1

(120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+ ⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865)

+

119899

sum

119894=1

(120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171+ ⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865)

(16)

where 119884119894 119885119894are Lagrangian multipliers and 120572

119894 120573119894gt 0 are

penalty parametersThen we can now directly apply ADMM with this aug-

mented Lagrangian function

Computing 119872119894 The optimal119872

119894can be solved with all other

variables to be constant by the following subproblem

min119872119894

120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+ ⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865 (17)

As shown in [20] the optimal solution of (17) is given by

119872119894= 119880119894119863120582119894120572119894(Λ)119881119894

119879 (18)

where 119880119894Λ119881119894

119879 is the singular value decomposition given by

119880119894Λ119881119894

119879= 119875119894L +

119884119894

120572119894

(19)

and the ldquoshrinkagerdquo operator119863120591(119909) with 120591 gt 0 is defined as

119863120591 (119909) =

119909 minus 120591 if 119909 gt 120591119909 + 120591 if 119909 lt minus1205910 otherwise

(20)

Computing 119873119894 The optimal 119873

119894can be solved with all other

variables to be the constants by the following subproblem

min119873119894

120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171+ ⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865 (21)

Mathematical Problems in Engineering 5

Input n-mode tensorAParameters 120572 120573 120574 120582 120578 120588(1) InitializationL(0) = A S(0) = 0 119872

119894=L(119894) 119873119894= 0 119896 = 1

(2) Repeat until convergence(3) for 119894 = 1 to 119899(4) 119872

(119896+1)

119894= 119880119894119863120582119894120572119894(Λ)119881

119879

119894

where 119880119894Λ119881119879

119894= 119875119894L(119896) + 119884

(119896)

119894120572119894

(5) 119873(119896+1)

119894= 119863120578119894120573119894(119875119894S(119896) + 119885

(119896)

119894120573119894)

(6) 119884(119896+1)

119894= 119884(119896)

119894+ 120572119894(119875119894L(119896) minus119872

(119896+1)

119894)

(7) 119885(119896+1)

119894= 119885(119896)

119894+ 120573119894(119875119894S(119896) minus 119873

(119896+1)

119894)

(8) end for

(9) L(119896+1) =A minus S(119896) minus 120574sum

119899

119894=1119875119879

119894(119884(119896+1)

119894minus 120572119894119872(119896+1)

119894)

1 + 120574sum119899

119894=1120572119894

(10) S(119896+1) =A minusL(119896+1) minus 120574sum

119899

119894=1119875119879

119894(119885(119896+1)

119894minus 120572119894119873(119896+1)

119894)

1 + 120574sum119899

119894=1120573119894

(11) 120572 = 120588120572 120573 = 120588120573(12) End(13) 119896 = 119896 + 1

Output n-mode tensorLS

Algorithm 1 ADMM-TR ADMM for tensor recovery

By the well-known 1198971minimization [21] the optimal

solution of (21) is

119873119894= 119863120578119894120573119894

(119875119894S +119885119894

120573119894

) (22)

where119863120591is the ldquoshrinkagerdquo operation

Computing L Now we fix all variables except L andminimize 119871

119860over L The resulting minimization problem

is the minimization of a quadratic function

minL 119871119860 (

L) =1

2120574

A minusL minus S2

119865

+

119899

sum

119894=1

(⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865)

(23)

The objective function is differentiable so the minimizerLmin is characterized by (120597119871119860(L))120597L = 0 Thus we obtain

Lmin =A minusS minus 120574sum

119899

119894=1119875119894

119879(119884119894minus 120572119894119872119894)

(1 + 120574sum119899

119894=1120572119894)

(24)

ComputingS Nowwefix all variables exceptS andminimize119871119860

over S The resulting minimization problem is theminimization of a quadratic function

minS 119871119860 (

S) =1

2120574

A minusL minusS2

119865

+

119899

sum

119894=1

(⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865)

(25)

The objective function is also differentiable so the min-imizer Smin is characterized by (120597119871

119860(S))120597S = 0 Thus we

have

Smin =A minusL minus 120574sum

119899

119894=1119875119894

119879(119885119894minus 120573119894119873119894)

(1 + 120574sum119899

119894=1120573119894)

(26)

For comparing with RSTD [22] we also choose thedifference of L and S in successive iterations against acertain tolerance as the stopping criterion The pseudocodeof the proposed ADMM-TR algorithm is summarized inAlgorithm 1

Theorem 2 (Convergence of ADMM-TR) Assume that theoptimal solution set 119883lowast of (11) is nonempty A sequenceL(119896)S(119896)119872

(119896)

119894 119873(119896)

119894 119884(119896)

119894 119885(119896)

119894 generated by our proposed

ADMM-TR algorithm is bounded and every limit point ofL(119896)S(119896) is an optimal solution of the original problem (11)

Proof We check the assumptions of Theorem 1 119862119909is not

bounded but 119875119894

lowast119875119894= 119868 is a constant multiple of the identity

operator Thus Theorem 1 can also be applied to ADMM-TRandTheorem 2 can be derived

5 Numerical Experiments

This section evaluates the empirical performance of the pro-posed algorithm on synthetic data and compares the resultswithRSTD (Rank Sparsity TensorDecomposition) [22] Alsoexperiments on traffic volume data outlier recovery illustratethe efficiency of the proposedmethod in traffic research filed

We use the Lanczos algorithm for computing the singularvalues decomposition and adopt the same rule for predicting

6 Mathematical Problems in Engineering

Table 2 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 119899-rank = [5 5 5]

119904119901119903

ADMM-TR RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 43 43 133 137 46 46 208 178015 98 42 162 208 105 47 235 336025 123 43 514 449 159 52 676 510035 565 95 654 533 675 114 737 575

Table 3 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 n-rank = [10 10 10]

119904119901119903

Algorithm ADMM-TR Algorithm RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 44 20 236 264 47 25 338 299015 47 21 417 421 52 25 603 508025 89 29 664 608 120 42 663 536035 173 52 981 806 211 65 1110 877

the dimension of the principal singular space as [22] And theparameters are set as 120572 = 120573 = [119868

1119868max 1198682119868max 119868119899119868max]

119879

and 120574 = 1sum([1198681119868max 1198682119868max 119868119899119868max]

119879 for all experi-ments where 119868max = max119868

119894 120578 is set to 1radic119868max as suggested

in [23]All the experiments are conducted and timed on the same

desktop with an Pentium (R) Dual-Core 250GHz CPU thathas 4GB memory running on Windows 7 and MATLAB

51 Synthetic Data ADMM-TR and RSTD are tested on thesynthetic data of size 40times40times40 We generate the dimension119903 of a ldquocore tensorrdquo C isin R119903timessdotsdotsdottimes119903 which we fill with Gaussiandistributed entries (sim N(0 1)) Then we generate matrices119880(1) 119880

(119873) with119880(119894) isin R119899119894times119903 whose elements are also iidGaussian random variables (simN(0 1)) and set

L0= Ctimes1119880(1)

times2sdotsdotsdottimes119873119880(119873) (27)

The entries of sparse tensor S0are independently dis-

tributed each taking on value 0 with probability 1-119904119901119903 andeach taking on impulsive value with probability 119904119901119903We applythe proposed algorithm to the tensorA

0=L0+S0to recover

L and S and compare with RSTD For these experimentstwo cases of 119899-rank are investigated 119899-rank = [5 5 5] and 119899-rank = [10 10 10] Table 1 presents the average results (across30 instances) for different 119904119901119903

The quality of recovery is measured by the relative squareerror (RSE) toL

0and S

0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

RSE S0=

10038171003817100381710038171003817

S minusS0

10038171003817100381710038171003817119865

1003817100381710038171003817S0

1003817100381710038171003817119865

(28)

Tables 2 and 3 show that the proposed algorithm(ADMM-TR) is about 10 percent faster than RSTD proposed

in [22] and achieves better accuracy in terms of relative squareerror Though both of the algorithms involve computing aSVD per iteration we observe that the proposed algorithmtake much fewer iterations than RSTD to converge to theoptimal solution

The more impulsive entries are added that is for highervalue of spr the less probable it becomes for the tensorrecovery problem In addition the problem becomes moresophisticated when the 119899-rank is higher for the ground truthtensors of the same size In Table 1 different spr is set fortwo tensor cases And results show the recovered accuracydecreases as the spr grows for a certain case In particular therecovered accuracy for the tensorA

0isin R40times40times40 with 119899-rank

= [5 5 5] decreases sharply when the spr is up to 40 whilethe phenomenon occurs for tensor A

0isin R40times40times40 with 119899-

rank = [10 10 10] when the spr is about 25 This is due tothat relative low rank and low sparse ratio are precondition ofthe tensor recovery problem

52 TrafficVolumeData To evaluate the performances of theproposed method in traffic volume data outlier recovery acomplete traffic volume data set is used as the test data setWeuse the data of a fixed point in Sacramento County which isdownloaded fromhttppemsdotcagovThe traffic volumedata are recorded every 5 minutes Therefore a daily trafficvolume series for a loop detector contains 288 records andthe whole period of the data lasts for 16 days that is fromAugust 2 to August 17 2010

Based on multiple correlations of the traffic volume datawe model the data set as a tensor model of size 16 times 24 times 12which stands for 16 days 24 hours in a day and 12 sampleintervals (ie recorded by 5 minutes) per hour The ratiosof outlier data are set from 5 to 15 and the outlier dataare produced randomly All the results are averaged by 10instances

Mathematical Problems in Engineering 7

0

100

200

300

400

500

600

700

800

900

1000

Traffi

c vol

ume (

veh5

min

)

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Figure 1 Comparisons with raw traffic volume data data corruptedby outliers with 5 ratio and data recovered by ADMM-TR

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 2 Comparisons with raw traffic volume data data corruptedby outliers with 10 ratio and data recovered by ADMM-TR

For the real world data we mainly pay attention to thecorrectness of the recovered traffic volume data Thus thequality of recovery is measured by the relative square error(RSE) to L

0and Mean Absolute Percentage Error (MAPE)

toL0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

MAPE L0=

1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

(29)

0

200

400

600

800

1000

1200

1400

1600

1800

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 3 Comparisons with raw traffic volume data data corruptedby outliers with 15 ratio and data recovered by ADMM-TR

Table 4 Traffic volume outlier data before and afterbeing recovered

119904119901119903

Before recovery After recoveryRSE L

0

(119890-3)MAPE L

0

(119890-3)RSE L

0

(119890-3)MAPE L

0

(119890-3)005 05005 02136 00961 00314010 07066 05008 01417 01027015 08850 08777 02221 02351

where 119905(119898)119903

and 119905(119898)119890

are the 119898th elements which stand forthe known real value and recovered value respectively 119872denotes the number of recovered traffic volumes

Table 4 presents the relative errors of traffic volumeoutlier data before and after being recovered by ADMM-TRThe results show that the RSE L

0and MAPE L

0for traffic

volume data corrupted by outlier data are about 5 times thanthe data after being recovered by ADMM-TR Figures 1 2and 3 present the profiles of traffic volume data for a day Theresults show that ADMM-TR could recover the traffic volumeoutlier data with perfect performance

6 Conclusions

In this paper we concentrate on the mathematical problemin traffic volume outlier data recovery and proposed anovel tensor recovery method based on alternating directionmethod of multipliers (ADMM) The proposed algorithmcan automatically separate the low-119899-rank tensor data andsparse partThe experiments show that the proposedmethodis more stable and accurate in most cases and has excellentconvergence rate Experiments on real world traffic volumedata demonstrate the practicability and effectiveness of theproposed method in traffic research domain

In the future we would like to investigate how to auto-matically choose the parameters in our algorithm and explore

8 Mathematical Problems in Engineering

additional applications of our method in traffic researchdomain

Acknowledgments

This research was supported by NSFC (Grant No 6127137661171118 and 91120015) Beijing Natural Science Foundation(4122067)The authors would like to thank Professor Bin Ranfrom the University ofWisconsin-Madison and Yong Li fromthe University of Notre Dame for the suggestive discussions

References

[1] K Tamura and M Hirayama ldquoToward realization of VICSmdashvehicle information and communications systemrdquo in Pro-ceedings of the IEEE-IEE Vehicle Navigation and InformationsSystems Conference pp 72ndash77 October 1993

[2] S Kim M E Lewis and C C White III ldquoOptimal vehiclerouting with real-time traffic informationrdquo IEEE Transactionson Intelligent Transportation Systems vol 6 no 2 pp 178ndash1882005

[3] P Mirchandani and L Head ldquoA real-time traffic signal controlsystem architecture algorithms and analysisrdquo TransportationResearch Part C vol 9 no 6 pp 415ndash432 2001

[4] B M Williams P K Durvasula and D E Brown ldquoUrbanfreeway traffic flow prediction application of seasonal autore-gressive integrated moving average and exponential smoothingmodelsrdquo Transportation Research Record no 1644 pp 132ndash1411998

[5] J Xu X Li and H Shi ldquoShort-term traffic flow forecastingmodel under missing datardquo Journal of Computer Applicationsvol 30 no 4 pp 1117ndash1120 2010

[6] Y Pei and J Ma ldquoReal-time traffic data screening and recon-structionrdquo China Civil Engineering Journal vol 36 no 7 pp78ndash83 2003

[7] S Chen W Wang and W Li ldquoNoise recognition and noisereduction of real-time trafficdatardquoDongnanDaxueXuebao vol36 no 2 pp 322ndash325 2006

[8] B-G Daniel J D-P Francisco G-O David et al ldquoWavelet-based denoising for traffic volume time series forecasting withself-organizing neural networksrdquo Computer-Aided Civil andInfrastructure Engineering vol 25 no 7 pp 530ndash545 2010

[9] S Gandy B Recht and I Yamada ldquoTensor completion andlowmdashrank tensor recovery via convex optimizationrdquo InverseProblems vol 27 no 2 Article ID 025010 2011

[10] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[11] A S Lewis and G Knowles ldquoImage compression using the 2-Dwavelet transformrdquo IEEE Transactions of Image Processing vol1 no 2 pp 244ndash250 1992

[12] V Chandrasekaran S Sanghavi P A Parrilo and A S WillskyldquoRank-sparsity incoherence for matrix decompositionrdquo SIAMJournal on Optimization vol 21 no 2 pp 572ndash596 2011

[13] J Wright Y Peng Y Ma and A Ganesh ldquoRobust principalcomponent analysis exact recovery of corrupted low-rankmatrices by convex optimizationrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 2080ndash2088 Vancouver Canada 2009

[14] A Ganesh Z Lin J Wright L Wu M Chen and Y MaldquoFast algorithms for recovering a corrupted low-rank matrixrdquo

in Proceedings of the 3rd IEEE International Workshop onComputational Advances in Multi-Sensor Adaptive Processing(CAMSAP rsquo09) pp 213ndash216 December 2009

[15] Z Lin M Chen L Wu and Y Ma ldquoThe augmented Lagrangemultiplier method for exact recovery of a corrupted low-rankmatricesrdquoMathematical Programming 2009

[16] X Yuan and J Yang ldquoSparse and low-rank matrix decomposi-tion via alternating directionmethodsrdquo Tech Rep Departmentof Mathematics Hong Kong Baptist University 2009

[17] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[18] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo in Proceedings of theInternational Conference on Computer Vision (ICCV rsquo09) 2009

[19] D Bertsekas and J Tsitsiklis Parallel and Distributed Computa-tion Prentice-Hall 1989

[20] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010

[21] E T Hale W Yin and Y Zhang ldquoFixed-point continuationfor ℓ1-minimization methodology and convergencerdquo SIAMJournal on Optimization vol 19 no 3 pp 1107ndash1130 2008

[22] Y Li J Yan Y Zhou and J Yang ldquoOptimum subspace learningand error correction for tensorsrdquo in Proceedings of the 11thEuropean Conference on Computer Vision (ECCV rsquo10) CreteGreece 2010

[23] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo Journal of the ACM vol 58 no 3 article11 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Traffic Volume Data Outlier …downloads.hindawi.com/journals/mpe/2013/164810.pdfTraffic Volume Data Outlier Recovery via Tensor Model HuachunTan, 1 JianshuaiFeng,

2 Mathematical Problems in Engineering

model and smooth the traffic waveform These approachesrecover the outlier data of day by day through spectrumanalysis and feature information extracting However thetraffic data through the same location is significantly similarfrom day to day and these approaches cannot utilize suchcharacteristic Pei and Ma [6] show that similarity is animportant factor impacting on recovery performance Whilethe above methods consider only one mode similarity therecovery performance is mainly dependent on the smooththreshold Unfortunately the smooth threshold is empiricallydetermined

In order to improve the recovery performance and con-sider themultidimension characteristic of traffic dataminingthe multimode similarities will make a great contribution forrecovering outlier value Our approach is based on utilizingmultimode correlations of traffic data that is traffic data havedifferent correlation on different modes such as week modeday mode and hour mode More concretely the feature ofthe proposed method is to recover the outlier value using thetraffic volume information of the different modes But theproblem is not so simple because traffic volumes from manydays might be corrupted by outlier data simultaneously Inorder to consider the multiple outlier traffic volumes we usetensor modeling the traffic volume

In order to solve the traffic volume outlier data problemwe formulate the traffic volume recovery problem as a datarecovery problem based on the assumption that the essentialtraffic volume is low-119899-ranklow rank and the outliers aresparse That is the corrupted traffic volumes can formulatedas

A =L +S (1)

whereA is the observed traffic volumewhich is corruptedLis the recovered traffic volume andS represents the outliersIn the problem the entrances of corrupted traffic data areunknown One straight solution is optimizing the followingproblem under the assumption that the 119899-rank ofL is smalland the corrupted outliers are sparse or bounded

minLsum

119894

120583119894rank119894 (L)

st A minusL minusS119865 le 120575(2)

The tensor recovery problem of (2) has been studiedin recent years which will be detailed in Section 3 In thispaper a new data recovery method based on tensor modelcalledAlternatingDirectionMethod ofMultipliers for TensorRecovery (ADMM-TR) is proposed to handle the outliertraffic volumes

This paper makes three main contributions (1) We usetensor to model the traffic volume and take advantage ofthe multiway characteristics of tensor which could explorethe multicorrelations of different modes in traffic data and(2) we formulate the problem of the traffic volume outlierdata recovery as a tensor recovery problem (3) we proposedADMM-TR algorithm by extending ADMM from matrix totensor case to solve the formulated tensor recovery problemfor traffic volume and the convergence of ADMM-TR is

proved It also should be noted that the proposed ADMM-TRmethod is different from [9] which reported that extendedADM for tensor recovery is proposed In fact they presentedADM for tensor completion in which the data are missedand the entrances of missing data are known While inthis research the objective is to recover the data which arecorrupted including missing or noised and the entrances ofcorrupted values are unknown

The paper is organized as follows We present the reviewof tensor model in Section 2 Section 3 briefly reviews therelated data recovery methods In Section 4 tensor model fortraffic volume is constructed traffic data recovery problem isformulated and an efficient algorithm is proposed to solvethe formulation Also a simple convergence guarantee for theproposed algorithm is given In Section 5 we evaluated theproposed method on synthetic data and real world trafficvolume data Finally we provide some concluding remarksin Section 6

2 Notation and Review of Tensor Models

In this section we adopt the nomenclature of Kolda andBaderrsquos review on tensor decomposition [10] and partiallyadopt the notation in [11]

A tensor is the generalization of amatrix to higher dimen-sions We denote scalars by lowercase letters (119886 119887 119888 )vectors as bold-case lowercase letters (a b c ) andmatricesas uppercase letters (119860 119861 119862 ) Tensors are written ascalligraphic letters (ABC )

An 119899-mode tensor is denoted as A isin R1198681times1198682timessdotsdotsdottimes119868119873 Itselements are denoted as 119886

1198941sdotsdotsdot119894119896sdotsdotsdot119894119899

where 1 le 119894119896le 119868119870 1 le

119870 le 119873 The mode-119899 unfolding (also called matricizationor flattening) of a tensor A isin R1198681times1198682timessdotsdotsdottimes119868119873 is defined asunfold (A 119899) = 119860

(119899) The tensor element (119894

1 1198942 119894

119873) is

mapped to the matrix element (119894119899 119895) where

119895 = 1 +

119873

sum

119896=1

119896 = 119899

(119894119896minus 1) 119869119896 with 119869

119896=

119896minus1

prod

119898=1119898 = 119899

119868119898 (3)

Therefore 119860(119899)isin R119868119899times119869 where 119869 = prod

119873

119896=1 119896 = 119899119868119896

Accordingly its inverse operator fold can be defined asfold (119860

(119899) 119899) = A

The 119899-rank of an119873-dimensional tensorA isin R1198681times1198682timessdotsdotsdottimes119868119873 denoted by 119903

119899 is the rank of the mode-119899 unfolding matrix

119860(119899)

119903119899= rank

119899 (A) = rank (119860 (119899)) (4)

The inner product of two same-size tensors AB isin

R1198681times1198682timessdotsdotsdottimes119868119873 is defined as the sum of the products of theirentries that is

⟨AB⟩ = sum1198941

sum

1198942

sdot sdot sdotsum

119894119873

1198861198941sdotsdotsdot119894119896sdotsdotsdot119894119899

1198871198941sdotsdotsdot119894119896sdotsdotsdot119894119899

(5)

The corresponding Frobenius norm is A119865= radic⟨AA⟩

Besides the 1198970norm of a tensor A denoted by A

0 is the

number of nonzero elements inA and the 1198971norm is defined

Mathematical Problems in Engineering 3

as A1= sum1198941sdotsdotsdot119894119896sdotsdotsdot119894119899

|1198861198941sdotsdotsdot119894119896sdotsdotsdot119894119899

| It is clear that A119865= 119860(119899)119865

A0= 119860(119899)0 and A

1= 119860(119899)1for any 1 le 119899 le 119873

The 119899-mode (matrix) product of a tensorA isin R1198681times1198682timessdotsdotsdottimes119868119873with a matrix 119872 isin R119869times119868119899 is denoted by Atimes

119899119872 and is size

1198681times sdot sdot sdot times 119868

119899minus1times119869times119868

119899+1times sdot sdot sdot times 119868

119873 In terms of flattened matrix

the 119899-mode product can be expressed as

Y = Atimes119899119872lArrrArr 119884

(119899)= 119872119860

(119899) (6)

3 Review of Data Recovery Methods

Recently the problem of recovering the sparse and low-rank components with no prior knowledge about the sparsitypattern of the sparse matrix or the rank of the low-rankmatrix has been well studied Authors of [12] proposedthe concept of ldquorank-sparse incoherencerdquo and solved theproblem by an interior point solver after being reformulatedas a semidefinite problem However although interior pointmethods normally take very few iterations to converge theyhave difficulty in handling large matrices So this limitationprevented the usage of the technique in computer vision andthe traffic volume recovery in this research

To solve the problem for large scale matrices Wrightet al [13] have adopted the iterative thresholding techniqueto solve the problem and obtained scalability propertiesLin et al have proposed an accelerated proximal gradient(APG) algorithm [14] and applied techniques of augmentedLagrange multipliers (ALM) [15] to solve the problem Yuanand Yang [16] have utilized the alternating direction method(ADM) which can be regarded as a practical version ofthe classical ALM method to solve the matrix recoveryproblem The ADM method has been proved to have apleasing convergence speed and results in [16] demonstratedits excellent performance

Inspired by the idea of [16] this paper extends the sparseand low-rank recovery problem to tensor case which is dueto the fact that the multidimensional traffic data can beformulated into the form of tensor

4 ADMM-TR for Traffic Volume OutlierRecovery

In this section we show the solution of problem (2) Thetensor model is firstly constructed for traffic volume inSection 41 Then we present the tensor recovery problem inSection 42 In Section 43 the classical ADMM approach isintroduced In Section 44 we convert the original probleminto a constrained convex optimization problem which canbe solved by the extended ADMM approach and presentthe details of the proposed algorithm Also the convergenceguarantees of the proposed algorithm are given in thissection

41 Tensor Model for Traffic Volume The correlations oftraffic volume data are critical for recovering the corruptedtraffic volume data Traditional methods mostly exploit partof correlations such as historical or temporal neighboringcorrelationsThe classic methods usually utilize the temporal

Table 1 The similarity coefficient of four modes

Mode Size Similarity coefficientHour 6 times 12 09670Day 7 times 288 08654Week 7 times 288 09153Link 4 times 288 08497

correlations of traffic data from day to day For the singledetector data multiple correlations contain the relations oftraffic data from day to day hour to hour and so forth Inaddition the spatial correlations exist in multiple detectorsdata

In this paper quantitative analysis of traffic data corre-lation is analyzed based on the traffic volume data down-loaded from httppemsdotcagov The correlation coeffi-cient applied tomeasuring the data correlation is given by [17]

119904 =

sum119899ge119894gt119895ge1

119877 (119894 119895)

119899 (119899 minus 1) 2

(7)

where 119899 refers to the whole data points 119877(119894 119895) refers tothe correlation coefficient matrix Table 1 gives the resultsof correlation coefficient of four modes which is hour dayweek and month

Conventional methods usually use day-to-day matrixpattern to model the traffic data Although each mode oftraffic data has a very high similarity these methods do notutilize the multimode correlations which are ldquoDay times HourrdquoldquoWeek times Hourrdquo and ldquoLink times Hourrdquo simultaneously and thusmay result in poor recovery performance

To make full use of the multimode correlation and trafficspatial-temporal information traffic data need to be con-structed into multiway data set Fortunately tensor patternbased traffic data can be well used to model the multiwaytraffic dataThis helps keep the original structure and employenough traffic spatial-temporal information

42 The Tensor Recovery Problem The problem of (2) is NPhard since it is not convex Then we use an approximationformulation as shown in (8)

minLS Llowast + 120578S1

st A minusL minusS119865 le 120575(8)

where A isin R1198681times1198682timessdotsdotsdottimes119868119873 is the given matrix to be recoveredL isin R1198681times1198682timessdotsdotsdottimes119868119873 is the low-rank component of A S isin

R1198681times1198682timessdotsdotsdottimes119868119873 is the sparse component of A Compared with(2) (8) relaxes the constrain for recovering a low-119899-ranktensor from a high-dimensional data tensor despite bothsmall entry-wise noise and gross sparse errors

Recently Liu et al [18] have proposed the definition of thenuclear norm of an 119899-mode tensor

Xlowast =1

119899

119899

sum

119894=1

1003817100381710038171003817119883(119894)

1003817100381710038171003817lowast (9)

4 Mathematical Problems in Engineering

Based on this definition the optimization in (8) can bewritten as

minLS Llowast + 120578S1 equiv

1

119899

119899

sum

119894=1

120582119894

1003817100381710038171003817119871(119894)

1003817100381710038171003817lowast+

1

119899

119899

sum

119894=1

120578119894

1003817100381710038171003817119878(119894)

10038171003817100381710038171

st A minusL minusS119865 le 120575

(10)

In order to recover (L S) instead of directly solving (10)we solve the following dual problem

min LS

1

2120574

A minusL minusS2

119865+

119899

sum

119894=1

120582119894

1003817100381710038171003817119871(119894)

1003817100381710038171003817lowast+

119899

sum

119894=1

120578119894

1003817100381710038171003817119878(119894)

10038171003817100381710038171 (11)

The problem in (11) is still difficult to solve due to theinterdependent nuclear norm and 119897

1norm constraints To

simplify the problem the formulation can be reformulated asfollows

minLS119872

119894119873119894

1

2120574

A minusL minus S2

119865+

119899

sum

119894=1

120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+

119899

sum

119894=1

120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171

st 119875119894L = 119872

119894119875119894S = 119873

119894forall119894

(12)

where 119875119894is the matrix representation of mode-119894 unfolding

(note that 119875119894is a permutation matrix thus 119875

119894

119879119875119894= 119868) 119872

(119894)

and119873(119894)are additional auxiliary matrices of the same size as

the mode-119894 unfolding ofL (or S)

43The Classical ADMMApproach The classical alternatingdirection method of multipliers (ADMM) is for solvingstructured convex programs of the form

min119909isin119862119909119910isin119862119910

119891 (119909) + 119892 (119910)

st 119860119909 + 119861119910 = 119888(13)

where119891 and 119892 are convex functions defined on closed subsets119862119909and 119862

119910119860 119861 and 119888 are matrices and vector of appropriate

sizes The segmented Lagrangian function of (13) is

119871119860(119909 119910 119908) = 119891 (119909) + 119892 (119910) + ⟨119908119860119909 + 119861119910 minus 119888⟩

+

120573

2

1003817100381710038171003817119860119909 + 119861119910 minus 119888

1003817100381710038171003817

2

2

(14)

where 119908 is a Lagrangian multiplier vector and 120573 gt 0 is apenalty parameter

The approach performs one sweep of alternating mini-mization with respect to 119909 and 119910 individually then updatesthe multiplier 119908 at the iteration 119896 the steps are given by [18Equations (479)ndash(481)]

119909(119896+1)larr997888 arg min

119909isin119862119909

119871119860(119909 119910(119896) 119908(119896))

119910(119896+1)larr997888 argmin

119909isin119862119910

119871119860(119909(119896+1) 119910 119908(119896))

119908(119896+1)larr997888 119908

(119896)+ 120588120573 (119860119909

(119896+1)+ 119861119910(119896+1)minus 119888)

(15)

where 120588 is the step length A convergence proof for the aboveADMM algorithm was shown as follows

Theorem 1 (See [19 Proposition 52]) Assume that theoptimal solution set 119883lowast of (13) is nonempty Furthermoreassume that119862

119909is bounded or else the matrix119860lowast119860 is invertible

Then a sequence 119909(119896) 119910(119896) 119908(119896) generated by (15) is boundedand every limit point of 119909(119896) is an optimal solution of theoriginal of problem (13)

44 ADMM Extension to Tensor Recovery We observe (12)is well structured in the sense that the separable structureemerges in both the objective function and constraintsThuswe propose an algorithm based on an extension of theclassical ADMM approach for solving the tensor recoveryproblem by taking advantage of this favorable structure

The augmented Lagrangian of (12) is

119871119860(LS119872

119894 119873119894)

=

1

2120574

A minusL minusS2

119865

+

119899

sum

119894=1

(120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+ ⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865)

+

119899

sum

119894=1

(120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171+ ⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865)

(16)

where 119884119894 119885119894are Lagrangian multipliers and 120572

119894 120573119894gt 0 are

penalty parametersThen we can now directly apply ADMM with this aug-

mented Lagrangian function

Computing 119872119894 The optimal119872

119894can be solved with all other

variables to be constant by the following subproblem

min119872119894

120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+ ⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865 (17)

As shown in [20] the optimal solution of (17) is given by

119872119894= 119880119894119863120582119894120572119894(Λ)119881119894

119879 (18)

where 119880119894Λ119881119894

119879 is the singular value decomposition given by

119880119894Λ119881119894

119879= 119875119894L +

119884119894

120572119894

(19)

and the ldquoshrinkagerdquo operator119863120591(119909) with 120591 gt 0 is defined as

119863120591 (119909) =

119909 minus 120591 if 119909 gt 120591119909 + 120591 if 119909 lt minus1205910 otherwise

(20)

Computing 119873119894 The optimal 119873

119894can be solved with all other

variables to be the constants by the following subproblem

min119873119894

120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171+ ⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865 (21)

Mathematical Problems in Engineering 5

Input n-mode tensorAParameters 120572 120573 120574 120582 120578 120588(1) InitializationL(0) = A S(0) = 0 119872

119894=L(119894) 119873119894= 0 119896 = 1

(2) Repeat until convergence(3) for 119894 = 1 to 119899(4) 119872

(119896+1)

119894= 119880119894119863120582119894120572119894(Λ)119881

119879

119894

where 119880119894Λ119881119879

119894= 119875119894L(119896) + 119884

(119896)

119894120572119894

(5) 119873(119896+1)

119894= 119863120578119894120573119894(119875119894S(119896) + 119885

(119896)

119894120573119894)

(6) 119884(119896+1)

119894= 119884(119896)

119894+ 120572119894(119875119894L(119896) minus119872

(119896+1)

119894)

(7) 119885(119896+1)

119894= 119885(119896)

119894+ 120573119894(119875119894S(119896) minus 119873

(119896+1)

119894)

(8) end for

(9) L(119896+1) =A minus S(119896) minus 120574sum

119899

119894=1119875119879

119894(119884(119896+1)

119894minus 120572119894119872(119896+1)

119894)

1 + 120574sum119899

119894=1120572119894

(10) S(119896+1) =A minusL(119896+1) minus 120574sum

119899

119894=1119875119879

119894(119885(119896+1)

119894minus 120572119894119873(119896+1)

119894)

1 + 120574sum119899

119894=1120573119894

(11) 120572 = 120588120572 120573 = 120588120573(12) End(13) 119896 = 119896 + 1

Output n-mode tensorLS

Algorithm 1 ADMM-TR ADMM for tensor recovery

By the well-known 1198971minimization [21] the optimal

solution of (21) is

119873119894= 119863120578119894120573119894

(119875119894S +119885119894

120573119894

) (22)

where119863120591is the ldquoshrinkagerdquo operation

Computing L Now we fix all variables except L andminimize 119871

119860over L The resulting minimization problem

is the minimization of a quadratic function

minL 119871119860 (

L) =1

2120574

A minusL minus S2

119865

+

119899

sum

119894=1

(⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865)

(23)

The objective function is differentiable so the minimizerLmin is characterized by (120597119871119860(L))120597L = 0 Thus we obtain

Lmin =A minusS minus 120574sum

119899

119894=1119875119894

119879(119884119894minus 120572119894119872119894)

(1 + 120574sum119899

119894=1120572119894)

(24)

ComputingS Nowwefix all variables exceptS andminimize119871119860

over S The resulting minimization problem is theminimization of a quadratic function

minS 119871119860 (

S) =1

2120574

A minusL minusS2

119865

+

119899

sum

119894=1

(⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865)

(25)

The objective function is also differentiable so the min-imizer Smin is characterized by (120597119871

119860(S))120597S = 0 Thus we

have

Smin =A minusL minus 120574sum

119899

119894=1119875119894

119879(119885119894minus 120573119894119873119894)

(1 + 120574sum119899

119894=1120573119894)

(26)

For comparing with RSTD [22] we also choose thedifference of L and S in successive iterations against acertain tolerance as the stopping criterion The pseudocodeof the proposed ADMM-TR algorithm is summarized inAlgorithm 1

Theorem 2 (Convergence of ADMM-TR) Assume that theoptimal solution set 119883lowast of (11) is nonempty A sequenceL(119896)S(119896)119872

(119896)

119894 119873(119896)

119894 119884(119896)

119894 119885(119896)

119894 generated by our proposed

ADMM-TR algorithm is bounded and every limit point ofL(119896)S(119896) is an optimal solution of the original problem (11)

Proof We check the assumptions of Theorem 1 119862119909is not

bounded but 119875119894

lowast119875119894= 119868 is a constant multiple of the identity

operator Thus Theorem 1 can also be applied to ADMM-TRandTheorem 2 can be derived

5 Numerical Experiments

This section evaluates the empirical performance of the pro-posed algorithm on synthetic data and compares the resultswithRSTD (Rank Sparsity TensorDecomposition) [22] Alsoexperiments on traffic volume data outlier recovery illustratethe efficiency of the proposedmethod in traffic research filed

We use the Lanczos algorithm for computing the singularvalues decomposition and adopt the same rule for predicting

6 Mathematical Problems in Engineering

Table 2 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 119899-rank = [5 5 5]

119904119901119903

ADMM-TR RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 43 43 133 137 46 46 208 178015 98 42 162 208 105 47 235 336025 123 43 514 449 159 52 676 510035 565 95 654 533 675 114 737 575

Table 3 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 n-rank = [10 10 10]

119904119901119903

Algorithm ADMM-TR Algorithm RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 44 20 236 264 47 25 338 299015 47 21 417 421 52 25 603 508025 89 29 664 608 120 42 663 536035 173 52 981 806 211 65 1110 877

the dimension of the principal singular space as [22] And theparameters are set as 120572 = 120573 = [119868

1119868max 1198682119868max 119868119899119868max]

119879

and 120574 = 1sum([1198681119868max 1198682119868max 119868119899119868max]

119879 for all experi-ments where 119868max = max119868

119894 120578 is set to 1radic119868max as suggested

in [23]All the experiments are conducted and timed on the same

desktop with an Pentium (R) Dual-Core 250GHz CPU thathas 4GB memory running on Windows 7 and MATLAB

51 Synthetic Data ADMM-TR and RSTD are tested on thesynthetic data of size 40times40times40 We generate the dimension119903 of a ldquocore tensorrdquo C isin R119903timessdotsdotsdottimes119903 which we fill with Gaussiandistributed entries (sim N(0 1)) Then we generate matrices119880(1) 119880

(119873) with119880(119894) isin R119899119894times119903 whose elements are also iidGaussian random variables (simN(0 1)) and set

L0= Ctimes1119880(1)

times2sdotsdotsdottimes119873119880(119873) (27)

The entries of sparse tensor S0are independently dis-

tributed each taking on value 0 with probability 1-119904119901119903 andeach taking on impulsive value with probability 119904119901119903We applythe proposed algorithm to the tensorA

0=L0+S0to recover

L and S and compare with RSTD For these experimentstwo cases of 119899-rank are investigated 119899-rank = [5 5 5] and 119899-rank = [10 10 10] Table 1 presents the average results (across30 instances) for different 119904119901119903

The quality of recovery is measured by the relative squareerror (RSE) toL

0and S

0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

RSE S0=

10038171003817100381710038171003817

S minusS0

10038171003817100381710038171003817119865

1003817100381710038171003817S0

1003817100381710038171003817119865

(28)

Tables 2 and 3 show that the proposed algorithm(ADMM-TR) is about 10 percent faster than RSTD proposed

in [22] and achieves better accuracy in terms of relative squareerror Though both of the algorithms involve computing aSVD per iteration we observe that the proposed algorithmtake much fewer iterations than RSTD to converge to theoptimal solution

The more impulsive entries are added that is for highervalue of spr the less probable it becomes for the tensorrecovery problem In addition the problem becomes moresophisticated when the 119899-rank is higher for the ground truthtensors of the same size In Table 1 different spr is set fortwo tensor cases And results show the recovered accuracydecreases as the spr grows for a certain case In particular therecovered accuracy for the tensorA

0isin R40times40times40 with 119899-rank

= [5 5 5] decreases sharply when the spr is up to 40 whilethe phenomenon occurs for tensor A

0isin R40times40times40 with 119899-

rank = [10 10 10] when the spr is about 25 This is due tothat relative low rank and low sparse ratio are precondition ofthe tensor recovery problem

52 TrafficVolumeData To evaluate the performances of theproposed method in traffic volume data outlier recovery acomplete traffic volume data set is used as the test data setWeuse the data of a fixed point in Sacramento County which isdownloaded fromhttppemsdotcagovThe traffic volumedata are recorded every 5 minutes Therefore a daily trafficvolume series for a loop detector contains 288 records andthe whole period of the data lasts for 16 days that is fromAugust 2 to August 17 2010

Based on multiple correlations of the traffic volume datawe model the data set as a tensor model of size 16 times 24 times 12which stands for 16 days 24 hours in a day and 12 sampleintervals (ie recorded by 5 minutes) per hour The ratiosof outlier data are set from 5 to 15 and the outlier dataare produced randomly All the results are averaged by 10instances

Mathematical Problems in Engineering 7

0

100

200

300

400

500

600

700

800

900

1000

Traffi

c vol

ume (

veh5

min

)

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Figure 1 Comparisons with raw traffic volume data data corruptedby outliers with 5 ratio and data recovered by ADMM-TR

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 2 Comparisons with raw traffic volume data data corruptedby outliers with 10 ratio and data recovered by ADMM-TR

For the real world data we mainly pay attention to thecorrectness of the recovered traffic volume data Thus thequality of recovery is measured by the relative square error(RSE) to L

0and Mean Absolute Percentage Error (MAPE)

toL0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

MAPE L0=

1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

(29)

0

200

400

600

800

1000

1200

1400

1600

1800

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 3 Comparisons with raw traffic volume data data corruptedby outliers with 15 ratio and data recovered by ADMM-TR

Table 4 Traffic volume outlier data before and afterbeing recovered

119904119901119903

Before recovery After recoveryRSE L

0

(119890-3)MAPE L

0

(119890-3)RSE L

0

(119890-3)MAPE L

0

(119890-3)005 05005 02136 00961 00314010 07066 05008 01417 01027015 08850 08777 02221 02351

where 119905(119898)119903

and 119905(119898)119890

are the 119898th elements which stand forthe known real value and recovered value respectively 119872denotes the number of recovered traffic volumes

Table 4 presents the relative errors of traffic volumeoutlier data before and after being recovered by ADMM-TRThe results show that the RSE L

0and MAPE L

0for traffic

volume data corrupted by outlier data are about 5 times thanthe data after being recovered by ADMM-TR Figures 1 2and 3 present the profiles of traffic volume data for a day Theresults show that ADMM-TR could recover the traffic volumeoutlier data with perfect performance

6 Conclusions

In this paper we concentrate on the mathematical problemin traffic volume outlier data recovery and proposed anovel tensor recovery method based on alternating directionmethod of multipliers (ADMM) The proposed algorithmcan automatically separate the low-119899-rank tensor data andsparse partThe experiments show that the proposedmethodis more stable and accurate in most cases and has excellentconvergence rate Experiments on real world traffic volumedata demonstrate the practicability and effectiveness of theproposed method in traffic research domain

In the future we would like to investigate how to auto-matically choose the parameters in our algorithm and explore

8 Mathematical Problems in Engineering

additional applications of our method in traffic researchdomain

Acknowledgments

This research was supported by NSFC (Grant No 6127137661171118 and 91120015) Beijing Natural Science Foundation(4122067)The authors would like to thank Professor Bin Ranfrom the University ofWisconsin-Madison and Yong Li fromthe University of Notre Dame for the suggestive discussions

References

[1] K Tamura and M Hirayama ldquoToward realization of VICSmdashvehicle information and communications systemrdquo in Pro-ceedings of the IEEE-IEE Vehicle Navigation and InformationsSystems Conference pp 72ndash77 October 1993

[2] S Kim M E Lewis and C C White III ldquoOptimal vehiclerouting with real-time traffic informationrdquo IEEE Transactionson Intelligent Transportation Systems vol 6 no 2 pp 178ndash1882005

[3] P Mirchandani and L Head ldquoA real-time traffic signal controlsystem architecture algorithms and analysisrdquo TransportationResearch Part C vol 9 no 6 pp 415ndash432 2001

[4] B M Williams P K Durvasula and D E Brown ldquoUrbanfreeway traffic flow prediction application of seasonal autore-gressive integrated moving average and exponential smoothingmodelsrdquo Transportation Research Record no 1644 pp 132ndash1411998

[5] J Xu X Li and H Shi ldquoShort-term traffic flow forecastingmodel under missing datardquo Journal of Computer Applicationsvol 30 no 4 pp 1117ndash1120 2010

[6] Y Pei and J Ma ldquoReal-time traffic data screening and recon-structionrdquo China Civil Engineering Journal vol 36 no 7 pp78ndash83 2003

[7] S Chen W Wang and W Li ldquoNoise recognition and noisereduction of real-time trafficdatardquoDongnanDaxueXuebao vol36 no 2 pp 322ndash325 2006

[8] B-G Daniel J D-P Francisco G-O David et al ldquoWavelet-based denoising for traffic volume time series forecasting withself-organizing neural networksrdquo Computer-Aided Civil andInfrastructure Engineering vol 25 no 7 pp 530ndash545 2010

[9] S Gandy B Recht and I Yamada ldquoTensor completion andlowmdashrank tensor recovery via convex optimizationrdquo InverseProblems vol 27 no 2 Article ID 025010 2011

[10] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[11] A S Lewis and G Knowles ldquoImage compression using the 2-Dwavelet transformrdquo IEEE Transactions of Image Processing vol1 no 2 pp 244ndash250 1992

[12] V Chandrasekaran S Sanghavi P A Parrilo and A S WillskyldquoRank-sparsity incoherence for matrix decompositionrdquo SIAMJournal on Optimization vol 21 no 2 pp 572ndash596 2011

[13] J Wright Y Peng Y Ma and A Ganesh ldquoRobust principalcomponent analysis exact recovery of corrupted low-rankmatrices by convex optimizationrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 2080ndash2088 Vancouver Canada 2009

[14] A Ganesh Z Lin J Wright L Wu M Chen and Y MaldquoFast algorithms for recovering a corrupted low-rank matrixrdquo

in Proceedings of the 3rd IEEE International Workshop onComputational Advances in Multi-Sensor Adaptive Processing(CAMSAP rsquo09) pp 213ndash216 December 2009

[15] Z Lin M Chen L Wu and Y Ma ldquoThe augmented Lagrangemultiplier method for exact recovery of a corrupted low-rankmatricesrdquoMathematical Programming 2009

[16] X Yuan and J Yang ldquoSparse and low-rank matrix decomposi-tion via alternating directionmethodsrdquo Tech Rep Departmentof Mathematics Hong Kong Baptist University 2009

[17] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[18] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo in Proceedings of theInternational Conference on Computer Vision (ICCV rsquo09) 2009

[19] D Bertsekas and J Tsitsiklis Parallel and Distributed Computa-tion Prentice-Hall 1989

[20] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010

[21] E T Hale W Yin and Y Zhang ldquoFixed-point continuationfor ℓ1-minimization methodology and convergencerdquo SIAMJournal on Optimization vol 19 no 3 pp 1107ndash1130 2008

[22] Y Li J Yan Y Zhou and J Yang ldquoOptimum subspace learningand error correction for tensorsrdquo in Proceedings of the 11thEuropean Conference on Computer Vision (ECCV rsquo10) CreteGreece 2010

[23] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo Journal of the ACM vol 58 no 3 article11 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Traffic Volume Data Outlier …downloads.hindawi.com/journals/mpe/2013/164810.pdfTraffic Volume Data Outlier Recovery via Tensor Model HuachunTan, 1 JianshuaiFeng,

Mathematical Problems in Engineering 3

as A1= sum1198941sdotsdotsdot119894119896sdotsdotsdot119894119899

|1198861198941sdotsdotsdot119894119896sdotsdotsdot119894119899

| It is clear that A119865= 119860(119899)119865

A0= 119860(119899)0 and A

1= 119860(119899)1for any 1 le 119899 le 119873

The 119899-mode (matrix) product of a tensorA isin R1198681times1198682timessdotsdotsdottimes119868119873with a matrix 119872 isin R119869times119868119899 is denoted by Atimes

119899119872 and is size

1198681times sdot sdot sdot times 119868

119899minus1times119869times119868

119899+1times sdot sdot sdot times 119868

119873 In terms of flattened matrix

the 119899-mode product can be expressed as

Y = Atimes119899119872lArrrArr 119884

(119899)= 119872119860

(119899) (6)

3 Review of Data Recovery Methods

Recently the problem of recovering the sparse and low-rank components with no prior knowledge about the sparsitypattern of the sparse matrix or the rank of the low-rankmatrix has been well studied Authors of [12] proposedthe concept of ldquorank-sparse incoherencerdquo and solved theproblem by an interior point solver after being reformulatedas a semidefinite problem However although interior pointmethods normally take very few iterations to converge theyhave difficulty in handling large matrices So this limitationprevented the usage of the technique in computer vision andthe traffic volume recovery in this research

To solve the problem for large scale matrices Wrightet al [13] have adopted the iterative thresholding techniqueto solve the problem and obtained scalability propertiesLin et al have proposed an accelerated proximal gradient(APG) algorithm [14] and applied techniques of augmentedLagrange multipliers (ALM) [15] to solve the problem Yuanand Yang [16] have utilized the alternating direction method(ADM) which can be regarded as a practical version ofthe classical ALM method to solve the matrix recoveryproblem The ADM method has been proved to have apleasing convergence speed and results in [16] demonstratedits excellent performance

Inspired by the idea of [16] this paper extends the sparseand low-rank recovery problem to tensor case which is dueto the fact that the multidimensional traffic data can beformulated into the form of tensor

4 ADMM-TR for Traffic Volume OutlierRecovery

In this section we show the solution of problem (2) Thetensor model is firstly constructed for traffic volume inSection 41 Then we present the tensor recovery problem inSection 42 In Section 43 the classical ADMM approach isintroduced In Section 44 we convert the original probleminto a constrained convex optimization problem which canbe solved by the extended ADMM approach and presentthe details of the proposed algorithm Also the convergenceguarantees of the proposed algorithm are given in thissection

41 Tensor Model for Traffic Volume The correlations oftraffic volume data are critical for recovering the corruptedtraffic volume data Traditional methods mostly exploit partof correlations such as historical or temporal neighboringcorrelationsThe classic methods usually utilize the temporal

Table 1 The similarity coefficient of four modes

Mode Size Similarity coefficientHour 6 times 12 09670Day 7 times 288 08654Week 7 times 288 09153Link 4 times 288 08497

correlations of traffic data from day to day For the singledetector data multiple correlations contain the relations oftraffic data from day to day hour to hour and so forth Inaddition the spatial correlations exist in multiple detectorsdata

In this paper quantitative analysis of traffic data corre-lation is analyzed based on the traffic volume data down-loaded from httppemsdotcagov The correlation coeffi-cient applied tomeasuring the data correlation is given by [17]

119904 =

sum119899ge119894gt119895ge1

119877 (119894 119895)

119899 (119899 minus 1) 2

(7)

where 119899 refers to the whole data points 119877(119894 119895) refers tothe correlation coefficient matrix Table 1 gives the resultsof correlation coefficient of four modes which is hour dayweek and month

Conventional methods usually use day-to-day matrixpattern to model the traffic data Although each mode oftraffic data has a very high similarity these methods do notutilize the multimode correlations which are ldquoDay times HourrdquoldquoWeek times Hourrdquo and ldquoLink times Hourrdquo simultaneously and thusmay result in poor recovery performance

To make full use of the multimode correlation and trafficspatial-temporal information traffic data need to be con-structed into multiway data set Fortunately tensor patternbased traffic data can be well used to model the multiwaytraffic dataThis helps keep the original structure and employenough traffic spatial-temporal information

42 The Tensor Recovery Problem The problem of (2) is NPhard since it is not convex Then we use an approximationformulation as shown in (8)

minLS Llowast + 120578S1

st A minusL minusS119865 le 120575(8)

where A isin R1198681times1198682timessdotsdotsdottimes119868119873 is the given matrix to be recoveredL isin R1198681times1198682timessdotsdotsdottimes119868119873 is the low-rank component of A S isin

R1198681times1198682timessdotsdotsdottimes119868119873 is the sparse component of A Compared with(2) (8) relaxes the constrain for recovering a low-119899-ranktensor from a high-dimensional data tensor despite bothsmall entry-wise noise and gross sparse errors

Recently Liu et al [18] have proposed the definition of thenuclear norm of an 119899-mode tensor

Xlowast =1

119899

119899

sum

119894=1

1003817100381710038171003817119883(119894)

1003817100381710038171003817lowast (9)

4 Mathematical Problems in Engineering

Based on this definition the optimization in (8) can bewritten as

minLS Llowast + 120578S1 equiv

1

119899

119899

sum

119894=1

120582119894

1003817100381710038171003817119871(119894)

1003817100381710038171003817lowast+

1

119899

119899

sum

119894=1

120578119894

1003817100381710038171003817119878(119894)

10038171003817100381710038171

st A minusL minusS119865 le 120575

(10)

In order to recover (L S) instead of directly solving (10)we solve the following dual problem

min LS

1

2120574

A minusL minusS2

119865+

119899

sum

119894=1

120582119894

1003817100381710038171003817119871(119894)

1003817100381710038171003817lowast+

119899

sum

119894=1

120578119894

1003817100381710038171003817119878(119894)

10038171003817100381710038171 (11)

The problem in (11) is still difficult to solve due to theinterdependent nuclear norm and 119897

1norm constraints To

simplify the problem the formulation can be reformulated asfollows

minLS119872

119894119873119894

1

2120574

A minusL minus S2

119865+

119899

sum

119894=1

120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+

119899

sum

119894=1

120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171

st 119875119894L = 119872

119894119875119894S = 119873

119894forall119894

(12)

where 119875119894is the matrix representation of mode-119894 unfolding

(note that 119875119894is a permutation matrix thus 119875

119894

119879119875119894= 119868) 119872

(119894)

and119873(119894)are additional auxiliary matrices of the same size as

the mode-119894 unfolding ofL (or S)

43The Classical ADMMApproach The classical alternatingdirection method of multipliers (ADMM) is for solvingstructured convex programs of the form

min119909isin119862119909119910isin119862119910

119891 (119909) + 119892 (119910)

st 119860119909 + 119861119910 = 119888(13)

where119891 and 119892 are convex functions defined on closed subsets119862119909and 119862

119910119860 119861 and 119888 are matrices and vector of appropriate

sizes The segmented Lagrangian function of (13) is

119871119860(119909 119910 119908) = 119891 (119909) + 119892 (119910) + ⟨119908119860119909 + 119861119910 minus 119888⟩

+

120573

2

1003817100381710038171003817119860119909 + 119861119910 minus 119888

1003817100381710038171003817

2

2

(14)

where 119908 is a Lagrangian multiplier vector and 120573 gt 0 is apenalty parameter

The approach performs one sweep of alternating mini-mization with respect to 119909 and 119910 individually then updatesthe multiplier 119908 at the iteration 119896 the steps are given by [18Equations (479)ndash(481)]

119909(119896+1)larr997888 arg min

119909isin119862119909

119871119860(119909 119910(119896) 119908(119896))

119910(119896+1)larr997888 argmin

119909isin119862119910

119871119860(119909(119896+1) 119910 119908(119896))

119908(119896+1)larr997888 119908

(119896)+ 120588120573 (119860119909

(119896+1)+ 119861119910(119896+1)minus 119888)

(15)

where 120588 is the step length A convergence proof for the aboveADMM algorithm was shown as follows

Theorem 1 (See [19 Proposition 52]) Assume that theoptimal solution set 119883lowast of (13) is nonempty Furthermoreassume that119862

119909is bounded or else the matrix119860lowast119860 is invertible

Then a sequence 119909(119896) 119910(119896) 119908(119896) generated by (15) is boundedand every limit point of 119909(119896) is an optimal solution of theoriginal of problem (13)

44 ADMM Extension to Tensor Recovery We observe (12)is well structured in the sense that the separable structureemerges in both the objective function and constraintsThuswe propose an algorithm based on an extension of theclassical ADMM approach for solving the tensor recoveryproblem by taking advantage of this favorable structure

The augmented Lagrangian of (12) is

119871119860(LS119872

119894 119873119894)

=

1

2120574

A minusL minusS2

119865

+

119899

sum

119894=1

(120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+ ⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865)

+

119899

sum

119894=1

(120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171+ ⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865)

(16)

where 119884119894 119885119894are Lagrangian multipliers and 120572

119894 120573119894gt 0 are

penalty parametersThen we can now directly apply ADMM with this aug-

mented Lagrangian function

Computing 119872119894 The optimal119872

119894can be solved with all other

variables to be constant by the following subproblem

min119872119894

120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+ ⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865 (17)

As shown in [20] the optimal solution of (17) is given by

119872119894= 119880119894119863120582119894120572119894(Λ)119881119894

119879 (18)

where 119880119894Λ119881119894

119879 is the singular value decomposition given by

119880119894Λ119881119894

119879= 119875119894L +

119884119894

120572119894

(19)

and the ldquoshrinkagerdquo operator119863120591(119909) with 120591 gt 0 is defined as

119863120591 (119909) =

119909 minus 120591 if 119909 gt 120591119909 + 120591 if 119909 lt minus1205910 otherwise

(20)

Computing 119873119894 The optimal 119873

119894can be solved with all other

variables to be the constants by the following subproblem

min119873119894

120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171+ ⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865 (21)

Mathematical Problems in Engineering 5

Input n-mode tensorAParameters 120572 120573 120574 120582 120578 120588(1) InitializationL(0) = A S(0) = 0 119872

119894=L(119894) 119873119894= 0 119896 = 1

(2) Repeat until convergence(3) for 119894 = 1 to 119899(4) 119872

(119896+1)

119894= 119880119894119863120582119894120572119894(Λ)119881

119879

119894

where 119880119894Λ119881119879

119894= 119875119894L(119896) + 119884

(119896)

119894120572119894

(5) 119873(119896+1)

119894= 119863120578119894120573119894(119875119894S(119896) + 119885

(119896)

119894120573119894)

(6) 119884(119896+1)

119894= 119884(119896)

119894+ 120572119894(119875119894L(119896) minus119872

(119896+1)

119894)

(7) 119885(119896+1)

119894= 119885(119896)

119894+ 120573119894(119875119894S(119896) minus 119873

(119896+1)

119894)

(8) end for

(9) L(119896+1) =A minus S(119896) minus 120574sum

119899

119894=1119875119879

119894(119884(119896+1)

119894minus 120572119894119872(119896+1)

119894)

1 + 120574sum119899

119894=1120572119894

(10) S(119896+1) =A minusL(119896+1) minus 120574sum

119899

119894=1119875119879

119894(119885(119896+1)

119894minus 120572119894119873(119896+1)

119894)

1 + 120574sum119899

119894=1120573119894

(11) 120572 = 120588120572 120573 = 120588120573(12) End(13) 119896 = 119896 + 1

Output n-mode tensorLS

Algorithm 1 ADMM-TR ADMM for tensor recovery

By the well-known 1198971minimization [21] the optimal

solution of (21) is

119873119894= 119863120578119894120573119894

(119875119894S +119885119894

120573119894

) (22)

where119863120591is the ldquoshrinkagerdquo operation

Computing L Now we fix all variables except L andminimize 119871

119860over L The resulting minimization problem

is the minimization of a quadratic function

minL 119871119860 (

L) =1

2120574

A minusL minus S2

119865

+

119899

sum

119894=1

(⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865)

(23)

The objective function is differentiable so the minimizerLmin is characterized by (120597119871119860(L))120597L = 0 Thus we obtain

Lmin =A minusS minus 120574sum

119899

119894=1119875119894

119879(119884119894minus 120572119894119872119894)

(1 + 120574sum119899

119894=1120572119894)

(24)

ComputingS Nowwefix all variables exceptS andminimize119871119860

over S The resulting minimization problem is theminimization of a quadratic function

minS 119871119860 (

S) =1

2120574

A minusL minusS2

119865

+

119899

sum

119894=1

(⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865)

(25)

The objective function is also differentiable so the min-imizer Smin is characterized by (120597119871

119860(S))120597S = 0 Thus we

have

Smin =A minusL minus 120574sum

119899

119894=1119875119894

119879(119885119894minus 120573119894119873119894)

(1 + 120574sum119899

119894=1120573119894)

(26)

For comparing with RSTD [22] we also choose thedifference of L and S in successive iterations against acertain tolerance as the stopping criterion The pseudocodeof the proposed ADMM-TR algorithm is summarized inAlgorithm 1

Theorem 2 (Convergence of ADMM-TR) Assume that theoptimal solution set 119883lowast of (11) is nonempty A sequenceL(119896)S(119896)119872

(119896)

119894 119873(119896)

119894 119884(119896)

119894 119885(119896)

119894 generated by our proposed

ADMM-TR algorithm is bounded and every limit point ofL(119896)S(119896) is an optimal solution of the original problem (11)

Proof We check the assumptions of Theorem 1 119862119909is not

bounded but 119875119894

lowast119875119894= 119868 is a constant multiple of the identity

operator Thus Theorem 1 can also be applied to ADMM-TRandTheorem 2 can be derived

5 Numerical Experiments

This section evaluates the empirical performance of the pro-posed algorithm on synthetic data and compares the resultswithRSTD (Rank Sparsity TensorDecomposition) [22] Alsoexperiments on traffic volume data outlier recovery illustratethe efficiency of the proposedmethod in traffic research filed

We use the Lanczos algorithm for computing the singularvalues decomposition and adopt the same rule for predicting

6 Mathematical Problems in Engineering

Table 2 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 119899-rank = [5 5 5]

119904119901119903

ADMM-TR RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 43 43 133 137 46 46 208 178015 98 42 162 208 105 47 235 336025 123 43 514 449 159 52 676 510035 565 95 654 533 675 114 737 575

Table 3 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 n-rank = [10 10 10]

119904119901119903

Algorithm ADMM-TR Algorithm RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 44 20 236 264 47 25 338 299015 47 21 417 421 52 25 603 508025 89 29 664 608 120 42 663 536035 173 52 981 806 211 65 1110 877

the dimension of the principal singular space as [22] And theparameters are set as 120572 = 120573 = [119868

1119868max 1198682119868max 119868119899119868max]

119879

and 120574 = 1sum([1198681119868max 1198682119868max 119868119899119868max]

119879 for all experi-ments where 119868max = max119868

119894 120578 is set to 1radic119868max as suggested

in [23]All the experiments are conducted and timed on the same

desktop with an Pentium (R) Dual-Core 250GHz CPU thathas 4GB memory running on Windows 7 and MATLAB

51 Synthetic Data ADMM-TR and RSTD are tested on thesynthetic data of size 40times40times40 We generate the dimension119903 of a ldquocore tensorrdquo C isin R119903timessdotsdotsdottimes119903 which we fill with Gaussiandistributed entries (sim N(0 1)) Then we generate matrices119880(1) 119880

(119873) with119880(119894) isin R119899119894times119903 whose elements are also iidGaussian random variables (simN(0 1)) and set

L0= Ctimes1119880(1)

times2sdotsdotsdottimes119873119880(119873) (27)

The entries of sparse tensor S0are independently dis-

tributed each taking on value 0 with probability 1-119904119901119903 andeach taking on impulsive value with probability 119904119901119903We applythe proposed algorithm to the tensorA

0=L0+S0to recover

L and S and compare with RSTD For these experimentstwo cases of 119899-rank are investigated 119899-rank = [5 5 5] and 119899-rank = [10 10 10] Table 1 presents the average results (across30 instances) for different 119904119901119903

The quality of recovery is measured by the relative squareerror (RSE) toL

0and S

0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

RSE S0=

10038171003817100381710038171003817

S minusS0

10038171003817100381710038171003817119865

1003817100381710038171003817S0

1003817100381710038171003817119865

(28)

Tables 2 and 3 show that the proposed algorithm(ADMM-TR) is about 10 percent faster than RSTD proposed

in [22] and achieves better accuracy in terms of relative squareerror Though both of the algorithms involve computing aSVD per iteration we observe that the proposed algorithmtake much fewer iterations than RSTD to converge to theoptimal solution

The more impulsive entries are added that is for highervalue of spr the less probable it becomes for the tensorrecovery problem In addition the problem becomes moresophisticated when the 119899-rank is higher for the ground truthtensors of the same size In Table 1 different spr is set fortwo tensor cases And results show the recovered accuracydecreases as the spr grows for a certain case In particular therecovered accuracy for the tensorA

0isin R40times40times40 with 119899-rank

= [5 5 5] decreases sharply when the spr is up to 40 whilethe phenomenon occurs for tensor A

0isin R40times40times40 with 119899-

rank = [10 10 10] when the spr is about 25 This is due tothat relative low rank and low sparse ratio are precondition ofthe tensor recovery problem

52 TrafficVolumeData To evaluate the performances of theproposed method in traffic volume data outlier recovery acomplete traffic volume data set is used as the test data setWeuse the data of a fixed point in Sacramento County which isdownloaded fromhttppemsdotcagovThe traffic volumedata are recorded every 5 minutes Therefore a daily trafficvolume series for a loop detector contains 288 records andthe whole period of the data lasts for 16 days that is fromAugust 2 to August 17 2010

Based on multiple correlations of the traffic volume datawe model the data set as a tensor model of size 16 times 24 times 12which stands for 16 days 24 hours in a day and 12 sampleintervals (ie recorded by 5 minutes) per hour The ratiosof outlier data are set from 5 to 15 and the outlier dataare produced randomly All the results are averaged by 10instances

Mathematical Problems in Engineering 7

0

100

200

300

400

500

600

700

800

900

1000

Traffi

c vol

ume (

veh5

min

)

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Figure 1 Comparisons with raw traffic volume data data corruptedby outliers with 5 ratio and data recovered by ADMM-TR

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 2 Comparisons with raw traffic volume data data corruptedby outliers with 10 ratio and data recovered by ADMM-TR

For the real world data we mainly pay attention to thecorrectness of the recovered traffic volume data Thus thequality of recovery is measured by the relative square error(RSE) to L

0and Mean Absolute Percentage Error (MAPE)

toL0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

MAPE L0=

1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

(29)

0

200

400

600

800

1000

1200

1400

1600

1800

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 3 Comparisons with raw traffic volume data data corruptedby outliers with 15 ratio and data recovered by ADMM-TR

Table 4 Traffic volume outlier data before and afterbeing recovered

119904119901119903

Before recovery After recoveryRSE L

0

(119890-3)MAPE L

0

(119890-3)RSE L

0

(119890-3)MAPE L

0

(119890-3)005 05005 02136 00961 00314010 07066 05008 01417 01027015 08850 08777 02221 02351

where 119905(119898)119903

and 119905(119898)119890

are the 119898th elements which stand forthe known real value and recovered value respectively 119872denotes the number of recovered traffic volumes

Table 4 presents the relative errors of traffic volumeoutlier data before and after being recovered by ADMM-TRThe results show that the RSE L

0and MAPE L

0for traffic

volume data corrupted by outlier data are about 5 times thanthe data after being recovered by ADMM-TR Figures 1 2and 3 present the profiles of traffic volume data for a day Theresults show that ADMM-TR could recover the traffic volumeoutlier data with perfect performance

6 Conclusions

In this paper we concentrate on the mathematical problemin traffic volume outlier data recovery and proposed anovel tensor recovery method based on alternating directionmethod of multipliers (ADMM) The proposed algorithmcan automatically separate the low-119899-rank tensor data andsparse partThe experiments show that the proposedmethodis more stable and accurate in most cases and has excellentconvergence rate Experiments on real world traffic volumedata demonstrate the practicability and effectiveness of theproposed method in traffic research domain

In the future we would like to investigate how to auto-matically choose the parameters in our algorithm and explore

8 Mathematical Problems in Engineering

additional applications of our method in traffic researchdomain

Acknowledgments

This research was supported by NSFC (Grant No 6127137661171118 and 91120015) Beijing Natural Science Foundation(4122067)The authors would like to thank Professor Bin Ranfrom the University ofWisconsin-Madison and Yong Li fromthe University of Notre Dame for the suggestive discussions

References

[1] K Tamura and M Hirayama ldquoToward realization of VICSmdashvehicle information and communications systemrdquo in Pro-ceedings of the IEEE-IEE Vehicle Navigation and InformationsSystems Conference pp 72ndash77 October 1993

[2] S Kim M E Lewis and C C White III ldquoOptimal vehiclerouting with real-time traffic informationrdquo IEEE Transactionson Intelligent Transportation Systems vol 6 no 2 pp 178ndash1882005

[3] P Mirchandani and L Head ldquoA real-time traffic signal controlsystem architecture algorithms and analysisrdquo TransportationResearch Part C vol 9 no 6 pp 415ndash432 2001

[4] B M Williams P K Durvasula and D E Brown ldquoUrbanfreeway traffic flow prediction application of seasonal autore-gressive integrated moving average and exponential smoothingmodelsrdquo Transportation Research Record no 1644 pp 132ndash1411998

[5] J Xu X Li and H Shi ldquoShort-term traffic flow forecastingmodel under missing datardquo Journal of Computer Applicationsvol 30 no 4 pp 1117ndash1120 2010

[6] Y Pei and J Ma ldquoReal-time traffic data screening and recon-structionrdquo China Civil Engineering Journal vol 36 no 7 pp78ndash83 2003

[7] S Chen W Wang and W Li ldquoNoise recognition and noisereduction of real-time trafficdatardquoDongnanDaxueXuebao vol36 no 2 pp 322ndash325 2006

[8] B-G Daniel J D-P Francisco G-O David et al ldquoWavelet-based denoising for traffic volume time series forecasting withself-organizing neural networksrdquo Computer-Aided Civil andInfrastructure Engineering vol 25 no 7 pp 530ndash545 2010

[9] S Gandy B Recht and I Yamada ldquoTensor completion andlowmdashrank tensor recovery via convex optimizationrdquo InverseProblems vol 27 no 2 Article ID 025010 2011

[10] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[11] A S Lewis and G Knowles ldquoImage compression using the 2-Dwavelet transformrdquo IEEE Transactions of Image Processing vol1 no 2 pp 244ndash250 1992

[12] V Chandrasekaran S Sanghavi P A Parrilo and A S WillskyldquoRank-sparsity incoherence for matrix decompositionrdquo SIAMJournal on Optimization vol 21 no 2 pp 572ndash596 2011

[13] J Wright Y Peng Y Ma and A Ganesh ldquoRobust principalcomponent analysis exact recovery of corrupted low-rankmatrices by convex optimizationrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 2080ndash2088 Vancouver Canada 2009

[14] A Ganesh Z Lin J Wright L Wu M Chen and Y MaldquoFast algorithms for recovering a corrupted low-rank matrixrdquo

in Proceedings of the 3rd IEEE International Workshop onComputational Advances in Multi-Sensor Adaptive Processing(CAMSAP rsquo09) pp 213ndash216 December 2009

[15] Z Lin M Chen L Wu and Y Ma ldquoThe augmented Lagrangemultiplier method for exact recovery of a corrupted low-rankmatricesrdquoMathematical Programming 2009

[16] X Yuan and J Yang ldquoSparse and low-rank matrix decomposi-tion via alternating directionmethodsrdquo Tech Rep Departmentof Mathematics Hong Kong Baptist University 2009

[17] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[18] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo in Proceedings of theInternational Conference on Computer Vision (ICCV rsquo09) 2009

[19] D Bertsekas and J Tsitsiklis Parallel and Distributed Computa-tion Prentice-Hall 1989

[20] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010

[21] E T Hale W Yin and Y Zhang ldquoFixed-point continuationfor ℓ1-minimization methodology and convergencerdquo SIAMJournal on Optimization vol 19 no 3 pp 1107ndash1130 2008

[22] Y Li J Yan Y Zhou and J Yang ldquoOptimum subspace learningand error correction for tensorsrdquo in Proceedings of the 11thEuropean Conference on Computer Vision (ECCV rsquo10) CreteGreece 2010

[23] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo Journal of the ACM vol 58 no 3 article11 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Traffic Volume Data Outlier …downloads.hindawi.com/journals/mpe/2013/164810.pdfTraffic Volume Data Outlier Recovery via Tensor Model HuachunTan, 1 JianshuaiFeng,

4 Mathematical Problems in Engineering

Based on this definition the optimization in (8) can bewritten as

minLS Llowast + 120578S1 equiv

1

119899

119899

sum

119894=1

120582119894

1003817100381710038171003817119871(119894)

1003817100381710038171003817lowast+

1

119899

119899

sum

119894=1

120578119894

1003817100381710038171003817119878(119894)

10038171003817100381710038171

st A minusL minusS119865 le 120575

(10)

In order to recover (L S) instead of directly solving (10)we solve the following dual problem

min LS

1

2120574

A minusL minusS2

119865+

119899

sum

119894=1

120582119894

1003817100381710038171003817119871(119894)

1003817100381710038171003817lowast+

119899

sum

119894=1

120578119894

1003817100381710038171003817119878(119894)

10038171003817100381710038171 (11)

The problem in (11) is still difficult to solve due to theinterdependent nuclear norm and 119897

1norm constraints To

simplify the problem the formulation can be reformulated asfollows

minLS119872

119894119873119894

1

2120574

A minusL minus S2

119865+

119899

sum

119894=1

120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+

119899

sum

119894=1

120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171

st 119875119894L = 119872

119894119875119894S = 119873

119894forall119894

(12)

where 119875119894is the matrix representation of mode-119894 unfolding

(note that 119875119894is a permutation matrix thus 119875

119894

119879119875119894= 119868) 119872

(119894)

and119873(119894)are additional auxiliary matrices of the same size as

the mode-119894 unfolding ofL (or S)

43The Classical ADMMApproach The classical alternatingdirection method of multipliers (ADMM) is for solvingstructured convex programs of the form

min119909isin119862119909119910isin119862119910

119891 (119909) + 119892 (119910)

st 119860119909 + 119861119910 = 119888(13)

where119891 and 119892 are convex functions defined on closed subsets119862119909and 119862

119910119860 119861 and 119888 are matrices and vector of appropriate

sizes The segmented Lagrangian function of (13) is

119871119860(119909 119910 119908) = 119891 (119909) + 119892 (119910) + ⟨119908119860119909 + 119861119910 minus 119888⟩

+

120573

2

1003817100381710038171003817119860119909 + 119861119910 minus 119888

1003817100381710038171003817

2

2

(14)

where 119908 is a Lagrangian multiplier vector and 120573 gt 0 is apenalty parameter

The approach performs one sweep of alternating mini-mization with respect to 119909 and 119910 individually then updatesthe multiplier 119908 at the iteration 119896 the steps are given by [18Equations (479)ndash(481)]

119909(119896+1)larr997888 arg min

119909isin119862119909

119871119860(119909 119910(119896) 119908(119896))

119910(119896+1)larr997888 argmin

119909isin119862119910

119871119860(119909(119896+1) 119910 119908(119896))

119908(119896+1)larr997888 119908

(119896)+ 120588120573 (119860119909

(119896+1)+ 119861119910(119896+1)minus 119888)

(15)

where 120588 is the step length A convergence proof for the aboveADMM algorithm was shown as follows

Theorem 1 (See [19 Proposition 52]) Assume that theoptimal solution set 119883lowast of (13) is nonempty Furthermoreassume that119862

119909is bounded or else the matrix119860lowast119860 is invertible

Then a sequence 119909(119896) 119910(119896) 119908(119896) generated by (15) is boundedand every limit point of 119909(119896) is an optimal solution of theoriginal of problem (13)

44 ADMM Extension to Tensor Recovery We observe (12)is well structured in the sense that the separable structureemerges in both the objective function and constraintsThuswe propose an algorithm based on an extension of theclassical ADMM approach for solving the tensor recoveryproblem by taking advantage of this favorable structure

The augmented Lagrangian of (12) is

119871119860(LS119872

119894 119873119894)

=

1

2120574

A minusL minusS2

119865

+

119899

sum

119894=1

(120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+ ⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865)

+

119899

sum

119894=1

(120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171+ ⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865)

(16)

where 119884119894 119885119894are Lagrangian multipliers and 120572

119894 120573119894gt 0 are

penalty parametersThen we can now directly apply ADMM with this aug-

mented Lagrangian function

Computing 119872119894 The optimal119872

119894can be solved with all other

variables to be constant by the following subproblem

min119872119894

120582119894

1003817100381710038171003817119872119894

1003817100381710038171003817lowast+ ⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865 (17)

As shown in [20] the optimal solution of (17) is given by

119872119894= 119880119894119863120582119894120572119894(Λ)119881119894

119879 (18)

where 119880119894Λ119881119894

119879 is the singular value decomposition given by

119880119894Λ119881119894

119879= 119875119894L +

119884119894

120572119894

(19)

and the ldquoshrinkagerdquo operator119863120591(119909) with 120591 gt 0 is defined as

119863120591 (119909) =

119909 minus 120591 if 119909 gt 120591119909 + 120591 if 119909 lt minus1205910 otherwise

(20)

Computing 119873119894 The optimal 119873

119894can be solved with all other

variables to be the constants by the following subproblem

min119873119894

120578119894

1003817100381710038171003817119873119894

10038171003817100381710038171+ ⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865 (21)

Mathematical Problems in Engineering 5

Input n-mode tensorAParameters 120572 120573 120574 120582 120578 120588(1) InitializationL(0) = A S(0) = 0 119872

119894=L(119894) 119873119894= 0 119896 = 1

(2) Repeat until convergence(3) for 119894 = 1 to 119899(4) 119872

(119896+1)

119894= 119880119894119863120582119894120572119894(Λ)119881

119879

119894

where 119880119894Λ119881119879

119894= 119875119894L(119896) + 119884

(119896)

119894120572119894

(5) 119873(119896+1)

119894= 119863120578119894120573119894(119875119894S(119896) + 119885

(119896)

119894120573119894)

(6) 119884(119896+1)

119894= 119884(119896)

119894+ 120572119894(119875119894L(119896) minus119872

(119896+1)

119894)

(7) 119885(119896+1)

119894= 119885(119896)

119894+ 120573119894(119875119894S(119896) minus 119873

(119896+1)

119894)

(8) end for

(9) L(119896+1) =A minus S(119896) minus 120574sum

119899

119894=1119875119879

119894(119884(119896+1)

119894minus 120572119894119872(119896+1)

119894)

1 + 120574sum119899

119894=1120572119894

(10) S(119896+1) =A minusL(119896+1) minus 120574sum

119899

119894=1119875119879

119894(119885(119896+1)

119894minus 120572119894119873(119896+1)

119894)

1 + 120574sum119899

119894=1120573119894

(11) 120572 = 120588120572 120573 = 120588120573(12) End(13) 119896 = 119896 + 1

Output n-mode tensorLS

Algorithm 1 ADMM-TR ADMM for tensor recovery

By the well-known 1198971minimization [21] the optimal

solution of (21) is

119873119894= 119863120578119894120573119894

(119875119894S +119885119894

120573119894

) (22)

where119863120591is the ldquoshrinkagerdquo operation

Computing L Now we fix all variables except L andminimize 119871

119860over L The resulting minimization problem

is the minimization of a quadratic function

minL 119871119860 (

L) =1

2120574

A minusL minus S2

119865

+

119899

sum

119894=1

(⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865)

(23)

The objective function is differentiable so the minimizerLmin is characterized by (120597119871119860(L))120597L = 0 Thus we obtain

Lmin =A minusS minus 120574sum

119899

119894=1119875119894

119879(119884119894minus 120572119894119872119894)

(1 + 120574sum119899

119894=1120572119894)

(24)

ComputingS Nowwefix all variables exceptS andminimize119871119860

over S The resulting minimization problem is theminimization of a quadratic function

minS 119871119860 (

S) =1

2120574

A minusL minusS2

119865

+

119899

sum

119894=1

(⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865)

(25)

The objective function is also differentiable so the min-imizer Smin is characterized by (120597119871

119860(S))120597S = 0 Thus we

have

Smin =A minusL minus 120574sum

119899

119894=1119875119894

119879(119885119894minus 120573119894119873119894)

(1 + 120574sum119899

119894=1120573119894)

(26)

For comparing with RSTD [22] we also choose thedifference of L and S in successive iterations against acertain tolerance as the stopping criterion The pseudocodeof the proposed ADMM-TR algorithm is summarized inAlgorithm 1

Theorem 2 (Convergence of ADMM-TR) Assume that theoptimal solution set 119883lowast of (11) is nonempty A sequenceL(119896)S(119896)119872

(119896)

119894 119873(119896)

119894 119884(119896)

119894 119885(119896)

119894 generated by our proposed

ADMM-TR algorithm is bounded and every limit point ofL(119896)S(119896) is an optimal solution of the original problem (11)

Proof We check the assumptions of Theorem 1 119862119909is not

bounded but 119875119894

lowast119875119894= 119868 is a constant multiple of the identity

operator Thus Theorem 1 can also be applied to ADMM-TRandTheorem 2 can be derived

5 Numerical Experiments

This section evaluates the empirical performance of the pro-posed algorithm on synthetic data and compares the resultswithRSTD (Rank Sparsity TensorDecomposition) [22] Alsoexperiments on traffic volume data outlier recovery illustratethe efficiency of the proposedmethod in traffic research filed

We use the Lanczos algorithm for computing the singularvalues decomposition and adopt the same rule for predicting

6 Mathematical Problems in Engineering

Table 2 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 119899-rank = [5 5 5]

119904119901119903

ADMM-TR RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 43 43 133 137 46 46 208 178015 98 42 162 208 105 47 235 336025 123 43 514 449 159 52 676 510035 565 95 654 533 675 114 737 575

Table 3 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 n-rank = [10 10 10]

119904119901119903

Algorithm ADMM-TR Algorithm RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 44 20 236 264 47 25 338 299015 47 21 417 421 52 25 603 508025 89 29 664 608 120 42 663 536035 173 52 981 806 211 65 1110 877

the dimension of the principal singular space as [22] And theparameters are set as 120572 = 120573 = [119868

1119868max 1198682119868max 119868119899119868max]

119879

and 120574 = 1sum([1198681119868max 1198682119868max 119868119899119868max]

119879 for all experi-ments where 119868max = max119868

119894 120578 is set to 1radic119868max as suggested

in [23]All the experiments are conducted and timed on the same

desktop with an Pentium (R) Dual-Core 250GHz CPU thathas 4GB memory running on Windows 7 and MATLAB

51 Synthetic Data ADMM-TR and RSTD are tested on thesynthetic data of size 40times40times40 We generate the dimension119903 of a ldquocore tensorrdquo C isin R119903timessdotsdotsdottimes119903 which we fill with Gaussiandistributed entries (sim N(0 1)) Then we generate matrices119880(1) 119880

(119873) with119880(119894) isin R119899119894times119903 whose elements are also iidGaussian random variables (simN(0 1)) and set

L0= Ctimes1119880(1)

times2sdotsdotsdottimes119873119880(119873) (27)

The entries of sparse tensor S0are independently dis-

tributed each taking on value 0 with probability 1-119904119901119903 andeach taking on impulsive value with probability 119904119901119903We applythe proposed algorithm to the tensorA

0=L0+S0to recover

L and S and compare with RSTD For these experimentstwo cases of 119899-rank are investigated 119899-rank = [5 5 5] and 119899-rank = [10 10 10] Table 1 presents the average results (across30 instances) for different 119904119901119903

The quality of recovery is measured by the relative squareerror (RSE) toL

0and S

0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

RSE S0=

10038171003817100381710038171003817

S minusS0

10038171003817100381710038171003817119865

1003817100381710038171003817S0

1003817100381710038171003817119865

(28)

Tables 2 and 3 show that the proposed algorithm(ADMM-TR) is about 10 percent faster than RSTD proposed

in [22] and achieves better accuracy in terms of relative squareerror Though both of the algorithms involve computing aSVD per iteration we observe that the proposed algorithmtake much fewer iterations than RSTD to converge to theoptimal solution

The more impulsive entries are added that is for highervalue of spr the less probable it becomes for the tensorrecovery problem In addition the problem becomes moresophisticated when the 119899-rank is higher for the ground truthtensors of the same size In Table 1 different spr is set fortwo tensor cases And results show the recovered accuracydecreases as the spr grows for a certain case In particular therecovered accuracy for the tensorA

0isin R40times40times40 with 119899-rank

= [5 5 5] decreases sharply when the spr is up to 40 whilethe phenomenon occurs for tensor A

0isin R40times40times40 with 119899-

rank = [10 10 10] when the spr is about 25 This is due tothat relative low rank and low sparse ratio are precondition ofthe tensor recovery problem

52 TrafficVolumeData To evaluate the performances of theproposed method in traffic volume data outlier recovery acomplete traffic volume data set is used as the test data setWeuse the data of a fixed point in Sacramento County which isdownloaded fromhttppemsdotcagovThe traffic volumedata are recorded every 5 minutes Therefore a daily trafficvolume series for a loop detector contains 288 records andthe whole period of the data lasts for 16 days that is fromAugust 2 to August 17 2010

Based on multiple correlations of the traffic volume datawe model the data set as a tensor model of size 16 times 24 times 12which stands for 16 days 24 hours in a day and 12 sampleintervals (ie recorded by 5 minutes) per hour The ratiosof outlier data are set from 5 to 15 and the outlier dataare produced randomly All the results are averaged by 10instances

Mathematical Problems in Engineering 7

0

100

200

300

400

500

600

700

800

900

1000

Traffi

c vol

ume (

veh5

min

)

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Figure 1 Comparisons with raw traffic volume data data corruptedby outliers with 5 ratio and data recovered by ADMM-TR

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 2 Comparisons with raw traffic volume data data corruptedby outliers with 10 ratio and data recovered by ADMM-TR

For the real world data we mainly pay attention to thecorrectness of the recovered traffic volume data Thus thequality of recovery is measured by the relative square error(RSE) to L

0and Mean Absolute Percentage Error (MAPE)

toL0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

MAPE L0=

1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

(29)

0

200

400

600

800

1000

1200

1400

1600

1800

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 3 Comparisons with raw traffic volume data data corruptedby outliers with 15 ratio and data recovered by ADMM-TR

Table 4 Traffic volume outlier data before and afterbeing recovered

119904119901119903

Before recovery After recoveryRSE L

0

(119890-3)MAPE L

0

(119890-3)RSE L

0

(119890-3)MAPE L

0

(119890-3)005 05005 02136 00961 00314010 07066 05008 01417 01027015 08850 08777 02221 02351

where 119905(119898)119903

and 119905(119898)119890

are the 119898th elements which stand forthe known real value and recovered value respectively 119872denotes the number of recovered traffic volumes

Table 4 presents the relative errors of traffic volumeoutlier data before and after being recovered by ADMM-TRThe results show that the RSE L

0and MAPE L

0for traffic

volume data corrupted by outlier data are about 5 times thanthe data after being recovered by ADMM-TR Figures 1 2and 3 present the profiles of traffic volume data for a day Theresults show that ADMM-TR could recover the traffic volumeoutlier data with perfect performance

6 Conclusions

In this paper we concentrate on the mathematical problemin traffic volume outlier data recovery and proposed anovel tensor recovery method based on alternating directionmethod of multipliers (ADMM) The proposed algorithmcan automatically separate the low-119899-rank tensor data andsparse partThe experiments show that the proposedmethodis more stable and accurate in most cases and has excellentconvergence rate Experiments on real world traffic volumedata demonstrate the practicability and effectiveness of theproposed method in traffic research domain

In the future we would like to investigate how to auto-matically choose the parameters in our algorithm and explore

8 Mathematical Problems in Engineering

additional applications of our method in traffic researchdomain

Acknowledgments

This research was supported by NSFC (Grant No 6127137661171118 and 91120015) Beijing Natural Science Foundation(4122067)The authors would like to thank Professor Bin Ranfrom the University ofWisconsin-Madison and Yong Li fromthe University of Notre Dame for the suggestive discussions

References

[1] K Tamura and M Hirayama ldquoToward realization of VICSmdashvehicle information and communications systemrdquo in Pro-ceedings of the IEEE-IEE Vehicle Navigation and InformationsSystems Conference pp 72ndash77 October 1993

[2] S Kim M E Lewis and C C White III ldquoOptimal vehiclerouting with real-time traffic informationrdquo IEEE Transactionson Intelligent Transportation Systems vol 6 no 2 pp 178ndash1882005

[3] P Mirchandani and L Head ldquoA real-time traffic signal controlsystem architecture algorithms and analysisrdquo TransportationResearch Part C vol 9 no 6 pp 415ndash432 2001

[4] B M Williams P K Durvasula and D E Brown ldquoUrbanfreeway traffic flow prediction application of seasonal autore-gressive integrated moving average and exponential smoothingmodelsrdquo Transportation Research Record no 1644 pp 132ndash1411998

[5] J Xu X Li and H Shi ldquoShort-term traffic flow forecastingmodel under missing datardquo Journal of Computer Applicationsvol 30 no 4 pp 1117ndash1120 2010

[6] Y Pei and J Ma ldquoReal-time traffic data screening and recon-structionrdquo China Civil Engineering Journal vol 36 no 7 pp78ndash83 2003

[7] S Chen W Wang and W Li ldquoNoise recognition and noisereduction of real-time trafficdatardquoDongnanDaxueXuebao vol36 no 2 pp 322ndash325 2006

[8] B-G Daniel J D-P Francisco G-O David et al ldquoWavelet-based denoising for traffic volume time series forecasting withself-organizing neural networksrdquo Computer-Aided Civil andInfrastructure Engineering vol 25 no 7 pp 530ndash545 2010

[9] S Gandy B Recht and I Yamada ldquoTensor completion andlowmdashrank tensor recovery via convex optimizationrdquo InverseProblems vol 27 no 2 Article ID 025010 2011

[10] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[11] A S Lewis and G Knowles ldquoImage compression using the 2-Dwavelet transformrdquo IEEE Transactions of Image Processing vol1 no 2 pp 244ndash250 1992

[12] V Chandrasekaran S Sanghavi P A Parrilo and A S WillskyldquoRank-sparsity incoherence for matrix decompositionrdquo SIAMJournal on Optimization vol 21 no 2 pp 572ndash596 2011

[13] J Wright Y Peng Y Ma and A Ganesh ldquoRobust principalcomponent analysis exact recovery of corrupted low-rankmatrices by convex optimizationrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 2080ndash2088 Vancouver Canada 2009

[14] A Ganesh Z Lin J Wright L Wu M Chen and Y MaldquoFast algorithms for recovering a corrupted low-rank matrixrdquo

in Proceedings of the 3rd IEEE International Workshop onComputational Advances in Multi-Sensor Adaptive Processing(CAMSAP rsquo09) pp 213ndash216 December 2009

[15] Z Lin M Chen L Wu and Y Ma ldquoThe augmented Lagrangemultiplier method for exact recovery of a corrupted low-rankmatricesrdquoMathematical Programming 2009

[16] X Yuan and J Yang ldquoSparse and low-rank matrix decomposi-tion via alternating directionmethodsrdquo Tech Rep Departmentof Mathematics Hong Kong Baptist University 2009

[17] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[18] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo in Proceedings of theInternational Conference on Computer Vision (ICCV rsquo09) 2009

[19] D Bertsekas and J Tsitsiklis Parallel and Distributed Computa-tion Prentice-Hall 1989

[20] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010

[21] E T Hale W Yin and Y Zhang ldquoFixed-point continuationfor ℓ1-minimization methodology and convergencerdquo SIAMJournal on Optimization vol 19 no 3 pp 1107ndash1130 2008

[22] Y Li J Yan Y Zhou and J Yang ldquoOptimum subspace learningand error correction for tensorsrdquo in Proceedings of the 11thEuropean Conference on Computer Vision (ECCV rsquo10) CreteGreece 2010

[23] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo Journal of the ACM vol 58 no 3 article11 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Traffic Volume Data Outlier …downloads.hindawi.com/journals/mpe/2013/164810.pdfTraffic Volume Data Outlier Recovery via Tensor Model HuachunTan, 1 JianshuaiFeng,

Mathematical Problems in Engineering 5

Input n-mode tensorAParameters 120572 120573 120574 120582 120578 120588(1) InitializationL(0) = A S(0) = 0 119872

119894=L(119894) 119873119894= 0 119896 = 1

(2) Repeat until convergence(3) for 119894 = 1 to 119899(4) 119872

(119896+1)

119894= 119880119894119863120582119894120572119894(Λ)119881

119879

119894

where 119880119894Λ119881119879

119894= 119875119894L(119896) + 119884

(119896)

119894120572119894

(5) 119873(119896+1)

119894= 119863120578119894120573119894(119875119894S(119896) + 119885

(119896)

119894120573119894)

(6) 119884(119896+1)

119894= 119884(119896)

119894+ 120572119894(119875119894L(119896) minus119872

(119896+1)

119894)

(7) 119885(119896+1)

119894= 119885(119896)

119894+ 120573119894(119875119894S(119896) minus 119873

(119896+1)

119894)

(8) end for

(9) L(119896+1) =A minus S(119896) minus 120574sum

119899

119894=1119875119879

119894(119884(119896+1)

119894minus 120572119894119872(119896+1)

119894)

1 + 120574sum119899

119894=1120572119894

(10) S(119896+1) =A minusL(119896+1) minus 120574sum

119899

119894=1119875119879

119894(119885(119896+1)

119894minus 120572119894119873(119896+1)

119894)

1 + 120574sum119899

119894=1120573119894

(11) 120572 = 120588120572 120573 = 120588120573(12) End(13) 119896 = 119896 + 1

Output n-mode tensorLS

Algorithm 1 ADMM-TR ADMM for tensor recovery

By the well-known 1198971minimization [21] the optimal

solution of (21) is

119873119894= 119863120578119894120573119894

(119875119894S +119885119894

120573119894

) (22)

where119863120591is the ldquoshrinkagerdquo operation

Computing L Now we fix all variables except L andminimize 119871

119860over L The resulting minimization problem

is the minimization of a quadratic function

minL 119871119860 (

L) =1

2120574

A minusL minus S2

119865

+

119899

sum

119894=1

(⟨119884119894 119875119894L minus119872

119894⟩ +

120572119894

2

1003817100381710038171003817119875119894L minus119872

119894

1003817100381710038171003817

2

119865)

(23)

The objective function is differentiable so the minimizerLmin is characterized by (120597119871119860(L))120597L = 0 Thus we obtain

Lmin =A minusS minus 120574sum

119899

119894=1119875119894

119879(119884119894minus 120572119894119872119894)

(1 + 120574sum119899

119894=1120572119894)

(24)

ComputingS Nowwefix all variables exceptS andminimize119871119860

over S The resulting minimization problem is theminimization of a quadratic function

minS 119871119860 (

S) =1

2120574

A minusL minusS2

119865

+

119899

sum

119894=1

(⟨119885119894 119875119894S minus 119873

119894⟩ +

120573119894

2

1003817100381710038171003817119875119894S minus 119873

119894

1003817100381710038171003817

2

119865)

(25)

The objective function is also differentiable so the min-imizer Smin is characterized by (120597119871

119860(S))120597S = 0 Thus we

have

Smin =A minusL minus 120574sum

119899

119894=1119875119894

119879(119885119894minus 120573119894119873119894)

(1 + 120574sum119899

119894=1120573119894)

(26)

For comparing with RSTD [22] we also choose thedifference of L and S in successive iterations against acertain tolerance as the stopping criterion The pseudocodeof the proposed ADMM-TR algorithm is summarized inAlgorithm 1

Theorem 2 (Convergence of ADMM-TR) Assume that theoptimal solution set 119883lowast of (11) is nonempty A sequenceL(119896)S(119896)119872

(119896)

119894 119873(119896)

119894 119884(119896)

119894 119885(119896)

119894 generated by our proposed

ADMM-TR algorithm is bounded and every limit point ofL(119896)S(119896) is an optimal solution of the original problem (11)

Proof We check the assumptions of Theorem 1 119862119909is not

bounded but 119875119894

lowast119875119894= 119868 is a constant multiple of the identity

operator Thus Theorem 1 can also be applied to ADMM-TRandTheorem 2 can be derived

5 Numerical Experiments

This section evaluates the empirical performance of the pro-posed algorithm on synthetic data and compares the resultswithRSTD (Rank Sparsity TensorDecomposition) [22] Alsoexperiments on traffic volume data outlier recovery illustratethe efficiency of the proposedmethod in traffic research filed

We use the Lanczos algorithm for computing the singularvalues decomposition and adopt the same rule for predicting

6 Mathematical Problems in Engineering

Table 2 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 119899-rank = [5 5 5]

119904119901119903

ADMM-TR RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 43 43 133 137 46 46 208 178015 98 42 162 208 105 47 235 336025 123 43 514 449 159 52 676 510035 565 95 654 533 675 114 737 575

Table 3 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 n-rank = [10 10 10]

119904119901119903

Algorithm ADMM-TR Algorithm RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 44 20 236 264 47 25 338 299015 47 21 417 421 52 25 603 508025 89 29 664 608 120 42 663 536035 173 52 981 806 211 65 1110 877

the dimension of the principal singular space as [22] And theparameters are set as 120572 = 120573 = [119868

1119868max 1198682119868max 119868119899119868max]

119879

and 120574 = 1sum([1198681119868max 1198682119868max 119868119899119868max]

119879 for all experi-ments where 119868max = max119868

119894 120578 is set to 1radic119868max as suggested

in [23]All the experiments are conducted and timed on the same

desktop with an Pentium (R) Dual-Core 250GHz CPU thathas 4GB memory running on Windows 7 and MATLAB

51 Synthetic Data ADMM-TR and RSTD are tested on thesynthetic data of size 40times40times40 We generate the dimension119903 of a ldquocore tensorrdquo C isin R119903timessdotsdotsdottimes119903 which we fill with Gaussiandistributed entries (sim N(0 1)) Then we generate matrices119880(1) 119880

(119873) with119880(119894) isin R119899119894times119903 whose elements are also iidGaussian random variables (simN(0 1)) and set

L0= Ctimes1119880(1)

times2sdotsdotsdottimes119873119880(119873) (27)

The entries of sparse tensor S0are independently dis-

tributed each taking on value 0 with probability 1-119904119901119903 andeach taking on impulsive value with probability 119904119901119903We applythe proposed algorithm to the tensorA

0=L0+S0to recover

L and S and compare with RSTD For these experimentstwo cases of 119899-rank are investigated 119899-rank = [5 5 5] and 119899-rank = [10 10 10] Table 1 presents the average results (across30 instances) for different 119904119901119903

The quality of recovery is measured by the relative squareerror (RSE) toL

0and S

0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

RSE S0=

10038171003817100381710038171003817

S minusS0

10038171003817100381710038171003817119865

1003817100381710038171003817S0

1003817100381710038171003817119865

(28)

Tables 2 and 3 show that the proposed algorithm(ADMM-TR) is about 10 percent faster than RSTD proposed

in [22] and achieves better accuracy in terms of relative squareerror Though both of the algorithms involve computing aSVD per iteration we observe that the proposed algorithmtake much fewer iterations than RSTD to converge to theoptimal solution

The more impulsive entries are added that is for highervalue of spr the less probable it becomes for the tensorrecovery problem In addition the problem becomes moresophisticated when the 119899-rank is higher for the ground truthtensors of the same size In Table 1 different spr is set fortwo tensor cases And results show the recovered accuracydecreases as the spr grows for a certain case In particular therecovered accuracy for the tensorA

0isin R40times40times40 with 119899-rank

= [5 5 5] decreases sharply when the spr is up to 40 whilethe phenomenon occurs for tensor A

0isin R40times40times40 with 119899-

rank = [10 10 10] when the spr is about 25 This is due tothat relative low rank and low sparse ratio are precondition ofthe tensor recovery problem

52 TrafficVolumeData To evaluate the performances of theproposed method in traffic volume data outlier recovery acomplete traffic volume data set is used as the test data setWeuse the data of a fixed point in Sacramento County which isdownloaded fromhttppemsdotcagovThe traffic volumedata are recorded every 5 minutes Therefore a daily trafficvolume series for a loop detector contains 288 records andthe whole period of the data lasts for 16 days that is fromAugust 2 to August 17 2010

Based on multiple correlations of the traffic volume datawe model the data set as a tensor model of size 16 times 24 times 12which stands for 16 days 24 hours in a day and 12 sampleintervals (ie recorded by 5 minutes) per hour The ratiosof outlier data are set from 5 to 15 and the outlier dataare produced randomly All the results are averaged by 10instances

Mathematical Problems in Engineering 7

0

100

200

300

400

500

600

700

800

900

1000

Traffi

c vol

ume (

veh5

min

)

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Figure 1 Comparisons with raw traffic volume data data corruptedby outliers with 5 ratio and data recovered by ADMM-TR

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 2 Comparisons with raw traffic volume data data corruptedby outliers with 10 ratio and data recovered by ADMM-TR

For the real world data we mainly pay attention to thecorrectness of the recovered traffic volume data Thus thequality of recovery is measured by the relative square error(RSE) to L

0and Mean Absolute Percentage Error (MAPE)

toL0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

MAPE L0=

1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

(29)

0

200

400

600

800

1000

1200

1400

1600

1800

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 3 Comparisons with raw traffic volume data data corruptedby outliers with 15 ratio and data recovered by ADMM-TR

Table 4 Traffic volume outlier data before and afterbeing recovered

119904119901119903

Before recovery After recoveryRSE L

0

(119890-3)MAPE L

0

(119890-3)RSE L

0

(119890-3)MAPE L

0

(119890-3)005 05005 02136 00961 00314010 07066 05008 01417 01027015 08850 08777 02221 02351

where 119905(119898)119903

and 119905(119898)119890

are the 119898th elements which stand forthe known real value and recovered value respectively 119872denotes the number of recovered traffic volumes

Table 4 presents the relative errors of traffic volumeoutlier data before and after being recovered by ADMM-TRThe results show that the RSE L

0and MAPE L

0for traffic

volume data corrupted by outlier data are about 5 times thanthe data after being recovered by ADMM-TR Figures 1 2and 3 present the profiles of traffic volume data for a day Theresults show that ADMM-TR could recover the traffic volumeoutlier data with perfect performance

6 Conclusions

In this paper we concentrate on the mathematical problemin traffic volume outlier data recovery and proposed anovel tensor recovery method based on alternating directionmethod of multipliers (ADMM) The proposed algorithmcan automatically separate the low-119899-rank tensor data andsparse partThe experiments show that the proposedmethodis more stable and accurate in most cases and has excellentconvergence rate Experiments on real world traffic volumedata demonstrate the practicability and effectiveness of theproposed method in traffic research domain

In the future we would like to investigate how to auto-matically choose the parameters in our algorithm and explore

8 Mathematical Problems in Engineering

additional applications of our method in traffic researchdomain

Acknowledgments

This research was supported by NSFC (Grant No 6127137661171118 and 91120015) Beijing Natural Science Foundation(4122067)The authors would like to thank Professor Bin Ranfrom the University ofWisconsin-Madison and Yong Li fromthe University of Notre Dame for the suggestive discussions

References

[1] K Tamura and M Hirayama ldquoToward realization of VICSmdashvehicle information and communications systemrdquo in Pro-ceedings of the IEEE-IEE Vehicle Navigation and InformationsSystems Conference pp 72ndash77 October 1993

[2] S Kim M E Lewis and C C White III ldquoOptimal vehiclerouting with real-time traffic informationrdquo IEEE Transactionson Intelligent Transportation Systems vol 6 no 2 pp 178ndash1882005

[3] P Mirchandani and L Head ldquoA real-time traffic signal controlsystem architecture algorithms and analysisrdquo TransportationResearch Part C vol 9 no 6 pp 415ndash432 2001

[4] B M Williams P K Durvasula and D E Brown ldquoUrbanfreeway traffic flow prediction application of seasonal autore-gressive integrated moving average and exponential smoothingmodelsrdquo Transportation Research Record no 1644 pp 132ndash1411998

[5] J Xu X Li and H Shi ldquoShort-term traffic flow forecastingmodel under missing datardquo Journal of Computer Applicationsvol 30 no 4 pp 1117ndash1120 2010

[6] Y Pei and J Ma ldquoReal-time traffic data screening and recon-structionrdquo China Civil Engineering Journal vol 36 no 7 pp78ndash83 2003

[7] S Chen W Wang and W Li ldquoNoise recognition and noisereduction of real-time trafficdatardquoDongnanDaxueXuebao vol36 no 2 pp 322ndash325 2006

[8] B-G Daniel J D-P Francisco G-O David et al ldquoWavelet-based denoising for traffic volume time series forecasting withself-organizing neural networksrdquo Computer-Aided Civil andInfrastructure Engineering vol 25 no 7 pp 530ndash545 2010

[9] S Gandy B Recht and I Yamada ldquoTensor completion andlowmdashrank tensor recovery via convex optimizationrdquo InverseProblems vol 27 no 2 Article ID 025010 2011

[10] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[11] A S Lewis and G Knowles ldquoImage compression using the 2-Dwavelet transformrdquo IEEE Transactions of Image Processing vol1 no 2 pp 244ndash250 1992

[12] V Chandrasekaran S Sanghavi P A Parrilo and A S WillskyldquoRank-sparsity incoherence for matrix decompositionrdquo SIAMJournal on Optimization vol 21 no 2 pp 572ndash596 2011

[13] J Wright Y Peng Y Ma and A Ganesh ldquoRobust principalcomponent analysis exact recovery of corrupted low-rankmatrices by convex optimizationrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 2080ndash2088 Vancouver Canada 2009

[14] A Ganesh Z Lin J Wright L Wu M Chen and Y MaldquoFast algorithms for recovering a corrupted low-rank matrixrdquo

in Proceedings of the 3rd IEEE International Workshop onComputational Advances in Multi-Sensor Adaptive Processing(CAMSAP rsquo09) pp 213ndash216 December 2009

[15] Z Lin M Chen L Wu and Y Ma ldquoThe augmented Lagrangemultiplier method for exact recovery of a corrupted low-rankmatricesrdquoMathematical Programming 2009

[16] X Yuan and J Yang ldquoSparse and low-rank matrix decomposi-tion via alternating directionmethodsrdquo Tech Rep Departmentof Mathematics Hong Kong Baptist University 2009

[17] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[18] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo in Proceedings of theInternational Conference on Computer Vision (ICCV rsquo09) 2009

[19] D Bertsekas and J Tsitsiklis Parallel and Distributed Computa-tion Prentice-Hall 1989

[20] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010

[21] E T Hale W Yin and Y Zhang ldquoFixed-point continuationfor ℓ1-minimization methodology and convergencerdquo SIAMJournal on Optimization vol 19 no 3 pp 1107ndash1130 2008

[22] Y Li J Yan Y Zhou and J Yang ldquoOptimum subspace learningand error correction for tensorsrdquo in Proceedings of the 11thEuropean Conference on Computer Vision (ECCV rsquo10) CreteGreece 2010

[23] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo Journal of the ACM vol 58 no 3 article11 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Traffic Volume Data Outlier …downloads.hindawi.com/journals/mpe/2013/164810.pdfTraffic Volume Data Outlier Recovery via Tensor Model HuachunTan, 1 JianshuaiFeng,

6 Mathematical Problems in Engineering

Table 2 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 119899-rank = [5 5 5]

119904119901119903

ADMM-TR RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 43 43 133 137 46 46 208 178015 98 42 162 208 105 47 235 336025 123 43 514 449 159 52 676 510035 565 95 654 533 675 114 737 575

Table 3 Comparison of ADMM-TR with RSTD for synthetic data whereA0isin R40times40times40 n-rank = [10 10 10]

119904119901119903

Algorithm ADMM-TR Algorithm RSTDRSE L

0

(119890-3)RSE S

0

(119890-3) iter Time (s) RSE L0

(119890-3)RSE S

0

(119890-3) iter Time (s)

005 44 20 236 264 47 25 338 299015 47 21 417 421 52 25 603 508025 89 29 664 608 120 42 663 536035 173 52 981 806 211 65 1110 877

the dimension of the principal singular space as [22] And theparameters are set as 120572 = 120573 = [119868

1119868max 1198682119868max 119868119899119868max]

119879

and 120574 = 1sum([1198681119868max 1198682119868max 119868119899119868max]

119879 for all experi-ments where 119868max = max119868

119894 120578 is set to 1radic119868max as suggested

in [23]All the experiments are conducted and timed on the same

desktop with an Pentium (R) Dual-Core 250GHz CPU thathas 4GB memory running on Windows 7 and MATLAB

51 Synthetic Data ADMM-TR and RSTD are tested on thesynthetic data of size 40times40times40 We generate the dimension119903 of a ldquocore tensorrdquo C isin R119903timessdotsdotsdottimes119903 which we fill with Gaussiandistributed entries (sim N(0 1)) Then we generate matrices119880(1) 119880

(119873) with119880(119894) isin R119899119894times119903 whose elements are also iidGaussian random variables (simN(0 1)) and set

L0= Ctimes1119880(1)

times2sdotsdotsdottimes119873119880(119873) (27)

The entries of sparse tensor S0are independently dis-

tributed each taking on value 0 with probability 1-119904119901119903 andeach taking on impulsive value with probability 119904119901119903We applythe proposed algorithm to the tensorA

0=L0+S0to recover

L and S and compare with RSTD For these experimentstwo cases of 119899-rank are investigated 119899-rank = [5 5 5] and 119899-rank = [10 10 10] Table 1 presents the average results (across30 instances) for different 119904119901119903

The quality of recovery is measured by the relative squareerror (RSE) toL

0and S

0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

RSE S0=

10038171003817100381710038171003817

S minusS0

10038171003817100381710038171003817119865

1003817100381710038171003817S0

1003817100381710038171003817119865

(28)

Tables 2 and 3 show that the proposed algorithm(ADMM-TR) is about 10 percent faster than RSTD proposed

in [22] and achieves better accuracy in terms of relative squareerror Though both of the algorithms involve computing aSVD per iteration we observe that the proposed algorithmtake much fewer iterations than RSTD to converge to theoptimal solution

The more impulsive entries are added that is for highervalue of spr the less probable it becomes for the tensorrecovery problem In addition the problem becomes moresophisticated when the 119899-rank is higher for the ground truthtensors of the same size In Table 1 different spr is set fortwo tensor cases And results show the recovered accuracydecreases as the spr grows for a certain case In particular therecovered accuracy for the tensorA

0isin R40times40times40 with 119899-rank

= [5 5 5] decreases sharply when the spr is up to 40 whilethe phenomenon occurs for tensor A

0isin R40times40times40 with 119899-

rank = [10 10 10] when the spr is about 25 This is due tothat relative low rank and low sparse ratio are precondition ofthe tensor recovery problem

52 TrafficVolumeData To evaluate the performances of theproposed method in traffic volume data outlier recovery acomplete traffic volume data set is used as the test data setWeuse the data of a fixed point in Sacramento County which isdownloaded fromhttppemsdotcagovThe traffic volumedata are recorded every 5 minutes Therefore a daily trafficvolume series for a loop detector contains 288 records andthe whole period of the data lasts for 16 days that is fromAugust 2 to August 17 2010

Based on multiple correlations of the traffic volume datawe model the data set as a tensor model of size 16 times 24 times 12which stands for 16 days 24 hours in a day and 12 sampleintervals (ie recorded by 5 minutes) per hour The ratiosof outlier data are set from 5 to 15 and the outlier dataare produced randomly All the results are averaged by 10instances

Mathematical Problems in Engineering 7

0

100

200

300

400

500

600

700

800

900

1000

Traffi

c vol

ume (

veh5

min

)

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Figure 1 Comparisons with raw traffic volume data data corruptedby outliers with 5 ratio and data recovered by ADMM-TR

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 2 Comparisons with raw traffic volume data data corruptedby outliers with 10 ratio and data recovered by ADMM-TR

For the real world data we mainly pay attention to thecorrectness of the recovered traffic volume data Thus thequality of recovery is measured by the relative square error(RSE) to L

0and Mean Absolute Percentage Error (MAPE)

toL0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

MAPE L0=

1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

(29)

0

200

400

600

800

1000

1200

1400

1600

1800

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 3 Comparisons with raw traffic volume data data corruptedby outliers with 15 ratio and data recovered by ADMM-TR

Table 4 Traffic volume outlier data before and afterbeing recovered

119904119901119903

Before recovery After recoveryRSE L

0

(119890-3)MAPE L

0

(119890-3)RSE L

0

(119890-3)MAPE L

0

(119890-3)005 05005 02136 00961 00314010 07066 05008 01417 01027015 08850 08777 02221 02351

where 119905(119898)119903

and 119905(119898)119890

are the 119898th elements which stand forthe known real value and recovered value respectively 119872denotes the number of recovered traffic volumes

Table 4 presents the relative errors of traffic volumeoutlier data before and after being recovered by ADMM-TRThe results show that the RSE L

0and MAPE L

0for traffic

volume data corrupted by outlier data are about 5 times thanthe data after being recovered by ADMM-TR Figures 1 2and 3 present the profiles of traffic volume data for a day Theresults show that ADMM-TR could recover the traffic volumeoutlier data with perfect performance

6 Conclusions

In this paper we concentrate on the mathematical problemin traffic volume outlier data recovery and proposed anovel tensor recovery method based on alternating directionmethod of multipliers (ADMM) The proposed algorithmcan automatically separate the low-119899-rank tensor data andsparse partThe experiments show that the proposedmethodis more stable and accurate in most cases and has excellentconvergence rate Experiments on real world traffic volumedata demonstrate the practicability and effectiveness of theproposed method in traffic research domain

In the future we would like to investigate how to auto-matically choose the parameters in our algorithm and explore

8 Mathematical Problems in Engineering

additional applications of our method in traffic researchdomain

Acknowledgments

This research was supported by NSFC (Grant No 6127137661171118 and 91120015) Beijing Natural Science Foundation(4122067)The authors would like to thank Professor Bin Ranfrom the University ofWisconsin-Madison and Yong Li fromthe University of Notre Dame for the suggestive discussions

References

[1] K Tamura and M Hirayama ldquoToward realization of VICSmdashvehicle information and communications systemrdquo in Pro-ceedings of the IEEE-IEE Vehicle Navigation and InformationsSystems Conference pp 72ndash77 October 1993

[2] S Kim M E Lewis and C C White III ldquoOptimal vehiclerouting with real-time traffic informationrdquo IEEE Transactionson Intelligent Transportation Systems vol 6 no 2 pp 178ndash1882005

[3] P Mirchandani and L Head ldquoA real-time traffic signal controlsystem architecture algorithms and analysisrdquo TransportationResearch Part C vol 9 no 6 pp 415ndash432 2001

[4] B M Williams P K Durvasula and D E Brown ldquoUrbanfreeway traffic flow prediction application of seasonal autore-gressive integrated moving average and exponential smoothingmodelsrdquo Transportation Research Record no 1644 pp 132ndash1411998

[5] J Xu X Li and H Shi ldquoShort-term traffic flow forecastingmodel under missing datardquo Journal of Computer Applicationsvol 30 no 4 pp 1117ndash1120 2010

[6] Y Pei and J Ma ldquoReal-time traffic data screening and recon-structionrdquo China Civil Engineering Journal vol 36 no 7 pp78ndash83 2003

[7] S Chen W Wang and W Li ldquoNoise recognition and noisereduction of real-time trafficdatardquoDongnanDaxueXuebao vol36 no 2 pp 322ndash325 2006

[8] B-G Daniel J D-P Francisco G-O David et al ldquoWavelet-based denoising for traffic volume time series forecasting withself-organizing neural networksrdquo Computer-Aided Civil andInfrastructure Engineering vol 25 no 7 pp 530ndash545 2010

[9] S Gandy B Recht and I Yamada ldquoTensor completion andlowmdashrank tensor recovery via convex optimizationrdquo InverseProblems vol 27 no 2 Article ID 025010 2011

[10] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[11] A S Lewis and G Knowles ldquoImage compression using the 2-Dwavelet transformrdquo IEEE Transactions of Image Processing vol1 no 2 pp 244ndash250 1992

[12] V Chandrasekaran S Sanghavi P A Parrilo and A S WillskyldquoRank-sparsity incoherence for matrix decompositionrdquo SIAMJournal on Optimization vol 21 no 2 pp 572ndash596 2011

[13] J Wright Y Peng Y Ma and A Ganesh ldquoRobust principalcomponent analysis exact recovery of corrupted low-rankmatrices by convex optimizationrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 2080ndash2088 Vancouver Canada 2009

[14] A Ganesh Z Lin J Wright L Wu M Chen and Y MaldquoFast algorithms for recovering a corrupted low-rank matrixrdquo

in Proceedings of the 3rd IEEE International Workshop onComputational Advances in Multi-Sensor Adaptive Processing(CAMSAP rsquo09) pp 213ndash216 December 2009

[15] Z Lin M Chen L Wu and Y Ma ldquoThe augmented Lagrangemultiplier method for exact recovery of a corrupted low-rankmatricesrdquoMathematical Programming 2009

[16] X Yuan and J Yang ldquoSparse and low-rank matrix decomposi-tion via alternating directionmethodsrdquo Tech Rep Departmentof Mathematics Hong Kong Baptist University 2009

[17] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[18] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo in Proceedings of theInternational Conference on Computer Vision (ICCV rsquo09) 2009

[19] D Bertsekas and J Tsitsiklis Parallel and Distributed Computa-tion Prentice-Hall 1989

[20] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010

[21] E T Hale W Yin and Y Zhang ldquoFixed-point continuationfor ℓ1-minimization methodology and convergencerdquo SIAMJournal on Optimization vol 19 no 3 pp 1107ndash1130 2008

[22] Y Li J Yan Y Zhou and J Yang ldquoOptimum subspace learningand error correction for tensorsrdquo in Proceedings of the 11thEuropean Conference on Computer Vision (ECCV rsquo10) CreteGreece 2010

[23] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo Journal of the ACM vol 58 no 3 article11 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Traffic Volume Data Outlier …downloads.hindawi.com/journals/mpe/2013/164810.pdfTraffic Volume Data Outlier Recovery via Tensor Model HuachunTan, 1 JianshuaiFeng,

Mathematical Problems in Engineering 7

0

100

200

300

400

500

600

700

800

900

1000

Traffi

c vol

ume (

veh5

min

)

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Figure 1 Comparisons with raw traffic volume data data corruptedby outliers with 5 ratio and data recovered by ADMM-TR

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

1400

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 2 Comparisons with raw traffic volume data data corruptedby outliers with 10 ratio and data recovered by ADMM-TR

For the real world data we mainly pay attention to thecorrectness of the recovered traffic volume data Thus thequality of recovery is measured by the relative square error(RSE) to L

0and Mean Absolute Percentage Error (MAPE)

toL0 defined to be

RSE L0=

10038171003817100381710038171003817

L minusL0

10038171003817100381710038171003817119865

1003817100381710038171003817L0

1003817100381710038171003817119865

MAPE L0=

1

119872

119872

sum

119898=1

1003816100381610038161003816100381610038161003816100381610038161003816

119905(119898)

119903minus 119905(119898)

119890

119905(119898)

119903

1003816100381610038161003816100381610038161003816100381610038161003816

(29)

0

200

400

600

800

1000

1200

1400

1600

1800

0 50 100 150 200 250 300

Raw dataOutlier data

Recovered data

Time interval (5 min)

Traffi

c vol

ume (

veh5

min

)

Figure 3 Comparisons with raw traffic volume data data corruptedby outliers with 15 ratio and data recovered by ADMM-TR

Table 4 Traffic volume outlier data before and afterbeing recovered

119904119901119903

Before recovery After recoveryRSE L

0

(119890-3)MAPE L

0

(119890-3)RSE L

0

(119890-3)MAPE L

0

(119890-3)005 05005 02136 00961 00314010 07066 05008 01417 01027015 08850 08777 02221 02351

where 119905(119898)119903

and 119905(119898)119890

are the 119898th elements which stand forthe known real value and recovered value respectively 119872denotes the number of recovered traffic volumes

Table 4 presents the relative errors of traffic volumeoutlier data before and after being recovered by ADMM-TRThe results show that the RSE L

0and MAPE L

0for traffic

volume data corrupted by outlier data are about 5 times thanthe data after being recovered by ADMM-TR Figures 1 2and 3 present the profiles of traffic volume data for a day Theresults show that ADMM-TR could recover the traffic volumeoutlier data with perfect performance

6 Conclusions

In this paper we concentrate on the mathematical problemin traffic volume outlier data recovery and proposed anovel tensor recovery method based on alternating directionmethod of multipliers (ADMM) The proposed algorithmcan automatically separate the low-119899-rank tensor data andsparse partThe experiments show that the proposedmethodis more stable and accurate in most cases and has excellentconvergence rate Experiments on real world traffic volumedata demonstrate the practicability and effectiveness of theproposed method in traffic research domain

In the future we would like to investigate how to auto-matically choose the parameters in our algorithm and explore

8 Mathematical Problems in Engineering

additional applications of our method in traffic researchdomain

Acknowledgments

This research was supported by NSFC (Grant No 6127137661171118 and 91120015) Beijing Natural Science Foundation(4122067)The authors would like to thank Professor Bin Ranfrom the University ofWisconsin-Madison and Yong Li fromthe University of Notre Dame for the suggestive discussions

References

[1] K Tamura and M Hirayama ldquoToward realization of VICSmdashvehicle information and communications systemrdquo in Pro-ceedings of the IEEE-IEE Vehicle Navigation and InformationsSystems Conference pp 72ndash77 October 1993

[2] S Kim M E Lewis and C C White III ldquoOptimal vehiclerouting with real-time traffic informationrdquo IEEE Transactionson Intelligent Transportation Systems vol 6 no 2 pp 178ndash1882005

[3] P Mirchandani and L Head ldquoA real-time traffic signal controlsystem architecture algorithms and analysisrdquo TransportationResearch Part C vol 9 no 6 pp 415ndash432 2001

[4] B M Williams P K Durvasula and D E Brown ldquoUrbanfreeway traffic flow prediction application of seasonal autore-gressive integrated moving average and exponential smoothingmodelsrdquo Transportation Research Record no 1644 pp 132ndash1411998

[5] J Xu X Li and H Shi ldquoShort-term traffic flow forecastingmodel under missing datardquo Journal of Computer Applicationsvol 30 no 4 pp 1117ndash1120 2010

[6] Y Pei and J Ma ldquoReal-time traffic data screening and recon-structionrdquo China Civil Engineering Journal vol 36 no 7 pp78ndash83 2003

[7] S Chen W Wang and W Li ldquoNoise recognition and noisereduction of real-time trafficdatardquoDongnanDaxueXuebao vol36 no 2 pp 322ndash325 2006

[8] B-G Daniel J D-P Francisco G-O David et al ldquoWavelet-based denoising for traffic volume time series forecasting withself-organizing neural networksrdquo Computer-Aided Civil andInfrastructure Engineering vol 25 no 7 pp 530ndash545 2010

[9] S Gandy B Recht and I Yamada ldquoTensor completion andlowmdashrank tensor recovery via convex optimizationrdquo InverseProblems vol 27 no 2 Article ID 025010 2011

[10] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[11] A S Lewis and G Knowles ldquoImage compression using the 2-Dwavelet transformrdquo IEEE Transactions of Image Processing vol1 no 2 pp 244ndash250 1992

[12] V Chandrasekaran S Sanghavi P A Parrilo and A S WillskyldquoRank-sparsity incoherence for matrix decompositionrdquo SIAMJournal on Optimization vol 21 no 2 pp 572ndash596 2011

[13] J Wright Y Peng Y Ma and A Ganesh ldquoRobust principalcomponent analysis exact recovery of corrupted low-rankmatrices by convex optimizationrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 2080ndash2088 Vancouver Canada 2009

[14] A Ganesh Z Lin J Wright L Wu M Chen and Y MaldquoFast algorithms for recovering a corrupted low-rank matrixrdquo

in Proceedings of the 3rd IEEE International Workshop onComputational Advances in Multi-Sensor Adaptive Processing(CAMSAP rsquo09) pp 213ndash216 December 2009

[15] Z Lin M Chen L Wu and Y Ma ldquoThe augmented Lagrangemultiplier method for exact recovery of a corrupted low-rankmatricesrdquoMathematical Programming 2009

[16] X Yuan and J Yang ldquoSparse and low-rank matrix decomposi-tion via alternating directionmethodsrdquo Tech Rep Departmentof Mathematics Hong Kong Baptist University 2009

[17] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[18] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo in Proceedings of theInternational Conference on Computer Vision (ICCV rsquo09) 2009

[19] D Bertsekas and J Tsitsiklis Parallel and Distributed Computa-tion Prentice-Hall 1989

[20] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010

[21] E T Hale W Yin and Y Zhang ldquoFixed-point continuationfor ℓ1-minimization methodology and convergencerdquo SIAMJournal on Optimization vol 19 no 3 pp 1107ndash1130 2008

[22] Y Li J Yan Y Zhou and J Yang ldquoOptimum subspace learningand error correction for tensorsrdquo in Proceedings of the 11thEuropean Conference on Computer Vision (ECCV rsquo10) CreteGreece 2010

[23] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo Journal of the ACM vol 58 no 3 article11 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Traffic Volume Data Outlier …downloads.hindawi.com/journals/mpe/2013/164810.pdfTraffic Volume Data Outlier Recovery via Tensor Model HuachunTan, 1 JianshuaiFeng,

8 Mathematical Problems in Engineering

additional applications of our method in traffic researchdomain

Acknowledgments

This research was supported by NSFC (Grant No 6127137661171118 and 91120015) Beijing Natural Science Foundation(4122067)The authors would like to thank Professor Bin Ranfrom the University ofWisconsin-Madison and Yong Li fromthe University of Notre Dame for the suggestive discussions

References

[1] K Tamura and M Hirayama ldquoToward realization of VICSmdashvehicle information and communications systemrdquo in Pro-ceedings of the IEEE-IEE Vehicle Navigation and InformationsSystems Conference pp 72ndash77 October 1993

[2] S Kim M E Lewis and C C White III ldquoOptimal vehiclerouting with real-time traffic informationrdquo IEEE Transactionson Intelligent Transportation Systems vol 6 no 2 pp 178ndash1882005

[3] P Mirchandani and L Head ldquoA real-time traffic signal controlsystem architecture algorithms and analysisrdquo TransportationResearch Part C vol 9 no 6 pp 415ndash432 2001

[4] B M Williams P K Durvasula and D E Brown ldquoUrbanfreeway traffic flow prediction application of seasonal autore-gressive integrated moving average and exponential smoothingmodelsrdquo Transportation Research Record no 1644 pp 132ndash1411998

[5] J Xu X Li and H Shi ldquoShort-term traffic flow forecastingmodel under missing datardquo Journal of Computer Applicationsvol 30 no 4 pp 1117ndash1120 2010

[6] Y Pei and J Ma ldquoReal-time traffic data screening and recon-structionrdquo China Civil Engineering Journal vol 36 no 7 pp78ndash83 2003

[7] S Chen W Wang and W Li ldquoNoise recognition and noisereduction of real-time trafficdatardquoDongnanDaxueXuebao vol36 no 2 pp 322ndash325 2006

[8] B-G Daniel J D-P Francisco G-O David et al ldquoWavelet-based denoising for traffic volume time series forecasting withself-organizing neural networksrdquo Computer-Aided Civil andInfrastructure Engineering vol 25 no 7 pp 530ndash545 2010

[9] S Gandy B Recht and I Yamada ldquoTensor completion andlowmdashrank tensor recovery via convex optimizationrdquo InverseProblems vol 27 no 2 Article ID 025010 2011

[10] T G Kolda and B W Bader ldquoTensor decompositions andapplicationsrdquo SIAM Review vol 51 no 3 pp 455ndash500 2009

[11] A S Lewis and G Knowles ldquoImage compression using the 2-Dwavelet transformrdquo IEEE Transactions of Image Processing vol1 no 2 pp 244ndash250 1992

[12] V Chandrasekaran S Sanghavi P A Parrilo and A S WillskyldquoRank-sparsity incoherence for matrix decompositionrdquo SIAMJournal on Optimization vol 21 no 2 pp 572ndash596 2011

[13] J Wright Y Peng Y Ma and A Ganesh ldquoRobust principalcomponent analysis exact recovery of corrupted low-rankmatrices by convex optimizationrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 2080ndash2088 Vancouver Canada 2009

[14] A Ganesh Z Lin J Wright L Wu M Chen and Y MaldquoFast algorithms for recovering a corrupted low-rank matrixrdquo

in Proceedings of the 3rd IEEE International Workshop onComputational Advances in Multi-Sensor Adaptive Processing(CAMSAP rsquo09) pp 213ndash216 December 2009

[15] Z Lin M Chen L Wu and Y Ma ldquoThe augmented Lagrangemultiplier method for exact recovery of a corrupted low-rankmatricesrdquoMathematical Programming 2009

[16] X Yuan and J Yang ldquoSparse and low-rank matrix decomposi-tion via alternating directionmethodsrdquo Tech Rep Departmentof Mathematics Hong Kong Baptist University 2009

[17] L Qu Y Zhang J Hu L Jia and L Li ldquoA BPCA basedmissing value imputing method for traffic flow volume datardquo inProceedings of the IEEE Intelligent Vehicles Symposium (IV rsquo08)pp 985ndash990 June 2008

[18] J Liu P Musialski P Wonka and J Ye ldquoTensor completion forestimating missing values in visual datardquo in Proceedings of theInternational Conference on Computer Vision (ICCV rsquo09) 2009

[19] D Bertsekas and J Tsitsiklis Parallel and Distributed Computa-tion Prentice-Hall 1989

[20] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010

[21] E T Hale W Yin and Y Zhang ldquoFixed-point continuationfor ℓ1-minimization methodology and convergencerdquo SIAMJournal on Optimization vol 19 no 3 pp 1107ndash1130 2008

[22] Y Li J Yan Y Zhou and J Yang ldquoOptimum subspace learningand error correction for tensorsrdquo in Proceedings of the 11thEuropean Conference on Computer Vision (ECCV rsquo10) CreteGreece 2010

[23] E J Candes X Li Y Ma and J Wright ldquoRobust principalcomponent analysisrdquo Journal of the ACM vol 58 no 3 article11 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Traffic Volume Data Outlier …downloads.hindawi.com/journals/mpe/2013/164810.pdfTraffic Volume Data Outlier Recovery via Tensor Model HuachunTan, 1 JianshuaiFeng,

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of