6
LOW-RESOLUTION PEDESTRIAN DETECTION VIA A NOVEL RESOLUTION-SCORE DISCRIMINATIVE SURFACE Xiao Wang 1,2 , Jun Chen 1,2,3 , Chao Liang 1,2,3 , Chen Chen 4 , Zheng Wang 1,2 , Ruimin Hu 1,2,3 , 1 National Engineering Research Center for Multimedia Software, Wuhan University, China 2 Hubei Provincial Key Laboratory of Multimedia and Network Communication Engineering, China 3 Collaborative Innovation Center of Geospatial Technology, China 4 Center for Research in Computer Vision, University of Central Florida ABSTRACT Pedestrian detection, as an important task in video surveil- lance and forensics applications, has been widely studied. However, its performance is unsatisfactory especially in the low resolution conditions. In realistic scenarios, the size of pedestrians in the images is often small, and detection can be challenging. To solve this problem, this paper proposes a novel resolution-score discriminative surface method to in- vestigate the variation behaviors of detection scores under d- ifferent pedestrian and non-pedestrian image resolutions. The discriminative surface consists of a series of positive and neg- ative resolution-score lines, and each of them is a connected line to depict the variation relationship between pedestrian’s detection scores under various image resolutions. On this basis, the resolution-score discriminative surface can classi- fy a resolution-score line as a pedestrian or not according to whether it lies in the positive or the negative region. Experi- mental results on two public datasets and one campus surveil- lance dataset demonstrate the effectiveness of the proposed method. Index TermsPedestrian detection, Low-resolution, Resolution-score line, Discriminative surface 1. INTRODUCTION Pedestrian detection, which is to identify the category (pedes- trian or non-pedestrian) of an object and give its position in- formation in image, has long been playing a critical role in realistic applications [1–4]. Especially in criminal investiga- tion, pedestrian information, as one of the most informative objects, provides the most direct and important cues for the case solving. However, in a real world surveillance scenarios, The research was supported by the National High Technology Research and Development Program of China (2015AA016306), the National Nature Science Foundation of China (61231015), the Nature Science Foundation of Jiangsu Province (BK20160386), the Technology Research Program of Min- istry of Public Security (2016JSYJA12), the Fundamental Research Fund- s for the Central Universities(2042014kf0025), the science and technology program of Shenzhen (JCYJ20150422150029092). 1.0 0.8 0.6 0.5 0.4 0.3 0.2 32 46 64 92 128 VJ [5] Shapelet [6] LatSvm-V1 [7] PoseLnv [8] HogLbp [9] FtrMine [5] HikSvm [10] HOG [11] MultiFtr [12] LatSvm-v2 [7] Pls [13] MultiFtr+CSS[12] FeatSynth [14] ChnFtrs [5] FPDW [5] R-CNN [15] miss rate pedestrian height (pixels) score=0.98 score=0.86 score=0.59 score=0.45 score=0.24 Fig. 1. The problem of low resolution pedestrian detection. We show the performance of state-of-art detectors [5–15] with different low resolutions on the Caltech dataset [5]. The hor- izontal axis denotes the pedestrian height (the width is half of the height) and the vertical axis denotes the average miss rate. The miss rate is higher, the performance is worse. The performance of pedestrian detection varies with the different low resolutions of the pedestrian. the captured pedestrian images are usually of low resolution. Since many iconic details of pedestrian features are lost in the image acquisition and transmission, it is very difficult to detect pedestrians in practical applications. Generally speaking, previous pedestrian detection meth- ods can be divided into two categories: feature-based and classifier-based methods. The feature extractor in tradition- al pedestrian detection method is a hand engineered mod- ule, such as LBP [9], HOG [11]. Thereafter, the classifier focuses on seeking an optimal decision for pedestrian detec- tion, such as linear SVM [7], kernel SVM [16] and the non- linear boosted classifier [17]. Recently, large performance improvements have been realized by training deep ConvNet- s [18]. R-CNN [15], one particularly successful approach, starts with a pre-trained ImageNet [19] classification network 978-1-5090-6067-2/17/$31.00 c 2017 IEEE

LOW-RESOLUTION PEDESTRIAN DETECTION VIA …cxc123730/ICME-Pedestrian-detection.pdfI1 I2 e I1 I2 0 5 0 5 0 5 0 57 o e e e (a) 0 5 0 5 0 5 0 57 e o (b) 0 5 0 5 0 5 0 57 e e e e o (c)

Embed Size (px)

Citation preview

Page 1: LOW-RESOLUTION PEDESTRIAN DETECTION VIA …cxc123730/ICME-Pedestrian-detection.pdfI1 I2 e I1 I2 0 5 0 5 0 5 0 57 o e e e (a) 0 5 0 5 0 5 0 57 e o (b) 0 5 0 5 0 5 0 57 e e e e o (c)

LOW-RESOLUTION PEDESTRIAN DETECTION VIA A NOVEL RESOLUTION-SCOREDISCRIMINATIVE SURFACE

Xiao Wang1,2, Jun Chen1,2,3, Chao Liang1,2,3, Chen Chen4, Zheng Wang1,2, Ruimin Hu1,2,3,

1National Engineering Research Center for Multimedia Software, Wuhan University, China2Hubei Provincial Key Laboratory of Multimedia and Network Communication Engineering, China

3Collaborative Innovation Center of Geospatial Technology, China4Center for Research in Computer Vision, University of Central Florida

ABSTRACT

Pedestrian detection, as an important task in video surveil-lance and forensics applications, has been widely studied.However, its performance is unsatisfactory especially in thelow resolution conditions. In realistic scenarios, the size ofpedestrians in the images is often small, and detection canbe challenging. To solve this problem, this paper proposesa novel resolution-score discriminative surface method to in-vestigate the variation behaviors of detection scores under d-ifferent pedestrian and non-pedestrian image resolutions. Thediscriminative surface consists of a series of positive and neg-ative resolution-score lines, and each of them is a connectedline to depict the variation relationship between pedestrian’sdetection scores under various image resolutions. On thisbasis, the resolution-score discriminative surface can classi-fy a resolution-score line as a pedestrian or not according towhether it lies in the positive or the negative region. Experi-mental results on two public datasets and one campus surveil-lance dataset demonstrate the effectiveness of the proposedmethod.

Index Terms— Pedestrian detection, Low-resolution,Resolution-score line, Discriminative surface

1. INTRODUCTION

Pedestrian detection, which is to identify the category (pedes-trian or non-pedestrian) of an object and give its position in-formation in image, has long been playing a critical role inrealistic applications [1–4]. Especially in criminal investiga-tion, pedestrian information, as one of the most informativeobjects, provides the most direct and important cues for thecase solving. However, in a real world surveillance scenarios,

The research was supported by the National High Technology Researchand Development Program of China (2015AA016306), the National NatureScience Foundation of China (61231015), the Nature Science Foundation ofJiangsu Province (BK20160386), the Technology Research Program of Min-istry of Public Security (2016JSYJA12), the Fundamental Research Fund-s for the Central Universities(2042014kf0025), the science and technologyprogram of Shenzhen (JCYJ20150422150029092).

1.0

0.8

0.6

0.5

0.4

0.3

0.2

32466492128

VJ [5]

Shapelet [6]

LatSvm-V1 [7]

PoseLnv [8]

HogLbp [9]

FtrMine [5]

HikSvm [10]

HOG [11]

MultiFtr [12]

LatSvm-v2 [7]

Pls [13]

MultiFtr+CSS[12]

FeatSynth [14]

ChnFtrs [5]

FPDW [5]

R-CNN [15]

miss ra

te

pedestrian height (pixels)

score=0.98

score=0.86

score=0.59

score=0.45

score=0.24

Fig. 1. The problem of low resolution pedestrian detection.We show the performance of state-of-art detectors [5–15] withdifferent low resolutions on the Caltech dataset [5]. The hor-izontal axis denotes the pedestrian height (the width is halfof the height) and the vertical axis denotes the average missrate. The miss rate is higher, the performance is worse. Theperformance of pedestrian detection varies with the differentlow resolutions of the pedestrian.

the captured pedestrian images are usually of low resolution.Since many iconic details of pedestrian features are lost inthe image acquisition and transmission, it is very difficult todetect pedestrians in practical applications.

Generally speaking, previous pedestrian detection meth-ods can be divided into two categories: feature-based andclassifier-based methods. The feature extractor in tradition-al pedestrian detection method is a hand engineered mod-ule, such as LBP [9], HOG [11]. Thereafter, the classifierfocuses on seeking an optimal decision for pedestrian detec-tion, such as linear SVM [7], kernel SVM [16] and the non-linear boosted classifier [17]. Recently, large performanceimprovements have been realized by training deep ConvNet-s [18]. R-CNN [15], one particularly successful approach,starts with a pre-trained ImageNet [19] classification network

978-1-5090-6067-2/17/$31.00 c⃝2017 IEEE

Page 2: LOW-RESOLUTION PEDESTRIAN DETECTION VIA …cxc123730/ICME-Pedestrian-detection.pdfI1 I2 e I1 I2 0 5 0 5 0 5 0 57 o e e e (a) 0 5 0 5 0 5 0 57 e o (b) 0 5 0 5 0 5 0 57 e e e e o (c)

1I

2I

Down sample Up sample

resolution-score line of

resolution-score line of

1I

2I

5.0

4.5

4.0

3.5

3.0

2.5

2.0

0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7

Resolution Ratio

Tra

nsf

orm

ed

Sco

rePositive sample

Negative sample

(a)

5.0

4.5

4.0

3.5

3.0

2.5

2.0

0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7

Tra

nsf

orm

ed

Sco

re

RSS of positive samples

RSS of negative samples

Resolution Ratio

(b)

5.0

4.5

4.0

3.5

3.0

2.5

2.0

0.1 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7

Tra

nsf

orm

ed

Sco

re

Positive test sample

Negative test sample

Down sample Up sample

Resolution Ratio

(c)

Fig. 2. In the above figures, the horizontal axis represents resolution ratio which is the resized resolutions of sample image;the vertical axis denotes the transformation value of the detection score. (a) Two resolution-score lines. (b) The resolution-score surface. (c) This figure shows that positive and negative test samples can be distinguished in the resolution-score surfaceaccording to whether it lies in the positive or the negative region. Therefore, the learned resolution-score surface can be usedfor LRPD.

and then fine-tunes the ConvNet. Despite of these remarkableprogress, the performance of these state-of-the-art method-s [5–15] is still bad especially when the pedestrian resolutionis 46 × 23 or smaller (Fig .1). There are two pioneer research-es which investigated the Low Resolution Pedestrian Detec-tion (LRPD) problem, [20] proposed a hierarchical structuresalgorithm to achieve the pedestrian detection under the lim-ited low resolution images. While, [21] built a normalizedvolumetric gradient-based feature set which allows pedestri-ans to be discriminated in low resolution videos. However,both of these two researches have resized the image to a u-niformly resolution and neglected the fact that the detectionperformance varies with the different low resolutions (Fig. 1),where a common phenomenon is that the lower resolution is,the lower detection score is, and the worse performance is.

Since the detection score of a pedestrian image varies withthe different resolutions of the pedestrian image, a naturalproblem is that how the detection score changes when thepedestrian image resolution goes down /up gradually. To vi-sualize the changes of scores under various resolutions, wemap the association of the detection scores and the pedestri-an resolutions, then generate a resolution-score line (RSL) foreach pedestrian image. Inspired by [22], which proposed toclassify different scale-distance functions formed by differ-ent image pairs for person re-identification, in this paper wepropose to distinguish positive and negative pedestrian sam-ple via resolution-score surface (RSS), instead of using thedetection score as the final evaluation.

We observe that RSLs can be separated as positive andnegative ones as well. The positive lines are generated bypedestrian images with various resolutions, while the nega-tive lines are generated by non-pedestrian images with vari-ous resolutions. The positive and negative RSLs comprise theRSS. The observation in detail demonstrates a visual proof of

the discriminating power of the positive and negative RSLs.The RSS can classify a RSL as a pedestrian or not accordingto whether it lies in the positive or the negative RSS region.The proposed work explores this discriminative power in theRSS for LRPD.

2. MOTIVATION

As the pedestrian resolution affects the detection score, weattempt to excavate the score variation rule as the resolutionchanges. To investigate the issue, we made a preliminary ex-periment. Two images were selected from Caltech pedestriandataset [5]. They were respectively denoted as I1 (positivesample) and I2 (negative sample). To simplify the process,an easy and uniform method was used to obtain the feature ofevery image. Although the resolution of the image is incon-sistent, the RoI pooling [23] can extract fixed-length featuresfrom images, and the detection score of an image was calcu-lated by the softmax [24]. In this way, we could easily ob-tain a sequence of detection scores of image, when we down-sampled /up-sampled the image step by step. We used s to de-note each detection score, and k to represent the resolution ra-tio of the down-sampled /up-sampled image resolution to theoriginal one. To visualize the changes of s with k, we drewcurves with sequences of s−k value pairs in Fig.2, where thehorizontal axis represents the resolution ratio, and the verti-cal axis stands for the transformation value of score. In or-der to make observation easier, the score was transformed by−s = es∗k. We named each line as RSL. Fig.2(a) shows thetwo RSLs respectively generated by I1 and I2. We found thatthe two lines can be separated for different images.

Furthermore, we randomly selected 150 pedestrian sam-ples and 150 non-pedestrian samples from the Caltech pedes-trian dataset [5]. Then, 150 RSLs for the pedestrian samples

Page 3: LOW-RESOLUTION PEDESTRIAN DETECTION VIA …cxc123730/ICME-Pedestrian-detection.pdfI1 I2 e I1 I2 0 5 0 5 0 5 0 57 o e e e (a) 0 5 0 5 0 5 0 57 e o (b) 0 5 0 5 0 5 0 57 e e e e o (c)

and non-pedestrian samples were respectively obtained, fol-lowing the RSL generation method described above. We com-puted the variances of RSLs for pedestrian and non-pedestriansamples. As Fig.2(b) shows, the red shaded area demonstratesthe corresponding spread of the variance of pedestrian sam-ples. Meanwhile, the blue shaded area represent the RSS forthe non-pedestrian samples. We name the RSS for the pedes-trian image as positive RSS, and that for non-pedestrian im-age as negative one. This figure shows that positive and neg-ative RSS are discriminative. The proposed method exploresthis discriminative power of the positive and negative RSLsin the RSS for pedestrian detection.

Directly exploiting general feature-score models may notbe possible to discriminate pedestrians well enough in thekind of low resolution situation (Fig .1). We computed theRSL of the pedestrian image in the actual experimentations todeal with these challenges. In Fig.2(c), an example was intro-duced to show how to identify pedestrian and non-pedestrianimage with different low resolutions. In the same way, wegradually down-sampled/ up-sampled the test images and cal-culated their scores, then formed two RSLs. From the figure,we can see that the RSLs generated by positive and negativetest sample lie in the red and blue shaded area respectively,and the learned resolution-score surface can be used for LR-PD.

3. APPROACH

The overall scheme of the proposed pedestrian detection pro-cess is introduced in this part (Fig .3). It is divided into t-wo stages: offline training stage and online test/ detectionstage. In the offline training stage, we learn a discrimina-tive pedestrian model in feature space through the deep CN-N method [24] which is the most classical pedestrian detec-tion CNN framework. It can get a deep pedestrian model,which can generate the detection scores when the new testimages come. Then, we learn a discriminative RSS model toget the probability indicating that a sample RSL comes fromthe pedestrian. For each candidate image proposal, parametervector of each RSL are computed by the function represen-tation module. In online test stage, the RSS representationmodule computes the RSLs and classifies the image as pedes-trian or not when a new test image comes.

3.1. RSS representation

In this step, we get the candidate pedestrian regions by us-ing the region proposal network (RPN) from deep CNN [24].This part is initialized with an ImageNet pretrained model andfine-tuned end-to-end for the region proposal task. Then, wetrain a pedestrian detection network by using the generatedproposals. This detection network is also initialized by theImageNet-pre-trained model. Although the resolution of thecandidate region is inconsistent, the RoI pooling can extract

training data

positive samples

negative samples

score

resolution ratio

positive RSLs

negative RSLs

SVM

test data

score

resolution ratio SVM

Offline

training

Online

detection

RSS representation RSS classifier

Fig. 3. The framework of our method. It includes the offlinetraining stage and online detection stage.

fixed-length features from candidate region, and the detectionscore of an image was calculated by learned pedestrian detec-tion model. In this way, we could easily obtain a sequenceof detection scores, when we down-sampled /up-sampled theimage step by step.

To generate the RSL, we represent the original visual fea-ture of the sample I as F 1, where the superscript 1 stands forthe resolution ratio of the original sample resolution. Then,we up and down sample I step by step, and thus obtain a se-ries of visual features as: F 1.70, F 1.69, . . . F 0.06, F 0.05. Here,it should be noted that the resolution ratio stands for the ratioof image heights or widths. That is to say, if the resolutionratio k = 0.5, it means that the height and the width of thedown-sampled image I0.5 is half of I .

We can calculate a range of detection scores sk, k ∈[0.05, 1.7]. In order to make observation easier, each scoresk is transformed into a new form: s̄k = exp(sk ∗ k), wheresk is the detection score of sample at the resolution ratio k, s̄k

is the transformation form, k is resolution ratio. In one word,for each proposal region Ii, the transformed score s̄ki varieswith the resolution ratio k. By this means, the RSL is gen-erated as the curves shown in Fig. 2(a), where the horizontalaxis stands for the resolution ratio k, and the vertical axis s-tands for the transformed score. Those pedestrian samplesproduce the positive RSLs, and those non-pedestrian samplesproduce the negative RSLs.

We randomly selected 150 positive samples and 150 neg-ative samples from the Caltech dateset [5]. Then, 150 RSLsfor pedestrians and 150 RSLs for non-pedestrians were re-spectively obtained, following the RSL generation method de-scribed above. We can get the means RSL for positive RSLsand negative RSLs. Then, the standard deviations for positiveRSLs and negative RSLs can been got. The red and blue shad-ed areas show the corresponding spread of the variances (as ±standard deviation value) of positive and negative RSLs (Fig.2(b)), we call them positive and negative RSS region respec-tively. The RSS can be discriminative and used for pedestriandetection when pedestrians are with low resolution.

Page 4: LOW-RESOLUTION PEDESTRIAN DETECTION VIA …cxc123730/ICME-Pedestrian-detection.pdfI1 I2 e I1 I2 0 5 0 5 0 5 0 57 o e e e (a) 0 5 0 5 0 5 0 57 e o (b) 0 5 0 5 0 5 0 57 e e e e o (c)

3.2. Training RSS classifier

The Fig .2 shows the transformation score curve with respec-t to the variations of pedestrian image resolutions. A priorknowledge is that the expected relevance score should be de-creasing with respect to down sample resolutions. Therefore,we further smooth the curve with a parametric approach. An

expression of the RSL is−s = a + bek/c, where

−s is the re-

sponse value of the RSL, k is the variant of the RSL, and a, b,

c are the parameters of RSL. We assume−s = a + bek/c and

then fit this function with the RSL. In this way, we can use theparameter vector {a, b, c} to denote the RSL.

The RSLs can been written as vectors which is com-posed by parameters a, b, and c. For each training sampleIi = (xi, yi) , where yi is label information(+1: positivesample, -1: negative sample). We can learn a new expres-sion through parameterization. So the sample can been ex-pressed as (ai, bi, ci) and label information yi. Therefore, wecan learn a kernel SVM easier, we name it as RSS classifier.

3.3. RSS for pedestrian detection

In the online test stage, we distinguish a RSL as coming fromthe pedestrian or not according to the probability through us-ing the RSS classifier trained above. For example, a newcoming image proposal first produces a series of scores byup-sampling and down-sampling the resolution of the imageproposal. Then, the RSL representation module transform-s the generated RSL into a parameter vector (in subsection3.2). The RSS classifier can predicts the probability that theimage proposal comes from a pedestrian.

4. EXPERIMENTS

In this section, the proposed approach is validated by com-paring with several classical pedestrian detection method-s [11,16,24] on two pedestrian datasets: Low-Caltech datasetand Low-ETHZ dataset, which are based on the publicly rep-resentative Caltech dataset [5] and ETHZ dataset [25] respec-tively. We chose these datasets as they provide many chal-lenges faced in practical surveillance, i.e., viewpoint, poseand illumination changes, different backgrounds, occlusion-s, etc. Additionally, in order to be more close to monitorenvironment, a practice campus surveillance dataset is alsoadopted to verify the RSS method.

4.1. Experimental Datasets

Low-Caltech dataset Our method is evaluated on the Caltechdatasets [5], using subsets set00-set05 for training and set06-set10 for testing. Caltech dataset consists of about 250,000frames with a total of 350,000 bounding boxes and 2,300unique pedestrians are annotated. The individuals in these

datasets appear in many positions, orientations, and back-ground variety. However, most observed pedestrians (about70% of pedestrian height is more than 50 pixels) are not withthe low resolution. In order to simulate the low resolution s-cenarios, we resize the images into the quarter of the originalresolution, height and width are the half of original resolution.

Low-ETH dataset ETH [25] datasets are often used forverifying the generalization capability of the models. TheETH dataset consists of 3 testing video sequences. Our mod-els are evaluated on the 1804 images in ETH dataset. Mostobserved pedestrians (more than 90% of pedestrian heigh is189 pixels) are not with the low resolution. In order to sim-ulate the low resolution scenarios, we resize the images intothe quarter of the original resolution, height and width are thehalf of original resolution.

Campus surveillance dateset Additionally, in order tobe more close to real monitor environment, a practice cam-pus surveillance dataset is also used to verify the proposedmethod. It consists of about 14,660 frames with a total of230,000 bounding boxes and 8,213 unique pedestrians are an-notated. The pedestrians appear in many positions, orienta-tions, and background variety. Our models are evaluated onthe 2478 images in the dataset. Most observed pedestrians(about 70%) are with the low resolution (height of pedestrianis less than 50 pixels ).

1.0

0.8

0.6

0.5

0.4

0.3

0.210-3 10-2 10-1 100

mis

s ra

te

false positive per image (FPPI)

Classifier-based method [16]

Feature-based method [11]

Deep CNN method [24]

Our RSS method

Fig. 4. the Low-Caltech dataset

4.2. Implementation details

In order to compare the scores of pedestrian with differentresolutions, we represent the feature of each pedestrian imagewith the same dimension. We adopt RoI pooling to extrac-t fixed-length features from images, and the detection scoreof an image was calculated by pedestrian detection model (inSec 3.1). More detailedly, the data are augmented by horizon-tal flipping and color altering. Dropout [18] is used on the twofully-connected layers. The learning rate starts from 0.01, and

Page 5: LOW-RESOLUTION PEDESTRIAN DETECTION VIA …cxc123730/ICME-Pedestrian-detection.pdfI1 I2 e I1 I2 0 5 0 5 0 5 0 57 o e e e (a) 0 5 0 5 0 5 0 57 e o (b) 0 5 0 5 0 5 0 57 e e e e o (c)

is divided by 10 when the error plateaus. Our experiments arerun on a GeForce GTX Titan GPU.

In order to ensure that the comparison was performedin a fair condition, all datasets are divided into training setand testing set. Images for training/ testing are respectively42782/ 4024 (Low-Caltech), 2388/ 1804 (Low-ETHZ), and5735/2478 (Campus surveillance).

4.3. Evaluation protocol.

The labels and evaluation code provided by Dollar et al areused for evaluation following the criteria proposed in [5]. Thelog-average miss rate is used to summarize the detector per-formance, and is computed by averaging the miss rate at falsepositive per image (FPPI) rates evenly spaced in the log-spacein the range from 10−3 to 101. For each experiment, a subsetof all ground truth data is considered according to its specif-ic purpose. Outliers are marked with an ignore label, whichmeans they need not be matched. We filter out detection re-sults using an expanded filtering method [5].

1.0

0.8

0.6

0.5

0.4

0.3

10-3 10-2 10-1 100

mis

s ra

te

false positive per image (FPPI)

Classifier-based method [16]

Feature-based method [11]

Deep CNN method [24]

Our RSS method

Fig. 5. the Low-ETH dataset

4.4. Evaluation on the Datasets

In this subsection, we demonstrate that the classic features-based [11], classifier-based [16], and deep CNN [24] methodwill gradually lose its effectiveness in the low resolution con-dition. The resolution ratio of the test proposal set with d-ifferent resolution range from 0.05 to 1.70. The pedestriandetection performance is reported with log-average miss rate,the results on Caltech dataset are shown in Fig. 4, and theresults on ETHZ dataset are shown in Fig. 5. The miss rate ishigh (bad) when the methods [11,16,24] are applied to the de-tection images. The performance of our RSS is better than theabove methods. Additionally, a practice campus surveillancedataset is also adopted to verify the RSS method, as a result

that which is more close to monitor environment (in Fig. 6).Our method is better than other methods.

1.0

0.8

0.6

0.5

0.4

10-2 10-1 100

mis

s ra

te

false positive per image (FPPI)

Classifier-based method [16]

Feature-based method [11]

Deep CNN method [24]

Our RSS method

Fig. 6. the campus surveillance dataset

From the observation, the discussions are made as fol-lows. The detection performance varies with the differentresolutions of the pedestrian image, where a common phe-nomenon is that the lower resolution is, the lower detectionscore is, and the worse performance is. So, general feature-score models [11, 16, 24] could not be possible to discrim-inate pedestrians well enough in the kind of low resolutionsituation. RSls can be separated as positive and negative onesas well in the new RSS, instead of using the detection scoreas the final evaluation. Comparing the results respectivelyon [11, 16, 24], it shows that the RSS method is suitable forlow resolution situation.

5. CONCLUSION

This paper raises a new issue, low-resolution pedestrian de-tection via a novel resolution-score discriminated surface,which has not been investigated before as far as we know.Traditional models may not be proper for this task, due tothe unsatisfactory performance in the low resolution condi-tion. We propose to learn a discriminating surface to addressthe problem, by mapping a RSL onto the RSS and classify-ing it as either the positive or negative SSL. Experimental re-sults illustrate that the traditional model has a significant lossof performance when the resolution of images are low, anddemonstrate the effectiveness of the proposed framework.

6. REFERENCES

[1] M. Ding and G. Fan, “Articulated and generalizedgaussian kernel correlation for human pose estimation,”IEEE Transactions on Image Processing, vol. 25, no. 2,pp. 776, 2016.

Page 6: LOW-RESOLUTION PEDESTRIAN DETECTION VIA …cxc123730/ICME-Pedestrian-detection.pdfI1 I2 e I1 I2 0 5 0 5 0 5 0 57 o e e e (a) 0 5 0 5 0 5 0 57 e o (b) 0 5 0 5 0 5 0 57 e e e e o (c)

[2] C. Chen, R. Jafari, and N. Kehtarnavaz, “A real-timehuman action recognition system using depth and iner-tial sensor fusion,” IEEE Sensors Journal, vol. 16, no.3, pp. 773–781, 2015.

[3] Zheng Wang, Ruimin Hu, Chao Liang, and Yi Yu,“Zero-shot person re-identification via cross-view con-sistency,” IEEE Transactions on Multimedia, vol. 18,no. 2, pp. 1–1, 2015.

[4] Chen Chen, R Jafari, and N Kehtarnavaz, “Improvinghuman action recognition using fusion of depth cameraand inertial sensors,” IEEE Transactions on Human-Machine Systems, vol. 45, no. 1, pp. 51–61, 2014.

[5] C. Wojek, P. Dollar, B. Schiele, and P. Perona, “Pedes-trian detection: An evaluation of the state of the art,”IEEE Transactions on Pattern Analysis and Machine In-telligence, vol. 34, no. 4, pp. 743–761, 2012.

[6] Payam Sabzmeydani and Greg Mori, “Detecting pedes-trians by learning shapelet features,” 2007, pp. 1–8.

[7] Pedro F. Felzenszwalb, Ross B. Girshick, David M-callester, and Deva Ramanan, “Object detection withdiscriminatively trained part-based models,” IEEETransactions on Software Engineering, vol. 32, no. 9,pp. 1627–45, 2010.

[8] Zhe Lin and Larry S. Davis, “A pose-invariant descriptorfor human detection and segmentation,” in EuropeanConference on Computer Vision, 2008, pp. 423–436.

[9] Taejae Jeon, Taeoh Kim, Hyunhye Shin, and SangyounLee, “Pedestrian detection using hog, lbp and colorlbp,” in ITC-CSCC :International Technical Conferenceon Circuits Systems, Computers and Communications,2015.

[10] Chunhui Gu, J. J. Lim, P. Arbelaez, and J. Malik,“Recognition using regions,” 2009, pp. 1030–1037.

[11] Navneet Dalal and Bill Triggs, “Histograms of orient-ed gradients for human detection,” in IEEE Conferenceon Computer Vision and Pattern Recognition, 2013, pp.886–893.

[12] Stefan Walk, Nikodem Majer, Konrad Schindler, andBernt Schiele, “New features and insights for pedestriandetection,” 2010, pp. 1030–1037.

[13] W. R. Schwartz, A. Kembhavi, D. Harwood, and L. S.Davis, “Human detection using partial least squaresanalysis,” in IEEE International Conference on Com-puter Vision, 2009, pp. 24–31.

[14] Aharon Barhillel, Levi Dan, Eyal Krupka, and GoldbergChen, “Part-based feature synthesis for human detec-tion,” in Computer Vision - ECCV 2010, European Con-ference on Computer Vision, Heraklion, Crete, Greece,September 5-11, 2010, Proceedings, 2010, pp. 127–142.

[15] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jagan-nath Malik, “Rich feature hierarchies for accurate ob-ject detection and semantic segmentation,” ComputerScience, pp. 580–587, 2014.

[16] J. R. R. Uijlings, K. E. A. Van De Sande, T. Gevers,and A. W. M. Smeulders, “Selective search for objectrecognition,” International Journal of Computer Vision,vol. 104, no. 2, pp. 154–171, 2013.

[17] Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuan-qing Lin, “Regionlets for generic object detection,”IEEE Transactions on Pattern Analysis and Machine In-telligence, vol. 37, no. 10, pp. 2071–84, 2015.

[18] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hin-ton, “Imagenet classification with deep convolutionalneural networks,” Advances in Neural Information Pro-cessing Systems, vol. 25, no. 2, pp. 2012, 2012.

[19] Jia Deng, Wei Dong, R. Socher, Li Jia Li, Kai Li, andFei Fei Li, “Imagenet: A large-scale hierarchical imagedatabase,” 2009, pp. 248–255.

[20] Yun Fu Liu, Jing Ming Guo, Che Hao Chang, and Chi-h Hsien Hsia, “Hierarchical pedestrian detection underlow resolution scenario,” in International Symposium onIntelligent Signal Processing and Communications Sys-tems, 2012, pp. 80–84.

[21] Hisham Sager and William Hoff, “Pedestrian detectionin low resolution videos,” in IEEE Winter Conference onApplications of Computer Vision, 2014, pp. 668–673.

[22] Wang Zheng, Hu Ruimin, Jiang Junjun, and LiangChao, “Scale-adaptive low-resolution person re-identification via learning a discriminating surface,”2016, pp. 2669–2675.

[23] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun, “Spatial pyramid pooling in deep convolutionalnetworks for visual recognition,” IEEE Transactions onPattern Analysis and Machine Intelligence, vol. 37, no.9, pp. 1904–1916, 2015.

[24] S. Ren, K. He, R Girshick, and J. Sun, “Faster r-cnn:Towards real-time object detection with region proposalnetworks.,” IEEE Transactions on Pattern Analysis andMachine Intelligence, pp. 1–1, 2016.

[25] Andreas Ess, Bastian Leibe, Konrad Schindler, andLuc Van Gool, “A mobile vision system for robust multi-person tracking,” pp. 1–8, 2008.