5
The Optical Flow-Based Analysis of Human Behavior-Specific System Hsin-Chun Tsai Chi-Hung Chuang * Shin-Pang Tseng Jhing-Fa Wang Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan [email protected] Department of Applied Informatics, Fo Guang University, I-Lan, Taiwan [email protected] Department of Computer Engineering and Entertainment Technology, Tajen University, Pingtung, Taiwan [email protected] Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan Department of Digital Multimedia Design, Tajen University, Pingtung, Taiwan [email protected] AbstractTo detect illegal behaviors, cameras have to be mounted in public space to analyze human behavior to determine whether it is illegal. In this thesis, we focus on detecting smoking and drinking in certain public spaces as the illegal behaviors. Comparing with other works, our work does not need to establish the background in advance to classify human behavior. The proposed system consists of four modules including face region extraction, multiple hand samples extraction, features extraction, and behavior analysis. We extract three features from each human behavior. They are the touching time between the face and hand samples, smoke detection, and handheld object detection. Then a decision tree is employed to classify the human behavior by using the extracted three features. Experimental result demonstrates that the proposed method can be suited and successfully applied in many environments under various conditions, such as different illumination intensities, different backgrounds, and the different habits exhibited by human. Keywords Analysis of human behavior, face detection, optical flow, particle filter, smoke detection. I. INTRODUCTION The World Health Organization estimates that tobacco kills nearly six million people a year of whom more than 5 million are users and ex users and more than 600,000 are nonsmokers exposed to second-hand smoke. Approximately one person dies every six seconds due to tobacco and this accounts for one in 10 adult deaths. There are more than 50 carcinogens in tobacco smoke. Second-hand smoke causes the fatal diseases in adults and babies. Therefore, besides the problem of the health, the smoking behavior causes a problem of the social order. In Taiwan, smoking in public space is prohibited. Another illegal behavior in certain public space is drinking. There are increasing public spaces prohibiting drinking and eating, such as libraries, museums, MRT, and so on. To practice laws in certain public space, this paper proposes a smart video surveillance system to analyze human behavior to determine whether it is illegal or not. The flowchart is shown in Fig.1. The proposed system consists of six sections, including face region extraction, multiple hand samples extraction, the touching time between the face and hand samples, handheld object detection, smoke detection, and human behavior recognition. Fig. 1: Flowchart of our proposed system. Load next frame Face region extraction Multiple hand samples extraction The touching time between the face and hand samples Handheld object detection Human behavior recognitio n Smoke detection 978-1-4673-5936-8/13/$31.00 ©2013 IEEE 214

[IEEE 2013 1st International Conference on Orange Technologies (ICOT 2013) - Tainan (2013.3.12-2013.3.16)] 2013 1st International Conference on Orange Technologies (ICOT) - The optical

Embed Size (px)

Citation preview

Page 1: [IEEE 2013 1st International Conference on Orange Technologies (ICOT 2013) - Tainan (2013.3.12-2013.3.16)] 2013 1st International Conference on Orange Technologies (ICOT) - The optical

The Optical Flow-Based Analysis of Human Behavior-Specific System

Hsin-Chun Tsai Chi-Hung Chuang* Shin-Pang Tseng Jhing-Fa Wang Department of Electrical

Engineering, National Cheng Kung

University, Tainan, Taiwan

[email protected]

Department of Applied Informatics,

Fo Guang University, I-Lan, Taiwan

[email protected]

Department of Computer Engineering and

Entertainment Technology, Tajen University,

Pingtung, Taiwan [email protected]

Department of Electrical Engineering,

National Cheng Kung University,

Tainan, Taiwan Department of Digital

Multimedia Design, Tajen University,

Pingtung, Taiwan [email protected]

Abstract-To detect illegal behaviors, cameras have to be

mounted in public space to analyze human behavior to determine whether it is illegal. In this thesis, we focus on detecting smoking and drinking in certain public spaces as the illegal behaviors. Comparing with other works, our work does not need to establish the background in advance to classify human behavior.

The proposed system consists of four modules including face region extraction, multiple hand samples extraction, features extraction, and behavior analysis. We extract three features from each human behavior. They are the touching time between the face and hand samples, smoke detection, and handheld object detection. Then a decision tree is employed to classify the human behavior by using the extracted three features. Experimental result demonstrates that the proposed method can be suited and successfully applied in many environments under various conditions, such as different illumination intensities, different backgrounds, and the different habits exhibited by human. Keywords- Analysis of human behavior, face detection, optical flow, particle filter, smoke detection.

I. INTRODUCTION The World Health Organization estimates that

tobacco kills nearly six million people a year of whom more than 5 million are users and ex users and more than 600,000 are nonsmokers exposed to second-hand smoke. Approximately one person dies every six seconds due to tobacco and this accounts for one in 10 adult deaths. There are more than 50 carcinogens in tobacco smoke. Second-hand smoke causes the fatal diseases in adults and babies. Therefore, besides the problem of the health, the smoking behavior causes a problem of the social order. In Taiwan, smoking in public space is prohibited. Another illegal behavior in certain public space is drinking. There are increasing public spaces prohibiting drinking and eating, such as libraries, museums, MRT, and so on.

To practice laws in certain public space, this paper proposes a smart video surveillance system to analyze human behavior to determine whether it is illegal or not. The flowchart is shown in Fig.1. The proposed system consists of six sections, including face region extraction, multiple hand samples extraction, the touching time between the face and hand samples, handheld object detection, smoke detection, and human behavior recognition.

Fig. 1: Flowchart of our proposed system.

Load next frame

Face region extraction

Multiple hand samples

extraction

The touching time between the face and hand samples

Handheld object detection

Human behavior

recognition

Smoke detection

978-1-4673-5936-8/13/$31.00 ©2013 IEEE 214

Page 2: [IEEE 2013 1st International Conference on Orange Technologies (ICOT 2013) - Tainan (2013.3.12-2013.3.16)] 2013 1st International Conference on Orange Technologies (ICOT) - The optical

II. RELATED WORK

Detecting smoking behavior is a new research in recent year. Recently, Pin Wu et al. [1] proposed a method for detecting cigarettes to determine whether it is a smoking behavior or not. Because cigarettes must contact the lip when a smoking behavior happens, that paper searches the cigarette near the lip. That paper uses YCbCr color space to extract the face region and finds the location of the lip in the face region. Finally, that paper uses HSV color space to detect cigarette near the location of the lip. Wen-Chuan Wu et al. [2] presents a system to detect smoke behavior. That paper consists of two main modules including event clue detection and event classification. In event clue detection, that paper extracts three conceptual clues including cigarettes, hands motions, and then nominates time point candidates. In event classification, that paper utilizes Markov models of event classes to analyze behaviors to determine whether it is a smoking behavior or not.

Smoke is an important feature in a smoke behavior. Recently, the application of smoke detection is to detect wildfire smoke. Outlined in Toni Jakovčević et al. [3], wildfire smoke has many characteristics including its color and its texture. In the color information, T. H. Chen et al. [4] utilizes RGB color space to detect smoke. And A. K. Jain et al. [5] utilizes HSV color space to detect smoke. D. Krstinić et al. [6] compares the result of the following color spaces, such as RGB, YCbCr, CIELab, HIS, and HS′ I color space, and chooses HS′ I color space to detect smoke. Besides, D. Krstinić et al. [6] compares the result of the following classifiers, such as Lookup Table Method, Naïve Bayes Classifier, and the classifier of the kernel density estimation technique for calculating Bayes probability distributions. But only using the color information to detect smoke is not reliable. Because the smoke generated by a smoke behavior is thinner than wildfire smoke, the color of smoke will contain the color of the background. In the texture information, wavelet transform is a popular characteristic to detect smoke. Outlined in B. U. Treyin et al. [7, 8], when wildfire smoke covers objects, high frequencies of the entire image will decrease. But, this paper does not establish the background, so we cannot use the decrease of wavelet transform to detect smoke. Instead, this paper utilizes the motion information to detect smoke.

III. THE TOUCHING TIME

A. Face Extraction To capture the foreground region, the proposed system firstly extracts the face region. The method of the face extraction consists of two stages including face detection and face tracking. The technique of face detection proposed by Viola and Jones [9] is widely applied for extracting the face region. But if objects or smoke cover the face in the videos captured by camera,

the result of face detection will miss, as shown in Fig. 2. To solve the above problem, the proposed system utilizes the particle filter algorithm proposed by Nummiaro et al. [10] to track and extract the face region. In this paper, each particle is expressed by a 3×3×3 color histogram, and the weight of each particle is calculated by Bhattacharyya distance.

Fig 2: examples of missing results of face detection.

B. Hands Extraction

After extracting the face region, this system frames the area that hands may appear in, i.e. hand swing area. As shown in Fig. 3, the area in the blue frame is the hand swing area. If the location of hands is out of this area, then this system ignores this behavior. And then, this system utilizes multiple hand samples to extract hand region by the way of combining hand detection with hand tracking.

Fig 3: the blue frame is hand swing area

1. Hand Swing Area According to the face region, this paper defines the

body region firstly. As shown in Fig. 3, the green frame is the body region defined by this paper. The body width is 3 times more than the face width, and its height is 8 times more than the face height. Then, this paper defines the hand swing area. Its upper bound of y direction is two-third face height, and its lower bound is one-third body height. And its width is 3 times more than the face width. 2. Hand Detection

If the movement of hands is violent, then its corresponding velocity is fast. Therefore, this system calculates the optical flow to extract the above motion information. After spreading 400 features in the hand swing area, this system utilizes Lucas–Kanade method

215

Page 3: [IEEE 2013 1st International Conference on Orange Technologies (ICOT 2013) - Tainan (2013.3.12-2013.3.16)] 2013 1st International Conference on Orange Technologies (ICOT) - The optical

[11] to calculate the optical flow for those features. Then, according to the direction of those optical flow vectors, this system classifies five classes including left, right, up, down, and no motion. After that, using k-means algorithm, all of upward vectors are clustered into two groups according to the length of the vector, and then this system locates the possible hand region by framing all the longer vectors. Similarly, all of downward vectors do the same computation to locate the possible hand region. Finally, this system utilizes the technique of the skin detection proposed by Chiang et al. [12] to compute the number of skin pixels for each the possible hand region. In some hand sample, if skin pixels constitute a high proportion of the entire possible hand region, then this possible hand region is considered as a hand region. The result of hand detection is shown as Fig. 4.

Fig. 4: the result of hand detection.

3. Hand Tracking

In a behavior, if the movement of hands is too slow to be detected using hand detection, then this system will utilize the particle filter algorithm to locate the hand region. The method of hand tracking is like face tracking described in 3.1 section. 4. Multiple Hand Samples

In order to combine hand detection and hand tracking, this system utilizes a hand sample to update the hand location in each frame of the video. After loading a new frame, this system analyzes all upward optical flow vectors to detect the hand region as described in 3.2.2 section. If this system successfully detects the hand region which is near enough to the location of the hand sample, then the location of the hand sample will be updated to the detected hand region. If it cannot detect hand region according to all upward vectors in this frame, then this system will turn to analyze all downward optical flow vectors to update the location of the hand sample. But, if both the upward and downward vectors cannot be detected hand region, then this system will update the location of the hand sample using hand tracking.

But, if the location of the hand sample is wrong at the start or the movement of hands is too fast to track and detect, then using one hand sample to manage the hand location cannot recover the above failures any more once it happens. Therefore, this paper proposes multi-thread hand samples to solve the above problem. When successfully detecting the hand region according to all upward optical flow vectors, this system will add a

one-thread hand sample. The difference between one hand sample and multiple hand samples is shown in Fig. 5.

9

(a) (b) Fig. 5: (a) one hand sample. (b) multiple hand samples.

C. The Touching Time

After extracting the face region and hand samples, this system will calculate the touching time between the face and hand samples. To do this, the proposed system will detect whether there is contact between the face and any of hand samples. If the velocity and the distance of the face and some hand samples are similar enough, then this system will determine that hands touch the face in this frame. And then, if detecting contact successfully, this system will increase the touching time.

IV. HANDHELD DETECTION If detecting that any hand samples touch the face, then this system will utilize handheld detection proposed by Chuang et al. [13] to detect whether there is object in the face region.

The flowchart of handheld detection is shown in Fig. 6. As shown in Fig. 6, Ft-1 and Ft denote the face region extracted from It-1 and It, respectively. And Ht-1 and Ht denote the color histogram computed from Ft-1 and Ft respectively. Comparing Ht-1 with Ht, this system can get the ratio histogram Rt, and the ratio histogram is defined as follows:

(1)

where, Ht(.) is the color histogram of Ft, and Ht-1(.) is the color histogram of Ft-1. Rt(.) is the ratio histogram. And i express the ith bin in the histogram. Then, this system gets the value μ which is the average of all bins in Rt. If the value R(j) of some bin j in the ratio histogram R, then its corresponding color is the color of the handheld object.

216

Page 4: [IEEE 2013 1st International Conference on Orange Technologies (ICOT 2013) - Tainan (2013.3.12-2013.3.16)] 2013 1st International Conference on Orange Technologies (ICOT) - The optical

Fig. 6: the flowchart of handheld detection.

V. SMOKE DETECTION

(a) (b)

(c) (d)

Fig. 7: the clustering result. If detecting that hand samples leave from the face,

then this system will utilize the smoke detection. As mentioned above, the application of papers related to the smoke detection in recent year is to detect forest fire. The smoke generated by forest fire is thicker than generated by the smoking behavior. To detect the thinner smoke, this paper utilizes the motion information to confirm whether there is smoke in the face region. If there is smoke in the face region, then the optical flow vectors in the face region become longer. According to the above feature, this system firstly utilizes Lucas–Kanade method [11] to calculate the optical flow vectors. And then, using k-means algorithm, all of vectors are clustered into two groups according to the length of the vector. Comparing all longer optical flow vectors with all shorter vectors, this paper proposes two features to detect the smoke. One is that the difference between the length of longer vectors and the

length of shorter vectors is larger when smoke is generated. The other is that all longer vectors cluster in the smoke region when smoke is generated. The clustering result is shown in Fig. 8. Red vectors are longer optical flow vectors. The others are shorter vectors. Fig. 7(a) and Fig. 7(c) are no smoke cases. Fig. 7(b) and Fig. 8(d) are smoke cases. Fig. 7(a) and Fig. 7(b) are the result of computing optical flow out of the face. Fig. 7(c) and Fig.7(d) are the result the result of computing optical flow in the face.

VI. HUMAN BEHAVIOR RECOGNITION

Fig. 8: the decision tree to classify behaviors.

In this section, this paper utilizes a decision tree to combine three features: One is the touching time, another is the result of the handheld detection, and the other is the result of the smoke detection. The decision tree is shown in Fig. 8.

VII. EXPERIMENTAL RESULT

The hardware platform this paper used is Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz and 2GB memory. The software environment is VC 2008 and OpenCV 2.0. The spatial resolution is 360 × 240 pixels per frame. In the dataset, we collect 142 smoking behaviors, 129 drinking behaviors, 115 phoning behaviors, and 3 other behaviors.

The analysis result of this paper is shown in Table 1. The recognition rate of smoking behavior is less than the others behaviors. In smoke detection, the characteristic of smoke out of the face region is more obvious, but there is a big problem if something moving fast, such as a car, passed by the person who does not do illegal behavior. If used the smoke detection out of the

It It-1

video

Ft Ft-1

Ht Ht-1

Rt

ratio histogram

1.5μ

result

The touching

time

The handheld detection

The size of handheld

object

The smoke

detection

The phoning behavior

The drinking behavior

The smoking behavior

The others

short long

Detect

Not detect

big

small Detect

Not detect

217

Page 5: [IEEE 2013 1st International Conference on Orange Technologies (ICOT 2013) - Tainan (2013.3.12-2013.3.16)] 2013 1st International Conference on Orange Technologies (ICOT) - The optical

face region, the system may consider the car as smoke. To solve the above problem, this paper only analyzes the motion information in the face region. But we have to sacrifice a bit of recognition rate to reduce the cost of misclassification.

Table 1: The recognition rate of each behavior.

Analysis result

Smoking Drinking Phoning Other

Ground truth

Smoking 88.02% 2.81% 0% 9.15%

Drinking 3.1% 93.79% 1.55% 1.55%

Phoning 1.73% 0.86% 90.43% 6.95%

Other 0% 0% 0% 100%

The classification result of various behaviors is

shown in Fig. 9, Fig.10, and Fig. 11. The experimental result demonstrates that the proposed method can be suited and successfully applied in many environments.

Fig. 9: The classification result of smoking behaviors.

Fig. 10: The classification result of drinking behaviors.

Fig. 11: The classification result of phoning behaviors.

CONCLUSION

This paper presents a system to classify smoking,

drinking, phoning, and the others behaviors. The proposed system consists of four modules including face region extraction, multiple hand samples extraction, features extraction, and behavior analysis. This paper extracts three features from each human behavior. They are the touching time between the face and hand samples, smoke detection, and handheld object detection. Then a decision tree is employed to classify the human behavior using the extracted three features. The experimental result demonstrates that the proposed

method can be suited and successfully applied in many environments under various conditions.

REFERENCES

[1] Pin Wu, Jun-Wei Hsieh, Jin-Cheng Cheng, Shyi-Chyi Cheng, and Shau-Yin Tseng, “ Human Smoking Event Detection Using Visual Interaction Clues,” ICPR, pp. 4344–4347, 2010.

[2] Wen-Chuan Wu and Chun-Yang Chen, “Detection System of Smoking Behavior Based on Face Analysis,” International Conference on Genetic and Evolutionary Computing, pp.184-187, 2011.

[3] Toni Jakovčević, Maja Braović, Darko Stipaničev, and Damir Krstinić, “Review of wildfire smoke detection techniques based on visible spectrum video analysis,” Proc. 7th Int. Symp. Image Signal Processing Anal., pp. 480–484, Dubrovnik, 2011.

[4] T. H. Chen, Y. H. Yin, S. F. Huang, and Y. T. Ye, “The Smoke Detection for Early Fire-Alarming System Base on Video Processing,” Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP, pp. 427-430, 2006.

[5] A. K. Jain, Fundamentals of digital image processing, Prentice-Hall, Englewood Cliffs, New Jersey, 1989.

[6] D. Krstinić, D. Stipaničev, and T. Jakovčević, “Histogram-Based Smoke Segmentation in Forest Fire Detection System,” Information Technology and Control, Kaunas, Technologija, Vol. 38, No. 3, pp. 237-244, 2009.

[7] B. U. Treyin, Y. Dedeoglu, and A. E. Cetin, “Wavelet Based Real-Time Smoke Detection in Video,” EUSIPCO ’05, 2005.

[8] B. U. Treyin, Y. Dedeoglu, and A. E. Cetin, “Contour Based Smoke Detection in Video Using Wavelets,” European Signal Processing Conference, 2006.

[9] Paul Viola and Michael Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, Kauai, Hawaii, USA, pp. 905-910, 2001.

[10] K. Nummiaro, E. Koller-Meier, and L. Van Gool, “An Adaptive Color-Based Particle Filter,” Image and Vision Computing, Vol. 21, Issue 1, pp. 99-110, Jan 2003.

[11] B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” Proc. DARPA Image Understanding Workshop, pp. 121–130, 1981.

[12] C. Chiang, W. Tai, M. Yang, Y. Huang, and C. Huang, “A novel method for detecting lips, eyes and faces in real time,” Real-Time Imag., vol. 9, pp. 277–287, Aug. 2003.

[13] C.-H. Chuang, J.-W. Hiseh, L.-W. Tsai, S.-Y. Chen, and K.-C. Fan, “Carried object detection using ratio histogram and its application to suspicious event analysis,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 6, pp. 911–916, 2009.

218