Deep Learning-Based Safety Helmet Detection in Engineering ...downloads.hindawi.com/journals/ace/2020/9703560.pdf · Deep Learning-Based Safety Helmet Detection in Engineering Management

Research ArticleDeep Learning-Based Safety Helmet Detection in EngineeringManagement Based on Convolutional Neural Networks

Yange Li1 Han Wei1 Zheng Han 1 Jianling Huang1 and Weidong Wang12

1School of Civil Engineering Central South University Changsha 410075 China2e Key Laboratory of Engineering Structures of Heavy Haul Railway Ministry of Education Changsha 410075 China

Correspondence should be addressed to Zheng Han zheng_hancsueducn

Received 8 August 2019 Revised 14 May 2020 Accepted 10 September 2020 Published 19 September 2020

Academic Editor Antonio Formisano

Copyright copy 2020Yange Li et alis is an open access article distributed under theCreative CommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Visual examination of the workplace and in-time reminder to the failure of wearing a safety helmet is of particular importance toavoid injuries of workers at the construction site Video monitoring systems provide a large amount of unstructured image dataon-site for this purpose however requiring a computer vision-based automatic solution for real-time detection Although agrowing body of literature has developed many deep learning-based models to detect helmet for the traffic surveillance aspect anappropriate solution for the industry application is less discussed in view of the complex scene on the construction site In thisregard we develop a deep learning-based method for the real-time detection of a safety helmet at the construction site epresented method uses the SSD-MobileNet algorithm that is based on convolutional neural networks A dataset containing 3261images of safety helmets collected from two sources ie manual capture from the video monitoring system at the workplace andopen images obtained using web crawler technology is established and released to the public e image set is divided into atraining set validation set and test set with a sampling ratio of nearly 8 1 1 e experiment results demonstrate that thepresented deep learning-based model using the SSD-MobileNet algorithm is capable of detecting the unsafe operation of failure ofwearing a helmet at the construction site with satisfactory accuracy and efficiency

1 Introduction

Construction is a high-risk industry where constructionworkers tend to be hurt in the work process Head injuriesare very serious and often fatal According to the accidentstatistics released by the state administration of work safetyfrom 2015 to 2018 among the recorded 78 constructionaccidents 53 events happened owing to the fact that theworkers did not wear safety helmets properly accounting for6795 of the total number of accidents [1]

In safety management at the construction site it is es-sential to supervise the safety protective equipment wearingcondition of the construction workers Safety helmets canbear and disperse the hit of falling objects and alleviate thedamage of workers falling from heights Constructionworkers tend to ignore safety helmets because of weak safetyawareness At the construction site workers that wear safetyhelmets improperly are much more likely to be injured

Traditional supervision of the workers wearing safety hel-mets on construction sites often requires manual work [2]ere are problems such as a wide range of operations anddifficult management of site workers ese factors makemanual supervision difficult and inefficient and it is difficultto track and manage the whole workers at the constructionsites accurately in real time [3] Hence it is hard to satisfy themodern requirement of construction safety managementonly relying on the traditional manual supervision In thiscontext it remains a significant issue to study on the au-tomatic detection and recognition of safety helmets wearingconditions

e automatic monitoring method can contribute tomonitoring the construction workers and confirm the safetyhelmet wearing conditions at the construction site Inparticular considering that the traditional manual super-vision of the workers is often costly time-consuming error-prone and not sufficient to satisfy the modern requirements

HindawiAdvances in Civil EngineeringVolume 2020 Article ID 9703560 10 pageshttpsdoiorg10115520209703560

of construction safety management the automatic super-vision method can be beneficial to real-time on-sitemonitoring

In this paper based on the previous studies on computervision-based object detection we develop a deep learning-based method for the real-time detection of safety helmet atthe construction siteemajor contributions are as follows(1) a dataset containing 3261 images of safety helmets col-lected from two sources ie manual capture from the videomonitoring system at the workplace and open images ob-tained using web crawler technology is established andreleased to the public (2)e SSD-MobileNet algorithm thatis based on convolutional neural networks is used to trainthe model which is verified in our study as an alternativesolution to detect the unsafe operation of failure of wearing ahelmet at the construction site e article is organized asfollows Section 2 gives a brief description of the relatedwork Section 3 describes the methodology of the researchSection 4 introduces the construction of the databaseSection 5 reports the experiment results of the studySections 6 and 7 discuss the pros and cons of the study andconclude the paper

2 Literature Review

21 Related Research into the Safety Helmets Detection Atpresent previous studies of safety helmets detection can bedivided into three parts sensor-based detection machinelearning-based detection and deep learning-based detec-tion Sensor-based detection usually locates the safety hel-mets and workers (Kelm et al [4] Torres et al [5]) emethods usually use the RFID tags and readers to locate thehelmets and workers and monitor how personal protectiveequipment is worn by workers in real time Kelm et al [4]designed a mobile Radio Frequency Identification (RFID)portal for checking personal protective equipment (PPE)compliance of personnel However the working range of theRFID readers is limited and the RFID readers can onlysuggest that the safety helmets are close to the workers butunable to confirm that the safety helmets are being properlyworn

Up to date machine learning-based object detectiontechnologies are widely used in many domains for itspowerful object detection and classification capacity (egRubaiyat et al [6] Shrestha et al [7] Waranusast et al [8]Doungmala et al [9] Jia et al [10] and Li et al [11])Remarkable studies are made by Rubaiyat et al [6] whoproposed an automatic detection method to obtain thefeatures of construction workers and safety helmets anddetect safety helmets e method combines the frequencydomain information of the image with the histogram oforiented gradient (HOG) and the circle Hough transform(CHT) extractive technique to detect the workers and thehelmets in two steps e detection methods based onmachine learning can detect safety helmets accurately andprecisely under various scenarios but also have somedrawbacks Sometimes the method can only detect safetyhelmets with a specific color and it is difficult to distinguishthe hats with similar color and shape to the safety helmets

Moreover the method cannot detect faces and safety hel-mets thoroughly under some circumstances for examplesome workers do not turn their faces towards the camera atthe construction site

22 Deep Learning-Based Object Detection e above-mentioned methods are commonly based on traditionalmachine learning to detect and classify the helmets andchoose features artificially with a strong subjectivity acomplex design process and poor generalization ability Inrecent years with the rapid development of deep learningtechnology the object detection algorithm turns to the onebased on convolutional neural networks with a great pro-motion of speed and accuracy (eg Wu et al [12])

e methods construct convolutional neural networkswith different depths to detect safety helmets Some otherstrategies such as multiscale training increasing the numberof anchors and introducing the online hard example miningare added to increase the detection accuracy (eg Xu et al[13]) However these methods have some limitations in thepreprocessing aspects of image sharpness object proportionand the color difference between background andforeground

Deep learning-based methods are very potential for thepurpose of peoplersquos unsafe behavior identification Manyprevious studies have presented a solution to this topicRemarkable studies include the following Ding et al [14]developed a hybrid deep learning model that integrates aconvolution neural network (CNN) and long short-termmemory (LSTM) that automatically recognizes workersrsquounsafe actions e results demonstrated that the model canprecisely detect safe and unsafe actions conducted byworkers on-site However some behaviors cannot be rec-ognized owing to the lack of data the small sample size usedfor training and the limited number of unsafe actions thatwere considered Fang et al [15] proposed a novel deeplearning-based framework to check whether a site worker isworking within the constraints of their certification eframework includes key video clips extraction trade rec-ognition and worker competency evaluation Resultsdemonstrate that the proposed framework offers an effectiveand feasible solution to detect noncertified work Howeversome workers cannot be detected when the workersrsquo faceshardly appear or are obstructed by the safety helmets orother equipment Also the worker close to the camera failedto be recognized Fang et al [16] integrated a Faster R-CNNand a deep CNN to detect the presence of a worker fromimages and the harness respectively which can identifywhether workers wear safety harness while working atheights or not e research is limited by the restrictedactivities working at heights and the dataset size Fang et al[17] developed a computer vision-based approach whichuses a Mask R-CNN to detect people and recognize therelationship between people and concrete supports toidentify unsafe behaviors e study has some restrictions itfocuses on a limited number of activities related to theconstruction of deep foundation-pits Luo et al [18] pro-posed an increased CNN that integrates Red-Green-Blue

2 Advances in Civil Engineering

optical flow and gray stream CNNs to monitor and assessworkersrsquo activities associated with installing reinforcementat the construction site e research is limited by occlu-sions insufficient knowledge of a time series of actionsdefinition and lack of a large-scale database

Considering its excellent ability to extract features in thepaper we use the convolutional neural network (CNN) tobuild a safety helmet detection model Automatic detectionof safety helmets worn by construction workers at theconstruction site and timely warning of workers withouthelmets can largely avoid accidents caused by workerswearing safety helmets improperly e designed CNN istrained using the TensorFlow framework e contributionsof the research include a deep learning-based safety helmetdetection model and a safety helmet image dataset forfurther research e model provides an opportunity todetect the helmets and improve safety management

Deep learning-based methods are commonly used todetect unsafe behaviors on-site Nevertheless many tradi-tional measures of safety helmet detection are commonlysensor-based and machine-based thus limited by problemssuch as sensor failure over long distances the manual andsubjective features choice and the chaotic scene interferenceBased on the previous studies we present a deep learning-based method to detect the safety helmets in the workplacewhich is supposed to avoid the abovementioned limitations

3 Methodology

A convolutional neural network (CNN) is a multilayerneural network It is a deep learning method designed forimage recognition and classification tasks It can solve theproblems of too many parameters and difficult training ofthe deep neural networks and can get better classificationeffects e structure of most CNNs consists of input layer-convolutional layer (Conv layer)-activation function-pool-ing layer-fully connected layer (FC layer) e main char-acteristics of CNNs are local connectivity and parametersharing in order to reduce the number of parameters andincrease the efficiency of detection

e Conv layer and the pooling layer are the core partsand they can extract the object features Often the con-volutional layer and the pooling layer may occur alternatelye Conv layers can extract and reinforce the object featurese pooling layers can filter multiple features remove theunimportant features and compress the features e ac-tivation layers use nonlinear activation functions to enhancethe expression ability of the neural network models and cansolve the nonlinear problems effectively e FC layerscombine the data features of objects and output the featurevalues By this means the CNNs can transfer the originalinput images from the original pixel values to the finalclassification confidence layer by layer

In order to better extract the object features and classifythe objects more precisely Hinton et al [19] proposed theconcept of deep learning which is to learn object features fromvast amounts of data using deep neural networks and thenclassify new objects according to the learned features Deep

learning algorithm based on convolutional neural networkshas achieved great results in object detection image recog-nition and image segmentation Girshick et al [20] proposedR-CNN detection framework (region with CNN features) in2014Manymodels based on R-CNNwere proposed after thatincluding SPP-net (spatial pyramid pooling network) [21]Fast R-CNN (fast region with CNN features) [22] and FasterR-CNN (faster region with CNN features) [23]

Classification-based CNN object detection algorithmssuch as Faster R-CNN are widely used methods Howeverthe detection speed is slow and cannot detect in real timeRegression-based detection algorithms are becoming in-creasingly important Redmon et al [24] proposed YOLO(You Only Look Once) algorithm in 2016 At the end of2016 Liu et al [25] combined the anchor box of FasterR-CNN with the bounding box regression of YOLO andproposed a new algorithm SSD (Single Shot MultiBoxDetector) with higher detection accuracy and faster speed

Although the SSD algorithm is not capable of the highestaccuracy the detection speed of the SSD algorithm is muchfaster and comparable to the YOLO algorithm and theprecision can be higher than that of the YOLO algorithmwhen the sizes of the input images are smaller While theFaster R-CNN algorithm tends to lead to more accuratemodels it is much slower and requires at least 100ms perimage [26] erefore considering the real-time detectionrequirements the SSD algorithm is chosen in the researchIn order to reduce greatly the calculation amount and modelthickness the MobileNet [27] model is added erefore inthe paper the SSD-MobileNet model is selected to detectsafety helmets worn by the workers

e SSD algorithm is based on a feed-forward con-volutional network to produce bounding boxes of fixed sizesand generate scores for the object class examples in theboxes A nonmaximum suppression method is used topredict the final results

e early network layers of the SSD model are called thebase network based on a standard framework to classify theimage e base network is truncated before the classifica-tion layers and the convolutional layers are added at the endof the truncated base networke sizes of the convolutionalfeature maps decrease progressively to predict the detectionsat multiple scales

e SSD algorithm sets a series of fixed and different sizedefault boxes on the cell of each feature map as shown inFigure 1 Each default box predicts two kinds of detectionsOne is the location of bounding boxes including 4 offsets(cx cy w h) which represent respectively x and y coor-dinates of the center of the bounding box and the width andheight of the bounding box the other is the score of eachclass If there are C classes of the objects the SSD algorithmpredicts a total of C+1 score including the score of thebackground

e setting of default boxes can be divided into twoaspects size and aspect ratioe sizes of the default boxes inevery feature map will be calculated as follows

SK Smin +Smax minus Smin

m minus 1k minus 1 k isin [1 m] (1)

Advances in Civil Engineering 3

In the formula Smin is 02 Smax is 095 e aspect ratiosare set different for the default boxes expressed asar isin 1 2 3 (12) (13) e width of the default boxes iscalculated as follows

wak Sk

ar

radic (2)

e height of the default boxes is calculated as follows

hak

Skar

radic (3)

When the aspect ratio is 1 a default box size is addedSkprime

SkSk+1

1113968 erefore there are six default boxes of dif-

ferent sizes for each feature celle default boxes will be matched to the ground truth

boxes Each ground truth box can choose default boxes ofdifferent locations aspect ratios and sizes to match eground truth box will be matched to the default box with thebest Jaccard overlap e Jaccard overlap is also called theIoU (Intersection over Union) or the Jaccard similaritycoefficient e IoU is the ratio of the intersection and theunion of the default box to the ground truth box eschematic illustration of IoU is shown in Figure 2

J(A B) |AcapB|

|AcupB|

|AcapB|

|A| +|B| +|AcapB| (4)

After the match process most default boxes are negativeexamples which do not match the objects but the back-grounderefore the SSD algorithm uses the hard negativemining strategy to avoid the significant imbalance betweenthe positive and negative training examples e defaultboxes are ranked in the descending order according to theconfidence error and the top ones are chosen to be thenegative examples so the ratio between the negative andpositive examples is almost 3 1

e SSD algorithm defines the total loss function as theweighted sum between localization loss and confidence loss

L(x c l g) 1N

Lconf(x c) + αLloc(x l g)( 1113857 (5)

In the prediction process the object classes and confi-dence scores will be confirmed according to the maximumclass confidence score and the prediction box that belongs tothe background will be filtered out e prediction boxes withconfidence scores below 05 are also removed As for the leftboxes the location will be obtained according to the defaultboxes e prediction boxes are ranked in the descendingorder according to the confidence score and the top ones areretained Finally the nonmaximum suppression algorithm isused to filter out the prediction boxes with higher but not thehighest IOU and the left prediction boxes are the results

Although the SSD algorithm performs well in the speedand the precision the large model and a large amount ofcalculation make the training speed a bit slowerefore thebase network of the SSDmodel is replaced by the MobileNetmodel to reduce the calculation amount and the modelthickness In the paper the SSD-MobileNet model is chosento detect the safety helmets worn by the workers

e core concept of the MobileNet model is the fac-torization of the filters e main function is to reduce thecalculation amount and the network parameters e modelis used to factorize a standard convolution into a depthwiseconvolution and a pointwise convolution e model isshown in Figure 3

e model also introduces two hyperparameters widthmultiplier and resolution multiplier to reduce the channelnumbers and reduce the image resolutions respectively enetwork model with less calculation amount can be builtHence using the SSD-MobileNet model can reduce thethickness of the SSD model effectively

4 Database

e data required for the experiment were collected by theauthor Since there are few object detection applications ofsafety helmets using deep learning and there is no off-the-shelf safety helmets dataset available part of the experi-mental data was collected using web crawler technologymaking full use of network resources By using several

(a) (b)

Loc∆ (cx cy w h)Conf (c1 c2 hellip cp)

(c)

Figure 1 Detection process (a) Image with GT box (b) 8 times 8 feature map (c) 4 times 4 feature map


keywords such as ldquoworkers wear safety helmetsrdquo andldquoworkers on the construction siterdquo python language is usedto crawl relevant pictures on the Internet

However the quality of the crawled images varies greatlyere are problems that there is an only background and noobjects in some images the size of the safety helmet is smalland the shape is blurred erefore images were also collectedmanually besides web crawling 3500 images were collected intotal e images that did not contain safety helmets duplicateimages and the images that are not in the RGB three-channelformat were eliminated and 3261 images were left forming thesafety helmet detection dataset Some images in the dataset areshown in Figure 4 To increase the detection effect of the safetyhelmet detection model in detecting helmets with differentdirections and brightness in images the image dataset waspreprocessed such as rotation cutting and zooming

en the samples in the dataset are divided into three partsrandomly training set validation set and test set Commonly

a ratio of 6 2 2 is suggested for dividing the training setvalidation set and test set in the previous machine learningstudies such as the course of AndrewNg from deeplearningaiIn deep learning the dataset scale is much larger and thevalidation and test sets tend to be a smaller percentage of thetotal data which are commonly less than 20 or 10 In thissense an adequate ratio of 8 1 1 according to the previousexperience is adopted in our study e numbers of the threesets are 2769 339 and 153 respectively All the images thatcontained safety helmets were manually prelabeled using theopen-source tool LabelImage (available in httpsgithubcomtzutalinlabelImg) In each labeled image the sizes and thelocations of the object are recorded (Figure 5)

5 Results

In the paper the open-source TensorFlow framework ischosen to train the model e pretrained

Input Kernel Feature

sum + b

sum + b

(a)

InputGroup 0 Depthwise feature

Group 1

Group 2

(b)

Depthwise feature Pointwise feature

(c)

Figure 3 Schematic illustration of MobileNet separate convolution (a) Standard convolution (b) Depthwise convolution (c) Pointwiseconvolution

A

A cap B

A cup B

IoU =

B

Figure 2 Schematic illustration of IoU


SSD_mobilenet_v1_COCOmodel with the COCO dataset isused to learn the characteristics of the safety helmet in thebuilt dataset to reduce the training time and save thecomputing resources e initial weights and the parametervalues of our own model are the same as the SSD_mobi-lenet_v1_COCO model Finally the weights and the pa-rameter values of the safety helmet detection model aretrained and obtained through the training process

Among the 3261 images 2769 images were dividedinto the training set 339 images were divided into thevalidation set and 153 images were divided into the testset e training set is used to train the model or to de-termine the parameters of the model e validation set isused to adjust the hyperparameters of the model and toevaluate the capacity of the model preliminarily e testset is used to evaluate the generalization ability of the finalmodel [28]

In the course of training the change of the mean averageprecision (mAP) and the loss function during training wasrecorded by TensorBoard As a measure index the meanaverage precision (mAP) [29] is generally used in the field ofobject detection Figure 6 illustrated that the mean Average

Precision shows an overall upward trend and the trend hasups and downs and is not a steady rise When training roundsup to 50000 the mean average precision of the detectionmodel is 3682 Figure 7 shows that the total loss valuesdecrease slowly at the beginning of the training and convergeat the end of the training e values of the loss function arethe differences between the true value and the predicted valuein general speaking e change in the values of the lossfunction represents the training process of the model esmaller the values are the better the model is trained econvergence of the loss functions demonstrates that thetraining of the model is completed Hence loss functionsmainly influence the training process but not detectionFigures 8(a) to 8(c) show the variation of the classification lossfunction localization loss function and regularization lossfunction against the steps Figures 8(a) and 8(b) demonstratethat the values of the classification loss function decreaseslowly at first and then decrease rapidly when trainingrounds up to nearly 7000 the values of the localization lossfunction decrease rapidly at first and converge at the end ofthe training e convergence of the loss functions demon-strates that the training of the model is completed

Figure 4 Image dataset

(a)

Image (1)jpgImage (1)jpgImage (1)jpgImage (1)jpgImage (1)jpgImage (1)jpg

Filename315315315315315315

Width207207207207207207

HeightHelmetHelmetHelmetHelmetHelmetHelmet

Class100162206238

146

Xmin6

3553453045

Ymin1522002292583383

Xmax516169606874

Ymax

(b)

Figure 5 (a) Manual labeling (b) Data recording


After the model was trained it was used to validate thecollected validation set by using the Spyder softwaree 153images of the validation set were input into the model andthe detected images were output e output images showthe predicted labels and the confidence scores of safetyhelmets Some validation results are shown in Figure 9

e precision and recall are the commonly used metricsto evaluate the performance and reliability of the trainedmodel Precision is the ratio of true positive (TP) to truepositive and false positive (TP + FP) TP + FP is the numberof helmets detected Recall is the ratio of true positive (TP) totrue positive and false negative (TP+ FN) TP+ FN means

the actual number of helmets ere are 250 true positiveobjects 12 false positive objects and 73 false negative objectsin the detected images e precision of the trained model is95 and the recall is 77 which demonstrates that theproposed method performs well in safety helmet detection

As the pictures above show the probabilities of recog-nizing the safety helmets worn by workers as safety helmetsare more than 80 However the output images of themodel demonstrate some errors in the detection model Forexample it is hard for the model to detect the safety helmetsof small sizes or large rotation angles It is possible torecognize the objects of the same colors in the images as the

00

40

80

120

Tota

l los

s

0 10k 20k 30kTraining steps

40k 50k

Figure 7 Total loss change during training

Classification loss

00

40

80

120

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(a)

Localization loss

00

10

20

30

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(b)

Regularization loss036

040

044

048

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(c)

Figure 8 Partial loss function value change (a) Classification loss (b) Localization loss (c) Regularization loss

040

030

020

010

00 10k 20k 30k

Training steps

Mea

n av

erag

e pre

cisio

n

40k 50k

Figure 6 Mean average precision change during training


safety helmets When the illumination intensity of theconstruction site in the images and the objects are not clearthe safety helmets are difficult to be recognized at sug-gests the detection model established in the paper is notaccurate enough

As shown in Figure 10(a) the probability predicted by themodel is 98 but the probability of recognizing the back-ground as safety helmets is 78is fake detection generatesfalse positive is is a case of false detection which predictedthe false object as correct e case of Figure 10(c) is the sameas the first one In Figure 10(b) the red helmet is missed andthis is a case of false negative e errors occur because of theinterference of the complex background the limitation of thenumber of the image dataset and the safety helmets pro-portion in the images In order to improve the performance ofthe model some measures must be taken such as increasingthe number of the image dataset and adding the pre-processing operations of the images Besides the abovemeasures ameliorating the nonmaximum suppression al-gorithm adjusting the parameters and weights and so forthcan also be a great solution to reduce the false positives

In summary there are several detection errors of themodel (1) e hats with the same shapes and colors or thebackground are recognized mistakenly as the safety helmets(2) e safety helmets of incomplete shapes and small sizesare hard to be recognized (3) e two or more helmets thatare very close to each other are often recognized as a safetyhelmet

6 Discussion

61 Effect of the Presented Method e proposed automaticdetection method based on deep learning to detect safetyhelmets worn by workers provides an effective opportunityto improve safety management on construction sites Pre-vious studies have demonstrated the effectiveness of locatingthe safety helmets and workers and detecting the helmets

However most of the studies have limitations in practicalapplication Sensor-based detection methods have a limitedread range of readers and cannot be able to confirm theposition relationship between the helmets and the workerse machine learning-based detection methods choosefeatures artificially with a strong subjectivity a complexdesign process and poor generalization ability ereforethe study proposed a method based on deep learning todetect safety helmets automatically using convolutionalneural networks e experimental results have suggestedthe effectiveness of the proposed method

In the paper the SSD-MobileNet algorithm is used tobuild the model A dataset of 3261 images containing varioushelmets is trained and tested on the modele experimentalresults demonstrate the feasibility of the model And themodel does not require the selection of handcraft featuresand has a good capacity of extracting features in the imagese high precision and recall show the great performance ofthe model e proposed model provides an opportunity todetect the helmets and improve construction safety man-agement on-site

62 Limitations However the detection model has a poorperformance when the images are not very clear the safetyhelmets are too small and obscure and the background is toocomplex as shown in Figure 10 Moreover the presentedmodel is limited by the problems that some images of thedataset are less in quantity the preprocessing operations ofthe images are confined to rotation cutting and zoomingthe manual labeling is not comprehensive and may misssome objects In some extreme cases for example only partof the head is visible and the safety helmet is obstructed themodel cannot detect the helmets accurately is is thecommon limitation of the-state-of-art algorithms Due tothe above reasons the detection performance is not goodenough and there are some detection errors

Figure 9 Validation results


e algorithm we use emphasizes the real-time detectionand fast speed However the accuracy of the detection is alsoquite important and the performance needs to be improvedHence in the ongoing studies we are working at the ex-pansion and improvement of the dataset in order to solve theproblems of inadequate data with poor quality Morecomprehensive preprocessing operations should be done toimprove the performance of the model

7 Conclusions

e paper proposed a method for detecting the wearing ofsafety helmets by the workers based on convolutional neuralnetworks e model uses the SSD-MobileNet algorithm todetect safety helmets en a dataset of 3261 images con-taining various helmets is built and divided into three partsto train and test the model e TensorFlow framework ischosen to train the model After the training and testingprocess the mean average precision (mAP) of the detectionmodel is stable and the helmet detection model is built eexperiment results demonstrate that the method can be usedto detect the safety helmets worn by the constructionworkers at the construction site e presented methodoffers an alternative solution to detect the safety helmets andimprove the safety management of the construction workersat the construction site

Data Availability

e database used to train the CNN of this study is availablefrom the corresponding author upon request

Conflicts of Interest

e authors declare that they have no conflicts of interestregarding the publication of this paper

Authorsrsquo Contributions

Professor YG Li collected the dataset and wrote themanuscript Professor Z Han designed the study H Weitrained and optimized the model Professor JL Huang andProfessor WD Wang participated in the analysis of theresults All the authors discussed the results and commentedon the manuscript

Acknowledgments

is study was financially supported by the National KeyRampD Program of China (Grant no 2018YFC1505401) theNational Natural Science Foundation of China (Grant no52078493) the Natural Science Foundation of Hunan(Grant no 2018JJ3638) and the Innovation Driven Programof Central South University (Grant no 2019CX011) esefinancial supports are gratefully acknowledged

References

[1] X Chang and X M Liu ldquoFault tree analysis of unreasonablywearing helmets for buildersrdquo Journal of Jilin Jianzhu Uni-versity vol 35 no 6 pp 67ndash71 2018

[2] Z YWangDesign and Implementation of Detection System ofWarning Helmets Based on Intelligent Video SurveillanceBeijing University of Posts and Telecommunications BeijingChina 2018

[3] H Zeng Research on Intelligent Helmets System for Engi-neering Construction Harbin Institute of Technology HarbinChina 2017

[4] A Kelm L Lauszligat A Meins-Becker et al ldquoMobile passiveradio frequency identification (RFID) portal for automatedand rapid control of personal protective equipment (PPE) onconstruction sitesrdquo Automation in Construction vol 36pp 38ndash52 2013

[5] S Barro-Torres T M Fernandez-Carames H J Perez-Iglesias and C J Escudero ldquoReal-time personal protectiveequipment monitoring systemrdquo Computer Communicationsvol 36 no 1 pp 42ndash50 2012

[6] A H M Rubaiyat T T Toma M Kalantari-Khandani et alldquoAutomatic detection of helmet uses for construction safetyrdquoin Proceedings of the 2016 IEEE ACM International Conferenceon Web Intelligence Workshops (WIW) ACM Omaha NEUSA October 2016

[7] K Shrestha P P Shrestha D Bajracharya and E A YfantisldquoHard-hat detection for construction safety visualizationrdquoJournal of Construction Engineering vol 2015 Article ID721380 8 pages 2015

[8] R Waranusast N Bundon V Timtong C Tangnoi andP Pattanathaburt ldquoMachine vision techniques for motorcyclesafety helmet detectionrdquo in Proceedings of the Image amp VisionComputing New Zealand IEEE Wellington New ZealandNovember 2013

[9] P Doungmala and K Klubsuwan ldquoHelmet wearing detectioninailand using haar like feature and circle hough transform

(a) (b) (c)

Figure 10 Detection errors


on image processingrdquo in Proceedings of the IEEE InternationalConference on Computer amp Information Technology IEEENadi Fiji December 2016

[10] J S Jia Q J Bao and H M Tang ldquoMethod for detectingsafety helmet based on deformable part modelrdquo ApplicationResearch of Computers vol 33 no 3 pp 953ndash956 2016

[11] K Li X G Zhao J Bian and M Tan ldquoAutomatic safetyhelmet wearing detectionrdquo 2018 httpsarxivorgabs180200264

[12] H Wu and J Zhao ldquoAutomated visual helmet identificationbased on deep convolutional neural networksrdquo in Proceedingsof the 13th International Symposium on Process Systems En-gineering (PSE 2018) vol 44 pp 2299ndash2304 San Diego CAUSA July 2018

[13] S Xu Y Wang Y Gu N Li L Zhuang and L Shi ldquoSafetyhelmet wearing detection study based on improved fasterRCNNrdquo Application Research of Computers vol 37 no 3pp 901ndash905 2019

[14] L Ding W Fang H Luo P E D Love B Zhong andX Ouyang ldquoA deep hybrid learning model to detect unsafebehavior integrating convolution neural networks and longshort-term memoryrdquo Automation in Construction vol 86pp 118ndash124 2018

[15] Q Fang H Li X Luo et al ldquoA deep learning-based methodfor detecting non-certified work on construction sitesrdquo Ad-vanced Engineering Informatics vol 35 pp 56ndash68 2018

[16] W Fang L Ding H Luo and P E D Love ldquoFalls fromheights a computer vision-based approach for safety harnessdetectionrdquo Automation in Construction vol 91 pp 53ndash612018

[17] W Fang B Zhong N Zhao et al ldquoA deep learning basedapproach for mitigating falls from height with computervision convolutional neural networkrdquo Advanced EngineeringInformatics vol 39 pp 179ndash177 2019

[18] H Luo C Xiong W Fang P E D Love B Zhang andX Ouyang ldquoConvolutional neural networks computer vi-sion-based workforce activity assessment in constructionrdquoAutomation in Construction vol 94 pp 282ndash289 2018

[19] G E Hinton and R R Salakhutdinov ldquoReducing the di-mensionality of data with neural networksrdquo Science vol 313no 5786 pp 504ndash507 2006

[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic seg-mentationrdquo in Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition pp 580ndash587 ColumbusOH USA June 2014

[21] K He X Zhang S Ren and J Sun ldquoSpatial pyramid poolingin deep convolutional networks for visual recognitionrdquo IEEETransactions on Pattern Analysis amp Machine Intelligencevol 37 no 9 pp 1904ndash1916 2014

[22] R Girshick ldquoFast R-CNN computer sciencerdquo in Proceedingsof the 2015 IEEE International Conference on Computer Vision(ICCV) pp 1440ndash1448 Santiago Chile December 2015

[23] S Ren K He R Girshick and J Sun ldquoFaster R-CNN towardsreal-time object detection with region proposal networkrdquoIEEE Transactions on Pattern Analysis ampMachine Intelligencevol 39 no 6 p 1137 2016

[24] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR) Las Vegas NV USA June 2016

[25] W Liu D Anguelov D Erhan et al ldquoSingle shot multiboxdetectorrdquo in Proceedings of the ECCV 2016 Computer Vision-

ECCV 2016 vol 9905 pp 21ndash37 Springer Amsterdam eNetherlands October 2016

[26] J Huang V Rathod C Sun et al ldquoSpeedaccuracy trade-offsfor modern convolutional object detectorsrdquo in Proceedings ofthe 2017 IEEE Conference on Computer Vision and PatternRecognition (CVPR) pp 3296-3297 Honolulu HI USA July2017

[27] A G Howard M Zhu B Chen et al ldquoMobileNets efficientconvolutional neural networks for mobile vision applica-tionsrdquo 2017 httpsarxivorgabs170404861

[28] J Grum ldquoBook review pattern recognition and neural net-works by BD Ripleyrdquo International Journal of Microstructureand Materials Properties vol 4 no 1 p 146 2009

[29] M Everingham L V Gool and J Winn ldquoe pascal visualobject classes (VOC) challengerdquo International Journal ofComputer Vision vol 88 no 2 pp 303ndash338 2010


of construction safety management the automatic super-vision method can be beneficial to real-time on-sitemonitoring

In this paper based on the previous studies on computervision-based object detection we develop a deep learning-based method for the real-time detection of safety helmet atthe construction siteemajor contributions are as follows(1) a dataset containing 3261 images of safety helmets col-lected from two sources ie manual capture from the videomonitoring system at the workplace and open images ob-tained using web crawler technology is established andreleased to the public (2)e SSD-MobileNet algorithm thatis based on convolutional neural networks is used to trainthe model which is verified in our study as an alternativesolution to detect the unsafe operation of failure of wearing ahelmet at the construction site e article is organized asfollows Section 2 gives a brief description of the relatedwork Section 3 describes the methodology of the researchSection 4 introduces the construction of the databaseSection 5 reports the experiment results of the studySections 6 and 7 discuss the pros and cons of the study andconclude the paper

2 Literature Review

21 Related Research into the Safety Helmets Detection Atpresent previous studies of safety helmets detection can bedivided into three parts sensor-based detection machinelearning-based detection and deep learning-based detec-tion Sensor-based detection usually locates the safety hel-mets and workers (Kelm et al [4] Torres et al [5]) emethods usually use the RFID tags and readers to locate thehelmets and workers and monitor how personal protectiveequipment is worn by workers in real time Kelm et al [4]designed a mobile Radio Frequency Identification (RFID)portal for checking personal protective equipment (PPE)compliance of personnel However the working range of theRFID readers is limited and the RFID readers can onlysuggest that the safety helmets are close to the workers butunable to confirm that the safety helmets are being properlyworn

Up to date machine learning-based object detectiontechnologies are widely used in many domains for itspowerful object detection and classification capacity (egRubaiyat et al [6] Shrestha et al [7] Waranusast et al [8]Doungmala et al [9] Jia et al [10] and Li et al [11])Remarkable studies are made by Rubaiyat et al [6] whoproposed an automatic detection method to obtain thefeatures of construction workers and safety helmets anddetect safety helmets e method combines the frequencydomain information of the image with the histogram oforiented gradient (HOG) and the circle Hough transform(CHT) extractive technique to detect the workers and thehelmets in two steps e detection methods based onmachine learning can detect safety helmets accurately andprecisely under various scenarios but also have somedrawbacks Sometimes the method can only detect safetyhelmets with a specific color and it is difficult to distinguishthe hats with similar color and shape to the safety helmets

Moreover the method cannot detect faces and safety hel-mets thoroughly under some circumstances for examplesome workers do not turn their faces towards the camera atthe construction site

22 Deep Learning-Based Object Detection e above-mentioned methods are commonly based on traditionalmachine learning to detect and classify the helmets andchoose features artificially with a strong subjectivity acomplex design process and poor generalization ability Inrecent years with the rapid development of deep learningtechnology the object detection algorithm turns to the onebased on convolutional neural networks with a great pro-motion of speed and accuracy (eg Wu et al [12])

e methods construct convolutional neural networkswith different depths to detect safety helmets Some otherstrategies such as multiscale training increasing the numberof anchors and introducing the online hard example miningare added to increase the detection accuracy (eg Xu et al[13]) However these methods have some limitations in thepreprocessing aspects of image sharpness object proportionand the color difference between background andforeground

Deep learning-based methods are very potential for thepurpose of peoplersquos unsafe behavior identification Manyprevious studies have presented a solution to this topicRemarkable studies include the following Ding et al [14]developed a hybrid deep learning model that integrates aconvolution neural network (CNN) and long short-termmemory (LSTM) that automatically recognizes workersrsquounsafe actions e results demonstrated that the model canprecisely detect safe and unsafe actions conducted byworkers on-site However some behaviors cannot be rec-ognized owing to the lack of data the small sample size usedfor training and the limited number of unsafe actions thatwere considered Fang et al [15] proposed a novel deeplearning-based framework to check whether a site worker isworking within the constraints of their certification eframework includes key video clips extraction trade rec-ognition and worker competency evaluation Resultsdemonstrate that the proposed framework offers an effectiveand feasible solution to detect noncertified work Howeversome workers cannot be detected when the workersrsquo faceshardly appear or are obstructed by the safety helmets orother equipment Also the worker close to the camera failedto be recognized Fang et al [16] integrated a Faster R-CNNand a deep CNN to detect the presence of a worker fromimages and the harness respectively which can identifywhether workers wear safety harness while working atheights or not e research is limited by the restrictedactivities working at heights and the dataset size Fang et al[17] developed a computer vision-based approach whichuses a Mask R-CNN to detect people and recognize therelationship between people and concrete supports toidentify unsafe behaviors e study has some restrictions itfocuses on a limited number of activities related to theconstruction of deep foundation-pits Luo et al [18] pro-posed an increased CNN that integrates Red-Green-Blue





3 Methodology















wak Sk

ar

radic (2)


hak

Skar

radic (3)


SkSk+1




J(A B) |AcapB|

|AcupB|

|AcapB|

|A| +|B| +|AcapB| (4)



L(x c l g) 1N






4 Database


(a) (b)


(c)







5 Results



sum + b

sum + b

(a)


Group 1

Group 2

(b)


(c)


A

A cap B

A cup B

IoU =

B








(a)


Filename315315315315315315

Width207207207207207207


Class100162206238

146

Xmin6

3553453045

Ymin1522002292583383

Xmax516169606874

Ymax

(b)







00

40

80

120

Tota

l los

s


40k 50k


Classification loss

00

40

80

120

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(a)

Localization loss

00

10

20

30

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(b)


040

044

048

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(c)


040

030

020

010

00 10k 20k 30k

Training steps

Mea

n av

erag

e pre

cisio

n

40k 50k






6 Discussion








7 Conclusions


Data Availability






Acknowledgments


References










(a) (b) (c)





























3 Methodology















wak Sk

ar

radic (2)


hak

Skar

radic (3)


SkSk+1




J(A B) |AcapB|

|AcupB|

|AcapB|

|A| +|B| +|AcapB| (4)



L(x c l g) 1N






4 Database


(a) (b)


(c)







5 Results



sum + b

sum + b

(a)


Group 1

Group 2

(b)


(c)


A

A cap B

A cup B

IoU =

B








(a)


Filename315315315315315315

Width207207207207207207


Class100162206238

146

Xmin6

3553453045

Ymin1522002292583383

Xmax516169606874

Ymax

(b)







00

40

80

120

Tota

l los

s


40k 50k


Classification loss

00

40

80

120

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(a)

Localization loss

00

10

20

30

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(b)


040

044

048

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(c)


040

030

020

010

00 10k 20k 30k

Training steps

Mea

n av

erag

e pre

cisio

n

40k 50k






6 Discussion








7 Conclusions


Data Availability






Acknowledgments


References










(a) (b) (c)



























wak Sk

ar

radic (2)


hak

Skar

radic (3)


SkSk+1




J(A B) |AcapB|

|AcupB|

|AcapB|

|A| +|B| +|AcapB| (4)



L(x c l g) 1N






4 Database


(a) (b)


(c)







5 Results



sum + b

sum + b

(a)


Group 1

Group 2

(b)


(c)


A

A cap B

A cup B

IoU =

B








(a)


Filename315315315315315315

Width207207207207207207


Class100162206238

146

Xmin6

3553453045

Ymin1522002292583383

Xmax516169606874

Ymax

(b)







00

40

80

120

Tota

l los

s


40k 50k


Classification loss

00

40

80

120

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(a)

Localization loss

00

10

20

30

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(b)


040

044

048

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(c)


040

030

020

010

00 10k 20k 30k

Training steps

Mea

n av

erag

e pre

cisio

n

40k 50k






6 Discussion








7 Conclusions


Data Availability






Acknowledgments


References










(a) (b) (c)






























5 Results



sum + b

sum + b

(a)


Group 1

Group 2

(b)


(c)


A

A cap B

A cup B

IoU =

B








(a)


Filename315315315315315315

Width207207207207207207


Class100162206238

146

Xmin6

3553453045

Ymin1522002292583383

Xmax516169606874

Ymax

(b)







00

40

80

120

Tota

l los

s


40k 50k


Classification loss

00

40

80

120

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(a)

Localization loss

00

10

20

30

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(b)


040

044

048

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(c)


040

030

020

010

00 10k 20k 30k

Training steps

Mea

n av

erag

e pre

cisio

n

40k 50k






6 Discussion








7 Conclusions


Data Availability






Acknowledgments


References










(a) (b) (c)































(a)


Filename315315315315315315

Width207207207207207207


Class100162206238

146

Xmin6

3553453045

Ymin1522002292583383

Xmax516169606874

Ymax

(b)







00

40

80

120

Tota

l los

s


40k 50k


Classification loss

00

40

80

120

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(a)

Localization loss

00

10

20

30

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(b)


040

044

048

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(c)


040

030

020

010

00 10k 20k 30k

Training steps

Mea

n av

erag

e pre

cisio

n

40k 50k






6 Discussion








7 Conclusions


Data Availability






Acknowledgments


References










(a) (b) (c)






























00

40

80

120

Tota

l los

s


40k 50k


Classification loss

00

40

80

120

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(a)

Localization loss

00

10

20

30

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(b)


040

044

048

Loss

val

ue

0 10k 20k 30kSteps

40k 50k

(c)


040

030

020

010

00 10k 20k 30k

Training steps

Mea

n av

erag

e pre

cisio

n

40k 50k






6 Discussion








7 Conclusions


Data Availability






Acknowledgments


References










(a) (b) (c)





























6 Discussion








7 Conclusions


Data Availability






Acknowledgments


References










(a) (b) (c)



























7 Conclusions


Data Availability






Acknowledgments


References










(a) (b) (c)

















































Documents

Deep Learning-Based Safety Helmet Detection in Engineering ...downloads.hindawi.com/journals/ace/2020/9703560.pdf · Deep Learning-Based Safety Helmet Detection in Engineering Management