A hybrid structured deep neural network with Word2Vec for …du.diva-portal.org/smash/get/diva2:1375948/FULLTEXT01.pdf · 2019-12-06 · A hybrid structured deep neural network with

Full Terms & Conditions of access and use can be found athttps://www.tandfonline.com/action/journalInformation?journalCode=tjcm20

International Journal of Construction Management

ISSN: 1562-3599 (Print) 2331-2327 (Online) Journal homepage: https://www.tandfonline.com/loi/tjcm20

A hybrid structured deep neural network withWord2Vec for construction accident causesclassification

Fan Zhang

To cite this article: Fan Zhang (2019): A hybrid structured deep neural network with Word2Vecfor construction accident causes classification, International Journal of Construction Management,DOI: 10.1080/15623599.2019.1683692

To link to this article: https://doi.org/10.1080/15623599.2019.1683692

© 2019 The Author(s). Published by InformaUK Limited, trading as Taylor & FrancisGroup

Published online: 08 Nov 2019.

Submit your article to this journal

Article views: 85

View related articles

View Crossmark data

https://www.tandfonline.com/action/journalInformation?journalCode=tjcm20

https://www.tandfonline.com/loi/tjcm20

https://www.tandfonline.com/action/showCitFormats?doi=10.1080/15623599.2019.1683692

https://doi.org/10.1080/15623599.2019.1683692

https://www.tandfonline.com/action/authorSubmission?journalCode=tjcm20&show=instructions

https://www.tandfonline.com/action/authorSubmission?journalCode=tjcm20&show=instructions

https://www.tandfonline.com/doi/mlt/10.1080/15623599.2019.1683692

https://www.tandfonline.com/doi/mlt/10.1080/15623599.2019.1683692

http://crossmark.crossref.org/dialog/?doi=10.1080/15623599.2019.1683692&domain=pdf&date_stamp=2019-11-08


A hybrid structured deep neural network with Word2Vec for constructionaccident causes classification

Fan Zhang

Departments of Microdata Analysis and Energy Technology, Dalarna University, Falun, Sweden

ABSTRACTAccording to the latest fatal work injury rates reported by the Bureau of Labors Statistics,construction sites remain the most hazardous workplaces. In the construction sector, fatalityinvestigation summary reports are available for past accidents and by investigating suchreports, valuable insights can be gained. In this study, text mining algorithms are exploredfor automatic construction accident causes classification. To be more specific, Word2Vec skip-gram model is utilized to learn word embedding from a domain-specific corpus and a hybridstructured deep neural network is proposed by incorporating the learned word embeddingfor accident reports classification. Dataset from Occupational Safety and HealthAdministration (OSHA) is employed in the experiment to evaluate the performance of theproposed approach. Besides, five baseline models: support vector machine (SVM), linearregression (LR), K-nearest neighbor (KNN), decision tree (DT), Naive Bayes (NB) are employedto compare with the proposed approach. Experiment results show that the proposed modelachieves the highest average weighted F1 score among all models considered in this study.The result also proves the effectiveness of applying Word2Vec skip-gram algorithm forsemantic information augmentation. As a result, robustness of the model is improved whenclassifying cases of low support values.

KEYWORDSConstruction safety;construction automation;machine learning; naturallanguage processing; deepneural networks; Word2Vec

Introduction

According to the latest fatal work injury ratesreported by the Bureau of Labors Statistics, construc-tion industry contributed to the highest rate of fatalwork injuries, 971 deaths out of 4685 in 2017.Construction accidents lead to fatalities, immensefinancial loss and harms the reputation of the firmsinvolved. To enhance the construction site safety inthe long run, investigating what went wrong in thepast is essential. In the construction industry, a catas-trophe investigation report is filed which describes ofthe accident in details, generally including eventsleading to the accident and causal factors as part ofthe enforcement inspection after each catastrophicconstruction accident. By analyzing such reports,construction engineers, project managers and regula-tory bodies can identify issues in construction design,project management as well as management of fieldengineering changes. As a result, prevention

strategies can be made accordingly to mitigate poten-tial risks in the future.

Recently, deep neural networks have become popu-lar among researchers due to the capabilities of solv-ing complex problems such as time series forecasting,image analysis, etc. On the other hand, Word2Vec(Mikolov et al. 2013a, 2013b, 2013c) techniques areproved to be more efficient in capturing semanticsimilarity of words and presenting them in a morescientific way than conventional words representationapproaches. More importantly, labelling datasetrequires huge manual effort while unlabelled data iseasy to obtain. By utilizing Word2Vec algorithms,semantic information can be learned from massiveunlabelled data which results in the augmented know-ledge with minimal human effort.

However, studies of either deep neural networks orWord2Vec with respect to construction accidentscauses classification are barely found in existing

CONTACT Fan Zhang [email protected]� 2019 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed,or built upon in any way.

INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENThttps://doi.org/10.1080/15623599.2019.1683692


http://creativecommons.org/licenses/by/4.0/

http://creativecommons.org/licenses/by/4.0/

https://doi.org/10.1080/15623599.2019.1683692

http://www.tandfonline.com

literature. The motivation of this paper is to fill thisresearch gap and explore the state of art text miningtechniques for automatic construction accidentreports classification, which enables large text data-bases to be analyzed with minimized human laboureffort. To be more specific, Word2Vec skip-gram algo-rithm is utilized to learn word embedding using thedomain-specific dataset from the Occupational Safetyand Health Administration (OSHA). Besides, a hybridstructured deep neural network which utilizes thelearned word embedding as one of its main buildingblocks is proposed for construction accident reportsclassification. Neither Word2Vec skip-gram algorithmnor the hybrid structured deep neural network is foundwith respect to this field in any existing literature.Major contributions of this study are:

� A domain-specific corpus is composed andWord2Vec skip-gram algorithm is utilized to learnword embedding from the built corpus.

� t-distributed Stochastic Neighbor Embedding(t-SNE) with Principal component analysis (PCA)initialization algorithm is utilized for dimensionreduction and word embedding visualization.

� A hybrid structured deep neural network is pro-posed and with following three main buildingblocks, the learned word embedding, conventionallayer and Bidirectional Long short-term memory(BDLSTM) layer.

� The numeric experiment is designed using OSHAdataset and the effectiveness of the proposedapproaches is verified by the experiment results.

Rest of the paper is organized as following: relevantpast studies are reviewed in section ‘Literature review’,followed by the overview of methodologies involvedin section ‘Methodology’. In section ‘Experiment andthe proposed method’, the details of the proposedapproach are discussed. Besides, development toolsand the dataset employed in the experiment aredescribed. Experiment results are presented and dis-cussed in section ‘Experiment and the proposedmethod’ as well. The final section discusses possiblesolutions to the identified potential improving areasand summaries this study.

Literature review

In terms of the applications of text mining and NLPtechniques for construction-related accidents analysis,Random Forest (RF) and Stochastic Gradient TreeBoosting (SGTB) algorithms were utilized by Tixier

et al. (2016) to predict the injury type, body partaffected and injury severity using construction injuryreports. Tixier et al. (2016) proposed an NLP approachwhich is based on hand-crafted rules and keywordsdictionary to extract outcomes and precursors fromunstructured injury reports. Precision of the proposedmethod is 0.95. Goh and Ubeynarayana (2017) appliedsupport vector machine (SVM), linear regression (LR),random forest (RF), K-nearest neighbor (KNN), deci-sion tree (DT) and Naive Bayes (NB) algorithms forconstruction accident narrative classification. Amongwhich, SVM achieved an F1 score ranged from 0.45 to0.92 and outperformed the rest models. Chokor et al.(2016) applied a K-means based approach for injuryreports classification. Four clusters with each represent-ing a type of accident were identified. The identifiedaccident types were ‘falls’, ‘struck by objects’,‘electrocutions’ and ‘trenches collapse’. Text miningapproach was compared with case-based reasoningapproach for accidents documents retrieval by Fan andLi (2013). It was reported that the text miningapproach is superior in terms of recall and precision.Zou et al. (2017) proposed an NLP approach based onsemantic query expansion and Vector Space Model(VSM) techniques to retrieve similar accident cases.Recall of the proposed method ranged from 0.5 to 1.

Methodology

Overview of Word2Vec

Word2vec models proposed by Mikolov et al. are gen-erally used for learning word embedding. While theconventional bag of word model represents each wordin the corpus by large sparse vectors, word embed-ding approach adopts real-valued dense vectors repre-sentation, where each vector represents the projectionof the word into a continuous vector space. The pos-ition of each word in the learned vector space isregarded as its corresponding embedding. The majoradvantage of Word2Vec is that contextual similarityand semantic relationship between words can beinferred from the learned vectors (Khatua et al. 2019).There are two commonly adopted structures ofWord2Vec. Namely, continuous bag-of-words(CBOW) and skip-gram. While CBOW predicts thetarget word based on its surrounding words, theobjective of skip-gram is to predict the contexts of agiven word. Applications of Word2Vec techniquescan be found in (Zhang et al. 2015; Alshari etal. 2017).

2 F. ZHANG

Conventional neural network (CNN) overview

CNN is a type of deep neural network which hasbeen successfully applied for image classification(Pinto et al. 2017), speech recognition, feature extrac-tion Scarpa et al. (2018), Kuo and Huang (2018) andso forth. Similarly, a CNN layer is utilized in the pro-posed method for extracting features and modellingthe spatial relationships of the text data. Details of theproposed method is discussed in section ‘Overview ofthe proposed method’.

A standard CNN architecture is presented inFigure 1.

Assume the input from l � 1th layer connects tothe lth convolution layer. Size of the input volume isW � W � K and there M spatial filters of size H �H � K, output uijm is calculated by Equation (1).

uijm ¼XK�1

k¼0

XH�1

p¼0

XH�1

q¼0

Zl�1iþp, jþq, khpqkm þ bijm (1)

Where

bijm

denotes the bias.Output of the convolution layer ZðlÞ

ijm is calculatedby Equation (2).

ZðlÞijm ¼ f ðuijmÞ (2)

In terms of the pooling layer, the most commonlyused pooling approach is max pooling. Assume out-put of convolution layer consists of N � 1 patches,output of max pooling is calculated by Equation (3).

aj ¼ maxN�1ðan�1i uðn, 1ÞÞN�1

(3)

Where uðn, 1Þ is a window function to the patch ofconvolution layer output. aj is the maximum value inthe neighbourhood.

Overview of long short-term memory (LSTM) andbidirectional long short-term memory (BDLSTM)

Overview of LSTM

LSTM is a type of the recurrent neural network (RNN)(Sulehria and Zhang 2007) proposed by Hochreiterand Schmidhuber (1997). A standard topology ofLSTM is shown in Figure 2. At each iteration t, theinput of LSTM cell is xt and ht denotes its output, thecurrent cell input, output state �Ct and Ct , cell outputstate of previous time step denoted by Ct�1 are incor-porated by LSTM cell to update network parametersduring the training process. Gates mechanisms areintroduced to control cell states of LSTM by allowinginformation to pass through optionally. The input gate,forget gate and output gate are denoted by it , ft, otrespectively. Values of the cell input state and gates arecalculated by Equations (4–7).

it ¼ rgðWixt þ Uiht�1 þ biÞ (4)

ft ¼ rgðWf xt þ Uf ht�1 þ bf Þ (5)

ot ¼ rgðWoxt þ Uoht�1 þ boÞ (6)

C�t ¼ tanhðWcxt þ Uoht�1 þ bcÞ (7)

where Wi, Wf , Wo, Wc denote the weight matri-ces between the input of hidden layer, input gate, for-get gate, output gate and input cell state. Ui, Uf , Uo,Uc denote the weight matrices between previous celloutput state, input gate, forget gate, output gate andinput cell state. bi, bf , bo, bc denote the correspond-ing bias vectors.

Overview of BDLSTM

Bidirectional LSTM is a type of LSTM variation(Graves 2005). In bidirectional LSTM, sequence datais processed in both directions with forward LSTMand backward LSTM layer and these two hiddenlayers are connected to the same output layer. Astandard topology (Yildirim 2018) of bidirectionalLSTM is shown in Figure 3.

Figure 1. CNN topology.

INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 3

According to Equations (4–7), at each iteration t,cell output state Ct and LSTM layer output ht are cal-culated by Equations (8 and 9).

Ct ¼ ft�Ct�1 þ C�t�it (8)

ht ¼ ot�tanhðCtÞ (9)

Bidirectional LSTMs have been successfully appliedin the field of trajectory prediction (Xue et al. 2017;Zhao et al. 2018), speech recognition (Zheng et al.2016; Zeyer et al. 2017), biomedical event analysis(Wang et al. 2017), natural language processing (Xuet al. 2018), etc. It is reported in the literature thatbidirectional LSTM outperforms conventional LSTMin some areas such as frame wise phoneme classifica-tion, automatic speech recognition and understanding(Graves et al. 2013).

Experiment and the proposed method

Data description

Two experiments are designed in this study. In the firstexperiment, Word2Vec skip-gram model is trained to

learn word embedding using a domain-specific corpuswhile the second experiment is designed to classifycauses of accidents by a hybrid structured deep neuralnetwork utilizing the learned word embedding in thefirst experiment. The original dataset adopted in thisstudy can be downloaded from the Occupational Safetyand Health Administration (OSHA) website for free.There are 16,323 archived text reports of constructionsite accidents happened between 1983 and 2014, causesof accidents are not annotated in the original dataset.Data pre-processing process for the aforementionedtwo experiments are different. To be more specific,accidents causes classification task requires labelleddata, while to learn the word embedding, the dataset isnot necessarily to be annotated. Details of the datapre-processing steps are discussed in the correspondingexperiment sections separately.

Overview of the proposed method

In this study, a hybrid structured deep neural networkwith Word2Vec approach is proposed. There are

Figure 2. LSTM topology.

Figure 3. BDLSTM topology.

4 F. ZHANG

existing pre-trained word embeddings such asWord2Vec (contrived from Google News) or GloVe(of Stanford NLP group). However, the corpus usedfor the aforementioned pre-trained word embeddingis generic while a relatively small amount of domain-specific corpus is utilized in this study. Due to thesize of data, a skip-gram model of Word2Vec isemployed. The overall process of proposed approachconsists of two phases: training a skip-gram modelusing the domain-specific corpus, then build a hybrid

structured neural network utilizing the learned wordembedding in the previous step. To be more specific,in phase one, the first step is pre-processing the entire16,323 text summaries of the dataset. Punkt tokenizerof NLTK is utilized to detect the end of a sentenceand split each paragraph into a list of sentences. Afterthat, each sentence in the list is split into a sequenceof words, each word is normalized into lower caseand non-English letters are removed. Different to theconventional text mining data pre-processing step,

Figure 4. Overall workflow of building a Word2Vec skip-gram model to learn word embedding.


stopwords are not removed for Word2Vec training, asWord2Vec algorithms depend on the broader contextof sentences to learn word vectors better.

Then, the pre-processed data is fed to Word2Vecskip-gram to learn word embedding. There are severalhyperparameters of Word2Vec which influence thetraining time as well as the quality of the trainedmodel. For example, word vector dimensionality par-ameter specifies the number of features, more featuresresults in a better model in theory. However, thedownside of including more features is a longer train-ing time. The context parameter specifies the numberof context words to be taken into account by the

training algorithm. Besides, the minimum word countparameter is used for reducing the size of vocabularyto words which occur a certain number of timesacross all documents. More details of the hyperpara-meters of Word2Vec can be found in the Googledocument. The overall workflow of phase one isshown in Figure 4.

In phase two, a hybrid structured deep neural net-work is built for classification. Due to the fact thatmanual labelling the whole OSHA dataset is a labourintensive task, another dataset published by Goh andUbeynarayana (2017) is utilized, which consists of1000 labelled sample reports. A sample report is pre-sented in Table 1. Accidents are labelled according tothe standard of Workplace Safety and Health Institute(2016). Besides, the label is assigned according to thefirst incident if multiple incidents are involved in oneaccident. Therefore, scenarios of one case with mul-tiple labels are avoided. For instance, according to thecase summary in Table 1, the incident ‘second storycollapsed’ comes first, followed by ‘Bricks struckEmployee #10s head and neck’. In such case, the causeis labelled as ‘Collapse of object’.

As a result, the dataset is labelled with eleven dif-ferent causes. Distribution of the causes is shown inFigure 5.

Table 1. Sample labelled case.Title Firefighter dies after being struck by collapsing wall

Summary On June 17 2011 Employee #1 a volunteerfirefighter with the Du Quoin Fire Departmentwas fighting a fire in a brick structure. He waswearing full turnout gear. While Employee #1and another firefighter were removing a 35 ftextension ladder from the side of the building itssecond story collapsed. Bricks struck Employee #l0s head and neck. He was transported to ahospital where he died later that day. Thesecond firefighter was not injured.

Cause (manuallylabelled)

Collapse of object

Figure 5. Distribution of accidents causes.

6 F. ZHANG

From Figure 5, it is noticed that the dataset is notbalanced, numbers of cases for certain labels such as‘exposure to extreme temperatures’, ‘exposure tochemical substances’, etc. are small. Therefore, moremanually labelled cases are added for such causes. Asa result, a relatively balanced dataset is obtained asshown in Figure 6. There are 280 more cases areadded to the original dataset and results in a datasetconsists of 1280 cases in total.

After labelling and data balancing, tokenization isperformed to break the long sentence into separatedword and each word is normalized to lower case.However, common words across sentences with littlelexical content, i.e. stop words are removed. Then, thetokens are tagged with POS tags. After that, lemma-tization is applied to remove inflectional endings andreturn the base or dictionary form of a word.

After data pre-processing, each word is encodedinto an integer, each document is represented by asequence of integers and each sequence is padded tobe of the same length.

The overall process of the model building and val-idation is: First, the pre-processed dataset is randomlyshuffled and split into training and validation set byfive folder cross-validation approach. The training setserves as an input to the proposed hybrid structured

neural network. Meanwhile, the pre-trained wordembedding from phase one is loaded into memory.After that, the embedding layer of the proposedneural network is seeded with pre-learned wordembedding weights. After the embedding layer, a con-volution layer with rectified linear unit (ReLu) activa-tion function is stacked, followed by a max poolinglayer. Purpose of CNN layers is to extract potentialspatial features from its previous layer output. Then,a BDLSTM layer with Tanh activation function isstacked for sequence modelling, followed by a drop-out layer to mitigate the risk of overfitting. After that,a fully connected layer with Softmax activation func-tion is added for classification. Classification result isobtained in the last output layer. The validation set isutilized to evaluate the model performance and sucha procedure is repeated for each cross-validationfolder. The overall workflow of phase two is shown inFigure 7.

Results and discussion

In the first part of the experiment, a skip-gram modelis trained by the aforementioned domain-specific cor-pus and a set of different hyperparameters areexplored for training. To be more specific, word

Figure 6. Distribution of accidents causes after data balancing.


vector dimensionality is set to 35 and 55 for bothunigrams and bigrams respectively. After training theskip-model, semantic description of words in a corpus

are represented as numeric vectors which can be uti-lized to identify words that have similar semantics.Further, mathematic operations can be applied to the

Figure 7. Overall workflow of building a hybrid structured deep neural network.

8 F. ZHANG

numeric vectors to reveal potential complex relation-ships between different words to some extent.

The result of mathematics adding and subtractionoperators are shown in Table 2.

Table 2 shows that most calculation results are rea-sonable. For example, ‘fire’ þ ‘water’ results in‘water_heater’, ‘oil’ – ‘water’ leads to ‘parts’. Althoughvectors calculation results cannot be interpreted pre-cisely due to the fact that inferencing the exact mean-ing of each vector dimension in the learned dimensionspace is not feasible.

Results of identifying most similar words bylearned word vectors are shown in Table 3 for theaforementioned hyperparameter settings. The similar-ity scores are measured by cosine similarity given byEquation (10).

cosh ¼ a!:b!

a!�� b

!��

��¼

Pni¼1aibiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPn

i¼1a2i

q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPni¼1b

2i

q (10)

where a!:b!¼ Pn

i¼1 aibi denotes the dot product oftwo vectors.

The results show that similar words to a givenword or a bigram word phrase along with the corre-sponding similarity scores vary slightly when specify-ing different hyperparameters of skip gram models.Moreover, the results of Tables 3–5 proves thatsemantic relationships of a single word or a wordphrase are well captured by the trained models. To bemore specific, a high similarity score is assigned to‘scaffold’ and ‘ladder’ due to the similar semanticmeaning although they are completely different wordsand not close to each other in text sentences.Similarly, the score for semantical close word phrases‘arc flash’ and ‘electrical fault’ is high.

Since the learned vector space is high dimensional,it is impossible to visualize the similar words of agiven word or word phrase in the learned vectorspace. Therefore, t-distributed Stochastic NeighborEmbedding (t-SNE) algorithm (Van Der Maaten andHinton 2008) is utilized to map the data from highdimensional vector space to a two dimensionalembedded space. To be more specific, t-SNE convertssimilarities between data points to joint probabilitiesand tries to minimize the Kullback-Leibler divergence

Table 2. Mathematics operations of two-word vectors.Word1 Word2 Word1 – Word2 Word3 Word4 Word3þWord4

Oil Water Parts Water Acid Sulfuric_acidGas Fire Air_pressure Driver Car PassengerVapor Fire Drained Vapor Water CondensateCarpenter Wood Technician Fire Water Water_heaterDriver Vehicle Operator Floor Roof Second_floorTechnician Machine Inspector Knife Hospital FingertipElectric Fire Volts Saw Tree BranchesFire Electric Flames Water Electric Sump_pumpFlash Flame Electrical Forklift Collapse OverturnRoof Collapse Fixed_ladder Tire Driver Dump_truck

Table 3. Identification of the most similar words using unig-ram models.

Most similarwords to ‘ladder’ Similarity score

Unigram 135 dimensionality

scaffold 0.774950027stairway 0.727389812stepladder 0.727131605plank 0.72421509platform 0.692705512

Unigram 155 dimensionality

stepladder 0.7566185scaffold 0.744360328plank 0.699460685stairway 0.670082092catwalk 0.663146138

Table 4. Identification of the most similar words usingbigram models.

Most similarwords to ‘ladder’ Similarity score

Bigram 135 dimensionality

stepladder 0.90226227extension_ladder 0.842015862fixed_ladder 0.837491155fire_escape 0.824131191stairway 0.814810216


stepladder 0.870609879extension_ladder 0.820408463scaffold 0.786481142fixed_ladder 0.782231569stairway 0.756931722

Table 5. Identification of the most similar word phrases usingboth unigram and bigram models.

Most similarwords to ‘arc flash’ Similarity score


electrical_fault 0.913412988flash 0.884383321arc_blast 0.874953032electrical_shock 0.820102572electric_arc 0.815538585


electrical_fault 0.892078817arc_blast 0.86829561flash 0.862579107electric_arc 0.806348085electrical_shock 0.805317938


(Kullback 1987) between the joint probabilities of thelow dimensional embedding and the high dimensionaldata. One downside of t-SNE algorithm is that thecorresponding cost function is not convex, namely,results of the algorithm depend on its initialization.To tackle this problem, Principal component analysis(PCA) (Pearson 1901) is utilized to reduce the dimen-sionality of input to two by keeping first two majorprincipal components, which speeds up the computa-tion of pairwise distances between data points as wellas suppresses noise without severely distorting theinterpoint distances. In general, PCA initialization ismore globally stable than random initializationaccording to Van Der Maaten and Hinton (2008).

Visualization result of the words which are seman-tically similar to ‘arc_flash’, ‘ladder’, ‘truck’, ‘oil’ and‘log’ in two dimensional embedded vector space isdisplayed in Figure 8.

There are five word clusters shown in Figure 7,boundaries between the five clusters are clear. Eachcluster contains ten most similar words to the givenword or word phrase which are marked by stars. Theresult also shows that, words with similar semanticalmeanings are located closer to each in the embeddedvector space, such as ‘truck’ and ‘van’, on the con-trary, words with different meanings such as ‘log’ and‘electrical_shock’ deviate far from each other in thevector space, which proves the effectiveness of captur-ing semantical relationships by the trained skip-gram model.

In phase two, the proposed neural network istrained with four different embedding layer

configurations. To be more specific, CNN, BDLSTMand other layers of the hybrid structure neural networkare fixed while the embedding layer is seeded with theleaned word embedding from Word2Vec skip modelsusing the aforementioned settings in phase one: 35word vector dimensionality with unigram, 55 wordvector dimensionality with unigram, 35 word vec-tor Fig.

In the second part of the experiment, the proposedneural network is trained with four different embed-ding layer configurations. To be more specific, CNN,BDLSTM and other layers of the hybrid structureneural network are fixed while the embedding layer isseeded with the leaned word embedding fromWord2Vec skip models using the aforementioned set-tings in phase one: 35 word vector dimensionalitywith unigram, 55 word vector dimensionality withunigram, 35 word vector dimensionality with bigramand 55 word vector dimensionality with bigramrespectively.

To evaluate the model performance, F1 score pro-posed by Buckland and Gey (1994) has been widelyemployed in literature. However, number of trueinstances for each label is not considered by F1 score.Therefore, in this study, the average weighted F1score given by Equation (11) is adopted.

Avg F1weighted ¼XNi¼1

SiT�F1i

� �(11)

where N denotes the total number of labels, Sidenotes the support of label i, T denotes the supportof all labels and F1i denotes the F1 score of label i.

Figure 8. Visualization of word clusters in the mapped vector space.

10 F. ZHANG

Apart from the average weighted F1 score, averageweighted precision and recall are calculated in thesimilar fashion and reported. In addition, averageAUC of models considered in the experiment isreported for each cause, ROC curves with the highestaveraged AUC values in 5 folder cross validationresults are presented in Appendix A foreach classifier.

Experiment results of the proposed models areshown in Tables 6–8. The overall f1 score, precisionand recall of each classifier are highlighted in boldfont in the tables.

Furthermore, five baseline classifiers: K-NearestNeighbor (KNN) (Dasarathy 1991), Naïve Bayesian

(NB) (Russell et al. 2016), Decision Tree (DT)(Breiman et al. 1984), Logistic regression (LR)(Hosmer and Lemeshow 1989) and Support VectorMachine (SVM) (Vapnik 1995) are employed in theexperiment to compare with the proposed model. Thenumber of neighbours of KNN is set to 11, RBF kernelis used for SVM and the Multinomial NB of NaïveBayesian classifiers is employed in the experiment.

Results of the baseline classifiers are shown inTables 9–12. The overall f1 score, precision and recallof each classifier are highlighted in bold font inthe tables.

From the results, the proposed neural networksoutperform each baseline models in terms of weighted

Table 7. Five folder cross-validation results of the proposed models.

Causes

unigram þ 55 dim bigram þ 55 dim

Total supportPrecision Recall f1 Precision Recall f1

Caught in/between objects 0.575 0.531 0.547 0.741 0.563 0.626 68Collapse of object 0.554 0.510 0.526 0.522 0.646 0.574 212Electrocution 0.877 0.897 0.885 0.936 0.926 0.930 108Exposure to chemical substances 0.825 0.820 0.807 0.817 0.836 0.820 73Exposure to extreme temperatures 0.769 0.631 0.680 0.778 0.738 0.756 65Falls 0.738 0.754 0.744 0.750 0.699 0.723 236Fires and explosions 0.783 0.685 0.707 0.805 0.778 0.782 53Struck by moving objects 0.580 0.714 0.632 0.795 0.705 0.743 105Struck by falling objects 0.729 0.708 0.709 0.689 0.700 0.692 120Traffic 0.690 0.644 0.658 0.814 0.707 0.756 177Others 0.710 0.746 0.722 0.663 0.760 0.704 63Weighted average/total 0.697 0.685 0.683 0.738 0.720 0.723 1280

Table 8. Average AUC of the proposed models for each cause.

Causes

Average AUC of 5 folder cross validation

unigram þ 35 dim bigram þ 35 dim unigram þ 55 dim bigram þ 55 dim

Caught in/between objects 0.910 0.944 0.900 0.922Collapse of object 0.830 0.850 0.884 0.866Electrocution 0.982 0.990 0.982 0.990Exposure to chemical substances 0.964 0.960 0.972 0.970Exposure to extreme temperatures 0.950 0.934 0.964 0.952Falls 0.924 0.926 0.932 0.932Fires and explosions 0.982 0.978 0.970 0.986Struck by moving objects 0.900 0.926 0.908 0.926Struck by falling objects 0.900 0.934 0.914 0.920Traffic 0.884 0.932 0.916 0.928Others 0.952 0.966 0.968 0.966

Table 6. Five folder cross-validation results of the proposed models.

Causes

unigram þ 35 dim bigram þ 35 dim




average f1 score. Among all models considered in theexperiment, the highest weighted average f1 score is0.723 achieved by the proposed model of embeddinglayer seeded with pre-trained word embedding usingbigram and 55 word vector dimensionality setting.Moreover, among four proposed models, better per-formance is achieved by utilizing word embeddingtrained with a higher word vector dimensionality. Inaddition, it is noted that using bigram setting whichincludes more context for word embedding trainingleads to a relatively high weighted average f1 scorethan the unigram approach. In terms of the classifier

performance with respect to each class label, the pro-posed classifiers achieve a satisfactory result for classi-fying most accident causes, except for ‘caught in/between objects’ and ‘collapse of object’. It is observedin Table 4 that support of ‘caught in/between objects’is relatively low comparing with other labels, only 68cases are due to ‘caught in/between objects’ and insuch case, the proposed classifiers using bigramembeddings still outperform rest baseline models. Theresult further proves that in the case of lackingenough data, by utilizing word embedding, more

Table 10. Five folder cross-validation results of five baseline models.

Causes

LR NB



Table 11. Five folder cross-validation results of five base-line models.

Causes

SVMTotal

supportPrecision Recall f1

Caught in/between objects 0.533 0.104 0.174 68Collapse of object 0.457 0.642 0.533 212Electrocution 0.973 0.860 0.909 108Exposure to chemical substances 0.870 0.672 0.752 73Exposure to extreme temperatures 0.902 0.769 0.824 65Falls 0.702 0.822 0.757 236Fires and explosions 0.860 0.491 0.617 53Struck by moving objects 0.872 0.667 0.747 105Struck by falling objects 0.848 0.500 0.628 120Traffic 0.504 0.740 0.599 177Others 0.778 0.524 0.610 63Weighted average/total 0.706 0.663 0.657 1280

Table 12. Average AUC of the baseline models foreach cause.

Causes

Average AUC of 5 folder cross validation

DT KNN LR NB SVM

Caught in/between objects 0.784 0.868 0.946 0.922 0.952Collapse of object 0.660 0.830 0.870 0.866 0.876Electrocution 0.894 0.964 1.000 0.992 1.000Exposure to chemical

substances0.716 0.950 0.986 0.982 0.994

Exposure to extremetemperatures

0.876 0.978 0.994 0.990 0.992

Falls 0.778 0.920 0.948 0.940 0.950Fires and explosions 0.776 0.940 0.992 0.990 0.996Struck by moving objects 0.798 0.872 0.930 0.932 0.946Struck by falling objects 0.762 0.920 0.930 0.928 0.934Traffic 0.780 0.872 0.906 0.902 0.920Others 0.756 0.934 0.982 0.984 0.984

Table 9. Five folder cross-validation results of five baseline models.

Causes

DT KNN



12 F. ZHANG

semantic information can be learned than conven-tional approaches, and incorporating the augmentedsemantic information results in a more robust classi-fier. However, it is advised that for those labels withweighted average f1 scores, additional manual checksare necessary. Another common challenge is thatsome particular labels are difficult to be classifiedeven by human. For example, ‘collapse of object’,‘struck by falling object’ and ‘struck by moving object’as reported by Goh and Ubeynarayana (2017). Themajor reason is the imprecise nature of languages,leading to diverse interpretations. One commonlymislabelled case type is that the actual label should be‘struck by moving object’, however it is misclassifiedas ‘collapse of object’. Common objects involved in

the cases are trees, beams and objects that sheared,bended, which are confused with structural collapse.

In terms of the performance of the baselines mod-els, SVM achieves an overall satisfying f1 score forclassifying most of labels, except ‘caught in/betweenobjects’ which has a relatively low support. Though incase the case of low support, recall values of mostbaseline classifiers are low, DT and KNN classifiersresults a more balanced precision and recall compar-ing with other baseline models.

Further evaluation of time cost for baseline mod-els and the proposed models is performed and pre-sented in Table 13. The last four rows are themeasure of the proposed approach. For example,‘HANN_unigram_35dim’ denotes the proposedhybrid structured neural network with embeddinglayer seeded with word embedding trained usingunigram and 35 word vector dimensionality settingand so forth. The experiment platform is a 64 bitWindows 10 Dell laptop with Intel core i5 vPro 8thgeneration processor and 16GB memory.

The result shows that setting a lower word vectordimensionality for Word2Vec training is slightly lesstime consuming while applying bigram setting resultsa higher time cost to learn the word embedding.Moreover, it is obvious that all baseline models aresignificantly more time efficient than the proposedapproach. As time aspect is a crucial factor, especially

Figure 9. Distribution of cases for each accident cause from 1983.

Table 13. Time measures of each baseline model and theproposed approach.

Classifier

5 folder crossvalidationtime (sec)

Word2Vectrainingtime (sec)

DT 1.740 NAKNN 11.726 NALR 1.313 NASVM 19.547 NANB 0.848 NAHANN_unigram_35dim 84280.033 36.311HANN_bigram_35dim 83994.167 237.994HANN_unigram_55dim 80634.557 39.228HANN_bigram_55dim 78658.603 248.504


Figure 10. Occurrences of each accident cause type over twenty years from 1994 to 2003.

Figure 11. Occurrences of each accident cause type over twenty years from 2004 to 2013.

14 F. ZHANG

for real-time systems which requires a promptresponse. In such a case, SVM is regarded as a moreoptimal alternative which requires less computationtime with a compromised weighted average f1 score.

Finally, the proposed classifier is applied to classifyall 16,323 cases since year 1983. The distribution ofcases for each cause is shown in Figure 9. Occurrencesof each accident cause type over twenty years from 1994to 2013 is plotted in Figures 10 and 11.

From Figures 9–11, ‘collapse of object’ was themajor cause of construction accidents in the past,while the number of incidents due to ‘struck by fall-ing object’ and ‘exposure to chemical substances’ wasrelatively small. However, ‘falls’ has become the majorcause of construction accidents in recent years. Inaddition, though the number of accidents slightlydecreased in year 1996, 2003 and 2008, the total num-ber of construction accidents kept an overall risingtrend over the past twenty years. However, the corre-sponding study of similar trend analysis of OSHAdataset is not found in existing literature. Besides,there is a lack of labelled cases for each cause, whichposes the challenge of results validation.

Conclusions

In this study, a two-stage text mining approach isproposed for construction accident causes classifica-tion. Firstly, Word2Vec skip-gram algorithm is uti-lized to learn the word embedding from a smalldomain-specific corpus. Then, the learned wordembedding is utilized by a hybrid structured neuralnetwork. Experiment result of the first stage provesthe effectiveness of modelling semantic meanings andrelationships between different words by the proposedapproach. Besides, without the need of an annotatedcorpus is another major advantage of Word2Vecskip-gram algorithm. Experiments results of stage twoshow that the proposed approach outperforms therest baseline models considered in this study in termsof weighted average f1 score. However, there are sev-eral possible future improvements can be considered.First of all, the size of the corpus used in this study isrelatively small, building a larger domain-specific cor-pus can be beneficial for improving the quality oflearned word embedding. Besides, a large pre-trainedcorpus such as GloVe can be explored in future stud-ies. Further, fine-tuning of Word2Vec hyperpara-meters, such as word vector dimensionality,downsampling and Context window size can bepotentially helpful. Moreover, instead of the skip-gram model employed in this study, CBOW model

with various settings such as unigram, bigram and tri-gram settings can be explored as well.

In terms of the proposed deep neural network, totackle the time inefficiency issue, graphics processingunits (GPU) and parallel computing techniques canbe considered, other potential solutions to addressthis issue are, applying early stopping criteria insteadof setting fixed number of epochs for training,increasing learning rate of the network, fine-tuningthe network architecture such as adjusting the num-ber of neurons so as to reduce the unnecessaryparameters. Moreover, data balancing technique suchas oversampling (Oqab et al. 2016) can be appliedwhen pre-processing the accident causes with lowsupport values. Last but not the least, apart from thedeep neural network, ensembling various weak classi-fiers with appropriate optimization algorithms such asGA, PSO, DE (Zhang et al. 2016, 2019) potentiallylead to time efficient and strong classifier, such tech-niques are in the future scope.

Finally, although rigorous validation of the trendanalysis in this study cannot be performed, the ana-lysis results provide valuable insights to some extentand can be served as a reference for future researchersin the similar area.

Disclosure statement

No potential conflict of interest was reported bythe author.

References

Alshari EM, Azman A, Doraisamy S, Mustapha N, AlkeshrM. 2017. Improvement of Sentiment Analysis Based onClustering of Word2Vec Features. In: 2017 28thInternational Workshop on Database and Expert SystemsApplications (DEXA), Lyon. p. 123–126. doi:10.1109/DEXA.2017.41.

Breiman L, Friedman JH, Olshen RA, Stone CJ. 1984.Classification and regression trees. Taylor & Francis.

Buckland M, Gey F. 1994. The relationship between recalland precision. J Am Soc Inf Sci. 45(1):12–19. doi:10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO;2-L.

Chokor A, Naganathan H, Chong WK, El M. 2016.Analyzing Arizona OSHA injury reports using unsuper-vised machine learning. Procedia Eng. 145:1588–1593.doi:10.1016/j.proeng.2016.04.200.

Dasarathy BV. 1991. Nearest Neighbor (NN) Norms: NNPattern Classification Techniques. Los Alamitos: IEEEComputer Society Press.

Fan H, Li H. 2013. Retrieving similar cases for alternativedispute resolution in construction accidents using textmining techniques. Autom Constr. 34:85–91. doi:10.1016/j.autcon.2012.10.014.


https://doi.org/10.1109/DEXA.2017.41

https://doi.org/10.1109/DEXA.2017.41

https://doi.org/10.1002/SICI1097-457119940145:112::AID-ASI23.0.CO;2-L

https://doi.org/10.1002/SICI1097-457119940145:112::AID-ASI23.0.CO;2-L

https://doi.org/10.1016/j.proeng.2016.04.200

https://doi.org/10.1016/j.autcon.2012.10.014


Goh YM, Ubeynarayana CU. 2017. Construction accidentnarrative classification: an evaluation of text mining tech-niques. Accid Anal Prev. 108:122–130. doi:10.1016/j.aap.2017.08.026.

Graves A, Jaitly N, Mohamed A. 2013. Hybrid speech rec-ognition with Deep Bidirectional LSTM. In: 2013 IEEEWorkshop on Automatic Speech Recognition andUnderstanding, Olomouc. p. 273–278. doi:10.1109/ASRU.2013.6707742.

Graves A. 2005. Framewise phoneme classification withbidirectional LSTM and other neural network architec-tures. Neural Netw. 18:602–610. doi:10.1016/j.neunet.2005.06.042.

Hochreiter S, Schmidhuber J. 1997. Long short-term mem-ory. Neural Comput. 9(8):1735–1780. doi:10.1162/neco.1997.9.8.1735.

Hosmer DW, Lemeshow S. 1989. Applied logistic regres-sion. New York (NY): Wiley. Print ISBN:9780471356325

Khatua A, Khatua A, Cambria E. 2019. A tale of two epi-demics: contextual Word2Vec for classifying twitterstreams during outbreaks. Inf Process Manag. 56(1):247–257.

Kullback S. 1987. Letter to the Editor: The Kullback–Leiblerdistance. Am Stat. 41(4) :40–341. JSTOR 2684769.

Kuo P, Huang C. 2018. An electricity price forecastingmodel by hybrid structured deep neural networks.Sustainability. 10(4):1280.

Mikolov T, Chen K, Corrado G, Dean J. 2013a. Distributedrepresentations of words and phrases and their composi-tionality. arXiv:1310.4546v1

Mikolov T, Corrado G, Chen K, Dean J. 2013b. Efficientestimation of word representations in vector space.arXiv:1301.3781v3

Mikolov T, Yih W, Zweig G. 2013c. Linguistic regularitiesin continuous space word representations. In:Proceedings of NAACL-HLT 2013. p. 746–751. https://www.aclweb.org/anthology/N13-1090.

Oqab R, L�opez G, Garach L. 2016. Bayes classifiers forimbalanced traffic accidents datasets. Accid Anal Prev.88:37–51.

Pearson K. 1901. On lines and planes of closest fit to sys-tems of points in space. Philos Mag. 2(11):559–572.

Pinto C, Furukawa J, Fukai H, Tamura S. 2017.Classification of Green coffee bean images basec ondefect types using convolutional neural network (CNN).In: 2017 International Conference on AdvancedInformatics, Concepts, Theory, and Applications(ICAICTA), Denpasar. p. 1–5. doi:10.1109/ICAICTA.2017.8090980.

Russell S, Norvig P. 2016. Artificial intelligence: A modernapproach. 2nd ed. Prentice Hall. ISBN: 0818689307

Scarpa G, Gargiulo M, Mazza A, Gaetano R. 2018. A CNN-based fusion method for feature extraction from sentineldata. Remote Sens. 10(2):1–21.

Sulehria HK, Zhang Y. 2007. Hopfield neural networks: asurvey. In: Proceedings of the 6th Conference on 6thWSEAS Int. Conf. on Artificial Intelligence, KnowledgeEngineering and Data Bases. Corfu Island, Greece. Vol.6, p. 125–130.

Tixier AJ, Hallowell MR, Rajagopalan B, Bowman D. 2016.Application of machine learning to construction injury

prediction. Autom Constr. 69:102–114. doi:10.1016/j.aut-con.2016.05.016.

Tixier AJ, Hallowell MR, Rajagopalan B, Bowman D. 2016.Automated content analysis for construction safety: Anatural language processing system to extract precursorsand outcomes from unstructured injury reports. AutomConstr. 62:45–56. doi:10.1016/j.autcon.2015.11.001.

Van Der Maaten L, Hinton G. 2008. Visualizing Data usingt-SNE. J Mach Learn Res. 9:2579–2605. http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

Vapnik V. 1995. The nature of statistical learning theory.New York (NY): Springer. ISBN 13: 9780137903955

Wang Y, Wang J, Lin H, Zhang S, Li L. 2017. Biomedicalevent trigger detection based on bidirectional LSTM andCRF. In: 2017 IEEE International Conference onBioinformatics and Biomedicine (BIBM), Kansas City,MO. p. 445–450.

Xu C, Xie L, Xiao X. 2018. A bidirectional LSTM approachwith word embeddings for sentence boundary detection.J Signal Process Syst. 90:1063–1075. doi:10.1007/s1126.

Xue H, Huynh DQ, Reynolds M. 2017. Bi-prediction: ped-estrian trajectory prediction based on bidirectional LSTMclassification. In: 2017 International Conference onDigital Image Computing: Techniques and Applications(DICTA), Sydney, NSW. p. 1–8.

Yildirim €O. 2018. A novel wavelet sequences based on deepbidirectional LSTM network model for ECG signal classi-fication. Comput Biol Med. 96:189–202.

Zeyer A, Doetsch P, Voigtlaender P, Schl€uter R, Ney H.2017. A comprehensive study of deep bidirectionalLSTM RNNS for acoustic modeling in speech recogni-tion. In: 2017 IEEE International Conference onAcoustics, Speech and Signal Processing (ICASSP), NewOrleans, LA. p. 2462–2466.

Zhang F, Deb C, Lee SE, Yang J, Shah KW. 2016. Time ser-ies forecasting for building energy consumption usingweighted Support Vector Regression with differentialevolution optimization technique. Energy Build. 126:94–103. doi:10.1016/j.enbuild.2016.05.028.

Zhang F, Fleyeh H, Wang X, Lu M. 2019. Construction siteaccident analysis using text mining and natural languageprocessing techniques. Autom Constr. 99(June 2018):238–248. doi:10.1016/j.autcon.2018.12.016.

Zhang D, Xu H, Su Z, Xu Y. 2015. Expert Systems withApplications Chinese comments sentiment classificationbased on word2vec. Expert Syst Appl. 42(4):1857–1863.doi:10.1016/j.eswa.2014.09.011.

Zhao Y, Yang R, Chevalier G, Shah R, Romijnders R. 2018.Applying deep bidirectional LSTM and mixture density net-work for basketball trajectory prediction. Optik (Stuttg).158:266–272. doi:10.1016/j.ijleo.2017.12.038.

Zheng D, Chen Z, Wu Y, Yu K. 2016. Directed automaticspeech transcription error correction using bidirectionalLSTM. In: 2016 10th International Symposium onChinese Spoken Language Processing (ISCSLP), Tianjins.p. 1–5.

Zou Y, Kiviniemi A, Jones SW. 2017. Retrieving similar casesfor construction project risk management using NaturalLanguage Processing techniques. Autom Constr. 80:66–76.doi:10.1016/j.autcon.2017.04.003.

16 F. ZHANG

https://doi.org/10.1016/j.aap.2017.08.026

https://doi.org/10.1016/j.aap.2017.08.026

https://doi.org/10.1109/ASRU.2013.6707742

https://doi.org/10.1109/ASRU.2013.6707742

https://doi.org/10.1016/j.neunet.2005.06.042

https://doi.org/10.1016/j.neunet.2005.06.042

https://doi.org/10.1162/neco.1997.9.8.1735

https://doi.org/10.1162/neco.1997.9.8.1735

https://www.aclweb.org/anthology/N13-1090

https://www.aclweb.org/anthology/N13-1090

https://doi.org/10.1109/ICAICTA.2017.8090980

https://doi.org/10.1109/ICAICTA.2017.8090980




http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

https://doi.org/10.1007/s1126

https://doi.org/10.1016/j.enbuild.2016.05.028


https://doi.org/10.1016/j.eswa.2014.09.011

https://doi.org/10.1016/j.ijleo.2017.12.038


Figure A1. ROC Curves: unigram þ 35 dim.

Appendix A: ROC curves with the highest average AUC values of 5 folder cross validation foreach classifier


Figure A2. ROC Curves: bigram þ 35 dim.

Figure A3. ROC Curves: unigram þ 55 dim.

18 F. ZHANG

Figure A4. ROC Curves: bigram þ 55 dim.

Figure A5. ROC Curves: DT.


Figure A6. ROC Curves: KNN.

Figure A7. ROC Curves: LR.

20 F. ZHANG

Figure A8. ROC Curves: NB.

Figure A9. ROC Curves: SVM.


Documents

A hybrid structured deep neural network with Word2Vec for …du.diva-portal.org/smash/get/diva2:1375948/FULLTEXT01.pdf · 2019-12-06 · A hybrid structured deep neural network with