11
Research Article A Multiple Hidden Layers Extreme Learning Machine Method and Its Application Dong Xiao, 1 Beijing Li, 1 and Yachun Mao 2 1 Information Science & Engineering School, Northeastern University, Shenyang 110004, China 2 College of Resources and Civil Engineering, Northeastern University, Shenyang 110004, China Correspondence should be addressed to Dong Xiao; [email protected] Received 30 August 2017; Accepted 14 November 2017; Published 13 December 2017 Academic Editor: Mauro Gaggero Copyright © 2017 Dong Xiao et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Extreme learning machine (ELM) is a rapid learning algorithm of the single-hidden-layer feedforward neural network, which randomly initializes the weights between the input layer and the hidden layer and the bias of hidden layer neurons and finally uses the least-squares method to calculate the weights between the hidden layer and the output layer. is paper proposes a multiple hidden layers ELM (MELM for short) which inherits the characteristics of parameters of the first hidden layer. e parameters of the remaining hidden layers are obtained by introducing a method (make the actual output zero error approach the expected hidden layer output). Based on the MELM algorithm, many experiments on regression and classification show that the MELM can achieve the satisfactory results based on average precision and good generalization performance compared to the two-hidden-layer ELM (TELM), the ELM, and some other multilayer ELM. 1. Introduction At present, artificial neural network has been widely applied in many research fields, such as pattern recognition, signal processing, and short-term prediction. Among them, the single-hidden-layer feedforward neural network (SLFN) is the most widely used type of the artificial neural network [1, 2]. Because the parameters of traditional feedforward neural network are usually determined by gradient-based error backpropagation algorithms, the network bears the time- expensive training and testing process and easily falls into the local optimum. Now, many algorithms have been proposed to improve the SLFN operation rate and precision such as the backpropagation algorithm (BP) and its improved algorithms [3, 4]. With limitations of BP algorithms, generalization ability of networks is unsatisfactory and the over learning easily occurs. In 1989, Lowe proposed the RBF neural network [5] which indicated that the parameters of the SLFNs can also be randomly selected in his articles. In 1992, Pao Y. H. et al. proposed the theory of the random vector functional link network (RVFL) [6, 7], and they presented that only one parameter of the output weights should be calculated during the training process. In 2004, Huang G. B. proposed the extreme learning machine (ELM) reducing the training time of network and improving the generalization performance [8–10]. Tradi- tional neural network learning algorithms (such as BP) need to randomly set all the training parameters and use iterative algorithm to update the parameters. Also it is easy to generate local optimal solution. But ELM only needs to randomly set the weights and bias of the hidden neurons, and the output weights are determined by using the Moore-Penrose pseudoinverse under the criterion of least-squares method. In recent years, various ELM variants have been proposed aiming to achieve better achievements, such as the deep ELM with kernel based on Multilayer Extreme Learning Machine (DELM) algorithm [11]; two-hidden-layer extreme learning machine (TELM) [12]; a Four-Layered Feedforward Neural Network [13]; online sequential extreme learning machine [14, 15]; multiple kernel extreme learning machine (MK- ELM) [16]; two-stage extreme learning machine [17], using noise detection and improving the classifier accuracy [18, 19]. Hindawi Mathematical Problems in Engineering Volume 2017, Article ID 4670187, 10 pages https://doi.org/10.1155/2017/4670187

A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

Research ArticleA Multiple Hidden Layers Extreme Learning MachineMethod and Its Application

Dong Xiao1 Beijing Li1 and YachunMao2

1 Information Science amp Engineering School Northeastern University Shenyang 110004 China2College of Resources and Civil Engineering Northeastern University Shenyang 110004 China

Correspondence should be addressed to Dong Xiao xiaodongiseneueducn

Received 30 August 2017 Accepted 14 November 2017 Published 13 December 2017

Academic Editor Mauro Gaggero

Copyright copy 2017 Dong Xiao et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Extreme learning machine (ELM) is a rapid learning algorithm of the single-hidden-layer feedforward neural network whichrandomly initializes the weights between the input layer and the hidden layer and the bias of hidden layer neurons and finally usesthe least-squares method to calculate the weights between the hidden layer and the output layer This paper proposes a multiplehidden layers ELM (MELM for short) which inherits the characteristics of parameters of the first hidden layer The parametersof the remaining hidden layers are obtained by introducing a method (make the actual output zero error approach the expectedhidden layer output) Based on the MELM algorithm many experiments on regression and classification show that the MELM canachieve the satisfactory results based on average precision and good generalization performance compared to the two-hidden-layerELM (TELM) the ELM and some other multilayer ELM

1 Introduction

At present artificial neural network has been widely appliedin many research fields such as pattern recognition signalprocessing and short-term prediction Among them thesingle-hidden-layer feedforward neural network (SLFN) isthe most widely used type of the artificial neural network [12] Because the parameters of traditional feedforward neuralnetwork are usually determined by gradient-based errorbackpropagation algorithms the network bears the time-expensive training and testing process and easily falls into thelocal optimum Now many algorithms have been proposedto improve the SLFN operation rate and precision such as thebackpropagation algorithm (BP) and its improved algorithms[3 4] With limitations of BP algorithms generalizationability of networks is unsatisfactory and the over learningeasily occurs In 1989 Lowe proposed theRBFneural network[5] which indicated that the parameters of the SLFNs canalso be randomly selected in his articles In 1992 Pao Y Het al proposed the theory of the random vector functionallink network (RVFL) [6 7] and they presented that only one

parameter of the output weights should be calculated duringthe training process

In 2004 Huang G B proposed the extreme learningmachine (ELM) reducing the training time of network andimproving the generalization performance [8ndash10] Tradi-tional neural network learning algorithms (such as BP) needto randomly set all the training parameters and use iterativealgorithm to update the parameters Also it is easy to generatelocal optimal solution But ELM only needs to randomlyset the weights and bias of the hidden neurons and theoutput weights are determined by using the Moore-Penrosepseudoinverse under the criterion of least-squares methodIn recent years various ELM variants have been proposedaiming to achieve better achievements such as the deep ELMwith kernel based on Multilayer Extreme Learning Machine(DELM) algorithm [11] two-hidden-layer extreme learningmachine (TELM) [12] a Four-Layered Feedforward NeuralNetwork [13] online sequential extreme learning machine[14 15] multiple kernel extreme learning machine (MK-ELM) [16] two-stage extreme learning machine [17] usingnoise detection and improving the classifier accuracy [18 19]

HindawiMathematical Problems in EngineeringVolume 2017 Article ID 4670187 10 pageshttpsdoiorg10115520174670187

2 Mathematical Problems in Engineering

First consider the DELM with kernel based on ELM-AEalgorithm (DELM) presented in [11] which quotes the ELMautoencoder (ELM-AE) [20ndash22] as the learning algorithm ineach layer The DELM also has multilayer network structuredivided into two parts the first part uses the ELM-AEto deep learn the original data aiming at obtaining themost representative new data the second part calculates thenetwork parameters by using the Kernel ELM algorithmwitha three-layer structure (the output of the first part hiddenlayer and output layer) But for theMELMwedo not need thedata processing (such as extract the representative data fromthe original data) but make the actual output of the hiddenlayers more closer to the expected output of the hidden layersby calculating step by step in the multiple hidden layers

Next a two-hidden-layer feedforward network (TLFN)was proposed by Huang in 2003 [23] This article demon-strates that the TLFNs could learn arbitrary 119873 trainingsamples with a very small training error by employing2radic(119898 + 3)119873 hidden neurons [24] But the changing processof the TLFNs structure is very complicated First the TLFNhas a three-layer network with 119871 output neurons then adds2119871 neurons (two parts) to the hidden layer aiming at makingthe original output layer transform into the second hiddenlayer with 119871 hidden neurons and finally adds a output layerto the structure Eventually the final network structure hasone input layer two hidden layers and one output layer Butthe MELM has a relatively simple stable network structureand the simple calculation process and the MELM is a time-saving algorithm compared with TLFNs

In the paper we propose amultiple hidden layers extremelearning machine algorithm (MELM) that the MELM addssome hidden layers to the original ELM network structurerandomly initializes the weights between the input layer andthe first hidden layer as well as the bias of the first hiddenlayer utilizes the method (make the actual each hidden layeroutput approach the expected each hidden layer output) tocalculate the parameters of the hidden layers (except the firsthidden layer) and finally uses the least square method tocalculate the output weights of the network In the followingchapters we have carried out many experiments with theideas proposed The MELM experimental results on regres-sion problems and some popular classification problems haveshown satisfactory advantages in terms of average accuracycompared to other ELM variants Our experiments also studythe effect of different numbers of the hidden layer neuronsthe compatible activation function and the different numbersof the hidden layers on the same problems

The rest of this paper is organized as follows Section 2reviews the original ELM Section 3 presents the methodand framework structure of two-hidden-layer ELM Section 4presents the proposed the MELM technique multihidden-layer ELM Section 5 reports and analyzes experimentalresults Section 6 presents the conclusions

2 Extreme Learning Machine

The extreme learning machine (ELM) proposed by HuangGB aims at avoiding time-costing iterative training process

middot middot middot

middot middot middot

Y

X

e outputlayer

e hiddenlayer

e inputlayer

H = g(wX + b)

middot middot middot

Figure 1 The structure of the ELM

and improving the generalization performance [8ndash10 25] Asa single-hidden-layer feedforward neural networks (SLFNs)the ELM structure includes input layer hidden layer andoutput layer Different from the traditional neural networklearning algorithms (such as BP algorithm) randomly settingall the network training parameters and easily generatinglocal optimal solution the ELM only sets the number ofhidden neurons of the network randomizes the weightsbetween the input layer and the hidden layer aswell as the biasof the hidden neurons in the algorithm execution processcalculates the hidden layer output matrix and finally obtainsthe weight between the hidden layer and the output layer byusing the Moore-Penrose pseudoinverse under the criterionof least-squares method Because the ELM has the simplenetwork structure and the concise parameters computationprocesses so the ELM has the advantages of fast learningspeedThe original structure of ELM is expressed in Figure 1

Figure 1 is the extreme learning machine network struc-ture which includes 119899 input layer neurons 119897 hidden layerneurons and 119898 output layer neurons First consider thetraining sample 119883Y = 119909119894 119910119894 (119894 = 1 2 Q) and there isan input feature 119883 = [1199091198941 1199091198942 sdot sdot sdot 119909119894119876] and a desired matrix119884 = [1199101198951 1199101198952 sdot sdot sdot 119910119895119876] comprised of the training sampleswhere the matrix 119883 and the matrix 119884 can be expressed asfollows

119883 =[[[[[[[[

11990911 11990912 sdot sdot sdot 119909111987611990921 11990922 sdot sdot sdot 1199092119876 d

1199091198991 1199091198992 sdot sdot sdot 119909119899119876

]]]]]]]]

119884 =[[[[[[[[

11991011 11991012 sdot sdot sdot 11991011989811987611991021 11991022 sdot sdot sdot 119910119898119876 d

1199101198981 1199101198982 sdot sdot sdot 119910119898119876

]]]]]]]]

(1)

Mathematical Problems in Engineering 3

where the parameters 119899 and 119898 are the dimension of inputmatrix and output matrix

Then the ELM randomly sets the weights between theinput layer and the hidden layer

119908 =[[[[[[[[

11990811 11990812 sdot sdot sdot 119908111989911990821 11990822 sdot sdot sdot 1199082119899 d

1199081198971 1199081198972 sdot sdot sdot 119908119897119899

]]]]]]]]

(2)

where 119908119894119895 represents the weights between the 119895th input layerneuron and 119894th hidden layer neuron

Third the ELM assumes the weights between the hiddenlayer and the output layer that can be expressed as follows

120573 =[[[[[[[

12057311 12057312 sdot sdot sdot 120573111989812057321 12057322 sdot sdot sdot 1205732119898 d

1205731198971 1205731198972 sdot sdot sdot 120573119897119898

]]]]]]] (3)

where 120573119895119896 represents the weights between the 119895th hiddenlayer neuron and 119896th output layer neuron

Fourth the ELM randomly sets the bias of the hiddenlayer neurons

119861 = [1198871 1198872 sdot sdot sdot 119887119899]119879 (4)

Fifth the ELM chooses the network activation function 119892(119909)According to Figure 1 the output matrix 119879 can be expressedas follows

119879 = [1199051 1199052 119905119876]119898times119876 (5)

Each column vector of the output matrix 119879 is as follows

119905119895 =[[[[[[[[

11990511198951199052119895

119905119898119895

]]]]]]]]

=

[[[[[[[[[[[[[[[

119897sum119894=1

1205731198941119892 (119908119894119909119895 + 119887119894)119897sum119894=1

1205731198942119892 (119908119894119909119895 + 119887119894)

119897sum119894=1

120573119894119898119892 (119908119894119909119895 + 119887119894)

]]]]]]]]]]]]]]](119895 = 1 2 3 119876)

(6)

Sixth consider formulae (5) and (6) and we can get

119867120573 = 1198791015840 (7)

where 1198791015840 is the transpose of 119879 and 119867 is the output of thehidden layer In order to obtain the unique solution withminimum-error we use least square method to calculate theweight matrix values of 120573 [8 9]

120573 = 119867+1198791015840 (8)

W

B

X

H

B1

W1H2

f(x)

Figure 2 The workflow of the TELM

To improve the generalization ability of network andmake the results more stable we add a regularization termto the 120573 [26] When the number of hidden layer neurons isless than the number of training samples 120573 can be expressedas

120573 = ( 119868120582 + 119867119879119867)minus11198671198791198791015840 (9)

When the number of hidden layer nodes is more than thenumber of training samples 120573 can be expressed as

120573 = 119867119879 ( 119868120582 + 119867119867119879)minus1 1198791015840 (10)

3 Two-Hidden-Layer ELM

In 2016 B Y Qu and B F Lang proposed the two-hidden-layer extreme learning machine (TELM) in [11] The TELMtries to make the actual hidden layers output approachexpected hidden layer outputs by adding parameter settingstep for the second hidden layer Finally the TELM finds abetter way to map the relationship between input and outputsignals which is the two-hidden-layer ELM The networkstructure of TELM includes one input layer two hiddenlayers one output layer and each hidden layer with 119897 hiddenneurons The activation function of the network is selectedfor 119892(119909)

The focus of the TELM algorithm is the process ofcalculating and updating the weights between the first hiddenlayer and the second hidden layer as well as the bias of thehidden layer and the output weights between the secondhidden layer and the output layerThe workflow of the TELMarchitecture is depicted in Figure 2

Consider the training sample datasets 119883T =119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 is inputsamples and 119879 is the labeled samples

The TELM first puts the two hidden layers as one hiddenlayer so the output of the hidden layer can be expressed as119867 = 119892(119882119883 + 119861) with the parameters of the weight 119882 andbias 119861 of the first hidden layer randomly initialized Next theoutput weight matrix 120573 between the second hidden layer andthe output layer can be obtained by using

120573 = 119867+119879 (11)

Now the TELM separates the two hidden layers mergedpreviously so the network has two hidden layers Accordingto the workflow of Figure 2 the output of the second hiddenlayer can be obtained as follows

119892 (1198821119867 + 1198611) = 1198671 (12)

4 Mathematical Problems in Engineering

where1198821 is theweightmatrixes between the first hidden layerand the second hidden layer119867 is the outputmatrix of the firsthidden layer 1198611 is the bias of the second hidden layer and1198671is the expected output of the second hidden layer

However the expected output of the second hidden layercan be obtained by calculating

1198671 = 119879120573+ (13)

where 120573+ is the generalized inverse of the matrix 120573Now the TELM defines the matrix 119882119867119864 = [1198611 1198821]

so the parameters of the second hidden layer can be easilyobtained by using formula (12) and the inverse function ofthe activation function

119882119867119864 = 119892minus1 (1198671)119867+119864 (14)

where 119867+119864 is the generalized inverse of 119867119864 = [1 119867]119879 1denotes a one-column vector of size 119876 and its elements arethe scalar unit 1 The notation 119892minus1(119909) indicates the inverse ofthe activation function 119892(119909)

With selecting the appropriate activation function 119892(119909)the TELM calculates (14) so the actual output of the secondhidden layer is updated as follows

1198672 = 119892 (119882119867119864119867119864) (15)

So the weights matrix 120573 between the second hidden layerand the output layer is updated as follows

120573new = 119867+2 119879 (16)

where119867+2 is the generalized inverse of1198672 so the actual outputof the TELM network can be expressed as

119891 (119909) = 1198672120573new (17)

To sum up the TELM algorithm process can be expressed asfollows

Algorithm 1 (1) Assume the training sample dataset is119883 119879 = 119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 isinput samples and the matrix 119879 is the labeled samples eachhidden layer with 119897 hidden neurons the activation function119892(119909)(2) Randomly generate the weights 119882 between the inputlayer and the first hidden layer and bias 119861 of the first hiddenneurons119882119868119864 = [119861 119882] 119883119864 = [1 119883]119879(3) Calculate the equation119867 = 119892(119882119868119864119883119864)(4) Obtain the weights between the second hidden layerand output layer 120573 = 119867+119879(5) Calculate the expected output of the second hiddenlayer1198671 = 119879120573+(6) According to formulae (12)ndash(14) and the algorithmsteps (4 5) calculate the weights1198821 between the first hiddenlayer and the second hidden layer and the bias 1198611 of thesecond hidden neurons119882119867119864 = 119892minus1(1198671)119867+119864 (7) Obtain and update the actual output of the secondhidden layer1198672 = 119892(119882119867119864119867119864)(8)Update the weights matrix 120573 between the secondhidden layer and the output layer 120573new = 119867+2 119879(9)Calculate the output of the network 119891(119909) =119892[119882119867119892(119882119883 + 119861) + 1198611]120573new

e outputlayer

e thirdhidden layer

e secondhidden layer

e rsthidden layer

e inputlayer

f(x)

HQ

X

H4 = g(WHE1HE1)

H2 = g(WHEHE)

H = g(WIEXE)

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 3 The structure of the three-hidden-layer ELM

XW

B

H

B1 B2

W1 W2 H2

Figure 4 The workflow of the three-hidden-layer ELM

4 Multihidden-Layer ELM

At present some scholars have proposed many improve-ments on ELM algorithm and structure and obtained someachievements and advantages Such advantages of neuralnetwork motivate us to explore the better ideas behind theELM Based on the above articles we adjust the structure ofELM neural network Thus we propose an algorithm namedmultiple hidden layers extreme learning machine (MELM)The structure of the MELM (select the three-hidden-layerELM for example) is illustrated in Figure 3 The workflow ofthe three-hidden-layer ELM is illustrated in Figure 4

Here we take a three-hidden-layer ELM for example andanalyze the MELM algorithm below First give the trainingsamples 119883T = 119909119894 119905119894 (119894 = 1 2 3 119876) and the three-hidden-layer network structure (each of the three-hidden-layer has 119897 hidden neurons) with the activation function 119892(119909)The structure of the three-hidden-layer ELM has input layerthree hidden layers and output layer According to the theoryof the TELMalgorithm nowwe put the three hidden layers astwo hidden layers (the first hidden layer is still the first hiddenlayer but put the second hidden layer and the third hidden

Mathematical Problems in Engineering 5

layer together as one hidden layer) so that the structure of thenetwork is the same as the TELM network mentioned aboveSo we can obtain the weights matrix 120573new between the secondhidden layer and the output layer According to the numberof the actual samples we can use formula (8) or formula (9) tocalculate the weights 120573 which can improve the generalizationability of the network

Then theMELMseparates the three hidden layersmergedpreviously so theMELMstructure has three hidden layers Sothe expected output of the third hidden layer can be expressedas follows

1198673 = 119879120573+new (18)

120573+new is generalized inverse of the weights matrix 120573newThird the MELM defines the matrix 1198821198671198641 = [1198612 1198822]

so the parameters of the third hidden layer can be easilyobtained by calculating formula (18) and the formula 1198673 =119892(11986721198822 + 1198612) = 119892(11988211986711986411198671198641)

1198821198671198641 = 119892minus1 (1198673)119867+1198641 (19)

where1198672 is the actual output of the second hidden layer1198822is the weights between the second hidden layer and the thirdhidden layer1198612 is the bias of the third hidden neurons119867+1198641 isthe generational inverse of 1198671198641 = [1 1198672]119879 1 denotes a one-column vector of size 119876 and its elements are the scalar unit1 The notation 119892minus1(119909) indicates the inverse of the activationfunction 119892(119909)

In order to test the performance of the proposed MELMalgorithm we adopt different activation functions for regres-sion and classification problems to experiment Generally weadopt the logistic sigmoid function 119892(119909) = 1(1 + 119890minus119909) Theactual output of the third hidden layer is calculated as follows

1198674 = 119892 (W11986711986411198671198641) (20)

Finally the output weights matrix 120573new between the thirdhidden layer and the output layer is calculated as followswhen the number of hidden layer neurons is less than thenumber of training samples 120573 can be expressed as follows

120573new = ( 119868120582 + 11986711987941198674)

minus11198671198794 119879 (21)

When the number of hidden layer neurons is more thanthe number of training samples120573 can be expressed as follows

120573new = 1198671198794 ( 119868120582 + 11986741198671198794 )

minus1 119879 (22)

The actual output of the three-hidden-layer ELMnetwork canbe expressed as follows

119891 (119909) = 1198674120573new (23)

To ensure that the actual final hidden output moreapproaches the expected hidden output during the trainingprocess the operation process is the optimization of the net-work structure parameters starting from the second hiddenlayer

The above is the parameter calculation process of three-hidden-layer ELM network but the purpose of this paperis to calculate the parameter of the multiple hidden layersELM network and the final output of the MELM networkstructure We can use cycle calculation theory to illustratethe calculating process of theMELMWhen the four-hidden-layer ELM network occurs we can recalculate formula (18)to formula (22) in the calculation process of the networkobtain and record the parameters of each hidden layer andfinally calculate the final output of the MELM network If thenumber of hidden layers increased the calculation processcan be recycled and executed in the samewayThe calculationprocess of MELM network can be described as follows

Algorithm 2 (1) Assume the training sample dataset is119883 119879 = 119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 isthe input samples and the matrix 119879 is the labeled samplesEach hidden layer has 119897 hidden neurons with the activationfunction 119892(119909)(2) Randomly initialize the weights119882 between the inputlayer and the first hidden layer as well as the bias 119861 of the firsthidden neurons119882119868119864 = [119861 119882] 119883119864 = [1 119883]119879(3) Calculate the equation119867 = 119892(119882119868119864119883119864)(4) Calculate the weights between the hidden layers andthe output layer 120573 = (119868120582 + 119867119879119867)minus1119867119879119879 or 120573 = 119867119879(119868120582 +119867119867119879)minus1119879(5) Calculate the expected output of the second hiddenlayer1198671 = 119879120573+(6) According to formulae (12)ndash(14) and the algorithmsteps (4 5) calculate the weights1198821 between the first hiddenlayer and the second hidden layer and the bias 1198611 of thesecond hidden neurons119882119867119864 = 119892minus1(1198671)119867+119864 (7) Obtain and update the actual output of the secondhidden layer1198672 = 119892(119882119867119864119867119864)(8)Update the weights matrix 120573 between the hidden layerand the output layer 120573new = (119868120582 + 11986711987921198672)minus11198671198792 119879 or 120573new =1198671198792 (119868120582 + 11986721198671198792 )minus1119879(9) If the number of the hidden layer is three we cancalculate the parameters by recycle executing the aboveoperation from step (5) to step (9) Now 120573new is expressed asfollows 120573new = 120573 119867119864 = [1 1198672]119879(10) Calculate the output 119891(119909) = 1198672120573newIf the number119870 of the hidden layer ismore than three recycleis executing step (5) to step (9) for (119870 minus 1) times All the 119867matrix (1198671 1198672) must be normalized between the range ofminus09 and 09 when the max of the matrix is more than 1 andthe min of the matrix is less than minus15 Application

In order to verify the actual effect of the algorithm proposedin this paper we have done the following experiments whichare divided into three parts regression problems classifi-cation problems and the application of mineral selectionindustry All the experiments are conducted in the MATLAB2010b computational environment running on a computerwith a 2302GHZ in i3 CPU

6 Mathematical Problems in Engineering

Table 1 The RMSE of the three function examples

Algorithm Training RMSE Testing RMSE1198911(119909)

TELM 86233119864 minus 9 10797119864 minus 8MELM (three-hidden-layer) 8722119864 minus 15 13055119864 minus 14MLELM 18428119864 minus 6 15204119864 minus 5

1198912(119909)TELM 02867 02775MELM (three-hidden-layer) 00011 00019MELM 01060 01098

1198913(119909)TELM 03528 06204MELM (three-hidden-layer) 02110 04177MLELM 05538 04591

51 Regression Problems To test the performance of theregression problems several widely used functions are listedbelow [27]Weuse these functions to generate a datasetwhichincludes random selection of sufficient training samples andthe remaining is used as a testing samples and the activationfunction is selected as the hyperbolic tangent function 119892(119909) =(1 minus 119890minus119909)(1 + 119890minus119909)

(1) 1198911(119909) = sum119863119894=1 119909119894(2) 1198912(119909) = sum119863minus1119894=1 (100(1199092119894 minus 119909119894+1)2 + (119909119894 minus 1)2)(3) 1198913(119909) = minus20119890minus02radicsum119899119894=1 1199092119894 119863 minus 119890sum119899119894=1 cos(2120587119909119894)119863 + 20The symbol 119863 that is set as a positive integer represents

the dimensions of the function we are using The function1198912(119909) and 1198913(119909) are the complex multimodal functionEach function-experiment has 900 training samples and 100testing samples The evaluation criterion of the experimentsis the average accuracy namely the root mean square error(RMSE) in the regression problems We first need to conductsome experiments to verify the average accuracy of the neuralnetwork structure in different number of hidden neuronsand hidden layers Then we can observe the optimal neuralnetwork structure with specific hidden layers number andhidden neurons number compared with the results of theTELM on regression problems

In Table 1 we give the average RMSE of the above threefunctions in the case of the two-hidden-layer structure theMLELM and the three-hidden-layer structure (we do a seriesof experiments including the four-hidden-layer structurefive-hidden-layer structure and eight-hidden-layer struc-ture But the experiments proved that the three-hidden-layer structure for the three functions 1198911(119909) 1198912(119909) 1198913(119909) canachieve the best results so we only give the value of the RMSEof the three-hidden-layer structure) in regression problemsThe table includes the testing RMSE and the training RMSEof the three algorithm structure We can see that the MELM(three-hidden-layer) has the better results

Table 2 The three datasets for the classification

Datasets Training samples Testing samples Attributes ClassMice Protein 750 330 82 8svmguide4 300 312 10 6vowel 568 422 10 11AutoUnix 380 120 101 4Iris 100 50 4 3

52 Classification Problems To test the performance of theMELM algorithm proposed on classification datasets so wequote some datasets (such asMice Protein svmguide4 vowelAutoUnix and Iris) that are collected from the University ofCalifornia [28] and the LIBSVM website [29] The trainingdata and testing data of each experiment are randomlyselected from the original datasets This information of thedatasets introduced is given in Table 2

The classification performance criteria of the problemsare the average classification accuracy of the testing data Inthe Figure 5 the average testing classification correct percent-age for the ELM TELM MLELM and MELM algorithm isshown clearly by using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris and different datasets (ab c d and e) have an important effect on the classificationaccuracy for the three algorithms (ELM TELM MLELMand MELM) But from the changing trend of the averagetesting correct percentage for the three algorithms with thesame datasets we can see that the MELM algorithm canselect the optimal number of hidden layers for the networkstructure to adapt to the specific datasets aiming at obtainingthe better results So the datasets of the Mice Protein andthe Forest type mapping use the three-hidden-layer ELMalgorithm to experiment and the datasets of the svmguide4use the five-hidden-layer ELM algorithm to experiment

53 The Application of Ores Selection Industry In this partwe use the actual datasets to analyze and experiment withthe model we have established and test the performance ofthe MELM algorithm First our datasets come from AnQianMining Group namely the mining area of Ya-ba-ling andXi-da-bei And the samples we selected are the hematite(50 samples) magnetite (90 samples) granite (57 samples)phyllite (15 samples) and chlorite (32 samples) On theprecise classification of the ores and the prediction of totaliron content of iron ores people usually use the chemicalanalysis method But the analysis process of total iron contentis a complex process a long process and the color of thesolution sometimes without the significantly change in theend Due to the extensive using of near infrared spectroscopyfor chemical composition analysis we can infer the structureof the unknown substance according to the position andshape of peaks in the absorption spectra So we can use thetheory to obtain the correct information of the ores with thelow influence of environment on data acquisition HR1024SVC spectrometer is used to test the spectral data of eachsample and the total iron content of hematite and magnetiteis obtained from the chemical analysis center of Northeastern

Mathematical Problems in Engineering 7

ELMTELM

Three-hidden-layerMLELM

30

40

50

60

70

80

90

100Av

erag

e tes

ting

corr

ect p

erce

ntag

e (

)

50 100 150 200 250 3000The number of hidden neurons

(a)

ELMTELM

Five-hidden-layerMLELM

2025303540455055606570

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(b)

ELMTELM

Three-hidden-layerMLELM

10

20

30

40

50

60

70

80

90

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(c)

ELMTELM

Five-hidden-layerMLELM

45

50

55

60

65

70

75

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(d)

ELMTELM

Three-hidden-layerMLELM

55

60

65

70

75

80

85

90

95

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(e)

Figure 5 The average testing classification correct percentage for the ELM TELM and MELM using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris

8 Mathematical Problems in Engineering

Training data

ELMTELM

Three-hidden-layerMLELM

75

80

85

90

95

100

Aver

age t

rain

ing

corr

ect p

erce

ntag

e (

)

40 60 80 100 120 140 16020The number of hidden neurons

(a)

Testing data

ELMTELM

Three-hidden-layerMLELM

50

55

60

65

70

75

80

85

90

95

100

Aver

age t

estin

g co

rrec

t per

cent

age (

)

40 60 80 100 120 140 16020The number of hidden neurons

(b)

Figure 6 Average classification accuracy of different hidden layers

University of China Our experimentsrsquo datasets are the nearinfrared spectrum information of iron ores which are theabsorption rate of different iron ores under the condition ofthe near infrared spectrum The wavelength of near infraredspectrum is 300 nmndash2500 nm so the datasets of the iron oreswe obtained have the very high dimensions (973 dimensions)We use the theory of the principal component analysis toreduce the dimensions of the datasets

(A) Here we use the algorithm proposed in this paper toclassify the five kinds of ores and compare with the resultsof the ELM algorithm and TELM algorithm The activa-tion function of the algorithm in classification problems isselected as the sigmoid function 119892(119909) = 1(1 + 119890minus119909)

Figure 6 expresses the average classification accuracy ofthe ELM TELM MLELM and MELM (three-hidden-layerELM) algorithm in different hidden layers After we conducta series of experiments with different hidden layers of theMELM structure finally we select the three-hidden-layerELM structure which can obtain the optimal classificationresults in the MELM algorithm Figure 6(a) is the averagetraining correct percentage and the MELM has the highresult compared with others in the same hidden neuronsFigure 6(b) is the average testing correct percentage and theMELM has the high result in the hidden neurons between 20and 60 In the actual situation we can reasonably choose thenumber of each hidden neurons to model

(B) Here we use the method proposed in this paper totest the total iron content of the hematite and magnetite andcompare with the results of the ELM algorithm and TELMalgorithm Here is the prediction of total iron content ofhematite and magnetite with the optimal hidden layers andoptimal hidden layer neurons The activation function of the

algorithm in regression problems is selected as the hyperbolictangent function 119892(119909) = (1 minus 119890minus119909)(1 + 119890minus119909)

Figures 7 and 8 respectively expresses the total ironcontent of the hematite and magnetite by using the ELMTELM MLELM andMELM algorithm (the MELM includesthe eight-hidden-layer structure which is the optimal struc-ture in Figure 7 the six-hidden-layer structure which is theoptimal structure in Figure 8) The optimal results of eachalgorithm are obtained by constantly experimenting with thedifferent number of hidden layer neurons In the predictionprocess of the hematite we can see that the MELM has thebetter results and each hidden layer of this structure has 100hidden neurons In the prediction process of the magnetitewe can see that the MELM has the better results and eachhidden layer of this structure has 20 hidden neurons Sothe MELM algorithm has strong performance to the oresselection problems which can achieve the best adaptabilityto the problems by adjusting the number of hidden layers andthe number of hidden layer neurons to improve the ability ofsystem regression analysis

6 Conclusion

The MELM we proposed in this paper solves the two prob-lems which exist in the training process of the original ELMThe first is the stability of single network that the disadvan-tage will also influence the generalization performance Thesecond is that the outputweights are calculated by the productof the generalized inverse of hidden layer output and thesystem actual output And the parameters randomly selectedof hidden layer neurons which can lead to the singularmatrix or morbid matrix At the same time the MELM

Mathematical Problems in Engineering 9

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

16

18

20

22

24

26

28

30

32

34

36Tr

ain

set o

utpu

t

10 20 30 40 50 600Train set sample

(a)

1 2 3 4 5 6 7 8 9 10Test set sample

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

15

20

25

30

35

40

45

Test

set o

utpu

t(b)

Figure 7 The total iron content of the hematite

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

Trai

n se

t out

put

2 3 4 5 6 7 8 91Train set sample

(a)

1 15 2 25 3 35 4 45 5Test set sample

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

40

Test

set o

utpu

t

(b)

Figure 8 The total iron content of the magnetite

network structure also improves the average accuracy oftraining and testing performance compared to the ELM andTELM network structure The MELM algorithm inherits thecharacteristics of traditional ELM that randomly initializesthe weights and bias (between the input layer and the firsthidden layer) also adopts a part of the TELM algorithm anduses the inverse activation function to calculate the weightsand bias of hidden layers (except the first hidden layer)Then we make the actual hidden layer output approximateto the expected hidden layer output and use the parameters

obtained above to calculate the actual output In the functionregression problems this algorithm reduces the least meansquare error In the datasets classification problems theaverage accuracy of themultiple classifications is significantlyhigher than that of the ELM and TELMnetwork structure Insuch cases the MELM is able to improve the performance ofthe network structure

Conflicts of Interest

The authors declare no conflicts of interest

10 Mathematical Problems in Engineering

Acknowledgments

This research is supported by National Natural ScienceFoundation of China (Grants nos 41371437 61473072and 61203214) Fundamental Research Funds for the Cen-tral Universities (N160404008) National Twelfth Five-YearPlan for Science and Technology Support (2015BAB15B01)China and National Key Research and Development Plan(2016YFC0801602)

References

[1] K Hornik ldquoApproximation capabilities of multilayer feedfor-ward networksrdquoNeural Networks vol 4 no 2 pp 251ndash257 1991

[2] M Leshno V Y Lin A Pinkus and S Schocken ldquoMultilayerfeedforward networks with a nonpolynomial activation func-tion can approximate any functionrdquoNeural Networks vol 6 no6 pp 861ndash867 1993

[3] R Hecht-Nielsen ldquoTheory of the backpropagation neural net-workrdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo89) vol 1 pp 593ndash605 WashingtonDC USA June 1989

[4] H-C Hsin C-C Li M Sun and R J Sclabassi ldquoAn AdaptiveTraining Algorithm for Back-Propagation Neural NetworksrdquoIEEE Transactions on SystemsMan and Cybernetics vol 25 no3 pp 512ndash514 1995

[5] D Lowe ldquodaptive radial basis function nonlinearities and theproblemof generalizationrdquo in First IEE International Conferenceon Artificial Neural Networks pp 313-171 1989

[6] S Ding G Ma and Z Shi ldquoA Rough RBF Neural NetworkBased on Weighted Regularized Extreme Learning MachinerdquoNeural Processing Letters vol 40 no 3 pp 245ndash260 2014

[7] C SunW HeW Ge and C Chang ldquoAdaptive Neural NetworkControl of Biped Robotsrdquo IEEE Transactions on Systems Manand Cybernetics Systems vol 47 no 2 pp 315ndash326 2017

[8] S-J Wang H-L Chen W-J Yan Y-H Chen and X FuldquoFace recognition and micro-expression recognition based ondiscriminant tensor subspace analysis plus extreme learningmachinerdquo Neural Processing Letters vol 39 no 1 pp 25ndash432014

[9] G Huang ldquoAn insight into extreme learningmachines randomneurons random features and kernelsrdquo Cognitive Computation2014

[10] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[11] S Ding N Zhang X Xu L Guo and J Zhang ldquoDeep ExtremeLearning Machine and Its Application in EEG ClassificationrdquoMathematical Problems in Engineering vol 2015 Article ID129021 2015

[12] B Qu B Lang J Liang A Qin and O Crisalle ldquoTwo-hidden-layer extreme learning machine for regression andclassificationrdquo Neurocomputing vol 175 pp 826ndash834 2016

[13] S I Tamura and M Tateishi ldquoCapabilities of a four-layeredfeedforward neural network four layers versus threerdquo IEEETransactions on Neural Networks and Learning Systems vol 8no 2 pp 251ndash255 1997

[14] J Zhao Z Wang and D S Park ldquoOnline sequential extremelearning machine with forgetting mechanismrdquo Neurocomput-ing vol 87 no 15 pp 79ndash89 2012

[15] B Mirza Z Lin and K-A Toh ldquoWeighted online sequentialextreme learning machine for class imbalance learningrdquoNeuralProcessing Letters vol 38 no 3 pp 465ndash486 2013

[16] X Liu L Wang G B Huang J Zhang and J Yin ldquoMultiplekernel extreme learning machinerdquo Neurocomputing vol 149part A pp 253ndash264 2015

[17] Y Lan Y C Soh andG BHuang ldquoTwo-stage extreme learningmachine for regressionrdquo Neurocomputing vol 73 no 16-18 pp3028ndash3038 2010

[18] C-T Lin M Prasad and A Saxena ldquoAn Improved PolynomialNeural Network Classifier Using Real-Coded Genetic Algo-rithmrdquo IEEE Transactions on Systems Man and CyberneticsSystems vol 45 no 11 pp 1389ndash1401 2015

[19] C F Stallmann and A P Engelbrecht ldquoGramophone NoiseDetection and Reconstruction Using Time Delay ArtificialNeural Networksrdquo IEEE Transactions on Systems Man andCybernetics Systems vol 47 no 6 pp 893ndash905 2017

[20] E Cambria G-BHuang and L L C Kasun ldquoExtreme learningmachinesrdquo IEEE Intelligent Systems vol 28 no 6 pp 30ndash592013

[21] W Zhu J Miao L Qing and G-B Huang ldquoHierarchicalExtreme Learning Machine for unsupervised representationlearningrdquo in Proceedings of the International Joint Conference onNeural Networks IJCNN 2015 Ireland July 2015

[22] W Yu F Zhuang Q He and Z Shi ldquoLearning deep representa-tions via extreme learningmachinesrdquoNeurocomputing pp 308ndash315 2015

[23] G-B Huang ldquoLearning capability and storage capacity oftwo-hidden-layer feedforward networksrdquo IEEE Transactions onNeural Networks and Learning Systems vol 14 no 2 pp 274ndash281 2003

[24] G-B Huang and H A Babri ldquoUpper bounds on the numberof hidden neurons in feedforward networks with arbitrarybounded nonlinear activation functionsrdquo IEEE Transactions onNeural Networks and Learning Systems vol 9 no 1 pp 224ndash2291998

[25] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[26] JWei H Liu G Yan and F Sun ldquoRobotic grasping recognitionusing multi-modal deep extreme learning machinerdquo Multidi-mensional Systems and Signal Processing vol 28 no 3 pp 817ndash833 2016

[27] J J Liang A K Qin P N Suganthan and S BaskarldquoComprehensive learning particle swarm optimizer for globaloptimization of multimodal functionsrdquo IEEE Transactions onEvolutionary Computation vol 10 no 3 pp 281ndash295 2006

[28] University of California Irvine Machine Learning Repositoryhttparchiveicsucieduml

[29] LWebsite httpswwwcsientuedutwsimcjlinlibsvmtoolsdata-sets

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

2 Mathematical Problems in Engineering

First consider the DELM with kernel based on ELM-AEalgorithm (DELM) presented in [11] which quotes the ELMautoencoder (ELM-AE) [20ndash22] as the learning algorithm ineach layer The DELM also has multilayer network structuredivided into two parts the first part uses the ELM-AEto deep learn the original data aiming at obtaining themost representative new data the second part calculates thenetwork parameters by using the Kernel ELM algorithmwitha three-layer structure (the output of the first part hiddenlayer and output layer) But for theMELMwedo not need thedata processing (such as extract the representative data fromthe original data) but make the actual output of the hiddenlayers more closer to the expected output of the hidden layersby calculating step by step in the multiple hidden layers

Next a two-hidden-layer feedforward network (TLFN)was proposed by Huang in 2003 [23] This article demon-strates that the TLFNs could learn arbitrary 119873 trainingsamples with a very small training error by employing2radic(119898 + 3)119873 hidden neurons [24] But the changing processof the TLFNs structure is very complicated First the TLFNhas a three-layer network with 119871 output neurons then adds2119871 neurons (two parts) to the hidden layer aiming at makingthe original output layer transform into the second hiddenlayer with 119871 hidden neurons and finally adds a output layerto the structure Eventually the final network structure hasone input layer two hidden layers and one output layer Butthe MELM has a relatively simple stable network structureand the simple calculation process and the MELM is a time-saving algorithm compared with TLFNs

In the paper we propose amultiple hidden layers extremelearning machine algorithm (MELM) that the MELM addssome hidden layers to the original ELM network structurerandomly initializes the weights between the input layer andthe first hidden layer as well as the bias of the first hiddenlayer utilizes the method (make the actual each hidden layeroutput approach the expected each hidden layer output) tocalculate the parameters of the hidden layers (except the firsthidden layer) and finally uses the least square method tocalculate the output weights of the network In the followingchapters we have carried out many experiments with theideas proposed The MELM experimental results on regres-sion problems and some popular classification problems haveshown satisfactory advantages in terms of average accuracycompared to other ELM variants Our experiments also studythe effect of different numbers of the hidden layer neuronsthe compatible activation function and the different numbersof the hidden layers on the same problems

The rest of this paper is organized as follows Section 2reviews the original ELM Section 3 presents the methodand framework structure of two-hidden-layer ELM Section 4presents the proposed the MELM technique multihidden-layer ELM Section 5 reports and analyzes experimentalresults Section 6 presents the conclusions

2 Extreme Learning Machine

The extreme learning machine (ELM) proposed by HuangGB aims at avoiding time-costing iterative training process

middot middot middot

middot middot middot

Y

X

e outputlayer

e hiddenlayer

e inputlayer

H = g(wX + b)

middot middot middot

Figure 1 The structure of the ELM

and improving the generalization performance [8ndash10 25] Asa single-hidden-layer feedforward neural networks (SLFNs)the ELM structure includes input layer hidden layer andoutput layer Different from the traditional neural networklearning algorithms (such as BP algorithm) randomly settingall the network training parameters and easily generatinglocal optimal solution the ELM only sets the number ofhidden neurons of the network randomizes the weightsbetween the input layer and the hidden layer aswell as the biasof the hidden neurons in the algorithm execution processcalculates the hidden layer output matrix and finally obtainsthe weight between the hidden layer and the output layer byusing the Moore-Penrose pseudoinverse under the criterionof least-squares method Because the ELM has the simplenetwork structure and the concise parameters computationprocesses so the ELM has the advantages of fast learningspeedThe original structure of ELM is expressed in Figure 1

Figure 1 is the extreme learning machine network struc-ture which includes 119899 input layer neurons 119897 hidden layerneurons and 119898 output layer neurons First consider thetraining sample 119883Y = 119909119894 119910119894 (119894 = 1 2 Q) and there isan input feature 119883 = [1199091198941 1199091198942 sdot sdot sdot 119909119894119876] and a desired matrix119884 = [1199101198951 1199101198952 sdot sdot sdot 119910119895119876] comprised of the training sampleswhere the matrix 119883 and the matrix 119884 can be expressed asfollows

119883 =[[[[[[[[

11990911 11990912 sdot sdot sdot 119909111987611990921 11990922 sdot sdot sdot 1199092119876 d

1199091198991 1199091198992 sdot sdot sdot 119909119899119876

]]]]]]]]

119884 =[[[[[[[[

11991011 11991012 sdot sdot sdot 11991011989811987611991021 11991022 sdot sdot sdot 119910119898119876 d

1199101198981 1199101198982 sdot sdot sdot 119910119898119876

]]]]]]]]

(1)

Mathematical Problems in Engineering 3

where the parameters 119899 and 119898 are the dimension of inputmatrix and output matrix

Then the ELM randomly sets the weights between theinput layer and the hidden layer

119908 =[[[[[[[[

11990811 11990812 sdot sdot sdot 119908111989911990821 11990822 sdot sdot sdot 1199082119899 d

1199081198971 1199081198972 sdot sdot sdot 119908119897119899

]]]]]]]]

(2)

where 119908119894119895 represents the weights between the 119895th input layerneuron and 119894th hidden layer neuron

Third the ELM assumes the weights between the hiddenlayer and the output layer that can be expressed as follows

120573 =[[[[[[[

12057311 12057312 sdot sdot sdot 120573111989812057321 12057322 sdot sdot sdot 1205732119898 d

1205731198971 1205731198972 sdot sdot sdot 120573119897119898

]]]]]]] (3)

where 120573119895119896 represents the weights between the 119895th hiddenlayer neuron and 119896th output layer neuron

Fourth the ELM randomly sets the bias of the hiddenlayer neurons

119861 = [1198871 1198872 sdot sdot sdot 119887119899]119879 (4)

Fifth the ELM chooses the network activation function 119892(119909)According to Figure 1 the output matrix 119879 can be expressedas follows

119879 = [1199051 1199052 119905119876]119898times119876 (5)

Each column vector of the output matrix 119879 is as follows

119905119895 =[[[[[[[[

11990511198951199052119895

119905119898119895

]]]]]]]]

=

[[[[[[[[[[[[[[[

119897sum119894=1

1205731198941119892 (119908119894119909119895 + 119887119894)119897sum119894=1

1205731198942119892 (119908119894119909119895 + 119887119894)

119897sum119894=1

120573119894119898119892 (119908119894119909119895 + 119887119894)

]]]]]]]]]]]]]]](119895 = 1 2 3 119876)

(6)

Sixth consider formulae (5) and (6) and we can get

119867120573 = 1198791015840 (7)

where 1198791015840 is the transpose of 119879 and 119867 is the output of thehidden layer In order to obtain the unique solution withminimum-error we use least square method to calculate theweight matrix values of 120573 [8 9]

120573 = 119867+1198791015840 (8)

W

B

X

H

B1

W1H2

f(x)

Figure 2 The workflow of the TELM

To improve the generalization ability of network andmake the results more stable we add a regularization termto the 120573 [26] When the number of hidden layer neurons isless than the number of training samples 120573 can be expressedas

120573 = ( 119868120582 + 119867119879119867)minus11198671198791198791015840 (9)

When the number of hidden layer nodes is more than thenumber of training samples 120573 can be expressed as

120573 = 119867119879 ( 119868120582 + 119867119867119879)minus1 1198791015840 (10)

3 Two-Hidden-Layer ELM

In 2016 B Y Qu and B F Lang proposed the two-hidden-layer extreme learning machine (TELM) in [11] The TELMtries to make the actual hidden layers output approachexpected hidden layer outputs by adding parameter settingstep for the second hidden layer Finally the TELM finds abetter way to map the relationship between input and outputsignals which is the two-hidden-layer ELM The networkstructure of TELM includes one input layer two hiddenlayers one output layer and each hidden layer with 119897 hiddenneurons The activation function of the network is selectedfor 119892(119909)

The focus of the TELM algorithm is the process ofcalculating and updating the weights between the first hiddenlayer and the second hidden layer as well as the bias of thehidden layer and the output weights between the secondhidden layer and the output layerThe workflow of the TELMarchitecture is depicted in Figure 2

Consider the training sample datasets 119883T =119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 is inputsamples and 119879 is the labeled samples

The TELM first puts the two hidden layers as one hiddenlayer so the output of the hidden layer can be expressed as119867 = 119892(119882119883 + 119861) with the parameters of the weight 119882 andbias 119861 of the first hidden layer randomly initialized Next theoutput weight matrix 120573 between the second hidden layer andthe output layer can be obtained by using

120573 = 119867+119879 (11)

Now the TELM separates the two hidden layers mergedpreviously so the network has two hidden layers Accordingto the workflow of Figure 2 the output of the second hiddenlayer can be obtained as follows

119892 (1198821119867 + 1198611) = 1198671 (12)

4 Mathematical Problems in Engineering

where1198821 is theweightmatrixes between the first hidden layerand the second hidden layer119867 is the outputmatrix of the firsthidden layer 1198611 is the bias of the second hidden layer and1198671is the expected output of the second hidden layer

However the expected output of the second hidden layercan be obtained by calculating

1198671 = 119879120573+ (13)

where 120573+ is the generalized inverse of the matrix 120573Now the TELM defines the matrix 119882119867119864 = [1198611 1198821]

so the parameters of the second hidden layer can be easilyobtained by using formula (12) and the inverse function ofthe activation function

119882119867119864 = 119892minus1 (1198671)119867+119864 (14)

where 119867+119864 is the generalized inverse of 119867119864 = [1 119867]119879 1denotes a one-column vector of size 119876 and its elements arethe scalar unit 1 The notation 119892minus1(119909) indicates the inverse ofthe activation function 119892(119909)

With selecting the appropriate activation function 119892(119909)the TELM calculates (14) so the actual output of the secondhidden layer is updated as follows

1198672 = 119892 (119882119867119864119867119864) (15)

So the weights matrix 120573 between the second hidden layerand the output layer is updated as follows

120573new = 119867+2 119879 (16)

where119867+2 is the generalized inverse of1198672 so the actual outputof the TELM network can be expressed as

119891 (119909) = 1198672120573new (17)

To sum up the TELM algorithm process can be expressed asfollows

Algorithm 1 (1) Assume the training sample dataset is119883 119879 = 119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 isinput samples and the matrix 119879 is the labeled samples eachhidden layer with 119897 hidden neurons the activation function119892(119909)(2) Randomly generate the weights 119882 between the inputlayer and the first hidden layer and bias 119861 of the first hiddenneurons119882119868119864 = [119861 119882] 119883119864 = [1 119883]119879(3) Calculate the equation119867 = 119892(119882119868119864119883119864)(4) Obtain the weights between the second hidden layerand output layer 120573 = 119867+119879(5) Calculate the expected output of the second hiddenlayer1198671 = 119879120573+(6) According to formulae (12)ndash(14) and the algorithmsteps (4 5) calculate the weights1198821 between the first hiddenlayer and the second hidden layer and the bias 1198611 of thesecond hidden neurons119882119867119864 = 119892minus1(1198671)119867+119864 (7) Obtain and update the actual output of the secondhidden layer1198672 = 119892(119882119867119864119867119864)(8)Update the weights matrix 120573 between the secondhidden layer and the output layer 120573new = 119867+2 119879(9)Calculate the output of the network 119891(119909) =119892[119882119867119892(119882119883 + 119861) + 1198611]120573new

e outputlayer

e thirdhidden layer

e secondhidden layer

e rsthidden layer

e inputlayer

f(x)

HQ

X

H4 = g(WHE1HE1)

H2 = g(WHEHE)

H = g(WIEXE)

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 3 The structure of the three-hidden-layer ELM

XW

B

H

B1 B2

W1 W2 H2

Figure 4 The workflow of the three-hidden-layer ELM

4 Multihidden-Layer ELM

At present some scholars have proposed many improve-ments on ELM algorithm and structure and obtained someachievements and advantages Such advantages of neuralnetwork motivate us to explore the better ideas behind theELM Based on the above articles we adjust the structure ofELM neural network Thus we propose an algorithm namedmultiple hidden layers extreme learning machine (MELM)The structure of the MELM (select the three-hidden-layerELM for example) is illustrated in Figure 3 The workflow ofthe three-hidden-layer ELM is illustrated in Figure 4

Here we take a three-hidden-layer ELM for example andanalyze the MELM algorithm below First give the trainingsamples 119883T = 119909119894 119905119894 (119894 = 1 2 3 119876) and the three-hidden-layer network structure (each of the three-hidden-layer has 119897 hidden neurons) with the activation function 119892(119909)The structure of the three-hidden-layer ELM has input layerthree hidden layers and output layer According to the theoryof the TELMalgorithm nowwe put the three hidden layers astwo hidden layers (the first hidden layer is still the first hiddenlayer but put the second hidden layer and the third hidden

Mathematical Problems in Engineering 5

layer together as one hidden layer) so that the structure of thenetwork is the same as the TELM network mentioned aboveSo we can obtain the weights matrix 120573new between the secondhidden layer and the output layer According to the numberof the actual samples we can use formula (8) or formula (9) tocalculate the weights 120573 which can improve the generalizationability of the network

Then theMELMseparates the three hidden layersmergedpreviously so theMELMstructure has three hidden layers Sothe expected output of the third hidden layer can be expressedas follows

1198673 = 119879120573+new (18)

120573+new is generalized inverse of the weights matrix 120573newThird the MELM defines the matrix 1198821198671198641 = [1198612 1198822]

so the parameters of the third hidden layer can be easilyobtained by calculating formula (18) and the formula 1198673 =119892(11986721198822 + 1198612) = 119892(11988211986711986411198671198641)

1198821198671198641 = 119892minus1 (1198673)119867+1198641 (19)

where1198672 is the actual output of the second hidden layer1198822is the weights between the second hidden layer and the thirdhidden layer1198612 is the bias of the third hidden neurons119867+1198641 isthe generational inverse of 1198671198641 = [1 1198672]119879 1 denotes a one-column vector of size 119876 and its elements are the scalar unit1 The notation 119892minus1(119909) indicates the inverse of the activationfunction 119892(119909)

In order to test the performance of the proposed MELMalgorithm we adopt different activation functions for regres-sion and classification problems to experiment Generally weadopt the logistic sigmoid function 119892(119909) = 1(1 + 119890minus119909) Theactual output of the third hidden layer is calculated as follows

1198674 = 119892 (W11986711986411198671198641) (20)

Finally the output weights matrix 120573new between the thirdhidden layer and the output layer is calculated as followswhen the number of hidden layer neurons is less than thenumber of training samples 120573 can be expressed as follows

120573new = ( 119868120582 + 11986711987941198674)

minus11198671198794 119879 (21)

When the number of hidden layer neurons is more thanthe number of training samples120573 can be expressed as follows

120573new = 1198671198794 ( 119868120582 + 11986741198671198794 )

minus1 119879 (22)

The actual output of the three-hidden-layer ELMnetwork canbe expressed as follows

119891 (119909) = 1198674120573new (23)

To ensure that the actual final hidden output moreapproaches the expected hidden output during the trainingprocess the operation process is the optimization of the net-work structure parameters starting from the second hiddenlayer

The above is the parameter calculation process of three-hidden-layer ELM network but the purpose of this paperis to calculate the parameter of the multiple hidden layersELM network and the final output of the MELM networkstructure We can use cycle calculation theory to illustratethe calculating process of theMELMWhen the four-hidden-layer ELM network occurs we can recalculate formula (18)to formula (22) in the calculation process of the networkobtain and record the parameters of each hidden layer andfinally calculate the final output of the MELM network If thenumber of hidden layers increased the calculation processcan be recycled and executed in the samewayThe calculationprocess of MELM network can be described as follows

Algorithm 2 (1) Assume the training sample dataset is119883 119879 = 119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 isthe input samples and the matrix 119879 is the labeled samplesEach hidden layer has 119897 hidden neurons with the activationfunction 119892(119909)(2) Randomly initialize the weights119882 between the inputlayer and the first hidden layer as well as the bias 119861 of the firsthidden neurons119882119868119864 = [119861 119882] 119883119864 = [1 119883]119879(3) Calculate the equation119867 = 119892(119882119868119864119883119864)(4) Calculate the weights between the hidden layers andthe output layer 120573 = (119868120582 + 119867119879119867)minus1119867119879119879 or 120573 = 119867119879(119868120582 +119867119867119879)minus1119879(5) Calculate the expected output of the second hiddenlayer1198671 = 119879120573+(6) According to formulae (12)ndash(14) and the algorithmsteps (4 5) calculate the weights1198821 between the first hiddenlayer and the second hidden layer and the bias 1198611 of thesecond hidden neurons119882119867119864 = 119892minus1(1198671)119867+119864 (7) Obtain and update the actual output of the secondhidden layer1198672 = 119892(119882119867119864119867119864)(8)Update the weights matrix 120573 between the hidden layerand the output layer 120573new = (119868120582 + 11986711987921198672)minus11198671198792 119879 or 120573new =1198671198792 (119868120582 + 11986721198671198792 )minus1119879(9) If the number of the hidden layer is three we cancalculate the parameters by recycle executing the aboveoperation from step (5) to step (9) Now 120573new is expressed asfollows 120573new = 120573 119867119864 = [1 1198672]119879(10) Calculate the output 119891(119909) = 1198672120573newIf the number119870 of the hidden layer ismore than three recycleis executing step (5) to step (9) for (119870 minus 1) times All the 119867matrix (1198671 1198672) must be normalized between the range ofminus09 and 09 when the max of the matrix is more than 1 andthe min of the matrix is less than minus15 Application

In order to verify the actual effect of the algorithm proposedin this paper we have done the following experiments whichare divided into three parts regression problems classifi-cation problems and the application of mineral selectionindustry All the experiments are conducted in the MATLAB2010b computational environment running on a computerwith a 2302GHZ in i3 CPU

6 Mathematical Problems in Engineering

Table 1 The RMSE of the three function examples

Algorithm Training RMSE Testing RMSE1198911(119909)

TELM 86233119864 minus 9 10797119864 minus 8MELM (three-hidden-layer) 8722119864 minus 15 13055119864 minus 14MLELM 18428119864 minus 6 15204119864 minus 5

1198912(119909)TELM 02867 02775MELM (three-hidden-layer) 00011 00019MELM 01060 01098

1198913(119909)TELM 03528 06204MELM (three-hidden-layer) 02110 04177MLELM 05538 04591

51 Regression Problems To test the performance of theregression problems several widely used functions are listedbelow [27]Weuse these functions to generate a datasetwhichincludes random selection of sufficient training samples andthe remaining is used as a testing samples and the activationfunction is selected as the hyperbolic tangent function 119892(119909) =(1 minus 119890minus119909)(1 + 119890minus119909)

(1) 1198911(119909) = sum119863119894=1 119909119894(2) 1198912(119909) = sum119863minus1119894=1 (100(1199092119894 minus 119909119894+1)2 + (119909119894 minus 1)2)(3) 1198913(119909) = minus20119890minus02radicsum119899119894=1 1199092119894 119863 minus 119890sum119899119894=1 cos(2120587119909119894)119863 + 20The symbol 119863 that is set as a positive integer represents

the dimensions of the function we are using The function1198912(119909) and 1198913(119909) are the complex multimodal functionEach function-experiment has 900 training samples and 100testing samples The evaluation criterion of the experimentsis the average accuracy namely the root mean square error(RMSE) in the regression problems We first need to conductsome experiments to verify the average accuracy of the neuralnetwork structure in different number of hidden neuronsand hidden layers Then we can observe the optimal neuralnetwork structure with specific hidden layers number andhidden neurons number compared with the results of theTELM on regression problems

In Table 1 we give the average RMSE of the above threefunctions in the case of the two-hidden-layer structure theMLELM and the three-hidden-layer structure (we do a seriesof experiments including the four-hidden-layer structurefive-hidden-layer structure and eight-hidden-layer struc-ture But the experiments proved that the three-hidden-layer structure for the three functions 1198911(119909) 1198912(119909) 1198913(119909) canachieve the best results so we only give the value of the RMSEof the three-hidden-layer structure) in regression problemsThe table includes the testing RMSE and the training RMSEof the three algorithm structure We can see that the MELM(three-hidden-layer) has the better results

Table 2 The three datasets for the classification

Datasets Training samples Testing samples Attributes ClassMice Protein 750 330 82 8svmguide4 300 312 10 6vowel 568 422 10 11AutoUnix 380 120 101 4Iris 100 50 4 3

52 Classification Problems To test the performance of theMELM algorithm proposed on classification datasets so wequote some datasets (such asMice Protein svmguide4 vowelAutoUnix and Iris) that are collected from the University ofCalifornia [28] and the LIBSVM website [29] The trainingdata and testing data of each experiment are randomlyselected from the original datasets This information of thedatasets introduced is given in Table 2

The classification performance criteria of the problemsare the average classification accuracy of the testing data Inthe Figure 5 the average testing classification correct percent-age for the ELM TELM MLELM and MELM algorithm isshown clearly by using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris and different datasets (ab c d and e) have an important effect on the classificationaccuracy for the three algorithms (ELM TELM MLELMand MELM) But from the changing trend of the averagetesting correct percentage for the three algorithms with thesame datasets we can see that the MELM algorithm canselect the optimal number of hidden layers for the networkstructure to adapt to the specific datasets aiming at obtainingthe better results So the datasets of the Mice Protein andthe Forest type mapping use the three-hidden-layer ELMalgorithm to experiment and the datasets of the svmguide4use the five-hidden-layer ELM algorithm to experiment

53 The Application of Ores Selection Industry In this partwe use the actual datasets to analyze and experiment withthe model we have established and test the performance ofthe MELM algorithm First our datasets come from AnQianMining Group namely the mining area of Ya-ba-ling andXi-da-bei And the samples we selected are the hematite(50 samples) magnetite (90 samples) granite (57 samples)phyllite (15 samples) and chlorite (32 samples) On theprecise classification of the ores and the prediction of totaliron content of iron ores people usually use the chemicalanalysis method But the analysis process of total iron contentis a complex process a long process and the color of thesolution sometimes without the significantly change in theend Due to the extensive using of near infrared spectroscopyfor chemical composition analysis we can infer the structureof the unknown substance according to the position andshape of peaks in the absorption spectra So we can use thetheory to obtain the correct information of the ores with thelow influence of environment on data acquisition HR1024SVC spectrometer is used to test the spectral data of eachsample and the total iron content of hematite and magnetiteis obtained from the chemical analysis center of Northeastern

Mathematical Problems in Engineering 7

ELMTELM

Three-hidden-layerMLELM

30

40

50

60

70

80

90

100Av

erag

e tes

ting

corr

ect p

erce

ntag

e (

)

50 100 150 200 250 3000The number of hidden neurons

(a)

ELMTELM

Five-hidden-layerMLELM

2025303540455055606570

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(b)

ELMTELM

Three-hidden-layerMLELM

10

20

30

40

50

60

70

80

90

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(c)

ELMTELM

Five-hidden-layerMLELM

45

50

55

60

65

70

75

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(d)

ELMTELM

Three-hidden-layerMLELM

55

60

65

70

75

80

85

90

95

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(e)

Figure 5 The average testing classification correct percentage for the ELM TELM and MELM using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris

8 Mathematical Problems in Engineering

Training data

ELMTELM

Three-hidden-layerMLELM

75

80

85

90

95

100

Aver

age t

rain

ing

corr

ect p

erce

ntag

e (

)

40 60 80 100 120 140 16020The number of hidden neurons

(a)

Testing data

ELMTELM

Three-hidden-layerMLELM

50

55

60

65

70

75

80

85

90

95

100

Aver

age t

estin

g co

rrec

t per

cent

age (

)

40 60 80 100 120 140 16020The number of hidden neurons

(b)

Figure 6 Average classification accuracy of different hidden layers

University of China Our experimentsrsquo datasets are the nearinfrared spectrum information of iron ores which are theabsorption rate of different iron ores under the condition ofthe near infrared spectrum The wavelength of near infraredspectrum is 300 nmndash2500 nm so the datasets of the iron oreswe obtained have the very high dimensions (973 dimensions)We use the theory of the principal component analysis toreduce the dimensions of the datasets

(A) Here we use the algorithm proposed in this paper toclassify the five kinds of ores and compare with the resultsof the ELM algorithm and TELM algorithm The activa-tion function of the algorithm in classification problems isselected as the sigmoid function 119892(119909) = 1(1 + 119890minus119909)

Figure 6 expresses the average classification accuracy ofthe ELM TELM MLELM and MELM (three-hidden-layerELM) algorithm in different hidden layers After we conducta series of experiments with different hidden layers of theMELM structure finally we select the three-hidden-layerELM structure which can obtain the optimal classificationresults in the MELM algorithm Figure 6(a) is the averagetraining correct percentage and the MELM has the highresult compared with others in the same hidden neuronsFigure 6(b) is the average testing correct percentage and theMELM has the high result in the hidden neurons between 20and 60 In the actual situation we can reasonably choose thenumber of each hidden neurons to model

(B) Here we use the method proposed in this paper totest the total iron content of the hematite and magnetite andcompare with the results of the ELM algorithm and TELMalgorithm Here is the prediction of total iron content ofhematite and magnetite with the optimal hidden layers andoptimal hidden layer neurons The activation function of the

algorithm in regression problems is selected as the hyperbolictangent function 119892(119909) = (1 minus 119890minus119909)(1 + 119890minus119909)

Figures 7 and 8 respectively expresses the total ironcontent of the hematite and magnetite by using the ELMTELM MLELM andMELM algorithm (the MELM includesthe eight-hidden-layer structure which is the optimal struc-ture in Figure 7 the six-hidden-layer structure which is theoptimal structure in Figure 8) The optimal results of eachalgorithm are obtained by constantly experimenting with thedifferent number of hidden layer neurons In the predictionprocess of the hematite we can see that the MELM has thebetter results and each hidden layer of this structure has 100hidden neurons In the prediction process of the magnetitewe can see that the MELM has the better results and eachhidden layer of this structure has 20 hidden neurons Sothe MELM algorithm has strong performance to the oresselection problems which can achieve the best adaptabilityto the problems by adjusting the number of hidden layers andthe number of hidden layer neurons to improve the ability ofsystem regression analysis

6 Conclusion

The MELM we proposed in this paper solves the two prob-lems which exist in the training process of the original ELMThe first is the stability of single network that the disadvan-tage will also influence the generalization performance Thesecond is that the outputweights are calculated by the productof the generalized inverse of hidden layer output and thesystem actual output And the parameters randomly selectedof hidden layer neurons which can lead to the singularmatrix or morbid matrix At the same time the MELM

Mathematical Problems in Engineering 9

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

16

18

20

22

24

26

28

30

32

34

36Tr

ain

set o

utpu

t

10 20 30 40 50 600Train set sample

(a)

1 2 3 4 5 6 7 8 9 10Test set sample

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

15

20

25

30

35

40

45

Test

set o

utpu

t(b)

Figure 7 The total iron content of the hematite

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

Trai

n se

t out

put

2 3 4 5 6 7 8 91Train set sample

(a)

1 15 2 25 3 35 4 45 5Test set sample

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

40

Test

set o

utpu

t

(b)

Figure 8 The total iron content of the magnetite

network structure also improves the average accuracy oftraining and testing performance compared to the ELM andTELM network structure The MELM algorithm inherits thecharacteristics of traditional ELM that randomly initializesthe weights and bias (between the input layer and the firsthidden layer) also adopts a part of the TELM algorithm anduses the inverse activation function to calculate the weightsand bias of hidden layers (except the first hidden layer)Then we make the actual hidden layer output approximateto the expected hidden layer output and use the parameters

obtained above to calculate the actual output In the functionregression problems this algorithm reduces the least meansquare error In the datasets classification problems theaverage accuracy of themultiple classifications is significantlyhigher than that of the ELM and TELMnetwork structure Insuch cases the MELM is able to improve the performance ofthe network structure

Conflicts of Interest

The authors declare no conflicts of interest

10 Mathematical Problems in Engineering

Acknowledgments

This research is supported by National Natural ScienceFoundation of China (Grants nos 41371437 61473072and 61203214) Fundamental Research Funds for the Cen-tral Universities (N160404008) National Twelfth Five-YearPlan for Science and Technology Support (2015BAB15B01)China and National Key Research and Development Plan(2016YFC0801602)

References

[1] K Hornik ldquoApproximation capabilities of multilayer feedfor-ward networksrdquoNeural Networks vol 4 no 2 pp 251ndash257 1991

[2] M Leshno V Y Lin A Pinkus and S Schocken ldquoMultilayerfeedforward networks with a nonpolynomial activation func-tion can approximate any functionrdquoNeural Networks vol 6 no6 pp 861ndash867 1993

[3] R Hecht-Nielsen ldquoTheory of the backpropagation neural net-workrdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo89) vol 1 pp 593ndash605 WashingtonDC USA June 1989

[4] H-C Hsin C-C Li M Sun and R J Sclabassi ldquoAn AdaptiveTraining Algorithm for Back-Propagation Neural NetworksrdquoIEEE Transactions on SystemsMan and Cybernetics vol 25 no3 pp 512ndash514 1995

[5] D Lowe ldquodaptive radial basis function nonlinearities and theproblemof generalizationrdquo in First IEE International Conferenceon Artificial Neural Networks pp 313-171 1989

[6] S Ding G Ma and Z Shi ldquoA Rough RBF Neural NetworkBased on Weighted Regularized Extreme Learning MachinerdquoNeural Processing Letters vol 40 no 3 pp 245ndash260 2014

[7] C SunW HeW Ge and C Chang ldquoAdaptive Neural NetworkControl of Biped Robotsrdquo IEEE Transactions on Systems Manand Cybernetics Systems vol 47 no 2 pp 315ndash326 2017

[8] S-J Wang H-L Chen W-J Yan Y-H Chen and X FuldquoFace recognition and micro-expression recognition based ondiscriminant tensor subspace analysis plus extreme learningmachinerdquo Neural Processing Letters vol 39 no 1 pp 25ndash432014

[9] G Huang ldquoAn insight into extreme learningmachines randomneurons random features and kernelsrdquo Cognitive Computation2014

[10] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[11] S Ding N Zhang X Xu L Guo and J Zhang ldquoDeep ExtremeLearning Machine and Its Application in EEG ClassificationrdquoMathematical Problems in Engineering vol 2015 Article ID129021 2015

[12] B Qu B Lang J Liang A Qin and O Crisalle ldquoTwo-hidden-layer extreme learning machine for regression andclassificationrdquo Neurocomputing vol 175 pp 826ndash834 2016

[13] S I Tamura and M Tateishi ldquoCapabilities of a four-layeredfeedforward neural network four layers versus threerdquo IEEETransactions on Neural Networks and Learning Systems vol 8no 2 pp 251ndash255 1997

[14] J Zhao Z Wang and D S Park ldquoOnline sequential extremelearning machine with forgetting mechanismrdquo Neurocomput-ing vol 87 no 15 pp 79ndash89 2012

[15] B Mirza Z Lin and K-A Toh ldquoWeighted online sequentialextreme learning machine for class imbalance learningrdquoNeuralProcessing Letters vol 38 no 3 pp 465ndash486 2013

[16] X Liu L Wang G B Huang J Zhang and J Yin ldquoMultiplekernel extreme learning machinerdquo Neurocomputing vol 149part A pp 253ndash264 2015

[17] Y Lan Y C Soh andG BHuang ldquoTwo-stage extreme learningmachine for regressionrdquo Neurocomputing vol 73 no 16-18 pp3028ndash3038 2010

[18] C-T Lin M Prasad and A Saxena ldquoAn Improved PolynomialNeural Network Classifier Using Real-Coded Genetic Algo-rithmrdquo IEEE Transactions on Systems Man and CyberneticsSystems vol 45 no 11 pp 1389ndash1401 2015

[19] C F Stallmann and A P Engelbrecht ldquoGramophone NoiseDetection and Reconstruction Using Time Delay ArtificialNeural Networksrdquo IEEE Transactions on Systems Man andCybernetics Systems vol 47 no 6 pp 893ndash905 2017

[20] E Cambria G-BHuang and L L C Kasun ldquoExtreme learningmachinesrdquo IEEE Intelligent Systems vol 28 no 6 pp 30ndash592013

[21] W Zhu J Miao L Qing and G-B Huang ldquoHierarchicalExtreme Learning Machine for unsupervised representationlearningrdquo in Proceedings of the International Joint Conference onNeural Networks IJCNN 2015 Ireland July 2015

[22] W Yu F Zhuang Q He and Z Shi ldquoLearning deep representa-tions via extreme learningmachinesrdquoNeurocomputing pp 308ndash315 2015

[23] G-B Huang ldquoLearning capability and storage capacity oftwo-hidden-layer feedforward networksrdquo IEEE Transactions onNeural Networks and Learning Systems vol 14 no 2 pp 274ndash281 2003

[24] G-B Huang and H A Babri ldquoUpper bounds on the numberof hidden neurons in feedforward networks with arbitrarybounded nonlinear activation functionsrdquo IEEE Transactions onNeural Networks and Learning Systems vol 9 no 1 pp 224ndash2291998

[25] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[26] JWei H Liu G Yan and F Sun ldquoRobotic grasping recognitionusing multi-modal deep extreme learning machinerdquo Multidi-mensional Systems and Signal Processing vol 28 no 3 pp 817ndash833 2016

[27] J J Liang A K Qin P N Suganthan and S BaskarldquoComprehensive learning particle swarm optimizer for globaloptimization of multimodal functionsrdquo IEEE Transactions onEvolutionary Computation vol 10 no 3 pp 281ndash295 2006

[28] University of California Irvine Machine Learning Repositoryhttparchiveicsucieduml

[29] LWebsite httpswwwcsientuedutwsimcjlinlibsvmtoolsdata-sets

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

Mathematical Problems in Engineering 3

where the parameters 119899 and 119898 are the dimension of inputmatrix and output matrix

Then the ELM randomly sets the weights between theinput layer and the hidden layer

119908 =[[[[[[[[

11990811 11990812 sdot sdot sdot 119908111989911990821 11990822 sdot sdot sdot 1199082119899 d

1199081198971 1199081198972 sdot sdot sdot 119908119897119899

]]]]]]]]

(2)

where 119908119894119895 represents the weights between the 119895th input layerneuron and 119894th hidden layer neuron

Third the ELM assumes the weights between the hiddenlayer and the output layer that can be expressed as follows

120573 =[[[[[[[

12057311 12057312 sdot sdot sdot 120573111989812057321 12057322 sdot sdot sdot 1205732119898 d

1205731198971 1205731198972 sdot sdot sdot 120573119897119898

]]]]]]] (3)

where 120573119895119896 represents the weights between the 119895th hiddenlayer neuron and 119896th output layer neuron

Fourth the ELM randomly sets the bias of the hiddenlayer neurons

119861 = [1198871 1198872 sdot sdot sdot 119887119899]119879 (4)

Fifth the ELM chooses the network activation function 119892(119909)According to Figure 1 the output matrix 119879 can be expressedas follows

119879 = [1199051 1199052 119905119876]119898times119876 (5)

Each column vector of the output matrix 119879 is as follows

119905119895 =[[[[[[[[

11990511198951199052119895

119905119898119895

]]]]]]]]

=

[[[[[[[[[[[[[[[

119897sum119894=1

1205731198941119892 (119908119894119909119895 + 119887119894)119897sum119894=1

1205731198942119892 (119908119894119909119895 + 119887119894)

119897sum119894=1

120573119894119898119892 (119908119894119909119895 + 119887119894)

]]]]]]]]]]]]]]](119895 = 1 2 3 119876)

(6)

Sixth consider formulae (5) and (6) and we can get

119867120573 = 1198791015840 (7)

where 1198791015840 is the transpose of 119879 and 119867 is the output of thehidden layer In order to obtain the unique solution withminimum-error we use least square method to calculate theweight matrix values of 120573 [8 9]

120573 = 119867+1198791015840 (8)

W

B

X

H

B1

W1H2

f(x)

Figure 2 The workflow of the TELM

To improve the generalization ability of network andmake the results more stable we add a regularization termto the 120573 [26] When the number of hidden layer neurons isless than the number of training samples 120573 can be expressedas

120573 = ( 119868120582 + 119867119879119867)minus11198671198791198791015840 (9)

When the number of hidden layer nodes is more than thenumber of training samples 120573 can be expressed as

120573 = 119867119879 ( 119868120582 + 119867119867119879)minus1 1198791015840 (10)

3 Two-Hidden-Layer ELM

In 2016 B Y Qu and B F Lang proposed the two-hidden-layer extreme learning machine (TELM) in [11] The TELMtries to make the actual hidden layers output approachexpected hidden layer outputs by adding parameter settingstep for the second hidden layer Finally the TELM finds abetter way to map the relationship between input and outputsignals which is the two-hidden-layer ELM The networkstructure of TELM includes one input layer two hiddenlayers one output layer and each hidden layer with 119897 hiddenneurons The activation function of the network is selectedfor 119892(119909)

The focus of the TELM algorithm is the process ofcalculating and updating the weights between the first hiddenlayer and the second hidden layer as well as the bias of thehidden layer and the output weights between the secondhidden layer and the output layerThe workflow of the TELMarchitecture is depicted in Figure 2

Consider the training sample datasets 119883T =119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 is inputsamples and 119879 is the labeled samples

The TELM first puts the two hidden layers as one hiddenlayer so the output of the hidden layer can be expressed as119867 = 119892(119882119883 + 119861) with the parameters of the weight 119882 andbias 119861 of the first hidden layer randomly initialized Next theoutput weight matrix 120573 between the second hidden layer andthe output layer can be obtained by using

120573 = 119867+119879 (11)

Now the TELM separates the two hidden layers mergedpreviously so the network has two hidden layers Accordingto the workflow of Figure 2 the output of the second hiddenlayer can be obtained as follows

119892 (1198821119867 + 1198611) = 1198671 (12)

4 Mathematical Problems in Engineering

where1198821 is theweightmatrixes between the first hidden layerand the second hidden layer119867 is the outputmatrix of the firsthidden layer 1198611 is the bias of the second hidden layer and1198671is the expected output of the second hidden layer

However the expected output of the second hidden layercan be obtained by calculating

1198671 = 119879120573+ (13)

where 120573+ is the generalized inverse of the matrix 120573Now the TELM defines the matrix 119882119867119864 = [1198611 1198821]

so the parameters of the second hidden layer can be easilyobtained by using formula (12) and the inverse function ofthe activation function

119882119867119864 = 119892minus1 (1198671)119867+119864 (14)

where 119867+119864 is the generalized inverse of 119867119864 = [1 119867]119879 1denotes a one-column vector of size 119876 and its elements arethe scalar unit 1 The notation 119892minus1(119909) indicates the inverse ofthe activation function 119892(119909)

With selecting the appropriate activation function 119892(119909)the TELM calculates (14) so the actual output of the secondhidden layer is updated as follows

1198672 = 119892 (119882119867119864119867119864) (15)

So the weights matrix 120573 between the second hidden layerand the output layer is updated as follows

120573new = 119867+2 119879 (16)

where119867+2 is the generalized inverse of1198672 so the actual outputof the TELM network can be expressed as

119891 (119909) = 1198672120573new (17)

To sum up the TELM algorithm process can be expressed asfollows

Algorithm 1 (1) Assume the training sample dataset is119883 119879 = 119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 isinput samples and the matrix 119879 is the labeled samples eachhidden layer with 119897 hidden neurons the activation function119892(119909)(2) Randomly generate the weights 119882 between the inputlayer and the first hidden layer and bias 119861 of the first hiddenneurons119882119868119864 = [119861 119882] 119883119864 = [1 119883]119879(3) Calculate the equation119867 = 119892(119882119868119864119883119864)(4) Obtain the weights between the second hidden layerand output layer 120573 = 119867+119879(5) Calculate the expected output of the second hiddenlayer1198671 = 119879120573+(6) According to formulae (12)ndash(14) and the algorithmsteps (4 5) calculate the weights1198821 between the first hiddenlayer and the second hidden layer and the bias 1198611 of thesecond hidden neurons119882119867119864 = 119892minus1(1198671)119867+119864 (7) Obtain and update the actual output of the secondhidden layer1198672 = 119892(119882119867119864119867119864)(8)Update the weights matrix 120573 between the secondhidden layer and the output layer 120573new = 119867+2 119879(9)Calculate the output of the network 119891(119909) =119892[119882119867119892(119882119883 + 119861) + 1198611]120573new

e outputlayer

e thirdhidden layer

e secondhidden layer

e rsthidden layer

e inputlayer

f(x)

HQ

X

H4 = g(WHE1HE1)

H2 = g(WHEHE)

H = g(WIEXE)

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 3 The structure of the three-hidden-layer ELM

XW

B

H

B1 B2

W1 W2 H2

Figure 4 The workflow of the three-hidden-layer ELM

4 Multihidden-Layer ELM

At present some scholars have proposed many improve-ments on ELM algorithm and structure and obtained someachievements and advantages Such advantages of neuralnetwork motivate us to explore the better ideas behind theELM Based on the above articles we adjust the structure ofELM neural network Thus we propose an algorithm namedmultiple hidden layers extreme learning machine (MELM)The structure of the MELM (select the three-hidden-layerELM for example) is illustrated in Figure 3 The workflow ofthe three-hidden-layer ELM is illustrated in Figure 4

Here we take a three-hidden-layer ELM for example andanalyze the MELM algorithm below First give the trainingsamples 119883T = 119909119894 119905119894 (119894 = 1 2 3 119876) and the three-hidden-layer network structure (each of the three-hidden-layer has 119897 hidden neurons) with the activation function 119892(119909)The structure of the three-hidden-layer ELM has input layerthree hidden layers and output layer According to the theoryof the TELMalgorithm nowwe put the three hidden layers astwo hidden layers (the first hidden layer is still the first hiddenlayer but put the second hidden layer and the third hidden

Mathematical Problems in Engineering 5

layer together as one hidden layer) so that the structure of thenetwork is the same as the TELM network mentioned aboveSo we can obtain the weights matrix 120573new between the secondhidden layer and the output layer According to the numberof the actual samples we can use formula (8) or formula (9) tocalculate the weights 120573 which can improve the generalizationability of the network

Then theMELMseparates the three hidden layersmergedpreviously so theMELMstructure has three hidden layers Sothe expected output of the third hidden layer can be expressedas follows

1198673 = 119879120573+new (18)

120573+new is generalized inverse of the weights matrix 120573newThird the MELM defines the matrix 1198821198671198641 = [1198612 1198822]

so the parameters of the third hidden layer can be easilyobtained by calculating formula (18) and the formula 1198673 =119892(11986721198822 + 1198612) = 119892(11988211986711986411198671198641)

1198821198671198641 = 119892minus1 (1198673)119867+1198641 (19)

where1198672 is the actual output of the second hidden layer1198822is the weights between the second hidden layer and the thirdhidden layer1198612 is the bias of the third hidden neurons119867+1198641 isthe generational inverse of 1198671198641 = [1 1198672]119879 1 denotes a one-column vector of size 119876 and its elements are the scalar unit1 The notation 119892minus1(119909) indicates the inverse of the activationfunction 119892(119909)

In order to test the performance of the proposed MELMalgorithm we adopt different activation functions for regres-sion and classification problems to experiment Generally weadopt the logistic sigmoid function 119892(119909) = 1(1 + 119890minus119909) Theactual output of the third hidden layer is calculated as follows

1198674 = 119892 (W11986711986411198671198641) (20)

Finally the output weights matrix 120573new between the thirdhidden layer and the output layer is calculated as followswhen the number of hidden layer neurons is less than thenumber of training samples 120573 can be expressed as follows

120573new = ( 119868120582 + 11986711987941198674)

minus11198671198794 119879 (21)

When the number of hidden layer neurons is more thanthe number of training samples120573 can be expressed as follows

120573new = 1198671198794 ( 119868120582 + 11986741198671198794 )

minus1 119879 (22)

The actual output of the three-hidden-layer ELMnetwork canbe expressed as follows

119891 (119909) = 1198674120573new (23)

To ensure that the actual final hidden output moreapproaches the expected hidden output during the trainingprocess the operation process is the optimization of the net-work structure parameters starting from the second hiddenlayer

The above is the parameter calculation process of three-hidden-layer ELM network but the purpose of this paperis to calculate the parameter of the multiple hidden layersELM network and the final output of the MELM networkstructure We can use cycle calculation theory to illustratethe calculating process of theMELMWhen the four-hidden-layer ELM network occurs we can recalculate formula (18)to formula (22) in the calculation process of the networkobtain and record the parameters of each hidden layer andfinally calculate the final output of the MELM network If thenumber of hidden layers increased the calculation processcan be recycled and executed in the samewayThe calculationprocess of MELM network can be described as follows

Algorithm 2 (1) Assume the training sample dataset is119883 119879 = 119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 isthe input samples and the matrix 119879 is the labeled samplesEach hidden layer has 119897 hidden neurons with the activationfunction 119892(119909)(2) Randomly initialize the weights119882 between the inputlayer and the first hidden layer as well as the bias 119861 of the firsthidden neurons119882119868119864 = [119861 119882] 119883119864 = [1 119883]119879(3) Calculate the equation119867 = 119892(119882119868119864119883119864)(4) Calculate the weights between the hidden layers andthe output layer 120573 = (119868120582 + 119867119879119867)minus1119867119879119879 or 120573 = 119867119879(119868120582 +119867119867119879)minus1119879(5) Calculate the expected output of the second hiddenlayer1198671 = 119879120573+(6) According to formulae (12)ndash(14) and the algorithmsteps (4 5) calculate the weights1198821 between the first hiddenlayer and the second hidden layer and the bias 1198611 of thesecond hidden neurons119882119867119864 = 119892minus1(1198671)119867+119864 (7) Obtain and update the actual output of the secondhidden layer1198672 = 119892(119882119867119864119867119864)(8)Update the weights matrix 120573 between the hidden layerand the output layer 120573new = (119868120582 + 11986711987921198672)minus11198671198792 119879 or 120573new =1198671198792 (119868120582 + 11986721198671198792 )minus1119879(9) If the number of the hidden layer is three we cancalculate the parameters by recycle executing the aboveoperation from step (5) to step (9) Now 120573new is expressed asfollows 120573new = 120573 119867119864 = [1 1198672]119879(10) Calculate the output 119891(119909) = 1198672120573newIf the number119870 of the hidden layer ismore than three recycleis executing step (5) to step (9) for (119870 minus 1) times All the 119867matrix (1198671 1198672) must be normalized between the range ofminus09 and 09 when the max of the matrix is more than 1 andthe min of the matrix is less than minus15 Application

In order to verify the actual effect of the algorithm proposedin this paper we have done the following experiments whichare divided into three parts regression problems classifi-cation problems and the application of mineral selectionindustry All the experiments are conducted in the MATLAB2010b computational environment running on a computerwith a 2302GHZ in i3 CPU

6 Mathematical Problems in Engineering

Table 1 The RMSE of the three function examples

Algorithm Training RMSE Testing RMSE1198911(119909)

TELM 86233119864 minus 9 10797119864 minus 8MELM (three-hidden-layer) 8722119864 minus 15 13055119864 minus 14MLELM 18428119864 minus 6 15204119864 minus 5

1198912(119909)TELM 02867 02775MELM (three-hidden-layer) 00011 00019MELM 01060 01098

1198913(119909)TELM 03528 06204MELM (three-hidden-layer) 02110 04177MLELM 05538 04591

51 Regression Problems To test the performance of theregression problems several widely used functions are listedbelow [27]Weuse these functions to generate a datasetwhichincludes random selection of sufficient training samples andthe remaining is used as a testing samples and the activationfunction is selected as the hyperbolic tangent function 119892(119909) =(1 minus 119890minus119909)(1 + 119890minus119909)

(1) 1198911(119909) = sum119863119894=1 119909119894(2) 1198912(119909) = sum119863minus1119894=1 (100(1199092119894 minus 119909119894+1)2 + (119909119894 minus 1)2)(3) 1198913(119909) = minus20119890minus02radicsum119899119894=1 1199092119894 119863 minus 119890sum119899119894=1 cos(2120587119909119894)119863 + 20The symbol 119863 that is set as a positive integer represents

the dimensions of the function we are using The function1198912(119909) and 1198913(119909) are the complex multimodal functionEach function-experiment has 900 training samples and 100testing samples The evaluation criterion of the experimentsis the average accuracy namely the root mean square error(RMSE) in the regression problems We first need to conductsome experiments to verify the average accuracy of the neuralnetwork structure in different number of hidden neuronsand hidden layers Then we can observe the optimal neuralnetwork structure with specific hidden layers number andhidden neurons number compared with the results of theTELM on regression problems

In Table 1 we give the average RMSE of the above threefunctions in the case of the two-hidden-layer structure theMLELM and the three-hidden-layer structure (we do a seriesof experiments including the four-hidden-layer structurefive-hidden-layer structure and eight-hidden-layer struc-ture But the experiments proved that the three-hidden-layer structure for the three functions 1198911(119909) 1198912(119909) 1198913(119909) canachieve the best results so we only give the value of the RMSEof the three-hidden-layer structure) in regression problemsThe table includes the testing RMSE and the training RMSEof the three algorithm structure We can see that the MELM(three-hidden-layer) has the better results

Table 2 The three datasets for the classification

Datasets Training samples Testing samples Attributes ClassMice Protein 750 330 82 8svmguide4 300 312 10 6vowel 568 422 10 11AutoUnix 380 120 101 4Iris 100 50 4 3

52 Classification Problems To test the performance of theMELM algorithm proposed on classification datasets so wequote some datasets (such asMice Protein svmguide4 vowelAutoUnix and Iris) that are collected from the University ofCalifornia [28] and the LIBSVM website [29] The trainingdata and testing data of each experiment are randomlyselected from the original datasets This information of thedatasets introduced is given in Table 2

The classification performance criteria of the problemsare the average classification accuracy of the testing data Inthe Figure 5 the average testing classification correct percent-age for the ELM TELM MLELM and MELM algorithm isshown clearly by using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris and different datasets (ab c d and e) have an important effect on the classificationaccuracy for the three algorithms (ELM TELM MLELMand MELM) But from the changing trend of the averagetesting correct percentage for the three algorithms with thesame datasets we can see that the MELM algorithm canselect the optimal number of hidden layers for the networkstructure to adapt to the specific datasets aiming at obtainingthe better results So the datasets of the Mice Protein andthe Forest type mapping use the three-hidden-layer ELMalgorithm to experiment and the datasets of the svmguide4use the five-hidden-layer ELM algorithm to experiment

53 The Application of Ores Selection Industry In this partwe use the actual datasets to analyze and experiment withthe model we have established and test the performance ofthe MELM algorithm First our datasets come from AnQianMining Group namely the mining area of Ya-ba-ling andXi-da-bei And the samples we selected are the hematite(50 samples) magnetite (90 samples) granite (57 samples)phyllite (15 samples) and chlorite (32 samples) On theprecise classification of the ores and the prediction of totaliron content of iron ores people usually use the chemicalanalysis method But the analysis process of total iron contentis a complex process a long process and the color of thesolution sometimes without the significantly change in theend Due to the extensive using of near infrared spectroscopyfor chemical composition analysis we can infer the structureof the unknown substance according to the position andshape of peaks in the absorption spectra So we can use thetheory to obtain the correct information of the ores with thelow influence of environment on data acquisition HR1024SVC spectrometer is used to test the spectral data of eachsample and the total iron content of hematite and magnetiteis obtained from the chemical analysis center of Northeastern

Mathematical Problems in Engineering 7

ELMTELM

Three-hidden-layerMLELM

30

40

50

60

70

80

90

100Av

erag

e tes

ting

corr

ect p

erce

ntag

e (

)

50 100 150 200 250 3000The number of hidden neurons

(a)

ELMTELM

Five-hidden-layerMLELM

2025303540455055606570

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(b)

ELMTELM

Three-hidden-layerMLELM

10

20

30

40

50

60

70

80

90

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(c)

ELMTELM

Five-hidden-layerMLELM

45

50

55

60

65

70

75

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(d)

ELMTELM

Three-hidden-layerMLELM

55

60

65

70

75

80

85

90

95

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(e)

Figure 5 The average testing classification correct percentage for the ELM TELM and MELM using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris

8 Mathematical Problems in Engineering

Training data

ELMTELM

Three-hidden-layerMLELM

75

80

85

90

95

100

Aver

age t

rain

ing

corr

ect p

erce

ntag

e (

)

40 60 80 100 120 140 16020The number of hidden neurons

(a)

Testing data

ELMTELM

Three-hidden-layerMLELM

50

55

60

65

70

75

80

85

90

95

100

Aver

age t

estin

g co

rrec

t per

cent

age (

)

40 60 80 100 120 140 16020The number of hidden neurons

(b)

Figure 6 Average classification accuracy of different hidden layers

University of China Our experimentsrsquo datasets are the nearinfrared spectrum information of iron ores which are theabsorption rate of different iron ores under the condition ofthe near infrared spectrum The wavelength of near infraredspectrum is 300 nmndash2500 nm so the datasets of the iron oreswe obtained have the very high dimensions (973 dimensions)We use the theory of the principal component analysis toreduce the dimensions of the datasets

(A) Here we use the algorithm proposed in this paper toclassify the five kinds of ores and compare with the resultsof the ELM algorithm and TELM algorithm The activa-tion function of the algorithm in classification problems isselected as the sigmoid function 119892(119909) = 1(1 + 119890minus119909)

Figure 6 expresses the average classification accuracy ofthe ELM TELM MLELM and MELM (three-hidden-layerELM) algorithm in different hidden layers After we conducta series of experiments with different hidden layers of theMELM structure finally we select the three-hidden-layerELM structure which can obtain the optimal classificationresults in the MELM algorithm Figure 6(a) is the averagetraining correct percentage and the MELM has the highresult compared with others in the same hidden neuronsFigure 6(b) is the average testing correct percentage and theMELM has the high result in the hidden neurons between 20and 60 In the actual situation we can reasonably choose thenumber of each hidden neurons to model

(B) Here we use the method proposed in this paper totest the total iron content of the hematite and magnetite andcompare with the results of the ELM algorithm and TELMalgorithm Here is the prediction of total iron content ofhematite and magnetite with the optimal hidden layers andoptimal hidden layer neurons The activation function of the

algorithm in regression problems is selected as the hyperbolictangent function 119892(119909) = (1 minus 119890minus119909)(1 + 119890minus119909)

Figures 7 and 8 respectively expresses the total ironcontent of the hematite and magnetite by using the ELMTELM MLELM andMELM algorithm (the MELM includesthe eight-hidden-layer structure which is the optimal struc-ture in Figure 7 the six-hidden-layer structure which is theoptimal structure in Figure 8) The optimal results of eachalgorithm are obtained by constantly experimenting with thedifferent number of hidden layer neurons In the predictionprocess of the hematite we can see that the MELM has thebetter results and each hidden layer of this structure has 100hidden neurons In the prediction process of the magnetitewe can see that the MELM has the better results and eachhidden layer of this structure has 20 hidden neurons Sothe MELM algorithm has strong performance to the oresselection problems which can achieve the best adaptabilityto the problems by adjusting the number of hidden layers andthe number of hidden layer neurons to improve the ability ofsystem regression analysis

6 Conclusion

The MELM we proposed in this paper solves the two prob-lems which exist in the training process of the original ELMThe first is the stability of single network that the disadvan-tage will also influence the generalization performance Thesecond is that the outputweights are calculated by the productof the generalized inverse of hidden layer output and thesystem actual output And the parameters randomly selectedof hidden layer neurons which can lead to the singularmatrix or morbid matrix At the same time the MELM

Mathematical Problems in Engineering 9

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

16

18

20

22

24

26

28

30

32

34

36Tr

ain

set o

utpu

t

10 20 30 40 50 600Train set sample

(a)

1 2 3 4 5 6 7 8 9 10Test set sample

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

15

20

25

30

35

40

45

Test

set o

utpu

t(b)

Figure 7 The total iron content of the hematite

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

Trai

n se

t out

put

2 3 4 5 6 7 8 91Train set sample

(a)

1 15 2 25 3 35 4 45 5Test set sample

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

40

Test

set o

utpu

t

(b)

Figure 8 The total iron content of the magnetite

network structure also improves the average accuracy oftraining and testing performance compared to the ELM andTELM network structure The MELM algorithm inherits thecharacteristics of traditional ELM that randomly initializesthe weights and bias (between the input layer and the firsthidden layer) also adopts a part of the TELM algorithm anduses the inverse activation function to calculate the weightsand bias of hidden layers (except the first hidden layer)Then we make the actual hidden layer output approximateto the expected hidden layer output and use the parameters

obtained above to calculate the actual output In the functionregression problems this algorithm reduces the least meansquare error In the datasets classification problems theaverage accuracy of themultiple classifications is significantlyhigher than that of the ELM and TELMnetwork structure Insuch cases the MELM is able to improve the performance ofthe network structure

Conflicts of Interest

The authors declare no conflicts of interest

10 Mathematical Problems in Engineering

Acknowledgments

This research is supported by National Natural ScienceFoundation of China (Grants nos 41371437 61473072and 61203214) Fundamental Research Funds for the Cen-tral Universities (N160404008) National Twelfth Five-YearPlan for Science and Technology Support (2015BAB15B01)China and National Key Research and Development Plan(2016YFC0801602)

References

[1] K Hornik ldquoApproximation capabilities of multilayer feedfor-ward networksrdquoNeural Networks vol 4 no 2 pp 251ndash257 1991

[2] M Leshno V Y Lin A Pinkus and S Schocken ldquoMultilayerfeedforward networks with a nonpolynomial activation func-tion can approximate any functionrdquoNeural Networks vol 6 no6 pp 861ndash867 1993

[3] R Hecht-Nielsen ldquoTheory of the backpropagation neural net-workrdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo89) vol 1 pp 593ndash605 WashingtonDC USA June 1989

[4] H-C Hsin C-C Li M Sun and R J Sclabassi ldquoAn AdaptiveTraining Algorithm for Back-Propagation Neural NetworksrdquoIEEE Transactions on SystemsMan and Cybernetics vol 25 no3 pp 512ndash514 1995

[5] D Lowe ldquodaptive radial basis function nonlinearities and theproblemof generalizationrdquo in First IEE International Conferenceon Artificial Neural Networks pp 313-171 1989

[6] S Ding G Ma and Z Shi ldquoA Rough RBF Neural NetworkBased on Weighted Regularized Extreme Learning MachinerdquoNeural Processing Letters vol 40 no 3 pp 245ndash260 2014

[7] C SunW HeW Ge and C Chang ldquoAdaptive Neural NetworkControl of Biped Robotsrdquo IEEE Transactions on Systems Manand Cybernetics Systems vol 47 no 2 pp 315ndash326 2017

[8] S-J Wang H-L Chen W-J Yan Y-H Chen and X FuldquoFace recognition and micro-expression recognition based ondiscriminant tensor subspace analysis plus extreme learningmachinerdquo Neural Processing Letters vol 39 no 1 pp 25ndash432014

[9] G Huang ldquoAn insight into extreme learningmachines randomneurons random features and kernelsrdquo Cognitive Computation2014

[10] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[11] S Ding N Zhang X Xu L Guo and J Zhang ldquoDeep ExtremeLearning Machine and Its Application in EEG ClassificationrdquoMathematical Problems in Engineering vol 2015 Article ID129021 2015

[12] B Qu B Lang J Liang A Qin and O Crisalle ldquoTwo-hidden-layer extreme learning machine for regression andclassificationrdquo Neurocomputing vol 175 pp 826ndash834 2016

[13] S I Tamura and M Tateishi ldquoCapabilities of a four-layeredfeedforward neural network four layers versus threerdquo IEEETransactions on Neural Networks and Learning Systems vol 8no 2 pp 251ndash255 1997

[14] J Zhao Z Wang and D S Park ldquoOnline sequential extremelearning machine with forgetting mechanismrdquo Neurocomput-ing vol 87 no 15 pp 79ndash89 2012

[15] B Mirza Z Lin and K-A Toh ldquoWeighted online sequentialextreme learning machine for class imbalance learningrdquoNeuralProcessing Letters vol 38 no 3 pp 465ndash486 2013

[16] X Liu L Wang G B Huang J Zhang and J Yin ldquoMultiplekernel extreme learning machinerdquo Neurocomputing vol 149part A pp 253ndash264 2015

[17] Y Lan Y C Soh andG BHuang ldquoTwo-stage extreme learningmachine for regressionrdquo Neurocomputing vol 73 no 16-18 pp3028ndash3038 2010

[18] C-T Lin M Prasad and A Saxena ldquoAn Improved PolynomialNeural Network Classifier Using Real-Coded Genetic Algo-rithmrdquo IEEE Transactions on Systems Man and CyberneticsSystems vol 45 no 11 pp 1389ndash1401 2015

[19] C F Stallmann and A P Engelbrecht ldquoGramophone NoiseDetection and Reconstruction Using Time Delay ArtificialNeural Networksrdquo IEEE Transactions on Systems Man andCybernetics Systems vol 47 no 6 pp 893ndash905 2017

[20] E Cambria G-BHuang and L L C Kasun ldquoExtreme learningmachinesrdquo IEEE Intelligent Systems vol 28 no 6 pp 30ndash592013

[21] W Zhu J Miao L Qing and G-B Huang ldquoHierarchicalExtreme Learning Machine for unsupervised representationlearningrdquo in Proceedings of the International Joint Conference onNeural Networks IJCNN 2015 Ireland July 2015

[22] W Yu F Zhuang Q He and Z Shi ldquoLearning deep representa-tions via extreme learningmachinesrdquoNeurocomputing pp 308ndash315 2015

[23] G-B Huang ldquoLearning capability and storage capacity oftwo-hidden-layer feedforward networksrdquo IEEE Transactions onNeural Networks and Learning Systems vol 14 no 2 pp 274ndash281 2003

[24] G-B Huang and H A Babri ldquoUpper bounds on the numberof hidden neurons in feedforward networks with arbitrarybounded nonlinear activation functionsrdquo IEEE Transactions onNeural Networks and Learning Systems vol 9 no 1 pp 224ndash2291998

[25] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[26] JWei H Liu G Yan and F Sun ldquoRobotic grasping recognitionusing multi-modal deep extreme learning machinerdquo Multidi-mensional Systems and Signal Processing vol 28 no 3 pp 817ndash833 2016

[27] J J Liang A K Qin P N Suganthan and S BaskarldquoComprehensive learning particle swarm optimizer for globaloptimization of multimodal functionsrdquo IEEE Transactions onEvolutionary Computation vol 10 no 3 pp 281ndash295 2006

[28] University of California Irvine Machine Learning Repositoryhttparchiveicsucieduml

[29] LWebsite httpswwwcsientuedutwsimcjlinlibsvmtoolsdata-sets

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

4 Mathematical Problems in Engineering

where1198821 is theweightmatrixes between the first hidden layerand the second hidden layer119867 is the outputmatrix of the firsthidden layer 1198611 is the bias of the second hidden layer and1198671is the expected output of the second hidden layer

However the expected output of the second hidden layercan be obtained by calculating

1198671 = 119879120573+ (13)

where 120573+ is the generalized inverse of the matrix 120573Now the TELM defines the matrix 119882119867119864 = [1198611 1198821]

so the parameters of the second hidden layer can be easilyobtained by using formula (12) and the inverse function ofthe activation function

119882119867119864 = 119892minus1 (1198671)119867+119864 (14)

where 119867+119864 is the generalized inverse of 119867119864 = [1 119867]119879 1denotes a one-column vector of size 119876 and its elements arethe scalar unit 1 The notation 119892minus1(119909) indicates the inverse ofthe activation function 119892(119909)

With selecting the appropriate activation function 119892(119909)the TELM calculates (14) so the actual output of the secondhidden layer is updated as follows

1198672 = 119892 (119882119867119864119867119864) (15)

So the weights matrix 120573 between the second hidden layerand the output layer is updated as follows

120573new = 119867+2 119879 (16)

where119867+2 is the generalized inverse of1198672 so the actual outputof the TELM network can be expressed as

119891 (119909) = 1198672120573new (17)

To sum up the TELM algorithm process can be expressed asfollows

Algorithm 1 (1) Assume the training sample dataset is119883 119879 = 119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 isinput samples and the matrix 119879 is the labeled samples eachhidden layer with 119897 hidden neurons the activation function119892(119909)(2) Randomly generate the weights 119882 between the inputlayer and the first hidden layer and bias 119861 of the first hiddenneurons119882119868119864 = [119861 119882] 119883119864 = [1 119883]119879(3) Calculate the equation119867 = 119892(119882119868119864119883119864)(4) Obtain the weights between the second hidden layerand output layer 120573 = 119867+119879(5) Calculate the expected output of the second hiddenlayer1198671 = 119879120573+(6) According to formulae (12)ndash(14) and the algorithmsteps (4 5) calculate the weights1198821 between the first hiddenlayer and the second hidden layer and the bias 1198611 of thesecond hidden neurons119882119867119864 = 119892minus1(1198671)119867+119864 (7) Obtain and update the actual output of the secondhidden layer1198672 = 119892(119882119867119864119867119864)(8)Update the weights matrix 120573 between the secondhidden layer and the output layer 120573new = 119867+2 119879(9)Calculate the output of the network 119891(119909) =119892[119882119867119892(119882119883 + 119861) + 1198611]120573new

e outputlayer

e thirdhidden layer

e secondhidden layer

e rsthidden layer

e inputlayer

f(x)

HQ

X

H4 = g(WHE1HE1)

H2 = g(WHEHE)

H = g(WIEXE)

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

Figure 3 The structure of the three-hidden-layer ELM

XW

B

H

B1 B2

W1 W2 H2

Figure 4 The workflow of the three-hidden-layer ELM

4 Multihidden-Layer ELM

At present some scholars have proposed many improve-ments on ELM algorithm and structure and obtained someachievements and advantages Such advantages of neuralnetwork motivate us to explore the better ideas behind theELM Based on the above articles we adjust the structure ofELM neural network Thus we propose an algorithm namedmultiple hidden layers extreme learning machine (MELM)The structure of the MELM (select the three-hidden-layerELM for example) is illustrated in Figure 3 The workflow ofthe three-hidden-layer ELM is illustrated in Figure 4

Here we take a three-hidden-layer ELM for example andanalyze the MELM algorithm below First give the trainingsamples 119883T = 119909119894 119905119894 (119894 = 1 2 3 119876) and the three-hidden-layer network structure (each of the three-hidden-layer has 119897 hidden neurons) with the activation function 119892(119909)The structure of the three-hidden-layer ELM has input layerthree hidden layers and output layer According to the theoryof the TELMalgorithm nowwe put the three hidden layers astwo hidden layers (the first hidden layer is still the first hiddenlayer but put the second hidden layer and the third hidden

Mathematical Problems in Engineering 5

layer together as one hidden layer) so that the structure of thenetwork is the same as the TELM network mentioned aboveSo we can obtain the weights matrix 120573new between the secondhidden layer and the output layer According to the numberof the actual samples we can use formula (8) or formula (9) tocalculate the weights 120573 which can improve the generalizationability of the network

Then theMELMseparates the three hidden layersmergedpreviously so theMELMstructure has three hidden layers Sothe expected output of the third hidden layer can be expressedas follows

1198673 = 119879120573+new (18)

120573+new is generalized inverse of the weights matrix 120573newThird the MELM defines the matrix 1198821198671198641 = [1198612 1198822]

so the parameters of the third hidden layer can be easilyobtained by calculating formula (18) and the formula 1198673 =119892(11986721198822 + 1198612) = 119892(11988211986711986411198671198641)

1198821198671198641 = 119892minus1 (1198673)119867+1198641 (19)

where1198672 is the actual output of the second hidden layer1198822is the weights between the second hidden layer and the thirdhidden layer1198612 is the bias of the third hidden neurons119867+1198641 isthe generational inverse of 1198671198641 = [1 1198672]119879 1 denotes a one-column vector of size 119876 and its elements are the scalar unit1 The notation 119892minus1(119909) indicates the inverse of the activationfunction 119892(119909)

In order to test the performance of the proposed MELMalgorithm we adopt different activation functions for regres-sion and classification problems to experiment Generally weadopt the logistic sigmoid function 119892(119909) = 1(1 + 119890minus119909) Theactual output of the third hidden layer is calculated as follows

1198674 = 119892 (W11986711986411198671198641) (20)

Finally the output weights matrix 120573new between the thirdhidden layer and the output layer is calculated as followswhen the number of hidden layer neurons is less than thenumber of training samples 120573 can be expressed as follows

120573new = ( 119868120582 + 11986711987941198674)

minus11198671198794 119879 (21)

When the number of hidden layer neurons is more thanthe number of training samples120573 can be expressed as follows

120573new = 1198671198794 ( 119868120582 + 11986741198671198794 )

minus1 119879 (22)

The actual output of the three-hidden-layer ELMnetwork canbe expressed as follows

119891 (119909) = 1198674120573new (23)

To ensure that the actual final hidden output moreapproaches the expected hidden output during the trainingprocess the operation process is the optimization of the net-work structure parameters starting from the second hiddenlayer

The above is the parameter calculation process of three-hidden-layer ELM network but the purpose of this paperis to calculate the parameter of the multiple hidden layersELM network and the final output of the MELM networkstructure We can use cycle calculation theory to illustratethe calculating process of theMELMWhen the four-hidden-layer ELM network occurs we can recalculate formula (18)to formula (22) in the calculation process of the networkobtain and record the parameters of each hidden layer andfinally calculate the final output of the MELM network If thenumber of hidden layers increased the calculation processcan be recycled and executed in the samewayThe calculationprocess of MELM network can be described as follows

Algorithm 2 (1) Assume the training sample dataset is119883 119879 = 119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 isthe input samples and the matrix 119879 is the labeled samplesEach hidden layer has 119897 hidden neurons with the activationfunction 119892(119909)(2) Randomly initialize the weights119882 between the inputlayer and the first hidden layer as well as the bias 119861 of the firsthidden neurons119882119868119864 = [119861 119882] 119883119864 = [1 119883]119879(3) Calculate the equation119867 = 119892(119882119868119864119883119864)(4) Calculate the weights between the hidden layers andthe output layer 120573 = (119868120582 + 119867119879119867)minus1119867119879119879 or 120573 = 119867119879(119868120582 +119867119867119879)minus1119879(5) Calculate the expected output of the second hiddenlayer1198671 = 119879120573+(6) According to formulae (12)ndash(14) and the algorithmsteps (4 5) calculate the weights1198821 between the first hiddenlayer and the second hidden layer and the bias 1198611 of thesecond hidden neurons119882119867119864 = 119892minus1(1198671)119867+119864 (7) Obtain and update the actual output of the secondhidden layer1198672 = 119892(119882119867119864119867119864)(8)Update the weights matrix 120573 between the hidden layerand the output layer 120573new = (119868120582 + 11986711987921198672)minus11198671198792 119879 or 120573new =1198671198792 (119868120582 + 11986721198671198792 )minus1119879(9) If the number of the hidden layer is three we cancalculate the parameters by recycle executing the aboveoperation from step (5) to step (9) Now 120573new is expressed asfollows 120573new = 120573 119867119864 = [1 1198672]119879(10) Calculate the output 119891(119909) = 1198672120573newIf the number119870 of the hidden layer ismore than three recycleis executing step (5) to step (9) for (119870 minus 1) times All the 119867matrix (1198671 1198672) must be normalized between the range ofminus09 and 09 when the max of the matrix is more than 1 andthe min of the matrix is less than minus15 Application

In order to verify the actual effect of the algorithm proposedin this paper we have done the following experiments whichare divided into three parts regression problems classifi-cation problems and the application of mineral selectionindustry All the experiments are conducted in the MATLAB2010b computational environment running on a computerwith a 2302GHZ in i3 CPU

6 Mathematical Problems in Engineering

Table 1 The RMSE of the three function examples

Algorithm Training RMSE Testing RMSE1198911(119909)

TELM 86233119864 minus 9 10797119864 minus 8MELM (three-hidden-layer) 8722119864 minus 15 13055119864 minus 14MLELM 18428119864 minus 6 15204119864 minus 5

1198912(119909)TELM 02867 02775MELM (three-hidden-layer) 00011 00019MELM 01060 01098

1198913(119909)TELM 03528 06204MELM (three-hidden-layer) 02110 04177MLELM 05538 04591

51 Regression Problems To test the performance of theregression problems several widely used functions are listedbelow [27]Weuse these functions to generate a datasetwhichincludes random selection of sufficient training samples andthe remaining is used as a testing samples and the activationfunction is selected as the hyperbolic tangent function 119892(119909) =(1 minus 119890minus119909)(1 + 119890minus119909)

(1) 1198911(119909) = sum119863119894=1 119909119894(2) 1198912(119909) = sum119863minus1119894=1 (100(1199092119894 minus 119909119894+1)2 + (119909119894 minus 1)2)(3) 1198913(119909) = minus20119890minus02radicsum119899119894=1 1199092119894 119863 minus 119890sum119899119894=1 cos(2120587119909119894)119863 + 20The symbol 119863 that is set as a positive integer represents

the dimensions of the function we are using The function1198912(119909) and 1198913(119909) are the complex multimodal functionEach function-experiment has 900 training samples and 100testing samples The evaluation criterion of the experimentsis the average accuracy namely the root mean square error(RMSE) in the regression problems We first need to conductsome experiments to verify the average accuracy of the neuralnetwork structure in different number of hidden neuronsand hidden layers Then we can observe the optimal neuralnetwork structure with specific hidden layers number andhidden neurons number compared with the results of theTELM on regression problems

In Table 1 we give the average RMSE of the above threefunctions in the case of the two-hidden-layer structure theMLELM and the three-hidden-layer structure (we do a seriesof experiments including the four-hidden-layer structurefive-hidden-layer structure and eight-hidden-layer struc-ture But the experiments proved that the three-hidden-layer structure for the three functions 1198911(119909) 1198912(119909) 1198913(119909) canachieve the best results so we only give the value of the RMSEof the three-hidden-layer structure) in regression problemsThe table includes the testing RMSE and the training RMSEof the three algorithm structure We can see that the MELM(three-hidden-layer) has the better results

Table 2 The three datasets for the classification

Datasets Training samples Testing samples Attributes ClassMice Protein 750 330 82 8svmguide4 300 312 10 6vowel 568 422 10 11AutoUnix 380 120 101 4Iris 100 50 4 3

52 Classification Problems To test the performance of theMELM algorithm proposed on classification datasets so wequote some datasets (such asMice Protein svmguide4 vowelAutoUnix and Iris) that are collected from the University ofCalifornia [28] and the LIBSVM website [29] The trainingdata and testing data of each experiment are randomlyselected from the original datasets This information of thedatasets introduced is given in Table 2

The classification performance criteria of the problemsare the average classification accuracy of the testing data Inthe Figure 5 the average testing classification correct percent-age for the ELM TELM MLELM and MELM algorithm isshown clearly by using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris and different datasets (ab c d and e) have an important effect on the classificationaccuracy for the three algorithms (ELM TELM MLELMand MELM) But from the changing trend of the averagetesting correct percentage for the three algorithms with thesame datasets we can see that the MELM algorithm canselect the optimal number of hidden layers for the networkstructure to adapt to the specific datasets aiming at obtainingthe better results So the datasets of the Mice Protein andthe Forest type mapping use the three-hidden-layer ELMalgorithm to experiment and the datasets of the svmguide4use the five-hidden-layer ELM algorithm to experiment

53 The Application of Ores Selection Industry In this partwe use the actual datasets to analyze and experiment withthe model we have established and test the performance ofthe MELM algorithm First our datasets come from AnQianMining Group namely the mining area of Ya-ba-ling andXi-da-bei And the samples we selected are the hematite(50 samples) magnetite (90 samples) granite (57 samples)phyllite (15 samples) and chlorite (32 samples) On theprecise classification of the ores and the prediction of totaliron content of iron ores people usually use the chemicalanalysis method But the analysis process of total iron contentis a complex process a long process and the color of thesolution sometimes without the significantly change in theend Due to the extensive using of near infrared spectroscopyfor chemical composition analysis we can infer the structureof the unknown substance according to the position andshape of peaks in the absorption spectra So we can use thetheory to obtain the correct information of the ores with thelow influence of environment on data acquisition HR1024SVC spectrometer is used to test the spectral data of eachsample and the total iron content of hematite and magnetiteis obtained from the chemical analysis center of Northeastern

Mathematical Problems in Engineering 7

ELMTELM

Three-hidden-layerMLELM

30

40

50

60

70

80

90

100Av

erag

e tes

ting

corr

ect p

erce

ntag

e (

)

50 100 150 200 250 3000The number of hidden neurons

(a)

ELMTELM

Five-hidden-layerMLELM

2025303540455055606570

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(b)

ELMTELM

Three-hidden-layerMLELM

10

20

30

40

50

60

70

80

90

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(c)

ELMTELM

Five-hidden-layerMLELM

45

50

55

60

65

70

75

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(d)

ELMTELM

Three-hidden-layerMLELM

55

60

65

70

75

80

85

90

95

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(e)

Figure 5 The average testing classification correct percentage for the ELM TELM and MELM using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris

8 Mathematical Problems in Engineering

Training data

ELMTELM

Three-hidden-layerMLELM

75

80

85

90

95

100

Aver

age t

rain

ing

corr

ect p

erce

ntag

e (

)

40 60 80 100 120 140 16020The number of hidden neurons

(a)

Testing data

ELMTELM

Three-hidden-layerMLELM

50

55

60

65

70

75

80

85

90

95

100

Aver

age t

estin

g co

rrec

t per

cent

age (

)

40 60 80 100 120 140 16020The number of hidden neurons

(b)

Figure 6 Average classification accuracy of different hidden layers

University of China Our experimentsrsquo datasets are the nearinfrared spectrum information of iron ores which are theabsorption rate of different iron ores under the condition ofthe near infrared spectrum The wavelength of near infraredspectrum is 300 nmndash2500 nm so the datasets of the iron oreswe obtained have the very high dimensions (973 dimensions)We use the theory of the principal component analysis toreduce the dimensions of the datasets

(A) Here we use the algorithm proposed in this paper toclassify the five kinds of ores and compare with the resultsof the ELM algorithm and TELM algorithm The activa-tion function of the algorithm in classification problems isselected as the sigmoid function 119892(119909) = 1(1 + 119890minus119909)

Figure 6 expresses the average classification accuracy ofthe ELM TELM MLELM and MELM (three-hidden-layerELM) algorithm in different hidden layers After we conducta series of experiments with different hidden layers of theMELM structure finally we select the three-hidden-layerELM structure which can obtain the optimal classificationresults in the MELM algorithm Figure 6(a) is the averagetraining correct percentage and the MELM has the highresult compared with others in the same hidden neuronsFigure 6(b) is the average testing correct percentage and theMELM has the high result in the hidden neurons between 20and 60 In the actual situation we can reasonably choose thenumber of each hidden neurons to model

(B) Here we use the method proposed in this paper totest the total iron content of the hematite and magnetite andcompare with the results of the ELM algorithm and TELMalgorithm Here is the prediction of total iron content ofhematite and magnetite with the optimal hidden layers andoptimal hidden layer neurons The activation function of the

algorithm in regression problems is selected as the hyperbolictangent function 119892(119909) = (1 minus 119890minus119909)(1 + 119890minus119909)

Figures 7 and 8 respectively expresses the total ironcontent of the hematite and magnetite by using the ELMTELM MLELM andMELM algorithm (the MELM includesthe eight-hidden-layer structure which is the optimal struc-ture in Figure 7 the six-hidden-layer structure which is theoptimal structure in Figure 8) The optimal results of eachalgorithm are obtained by constantly experimenting with thedifferent number of hidden layer neurons In the predictionprocess of the hematite we can see that the MELM has thebetter results and each hidden layer of this structure has 100hidden neurons In the prediction process of the magnetitewe can see that the MELM has the better results and eachhidden layer of this structure has 20 hidden neurons Sothe MELM algorithm has strong performance to the oresselection problems which can achieve the best adaptabilityto the problems by adjusting the number of hidden layers andthe number of hidden layer neurons to improve the ability ofsystem regression analysis

6 Conclusion

The MELM we proposed in this paper solves the two prob-lems which exist in the training process of the original ELMThe first is the stability of single network that the disadvan-tage will also influence the generalization performance Thesecond is that the outputweights are calculated by the productof the generalized inverse of hidden layer output and thesystem actual output And the parameters randomly selectedof hidden layer neurons which can lead to the singularmatrix or morbid matrix At the same time the MELM

Mathematical Problems in Engineering 9

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

16

18

20

22

24

26

28

30

32

34

36Tr

ain

set o

utpu

t

10 20 30 40 50 600Train set sample

(a)

1 2 3 4 5 6 7 8 9 10Test set sample

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

15

20

25

30

35

40

45

Test

set o

utpu

t(b)

Figure 7 The total iron content of the hematite

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

Trai

n se

t out

put

2 3 4 5 6 7 8 91Train set sample

(a)

1 15 2 25 3 35 4 45 5Test set sample

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

40

Test

set o

utpu

t

(b)

Figure 8 The total iron content of the magnetite

network structure also improves the average accuracy oftraining and testing performance compared to the ELM andTELM network structure The MELM algorithm inherits thecharacteristics of traditional ELM that randomly initializesthe weights and bias (between the input layer and the firsthidden layer) also adopts a part of the TELM algorithm anduses the inverse activation function to calculate the weightsand bias of hidden layers (except the first hidden layer)Then we make the actual hidden layer output approximateto the expected hidden layer output and use the parameters

obtained above to calculate the actual output In the functionregression problems this algorithm reduces the least meansquare error In the datasets classification problems theaverage accuracy of themultiple classifications is significantlyhigher than that of the ELM and TELMnetwork structure Insuch cases the MELM is able to improve the performance ofthe network structure

Conflicts of Interest

The authors declare no conflicts of interest

10 Mathematical Problems in Engineering

Acknowledgments

This research is supported by National Natural ScienceFoundation of China (Grants nos 41371437 61473072and 61203214) Fundamental Research Funds for the Cen-tral Universities (N160404008) National Twelfth Five-YearPlan for Science and Technology Support (2015BAB15B01)China and National Key Research and Development Plan(2016YFC0801602)

References

[1] K Hornik ldquoApproximation capabilities of multilayer feedfor-ward networksrdquoNeural Networks vol 4 no 2 pp 251ndash257 1991

[2] M Leshno V Y Lin A Pinkus and S Schocken ldquoMultilayerfeedforward networks with a nonpolynomial activation func-tion can approximate any functionrdquoNeural Networks vol 6 no6 pp 861ndash867 1993

[3] R Hecht-Nielsen ldquoTheory of the backpropagation neural net-workrdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo89) vol 1 pp 593ndash605 WashingtonDC USA June 1989

[4] H-C Hsin C-C Li M Sun and R J Sclabassi ldquoAn AdaptiveTraining Algorithm for Back-Propagation Neural NetworksrdquoIEEE Transactions on SystemsMan and Cybernetics vol 25 no3 pp 512ndash514 1995

[5] D Lowe ldquodaptive radial basis function nonlinearities and theproblemof generalizationrdquo in First IEE International Conferenceon Artificial Neural Networks pp 313-171 1989

[6] S Ding G Ma and Z Shi ldquoA Rough RBF Neural NetworkBased on Weighted Regularized Extreme Learning MachinerdquoNeural Processing Letters vol 40 no 3 pp 245ndash260 2014

[7] C SunW HeW Ge and C Chang ldquoAdaptive Neural NetworkControl of Biped Robotsrdquo IEEE Transactions on Systems Manand Cybernetics Systems vol 47 no 2 pp 315ndash326 2017

[8] S-J Wang H-L Chen W-J Yan Y-H Chen and X FuldquoFace recognition and micro-expression recognition based ondiscriminant tensor subspace analysis plus extreme learningmachinerdquo Neural Processing Letters vol 39 no 1 pp 25ndash432014

[9] G Huang ldquoAn insight into extreme learningmachines randomneurons random features and kernelsrdquo Cognitive Computation2014

[10] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[11] S Ding N Zhang X Xu L Guo and J Zhang ldquoDeep ExtremeLearning Machine and Its Application in EEG ClassificationrdquoMathematical Problems in Engineering vol 2015 Article ID129021 2015

[12] B Qu B Lang J Liang A Qin and O Crisalle ldquoTwo-hidden-layer extreme learning machine for regression andclassificationrdquo Neurocomputing vol 175 pp 826ndash834 2016

[13] S I Tamura and M Tateishi ldquoCapabilities of a four-layeredfeedforward neural network four layers versus threerdquo IEEETransactions on Neural Networks and Learning Systems vol 8no 2 pp 251ndash255 1997

[14] J Zhao Z Wang and D S Park ldquoOnline sequential extremelearning machine with forgetting mechanismrdquo Neurocomput-ing vol 87 no 15 pp 79ndash89 2012

[15] B Mirza Z Lin and K-A Toh ldquoWeighted online sequentialextreme learning machine for class imbalance learningrdquoNeuralProcessing Letters vol 38 no 3 pp 465ndash486 2013

[16] X Liu L Wang G B Huang J Zhang and J Yin ldquoMultiplekernel extreme learning machinerdquo Neurocomputing vol 149part A pp 253ndash264 2015

[17] Y Lan Y C Soh andG BHuang ldquoTwo-stage extreme learningmachine for regressionrdquo Neurocomputing vol 73 no 16-18 pp3028ndash3038 2010

[18] C-T Lin M Prasad and A Saxena ldquoAn Improved PolynomialNeural Network Classifier Using Real-Coded Genetic Algo-rithmrdquo IEEE Transactions on Systems Man and CyberneticsSystems vol 45 no 11 pp 1389ndash1401 2015

[19] C F Stallmann and A P Engelbrecht ldquoGramophone NoiseDetection and Reconstruction Using Time Delay ArtificialNeural Networksrdquo IEEE Transactions on Systems Man andCybernetics Systems vol 47 no 6 pp 893ndash905 2017

[20] E Cambria G-BHuang and L L C Kasun ldquoExtreme learningmachinesrdquo IEEE Intelligent Systems vol 28 no 6 pp 30ndash592013

[21] W Zhu J Miao L Qing and G-B Huang ldquoHierarchicalExtreme Learning Machine for unsupervised representationlearningrdquo in Proceedings of the International Joint Conference onNeural Networks IJCNN 2015 Ireland July 2015

[22] W Yu F Zhuang Q He and Z Shi ldquoLearning deep representa-tions via extreme learningmachinesrdquoNeurocomputing pp 308ndash315 2015

[23] G-B Huang ldquoLearning capability and storage capacity oftwo-hidden-layer feedforward networksrdquo IEEE Transactions onNeural Networks and Learning Systems vol 14 no 2 pp 274ndash281 2003

[24] G-B Huang and H A Babri ldquoUpper bounds on the numberof hidden neurons in feedforward networks with arbitrarybounded nonlinear activation functionsrdquo IEEE Transactions onNeural Networks and Learning Systems vol 9 no 1 pp 224ndash2291998

[25] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[26] JWei H Liu G Yan and F Sun ldquoRobotic grasping recognitionusing multi-modal deep extreme learning machinerdquo Multidi-mensional Systems and Signal Processing vol 28 no 3 pp 817ndash833 2016

[27] J J Liang A K Qin P N Suganthan and S BaskarldquoComprehensive learning particle swarm optimizer for globaloptimization of multimodal functionsrdquo IEEE Transactions onEvolutionary Computation vol 10 no 3 pp 281ndash295 2006

[28] University of California Irvine Machine Learning Repositoryhttparchiveicsucieduml

[29] LWebsite httpswwwcsientuedutwsimcjlinlibsvmtoolsdata-sets

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

Mathematical Problems in Engineering 5

layer together as one hidden layer) so that the structure of thenetwork is the same as the TELM network mentioned aboveSo we can obtain the weights matrix 120573new between the secondhidden layer and the output layer According to the numberof the actual samples we can use formula (8) or formula (9) tocalculate the weights 120573 which can improve the generalizationability of the network

Then theMELMseparates the three hidden layersmergedpreviously so theMELMstructure has three hidden layers Sothe expected output of the third hidden layer can be expressedas follows

1198673 = 119879120573+new (18)

120573+new is generalized inverse of the weights matrix 120573newThird the MELM defines the matrix 1198821198671198641 = [1198612 1198822]

so the parameters of the third hidden layer can be easilyobtained by calculating formula (18) and the formula 1198673 =119892(11986721198822 + 1198612) = 119892(11988211986711986411198671198641)

1198821198671198641 = 119892minus1 (1198673)119867+1198641 (19)

where1198672 is the actual output of the second hidden layer1198822is the weights between the second hidden layer and the thirdhidden layer1198612 is the bias of the third hidden neurons119867+1198641 isthe generational inverse of 1198671198641 = [1 1198672]119879 1 denotes a one-column vector of size 119876 and its elements are the scalar unit1 The notation 119892minus1(119909) indicates the inverse of the activationfunction 119892(119909)

In order to test the performance of the proposed MELMalgorithm we adopt different activation functions for regres-sion and classification problems to experiment Generally weadopt the logistic sigmoid function 119892(119909) = 1(1 + 119890minus119909) Theactual output of the third hidden layer is calculated as follows

1198674 = 119892 (W11986711986411198671198641) (20)

Finally the output weights matrix 120573new between the thirdhidden layer and the output layer is calculated as followswhen the number of hidden layer neurons is less than thenumber of training samples 120573 can be expressed as follows

120573new = ( 119868120582 + 11986711987941198674)

minus11198671198794 119879 (21)

When the number of hidden layer neurons is more thanthe number of training samples120573 can be expressed as follows

120573new = 1198671198794 ( 119868120582 + 11986741198671198794 )

minus1 119879 (22)

The actual output of the three-hidden-layer ELMnetwork canbe expressed as follows

119891 (119909) = 1198674120573new (23)

To ensure that the actual final hidden output moreapproaches the expected hidden output during the trainingprocess the operation process is the optimization of the net-work structure parameters starting from the second hiddenlayer

The above is the parameter calculation process of three-hidden-layer ELM network but the purpose of this paperis to calculate the parameter of the multiple hidden layersELM network and the final output of the MELM networkstructure We can use cycle calculation theory to illustratethe calculating process of theMELMWhen the four-hidden-layer ELM network occurs we can recalculate formula (18)to formula (22) in the calculation process of the networkobtain and record the parameters of each hidden layer andfinally calculate the final output of the MELM network If thenumber of hidden layers increased the calculation processcan be recycled and executed in the samewayThe calculationprocess of MELM network can be described as follows

Algorithm 2 (1) Assume the training sample dataset is119883 119879 = 119909119894 119905119894 (119894 = 1 2 3 119876) where the matrix 119883 isthe input samples and the matrix 119879 is the labeled samplesEach hidden layer has 119897 hidden neurons with the activationfunction 119892(119909)(2) Randomly initialize the weights119882 between the inputlayer and the first hidden layer as well as the bias 119861 of the firsthidden neurons119882119868119864 = [119861 119882] 119883119864 = [1 119883]119879(3) Calculate the equation119867 = 119892(119882119868119864119883119864)(4) Calculate the weights between the hidden layers andthe output layer 120573 = (119868120582 + 119867119879119867)minus1119867119879119879 or 120573 = 119867119879(119868120582 +119867119867119879)minus1119879(5) Calculate the expected output of the second hiddenlayer1198671 = 119879120573+(6) According to formulae (12)ndash(14) and the algorithmsteps (4 5) calculate the weights1198821 between the first hiddenlayer and the second hidden layer and the bias 1198611 of thesecond hidden neurons119882119867119864 = 119892minus1(1198671)119867+119864 (7) Obtain and update the actual output of the secondhidden layer1198672 = 119892(119882119867119864119867119864)(8)Update the weights matrix 120573 between the hidden layerand the output layer 120573new = (119868120582 + 11986711987921198672)minus11198671198792 119879 or 120573new =1198671198792 (119868120582 + 11986721198671198792 )minus1119879(9) If the number of the hidden layer is three we cancalculate the parameters by recycle executing the aboveoperation from step (5) to step (9) Now 120573new is expressed asfollows 120573new = 120573 119867119864 = [1 1198672]119879(10) Calculate the output 119891(119909) = 1198672120573newIf the number119870 of the hidden layer ismore than three recycleis executing step (5) to step (9) for (119870 minus 1) times All the 119867matrix (1198671 1198672) must be normalized between the range ofminus09 and 09 when the max of the matrix is more than 1 andthe min of the matrix is less than minus15 Application

In order to verify the actual effect of the algorithm proposedin this paper we have done the following experiments whichare divided into three parts regression problems classifi-cation problems and the application of mineral selectionindustry All the experiments are conducted in the MATLAB2010b computational environment running on a computerwith a 2302GHZ in i3 CPU

6 Mathematical Problems in Engineering

Table 1 The RMSE of the three function examples

Algorithm Training RMSE Testing RMSE1198911(119909)

TELM 86233119864 minus 9 10797119864 minus 8MELM (three-hidden-layer) 8722119864 minus 15 13055119864 minus 14MLELM 18428119864 minus 6 15204119864 minus 5

1198912(119909)TELM 02867 02775MELM (three-hidden-layer) 00011 00019MELM 01060 01098

1198913(119909)TELM 03528 06204MELM (three-hidden-layer) 02110 04177MLELM 05538 04591

51 Regression Problems To test the performance of theregression problems several widely used functions are listedbelow [27]Weuse these functions to generate a datasetwhichincludes random selection of sufficient training samples andthe remaining is used as a testing samples and the activationfunction is selected as the hyperbolic tangent function 119892(119909) =(1 minus 119890minus119909)(1 + 119890minus119909)

(1) 1198911(119909) = sum119863119894=1 119909119894(2) 1198912(119909) = sum119863minus1119894=1 (100(1199092119894 minus 119909119894+1)2 + (119909119894 minus 1)2)(3) 1198913(119909) = minus20119890minus02radicsum119899119894=1 1199092119894 119863 minus 119890sum119899119894=1 cos(2120587119909119894)119863 + 20The symbol 119863 that is set as a positive integer represents

the dimensions of the function we are using The function1198912(119909) and 1198913(119909) are the complex multimodal functionEach function-experiment has 900 training samples and 100testing samples The evaluation criterion of the experimentsis the average accuracy namely the root mean square error(RMSE) in the regression problems We first need to conductsome experiments to verify the average accuracy of the neuralnetwork structure in different number of hidden neuronsand hidden layers Then we can observe the optimal neuralnetwork structure with specific hidden layers number andhidden neurons number compared with the results of theTELM on regression problems

In Table 1 we give the average RMSE of the above threefunctions in the case of the two-hidden-layer structure theMLELM and the three-hidden-layer structure (we do a seriesof experiments including the four-hidden-layer structurefive-hidden-layer structure and eight-hidden-layer struc-ture But the experiments proved that the three-hidden-layer structure for the three functions 1198911(119909) 1198912(119909) 1198913(119909) canachieve the best results so we only give the value of the RMSEof the three-hidden-layer structure) in regression problemsThe table includes the testing RMSE and the training RMSEof the three algorithm structure We can see that the MELM(three-hidden-layer) has the better results

Table 2 The three datasets for the classification

Datasets Training samples Testing samples Attributes ClassMice Protein 750 330 82 8svmguide4 300 312 10 6vowel 568 422 10 11AutoUnix 380 120 101 4Iris 100 50 4 3

52 Classification Problems To test the performance of theMELM algorithm proposed on classification datasets so wequote some datasets (such asMice Protein svmguide4 vowelAutoUnix and Iris) that are collected from the University ofCalifornia [28] and the LIBSVM website [29] The trainingdata and testing data of each experiment are randomlyselected from the original datasets This information of thedatasets introduced is given in Table 2

The classification performance criteria of the problemsare the average classification accuracy of the testing data Inthe Figure 5 the average testing classification correct percent-age for the ELM TELM MLELM and MELM algorithm isshown clearly by using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris and different datasets (ab c d and e) have an important effect on the classificationaccuracy for the three algorithms (ELM TELM MLELMand MELM) But from the changing trend of the averagetesting correct percentage for the three algorithms with thesame datasets we can see that the MELM algorithm canselect the optimal number of hidden layers for the networkstructure to adapt to the specific datasets aiming at obtainingthe better results So the datasets of the Mice Protein andthe Forest type mapping use the three-hidden-layer ELMalgorithm to experiment and the datasets of the svmguide4use the five-hidden-layer ELM algorithm to experiment

53 The Application of Ores Selection Industry In this partwe use the actual datasets to analyze and experiment withthe model we have established and test the performance ofthe MELM algorithm First our datasets come from AnQianMining Group namely the mining area of Ya-ba-ling andXi-da-bei And the samples we selected are the hematite(50 samples) magnetite (90 samples) granite (57 samples)phyllite (15 samples) and chlorite (32 samples) On theprecise classification of the ores and the prediction of totaliron content of iron ores people usually use the chemicalanalysis method But the analysis process of total iron contentis a complex process a long process and the color of thesolution sometimes without the significantly change in theend Due to the extensive using of near infrared spectroscopyfor chemical composition analysis we can infer the structureof the unknown substance according to the position andshape of peaks in the absorption spectra So we can use thetheory to obtain the correct information of the ores with thelow influence of environment on data acquisition HR1024SVC spectrometer is used to test the spectral data of eachsample and the total iron content of hematite and magnetiteis obtained from the chemical analysis center of Northeastern

Mathematical Problems in Engineering 7

ELMTELM

Three-hidden-layerMLELM

30

40

50

60

70

80

90

100Av

erag

e tes

ting

corr

ect p

erce

ntag

e (

)

50 100 150 200 250 3000The number of hidden neurons

(a)

ELMTELM

Five-hidden-layerMLELM

2025303540455055606570

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(b)

ELMTELM

Three-hidden-layerMLELM

10

20

30

40

50

60

70

80

90

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(c)

ELMTELM

Five-hidden-layerMLELM

45

50

55

60

65

70

75

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(d)

ELMTELM

Three-hidden-layerMLELM

55

60

65

70

75

80

85

90

95

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(e)

Figure 5 The average testing classification correct percentage for the ELM TELM and MELM using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris

8 Mathematical Problems in Engineering

Training data

ELMTELM

Three-hidden-layerMLELM

75

80

85

90

95

100

Aver

age t

rain

ing

corr

ect p

erce

ntag

e (

)

40 60 80 100 120 140 16020The number of hidden neurons

(a)

Testing data

ELMTELM

Three-hidden-layerMLELM

50

55

60

65

70

75

80

85

90

95

100

Aver

age t

estin

g co

rrec

t per

cent

age (

)

40 60 80 100 120 140 16020The number of hidden neurons

(b)

Figure 6 Average classification accuracy of different hidden layers

University of China Our experimentsrsquo datasets are the nearinfrared spectrum information of iron ores which are theabsorption rate of different iron ores under the condition ofthe near infrared spectrum The wavelength of near infraredspectrum is 300 nmndash2500 nm so the datasets of the iron oreswe obtained have the very high dimensions (973 dimensions)We use the theory of the principal component analysis toreduce the dimensions of the datasets

(A) Here we use the algorithm proposed in this paper toclassify the five kinds of ores and compare with the resultsof the ELM algorithm and TELM algorithm The activa-tion function of the algorithm in classification problems isselected as the sigmoid function 119892(119909) = 1(1 + 119890minus119909)

Figure 6 expresses the average classification accuracy ofthe ELM TELM MLELM and MELM (three-hidden-layerELM) algorithm in different hidden layers After we conducta series of experiments with different hidden layers of theMELM structure finally we select the three-hidden-layerELM structure which can obtain the optimal classificationresults in the MELM algorithm Figure 6(a) is the averagetraining correct percentage and the MELM has the highresult compared with others in the same hidden neuronsFigure 6(b) is the average testing correct percentage and theMELM has the high result in the hidden neurons between 20and 60 In the actual situation we can reasonably choose thenumber of each hidden neurons to model

(B) Here we use the method proposed in this paper totest the total iron content of the hematite and magnetite andcompare with the results of the ELM algorithm and TELMalgorithm Here is the prediction of total iron content ofhematite and magnetite with the optimal hidden layers andoptimal hidden layer neurons The activation function of the

algorithm in regression problems is selected as the hyperbolictangent function 119892(119909) = (1 minus 119890minus119909)(1 + 119890minus119909)

Figures 7 and 8 respectively expresses the total ironcontent of the hematite and magnetite by using the ELMTELM MLELM andMELM algorithm (the MELM includesthe eight-hidden-layer structure which is the optimal struc-ture in Figure 7 the six-hidden-layer structure which is theoptimal structure in Figure 8) The optimal results of eachalgorithm are obtained by constantly experimenting with thedifferent number of hidden layer neurons In the predictionprocess of the hematite we can see that the MELM has thebetter results and each hidden layer of this structure has 100hidden neurons In the prediction process of the magnetitewe can see that the MELM has the better results and eachhidden layer of this structure has 20 hidden neurons Sothe MELM algorithm has strong performance to the oresselection problems which can achieve the best adaptabilityto the problems by adjusting the number of hidden layers andthe number of hidden layer neurons to improve the ability ofsystem regression analysis

6 Conclusion

The MELM we proposed in this paper solves the two prob-lems which exist in the training process of the original ELMThe first is the stability of single network that the disadvan-tage will also influence the generalization performance Thesecond is that the outputweights are calculated by the productof the generalized inverse of hidden layer output and thesystem actual output And the parameters randomly selectedof hidden layer neurons which can lead to the singularmatrix or morbid matrix At the same time the MELM

Mathematical Problems in Engineering 9

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

16

18

20

22

24

26

28

30

32

34

36Tr

ain

set o

utpu

t

10 20 30 40 50 600Train set sample

(a)

1 2 3 4 5 6 7 8 9 10Test set sample

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

15

20

25

30

35

40

45

Test

set o

utpu

t(b)

Figure 7 The total iron content of the hematite

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

Trai

n se

t out

put

2 3 4 5 6 7 8 91Train set sample

(a)

1 15 2 25 3 35 4 45 5Test set sample

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

40

Test

set o

utpu

t

(b)

Figure 8 The total iron content of the magnetite

network structure also improves the average accuracy oftraining and testing performance compared to the ELM andTELM network structure The MELM algorithm inherits thecharacteristics of traditional ELM that randomly initializesthe weights and bias (between the input layer and the firsthidden layer) also adopts a part of the TELM algorithm anduses the inverse activation function to calculate the weightsand bias of hidden layers (except the first hidden layer)Then we make the actual hidden layer output approximateto the expected hidden layer output and use the parameters

obtained above to calculate the actual output In the functionregression problems this algorithm reduces the least meansquare error In the datasets classification problems theaverage accuracy of themultiple classifications is significantlyhigher than that of the ELM and TELMnetwork structure Insuch cases the MELM is able to improve the performance ofthe network structure

Conflicts of Interest

The authors declare no conflicts of interest

10 Mathematical Problems in Engineering

Acknowledgments

This research is supported by National Natural ScienceFoundation of China (Grants nos 41371437 61473072and 61203214) Fundamental Research Funds for the Cen-tral Universities (N160404008) National Twelfth Five-YearPlan for Science and Technology Support (2015BAB15B01)China and National Key Research and Development Plan(2016YFC0801602)

References

[1] K Hornik ldquoApproximation capabilities of multilayer feedfor-ward networksrdquoNeural Networks vol 4 no 2 pp 251ndash257 1991

[2] M Leshno V Y Lin A Pinkus and S Schocken ldquoMultilayerfeedforward networks with a nonpolynomial activation func-tion can approximate any functionrdquoNeural Networks vol 6 no6 pp 861ndash867 1993

[3] R Hecht-Nielsen ldquoTheory of the backpropagation neural net-workrdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo89) vol 1 pp 593ndash605 WashingtonDC USA June 1989

[4] H-C Hsin C-C Li M Sun and R J Sclabassi ldquoAn AdaptiveTraining Algorithm for Back-Propagation Neural NetworksrdquoIEEE Transactions on SystemsMan and Cybernetics vol 25 no3 pp 512ndash514 1995

[5] D Lowe ldquodaptive radial basis function nonlinearities and theproblemof generalizationrdquo in First IEE International Conferenceon Artificial Neural Networks pp 313-171 1989

[6] S Ding G Ma and Z Shi ldquoA Rough RBF Neural NetworkBased on Weighted Regularized Extreme Learning MachinerdquoNeural Processing Letters vol 40 no 3 pp 245ndash260 2014

[7] C SunW HeW Ge and C Chang ldquoAdaptive Neural NetworkControl of Biped Robotsrdquo IEEE Transactions on Systems Manand Cybernetics Systems vol 47 no 2 pp 315ndash326 2017

[8] S-J Wang H-L Chen W-J Yan Y-H Chen and X FuldquoFace recognition and micro-expression recognition based ondiscriminant tensor subspace analysis plus extreme learningmachinerdquo Neural Processing Letters vol 39 no 1 pp 25ndash432014

[9] G Huang ldquoAn insight into extreme learningmachines randomneurons random features and kernelsrdquo Cognitive Computation2014

[10] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[11] S Ding N Zhang X Xu L Guo and J Zhang ldquoDeep ExtremeLearning Machine and Its Application in EEG ClassificationrdquoMathematical Problems in Engineering vol 2015 Article ID129021 2015

[12] B Qu B Lang J Liang A Qin and O Crisalle ldquoTwo-hidden-layer extreme learning machine for regression andclassificationrdquo Neurocomputing vol 175 pp 826ndash834 2016

[13] S I Tamura and M Tateishi ldquoCapabilities of a four-layeredfeedforward neural network four layers versus threerdquo IEEETransactions on Neural Networks and Learning Systems vol 8no 2 pp 251ndash255 1997

[14] J Zhao Z Wang and D S Park ldquoOnline sequential extremelearning machine with forgetting mechanismrdquo Neurocomput-ing vol 87 no 15 pp 79ndash89 2012

[15] B Mirza Z Lin and K-A Toh ldquoWeighted online sequentialextreme learning machine for class imbalance learningrdquoNeuralProcessing Letters vol 38 no 3 pp 465ndash486 2013

[16] X Liu L Wang G B Huang J Zhang and J Yin ldquoMultiplekernel extreme learning machinerdquo Neurocomputing vol 149part A pp 253ndash264 2015

[17] Y Lan Y C Soh andG BHuang ldquoTwo-stage extreme learningmachine for regressionrdquo Neurocomputing vol 73 no 16-18 pp3028ndash3038 2010

[18] C-T Lin M Prasad and A Saxena ldquoAn Improved PolynomialNeural Network Classifier Using Real-Coded Genetic Algo-rithmrdquo IEEE Transactions on Systems Man and CyberneticsSystems vol 45 no 11 pp 1389ndash1401 2015

[19] C F Stallmann and A P Engelbrecht ldquoGramophone NoiseDetection and Reconstruction Using Time Delay ArtificialNeural Networksrdquo IEEE Transactions on Systems Man andCybernetics Systems vol 47 no 6 pp 893ndash905 2017

[20] E Cambria G-BHuang and L L C Kasun ldquoExtreme learningmachinesrdquo IEEE Intelligent Systems vol 28 no 6 pp 30ndash592013

[21] W Zhu J Miao L Qing and G-B Huang ldquoHierarchicalExtreme Learning Machine for unsupervised representationlearningrdquo in Proceedings of the International Joint Conference onNeural Networks IJCNN 2015 Ireland July 2015

[22] W Yu F Zhuang Q He and Z Shi ldquoLearning deep representa-tions via extreme learningmachinesrdquoNeurocomputing pp 308ndash315 2015

[23] G-B Huang ldquoLearning capability and storage capacity oftwo-hidden-layer feedforward networksrdquo IEEE Transactions onNeural Networks and Learning Systems vol 14 no 2 pp 274ndash281 2003

[24] G-B Huang and H A Babri ldquoUpper bounds on the numberof hidden neurons in feedforward networks with arbitrarybounded nonlinear activation functionsrdquo IEEE Transactions onNeural Networks and Learning Systems vol 9 no 1 pp 224ndash2291998

[25] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[26] JWei H Liu G Yan and F Sun ldquoRobotic grasping recognitionusing multi-modal deep extreme learning machinerdquo Multidi-mensional Systems and Signal Processing vol 28 no 3 pp 817ndash833 2016

[27] J J Liang A K Qin P N Suganthan and S BaskarldquoComprehensive learning particle swarm optimizer for globaloptimization of multimodal functionsrdquo IEEE Transactions onEvolutionary Computation vol 10 no 3 pp 281ndash295 2006

[28] University of California Irvine Machine Learning Repositoryhttparchiveicsucieduml

[29] LWebsite httpswwwcsientuedutwsimcjlinlibsvmtoolsdata-sets

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

6 Mathematical Problems in Engineering

Table 1 The RMSE of the three function examples

Algorithm Training RMSE Testing RMSE1198911(119909)

TELM 86233119864 minus 9 10797119864 minus 8MELM (three-hidden-layer) 8722119864 minus 15 13055119864 minus 14MLELM 18428119864 minus 6 15204119864 minus 5

1198912(119909)TELM 02867 02775MELM (three-hidden-layer) 00011 00019MELM 01060 01098

1198913(119909)TELM 03528 06204MELM (three-hidden-layer) 02110 04177MLELM 05538 04591

51 Regression Problems To test the performance of theregression problems several widely used functions are listedbelow [27]Weuse these functions to generate a datasetwhichincludes random selection of sufficient training samples andthe remaining is used as a testing samples and the activationfunction is selected as the hyperbolic tangent function 119892(119909) =(1 minus 119890minus119909)(1 + 119890minus119909)

(1) 1198911(119909) = sum119863119894=1 119909119894(2) 1198912(119909) = sum119863minus1119894=1 (100(1199092119894 minus 119909119894+1)2 + (119909119894 minus 1)2)(3) 1198913(119909) = minus20119890minus02radicsum119899119894=1 1199092119894 119863 minus 119890sum119899119894=1 cos(2120587119909119894)119863 + 20The symbol 119863 that is set as a positive integer represents

the dimensions of the function we are using The function1198912(119909) and 1198913(119909) are the complex multimodal functionEach function-experiment has 900 training samples and 100testing samples The evaluation criterion of the experimentsis the average accuracy namely the root mean square error(RMSE) in the regression problems We first need to conductsome experiments to verify the average accuracy of the neuralnetwork structure in different number of hidden neuronsand hidden layers Then we can observe the optimal neuralnetwork structure with specific hidden layers number andhidden neurons number compared with the results of theTELM on regression problems

In Table 1 we give the average RMSE of the above threefunctions in the case of the two-hidden-layer structure theMLELM and the three-hidden-layer structure (we do a seriesof experiments including the four-hidden-layer structurefive-hidden-layer structure and eight-hidden-layer struc-ture But the experiments proved that the three-hidden-layer structure for the three functions 1198911(119909) 1198912(119909) 1198913(119909) canachieve the best results so we only give the value of the RMSEof the three-hidden-layer structure) in regression problemsThe table includes the testing RMSE and the training RMSEof the three algorithm structure We can see that the MELM(three-hidden-layer) has the better results

Table 2 The three datasets for the classification

Datasets Training samples Testing samples Attributes ClassMice Protein 750 330 82 8svmguide4 300 312 10 6vowel 568 422 10 11AutoUnix 380 120 101 4Iris 100 50 4 3

52 Classification Problems To test the performance of theMELM algorithm proposed on classification datasets so wequote some datasets (such asMice Protein svmguide4 vowelAutoUnix and Iris) that are collected from the University ofCalifornia [28] and the LIBSVM website [29] The trainingdata and testing data of each experiment are randomlyselected from the original datasets This information of thedatasets introduced is given in Table 2

The classification performance criteria of the problemsare the average classification accuracy of the testing data Inthe Figure 5 the average testing classification correct percent-age for the ELM TELM MLELM and MELM algorithm isshown clearly by using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris and different datasets (ab c d and e) have an important effect on the classificationaccuracy for the three algorithms (ELM TELM MLELMand MELM) But from the changing trend of the averagetesting correct percentage for the three algorithms with thesame datasets we can see that the MELM algorithm canselect the optimal number of hidden layers for the networkstructure to adapt to the specific datasets aiming at obtainingthe better results So the datasets of the Mice Protein andthe Forest type mapping use the three-hidden-layer ELMalgorithm to experiment and the datasets of the svmguide4use the five-hidden-layer ELM algorithm to experiment

53 The Application of Ores Selection Industry In this partwe use the actual datasets to analyze and experiment withthe model we have established and test the performance ofthe MELM algorithm First our datasets come from AnQianMining Group namely the mining area of Ya-ba-ling andXi-da-bei And the samples we selected are the hematite(50 samples) magnetite (90 samples) granite (57 samples)phyllite (15 samples) and chlorite (32 samples) On theprecise classification of the ores and the prediction of totaliron content of iron ores people usually use the chemicalanalysis method But the analysis process of total iron contentis a complex process a long process and the color of thesolution sometimes without the significantly change in theend Due to the extensive using of near infrared spectroscopyfor chemical composition analysis we can infer the structureof the unknown substance according to the position andshape of peaks in the absorption spectra So we can use thetheory to obtain the correct information of the ores with thelow influence of environment on data acquisition HR1024SVC spectrometer is used to test the spectral data of eachsample and the total iron content of hematite and magnetiteis obtained from the chemical analysis center of Northeastern

Mathematical Problems in Engineering 7

ELMTELM

Three-hidden-layerMLELM

30

40

50

60

70

80

90

100Av

erag

e tes

ting

corr

ect p

erce

ntag

e (

)

50 100 150 200 250 3000The number of hidden neurons

(a)

ELMTELM

Five-hidden-layerMLELM

2025303540455055606570

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(b)

ELMTELM

Three-hidden-layerMLELM

10

20

30

40

50

60

70

80

90

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(c)

ELMTELM

Five-hidden-layerMLELM

45

50

55

60

65

70

75

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(d)

ELMTELM

Three-hidden-layerMLELM

55

60

65

70

75

80

85

90

95

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(e)

Figure 5 The average testing classification correct percentage for the ELM TELM and MELM using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris

8 Mathematical Problems in Engineering

Training data

ELMTELM

Three-hidden-layerMLELM

75

80

85

90

95

100

Aver

age t

rain

ing

corr

ect p

erce

ntag

e (

)

40 60 80 100 120 140 16020The number of hidden neurons

(a)

Testing data

ELMTELM

Three-hidden-layerMLELM

50

55

60

65

70

75

80

85

90

95

100

Aver

age t

estin

g co

rrec

t per

cent

age (

)

40 60 80 100 120 140 16020The number of hidden neurons

(b)

Figure 6 Average classification accuracy of different hidden layers

University of China Our experimentsrsquo datasets are the nearinfrared spectrum information of iron ores which are theabsorption rate of different iron ores under the condition ofthe near infrared spectrum The wavelength of near infraredspectrum is 300 nmndash2500 nm so the datasets of the iron oreswe obtained have the very high dimensions (973 dimensions)We use the theory of the principal component analysis toreduce the dimensions of the datasets

(A) Here we use the algorithm proposed in this paper toclassify the five kinds of ores and compare with the resultsof the ELM algorithm and TELM algorithm The activa-tion function of the algorithm in classification problems isselected as the sigmoid function 119892(119909) = 1(1 + 119890minus119909)

Figure 6 expresses the average classification accuracy ofthe ELM TELM MLELM and MELM (three-hidden-layerELM) algorithm in different hidden layers After we conducta series of experiments with different hidden layers of theMELM structure finally we select the three-hidden-layerELM structure which can obtain the optimal classificationresults in the MELM algorithm Figure 6(a) is the averagetraining correct percentage and the MELM has the highresult compared with others in the same hidden neuronsFigure 6(b) is the average testing correct percentage and theMELM has the high result in the hidden neurons between 20and 60 In the actual situation we can reasonably choose thenumber of each hidden neurons to model

(B) Here we use the method proposed in this paper totest the total iron content of the hematite and magnetite andcompare with the results of the ELM algorithm and TELMalgorithm Here is the prediction of total iron content ofhematite and magnetite with the optimal hidden layers andoptimal hidden layer neurons The activation function of the

algorithm in regression problems is selected as the hyperbolictangent function 119892(119909) = (1 minus 119890minus119909)(1 + 119890minus119909)

Figures 7 and 8 respectively expresses the total ironcontent of the hematite and magnetite by using the ELMTELM MLELM andMELM algorithm (the MELM includesthe eight-hidden-layer structure which is the optimal struc-ture in Figure 7 the six-hidden-layer structure which is theoptimal structure in Figure 8) The optimal results of eachalgorithm are obtained by constantly experimenting with thedifferent number of hidden layer neurons In the predictionprocess of the hematite we can see that the MELM has thebetter results and each hidden layer of this structure has 100hidden neurons In the prediction process of the magnetitewe can see that the MELM has the better results and eachhidden layer of this structure has 20 hidden neurons Sothe MELM algorithm has strong performance to the oresselection problems which can achieve the best adaptabilityto the problems by adjusting the number of hidden layers andthe number of hidden layer neurons to improve the ability ofsystem regression analysis

6 Conclusion

The MELM we proposed in this paper solves the two prob-lems which exist in the training process of the original ELMThe first is the stability of single network that the disadvan-tage will also influence the generalization performance Thesecond is that the outputweights are calculated by the productof the generalized inverse of hidden layer output and thesystem actual output And the parameters randomly selectedof hidden layer neurons which can lead to the singularmatrix or morbid matrix At the same time the MELM

Mathematical Problems in Engineering 9

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

16

18

20

22

24

26

28

30

32

34

36Tr

ain

set o

utpu

t

10 20 30 40 50 600Train set sample

(a)

1 2 3 4 5 6 7 8 9 10Test set sample

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

15

20

25

30

35

40

45

Test

set o

utpu

t(b)

Figure 7 The total iron content of the hematite

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

Trai

n se

t out

put

2 3 4 5 6 7 8 91Train set sample

(a)

1 15 2 25 3 35 4 45 5Test set sample

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

40

Test

set o

utpu

t

(b)

Figure 8 The total iron content of the magnetite

network structure also improves the average accuracy oftraining and testing performance compared to the ELM andTELM network structure The MELM algorithm inherits thecharacteristics of traditional ELM that randomly initializesthe weights and bias (between the input layer and the firsthidden layer) also adopts a part of the TELM algorithm anduses the inverse activation function to calculate the weightsand bias of hidden layers (except the first hidden layer)Then we make the actual hidden layer output approximateto the expected hidden layer output and use the parameters

obtained above to calculate the actual output In the functionregression problems this algorithm reduces the least meansquare error In the datasets classification problems theaverage accuracy of themultiple classifications is significantlyhigher than that of the ELM and TELMnetwork structure Insuch cases the MELM is able to improve the performance ofthe network structure

Conflicts of Interest

The authors declare no conflicts of interest

10 Mathematical Problems in Engineering

Acknowledgments

This research is supported by National Natural ScienceFoundation of China (Grants nos 41371437 61473072and 61203214) Fundamental Research Funds for the Cen-tral Universities (N160404008) National Twelfth Five-YearPlan for Science and Technology Support (2015BAB15B01)China and National Key Research and Development Plan(2016YFC0801602)

References

[1] K Hornik ldquoApproximation capabilities of multilayer feedfor-ward networksrdquoNeural Networks vol 4 no 2 pp 251ndash257 1991

[2] M Leshno V Y Lin A Pinkus and S Schocken ldquoMultilayerfeedforward networks with a nonpolynomial activation func-tion can approximate any functionrdquoNeural Networks vol 6 no6 pp 861ndash867 1993

[3] R Hecht-Nielsen ldquoTheory of the backpropagation neural net-workrdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo89) vol 1 pp 593ndash605 WashingtonDC USA June 1989

[4] H-C Hsin C-C Li M Sun and R J Sclabassi ldquoAn AdaptiveTraining Algorithm for Back-Propagation Neural NetworksrdquoIEEE Transactions on SystemsMan and Cybernetics vol 25 no3 pp 512ndash514 1995

[5] D Lowe ldquodaptive radial basis function nonlinearities and theproblemof generalizationrdquo in First IEE International Conferenceon Artificial Neural Networks pp 313-171 1989

[6] S Ding G Ma and Z Shi ldquoA Rough RBF Neural NetworkBased on Weighted Regularized Extreme Learning MachinerdquoNeural Processing Letters vol 40 no 3 pp 245ndash260 2014

[7] C SunW HeW Ge and C Chang ldquoAdaptive Neural NetworkControl of Biped Robotsrdquo IEEE Transactions on Systems Manand Cybernetics Systems vol 47 no 2 pp 315ndash326 2017

[8] S-J Wang H-L Chen W-J Yan Y-H Chen and X FuldquoFace recognition and micro-expression recognition based ondiscriminant tensor subspace analysis plus extreme learningmachinerdquo Neural Processing Letters vol 39 no 1 pp 25ndash432014

[9] G Huang ldquoAn insight into extreme learningmachines randomneurons random features and kernelsrdquo Cognitive Computation2014

[10] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[11] S Ding N Zhang X Xu L Guo and J Zhang ldquoDeep ExtremeLearning Machine and Its Application in EEG ClassificationrdquoMathematical Problems in Engineering vol 2015 Article ID129021 2015

[12] B Qu B Lang J Liang A Qin and O Crisalle ldquoTwo-hidden-layer extreme learning machine for regression andclassificationrdquo Neurocomputing vol 175 pp 826ndash834 2016

[13] S I Tamura and M Tateishi ldquoCapabilities of a four-layeredfeedforward neural network four layers versus threerdquo IEEETransactions on Neural Networks and Learning Systems vol 8no 2 pp 251ndash255 1997

[14] J Zhao Z Wang and D S Park ldquoOnline sequential extremelearning machine with forgetting mechanismrdquo Neurocomput-ing vol 87 no 15 pp 79ndash89 2012

[15] B Mirza Z Lin and K-A Toh ldquoWeighted online sequentialextreme learning machine for class imbalance learningrdquoNeuralProcessing Letters vol 38 no 3 pp 465ndash486 2013

[16] X Liu L Wang G B Huang J Zhang and J Yin ldquoMultiplekernel extreme learning machinerdquo Neurocomputing vol 149part A pp 253ndash264 2015

[17] Y Lan Y C Soh andG BHuang ldquoTwo-stage extreme learningmachine for regressionrdquo Neurocomputing vol 73 no 16-18 pp3028ndash3038 2010

[18] C-T Lin M Prasad and A Saxena ldquoAn Improved PolynomialNeural Network Classifier Using Real-Coded Genetic Algo-rithmrdquo IEEE Transactions on Systems Man and CyberneticsSystems vol 45 no 11 pp 1389ndash1401 2015

[19] C F Stallmann and A P Engelbrecht ldquoGramophone NoiseDetection and Reconstruction Using Time Delay ArtificialNeural Networksrdquo IEEE Transactions on Systems Man andCybernetics Systems vol 47 no 6 pp 893ndash905 2017

[20] E Cambria G-BHuang and L L C Kasun ldquoExtreme learningmachinesrdquo IEEE Intelligent Systems vol 28 no 6 pp 30ndash592013

[21] W Zhu J Miao L Qing and G-B Huang ldquoHierarchicalExtreme Learning Machine for unsupervised representationlearningrdquo in Proceedings of the International Joint Conference onNeural Networks IJCNN 2015 Ireland July 2015

[22] W Yu F Zhuang Q He and Z Shi ldquoLearning deep representa-tions via extreme learningmachinesrdquoNeurocomputing pp 308ndash315 2015

[23] G-B Huang ldquoLearning capability and storage capacity oftwo-hidden-layer feedforward networksrdquo IEEE Transactions onNeural Networks and Learning Systems vol 14 no 2 pp 274ndash281 2003

[24] G-B Huang and H A Babri ldquoUpper bounds on the numberof hidden neurons in feedforward networks with arbitrarybounded nonlinear activation functionsrdquo IEEE Transactions onNeural Networks and Learning Systems vol 9 no 1 pp 224ndash2291998

[25] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[26] JWei H Liu G Yan and F Sun ldquoRobotic grasping recognitionusing multi-modal deep extreme learning machinerdquo Multidi-mensional Systems and Signal Processing vol 28 no 3 pp 817ndash833 2016

[27] J J Liang A K Qin P N Suganthan and S BaskarldquoComprehensive learning particle swarm optimizer for globaloptimization of multimodal functionsrdquo IEEE Transactions onEvolutionary Computation vol 10 no 3 pp 281ndash295 2006

[28] University of California Irvine Machine Learning Repositoryhttparchiveicsucieduml

[29] LWebsite httpswwwcsientuedutwsimcjlinlibsvmtoolsdata-sets

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

Mathematical Problems in Engineering 7

ELMTELM

Three-hidden-layerMLELM

30

40

50

60

70

80

90

100Av

erag

e tes

ting

corr

ect p

erce

ntag

e (

)

50 100 150 200 250 3000The number of hidden neurons

(a)

ELMTELM

Five-hidden-layerMLELM

2025303540455055606570

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(b)

ELMTELM

Three-hidden-layerMLELM

10

20

30

40

50

60

70

80

90

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(c)

ELMTELM

Five-hidden-layerMLELM

45

50

55

60

65

70

75

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(d)

ELMTELM

Three-hidden-layerMLELM

55

60

65

70

75

80

85

90

95

Aver

age t

estin

g co

rrec

t per

cent

age (

)

50 100 150 200 250 3000The number of hidden neurons

(e)

Figure 5 The average testing classification correct percentage for the ELM TELM and MELM using (a) Mice Protein (b) svmguide4 (c)Vowel (d) AutoUnix and (e) Iris

8 Mathematical Problems in Engineering

Training data

ELMTELM

Three-hidden-layerMLELM

75

80

85

90

95

100

Aver

age t

rain

ing

corr

ect p

erce

ntag

e (

)

40 60 80 100 120 140 16020The number of hidden neurons

(a)

Testing data

ELMTELM

Three-hidden-layerMLELM

50

55

60

65

70

75

80

85

90

95

100

Aver

age t

estin

g co

rrec

t per

cent

age (

)

40 60 80 100 120 140 16020The number of hidden neurons

(b)

Figure 6 Average classification accuracy of different hidden layers

University of China Our experimentsrsquo datasets are the nearinfrared spectrum information of iron ores which are theabsorption rate of different iron ores under the condition ofthe near infrared spectrum The wavelength of near infraredspectrum is 300 nmndash2500 nm so the datasets of the iron oreswe obtained have the very high dimensions (973 dimensions)We use the theory of the principal component analysis toreduce the dimensions of the datasets

(A) Here we use the algorithm proposed in this paper toclassify the five kinds of ores and compare with the resultsof the ELM algorithm and TELM algorithm The activa-tion function of the algorithm in classification problems isselected as the sigmoid function 119892(119909) = 1(1 + 119890minus119909)

Figure 6 expresses the average classification accuracy ofthe ELM TELM MLELM and MELM (three-hidden-layerELM) algorithm in different hidden layers After we conducta series of experiments with different hidden layers of theMELM structure finally we select the three-hidden-layerELM structure which can obtain the optimal classificationresults in the MELM algorithm Figure 6(a) is the averagetraining correct percentage and the MELM has the highresult compared with others in the same hidden neuronsFigure 6(b) is the average testing correct percentage and theMELM has the high result in the hidden neurons between 20and 60 In the actual situation we can reasonably choose thenumber of each hidden neurons to model

(B) Here we use the method proposed in this paper totest the total iron content of the hematite and magnetite andcompare with the results of the ELM algorithm and TELMalgorithm Here is the prediction of total iron content ofhematite and magnetite with the optimal hidden layers andoptimal hidden layer neurons The activation function of the

algorithm in regression problems is selected as the hyperbolictangent function 119892(119909) = (1 minus 119890minus119909)(1 + 119890minus119909)

Figures 7 and 8 respectively expresses the total ironcontent of the hematite and magnetite by using the ELMTELM MLELM andMELM algorithm (the MELM includesthe eight-hidden-layer structure which is the optimal struc-ture in Figure 7 the six-hidden-layer structure which is theoptimal structure in Figure 8) The optimal results of eachalgorithm are obtained by constantly experimenting with thedifferent number of hidden layer neurons In the predictionprocess of the hematite we can see that the MELM has thebetter results and each hidden layer of this structure has 100hidden neurons In the prediction process of the magnetitewe can see that the MELM has the better results and eachhidden layer of this structure has 20 hidden neurons Sothe MELM algorithm has strong performance to the oresselection problems which can achieve the best adaptabilityto the problems by adjusting the number of hidden layers andthe number of hidden layer neurons to improve the ability ofsystem regression analysis

6 Conclusion

The MELM we proposed in this paper solves the two prob-lems which exist in the training process of the original ELMThe first is the stability of single network that the disadvan-tage will also influence the generalization performance Thesecond is that the outputweights are calculated by the productof the generalized inverse of hidden layer output and thesystem actual output And the parameters randomly selectedof hidden layer neurons which can lead to the singularmatrix or morbid matrix At the same time the MELM

Mathematical Problems in Engineering 9

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

16

18

20

22

24

26

28

30

32

34

36Tr

ain

set o

utpu

t

10 20 30 40 50 600Train set sample

(a)

1 2 3 4 5 6 7 8 9 10Test set sample

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

15

20

25

30

35

40

45

Test

set o

utpu

t(b)

Figure 7 The total iron content of the hematite

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

Trai

n se

t out

put

2 3 4 5 6 7 8 91Train set sample

(a)

1 15 2 25 3 35 4 45 5Test set sample

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

40

Test

set o

utpu

t

(b)

Figure 8 The total iron content of the magnetite

network structure also improves the average accuracy oftraining and testing performance compared to the ELM andTELM network structure The MELM algorithm inherits thecharacteristics of traditional ELM that randomly initializesthe weights and bias (between the input layer and the firsthidden layer) also adopts a part of the TELM algorithm anduses the inverse activation function to calculate the weightsand bias of hidden layers (except the first hidden layer)Then we make the actual hidden layer output approximateto the expected hidden layer output and use the parameters

obtained above to calculate the actual output In the functionregression problems this algorithm reduces the least meansquare error In the datasets classification problems theaverage accuracy of themultiple classifications is significantlyhigher than that of the ELM and TELMnetwork structure Insuch cases the MELM is able to improve the performance ofthe network structure

Conflicts of Interest

The authors declare no conflicts of interest

10 Mathematical Problems in Engineering

Acknowledgments

This research is supported by National Natural ScienceFoundation of China (Grants nos 41371437 61473072and 61203214) Fundamental Research Funds for the Cen-tral Universities (N160404008) National Twelfth Five-YearPlan for Science and Technology Support (2015BAB15B01)China and National Key Research and Development Plan(2016YFC0801602)

References

[1] K Hornik ldquoApproximation capabilities of multilayer feedfor-ward networksrdquoNeural Networks vol 4 no 2 pp 251ndash257 1991

[2] M Leshno V Y Lin A Pinkus and S Schocken ldquoMultilayerfeedforward networks with a nonpolynomial activation func-tion can approximate any functionrdquoNeural Networks vol 6 no6 pp 861ndash867 1993

[3] R Hecht-Nielsen ldquoTheory of the backpropagation neural net-workrdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo89) vol 1 pp 593ndash605 WashingtonDC USA June 1989

[4] H-C Hsin C-C Li M Sun and R J Sclabassi ldquoAn AdaptiveTraining Algorithm for Back-Propagation Neural NetworksrdquoIEEE Transactions on SystemsMan and Cybernetics vol 25 no3 pp 512ndash514 1995

[5] D Lowe ldquodaptive radial basis function nonlinearities and theproblemof generalizationrdquo in First IEE International Conferenceon Artificial Neural Networks pp 313-171 1989

[6] S Ding G Ma and Z Shi ldquoA Rough RBF Neural NetworkBased on Weighted Regularized Extreme Learning MachinerdquoNeural Processing Letters vol 40 no 3 pp 245ndash260 2014

[7] C SunW HeW Ge and C Chang ldquoAdaptive Neural NetworkControl of Biped Robotsrdquo IEEE Transactions on Systems Manand Cybernetics Systems vol 47 no 2 pp 315ndash326 2017

[8] S-J Wang H-L Chen W-J Yan Y-H Chen and X FuldquoFace recognition and micro-expression recognition based ondiscriminant tensor subspace analysis plus extreme learningmachinerdquo Neural Processing Letters vol 39 no 1 pp 25ndash432014

[9] G Huang ldquoAn insight into extreme learningmachines randomneurons random features and kernelsrdquo Cognitive Computation2014

[10] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[11] S Ding N Zhang X Xu L Guo and J Zhang ldquoDeep ExtremeLearning Machine and Its Application in EEG ClassificationrdquoMathematical Problems in Engineering vol 2015 Article ID129021 2015

[12] B Qu B Lang J Liang A Qin and O Crisalle ldquoTwo-hidden-layer extreme learning machine for regression andclassificationrdquo Neurocomputing vol 175 pp 826ndash834 2016

[13] S I Tamura and M Tateishi ldquoCapabilities of a four-layeredfeedforward neural network four layers versus threerdquo IEEETransactions on Neural Networks and Learning Systems vol 8no 2 pp 251ndash255 1997

[14] J Zhao Z Wang and D S Park ldquoOnline sequential extremelearning machine with forgetting mechanismrdquo Neurocomput-ing vol 87 no 15 pp 79ndash89 2012

[15] B Mirza Z Lin and K-A Toh ldquoWeighted online sequentialextreme learning machine for class imbalance learningrdquoNeuralProcessing Letters vol 38 no 3 pp 465ndash486 2013

[16] X Liu L Wang G B Huang J Zhang and J Yin ldquoMultiplekernel extreme learning machinerdquo Neurocomputing vol 149part A pp 253ndash264 2015

[17] Y Lan Y C Soh andG BHuang ldquoTwo-stage extreme learningmachine for regressionrdquo Neurocomputing vol 73 no 16-18 pp3028ndash3038 2010

[18] C-T Lin M Prasad and A Saxena ldquoAn Improved PolynomialNeural Network Classifier Using Real-Coded Genetic Algo-rithmrdquo IEEE Transactions on Systems Man and CyberneticsSystems vol 45 no 11 pp 1389ndash1401 2015

[19] C F Stallmann and A P Engelbrecht ldquoGramophone NoiseDetection and Reconstruction Using Time Delay ArtificialNeural Networksrdquo IEEE Transactions on Systems Man andCybernetics Systems vol 47 no 6 pp 893ndash905 2017

[20] E Cambria G-BHuang and L L C Kasun ldquoExtreme learningmachinesrdquo IEEE Intelligent Systems vol 28 no 6 pp 30ndash592013

[21] W Zhu J Miao L Qing and G-B Huang ldquoHierarchicalExtreme Learning Machine for unsupervised representationlearningrdquo in Proceedings of the International Joint Conference onNeural Networks IJCNN 2015 Ireland July 2015

[22] W Yu F Zhuang Q He and Z Shi ldquoLearning deep representa-tions via extreme learningmachinesrdquoNeurocomputing pp 308ndash315 2015

[23] G-B Huang ldquoLearning capability and storage capacity oftwo-hidden-layer feedforward networksrdquo IEEE Transactions onNeural Networks and Learning Systems vol 14 no 2 pp 274ndash281 2003

[24] G-B Huang and H A Babri ldquoUpper bounds on the numberof hidden neurons in feedforward networks with arbitrarybounded nonlinear activation functionsrdquo IEEE Transactions onNeural Networks and Learning Systems vol 9 no 1 pp 224ndash2291998

[25] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[26] JWei H Liu G Yan and F Sun ldquoRobotic grasping recognitionusing multi-modal deep extreme learning machinerdquo Multidi-mensional Systems and Signal Processing vol 28 no 3 pp 817ndash833 2016

[27] J J Liang A K Qin P N Suganthan and S BaskarldquoComprehensive learning particle swarm optimizer for globaloptimization of multimodal functionsrdquo IEEE Transactions onEvolutionary Computation vol 10 no 3 pp 281ndash295 2006

[28] University of California Irvine Machine Learning Repositoryhttparchiveicsucieduml

[29] LWebsite httpswwwcsientuedutwsimcjlinlibsvmtoolsdata-sets

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

8 Mathematical Problems in Engineering

Training data

ELMTELM

Three-hidden-layerMLELM

75

80

85

90

95

100

Aver

age t

rain

ing

corr

ect p

erce

ntag

e (

)

40 60 80 100 120 140 16020The number of hidden neurons

(a)

Testing data

ELMTELM

Three-hidden-layerMLELM

50

55

60

65

70

75

80

85

90

95

100

Aver

age t

estin

g co

rrec

t per

cent

age (

)

40 60 80 100 120 140 16020The number of hidden neurons

(b)

Figure 6 Average classification accuracy of different hidden layers

University of China Our experimentsrsquo datasets are the nearinfrared spectrum information of iron ores which are theabsorption rate of different iron ores under the condition ofthe near infrared spectrum The wavelength of near infraredspectrum is 300 nmndash2500 nm so the datasets of the iron oreswe obtained have the very high dimensions (973 dimensions)We use the theory of the principal component analysis toreduce the dimensions of the datasets

(A) Here we use the algorithm proposed in this paper toclassify the five kinds of ores and compare with the resultsof the ELM algorithm and TELM algorithm The activa-tion function of the algorithm in classification problems isselected as the sigmoid function 119892(119909) = 1(1 + 119890minus119909)

Figure 6 expresses the average classification accuracy ofthe ELM TELM MLELM and MELM (three-hidden-layerELM) algorithm in different hidden layers After we conducta series of experiments with different hidden layers of theMELM structure finally we select the three-hidden-layerELM structure which can obtain the optimal classificationresults in the MELM algorithm Figure 6(a) is the averagetraining correct percentage and the MELM has the highresult compared with others in the same hidden neuronsFigure 6(b) is the average testing correct percentage and theMELM has the high result in the hidden neurons between 20and 60 In the actual situation we can reasonably choose thenumber of each hidden neurons to model

(B) Here we use the method proposed in this paper totest the total iron content of the hematite and magnetite andcompare with the results of the ELM algorithm and TELMalgorithm Here is the prediction of total iron content ofhematite and magnetite with the optimal hidden layers andoptimal hidden layer neurons The activation function of the

algorithm in regression problems is selected as the hyperbolictangent function 119892(119909) = (1 minus 119890minus119909)(1 + 119890minus119909)

Figures 7 and 8 respectively expresses the total ironcontent of the hematite and magnetite by using the ELMTELM MLELM andMELM algorithm (the MELM includesthe eight-hidden-layer structure which is the optimal struc-ture in Figure 7 the six-hidden-layer structure which is theoptimal structure in Figure 8) The optimal results of eachalgorithm are obtained by constantly experimenting with thedifferent number of hidden layer neurons In the predictionprocess of the hematite we can see that the MELM has thebetter results and each hidden layer of this structure has 100hidden neurons In the prediction process of the magnetitewe can see that the MELM has the better results and eachhidden layer of this structure has 20 hidden neurons Sothe MELM algorithm has strong performance to the oresselection problems which can achieve the best adaptabilityto the problems by adjusting the number of hidden layers andthe number of hidden layer neurons to improve the ability ofsystem regression analysis

6 Conclusion

The MELM we proposed in this paper solves the two prob-lems which exist in the training process of the original ELMThe first is the stability of single network that the disadvan-tage will also influence the generalization performance Thesecond is that the outputweights are calculated by the productof the generalized inverse of hidden layer output and thesystem actual output And the parameters randomly selectedof hidden layer neurons which can lead to the singularmatrix or morbid matrix At the same time the MELM

Mathematical Problems in Engineering 9

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

16

18

20

22

24

26

28

30

32

34

36Tr

ain

set o

utpu

t

10 20 30 40 50 600Train set sample

(a)

1 2 3 4 5 6 7 8 9 10Test set sample

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

15

20

25

30

35

40

45

Test

set o

utpu

t(b)

Figure 7 The total iron content of the hematite

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

Trai

n se

t out

put

2 3 4 5 6 7 8 91Train set sample

(a)

1 15 2 25 3 35 4 45 5Test set sample

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

40

Test

set o

utpu

t

(b)

Figure 8 The total iron content of the magnetite

network structure also improves the average accuracy oftraining and testing performance compared to the ELM andTELM network structure The MELM algorithm inherits thecharacteristics of traditional ELM that randomly initializesthe weights and bias (between the input layer and the firsthidden layer) also adopts a part of the TELM algorithm anduses the inverse activation function to calculate the weightsand bias of hidden layers (except the first hidden layer)Then we make the actual hidden layer output approximateto the expected hidden layer output and use the parameters

obtained above to calculate the actual output In the functionregression problems this algorithm reduces the least meansquare error In the datasets classification problems theaverage accuracy of themultiple classifications is significantlyhigher than that of the ELM and TELMnetwork structure Insuch cases the MELM is able to improve the performance ofthe network structure

Conflicts of Interest

The authors declare no conflicts of interest

10 Mathematical Problems in Engineering

Acknowledgments

This research is supported by National Natural ScienceFoundation of China (Grants nos 41371437 61473072and 61203214) Fundamental Research Funds for the Cen-tral Universities (N160404008) National Twelfth Five-YearPlan for Science and Technology Support (2015BAB15B01)China and National Key Research and Development Plan(2016YFC0801602)

References

[1] K Hornik ldquoApproximation capabilities of multilayer feedfor-ward networksrdquoNeural Networks vol 4 no 2 pp 251ndash257 1991

[2] M Leshno V Y Lin A Pinkus and S Schocken ldquoMultilayerfeedforward networks with a nonpolynomial activation func-tion can approximate any functionrdquoNeural Networks vol 6 no6 pp 861ndash867 1993

[3] R Hecht-Nielsen ldquoTheory of the backpropagation neural net-workrdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo89) vol 1 pp 593ndash605 WashingtonDC USA June 1989

[4] H-C Hsin C-C Li M Sun and R J Sclabassi ldquoAn AdaptiveTraining Algorithm for Back-Propagation Neural NetworksrdquoIEEE Transactions on SystemsMan and Cybernetics vol 25 no3 pp 512ndash514 1995

[5] D Lowe ldquodaptive radial basis function nonlinearities and theproblemof generalizationrdquo in First IEE International Conferenceon Artificial Neural Networks pp 313-171 1989

[6] S Ding G Ma and Z Shi ldquoA Rough RBF Neural NetworkBased on Weighted Regularized Extreme Learning MachinerdquoNeural Processing Letters vol 40 no 3 pp 245ndash260 2014

[7] C SunW HeW Ge and C Chang ldquoAdaptive Neural NetworkControl of Biped Robotsrdquo IEEE Transactions on Systems Manand Cybernetics Systems vol 47 no 2 pp 315ndash326 2017

[8] S-J Wang H-L Chen W-J Yan Y-H Chen and X FuldquoFace recognition and micro-expression recognition based ondiscriminant tensor subspace analysis plus extreme learningmachinerdquo Neural Processing Letters vol 39 no 1 pp 25ndash432014

[9] G Huang ldquoAn insight into extreme learningmachines randomneurons random features and kernelsrdquo Cognitive Computation2014

[10] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[11] S Ding N Zhang X Xu L Guo and J Zhang ldquoDeep ExtremeLearning Machine and Its Application in EEG ClassificationrdquoMathematical Problems in Engineering vol 2015 Article ID129021 2015

[12] B Qu B Lang J Liang A Qin and O Crisalle ldquoTwo-hidden-layer extreme learning machine for regression andclassificationrdquo Neurocomputing vol 175 pp 826ndash834 2016

[13] S I Tamura and M Tateishi ldquoCapabilities of a four-layeredfeedforward neural network four layers versus threerdquo IEEETransactions on Neural Networks and Learning Systems vol 8no 2 pp 251ndash255 1997

[14] J Zhao Z Wang and D S Park ldquoOnline sequential extremelearning machine with forgetting mechanismrdquo Neurocomput-ing vol 87 no 15 pp 79ndash89 2012

[15] B Mirza Z Lin and K-A Toh ldquoWeighted online sequentialextreme learning machine for class imbalance learningrdquoNeuralProcessing Letters vol 38 no 3 pp 465ndash486 2013

[16] X Liu L Wang G B Huang J Zhang and J Yin ldquoMultiplekernel extreme learning machinerdquo Neurocomputing vol 149part A pp 253ndash264 2015

[17] Y Lan Y C Soh andG BHuang ldquoTwo-stage extreme learningmachine for regressionrdquo Neurocomputing vol 73 no 16-18 pp3028ndash3038 2010

[18] C-T Lin M Prasad and A Saxena ldquoAn Improved PolynomialNeural Network Classifier Using Real-Coded Genetic Algo-rithmrdquo IEEE Transactions on Systems Man and CyberneticsSystems vol 45 no 11 pp 1389ndash1401 2015

[19] C F Stallmann and A P Engelbrecht ldquoGramophone NoiseDetection and Reconstruction Using Time Delay ArtificialNeural Networksrdquo IEEE Transactions on Systems Man andCybernetics Systems vol 47 no 6 pp 893ndash905 2017

[20] E Cambria G-BHuang and L L C Kasun ldquoExtreme learningmachinesrdquo IEEE Intelligent Systems vol 28 no 6 pp 30ndash592013

[21] W Zhu J Miao L Qing and G-B Huang ldquoHierarchicalExtreme Learning Machine for unsupervised representationlearningrdquo in Proceedings of the International Joint Conference onNeural Networks IJCNN 2015 Ireland July 2015

[22] W Yu F Zhuang Q He and Z Shi ldquoLearning deep representa-tions via extreme learningmachinesrdquoNeurocomputing pp 308ndash315 2015

[23] G-B Huang ldquoLearning capability and storage capacity oftwo-hidden-layer feedforward networksrdquo IEEE Transactions onNeural Networks and Learning Systems vol 14 no 2 pp 274ndash281 2003

[24] G-B Huang and H A Babri ldquoUpper bounds on the numberof hidden neurons in feedforward networks with arbitrarybounded nonlinear activation functionsrdquo IEEE Transactions onNeural Networks and Learning Systems vol 9 no 1 pp 224ndash2291998

[25] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[26] JWei H Liu G Yan and F Sun ldquoRobotic grasping recognitionusing multi-modal deep extreme learning machinerdquo Multidi-mensional Systems and Signal Processing vol 28 no 3 pp 817ndash833 2016

[27] J J Liang A K Qin P N Suganthan and S BaskarldquoComprehensive learning particle swarm optimizer for globaloptimization of multimodal functionsrdquo IEEE Transactions onEvolutionary Computation vol 10 no 3 pp 281ndash295 2006

[28] University of California Irvine Machine Learning Repositoryhttparchiveicsucieduml

[29] LWebsite httpswwwcsientuedutwsimcjlinlibsvmtoolsdata-sets

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

Mathematical Problems in Engineering 9

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

16

18

20

22

24

26

28

30

32

34

36Tr

ain

set o

utpu

t

10 20 30 40 50 600Train set sample

(a)

1 2 3 4 5 6 7 8 9 10Test set sample

The hematite

Expected outputELMTELM

Eight-hidden-layerMLELM

15

20

25

30

35

40

45

Test

set o

utpu

t(b)

Figure 7 The total iron content of the hematite

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

Trai

n se

t out

put

2 3 4 5 6 7 8 91Train set sample

(a)

1 15 2 25 3 35 4 45 5Test set sample

The magnetite

Expected outputELMTELM

Six-hidden-layerMLELM

26

28

30

32

34

36

38

40

Test

set o

utpu

t

(b)

Figure 8 The total iron content of the magnetite

network structure also improves the average accuracy oftraining and testing performance compared to the ELM andTELM network structure The MELM algorithm inherits thecharacteristics of traditional ELM that randomly initializesthe weights and bias (between the input layer and the firsthidden layer) also adopts a part of the TELM algorithm anduses the inverse activation function to calculate the weightsand bias of hidden layers (except the first hidden layer)Then we make the actual hidden layer output approximateto the expected hidden layer output and use the parameters

obtained above to calculate the actual output In the functionregression problems this algorithm reduces the least meansquare error In the datasets classification problems theaverage accuracy of themultiple classifications is significantlyhigher than that of the ELM and TELMnetwork structure Insuch cases the MELM is able to improve the performance ofthe network structure

Conflicts of Interest

The authors declare no conflicts of interest

10 Mathematical Problems in Engineering

Acknowledgments

This research is supported by National Natural ScienceFoundation of China (Grants nos 41371437 61473072and 61203214) Fundamental Research Funds for the Cen-tral Universities (N160404008) National Twelfth Five-YearPlan for Science and Technology Support (2015BAB15B01)China and National Key Research and Development Plan(2016YFC0801602)

References

[1] K Hornik ldquoApproximation capabilities of multilayer feedfor-ward networksrdquoNeural Networks vol 4 no 2 pp 251ndash257 1991

[2] M Leshno V Y Lin A Pinkus and S Schocken ldquoMultilayerfeedforward networks with a nonpolynomial activation func-tion can approximate any functionrdquoNeural Networks vol 6 no6 pp 861ndash867 1993

[3] R Hecht-Nielsen ldquoTheory of the backpropagation neural net-workrdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo89) vol 1 pp 593ndash605 WashingtonDC USA June 1989

[4] H-C Hsin C-C Li M Sun and R J Sclabassi ldquoAn AdaptiveTraining Algorithm for Back-Propagation Neural NetworksrdquoIEEE Transactions on SystemsMan and Cybernetics vol 25 no3 pp 512ndash514 1995

[5] D Lowe ldquodaptive radial basis function nonlinearities and theproblemof generalizationrdquo in First IEE International Conferenceon Artificial Neural Networks pp 313-171 1989

[6] S Ding G Ma and Z Shi ldquoA Rough RBF Neural NetworkBased on Weighted Regularized Extreme Learning MachinerdquoNeural Processing Letters vol 40 no 3 pp 245ndash260 2014

[7] C SunW HeW Ge and C Chang ldquoAdaptive Neural NetworkControl of Biped Robotsrdquo IEEE Transactions on Systems Manand Cybernetics Systems vol 47 no 2 pp 315ndash326 2017

[8] S-J Wang H-L Chen W-J Yan Y-H Chen and X FuldquoFace recognition and micro-expression recognition based ondiscriminant tensor subspace analysis plus extreme learningmachinerdquo Neural Processing Letters vol 39 no 1 pp 25ndash432014

[9] G Huang ldquoAn insight into extreme learningmachines randomneurons random features and kernelsrdquo Cognitive Computation2014

[10] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[11] S Ding N Zhang X Xu L Guo and J Zhang ldquoDeep ExtremeLearning Machine and Its Application in EEG ClassificationrdquoMathematical Problems in Engineering vol 2015 Article ID129021 2015

[12] B Qu B Lang J Liang A Qin and O Crisalle ldquoTwo-hidden-layer extreme learning machine for regression andclassificationrdquo Neurocomputing vol 175 pp 826ndash834 2016

[13] S I Tamura and M Tateishi ldquoCapabilities of a four-layeredfeedforward neural network four layers versus threerdquo IEEETransactions on Neural Networks and Learning Systems vol 8no 2 pp 251ndash255 1997

[14] J Zhao Z Wang and D S Park ldquoOnline sequential extremelearning machine with forgetting mechanismrdquo Neurocomput-ing vol 87 no 15 pp 79ndash89 2012

[15] B Mirza Z Lin and K-A Toh ldquoWeighted online sequentialextreme learning machine for class imbalance learningrdquoNeuralProcessing Letters vol 38 no 3 pp 465ndash486 2013

[16] X Liu L Wang G B Huang J Zhang and J Yin ldquoMultiplekernel extreme learning machinerdquo Neurocomputing vol 149part A pp 253ndash264 2015

[17] Y Lan Y C Soh andG BHuang ldquoTwo-stage extreme learningmachine for regressionrdquo Neurocomputing vol 73 no 16-18 pp3028ndash3038 2010

[18] C-T Lin M Prasad and A Saxena ldquoAn Improved PolynomialNeural Network Classifier Using Real-Coded Genetic Algo-rithmrdquo IEEE Transactions on Systems Man and CyberneticsSystems vol 45 no 11 pp 1389ndash1401 2015

[19] C F Stallmann and A P Engelbrecht ldquoGramophone NoiseDetection and Reconstruction Using Time Delay ArtificialNeural Networksrdquo IEEE Transactions on Systems Man andCybernetics Systems vol 47 no 6 pp 893ndash905 2017

[20] E Cambria G-BHuang and L L C Kasun ldquoExtreme learningmachinesrdquo IEEE Intelligent Systems vol 28 no 6 pp 30ndash592013

[21] W Zhu J Miao L Qing and G-B Huang ldquoHierarchicalExtreme Learning Machine for unsupervised representationlearningrdquo in Proceedings of the International Joint Conference onNeural Networks IJCNN 2015 Ireland July 2015

[22] W Yu F Zhuang Q He and Z Shi ldquoLearning deep representa-tions via extreme learningmachinesrdquoNeurocomputing pp 308ndash315 2015

[23] G-B Huang ldquoLearning capability and storage capacity oftwo-hidden-layer feedforward networksrdquo IEEE Transactions onNeural Networks and Learning Systems vol 14 no 2 pp 274ndash281 2003

[24] G-B Huang and H A Babri ldquoUpper bounds on the numberof hidden neurons in feedforward networks with arbitrarybounded nonlinear activation functionsrdquo IEEE Transactions onNeural Networks and Learning Systems vol 9 no 1 pp 224ndash2291998

[25] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[26] JWei H Liu G Yan and F Sun ldquoRobotic grasping recognitionusing multi-modal deep extreme learning machinerdquo Multidi-mensional Systems and Signal Processing vol 28 no 3 pp 817ndash833 2016

[27] J J Liang A K Qin P N Suganthan and S BaskarldquoComprehensive learning particle swarm optimizer for globaloptimization of multimodal functionsrdquo IEEE Transactions onEvolutionary Computation vol 10 no 3 pp 281ndash295 2006

[28] University of California Irvine Machine Learning Repositoryhttparchiveicsucieduml

[29] LWebsite httpswwwcsientuedutwsimcjlinlibsvmtoolsdata-sets

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

10 Mathematical Problems in Engineering

Acknowledgments

This research is supported by National Natural ScienceFoundation of China (Grants nos 41371437 61473072and 61203214) Fundamental Research Funds for the Cen-tral Universities (N160404008) National Twelfth Five-YearPlan for Science and Technology Support (2015BAB15B01)China and National Key Research and Development Plan(2016YFC0801602)

References

[1] K Hornik ldquoApproximation capabilities of multilayer feedfor-ward networksrdquoNeural Networks vol 4 no 2 pp 251ndash257 1991

[2] M Leshno V Y Lin A Pinkus and S Schocken ldquoMultilayerfeedforward networks with a nonpolynomial activation func-tion can approximate any functionrdquoNeural Networks vol 6 no6 pp 861ndash867 1993

[3] R Hecht-Nielsen ldquoTheory of the backpropagation neural net-workrdquo in Proceedings of the International Joint Conference onNeural Networks (IJCNN rsquo89) vol 1 pp 593ndash605 WashingtonDC USA June 1989

[4] H-C Hsin C-C Li M Sun and R J Sclabassi ldquoAn AdaptiveTraining Algorithm for Back-Propagation Neural NetworksrdquoIEEE Transactions on SystemsMan and Cybernetics vol 25 no3 pp 512ndash514 1995

[5] D Lowe ldquodaptive radial basis function nonlinearities and theproblemof generalizationrdquo in First IEE International Conferenceon Artificial Neural Networks pp 313-171 1989

[6] S Ding G Ma and Z Shi ldquoA Rough RBF Neural NetworkBased on Weighted Regularized Extreme Learning MachinerdquoNeural Processing Letters vol 40 no 3 pp 245ndash260 2014

[7] C SunW HeW Ge and C Chang ldquoAdaptive Neural NetworkControl of Biped Robotsrdquo IEEE Transactions on Systems Manand Cybernetics Systems vol 47 no 2 pp 315ndash326 2017

[8] S-J Wang H-L Chen W-J Yan Y-H Chen and X FuldquoFace recognition and micro-expression recognition based ondiscriminant tensor subspace analysis plus extreme learningmachinerdquo Neural Processing Letters vol 39 no 1 pp 25ndash432014

[9] G Huang ldquoAn insight into extreme learningmachines randomneurons random features and kernelsrdquo Cognitive Computation2014

[10] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[11] S Ding N Zhang X Xu L Guo and J Zhang ldquoDeep ExtremeLearning Machine and Its Application in EEG ClassificationrdquoMathematical Problems in Engineering vol 2015 Article ID129021 2015

[12] B Qu B Lang J Liang A Qin and O Crisalle ldquoTwo-hidden-layer extreme learning machine for regression andclassificationrdquo Neurocomputing vol 175 pp 826ndash834 2016

[13] S I Tamura and M Tateishi ldquoCapabilities of a four-layeredfeedforward neural network four layers versus threerdquo IEEETransactions on Neural Networks and Learning Systems vol 8no 2 pp 251ndash255 1997

[14] J Zhao Z Wang and D S Park ldquoOnline sequential extremelearning machine with forgetting mechanismrdquo Neurocomput-ing vol 87 no 15 pp 79ndash89 2012

[15] B Mirza Z Lin and K-A Toh ldquoWeighted online sequentialextreme learning machine for class imbalance learningrdquoNeuralProcessing Letters vol 38 no 3 pp 465ndash486 2013

[16] X Liu L Wang G B Huang J Zhang and J Yin ldquoMultiplekernel extreme learning machinerdquo Neurocomputing vol 149part A pp 253ndash264 2015

[17] Y Lan Y C Soh andG BHuang ldquoTwo-stage extreme learningmachine for regressionrdquo Neurocomputing vol 73 no 16-18 pp3028ndash3038 2010

[18] C-T Lin M Prasad and A Saxena ldquoAn Improved PolynomialNeural Network Classifier Using Real-Coded Genetic Algo-rithmrdquo IEEE Transactions on Systems Man and CyberneticsSystems vol 45 no 11 pp 1389ndash1401 2015

[19] C F Stallmann and A P Engelbrecht ldquoGramophone NoiseDetection and Reconstruction Using Time Delay ArtificialNeural Networksrdquo IEEE Transactions on Systems Man andCybernetics Systems vol 47 no 6 pp 893ndash905 2017

[20] E Cambria G-BHuang and L L C Kasun ldquoExtreme learningmachinesrdquo IEEE Intelligent Systems vol 28 no 6 pp 30ndash592013

[21] W Zhu J Miao L Qing and G-B Huang ldquoHierarchicalExtreme Learning Machine for unsupervised representationlearningrdquo in Proceedings of the International Joint Conference onNeural Networks IJCNN 2015 Ireland July 2015

[22] W Yu F Zhuang Q He and Z Shi ldquoLearning deep representa-tions via extreme learningmachinesrdquoNeurocomputing pp 308ndash315 2015

[23] G-B Huang ldquoLearning capability and storage capacity oftwo-hidden-layer feedforward networksrdquo IEEE Transactions onNeural Networks and Learning Systems vol 14 no 2 pp 274ndash281 2003

[24] G-B Huang and H A Babri ldquoUpper bounds on the numberof hidden neurons in feedforward networks with arbitrarybounded nonlinear activation functionsrdquo IEEE Transactions onNeural Networks and Learning Systems vol 9 no 1 pp 224ndash2291998

[25] G-B Huang H Zhou X Ding and R Zhang ldquoExtremelearning machine for regression and multiclass classificationrdquoIEEE Transactions on Systems Man and Cybernetics Part BCybernetics vol 42 no 2 pp 513ndash529 2012

[26] JWei H Liu G Yan and F Sun ldquoRobotic grasping recognitionusing multi-modal deep extreme learning machinerdquo Multidi-mensional Systems and Signal Processing vol 28 no 3 pp 817ndash833 2016

[27] J J Liang A K Qin P N Suganthan and S BaskarldquoComprehensive learning particle swarm optimizer for globaloptimization of multimodal functionsrdquo IEEE Transactions onEvolutionary Computation vol 10 no 3 pp 281ndash295 2006

[28] University of California Irvine Machine Learning Repositoryhttparchiveicsucieduml

[29] LWebsite httpswwwcsientuedutwsimcjlinlibsvmtoolsdata-sets

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: A Multiple Hidden Layers Extreme Learning Machine Method and Its Applicationdownloads.hindawi.com/journals/mpe/2017/4670187.pdf · 2017-08-30 · ResearchArticle A Multiple Hidden

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of