A Structured Approach to Neural

Embed Size (px)

Citation preview

  • 8/2/2019 A Structured Approach to Neural

    1/8

    A Structured Approach to Neural Networks in Bankruptcy PredictionFernando C. Almeida, FEARP, U SP- razil

    AbstractThis paper explores the difficulties a s well a s astructured approach in developing neural networks inmanagement, more specifically in bankruptcyprediction. It distinguishes the potential strategicinterest in using neural networks in a firm as adecision support tool. Neural networks are developedan tested through the exploring of the French industryof merchandise transportation. Through theconstruction of an experimental plan neural networksare teste d and com pared w ith logistic regression..

    Key words: Neural Networks, Decision SupportSystems, Methodology, Logistic Regression

    Though numerous studies have been devoted tobankruptcy prediction 1, the problem of decisionsupport and the development of information systems inbankruptcy risk and credit evaluation is rarelydeveloped. Many authors have explored neural nets inbankruptcy prediction ([1];[3];[5]; 12]; [19]; etc.)though difficulties in using this tool and structuredmethods for exploring it were not explored. This p aperproposes a structured method to explore bankruptcyprediction and analyses the French transportationindustry.

    1. Information Systems in Context:Bankruptcy Prediction and NeuralNetworks

    A neural network decision support system tobankruptcy or credit evaluation has many strategic andorganisational implications to a firm.The Strategic Information TechnologyBankruptcy risk evaluation is usually supported bystatistical tools like the ZETA model [2 ] developed byusing a discriminant analysis and emp loyed by severa lfinancial institutions [161.The use of new information technologies scarcelyexplored in credit domain can offer considerable

    See DUMONTIER (1990) fo r a com prehensive review of thesestudies.

    0-8186-8070-9/97 $10.00 0 997 IEEE 1

    competitive trumps to a firm. The Chase ManhattanBank, for instance, has developed a neural networksystem that automatically detects frauds in credit carduse 30% m ore accurately than statistical methods [141.Chase Manhattans system has interesting strategicoutcomes: its superior performance gives the bankprivileged information in relation to its competitors. Itwill probably reduce the banks costs in non refundedcredits. The system is also a differentiation source: asa quick er blockin g of the irregular use of cards isexecuted, a more satisfying service is offered tocustomers.Neural Networks in the Organisation

    In a growing com plex organisatilonal context, a firmmust captu re information from outside its boundaries ina prospective way. New information technology toolslike neural networks act in this manner.

    The following points reveal the importance ofinformation [SI:Information permits incertitude reductions inthe decision process improving its quality andeffectiveness;It adds value to products or services;Information flow effectiveness conditions the

    liaisons quality an d relationships engaged by the firm.From our point of view, a neur#alnetwork is a toolthat permits a firm to scan its environment in aprivileged way and assure its endurance.

    Knowledge Base Systems and DecisionSupport SystemsA Decision Support System (DSS) is a man-machine system that through a dialogue interfaceamplifies decision makers reasoning capabilities incomplex and ill structured problem resolution ([4]

    p.30). Neural Networks can enlarge capabilities of thetraditional DSS limited by what ifs scenariosanalysis. Traditional DSS used 10 suppose that asystems capability in generalising and analysing agreater number of alternatives could improve decisionprocess effectiveness. However DSS shall evolvethrough the use of new information technologies frompassive systems to active ones having the capability of

  • 8/2/2019 A Structured Approach to Neural

    2/8

    influencing and guiding decision process [9]. DSS mayevolve through the use of neural networks giving itenlarged capabilities.

    R 1 = Net sales/total assets

    R2 = Total debutotal assetsR 3 = Cash flow/ net salesR4 = Curren t ratioR 5 = EBIT/total interestspaymentR 6 = Total Income/Total

    2. Neural Networks

    R7 = Earnings BeforeInterests and Taxes/TotalAssetsR 8 = Sales/Net PlantR 9 = CashlTotal AssetsR10= Inventory/SalesR11=lnven tory/ReceivablesR12= log(Tota1 Assets)

    The Backpropagation Learning MethodNetworks are constructed in this researchbackpropagation method as proposed byRUMELHART et al.s [151 model of generalised deltarule :

    AWji(n+l ) = pSjOi + aAWji(n)

    AWji(n+ 1) is the weight adjustment introduced attime n+l on the connection weight between neurones iand j . p is a constant called learning rate that controlsthe rate of corrections made on connection weights.The larger the learning rate, the larger the changesintroduced in weights in each iteration. a is a constantcalled smoothing factor that makes learning processconsider the weight value at time n. 6 is the error signalat the neurone output.

    The backpropagation model is a feed-forward modelfrequently used in management and financialapplications ( [ IO ] ; [SI; [5], [121,etc.).3. Bankruptcy PredictionOverview of the Process

    Most bankruptcy prediction models are built using apaired-sample technique : one part of the samplecontains data from failing firms, the other part containscontemporaneous data from non-failing firms.Variables are then selected because of their potentialrelevancy to detect bankruptcy and a statistical m ethodis used to develop a classification model (i.e.,: acombination of variables that best discriminatesbetween the two types of firms). Finally, theclassification success is evaluated on a holdout sample(i.e.: a sample other than the one used to derive themodel).

    4. Quantitative AnalysisSample Selection and Data Collection

    The data sample of this study consists of 2736French firms belonging to the transport industryincluding I14 firms that failed in the period 195 5-1990 .

    The collection of data for bankrupt firms requires adefinition of failure. This definition here is purelylegalistic: failed firms are those whose failure has beensanctioned by judicial proceedings.

    CapitalChoices in statistical methods

    Identifying the distinguishing characteristicsbetween bankrupt and non bankrupt firms is critical forat least two reasons: i) because of the lack of acomprehensive theory concerning bankruptcy, someunknown relevant predictor variables may be forgotten; ii) intricate relationships among predictor variablesmay alter the predictive ability of the classificationmodels. As they can work with noisy and incompleteinputs and produce the correct output by making use ofcontext and generalising in incomplete information,neural nets are supposed to perform well in theprediction of risk failure. That is why a neural netsapproach is introduced in this study. However, in orderto appreciate neural nets performance in bankruptcyprediction, their classification accuracy will becompared with that of LOGIT analysis. The logisticregression approach is here preferred over the moreusual multivariate discriminant analysis because it is atleast as efficient as a linear classifie r, even when all theassumptions of discrim inant analysis hold [131.

    Since OHLSON [1 ] LOGIT analysis is frequentlyused to estimate the failure risk, conditioned on

    2

  • 8/2/2019 A Structured Approach to Neural

    3/8

    financial characteristics (i.e. : ratios) of firms. TheLOGIT model creates for each firm a score Z that maybe used to assess the probability of failure:Z = a + p X ifinancial characteristic).where Xi is the value of the ith variable (i.e.,:

    1

    Since, by construction, P always falls between 0 an d1, it is usually interpreted as the probability of failure.Incorporating Prior Probabilities and Cost ofMisclassification

    1.2.3 .4.

    Definition of an e xperim ental planConstruction of the neural nets from experimentalplanGraph analysis of neural nets resultsIdentification of a portfolio of netw orks

    Definition of an Experimental PlanThe following elements were explored through the

    experimental plan:

    Prior probabilities of failure and cost ofmisclassification must be assigned to guarantee asuccessful application of the predicting model. Thereare two types of errors of classification. The first one,called type I error, consists of identifying a failed firmas non failed . The type I1 error consists of identifying anon failed firm as failed.

    The cut-off score is chosen so as to minimise themisclassifiaction of the two groups. With neural nets aswell as with logistic regressions, this score may varyfrom 0 to 1 in so far as the probability of failure isequal to 1 for failing firm and to 0 for non failing firms. i. The use of historical dataValidation of Results

    As a model generally fit,s the sam ple from which itwas derived, two sub samples were randomly selectedfrom the en tire 2414 firms sample. T he first one is usedto derive the models, the second one is used to testmodels predictive accuracy.Neural Nets Construction Methodology

    This paper proposes the concep tion and execution ofan experimental plan to explore the various parameters(or factors) influence on networks performance. Theplan evaluation is done through a graph analysismethod. The most performing networks will beintroduced in a portfolio to appreciate the failure risk.Performance is evaluated considering the percentage offirms correctly classified in each of two group s of firms(failing and non failing groups).

    The proposed methodology includes the followingsteps:

    Thre e wa ys of introducing historical (data were used:H1 : Th e introduction of the latest ratio valueof each variable followed by the precedingyears value: Rn - 1 Rn-2

    before bankruptcy are intiroducedValues from one (Rn-1) and two years (Rn-2)

    H 2 :The introduction of the latest ratio value ofeach variable an d the difference betweentwo succeeding years: R n-l ,& = Rn-l -Rn-2.Values one year before bankruptcy (Rn-l) and

    difference between two values of the samevariable one and tw o years before failing (&)are introduced.H3 : Th e introduction of the latest ratio value ofeach variable a nd the ratio between two

    succeeding values: Rn- ,, & = R n- ,/ R,,-23

  • 8/2/2019 A Structured Approach to Neural

    4/8

    Values of each variable one year beforebankruptcy (Rn.l) and the ratio between valuesof the same variable one and two years beforebankruptcy (&) are introduced.

    ii. Number of Neurones in Hidden LayersThe number of hidden neurones and hidden layers

    does not have any theoretical limit. This limit is onlyimposed by costs, time, and computational constraintsin creating a network. In this study, hidden neuronesare explored as follows: 5, 10 , 40 or 80 neurones byhidden layer.

    iii. Number of Hidden LayersAs time necessary to train a network increases withthe number of layers, only two hidden layers will be

    used in this study and the number of neurones is limitedto 40 in networks with two hidden layers. Threeconfigurations are tested: i) 5 neurones in the firsthidden layer and 5 in the second on e, ii) 1 0 in the firstan d 10 in the other one, iii) 40 neurones in each hiddenlayer.

    iv . Predicting RatiosAs the most relevant ratios to predict failure are notfully know n, two batteries of ratios are used.Construction of the Neural Nets from anExperimental PlanNumber of Essays

    The greater the number of experiences per cell, thebetter the experimental plan. However, because of thetime necessary to train a network, the number ofexperiences must be carefully selected. Th e number ofexperiences per cell was here limited to 10. Out of the76 failing and 2338 non failing firms, 45 failing firmsan d 135 non failing firms were randomly selected tocreate each network. The remaining 31 failing and 93non failing firms were used to validate the results. Thisrandom selection was made 10 times to obtain 10 su bsamples. With these 10 sub samples, 10 networks werecreated for each of the 56 cells in the experimentalplan. According to table 2, 560 neural nets weretherefore created to evaluate the influence of eachparameter previously described on the networkspredictive accuracy.

    The proportion between failing and non failing firmsis 1:3. This proportion was chosen to increase the

    representation of non failing firms in the samples. Itcan be noticed that this proportion is not consistentwith the real distribution of failing and non failingfirms in the population, but the inclusion of more nonfailing firms would have increased the learning timeand therefore the computational costs. The neuralnetworks learning algorithm could have beentransformed to take the prior probability of failure intoaccount, as suggested by TAM & KIANG [ l 8 ] .Unfortunately, the package used in this study does notallow any correction of the algorithm.Number of Interaction

    An iteration is a complete reading of the data setduring the learning process. The learning process ofcertain neural nets has converged ( i.e.: the network haslearned all facts in the fixed precision of 0,l). So m enetworks, however, have not converged after a certainnumber of iterations. Based on the mean error2observed, the learning process in these cases wasinterrupted after 1300 iterations.Graph Analysis of Neural Nets Results

    A graph analysis was elaborated to evaluate theperformance of networks constructed. As it has beenmentioned earlier, when considering a cut-off score toclass a firm in one of two grou ps (failing or non failing)there are two types of classifying error: the type I errorand the type 11 error. Through a graph analysisdifferent cut-off values may be considered and networkperformance can be observed in different error levels.In varying cut-off scores, different errors ofclassification are obtained for each network, (i.e.different percentages of firms correctly classified areobtained in each of two groups). The following cut-offvalues were used : 0,95;0,9;0,7;0,5;0,3;0,1;0,05;0,01.As 10 sub samples were randomly selected to create 10times the same configuration of network (56configurations in the experimental plan), means (p) ndstandard deviation (0) f percentage of firms correctlyclassified in failing and non failing groups werecalculated for each of 56 configurations. Th e trade-offbetween failing and non failing percentage of correctclassification generates 8 points, one for each cut-offvalue, that are plotted in a graph. Curves from differentnetworks may be compared.

    The m ean error is the error observed on each neurone dividedby the number of neurones an d multiplied by the total numb erof examples (o r facts) in the training set.

    4

  • 8/2/2019 A Structured Approach to Neural

    5/8

    Instead of using only the mean of correct classificationpercentage, the differenc e between mean and standarddeviation is used ((p-o)def X ( p - c ~ ) ~ ~ i ) .n this waynot only the predictive capability of the networkstructure is considered (p), ut als o its robustness (0).Due to space only the best performing networks arepresented here (Chart 1). L-R-D-N represents thenetwork configuration and data used to create it. Forexample 1-2-1-5 means 1 hidden layer (l) , second setof ratios (2), no historical data (I), 5 neurones in thehidden layer(s) (5).

    Table 3 - The 16 best performing networks (Thefirst and second groups)I First set of ratios I Second set of ratios I

    Chart 1 - T rade-off between percentagesof correct classification of failing and nonfailing firms - Group I - 6 mist performing

    networks (Mean - S tandcird D e v . )

    I 2O 1L

    -I)- 1-1-1-5-E+ l-l-H2-40-W 1-2-1-40-U- 1-2-H2-40

    0 20 40 60 80 100% F ailing firms correctly classified

    Result of the Graph AnalysisObserving the different curves it can be noticed thatit is not always possible to identify the mo st performingconfiguration. Tables 3 and 4 indiicate 4 groups ofnetworks obtained from the graph analysis.Con figura tion performance varies with the cut-offpoint. So a portfolio of 6 networks was identified as

    containing the most performing networks among the 56structures explored.

    First group : 6 most performing networks (@);Second group : This group of 1 0 networks is lessperforming than the first group but more performingthan the oth er networks (0);Thir d group: constituted of 11 nets (0);Forth group: constituted by 24 networks whose

    performance is inferior to preceding networks ( 0 .

    5

  • 8/2/2019 A Structured Approach to Neural

    6/8

    Table 4 - Less performing networks (third andforth groups)

    13

    ~6--

    6-

    The Influence of the Type of Structure inNetwork PerformanceIt can be noticed from these results that thesystematic use of one type of network structure has notalways produced the best performing network. In other

    words even if some of the best performing networks areobtained using 40 neurones in the hidden layer (1-1-H2-40; 1-2-1-40; -2-H2-40), 40 neurones can producesometimes bad performing networks ( 1 - 1-1-40; 1-2-H1-40; etc.) . Graph analysis and tables 3 and 4 suggest thefollowing conclusions:Use of Historical Data

    It can be observed that H3 networks are almostalways the worst performers. H2 has produced betterresults (8 networks among the 16 better networks),superior to I (Y16) or H1 (3/16). These results suggestinteresting implications. First, it suggests that usinghistorical data as ratios gives bad results (H3) andsuggest that networks are able by themse lves to find themost interesting relations among different years (H2).

    Results are consistent with the hypothesis ofSTANLEY [171 concerning the use of historical data.STANLEY suggests that better nets may be obtainedwhen using the difference between two years value(H2) than the variable values them selves (H 1). Finallybetter networks were obtained by using H2 than byusing I, what suggests the interest of using historicaldata.

    of network with 5 or 10 neurones are among the bestperforming ones. This study does not permit aconsistent conclusion about using only a few neurones.Moreover other studies have obtained interestingresults with only a few neurones [3];[ 181, though theydo not explore networks with more than 10neurones.Number of layers

    This study does not distinguish any interest in usingtwo layers in bankruptcy prediction. Sometimes onelayer network outperforms two layer networks,sometimes the opposite is observed.Number of ratios

    It cant be concluded from this study that a set ofratio was better than the other.Comparison of Neural Networks with LogisticRegression

    In order to compare network performance withstatistical methods the same data sample was used.Char t 2 compares classification performance of bothtechniques.

    Network 1-2-H2-40 presented in chart 2 is one ofthe six most performing n etworks. It can be noticed thatperformance of both techniques were considerablysimilar. Both graphs were constructed using the sameinterval of cut-off scores (from 0,95 to 0,001).However, it can be observed that LOGIT is moresensible to changes in cut-off score than networks (i.e.smaller variance of network curve). It may be inferredthat predicting capability of nets are more stable thanthose of LOGIT that changes more abruptly whenvarying the cut-off score. In a real decision contextwhen the error risks related to the critical score areunknown predictions made with nets are more reliablethen those made by LOGIT as slight changes in thechosen cut-off score value do not incur in significantchange in risk evaluation.

    Number of NeuronesNetworks with 5 an d 10 neurones have generatedbad performing networks, though certain configurations

    6

  • 8/2/2019 A Structured Approach to Neural

    7/8

    nets conception. Other param eters than those presentedin this study should be explored.100

    $ 9 08070

    m-E 60.E z 5032 40 nets may be observed.

    30

    This study has explored the French transportindustry. Neural nets predicting perfiormances have notsignificantly surpassed statistical methods. It is possiblethat in other industries where o ther variables than thoseused here are available a distinguished performance of

    I 3 -, V I

    This study concerns the problem of neural netsdevelopm ent in bankruptcy prediction. Other studiesshould explore failing processes comprehensionthrough neural nets use. In fact neural nets do not have

    0 50 100 the same behaviour as logistic regression and theycould eventually bring new perspectives in bankruptcyailling ified process comprehension ( [ 5 ] ) . In DE ALMEIDA theproblem of interpreting failing processes throughneural nets is mor e extensively discu ssed. As a matterof space, this paper does not develop this point.REFERENCES

    z 208 10

    0

    Chart 2 - Comparison between Networks andLOGIT5. Conclusions

    This study brings some discussion about the use ofneural networks for bankruptcy prediction. Theabsence of a theory of bankruptcy analysis brings theadditional difficulty of properly choosing a set ofpredictive variables. Therefore other sets of variablescould eventually bring a better predictive quality to theneural n etwork model.

    [ l ] ADYA, M. and COLLOPY, F. Does AI Research AidPrediction? A Review and Evaluation.. Proceedings ofthe sixteenth International Conference on InformationSystems, Amsterdam, the Netherlands, December 10-13,p. 1 23-140, 1995.[2] ALTMAN, E., R. HALDEMAN el: P. NARAYANAN(1977). -Zeta analysis.-Journal of banking and

    finance, June 1977 . -p.29-5 4,.This paper has distinguished not one best trained [31 BELL, B.T., G.R. RIBAR, J.R. VERCHIO (1990).Neural nets vs. logistic regression : a comparison ofeach models ability to predict commercial bank

    failures. -Actes du congres international decomptabilitk..-Tome I. -Nice , December 1990.network, but a portfolio of six networks. Precisedistinction amo ng performance of different networksmay not be the main concern of a decision maker. Theidentification of some best networks will probably besufficient to conceive a portfolio of networks to supportthe decision process in bankruptcy risk evaluation. Inthis way a graph analysis is an interesting a nalys is tool.

    [41COURBON, J-C . (1983). --Les ~ 1 ~ 1 )util,concepts etmode daction. -AFCET nterface.,9 July 1983. -p.30-36.

    When choosing a neural network developingpackage, the developer should consider the capabilitiesand features of the package in helping the automationof the conception and execution processes.BRAINMAKER is not one of these packages and adatabase developing language was used to cons truct theexperimental plan.

    This study introduces a structured manner ofexploring neural networks in bankruptcy prediction.Despite the complexity of neural network developm ent,neural nets conception in management is not fullydiscussed in the literature.

    [ 5 ] DE ALMEIDA, F.C. and LESCA, H. - AdministraCBoEstratCgica da InformaCHo. - Revist:a de AdministraqSio.V.29, n., jul-sept, p. 66-75, 1994.[6] DE ALMEIDA, F.C. -LEvaluation des risques dedCfaillance des entreprises a paritir des rCseaux deneurones insCrCs dans les systkmes daide a la d6cision.-Doctoral thesis in Management. -Ecole SupCrieure desAffaires, Universidade de Grenoble, 1993.[7] DUM ONTI ER , P. (1990). -Vices et vertus desmodkles d e prkvision de dkfaillance. - Papier de

    recherche n o 90-12, U niversite de Cirenoble 11, CERAG,1990Studies in neural nets use in management normallydo not discuss the exploring of different parameters in

    7

  • 8/2/2019 A Structured Approach to Neural

    8/8

    [8] DUTrA S . , S . SHEKHAR et W.Y. WONG (1992). -Decision support in non-conservative domains :generalization with neural networks. -W P no 92-31,INSEAD. 1992.[9] KEEN, P.G.W et M.S.SCOIT-MORTON (1978). -Decision supports systems : an organisational

    perspective. -Addison Wesley, 1978.[lo] MAGNIER, J.P. -Utilisation de rCseaux de neuronespour le dkveloppement de systkmes daide 2 ladCcision. -Montpelier, Centre de recherche en gestiondes organisations, 1991.[ I I] OHLSON, J.A.- Financial Ratios and the ProbabilisticPrediction of Bankruptcy.- Journal of AccountingResearch, Spring, p.109-131, 1980.[121 PODD IG, T. Bankruptcy Prediction: A Comp arisonwith Discriminant Analysis. in Neural Networks in

    Capital Markets. Editado po r A.P. REFENES, NewYork.-John Wiley & Sons, 1995.[I31 PRESS D.J. & WILSON S . -Choosing BetweenLogistic Regression and Discriminant Analysis. -Journal of American Statistical Economics, 1978, p.3-35.

    [I41 ROCHESTER, J.B. (1990) -New business forneurocomputing. -IS/Analyser, vol. 28, n2, 1990. -p. 1-16.[151 RUMELHART, D.E., J.C. McCLELLAND, PDPResearch Gro up.- Parallel Distributed Processing -Exploration in the Microtexture of Cognition.- Volume

    1-London.- The M IT Press.- 1986.[16] SCOTT, J. (1981 ).-The probability of bankruptcy : acomparison of empirical predictions and theoreticalmodels. -Journal of banking and finance, no 5, September, 1981.-p. 1-26.1171 STANLEY J. - Introduction to Neural Networks.-CA:Sierra Madre.-Cal. Scientific Software.-3rd edition.-1990.[I81 TAM, K.T. & KIANG, M.Y.- Managerial Applicationsof Neural Networks: The case of Bank FailurePredictions.. Management Science, vol. 38, p.926-947,1992[19] WILSON, R.L. & SHAKDA, R. -BankruptcyPrediction Using Neural Networks. -Decision SupportSystems, vol. 1 1 , n. 5 , p. 545-557, junho 1994.

    8