92
UNIVERSITATIS OULUENSIS ACTA C TECHNICA OULU 2014 C 498 Satu Tamminen MODELLING THE REJECTION PROBABILITY OF A QUALITY TEST CONSISTING OF MULTIPLE MEASUREMENTS UNIVERSITY OF OULU GRADUATE SCHOOL; UNIVERSITY OF OULU, FACULTY OF INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING; INFOTECH OULU C 498 ACTA Satu Tamminen

Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

ABCDEFG

UNIVERSITY OF OULU P .O. B 00 F I -90014 UNIVERSITY OF OULU FINLAND

A C T A U N I V E R S I T A T I S O U L U E N S I S

S E R I E S E D I T O R S

SCIENTIAE RERUM NATURALIUM

HUMANIORA

TECHNICA

MEDICA

SCIENTIAE RERUM SOCIALIUM

SCRIPTA ACADEMICA

OECONOMICA

EDITOR IN CHIEF

PUBLICATIONS EDITOR

Professor Esa Hohtola

University Lecturer Santeri Palviainen

Postdoctoral research fellow Sanna Taskila

Professor Olli Vuolteenaho

University Lecturer Veli-Matti Ulvinen

Director Sinikka Eskelinen

Professor Jari Juga

Professor Olli Vuolteenaho

Publications Editor Kirsti Nurkkala

ISBN 978-952-62-0519-9 (Paperback)ISBN 978-952-62-0520-5 (PDF)ISSN 0355-3213 (Print)ISSN 1796-2226 (Online)

U N I V E R S I TAT I S O U L U E N S I SACTAC

TECHNICA

U N I V E R S I TAT I S O U L U E N S I SACTAC

TECHNICA

OULU 2014

C 498

Satu Tamminen

MODELLING THE REJECTION PROBABILITY OF A QUALITY TEST CONSISTING OF MULTIPLE MEASUREMENTS

UNIVERSITY OF OULU GRADUATE SCHOOL;UNIVERSITY OF OULU, FACULTY OF INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING;INFOTECH OULU

C 498

ACTA

Satu Tamm

inen

C498_etukansi.fm Page 1 Thursday, July 3, 2014 2:10 PM

Page 2: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models
Page 3: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

A C T A U N I V E R S I T A T I S O U L U E N S I SC Te c h n i c a 4 9 8

SATU TAMMINEN

MODELLING THE REJECTION PROBABILITY OF A QUALITY TEST CONSISTING OF MULTIPLE MEASUREMENTS

Academic dissertation to be presented with the assent ofthe Doctoral Training Committee of Technology andNatural Sciences of the University of Oulu for publicdefence in Auditorium TS101, L innanmaa, on 12September 2014, at 12 noon

UNIVERSITY OF OULU, OULU 2014

Page 4: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Copyright © 2014Acta Univ. Oul. C 498, 2014

Supervised byProfessor Juha Röning

Reviewed by Professor Bogdan Filipič Doctor Markus A. Reuter

ISBN 978-952-62-0519-9 (Paperback)ISBN 978-952-62-0520-5 (PDF)

ISSN 0355-3213 (Printed)ISSN 1796-2226 (Online)

Cover DesignRaimo Ahonen

JUVENES PRINTTAMPERE 2014

OpponentDocent Jaakko Hollmén

Page 5: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Tamminen, Satu, Modelling the rejection probability of a quality test consisting ofmultiple measurements. University of Oulu Graduate School; University of Oulu, Faculty of Information Technologyand Electrical Engineering, Department of Computer Science and Engineering; University ofOulu, Infotech OuluActa Univ. Oul. C 498, 2014University of Oulu, P.O. Box 8000, FI-90014 University of Oulu, Finland

Abstract

Quality control is an essential part of manufacturing, and the different properties of the productscan be tested with standardized methods. If the decision of qualification is based on only one testspecimen representing a batch of products, the testing procedure is quite straightforward.However, when the measured property has a high variability within the product, as usual, severaltest specimens are needed for the quality verification.

When a quality property is predicted, the response value of the model that most effectivelyfinds the critical observations should naturally be selected. In this thesis, it has been shown thatLIB-transformation (Larger Is Better) is a suitable method for multiple test samples, because iteffectively recognizes especially the situations where one of the measurements is very low.

The main contribution of this thesis is to show how to model quality of phenomena that consistof several measurement samples for each observation. The process contains several steps,beginning from the selection of the model type. Prediction of the exceedance probability providesmore information for the decision making than that of the mean. Especially with the selectedapplication, where the quality property has no optimal value, but the interest is in adequately highvalue, this approach is more natural.

With industrial applications, the assumption of constant variance should be analysed critically.In this thesis, it is shown that exceedance probability modelling can benefit from the use of anadditional variance model together with a mean model in prediction. The distribution shapemodelling improves the model further, when the response variable may not be Gaussian. As theproposed methods are fundamentally different, the model selection criteria have to be chosen withcaution. Different methods for model selection were considered and commented, and EPS(Exceedance Probability Score) was chosen, because it is most suitable for probability predictors.

This thesis demonstrates that especially a process with high diversity in its production andmore challenging distribution shape gains from the deviation modelling, and the results can beimproved further with the distribution shape modelling.

Keywords: Charpy-V test, distribution shape model, exceedance probability, impacttoughness, model selection, quality test, quantile regression, variance model

Page 6: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models
Page 7: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Tamminen, Satu, Hylkäystodennäköisyyden mallintaminen laatutestin koostuessauseasta mittauksesta. Oulun yliopiston tutkijakoulu; Oulun yliopisto, Tieto- ja sähkötekniikan tiedekunta,Tietotekniikan osasto; Oulun yliopisto, Infotech OuluActa Univ. Oul. C 498, 2014Oulun yliopisto, PL 8000, 90014 Oulun yliopisto

Tiivistelmä

Laadunvalvonnalla on keskeinen rooli teollisessa tuotannossa. Valmistettavan tuotteen erilaisiaominaisuuksia mitataan standardin mukaisilla testausmenetelmillä. Testi on yksinkertainen, jostuotteen laatu varmistetaan vain yhdellä testikappaleella. Kun testattava ominaisuus voi saadahyvin vaihtelevia tuloksia samastakin tuotteesta, tarvitaan useita testikappaleita laadun varmista-miseen.

Tuotteen laatuominaisuuksia ennustettaessa valitaan malliin vastemuuttuja, joka tehokkaim-min tunnistaa laadun kannalta kriittiset havainnot. Tässä väitöskirjassa osoitetaan, että LIB-transformaatio (Large Is Better) tunnistaa tehokkaasti erityisesti tilanteet, joissa yksi mittauksistaon hyvin matala.

Tämän väitöskirja vastaa kysymykseen, kuinka mallintaa laatua, kun tutkittavasta tuotteestatarvitaan useita testinäytteitä. Mallinnusprosessi koostuu useista vaiheista alkaen mallityypinvalinnasta. Alitusriskin mallinnuksen avulla saadaan enemmän informaatiota päätöksenteontueksi perinteisen odotusarvomallinnuksen sijaan, etenkin jos laatutekijältä vaaditaan vain riittä-vän hyvää tasoa optimiarvon sijaan.

Teollisissa sovelluksissa ei voida useinkaan olettaa, että vasteen hajonta olisi vakio läpi pro-sessin. Tässä väitöskirjassa osoitetaan että alitusriskin ennustamistarkkuus paranee, kun odo-tusarvon lisäksi mallinnetaan myös hajontaa. Jakaumamuodon mallilla voidaan parantaa ennus-tetarkkuutta silloin, kun vastemuuttuja ei noudata Gaussin jakaumaa. Koska ehdotetut mallitovat perustaltaan erilaisia, täytyy myös mallin valintakriteeri valita huolella. Työssä osoitetaan,että EPS (Exceedance Probability Score) toimii parhaiten käytetyillä todennäköisyyttä ennusta-villa malleilla.

Tässä väitöskirjassa osoitetaan, että erityisesti silloin kun tuotantoprosessi on monimuotoinenja laatumuuttujan jakaumamuoto on haastava, mallinnuttaminen hyötyy hajontamallin käytöstä,ja tuloksia voidaan parantaa jakaumamuodon mallilla.

Asiasanat: alitustodennäköisyys, Charpy-V testi, hajontamalli, iskusitkeys,jakaumamuodon malli, kvantiiliregressio, laatutesti, mallin valinta

Page 8: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models
Page 9: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Acknowledgements

The research for this thesis was conducted in the Biomimetics and Intelligent SystemsGroup (BISC) at the Department of Computer Science and Engineering of the Universityof Oulu, Finland.

First of all, I would like to thank my supervisor, Professor Juha Röning, for havingthe trust and faith in me through these years. I am also grateful to Professor MarkusReuter and Professor Bogdan Filipic for reviewing the dissertation manuscript.

I am truly indebted to Dr. Ilmari Juutilainen, who has been an endless source ofideas, valuable critic and suggestions. I would like to thank Mr. Timo Koivula fromRuukki Metals, Mr. Esa Heiskala from Ovako Bar, Imatra, and their colleagues for theircomments and interactive participation in model development. I would also like to thankmy colleagues in BISC for pleasant working atmosphere.

I am thankful for Dr. Susanna Pirttikangas for being a colleague and a friend withwhom the lunch conversations can turn from laugh to tears and laugh again in onesentence.

The financial support provided by the Graduate School in Electronics, Telecommuni-cations and Automation, Metallinjalostajien rahasto and Foundation for the Promotionof Technology (TES), Technology Industries of Finland Centennial Foundation Fund forthe Association of Finnish Steel and Metal Producers, and the Finnish Funding Agencyfor Technology and Innovation (TEKES), is gratefully acknowledged.

I thank my parents Elsa and Jorma, and my sister Sanna for their love and supportover the years. I owe my thanks to my dear friends for their encouragement. I am deeplythankful to my husband Antero, my Li’l bad wolf, for his love and patience during thisprocess, and to my sons Henri and Petri for keeping me busy with the real life, butthere’s no place I’d rather be.

Oulu, April 2014 Satu Tamminen

7

Page 10: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

8

Page 11: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Abbreviations

AUC Area under ROC curveCRPS Continuously Ranked Probabilistic ScoreCVT Charpy-V testDataAI Data Analysis and Inference groupEPS Exceedance Probability ScoreFN False NegativeFP False PositiveGAM Generalized Additive ModelsGAMLSS Generalized Additive Models for Location Scale and ShapeGLM Generalized Linear ModelISG Intelligent Systems GroupITT Impact Toughness TemperatureLIB Larger Is BetterLogS Logarithmic ScoreMAE Mean Absolute ErrorMLP Multi-Layer PerceptronMSE Mean Squared ErrorRM Tensile StrengthROC Receiver Operating CharacteristicRP Resilient PropagationRP02 Yield StrengthST3 Skew t type 3 distributiont-GARCH Generalized Autoregressive Conditional Heteroscedastic Model with

t-distributionTMCR Thermo Mechanically Controlled RolledTN True NegativeTP True PositiveUI User InterfaceVaR Value at Risk

9

Page 12: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

10

Page 13: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

List of original publications

This thesis is based on the following articles, which are referred to in the text by theirRoman numerals (I–V):

I Tamminen S & Juutilainen I & Röning J (2008) Product Design Model for Impact ToughnessEstimation in Steel Plate Manufacturing. Proc International Joint Conference on NeuralNetworks (IJCNN 2008), Hong Kong: 990–993.

II Tamminen S & Juutilainen I & Röning J (2010) Modelling of Charpy V Test RejectionProbability. Ironmaking & Steelmaking 37(1): 35–40.

III Tamminen S & Juutilainen I & Röning J (2010) Quantile Regression Model for ImpactToughness Estimation. Proc Industrial Conference on Data Mining (ICDM 2010), Berlin,Germany: 263–276.

IV Juutilainen I & Tamminen S & Röning J (2012) A Tutorial to Developing Statistical Modelsfor Predicting Disqualification Probability. In: Davim JP (ed) Computational Methods forOptimizing Manufacturing Technology: Models and Techniques. IGI Global: 368–399.

V Tamminen S & Juutilainen I & Röning J (2013) Exceedance Probability Estimation for aQuality Test Consisting of Multiple Measurements. Expert Systems with Applications 40(11):4577–4584.

11

Page 14: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

12

Page 15: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Contents

AbstractTiivistelmäAcknowledgements 7Abbreviations 9List of original publications 11Contents 131 Introduction 15

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.2 The scope of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3 Contribution of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.4 Summary of original papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Impact toughness testing 232.1 Transition behaviour and impact toughness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2 Charpy-V test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Modelling of a quality property with multiple measurements 313.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.1.1 Managing the quality test sets with multiple measurements . . . . . . . . 32

3.1.2 Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1.3 Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 Methods for rejection probability prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.1 Model candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.2 Rejection probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37

3.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3.1 Performance evaluation for rejection probability estimationmodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3.2 Proper scoring rules for model selection . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Target of application 474.1 Model users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

13

Page 16: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 Application to impact toughness of steel products 55

5.1 Selecting the proper transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.2 Modelling of impact toughness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.3 Deviation and distribution model in action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.4 Selecting the quantile model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.5 The rejection probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .655.6 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6 Summary and conclusions 77References 81Original publications 87

14

Page 17: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

1 Introduction

Every axiomatic (abstract) theory admits, as is well known, an unlimited

number of concrete interpretations besides those from which it was derived.

Thus we find applications in fields of science which have no relation to the

concepts of random event and of probability in the precise meaning of these

words.

A.N. Kolmogorov, 1933

1.1 Background

In the global market, manufacturing enterprises have to maintain their competitivenesswith different strategies. Quality, cost, and cycle time are the factors when competingagainst ones’ peers. Among them, quality is critical for getting long-term competitiveadvantages (He et al. 2009). Small manufacturers may choose to compete with highquality and short delivery time instead of a large volume or low prise, but as a result, theprocess control is more difficult to maintain to gain optimal properties for the product.Therefore, the goal is to reduce the process variability, which can be achieved with theassistance of design models for products or production. Furthermore, product parameterscan be optimized with the assistance of statistical models.

According to Harding et al. (2006), knowledge is the most valuable asset of amanufacturing enterprise, when defining itself in the market and competing with others.They point out several areas, where manufacturing has benefited from the data mining,including engineering design, manufacturing systems, decision support systems, shopfloor control and layout, fault detection and quality improvement, data mining inmaintenance and customer relationship management. Köksal et al. (2011) reviewextensively the data mining applications for quality improvement in manufacturingthat have been reported in the literature from 1997 to 2007. The recent developmentof information and sensor technology has enabled large-scale data collection duringnormal operation, and therefore, it is not complicated to implement different datamining applications to support the manufacturing. The quality of the data in industrialmeasurements is also an important issue of data mining.

15

Page 18: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

In this thesis, the application area is in steel production. As environmental issues andsustainability are the keywords that the enlightened customers demand from a product,the focus of a steel research is on these areas as well. Steel is the world’s most recycledindustrial material; it is 100% – even potentially infinitely – recyclable without loss ofquality. The reusability and the rate of remanufacturing of the steel products are highas well. During the years, the steel research has been able to reduce the need for rawmaterials, and the industry’s goal is zero waste. Novel advanced and ultra-high strengthsteel grades allow to reduce mass of the vehicles, which leads to lower emissions (WorldSteel Association 2012). The production planning and steel design benefit from themodels that predict the steel properties for alternative production routes.

Typically, one steel plant can have hundreds of products, and yet, the customersmight make enquiries of new ones. The steel properties are not only defined by chemicalcomposition, but also by a variety of microstructures that can be achieved by thermaltreatments and controlled rolling procedure, for example. Thus, the range of achievableproperties promotes the steel research and the development of new, innovative steels(World Steel Association 2012). In the product design department, there is a call for amodel that has good interpolation capabilities, when responding to the customers’ qualityrequirements. Rejections in qualification tests are very expensive for the company. As aresult, the motivation to reduce the number of rejected plates has aroused interest todevelop models that provide help for process planning.

For many mechanical properties of the steel products, the decision of qualification isbased on one test specimen for each product, and then, the testing procedure is quitestraightforward. When the measured property has a high variability within the product,several test specimens are needed for quality verification. It depends on the rejectionrule how these measurements should be treated. When designing modelling tools forquality control, the response value that most effectively finds the critical points shouldbe selected. The most obvious way is to use the average of the measurements as atarget for the model, but the risk of losing critical information is high especially if themeasurements are scattered. Also, the use of the measurements by themselves maylose the information depending on the modelling method. When the average of themeasurements is not the only value that is monitored, but one single measurementcan cause the rejection as well, it is clear that the measurements should be handleddifferently. One possibility is to find more efficient transformation for the measurements,or to use a method that will model the measurement distribution instead of the mean.

16

Page 19: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Industrial process data is often heteroscedastic and non-Gaussian in nature. Inother words, it is not rare that the noise process of the model is input-dependent orthat the response variable may be highly skewed. If the model simply estimates theconditional mean of the target data by minimizing the sum of squared errors (SSE)function, and ignores these conditions, the efficiency of the estimator will be weaker.Furthermore, when predicting extreme events, as the rejection in quality test, theconventional SSE-model will consistently under-predict these events. (Cawley et al.

2007)There are several possibilities to take into account this predictive uncertainty.

The basic mean model can work as a probabilistic predictor, but the distributionalassumptions may be improved with the deviation model. The quantile regressionmodel and distribution shape model enable to include the form of the distribution inthe prediction. The need for density forecasting increases when the predicted propertyconsists of several measurements. Generally, the property itself has a high variabilitywithin the product, but what is more, the property may have clearly non-Gaussiandistribution.

1.2 The scope of the thesis

The impact toughness is probably the most common example of a quality property thatdemands several measurements for its verification. In the literature, the general concernin impact toughness modelling is behaviour inside the transition region (the upperand lower shelf energies, the slope between them, and the ductile-to-brittle transitiontemperature). (Haušild 2002, Todinov 2004) Charpy-V modelling of weld seams isprobably the most widely studied application area. (Bhadeshia 1999, Bhadeshia et al.

1995)Both neural networks and traditional regression methods have been used in impact

toughness modelling, (Malinov et al. 2001, Kurban et al. 2007), but only a few of thestudies have concentrated on how to predict the risk of disqualification and how tohandle three measurements of every product. Golodnikov et al. (2005) used a quadraticregression model for CVT modelling with a small data set. The test was performed ononly one grade of steel and only at one test temperature. They selected the 20th, 50th,and 80th quantile to predict the three results of a CV test series. Although the selectedquantiles proved to be good candidates for CVT estimation, they cannot be used forpredicting the rejection in the test. For this purpose, the useful quantiles can be found

17

Page 20: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

from the lower tail of the distribution. The purpose of this thesis is not to bring newknowledge to metallurgy or impact toughness phenomenon itself, although the modeluser as a metallurgist could benefit from the model in this area as well. However, it isimportant to be aware of the development of the area.

In this thesis, the focus is on the risk probability prediction. A risk probabilitypredictor means a probabilistic forecasting method that predicts a risk of failure, forexample, in meeting the product specifications. When the predicted risk probability iscombined with the manufacturing and rejection costs, the decision making in qualityimprovement is more assured. Probabilistic prediction, i.e. density forecasting, enablesthe prediction of the full probability distribution of the response variable in contrastto dominantly utilized point forecasts, which offer no description of the uncertaintyassociated with the prediction. Probabilistic prediction has been applied for example toeconometric applications (Diks et al. 2011, Tay & Wallis 2000), electricity consumptionand price prediction (Hyndman & Fan 2010, Panagiotelis & Smith 2008), and weatherforecasting (Gneiting & Raftery 2007, Laio & Tamea 2007, Little & McSharry 2009).

The benefits of the deviation modelling in industrial applications have been acknowl-edged in several studies, (Engel 1992, Carroll & Ruppert 1988, Smyth et al. 2001). Atypical method is to perform heteroscedastic regression with generalised linear models(GLM), but neural networks can be used as well. Conditional probability density can beestimated by using separate networks for both the mean and variance (Boyd & White1994). Dorling et al. (2003) predicted air quality with MLP heteroscedastic Gaussianregression model. Distribution modelling with generalized additive models for location,scale and shape (GAMLSS) have been applied for economical, meteorological andbiomedical modelling, for example (Scandroglio et al. 2013, Serinaldi & Kilsby 2012,Rees et al. 2010).

Quantile regression has been widely used by economists and ecologists, and formedical applications, survival analysis, financial economics, environmental modellingand detection of heterostcedasticity. (Koenker 2005, Yu et al. 2003) Accurate estimationof the tails of the distribution is essential for risk management tools, such as Value-at-Risk (VaR) models in economics, and promising results have been achieved withquantile regression (Taylor 2000). Chen and Chen have shown that quantile regressionapproach with t-GARCH provides the best estimates for 1% and 5% VaR:s of Nikkei225 (Chen & Chen 2005). Especially economical applications may involve very largedata sets, but the method is less frequently used for industrial applications. Nevertheless,

18

Page 21: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

many applications may benefit more from the estimation of extreme values instead ofthe mean.

The Data Analysis and Inference Group (DataAI) in Biomimetics and IntelligentSystems Group (BISG) at Oulu University has a long history of co-operation withthe steel manufacturing industry, and several research projects have been completedwith relations to the post roughing mill temperature prediction (Laurinen & Röning2005), resistance spot welding process identification (Koskimäki et al. 2007), wedgeformation prediction in hot strip rolling after continuous casting (Elsilä & Röning 2006),defect prediction in hot strip rolling (Elsilä & Röning 2004), scale defect prediction(Haapamäki & Röning 2005), hot rolled steel plate temperature prediction (Tiensuu et al.

2011), mechanical properties prediction of the steel plates (Juutilainen & Röning 2006),and prediction of the disqualification probability related to the width of a hot rolled steelstrip (the original Paper IV). The industrial processes are constantly under change, andthus, the data management and model updating are important research areas, as well.DataAI have studied semi-automatic maintenance of regression models (Juutilainenet al. 2011) and developed a Smart Archive Framework for data mining applications(Laurinen et al. 2005).

In this study, the goal was to develop a product design model for all grades of steeland test temperatures in production, so that the model can be utilized to predict therelated risk of rejection in the CVT for different steel products.

1.3 Contribution of the thesis

The main contribution of this thesis is to show how to model phenomena that consistof several measurements for each observation, especially, when the observed sampleshould pass a qualification test. The process contains several steps, beginning from theselection of the model type. Prediction of the exceedance probability provides moreinformation for the decision making than that of the mean. Especially, with the selectedapplication, this approach is more natural, because the quality property has no optimalvalue, but instead, the interest is in adequately high value.

When modelling rejection probability, the procedure can be viewed in the simplestfashion as a class probability estimation problem that can be solved with simple methods.In the case of impact toughness rejection probability estimation, this is not adequate,however. Although the end-user is mainly interested in the rejection probability, theimpact toughness level is equally important. The data set cannot be classified into

19

Page 22: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

qualified and disqualified products beforehand, either, because the customer determinesthe qualification requirements for the order, and the products in the data set couldbelong to either class depending on the requirement. The end-user thus needs a tool thatproduces the predicted distribution of the impact toughness completely.

A rejection rule of the quality test determines how the test measurements are treatedas they are judged against this rule. The situation is not simple, especially, if thequalification rule is more complicated than just an average of the measurements. When aquality property is predicted, the response value of the model that most effectively findsthe critical observations should be selected. In this thesis, it has been shown that the meanof the measurements is not always a suitable transformation method for the responsevariable, and LIB-transformation (Larger Is Better) is suggested instead. Especially withdata sets that have high diversity, the role of proper transformation becomes crucial.Here, the product was considered rejected if the mean of the measurements was belowthe requirement or if the smallest measurement was 30% below the requirement. Thisis a typical rejection rule in steel industry, and the LIB-transformation effectivelyrecognizes especially the situations where one of the measurements is very low, but themean of the measurements is acceptable.

When modelling industrial processes, the assumption of constant variance should beconsidered critically. In this thesis, it is shown that Charpy-V test rejection probabilitymodelling can benefit from the use of an additional variance model together with a meanmodel in prediction. The distribution of the model error may have an important role aswell, depending on the diversity of the data.

Four different methods were applied for rejection probability estimation in theCharpy-V test. The basic model was a mean model with a constant model errorassumption. It was improved with the deviation model. The mean model was trainedwith MLP networks and the deviation model was a generalized additive model (GAM).The distribution shape was modelled with two different methods: GAMLSS (GeneralizedAdditive Models for Location, Scale and Shape) and quantile regression models. Asthe proposed methods are fundamentally different, the model selection criteria had tobe chosen with caution. Different methods for model selection were considered andcommented, and EPS (Exceedance Probability Score) was chosen, because it is mostsuitable for models from different origin.

Two different cases were examined: steel bar data from Ovako, Imatra and steelplate data from Ruukki Metals, Raahe Works. Ovako data consists of products whosemechanical properties are determined by chemical composition and tempering treatments.

20

Page 23: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

For Ruukki data, the different rolling procedures affect the properties as well. Thedata consists of most of the production assortment at both factories, and the rejectionprobability models are designed for product planning. The impact toughness estimationmodel has been in test use at both factories, and Ruukki has taken it into everyday use inproduct planning.

The thesis demonstrates that especially a process with high diversity in its productionand more challenging distribution shape gains from the deviation modelling, and theresults can be improved further with the distribution shape modelling. The quantilemodel is a very cumbersome modelling method, but no assumptions about the modelshape need to be made beforehand, and therefore, the method is appealing in some cases.

1.4 Summary of original papers

This thesis consists of five publications.Paper I presents the first step of impact toughness estimation. The focus is on the

transformation of the measurements and on the challenges of modelling of the qualityproperty itself. The paper shows that it is possible to predict a complicated phenomenonlike impact toughness for a large assortment of products in a steel factory.

Paper II extends the impact toughness prediction to rejection probability. In thispaper, the utilization of a deviation model was introduced. It has been shown that withlower rejection levels the LIB transformed impact toughness model together with adeviation model will most correctly predict the rejection probability in the Charpy-Vtest.

Paper III introduces the quantile regression as an alternative for rejection probabilityestimation. With quantile models, no assumptions of the distribution shape are needed.

Paper IV describes the whole process of disqualification probability prediction.The selected applications that enlighten the subject were from quality tests that consistof only one measurement. These applications may benefit from the deviation anddistribution shape modelling as well, although the improvement may not be so dramaticcompared to applications with multiple measurements.

Paper V combines the results from the earlier papers and the focus is on the modelselection, which is an essential part of the modelling, especially with different types ofmodels. In this paper, there are two applications from different steel manufacturingprocesses, and it has been shown that a process with high diversity in its production andmore challenging distribution shape gains from the deviation and distribution modelling.

21

Page 24: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

22

Page 25: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

2 Impact toughness testing

The materials testing is an essential part of steel manufacturing. The purpose of testingis to verify that the product fulfils the customer requirements. The importance of thematerials testing was recognized during the industrialisation period in the second half ofthe 19th century when the railway network expanded and increased the demand fordifferent structures in the network and the vehicles. The characterisation of the brittleand ductile behaviour of the materials, as well as the clarification of the ductile-to-brittletransition behaviour of the metals, was driven by the large number of failures of railsand axles that led to serious accidents. (Tóth et al. 2002) The analysis of the failures inwelded merchant ships after the World War II showed that the ship plates where thecrack had initiated showed generally a lower impact energy at the failure temperaturethan the plates where the fracture had arrested. This discovery launched the transitiontemperature research. (Wallin et al. 2002)

Impact toughness (or notch toughness) describes how well the steel resists fracturingat a predefined temperature when a hard impact suddenly hits the object. The propertyis crucial in steel products that are used in cold and harsh environments e.g. ships,derricks and bridges (Figure 1). From the customer’s point of view, the impact toughnesscriterion has to be fulfilled at the lowest service temperature of the structure. (Wallinet al. 2002)

2.1 Transition behaviour and impact toughness

The ideal structural steel combines ultra-high strength, to resist failure by plasticdeformation, with high toughness, to resist failure by crack propagation, but theseobjectives are often contradictory. Since plastic deformation is an important mechanismfor relaxing stress concentrations, alloys with higher strength are less tolerant of internalflaws and liable to fracture. One of the current topics of the steel research is how tomaximize the toughness of high strength steel. The desired property can be achieved bymeans of affecting the microstructure with the thermal treatments, for example. (Morriset al. 2001, Prawoto 2009)

Ductility is a measure of how much the product deforms plastically before fracture,but ductility does not lead to toughness by itself. Strength is the other desired property,but the stronger the steel, the lower the toughness, and this, in its turn, brings a challenge

23

Page 26: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Fig 1. Bridges and cranes stretch the requirements of the mechanical properties of steel.

to steel research. High toughness is a result of good combination of strength andductility.

Toughness can be tested with a static or dynamic load, and impact toughness isan example of the latter. At room temperature, steel can perform well in the impacttoughness test, but when the temperature falls, its performance weakens. In this study,the most demanding steel qualities may be tested at temperatures as low as −100◦C.

Transition behaviour provides information about the quality of steel. Because itdepends on the test temperature, this relation can be visualized with a graph, which isderived from several tests at different temperatures for one steel quality. The resultingfunction can be visualized with a graph (Figure 2).

The transition curve has a sigmoidal shape for which a function can be selected froma wide range of standard statistical distributions. One of these is the Burr distribution,for which the function is given in terms of test temperature T by:

F(T ) = [1+ exp−{(T −T0)/ξ}]−ν , (1)

where ξ is the scale parameter, ν is the shape parameter and T0 is the location parameterof the curve (Windle et al. 1996, Cao et al. 2012).

24

Page 27: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Temperature,◦ C

Impa

ctto

ughn

ess

Brittle behaviour

Ductile behaviour

Transition area

Fig 2. A graph visualizing the impact toughness of a steel product as a function of testtemperature.

The impact toughness can be affected by chemical composition, rolling practiceand thermo mechanical treatments. For example, carbon adds strength to steel, buthigher carbon concentration leads to poorer impact toughness. The effect of carbonconcentration on transition behaviour is illustrated in Figure 3. Transition temperaturecan be effectively lowered, when also manganese is alloyed. Impurities or inclusionsweaken properties of the steel, and for example, sulphur concentration can be loweredby treating molten steel with different methods.

Steel’s behaviour is ductile at higher temperatures (the area is called the upper shelf)and it gets brittle at low temperatures (the lower shelf). The transition temperatureis determined from the average of the upper and lower shelves. When the carbonconcentration is low, the transition region from ductile to brittle is narrow, the uppershelf is high, and the slope between the shelves is steep. An increase in the carbonconcentration lowers the upper shelf and also widens the transition region. The effect ofother alloying elements and process parameters on transition behaviour is similar (orreversed, if the parameter has a positive effect on impact toughness). (Lindroos et al.

1986)Steel’s behaviour in the transition region brings uncertainty to modelling, because

in this area the force required to break the steel stick can vary dramatically. Factorsthat raise the transition temperature have a negative effect on impact toughness as well.Furthermore, as the grain size is not uniform in the product, the transition temperature isaffected by the biggest grain size instead of the average grain size. The micro structureof a steel specimen is shown in Figure 4. The picture was taken from a low carbonNb-microalloyed steel strip specimen with an optical microscope. The microstructure

25

Page 28: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

0.21

0.11

% C

% C

0.31

% C

250

150

50

−100 20

0

Temperature,˚ C

Energy, J

Fig 3. Effect of carbon concentration on transition behaviour (temperature on the x-axis andabsorbed energy on the y-axis). Reprinted with permission from Tamminen, Juutilainen &Röning (2010) c©2010 Maney Publishing.

consists of ferrite and perlite, and there is variation in grain size, and crystallographicorientation, for example.

Fig 4. Microstructure of a steel specimen.

26

Page 29: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

The prediction of impact transition temperature (ITT) has been studied for decades.Several models have been proposed to predict the ITT, based on brittle-crack initiation bycracking of grain-boundary carbides or the average size of a certain fraction of the largestgrains in microstructure. However, most of the models are based on microalloyed steelsthat have been either heat treated or conventionally hot rolled. It has been shown thatthese models are not adequate for novel thermo mechanically controlled rolled (TMCR)steels, however. These steels have fine grain size and prediction of impact toughness onthe basis of an average metallographic grain size is not very successful. The mesotexture(the number of grains that have a closely aligned crystallographic orientation) can havethe effect of increasing the ’effective’ grain size from the metallographic grain size,and it should be taken into account in the predictions together with the grain boundarycarbide thickness. However, the carbide thickness cannot yet be predicted accuratelywith the existing models. Furthermore, the mesotexture varies from the subsurface to thesurface, and from the centre to the edge of the plate, and therefore, the place where thetest specimen is machined has an effect on the impact toughness as well. (Bhattacharjeeet al. 2003, Bhattacharjee et al. 2004)

2.2 Charpy-V test

Knott 2008 defines quality as the ability of a steel component to continue to fulfilits design intent throughout the design-life, when subjected to the threats to integrityencountered in service. When this quality is defined in a quantitative manner, it ispossible to rank the products based on the quality. Low impact transition temperatureequates high quality, and the energy absorbed at a specified test temperature serves asanother measure of quality. For routine testing of a given production batch of steel plates,it is necessary to use rather small amounts of material. The most realistic performancein the test demands the use of the whole end product, as the small test specimen behavesquite differently in the test. The cost of a small test piece is reasonable compared to thecost of testing with the whole product. The results of a small test piece can be correlatedwith the large database of the type tests, and these experimental correlations providea good basis for ensuring reproducible quality within known limits. The size of thedatabases used to establish the correlations and the extent to which type tests representthe service application define the degree of confidence of these quality tests. (Knott2008)

27

Page 30: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Qualification of the impact toughness requirements of a steel plate can be verifiedwith a Charpy-V test (CVT), which is a cost-effective material testing procedure formaterial selection and toughness evaluation (Tóth et al. 2002, Wallin et al. 2002).Alternative small-scale notched bend tests include those devised by Izod, Mesnager,Schnadt and Hounsfield. Larger-scale drop-weight tests, plate tests and crack arrest testsare associated with Pellini, Wells, Soete, Robertson and the U.S. Navy, for example.(Knott 2008) The CVT has standardized specifications that define the dimensions of testpieces, the type of notch (U or V), test force, testing temperature, etc. The test piece isbroken with a pendulum and the energy (in joules, J) absorbed in fracturing is measured.The test does not provide a direct measurement of the fracture toughness, but the impactenergy correlates with the property. Numerous different empirical correlations relatingimpact toughness energy to fracture toughness have been determined for a variety ofmaterials, over the past years. However, finding an empirical correlation that would beuniversally applicable has proven to be quite difficult, and the empirical correlations areusually very case dependent (Wallin et al. 2002). with a sufficient database, if both testresults and service experience are available in it, it is possible to specify Charpy valuesat prescribed test temperatures which correlate well with good experience in service(Knott 2008). The customer defines the required Charpy values and test conditionsaccording to his knowledge of the final product requirements, when ordering the steelplates.

Because the impact toughness has a large variability within the product depending onthe location of the test specimen, the test is performed on three different samples fromevery test unit, and the plate is accepted only if the average of the measurements is higherthan the requirement and none of the measurements is 30% below the requirement (thereare some exceptions specified by the customer). Since a Charpy specimen represents asmall volume of material and the microstructure of steel is heterogeneous, the individualCharpy specimens may sample different microstructures, and the impact energy dataobtained on such specimens are inherently scattered (Moskovic & Flewitt 1997). TheCharpy-V test equipment is illustrated in Figure 5. The absorbed energy is calculatedfrom the difference in the heights h0 and h1.

28

Page 31: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

����������������������������������������������������������������

����������������������������������������������������������������

h0

h1

specimen

Fig 5. The principle of the Charpy-V test equipment.

2.3 Discussion

There are other more exact tests for determining the impact toughness of a steel product,but as the Charpy-V test is a widely used and certified quality control test for steelproduction, the comparison of different test methods can be excluded from the thesis.

The developed model would benefit from the measurements collected inside thetransition area, but based on CVT measurements collected during regular production,there is no possibility to draw conclusions of the transition behaviour of these steelproducts. Products that have high variability in measurements are more likely to be nearthis area, however, and the model will recognize the hidden risk of rejection. From theend user’s point of view, it is important to be able to determine the point in how coldenvironment the product is safe to be used.

The chemical composition of a steel itself does not necessarily tell the whole impactof the alloyed materials to a CVT. The inclusion formation can be affected by the orderof the alloying as well. Chemical compounds that form during alloying are one type ofinclusions that are impurities present in steel, and their role is very important for impacttoughness. Depending on the position and size of the inclusions in the test specimen,the Charpy-V test can produce acceptable measurements or very poor results, if theinclusion locates near the notch. In this research, the information of the order of alloyingwas not available for the models, and furthermore, the research of the mechanics howthe order might affect the inclusion formation is still on-going.

29

Page 32: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

30

Page 33: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

3 Modelling of a quality property with multiplemeasurements

Small variability is an essential quality factor of the process. Small variability inend-products properties has several advantages, including:

– The product will satisfy the specifications more probably, and the rejections causedby disqualifications will be diminished.

– The product costs may be cut, by using smaller working allowance.– Customer satisfaction will be higher, when the products have uniform quality.

Statistical models for quality prediction can be useful in variability reduction. Theyenable the operator to test the effects of the production parameters on the quality.

When planning a product or production, it is important that the final product willsatisfy the specifications and quality requirements. When the risk of failing to satisfythese requirements can be estimated, the process can be controlled in order to lower therisk. The result may be different whether the decision making is based on the predictedmean of the quality property, or instead, on the exceedance probability in the quality test,and thus, the selection of the model type should be done with care. The predicted risk ofrejection combines naturally with the costs related to the production and can be used forprocess optimization, and thus, variability reduction.

In quality control, the test is performed for several test specimens from a singleproduct, if the quality property is known to have high variability within the product.The following prediction method should be selected accordingly, because the methodshould be capable of dealing with the challenges that may concern the more complicateddeviation or distribution of the measurements.

The majority of regression model applications in industry predict only the mean(point prediction). In this chapter, some density forecasts for exceedance probabilityprediction for multiple measurements test sets are covered. The methods produce aprediction for the risk of exceeding the requirements, and they allow the deviationand the distribution shape of the response variable to be dependent on the explanatoryvariables as well. The model selection between models of different types is also anessential part of the chapter.

31

Page 34: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

3.1 Methods

3.1.1 Managing the quality test sets with multiple measurements

When a model for a mechanical property is designed, it is not always straightforwardto choose the target variable. It depends on the test method if one could use themeasurements directly, or if one needs to transform them first. If there were morethan one measurements for each product, they may have to be transformed into asingle response variable, depending on the modelling method. The usability of thetransformation depends on the purpose of the prediction, which in this case was thedevelopment of a model for product design to help in designing the manufacturingprocess for steel grades. Therefore, the objective is to predict rejection probability inCharpy-V test for each product, instead of understanding the impact toughness as amechanical property of steel.

Because the CVT has two different rules for rejection, the target variable of theimpact toughness model should be able to recognize both of them. In the transitionregion, there is a high probability of getting high variability between the measurements.These measurements can show upper shelf energy, lower shelf energy or somethingbetween them. The mean of the measurements would be the most obvious transformation,but because of this variability, averaging often hides the evidence on an increased risk ofrejection. For example, if a Charpy-V test set included measurements 300 J, 300 J and15 J, the average of the measurements would be over 200 J, while the smallest valueclearly causes rejection even with a low rejection level of 27 J. Thus, the average of thethree measurements does not serve well as the target of modelling.

Instead, in this study, it is suggested to employ LIB-transformation of Taguchi(Larger Is Better) as the response variable in model estimation. The transformation is

yLIB =− log10

(13

3

∑i=1

1z2

i

), (2)

where variable zi is the ith measurement in the test series of one plate. (Tong & Su 1997)The advantage of this transformation is the fact that it better brings out the increased riskof one low Charpy-V measurement, which is important as the majority of the Charpy-Vtests get high scores. In some cases, the average of three measurements would ignorethese rejections completely. It is more important to try to predict rare rejections than toperform well with well-behaving observations that get results high above the rejectionlimit and would not be rejected anyway.

32

Page 35: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

The LIB transformation strongly shrinks the upper part of the measurement scale,and thus, increases the relative influence of the lower part where the rejections actuallyhappen. The dependence of the single Charpy-V measurements and the average of theCVT set they belong to is illustrated in the left scatter plot in Figure 6. It is importantto notice that the observations inside the ellipse are not recognized by the average,although a large part of them should be rejected.

Fig 6. Dependence between the Charpy-V measurements and the average (left) and LIBtransformation (right) of the CVT set.

In the right scatter plot in the similar illustration for the LIB transformation, it canbe seen that the lowest measurements form a boundary or guarantee for the minimumCVT value as the LIB value increases. For example, if the LIB is 4, which is equivalentto 100 J, every one of the measurements is guaranteed to be at least 60 J. Therefore,because the purpose of the model is to recognize the plates that would be rejected, theLIB transformation is a more suitable target.

Certainly, there are other possible transformations as well. For example,

y = min(mean(z1,z2,z3),min(z1,z2,z3)/0.7) (3)

would recognize the rejections according to the rejection rule, but because the allowablepercentage for the lowest measurement that is now 30%, can be whatever the customerstates, the transformation is not generalizable.

33

Page 36: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

3.1.2 Heteroscedasticity

Sometimes, not only the mean, but also error variance is dependent on the explanatoryvariables, and thus, assuming constant variance is not satisfactory. If there wereinformation about dispersion, the distribution of the mean could be predicted in greaterdetail. (Juutilainen & Röning 2004, Juutilainen 2006) The unequal variance is sometimespossible to correct with transformation (e.g. Box-Cox transformation), but in somecases, it is not adequate, and the modelling of the variance is needed. One example isthe log-linear variance model, which ensures that the estimates of σ2

i remain positive.(Aitkin 1987, Verbyla 1993)

Dispersion modelling has been employed to analyse quality improvement experi-ments designed to find the process settings that minimize variance under given conditions.Models of dispersion can be used in tolerance design, for example, because the modelpoints out the sources that produce variation in the process. (Engel 1992)

The mean and variance can be estimated together with MLP containing two outputneurons. Then, the cost function has two parameters minimized simultaneously (Dorlinget al. 2003). The method does not allow to include the information of the estimated meanin the variance estimation. Therefore, an iterative method where mean and variance aremodelled separately can be more preferable.

The purpose of variance function estimation is to model the structure of the variancesas a function of predictors. (Carroll & Ruppert 1988) The most common responses forvariance modelling are ε2

i = (yi− µi)2, as E(yi−µi)

2 = σ2i and logε2

i . In a case of anormally distributed response, squared residuals are suitable because of the result

ε ∼ N(0,σ2)⇒ ε2 ∼ Gamma(σ2,2), (4)

where notation y∼ Gamma(µ,s) means that y is Gamma- distributed with expectationµ and variance sµ2 (Juutilainen 2006). Assumption of a normally distributed error termis often reasonable, but not necessary. The subject of distribution shape modelling isdiscussed later (see Section 3.2.2).

If the response variable of a model have a distribution belonging to the exponentialfamily, Generalized Additive Models (GAM) can be used for modelling. They are anonparametric extension of Generalized Linear Models (GLM), where linear components

∑β jX j can be replaced by a sum of smooth functions ∑s jX j. The s(·)’s are unspecifiedfunctions that are estimated using a scatterplot smoother (Hastie & Tibshirani 1986,

34

Page 37: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Hastie et al. 2001). The method is suitable for variance modelling, when Gamma-familyis selected. When the response logε2

i is used, the model is fitted using least squares.

3.1.3 Quantile Regression

Quantile regression is a method that enables the estimation of conditional quantiles of aresponse variable distribution, and therefore, it provides information of not just thelocation of the distribution, but also the shape. The model allows heteroscedasticity toappear in the data, and it is suitable with non-Gaussian distributions. Methods assumingthat the observations follow the Normal distribution will suffer from weaker modelperformance if the assumption is violated. Furthermore, the estimated median providesmore robustness to large outliers than the ordinary least squares regression estimate ofthe mean. (Koenker 2005)

When estimating the sample mean with the ordinary least squares regression, theobjective is to minimize a sum of squared residuals, whereas the median is estimatedby minimizing the sum of absolute residuals. Similarly, with quantile regression, theobjective is to minimize a sum of asymmetrically weighted absolute residuals, wherepositive and negative residuals are weighted differently. (Koenker & Hallock 2001)

The advantage of using quantile regression instead of the generalized linear modelswhen modelling the deviation is that no specification of how variance changes arelinked to the mean is required, nor is there any restriction on the exponential family ofdistributions. Furthermore, it is possible to detect changes in the shape of the distributionof y across the predictor variables. (Cade & Noon 2003)

Yu and Jones have studied nonparametric quantile estimation by kernel weightedlocal linear fitting and found out that double kernel smoothing is the most preferablemethod for the task (Yu & Jones 1998). Conditional mean and conditional quantilescan be estimated simultaneously as well. Multilayer perceptron network with severaloutput neurons, one for the mean and the others for the quantiles, has been used for theestimation (Takeuchi et al. 2003). Quantile estimation can be very time consuming, andChen and Wei have developed optimal, reliable and efficient computational tools forthree algorithms (simplex, interior point and smoothing) when estimating conditionalquantiles (Chen & Wei 2005).

In many applications, only one quantile is inspected at a time. If several conditionalquantiles are estimated, two or more of them can cross or overlap, especially in the areaswhere no training data is available. This phenomenon occurs because each conditional

35

Page 38: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

quantile function is independently estimated. Takeuchi et al propose a model with anon-crossing constraint that will prevent the phenomenon. The quantile properties maynot be guaranteed, because the method both tries to optimize for the quantile propertyand the non-crossing property, but they ensure the semantics of the quantile definition:lower quantile level should not cross the higher quantile level (Takeuchi et al. 2006).

Cade and Noon have shown that the quantile estimates further from the centre of thedistribution (the median) usually cannot be estimated as precisely, because the samplingvariation will increase when approaching the tails (Cade & Noon 2003).

Instead of estimating the conditional quantile function, the estimation of theconditional distribution function could sometimes be more convenient (Peracchi 2002).Local linear models proposed by Hall et al. (1999) are one such method for estimatingthe conditional distribution function.

3.2 Methods for rejection probability prediction

3.2.1 Model candidates

Two model candidates were trained for the transformed CVT measurements: the jointmodel of mean and deviation and the mean model with constant variance. The transfor-mations for the measurements were mean of the measurements, LIB transformation inEq. 2, and transformation in Eq. 3. In this study, the proposed joint model of mean anddeviation is

µi = f (xi)

σ2i = g(xi,µi)

yi = µi +σiεi

εi ∼ D(xi).

(5)

The function f was modelled using MLP networks. Both response variables ε2i and

logε2i (see Section 3.1.2) were tested for g, and the response of the mean model was

used as one of the explanatory variables of the deviation model. The response log ε2i

guarantees that the estimated variances are positive (Carroll & Ruppert 1988), but thepredicted variance cannot be utilized without correction, because E logε2

i 6= logσ2i , and

thusσ

2i = e

logε2i +1.27, (6)

where logε2i is the prediction for the logarithm of the squared residual for the ith

observation (Harvey 1976, Juutilainen & Röning 2004, Juutilainen 2006). GAM model

36

Page 39: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

with response ε2i proved out to perform better in the rejection probability estimation, and

it was selected for the final model. For the model with the constant variance, σ2i = σ2∀i,

where σ2 is the sample variance.In practice, the extreme values of the predicted deviation are often suspicious, and it

is advisable to restrict the values to a suitable interval, which can be determined visually,for example.

In quantile regression, the θ th quantile of variable y is the value of Q(θ), for whichP(y < Q(θ)) = θ , and the conditional nonlinear quantile function is

Qyi(θ |xi) = g(xi,βθ ), (7)

where βθ is a vector of parameters dependent on θ . The full probability distribution of y

can be approximated with quantile regression models corresponding to a range of valuesof θ (0 < θ < 1). (Koenker 2005)

When training a neural network model by minimizing a mean squared error (MSE)criterion, an estimation of the conditional expectation of the desired response is achieved:

1N

N

∑i=1

(yi− yi)2. (8)

Now, the conditional θ th quantile will be produced if the optimization criterion is

1N

N

∑i=1

[δ (yi > yi)(1−θ)(yi− yi)+δ (yi > yi)θ(yi− yi)] , (9)

where 0 < θ < 1, δ (yi > yi) = 1 if yi > yi, and = 0 otherwise. If θ = 0.5, the conditionalmedian quantile model will be produced. (Saerens 2000)

3.2.2 Rejection probability

The aim of the proposed model is to predict the disqualification risk Ri = 1−P(Y mini ≤

yi|xi) in a quality control test. The predicted disqualification risk can be calculated fromthe predicted conditional cumulative distribution function Fyi|xi = F(s,xi) = P(yi < s|xi),where s is a predefined rejection or exceedance limit.

If the distribution of the standardized residuals was assumed to be normal for themean model, the probability of rejection can be calculated with

Pi = Φ

(L− µi

σi

), (10)

37

Page 40: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

where Φ is the cumulative normal distribution function, µi is the impact toughnessestimate and σi is the predicted deviation for observation i. The rejection probability risklevel that will lead to a change in product design should be low enough to recognize theunsuccessful plates and high enough not to produce too many false alarms. Typically,probabilities between 0.05-0.2 are used.

The assumption of normally distributed errors is not always correct, and t-distribu-tion, for example, might fit better. In some cases, the shape of the error distributionmight depend on the input variables, as well as the mean and deviation. GAMLSS(Generalized Additive Models for Location Scale and Shape) provide a large selectionof different distributions for the modelling. The conditional distribution of the responsecan be modelled using separate prediction functions for location µ , scale σ , and twoshape parameters ν and τ (Stasinopoulos & Rigby 2007). As presented in the originalPaper IV, the GAMLSS model assumes yi ∼ Fi,Fi = D(xi) = D(µi,σi,νi,τi) and

g1(µi) = h1(xi,β1)

g2(σi) = h2(xi,β2)

g3(νi) = h3(xi,β3)

g4(τi) = h4(xi,β4), (11)

where g j(·), j = 1 . . .4 are the link functions and h j(·), j = 1 . . .4 are smooth functions,i.e. additive splines or linear functions. There is no need to model the mean and thedeviation jointly with the distribution shape model, but g1 and g2 can be kept fixed, andonly g3 and g4 will be modelled. The GAMLSS library includes over forty differentdistributions, and finding the distribution that best fits the data can sometimes belaborious.

To utilize the rejection probability information, quantile regression models areneeded only for a few desired probability levels. The model corresponding to the medianis informative for the user as well. Typically, quantiles between θ = 0.01 and θ = 0.2are sufficient levels for rejection probability for the whole product planning. The CVTrequirement of the product is compared with the estimated value of the quantile model,and as model θ = 0.05 indicates that 95% of the plates will have higher CVT value thanestimated, the quantile will act as 0.05 rejection probability level, for example.

For the model selection purpose, it is necessary to model the whole quantile range,which requires independent models for each quantile. The outermost quantiles thatcan be reliably modelled are near θ = 0.01 and θ = 0.99 depending on the data, and asufficiently large amount of quantiles should be selected between these points. As a

38

Page 41: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

result, the estimated rejection probability for a predefined rejection limit can be locatedbetween two quantiles, and the exact value should be interpolated, as suggested inthe original Paper IV. To ensure that the interpolated probability is between [0,1],logit-transformation is preferred

P(yi < Y mini ) = sigmoid

{logit(αk)

Y mini −qα j(xi)

qαk(xi)−qα j(xi)

+ logit(α j)

[1− Y min

i −qα j(xi)

qαk(xi)−qα j(xi)

]}, (12)

where qα j(xi) and qαk(xi) are selected from the available predicted quantiles qα(xi) sothat the selected quantiles are as near to Y min

i as possible and satisfy qα j(xi)≤ Y mini <

qαk(xi). If Y mini < min(qα1(xi), qα2(xi), . . . , qαA(xi)) or Y min

i > max(qα1(xi), qα2(xi),

. . . , qαA(xi)), then, k and j are the two nearest available predicted quantiles such thatqα j(xi)< qαk(xi) and α j < αk. Cannon (2011) presents an alternative for predictivedensity interpolation with quantiles. The disadvantage of the Cannon’s method is that itproduces too low predicted distribution values in the tails, when the distance betweentwo lowest or highest quantiles is very small. If the steepness of the tail is determinedwith these two points, the predicted density will approach zero too quickly. The problemcan be solved, however, with a condition that the extrapolation should not be performedbased on these points if the distance between them is too small.

3.3 Model selection

It is not simple to compare the proposed models, because the dependent variablesare not in the same scale originally. Therefore, the LIB transformed measurementswere transformed back into joule-scale before training. The plain back-transformationafter training would lead to incorrect results, because of E( f (x)) 6= f (E(x)), wheref (x) =

√(10x). However, with certain distributional assumptions, the correct results

could be achieved analytically or by using approximation after training as well.

3.3.1 Performance evaluation for rejection probability estimationmodels

Evaluation of the performance of different classification methods is needed whenselecting the best model for rejection probability estimation. The method should be

39

Page 42: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

robust to imprecise class distributions and misclassification costs. Ideally, the methoddoes not assume that the target environment will be constant and balanced (Provost &Fawcett 2001).

The decision of accepting or rejecting the designed product is based on the rejectionprobability estimation. The probability level that necessitates changing the productionplan, has an impact on true positive (TP) and false positive (FP) rates. The predictiveanalytics of a decision making in classification problems can be summarized into aconfusion matrix, as shown in Table 1.

Table 1. Predictive analytics of a classification problem.

Actual

positive negative

PredictedPositive true positive (TP) false positive (FP)Negative false negative (FN) true negative (TN)

An acceptable trade-off between the possible results should be found. When therejection probability level is low, the number of TP is high, but the number of FP is highas well. When the level increases, the number of TP observations decreases as well asthe number of FP observations. If the level is too high, too many TP observations getaccepted, and when the level is too low, too many FP observations get rejected.

If the costs of each decision were known, they should be taken into account whenselecting the probability level. Often, the cost of FP is different from the cost of FN. If afaultless product is rejected, only the production costs are paid, but if a defective productis delivered to the customer, the cost could be much higher. The proportion of theseincidents can be very imbalanced, when the number of faultless products is significantlylarger than that of the defective products. The problem can be handled similarly asunbalanced or unknown misclassification costs. (Maloof 2003).

Rejection probability models can be treated as classification models, where at certainprobability level the product can be considered to be rejected. Classification models arecommonly evaluated graphically with Receiver Operating Characteristic (ROC) curves,which present the trade-off between true positive and false positive rates. The area underthe ROC curve, known as the AUC, is a standard method to assess the accuracy ofdifferent classification models. The method is objective as no threshold for the decision

40

Page 43: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

needs to be specified. Instead, the method summarizes the overall model performanceover all possible thresholds. There are several reasons why it should not be used formodel selection, however, (Cook 2007, Lobo et al. 2008, Peterson et al. 2008). ROCand AUC are suitable mainly for analysing local performances at different probabilitylevels.

F-score is another possible criterion for model comparison. It is the harmonic meanof precision and recall

F = 2T P/(T P+FP)T P/(T P+FN)

T P/(T P+FP)+T P/(T P+FN), (13)

where T P is a number of true positives, FP is a number of false positives and FN is anumber of false negatives in disqualification. A larger F-score indicates a better model.It is possible to give more weight to the precision or recall, if one is more important thanthe other (Sokolova et al. 2006). The F-score can be calculated at every p j = q j, andaveraged F-score values for each probability can be used for local evaluation.

The beanplot (Kampstra 2008) is a method for visual inspecting of different databatches. It summarizes the data while the original observations are still visible. Beanplotis a combination of 1-dimensional scatter plot and distribution trace of the univariatedata set. In our research group, the method has been developed further to illustratethe differences between predicted and observed distributions, where predicted andobserved distributions are mirrored against each other, instead of the symmetrical bean(Juutilainen et al. 2013). It is possible to visually analyse how well the observed andpredicted distributions coincide in the test data or how the predicted distributions varybetween different regions of input space or between different query points. Beanplotsrequire data groups with approximately homogenous predicted distributions, and if thereare no natural groups available, the data should be clustered according to predicted mean,predicted deviation, and optionally, predicted kurtosis and skewness. Alternatively, thedata can be clustered according to the input variables that have the strongest effect onthe predictive distribution.

3.3.2 Proper scoring rules for model selection

When selecting a suitable ranking method, one must remember that, especially with CVT,only the probability to exceed the requirement level z is considered and the productiondoes not aim at a certain optimal value for the property. Therefore, the proposedmodels predict the probability that the response variable is below or above a certain

41

Page 44: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

limit. In other words, the prediction of a model is a function F(·), F(s) = P(y < s)

using the cumulative distribution function, where s is the predefined exceedance limit.Because, the selected models produce quantile or density predictors, they have tobe transformed into exceedance probability predictors that predict the conditionalprobability P(yi < si|xi). This way, the comparison of different types of probabilityprediction models becomes possible as well.

According to Gneiting & Raftery (2007), scoring rules assess the quality of prob-abilistic forecasts. In terms of evaluation, scoring rules measure the quality of theprobabilistic forecasts, reward probability assessors for forecasting jobs, and rankcompeting forecast procedures. A scoring rule S(P,y) is a function that outputs anumerical score based on the observed response y and its distribution P. The scoringrules measure the out-of-sample performance of the forecasts. (Juutilainen et al. 2012)

The purpose of a score function is to rank the models as a function of how useful themodels are for the user (Hand et al. 2001). One might want to put more weight to theerror of not detecting the rejection and less weight to the false alarm. Sometimes, theunderestimation is less harmful than overestimation, and therefore, asymmetrical scorefunction would be more appropriate. It is not always straightforward how to handle theextreme values, either. In this thesis, the previous question was not considered further.

The predictive performance of the regression methods that produce the predicteddensity function f (·) of the response measurement y with predicted mean and otherpossibly predicted parameters of the assumed distribution can be evaluated with theLogarithmic Score (LogS)

LogS(P,y) = log f (y). (14)

LogS is equivalent to test data log-likelihood, and it is the most commonly used scoringmethod. Thus, it is selected as one of the proposed methods here. LogS rule is insensitiveto distance, and therefore, it does not reward response values that have low predictedprobability but are near to a region with high probability. (Juutilainen et al. 2012) Withmultiple measurements quality test sets, the response value may differ between themodels depending on the transformation, and the model comparison with LogS may bemisleading. When calculating the score, y cannot be different for every model, and thus,the selected value may favour a model that has the same response variable originally.The selection may serve the target of the quality test, for example, and in that case, thetransformation in Eq. 3 would be applicable.

42

Page 45: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Score functions that are based on the rejection limits are more suitable for modelcomparison when models have different response variables. Furthermore, they differfrom LogS, because they are distance-sensitive, whereas LogS is not. ContinuouslyRanked Probabilistic Score (CRPS)

CRPS(F,y) =−∫

−∞

[F(s)−1(y < s)]2 ds (15)

has been considered as the optimal method for comparing probabilistic predictions thatare expressed using CDFs (Gneiting & Raftery 2007). In practice, the CRPS can beenapproximated by

SCRPS(F,y) =yu− yl

J+1

J

∑j=0− [F(s j)−1(y < s j)]

2 , (16)

where the equally spaced artificial exceedance limits are

s j = yl + jyu− yl

J(17)

and [yl ,yu] is the requirement interval. (Juutilainen et al. 2012)Exceedance Probability Score (EPS)

EPS(F,y) =∫

−∞

1(y≤ s) log [F(s)]+1(y > s) log [1−F(s)]ds (18)

penalizes more the model that produces very unlikely probabilities, and thus, it is amore reliable scoring method than CRPS (Juutilainen et al. 2012). In practice, EPS iscalculated as a sum of a finite number of artificial exceedance limits that cover [yl ,yu]

SEPS(F,y) =yu− yl

J+1

J

∑j=0

1(y≤ s j) log [F(s j)]+1(y > s j) log [1−F(s j)] . (19)

There are differences in how LogS, CRPS and EPS penalize different types of predictionerrors. In Juutilainen et al. (2012), it was shown that CRPS is very insensitive to caseswith too low predicted deviation, and that logS is relatively insensitive to cases with toohigh predicted deviation. EPS penalized more on crude errors in predicted mean thanlogS or CRPS.

The ranking of the probability forecasts is not reliable if only a single exceedancelimit is used. Instead, a range of artificial limits should be determined. It should coverthe whole range of the response values in which the accurate prediction of exceedanceprobability is important. For example, if a product has a lower specification limit but

43

Page 46: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

not an upper specification limit, the upper tail of the distribution is uninteresting andthe range of the limits does not need to cover the upper tail of the distribution. Thefrequency of the limits should be small enough to approximate satisfactorily the integralEq. 18 (Juutilainen et al. 2012). When using several limits, the forecasts are ranked bythe average of the scores.

The difference of the prediction accuracies of the proposed models can be testedwith the Diebold-Mariano type test (Diebold & Mariano 1995) that answers the questionif the difference in LogS, CRPS and EPS is statistically significant. (Juutilainen et al.

2012)

3.4 Discussion

Risk probability predictors can be utilized in manufacturing planning, process controland optimization, when they are combined with the manufacturing and rejection costs.We have implemented a steering tool for quench tempering process at Ovako. The tooloptimizes the control variables (quenching temperature and time) of the process byutilizing the risk probability predictors for the tensile (RM) and yield strength (RP02)which are the most important mechanical properties of a steel product (Juutilainen et al.

2014). A graphical presentation of the system is illustrated in Figure 7. It would beinteresting to include the impact toughness model to the tool, because the key variableshave an opposite effect on CVT than on RM or RP02.

Rejection costC

p

Specifications(min, max)

OUTPUT

Which temperingtemperature andtime minimize thecosts when eachpredicted rejectionprobability is less than p.

EXPLANATORYVARIABLES FROMTHE PROCESS

QUENCH TEMPERINGVARIABLES− all possible values (grid)− default values

RP02_mean(x)Prediction model

RP02_sd(x)Prediction model

RM_mean(x)Prediction model

RM_sd(x)Prediction model

Tempering costc(x)

PREDICTEDMEAN

N(0,1) ordistributionshape

P(rejection)(for allpossiblevalues ofthe grid)

PREDICTEDMEAN

N(0,1) ordistributionshape

P(rejection)(for allpossiblevalues ofthe grid)

PREDICTED PROPERTY RP02

PREDICTEDSD

PREDICTED PROPERTY RM

PREDICTEDSD

Fig 7. A graphical presentation of a quench tempering optimization tool.

44

Page 47: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

The model integration to the process is a crucial part of data mining, when theprocess improvement is considered. It is better if several operators have an access touse the models, and the model implementation can restrict the usability as well, if themanufacturer may not be willing to invest in expensive commercial statistical programs.In this research, the majority of the modelling have been done with statistical programR that is available in the Internet for free. The operators have learned easily to usethe developed tools, and the command line interface has not been an obstacle. Theinteraction during the modelling phase has been productive, because the developedmodels have been delivered to test use right away, and the feedback has been availablequickly allowing the improvement of the model.

45

Page 48: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

46

Page 49: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

4 Target of application

In industry, a vast amount of measurements and process data are constantly collectedfrom the production. When valuable information is extracted from this data, it can beutilized in planning or optimizing of process parameters. Statistical models may help topredict the quality of a product in designing phase, for example. Elovici & Braha (2003)suggest that if the value of the extracted knowledge was combined with the amount ofavailable investments for the organization, it would improve the data utilization. Inthat sense, knowledge has value only if it leads to actions that increase the payoff ofthe decision-maker. The advantage of data mining techniques over other experimentaltechniques is that the required data for analysis can be collected during the normalmanufacturing process (Choudhary et al. 2009).

For complicated phenomena it is much easier to build models that concern onlyspecific products or subgroups of the production. However, the interpolation capabilityof these models is inadequate. A much more useful model can be achieved by includingthe whole process to modelling and this way allowing the transfer of informationbetween products. It is more difficult to build a model for the whole process, butpowerful data mining methods can help to achieve this goal.

The collaboration between the experts of both data mining and manufacturing areasis needed for a successful industrial application (Wang 2007). Usually, the data miningexpert does not have deeper knowledge of the application area, and similarly, the expertof the application area may not have a capability of performing powerful data analysis.As a result of the successful collaboration, a well-performing modelling tool will beproduced, and the operators will be motivated to use it frequently, as they have beenheard during the data mining process. Thus, the effort will be turned into processimprovement.

4.1 Model users

From the beginning of this research, the main goal has been to develop a model forimpact toughness rejection that will be used for steel design at steel factories. Therefore,it has been essential to collaborate with the metallurgists and other experts at ourcollaborative factories. The design departments have played an important part in data

47

Page 50: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

selection, but they have provided valuable user experience in model development aswell.

The operator at the product design department plans the chemical composition,possible treatments during melting (e.g. vacuum degassing) and some productionrequirements for heating, rolling and thermo mechanical treatments for each product.The model may guide designers not to use too much working allowance when keepingthe product within tolerances, and thus, to produce desired properties in the product atlower cost. If the steel mill has only a few products and a large volume of each grade,optimization of the process parameters is an easier task than in the case where the millcompetes with high quality, short delivery time and a large product range, which leads tosmall production batches at a time. In the latter case, tools that help product design willdirectly reduce the number of rejections and delays, because the amount of trial anderror will be reduced.

Our research group has developed models for predicting strength and elongation of asteel plate at Ruukki Steel, Raahe Works, Finland in earlier research projects. As aresult, we have developed a tool to help the product design process. The user in a steeldesign department enters key values of the product to the user interface, and the modelpredicts the rejection probabilities for different mechanical properties. Now, the impacttoughness model has been included in this system as well. The user interface (UI) withthe values for one steel product can be seen in Figure 8, and the estimated impact energyvalue, and the rejection probability are located in upper right corner of the screen. Thepicture of the UI was taken in actual use in Ruukki, and therefore, it is in use in Finnish.

Similar models have been developed for Ovako Bar, Imatra, Finland as well, andthey are in test use at the moment.

These two steel factories differ from each other by the products and the productdiversity, but the quality test procedure was basically the same in both of them. Byselecting both of them to this research, the methods of this thesis were applied to quitedifferent situations.

4.2 The data

Typically, the impact toughness research has concentrated on certain steel grades at atime, but in this thesis, the model is developed for the whole production line. Two steelcases from different factories were selected for the modelling. The first case was thesteel plate production line at Ruukki Metals, Raahe Works, Finland and the second one

48

Page 51: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Fig 8. The rejection model in use.

was the steel bar production line at Ovako, Imatra, Finland. These two steel cases differfrom each other by the production method and production variability. At Ovako, therolling phase is quite similar to all of the products, and only the chemical compositionand the quenching and tempering parameters were included in the modelling. At Ruukki,the product range is larger and the rolling process itself has many variables. Furthermore,at Ruukki, the Charpy-V test has different variables, including the test temperature andthe dimensions of the test sample. From the viewpoint of the thesis, it is interesting tosee how the larger product range affects the results.

The Ovako Bar data were collected during 2008-2011 and it consist of informationon nearly 8,000 steel bars. The analysis was focused on quenched and tempered steelgrades, and all of the CVTs were performed at −20◦C.

The Ruukki Metals data were collected during 2008-2011 and they consist ofinformation on nearly 90,000 low-alloy steel plates and over 70 variables with relationto nearly 90% of the steel grades in production of the mill. Some unique and demandingsteel grades were excluded from the data set. The CVT was performed on a majority ofthe steel plates at −20◦C, but the test area varied from +20◦C to −62◦C. In some cases,the test temperature can be as low as −100◦C, but none of them existed during this

49

Page 52: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

period. Careful pre-processing was performed in order to exclude defective observationsand redundant variables, for example, unnaturally low measurements and incorrectlyperformed tests.

The CVT rejection limits for the steel products are given by the customer. Therefore,similar steel products can have very different requirements depending on the end product.Furthermore, there is no target value for the quality property. For model testing, a largerange of artificial rejection limits are needed, because, the majority of the observationswould easily pass the existing requirement limits, and thus, their predicted rejectionprobability would be very low.

Artificial rejection limits s between 20 and 150 J was created. All the products weretested at these requirement limits, and for both of the data sets, the observation wasconsidered to be rejected in the CVT, if the mean of the measurements was below therequirement or if one of the measurements was 30% below the requirement. This is acommon practise, although other rejection rules exist as well.

The data sets differed from each other on how these two rules were realized atdifferent rejection levels. In Tables 2 and 3, the differences of the data sets havebeen summarised. The rejection impact energy levels are in the leftmost column, thepercentage of the objects that has been rejected because the mean and the minimum of themeasurements did not qualify is next. The percentage if the mean of the measurementsdid not qualify but the minimum was high enough is in the third column, and in therightmost column, is the percentage for the rejections where only the minimum of themeasurements was too low. The percentages have been calculated from the total numberof rejections. As it can be seen, the rejections do not distribute similarly in these datasets. It is clear that for neither of the data sets, the mean of the measurements alone is anadequate tool for rejection detection, but in Ruukki data set, quite a large amount of therejections could have been recognized by the minimum measurement only, especially atthe lower rejection levels, as in Ovako data set, that hardly existed.

It can be expected that the size of the data and diversity in the production willaffect the results of Ruukki data. More importantly, the steel grades are quite differentfrom each other. For example, the carbon (C) content in Ruukki data is much lower(0.005-0.270) than in Ovako data (0.153-0.528). As can be seen in Figure 3, the higherC content makes the transition curve more gradual, and the transition area widens.This leads to near Gaussian distribution for impact toughness. With low C content,the transition area can be so steep that it leads in the worst case to a bimodal impacttoughness distribution, where single measurements can locate either on the upper or the

50

Page 53: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Table 2. Summary of the arbitrary rejections for Ovako data set.

J mean(CVT) and mean(CVT) min(CVT)

min(CVT) below req (%) below req (%) below req (%)

20 0 0 0

30 0 100 0

40 6.79 93.21 0

50 25.63 74.08 0.28

60 33.56 66.17 0.27

70 29.33 70.36 0.31

80 27.37 72.37 0.26

90 25.49 74.48 0.03

100 26.42 73.56 0.02

110 34.07 65.93 0

120 47.62 52.38 0

130 63.32 36.68 0

140 73.41 26.59 0

150 80.94 19.06 0

lower shelf. As a result, it can be expected that, especially the Ruukki data, will benefitfrom the distribution modelling.

4.3 Discussion

As mentioned earlier in Chapter 2, the CVT does not provide a direct measurementof the fracture toughness, but the impact energy only correlates with the property.Generally, the shape and the location of the transition curve are modelled with severalsamples from the area and only for specific steel grades (e.g. Haušild 2002). Therefore,when the data is collected from a regular process, it is not possible to answer the questionof how near the transition area the designed product is. Instead, from an end user’s pointof view, the model produces valuable information in a form of the rejection probability.

The melting phase of a steel plate manufacturing produces over 20 observationsin one batch and the dependency of the observations should be taken into account

51

Page 54: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Table 3. Summary of the arbitrary rejections for Ruukki data set.

J mean(CVT) and mean(CVT) min(CVT)

min(CVT) below req (%) below req (%) below req (%)

20 31.33 12.05 56.63

30 29.05 14.19 56.76

40 24.35 33.18 42.47

50 22.77 50.30 26.93

60 35.81 44.11 20.08

70 44.02 41.71 14.27

80 49.90 38.87 11.23

90 54.69 36.70 8.61

100 58.62 34.68 6.71

110 60.18 34.71 5.11

120 59.86 36.53 3.61

130 60.53 36.75 2.72

140 59.83 38.13 2.04

150 59.10 39.35 1.56

in modelling. Similarly, the steel bar products are heat treated in batches during themanufacturing and the batches should not be divided into different data sets.

Because, the models were developed mainly for product design, only the variablesthat are available when using the model during the manufacturing could be includedin the model. Also the variables that cannot be planned beforehand were excluded.Mesotexture of the steel has an effect on impact toughness, and texture is stronglyinfluenced by rolling practice (Bhattacharjee et al. 2004). For example, the amount ofreduction and cross-reduction would have improved the prediction, but they had to beexcluded, because the variable is not used in the design. Instead of the real values, thetarget values of the chemical composition were used as well. As a result, the instabilitythat exists in the decisions of the design is included in the model correctly.

The production design department is another possible end user of the models. Theyplace the existing steel slabs into production according to the orders, and at this state,it is possible to use the actual chemical composition and other information from themelting and casting. Currently, the production design department does not utilize the

52

Page 55: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

impact toughness information, but it is possible to develop similar models to their use aswell.

In industrial processes, the data is stored in different databases within the factory,and the accessibility to the data sources should be solved before the models are usedon-line. Sometimes, the variables have to be selected based on the availability of theon-line data.

53

Page 56: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

54

Page 57: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

5 Application to impact toughness of steelproducts

This chapter represents the application results that enlighten the exceedance probabilitymodelling process when a product quality is tested with multiple measurements. Atfirst, the transformation selection is viewed, followed by model performance analysisfor mean and deviation models, quantile models, and rejection probability predictionmodels, and finally, model selection is covered.

The methods were applied for impact toughness modelling at two steel factorieswith different backgrounds. The proposed models were mainly trained with R, using thepackages nnet for MLP networks, mgcv for GAM and gamlss for GAMLSS models.The quantile regression models were trained with Matlab using resilient propagationalgorithm. The training data contained 60% of the observations, while the rest of thedata was equally divided between the test and validation data sets. The independence ofthe data sets was confirmed by grouping the data according to the production schedule.

5.1 Selecting the proper transformation

In impact toughness modelling, three test samples are measured from an observedproduct sample. Because the rejection rule has two conditions (see Section 2.2), itshould be considered carefully how these measurements should be processed. If themodelling of the distribution of the mechanical property required transformation ofthe measurements into one target variable, transformations as stated in Eq. (2) and(3) in Section 3.1.1 may be considered as well as the mean of the measurements. Thefollowing analysis shows that LIB transformation is the most suitable for applications,where the variability between the measurements is large. The models with differenttransformations are referred to as AVG model for the mean of the measurements, DISQmodel for rejection rule based transformation, and LIB model. Also, the effect of thedeviation model was considered for all of the transformations.

The model performances for the models are presented in Table 4. The performanceswere calculated for test sets in Ovako and Ruukki data. The variety of the productsleads to lower overall performance in Ruukki data, but the results indicate that theperformance of the AVG model is the best for both of the data sets. The transformation

55

Page 58: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

cannot be selected based on these results, however, as the model’s ability to predictthe average outcome of the CVT does not affect decisively the ability to predict therejection probability of the CVT, which is a more important attribute from the users’point of view.

Table 4. Model performance for the AVG, DISQ and LIB transformation in Ovako and Ruukkitest data.

Ovako Ruukki

AVG DISQ LIB AVG DISQ LIB

RMSE 9.8562 9.9996 9.9831 29.6777 32.4055 32.2252R 0.9298 0.9279 0.9277 0.9034 0.8878 0.8889

Instead, the models can be compared with the score functions, which measure theperformance of the probabilistic forecasts. The Exceedance Probability Score criterion(EPS), introduced in Section 3.3, is capable of ranking models with different responsevariables, because it is based on the rejection limits, instead of the actual response valuewhich is the case with Logarithmic Score (LogS), for example. LogS would favour themodel that has the same response value as the selected value for LogS calculation.

All transformations were analysed with and without the deviation model. Models forrejection probability with constant deviation are referred to as mean models, and modelswith deviation model are referred to as dev models. The results have been visualizedin Figures 9 and 10, where the columns show the improvement of each model incomparison to a random classifier. For Ovako data, the effect of the transformation is notstatistically significant, but for Ruukki data, it is clear that LIB or DISQ transformationsimprove the results for both model types.

56

Page 59: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Fig 9. The comparison of AVG, DISQ and LIB models by using the Exceedance ProbabilityScore (EPS) value for Ovako test data.

Fig 10. The comparison of AVG, DISQ and LIB models by using the Exceedance ProbabilityScore (EPS) value for Ruukki test data.

57

Page 60: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

The predictive accuracy of the models was evaluated with Diebold-Mariano typetest, in order to find out the statistical significance of the differences between the models.The results can be seen in Table 5.

Table 5. Diebold-Mariano type test for statistical significance of model differences for Ovakoand Ruukki test sets.

Ovako Ruukki

LIB dev, DISQ dev 3.0325 (0.0024) 1.5747 (0.1153)

LIB dev, AVG dev 1.2544 (0.2097) 8.8008 (0.0000)

LIB dev, LIB mean 5.0206 (0.0000) 12.9400 (0.0000)

LIB dev, DISQ mean 5.2140 (0.0000) 12.9018 (0.0000)

LIB dev, AVG mean 5.0658 (0.0000) 13.1565 (0.0000)

DISQ dev, AVG dev -0.65085 (0.5151) 8.2591 (0.0000)

DISQ dev, LIB mean 4.4261 (0.0000) 12.5666 (0.0000)

DISQ dev, DISQ mean 4.8507 (0.0000) 12.8095 (0.0000)

DISQ dev, AVG mean 4.5678 (0.0000) 13.0176 (0.0000)

AVG dev, LIB mean 4.1097 (0.0000) 11.8239 (0.0000)

AVG dev, DISQ mean 4.4419 (0.0000) 12.0448 (0.0000)

AVG dev, AVG mean 4,6364 (0.0000) 13.1485 (0.0000)

LIB mean, DISQ mean 1.8875 (0.0591) 1.8671 (0.0619)

LIB mean, AVG mean 0.5473 (0.5842) 11.2443 (0.0000)

DISQ mean, AVG mean -0.6345 (0.5258) 11.0589 (0.0000)

There is no statistical difference between the transformations in Ovako data, but theutilization of the deviation model clearly improves the results regardless of transforma-tions. With Ruukki data, also the proper selection of the transformation is important.The average of the measurements performs considerably worse than the others, andwhen considering the facts stated in Section 3.1.1, the LIB transformation, as moregeneral transformation, was selected for the modelling.

5.2 Modelling of impact toughness

The variable importance in impact toughness models has been presented for Ovako datain Table 6 and for Ruukki data in Table 7. The value in column mean(abs(effect)) showshow much the CVT prediction will change in average, if the value of the variable in

58

Page 61: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

question is altered. The direction of the average change can be found in the columnmean(effect), or the positive value means that an increase in the variable in question willincrease the CVT prediction, and the negative value means that an increase will decreasethe prediction. The last column sd(effect) shows what is the deviation of the effect of thevariable in question between the observation. If the deviation is large, the average effectof the variable between the observations varies a lot as well.

Table 6. Ten most important variables in CVT model for Ovako data.

mean(abs(effect)) mean(effect) sd(effect)

V 79.22 -79.22 18.15

Tempering (◦C) 70.68 70.68 20.18

C 70.48 -70.48 16.75

Ni 32.95 -32.95 11.16

Al 20.38 20.17 10.86

Cr 18.20 -18.11 6.73

Ca 17.66 -17.63 12.12

P 11.54 -11.33 9.29

Mo 10.23 -8.31 8.78

Product diameter 10.23 -1.32 12.65

59

Page 62: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Table 7. Ten most important variables in CVT model for Ruukki data.

mean(abs(effect)) mean(effect) sd(effect)

C 181.67 -181.45 91.60

Product thickness 149.49 -144.93 123.23

Test specimen thickness 126.03 123.77 78.02

Test temp. (◦C) 115.83 113.35 68.29

Normalizing 108.73 107.92 60.46

Si 82.27 -31.95 98.05

End rolling temp. (◦C) 81.07 -53.26 80.66

Mn 70.96 18.90 87.52

Mo 58.44 -53.51 50.63

V 55.66 -53.17 36.18

5.3 Deviation and distribution model in action

It was assumed that the deviation and the distribution shape may depend on theexplanatory variables, as the mean does. For deviation modelling, two different modelcandidates were trained (MLP with neural networks and GAM), and the latter wasselected for rejection probability prediction. In Figure 11, it can be seen that thepredicted deviation decreases when the Carbon concentration increases in both datasets. In other words, the model error is smaller when the C concentration is high. Theexplanation can be found in Figure 3, where the transition curve gets more gradual whenthe C concentration rises. As a result, the distribution of CVT measurements gets morepredictable.

60

Page 63: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

0.20 0.25 0.30 0.35 0.40 0.45

510

15

20

C

Pre

dic

ted d

evia

tion

0.10 0.15 0.20 0.25

20

40

60

80

100

C

Pre

dic

ted d

evia

tion

Fig 11. The dependence between one input variable (C) and predicted deviation using Ovako(left) and Ruukki (right) test data.

It is interesting to find out how the predicted deviation depends on the predictedmean. The scatter plot of the predicted impact toughness and the predicted deviation aremore informative, if different product groups are visible in the plot as well. In Figure 12,the groups have been determined by the production method. Group 3 in Ovako datadoes not get high predicted impact toughness estimates, nor high deviation estimates.Group 2 is the least homogenous, because it is a combination of the production methodsthat cannot be classified otherwise. These observations get higher deviation estimateswhen the CVT estimate is higher, except for a few odd observations. For Ruukki data,both of the estimates have a much larger range, but the differences between the groupsare more visible as well. Group 3 has the highest deviation estimates, and they are notdependent on the CVT estimate. Group 1 has only very small deviation estimates. Thedeviation of Group 4 and 5 act similarly, although Group 4 has lower CVT estimates.In both data sets, Group 1 represents products that generally have no specific or verylow requirement for impact toughness. These observations have very high Carboncontent and/or their production method is simple. In summary, the production methodhas naturally an impact on the CVT measurements, and the corresponding deviationmeasurements have been affected as well.

For deviation model, it is not advisable to plot the measured values against thepredicted ones. Instead, the data should be divided into homogenous groups for whichthe typical predicted deviation and the average prediction error can be determined. In

61

Page 64: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

50 100 150 200

05

10

15

20

25

Predicted distribution of impact toughness

Predicted Impact Toughness (J)

Pre

dic

ted

devia

tio

n (

J)

Group 1Group 2Group 3Group 4

0 100 200 300 400

020

40

60

80

100

Predicted distribution of impact toughness

Predicted Impact Toughness (J)

Pre

dic

ted

devia

tio

n (

J)

group 1group 2group 3group 4group 5

Fig 12. Predicted impact toughness and deviation with colour codes for different productgroups using Ovako (left) and Ruukki (right) test data.

Figure 13, the data groups have been defined by production distribution using k-meansclustering. For both datasets, the predicted and measured deviations are well correlated.

4 6 8 10 12 14 16 18

68

10

12

14

The predicted and measured deviation in groups defined by predicted distribution

Predicted deviation

Me

asu

red

devia

tio

n

20 40 60 80

20

40

60

80

The predicted and measured deviation in groups defined by predicted distribution

Predicted deviation

Me

asu

red

devia

tio

n

Fig 13. Predicted and measured deviation in homogenous groups defined by predicted dis-tribution using Ovako (left) and Ruukki (right) test data.

In Figure 14, the data groups have been defined by the product number. The resultedscatter plots are a bit more spread out, but especially for Ovako data, the groups definedby the product number are not very homogenous.

62

Page 65: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

6 8 10 12 14 16

68

10

12

14

16

The predicted and measured deviation in groups defined by product number

Predicted deviation

Me

asu

red

devia

tio

n

20 30 40 50 60 70

20

30

40

50

60

70

80

The predicted and measured deviation in groups defined by product number

Predicted deviationM

ea

su

red

devia

tio

n

Fig 14. Predicted and measured deviation in homogenous groups defined by the productnumber using Ovako (left) and Ruukki (right) test data.

The distribution shape was modelled with GAMLSS-model, and the best distributionfor both data sets was the skew t type 3 (ST3), which is based on the work of Fernandez& Steel (1998). For Ovako data, the shape parameters ν and τ in the model wereconstant, but for Ruukki data, they were dependent on the input variables. In Figure 15,it can be seen how strongly ν-parameter depends on the thickness of the CVT-specimenand τ-parameter depends on the C content.

0.10 0.15 0.20 0.25

0.8

80.9

00.9

20.9

40.9

60.9

81.0

0

C

Pre

dic

ted

ν−

dis

trib

utio

n p

ara

me

ter

0.5 0.6 0.7 0.8 0.9 1.0

050

100

150

CVT thickness

Pre

dic

ted

τ−

dis

trib

utio

n p

ara

me

ter

Fig 15. The predicted shape parameter values and their most important predictors for Ruukkidata.

63

Page 66: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

5.4 Selecting the quantile model

Because, the models for different quantiles are independent, the complexity of themodels can be altered, if necessary. It feels natural to expect that the modelling of theoutermost quantiles is not very reliable because of the sparse data, and the selectedmodel should be simple, in order to avoid over fitting. When building a quantile modelin practise, it is unreasonable to train many different model combinations for eachquantile, and thus, some guidelines for model complexity selection are outlined next.

In order to find out the suitable model complexities, four different model types weretrained. The simplest type of model can have only a few dependent variables. Here, themodel was trained with the same variables as the deviation model (quantile model 1).The predicted CVT value from the mean model was included as a dependent variable forthe quantile model 2. Quantile model 3 had the same dependent variables as the meanmodel, and the predicted CVT value was included in the quantile model 4 as well.

It can be seen in Figure 16 that the F-measure varies surprisingly little betweenthe models, especially near the median. The right model type for each quantile cannotbe selected based on F-measures only. The model selection criteria presented inSection 3.3 provide a more reliable measure for quantile selection. As a result, it can berecommended that for outermost quantiles q = 0.01, . . . ,0,1 and q = 0.9, . . . ,0.99 thesimplest quantile model 1 is adequate, for quantiles q = 0.2, . . . ,0.4 and q = 0.6, . . . ,0.8,model 2 improves the results and for the median model q = 0.5, the most complexmodel 4 performs the best.

64

Page 67: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Fig 16. The F-measure values for different quantile models at Ovako validation data.

If the data distribution is assumed to be other than Gaussian, the model complexitycan be asymmetrical as well. In the case of Ruukki data, the best combination includedmore modelling power towards the lower end of the data set, which can be indicatedfrom the long right tail of the CVT-distribution as well. Model 1 was selected forquantiles q = 0.01 and q = 0.9, . . . ,0.99, model 2 for quantiles q = 0.05, . . . ,0.3 andq = 0.7,0.8, and model 3 for the quantiles q = 0.4, . . . ,0.6.

5.5 The rejection probability

From the operator’s point of view, the predicted rejection probability makes sense, whenit is compared to the real rejections. This can be done visually by selecting interestingproduct groups and by tagging the locations of the estimated rejections. The analysishave to be done separately for each rejection level and only one homogenous groupat a time, and therefore, the method cannot be used for model selection, but for theoperator, it is a valuable tool for planning. Two different products from both data setswere selected for the following examples which enlighten the subject.

In Figure 17, using the Ovako test data of product 1, the chosen risk level fordisqualification is p = 0.05 and the disqualification level is 60 J. With the chosen risk,the number of true positives is almost similar for all the models, except one observationthat has been recognized only by model 1. The number of false positives differs from

65

Page 68: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

model to model. Especially model 4 would reject clearly well performing products. Thereason for this is that the 0.05th quantile predicts high variation for these observations.The predicted mean (models 1, 2 and 3) and predicted median (model 4) are modelledquite similarly for this product.

40 60 80 100

40

60

80

10

01

20

Product 1 , p−value 0.05

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(a) Model 1

40 60 80 100

40

60

80

10

01

20

Product 1 , p−value 0.05

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(b) Model 2

40 60 80 100

40

60

80

10

01

20

Product 1 , p−value 0.05

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(c) Model 3

40 50 60 70 80 90 100 110

40

60

80

10

01

20

Product 1 , quantile 0.05

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(d) Model 4

Fig 17. The scatter plot for predicted and measured impact toughness values for four modelsusing product 1 data from Ovako. The predicted rejections are marked with solid red circles.

Product 2 has a higher CVT level and thus, the disqualification level is chosen to be100 J. The risk level for disqualification is p = 0.2. The results can be seen in Figure 18.

66

Page 69: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

The differences between models disappear at the higher disqualification levels almostcompletely, which can be seen in ROC analysis in the next section as well.

60 80 100 120 140 160

40

60

80

10

01

20

14

01

60

Product 2 , p−value 0.2

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(a) Model 1

60 80 100 120 140 1604

06

08

01

00

12

01

40

16

0

Product 2 , p−value 0.2

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(b) Model 2

60 80 100 120 140 160

40

60

80

10

01

20

14

01

60

Product 2 , p−value 0.2

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(c) Model 3

60 80 100 120 140 160

40

60

80

10

01

20

14

01

60

Product 2 , quantile 0.2

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(d) Model 4

Fig 18. The scatter plot for predicted and measured impact toughness values for four modelsusing product 2 data from Ovako. The predicted rejections are marked with solid red circles.

The CVT values of the Ruukki data are much higher and although the rejectionlimits in production might be much lower, in the following scatter plots, the rejectionlevel has been chosen at 100 J (Figure 19) and 130 J (Figure 20). In lower levels, thenumber of false positives is lower, but the differences between models may be lessinteresting. In Figure 19, the risk level for disqualification is p = 0.2 for product 1. The

67

Page 70: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

first difference between the models that stands out, is the number of false positives.Models 2 and 3 come out better than the others. The number of false negatives is lowerfor models 1 and 4, however.

50 100 150 200

50

10

01

50

20

02

50

Product 1 , p−value 0.2

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(a) Model 1

50 100 150 200

50

10

01

50

20

02

50

Product 1 , p−value 0.2

Predicted Impact Toughness (J)M

ea

su

red

Im

pa

ct

To

ug

hn

ess (

J)

(b) Model 2

50 100 150 200

50

10

01

50

20

02

50

Product 1 , p−value 0.2

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(c) Model 3

100 150 200

50

10

01

50

20

02

50

Product 1 , q= 0.2

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(d) Model 4

Fig 19. The scatter plot for predicted and measured impact toughness values for four modelsusing product 1 data from Ruukki. The predicted rejections are marked with solid red circles.

68

Page 71: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

150 200 250

50

10

01

50

20

02

50

30

03

50

Product 2 , p−value 0.1

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(a) Model 1

150 200 250

50

10

01

50

20

02

50

30

03

50

Product 2 , p−value 0.1

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(b) Model 2

150 200 250

50

10

01

50

20

02

50

30

03

50

Product 2 , p−value 0.1

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(c) Model 3

150 200 250 300

50

10

01

50

20

02

50

30

03

50

Product 2 , q= 0.1

Predicted Impact Toughness (J)

Me

asu

red

Im

pa

ct

To

ug

hn

ess (

J)

(d) Model 4

Fig 20. The scatter plot for predicted and measured impact toughness values for four modelsusing product 2 data from Ruukki. The predicted rejections are marked with solid red circles.

Product 2 has even higher CVT measurements than product 1 (Figure 20). Only themedian model of model 4 is able to predict the highest values well. On the other hand,the measurements in the middle part of the scatter plot are difficult to predict with all themodels. The predicted rejections with the risk level p = 0.1 look quite similar to theones of product 1. Model 1 seems to recognize one rejection which is a false negativewith the other models, but the number of false positives is much higher.

This kind of analysis of the results is the most meaningful for the process operator,who is interested in the specific products. This way, it is possible to find out that a

69

Page 72: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

certain model performs well locally with some critical products, but it does not provideinformation about the overall performance.

5.6 Model selection

Commonly, the accuracy of different classification models is visualized with ROC(receiver operating characteristics) curves. The method will graphically display thetrade-off between false-negative (1-true-positive) and false-positive rates obtained byvarying the classification criteria (Dorling et al. 2003).

In Figure 21, the ROC curves for Ovako data at 40 J and 70 J rejection limits havebeen visualized. Only the interesting parts of the plots have been shown in order thatthe model differences become more visible. It can be seen that the model differencesdisappear nearly completely with the higher rejection limit at the figure on the right. Thesimilar behaviour in model performances was visible in Figure 18 as well. Curves formodels 2 and 3 are identical.

ROC−curve

False positive rate

Tru

e p

ositiv

e r

ate

0.0 0.1 0.2 0.3 0.4 0.5

0.5

0.6

0.7

0.8

0.9

1.0

Gaussian model

Gaussian model for mean and deviation

Model for distribution shape

Quantile model

ROC−curve

False positive rate

Tru

e p

ositiv

e r

ate

0.0 0.1 0.2 0.3 0.4 0.5

0.5

0.6

0.7

0.8

0.9

1.0

Gaussian model

Gaussian model for mean and deviation

Model for distribution shape

Quantile model

Fig 21. ROC analysis in a test set for Ovako data with a 40 J (left) and 70 J (right) rejectionlimits.

The ROC curves for Ruukki data in Figure 22 show that model 1 is definitelynot adequate for a more diverse data set. Otherwise, the model differences disappearsimilarly to Ovako data when the rejection limit is raised. The area under ROC curvevalues (AUC) has been calculated for different models in original Papers II and III,but because it is not advisable to use them in model selection, they have been omitted

70

Page 73: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

here. According to Dorling et al. (2003), the costs related to the false positive and falsenegative decisions have an impact on the rejection probability level selection. If theywere available, they should be utilized in decision making. The role of these costs is notparticularly important in product design, but they may have an impact on productionvariable optimization.

ROC−curve

False positive rate

Tru

e p

ositiv

e r

ate

0.0 0.1 0.2 0.3 0.4 0.5

0.5

0.6

0.7

0.8

0.9

1.0

Gaussian model

Gaussian model for mean and deviation

Model for distribution shape

Quantile model

ROC−curve

False positive rate

Tru

e p

ositiv

e r

ate

0.0 0.1 0.2 0.3 0.4 0.5

0.5

0.6

0.7

0.8

0.9

1.0

Gaussian model

Gaussian model for mean and deviation

Model for distribution shape

Quantile model

Fig 22. ROC analysis in a test set for Ruukki data with a 40 J (left) and 70 J (right) rejectionlimits.

It is advisable to check if there are differences in how well the probability predictorsare calibrated. It can be inspected visually at homogenous groups in the test data set byplotting the observed proportions of the disqualifications against the predicted ones.The observed groups should have homogenous predicted distribution within, and smallgroups should be omitted from the plot. Here, the groups have been formed withk-means clustering, and two different disqualification areas have been selected, becauseall of them cannot be visualized simultaneously.

The proportions of the true disqualifications and predicted risks for Ovako data areillustrated in Figure 23 and 24. Model 1 is in the left upper corner, model 2 in the rightupper corner, model 3 in the left lower corner and model 4 in the right lower corner,respectively. In general, the models are well calibrated, although, model 1 seems toperform a bit worse than the others.

71

Page 74: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

0.00 0.05 0.10 0.15 0.20

0.0

00

.05

0.1

00

.15

0.2

0

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(a) Model 1

0.00 0.05 0.10 0.15 0.20

0.0

00

.05

0.1

00

.15

0.2

0

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(b) Model 2

0.00 0.05 0.10 0.15 0.20

0.0

00

.05

0.1

00

.15

0.2

0

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(c) Model 3

0.00 0.05 0.10 0.15 0.20

0.0

00

.05

0.1

00

.15

0.2

0

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(d) Model 4

Fig 23. The proportion of the disqualifications and the predicted risk in 40 J for Ovako data.

72

Page 75: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

0.00 0.05 0.10 0.15 0.20 0.25 0.30

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(a) Model 1

0.00 0.05 0.10 0.15 0.20 0.25 0.30

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(b) Model 2

0.00 0.05 0.10 0.15 0.20 0.25 0.30

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(c) Model 3

0.00 0.05 0.10 0.15 0.20 0.25 0.30

0.0

00

.05

0.1

00

.15

0.2

00

.25

0.3

0

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(d) Model 4

Fig 24. The proportion of the disqualifications and the predicted risk in 70 J for Ovako data.

The calibration of models for Ruukki seem to be good as well (Figure 25 and 26).The size of the data set allows the use of larger groups in k-means clustering, whichimproves the reliability of the results.

73

Page 76: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

0.0 0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(a) Model 1

0.0 0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(b) Model 2

0.0 0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(c) Model 3

0.0 0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(d) Model 4

Fig 25. The proportion of the disqualifications and the predicted risk in 40 J for Ruukki data.

74

Page 77: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

0.0 0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(a) Model 1

0.0 0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(b) Model 2

0.0 0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(c) Model 3

0.0 0.1 0.2 0.3 0.4

0.0

0.1

0.2

0.3

0.4

Calibration in clusters defined by input variables

Predicted risk

Measure

d p

roport

ion o

f dis

qualif

ications

(d) Model 4

Fig 26. The proportion of the disqualifications and the predicted risk in 70 J for Ruukki data.

The original Paper V presents the model selection with different ranking methodsintroduced in Section 3.3. The EPS method has proven to be the most reliable one, andaccording to it, the deviation model improves the exceedance probability predictionsignificantly for both datasets compared to the reference method. For Ovako data, thebenefits of the distribution modelling are marginal and the quantile model does notmanage to outperform the other two methods, although it is better than the referencemethod. This may be caused by the smaller size of the data. For Ruukki data, the EPS

75

Page 78: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

finds no difference between the distribution shape and the quantile models, but it is clearthat they both outperform the deviation model.

The results of the research show that the exceedance probability prediction shouldnot be based on mean modelling only. Depending on the application area, the deviationmodelling may be adequate, but if the data set is based on a more complex area, theeffort of distribution modelling will pay back.

76

Page 79: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

6 Summary and conclusions

The prediction of a quality property that consists of several measurements can be acomplex task that may also involve methods for modelling the distribution, especially ifthe diversity of the products is high. In quality verification, several test specimens areneeded when the measured property has high variability within the product. In thisthesis, the methods for several test measurements were applied for an impact toughnessprediction of different steel products. The property was measured with a Charpy-V test,which is a widely used cost-effective material testing procedure.

The exceedance probability of requirements in the CVT was selected for qualityprediction. Because the interest is in adequately high CVT value instead of an optimalone in production, the selected approach is more natural. The risk of failure is asuitable concept in product planning, although for an inexperienced operator, the ideaof probability might be unfamiliar at first. The exceedance probability can easily becombined with the manufacturing and rejection costs to help the decision making.

When there are more than one measurement for the response variable, transformationinto one variable may be required, depending on the modelling method. If the averagevalue of the CVT measurements was the property that the operator utilized in productplanning, the transformation of the measurements would have been the mean. Inthis thesis, it was shown that in that case the information of the scattering of themeasurements would be lost, and the possible rejection of the product could escape one’sattention. The character of this quality property is that the values of the measurementscan vary greatly within the product, and thus, the lowest measurement is as importantas the mean. As a result, the LIB transformation of the measurements was suggested.It effectively recognizes test sets consisting of one very low measurement with otheradequately high measurements.

The assumption was that it is advisable to apply density forecasts instead of thepoint forecasts for the prediction, and thus, to include the distributional properties inthe prediction. In this thesis, it is shown that Charpy-V test exceedance probabilitymodelling can benefit from the use of an additional variance model together with a meanmodel in rejection probability prediction. The distribution shape modelling improves themodel further, when the response variable is not Gaussian.

77

Page 80: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

The proposed prediction methods are fundamentally different, and thus, the selectionof model ranking criteria has been considered as well. The commonly used LogS rankingmethod proved to be less reliable than others. It is not suitable here, because it mayfavour models that have the same response variable y that is used for score calculation.It is also insensitive to distance, and thus, cannot reward probabilities predicted lownear the region with high probabilities. Score functions that are based on the rejectionlimits instead of the response value itself are more suitable for model comparison, whenmodels have different response variables. CRPS and EPS were compared, and EPS wasselected as the most advisable method for these applications. Compared to CRPS, itpenalizes more the model that produces very unlikely probabilities.

The data analysis and inference group (DataAI) at Oulu University has a long historyof studying the modelling of mechanical properties of steel plates, and models of tensilestrength, yield strength, and elongation have been developed earlier. Because of thecomplicated nature of impact toughness, the first task in this thesis was to find out if theproperty could be modelled to a similar extent at all. A further motivation was to includethe model in a simulation tool that the product design department uses for evaluating themechanical properties of a product, and the exceedance probability for the requirements.

In this thesis, it has been shown that it is possible to construct a product designmodel for a whole product range, including all possible test temperatures. The modelwas based on a risk probability prediction in a quality test, when the test set consists ofseveral test samples. The methods were applied for two different test cases in steelindustry.

The scale of the product assortment of the production line, as well as the steel type,determines the complexity needed in the modelling along with the steel type, and thedifferences in the results of two test cases were significant. The test case with a widerrange of products benefited more from the deviation and distribution modelling, becausethe assumption of Gaussian properties involved with mean modelling were violatedmore probably.

In practise, the customer sets the rejection limit for each order, and the rejectiondata available for the research was not useful for testing the performance of the models.The real rejection limits are often quite low, and because the optimal value for impacttoughness does not exist, a large range of exceedance limits was selected instead. Here,the highest values on the right tail of the distribution were excluded, because they werenot meaningful for this application, and the focus was on the interesting part of thedistribution.

78

Page 81: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

The prediction results of the disqualification probability of Charpy-V quality testwere improved with deviation modelling, and for Ruukki data, the distribution shapemodel and quantile model improved the results further. Compared to other methods,the quantile regression is a very time consuming, because every quantile needs to bemodelled independently. On the other hand, the quantile model is appealing, becauseno assumption about the distribution needs to be done. However, the laborious andcumbersome method can be rejected, if the modeller is willing to try several differentdistributions for GAMLSS-modelling. This way, the non-Gaussian data can be modelledwith confidence.

The diversity and the larger size of the other data caused differences between themodels as well. The transition area of the typical product in Ovako data is more gradualcompared to Ruukki data, because of the higher carbon content, and this leads tonear Gaussian impact toughness distribution. For certain products in Ruukki data, thetransition area can be so steep that it leads to bimodal impact toughness distribution,where single measurements can locate either on upper or lower shelf of the transitioncurve, and thus, the distributional modelling improved the results. Furthermore, theperformance of the quantile regression for Ovako data may have suffered from thesubstantially smaller size of the data. The number of observations near the tails mayhave been too low for accurate quantile estimation, and therefore, better results wereachieved with deviation and distribution models. To summarize, in situations comparedto Ruukki data, the deviation modelling is advisable, and the distribution shape model isworth trying as well.

This research showed up the importance of taking into account the distributionalproperties of the modelled quality property. In the future, it would be interesting totry these methods in other application areas as well. There may be many challengingprediction tasks that do not necessarily relate to quality testing, but still have similardifficulties in modelling. It would be desirable to get experiences from other types ofdistributions as well.

79

Page 82: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

80

Page 83: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

References

Aitkin M (1987) Modelling variance heterogeneity in normal regression using glim. AppliedStatistics 36(3): 332–339.

Bhadeshia H (1999) Neural networks in materials science. ISIJ International 39(10): 966–979.Bhadeshia H, MacKay D & Svensson LE (1995) The impact toughness of C-Mn steel arc-welds -

a Bayesian neural network analysis. Materials Science and Technology 11: 1046–1051.Bhattacharjee D, Davis C & Knott J (2003) Predictability of charpy impact toughness in

thermomechanically control rolled (TMCR) microalloyed steels. Ironmaking and Steelmaking30(3): 249–255.

Bhattacharjee D, Knott J & Davis C (2004) Charpy-impact-toughness prediction using an "effec-tive" grain size for thermomechanically controlled rolled microalloyed steels. Metallurgicaland Materials Transactions A 35A: 121–130.

Boyd J & White H (1994) Estimating data dispersion using neural networks. Proc. IEEE WorldCongress of Artificial Intelligence, Orlando, 2175–2178.

Cade B & Noon R (2003) A gentle introduction to quantile regression for ecologists. Frontiers inEcology and the Environment 1(8): 412–420.

Cannon A (2011) Quantile regression neural networks: Implementation in R and application toprecipitation downscaling. Computers & Geosciences 37(9): 1277–1284.

Cao L, Wu S & Flewitt P (2012) Comparison of ductile-to-brittle transition curve fitting approaches.International Journal of Pressure Vessels and Piping (93-94): 12–16.

Carroll R & Ruppert D (1988) Transformation and Weighting in Regression. Chapman and Hall,USA.

Cawley G, Janacek G, Haylock M & Dorling S (2007) Predictive uncertainty in environmentalmodelling. Neural Networks 20: 537–549.

Chen C & Wei Y (2005) Computational issues for quantile regression. The Indian Journal ofStatistics 67, Part 2: 399–417.

Chen MY & Chen JE (2005) Application of quantile regression to estimation of value at risk.Review of Financial Risk Management 1: 1–15.

Choudhary A, Harding J & Tiwari M (2009) Data mining in manufacturing: A review based onthe kind of knowledge. Journal of Intelligent Manufacturing 20(5): 501–521.

Cook N (2007) Use and misuse of the receiver operating characteristic curve in risk prediction115: 928–935.

Diebold F & Mariano R (1995) Comparing predictive accuracy. Journal of Business and EconomicStatistics 13: 253–263.

Diks C, Panchenko V & van Dijk D (2011) Likelihood-based scoring rules for comparing densityforecasts in tails. Journal of Econometrics 163(2): 215–230.

Dorling S, Foxall R, Mandic D & Cawley G (2003) Maximum likelihood cost functions for neuralnetwork models of air quality data. Atmospheric Environment 37: 3435–3443.

Elovici Y & Braha D (2003) A decision-theoretic approach to data mining. IEEE Transactions onSystems, Man, and Cybernetics U Part A: Systems and Humans 33(1): 42–51.

Elsilä U & Röning J (2004) Defect prediction in hot strip rolling. Ironmaking and Steelmaking31(3): 241–248.

81

Page 84: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Elsilä U & Röning J (2006) Analysis of wedge formation in hot strip rolling after continuouscasting. - modeling of casting, welding, and advanced solidification proccesses. Eds. C-A.Candin & M. Bellet. USA. The minerals, metals & materials society 2006: 767-774.

Engel J (1992) Modelling variation in industrial experiments. Applied Statistics 41(3): 579–593.Fernandez C & Steel M (1998) On Bayesian modelling of fat tails and skewness. Journal of The

American Statistical Association 93(441): 359–371.Gneiting T & Raftery A (2007) Strictly proper scoring rules, prediction, and estimation. Journal

of the American Statistical Association 102(477): 359–378.Golodnikov A, Macheret Y, Trindade A, Uryasev S & Zrazhevsky G (2005) Statistical modelling

of composition and processing parameters for alloy development. Modelling and Simulationin Materials Science and Engineering 13(4): 633–644.

Haapamäki J & Röning J (2005) Generic algorithms in hot steel rolling for scale defect prediction.Proc. The Third World Enformatica Conference (WEC05), Istanbul, Turkey, 5: 1–4.

Hall P, Wolff R & Yao Q (1999) Methods for estimating a conditional distribution function.Journal of the American Statistical Association 94: 154–163.

Hand D, Mannila H & Smyth P (2001) Principles of Data Mining. The MIT Press, Cambridge,MA, USA.

Harding J, Shahbaz M, Srinivas & Kusiak A (2006) Data mining in manufacturing: A review.Journal of Manufacturing Science and Engineering 128: 969–976.

Harvey A (1976) Regression models with multiplicative heteroscedasticity. Econometrica 44(3):461–465.

Hastie T & Tibshirani R (1986) Generalized additive models. Statistical Science 1(3): 297–318.Hastie T, Tibshirani R & Friedman J (2001) The Elements of Statistical Learning: Data Mining,

Inference, and Prediction. Springer, New York.Haušild P (2002) The influence of ductile tearing on fracture energy in the ductile-to-brittle

transition temperature range. Materials Science and Engineering A335: 164–174.He S, Wang G & Li L (2009) Quality improvement using data mining in manufacturing processes.

In: Ponce J & Karahoca A (eds) Data Mining and Knowledge Discovery in Real LifeApplications, 357–372. I-Tech, Austria.

Hyndman R & Fan S (2010) Density forecasting for long-term peak electricity demand. IEEETransactions on Power Systems 25(2): 1142–1153.

Juutilainen I (2006) Modelling of conditional variance and uncertainty using industrial processdata. Ph.D. thesis, University of Oulu, Finland.

Juutilainen I & Röning J (2004) Modelling the probability of rejection in a qualification testbased on process data. Proc. 16th Symposium of IASC (COMPSTAT 2004), Prague, CzechRepublic, 1271–1278.

Juutilainen I & Röning J (2006) Planning of strength margins using joint modelling of mean anddispersion. Materials and Manufacturing Processes 21(4): 367–373.

Juutilainen I, Tamminen S & Röning J (2012) Exceedance probability score: A novel measure forcomparing probabilistic predictions. Journal of Statistical Theory and Practice 6(3): 452–467.

Juutilainen I, Tamminen S & Röning J (2013) Visualizing predicted and observed densities jointlywith beanplot. Communications in Statistics - Theory and Methods (In press).

Juutilainen I, Tamminen S & Röning J (2014) Density forecast based failing probability predictorsin manufacturing. European Journal of Industrial Engineering (In press).

Juutilainen I, Tuovinen L, Laurinen P, Koskimäki H & Röning J (2011) Semi-automatic mainte-nance of regression models: an application in the steel industry. Journal of Computing and

82

Page 85: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Information Technology 19(3): 71–82.Kampstra P (2008) Beanplot: A boxplot alternative for visual comparison of distributions. Journal

of Statistical Software, Code Snippets 28(1): 1–9.Knott J (2008) Quantifying the quality of steel. Ironmaking and Steelmaking 35(4): 264–282.Koenker R (2005) Quantile Regression. Cambridge University Press, USA.Koenker R & Hallock K (2001) Quantile regression. Journal of Economic Perspectives 15(4):

143–156.Köksal G, Batmaz I & Testik C (2011) A review of data mining applications for quality

improvement in manufacturing industry. Expert Systems with Applications 38: 13448–13467.Koskimäki H, Laurinen P, Haapalainen E, Tuovinen L & Röning J (2007) Application of the

extended knn method to resistance spot welding process identification and the benefits ofprocess information. IEEE Transactions on Industrial Electronics 54(5): 2823–2830.

Kurban V, Yatsenko N & Belyakova V (2007) Feasibility of using neural networks for real-timeprediction of the mechanical properties of finished rolled products. Metallurgist 51(1): 24–26.

Laio F & Tamea S (2007) Verification tools for probabilistic forecasts of continuous hydrologicalvariables. Hydrology and Earth System Sciences 11(4): 1267–1277.

Laurinen P & Röning J (2005) An adaptive neural network model for predicting the post roughingmill temperature of steel slabs in the reheating furnace. Journal of Materials ProcessingTechnology 168(3): 423–430.

Laurinen P, Tuovinen L & Röning J (2005) Smart archive: A component-based data miningapplication framework. Proc. 5th International Conference on Intelligent Systems Design andApplications, Wroclaw, Poland, 20–26.

Lindroos, Sulonen & Veistinen (1986) Uudistettu Miekk-ojan metallioppi. Otava, Finland. InFinnish.

Little M & McSharry P (2009) Generalized linear models for site-specific density forecasting ofU.K. daily rainfall. Monthly Weather Review 137: 1029–1045.

Lobo J, Jiménez-Valverde A & Real R (2008) AUC: A misleading measure of the performance ofpredictive distribution models. Global Ecology and Biogeography 17: 145–151.

Malinov S, Sha W & McKeown J (2001) Modelling the correlation between processing parametersand properties in titanium alloys using artificial neural network. Computational MaterialsScience 21: 375–394.

Maloof M (2003) Learning when data sets are imbalanced and when costs are unequal andunknown. Proc. 20th International Conference on Machine Learning (ICML 2003), Workshopon Learning from Imbalanced Data Sets II, Washington, DC, USA.

Morris J, Guo Z, Krenn C & Kim YH (2001) The limits of strength and toughness in steel. ISIJInternational 41(6): 599–611.

Moskovic R & Flewitt P (1997) An overview of the principles of modeling charpy impact energydata using statistical analyses. Metallurgical and Materials Transactions A 28A: 2609–2623.

Panagiotelis A & Smith M (2008) Bayesian density forecasting of intraday electricity prices usingmultivariate skew t distributions. International Journal of Forecasting 24: 710–727.

Peracchi F (2002) On estimating conditional quantiles and distribution functions. ComputationalStatistics & Data Analysis 38: 433–447.

Peterson A, Papes M & Saberón J (2008) Rethinking receiver operating charasteristic analysisapplications in ecological niche modeling. Ecological Modelling 213: 63–72.

Prawoto Y (2009) Designing steel microstructure based on fracture mechanics approach. MaterialsScience and Engineering A 507: 74–86.

83

Page 86: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Provost F & Fawcett T (2001) Robust classification for imprecise environments. Machine Learning42: 203–231.

Rees G, Bakhshi S, Surujlal-Harry A, Stasinopoulos M & Baker A (2010) A computerised tailoredintervention for increasing intakes of fruit, vegetables, brown bread and wholegrain cereals inadolescent girls. Public Health Nutrition 13(8): 1271–1278.

Saerens M (2000) Building cost functions minimizing to some summary statistics. IEEETransactions on Neural Networks 11(6): 1263–1271.

Scandroglio G, Gori A, Vaccaro E & Voudouris V (2013) Estimating VaR and ES of the spotprice of oil using feture-varying centiles. International Journal of Financial Engineering andRisk Management 1(1): 6–19.

Serinaldi F & Kilsby C (2012) A modular class of multisite monthly rainfall generators for waterresource management and impact studies. Journal of Hydrology 464-465: 528–540.

Smyth G, Huele A & Verbyla A (2001) Exact and approximate REML for heteroscedasticregression. Statistical Modelling 1(3): 161–175.

Sokolova M, Japkowicz N & Szpakowicz S (2006) Beyond accuracy, F-score and ROC: Afamily of discriminant measures for performance evaluation. Proc. Australian Conference onArtificial Intelligence, Lecture Notes in Computer Science, 4304: 1015–1021.

Stasinopoulos D & Rigby R (2007) Generalized additive models for location scale and shape(gamlss) in r. Journal of Statistical Software 23: 1–46.

Takeuchi I, Le Q, Sears T & Smola A (2006) Nonparametric quantile estimation. Journal ofMachine Learning Research 7: 1231–1264.

Takeuchi I, Yamanaka N & Furuhashi T (2003) Robust regression under asymmetric or/and non-constant variance error by simultaneously training conditional quantiles. Proc. InternationalJoint Conference on Neural Networks (IJCNN 2003), Portland, Oregon, 4: 1729–1734.

Tay A & Wallis K (2000) Density forecasting: A survey. Journal of Forecasting 19(4): 235–254.Taylor J (2000) A quantile regression neural network approach to estimating the conditional

density of multiperiod returns. Journal of Forecasting 19: 299–311.Tiensuu H, Juutilainen I & Röning J (2011) Modeling the temperature of hot rolled steel plate with

semisupervised learning methods. Proc. Proc. 14th International Conference on DiscoveryScience (DS 2011), Lecture Notes in Computer Science, Springer, 6926: 351–364.

Todinov M (2004) Uncertainty and risk associated with the Charpy impact energy of multi-runwelds. Nuclear Engineering and Design 231: 27–38.

Tong L & Su C (1997) The optimization of multi-response problems in the Taguchi method.International Journal of Quality & Reliability Management 14(4): 367–380.

Tóth L, Rossmanith HP & Siewert T (2002) Historical background and development of the charpytest. In: François D & Pineau A (eds) From Charpy to Present Impact Testing. ElsevierScience Ltd., UK.

Verbyla A (1993) Modelling variance heterogeneity: Residual maximum likelihood and diagnos-tics. Journal of the Royal Statistical Society: Series B 55(2): 493–508.

Wallin K, Nevasmaa P, Planman T & Valo M (2002) Evaluation of the charpy-v test from a qualitycontrol test to a materials evaluation tool for structural integrity assessment. In: François D &Pineau A (eds) From Charpy to Present Impact Testing. Elsevier Science Ltd., UK.

Wang K (2007) Applying data mining to manufacturing: The nature and implications. Journal ofIntelligent Manufacturing 18(4): 487–495.

Windle P, Crowder M & Moskovic R (1996) A statistical model for the analysis and prediction ofthe effect of neutron irradiation on charpy impact energy curves. Nuclear Engineering and

84

Page 87: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Design 165: 43–56.World Steel Association (2012) Sustainable Steel: At the Core of a Green Economy. Belgium.

URL: http://www.worldsteel.org/dms/internetDocumentList/bookshop/Sustainable-steel-at-the-core-of-a-green-economy/document/Sustainable-steel-at-the-core-of-a-green-economy.pdf. Cited 2012/12/19.

Yu K & Jones M (1998) Local linear quantile regression. Journal of the American StatisticalAssociation 93(441): 228–237.

Yu K, Lu Z & Stander J (2003) Quantile regression: Applications and current research areas. TheStatistician 52(3): 331–350.

85

Page 88: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

86

Page 89: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

Original publications

I Tamminen S & Juutilainen I & Röning J (2008) Product Design Model for Impact ToughnessEstimation in Steel Plate Manufacturing. Proc International Joint Conference on NeuralNetworks, Hong Kong: 990–993.

II Tamminen S & Juutilainen I & Röning J (2010) Modelling of Charpy V Test RejectionProbability. Ironmaking & Steelmaking 37(1): 35–40.

III Tamminen S & Juutilainen I & Röning J (2010) Quantile Regression Model for ImpactToughness Estimation. Proc Industrial Conference on Data Mining, Berlin, Germany:263–276.

IV Juutilainen I & Tamminen S & Röning J (2012) A tutorial to Developing Statistical Modelsfor Predicting Disqualification Probability. In: Davim JP (ed) Computational Methods forOptimizing Manufacturing Technology: Models and Techniques. IGI Global: 368–399.

V Tamminen S & Juutilainen I & Röning J (2013) Exceedance Probability Estimation for aQuality Test Consisting of Multiple Measurements. Expert Systems with Applications 40(11):4577–4584.

Reprinted with permission from IEEE (I), Maney Publishing (II), Springer (III), IGIGlobal (IV) and Elsevier (V).

Original publications are not included in the electronic version of the dissertation.

87

Page 90: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

88

Page 91: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

A C T A U N I V E R S I T A T I S O U L U E N S I S

Book orders:Granum: Virtual book storehttp://granum.uta.fi/granum/

S E R I E S C T E C H N I C A

481. Peschl, Michael (2014) An architecture for flexible manufacturing systems basedon task-driven agents

482. Kangas, Jani (2014) Separation process modelling : highlighting the predictivecapabilities of the models and the robustness of the solving strategies

483. Kemppainen, Kalle (2014) Towards simplified deinking systems : a study of theeffects of ageing, pre-wetting and alternative pulping strategy on ink behaviour inpulping

484. Mäklin, Jani (2014) Electrical and thermal applications of carbon nanotube films

485. Niemistö, Johanna (2014) Towards sustainable and efficient biofuels production :use of pervaporation in product recovery and purification

486. Liu, Meirong (2014) Efficient super-peer-based coordinated service provision

487. Väyrynen, Eero (2014) Emotion recognition from speech using prosodic features

488. Celentano, Ulrico (2014) Dependable cognitive wireless networking : modellingand design

489. Peräntie, Jani (2014) Electric-field-induced dielectric and caloric effects in relaxorferroelectrics

490. Aapaoja, Aki (2014) Enhancing value creation of construction projects throughearly stakeholder involvement and integration

491. Rossi, Pekka M. (2014) Integrated management of groundwater and dependentecosystems in a Finnish esker

492. Sliz, Rafal (2014) Analysis of wetting and optical properties of materials developedfor novel printed solar cells

493. Juntunen, Jouni (2014) Enhancing organizational ambidexterity of the FinnishDefence Forces’ supply chain management

494. Hänninen, Kai (2014) Rapid productisation process : managing an unexpectedproduct increment

495. Mehtonen, Saara (2014) The behavior of stabilized high-chromium ferriticstainless steels in hot deformation

496. Majava, Jukka (2014) Product development : drivers, stakeholders, and customerrepresentation during early development

497. Myllylä, Teemu (2014) Multimodal biomedical measurement methods to studybrain functions simultaneously with functional magnetic resonance imaging

C498_etukansi.fm Page 2 Thursday, July 3, 2014 2:10 PM

Page 92: Modelling the rejection probability of a quality test …jultika.oulu.fi/files/isbn9789526205205.pdfFP False Positive GAM Generalized Additive Models GAMLSS Generalized Additive Models

ABCDEFG

UNIVERSITY OF OULU P .O. B 00 F I -90014 UNIVERSITY OF OULU FINLAND

A C T A U N I V E R S I T A T I S O U L U E N S I S

S E R I E S E D I T O R S

SCIENTIAE RERUM NATURALIUM

HUMANIORA

TECHNICA

MEDICA

SCIENTIAE RERUM SOCIALIUM

SCRIPTA ACADEMICA

OECONOMICA

EDITOR IN CHIEF

PUBLICATIONS EDITOR

Professor Esa Hohtola

University Lecturer Santeri Palviainen

Postdoctoral research fellow Sanna Taskila

Professor Olli Vuolteenaho

University Lecturer Veli-Matti Ulvinen

Director Sinikka Eskelinen

Professor Jari Juga

Professor Olli Vuolteenaho

Publications Editor Kirsti Nurkkala

ISBN 978-952-62-0519-9 (Paperback)ISBN 978-952-62-0520-5 (PDF)ISSN 0355-3213 (Print)ISSN 1796-2226 (Online)

U N I V E R S I TAT I S O U L U E N S I SACTAC

TECHNICA

U N I V E R S I TAT I S O U L U E N S I SACTAC

TECHNICA

OULU 2014

C 498

Satu Tamminen

MODELLING THE REJECTION PROBABILITY OF A QUALITY TEST CONSISTING OF MULTIPLE MEASUREMENTS

UNIVERSITY OF OULU GRADUATE SCHOOL;UNIVERSITY OF OULU, FACULTY OF INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING;INFOTECH OULU

C 498

ACTA

Satu Tamm

inenC498_etukansi.fm Page 1 Thursday, July 3, 2014 2:10 PM