Using Arti cial Intelligence for the Evaluation of the ...umu.diva-portal.org/smash/get/diva2:651308/FULLTEXT01.pdf · Using Arti cial Intelligence for the Evaluation of the Movability

Using Artificial Intelligence forthe Evaluation of the Movability

of Insurances

Martin Aslin

June 29, 2013Master’s Thesis in Computing Science, 30 ECTS credits

Supervisor at CS-UmU: Patrik EklundExaminer: Fredrik Georgsson

Umea UniversityDepartment of Computing Science

SE-901 87 UMEASWEDEN

Abstract

Today the decision to move an insurance from one company/bank to another is done man-ually. So there is always the risk that a incorrect decision is made due to human error. Thegoal of this thesis is to evaluate the possibility to use an artificial intelligence, AI, to makethat decision instead. The thesis evaluates three AI techniques Fuzzy clustering, Bayesiannetworks and Neural networks. These three techniques was compared and it was decidedthat Fuzzy clustering would be the technique to use. Even though Fuzzy clustering onlyachieved a hit rate of 69%, there is a lot of potential in Fuzzy clustering. In section 4.2 onpage 32 a few improvements are discussed which should help raise the hit rate.

ii

Contents

1 Background and motivation 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Insurances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2.1 What is an insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.3 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.4 Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3.1 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.3 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.4 The structure of the system . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Theory 7

2.1 Overview of ’intelligent computing’ . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 The logical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 The probabilistic approach . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.3 The numerical approach . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.4 Pros and cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 The chosen method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.1 C-Means Fuzzy Clustering Algorithm . . . . . . . . . . . . . . . . . . 17

2.3.2 Rulebase algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3.3 Inference algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Implementations, results and validations 25

3.1 Development environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.1 .Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.2 Usefull embedded namespaces . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.3 Database management via XML export . . . . . . . . . . . . . . . . . 26

3.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

iii

iv CONTENTS

3.3 Impact of the parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3.1 The Fuzzyness Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3.2 The Number of Runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3.3 The Cluster Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Conclusion 31

4.1 Restrictions and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.1 Better selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.2 Small and focused . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.3 Reinforced learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Acknowledgments 35

References 37

List of Figures

2.1 The process of fuzzy controls . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Inference using Mamdani’s method . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 K-Means clustering example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 Membership values for clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5 Projection of a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.6 Simple bayesian network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.7 Bayesian networks - order of nodes . . . . . . . . . . . . . . . . . . . . . . . . 12

2.8 Calculating probability in a Bayesian network. . . . . . . . . . . . . . . . . . 13

2.9 A neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.10 Neural network with one hidden level . . . . . . . . . . . . . . . . . . . . . . . 15

2.11 C-Means clustering algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.12 Criterion number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.13 Create rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.14 Calculate the input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.15 Process of Mamdani’s method. . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.16 Mamdani - Output membership function. . . . . . . . . . . . . . . . . . . . . 21

2.17 Centre of Gravity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.18 Larsen - Output membership function. . . . . . . . . . . . . . . . . . . . . . . 22

2.19 Takagi-Sugeno - X matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.20 Takagi-Sugeno - Y vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.21 Takagi-Sugeno - P vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.22 Takagi-Sugeno - β variable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.23 Takagi-Sugeno - matrix computation . . . . . . . . . . . . . . . . . . . . . . . 23

3.1 The impact of the fuzzyness variable . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 The impact of the number of runs variable . . . . . . . . . . . . . . . . . . . . 29

3.3 The impact of the interval limits . . . . . . . . . . . . . . . . . . . . . . . . . 30

v

vi LIST OF FIGURES

List of Tables

2.1 Randomized Membership Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 Adjusted Membership Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Output intervals of the inference methods . . . . . . . . . . . . . . . . . . . . 26

3.2 Output from the system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

vii

viii LIST OF TABLES

Chapter 1

Background and motivation

1.1 Introduction1

Today’s banks and insurance companies have the possibility to, with the customers consent,take over personal insurances like for example capital, service and private pension insurances.Which insurance that is movable, i.e. possible to take over, or which part of the insurancethat is movable is partly decided by legislation and partly by the content of the insurance.The size of one’s capital can be of big significance for a insurance company or bank when theydecide if they want to take over the insurance. The movability is also decided by internallyset rules by all separate companies. This makes the movability of a insurance dependenton a lot of parameters that can change a lot. Today the majority of the evaluation of themovability of insurances is done manually and from manually created sets of rules. Thereare three main types of assessment criteria, green - the insurance is movable, yellow- the insurance might be movable or partly movable, and red - the insurance is

not movable. Sometimes there are a lot of sub-criteria within each of the main criteria,where a more detailed assessment is described. Here, for example, it can be describedhow to proceed in a move matter by, for example, requesting a health certificate. Manualevaluation of insurances is a time consuming process which could benefit from becomingfully automated or partly automated. The goal of this thesis project is to create a systemthat can classify insurance information.

1.2 Insurances

This section will explain what an insurance is, how the structure of an insurance looks andwhat you have to keep in mind when reviewing to see if the information in the insuranceis correct. When reading this keep in mind that this is based on Swedish insurances andmight differ from insurances in other countries.

1.2.1 What is an insurance

An insurance is an agreed upon financial protection in exchange for payment. So you paysome bank or company a fee and they will give you money in case something happens like an

1Most of the information about insurances in chapter 1 has been found in [16] which is an internaldocument at Svenska Forsakringsfabriken.

1

2 Chapter 1. Background and motivation

accident or if you retire. Some insurances will only give you back roughly the same amountof money as you have paid them, they are called savings insurances and an example of suchinsurance is the pension insurance. There is another type that is called risk insurances whichwill in many cases give you a lot more money than you have paid to the insurance company,but they are only valid as long as you pay the fee and you will not get the money back ifyou stop paying. The risk insurance system works because of the number of people theyinsure and that not everyone will be in e.g. an accident and in need of insurance money.So the customers pays for each other, if all their customers would suddenly need insurancemoney then the company would likely become bankrupt. The insurance companies haverules where they evaluate the risk of something happening and adjusts the fee required orsimply not approve the insurance.

For example: lets say an 18 year old man wants to insure a brand new super car in amajor city, then he will most likely have to pay a huge yearly fee or the insurance companymight not want to sign him because the high risk of something happening to the car. Sothe risks that they might look at in this case might be:

– His young age and the fact that he is male, which statistically means that he is morelikely to be in some kind of accident.

– Powerful sports car which is easier to lose control of and they are a more desirabletarget for thieves.

– Lives in a major city with a big population which means more interaction with morepeople which means that there is a higher chance that some kind of accident occurs.

1.2.2 Structure

The structure can be split into two main parts that makes up an insurance. The first partcontains more of an overall information about the insurance such as the insurance number,the person that is insured, the owner of the insurance, the insurance provider when it wassigned, the cost of the insurance and etc. The second part contains information aboutthe content of the insurance. The content of the insurance is divided into moments. It ispossible for insurances to include more than one moment so e.g. an accident insurance willcontain a accident moment and a pension insurance can contain a pension moment and ahealth insurance moment. The possibility to add more moments to the insurance is up tothe insurance provider.

1.2.3 Types

There are a lot of different insurances but they can all be divided into two main types, thatwas mentioned in section 1.2.1 on the previous page, namely savings insurances and riskinsurances. Savings insurances are paid to the insured after a certain date, e.g. pensionsavings, and risk insurances are paid after accidents, sickness etc. and is only valid as longas the insurance fee is paid so e.g. if a person never is in an accident then they will not beable to use the money they have paid for the insurance. Though the advantage with therisk insurance is that the money you get if you e.g. are in an accident can be much higherthan the amount you have paid in.

Here are some of the types of insurances:

Pension this is a insurance that will let you receive money during your pension in a certainperiod of time and interval. Can be signed by a private person and/or a company.

1.3. Problem description 3

Survivor’s protection this moment exists so that the husband/wife, children or otherheirs to the insured gets the money if the insured person dies.

Health insurance this insurance will provide the insured person with an income in caseof sickness or early retirement due to ill health. Though a qualifying period of sicknessmight be needed.

Sickness-/ prematurely-capital this insurance will grant you a one-time amount ofmoney if you become sick or hurt enough that you are granted sickness benefit.

Medical treatment insurance this insurance covers costs during medical treatment/ at-tendance.

Accident insurance usually contains three moments:

Medical treatment costs is costs that occur with sickness/injury. and includescosts for treatment of a doctor/dentist but also necessary travel costs during thetreatment.

Disability capital a one time amount of money one receive if one is afflicted with apermanent disability or decreased capacity to work.

Death capital a sum that is paid to the husband/wife, children or heirs of the insuredperson in case of death.

Premium exemption this is an add-on for insurances that makes the insurance providertake responsibility for the payment of the premium if the insured becomes so sick thatthe period of sickness is greater than the qualifying period of sickness.

1.2.4 Pitfalls

There are a several pitfalls to keep a lookout for when reviewing an insurance and it slightlydiffers between the different types of insurances. Some general ones that apply to most ofthe insurances are e.g. is the insurance number, personal number, dates, sums, status etc.correct?. Another pitfall is if an insurance has a status of continuous and a Z-time that haspassed, then something is wrong. Z-time marks the last day that the insurance is valid, butthat is only the case of the risk type insurance. A Z-time in a savings type insurance meansthe date when they should start paying e.g. a person’s pension.

1.3 Problem description

This section will describe the goal of this thesis and cover some of the problems that has tobe considered and/or solved during this thesis.

1.3.1 Goal

The process of evaluating the movability of insurances is a time consuming process and thatis mainly because it is done manually and because of the complex rules. The goal of thisthesis is to make a system that can, based on manually pre-classified insurance evaluations,learn to classify insurance information. This could potentially save a lot of time whilereducing the number of human errors. There are a few requirements of the system:

– The result of the evaluation should consist of a flag and a text.


– Use data that has been serialized as XML.

– The system should be as general as possible.

– The system should be implemented in C# and .NET 4.0

1.3.2 Data

The data this system will use for training will be stored in XML-files. The problem with thedata is that the number of elements an insurance have will differ between the insurances.The possibility to add additional ”moments” to most of the insurances, will create a lot ofpossible combinations and a lot of different elements that the insurance contain. Some ofthese elements can of course be filtered out with a preprocessor since they might not affectany decision. But can, and probably will, still leave us with a variable number of remainingelements. It would be possible to create a large neural network, NN, that takes all, possible,elements and just set the ones not used to NULL or 0. But that would make it harder totrain the system since it would have to learn a more complex model. It would probably bemore efficient to somehow divide the problem into a lot of smaller NNs that is more focusedon a subset of the problem. For example: that each of the different types of insurance hasits own network. They would still be large and/or complex since they can have a lot ofdifferent moments, but perhaps it is possible to divide it even further and maybe combinewith other approaches such as Bayesian networks or a fuzzy clustering network.

There is another problem to consider with the data and that is the date and the sumvalues and that is how should they be represented in the system. The problem is that theyare values that can be continuous meaning that they might represent 13 January 1988 to11 February 2013 or 20-60.000 SEK.

1.3.3 Rules

There are of course a lot of rules regarding insurances and the rules can help us filter outinsurances which is incorrectly filled in, thus pointless to evaluate, and of course to helpus make a decision on whether or not the insurance is movable. A problem with the rulesis that they are complex and might be hard to implement and that companies can havedifferent rules. Another problem is that the rules can change a lot in the insurance industryand that makes it important to make a system that can adapt to new rules or a systemwhere it is easy to change or add rules. It would be desirable if the system could detect anyambiguity in the rules before and after any change or addition of new rules.

1.3.4 The structure of the system

If we decide to make one large structure/network that should handle every case then wewill end up with a behemoth of a system that will need a lot of data samples to be trainedwell enough. Another problem would be the training time which would be very long, sinceit will be a very complex function it would have to model.

On the other hand if we decide to use small expert systems e.g. one structure/networkfor each type of moment. That could allow us to shorten the training time since it willbe a less complex function with less amount of training data. During retraining it mightbe possible to not having to retrain all of them which can save time. But we would haveto come up with a way to train the system since insurances can have multiple amounts ofmoments but only one verdict. So if an insurance has a yellow flag should we assume that

1.3. Problem description 5

both moments was yellow, or was one of them yellow and the other green? The same hasto be considered after the moments has been evaluated, if one moment is green while theother is yellow will that make the complete verdict yellow?


Chapter 2

Theory

2.1 Overview of ’intelligent computing’

There will be three types of approaches that will be briefly explained in this section andthese approaches are logical, probabilistic and numerical. Then they will be comparedagainst each other and the pros and cons of each approach will be presented. At the end ofthis section there will be a more thorough explanation and some algorithms of the chosenapproach.

2.1.1 The logical approach1

The method in the logical approach that will be studied is the fuzzy clustering method.Before explaining fuzzy clustering it is necessary to explain fuzzy sets, fuzzy logic and fuzzycontrol which is all used in fuzzy clustering.

Fuzzy sets and fuzzy logic

Lets say we have the proposition it is cold outside, now is this true if we know thatthe temperature is 17 ◦C? Some people will probably find it cold while some might findit to be warmish. The problem with the proposition is the linguistic term cold which isnot an defined line where every value below that line is considered to be cold. Instead thelinguistic term cold is an arbitrary value that changes from person to person. Now thisis where fuzzy sets come into the picture, they allow us to represent a value such as cold.What the fuzzy set does is that it divides the cold value into degrees of cold, usually a valuebetween 0 and 1. So if we go back to is it cold outside when it is 17 c? a fuzzy setmight represent this as 0.65 true instead of definitely stating that it is true or false.

When fuzzy sets are used in logical expressions it is called fuzzy logic. Fuzzy logicdescribes fuzzy truth values which are a function of the truth values of the components.The standard rules for evaluating these fuzzy truth values, T, of a complex sentence are:

– >(A ∧B) = min(>(A),>(B))

– >(A ∨B) = max(>(A),>(B))

1The majority of the information for this basic explaination of Fuzzy logic, sets, control and Clusteringhas been found in [2] and [9]

7

8 Chapter 2. Theory

– >(¬A) = 1−>(A)

So for example if we have >(Cold(outside)) = 0.65 and >(Freezing(Martin)) = 0.55and then we have >(Cold(outside)) ∧ >(Freezing(Martin)) = 0.55 which is probable.

These rules might not change when a variable is modified, even when we would havewanted it to change. For example if we haveA = 0.75 andB = 0.33 thenmin(>(A),>(B)) =0.33 but if we change A = 0.85 then we still get min(>(A),>(B)) = 0.33 even though wemight want a new value since one of the values changed. There are a few ways we canimprove this but they will not be mentioned.

Fuzzy control

Fuzzy control uses rules for making decisions. A rule, <, is expressed as < : IF <fuzzycriteria > THEN < fuzzyconclusion >. Fuzzy control has a number of rulesand these rules are stored in what is called a rulebase. There are a few ways to create arulebase:

– Have an expert write the rules.

– Observe and record the in and out data when a expert performing the actions for aperiod of time.

– Generate rules based on data.

In this project the last one is the one that is of most interest since we want to minimizethe number of human decisions.

Figure 2.1: The process of fuzzy controls, [2] page 58.

figure 2.1 describes the process of fuzzy controls. First the input, x, gets fuzzified whichmeans that x is transformed into its corresponding truth value. Then the now fuzzified xgets combined by a logical conjunction which then is combined with the output membershipfunction of the rule. The newly created membership function is then calculated before beingdefuzzified.

Inference is used to create the conjunctions, in figure 2.2 on the next page we can seethe Mamdani inference method which in this case uses the minimum operation and thencombines the output results by using the maximum operation. In figure 2.2 on the facingpage the result given by the maximum operation is the grey field in U1 since that result hasa higher value than U2.

2.1. Overview of ’intelligent computing’ 9

Figure 2.2: Inference using Mamdani’s method, [2] page 59.

Fuzzy clustering

Figure 2.3: K-Means clustering example.2

In cluster analysis or clustering one strives to divide the data into different groups(clusters).The data that is clustered together are more similar to each other than the ones in the otherclusters, figure 2.3 is an example of a set of data points that has been divided into threeclusters. Now in fuzzy clustering, a number of data points gets divided into clusters butnow all data points belong to each one of the clusters, but in different degrees, just like in

2http://en.wikipedia.org/w/index.php?title=File:KMeans-Gaussian-data.svg&page=1, 8 Okt 2012

10 Chapter 2. Theory

fuzzy logic. The closer a point is to the center of the cluster the more it belong to thatcluster. The degree of belonging or membership value that a data point has, have to sumup to 1.0. So a data point could have the following membership values U0 = 0.17, U1 = 0.35and U2 = 0.48 which sums up to 1.0.

Figure 2.4: Membership values for clusters, [2] page 68.

Looking at figure 2.4 we can see two clusters and a number of data points. The numberabove the points is the membership value of the point for that cluster. In the left square wecan see the membership values to cluster 1, and as you can see even the ones that is reallyclose to cluster 2’s center, represented by the right most x, have a membership value tocluster 1. In fuzzy C-means clustering, which is the technique that will be used if fuzzyclustering is chosen, it is usually the distance from the center of the cluster that decides themembership value.

Figure 2.5: Projection of a cluster, [2] page 71.

Projections of the data points will be created after the set of data points has been dividedinto clusters. Each cluster will get their own set of projections, this can be seen in figure 2.5,these projections will be used to generate the rules of that clusters that together will formthe rulebase. The creation of the rulebase marks the end of the training and it can now beused to create output. In order to get this output we need something called inference which


will translate any new input to output with the help of the rulebase.

2.1.2 The probabilistic approach3

This section will go through Bayesian networks which is based on Bayes’ theorem which willbe explained first.

Bayes’ theorem

The product rule can be written in two forms, namely P (A ∧ B) = P (B|A)P (A) andP (A ∧ B) = P (A|B)P (B). By combining these two formulas we will get P (B|A)P (A) =

P (A|B)P (B) and by dividing by P (A) it turns into Bayes’ theorem P (B|A) = P (A|B)P (B)P (A) .

P (B|A) is read as what is the probability of B given A. With Bayes’ theorem it ispossible to calculate the probability of an unknown variable by using the probability fromthree known variables and that is a common case where a few probabilities is known whilethe one that we need to know is unknown.

In [9] they give an example where a doctor knows P (symptoms|disease), probability ofsymptoms given disease, but wants to know P (disease|symptoms). In the example thedoctor knows that:

– P (s|m) = 0.7

– P (m) = 1/50000

– P (s) = 0.01

Where s is patient has a stiff neck and m that the patient has meningitis. So by usingBayes’ theorem the doctor can calculate that the probability of a patient having meningitiswhen the patient has a stiff neck is:

P (m|s) =P (s|m)P (m)

P (s)=

0.7 · 1/50000

0.01= 0.0014

Baysian networks

Bayesian networks is a common approach when a system might have to deal with uncertainty.Bayesian networks are based on Bayes’ theorem section 2.1.2. A Bayesian network canbe described as a probabilistic graphical model that can represent dependencies betweenvariables, see figure 2.6 on the next page.

The figure 2.6 on the following page4 shows how a simple Bayesian network can look.There you have a graphical model describing the relationships between the different nodes.Each of the nodes in the network has a probability table associated with it and since therain node is not dependent on any other node, then only the unconditional probability ofeach state is necessary. When building a Bayesian network it is important to make a goodmodel of the relationships so that a node does not depend on, for that node, unnecessaryvariables. The way the nodes are introduced in the system, the order of them, can havea big impact on performance. If the nodes are introduced in a ’not so good’ way it couldpotentially mean that some nodes will get unnecessary dependencies and sometimes thatcould give dependencies that is difficult to calculate. In [9] they give the following example:

3The majority of the information for this basic explaination of Bayes’ Theorem and Bayesian Networkshas been found in [9]

4http://en.wikipedia.org/wiki/Bayesian network, 20 sep 2012


Figure 2.6: Simple bayesian network

Figure 2.7: Bayesian networks - order of nodes

The Bayesian networks in figure 2.7 both describe the same problem, the only differenceis the order of the nodes. In network A we have Alarm which is dependent on Burglary andEarthquake, Marycalls and Johncalls is dependent on Alarm. Which means that if eithera burglary is in progress or if there is an earthquake the alarm will go off which will causeeither or both Mary and John to call the owner. In network B we have Johncalls whichis dependent on Marycalls, Alarm which is dependent on both Marycalls and Johncalls.burglary is dependent on Alarm and Earthquake depends on both burglary and Alarm.A few details from the example in the book are needed in order for this to make sense. Inthe example they state that Mary often listens to loud music so she might not hear thealarm, so if she is calling it is a high probability that John will call as well. Which makesJohncalls dependent on Marycalls. The book, [9], will be quoted for the dependencybetween burglary and earthquake.

If the alarm is on, it is more likely that there has been an earthquake. (thealarm is an earthquake sensor of sorts.) But if we know that there has beena burglary, then that explains the alarm, and the probability of an earthquakewould only be slightly above normal. Hence, we need both alarm and burglaryas parents.


P (R|G)TT =P (G,R)TTP (G)T

=

∑S∈{T,F} P (G = T, S,R = T )∑S,R∈{T,F} P (G = T, S,R)

=P (G,S,R)TTT + P (G,S,R)TFT

P (G,S,R)TTT + P (G,S,R)TTF + P (G,S,R)TFT + P (G,S,R)TFF

=(P (G|S,R)P (S|R)P (R))TTT + (P (G|S,R)P (S|R)P (R))TFT

(P (G|S,R)P (S|R)P (R))TTT + (P (G|S,R)P (S|R)P (R))TTF + (P (G|S,R)P (S|R)P (R))TFT + (P (G|S,R)P (S|R)P (R))TFF

=(0.99 · 0.01 · 0.2) + (0.8 · 0.99 · 0.2)

(0.99 · 0.01 · 0.2) + (0.9 · 0.4 · 0.8) + (0.8 · 0.99 · 0.2) + (0.0 · 0.6 · 0.8)

=0.00198 + 0.1584

0.00198 + 0.288 + 0.1584 + 0.0≈ 35.77%

Figure 2.8: Calculating probability in a Bayesian network.

As we can see we get two additional dependencies when the Bayesian network is arrangedlike B instead of A. This could, as stated above, mean that the computations becomes harderto calculate.

Lets do an example5 that requires some calculations. Say that we have a Bayesiannetwork that looks like figure 2.6 on the facing page and we want to know What is the

probability that it is raining, given the grass is wet? So what we want to knowis P (R|G) where R means raining, G means Grass wet, S means sprinkler turned on

and T means true. In figure 2.8 we can see how it is solved by using Bayes’ theorem and otherstatistical formulas/rules. It is possible to describe the probability of most of the scenariosthat can occur based on the probability functions, even if some parts are unknown.

With Bayes’ theorem and other statistical formulas/rules it is possible to describe theprobability of most of the scenarios that can occur based on the probability functions, evenif some parts are unknown. There is a lot more that can be said about Bayesian networksbut this is supposed to be a brief introduction/explanation. If this gets chosen you can reada more thorough explanation in section 2.3 on page 16.

2.1.3 The numerical approach 6

Some problems might be too hard for designers solve on their own since it can sometimesbe hard (if not impossible) for a designer to predict all of the situations/states in whichthe system might find itself in. The change over time is another problem that is hard topredict, e.g. the stock market, and sometimes they don’t have an idea on how to programthe solution and this is where learning based AI can be a good choice since they will learn tobecome a solution. There are many different types of learning approaches, but this projectwill focus on neural networks and explain how those work.

5http://en.wikipedia.org/wiki/Bayesian networks, 20 Sep 20126The majority of the information for this basic explaination of Neural Networks has been found on

Wikipedia and some in [9]. The information about Artificial Neural Networks has been found in [9]


Neural networks

In the world of artificial neural networks(ANN) or simply neural networks(NN) one triesto achieve ‘intelligence’ by modelling the system after a biological neural network(BNN),like the human brain. Before I continue to explain ANN I will try and explain how a BNNworks.

Disclaimer: This will be a simple explanation since I am far from an expert in the fieldof neuroscience.

Figure 2.9: A neuron.7

A BNN is a vast network of connected nerve cells called neurons. Each neuron consistsof a cell body (soma) which contains a cell nucleus which contains the cells genetic material.Stretching out from the body are a number of dendrites which receives signals from otherneurons and a single long fiber called axon. The axon sends signals to other neurons. Theaxon and the dendrites are connected to other neurons at junctions called synapses. So thatis the structure of the BNN, now to explain how it works. Lets say that you see a flower,then a lot of your neurons will start firing and sending signals to other neurons until theystop at a state where you either recognises that it is in fact a flower, maybe even what typeof flower, or that it is something unknown.

So this is what a ANN is trying to mimic. An ANN will be built with a few layers. Firstwe have one Input layer and then a number of hidden layers and lastly one output layer.Each layer consists of a number of nodes (neurons) and each of these nodes are connectedto all the nodes in the next layer, in one direction, see figure 2.10 on the next page. Theconnection between two nodes, lets say i and j, serves to propagate activation ai from i to jand this connection has a numeric weight, wij , associated with it. This weight describes thestrength and sign of the connection. During the learning process the strength of the weightswill be updated to produce a desired signal flow. When a node derives its output it startsto calculate the weighted sum of all its inputs and then it applies an activation function tothis sum. This is called a feed-forward network and it is the type that will be used in caseANN will be used.

One of the biggest risks when working with AI and systems that needs to be trained,is the risk of over training them. This could mean that it starts over-fitting and if the

7http://en.wikipedia.org/wiki/Nervous system, 18 sep 2012


network is too big the network might become a big lookup table. There are a few availabletechniques that can help reduce over-fitting, but those will be mentioned in section 2.3 onthe following page if this approach is chosen.

Figure 2.10: Neural network with one hidden level.8

2.1.4 Pros and cons

In this section the pros and cons of each of the methods will be evaluated and comparedagainst the others and based on those a method will be chosen as the most suited methodfor this project. The pros and cons will be based on these criteria:

– Can the method handle a variable number of inputs.

– How would it have to handle dates/sums.

– Can it handle complex rules

– how well can it handle changes to the rules.

Can the method handle a variable number of inputs

Fuzzy Clustering and Neural networks does not handle a variable number of inputs so wellso it would be necessary to either assign the missing inputs as NULL, 0 or make manysmall expert systems for e.g. every type of moment. If the first choice is used we will endup with a big system that will be harder to train and would require a lot more data for thetraining. While the second choice with the expert systems will be simpler to train, requireless training data and after a rule change it might only be necessary to retrain a few of allthe expert systems. But a problem that arises is that the results would have to be combinedif an insurance contains multiple moments and it would then be necessary to know whatto do if the moments produce different results e.g. one moment is green while another isyellow or red. We could make small expert systems for every insurance type as well butthen we would still have the problem that a insurance can contain multiple moments i.e.

8http://en.wikipedia.org/wiki/Artificial neural network, 18 sep 2012


the number of input will vary. Bayesian Networks can handle variable inputs since it ispossible to calculate the probability of missing variables by using Bayes’ theorem and otherstatistical functions/rules. Given the number of inputs variables available in this projectand their dependencies it could mean that we would have to construct a big and complexmodel. That could lead to complex calculations.

How can it handle dates and/or sums

This problem is the same for all methods. The date or sum will have to be converted intoa numerical representation before the system can use them, which means that we need tofigure out how they should be represented.

Can it handle complex rules

Neural Networks are good at learning complex rules but one can never really be sure whichcomplex rule it has learned. Fuzzy Clustering can also learn complex rules, but unlike Neuralnetworks it is possible to show how/why it makes the choices it makes. Bayesian Networkscan describe complex stochastic relationships between variables.

How well can it handle rule changes

If the rules change then both the Neural Network and Fuzzy Clustering require retraining.It takes a long time to train a Neural Network and even longer for a Fuzzy Cluster. Thismakes it even more interesting with making one expert system for each moment since thatcould help us reduce training times. The smaller expert systems would require less trainingdata and would have to learn a less complex function which will save us time. BayesianNetworks do not require retraining, though it might be necessary to update the probabilitytables.

2.2 The chosen method

The method that was chosen for this project was fuzzy clustering. The two deciding reasonsfor why the fuzzy clustering was chosen was:

Show how the system ’thinks’ With fuzzy clustering it is possible to show, with e.g.graphs, how the system ’thinks’ which I evaluated would be a strong reason for pickingfuzzy clustering.

Similar problem with good results Patrik, my supervisor at the CS-department, haddone similar work with fuzzy clustering and with good results.

In section 2.3 the algorithms for fuzzy clustering will be explained.

2.3 Algorithms

In this section I will describe the fuzzy clustering algorithms used by this system.

2.3. Algorithms 17

2.3.1 C-Means Fuzzy Clustering Algorithm

This is the main algorithm for the type of fuzzy clustering that will be used in this project,namely C-means fuzzy clustering. The algorithm for C-means clustering can be seen infigure 2.11 and can be seen in [2] page 69.

Step 1: Fix c and m. Initialize U to some U (1).Select ε > 0 for a stopping condition.

Step 2: Update midpoints values vi for each cluster ciStep 3: Compute the set µk ≡ i : 1 ≤ i ≤ c : ||xk − vi|| = 0,

and update U (`) according to the following:

if µk = ∅, thenuik = 1/[

∑cj=1(||xk − vi||/||xk − vj ||)2/(m−1)]

otherwiseuik = 0∀i /∈ µk and

∑i∈µk

uik = 1

Step 4: Stop, if ||U (`+1) − U `|| < ε, otherwise go to Step 2.

Figure 2.11: C-Means clustering algorithm

Step 1: In this step the membership matrix U is initialized. The membership matrix isan matrix that contains all the membership values in the application. It is used to checkhow strongly an input is tied to a particular cluster. To create and initialize an membershipmatrix is very simple, just create a matrix of size CxI where C is the number of clusters andI the number of inputs. Then just fill fill the matrix with randomized values 0 ≤ value ≤ 1,this can be seen in table 2.1. Though there is a criterion that the values has to satisfy andthat is that the sum of all values for one cluster must sum up to 1 which in the table 2.1 isnot satisfied. But that is easily fixed by dividing all the values in that cluster with the sumof the values. In table 2.2 on the next page the values have been divided with the sums intable 2.1 and now sums up to 1.

Table 2.1: Membership Matrix with randomizedvalues between 0 and 1

Inputs

Clusters I1 I2 I3 I4 Suma

Cluster1 0, 05 0, 53 0, 48 0, 16 1, 22Cluster2 0, 79 0, 69 0, 60 0, 91 2, 99Cluster3 0, 67 0, 80 0, 40 0, 35 2, 22a The sum is not part of the membership matrix, its

just there to show that it is > 1.


Table 2.2: The Membership Matrix have been adjusted to sum up to 1 by dividingthe input values by the previous sum

Inputs

Clusters I1 I2 I3 I4 Suma

Cluster1 0, 040983607 0, 43442623 0, 393442623 0, 131147541 1Cluster2 0, 264214047 0, 230769231 0, 200668896 0, 304347826 1Cluster3 0, 301801802 0, 36036036 0, 18018018 0, 157657658 1a The sum is not part of the membership matrix, its just there to show that it now sums up

to 1.

Step 2: In this step we calculate the center, or midpoint, of a cluster. The center can becalculated with:

vi =

n∑k=1

(uik)mxk/

n∑k=1

(uik)m

Where uik is the membership value of point xk, for the i:th cluster and m is a fuzzinessvalue that will work as a weighting exponent.

Step 3: In this the membership matrix U is updated and there are basically two casesthat can occur when updating the membership matrix. One is that the center of a clusteris right on top of one or more points. In that case, the points that is under the center willget a membership value of 1.0 for that cluster and a value of 0.0 for the rest of the clusters,unless two clusters have the same center, which is unlikely but still plausible. If there istwo clusters with the same center and a point lies on that center then that point will get avalue of 1.0/number of clusters in the same location, so if there is two clusters thenthe point would get the membership value 1.0/2 = 0.5. In the other case of the update thevalue is updated based on this formula:

Uik = 1/[

c∑j=1

(||xk − vi||/||xk − vj ||)2/(m−1)]

Where ||xk−vi|| is the euclidean distance from the current cluster, vi, and the current point,xk, ||xk − vj || is the euclidean distance between cluster, vj , and the current point and thevariable m is a fuzzyness variable that is used as a weighting exponent.

Step 4: In this step we check if the difference between the old membership matrix andthe new one is less than the stopping condition that was chosen or if there has not been anychange to the matrix. If it is not then we go back to step 2.

This algorithm runs with a c that is set to a fixed number of clusters so it has to runseveral times with different c values in order to find the best c. It is preferred to have assmall a c as possible to limit the number of rules to a reasonable amount. If the c value isto big then the system will become over fit and it could mean that each data point in thesystem might be in one cluster each and thus the rule will most likely become that datapoint for that cluster. When the algorithm has run several times for a many different cvalues it is time to see which c that is the best. To find the best c we use something calledthe criterion number. The criterion number is calculated according to:

2.3. Algorithms 19

S(U, c) =

c∑i=1

n∑k=1

(uik)m[||xk − vi||2 − ||vi − x||2]

Figure 2.12: Criterion number

where x is the center of all data points. The goal is to get the c which generates thesmallest criterion number, though having c = number of data points usually generatesthe best criterion number. But, as stated above, it is not what we want since that wouldmake the system over fit. After the best number of clusters has been discovered it is timeto identify the rules of each of the clusters which you can read about in section 2.3.2.

2.3.2 Rulebase algorithms

After we have divided all of the data points into clusters we need to make projections ofthese clusters in order to create rules for the rulebase. All clusters will generate one ruleeach that is based on the projections of the cluster. In figure 2.5 on page 10 a projection ofa cluster can be seen. The closer the projection is to the center the higher the projectionwill be so the highest point of the projection is the center. When the projection has beencreated we need to find a function which best fits this curve and that function will becomethe rule.

The number of projections that is needed depends on how many inputs a data pointconsists of, so if we have data points that has three inputs Age, Sex and Income then wewould need to make three projections on the cluster. One for every input.

With these projections we can now create rules. To create a rule we use the followingformula:

uik − eβp(πp(xk−αip)

Figure 2.13: Create rules

Where uik is the membership value of data point xk in the i:th cluster, πp is the projectionon the p:th axis, or p:th input. The variable βp is expressed as βp = 1/(2σ2) where σ is thestandard deviation of πp. αip is the center of cluster i in πp. Running this on all clustersand all axes will create a set of rules that constitutes the rulebase. After the rulebase hasbeen created then the fussy clustering is completed and it is ready for the real input.

2.3.3 Inference algorithms

In order to get output from the rulebase we need something that can take the input andthe rulebase and translate it into output and for that we have inference methods. In thisproject there is two inference techniques that has been implemented, Mamdani’s methodand Takagi-Sugeno’s method.


Activation

Both of the inference methods use a activation function. The first thing this function doesis that it calculates the input using this formula

e−(xkp−vip)2

2σ2

Figure 2.14: Calculate the input.

Where xkp is the p:th input of data point xk, vip is the center of the p:th input on thei:th cluster. σ is the i:th clusters σ that we used when calculating the rules in section 2.3.2on the previous page. The results are saved in arrays, where each array represents one ofthe clusters and it contains all the calculations of all the dimensions of the input. So if wehave a data point x with the inputs age, sex and income. Then we would get an array thatwould look like this arrayr1 = [calc(age), calc(sex), calc(income)]. After all of the arrayshas been created we go into the nest step which is to, depending on the configuration chosen,either select all the smallest values, minimum, or largest values, maximum, in these arraysand save them in a new array which represents the activation function.

Mamdani’s method

The process of Mamdani’s method can be seen in figure 2.15 on the facing page.After the input has been calculated and the correct values has been chosen, see the

Activation paragraph above, Mamdani’s method will try and find the best rule for eachoutput. Mamdani’s membership function can be seen in equation (2.16) on the next page,where αi is the activation values, Ui is the output values from the rulebase and U is a fuzzyset.

To clarify things we will go through figure 2.15 on the facing page step by step. In thisfigure we have a system that contains three rules 1, 2 and 3. The curves represents how therule depends on the input. In the figure input1 has been calculated to be 3 and input2 tobe 8 and following the arrows we can see the results of the input where they hit the curve.The next step is to see which of the inputs that has the biggest affect on the final result andin Mamdani’s case we do this with the maximum function. The result of this a fuzzy set,U , which is represented by the green graphs in figure 2.15 on the next page.

Now to make sense of this fuzzy set we need to defuzzify it and there are a few techniquesfor defuzzification but the one that has been used in this project is called Centre-of-Gravity.So first we combine all values in the fuzzy set so we gets something that looks like the graphcalled ”Result of aggregation” in figure 2.15 on the facing page. It is on this graph we wantto find the centre of gravity and that can be done by using equation (2.17) on the next page.Where µU is the combined fuzzy set and uk is the k:th member of the fuzzy set.

There is another method that is based on Mamdani’s method but with a small alter-ation. The method is called Larsen’s method and uses the product as implication instead ofminimum that Mamdani’s method uses. This is mentioned because in the implementationthere is the possibility to do four different configurations. The Activation has two differentsettings namely maximum or minimum and in the Mamdani method it is possible to use

9http://www.dma.fi.upm.es/java/fuzzy/fuzzyinf/mamdani3 en.htm, 15 Feb 2013

2.3. Algorithms 21

Figure 2.15: Process of Mamdani’s method.9

U = ∨ni=1(αi ∧ Ui)

Figure 2.16: Mamdani - Output membership function.

u =

∑lk=1 uk · µU(uk)∑lk=1 µU(uk)

Figure 2.17: Centre of Gravity.

Larsen’s method instead. Larsen’s membership function can be seen in equation (2.18) onthe following page.


U = ∨ni=1(αi · Ui)

Figure 2.18: Larsen - Output membership function.

Takagi-Sugeno’s method

This method uses linear functions to create an inference. The out put is represented likethis: ui = pi1 + pi2x1 + pi3x2 one for each rule. The first thing that needs to be done isto compute the constants, p, for each rule. In [15], Takagi and Sugeno describes a way tocalculate these constants.

Let X be a m ·n(k+ 1) matrix(figure 2.19), Y an m vector(figure 2.20) and P a n(k+ 1)vector(figure 2.21).

X =

∣∣∣∣∣∣∣∣∣∣∣

β11, . . . , βn1, x11 · β11, . . . , x11 · βn1, . . .. . . , xk1 · β11, . . . , xk1 · βn1

......

β1m, . . . , βnm, x1m · β1m, . . . , x1m · β1m, . . .. . . , xk1 · β1m, . . . , xk1 · βnm

∣∣∣∣∣∣∣∣∣∣∣.

Figure 2.19: Takagi-Sugeno - X matrix.

Y = [y1, . . . , ym]T

Figure 2.20: Takagi-Sugeno - Y vector.

P = [p10, . . . , p

n0 , p

11, . . . , p

n1 , . . . , p

1k, . . . , p

nk ]T

Figure 2.21: Takagi-Sugeno - P vector.

Where β is defined as seen in figure 2.22 on the facing page, i represents the i:th rule, jand m represent the number of data points in the system, k is the k:th input in a data pointand n is the number of rules in the system. Aik means the membership value of the k:thinput in rule i, xkj means the k:th input from the j:th data point and ym is the output ofdata point m. The P vector represents the constant values needed to calculate the expectedoutput and is generated by the matrix computation seen in figure 2.23 on the next page. Infigure 2.21 the Pnk means that its the n:th constant for rule k. After the P vector has beencalculated we use this formula ui = pi1 + pi2x1 + pi3x2 to get the output ui which we thenuse to calculate the final output with:

2.3. Algorithms 23

u =

∑ni=1 αiui∑ni=1 αi

Where αi is the activation function that was described in the activation paragraph.

βij =Ai1(x1j) ∧ · · · ∧ Aik(xkj)∑j Ai1(xij) ∧ · · · ∧ Aik(xkj)

Figure 2.22: Takagi-Sugeno - β variable.

P = (XTX)−1XTY

Figure 2.23: Takagi-Sugeno - matrix computation


Chapter 3

Implementations, results andvalidations

3.1 Development environment

3.1.1 .Net

One criteria for this thesis was that it should be implemented in the .Net 4.0 environmentsince that is what they use at Acino. C# and .Net has excellent support for computersrunning windows since it is Microsoft that develops them both. I had never used C# priorto this project and knew that it could be a hindrance. I found it very similar to Java whichis a language that I have used many times which made it easier to learn. Though I mighthave been influenced to program in a more Java like style and thus might not have fullytaking advantage of the C# language. But it was an opportunity to learn C# which is a widelyused language in the business world and so is good to know to make you more attractive onthe job market.

3.1.2 Usefull embedded namespaces

A namespace in the .Net environment is basically a collection of useful methods and func-tions. The System.IO namespace contains useful methods for input and output like readingand writing data streams. In this section I will mention some of the namespaces that Ihave used in this project. The descriptions are from the msdn-site11 and has been used todescribe these namespaces.

System.IO: namespace contains types that allow reading and writing to files and datastreams, and types that provide basic file and directory support.

System.Runtime.Serialization: namespace contains classes that can be used for serial-izing and deserializing objects. Serialization is the process of converting an object ora graph of objects into a linear sequence of bytes for either storage or transmissionto another location. Deserialization is the process of taking in stored information andrecreating objects from it.

11http://msdn.microsoft.com/en-us/

25

26 Chapter 3. Implementations, results and validations

System.Linq: namespaces contain types that support queries that use Language-IntegratedQuery (LINQ). This includes types that represent queries as objects in expression trees.

System.Windows.Forms: namespace contains classes for creating Windows-based appli-cations that take full advantage of the rich user interface features available in theMicrosoft Windows operating system.

System.XML: namespaces contain types for processing XML. Child namespaces supportserialization of XML documents or streams, XSD schemas, XQuery 1.0 and XPath 2.0,and LINQ to XML, which is an in-memory XML programming interface that enableseasy modification of XML documents.

System.Text: namespaces contain types for character encoding and string manipulation.A child namespace enables you to process text using regular expressions.

3.1.3 Database management via XML export

This implementation uses XML to get the information about the insurances. Currently youhave to download this information to a file in order for the program to read them, but itshould not be to difficult to tie the implementation to a server that is hosting the insuranceinformation. One thing to keep in mind is that there are two styles in use. At Acino, whereI am doing this thesis, they have one style and at Svenska Forsakringsfabriken theyhave another style. The biggest difference is that the later uses codes for the different fieldswhile Acino uses words/names so it is easier to read the one from Acino and that is theformat that the implementation uses. The XML document is read by a preprocessor whichthen parses out the interesting parts of the data and sends them to the main program.

3.2 Result

The results was a bit of a surprise for a number of reasons. The first one was that theMandani methods that used the maximum function in the activation phase had a constanthit rate which never changed. Looking at a sample of the output of the functions, table 3.2on page 28, we can see that the interval of these methods only covers two outcomes namelycase 3.0 = Green and 4.0 = Gray which explains the poor results. The method with thebest results is the Takagi-Sugeno method which covers the whole range of expected results.In table 3.1 the observed upper and lower limits of each method can be seen.

Table 3.1: The output intervals of the different in-ference methods

Inference Lower limit Upper limit

Takagi-Sugeno 1.0 3.5T/Fa 0.5 2.5T/T 0.5 2.5F/F 3.0 4.0F/T 3.0 4.0

a T/F, (activation function: T = Minimum, F =Maximum)/(Mamdani function: T = Product, F =Minimum)

3.3. Impact of the parameters 27

The second one is that the hit rate for all the methods was over all very stable withonly a few drops in the hit rate. It was expected that the results would improve or at leastchange with different configurations but the results were very stable. With an exception ofthe T/T Mamdani method which struggled when the fuzziness variable was > 12. Anothersurprise with that Mamdani method was that it struggled when the number of runs was> 140 which is strange since all that does is run the programs again to find the best system,the one with the highest criterion number.

3.3 Impact of the parameters

There are a few settings that can affect the results. In this section we will go through someof them and see what impact they have on the hit rate. But first there is a thing that needsto be explained. In the charts the Mamdani inference is named T/T, T/F, F/T, or F/Fand the T stand for true and the F for false. The first symbol states if the Min-function hasbeen used or the Max-function is used, T means that Min was used. The second symbolstates if the Product-function has been used or if the Min-function has been used, T meansthat the Product-function was used.

3.3.1 The Fuzzyness Variable

The fuzziness variable, m, is a weighting exponent. It is related to the weight that is givento the closest center.

Figure 3.1: The impact of the fuzzyness variable

In figure 3.1 we can see that the Mamdani inference where we use the max functionhave a steady hit rate of 12% which never changes and 12% is is far from acceptable. TheMamdani inference where we instead use the min function performs around 40% whichstill is not acceptable. There is some slight difference between the Mamdani that uses theMin-function and the Product-function. The one that uses the min-function in both stagesperform slightly better and is a bit more stable than the one that uses the min-function inthe activation stage and the product-function in the next stage.


Table 3.2: Sample of the output from the system

Expected Takagi-Sugeno T/F T/T F/F F/T

2 2, 4 2, 1 1, 6 3, 9 3, 92 2, 4 2, 1 1, 6 3, 1 3, 12 2, 4 2, 1 1, 6 3, 9 3, 92 2, 4 2, 1 1, 6 3, 9 3, 92 2, 4 2, 1 1, 6 3, 1 3, 12 2, 4 2, 1 1, 6 3, 9 3, 92 2, 4 2, 1 1, 6 3, 1 3, 12 2, 4 2, 1 1, 6 3, 1 3, 13 1, 2 0, 6 0, 5 3, 7 3, 71 1, 2 2, 3 2, 2 3, 9 3, 93 2, 6 1, 6 1, 2 3, 9 3, 94 3, 2 0, 6 0, 4 3, 4 3, 43 2, 6 1, 6 1, 2 3, 9 3, 93 2, 6 1, 6 1, 2 3, 9 3, 92 2, 4 2, 1 1, 6 3, 1 3, 12 2, 4 2, 1 1, 6 3, 9 3, 93 2, 6 1, 6 1, 2 3, 9 3, 92 1, 4 2, 3 1, 9 3, 9 3, 93 2, 6 1, 6 1, 2 3, 9 3, 93 2, 6 1, 6 1, 2 3, 9 3, 93 1, 2 2, 4 2, 4 3, 9 3, 93 1, 2 2, 4 2, 4 3, 9 3, 91 1, 2 2, 3 2, 3 4, 0 4, 03 2, 6 1, 6 1, 2 3, 9 3, 92 2, 4 2, 1 1, 6 3, 1 3, 11 1, 2 2, 3 1, 9 3, 7 3, 71 1, 2 2, 3 1, 9 3, 7 3, 71 1, 2 1, 6 1, 2 3, 9 3, 93 2, 6 1, 6 1, 2 3, 9 3, 91 1, 2 1, 6 1, 2 3, 7 3, 71 1, 2 1, 6 1, 2 3, 7 3, 72 1, 4 2, 3 1, 9 3, 9 3, 93 2, 6 1, 6 1, 2 3, 9 3, 92 2, 4 2, 1 1, 6 3, 1 3, 14 3, 2 0, 6 0, 4 3, 4 3, 43 2, 6 1, 6 1, 2 3, 9 3, 93 2, 6 1, 6 1, 2 3, 9 3, 94 3, 5 0, 5 0, 4 3, 4 3, 4

a The numbers displayed is the actual numbers they produce.During evaluation they are rounded off to the nearest integer.

The inference method with the best performance is the Takagi Sugeno inference whichperforms with a hit rate around 69% with a few drops. 69% is a lot better than the Mamdanimethods but still not acceptable, the hit rate that we want to see is at least 90%.

3.3. Impact of the parameters 29

3.3.2 The Number of Runs

Since the initiation of the membership matrix is random we can see differences betweentraining runs. This means that it is a good idea to do a number of runs in order to get the’best’ clustering we can get.

Figure 3.2: The impact of the number of runs variable

In figure 3.2 we can see the impact of the number of runs performed. Most of the inferencemethods are stable which could mean that at some runs they were lucky or unlucky. Thoughthe number of runs seems to have a big impact on one of the inference methods, namely theMamdani version where we the minimum function is used in the activation and the productfunction is used in the Mamdani method. Though when using the product function insidethe Mamdani method it is known as Larsen’s method.

3.3.3 The Cluster Interval

It is preferable to have as few clusters as possible in order to avoid over fitting the system.In this test we look for an interval that that does not affect the hit rate while minimisingthe number of clusters we have to try. The label 20/90 means that the system will clusterbetween 20% and 90% of the data points. So if we have one hundred data points in thesystem, that would mean that 90 − 20 = 70 different clustering systems would be createdand then compared to see which one that yields the best result.

In figure 3.3 on the next page we can see that the methods are relatively stable with a fewdrops and peaks. This could just be a coincidence since it should not make any differencein performance unless a area that the usually creates the best clusters is removed from theinterval. Lets say that usually the winning cluster is from the system that makes around60% of the data points. So if we only check between 20% and 50% we might miss that andget a lower hit rate. Looking at figure 3.3 on the following page we can see that the intervalbetween 20/60 and 20/40 seems to be the best. Keep in mind that this is mostly to reducetraining time and not to improve the hit rate.


Figure 3.3: The impact of the interval limits

Chapter 4

Conclusion

There was an scheduling conflict in the beginning of the project. I had forgotten that I hadplanned to retake two courses at the same time which in hindsight was not the best of ideas.When the courses started the project was cut to a pace of 50% so half the day was spenton one of the courses and the other half on the project. But it always took a while to getback in to the project so a lot of time was lost in the end. Probably would have been bestto not have taken courses on the side or at least only one.

The goal for this project was to create a system that could evaluate insurances using AI.In section 1.3.1 on page 3 some criteria is presented, the only criteria that is not fully ful-filled is The result of the evaluation should consist of a flag and a text. Thesystem is currently only giving a flag and the reason for that is that I was unsure on howthe texts should be interpreted by the system. The system need the input and output to berepresented by floating-point numbers, it probably is possible to look up all combinationsof texts and give them a numerical representation that can be combined in a good way.The system is currently not a good replacement for the evaluation of insurances. But eventhough the system only managed to get a hit rate of 69% at best, there is potential in fuzzyclustering. In section 4.2 on the following page a few suggestions is mentioned that couldhelp boost the performance and make it an dependable replacement.

4.1 Restrictions and Limitations

There are a few restrictions and limitations in this project and those are:

Slow learning process: The process of training the system can can be quite time con-suming depending on the number of data points that is used. But since this is notsomething that needs to be done so often it can be probably be disregarded. A way toreduce the training time would be to make it parallel so that multiple number of clus-ters can be evaluated at the same time. The fuzzy clustering process is very parallelfriendly because it can easily be divided into a number of tasks.

Requires retraining: If there is a change in the rules then the system will need to beretrained. This could mean that new training data is required and that might haveto be manually evaluated which might take some time and then we have the actualtraining time of the system which is mentioned in the previous block.

31

32 Chapter 4. Conclusion

Variable number of inputs: The system can not handle a variable number of input andsince two insurance can have different number of values/attributes we can not justread the whole insurance and send it to the system. We need to choose a number ofimportant values/attributes that all insurances have and use them. The fewer that isused the less complex the problem will be for the system and the training time will bereduced. If a new value/attribute is introduced the system will have to be retrained.

No support for multiple instances: A way to improve the performance of the systemwould be to make different systems that each evaluates one type of insurance. This isdiscussed more in section 4.2. But currently the system does not support that. A fewmodifications would have to be done in order to support multiple instances. First is tofix the start up to support starting multiple instances and assigning tasks to them andthen fix so that each instance can save their own training data without overwritingeach other.

4.2 Future work

There is a few improvements that can be done in the future that might help improve theresults. These improvements will be discussed in the following sections and the proposedimprovements are:

– Better selections of training data

– Small and focused

– Reinforced learning

4.2.1 Better selections

It could be that the training set currently in use is to hard or complex for the system asit is. So by making a more thorough selection of training data it could be possible to getbetter results since it would be easier for the system to cluster.

4.2.2 Small and focused

At the moment the application is used as a single instance to evaluate all types of insurances.This means that the system needs to make a big and complex clustering that should handleall these cases. By making a lot of small systems that only handle a special case of theinsurances it should make it easier for the systems to make clusterings. There are a fewproblems that can occur when doing this, depending on how finely the insurances is divided.A insurance can contain a number of moments and if we make a system that has a lotof smaller systems that is focused on one moment each then we would have to come upwith a way to combine the results of these moments. If for example one moment givesGreen(movable) and another gives Red(not movable) should the combined output be Red,Green or Yellow(might be movable)?

4.2.3 Reinforced learning

During reinforced learning a system will, after training, run the training data again andsee how well it performs. If the system makes a good call it is ’rewarded’ and a bad call

4.2. Future work 33

results in a ’penalty’. The system will then go back to training and will try and adapt to the’rewards’ and ’penalties’ it revived. This process will continue until a certain limit has beenreached e.g. a hit rate of atleast 90%. As you can see by introducing reinforced learning tothe application it would be possible to improve the results.

34 Chapter 4. Conclusion

Chapter 5

Acknowledgments

I would like to thank Acino for letting me do my thesis there and I would also like to thankall of Acino’s employees for making me feel welcome. I especially want to give a big thanksto Hannes Kock my supervisor at Acino, Patrik Eklund my supervisor at the CS-departmentand Anna Theorin my contact at Svenska Forsakringsfabriken, for helping me with thisthesis.

35

36 Chapter 5. Acknowledgments

References

[1] John Binder, Daphne Koller, Stuart Russell, Keiji Kanazawa, and Padhraic Smyth.Adaptive probabilistic networks with hidden variables. In Machine Learning, pages213–244, 1997.

[2] Jens Bohlin, Patrik Eklund, Lena Kallin-Westin, and Tony Riissanen. Soft computing.2007.

[3] A. Doulamis, N. Doulamis, and S.D. Kollias. On-line retrainable neural networks:improving the performance of neural networks in image analysis problems. NeuralNetworks, IEEE Transactions on, 11(1):137–155, 2000.

[4] Nir Friedman, Dan Geiger, and Moises Goldszmidt. Bayesian network classifiers, 1997.

[5] Berenji H.R. and Khedkar P. Learning and tuning fuzzy logic controllers throughreinforcements. Neural Networks, IEEE Transactions on, 3(5):724–740, 1992.

[6] Bipin Joshi. Beginning xml with c# 2008 - from novice to professional. 2008.

[7] S. Marinai, M. Gori, and G. Soda. Artificial neural networks for document analysisand recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on,27(1):23–35, 2005.

[8] Dieter Merkl and Andreas Rauber. Document classification with unsupervised artificialneural networks. In IN F. CRESTANI, & G. PASI (EDS.), SOFT COMPUTINGIN INFORMATION RETRIEVAL (PP. 102–121). WURZBURG (WIEN): PHYSICA-VERLAG, 2000.

[9] Peter Norvig and Stuart J Russell. Artificial Intelligence A Modern Approach. PrenticeHall, 3rd edition, 2009.

[10] Agnieszka Onisko, Marek J. Druzdzel, and Hanna Wasyluk. Learning bayesian networkparameters from small data sets: Application of noisy-or gates, 2000.

[11] Yager Ronald R. Knowledge-based defuzzification. Fuzzy Sets Syst., 80(2):177–185,June 1996.

[12] J.A. Roubos, S. Mollov, R. Babuska, and H.B. Verbruggen. Fuzzy model-based predic-tive control using takagi-sugeno models, 1999.

[13] Han saem Park, Si ho Yoo, and Sung bae Cho. Evolutionary fuzzy clustering algorithmwith knowledge-based evaluation and applications for gene expression profiling, 2005.

37

38 REFERENCES

[14] M. Sugeno and T. Yasukawa. A fuzzy-logic-based approach to qualitative modeling.Fuzzy Systems, IEEE Transactions on, 1(1):7–, 1993.

[15] Tomohiro Takagi and Michio Sugeno. Fuzzy identification of systems and its applica-tions to modeling and control. Systems, Man and Cybernetics, IEEE Transactions on,SMC-15(1):116–132, 1985.

[16] Amanda Tapini. Mislife granskning, 2012.

[17] H. E. Virtanen. A study in fuzzy logic programming. Cybernetics and Systems’94,Proceedings of the 12th European Meeting on Cybernetics and Systems Research, Ed.R.Trappl, Vienna, Austria, pages 249–256, 1994.

[18] L.-X. Wang and J.M. Mendel. Fuzzy basis functions, universal approximation, andorthogonal least-squares learning. Neural Networks, IEEE Transactions on, 3(5):807–814, 1992.

[19] Nevin Lianwen Zhang and David Poole. A simple approach to bayesian network com-putations, 1994.

Documents

Using Arti cial Intelligence for the Evaluation of the ...umu.diva-portal.org/smash/get/diva2:651308/FULLTEXT01.pdf · Using Arti cial Intelligence for the Evaluation of the Movability