6
CASE BASED REASONING AS A TOOL "IN THE SEARCH FOR KNOWLEDGE IN DATABASES": PROSPECTS OF A "SUPPORT SYSTEM" 'FOR RESIDENTIAL VALUATION AND THE CONSTRUCTIONOF RESIDENTIAL IEIOUSING. Ilesh Dattani & Max Bramerl "I have but one lamp by which my feet are guided, and that is the lamp of experience. I know no way of judging the future but by the past. 'I Patrick Henry (S'peech in Virginia Convention, Richmond,March 23,1775). "Computers have promised us a fountain of wisdom but delivereld a flood of data" - A frustrated MIS executive Introduction What are we supposed to do with these *floods of data ?' A very small proportion of it will ever be seen by human eyes and even less will be analysed and understood. The data, that is and will be intelligently analysed and presented, would be a valuable resource and commercially could be used to a competitive advantage. The widespread exploitation of knowledge discovery has been synergistic with this realisation. Case-based reasoning represents part of Artificial Intelligence's scientific ambitions within problem solving and the search for efficient methods to define descriptive pattems and explanations within such 'a flood of data'. Theoretically it can be used when working with large amounts of historical data and in situations where there is a need to extract order from complex data. At a simplistic level case-based reasoning represents the ability to solve a given problem by remembering a previous similar situation and by reusing information and knowledge of that situation. This approach is supported by empirical evidence and results from cognitive psychological research and within these findings lie part of the foundations for the case- based approach. Essentially the roots of case-based reasoning in AI are found in the works of Roger %hank2 on dynamic memory and the fundamental role that a reminding of earlier situations have in problem solving and learning. For a fuller overview o f t h e theoretical principles see [Dattani & Bramer.,95. Aamodt and Plaza.,94. Watson & Marir.,94]. Prospects for Applications The work in progress within the domain of property valuation and construction involves the use of two datasets at present. The first of these was collated from the 1970 US Census.3 lDept of Information Science, AI Research Group, University of Portsmouth, Milton Site, Portsmouth. PO4 8JF Email: [email protected] & [email protected] 2Schank, R (1982): Dynamic memory; a theory of reminding and learning in computers andpeople. Cambridge University Press. 30riginal daraset held in the UCI Repository OfMachheLearning Databases;and Domain Theories . 6/ 1

00478102 ok

Embed Size (px)

DESCRIPTION

jurnal

Citation preview

Page 1: 00478102 ok

CASE BASED REASONING AS A TOOL "IN THE SEARCH FOR KNOWLEDGE IN DATABASES": PROSPECTS OF A "SUPPORT SYSTEM" 'FOR RESIDENTIAL VALUATION AND THE CONSTRUCTION OF RESIDENTIAL IEIOUSING.

Ilesh Dattani & Max Bramerl

"I have but one lamp by which my feet are guided, and that is the lamp of experience. I know no way of judging the future but by the past. 'I Patrick Henry (S'peech in Virginia

Convention, Richmond,March 23,1775).

"Computers have promised us a fountain of wisdom but delivereld a flood of data" - A frustrated MIS executive

Introduction What are we supposed to do with these *floods of data ?' A very small proportion of it will ever be seen by human eyes and even less will be analysed and understood. The data, that is and will be intelligently analysed and presented, would be a valuable resource and commercially could be used to a competitive advantage. The widespread exploitation of knowledge discovery has been synergistic with this realisation. Case-based reasoning represents part of Artificial Intelligence's scientific ambitions within problem solving and the search for efficient methods to define descriptive pattems and explanations within such 'a flood of data'. Theoretically it can be used when working with large amounts of historical data and in situations where there is a need to extract order from complex data. At a simplistic level case-based reasoning represents the ability to solve a given problem by remembering a previous similar situation and by reusing information and knowledge of that situation. This approach is supported by empirical evidence and results from cognitive psychological research and within these findings lie part of the foundations for the case- based approach. Essentially the roots of case-based reasoning in AI are found in the works of Roger %hank2 on dynamic memory and the fundamental role that a reminding of earlier situations have in problem solving and learning. For a fuller overview ofthe theoretical principles see [Dattani & Bramer.,95. Aamodt and Plaza.,94. Watson & Marir.,94].

Prospects for Applications The work in progress within the domain of property valuation and construction involves the use of two datasets at present. The first of these was collated from the 1970 US Census.3

lDept of Information Science, AI Research Group, University of Portsmouth, Milton Site, Portsmouth. PO4 8JF Email: [email protected] & [email protected] 2Schank, R (1982): Dynamic memory; a theory of reminding and learning in computers andpeople. Cambridge University Press. 30riginal daraset held in the UCI Repository OfMachhe Learning Databases; and Domain Theories .

6/ 1

Page 2: 00478102 ok

The data concerns housing values in the suburbs of Boston. Their are 506 instances ail wit1 14 attributes, (13 continuous attributes (including "class" attribute WEDV"), one is a binary. valued attribute. The attributes are: CFUM 2% INDUS cHAs4 Charles River dummy variable

per capita crime rate by town proportion of land zoned for lots over 25,000 sq. ft. proportion of non-retail business acres per town

(=I (iftract bounds river> eke

4) nitric oxides concentration (parts per 10 million) average number of rooms per dwelling average number of owner-occupied units built prior to 1940 weighted distances to five Boston employment centres index of accessibility to radial highways full-value property-tax rate per $10,000

IUOU(Bk - 0.63' A2 (Bk represents proportion of blacks by town) % lower status of the population Median value of owner-occupied homes in $1000'~

NOX Rh4 AGE DIS RAD TAX PTRATIO pupil-teacher ratio by town B LSTAT MEDV

I There are no missing attribute values and all the data generated is in numerical format.

The analysis of house price data to establish the effect of variations in locational and physical attributes has been attempted with the use of statistical techniques alone: the aim is usually to establish which attributes can then be used to synthesise valuations of a range of different properties. On their own statistical techniques appear to have achieved only a limited degree of success despite the relatively complex calculations involved. Results derived can be open possibly to different interpretation and any additional 'knowledge or information' used by the expert in making the valuations is not readily apparent. We are looking to consider whether numerical analysis and CBR can complement one another within a "hybrid system". Within such a system statistical techniques can be used to perform Exploratory Data Analysis (EDA) on large datasets after which CBR can be used and information derived from the initial statistical tests can act as input into any CBR system, particularly when applying appropriate weightings and developing qualitative models. Statistical Techniques By using correlational methods to identify relationships between the attributes one can reasonably make more informed judgements about developing qualitative models and determining the respective weightings for attributes that might together be related 'in-order' to build 'virtual q-nodes' . A 'virtual q-node' is used at one-level to summarise case-data into groups which then become new attributes within the 'case representation'. The virtual q- nodes represent causal relationships between the attributes that have been incorporated to contrive the 'virtual q-node'. Being able to provide precise information about these relationships is not however a pre-requisite. At the 'top level' the qualitative model is used to represent known causal relationships between case features that might affect a solution or outcome.(Barletta.,93) The Correlation Coefficient is generally used when we are concerned with relationships, however, the independent variable (X) usually has many quantitative levels (ie. XI, X2, .-...,

CHAS represents a binary attribute, all the rest being continuous.

Page 3: 00478102 ok

Xi ) and the experimenter is interested in showing that the dependent variable is some function of the independent variables. (Howe11.,87)

In defining the respective weightings of the 'match fields', regression analysis is a method we intend to apply in order to estimate how good a predictor 'XI is of Y in comparison to Xi' (where i=2,3 ,....., n).

The technique of Principal Component Analysis (Pearson.,l901. Hottceling.,l933) attempts to achieve some degree of economy in that within any respective CBR model 20 or 30 original variables might be. adequately represented by a significantly smaller number of principal components, at a given level level of statistical significance. The steps in a principal component analysis can be stated as: [ 13 Make sure that the assumptions of 'a normal distribution' and 'homogeneity of variance' can be applied to the dataset. [2] Calculate the covariance matrix

C ( X - F)(Y - y> N-1 covxr = - -

the cov-

This would be a correlation matrix if the assumptions for step 1 can be met. [3] Find the eigenvalues hl,h2, ....,hp and the corresponding eigenvectors al,a2, ..., ap. The coefficients of the ith principal component are then given by ai while hi is its variance. [4] Discard any components that only account for a small proportion of the variation in the data. Modelling in Reminds The correlation matrix shows simple positive and negative relationships for variables Xi, ....., Xi3 in relation to MEDV(Y1). This allows one to make some initial decisions about appropriate match fields for the outcome field MEDV. Weightings cam also be applied based on the strength of the respective positive or negative relationships that have been identified. Within Qualitative models the correlation matrix results can be used to again determine +ve and -ve relationships for the virtual nodes and inevitably on the outcome field. Simple models using this data have been implemented in Remind as a 'test bed'. One of the next stages is to carry outhmplement a Multiple Regression model on the above data. In this way a more robust and reliable set of indicators andor predictors would be available when implementing the underlying CART algorithm, accompanied with the appropriate Qualitative models and symbol hierarchies.(Brieman et a1.,84) Symbol hierarchies represent data that can be classified and rankLed. Graphically it is a branching structure of "parents" and "children" representing generalisations and specialisations. Through such a mechanism the system is provided with knowledge about the data within the domain. (Barletta.,93)

SRemindw Solutions from prior experience: A case-based reasoning development shell. Copyright 0 1992, Cognitive Systems, Inc.

Page 4: 00478102 ok

0.70 -1-Q -dent vanabk

0.30 MER!!!

TAX 0.46(-) PTRATIO 0.50(-)

0.30(-) LSTAT 0.74(-) CRIM 0.38(-) INDUS 0.48(-)

0.36 CHAS 0.18

Nearest Neighbour retrievals involve the use of the importance editor within which weight vectors can be specified for the assigned 'match fields'. (Watson.,94)

/

C L W i Fig 3. A Nearest Neighbour Algorithm

Accessibility (5%) RAD = -0.38

Neighbourhood Aesthetic (1 1%) ZN = 0.36

CHAS = 0.18 DIS = 0.30 LSTAT = -0.74

These tables represent the appropriate weightings and the 'virtual q-nodes'. Although this represents work in progress, accuracy appears to-improve when the weightings for the respective virtual q-nodes are increased. It is apparent that retrieval and indexing techniques can be incorporated into the system at varying levels of complexity through the use of qualitative models, virtual q-nodes, and Prototypes a11 of which can be used to represent 'domain specific knowledge' and to accommodate knowledge guided induction within the retrieval process. Past usage of this data has been made in the area of 'Combining Instance- Based and Model Based Learning'(Quinlan.,93a). The second dataset is a library of 34 cases from the "Inland Revenue Valuation Office (Southern Region)" . The variables in the dataset include: [l] Location, [2] VO Code, [3] Type, [4] Format, [5] Constructed, [6] Reduced Covered Area, [7] Central Heating, [8] Garage, [9] Car space and the outcome variable [lo] Value. This data has been used for valuation systems using 'artificial neural networks' (Evans..,92.

614

Page 5: 00478102 ok

Tay.,92.) We are using this dataset within Remind and C4.5 (Quinlan.,93b) to further assess the effectiveness of Case-based Reasoning for a 'decision support system' within the respective domain. Summary The main purpose of this project is the evaluation of a 'hybrid system' that would involve the use of a mathematical model , namely principal component analysis (PCA), with the results being applied to a CBR system incorporating CART, ID3, C4.5 (and their respective derivatives). Steps [I] and [2] of PCA have been applied and have then to some extent been applied to a CBR tool, Remind, which incorporates CART as an underlying algorithm in its 'Inductive Retrieval Engine'. Within the domain of residential valuation the tentative results to date indicate that this could be used in the development of a 'decision support system' for applications to determine taxation valuation, particularly the new Council Tax, or for loan secririty purposes. It could be used as an additional tool in the valuation process within which the system could gather comparables and adjust for differences relating to specific indicators. Such a system might also identify patterns based on similarities, interdependencies and relationships between pre- determined identifiers within the data. This might be useful for preliminary valuation prior to inspection. It would highlight non-conforming figures for further investigation and in some cases suggest 'a figure on which to work'. This might be suitable for application where bulk valuations might be required.

References [ 11Agnar Aamodt and Enric Plaza.,Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches. AICOM Vo1.7 Nr.1. 39-59. March 1994. [2]Barletta. R, et al.,ReMind: Developer's Reference Manual. Cognitive Systems Inc. 1993. [3]Brieman. L, et al.,Classification and Regression Trees. Belmont, CY: ?"&worth, 1984. [4]Dattani. I and Bramer. M.A., Case-Based Reasoning: Theoretical Principles, Development tools and the prospects for applications. Artificial Intelligence Research Group, Universiv of Portsmouth: Technical Report, 1995. [SIEvans. A, James. H and Collins. A.,Artificial Neural Networks: an application to Residential Valuation in the UK. Journal of Property Valuation and Investment: 11, 195-204, Computer Briefing 1992 [6]Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the dennand for clean air', J. Environ. Economics & Management: Vo1.5,8 1 - 102, 1978. [7]Hotelling, H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24,47 1-4 1,498-520. [8]Howell, D.C.,Statistical Methods for Psychology, 2nd Edition.PWS-Kent Publishing Company, 1987. [9]Pearson, K.,On lines and planes of closest fit to a system of points in space. Philosophical Magazine 2,557-72. [ 1 OIQuinlan, R.,Combining Instance-Based and Model-Based Learning. Proceedings of the tenth International Conference on Machine Learning. Morgan Kaiufman Pub Inc, 1993%

[ 1 IIQuinlan, R.,C4.5:Programs for Machine Learning. Morgan Kaujman Pub Inc, 1993b. [ 12]Schank, R., Dynamic memory; a theory of reminding and learning in computers and people. Cambridge University Press, 1982. [ 13lTay, D.P.H. and Ho, D.K.K.,Artificial Intelligence and the Mass Appraisal of Residential Apartments. 10 Journal of Properq Valuation and Investment: 2, 19912,525-540.

236-243.

Page 6: 00478102 ok

[14]Watson, I.,The Case for Case-Based Reasoning in Engineering Decision Support. Proceedings of Information Technology Awareness in Engineering: Informing Technologies to Support Engineering Decision Making. (Edited by James A. Powell), 55-64. Institute of Civil Engineers, London. November 1994. [ 151 Watson, I and Marir, F.,Case-Based Reasoning: A Review. The Knowledge Engineering Review: Vo1.9, No.4, 1994.

0 1995 The institution of Electrical Engineers. Printed and published by the IEE. Savoy Place, London WCOR OBL. UK.

616