Leveraging Operational Data for Intelligent Decision Support

LEVERAGING OPERATIONAL DATA FOR INTELLIGENT DECISION SUPPORT IN CONSTRUCTION EQUIPMENT MANAGEMENT

Research Proposal for the Degree of Doctor of Philosophy inConstruction Engineering and Management

Hongqin Fan

Hole School of Construction EngineeringDepartment of Civil and Environmental Engineering

University of AlbertaFebruary 24th, 2006

Co-supervised by

Dr. Simaan M. AbouRizkProfessor

Dept. of Civil and Environmental Engineering,The University of Alberta, Edmonton, Canada T6G 2W2,

3-133 Markin/CNRL Natural Resources Engineering Facility,Tel: +1 780/492-4235,Fax: +1 780/492-0249,

[email protected]

&Dr. Hyoungkwan Kim

Assistant ProfessorSchool of Civil and Environmental Engineering, Yonsei University,

134 Shinchon-dong, Seodaemun-gu, Seoul 120-749, Korea,Tel: +82 2/2123-2799, Fax: +82 2/364-5300,

[email protected]

mailto:[email protected]

mailto:[email protected]

Leveraging Operational Data for Intelligent Decision Support in Construction Equipment Management

INTRODUCTION Computerized construction equipment management has greatly simplified the daily tasks including equipment tracking, maintenance, repair and operations. Recent developments in computer hardware and software technologies, coupled with the modern controls in construction equipment further empower the contractors with advanced features of automatic data collection, integrated systems, parameter-driven reporting and automation of some managerial functions.

Researches in construction equipment management are mainly focused on the automation and robotic technologies, real-time data communications and information processing [Arditi et al. 1997; Chen and Liew 2003; LeBlond et al. 1998; Schexnayder and David 2002], statistical data analysis for decision support [Gillespie and Hyde 2004; Lucko and Vorster 2002]. While various commercial solutions for computerized construction equipment management are developed in a fast pace, the research works in this area are limited due to the primary objective of such systems being designed to replace the routine tasks in equipment management.

The premises of this research are two problems faced by large contractors in equipment management. Firstly, the data collected using various computer systems or applications tend to be noisy, heterogeneous and scattered, as a result, retrieving information from the data is difficult; secondly, there exists no handy yet powerful computer tools to automatically uncover implicit knowledge from the collected data for the purpose of decision support. The current construction equipment management systems, being capable of generating a wide variety of customized reports, provide limited features for information retrieval and knowledge discovery.

The proposed solutions for tackling these problems are emerging data warehousing and data mining technologies in computer science. Data warehousing consolidates and re-organizes enterprise data into a centralized data repository for efficient data analysis and information retrieval. Some other benefits of data warehousing include improved data quality, integrated data, and analysis-friendly structure. Data mining can automatically discover hidden rules/patterns or unusual behavior in the data and explicitly represent the knowledge to user; data mining is also capable of predicting future occurrence of events. From the technological viewpoint, both data warehousing and data mining techniques can be integrated seamlessly with the current equipment management system.

In partnership with a large road building contractor in Canada and based on its current construction equipment management system, “MTrack” developed by the NSERC/Alberta Construction Research Chair [NSERC/Alberta Construction Research Chair 2005], this research will accomplish the following objectives:

1. Build a prototype construction equipment data warehouse as the enterprise data source for decision support. Explore the opportunities and challenges at different stages of data warehousing, including planning, design and implementation, for equipment management.

2. Design and test of a novel nonparametric outlier mining algorithm for generic problem detection in construction equipment data, as well as other engineering data. Testing, evaluation and modification of current data mining algorithms for decision support in construction equipment management.

3. Design and implement the prototype intelligent equipment management system using integrated equipment data warehouse and embedded data mining models; make recommendations on

--- Page 1 of 8 ---

system planning and design.

PROBLEM STATEMENT AND RESEARCH MOTIVATION Academic researches and industrial developments in the area of construction equipment management are largely focused at the operational level. Examples include automation controls of equipment operations, real-time data collation and diagnostics, computer-aided equipment maintenance and repair control, order processing and inventory control. These technologies enable the contractors to capture the operational data and obtain various summary reports on equipment management in an efficient manner. Nevertheless, the usability of the large amounts of data collected is undermined by a number of problems as stated below:

1. Data quality is generally poor. Data in some applications or information systems, especially legacy systems, contain lots of noises due to entry error and lack of a mechanism for validating data input.

2. Data are scattered across different systems, applications, or departments, though they characterize the same domain problem.

3. Data are not stored in a structure efficient for data analysis. Most data are stored in relational databases, spreadsheets, text files etc. Answering unanticipated business questions based on these data repositories is technically challenging.

4. Lack of advanced computer tools for automatically discovering knowledge from the data. The hidden rules/patterns or irregularities in the data are commonly uncovered by equipment management using statistical tools in a trial-and-error approach.

Most data generated in construction equipment management operations is stored in relational databases. Based on a relational database model, the transactional systems such as an equipment management system are designed for efficient capturing of operational data. The process-oriented transactional systems guarantee that the data are added and updated efficiently during daily operations; however, it does not perform well for sophisticated data analyses. Extracting information out of a transactional database requires building queries across different database objects and can only be accomplished by database specialists. Using an operational system for decision support becomes even more inefficient with today’s increasing data volume and complexity. To tackle the problem, the data warehousing technology is employed to re-package and present the data in an integrated data repository using a multi-dimensional data model.

Compared to a relational database, the data warehouse has two distinct features facilitating dynamic decision support: subject-orientation and multi-dimensional structure. Subject-orientation means that data models center around each subject, such as work order cost, fuel consumption etc. the subject-oriented data model contains all the information on the subject; Multi-dimensional model has a star-shaped structure with fact table in the center and a number of dimension tables surrounding and connecting to the fact table. Such a data structure makes it possible to perform data analysis along any combination of dimensions, and at various granularities for each subject. With a single equipment data warehouse, the data collected across the enterprise are scrubbed, integrated and re-structured; the data warehouse can answer various business questions by simple point-and-click and other visual operations. It also serves as a universal data source for automated knowledge discovery and other analysis tasks on equipment data.

Knowledge buried in the data is valuable assets of a contractor. Traditional approaches such as statistical analysis, visualization, and mathematical modeling become inefficient for large amounts of data. Our interview with the collaborating contractor found that only a small portion of the data collected in equipment management is used for direct decision support due to “lack of tools”. There is an urgent need for converting data and scraps of information into knowledge using automated approaches.

--- Page 2 of 8 ---

Data mining is an interdisciplinary field with confluence of statistics, machine learning, database technology, information science etc. and is capable of “extracting interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases” [Han and Kamber 2000]. Depending on its purposes, data mining are categorized as descriptive data mining and predicative data mining. The former helps to better understand the data by uncovering the relationships and patterns in the database, while the later is used to generate data-driven models for predication, classification, forecasting etc.

Data mining model has the following advantages as compared with traditional mathematical model and expert system: firstly, the model is data-driven, data mining models are obtained from data using specific algorithms, the models are based on derived facts rather than expert opinions or personal experiences; secondly, the data mining model may become the only viable solution when the system is too complex to be described by other models.

Even though data mining is a well-researched area and has been applied in various industries, applying data mining techniques to construction equipment management faces some specific challenges, such as noisy data, dynamic changes of data, lack of pre-labelled data, and its exploratory nature. This research will select some data mining algorithms and equipment management problems for in-depth investigation.

One of the data mining tasks in this research is outlier mining. Hawkins defines outlier as “an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism” [Hawkins 1986]. Searching, sorting and ranking outliers in equipment database can identify problems in equipment field operations, equipment performance, management decisions etc.; however neither traditional statistical methods nor current outlier mining algorithms can provide flexible and reliable solutions when applied to real-world datasets due to their stringent pre-assumptions on data distributions or sensitivity of outlier mining results to the input parameters. Based on the idea of resolution change used by Andrew and Zaiane [Foss and Zaiane 2002] in a non-parametric clustering algorithm, this research will explore the design and implementation of a non-parametric outlier mining algorithm, for generic problem detection in engineering data, such as equipment data.

Prediction of a continuous variable based on a number of known attributes, of either categorical or continuous value, is a mundane problem in construction equipment management. Traditional solutions, such as statistical analysis, artificial neural network etc., suffer from problems such as inaccurate results, black-box model or hard system integration. AutoRegressive Tree (ART) technique proposed by Meek et al. [2002] provides a satisfactory solution by overcoming these problems using a transparent data mining model. Other data mining tasks such as time-series forecasting of equipment cost will also be studied in this research.

LITERATURE REVIEW

A comprehensive review on construction equipment automation was conducted by Chen and Liew [Chen and Liew 2003]. The authors pointed out the automation and robotic technologies are one important research areas in construction since 1980 to overcome problems in safety, quality, productivity and competition. Other resources also reported innovation in construction equipment and its influences to the industry [LeBlond et al. 1997; Jahren 2000].

A pilot research project and an investigation into the application of data warehousing technology in construction were conducted by Chau et al. (2002) at the University of Hong Kong. The authors built a decision support system based on Online Analytical Processing (OLAP) for inventory management of construction materials. Ma et al [Ma et al. 2004] applied data warehousing technique to improve document management in construction for multi-party multi-purpose use.

In outlier mining, Knorr and Ng first introduced distance-based outlier and DB (p,D)-outlier mining

--- Page 3 of 8 ---

algorithm [Knorr and Ng 1998], that can efficiently deal with multi-dimensional, large datasets. The problem with the DB-outlier definition is that it cannot cope with datasets containing clusters with significantly different densities. To overcome this deficiency, Breunig et al. proposed a definition of Local Outlier Factor (LOF) and LOF-based outlier mining algorithm [Breunig et al. 2000]. Other researchers have also suggested improvements to the LOF method, such as the Connectivity-based Outlier Factor (COF) proposed by Tang et al. [Tang et al. 2002] for cases of low density patterns.

Many researches in construction industry explored the application of data mining techniques for data analysis and decision support. Soibelman and Kim [2002] conducted a systematic research on data preparation and the entire Knowledge Discovery in Database (KDD) process for construction knowledge generation; as an example, the researchers applied the decision tree algorithm C4.5 for evaluation of construction delays in pipeline installation. Caldas et al. [2002] proposed an automated approach for classification of construction documents through the integration of a model-based information system with support Vector Machine (SVM). Lu and AbouRizk applied artificial neural network for estimating construction productivity [Lu and AbouRizk 2000]; Wilmot and Mei conducted research on highway cost estimation using neural network [Wilmot and Mei 2005]; Lee et al. investigated the application of decision tree to classify and quantify cumulative impact of change orders on productivity [Lee et al. 2004].

RESEARCH METHODOLOGY

An intelligent construction equipment management system based on MTrack is proposed as a workbench throughout the research. An in-depth investigation will first be conducted on the techniques and challenges of building an enterprise-wide equipment data warehouse using disparate operational data sources; secondly automated knowledge discovery from operational data for decision support, will be explored using various data mining techniques.

Data warehousing refers to all the processes needed to build up and implement the data warehouse. The procedures for data warehousing include: (1) Identification of data sources - the data may come from different operational systems, applications, or flat files; (2) Data staging, which usually involves data Extraction, Transformation and Loading (ETL) from the heterogeneous sources to a consolidated data warehouse; (3) Presentation of the data to the user through data access tools. The three steps are illustrated in Figure 1.

Figure 1. Procedures for building equipment data warehouse

The architectural design and multi-dimensional modeling of an equipment data warehouse will also be

--- Page 4 of 8 ---

explored in this research.

Architectural design: High level planning and design of the equipment data warehouse adopts the Data Warehouse Bus (DWB) Architecture proposed by Kimball and Ross [2002]. A bus matrix depicting the whole picture of the data warehouse is used to identify subjects for operational processes within the enterprise and to obtain a master suite of standardized dimensions and facts that are uniformly interpreted across the enterprise [Kimball and Ross 2002].

Multidimensional modeling: Multidimensional data models are to be designed based on the data warehouse bus matrix. The best model structure is star schema, which includes a measurement fact table at the center, with all the associated dimensions arranged around it. In the data warehouse, the star schema represents an interested subject as a data cube, with all the numerical measurements in the central fact table and all the descriptive attributes in the surrounding dimension tables. Questions of when, where, who, etc., can be answered after the schema is transformed into a dimensional data cube. Proper modeling of each data cube, with its underlying fact table and dimension tables, enables comprehensive data analysis on an individual subject. All the stars in the system will collectively provide an integrated view of equipment management performance.

Figure 2 shows the multidimensional data model for the subject “Repair Cost”, where the fact table in the center consolidates all cost measurements while the surrounding dimensions contain different descriptive attributes with various levels of detail.

the_Manufacturer

PK man_wk

manIDManufacturer

the_Department

PK Dpt_wk

DptIDDepartmentDivIDDivision

the_RepairCostType

PK rcType_wk

rcTypeIDrcType

the_Equipment

PK unit_wk

unitIDUnitClassIDClassCatIDCategory

the_Time

PK time_wk

timeIDthe_Daythe_DayofWeekthe_Weekthe_Monththe_Quarterthe_Year

RepairCostFactTable

PK,FK2 time_wkPK,FK1 man_wkPK,FK3 Dpt_wkPK,FK4 rcType_wk

Labor hoursLabor dollar amountParts dollar amountTotal dollar amount

Figure 2. Multidimensional data model for subject “Repair Cost”

A non-parametric outlier mining algorithm will be proposed in this research for genetic problem detection from engineering data. Based on the clustering algorithm TURN* [Foss and Zaiane 2002], an outlier factor—Resolution Outlier Factor (ROF) and ROF-based outlier mining algorithm will be studied and tested in this research using both synthetic and real world datasets. Instead of tracking statistical properties of each cluster, the ROF-based outlier mining algorithm will track the isolativity of each data point based on its

--- Page 5 of 8 ---

behavior in merging into its neighboring clusters during resolution change. Preliminary test results show the ROF definition and ROF-based algorithm performs better in real world dataset as compared with current distance-based DB(p,D) outlier [Knorr and Ng 1998] and density-based LOF outlier [Breunig et al. 2000] mining algorithms. Figure 3 shows the proposed flowchart for the algorithm.

Figure 3. ROF-based outlier mining algorithm

An example of other data mining algorithms to be investigated is AutoRegressive Tree (ART) technique proposed by Meek et al. [2002]. ART technique uses a hybrid algorithm of decision tree model and regression model, where regression models are built on each leaf node of the decision tree for prediction on a continuous target variable. ART will be tested and evaluated for real-time evaluation of “estimated work orders” and time-series forecasting of equipment cost. This research will investigate and try to solve the technical problems associated with the application of these algorithms, i.e. how to fine-tune the algorithm parameters to get the best results? Are there any other similar algorithms which can provide better performance? Is it possible to improve the results through modification of the algorithm?

From the perspective of system implementation, the proposed equipment data warehouse will work as an “add-on” data source in addition to the current transactional databases, all the data mining algorithms will be integrated with the current equipment management system as “plug-in” components as shown in Figure 4. The computer technologies for the high level system integration are Microsoft Data Mining Expression (DMX) and communication protocols such as OLE DB for Data Mining [Tang and MacLennan 2005].

EXPECTED CONTRIBUTIONS

Different from other studies on application of data warehousing and data mining technologies in construction, I will primarily focus on the algorithmic level of data mining to propose novel data mining

--- Page 6 of 8 ---

algorithms and tailor the current algorithms for mining engineering data; at the same time, I will address problems in the conceptual design of an equipment data warehouse, as well as system integration between data warehousing/data mining and the current equipment management system. The later will greatly benefit the construction industry by facilitating the transfer of data warehousing and data mining technologies. The expected contributions of my research are as follows:

1. This research will provide guidelines for applying data warehousing technology to construction equipment management for improved decision support. These include the opportunities, challenges, and suggestions for planning and design of an equipment data warehouse, as well as software implementation.

2. A novel non-parametric outlier mining algorithm is proposed for generic problem detection in both equipment management and other engineering applications. This will contribute to the body of knowledge in data mining community.

3. A number of current data mining algorithms, such as family of decision tree algorithms, will be tested, evaluated and modified for intelligent decision support in construction equipment management. This research will report my findings and make recommendations on the general application of data mining technology in construction equipment management.

4. This research will summarize and make recommendations on the architectural design and implementation of an intelligent equipment management information system using combined data warehousing/data mining techniques, to meet industrial expectations.

Figure 4. Proposed intelligent equipment management system

REFERENCES

Arditi, D., Kale, S. and Tangkar, M. (1997). “Innovation in construction equipment and its flow into the construction industry.” J. Constr. Engrg. and Mgmt., ASCE, 123(4),371-378

Breunig, M., Kriegel, H., Ng, R., and Sander, J. 2000. LOF: identifying density-based local outliers”, Proceedings of ACM SIGMOD 2000 International Conference on Management of Data, Dalles, TX, USA

Caldas, C. H., Soibelman, L. and Han J. (2002) “Automated Classification of Construction Project

--- Page 7 of 8 ---

Documents.” ASCE Journal of Computing in Civil Engineering, 16(4), 234-243Chau, K.W., Cao, Y., Anson, M., and Zhang J. 2002. Application of Data Warehouse and Decision

Support System in Construction Management. Automation in Construction, 12: 213–224.Chen, W. F. and Liew, J. R. (2003). Civil Engineering Handbook, second edition, Chapter 6. CRC

Press, Florida, USA.Foss, A., and Zaiane, O. (2002). “A parameter-less method for efficiently discovering clusters of

arbitrary shape in large datasets.” Proceedings of 2002 IEEE International Conference on Data Mining (ICDM'02), Maebashi City, Japan

Gillespie, J.S. and Hyde, A.S. (2004) The Replace/Repair Decision For Heavy Equipment. Virginia Transportation Research Council. Final report: VTRC 05-R8

Han, J. and Kamber, M. (2000). Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, August 2000.

Hawkins, D. (1980). Identification of Outliers. Chapman and Hall, London.Jahren, C.T. (2000). “Transportation construction equipment.” Transportation in the New Millennium,

TRB Annual meeting, January 2000.Kimball, R. and Ross, M. 2002. The Data Warehouse Toolkit: The Complete Guide to Dimensional

Modeling, second edition, John Wiley & Sons, Inc., New York, pp. 13–88.Knorr, E., and Ng, R. (1998) “Algorithms for mining distance-based outliers in large datasets.”

Proceedings of Very Large Data Bases Conference, New York, USALeBlond, D., Owen, F., Gibson G.. E., Hass, C. T. and Traver, A.E. (1998). “Control improvement for

advanced construction equipment.” J. Constr. Engrg. and Mgmt., ASCE, 124(4),289-296Lee, M., Hanna, A.S. and Loh, W.Y. (2004). “Decision Tree Approach to Classify and Quantify

cumulative Impact of Change Orders on Productivity.” J. Comp. in Civ. Engrg., ASCE, 18(2), 132

Lu, M., AbouRizk, S.M. and Hermann U.H. (2002). “Estimating labor productivity using probability inference neural network” J. Comp. in Civ. Engrg., ASCE, 14(4), 241-248

Lucko, G. and Vorster, M.C. (2002) “Predicting the Residual Value of Heavy Construction Equipment.” Proceedings of the 4th Joint International Symposium on Information Technology in Civil Engineering 2003, Tennessee, USA.

Ma, Z., Wond, K.D., Heng, L. and Jun Y. (2005) “Utilizing exchanged documents in construction projects for decision support based on data warehousing technique.” Automation in Construction, 14(3), 405-412

NSERC/Alberta Construction Research Chair, (2005). http://irc.construction.ualberta.ca/ html/research/M-Track1.html

Schexnayder, C.J. and David S.A. (2002). “Past and Future of Construction Equipment—Part IV” J. Constr. Engrg. and Mgmt., ASCE, 128(4),279-286

Soibelman, L and Kim, H. (2002). “Data Preparation Process for Construction Knowledge Generation through Knowledge Discovery in Databases.” ASCE Journal of Computing in Civil Engineering, 16(1), 39-48

Tang, J., Chen, Z., Fu, A., and Cheung, D. (2002). “Enhancing effectiveness of outlier detections for low density patterns.”, Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, Taipei, Taiwan. pp. 535 – 548

Tang, Z. and MacLennan, J. (2005). Data Mining with SQL Server 2005. Wiley Publishing, Inc. Indianapolis, USA.

--- Page 8 of 8 ---

http://irc.construction.ualberta.ca/%20html/research/M-Track1.html

http://irc.construction.ualberta.ca/%20html/research/M-Track1.html

http://csdl.computer.org/comp/proceedings/icdm/2002/1754/00/1754toc.htm

http://csdl.computer.org/comp/proceedings/icdm/2002/1754/00/1754toc.htm

Wilmot C.G. and Mei, B. (2005). “Neural Network Modeling of Highway Construction Costs.” J. Constr. Engrg. and Mgmt., ASCE, 131(7),765-771

--- Page 9 of 8 ---