16

Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

  • Upload
    others

  • View
    74

  • Download
    6

Embed Size (px)

Citation preview

Page 1: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450
Page 2: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

Introduction to Data Mining with Case Studies

Page 3: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450
Page 4: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

Introduction to

Data Mining with

Case StudiesTHIRD EDITION

G.K. GUPTAAdjunct Professor of Computer Science

Monash UniversityClayton, Australia

Delhi-1100922014

Page 5: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

INTRODUCTION TO DATA MINING WITH CASE STUDIES, Third EditionG.K. Gupta

© 2014 by PHI Learning Private Limited, Delhi. All rights reserved. No part of this book may be reproduced in any form, by mimeograph or any other means, without permission in writing from the publisher.

ISBN-978-81-203-5002-1

The export rights of this book are vested solely with the publisher.

Eighth Printing (Third Edition) ... ... July, 2014

Published by Asoke K. Ghosh, PHI Learning Private Limited, Rimjhim House, 111, Patparganj Industrial Estate, Delhi-110092 and Printed by Baba Barkha Nath Printers, Bahadurgarh, Haryana-124507.

Page 6: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

To the memory of

Professor C.S. WallaceFoundation Professor and Head

Department of Computer ScienceMonash University

26 October 1933–7 August 2004

Page 7: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450
Page 8: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

Preface xiiiPreface to the Second Edition xvPreface to the First Edition xvii

Chapter 1 INTRODUCTION 1–60Learning Objectives 11.1 Introduction 1Chapter Overview 21.2 What is Data Mining? 31.3 Why Data Mining Now? 41.4 The Data Mining Process—Software Development Approach 101.5 The Data Mining Process—The CRISP-DM Approach 111.6 Data Mining Applications 151.7 Data Mining Techniques 181.8 Practical Examples of Data Mining 211.9 The Future of Data Mining 281.10 Guidelines for Successful Data Mining 291.11 Limitations of Data Mining 301.12 Using WEKA Software in Class 311.13 Data Mining Software 31Summary 34Review Questions 35Exercises 35Multiple Choice Questions 36Bibliography 38Case Study 1A Data Mining Techniques for Optimizing Inventories for Electronic Commerce 41Case Study 1B Crime Data Mining: A General Framework and Some Examples 52

Contents

vii

Page 9: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

viii Contents

Chapter 2 DATA UNDERSTANDING AND DATA PREPARATION 61–90

Learning Objectives 612.1 Introduction 61Chapter Overview 622.2 Data Collection and Pre-processing 622.3 Outliers 702.4 Mining Outliers 722.5 Missing Data 742.6 Types of Data 752.7 Computing Distance 772.8 Data Summarising Using Basic Statistical Measurements 792.9 Displaying Data Graphically 822.10 Multidimensional Data Visualisation 85Summary 85Review Questions 85Exercises 86Multiple Choice Questions 87Bibliography 90

Chapter 3 ASSOCIATION RULES MINING 91–151

Learning Objectives 913.1 Introduction 91Chapter Overview 923.2 Basics 923.3 The Task and a Naïve Algorithm 943.4 The Apriori Algorithm 973.5 Improving theEfficiencyof theAprioriAlgorithm 1103.6 Apriori-TID 1113.7 Direct Hashing and Pruning (DHP) 1143.8 Dynamic Itemset Counting (DIC) 1173.9 Mining Frequent Patterns without Candidate Generation (FP–Growth) 1183.10 Performance Evaluation of Algorithms 123Summary 123Review Questions 124Exercises 125Multiple Choice Questions 127Project 1—Using ARM for Table 1.1 129Project 2—Designing a Shopping Mall 129Project 3—Distributed ARM 130Bibliography 131Rakesh Agrawal—Inventor of Association Rules Mining 133Case Study 3 Mining Customer Value: From Association Rules to Direct Marketing 134

Page 10: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

Contents ix

Chapter 4 CLASSIFICATION 152–215Learning Objectives 1524.1 Introduction 152Chapter Overview 1534.2 Decision Tree 1544.3 Building a Decision Tree—The Tree Induction Algorithm 1564.4 Split Algorithm Based on Information Theory 1574.5 Split Algorithm Based on the Gini Index 1634.6 Overfitting andPruning 1694.7 Decision Tree Rules 1694.8 Decision Tree Summary 1704.9 Naïve Bayes Method 1714.10 EstimatingPredictiveAccuracyofClassificationMethods 1744.11 ImprovingAccuracyofClassificationMethods 1774.12 OtherEvaluationCriteria forClassificationMethods 1784.13 ClassificationSoftware 179Summary 181Review Questions 181Exercises 182Multiple Choice Questions 184Projects 185Bibliography 187Ross Quinlan—Leading researcher in Decision Trees 189Case Study 4A KDD for Insurance Risk Assessment: A Case Study 190Case Study 4B A Data Mining Approach for Retailing Bank Customer Attrition Analysis 198

Chapter 5 CLUSTER ANALYSIS 216–269Learning Objectives 2165.1 Introduction 216Chapter Overview 2195.2 Desired Features of Cluster Analysis 2195.3 Types of Cluster Analysis Methods 2205.4 Partitional Methods 2215.5 Hierarchical Methods 2285.6 Density-Based Methods 2385.7 Dealing with Large Databases 2395.8 Quality and Validity of Cluster Analysis Methods 2415.9 Cluster Analysis Software 243Summary 243Review Questions 244Exercises 245Multiple Choice Questions 245Projects 247Bibliography 250Case Study 5 Efficient Clustering of Very Large Document Collections 252

Page 11: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

x Contents

Chapter 6 WEB DATA MINING 270–331Learning Objectives 2706.1 Introduction 270Overview 2726.2 Web Mining 2726.3 Web Terminology and Characteristics 2736.4 Locality and Hierarchy in the Web 2786.5 Web Content Mining 2806.6 Web Usage Mining 2866.7 Web Structure Mining 2886.8 Web Mining Software 295Summary 296Review Questions 296Exercises 297Multiple Choice Questions 297Bibliography 300Tim Berners-Lee—Inventor of the World Wide Web 303Case Study 6 Lessons and Challenges from Mining Retail E-Commerce Data 304

Chapter 7 SEARCH ENGINES AND QUERY MINING 332–381Learning Objectives 3327.1 Introduction 332Chapter Overview 3337.2 Differences between Web Search and Information Retrieval 3337.3 Characteristics of Search Engines 3347.4 Search Engine Functionality 3387.5 Search Engine Architecture 3397.6 Ranking of Web Pages 3467.7 Search Query Mining 3527.8 Individual Privacy and Query Data Mining 356Summary 357Review Questions 357Exercises 357Multiple Choice Questions 359Project 360Bibliography 362Case Study 7 The Anatomy of a Large-Scale Hypertextual Web

Search Engine 364

Chapter 8 DATA WAREHOUSING 382–425Learning Objectives 3828.1 Introduction 3828.2 Operational Data Stores 3858.3 Data Warehouses 387

Page 12: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

Contents xi

8.4 Data Warehouse Design 3928.5 Guidelines for Data Warehouse Implementation 3968.6 Data Warehouse Metadata 3988.7 Software for ODS and Data Warehousing 399Summary 400Review Questions 401Exercises 402Multiple Choice Questions 402Projects 404Bibliography 405Bill Inmon—Inventor of Data Warehouse 407Case Study 8 Data Warehouse Governance: Best Practices at Blue Cross

and Blue Shield of North Carolina 408

Chapter 9 ONLINE ANALYTICAL PROCESSING (OLAP) 426–470Learning Objectives 4269.1 Introduction 4269.2 OLAP 4279.3 Characteristics of OLAP Systems 4299.4 Motivations for Using OLAP 4329.5 Multidimensional View and Data Cube 4339.6 Data Cube Implementations 4399.7 Data Cube Operations 4439.8 Guidelines for OLAP Implementation 4479.9 OLAP Software 448Summary 449Review Questions 450Exercises 450Multiple Choice Questions 450Bibliography 453Jim Gray (1944–2007)—Pioneer in Databases and OLAP 454Case Study 9 Discovering Web Access Patterns and Trends by Applying

OLAP and Data Mining Technology on Web Logs 455

Chapter 10 INFORMATION PRIVACY AND DATA MINING 471–504Learning Objectives 47110.1 Introduction 471Chapter Overview 47210.2 What is Information Privacy? 47210.3 Basic Principles to Protect Information Privacy 47210.4 Privacy Legislation in India 47510.5 Uses and Misuses of Data Mining 47610.6 Primary Aims of Data Mining 47810.7 Pitfalls of Data Mining 479

Page 13: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

xii Contents

10.8 Why Current Privacy Principles are Ineffective for Data Mining? 48010.9 A Revised Set of Privacy Principles for Data Mining of Personal Information 48210.10 Technological Solutions 48310.11 Examples of Use of Data Mining by the US Government 485Summary 488Review Questions 488Exercises 489Multiple Choice Questions 489Bibliography 491Case Study 10 Privacy Conflicts in CRM Services for Online Shops: A Case Study 493

Answers to Multiple Choice Questions 505–506

Index 507–514

Page 14: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

The third edition of this book is a substantially revised version of the earlier editions. The firstchapter has been rewritten and expanded. A new example has been added. Whereas the second chapter is completely new to this edition. It discusses the importance of data preprocessing in data mining. A number of issues are discussed in this chapter. An interesting example is included as an exercise at the end of the chapter. This example may also be used in Chapter 5 on Clustering. Chapter 3 has been revised and a new project has been included. Chapter 4 has been revised and so has beenChapter 5.Minormodifications havebeenmade toChapter 6.WhereasChapter 7 hasbeen revised substantially. A new section on Query Data Mining has been added to this chapter. Minor modifications have been made to Chapters 8 and 9. Finally, Chapter 10 has been revisedsubstantially to focus on privacy developments in India.

Please continue to send me feedback about the book at my email address [email protected].

G.K. [email protected]

xiii

Preface

Page 15: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450
Page 16: Introduction to Data Mining with Case Studies · 9.8 Guidelines for OLAP Implementation447 9.9 OLAP Software 448 Summary 449 Review Questions450 Exercises450 Multiple Choice Questions450

Introduction To Data Mining With CaseStudies

Publisher : PHI Learning ISBN : 9788120350021 Author : Gupta

Type the URL : http://www.kopykitab.com/product/10277

Get this eBook

25%OFF