Scott Burton and Richard Morris CS 676 Presentation 12 April 2011

Mining Rules from Surveys and Questionnaires

Scott Burton and Richard MorrisCS 676 Presentation

12 April 2011

Frequently Used Problems for data mining• Rarity• Related and dependent questions• Ordinal / Likert scale

Surveys and Questionnaires

Association Rule Mining

Market basket analysis

Cookies -> Milk

Customer Milk Cookies Butter Bread

B x x x

Our Goal: Improve PrecisionStandard Algorithms/Approaches• Apriori, MS-Apriori• Too many rules• Rules are not “interesting” or actionable• Finding the needle in the haystack

Our goal• Improve Precision• How do you measure “interestingness?”

Mostly based on Support or Confidence Considered about 40 different metrics All seemed to favor the wrong types of rules

Interestingness Measures

Our Datasets Smoking habits of middle school students

in Mexico• Global Youth Tobacco Survey for the Pan

American Health Organization (GYTSPAHO)• ~65 Questions and 13,000 responses

HINTS (Health Information National Trends Survey)• hints.cancer.gov• 2007 response data had ~475 Questions and

8,000 responses• We focused on a subset of ~100 questions

Apriori vs. MS-Apriori

Apriori (Figure 1)

MS-Apriori (Figure 2)

Related and Dependent QuestionsTrue but worthless rules• Do you smoke=no -> Did you smoke last

week=no

Our approach• Cluster similar questions• Remove any intra-cluster rules

Distance Metrics◦ Bi-conditional prediction

Attribute vs. Attribute-Value pair

Involving the subject matter expert

Creating Clusters

A Sample Clustering of Questions

(see handout)

Effects of Cluster PruningMS-Apriori (Figure 2)

After cluster pruning (Figure 3)

Similar Rules

Abstract Viewpoint:• A B -> C D• A -> C D• A B -> C• A B Z -> C D

Similar Rule Pruning

Effects of Similar Rule Pruning

After cluster pruning (Figure 3)

After Similar Rule Pruning (Figure 4)

Ordinal and Likert DataTwo Approaches• Pre-process• Post-process

Ordinal Likert

Effects of Pre-Binning (Figure 5)

HINTS Data

(see handout, Figures 6-10)

Other Examples

Conclusions and Future WorkConclusions• Increased precision of “interesting” rules• More work to be done

Future work• Tuning of existing processes• Handle numerical data• Handle questions not asked to everyone• Handle questions with multiple responses• Try other record matching techniques for similar

rule pruning

Scott Burton and Richard Morris CS 676 Presentation 12 April 2011

Documents

Bendigo Weekly Issue 676

Nepali times #676

2014 burton morris

Siglo21 Edición 676

Fig. 22.10, p.676

maxine - burton + BURTON · is the loving mother of two grown children, Rachael Burton Dillon and Michael Burton, and the proud grandmother of Robert (Robbie) Burton Dillon and Ella

eleTurf 676

Feral Tribune 676

API 676.pdf

bean reed bostick ranson llewellyn re-nick parker brady hooper ... lindsay ward bird morris kamman kessler peacock mai-ley moser lind burton weyer

The Byzantine Empire By Bryan Burton, Jesse Wagner, and Makayla Morris

PIC16F630/676 EEPROM Memory Programming Specificationajpic.zonk.pl/procesory/pdf/pic16f630_676-ps.pdf · 2003. 2. 25. · • PIC16F676 1.0 PROGRAMMING THE PIC16F630/676 The PIC16F630/676

Burton Morris

676-689 P4P-01

PROMOTING ECONOMIC SELF SUFFICIENCY: A NATIONAL PERSPECTIVE January 25, 2011 1 Michael Morris Chief Executive Officer Burton Blatt Institute at Syracuse

P.O BOX 429, NUKU’ALOFA, Tel: (676) 21-400 Fax: (676) 23 ... Report Augus… · 1 P.O BOX 429, NUKU’ALOFA, Tel: (676) 21-400 Fax: (676) 23-047 Email: afernando@tongapower.to 25

BURTON COAL MINE BURTON HIGHWALL CHALLENGE - Queensland Mining … · 2019-11-20 · BURTON HIGHWALL CHALLENGE Matt Tsang and David Wang 19 August 2014. Burton Coal Mine - Burton

Semmais 676

INTRODUCTION - 39th Force Support Squadron Staffing 676-6416 676-3067 LN Staffing 676-6578 Office Fax 676-3879 CML:+90-322-3163879 Prepared by Fatih Akay Just click on an index item

Funding HR 676 Friedman