56
Introduction to Text Mining Tom Bohannon TAIR Conference February 2013

Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

  • Upload
    vandieu

  • View
    230

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

Introduction to Text Mining

Tom Bohannon

TAIR Conference

February 2013

Page 2: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

22

Objectives Define text mining and identify text mining applications.

Survey applications of text mining.

Use an example to illustrate text mining concepts.

Examine how text mining fits into modern data mining

projects.

Page 3: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

33

What Is Text Mining ? Text mining is a process that employs a set of algorithms

for converting unstructured text into structured data

objects and the quantitative methods used to analyze

these data objects.

“SAS defines text mining as the process of investigating

a large collection of free-form documents in order to

discover and use the knowledge that exists in the

collection as a whole.” (SAS® Text Miner: Distilling

Textual Data for Competitive Business Advantage)

Page 4: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

4

Text Mining – Two General Goals Pattern Discovery (Unsupervised Learning)

– Identify naturally occurring groups (classification*).

– Derive convenient segments (clustering).

Prediction (Supervised Learning)

– Input variables are associated with values

of a target variable.

– Derive a model or set of rules that produces a

predicted target value for a given set of inputs.

* Classification with a target variable is prediction.

Page 5: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

5

Text MiningText mining has the following characteristics:

operates with respect to a corpus of documents

employs a dictionary to identify relevant terms

accommodates a variety of metrics to quantify the

contents of a document within the corpus

derives a structured vector* of measurements for each

document relative to the corpus

employs analytical methods applied to the structured

vector of measurements based on the goals of the

analysis, for example, groups documents into

segments

* Some text mining methods use a structured matrix.

Page 6: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

66

Another View of Text Mining

Text

A

Miracle

Occurs

Numbers

Page 7: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

7

Application: Document Classification

7

New

Document

Group A vs. Others

Group B vs. Others

Group C vs. OthersGroup C

Group A

Group B

Page 8: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

8

Document Categorization

Document categorization

Assign documents to pre-defined categories

Examples

Process email into work, personal, junk

Process documents from a newsgroup into “interesting”,

“not interesting”, “spam and flames”

Process transcripts of bugged phone calls into “relevant”

and “irrelevant”

Page 9: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

9

Application: Information Retrieval

9

Document Collection

Text MiningInput

Document

Matched

Documents

Page 10: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

10

IntroductionHow can we retrieve information using a search engine?.

We can represent the query and the documents as

vectors (vector space model)

– However to construct these vectors we should

perform a preliminary document preparation.

The documents are retrieved by finding the closest

distance between the query and the document vector.

Page 11: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

11

Application: Clustering

11

Document Collection

Text Mining

Group

1Group 2 Group 3 Group 4

Page 12: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

1212

SAS Text

Miner

...

Page 13: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

13

Document Classification

Document classification

Cluster documents based on similarity

Examples

Group samples of writing in an attempt to determine

author(s)

Look for “hot spots” in customer feedback

Find new trends in a document collection (outliers,

hard to classify)

Page 14: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

14

IR Applications Using Text Mining Survey Analysis

Analysis of Student Evaluations of Instructors

Predictive Modeling

Enrollment Models

Retention Models

14

Page 15: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

1515

Predictive Modeling

Input X1

Text

Input X2

Input Xk

Pre-processing

Parsing

Transformation

Input T1

Input T2

Input Tj

Model Score

Cleaning

Screening

Derivation

Transformation

Imputation

Page 16: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

1616

Obtaining the Prediction

Nominal Target

Binary/Categorical

Data

Model

Score

Rule

Prediction

Example

Binary Response: Mail (Y/N)

Age=33,Gender=F,Income=$45,000

g(Y)=f(Age,Gender,Income)

0.378

If (Score>0.255)

then Mail=Y

Mail

Page 17: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

17

Objectives Explore the general concept of decision trees.

Build a decision tree model.

Examine the model results and interpret these results.

Page 18: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

18

Fitted Decision Tree

New CaseDEBTINC = 20

PROPERTY VALUE = $500,000

DELINQUENCIES = 0

FIRST MORTGAGE = $200,000

Delinquencies

DEBTINC45

...

64%

0

1

<45

7%Property Value

< $300,000

$300,0006%

79%

Delinquencies

<6

6%

6

100%

First Mortgage

< $246,000 $246,000

92% 20%

Oldest Loan83%

53%

178< 178

64% 32%

Page 19: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

19

The Cultivation of Trees Split Search

– Which splits are to be considered?

Splitting Criterion

– Which split is best?

Stopping Rule

– When should the splitting stop?

Pruning Rule

– Should some branches be lopped off?

Page 20: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

20

Benefits of Trees Interpretability

– tree-structured presentation

Mixed Measurement Scales

– nominal, ordinal, interval

Regression trees

Robustness

Missing Values

Page 21: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

21

Simple Prediction Illustration

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

Predict dot color

for each x1 and x2.

Training Data

...

Page 22: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

22

Simple Prediction Illustration

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

Predict dot color

for each x1 and x2.

Training Data

...

Page 23: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

23

Decision Tree Prediction Rules

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

40%

60%

55%

70%

x1

<0.52 ≥0.52 <0.51 ≥0.51x1

x2

<0.63 ≥0.63

root node

interior node

leaf node

...

Page 24: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

24

Decision Tree Prediction Rules

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

40%

60%

55%

x1

<0.52 ≥0.52

<0.63

70%

<0.51 ≥0.51x1

x2

≥0.63

root node

interior node

leaf node

Predict:

...

Page 25: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

25

≥0.51

40%

60%

55%

x1

<0.52 ≥0.52

<0.63

40%

60%

55%

x1

<0.52 ≥0.52 ≥0.51

<0.63

Decision Tree Prediction Rules

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

Decision =

Estimate = 0.70

70%

<0.51x1

x2

≥0.63

Predict:

...

Page 26: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

26

Decision Tree Prediction Rules

40%

60%

55%

x1

<0.52 ≥0.52 ≥0.51

<0.63

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

Decision =

Estimate = 0.70

70%

<0.51x1

x2

≥0.63

Predict:

...

Page 27: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

27

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

x1

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x2

Calculate the logworth

of every partition on

input x1.

left right

...

Confusion Matrix

Page 28: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

28

Decision Tree Split Search

Calculate the logworth

of every partition on

input x1.

left right

...

Confusion Matrix

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

0.52

Page 29: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

29

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

max

logworth(x1)

0.95

0.52left right

Select the partition with

the maximum logworth.

53%

47%

42%

58%

...

Page 30: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

30

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

max

logworth(x1)

0.95

left right

53% 42%

47% 58%

Repeat for input x2.

...

Page 31: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

31

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

max

logworth(x1)

0.95

left right

53% 42%

47% 58%

bottom top

...

Page 32: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

32

Decision Tree Split Search

max

logworth(x1)

0.95

left right

53% 42%

47% 58%

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

0.63

max

logworth(x2)

4.92

bottom top

54%

46%

35%

65%

...

Page 33: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

33

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

max

logworth(x2)

4.92

bottom top

max

logworth(x1)

0.95

left right

Compare partition

logworth ratings.54%

46%

35%

65%

53%

47%

42%

58%

...

Page 34: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

34

Decision Tree Split Search

max

logworth(x1)

0.95

left right

53% 42%

47% 58%

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

0.63

max

logworth(x2)

4.92

bottom top

54%

46%

35%

65%

...

Page 35: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

35

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

0.63

x2<0.63 ≥0.63

Create a partition rule

from the best partition

across all inputs.

...

Page 36: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

36

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

x2<0.63 ≥0.63

Repeat the process

in each subset.

...

Page 37: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

37

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

left right

...

Page 38: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

38

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

0.52

max

logworth(x1)

5.72

left right

61%

39%

55%

45%

...

Page 39: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

39

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

max

logworth(x1)

5.72

left right

61% 55%

39% 45%

bottom top

...

Page 40: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

40

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

max

logworth(x1)

5.72

left right

61% 55%

39% 45%

0.02

max

logworth(x2)

-2.01

bottom top

38%

62%

55%

45%

...

Page 41: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

41

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

max

logworth(x2)

-2.01

bottom top

38%

62%

55%

45%

max

logworth(x1)

5.72

left right

61%

39%

55%

45%

...

Page 42: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

42

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

0.52

max

logworth(x2)

-2.01

bottom top

38% 55%

62% 45%

max

logworth(x1)

5.72

left right

61%

39%

55%

45%

...

Page 43: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

43

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

x2

x1

<0.63 ≥0.63

<0.52 ≥0.52

Create a second

partition rule.

...

Page 44: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

44

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

x2

x1

<0.63 ≥0.63

<0.52 ≥0.52

Create a second

partition rule.

...

Page 45: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

45

Repeat to form a maximal tree.

Decision Tree Split Search

0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0

0.0

0.5

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1.0

x1

x2

...

Page 46: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

46

Example Two Year School on Texas & Mexico Border

Strong in Mathematics and Sciences

Weak in the Arts

Half of the students are from newly emigrated families

46

Page 47: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

47

Improve Graduation Rate

Identify Students Most Likely Not to Graduate

Collect Data and Build a Predictive Model

Determine What Intervention is Approximate

47

Objective

Page 48: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

48

Sample of 1000 Students Entering Fall 2010

Determine Which Students Had Left by Fall 2013

Data Fields

1. Student ID

2. Age

3. Gender

4. Major

5. Population

6. School

7. Enrollment Statement

8. Target

48

Hypothetical Data

Page 49: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

49

49

Hypothetical Data

Page 50: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

5050

Page 51: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

5151

Page 52: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

5252

Process Flow

Page 53: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

5353

Decision Tree Model

Page 54: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

5454

Fit Statistics

Page 55: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

5555

ROC Curve

Page 56: Introduction to Text Mining - texas-air.org how text mining fits into modern data mining ... (SAS® Text Miner: Distilling Textual Data for Competitive Business Advantage) 4 ... Introduction

56

Score Students Entering in Fall 2013 With Model

Distribute Scoring Information to Approximate People

Evaluate Model After Two Years

56

Next Steps