24
ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Embed Size (px)

Citation preview

Page 1: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

ACIS 1504 - Introduction to Data Analytics & Business Intelligence

Text MiningData Cleaning

Page 2: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Concept MapText Mining

Implementation

Mixed Cell References

Design: Accuracy

Random

Search, Left, Right, Mid,

Len, &

Paste Values

Page 3: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Objectives

• Define Text Mining

• Demonstrate Excel features that support text mining.

Page 4: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Segment A:Text Mining

Page 5: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Text Analytics / Text Mining

• Software that searches vast amounts of textual data (unstructured) identifying patterns.

Page 6: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Nestle• Nestle processes Social Media

http://uk.reuters.com/article/video/idUKBRE89P07Q20121026?videoId=238680321

Page 7: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Segment B:Text Functions

Page 8: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Text Mining

• Search

• Parse

• Concatenate

• SEARCH

• LEFT, MID, RIGHT, LEN

• &

Page 9: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Name Example

Open Grades Textfile.xlsx.

Divide Last Name, First Name into two separate columns.

1. Locate the comma (SEARCH)2. Extract all characters to left of comma (LEFT)3. Locate end of full name (LEN)4. Extract almost all characters between comma

and end of name (RIGHT)

Page 10: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

SEARCH Function

Page 11: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

LEFT Function

Page 12: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

LEN or Length Function

Page 13: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

RIGHT Function

Page 14: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

MID FunctionExtract the first initial of first name.

Page 15: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Concatenate• Combine First Name, space and Last

Name.

• & is the concatenate symbol

• Quotes are required around constant strings of text

Page 16: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Student ID Example

Extract each student’s PID from their email address.

Create a new student identifier by combining the first three letters of the last name with the last four digits of the student ID number.

Page 17: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Segment C:Data Cleaning & Generation

Page 18: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Data Cleaning• Delete Unnecessary Columns & Rows• Resize Columns• Format Numeric Values• Separate Distinct Values • Shorten Lengthy Values• Data Validation for Future Entries• Generate Values

Page 19: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Favorite Pie Example

Page 20: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Favorite Pie Example

1. Ensure pie flavor data is consistent.

2. Replace confidential clicker ID # with randomly generated 6 digit number.

3. Ensure new ID number is static and unique.

Page 21: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Favorite Pie Example

Original Sorted Consistent

Page 22: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Random Number Functions

• =RAND()

• =RANDBETWEEN(low#, high#)

Page 23: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Paste Special - Values

MAC: Edit Menu, Paste Special

Page 24: ACIS 1504 - Introduction to Data Analytics & Business Intelligence Text Mining Data Cleaning

Exam Feedback Example

Open Exam Feedback.xlsx