18
Dept. of Computing Science, University of Aberdeen 1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Embed Size (px)

Citation preview

Page 1: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 1

CS4031/CS5012 Data Mining and Visualization

Yaji Sripada

Page 2: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 2

Time table

• Lectures– 2 lectures

• 11:00 -12:00 Tuesdays in Taylor A21• 10:00 – 11:00 Wednesdays in Meston 311

• Practicals– 1 two hour practical on Mondays in

Meston 311 (Room changed since the last lecture)• 15:00-17:00

Page 3: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 3

Assessment

• Course is worth 15 credits• Two components

– 25% continuous assessment– 75% end of term exam

• Continuous assessment– Issued in Week 6/7– Due on the Friday of Week 10/11

Page 4: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 4

Course Organization

• Lectures– Discuss topics

including• Data Mining

– General Data– Time series– Spatial data

• Information Visualization (InfoVis)

• Case Studies

• Practicals– 10 weeks

• Mostly using some existing software

• Developing your own software (PHP)

– 8th week assessment

Page 5: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 5

Reading

• Mostly lecture notes and some research papers

• I will provide all the required reading material

Page 6: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 6

Introduction

Page 7: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 7

Overgrowth of Data• Humans accumulate large volumes of data

in many domains– Business

• Transactional data

– Scientific• Complete sequence data from Human Genome

Project– of 3 billion DNA units

– Engineering• 100s of sensors on a gas turbine taking

measurements every second

– And many more?

Page 8: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 8

Information Hidden in Data

• Data are raw facts • Humans routinely ‘dig’ useful abstractions from

raw data– An example abstraction ‘mined’ from past exam results– No coursework submitted => will fail the exam as well

• For small data sets (a few hundred bytes)– Simple and manual data analysis OK (Even preferred!!!)– Statistics

• For large data sets (a few Gigabytes or more)– Manual analysis is impossible– Computer Assistance needed

Page 9: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 9

Two views of computer assistance

• Data Mining View– Machines can automatically (or semi-

automatically) extract meaningful and useful information from heaps of raw data

• Information Visualization (InfoVis) View– Humans themselves can make sense of

data if data are presented visually• We learn both these views in this

course

Page 10: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 10

Data Mining

• Process of automatically (or semi-automatically) discovering useful, novel and meaningful patterns from substantial quantities of data.

• Tasks– Classification– Clustering– Association Rule Mining

Page 11: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 11

Typical Applications

• Customer Relationship Management (CRM)

• Linking gene variations among individuals to common illnesses (e.g. Cancer)

• Identifying abnormal conditions in an operational gas turbine

• More ?

Page 12: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 12

Information Visualization• Process of representing data in such a way

(usually involves visual presentations) that enable users to gain useful insights into the data

• Focus is on designing a data representation scheme that makes in underlying ‘information’ visible to the user

• For rendering the representation scheme– Computer graphics technology is exploited

• Good InfoVis techniques are based on– Good understanding of the information structures

underlying the data– Good understanding of the human perception and

cognition– Good graphics

Page 13: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 13

Typical Applications

• Newsmap• Touchgraph• ManyEyes• CountryScape

Page 14: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 14

SVG• Scalable Vector Graphics

– An XML-based Web Language to textually specify vector graphics

– E.g. SVG specification of a circle<svg width="300" height="200"

xmlns="http://www.w3.org/2000/svg"> <circle cx="125" cy="110" r="20" fill="red" />

</svg> • W3C Recommendation• Browser support for SVG content

– Firefox provides built in support– IE needs an Adobe Plug-in

• SVG content can be created– Using text editors (static)– Programmatically (dynamic)

Page 15: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 15

SVG in an HTML page

• Three methods• Using the <embed> tag

<embed src=“circle.svg" width="300" height="100"type="image/svg+xml"pluginspage="http://www.adobe.com/svg/viewer/install/" />

• Using the <object> tag– <object data="rect.svg" width="300" height="100"

type="image/svg+xml"codebase="http://www.adobe.com/svg/viewer/install/" />

• Using the <iframe> tag– <iframe src="rect.svg" width="300" height="100">

</iframe>

Page 16: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 16

Learning SVG

• In this course focus is on designing effective visualizations– We assume knowledge of SVG for

rendering the visualizations

• Fundamentals of SVG covered in lectures

• Some practicals involve SVG creation– No practicals exclusively for learning

SVG

Page 17: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 17

Lectures

Data

Exploratory Data Analysis (EDA)

Classification – I

Classification – II

Clustering – I

Clustering – II

InfoVis – I

InfoVis – II

InfoVis – III

Association rule mining

Time Series – I

Time Series – II

Time Series – III

Sequence Mining

Spatial Data

GIS

Spatial Data Mining

Issues with Data mining and InfoVis

Users

Detailed Course PlanPracticals

Exploratory Data Analysis (EDA)

Classification

Clustering

HCE

Tree Maps

Association Rule Mining

Assignment

Time Searcher

Time Series Representations

GIS

Exploring spatio-temporal Data

Page 18: Dept. of Computing Science, University of Aberdeen1 CS4031/CS5012 Data Mining and Visualization Yaji Sripada

Dept. of Computing Science, University of Aberdeen 18

Summary

• All modern organizations– possess large volumes of data and– Users want to understand these data

• You learn technologies to– Extract and/Or present information from

large data sets• Analytical methods• Visualization methods