Upload
paula-thornton
View
212
Download
0
Embed Size (px)
Citation preview
Dept. of Computing Science, University of Aberdeen 1
CS4031/CS5012 Data Mining and Visualization
Yaji Sripada
Dept. of Computing Science, University of Aberdeen 2
Time table
• Lectures– 2 lectures
• 11:00 -12:00 Tuesdays in Taylor A21• 10:00 – 11:00 Wednesdays in Meston 311
• Practicals– 1 two hour practical on Mondays in
Meston 311 (Room changed since the last lecture)• 15:00-17:00
Dept. of Computing Science, University of Aberdeen 3
Assessment
• Course is worth 15 credits• Two components
– 25% continuous assessment– 75% end of term exam
• Continuous assessment– Issued in Week 6/7– Due on the Friday of Week 10/11
Dept. of Computing Science, University of Aberdeen 4
Course Organization
• Lectures– Discuss topics
including• Data Mining
– General Data– Time series– Spatial data
• Information Visualization (InfoVis)
• Case Studies
• Practicals– 10 weeks
• Mostly using some existing software
• Developing your own software (PHP)
– 8th week assessment
Dept. of Computing Science, University of Aberdeen 5
Reading
• Mostly lecture notes and some research papers
• I will provide all the required reading material
Dept. of Computing Science, University of Aberdeen 6
Introduction
Dept. of Computing Science, University of Aberdeen 7
Overgrowth of Data• Humans accumulate large volumes of data
in many domains– Business
• Transactional data
– Scientific• Complete sequence data from Human Genome
Project– of 3 billion DNA units
– Engineering• 100s of sensors on a gas turbine taking
measurements every second
– And many more?
Dept. of Computing Science, University of Aberdeen 8
Information Hidden in Data
• Data are raw facts • Humans routinely ‘dig’ useful abstractions from
raw data– An example abstraction ‘mined’ from past exam results– No coursework submitted => will fail the exam as well
• For small data sets (a few hundred bytes)– Simple and manual data analysis OK (Even preferred!!!)– Statistics
• For large data sets (a few Gigabytes or more)– Manual analysis is impossible– Computer Assistance needed
Dept. of Computing Science, University of Aberdeen 9
Two views of computer assistance
• Data Mining View– Machines can automatically (or semi-
automatically) extract meaningful and useful information from heaps of raw data
• Information Visualization (InfoVis) View– Humans themselves can make sense of
data if data are presented visually• We learn both these views in this
course
Dept. of Computing Science, University of Aberdeen 10
Data Mining
• Process of automatically (or semi-automatically) discovering useful, novel and meaningful patterns from substantial quantities of data.
• Tasks– Classification– Clustering– Association Rule Mining
Dept. of Computing Science, University of Aberdeen 11
Typical Applications
• Customer Relationship Management (CRM)
• Linking gene variations among individuals to common illnesses (e.g. Cancer)
• Identifying abnormal conditions in an operational gas turbine
• More ?
Dept. of Computing Science, University of Aberdeen 12
Information Visualization• Process of representing data in such a way
(usually involves visual presentations) that enable users to gain useful insights into the data
• Focus is on designing a data representation scheme that makes in underlying ‘information’ visible to the user
• For rendering the representation scheme– Computer graphics technology is exploited
• Good InfoVis techniques are based on– Good understanding of the information structures
underlying the data– Good understanding of the human perception and
cognition– Good graphics
Dept. of Computing Science, University of Aberdeen 13
Typical Applications
• Newsmap• Touchgraph• ManyEyes• CountryScape
Dept. of Computing Science, University of Aberdeen 14
SVG• Scalable Vector Graphics
– An XML-based Web Language to textually specify vector graphics
– E.g. SVG specification of a circle<svg width="300" height="200"
xmlns="http://www.w3.org/2000/svg"> <circle cx="125" cy="110" r="20" fill="red" />
</svg> • W3C Recommendation• Browser support for SVG content
– Firefox provides built in support– IE needs an Adobe Plug-in
• SVG content can be created– Using text editors (static)– Programmatically (dynamic)
Dept. of Computing Science, University of Aberdeen 15
SVG in an HTML page
• Three methods• Using the <embed> tag
<embed src=“circle.svg" width="300" height="100"type="image/svg+xml"pluginspage="http://www.adobe.com/svg/viewer/install/" />
• Using the <object> tag– <object data="rect.svg" width="300" height="100"
type="image/svg+xml"codebase="http://www.adobe.com/svg/viewer/install/" />
• Using the <iframe> tag– <iframe src="rect.svg" width="300" height="100">
</iframe>
Dept. of Computing Science, University of Aberdeen 16
Learning SVG
• In this course focus is on designing effective visualizations– We assume knowledge of SVG for
rendering the visualizations
• Fundamentals of SVG covered in lectures
• Some practicals involve SVG creation– No practicals exclusively for learning
SVG
Dept. of Computing Science, University of Aberdeen 17
Lectures
Data
Exploratory Data Analysis (EDA)
Classification – I
Classification – II
Clustering – I
Clustering – II
InfoVis – I
InfoVis – II
InfoVis – III
Association rule mining
Time Series – I
Time Series – II
Time Series – III
Sequence Mining
Spatial Data
GIS
Spatial Data Mining
Issues with Data mining and InfoVis
Users
Detailed Course PlanPracticals
Exploratory Data Analysis (EDA)
Classification
Clustering
HCE
Tree Maps
Association Rule Mining
Assignment
Time Searcher
Time Series Representations
GIS
Exploring spatio-temporal Data
Dept. of Computing Science, University of Aberdeen 18
Summary
• All modern organizations– possess large volumes of data and– Users want to understand these data
• You learn technologies to– Extract and/Or present information from
large data sets• Analytical methods• Visualization methods