Students Doing BIG STUFF with BIG DATA Dan Matthews – Trine University

Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Embed Size (px)


Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference Presenter: Dan Matthews, Trine University At first, when beginners hear the term “data mining” they wonder, “What kind of mining could a computer possibly do? It must be awfully hard. What would the end product of data mining look like?”. Data mining (analytics) is becoming a core skill for an unprecedented number of professions. There exist software environment that help make the process efficient for the data miner. Tableau is one of the systems I use in my data mining class to teach students data mining. The software helps accelerate the process of converting data to not just information but to knowledge with intuitive drag & drop technology that lets you stop worrying about how to connect to data and lets you spend your time answering questions and forming relationships (knowledge) using critical thinking and creative association. With Tableau's speed and ease of use, students find themselves doing more complex analyses in less time. Tableau has an academic program that gives professional-grade analytics software in the form of Tableau Desktop to full-time students to help prepare them for working in an increasingly data-driven world. Students use Tableau Desktop for class work and extracurricular projects. Tableau offers instructors free access to Tableau Desktop as well to equip them to teach the next generation of data scientists (miners) and analysts. In addition to software, Tableau recognizes that materials and support are essential to teaching with a tool, and to that end they offer a variety of solutions for different classrooms. Dozens of universities are using Tableau in Data Mining classes. I want to share how I use the resources available to me to do quality instruction in this very important new technology discipline. I will define data mining (as best as I can). I will discuss why the subject is so very important. I will discuss a variety of applications. And most of all I will demonstrate some fun things students can do with the mining of the big data sets available in the cloud.

Citation preview

Page 1: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Students Doing BIG STUFF with BIG DATA Dan Matthews – Trine University

Page 2: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Trine University – Angola IndianaDepartment of Informatics

And Cybersecurity

Page 3: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference


“The success of computing is in the resolution of problems, found in areas that are predominately outside of computing..”

Page 4: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Data Mining AKA:

Information Harvesting

Knowledge Mining

Knowledge Discovery

Data Dredging

Data Pattern Processing

Data Archaeology

Database Mining

Siftware Analytics

Business Intelligence

And more…

Page 5: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference


• The process of discovering meaningful new correlations, patterns, and trends but sifting through large amounts of stored data, using pattern recognition technologies and statistical and mathematical techniques.

Page 6: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

A number of technology skills are needed:

Data Mining

Database Managemen


Machine Learning

Artificial Intelligence

Analysis of Algorithms



Data Warehousing


Technology Ethics

Page 7: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

“In order to discover anything, you must be looking for something.”

Laws of Serendipity

Page 8: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

I had to mine this data the hard way.

Page 9: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

What I won’t talk about today but these concepts are important to learn in a class on data mining.

Page 10: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Having fun “playing” with and mining data!

Page 11: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Visualization to gain insight and knowledge

David McCandless Data Visualization TED Talk

Page 12: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

WEKA: the software• Machine learning/data mining software written in Java

(distributed under the GNU Public License)• Used for research, education, and applications• Complements “Data Mining” by Witten & Frank• Main features:– Comprehensive set of data pre-processing tools, learning algorithms

and evaluation methods– Graphical user interfaces (incl. data visualization)– Environment for comparing learning algorithms

Page 13: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference
Page 14: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}


WEKA only deals with “flat” files

Page 15: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Visual Analytics


Tableau 8AnyData


Web & MobileAuthoring

Page 16: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Visual Analytics


Tableau 8AnyData


Web & MobileAuthoring


Sets and visual groups

Shared Filters

Treemaps, bubble charts, word clouds

New marks card

Freeform dashboards

Data Blending improvements

Parallelized dashboards

Faster quick filters

Data Engine & Extract performance

Fast graphics and calculations

Performance recorder


Google Analytics & Google BigQuery

Cloudera Impala, Cassandra, HortonWorks, Hadapt, Karmasphere


Data Extract API

JavaScript API

Data Server Security

Server Auditing

Distributed Data Engine

Web Authoring

iPad and Android authoring

Local rendering


Page 17: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Tableau for Academia

Page 18: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Time to play!

Page 19: Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference

Dan Matthews – Associate Professor – Trine [email protected]