Upload
cengage-learning
View
204
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Data Mining and Data Visualization – Tools to Allow Students to do BIG STUFF with BIG DATA - Course Technology Computing Conference Presenter: Dan Matthews, Trine University At first, when beginners hear the term “data mining” they wonder, “What kind of mining could a computer possibly do? It must be awfully hard. What would the end product of data mining look like?”. Data mining (analytics) is becoming a core skill for an unprecedented number of professions. There exist software environment that help make the process efficient for the data miner. Tableau is one of the systems I use in my data mining class to teach students data mining. The software helps accelerate the process of converting data to not just information but to knowledge with intuitive drag & drop technology that lets you stop worrying about how to connect to data and lets you spend your time answering questions and forming relationships (knowledge) using critical thinking and creative association. With Tableau's speed and ease of use, students find themselves doing more complex analyses in less time. Tableau has an academic program that gives professional-grade analytics software in the form of Tableau Desktop to full-time students to help prepare them for working in an increasingly data-driven world. Students use Tableau Desktop for class work and extracurricular projects. Tableau offers instructors free access to Tableau Desktop as well to equip them to teach the next generation of data scientists (miners) and analysts. In addition to software, Tableau recognizes that materials and support are essential to teaching with a tool, and to that end they offer a variety of solutions for different classrooms. Dozens of universities are using Tableau in Data Mining classes. I want to share how I use the resources available to me to do quality instruction in this very important new technology discipline. I will define data mining (as best as I can). I will discuss why the subject is so very important. I will discuss a variety of applications. And most of all I will demonstrate some fun things students can do with the mining of the big data sets available in the cloud.
Citation preview
Students Doing BIG STUFF with BIG DATA Dan Matthews – Trine University
Trine University – Angola IndianaDepartment of Informatics
And Cybersecurity
INFORMATICS – OUR WAY
“The success of computing is in the resolution of problems, found in areas that are predominately outside of computing..”
Data Mining AKA:
Information Harvesting
Knowledge Mining
Knowledge Discovery
Data Dredging
Data Pattern Processing
Data Archaeology
Database Mining
Siftware Analytics
Business Intelligence
And more…
A DECENT DEFINITION
• The process of discovering meaningful new correlations, patterns, and trends but sifting through large amounts of stored data, using pattern recognition technologies and statistical and mathematical techniques.
A number of technology skills are needed:
Data Mining
Database Managemen
t
Machine Learning
Artificial Intelligence
Analysis of Algorithms
Statistics
Visualization
Data Warehousing
Security
Technology Ethics
“In order to discover anything, you must be looking for something.”
Laws of Serendipity
I had to mine this data the hard way.
What I won’t talk about today but these concepts are important to learn in a class on data mining.
Having fun “playing” with and mining data!
Visualization to gain insight and knowledge
David McCandless Data Visualization TED Talk
WEKA: the software• Machine learning/data mining software written in Java
(distributed under the GNU Public License)• Used for research, education, and applications• Complements “Data Mining” by Witten & Frank• Main features:– Comprehensive set of data pre-processing tools, learning algorithms
and evaluation methods– Graphical user interfaces (incl. data visualization)– Environment for comparing learning algorithms
@relation heart-disease-simplified
@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}
@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...
WEKA only deals with “flat” files
Visual Analytics
BusinessIntegration
Tableau 8AnyData
FastPerformance
Web & MobileAuthoring
Visual Analytics
BusinessIntegration
Tableau 8AnyData
FastPerformance
Web & MobileAuthoring
Forecasting
Sets and visual groups
Shared Filters
Treemaps, bubble charts, word clouds
New marks card
Freeform dashboards
Data Blending improvements
Parallelized dashboards
Faster quick filters
Data Engine & Extract performance
Fast graphics and calculations
Performance recorder
Salesforce.com
Google Analytics & Google BigQuery
Cloudera Impala, Cassandra, HortonWorks, Hadapt, Karmasphere
SAP HANA
Data Extract API
JavaScript API
Data Server Security
Server Auditing
Distributed Data Engine
Web Authoring
iPad and Android authoring
Local rendering
Subscriptions
Tableau for Academia
Time to play!
Dan Matthews – Associate Professor – Trine [email protected]