Upload
rapidminer
View
116
Download
2
Embed Size (px)
DESCRIPTION
Citation preview
RapidMinerResources & Book presentation
Andrew Chisholm
Rapidminerresources 1
TOPICS
• About us - Andrew Chisholm
• Book - Exploring Data with RapidMiner
• About us – Dr Markus Hofmann
• RapidMiner Resources background
• RapidMiner Resources videos now
• Future plans
• A mini survey to help me focus
Rapidminerresources 2
About us – Andrew Chisholm
• By day• Product manager for an active test and measurement product used
extensively in the telecoms world
• By night • Crime fighting super hero• Data mining hobbyist
• Recent Masters degree in Data Mining and Business Intelligence
• Certified RapidMiner Master (#007)
• RapidMiner blog at http://rapidminernotes.blogspot.com
• Author of “Exploring Data with RapidMiner”
Rapidminerresources 3
Exploring data with RapidMiner
Rapidminerresources 4
• 90% [1] of data mining is • Cleaning• Reformatting• Summarizing• Understanding
• …Exploratory Data Analysis…• RapidMiner is good at helping with this• …so I decided to write a book• Practical examples within a process context
[1] Ingo Mierswa – verbal communication RapidMiner World Conference 20/8/14 09:43
About us - Dr Markus Hofmann
• PhD from Trinity College Dublin
• Lecturer in Informatics at the Institute of Technology, Blanchardstown
• Editor with Ralf Klinkenberg “RapidMiner: Data Mining Use Cases and Business Analytics Applications”
• Editor for an upcoming text mining book
• Extensive knowledge in the data mining domain
Rapidminerresources 5
Background
• RapidMiner is a truly powerful product
• The visual method of creating processes means it is more accessible to visual learners
• There is a learning curve and videos are the right way to help with it because this matches the method of creating processes
• RapidMiner videos initially with other collaborators in the future
• We do charge to cover costs and hopefully make some beer money
Rapidminerresources 6
RapidMiner Resources - now
• http://rapidminerresources.com
• Approximately 60 videos ~15 minutes duration each
• ~15 hours total length
• Organised as “basic”, “advanced” and “RapidMiner Server”
Rapidminerresources 7
Basic idea
• Most videos focus on one operator and show it being used with a mini case study
• Additional context and operators are required to help explanations• Processes and data accompany the videos so users can “sing along”• Tips and tricks as well as gotchas pop up from time to time• More advanced videos tend to focus on broader concepts
• “taming messy data”• “regular expressions”• “macros”• “dates”
• The idea is that they can be used to help learn initially and act as a refresher later
Rapidminerresources 8
Operators
Vid
eos
Rapidminerresources 9
Juicy images
Rapidminerresources 10
Rapidminerresources 11
The future
• There is so much we could do
• 5 candidate areas• Groovy Dark Arts
• Text Mining
• Web Mining
• RapidMiner Server
• Time Series in more detail
• The challenge is what is the priority?
Rapidminerresources 12
A mini survey
• Pretend you have $100k to spend
• I’m going to give you a link to a survey which will ask you to spend your money across different choices
• You can put all the money on one choice or spread it out across all of them
• Hopefully we will get an interesting result
• It will help us to decide what to focus on
http://goo.gl/vLgy96
Rapidminerresources 13
What the survey will look like
• Simply enter money in the 6 boxes (it has to add up to 100)
• Optionally give more detail and your name (don’t worry, no salesman will call)
Rapidminerresources 14
Groovy Dark Arts
• Reading from databases
• Getting details from models – for example SVD eigenvectors as example sets
• Multiple inputs and multiple outputs
• Regular expressions for parsing data
• Reading data files efficiently
• Checking example sets to assert correctness
Rapidminerresources 15
Text mining
• Word vectors
• Pruning
• Filtering
• Meta data
• Word lists
• Windowing documents
Rapidminerresources 16
Web mining
• Browsing and crawling
• Xpath
• Enrichment from external sources
• JSON
• XML
Rapidminerresources 17
RapidMiner Server
• Installing
• Passing parameters
• Creating services
• Reports
• Schedules
Rapidminerresources 18
Time Series in more detail
• Extracting features from series
• Windowing
• Fourier analysis
• Wavelet transforms
Rapidminerresources 19
The survey…
Rapidminerresources 20
Groovy Dark Arts Text Mining Web Mining RapidMiner Server Time Series in more detail
• Reading from databases
• Getting details from models – for example SVD eigenvectors as example sets
• Multiple inputs and multiple outputs
• Regular expressions for parsing data
• Reading data files efficiently
• Checking example sets to assert correctness
• Word vectors• Pruning• Filtering• Meta data• Word lists• Windowing
documents
• Browsing and crawling
• Xpath• Enrichment from
external sources• JSON• XML
• Installing• Passing parameters• Creating services• Reports• Schedules
• Extracting features from series
• Windowing• Fourier analysis• Wavelet transforms
http://goo.gl/vLgy96
Questions…
Thank you…
Rapidminerresources 21