Want Awesome Models? Build Awesome Training Data!

  • View
    41

  • Download
    0

Embed Size (px)

Text of Want Awesome Models? Build Awesome Training Data!

PowerPoint Presentation

Expertly Prepared Data Produces Better Models, Faster!Tom OttMarketing Data Scientist RapidMiner

2016 RapidMiner, Inc. All rights reserved.- # -

TM

1

Todays Agenda IntroductionChallenges of Dirty DataData Prep OverviewData ExplorationData BlendingData CleansingDemo Q&A

2016 RapidMiner, Inc. All rights reserved.- # -

TM

- # -

TMUnified Platform Accelerates Time to Value

Data PrepSpeed & optimize ALL dataexploration, blending & cleansing tasksOperationalizeEasily deploy & maintain models and embed analytic resultsModel & ValidateApply machine learning to rapidly prototype & confidently validate predictive models

Embed results in all types of business apps & data visualization toolsIncorporate all types of data

ACCELERATES TIME TO VALUE

2016 RapidMiner, Inc. All rights reserved.- # -

TM

- # -

TMSo this is the who is RapidMiner so everyone should know by now that RM has an awesome visual design environment and tons of Machine Learning Also so you can build awesome Predictive Models but what you might not know is that we also have kickass Data Prep. And not just any old data prep but both your basic blending & cleansing techniques, but also some pretty cool advanced techniques that help improve model accuracy and performance. And today I am going to show you some of these techniques.

Want to talk about visual design environment here and 1500 algos and functions. Highlight - not only powerful data prep but fast and easy data prep

3

Data in the Real World isDirtyIncomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation= Noisy: containing errors or outliers Salary=-10, Age=222 Inconsistent: containing discrepancies in codes or names e.g., Age=42 Birthday=03/07/1997 e.g., Was rating 1,2,3, now rating A, B, C e.g., discrepancy between duplicate records

2016 RapidMiner, Inc. All rights reserved.- # -

TM

- # -

TMTime Consuming Every real world dataset needs some kind of data pre-processing Deal with missing valuesCorrect erroneous values Select relevant attributes Adapt data set format to the model typeIn general, data prep or pre-processing consumes greater than 60% of a data science project effort

2016 RapidMiner, Inc. All rights reserved.- # -

TM

- # -

TMReduces Model Accuracy & Performance

2016 RapidMiner, Inc. All rights reserved.- # -

TM

- # -

TMIts Time to Wrangle Some Data!Data ExplorationDiscovery through Stats, Charts and Graphs

Data BlendingAttribute Selection & GenerationData Types & ConversionsFilters, Sorts & JoinsSampling

Data Cleansing Missing ValuesTransformation - NormalizationOutliersFeature Selection

2016 RapidMiner, Inc. All rights reserved.- # -

TM

- # -

TMSo you got your data its dirty, is messy, its structured and unstructured, its in various different locations its time to wrangle some data!

As most of you know - preparing the data set to suit a data mining task is the most time-consuming part of the process. Very rarely data are available in the form required by the predictive modeling process. Most algorithm require data to be structured in a tabular format with records in rows and attributes in columns. If the data is in any other format, the it needs to be transformed. What if there is incorrect data? Or missing values?

Today we will discuss some of the activities performed during the data perp stage and how to overcome common challenges. Today we are going to look at how you can get your data expertly formatted to get the most accurate results from you predictive models.And how RapidMiner makes it fast and easy to do this.

7

Demonstration

2016 RapidMiner, Inc. All rights reserved.- # -

TM

- # -

TMWe are #1 productUniquely Only player who can deliver pilot projects marketplace of expertsThe key to innovation and competitive advantage lies in the power of data science. Invest in the right tools and the right skills to uncover new opportunities and change the world

8

Next StepsResourcesRapidMiner Blog: rapidminer.com/resources/blog/RapidMiner Community: community.rapidminer.com

On-Demand DemosAdvanced Data Prep: rapidminer.com/resource/advanced-data-prep/Data Prep Subprocess: rapidminer.com/resource/creating-data-prep-Subprocess

Training VideosData Exploration: rapidminer.com/training/videos/Data Prep: rapidminer.com/training/videos/

2016 RapidMiner, Inc. All rights reserved.- # -

TM

- # -

TM

Contact Us

inquiries@rapidminer.com@RapidMinerwww.rapidminer.com

Q & A

Discuss Data Prep in the Communitycommunity.rapidminer.com

2016 RapidMiner, Inc. All rights reserved.- # -

TM

- # -

TMWe are #1 productUniquely Only player who can deliver pilot projects marketplace of expertsThe key to innovation and competitive advantage lies in the power of data science. Invest in the right tools and the right skills to uncover new opportunities and change the world

10