Click here to load reader

- an intro to how YOU can get gold from your data Going from Zero to 60 with Oracle Advanced Analytics

Embed Size (px)

Citation preview

Using ORCL as an Oracle

- an intro to how YOU can get gold from your dataGoing from Zero to 60 with Oracle Advanced Analytics

AudienceIm addressing those who:

Havent used Oracle Advanced Analytics or similar products before (RapidMiner, MatLab, Actian, Weka, etc.)

Want to find out if/how predictive analytics can benefit them

Well start at the beginning and point you in the right direction (hopefully)

Image courtesy of marcolm at FreeDigitalPhotos.netWhat do I mean gold?Ore > Refining Process > Gold > Value

Data > Patterns > Models > Value

Img: courtesy Mcstrother at wikimedia commons

Wheres the Value?Answers!*

*Just make sure theyre right

Answers to WHAT???What is the output value given a set of inputs?What kinds of groups could this data be organized into?What similarities exist inside this data set?What is the sentiment of this sentence?What inputs to this model really matter?What data points in this set just dont fit?When you leaveAsk yourselves what can YOU do with Advanced Analytics?Can you start asking some of the types of questions well ask today?Can your company benefit from the answers to those questions?

Try some of these capabilities out in a sand box:Use your own enterprise sand boxVirtualBoxapex.oracle.com => this is awesome!

Oracle Advanced AnalyticsAvailable as an Extra Cost add on to the Enterprise EditionSorry, ONLY the Enterprise Edition (afaik)Integrated with architectures of several Oracle ApplicationsOBIEE Generic and customizable integrationHCM Fusion Employee Performance PredictionsCRM Fusion Sales Opportunity predictions what to sell, when, and for how muchORCA Market Basket and Next Offer AnalysisIndustry Specific Models Communications, Airline, etcThe GUI is SQL DeveloperWell focus on this today

IMG: OracleGetting things running

Images courtesy of ecee.colorado.eduits NOT that complicatedOracle By Example (OBE) has some great tutorialsGoogle Oracle Data Mining 12c OBE Series and youll find itThese are great 0-60 tutorials that show you exactly how to get SQL Developer, Oracle Advanced Analytics, and the Oracle R Extension up and running.

The steps (50,000 ft view)Get your dataFeed it to a modelTweak it until its accurateUse your model

Does order matter?Everyone has an opinion

Lots of ParadigmsKDD (www.kdd.org)SEMMA (Wikipedia : SEMMA)Five As ( Google SPSS )CRISP-DM (Wikipedia : Cross Industry Standard Process for Data Mining)

All are similar and contain the phases well talk about today.

Just remember about the NFL

NFLNo Free Lunch Theorem:

No one algorithm (or defined process) is always better than another.

Sometimes one process is better, sometimes its not

IT DEPENDS

Well be talking about different algorithms and ways of doing things, this theorem states that a boiler plate approach to every data mining task wont guarantee an optimal outcome, so I encourage you to learn multiple approaches to data mining and apply them as best to your data mining efforts12

Common DM StepsPre-Process DataCreate a ModelEvaluate PerformanceUse the ModelTune the Model

Steps of Data MiningPre-ProcessThis can be a headache

Pre-Processing involves getting your data ready for analyses.

PL/SQL and SQL can be used to further prepare your data.

Well go over how Oracle Advanced Analytics makes Pre-Processing fast and easySteps of Data Mining

Common Pre-Processing tasksGet your data (Its closer than you think)Format it (Use SQL and PL/SQL)Sample itBin itNormalize itOutlier it

(ok no more made up words)Pre-Processing Overview

Pre-Process DemoIt doesnt have to be a headache anymore!

Sampling - Sometimes you dont want it allBinning Group numbers into CategoriesNormalization Put data on the same scaleDeal with OutliersDeal with Missing Values

Sample It (who doesnt love samples?) Sometimes you dont want it allOAA provides several sampling optionsSample Size can be % or a given #Sample Types can be Random, Stratified, or Top NSampling creates many smaller data sets from a single big oneSampling at the field level

Bin itBinning takes scalar values (say 0.1 through 99.0) and groups them into discrete bins or categories

For example : 10 FICOs (-999, 403, 428, 446, 698, 700, 740,782,812,849)Bin it into 3 categories Yes, No, Maybe:No : -999, 403, 428, 446 Maybe : 698, 700, 740Yes : 782,812,849

These bins can be used in algorithms (Models) that cant work on scalar valuesPre-Processing > Bin data into chunks

Normalize itThis isnt your 3NF normalizationNormalization means adjusting values measured on different scales to a common one (usually 0-1)Example 2 fields called Rate and AmountRate has a scale of 1% to 29%Amount has a scale of -9,999,999 to 9,999,999

A change of .10 in the Rate scale has a bigger impact than a change of .10 in the Amount scale

OAA has several methods built in (Min Max, Z Score, Linear, and others)Pre-Processing > Make your data Normal

Outlier ItOAA will detect outliers for youYou can use various definitions of outliers, standard deviations, percent ranges, and arbitrary value rangesYou can replace outlier values with null, edge values, etc.

Example: Fico scores usually come in ranges between about 300 to about 850, sometimes they come in as negatives, 999, or some (seemingly) randomly generated very large number.Pre-Processing > Single out the odd ones

Automatic Data PreparationSome algorithms need data put into certain formats

OAA has options to prepare this data for you automatically

OAA supports Binning, Normalization, Missing Value Replacement, etc.

When testing and applying data to models ADP applies the same transformations

Create a ModelYou have your questionsWhat kinds of answers do you want?

Ive found 14 different model types (and sub types) Advanced Analytics natively has to offerSteps of Data Mining

Model Types

Clustering Automated GroupingFeed a Clustering model data and it will group records into groups and tell you:Various groups that exist inside the data you gave itHow are groups different from each otherWhy it put any given data point in the group it didOnce youve got a model you like, you can use Advanced Analytics to assign a new data point to a groupLets use this to segment members (Account Holders)Phases of Data Mining > Creating ModelsDemo 1 : Member SegmentationQuestion: What groups do Members (account holders) fall into?Demos: Product Suggestions

Classification Supervised GroupingSimilar to Clustering, but you pick the group(s) you want.

Predicts one column from your dataset by looking at the other columns.

Lets use this to predict loans more likely to be written off

Phases of Data Mining > Creating ModelsDemo 2: Write Off ClassificationQuestion: Given details from loan applications, which loans are more likely to be written off than others?Demos: Anomaly Detection

Regression Predicting X because you know YSimilar to Classification, but the predicted value is a scalar (number) value not a discreet (group) value.

Attempts to find a function that fits data being given to it

Training Data builds the model, Testing Data sees how good the model really is

Lets use this to look at a simple Payment Amount Function

Phases of Data Mining > Creating ModelsClassic RegressionDemos: Member SegmentationDemo 3 : Payment Amount ModelQuestion: Given details from loan applications, what payment amount range can be expected?Association Discovery Correlating stuffA.K.A. Market Basket Analysis/DiscoveryGive this model data groups and it will output patterns it detectsExamples:Amazon : Items Recommended for YouNetflix : Movies you Might LikeWal-Marts classic (and untrue) finding that people buy Beer and Diapers on ThursdaysTargets famous (and true) ability to detect pregnant women based upon purchases Lets use this to build a Next Product Suggestion modelPhases of Data Mining > Creating Models

Demo 4 : Product SuggestionQuestion: Which products commonly go together?

Anomaly Detection Find the oddballsEver played One of these things is not like the others?

This model type finds data points that are outside the normUseful for fraud detection Sorry no demo for this one: Check out YouTube for one though(Search for Oracle Advanced Analytics Anomaly Detection)

Text Analysis Get the JistText strings can be broken down into Tokens and ThemesExample: When I started Oracle, what I wanted to do was to create an environment where I would enjoy working. That was my primary goal Larry Ellison[started],[Oracle],[wanted],[create],[environment],[enjoy],[working],[primary],[goal][when],[I],[what],[to],[do],[was],[to],[an],[where],[would],[that],[my]These can be stemmed using dictionary operationswork, worked, working, works => [work]

Lets use this to get a general satisfaction from surveysDemo 5 : Comment SentimentQuestion: What is the sentiment of comments given in feedback surveys?

Maintaining your ModelModels will get old and out dated as they age

New data should be added and the model reprocessed

If the data structure changes or new fields are used, reprocess your modelPhases of Data Mining > Maintain the Model

Questions?What will your next steps be?What questions can you ask?

Check out YouTube for some great tutorials!

Oracle Docs:Oracle Data Mining Concepts

CreditsAll images, where not attributed, courtesy of istockphoto.com or are otherwise used with permission.

Attributed images are copyright or trademark (TM) of their respective owner.

No sponsorship or endorsement shall be implied by the educational fair use of these images.Sheet1Answer TypeModelGenerate Grouping or OrganizationClusteringDiscrete Value or Predefined CategoryClassificationA NumberRegressionFree form text details, Comment SentimentText ProcessingAnomalies or data sets that aren't normalAnomaly Detection

Sheet2

Sheet3