Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
UsingMachineLearninginOBIEEforActionableBI
ByLakshmanBulusu
MitchellMartinInc./BankofAmerica
UsingMachineLearninginOBIEEforActionableBI
• UsingMachineLearning(ML)viaOracleRTechnologiesinOBIEE• ImplementinganR-basedpredictiveanalyticsalgorithmofMLinOracleDB12c• OutputofpredictionvisuallyintegratedinOBIEEdashboardsforactionableinsight• Keytakeaways:
• AnoutlineofOracleRtechnologies• BuildingandscoringamodelusingRandomForestMLclassificationalgorithmforin-databasepredictiveanalyticsinDB12c
• End-to-EndimplementationstepsoftheaboveanditsintegrationwithOBIEEdashboardsbackedbyabusinessusecase
OutlineofOracleRTechnologies
• OpenSourceR– ready-to-useRfunctions,relatedcodeanddataintheformofpackagesandviewscalledCRANTaskviewsforMLandStatisticalLearning(CRANstandsforComprehensiveRArchiveNetwork,availableathttp://cran.r-project.org orhttp://www.R-project.org/)• Oracle’sRtechnologies
• OracleRDistribution• ROracle• OracleREnterprise(ORE)• OracleRAdvancedAnalyticsforHadoop
OracleRTechnologiescontd….
• OracleRDistribution– v.3.2.0• RsoftwareredistributionofopensourceR(freelyavailable)– supportsdynamicloadingofhigh-performantmathlibrariesformulti-threadedexecution,e.g.,IntelMathKernellibraryetc.
• EnhancementstoopensourceRforscalabilityacrossusersanddatabaseforembeddedRexecution
• ROracle – AnopensourceRpackagemanagedbyOraclethatcomeswithaDBIcompliantOracledriverforRusingOCIlibre-engineeredandoptimizedforDBconnectivitybetweenRandOracle.
• SQLstatementsexecutionfromRinterface• TransactionalsupportforDMLoperations• Scalableacrossalldata-typesincludingOracleNUMBER,VARCHAR2,TIMESTAMP,andRAWdatatypesas
alsowithlargeresultsets.• ConnecttoOracleDBfromOracleRdistribution–OracleInstantClientorOraclestandardDBclientmust
beinstalled.SQL*PlusSQLinteractiveinterfacecanbeusedwiththeformer.• ROracle v.3-1.11availabilityasofthiswriting
OracleRTechnologiescontd….
• OracleRAdvancedAnalyticsforHadoop(ORAAH)• AcomponentofOracleBigDataConnectorssoftwaresuite• AvailableasanHadoopclusterintegratingHadoopandHIVEwithOracleDB(AdvancedAnalyticsoption)
• DatasourcedinHDFSincanbeharnessedusingHive-basedSQLforasalsoviaHDFSdatamappingsasdirectinputtoMLroutines– executedasMapReducejobsorusingApacheSpark.HFSenablesdatapreparation,joins,andviewcreation.
• SourcedatainHDFACSV,HadooptableorcachedintoApacheSparkasRDD(ResilientDistributedDataset)
• DetailsofprimaryMLalgorithmsathttp://www.oracle.com/technetwork/database/database-technologies/bdc/r-advanalytics-for-hadoop/overview/index.html andexamplesofusecasesbasedonthesealgorithmsathttps://blogs.oracle.com/R/entry/oracle_r_advanced_analytics_for
OracleRtechnologies– OracleREnterprise(ORE)
• OracleREnterprise(ORE)– themaincomponentofOracleRtechnologiesavailablewiththeOracleAdvancedAnalyticsoption
• Architecture• DatabaseServerMachinewithOracle DBinstalled– containslibrariesandPL/SQLprogramsforOREclient
• REngineinstalledonOracleDBforembeddedRexecution- in-databaseexecutionofstatisticsandmachinelearningfunctionsandmodels. EachDBREnginehasOREServerandOREClientpackages.
• MultipleREnginesspawnedfordataparallelism.• NativeOracleDBfeaturesforSQLandPL/SQL• OracleRDistribution• ROracle fordatabaseconnectivity• ClientREnginewithclientOREpackages,opensourceR(orOracleRDistribution)andROracle
• In-DBRScriptRepositorythatstoresRscripts– callablebynamedirectlyfromSQL• In-DBRdatastore thatstoresRobjects
OracleREnterprise(ORE)contd….
OREcontd….
• Advantages–• In-databaseexecutiononstoreddatadirectly• Minimallatency,high-performance,multi-threadedandparallelcapabilitiesfordataandmodels,
• Scalability• Minimalmemoryusage
• OREextendsopensourceRinthefollowingmanner:• ORETransparencyLayer• EmbeddedRExecution(viatheEmbeddedREngine)- bothRInterfaceandSQLInterface
• PredictiveAnalytics• Asofthiswriting,thelatestversionofOracleREnterpriseis1.5.0.
• FurtherdetailsandadditionalenhancementstoOREcanbefoundathttp://www.oracle.com/technetwork/database/database-technologies/r/r-enterprise/overview/index.html.
OracleREnterprisecontd….• ORETransparencyLayer–
• In-databasecomputationandanalysisofdatathroughRisdonetransparently.• BasicdatainRoperationscanbecodedasRdata.frames andthecorrespondingRfunctionsareexposedasoverloadedfunctiondefinitionsandtheirexecutionpushedtoOracleDBwherethestatisticalandmachinelearningoperationsareexecutedondatainOracletables.
• OREprovidesore.frameobjects(asub-classofdata.frame)thatgetsubstitutedfordatabasetables,therebySQLbeinggenerated,forin-dbexecutionondatastoredinOracle.
• ThiskindofTransparencyLayer providesreducedlatencyandoptimalperformanceintermsofoperationalefficiencyandatbig-datasetscale.
• EmbeddedRExecution–• EmbeddedRengine(s)spawnedbyOracleDBallowdata- andtask- parallelismviaitsEmbeddedRExecution support.EmbeddedRExecutionisexposedthroughbothRandSQLAPIs.SQLAPIsallowseamlessintegrationwithotherapplicationssuchasOBIEERPDsanddashboardsbypassingresultsfromRforbusinessintelligenceandadvancedanalytics.Table2liststheseAPIs(bothRInterfaceandSQLInterface)forOREv1.5.0
• Advantagesincludemorememoryavailabilityanddatabase-enableddataparallelism• Light-outprocessing-EmbeddedRexecutionenablesexecutionofRscripts(includingthosethatareopensourceCRANpackagesbased)embeddedinSQLandPL/SQLroutines.
• RscriptscanbestoredinthedatabaseRScriptRepositoryandinvokedbynameincallingSQLstatements.Thiscanbedonedynamicallytoo. Theoutputofsuchexecutioncanbestructureddata,XMLrepresentationofRobjectsandgraphics,orPNGgraphicsviaBLOBcolumnsinOracletable(s)..
• PredictiveAnalytics– OREsupportsthreetypesofpredictiveanalytics– ore.eda forexploratorydataanalysis,ore.dmfordatamining,andore.predict foradditionalpredcitive analytics.Moreinformationcanbefoundathttp://www.oracle.com/technetwork/database/database-technologies/r/r-enterprise/overview/index.html andREnterpriseUsersGuide,
ORE- PrimaryRobjects
• data.frame – dataparticipatinginRoperations• ore.frame – objectsofasubclassofdata.frame isuniquetoOREthatgetsubstitutedfordatabasetables.
• TransparentmappingofsemanticdatamappingsfromOracletablestoRdata.frame objectsandvice-versa.
• RscriptsintheOREscriptrepository– Rroutinesthatarestoredin-databaseandgetexecutedviaEmbeddedRexecution.TheygetcalledusingoneoforeorSQLAPIs.
• Rdatastore thatstoreRdatasetsinOracleDB.• ore.push,ore.pull thatinjectandoutputdata.frame objectstoOracleDBandore.frame objectstodata.frame Robjectsrespectively
• TheprimaryOREpackagesarelistedinTable1andtheOREembeddedexecutionAPIsarelistedinTable2.
OREcontd….
Package Functionality
DBI
OREOREbase
OREgraphics
OREstats
SupplementalpackagesforDatabaseInterfaceforROracleREnterpriseFunctionalityrelatedtoR'sbasepackageFunctionalityrelatedtoR'sgraphicspackageFunctionalityrelatedtoR'sstatisticspackage
OREedaOREdm
OREpredict
ORExml
bitopspng
PackageforexploratorydataanalysisPackagefordatamining(OracleDataMining)PackageforscoringdatainOracledatabaseusingR
ModelpredictionsFunctionalityforXMLgenerationwithinRFunctionsforbitwiseoperationsSupplementalpackageforRead/writepng images
from/toRandOracleDB
OBIEEAdvancedAnalytics AdvancedAnalyticsandMachineLearningRfunctionssupportedinOREforintegrationwithOBIEE
APIRInterface APISQLInterface BriefDescription
InthedescriptionofbelowAPI,thefunctionfreferstotheRfunctionassociatedwitheachoftheAPI.
ore.doEval rqEval Executesfunctionfwithoutinputdataargument
ore.tableApply rqTableApply Executesfunctiomfwithore.frameinputdataargumentprovidedviafirstparametertof(asdata.frame)
ore.groupApply rqGroupApply Executesfunctionfbypartitioningdataaspervaluesinanindexcolumn.Eachdatapartitionisprovidedasadata.frameargumenttothefirstparameteroff.Parallelexecutionofeachcalloffisenabled
ore.rowApply rqRowEval Executesfunctionfbypassingachunkofrowsoftheprovidedinputdata.Eachinputchunkofdataispassedasadata.frametothefirstparameteroff.Parallelexecutionofeachcalloffisenabled
ore.indexApply N/A Executesfunctionfwithoutinputdataargumentbutwithanindexoftheexecutioncalls1throughnwherenisthenumberoffunctioncalls.Parallelexecutionofeachcalloffisenabled
ore.scriptCreate sys.rqScriptCreate StorestheRscriptintotheORERscriptrepositorywiththegivennameandtheassociatedfunction
ore.scriptDrop sys.rqScriptDrop DropstheRscriptfromtheORERscriptrespositorywiththegivenname
ore.scriptLoad LoadstheRscriptwiththegivennamefromtheORERscriptrepositoryforsubsequentexecutionofthefunctionassociatedwiththeRscript
Table1PrimaryOREpackagesTable2EmbeddedRExecutionOREandSQLAPIs
BuildingandscoringamodelusingRandomForestMLclassification
• BuildingaRandomForestRclassificationmodelforpredictingProductSource• Input– CSVcontainingProductrelateddatawithonepredictorvariableorindependentvariablecalledProductrankandoneresponseoroutcomevariablecalledSource(e.g.,Sourceofproduct).TheProductispredictedbasedontheproductrankandclassifiedasoneofSourceA,SourceB,orSourceC
• Methodology• UseR’srandomForest()MLalgorithmexecutesinsideoneofORE’sspawnedRengines.
• ModelImportancevariablesMeanDecreaseAccuracy andMeanDecreaseGini• Plotthemodel,theimportancevariables,andpairsplotofthe‘Source’vs.model‘prediction’
• OREcodeandoutputplotsareshowninthedemo– Demo1.R.
BuildingandscoringamodelusingRandomForestMLclassificationcontd….
library(ORE)ore.connect(user=“<username>",sid=“<SID>",host="localhost",password=“<pw>")ore.doEval(function(){library(randomForest)productdata <- read.csv(“productdata.csv",header=TRUE,sep=',')head(productdata)productdata$Source <- ifelse(productdata$rank ==1,‘SourceA',ifelse(productdata$rank ==2,‘SourceB',ifelse(productdata$rank ==3,‘SourceC','')))productdata$Source <- as.factor(productdata$Source)
BuildingandscoringamodelusingRandomForestMLclassificationcontd….set.seed(123)sample_size <- 0.70*nrow(oriductdata)sampleset <-sample(seq_len(nrow(productdata)),sample_size)training_set <- productdata[sampleset,]test_set <- productdata[-sampleset,]formula<- Source~.- rankSource.rf <- randomForest(formula,data=training_set,ntree=100,importance=TRUE,proximity=TRUE)#BuildingtheRandomForestModelhead(Source.rf)print(Source.rf)#PrintRFmodeltoseeimportantfeaturesplot(Source.rf)#PlotRFModeltoseethecorrespondingRFmodelgraphimportance(Source.rf)#Seetheimportanceofthevariables
BuildingandscoringamodelusingRandomForestMLclassificationcontd….varImpPlot(Source.rf)#PlotRFModelseethevariableImportanceSourcePred <- predict(Source.rf,newdata =test_set)#Scoringthemodelhead(SourcePred)table(SourcePred,test_set$Source)plot(margin(Source.rf,test_set$Source))#Plotmarginoferrorforaccuracylibrary(AppliedPredictiveModeling)transparentTheme(trans=.4)pairs(table(SourcePred,test_set$Source),main=“ProductSourcePrediction")},ore.graphics=TRUE,ore.png.height=600,ore.png.width=500)
End-to-endimplementationofRandomForestalgorithmusingthebusinessusecaseofsourcingProduct1. CreateaPL/SQLblockthatcontainsaRscriptthatgetsstoredinORERscript
repository.ThisscriptbuildsandscorestheRandomForestmodelforpredictingProductSourceusingtheORESQLInterface.ThecorrespondingcodeisinfileDemo2.sql
2. CreateasecondPL/SQLblockthatpreparedtheinputdataandcallstheabovescriptusingtheORESQLAPI- theoutputofwhichcanbeusedtogenerateaPNGgraphoftheprediction.ThecorrespondingcodeisinfileDemo3.sql
3. VerifytheoutputofCallSourcePred,runthebelowSQLtogetatabularoutput:select*fromtable(rqEval(
NULL,'selectCAST(''a''asVARCHAR2(50))“SourcePred",CAST(''b''ASVARCHAR2(50))
"Var2",1as"Freq"fromdual',‘CallSourcedPred'));
End-to-endimplementationofRandomForestalgorithmusingthebusinessusecaseofsourcingProduct4.RunaquerythatgeneratesanIMAGEcolumnthatstorestheRFoutputasagraphinPNGformat:select*fromtable(rqEval(
NULL,'PNG',‘CallSourcePred'));
Thegraphoutputissameastheoneobtainedearlier.
IntegrationwithOBIEEPre-config stepsrequiredforOBIEE12cintegrationBeforecreatinganewrepositoryordownlaoding theSampleAppLite repositoryfromtheOBIEE12cserver,afewpre-configstepsmustbedonesothatOBIEE12ccancommunicatewiththetestdatabaseandschema.1.IntheNQSConfig.INIfilelocatedinthe<FusionMiddlewareHome>\user_projects\domains\bi\config\fmwconfig\biconfig\OBISfolder,makethefollowingchanges:Inthe[ADVANCE_ANALYTICS_SCRIPT]section:i.setthevalueofTARGET="ORE";ii.setthevalueofconnectionpoolasCONNECTION_POOL="ORCL".“<uname>/<pwd>:1521";MakesuretheORCLaliasisdefinded inthetnsnames.ora fileoftheOracleDB12cserver.2.RestartWeblogic serverandtheassociatedservicesbyissuingthestop.cmdcommandfollowedbystart.cmdcommand.Thesearefoundin<FusionMiddlewareHome>\user_projects\domains\bi\bitools\binfolder.3.RestartOracleBusinessIntelligenceservice(fromTaskmanager->servicesifonWindows)4.NextdownloadtheserverRPDfilecorrespondingtoSampleAppLite usingthefollowingcommand:<FusionMiddlewareHome>\user_projects\domains\bi\bitools\bin\datamodel.cmddownloadrpd -Oobieenew.rpd -W<Adminpw> -Uweblogic -P<weblogic_password>-SIssi
IntegrationwithOBIEEcontd….
• Integrationstepsinanutshell– shownindetailinDemo1. Opendownloaded.rpd inOBIEEAdmintool.InthePhysicalLayer,
chooseNewDatabase,specifyvalues;IntheConnectionPoolstab,createnewconnectionpoolwithdatasourceasthecompleteTNSdescriptorstringforthealiasORCLandusername/password.
2. CreatenewPhysicalSchemafollowedbyNewPhysicalTable.IntheNewPhysicalTabledialogbox,specifynameas‘CallSourcePred’,tabletypeas‘select’andInitializationStringasthefollowingquery:selectid,imagefromtable(rqEval(NULL,'PNG',‘CallSourcePred'))
IntegrationwithOBIEEcontd….
3.ClicktheNewSchemacreatedanddrag-n-droptotheBusinessModelandMappingLayer.Editthelookupexpressionbuilderforimagecolumnasfollows:lookup(“orcl_db”,””,”<newschemaname>”.”CallSourcePred”.”image”,”orcl_db”.””.”<newschemaname>”.”CallSourcePred”.”id”)4.RemainingstepstocompletethePresentationLayerasshownindemo.5.Savethe.rpd anduploadittoOBIEEusingtheuploadrpd command.
IntegrationwithOBIEEcontd….
6.OpentheOBIEEPresentationServices.ClickonAdministration->ReloadFilesandMetadata.ThencreateanAnalysisbyclickingNew->Analysis.ThenewschemacreatedinOBIEEAdminToolnowappearsunderavailableSubjectAreas.7.Drag-n-droptheidandimagecolumnsfromtheCallSourcePred underthenewschemasubjectareatotheSelectedColumnsarea.ClickonResults.ThepairsplotoftheProductSourcePrediction isdisplayed.• Tocreateanewdashboard,clickNew->Dashboardandspecify<newschemaname>Dashboardasthenameandchooseafoldernamewhereitwillbesaved.ForthisafolderinMyFoldersorSharedFolderscanbechosen.
• IntheCatalogsectiononbelowleft,choosetheanalysisnamed"PairsPlotofProductSourcePrediction",anddrag-n-dropthesameontotheemptyareatotherightoftheDashboard.ClickRuntoviewtheDashboard.Thepairsplotisdisplayedastheoutput.
UsesoftheBIDashboardforactionabledecision-making
• PredictingProductsourcebyanalyzingthepairsplotenablesthebusinessto1. Influencethepropensitytobuy2. ForecastingyieldbasedonSource3. ForecastingProductQuality4. RecommendingProductusage
PropensitytobuycanbepredictedbyusinganotherMLalgorithmcalledLogisticRegressionortheGeneralizedLinearModel(GLM).ThiscanbedoneinOBIEEusingthebuilt-inRegressionAnalyticsfunctionorusingRorOREbasedLogisticregressionmodelandthenusingtheprobabilisticvaluetodeterminetheoutcome.
ForecastingyieldcanalsobedonerightinOBIEEusingpastbuyingtrendsincombinationwithTimeseriesanalysisortheEvaluate_Script analyticsfunctionwithaSQLbasedmodelforthesameobtainedbyscoringthemodel.OraRorOREfunctioncanbeusedsimilartoasillustratedforpredictingProductSource.PredictingProductQualitycanbedoneusingthesameRandomForestMLmodelbasedontheSourcepredictorvariable.RecommendingProductusagecanbedonebyusingMLalgorithmusedforrecommendationenginesandintegratingthesameOBIEEdashboards.
Demo
Demo
Thanks
Thanksforcoming
FurtherReading
• FosterProvost&TomFawcett,DataScienceforBusiness,O’ReillyMedia,Inc.,2013• OracleREnterpriseDocumentationMediaLibrary,Release1.5,REnterpriseUsersGuide,https://docs.oracle.com/cd/E67822_01/OREUG/toc.htm• http://www.oracle.com/technetwork/database/database-technologies/r/r-technologies/documentation/documentation-2166653.html• http://analyticsvidhya.com/• https://datascienceplus.com/• https://blogs.oracle.com/r/
QandA
QandA