18
Big Data is also Big Compute Better Price Forecasting using Machine Learning Avishkar Misra PhD – [email protected] Big Data & Analytics Platform Team

Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

BigDataisalso BigComputeBetterPriceForecastingusingMachineLearning

AvishkarMisraPhD– [email protected]&AnalyticsPlatformTeam

Page 2: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

OracleBigDataApplianceandOracleBigDataCloudServiceofferafantasticcomputationplatformtoscaleupexperimentationandspeeduptimetoinsight.TheOracleBigData&AnalyticsPlatformteam,workedwithaleadingNorthAmericancommodityproducerin2016,tohelpimprovethegranularityandaccuracyoftheirpredictionforthepriceofthecommodity.Anaccuratepriceforecastmodelhasapotentialbenefitof$25.6Movera3yearperiodforthecommodityproducer.WeusedPythonandPySpark runningonOracleBigDataAppliancetotryout39,936modelingcombinationsinamatterofminutestofindabetteralgorithm.Themoreaccuratealgorithmwasabletopredictthepricewithin+/-5%oftheactualprice73%ofthetimecomparedto40%forthealgorithmthecustomerhaddevelopedandfine-tunedovermanyyears.Oracle’sBigDataplatformsofferawaytodramaticallyscalethenumberofexperiments,helpingthescientistsandanalystsquicklynarrowtheirfocusonthemostrelevantandeffectivetechniquestoimpacttheirbusiness’sbottomline.

BlogPost:https://a-misra.com/2016/08/01/big-data-is-also-big-compute/

Abstract

Page 3: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

BigData&DataScienceAdvisoryServicesCustomerEngagementsthatbringindustryexperienceandbestpracticestodemonstrateandprototypesolutionsforcustomerinlinewiththeirstrategicbusinessgoals.

2

SolutionArchitecture

3

DataScience&MachineLearning

4

AnalyticsDesignHub

1

BusinessValueDefinition

Teamskill-sets:• DataScientists• DataWranglers• Architects• BusinessAnalysts

Teambackgroundsfrom:• Amazon• MSFT• Deloitte• IBM• Teradata• etc.

Page 4: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

Define,design,developanddemonstratethevalueofasolutionquickly.

ExecutiveCommitment

DiscoverySession

ExecutiveReadout Deploy

BusinessCase

AnalyticSprint

Architecture&

Roadmap

4-6weeks

Agriculture:aerialimageanalysis1

2 Utilities:frauddetection

Tree

BigDataAnalyticsSprint

Page 5: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

AlargeNorthAmericanLumberproducerlookingtoimproveweeklypriceforecasts

Today:CommodityPriceForecasting

Spotmarkettradersspeculatedonweeklycommoditypriceusing:

priorweek’smarketprice90-dayforecast– Tunedover10yearsgutinstinct

Decide:sell vshold inventory

MoreaccuratepricepredictioncanÛMargin

Page 6: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

LCBQ - KDWesternSpruce-Pine-Fir#2&Btr2x4random.

Commonbenchmark,sinceithashighcorrelationwithothertypesoflumber.

Lumberlumbereverywhere

Page 7: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

Normalizethedataforweeklypredictionsby

• over-sampling

• aggregation

• sub-sampling

Whatfactorsinfluencetheprice?

CurrentPrice Weekly(everyFriday)

EconomicActivityinConstruction&Building Monthly (3-monthlag)

CurrencyExchangeRates Daily

Weather*new Hourly

Page 8: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

Focus:59majorpopulationcenters– wherepeopleliveandlikelytorenovate+build.

WeatherData:

• WeeklyTotalPrecipitation

• WeeklyAverageTemperature

Weather,butforwhere?

Page 9: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

Time-seriesprediction,socannotusetraditionalk-foldvalidation.

EvaluationMethodology

...

2-years

For104weeks:• Trainmodelsforeachweek.• Predictfornextweek.• Evaluatepredictionaccuracy

Page 10: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

Q:MachineLearningpipelineoptions– Whichdowepick?

FeatureGeneration None,RollingWindow {7,13,26,52},PolynomialInteractions,All 6

FeatureNormalizer None, Min-Max 2

FeatureSelector None, PCA 2

ModelingObjective Classifier, Regression 2

ModelingAlgorithm LogisticRegressionDecisionTreesExtraTreesAda BoostGradientBoostingRandomForest

LinearRegressionRidgeRegressionLassoRegressionDecisionTreeRegressorAda BoostTreeRegressorExtraTreeRegressorGradientBoostingRegressorRandomForestRegressor

6 |8

Page 11: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

Answer:Experiment!

OracleBigDataCloudService

DataScientists

Ifyoudoublethenumberofexperimentsyoudoperyearyou’regoingtodoubleyourinventiveness.

– JeffBezos

Page 12: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

Trainedandevaluated- 14,976Classifiers

Classification– Canwepredictifthepricewillgoup?

Good

Poor

Page 13: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

Trainedandevaluated- 22,464Regressors

Regressors – Canwepredicttheprice?

Poor

Good

Page 14: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

Evaluated37,440modelsinlessthan15mins!!

Takeaways

Classifiers:• Ensemblearebetter- ExtraTrees,GradientBoosting,RandomForest,AdaBoost.• Min-Maxnormalizationoffeaturevaluesnotuseful• PrincipalComponentAnalysisveryhelpful

Regressors:• Lotofmodelover-fitting!• ExtraTreeRegressors workverywell• PrincipalComponentAnalysisnothelpful

Page 15: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

Insightsà Abetterpricepredictionalgorithm

Target isActualPrice+/- 5%

LinearRegression:40% oftimeintarget.

Extra-TreeRegression:73% oftime intarget.

Û $19.4mto$25.6mmarginover3years

Page 16: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

PriceForecastingforCommodityGoodsObjective:Improvepriceforecastingonfinishedcommoditygoodstooptimizemargindollarsacrossallchannelsandenhancecustomerupsellingcapabilitiesfortraders

• OracleBigDataCloudServices,including:ClouderaHadoopDist.,Spark,Jupyter,Connectors,DataIntegrator,OracleRAdvancedAnalyticsforHadoop,Spatial&Graph,Storage

• SalesReps(spotmarkettraders)reliedonpriorweekmarketpricingandlong-range(90)dataforecaststomakea‘gutinstinct’decision–weeklypriceforecastsdidnotexist

• BusinessAnalystsmanuallypulledinformationfromavarietyofsourcesintoanExcelworkbook– oftenhavingtoretypemarkettrendinformationfromPDFs

• Whilethecurrentprocesswasdeemedhelpfulforlongrangestrategicforecasting,tradersneededshort-termforecastsinordertobetterpricegoodsanddeterminewhethertosellorholdinventoryonaweeklybasis

BusinessChallenge/Opportunity

ProposedSolution

Example

Leading North American Lumber Producer for Builders

and Home Improvement Centers

PotentialBenefits• Overa3yearhorizon,potentialincrementalgrossmarginfromimprovedpricingandreducedoperatingcostsrangedfrom$19.4mto$25.6minincrementalbenefits,anda4,342% ROI

AnalyticSprintResults• Demonstratedtheinclusionofexternaldatasetssuchasweather intothemixofdatatoimproveforecastaccuracy

• Demonstratednewmachinelearningtechniquesandtheabilitytoscaleclientexperimentationwithforecastmodelingto37k analyticmodelsinamatterofminutes

• Improvedpredictiongranularity frommonthlytoweekly,andaccuracytowithin+/- 5%oftheactualmarketprice73% ofthetime,comparedwith40% forthealgorithmusedbytheclient

Page 17: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,

BigData&DataScienceAdvisoryServicesGetintouchwithus!

Email:[email protected]

Blog:http://a-misra.com

OracleExternal:https://www.oracle.com/big-data

OracleInternal:http://bdcoe.us.oracle.com

Thiswork:https://a-misra.com/2016/08/01/big-data-is-also-big-compute/

Page 18: Big Data is also Big Compute - Analytics and Data Summit€¦ · •Oracle Big Data Cloud Services, including: Cloudera Hadoop Dist., Spark, Jupyter, Connectors, Data Integrator,