21
Big Data friends Scala and FP

Scala and-fp-in-big-data

Embed Size (px)

DESCRIPTION

Introducing Scala and FP (in 45 minutes) for the Big Data. Scala from Function to Future via Lazy. Samples are distribute-friendly implementations of mathematical models. This tal

Citation preview

Page 1: Scala and-fp-in-big-data

BigDatafriendsScalaandFP

Page 2: Scala and-fp-in-big-data

a.k.a.NoootsabProudhusbandandfather

pol'bonLidjeu

HavetowearglassessinceMathsgraduationin'03LearntodresswellsinceCSgraduationin'05

LostmyselfsinceexpertizeinGeomaticandGISRiskingmyselfin (GIS,BigDataandScala)

Publicinterestwork:co-foundedHelperandorganizerof

Scala

NextLabWajug

Devoxx4Kidstrainer

Page 3: Scala and-fp-in-big-data

WHYImeanitandothersdo...

ScalahasareputationtobeaccessibleIteasesthemaths(mostly[matrix]algebra)

TheCSworldischanging(fast)ItshiftsfromthecloudtoanalysisThatis,fromITneedstoMarket

opportunities

Page 4: Scala and-fp-in-big-data

Reusedknowledge

Fact:syntaxclosetoJava,C#,Ruby,...Cause:ObjectOriented

caseclassPerson(name:String,first:String,age:Double,gender:Gender,father:Option[Person],mother:Option[Person],children:List[Child]=Nil){defincAge(n:Int):Person=copy(age=age+n)defnewSon(child:Person):(Person,Person)={valnewChild=this.gendermatch{caseMale=>child.copy(father=Some(this))caseFemale=>child.copy(mother=Some(this))}(newChild,this.copy(children=newChild::children)}}

val_Noah=Person("Petrella","Noah",age=4,Male,mother=Some(Sandrine)

father=None)

valboringNoootsab=Person("Petrella","Andy",32,Male,father=Some(Arcangelo),mother=Some(Nadine))

val(Noah,happyNoootsab)=boringNoootsab.newSon(_Noah)

Page 5: Scala and-fp-in-big-data

FollowingthewaveFact:FunctionalProgrammingftw

Cause:ScalableLanguage

Please,bearwithme...

Page 6: Scala and-fp-in-big-data

WHO

alot

mainlydatafans

Page 7: Scala and-fp-in-big-data

Coursera

10⁶onlinestudentsPHP→Scala

ConcurrencyprimitivesPlay

TypesafetyEcosystem

Page 8: Scala and-fp-in-big-data

Twitter

REPLcaseclasses

productivitygainsconcisecode

ScalaschoolTensofopensourcelibs

Page 9: Scala and-fp-in-big-data

Netflix

BilliondevicesHistoricaleventsReal-timeanalytics

ProperAPI(Option)Async(Try)

Scalatra+ScalaTest

Page 10: Scala and-fp-in-big-data

Andmore

AirBnB

Snips(smartcities,...)

Tuplejump(analyticplatform)

eBay(analytics)

BBC(FutureMediaproject)

Virdata(IoTanalyticplatform)

Ooyala(videoanalyticplatform)

LinkedIn

Page 11: Scala and-fp-in-big-data

FunctionalProgramminginanutshell

INPUTx

FUNCTIONf:

OUTPUTf(x)

sourcewikipedia:http://en.wikipedia.org/wiki/Function_(mathematics)

Page 12: Scala and-fp-in-big-data

Inputxcanbeafunction...

Definesageneralprocessthatcouldbehavedifferently

listOfNamesmap{name=>DB.getByName(name)}

listOfPersonsflatMap{person=>person.friends}

listOfFriendsfilter{(f:Friend)=>f.metmoreThan(10years)}

listOfOldFriends.count(_.person.gender!=me.gender)

Page 13: Scala and-fp-in-big-data

Outputxbah...canbeafunctionaswell...

Preparesaprocessthatwillbeavailableforlaterusage

defauthentication(manager:SecurityManager):User=>Authentication

defsource(url:String):Authentication=>DataRepo=>Data

//[...]

valauthenticate=authentication(FakeSecurityManager)valsettings=source("/settings")defrequest={valuser=//...valauth=authenticate(user)valsettingsFetcher=settings(auth)//andsoon}

Page 14: Scala and-fp-in-big-data

Showmedeflm(x:List[Double],y:List[Double]):((Double,Double),Double=>Double)={valn=x.sizevalẍ=x.sum.toDouble/nvalÿ=y.sum.toDouble/nvalSp=((x·-ẍ)·*(y·-ÿ)sum)/(n-1)valSx2=((x·-ẍ)·^2sum)/(n-1)valß1=Sp/Sx2valß0=ÿ-ß1*ẍvalcoefs=(ß0,ß1)valpredict=(d:Double)=>ß0+ß1*d(coefs,predict)}

deftest(ß0:Double=18.1d,ß1:Double=6d,error:Int=>List[Double])={valn=10000valx:List[Double]=-n.toDoubletonby1toListvale=error(2*n+1)valy:List[Double]=ß0·+:(ß1·*:x)·+:elm(x,y)}valerror=rnorm(mean=0,sigma=5)//gengaussiannbsvalmodel=test(103,7,error)

ongithub

Page 15: Scala and-fp-in-big-data

Lazyyeahyeah...I'lldoit

lazyvalapp:App=initializeApp()

deflogDebug(m:=>String)=if(LOG.debugEnabled)LOG.error(m)else()

AvoidcomputationsDelayedinitialization

Page 16: Scala and-fp-in-big-data

SooolaaazyComeback...inapotentialfuture

TL;DWvalapp:Future[App]=initializeApp()

valhttp:Future[HttpClient]=app.map(_.http.client)

defisOk(url:String):Future[Boolean]=http.flatMap(client=>client.get(url)).map(_.code).filter(_==200).recoverWith{casex:CommunicationException=>isOk(url)}.recover{casee:Throwable=>false}

Page 17: Scala and-fp-in-big-data

Code...now(Ipromised)

classLazyCons[+A](a:A,t:=>Lazy[A])extendsLazy[A]{valhead=Some(a)lazyvaltail=t}

deffetch(file:String):Lazy[Future[String]]={valtexts=io.Source.fromFile(newjava.io.File(file)).getLinesdefreadLine(texts:Iterator[String]):Lazy[Future[String]]=//...readLine(texts)}

forthefun→valfibs:Stream[Int]=0#::1#::((fibszipfibs.drop(1))map((_:Int)+(_:Int)).tupled)

ongithub

Page 18: Scala and-fp-in-big-data

Mashup

Afunctioncouldeither→becalledondata(method,sync)

→besenttothedata(message,async)

Afunctioncomposes

Afunctionisadelayedcomputation

...

...

Page 19: Scala and-fp-in-big-data

Spark......

WhatifIcomposeallthecomputations

ThenIsendthewholeshebangtowherethedataare?

Map/Reduce:degeneratedcase

Spark:generalizedcase (seenexttalks)

.↓.↓.

Page 20: Scala and-fp-in-big-data

FunkycodetraitData{defdependent:List[Double]defobserved:Matrixdefbootstrap(proportion:Double):Future[Data]}

traitModel{typeCoefsdefapply(data:Data):Future[(Coefs,List[Double]=>Future[Double])]}

defbagging(model:Model)(agg:Aggregation[model.Coefs],n:Int)(data:Data):Future[model.Coefs]={defexec:Future[model.Coefs]=for{sample<-data.bootstrap(0.6)(coefs,_)<-model(sample)}yieldcoefsvalexecs:List[Future[model.Coefs]]=List.fill(n)(exec)valcoefsList:Future[List[model.Coefs]]=Future.sequence(execs)

valresult:Future[model.Coefs]=coefsListmapaggresult}

ongithub

Page 21: Scala and-fp-in-big-data

Enough!Thanks^_^

Pokeme:→forScalatraining→forfunwithData→withBooksideas