Upload
andy-petrella
View
156
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Introducing Scala and FP (in 45 minutes) for the Big Data. Scala from Function to Future via Lazy. Samples are distribute-friendly implementations of mathematical models. This tal
Citation preview
BigDatafriendsScalaandFP
a.k.a.NoootsabProudhusbandandfather
pol'bonLidjeu
HavetowearglassessinceMathsgraduationin'03LearntodresswellsinceCSgraduationin'05
LostmyselfsinceexpertizeinGeomaticandGISRiskingmyselfin (GIS,BigDataandScala)
Publicinterestwork:co-foundedHelperandorganizerof
Scala
NextLabWajug
Devoxx4Kidstrainer
WHYImeanitandothersdo...
ScalahasareputationtobeaccessibleIteasesthemaths(mostly[matrix]algebra)
TheCSworldischanging(fast)ItshiftsfromthecloudtoanalysisThatis,fromITneedstoMarket
opportunities
Reusedknowledge
Fact:syntaxclosetoJava,C#,Ruby,...Cause:ObjectOriented
caseclassPerson(name:String,first:String,age:Double,gender:Gender,father:Option[Person],mother:Option[Person],children:List[Child]=Nil){defincAge(n:Int):Person=copy(age=age+n)defnewSon(child:Person):(Person,Person)={valnewChild=this.gendermatch{caseMale=>child.copy(father=Some(this))caseFemale=>child.copy(mother=Some(this))}(newChild,this.copy(children=newChild::children)}}
val_Noah=Person("Petrella","Noah",age=4,Male,mother=Some(Sandrine)
father=None)
valboringNoootsab=Person("Petrella","Andy",32,Male,father=Some(Arcangelo),mother=Some(Nadine))
val(Noah,happyNoootsab)=boringNoootsab.newSon(_Noah)
FollowingthewaveFact:FunctionalProgrammingftw
Cause:ScalableLanguage
Please,bearwithme...
WHO
alot
mainlydatafans
Coursera
10⁶onlinestudentsPHP→Scala
ConcurrencyprimitivesPlay
TypesafetyEcosystem
REPLcaseclasses
productivitygainsconcisecode
ScalaschoolTensofopensourcelibs
Netflix
BilliondevicesHistoricaleventsReal-timeanalytics
ProperAPI(Option)Async(Try)
Scalatra+ScalaTest
Andmore
AirBnB
Snips(smartcities,...)
Tuplejump(analyticplatform)
eBay(analytics)
BBC(FutureMediaproject)
Virdata(IoTanalyticplatform)
Ooyala(videoanalyticplatform)
FunctionalProgramminginanutshell
INPUTx
FUNCTIONf:
OUTPUTf(x)
sourcewikipedia:http://en.wikipedia.org/wiki/Function_(mathematics)
Inputxcanbeafunction...
Definesageneralprocessthatcouldbehavedifferently
listOfNamesmap{name=>DB.getByName(name)}
listOfPersonsflatMap{person=>person.friends}
listOfFriendsfilter{(f:Friend)=>f.metmoreThan(10years)}
listOfOldFriends.count(_.person.gender!=me.gender)
Outputxbah...canbeafunctionaswell...
Preparesaprocessthatwillbeavailableforlaterusage
defauthentication(manager:SecurityManager):User=>Authentication
defsource(url:String):Authentication=>DataRepo=>Data
//[...]
valauthenticate=authentication(FakeSecurityManager)valsettings=source("/settings")defrequest={valuser=//...valauth=authenticate(user)valsettingsFetcher=settings(auth)//andsoon}
Showmedeflm(x:List[Double],y:List[Double]):((Double,Double),Double=>Double)={valn=x.sizevalẍ=x.sum.toDouble/nvalÿ=y.sum.toDouble/nvalSp=((x·-ẍ)·*(y·-ÿ)sum)/(n-1)valSx2=((x·-ẍ)·^2sum)/(n-1)valß1=Sp/Sx2valß0=ÿ-ß1*ẍvalcoefs=(ß0,ß1)valpredict=(d:Double)=>ß0+ß1*d(coefs,predict)}
deftest(ß0:Double=18.1d,ß1:Double=6d,error:Int=>List[Double])={valn=10000valx:List[Double]=-n.toDoubletonby1toListvale=error(2*n+1)valy:List[Double]=ß0·+:(ß1·*:x)·+:elm(x,y)}valerror=rnorm(mean=0,sigma=5)//gengaussiannbsvalmodel=test(103,7,error)
ongithub
Lazyyeahyeah...I'lldoit
lazyvalapp:App=initializeApp()
deflogDebug(m:=>String)=if(LOG.debugEnabled)LOG.error(m)else()
AvoidcomputationsDelayedinitialization
SooolaaazyComeback...inapotentialfuture
TL;DWvalapp:Future[App]=initializeApp()
valhttp:Future[HttpClient]=app.map(_.http.client)
defisOk(url:String):Future[Boolean]=http.flatMap(client=>client.get(url)).map(_.code).filter(_==200).recoverWith{casex:CommunicationException=>isOk(url)}.recover{casee:Throwable=>false}
Code...now(Ipromised)
classLazyCons[+A](a:A,t:=>Lazy[A])extendsLazy[A]{valhead=Some(a)lazyvaltail=t}
deffetch(file:String):Lazy[Future[String]]={valtexts=io.Source.fromFile(newjava.io.File(file)).getLinesdefreadLine(texts:Iterator[String]):Lazy[Future[String]]=//...readLine(texts)}
forthefun→valfibs:Stream[Int]=0#::1#::((fibszipfibs.drop(1))map((_:Int)+(_:Int)).tupled)
ongithub
Mashup
Afunctioncouldeither→becalledondata(method,sync)
→besenttothedata(message,async)
Afunctioncomposes
Afunctionisadelayedcomputation
...
...
Spark......
WhatifIcomposeallthecomputations
ThenIsendthewholeshebangtowherethedataare?
Map/Reduce:degeneratedcase
Spark:generalizedcase (seenexttalks)
.↓.↓.
FunkycodetraitData{defdependent:List[Double]defobserved:Matrixdefbootstrap(proportion:Double):Future[Data]}
traitModel{typeCoefsdefapply(data:Data):Future[(Coefs,List[Double]=>Future[Double])]}
defbagging(model:Model)(agg:Aggregation[model.Coefs],n:Int)(data:Data):Future[model.Coefs]={defexec:Future[model.Coefs]=for{sample<-data.bootstrap(0.6)(coefs,_)<-model(sample)}yieldcoefsvalexecs:List[Future[model.Coefs]]=List.fill(n)(exec)valcoefsList:Future[List[model.Coefs]]=Future.sequence(execs)
valresult:Future[model.Coefs]=coefsListmapaggresult}
ongithub
Enough!Thanks^_^
Pokeme:→forScalatraining→forfunwithData→withBooksideas