Upload
sanchita-sharma
View
216
Download
0
Embed Size (px)
Citation preview
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 1/25
*Machine learning
and Neuralnetworks
-Impact on Big data
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 2/25
Big Data –
*3Vs
*Generating Buzz - Scientific data exponential growth
*2012 - the ear of Big Data
*2013 - the ear of Big Data analtics
!achine learning –
*Branch of "#
*$ocus on the stud and construction of sste%s - predictions onunseen data
*"pplications in search engines& stoc' %ar'et analsis& speech
recognition &infor%ation retrie(al etc)
*Machine Learning &
Big Data
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 3/25
*ML History 1*+2- "rthur Sa%uel- first ga%e-plaing progra% - train chec'ers !achine learning - gained %o%entu% in earl *0,s earning algorith%s – co%e into co%%ercial sste%s .Baesian
networ's /
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 4/25
* Supervised and unsupervised
learnin l orithm
Logistics !egression
Neural Network
Support Vector !achine
lustering
"ther popular algorithms#-
ando% forest& asso& -%eans& SVD etc
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 5/25
*$ools used %or machine learning
*"pen source tools
– %iner& thon& 4a'a& Graphla5& 62o& 7cta(e
*ommercial $ools
!ahout& SaS& !atla5
4e'a detail Sci'it thon
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 6/25
6ow does learning "lgorith% wor' 8
inear egression
ogistic egression 9eural networ' "daline erceptron -!eans SV!
asso
ost $unction!ini%ization
GradientDescend
θ1
https:;;www)coursera)org;course;%l
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 7/25
6ow eas is it to co%pute gradient 8
θ
0
What if m is3,000,000
#n %e%or co%putingsu%%ation is not
possi5le
<raditional earningalgorith% 9ot wor'ing
for 5ig data
Solution '
Scale up your learning
algorithm %or Big Data
https:;;www)coursera)org;course;%l
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 8/25
6ow do is scale up 8
Stochastic (radient Descend
•!ore ti%e to con(erge•#n %e%or co%puting possi5le
10=+ training exa%ples and %orethan 10=+ features )asy *o+,,,
sofia&Shortgun-r>a(a-ingpipeShortgun-hon
https:;;www)coursera)org;course;%l
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 9/25
My data is even +igger and it takes a lot o% time,,,
!ap educe approach
300 000000
training
data
Data Split By Map
Reduce
Computer1
Computer3
Computer4
Computer 2
Combineresults
StochasticGradientDescend
Guest ecture on !-!ax inn
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 10/25
hat i% the algorithm is not additive.
Distri+uted Learning with Bagging
Guest ecture on !-!ax inn
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 11/25
hat all kind o% implementation are we
working on.
! and */L Bridge0 similar
Bridge %or eka0 1ython
->"? Bridge
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 12/25
Haloop
•6aloop inherits %ap reduce fro% 6adoop) #t adds (arious %odificationin order to support iterati(e %ap reduce tas')•6aoop has "# for easil writing iterati(e data analsis progra%•<here is a oop control %odule in %aster of 6aloop which starts new%ap reduce @o5 and and control exit•#n case of failure in iterati(e tas' the tas' scheduler and tas' trac'ers
facilitate reco(er and allow the iterati(e data analsis to continue)
Apache Mahout
R-Hadoop
6aloop Details !ahout Details r-6adoop
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 13/25
Implementations ofMachine learning
in Big data
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 14/25
*Neural Networks
pplication in Big Data
Manu%acturing and
industry
(overnment Banking and %inance Science and medicine
/uality control !issile targeting oan underwriting Speci%en identification
Si2 Sigma ri%inal 5eha(iour prediction redit scoring rotein seAuencing
Beer and wine %lavour
prediction
redit card fraud detection <u%our and tissue diagnosis
Natural language 1rocessing Cnerg price prediction 6eart attac' diagnosis
$elecommunication line %ault
detection
eal-estate appraisal 9ew drug effecti(eness
rediction of air and sea currents
"ir and water Aualit
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 15/25
*Sample implementation
*N4IDI 5
Built largest artificial neural networ' - purpose to si%ulate and learn 5eha(iorof hu%an 5rain)
9earl )+ ti%es larger than the one de(eloped 5 google in 2012
*N6N) 5
eader in 9atural anguage rocessing and speech recognition
*N)$7LI8 5
6ses neural networ's on 5ig chun' of user data generated through we5sites -predict 5etter reco%%endations for its users)
*1redicting India 4olatility Inde2 5
C%ploed on 5ig data generated 5 stoc' %ar'et for online learning) Esed to
forecast the upwards or downwards %otion in next trading daFs (olatilitusing #ndia V# .a (olatilit index 5ased on the 9#$<H #ndex 7ption prices/5ased indicators)
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 16/25
- ocation Graph
http:;;5log)@iwire)co%;how-5ig-data-ena5les-@iwire-to-deli(er-30-or-%ore-lift-in-ca%paign-perfor%ance;
o%5ines first-part data I the J5ig dataK I with platfor% 5ased on
%achine learning
"i% - eaching the right audience at the right
ti%e7ffline %achine learning process 5ased on 6i(e - datapreparation&4e'a L 6adoop !ahout - %achine learning acti(ities
L ! - statistics analsis and data (isualization)
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 17/25
Graph analtics
ommercial productraphLa+
rcData
ther Projects
appa-ni!ersit" of Washington #ro$ectitter Cassovary
eo4jiraph-an open source, %adoop-basedregel clone de!eloped at &aceboo'
http:;;gigao%)co%;2013;0+;1M;were-witnessing-the-rise-of-the-graph-in-5ig-data;
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 18/25
Big data& crowdsourcing and %achine learning tac'lear'inson,s
:;;successfulwor'place)co%;2013;0N;31;5ig-data-crowdsourcing-and-%achine-learning-tac'le-par'insons;
#79sol(er --was a5le to differentiate ar'inson,s patients fro% health indi(iduals
-show the trend in s%pto%s of the disease o(er ti%e
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 19/25
Eni(ersit of a%5ridge researcher "nastasios 9oulas - choosing the 5est retail location)
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 20/25
* pps that rely machine learning to work their
magic#
http:;;gigao%)co%;2012;11;03;+-trends-that-are-changing-how-we-do-5ig-data;
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 21/25
raised O30) %illion for its P#nsight Disco(erP technolog- #n'ed deals withGeneral Clectric& itigroup& !erc'& "nadar'o& E)S) $ood and Drug "d%inistration&enters for Disease ontrol and re(ention& the Eni(ersit of alifornia San$rancisco& !ount Sinai 6ospital& <exas "L! Eni(ersit and 6ar(ard !edicalSchool)
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 22/25
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 23/25
Big Data Q !achine earning Q rowd sourcing
6igh Esage of !achine earning
ando% $orest& 9eural 9etwor'
6igh predicti(e accurac of %ore than *+R
http:;;www)'aggle)co%;
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 24/25
Big Data Q !achine earning Q ?uantu% o%puting
Big Data
(raining set)arge Data
Set
*-+pp
ponentialCompression
.achine)earning
/eal timeSearch fig Data
http:;;www)eeti%es)co%;docu%ent)asp8docidT131*0+*
7/23/2019 ML in Big data
http://slidepdf.com/reader/full/ml-in-big-data 25/25
*$hank 9ou