Course Info.Machine LearningCurve FittingDecision...

Course Info. Machine Learning Curve Fitting Decision Theory Probability Theory Conclusion

Introduction to Machine LearningBishop PRML Ch. 1

Alireza Ghane

Outline

Course Info.: People, References, Resources

Machine Learning: What, Why, and How?

Curve Fitting: (e.g.) Regression and Model Selection

Decision Theory: ML, Loss Function, MAP

Probability Theory: (e.g.) Probabilities and ParameterEstimation

Conclusion

Intro. to Machine Learning Alireza Ghane / Torsten Moller 1

Outline

Conclusion

Course Info.

Home: http://vda.univie.ac.at/Teaching/ML/15s

Discussions: https://moodle.univie.ac.at/

Dr. Torsten Moller

Alireza Ghane

Registration

Course max participants: 25

Course Registered: 61Number of Seats: 48

Excess: 13

Sign your name on the sheet

If you miss the first two sessions,you will be automatically SIGNED OFF the course!

References

Main Textbook: Pattern Recognition and Machine Learning,Christopher M. Bishop, Springer 2006.

Other Useful Resources:

• The Elements of Statistical Learning, Trevor Hastie, RobertTibshirani, and Jerome Friedman.

• Machine Learning, Tom Mitchel.

• Pattern Classification (2nd ed.), Richard O. Duda, Peter E.Hart, and David G. Stork.

• Machine Learning, An Algorithmic Perspective, StephenMarsland.

• The Top Ten Algorithms in Data Mining, X. Wu, V. Kumar.

• Learning from Data, Cherkassky-Mulier.

Online Courses: Andrew Ng: http://ml-class.org/

Grading

• Grading:

• Assignments / Labs ( 50% )5 assignments, 10% each

• Final Exam ( 40 % )

• Class Feedback ( 10 % )

• Assignment late policy

• 5 grace days for allassignments together

• after the grace days, 25%penalty for each day

• Programming with:MATLAB (licensed): http://de.mathworks.com/

Octave (free): https://www.gnu.org/software/octave/

Course Topics

We will cover techniques in the standard ML toolkit

maximum likelihood, regularization, support vector machines(SVM), Fisher’s linear discriminant (LDA), boosting, principal

components analysis (PCA), Markov random fields (MRF),neural networks, graphical models, belief propagation,

expectation-maximization (EM), mixture models, mixtures ofexperts (MoE), hidden Markov models (HMM), particle filters,

Markov Chain Monte Carlo (MCMC), Gibbs sampling, ...

Background

• Calculus:

E = mc2 ⇒ ∂E

∂c= 2mc

• Linear algebra (PRML Appendix C):

Aui = λiui;∂

∂x(xTa) = a

• Probability (PRML Ch. 1.2):

p(X) =∑Y

p(X,Y ); p(x) =

∫p(x, y)dy; Ex[f ] =

∫p(x)f(x)dx

It will be possible to refresh, but if you’venever seen these before this course will bevery difficult.

Outline

Conclusion

Intro. to Machine Learning Alireza Ghane / Greg Mori 9

What is Machine Learning (ML)?

• Algorithms that automatically improve performancethrough experience

• Often this means define a model by hand, and use data tofit its parameters

Why ML?

• The real world is complex – difficult to hand-craft solutions.

• ML is the preferred framework for applications in manyfields:

• Computer Vision• Natural Language Processing, Speech Recognition• Robotics• . . .

Hand-written Digit Recognition

!"#$%&'"()#*$#+ !,- )( !.,$#/+ +%((/#/01/! 233456 334

7/#()#-! 8/#9 */:: )0 "'%! /$!9 +$"$;$!/ +,/ ") "'/ :$1< )(

8$#%$"%)0 %0 :%&'"%0& =>?@ 2ABC D,!" -$</! %" ($!"/#56E'/ 7#)")"97/ !/:/1"%)0 $:&)#%"'- %! %::,!"#$"/+ %0 F%&6 GH6

C! !//0I 8%/*! $#/ $::)1$"/+ -$%0:9 ()# -)#/ 1)-7:/J1$"/&)#%/! *%"' '%&' *%"'%0 1:$!! 8$#%$;%:%"96 E'/ 1,#8/-$#</+ 3BK7#)") %0 F%&6 L !')*! "'/ %-7#)8/+ 1:$!!%(%1$"%)07/#()#-$01/ ,!%0& "'%! 7#)")"97/ !/:/1"%)0 !"#$"/&9 %0!"/$+)( /.,$::9K!7$1/+ 8%/*!6 M)"/ "'$" */ );"$%0 $ >6? 7/#1/0"

/##)# #$"/ *%"' $0 $8/#$&/ )( )0:9 (),# "*)K+%-/0!%)0$:

8%/*! ()# /$1' "'#//K+%-/0!%)0$: );D/1"I "'$0<! ") "'/

(:/J%;%:%"9 7#)8%+/+ ;9 "'/ -$"1'%0& $:&)#%"'-6

!"# $%&'() *+,-. */0+12.33. 4,3,5,6.

N,# 0/J" /J7/#%-/0" %08):8/! "'/ OAPQKR !'$7/ !%:'),/""/

+$"$;$!/I !7/1%(%1$::9 B)#/ PJ7/#%-/0" BPK3'$7/KG 7$#" SI

*'%1' -/$!,#/! 7/#()#-$01/ )( !%-%:$#%"9K;$!/+ #/"#%/8$:

=>T@6 E'/ +$"$;$!/ 1)0!%!"! )( GI?HH %-$&/!U RH !'$7/

1$"/&)#%/!I >H %-$&/! 7/# 1$"/&)#96 E'/ 7/#()#-$01/ %!

-/$!,#/+ ,!%0& "'/ !)K1$::/+ V;,::!/9/ "/!"IW %0 *'%1' /$1'

!"# $%%% &'()*(+&$,)* ,) -(&&%') ()(./*$* ()0 1(+2$)% $)&%..$3%)+%4 5,.6 784 ),6 784 (-'$. 7997

:;<6 #6 (== >? @AB C;DE=FDD;?;BG 1)$*& @BD@ G;<;@D HD;I< >HJ CB@A>G KLM >H@ >? "94999N6 &AB @BO@ FP>QB BFEA G;<;@ ;IG;EF@BD @AB BOFCR=B IHCPBJ

?>==>SBG PT @AB @JHB =FPB= FIG @AB FDD;<IBG =FPB=6

:;<6 U6 M0 >PVBE@ JBE><I;@;>I HD;I< @AB +,$.W79 GF@F DB@6 +>CRFJ;D>I >?@BD@ DB@ BJJ>J ?>J **04 *AFRB 0;D@FIEB K*0N4 FIG *AFRB 0;D@FIEB S;@A!!"#$%&$' RJ>@>@TRBD K*0WRJ>@>N QBJDHD IHCPBJ >? RJ>@>@TRB Q;BSD6 :>J**0 FIG *04 SB QFJ;BG @AB IHCPBJ >? RJ>@>@TRBD HI;?>JC=T ?>J F==>PVBE@D6 :>J *0WRJ>@>4 @AB IHCPBJ >? RJ>@>@TRBD RBJ >PVBE@ GBRBIGBG >I@AB S;@A;IW>PVBE@ QFJ;F@;>I FD SB== FD @AB PB@SBBIW>PVBE@ D;C;=FJ;@T6

:;<6 "96 -J>@>@TRB Q;BSD DB=BE@BG ?>J @S> G;??BJBI@ M0 >PVBE@D ?J>C @AB+,$. GF@F DB@ HD;I< @AB F=<>J;@AC GBDEJ;PBG ;I *BE@;>I !676 X;@A @A;DFRRJ>FEA4 Q;BSD FJB F==>EF@BG FGFR@;QB=T GBRBIG;I< >I @AB Q;DHF=E>CR=BO;@T >? FI >PVBE@ S;@A JBDRBE@ @> Q;BS;I< FI<=B6

Belongie et al. PAMI 2002

• Difficult to hand-craft rules about digits

Hand-written Digit Recognition

!"#$%&'"()#*$#+ !,- )( !.,$#/+ +%((/#/01/! 233456 334

7/#()#-! 8/#9 */:: )0 "'%! /$!9 +$"$;$!/ +,/ ") "'/ :$1< )(

8$#%$"%)0 %0 :%&'"%0& =>?@ 2ABC D,!" -$</! %" ($!"/#56E'/ 7#)")"97/ !/:/1"%)0 $:&)#%"'- %! %::,!"#$"/+ %0 F%&6 GH6

C! !//0I 8%/*! $#/ $::)1$"/+ -$%0:9 ()# -)#/ 1)-7:/J1$"/&)#%/! *%"' '%&' *%"'%0 1:$!! 8$#%$;%:%"96 E'/ 1,#8/-$#</+ 3BK7#)") %0 F%&6 L !')*! "'/ %-7#)8/+ 1:$!!%(%1$"%)07/#()#-$01/ ,!%0& "'%! 7#)")"97/ !/:/1"%)0 !"#$"/&9 %0!"/$+)( /.,$::9K!7$1/+ 8%/*!6 M)"/ "'$" */ );"$%0 $ >6? 7/#1/0"

/##)# #$"/ *%"' $0 $8/#$&/ )( )0:9 (),# "*)K+%-/0!%)0$:

8%/*! ()# /$1' "'#//K+%-/0!%)0$: );D/1"I "'$0<! ") "'/

(:/J%;%:%"9 7#)8%+/+ ;9 "'/ -$"1'%0& $:&)#%"'-6

!"# $%&'() *+,-. */0+12.33. 4,3,5,6.

N,# 0/J" /J7/#%-/0" %08):8/! "'/ OAPQKR !'$7/ !%:'),/""/

+$"$;$!/I !7/1%(%1$::9 B)#/ PJ7/#%-/0" BPK3'$7/KG 7$#" SI

*'%1' -/$!,#/! 7/#()#-$01/ )( !%-%:$#%"9K;$!/+ #/"#%/8$:

=>T@6 E'/ +$"$;$!/ 1)0!%!"! )( GI?HH %-$&/!U RH !'$7/

1$"/&)#%/!I >H %-$&/! 7/# 1$"/&)#96 E'/ 7/#()#-$01/ %!

-/$!,#/+ ,!%0& "'/ !)K1$::/+ V;,::!/9/ "/!"IW %0 *'%1' /$1'

!"# $%%% &'()*(+&$,)* ,) -(&&%') ()(./*$* ()0 1(+2$)% $)&%..$3%)+%4 5,.6 784 ),6 784 (-'$. 7997

:;<6 #6 (== >? @AB C;DE=FDD;?;BG 1)$*& @BD@ G;<;@D HD;I< >HJ CB@A>G KLM >H@ >? "94999N6 &AB @BO@ FP>QB BFEA G;<;@ ;IG;EF@BD @AB BOFCR=B IHCPBJ

?>==>SBG PT @AB @JHB =FPB= FIG @AB FDD;<IBG =FPB=6

:;<6 U6 M0 >PVBE@ JBE><I;@;>I HD;I< @AB +,$.W79 GF@F DB@6 +>CRFJ;D>I >?@BD@ DB@ BJJ>J ?>J **04 *AFRB 0;D@FIEB K*0N4 FIG *AFRB 0;D@FIEB S;@A!!"#$%&$' RJ>@>@TRBD K*0WRJ>@>N QBJDHD IHCPBJ >? RJ>@>@TRB Q;BSD6 :>J**0 FIG *04 SB QFJ;BG @AB IHCPBJ >? RJ>@>@TRBD HI;?>JC=T ?>J F==>PVBE@D6 :>J *0WRJ>@>4 @AB IHCPBJ >? RJ>@>@TRBD RBJ >PVBE@ GBRBIGBG >I@AB S;@A;IW>PVBE@ QFJ;F@;>I FD SB== FD @AB PB@SBBIW>PVBE@ D;C;=FJ;@T6

:;<6 "96 -J>@>@TRB Q;BSD DB=BE@BG ?>J @S> G;??BJBI@ M0 >PVBE@D ?J>C @AB+,$. GF@F DB@ HD;I< @AB F=<>J;@AC GBDEJ;PBG ;I *BE@;>I !676 X;@A @A;DFRRJ>FEA4 Q;BSD FJB F==>EF@BG FGFR@;QB=T GBRBIG;I< >I @AB Q;DHF=E>CR=BO;@T >? FI >PVBE@ S;@A JBDRBE@ @> Q;BS;I< FI<=B6

ti = (0, 0, 0, 1, 0, 0, 0, 0, 0, 0)

• Represent input image as a vector xi ∈ R784.

• Suppose we have a target vector ti• This is supervised learning• Discrete, finite label set: perhaps ti ∈ {0, 1}10, a

classification problem

• Given a training set {(x1, t1), . . . , (xN , tN )}, learningproblem is to construct a “good” function y(x) from these.

• y : R784 → R10

Face Detection

Schneiderman and Kanade, IJCV 2002

• Classification problem

• ti ∈ {0, 1, 2}, non-face, frontal face, profile face.

Spam Detection

• Classification problem

• ti ∈ {0, 1}, non-spam, spam

• xi counts of words, e.g. Viagra, stock, outperform,multi-bagger

Caveat - Horses (source?)

• Once upon a time there were two neighboring farmers, Jedand Ned. Each owned a horse, and the horses both liked tojump the fence between the two farms. Clearly the farmersneeded some means to tell whose horse was whose.

• So Jed and Ned got together and agreed on a scheme fordiscriminating between horses. Jed would cut a small notchin one ear of his horse. Not a big, painful notch, but justbig enough to be seen. Well, wouldn’t you know it, the dayafter Jed cut the notch in horse’s ear, Ned’s horse caughton the barbed wire fence and tore his ear the exact sameway!

• Something else had to be devised, so Jed tied a big bluebow on the tail of his horse. But the next day, Jed’s horsejumped the fence, ran into the field where Ned’s horse wasgrazing, and chewed the bow right off the other horse’s tail.Ate the whole bow!

Caveat - Horses (source?)

• Finally, Jed suggested, and Ned concurred, that theyshould pick a feature that was less apt to change. Heightseemed like a good feature to use. But were the heightsdifferent? Well, each farmer went and measured his horse,and do you know what? The brown horse was a full inchtaller than the white one!

Moral of the story: ML provides theory and tools for settingparameters. Make sure you have the right model and features.Think about your “feature vector x.”

Stock Price Prediction

• Problems in which ti is continuous are called regression

• E.g. ti is stock price, xi contains company profit, debt,cash flow, gross sales, number of spam emails sent, . . .

Clustering Images

Wang et al., CVPR 2006

• Only xi is defined: unsupervised learning

• E.g. xi describes image, find groups of similar imagesIntro. to Machine Learning Alireza Ghane / Greg Mori 19

Types of Learning Problems

• Supervised Learning• Classification• Regression

• Unsupervised Learning• Density estimation• Clustering: k-means, mixture models, hierarchical clustering• Hidden Markov models

• Reinforcement Learning

Outline

Conclusion

An Example - Polynomial Curve Fitting

• Suppose we are given training set of N observations(x1, . . . , xN ) and (t1, . . . , tN ), xi, ti ∈ R

• Regression problem, estimate y(x) from these data

Polynomial Curve Fitting

• What form is y(x)?• Let’s try polynomials of degree M :

y(x,w) = w0+w1x+w2x2+. . .+wMx

• This is the hypothesis space.

• How do we measure success?• Sum of squared errors:

E(w) =1

N∑n=1

{y(xn,w)− tn}2

• Among functions in the class, choosethat which minimizes this error

y(xn,w)

Polynomial Curve Fitting

• Error function

E(w) =1

N∑n=1

{y(xn,w)− tn}2

• Best coefficients

w∗ = arg minw

• Found using pseudo-inverse (more later)

Which Degree of Polynomial?

��

• A model selection problem

• M = 9→ E(w∗) = 0: This is over-fittingIntro. to Machine Learning Alireza Ghane / Greg Mori 25

Generalization

��

0 3 6 90

1TrainingTest

• Generalization is the holy grail of ML• Want good performance for new data

• Measure generalization using a separate set• Use root-mean-squared (RMS) error: ERMS =

√2E(w∗)/N

Controlling Over-fitting: Regularization

��

• As order of polynomial M increases, so do coefficientmagnitudes

• Penalize large coefficients in error function:

E(w) =1

N∑n=1

{y(xn,w)− tn}2 +λ

2||w||2

� ��

��

� ��−35 −30 −25 −200

1TrainingTest

• Note the ERMS for the training set. Perfect match oftraining set with the model is a result of over-fitting

• Training and test error show similar trend

Over-fitting: Dataset size

��

• With more data, more complex model (M = 9) can be fit

• Rule of thumb: 10 datapoints for each parameter

Validation Set

• Split training data into training set and validation set

• Train different models (e.g. diff. order polynomials) ontraining set

• Choose model (e.g. order of polynomial) with minimumerror on validation set

Cross-validation

• Data are often limited

• Cross-validation creates S groups of data, use S − 1 totrain, other to validate

• Extreme case leave-one-out cross-validation (LOO-CV): S isnumber of training data points

• Cross-validation is an effective method for model selection,but can be slow

• Models with multiple complexity parameters: exponentialnumber of runs

Summary

• Want models that generalize to new data• Train model on training set• Measure performance on held-out test set

• Performance on test set is good estimate of performance onnew data

Summary - Model Selection

• Which model to use? E.g. which degree polynomial?• Training set error is lower with more complex model

• Can’t just choose the model with lowest training error

• Peeking at test error is unfair. E.g. picking polynomial withlowest test error

• Performance on test set is no longer good estimate ofperformance on new data

Summary - Solutions I

• Use a validation set• Train models on training set. E.g. different degree

polynomials• Measure performance on held-out validation set• Measure performance of that model on held-out test set

• Can use cross-validation on training set instead of aseparate validation set if little data and lots of time

• Choose model with lowest error over all cross-validationfolds (e.g. polynomial degree)

• Retrain that model using all training data (e.g. polynomialcoefficients)

Summary - Solutions II

• Use regularization• Train complex model (e.g high order polynomial) but

penalize being “too complex” (e.g. large weight magnitudes)• Need to balance error vs. regularization (λ)

• Choose λ using cross-validation

• Get more data

Outline

Conclusion

Decision Theory

For a sample x, decide which class(Ck) it is from.

Ideas:

• Maximum Likelihood

• Minimum Loss/Cost (e.g. misclassification rate)

• Maximum Aposteriori (MAP)

Intro. to Machine Learning Alireza Ghane 38

Decision: Maximum Likelihood

• Inference step: Determine statistics from training data.

p(x, t) OR p(x|Ck)• Decision step: Determine optimal t for test input x:

t = arg maxk{ p (x|Ck)︸︷︷︸ }

Likelihood

Decision: Minimum Misclassification Rate

p(mistake) = p (x ∈ R1, C2) + p (x ∈ R2, C1)=

∫R1p (x, C2) dx +

∫R2p (x, C1) dx

p(mistake) =∑k

∫Rjp (x, Ck) dx

p(x, C1)

p(x, C2)

x: decision boundary.x0: optimal decision boundary

x0 : arg minR1

{p (mistake)}

Decision: Minimum Loss/Cost

• Misclassification rate:

R : arg min{Ri|i∈{1,··· ,K}}

L (Rj , Ck)

• Weighted loss/cost function:

R : arg min{Ri|i∈{1,··· ,K}}

Wj,kL (Rj , Ck)

Is useful when:

• The population of the classes are different• The failure cost is non-symmetric• · · ·

Decision: Maximum Aposteriori (MAP)

Bayes’ Theorem: P{A|B} = P{B|A}P{A}P{B}

p(Ck|x)︸︷︷︸Posterior

∝ p(x|Ck)︸︷︷︸Likelihood

p(Ck)︸︷︷︸Prior

• Provides an Aposteriori Belief for the estimation, ratherthan a single point estimate.

• Can utilize Apriori Information in the decision.

Outline

Conclusion

Coin Tossing

• Let’s say you’re given a coin, and you want to find outP (heads), the probability that if you flip it it lands as“heads”.

• Flip it a few times: H H T

• P (heads) = 2/3

• Hmm... is this rigorous? Does this make sense?

Coin Tossing - Model

• Bernoulli distribution P (heads) = µ, P (tails) = 1− µ• Assume coin flips are independent and identically

distributed (i.i.d.)• i.e. All are separate samples from the Bernoulli distribution

• Given data D = {x1, . . . , xN}, heads: xi = 1, tails: xi = 0,the likelihood of the data is:

p(D|µ) =

N∏n=1

p(xn|µ) =

N∏n=1

µxn(1− µ)1−xn

Maximum Likelihood Estimation

• Given D with h heads and t tails

• What should µ be?

• Maximum Likelihood Estimation (MLE): choose µ whichmaximizes the likelihood of the data

µML = arg maxµ

p(D|µ)

• Since ln(·) is monotone increasing:

µML = arg maxµ

ln p(D|µ)

Maximum Likelihood Estimation

• Likelihood:

p(D|µ) =N∏n=1

µxn(1− µ)1−xn

• Log-likelihood:

ln p(D|µ) =N∑n=1

xn lnµ+ (1− xn) ln(1− µ)

• Take derivative, set to 0:

dµln p(D|µ) =

N∑n=1

µ− (1− xn)

1− µ=

µh− 1

1− µt

⇒ µ =h

t+ hIntro. to Machine Learning Alireza Ghane / Greg Mori 47

Bayesian Learning

• Wait, does this make sense? What if I flip 1 time, heads?Do I believe µ=1?

• Learn µ the Bayesian way:

P (µ|D) =P (D|µ)P (µ)

P (µ|D)︸︷︷︸posterior

∝ P (D|µ)︸︷︷︸likelihood

P (µ)︸︷︷︸prior

• Prior encodes knowledge that most coins are 50-50

• Conjugate prior makes math simpler, easy interpretation• For Bernoulli, the beta distribution is its conjugate

Beta Distribution

• We will use the Beta distribution to express our priorknowledge about coins:

Beta(µ|a, b) =Γ(a+ b)

Γ(a)Γ(b)︸︷︷︸normalization

µa−1(1− µ)b−1

• Parameters a and b control the shape of this distribution

Posterior

P (µ|D) ∝ P (D|µ)P (µ)

∝N∏n=1

µxn(1− µ)1−xn︸︷︷︸likelihood

µa−1(1− µ)b−1︸︷︷︸prior

∝ µh(1− µ)tµa−1(1− µ)b−1

∝ µh+a−1(1− µ)t+b−1

• Simple form for posterior is due to use of conjugate prior

• Parameters a and b act as extra observations

• Note that as N = h+ t→∞, prior is ignored

Maximum A Posteriori

• Given posterior P (µ|D) we could compute a single value,known as the Maximum a Posteriori (MAP) estimate for µ:

µMAP = arg maxµ

P (µ|D)

• Known as point estimation

• However, correct Bayesian thing to do is to use the fulldistribution over µ

• i.e. Compute

Eµ[f ] =

∫p(µ|D)f(µ)dµ

• This integral is usually hard to compute

Polynomial Curve Fitting: What We Did

• What form is y(x)?• Let’s try polynomials of degree M :

y(x,w) = w0+w1x+w2x2+. . .+wMx

• This is the hypothesis space.

• How do we measure success?• Sum of squared errors:

E(w) =1

N∑n=1

{y(xn,w)− tn}2

• Among functions in the class, choosethat which minimizes this error

y(xn,w)

Curve Fitting: Probabilistic Approach

2σy(x0,w)

y(x,w)

p(t|x0,w, β)

2σy(x0,w)

y(x,w)

p(t|x0,w, β)

2σy(x0,w)

y(x,w)

p(t|x0,w, β)

p(t|x,w, β) =N∏n=1

N(tn|y(xn,w), β−1

)ln (p(t|x,w, β)) = − β

N∑n=1

{y(xn,w)− tn}2︸︷︷︸βE(w)

2lnβ︸︷︷︸

const.

− N2

ln (2π)︸︷︷︸const.

Maximize log-likelihood ⇔ Minimize E(w).Can optimize for β as well.

Curve Fitting: Bayesian Approach

2σy(x0,w)

y(x,w)

p(t|x0,w, β)

2σy(x0,w)

y(x,w)

p(t|x0,w, β)

2σy(x0,w)

y(x,w)

p(t|x0,w, β)

p(t|x,w, β) =N∏n=1

N(tn|y(xn,w), β−1

)Posterior Dist.:p (w|x, t, α, β) ∝ p (t|x,w, β) p (t|α)

Minimize:β

N∑n=1

{y(xn,w)− tn}2︸︷︷︸βE(w)

2wTw︸︷︷︸

regularization.

Curve Fitting: Bayesian

p (t|x0,w, β, α) = N (t|Pw(x0), Qw,β,α(x0))

p (t|x,x, t) = N(t|m(x), s2(x)

m(x) = φ(x)TSN∑n=1

φ(xn)tn

s2(x) = β−1(1 + φ(x)TSφ(x)

)S−1 =

N∑n=1

φ(xn)φ(xn)T

2σy(x0,w)

y(x,w)

p(t|x0,w, β)

Conclusion

• Readings: Chapter 1.1, 1.3, 1.5, 2.1

• Types of learning problems• Supervised: regression, classification• Unsupervised

• Learning as optimization• Squared error loss function• Maximum likelihood (ML)• Maximum a posteriori (MAP)

• Want generalization, avoid over-fitting• Cross-validation• Regularization• Bayesian prior on model parameters

Course Info.Machine LearningCurve FittingDecision...

Documents

Exploring the Digital Music Instrument Trombosonic with ...vda.univie.ac.at/Teaching/HCI/18s/materials/... · The main contribution of this paper is the presentation of the new digital

Intro to P1+P2+Masterseminar Computer Science (CS ...vda.univie.ac.at/Teaching/General/mse18s.pdf · Paralelle Architekturen Gatekeeper / Cluster LV Gatekeeper / Cluster LV ... server

Prototyping (part 2 )vda.univie.ac.at/Teaching/HCI/14s/LectureNotes/06_PT_Design.pdf · High-Fidelity Prototypes • Looks & feels like the ﬁnal product to the user! •Colors,

- NCTE · PDF fileA thematic literature ... including LearningCurve adaptive quizzing. ... Michael J. Faris’s “Coffee Shop Writing in a Networked Age”:

A History of Western Society integrated with LearningCurve 11 th Edition, Volume 1 McKay | Crowston | Wiesner-Hanks | Perry ©2014 Bedford/St. Martin’s

Recap of Basic Probability Theoryprobability theory Why recap probability theory? The set-up of probability theory Conditional probabilities Stochastic variables FX, the cumulated

052600 VU Signal and Image Processing Torsten Möller ...vda.univie.ac.at/Teaching/SIP/17s/LectureNotes/11_FFT.pdf · Signal and Image Processing Torsten Möller + Hrvoje Bogunović+

e-Portfolio LearningCurve LaunchPad x-Books Integrated Media

Clustering Visualiser - Visualising partitioning-based ...vda.univie.ac.at/Teaching/Vis/17w/files/final_reports/final_report_g15… · tering Visualizer, a novel clustering algorithm

Human-Computer-Interaction und Psychologie: HCD …vda.univie.ac.at/.../02_Benutzeranalyse-HCD-Prozess.pdfHCD – User analysis – Personas VU Human Computer Interaction und Psychologie

enParadigm’s Learning Curve Simulation WorkshopTMcdn.enparadigm.com/documents/brochures/LearningCurve... · 2014. 4. 9. · Merck, NIIT, SAP, and Tata Motors. Learning Curve Simulation

Interactive analysis of the Viennese media transparency network - …vda.univie.ac.at/Teaching/Vis/16w/Final_Report/03.pdf · 2017. 1. 30. · Ulrich Schweinitzer University of Vienna

UBLICATION OF THE ON S UPPORT ACADEMIC The Learning Curve ...lawschoolasp.org/docs/LearningCurve-Winter-2017.pdf · We have articles that approach the programmatic aspects of law

Intro to P1+P2+Masterseminar Computer Science (CS ...vda.univie.ac.at/Teaching/P1_P2_Masterseminar/200302.pdf · Intro P1, P2, Master seminar Uni Wien The idea — V1 •Master topic

Probability 2020 Recap of Measure TheoryProbability 2020 Recap of Measure Theory Giovanni Pistone Draft February 9, 2020. Some books There will be written notes of this course. Howevere,

Non-parametric Methods - Machine Learningvda.univie.ac.at/Teaching/ML/16s/LectureNotes/02_non-parametric.pdf · ML-OverviewCurve FittingDecision TheoryProbability TheoryKernel Density

Touchscreen Haptic Augmentation Effects on Tapping, Drag ...vda.univie.ac.at/.../Lit-Tests/4TouchscreenHaptics.pdf · 2 BACKGROUND AND RELATED WORK Mclean [29] provides a broad overview

PSYC 5: GENERAL PSYCHOLOGY (ONLINE), El Camino College · PDF filePSYC 5: GENERAL PSYCHOLOGY (ONLINE), El Camino ... (be sure to reference “PSYC 5 Online” in the subject ... LearningCurve

Mike Phillips michael.phillips@univie.ac.at Slides created ...vda.univie.ac.at/Teaching/ImageSynthesis/16w/... · Slides created by Usman Alim } A physically based rendering system

“I’ve been manipulated!”: Designing Second Screen ...vda.univie.ac.at/Teaching/HCI/19s/materials/Lit-Tests/...For instance, HBO’s “companion apps” are designed to be used