UVA CS 4501: Machine Learning Lecture 1: Introduction€¦ · – Stock market transactions, corporate sales, airline traffic, … Entertainment OUR DATA-RICH WORLD. What can we do

UVACS4501:MachineLearning

Lecture1:Introduction

Dr.Yanjun Qi

UniversityofVirginiaDepartmentofComputerScience

Welcome• CS4501MachineLearning

– TuTh 3:30pm-4:45pm,– RiceHall130

• YourUVAcollab forAssignments:• CourseWebsite:

– https://qiyanjun.github.io/2018fUVA-CS4501MachineLearning/

9/4/18

YanjunQi/UVACS

2

Today

q CourseLogisticsqMachineLearningBasicsqMachineLearningHistoryq RoughPlanofCourseContent

9/4/18

YanjunQi/UVACS

3

CourseStaff

• Instructor:Prof.Yanjun Qi– QI:/ch ee/– Youcancallme“professor”,“professorQi”;– IhavebeenteachingGraduate-levelandUnder-LevelMachineLearningcourseforfiveyears!

– Myresearchisaboutmachinelearning

• TAandOfficeHourinformation@CourseWeb

9/4/18

YanjunQi/UVACS

4

CourseLogistics

• Q0- Quizfortheminimumbackgroundtest!!!!

• Courseemaillisthasbeensetup.Youshouldhavereceivedemailsalready!

• Policy,thegradewillbecalculatedasfollows:– Assignments(60%,Sixtotal,each~10%)– Midterm exam (20%)– Final exam (20%)

9/4/18

YanjunQi/UVACS

5

CourseLogistics

• Midterm:75mins• Final:75mins

• Sixassignments(each10%)– Three extensiondayspolicy(checkcoursewebsite)

• AlllateHomeworkshouldbesubmittedto [email protected]

9/4/18

YanjunQi/UVACS

6

HomeworkPolicy

• Policy,– HomeworkshouldbesubmittedelectronicallythroughUVaCollab

– Homeworkshouldbefinishedindividually– Dueatmidnightontheduedate

– Inordertopassthecourse,theaverageofyourmidtermandfinalmustalsobe"pass".

9/4/18

YanjunQi/UVACS

7

LateHomeworkPolicy

• Eachstudenthas three extensiondaystobeusedathisorherowndiscretionthroughouttheentirecourse.Yourgradeswouldbediscountedby15%perdaywhenyouusethese3latedays.Youcouldusethe3daysinwhatevercombinationyoulike.Forexample,all3dayson1assignment(foramaximumgradeof55%)or1eachdayover3assignments(foramaximumgradeof85%oneach).Afteryou'veusedall3days,youcannotgetcreditforanythingturnedinlate.

9/4/18

YanjunQi/UVACS

8

CourseMaterial

• Textbooksforthisclassis:– NONE

• Myslides– ifitisnotmentionedinmyslides,itisnotanofficialtopicofthecourse

9/4/18

YanjunQi/UVACS

9

CourseBackgroundNeeded

• BackgroundNeeded– Calculus,Basiclinearalgebra,BasicprobabilityandBasicAlgorithm

– Statisticsisrecommended.– Studentsshouldalreadyhavegoodprogrammingskills,i.e.python isrequiredforallprogrammingassignments

– Wewillreview“algebra”and“probability”inclass

9/4/18

YanjunQi/UVACS

10

Today


9/4/18

YanjunQi/UVACS

11

9/4/18

YanjunQi/UVACS

12

• Biomedicine– Patient records, brain imaging, MRI & CT scans, …– Genomic sequences, bio-structure, drug effect info, …

• Science– Historical documents, scanned books, databases from

astronomy, environmental data, climate records, …

• Social media– Social interactions data, twitter, facebook records, online

reviews, …

• Business– Stock market transactions, corporate sales, airline traffic,

…

Entertainment

OUR DATA-RICH WORLD

Whatcanwedowiththedatawealth?è REAL-WORLDIMPACT

§ Businessefficiencies§ Scientificbreakthroughs§ Improvequality-of-life:§ healthcare,§ energysaving/generation,§ environmentaldisasters,§ nursinghome,§ transportation,§ …

9/4/18

MedicalImages

GenomicData

TransportationData

Braincomputerinteraction(BCI)

Devicesensordata

YanjunQi/UVACS

13

9/4/18

YanjunQi/UVACS

14

• Data capturing (sensor, smart devices, medical instruments, et al.)

• Data transmission • Data storage • Data management • High performance data processing• Data visualization• Data security & privacy (e.g. multiple

individuals)• ……

• Data analytics¢How can we analyze this big data wealth ?¢E.g. Machine learning and data mining

BIG DATA CHALLENGES

this course

e.g.HCI

e.g.cloudcomputing

MACHINE LEARNING IS CHANGING THE WORLD

9/4/18 Manymore!

YanjunQi/UVACS

15

BASICS OF MACHINE LEARNING

• “The goal of machine learning is to build computer systems that can learn and adapt from their experience.” – Tom Dietterich

• “Experience” in the form of available dataexamples (also called as instances, samples)

• Available examples are described with properties (data points in feature space X)

9/4/18

YanjunQi/UVACS

16

9/4/18

YanjunQi/UVACS

17

e.g. SUPERVISED LEARNING• Find function to map input space X to

output space Y

• So that the difference between y and f(x)of each example x is small.

Ibelievethatthisbookisnotatallhelpfulsinceitdoesnotexplainthoroughlythematerial.itjustprovidesthereaderwithtablesandcalculationsthatsometimesarenoteasilyunderstood…

x

y-1

InputX:e.g.apieceofEnglishtext

OutputY:{1/Yes,-1/No}e.g.Isthisapositiveproduct review?

e.g.

SUPERVISED Linear Binary Classifier

• NowletuscheckoutaVERYSIMPLEcaseof

9/4/18

YanjunQi/UVACS

18

e.g.:Binaryy /Linearf/XasR2f x y

f(x,w,b) = sign(wT x + b)

X =(x_1,x_2)


f x y


wT x +b<0

CourtesyslidefromProf.AndrewMoore’stutorial

wTx +b>0

denotes +1 pointdenotes -1 point

9/4/18

YanjunQi/UVACS

19

X =(x_1,x_2)

x_1

X_2


f x y


wT x +b<0

CourtesyslidefromProf.AndrewMoore’stutorial

?

?

wTx +b>0

denotes +1 pointdenotes -1 pointdenotes future points

?

9/4/18

YanjunQi/UVACS

20

X =(x_1,x_2)

x_1

X_2

9/4/18

YanjunQi/UVACS

21

• Training (i.e. learning parameters w,b ) – Training set includes

• available examples x1,…,xL

• available corresponding labels y1,…,yL

– Find (w,b) by minimizing loss(i.e. difference between y and f(x) on available examples in training set)

(W, b) = argminW, b

Basic Concepts

• Testing (i.e. evaluating performance on “future” points)– Difference between true y? and the predicted f(x?) on a

set of testing examples (i.e. testing set)– Key: example x? not in the training set

• Generalisation:learnfunction/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”new dataexamples

9/4/18

Basic Concepts YanjunQi/UVACS

22

9/4/18

YanjunQi/UVACS

23

• Loss function – e.g. hinge loss for binary

classification task

– e.g. pairwise ranking loss for ranking task (i.e. ordering examples by preference)

• Regularization – E.g. additional information addedon loss function to control f

Basic Concepts

TYPICAL MACHINE LEARNING SYSTEM

9/4/18

Low-level sensing

Pre-processing

Feature Extract

Feature Select

Inference, Prediction, Recognition

Label Collection

YanjunQi/UVACS

24

Evaluation

“BigData”ChallengesforMachineLearning

9/4/18

üLargesizeofsamplesüHighdimensionalfeatures

Not the focus, being covered in my advanced-level course

YanjunQi/UVACS

25

Large-Scale Machine Learning: SIZE MATTERS

26

9/4/18

• One thousand data instances

• One million data instances

• One billion data instances

• One trillion data instancesThosearenotdifferentnumbers,

thosearedifferentmindsets !!!

YanjunQi/UVACS

BIG DATA CHALLENGES FOR MACHINE LEARNING

9/4/18

Thevariationsofboth X(feature,representation) andY(labels)arecomplex!

Most of this

courseüComplexityofXüComplexityofY

YanjunQi/UVACS

27

TYPICAL MACHINE LEARNING SYSTEM

9/4/18

Low-level sensing

Pre-processing

Feature Extract

Feature Select

Inference, Prediction, Recognition

Label Collection

Data Complexity of X

Data Complexity

of Y

YanjunQi/UVACS

28

Evaluation

UNSUPERVISED LEARNING : [ COMPLEXITY in Y ]

• No labels are provided (e.g. No Y provided)• Find patterns from unlabeled data, e.g. clustering

9/4/18

e.g.clustering=>tofind“natural” groupingofinstancesgivenun-labeleddata

YanjunQi/UVACS

29

STRUCTURAL OUTPUT LEARNING : [ COMPLEXITY OF Y ]

• Many prediction tasks involve output labels having structured correlations or constraints among instances

9/4/18

Manymorepossible structuresbetweeny_i ,e.g.spatial,temporal, relational…

Thedogchasedthecat

APAFSVSPASGACGPECA…

TreeSequence GridStructured Dependency between Examples’ Y

Input

Output

CCEEEEECCCCCHHHCCC…

YanjunQi/UVACS

30

Original Space Feature Space

STRUCTURAL INPUT : Kernel Methods [ COMPLEXITY OF X ]

e.g.Graphs,Sequences,3Dstructures,

9/4/18

YanjunQi/UVACS

31

�

�

�

MORE RECENT: FEATURE LEARNING[ COMPLEXITY OF X ]

Deep Learning Supervised Embedding

9/4/18

Layer-wise Pretraining

YanjunQi/UVACS

32

DEEP LEARNING / FEATURE LEARNING : [ COMPLEXITY OF X ]

Feature Engineering ü Most critical for accuracy ü Account for most of the computation for testing ü Most time-consuming in development cycle ü Often hand-craft and task dependent in practice

Feature Learning ü Easily adaptable to new similar tasks ü Layerwise representation ü Layer-by-layer unsupervised trainingü Layer-by-layer supervised training 339/4/18

YanjunQi/UVACS

Whylearnfeatures?

34

When to use Machine Learning (Adaptto/learn fromdata) ?

– 1. Extract knowledge from data– Relationships and correlations can be hidden within large

amounts of data– The amount of knowledge available about certain tasks is

simply too large for explicit encoding (e.g. rules) by humans

– 2. Learn tasks that are difficult to formalise– Hard to be defined well, except by examples, e.g., face

recognition

– 3. Create software that improves over time– New knowledge is constantly being discovered. – Rule or human encoding-based system is difficult to

continuously re-design “by hand”.

9/4/18

YanjunQi/UVACS

35

Today


9/4/18

YanjunQi/UVACS

36

MACHINE LEARNING IN COMPUTER SCIENCE

• Machine learning is already the preferred approach for – Speech recognition, natural language processing– Computer vision– Medical outcome analysis– Robot control …

• Why growing ?– Improved machine learning algorithms– Improved CPU / GPU powers – Increased data capture, new sensors, networking – Systems/Software too complex to control manually– Demand to self-customization for user, environment, ….

9/4/18

YanjunQi/UVACS

37

HISTORY OF MACHINE LEARNING

• 1950s– Samuel’s checker player– Selfridge’s Pandemonium

• 1960s: – Neural networks: Perceptron– Pattern recognition – Learning in the limit theory– Minsky and Papert prove limitations of Perceptron

• 1970s: – Symbolic concept induction– Winston’s arch learner– Expert systems and the knowledge acquisition bottleneck– Quinlan’s DT ID3– Michalski’s AQ and soybean diagnosis– Scientific discovery with BACON– Mathematical discovery with AM

9/4/18

YanjunQi/UVACS

38AdaptedFromProf.RaymondJ.Mooney’s slides

HISTORY OF MACHINE LEARNING (CONT.)

• 1980s:– Advanced decision tree and rule learning– Explanation-based Learning (EBL)– Learning and planning and problem solving– Utility problem– Analogy– Cognitive architectures– Resurgence of neural networks (connectionism, backpropagation)– Valiant’s PAC Learning Theory– Focus on experimental methodology

• 1990s– Data mining– Adaptive software agents and web applications– Text learning– Reinforcement learning (RL)– Inductive Logic Programming (ILP)– Ensembles: Bagging, Boosting, and Stacking– Bayes Net learning9/4/18

YanjunQi/UVACS



• 2000s– Support vector machines– Kernel methods– Graphical models– Statistical relational learning– Transfer learning– Sequence labeling– Collective classification and structured outputs– Computer Systems Applications

• Compilers• Debugging• Graphics• Security (intrusion, virus, and worm detection)

– Email management– Personalized assistants that learn– Learning in robotics and vision

9/4/18

YanjunQi/UVACS



• 2010s– Speech translation, voice recognition (e.g. SIRI)– Google search engine uses numerous machine learning

techniques (e.g. grouping news, spelling corrector, improving search ranking, image retrieval, …..)

– 23 and me (scan sample of person genome, predict likelihood of genetic disease, … )

– DeepMind, Google Brain, …– IBM waston QA system– Machine Learning as a service (e.g. google prediction API,

bigml.com, cloud autoML . ) – IBM healthcare analytics– ……

9/4/18

YanjunQi/UVACS


42

HISTORY OF MACHINE LEARNING

(CONT.)

43

RELATED DISCIPLINES• Artificial Intelligence• Data Mining• Probability and Statistics• Information theory• Numerical optimization• Computational complexity theory• Control theory (adaptive)• Psychology (developmental, cognitive)• Neurobiology• Linguistics• Philosophy9/4/18

YanjunQi/UVACS

WhatarethegoalsofAIresearch?

ArtifactsthatACTlikeHUMANS

ArtifactsthatTHINKlikeHUMANS

ArtifactsthatTHINKRATIONALLY

ArtifactsthatACTRATIONALLY

44From:M.A.Papalaskar

9/4/18

YanjunQi/UVACS

Howcanwebuildmoreintelligentcomputer/machine?

• Ableto– perceivetheworld– understandtheworld– reacttotheworld

• Thisneeds– Basicspeechcapabilities– Basicvisioncapabilities– Language/semanticunderstanding– Userbehavior/emotionunderstanding– Abletoact– Abletothink??

9/4/18 45

YanjunQi/UVACS

Howcanwebuildmoreintelligentcomputer/machine?: Milestones in

Recent Vision/AI Fields

9/4/18

YanjunQi/UVACS

46

ImageNet Competition:

[ Training on 1.2 million images [X]vs. 1000 different word labels [Y] ]

Detour:threeplannedprogrammingassignmentsaboutAItasks

• HW:Semanticlanguageunderstanding(sentimentclassificationonmoviereviewtext)

• HW:Visualobjectrecognition(labelingimagesabouthandwrittendigits)

• HW:Audiospeechrecognition(unsupervisedlearningbasedspeechrecognitiontask)

9/4/18 47

YanjunQi/UVACS

Today


9/4/18

YanjunQi/UVACS

48

CourseContentPlanèFivemajorsectionsofthiscourse

q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

9/4/18 49

YanjunQi/UVACS

Summary

• Thisisnotacourseabouthowtouseatoolbox

• Wefocusonlearningfundamentalprinciples,mathematicalformulation,algorithmdesignandlearningtheory.

9/4/18

YanjunQi/UVACS

50

SomenegativecommentsfromlastSpring

• Classwasboring,…• Theinstructorstatedthatthecoursewasgoingtobemath-heavywhich90%ofstudentsdidnotwant,andeventheremaining10%wereprobablyblownawayathowintensiveitreallywas…

9/4/18

YanjunQi/UVACS

51

AFEWSAMPLESLIDES

9/4/18

YanjunQi/UVACS

52

9/4/18 53

Dr.YanjunQi/UVACS

L4

L12:DerivingtheMaximumLikelihoodEstimateforBernoulli

!!L(p)= px(1− p)n−x

maximize

0.0

00.0

40.0

8

potential HIV prevalences

like

liho

od

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

!!log(L(p)= log px(1− p)n−x⎡⎣ ⎤⎦

maximize

!2

50

!1

50

!5

00

potential HIV prevalences

log(lik

elih

ood)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

−l(p)= − log px(1− p)n−x⎡⎣ ⎤⎦

Minimize the negative log-likelihood

05

01

00

20

0potential HIV prevalences

!lo

g(lik

elih

ood)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Dr.YanjunQi/UVACS

p

p

p

Task: y

Next lesson: Machine Learning in a Nutshell

Representation: x

Score Function: L()

Search/Optimization : argmin()

Models, Parameters : f, w, b

9/4/18 55

MLgrewoutofworkinAI

Optimizeaperformance criterionusing example dataorpast experience,

Aiming to generalize tounseen data

YanjunQi/UVACS

Next lesson: Review of linear algebra and basic calculus

References

q Prof.AndrewMoore’stutorialsq Prof.RaymondJ.Mooney’sslidesq Prof.AlexanderGray’sslidesq Prof.EricXing’sslidesq http://scikit-learn.org/q Hastie,Trevor,etal.Theelementsofstatisticallearning.Vol.2.No.1.NewYork:Springer,2009.

q Prof.M.A.Papalaskar’s slides

9/4/18

YanjunQi/UVACS

56

Documents

UVA CS 4501: Machine Learning Lecture 1: Introduction€¦ · – Stock market transactions, corporate sales, airline traffic, … Entertainment OUR DATA-RICH WORLD. What can we do