Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
UVACS4501:MachineLearning
Lecture1:Introduction
Dr.Yanjun Qi
UniversityofVirginiaDepartmentofComputerScience
Welcome• CS4501MachineLearning
– TuTh 3:30pm-4:45pm,– RiceHall130
• YourUVAcollab forAssignments:• CourseWebsite:
– https://qiyanjun.github.io/2018fUVA-CS4501MachineLearning/
9/4/18
YanjunQi/UVACS
2
Today
q CourseLogisticsqMachineLearningBasicsqMachineLearningHistoryq RoughPlanofCourseContent
9/4/18
YanjunQi/UVACS
3
CourseStaff
• Instructor:Prof.Yanjun Qi– QI:/ch ee/– Youcancallme“professor”,“professorQi”;– IhavebeenteachingGraduate-levelandUnder-LevelMachineLearningcourseforfiveyears!
– Myresearchisaboutmachinelearning
• TAandOfficeHourinformation@CourseWeb
9/4/18
YanjunQi/UVACS
4
CourseLogistics
• Q0- Quizfortheminimumbackgroundtest!!!!
• Courseemaillisthasbeensetup.Youshouldhavereceivedemailsalready!
• Policy,thegradewillbecalculatedasfollows:– Assignments(60%,Sixtotal,each~10%)– Midterm exam (20%)– Final exam (20%)
9/4/18
YanjunQi/UVACS
5
CourseLogistics
• Midterm:75mins• Final:75mins
• Sixassignments(each10%)– Three extensiondayspolicy(checkcoursewebsite)
• AlllateHomeworkshouldbesubmittedto [email protected]
9/4/18
YanjunQi/UVACS
6
HomeworkPolicy
• Policy,– HomeworkshouldbesubmittedelectronicallythroughUVaCollab
– Homeworkshouldbefinishedindividually– Dueatmidnightontheduedate
– Inordertopassthecourse,theaverageofyourmidtermandfinalmustalsobe"pass".
9/4/18
YanjunQi/UVACS
7
LateHomeworkPolicy
• Eachstudenthas three extensiondaystobeusedathisorherowndiscretionthroughouttheentirecourse.Yourgradeswouldbediscountedby15%perdaywhenyouusethese3latedays.Youcouldusethe3daysinwhatevercombinationyoulike.Forexample,all3dayson1assignment(foramaximumgradeof55%)or1eachdayover3assignments(foramaximumgradeof85%oneach).Afteryou'veusedall3days,youcannotgetcreditforanythingturnedinlate.
9/4/18
YanjunQi/UVACS
8
CourseMaterial
• Textbooksforthisclassis:– NONE
• Myslides– ifitisnotmentionedinmyslides,itisnotanofficialtopicofthecourse
9/4/18
YanjunQi/UVACS
9
CourseBackgroundNeeded
• BackgroundNeeded– Calculus,Basiclinearalgebra,BasicprobabilityandBasicAlgorithm
– Statisticsisrecommended.– Studentsshouldalreadyhavegoodprogrammingskills,i.e.python isrequiredforallprogrammingassignments
– Wewillreview“algebra”and“probability”inclass
9/4/18
YanjunQi/UVACS
10
Today
q CourseLogisticsqMachineLearningBasicsqMachineLearningHistoryq RoughPlanofCourseContent
9/4/18
YanjunQi/UVACS
11
9/4/18
YanjunQi/UVACS
12
• Biomedicine– Patient records, brain imaging, MRI & CT scans, …– Genomic sequences, bio-structure, drug effect info, …
• Science– Historical documents, scanned books, databases from
astronomy, environmental data, climate records, …
• Social media– Social interactions data, twitter, facebook records, online
reviews, …
• Business– Stock market transactions, corporate sales, airline traffic,
…
Entertainment
OUR DATA-RICH WORLD
Whatcanwedowiththedatawealth?è REAL-WORLDIMPACT
§ Businessefficiencies§ Scientificbreakthroughs§ Improvequality-of-life:§ healthcare,§ energysaving/generation,§ environmentaldisasters,§ nursinghome,§ transportation,§ …
9/4/18
MedicalImages
GenomicData
TransportationData
Braincomputerinteraction(BCI)
Devicesensordata
YanjunQi/UVACS
13
9/4/18
YanjunQi/UVACS
14
• Data capturing (sensor, smart devices, medical instruments, et al.)
• Data transmission • Data storage • Data management • High performance data processing• Data visualization• Data security & privacy (e.g. multiple
individuals)• ……
• Data analytics¢How can we analyze this big data wealth ?¢E.g. Machine learning and data mining
BIG DATA CHALLENGES
this course
e.g.HCI
e.g.cloudcomputing
MACHINE LEARNING IS CHANGING THE WORLD
9/4/18 Manymore!
YanjunQi/UVACS
15
BASICS OF MACHINE LEARNING
• “The goal of machine learning is to build computer systems that can learn and adapt from their experience.” – Tom Dietterich
• “Experience” in the form of available dataexamples (also called as instances, samples)
• Available examples are described with properties (data points in feature space X)
9/4/18
YanjunQi/UVACS
16
9/4/18
YanjunQi/UVACS
17
e.g. SUPERVISED LEARNING• Find function to map input space X to
output space Y
• So that the difference between y and f(x)of each example x is small.
Ibelievethatthisbookisnotatallhelpfulsinceitdoesnotexplainthoroughlythematerial.itjustprovidesthereaderwithtablesandcalculationsthatsometimesarenoteasilyunderstood…
x
y-1
InputX:e.g.apieceofEnglishtext
OutputY:{1/Yes,-1/No}e.g.Isthisapositiveproduct review?
e.g.
SUPERVISED Linear Binary Classifier
• NowletuscheckoutaVERYSIMPLEcaseof
9/4/18
YanjunQi/UVACS
18
e.g.:Binaryy /Linearf/XasR2f x y
f(x,w,b) = sign(wT x + b)
X =(x_1,x_2)
SUPERVISED Linear Binary Classifier
f x y
f(x,w,b) = sign(wT x + b)
wT x +b<0
CourtesyslidefromProf.AndrewMoore’stutorial
wTx +b>0
denotes +1 pointdenotes -1 point
9/4/18
YanjunQi/UVACS
19
X =(x_1,x_2)
x_1
X_2
SUPERVISED Linear Binary Classifier
f x y
f(x,w,b) = sign(wT x + b)
wT x +b<0
CourtesyslidefromProf.AndrewMoore’stutorial
?
?
wTx +b>0
denotes +1 pointdenotes -1 pointdenotes future points
?
9/4/18
YanjunQi/UVACS
20
X =(x_1,x_2)
x_1
X_2
9/4/18
YanjunQi/UVACS
21
• Training (i.e. learning parameters w,b ) – Training set includes
• available examples x1,…,xL
• available corresponding labels y1,…,yL
– Find (w,b) by minimizing loss(i.e. difference between y and f(x) on available examples in training set)
(W, b) = argminW, b
Basic Concepts
• Testing (i.e. evaluating performance on “future” points)– Difference between true y? and the predicted f(x?) on a
set of testing examples (i.e. testing set)– Key: example x? not in the training set
• Generalisation:learnfunction/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”new dataexamples
9/4/18
Basic Concepts YanjunQi/UVACS
22
9/4/18
YanjunQi/UVACS
23
• Loss function – e.g. hinge loss for binary
classification task
– e.g. pairwise ranking loss for ranking task (i.e. ordering examples by preference)
• Regularization – E.g. additional information addedon loss function to control f
Basic Concepts
TYPICAL MACHINE LEARNING SYSTEM
9/4/18
Low-level sensing
Pre-processing
Feature Extract
Feature Select
Inference, Prediction, Recognition
Label Collection
YanjunQi/UVACS
24
Evaluation
“BigData”ChallengesforMachineLearning
9/4/18
üLargesizeofsamplesüHighdimensionalfeatures
Not the focus, being covered in my advanced-level course
YanjunQi/UVACS
25
Large-Scale Machine Learning: SIZE MATTERS
26
9/4/18
• One thousand data instances
• One million data instances
• One billion data instances
• One trillion data instancesThosearenotdifferentnumbers,
thosearedifferentmindsets !!!
YanjunQi/UVACS
BIG DATA CHALLENGES FOR MACHINE LEARNING
9/4/18
Thevariationsofboth X(feature,representation) andY(labels)arecomplex!
Most of this
courseüComplexityofXüComplexityofY
YanjunQi/UVACS
27
TYPICAL MACHINE LEARNING SYSTEM
9/4/18
Low-level sensing
Pre-processing
Feature Extract
Feature Select
Inference, Prediction, Recognition
Label Collection
Data Complexity of X
Data Complexity
of Y
YanjunQi/UVACS
28
Evaluation
UNSUPERVISED LEARNING : [ COMPLEXITY in Y ]
• No labels are provided (e.g. No Y provided)• Find patterns from unlabeled data, e.g. clustering
9/4/18
e.g.clustering=>tofind“natural” groupingofinstancesgivenun-labeleddata
YanjunQi/UVACS
29
STRUCTURAL OUTPUT LEARNING : [ COMPLEXITY OF Y ]
• Many prediction tasks involve output labels having structured correlations or constraints among instances
9/4/18
Manymorepossible structuresbetweeny_i ,e.g.spatial,temporal, relational…
Thedogchasedthecat
APAFSVSPASGACGPECA…
TreeSequence GridStructured Dependency between Examples’ Y
Input
Output
CCEEEEECCCCCHHHCCC…
YanjunQi/UVACS
30
Original Space Feature Space
STRUCTURAL INPUT : Kernel Methods [ COMPLEXITY OF X ]
e.g.Graphs,Sequences,3Dstructures,
9/4/18
YanjunQi/UVACS
31
�
�
�
MORE RECENT: FEATURE LEARNING[ COMPLEXITY OF X ]
Deep Learning Supervised Embedding
9/4/18
Layer-wise Pretraining
YanjunQi/UVACS
32
DEEP LEARNING / FEATURE LEARNING : [ COMPLEXITY OF X ]
Feature Engineering ü Most critical for accuracy ü Account for most of the computation for testing ü Most time-consuming in development cycle ü Often hand-craft and task dependent in practice
Feature Learning ü Easily adaptable to new similar tasks ü Layerwise representation ü Layer-by-layer unsupervised trainingü Layer-by-layer supervised training 339/4/18
YanjunQi/UVACS
Whylearnfeatures?
34
When to use Machine Learning (Adaptto/learn fromdata) ?
– 1. Extract knowledge from data– Relationships and correlations can be hidden within large
amounts of data– The amount of knowledge available about certain tasks is
simply too large for explicit encoding (e.g. rules) by humans
– 2. Learn tasks that are difficult to formalise– Hard to be defined well, except by examples, e.g., face
recognition
– 3. Create software that improves over time– New knowledge is constantly being discovered. – Rule or human encoding-based system is difficult to
continuously re-design “by hand”.
9/4/18
YanjunQi/UVACS
35
Today
q CourseLogisticsqMachineLearningBasicsqMachineLearningHistoryq RoughPlanofCourseContent
9/4/18
YanjunQi/UVACS
36
MACHINE LEARNING IN COMPUTER SCIENCE
• Machine learning is already the preferred approach for – Speech recognition, natural language processing– Computer vision– Medical outcome analysis– Robot control …
• Why growing ?– Improved machine learning algorithms– Improved CPU / GPU powers – Increased data capture, new sensors, networking – Systems/Software too complex to control manually– Demand to self-customization for user, environment, ….
9/4/18
YanjunQi/UVACS
37
HISTORY OF MACHINE LEARNING
• 1950s– Samuel’s checker player– Selfridge’s Pandemonium
• 1960s: – Neural networks: Perceptron– Pattern recognition – Learning in the limit theory– Minsky and Papert prove limitations of Perceptron
• 1970s: – Symbolic concept induction– Winston’s arch learner– Expert systems and the knowledge acquisition bottleneck– Quinlan’s DT ID3– Michalski’s AQ and soybean diagnosis– Scientific discovery with BACON– Mathematical discovery with AM
9/4/18
YanjunQi/UVACS
38AdaptedFromProf.RaymondJ.Mooney’s slides
HISTORY OF MACHINE LEARNING (CONT.)
• 1980s:– Advanced decision tree and rule learning– Explanation-based Learning (EBL)– Learning and planning and problem solving– Utility problem– Analogy– Cognitive architectures– Resurgence of neural networks (connectionism, backpropagation)– Valiant’s PAC Learning Theory– Focus on experimental methodology
• 1990s– Data mining– Adaptive software agents and web applications– Text learning– Reinforcement learning (RL)– Inductive Logic Programming (ILP)– Ensembles: Bagging, Boosting, and Stacking– Bayes Net learning9/4/18
YanjunQi/UVACS
39AdaptedFromProf.RaymondJ.Mooney’s slides
HISTORY OF MACHINE LEARNING (CONT.)
• 2000s– Support vector machines– Kernel methods– Graphical models– Statistical relational learning– Transfer learning– Sequence labeling– Collective classification and structured outputs– Computer Systems Applications
• Compilers• Debugging• Graphics• Security (intrusion, virus, and worm detection)
– Email management– Personalized assistants that learn– Learning in robotics and vision
9/4/18
YanjunQi/UVACS
40AdaptedFromProf.RaymondJ.Mooney’s slides
HISTORY OF MACHINE LEARNING (CONT.)
• 2010s– Speech translation, voice recognition (e.g. SIRI)– Google search engine uses numerous machine learning
techniques (e.g. grouping news, spelling corrector, improving search ranking, image retrieval, …..)
– 23 and me (scan sample of person genome, predict likelihood of genetic disease, … )
– DeepMind, Google Brain, …– IBM waston QA system– Machine Learning as a service (e.g. google prediction API,
bigml.com, cloud autoML . ) – IBM healthcare analytics– ……
9/4/18
YanjunQi/UVACS
41AdaptedFromProf.RaymondJ.Mooney’s slides
42
HISTORY OF MACHINE LEARNING
(CONT.)
43
RELATED DISCIPLINES• Artificial Intelligence• Data Mining• Probability and Statistics• Information theory• Numerical optimization• Computational complexity theory• Control theory (adaptive)• Psychology (developmental, cognitive)• Neurobiology• Linguistics• Philosophy9/4/18
YanjunQi/UVACS
WhatarethegoalsofAIresearch?
ArtifactsthatACTlikeHUMANS
ArtifactsthatTHINKlikeHUMANS
ArtifactsthatTHINKRATIONALLY
ArtifactsthatACTRATIONALLY
44From:M.A.Papalaskar
9/4/18
YanjunQi/UVACS
Howcanwebuildmoreintelligentcomputer/machine?
• Ableto– perceivetheworld– understandtheworld– reacttotheworld
• Thisneeds– Basicspeechcapabilities– Basicvisioncapabilities– Language/semanticunderstanding– Userbehavior/emotionunderstanding– Abletoact– Abletothink??
9/4/18 45
YanjunQi/UVACS
Howcanwebuildmoreintelligentcomputer/machine?: Milestones in
Recent Vision/AI Fields
9/4/18
YanjunQi/UVACS
46
ImageNet Competition:
[ Training on 1.2 million images [X]vs. 1000 different word labels [Y] ]
Detour:threeplannedprogrammingassignmentsaboutAItasks
• HW:Semanticlanguageunderstanding(sentimentclassificationonmoviereviewtext)
• HW:Visualobjectrecognition(labelingimagesabouthandwrittendigits)
• HW:Audiospeechrecognition(unsupervisedlearningbasedspeechrecognitiontask)
9/4/18 47
YanjunQi/UVACS
Today
q CourseLogisticsqMachineLearningBasicsqMachineLearningHistoryq RoughPlanofCourseContent
9/4/18
YanjunQi/UVACS
48
CourseContentPlanèFivemajorsectionsofthiscourse
q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels
9/4/18 49
YanjunQi/UVACS
Summary
• Thisisnotacourseabouthowtouseatoolbox
• Wefocusonlearningfundamentalprinciples,mathematicalformulation,algorithmdesignandlearningtheory.
9/4/18
YanjunQi/UVACS
50
SomenegativecommentsfromlastSpring
• Classwasboring,…• Theinstructorstatedthatthecoursewasgoingtobemath-heavywhich90%ofstudentsdidnotwant,andeventheremaining10%wereprobablyblownawayathowintensiveitreallywas…
9/4/18
YanjunQi/UVACS
51
AFEWSAMPLESLIDES
9/4/18
YanjunQi/UVACS
52
9/4/18 53
Dr.YanjunQi/UVACS
L4
L12:DerivingtheMaximumLikelihoodEstimateforBernoulli
!!L(p)= px(1− p)n−x
maximize
0.0
00.0
40.0
8
potential HIV prevalences
like
liho
od
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
!!log(L(p)= log px(1− p)n−x⎡⎣ ⎤⎦
maximize
!2
50
!1
50
!5
00
potential HIV prevalences
log(lik
elih
ood)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
−l(p)= − log px(1− p)n−x⎡⎣ ⎤⎦
Minimize the negative log-likelihood
05
01
00
20
0potential HIV prevalences
!lo
g(lik
elih
ood)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Dr.YanjunQi/UVACS
p
p
p
Task: y
Next lesson: Machine Learning in a Nutshell
Representation: x
Score Function: L()
Search/Optimization : argmin()
Models, Parameters : f, w, b
9/4/18 55
MLgrewoutofworkinAI
Optimizeaperformance criterionusing example dataorpast experience,
Aiming to generalize tounseen data
YanjunQi/UVACS
Next lesson: Review of linear algebra and basic calculus
References
q Prof.AndrewMoore’stutorialsq Prof.RaymondJ.Mooney’sslidesq Prof.AlexanderGray’sslidesq Prof.EricXing’sslidesq http://scikit-learn.org/q Hastie,Trevor,etal.Theelementsofstatisticallearning.Vol.2.No.1.NewYork:Springer,2009.
q Prof.M.A.Papalaskar’s slides
9/4/18
YanjunQi/UVACS
56