Upload
vocong
View
232
Download
0
Embed Size (px)
Citation preview
Yann LeCun
Large-Scale Machine Learning
Large-Scale Machine Learning
John Langford Yann LeCun Microsoft Research Courant Institute
John Langford Yann LeCun Microsoft Research Courant Institute
Yann LeCun
What is Data Science?What is Data Science?
Data Science: automatically extracting knowledge from dataMathematics & StatisticsMachine LearningDomain Expertise
Applications in BusinessLots and lots
Applications in the SciencesAstronomy, CosmologyHigh-energy PhysicsBiology, GenomicsNeuroscienceThe Social Sciences
Medicine
Government
[after Drew Conway's Data Science Venn Diagram]
Mathematics &
StatisticsComputation
Domain Expertise
conventional
research
Danger
Zone!
Machine
Learning
Data
Science
Yann LeCun
Large Scale Machine LearningLarge Scale Machine Learning
Class website:http://cilvr.cs.nyu.edu/doku.php?id=courses:bigdata:starthttp://cilvr.cs.nyu.edu courses big data→ →
Forum, discussion, Q&A on Piazzahttps://piazza.com/class#spring2013/csciga3033002
Evaluation:Programming assignmentsProjectFinal exam
Computing infrastructure100-node cluster, 8 CPUs/node, Hadoop (donated by Yahoo! Labs)
SoftwareTorch: http://www.torch.ch/Vowpal Wabbit: https://github.com/JohnLangford/vowpal_wabbit/wiki
Yann LeCun
Big Data?Big Data?
Data often comes to in the form of a tableN: dimension of each vector (possibly very sparse)T: number of training samples (possibly infinite)
Big Data is large T, or large N, or bothLarge T, small N: great!Infinite T, small N: on-line / streamingSmall T, large N: hell!
Problems:(distributed) data storage and accesscan't use algo super-linear in TLarge N: overfittingParallelizingDealing with unbalanced setRepresenting high-dim data
N
T
Yann LeCun
SyllabusSyllabusIntro
Online Linear learning
2nd order optimization methods
LBFGS
Online Non-linear learning
Boosted Decision Trees
Hadoop, Allreduce
Parallel learning, OpenMP, CUDA
Inverted Indicies & Predictive Indexing
Hashing, LSH, linear/non-linear dimensionality reduction
Feature Learning, deep learning
Many Classes
Active Learning
Exploration and Learning