Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Topics in Algorithms and Data Science
Introduction
Omid Etesami
Early Computer Science (according to John Hopcroft)
• CS in 1960’s: emphasis on programming languages, compilers, operating systems
• CS theory in 1960’s: finite automata, regular expressions, context free languages, computability
• CS in 1970’s: making computers more useful
for well-defined tasks
• CS theory in 1970’s: important addition of
algorithms
Modern CS
• More focus on applications
• Merging of computing and communication
• More collected data in natural sciences, commerce, …
• Web, social networks
• Requires understanding data
Modern CS theory
Not only discrete mathematics
but also
probability, statistics, numerical methods
Textbook for the course
• Foundations of Data Science (draft of a new book as of May 2015)
by Avrim Blum, John Hopcroft, Ravi Kannan
• We will cover first four chapters.
• Available online: http://www.cs.cornell.edu/jeh/bookMay2015.pdf
Outline of the course
• Random graphs
• High-dimensional geometry
• Singular value decomposition
Random Graphs
• Models for web and social networks
• Simplest model: Erdos-Renyi random graph model
• Understanding global phenomenon such as giant connected component in terms of local choice
• Other models of random graphs: non-uniform
models, growth models with or without preferential
attachment, small-world graphs
Random graphs (continued)
• Random constraint satisfaction problems (like 3-SAT)
• Non-uniform random graphs and their relation to modern coding theory (like fountain codes)
3-SAT solution space (height represents # of unsatisfied constraints)!
High-dimensional geometry
• Represent data with vectors of many components
(e.g. in Search or Machine Learning)
• Intuition for two or three dimensions different from high dimensions!
Sphere in 3 dimensions Stereographic projection of sphere in 4 dimensions!
Singular value decomposition (SVD)
• To deal with high-dimensional data, we need matrix algebra and matrix algorithms
• Singular value decomposition is an important tool
• Applications of SVD:
Principal Component Analysis
Clustering statistical mixtures of Gaussian probability densities
Discrete optimization like Max-CUT
Grading
• Around 7 points for homework and quizzes.
• Around 5 points for midterm
• Around 8 points for final
• Additional points for presentation and project
Homework
• Late homework is NOT accepted. Prepare early.
• You can work on homework together, but you should acknowledge your collaborators and your write-up should be your own. (If you do not acknowledge, you can receive negative points.)
• If you use internet, you should acknowledge your source.
Prerequisites
• Probability including problem solving skills
and basic inequalities
• Linear algebra including
eigenvalues and eigenvectors
• Asymptotic analysis of algorithms
• Basic discrete math, basic calculus
• Most importantly, mathematical maturity like being able to rigorously prove things.
A few teasers (reflecting the background you need for the course)
Sex bias in graduate admissions
• 8442 men applied (44% admitted)
• 4321 women applied (35% admitted)
• In each department
% admitted women/women who applied
>=
% admitted men/men who applied
Can this happen?
Generating a random permutation
for i = 1 to n
j = random between 1 and n
swap(x[i], x[j])
How can you prove the above algorithm does not generate a uniformly random permutation (for all n >= 3)?
Matrix rank
Why is the number of linearly independent rows exactly equal to the number of linearly independent columns?
Volume of the sphere
Can you work out the volume of the sphere in 3 dimensions?