18
Topics in Algorithms and Data Science Introduction Omid Etesami

Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Topics in Algorithms and Data Science

Introduction

Omid Etesami

Page 2: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Early Computer Science (according to John Hopcroft)

• CS in 1960’s: emphasis on programming languages, compilers, operating systems

• CS theory in 1960’s: finite automata, regular expressions, context free languages, computability

• CS in 1970’s: making computers more useful

for well-defined tasks

• CS theory in 1970’s: important addition of

algorithms

Page 3: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Modern CS

• More focus on applications

• Merging of computing and communication

• More collected data in natural sciences, commerce, …

• Web, social networks

• Requires understanding data

Page 4: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Modern CS theory

Not only discrete mathematics

but also

probability, statistics, numerical methods

Page 5: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Textbook for the course

• Foundations of Data Science (draft of a new book as of May 2015)

by Avrim Blum, John Hopcroft, Ravi Kannan

• We will cover first four chapters.

• Available online: http://www.cs.cornell.edu/jeh/bookMay2015.pdf

Page 6: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Outline of the course

• Random graphs

• High-dimensional geometry

• Singular value decomposition

Page 7: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Random Graphs

• Models for web and social networks

• Simplest model: Erdos-Renyi random graph model

• Understanding global phenomenon such as giant connected component in terms of local choice

• Other models of random graphs: non-uniform

models, growth models with or without preferential

attachment, small-world graphs

Page 8: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Random graphs (continued)

• Random constraint satisfaction problems (like 3-SAT)

• Non-uniform random graphs and their relation to modern coding theory (like fountain codes)

3-SAT solution space (height represents # of unsatisfied constraints)!

Page 9: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

High-dimensional geometry

• Represent data with vectors of many components

(e.g. in Search or Machine Learning)

• Intuition for two or three dimensions different from high dimensions!

Sphere in 3 dimensions Stereographic projection of sphere in 4 dimensions!

Page 10: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Singular value decomposition (SVD)

• To deal with high-dimensional data, we need matrix algebra and matrix algorithms

• Singular value decomposition is an important tool

• Applications of SVD:

Principal Component Analysis

Clustering statistical mixtures of Gaussian probability densities

Discrete optimization like Max-CUT

Page 11: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Grading

• Around 7 points for homework and quizzes.

• Around 5 points for midterm

• Around 8 points for final

• Additional points for presentation and project

Page 12: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Homework

• Late homework is NOT accepted. Prepare early.

• You can work on homework together, but you should acknowledge your collaborators and your write-up should be your own. (If you do not acknowledge, you can receive negative points.)

• If you use internet, you should acknowledge your source.

Page 13: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Prerequisites

• Probability including problem solving skills

and basic inequalities

• Linear algebra including

eigenvalues and eigenvectors

• Asymptotic analysis of algorithms

• Basic discrete math, basic calculus

• Most importantly, mathematical maturity like being able to rigorously prove things.

Page 14: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

A few teasers (reflecting the background you need for the course)

Page 15: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Sex bias in graduate admissions

• 8442 men applied (44% admitted)

• 4321 women applied (35% admitted)

• In each department

% admitted women/women who applied

>=

% admitted men/men who applied

Can this happen?

Page 16: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Generating a random permutation

for i = 1 to n

j = random between 1 and n

swap(x[i], x[j])

How can you prove the above algorithm does not generate a uniformly random permutation (for all n >= 3)?

Page 17: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Matrix rank

Why is the number of linearly independent rows exactly equal to the number of linearly independent columns?

Page 18: Topics in Algorithms and Data Science Introductionce.sharif.edu/.../root/slides/Introduction.pdf · •CS theory in 1960’s: finite automata, regular expressions, context free languages,

Volume of the sphere

Can you work out the volume of the sphere in 3 dimensions?