14
Info 2950 Tue/Thu 10:10-11:25 Intro to Data Science Kennedy Hall 116 (Call Auditorium) https://courses.cit.cornell.edu/info2950_2017sp/ Instructor: Paul Ginsparg (242 Gates Hall) Only permitted to use middle section of auditorium

Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

Info 2950

Tue/Thu 10:10-11:25

Intro to Data Science

Kennedy Hall 116 (Call Auditorium)

https://courses.cit.cornell.edu/info2950_2017sp/

Instructor: Paul Ginsparg (242 Gates Hall)

Only permitted to use middle section of auditorium

Page 2: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

https://courses.cit.cornell.edu/info2950_2017sp/

Page 3: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov
Page 4: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov
Page 5: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

0. Review of basic python / jupyter notebook1. Counting and probability (factorial, binomial coefficients, conditional

probability, Bayes Theorem Real Data: text classifier, etc. [baby machine learning]2. Statistics: mean, variance; binomial, Gaussian, Poisson distributions3. Graph theory (nodes, edges), networks (c.f. Info 2040), graph algorithms4. Power Law data (need exponential and logarithms …)5. Linear and Logistic regression, Pearson and Spearman correlators6. Markov and other correlated data

Rosen chapters 2,6,7,10,11Easley Kleinberg chpts 3,18+ many other on-line resources [mentioned, e.g. Berkeley “Foundations of Data Science” https://data-8.appspot.com/sp16/course ]

Page 6: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

Problem sets will involve both programming and non-programming problems.

Problem sets are not group projects.You are expected to abide by the Cornell University Code of Academic Integrity. It is your responsibility to understand and follow these policies. (In particular, the work you submit for course assignments must be your own. You may discuss homework assignments with other students at a high level, by for example discussing general methods or strategies to solve a problem, but you must cite the other student in your submission. Any work you submit must be your own understanding of the solution, the details of which you personally and individually worked out, and written in your own words.)

You’ll be penalized if you copy an iPython notebook, OR if yours is copied.

Page 7: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

to be posted later today:

will include instructions for installing anaconda, we'll standardize on python 3due to minor python 2.7/3.5 compatibility issues(though welcome to use python 2)

known problem with python installations: cs 1110 unfortunately recommends misconfigured software that violates standard practice by adding environment variables to ~/.bashrc file. (link to instructions for removing)

“Problem Set 0”, due in one week

Page 8: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

https://www.continuum.io/downloads

Page 9: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

Definition. A set is a collection of objects.

The objects of a set are called elements of the set.

x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x|x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 3, |C| = 3, Stu↵| = 4, |;| = 0

1

Definition. A set is a collection of objects.

The objects of a set are called elements of the set.

x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x|x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 3, |C| = 3, Stu↵| = 4, |;| = 0

1

Definition. A set is a collection of objects.

The objects of a set are called elements of the set.

x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x | x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 5, |C| = 3, |Stu↵| = 4, |;| = 0

1

Page 10: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

A subset T of a set S is a set of elements all of which are contained in S.

T ⇢ S (proper subset) or T ✓ S

empty set ; 2 S for all S

Examples:

C

0= {Ithaca,Chicago}

C

0 ⇢ C

X

0= {x | x is a whole number between 2 and 5} is a subset of X

The power set P(S) of a set S is the set of all subsets of S. Example: For the set

A = {1, 2, 3}, P(A) = {;, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, {1, 2, 3}}

For a set S with n elements, what is P(S)?

2

Definition. A set is a collection of objects.

The objects of a set are called elements of the set.

x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x | x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 3, |C| = 3, |Stu↵| = 4, |;| = 0

1

Page 11: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

For two sets to be the same, must have the same elements.

A = B means that 8x we have x 2 A i↵ x 2 B

(Equivalently A = B means that A ✓ B and B ✓ A)

5

Page 12: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

A subset T of a set S is a set of elements all of which are contained in S.

T ⇢ S (proper subset) or T ✓ S

empty set ; 2 S for all S

Examples:

C

0= {Ithaca,Chicago}

C

0 ⇢ C

X

0= {x | x is a whole number between 2 and 5} is a subset of X

The power set P(S) of a set S is the set of all subsets of S.

Example: For the set A = {1, 2, 3}, P(A) = {;, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, {1, 2, 3}}

For a set S with n elements, what is |P(S)|?

2

di↵erence of two sets A�B = {x | x 2 A and x 62 B}

Examples:

X � Stu↵ = {2, 3, 4, 5}

Stu↵ �X = {snow,Cornell, y}

C � ; = {Ithaca,Boston,Chicago}

X � E = {1, 3, 5}

symmetric di↵erence A4B = {x | x 2 A or x 2 B, and x 62 A \B}

Examples:

X4Stu↵ = {2, 3, 4, 5, snow,Cornell, y}

C4; = {Ithaca,Boston,Chicago}

X4A = {4, 5}

Cartesian product of two sets A⇥B = {(x, y) | x 2 A and y 2 B}

Example:

A⇥A = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)}

4

Page 13: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

Definition. A set is a collection of objects.

The objects of a set are called elements of the set.

x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x | x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 3, |C| = 3, |Stu↵| = 4, |;| = 0

1

Set Operations

union of two sets A [B = {x | x 2 A or x 2 B}

Examples:

X [ Stu↵ = {1, 2, 3, 4, 5, snow,Cornell, y}

C [ ; = {Ithaca,Boston,Chicago}

A [X = {1, 2, 3, 4, 5} (In this case, A [X = X).

intersection of two sets A \B = {x | x 2 A and x 2 B}

Examples:

X \ Stu↵ = {1}

C \ ; = ;

X \ E = {2, 4}

A \X = {1, 2, 3} (In this case A \X = A)

3

A subset T of a set S is a set of elements all of which are contained in S.

T ⇢ S (proper subset) or T ✓ S

empty set ; 2 S for all S

Examples:

C

0= {Ithaca,Chicago}

C

0 ⇢ C

X

0= {x | x is a whole number between 2 and 5} is a subset of X

The power set P(S) of a set S is the set of all subsets of S.

Example: For the set A = {1, 2, 3}, P(A) = {;, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, {1, 2, 3}}

For a set S with n elements, what is |P(S)|?

2

Page 14: Info 2950 Intro to Data Science - Cornell University4. Power Law data (need exponential and logarithms …) 5. Linear and Logistic regression, Pearson and Spearman correlators 6. Markov

Definition. A set is a collection of objects.

The objects of a set are called elements of the set.

x 2 S, or x 62 S

Examples:

X = {1, 2, 3, 4, 5}

C = {Ithaca,Boston,Chicago}

Stu↵ = {1,snow,Cornell,y}

empty set = ;

Can also be defined by rule or equation:

Example: E is the set of even numbers. E = {x | x is even}

Cardinality |S| is the number of elements of S

Examples: |X| = 3, |C| = 3, |Stu↵| = 4, |;| = 0

1

di↵erence of two sets A�B = {x | x 2 A and x 62 B}

Examples:

X � Stu↵ = {2, 3, 4, 5}

Stu↵ �X = {snow,Cornell, y}

C � ; = {Ithaca,Boston,Chicago}

X � E = {1, 3, 5}

symmetric di↵erence A4B = {x | x 2 A or x 2 B, and x 62 A \B}

Examples:

X4Stu↵ = {2, 3, 4, 5, snow,Cornell, y}

C4; = {Ithaca,Boston,Chicago}

X4A = {4, 5}

Cartesian product of two sets A⇥B = {(x, y) | x 2 A and y 2 B}

Example:

A⇥A = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)}

4

A subset T of a set S is a set of elements all of which are contained in S.

T ⇢ S (proper subset) or T ✓ S

empty set ; 2 S for all S

Examples:

C

0= {Ithaca,Chicago}

C

0 ⇢ C

X

0= {x | x is a whole number between 2 and 5} is a subset of X

The power set P(S) of a set S is the set of all subsets of S.

Example: For the set A = {1, 2, 3}, P(A) = {;, {1}, {2}, {3}, {1, 2}, {2, 3}, {1, 3}, {1, 2, 3}}

For a set S with n elements, what is |P(S)|?

2