42
Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes [email protected] Introduction

Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes [email protected] Introduction

Embed Size (px)

Citation preview

Page 1: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

1

Info 2950Mathematical Methods for Information Science

Prof. Carla [email protected]

Introduction

Page 2: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

2

Overview of this Lecture

• Course Administration

• What is it about?

• Course Themes, Goals, and Syllabus

Page 3: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

3

Course Administration

Page 4: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

4

Lectures: Tuesdays and Thursdays --- 1:25 – 2:40Location: 315 Upson Hall

Lecturer: Prof. Carla Gomes Office: 5133 Upson HallPhone: 255 9189Email: [email protected]

Course Assistant: Megan McDonald ([email protected])    

Web Site: http://www.infosci.cornell.edu/courses/info2950/2011sp/.

Page 5: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

5

Grades

The final grade for the course will be determined as follows:

Participation 5%Homework: 30% Midterm: 30% (In-class, specific details TBA) Final: 35% (TBA)

 Note: The lowest homework grade will be dropped before the final grade is computed.

Page 6: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

6

Homework

• Homework is very important. It is the best way for you to learn the material. The midterm and the final will include one or two questions from the homework assignments.

• Your lowest homework grade will be dropped before the final grade is computed.

• You can discuss the problems with your classmates, but all work handed in should be original, written by you in your own words.

• Homework should be handed in in class.

• No late homework will be accepted.

Page 7: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

7

Textbook

Discrete Mathematics and Its Applications by Kenneth H. Rosen

Use lecture notes as study guide.

Page 8: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

8

Overview of this Lecture

• Course Administration

• What is INFO 2950 about?

• Course Themes, Goals, and Syllabus

Page 9: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

9

Focus: Discrete Structures

Why is it relevant to information science?

What is Info 2950 about?

Discrete Mathematics and Its Applications

Page 10: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

Information Science

Main Focus : Information in digital form

• studies the creation, representation, organization, access, and analysis of information in digital form

• examines the social, cultural, economic, historical, legal, and political contexts in which information systems are employed

Page 11: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

11

Continuous vs. Discrete Mathematics

Continuous Mathematics

It considers objects that vary continuously;

Example: analog wristwatch (separate hour, minute, and second hands).

From an analog watch perspective, between 1 :25 p.m. and 1 :26 p.m.

there are infinitely many possible different times as the second hand moves

around the watch face.

Real-number system --- core of continuous mathematics;

Continuous mathematics --- models and tools for analyzing real-world

phenomena that change smoothly over time. (Differential equations etc.)

http://ja0hxv.calico.jp/pai/epivalue.html (one trillion digits below the decimal point)

Page 12: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

12

Discrete vs. Continuous Mathematics

Discrete MathematicsIt considers objects that vary in a discrete way.

Example: digital wristwatch.

On a digital watch, there are only finitely many possible different times

between 1 :25 P.m. and 1:27 P.m. A digital watch does not show split

seconds: no time between 1 :25:03 and 1 :25:04. The watch moves from one

time to the next.

Integers --- core of discrete mathematics

Discrete mathematics --- models and tools for analyzing real-world

phenomena that change discretely over time and therefore ideal for studying

computer science – computers are digital! (numbers as finite bit strings; data structures, all discrete! )

Page 13: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

13

Why is Discrete Mathematics relevant to computer science and information science?

(examples)

What is INFO 2950 about?

Page 14: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

Logic:Web Page Searching - Boolean Searches

14George Boole

Page 15: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

Logic:Hardware and software specifications

One-bit Full Adder with

Carry-In and Carry-Out

4-bit full adder

Example 1: Adder

Formal: Input_wire_A value in {0, 1}

Logic – formal language for representing and reason about information:Syntax; Semantics; and Inference rules.

Hardware: Digital Circuits

Page 16: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

Logic:Digital circuits

One-bit full adder

•Input bitsto be added

Carry bit

XOR gates

AND gates OR gate

Sum

Carry bit for next adder

01

1

1 0

0

11

Page 17: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

17

Logic:Software specifications

Example 2: System Specification:

–The router can send packets to the edge system only if it supports the new address space.

– For the router to support the new address space it’s necessary that the latest software release be installed.

–The router can send packets to the edge system if the latest software release is installed.

–The router does not support the new address space.

How to write these specifications in a rigorous / formal way? Use Logic.

Page 18: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

Sudoku

9 8 4 2 6 3

4 3 9 7 5 2

2 3 1 5 8 4 9

1 6 2 7 3 4 9 5

8 6 1 5 3

5 3 4 2 6 7 1

3 1 6 2 8 9

2 4 9 8 3

8 9 3 2

How can we encode this problem and solve it? Use Logic!!!!

Page 19: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

Sudoku

9 8 5 4 2 6 3 1 7

4 1 6 3 9 7 5 2 8

7 2 3 1 5 8 4 6 9

1 6 2 7 3 4 9 8 5

8 9 7 6 1 5 2 4 3

5 3 4 2 8 9 6 7 1

3 7 1 5 6 2 8 9 4

2 4 9 8 7 3 1 5 6

6 5 8 9 4 1 7 3 2

Page 20: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

20

Automated Proofs:

EQP - Robbin’s Algebras are all Boolean

[An Argonne lab program] has come up with a major mathematical proof that would have been called creative if a human had thought of it. New York Times, December, 1996

http://www-unix.mcs.anl.gov/~mccune/papers/robbins/

A mathematical conjecture (Robbins conjecture) unsolved for decades.First non-trivial mathematical theorem proved automatically.

The Robbins problem was to determine whether one particular set of rules is powerful enough to capture all of the laws of Boolean algebra. One way to state the Robbins problem in mathematical terms is:Can the equation not(not(P))=P be derived from the following three equations? [1] P or Q = Q or P,[2] (P or Q) or R = P or (Q or R), [3] not(not(P or Q) or not(P or not(Q))) = P.

Page 21: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

21

Probability

Importance of concepts from probability is rapidly increasing in CS and Information Science:

• Machine Learning / Data Mining: Find statistical regularities in large amounts of data. (e.g. Naïve Bayes alg.)

• Natural language understanding: dealing with the ambiguity of language (words have multiple meanings, sentences have multiple parsings --- key: find the most likely (i.e., most probable) coherent interpretation of a sentence (the “holy grail” of NLU).

• Randomized algorithms: e.g. Google’s PageRank, “just” a random walk on the web! Also primality testing; randomized search algorithms, such as simulated annealing. In computation, having a few random bits really helps!

Page 22: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

22

Probability:Bayesian Reasoning

Bayesian networks provide a means of expressing joint probability over many interrelated hypotheses and therefore reason about them.

Bayesian networks have been successfully applied in diverse fields such as medical diagnosis,image recognition, language understanding, search algorithms, and many others.

Bayes Rule

Example of Query:what is the most likely

diagnosis for the infection given all the symptoms?

“18th-century theory is new force in computing” CNET ’07

Page 23: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

Naïve Bayes SPAM Filters

23

Formula used by Spam filters derived from “Bayes Rule”, basedon independence assumptions:

Key idea: some words are more likely to appear in spam email than in legitimate email. The filter doesn't know the probabilities of different words in advance, and must first be trained so it can build them up. Users manually flag SPAM email.

Page 24: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

24

Probability and Chance, cont.

Back to checking proofs...

Imagine a mathematical proof that is several thousands pages long. (e.g., the classification of so-called finite simple groups, also called the enormous theorem, 5000+ pages).

How would you check it to make sure it’s correct? Hmm…

Page 25: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

Probability and Chance, cont.Computer scientist have recently found a remarkable way to do this:

“holographic proofs”

Ask the author of the proof to write it down in a special encoding (size increases to, say, 50,000 pages of 0 / 1 bits). You don’t need to see the encoding! Instead, you ask the author to give you the values of 50 randomly picked bits of the proof. (i.e., “spot check the proof”). With almost absolute certainty, you can now determine whether the proof is correct of not! (works also for 100 trillion page proofs, use eg 100 bits.) Aside: Do professors ever use “spot checking”?

Started with results from the early nineties (Arora et al. ‘92) with recent refinements (Dinur ’06). Combines ideas from coding theory, probability, algebra, computation, and

graph theory. It’s an example of one of the latest advances in discrete mathematics.See Bernard Chazelle, Nature ’07.

Page 26: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

26

Graph Theory

Page 27: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

27

Graphs and Networks

•Many problems can be represented by a graphical network representation.

•Examples:

– Distribution problems

– Routing problems

– Maximum flow problems

– Designing computer / phone / road networks

– Equipment replacement

– And of course the Internet

Aside: finding the rightproblem representationis one of the key issues in this course.

Page 28: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

28

Sub-Category GraphNo Threshold

New Science of Networks

NYS Electric Power Grid(Thorp,Strogatz,Watts)

Cybercommunities(Automatically discovered)

Kleinberg et al

Network of computer scientistsReferralWeb System(Kautz and Selman)

Neural network of the nematode worm C- elegans

(Strogatz, Watts)

Networks arepervasive

Utility Patent network 1972-1999

(3 Million patents)Gomes and Lesser

Page 29: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

30

Example: Coloring a Map

How to color this map so that no two adjacent regions have the same color?

What does it have to do with discrete math?

Page 30: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

31

Graph representation

Coloring the nodes of the graph:What’s the minimum number of colors such that any two nodes connected by an edge have different colors?

Abstract theessential info:

Page 31: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

32

Four Color Theorem

The chromatic number of a graph is the least number of colors

that are required to color a graph.

The Four Color Theorem – the chromatic number of a planar graph

is no greater than four. (quite surprising!)

Proof: Appel and Haken 1976; careful case analysis performed by computer; proof

reduced the infinitude of possible maps to 1,936 reducible configurations (later

reduced to 1,476) which had to be checked one by one by computer. The computer

program ran for hundreds of hours. The first significant computer-assisted

mathematical proof. Write-up was hundreds of pages including code!

Four color map.

Page 32: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

33

Examples of Applications of Graph Coloring

Page 33: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

34

Scheduling of Final Exams

How can the final exams at Cornell be scheduled so that no student has

two exams at the same time? (Note not obvious this has anything to do

with graphs or graph coloring!)

Graph:A vertex correspond to a course.An edge between two vertices denotes that there is at least one common student in the courses they represent.Each time slot for a final exam is represented by a different color.

A coloring of the graph corresponds to a valid schedule of the exams.

1

7 2

36

5 4

Page 34: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

35

Scheduling of Final Exams

1

7 2

36

5 4

What are the constraints between courses?Find a valid coloring

1

7 2

36

5 4

TimePeriod

IIIIIIIV

Courses

1,62

3,54,7

Why is mimimumnumber of colorsuseful?

Page 35: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

38

Example 2:Traveling Salesman

Find a closed tour of minimum length visiting all the cities.

TSP lots of applications: Transportation related: scheduling deliveriesMany others: e.g., Scheduling of a machine to drill holes in a circuit board ;Genome sequencing; etc

Page 36: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

39

13,509 cities in the US

13508!= 1.4759774188460148199751342753208e+49936

Page 37: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

The optimal tour!

Page 38: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

41

Course Themes, Goals, and Course Outline

Page 39: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Goals of Info 2950Introduce students to a range of mathematical tools from discrete

mathematics that are key in Information Science

Mathematical SophisticationHow to write statements rigorouslyHow to read and write theorems, lemmas, etc.

How to write rigorous proofs

Areas we will cover:

Logic and proofsSet TheoryCounting and Probability Theory Graph Theory Models of computationa

Aside: We’re not after the shortest or most elegant proofs;verbose but rigorous is just fine!

Practice works!

Actually, only practice works!

Note: Learning to do proofs fromwatching the slides is like trying tolearn to play tennis from watchingit on TV! So, do the exercises!

Page 40: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Topics Info 2950

Logic and Methods of Proof

Propositional Logic --- SAT as an encoding language!

Predicates and Quantifiers

Methods of Proofs

Sets

Sets and Set operations

Functions

Counting

Basics of counting

Pigeonhole principle

Permutations and Combinations

Page 41: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Topics CS 2950

Probability

Probability Axioms, events, random variable

Independence, expectation, example distributions

Birthday paradox

Monte Carlo method

Graphs and Trees

Graph terminology

Example of graph problems and algorithms:

graph coloring

TSP

shortest path

Automata theory

Languages

Page 42: Carla Gomes INFO 2950 1 Info 2950 Mathematical Methods for Information Science Prof. Carla Gomes gomes@cs.cornell.edu Introduction

Carla GomesINFO 2950

45

The END