127
Applications Artificial Intelligence And Simulations

Applications

Embed Size (px)

Citation preview

Page 1: Applications

Applications

Artificial IntelligenceAnd

Simulations

Page 2: Applications

Applications

Pallet of Data StructuresAlgorithms

Choose which to use and Combine them To form your model

Of reality

Reality to modelThe modelerThe computer

You are the artist and the computer is your canvas

Page 3: Applications

Knowledge RepresentationAbstraction

You choose how to represent reality

The choice is not unique

It depends on what aspect of reality you want to represent and how

Page 4: Applications

Applications:Acquisition, management and use

of knowledge

Theme of lecture:

Abstraction of reality through knowledge engineering

Page 5: Applications

Applications:Acquisition, management and use of

knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge

– Feature Acquistion– Concept Abstraction

• Problem Solving• Use of knowledge in and as models

– Problem Solving– Simulations

Page 6: Applications

Storing and Managing Information

Table of data

Database management Systems (DBMS)Storage and retrieval of properties of objects

SpreadsheetsManipulations of and calculations with the data in the table

Each row is a particular object

Each column is a property associated with that objects

Two examples/paradigms of management systems

Page 7: Applications

Database Management System (DBMS)

Organizes data in sets of

tables

Page 8: Applications

Relational Database Management System (RDBMS)

Name Address Parcel #

John Smith 18 Lawyers Dr. 756554T. Brown 14 Summers Tr. 887419

Table A

Table B

Parcel # Assessed Value

887419 152,000446397 100,000

Provides relationships Between

data in the tables

Page 9: Applications

Using SQL- Structured Query Language

• SQL is a standard database protocol, adopted by most ‘relational’ databases

• Provides syntax for data:– Definition – Retrieval– Functions (COUNT, SUM, MIN, MAX, etc)– Updates and Deletes

• SELECT list FROM table WHERE condition• list - a list of items or * for all items

o WHERE - a logical expression limiting the number of records selectedo can be combined with Boolean logic: AND, OR, NOTo ORDER may be used to format results

Page 10: Applications

Spreadsheets

Every row isa different “object”with a set of properties

Every column isa different propertyof the row object

Page 11: Applications

SpreadsheetOrganization of elements

Column A Column B Column C

Row 1Row 2Row 3

Row and columnindicies

Cells with addresses

A7 B4 C10 D5Accessing each cell

Page 12: Applications

Spreadsheet Formulas

Formula: Combination of values or cell references and mathematical operators such as +, -, /, *

The formula displays in the entry bar. This formula is used to add the values in the four cells. The sum is displayed in cell B7.

The results of a formula display in the cell.

With cell, row and column functionsEx. Average, sum, min,max,

Page 13: Applications

Visualizing data:Charts

Page 14: Applications

Applications:Acquisition, management and use of

knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge

– Feature Acquistion– Concept Abstraction

• Use of knowledge in and as models– Problem Solving – Simulations

Page 15: Applications

Making Sense of Knowledge

Time flies like an arrow proverb

Fruit flies like a banana Groucho Marx

There is a semantic and context behind all words

Flies:1. The act of flying2. The insect

Like:1. Similar to2. Are fond of

There is also the elusive “Common Sense”

1. One type of fly, the fruit fly, is fond of bananas2. Fruit, in general, flies through the air just like a banana3. One type of fly, the fruit fly, is just like a banana

A bit complicated because we are speaking metaphorically,Time is not really an object, like a bird, which flies

Translation is not just doing a one-to-one search in the dictionaryComplex Searches is not just searching for individual words

Google translate

Page 16: Applications

16

Adding Semantics:Ontologies

Concept conceptual entity of the domain

Attribute property of a concept

Relation relationship between concepts or properties

Axiom coherent description between Concepts / Properties / Relations via logical expressions

Person

Student Professor

Lecture

isA – hierarchy (taxonomy)

name email

studentnr.

researchfield

topiclecturenr.

attends holds

Structuring of:• Background Knowledge• “Common Sense” knowledge

Page 17: Applications

17

Structure of an OntologyOntologies typically have two distinct components:

Names for important concepts in the domain– Elephant is a concept whose members are a kind of animal– Herbivore is a concept whose members are exactly those animals who eat

only plants or parts of plants – Adult_Elephant is a concept whose members are exactly those elephants

whose age is greater than 20 years

Background knowledge/constraints on the domain– Adult_Elephants weigh at least 2,000 kg– All Elephants are either African_Elephants or Indian_Elephants– No individual can be both a Herbivore and a Carnivore

Page 18: Applications

18

Ontology Definition

Formal, explicit specification of a shared conceptualization

commonly accepted understanding

conceptual model of a domain

(ontological theory)

unambiguous terminology definitions

machine-readability with computational

semantics

[Gruber93]

Page 19: Applications

19

The Semantic WebOntology implementation

"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee

“the wedding cake”

Page 20: Applications

Applications:Acquisition, management and use of

knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge

– Feature Acquisition– Concept Abstraction

• Use of knowledge in and as models– Problem Solving – Simulations

Page 21: Applications

AbstractingKnowledge

Several levels and reasons to abstract knowledge

Feature abstractionSimplifying “reality” so the know can be used in

Computer data structures and algorithms

Concept AbstractionOrganizing and making sense of the immense amount of

data/knowledge we have

Modeling abstractionMaking usable and predictive models of reality

Page 22: Applications

Feature AbstractionSimplifying “reality” so the knowledge can be used in

Computer data structures and algorithms

A photograph of a face

SetOf

pixels

Is it a face?Who’s face?

Page 23: Applications

Feature AbstractionSimplifying “reality” so the knowledge can be used in

Computer data structures and algorithms

A photograph of a face

Is it a face?Who’s face?

The eye sees the pixels

In the visual cortex,Features are detected

Page 24: Applications

Feature AbstractionSimplifying “reality” so the knowledge can be used in

Computer data structures and algorithms

43210 5

76 8 90 1 2 3 4 5

Photograph made up of pixelsThe pixels need to be converted to Data structures the algorithms can understand

Page 25: Applications

FeatureAbstract: Boundary Detection

• Is this a boundary?

Page 26: Applications

Feature Detection

“flat” region:no change in all directions

“edge”:no change along the edge direction

“corner”:significant change in all directions

Harris Detector: Intuition

From a square sampling of pixels

Page 27: Applications

Principle Component Analysis (PCA)

27

• Finding a map of principle components (PCs) of data into an orthogonal space

• Method: Find the set of eigenvalues in a vector space:– The eigen vectors are the principle components– The eigenvalues are the ranking of the vectors

• PCs – Variables with the largest variances– Orthogonality (each coordinate is orthogonal)– Linearity – Optimal least mean-square error

• Limitations? – Strict linearity – specific distribution– Large variance assumption

x1

x2

PC 1PC 2

Rotates coordinate system

Page 28: Applications

Feature Detection

( , ) ,u

E u v u v Mv

Intensity change in shifting window: eigenvalue analysis

1, 2 – eigenvalues of M

direction of the slowest change

direction of the fastest change

(max)-1/2

(min)-1/2

Ellipse E(u,v) = const

Harris Detector: Mathematics of the analysis of pixels Transformation of coordinates

Principle component analysis

Page 29: Applications

Can reduce the set of coordinates

One coordinate

The other coordinate is noise(all points are “shifted” to the Principle component)

Page 30: Applications

Harris Detector: Mathematics

1

2

“Corner”1 and 2 are large,

1 ~ 2;

E increases in all directions

1 and 2 are small;

E is almost constant in all directions

“Edge” 1 >> 2

“Edge” 2 >> 1

“Flat” region

Classification

of the

new coordinates

Page 31: Applications

PCA: Feature from pixels

1

2

“Corner”1 and 2 are large,

1 ~ 2;

E increases in all directions

“Edge” 1 >> 2

“Edge” 2 >> 1

“Flat” region

One principle componentAlong the line

The other component issmall

Note that line can be in any directionPrinciple component follows lineRotation invariant

Page 32: Applications

1

2

“Corner”1 and 2 are large,

1 ~ 2;

E increases in all directions

“Edge” 1 >> 2

“Edge” 2 >> 1

“Flat” region

PCA: Feature from pixels

There is no lineNo principle component

Page 33: Applications

PCA: Feature from pixels

1

2

“Corner”1 and 2 are large,

1 ~ 2;

E increases in all directions

“Edge” 1 >> 2

“Edge” 2 >> 1

“Flat” region

There are two lines(almost) in orthogonal

(perpendicular)Directions

Two principle components

Page 34: Applications

Feature Detection

Ellipse rotates but its shape (i.e. eigenvalues) remains the same

Corner response R is invariant to image rotation

Important property: Rotationally invariant

Page 35: Applications

SIFT Descriptor

• 16x16 Gradient window is taken. Partitioned into 4x4 subwindows.• Histogram of 4x4 samples in 8 directions• Gaussian weighting around center( is 0.5 times that of the scale of a

keypoint)• 4x4x8 = 128 dimensional feature vector

Another localized feature from the pixels

Page 36: Applications

Feature Detection

• Use the scale/orientation to determined by detector to in a normalized frame.

• compute a descriptor in this frame.

Scale example:• moments integrated over an adapted window• derivatives adapted to scale: sIx

Scale & orientation example:Resample all points/regions to 11X11 pixels

• PCA coefficients • Principle components of all points.

SIFT Descriptors also invariant to Scale/Orientation

Page 37: Applications

Feature AbstractionSimplifying “reality” so the knowledge can be used in

Computer data structures and algorithms

43210 5

76 8 90 1 2 3 4 5

New “features”represented

in data structures that can be used in algorithms

Page 38: Applications

Hierarchy of analysis

Hierarchy of features

Simple primitive features

Complex combinationsof simple features

Face detection

Page 39: Applications

Example: Face Detection

• Scan window over image

• Classify window as either:– Face– Non-face

ClassifierWindowFace

Non-face

From the established features

Page 40: Applications

Face Detection Algorithm

Face Localization

Lighting Compensation

Skin Color Detection

Color Space Transformation

Variance-based Segmentation

Connected Component &Grouping

Face Boundary Detection

Verifying/ WeightingEyes-Mouth Triangles

Eye/ Mouth Detection

Facial Feature Detection

Input Image

Output Image

Page 41: Applications

Applications:Acquisition, management and use of

knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge

– Feature Acquistion– Concept Abstraction

• Use of knowledge in and as models– Problem Solving – Simulations

Page 42: Applications

Concept AbstractionOrganizing and making sense of the immense amount of

data/knowledge we have

Generalization

The ability of an algorithm to perform accurately on new, unseen examples after having trained on a learning data set

Page 43: Applications

Generalization Consider the following regression problem: Predict real value on the y-axis from the real value on the x-axis. You are given 6 examples: {Xi,Yi}.

X*What is the y-value for a new query ?

Page 44: Applications

Generalization

X*What is the y-value for a new query ?

Page 45: Applications

Generalization

X*What is the y-value for a new query ?

Page 46: Applications

Generalizationwhich curve is best?

X*

What is the y-value for a new query ?

Page 47: Applications

Generalization

Occam’s razor: prefer the

simplest hypothesis consistent with data.

Have to find a balance

of constraints

Page 48: Applications

48

Two Schools of Thought

1. Statistical “Learning” The data is reduced to vectors of numbers Statistical techniques are used for the tasks to be performed.

2. Structural “Learning”

The data is converted to a discrete structure (such as a grammar or a graph) and the techniques are related to computer science subjects (such as parsing and graph matching).

Page 49: Applications

A spectrum of machine learning tasks

• High-dimensional data (e.g. more than 100 dimensions)

• The noise is not sufficient to obscure the structure in the data if we process it right.

• There is a huge amount of structure in the data, but the structure is too complicated to be represented by a simple model.

• The main problem is figuring out a way to represent the complicated structure that allows it to be learned.

• Low-dimensional data (e.g. less than 100 dimensions)

• Lots of noise in the data

• There is not much structure in the data, and what structure there is, can be represented by a fairly simple model.

• The main problem is distinguishing true structure from noise.

Statistics--------------------- Artificial Intelligence

Page 50: Applications

Supervised learning

Un-Supervised learning

Concept Acquisition

Statistics

Page 51: Applications

learning with the presence of an expert

Data is labelled with a class or value

Goal:: predict class or value label

c1

c2

c3

Supervised Learning

Learn a properties of a classificationDecision makingPredict (classify) sample → discrete set of class labels

e.g. C = {object 1, object 2 … } for recognition taske.g. C = {object, !object} for detection task

Spam

No-Spam

Page 52: Applications

learning without the presence of an expert

Data is unlabelled with a class or value

Goal:: determine data patterns/groupings

and the properties of that classification

Unsupervised Learning

Association or clustering::grouping a set of instances by attribute similarity

e.g. image segmentation

Key concept: Similarity

Page 53: Applications

Statistical Methods

Regression::Predict sample → associated real (continuous) value

e.g. data fitting

x1

x2 P

C1

PC

2

Learning within the constraints of the method

Data is basically n-dimensional set of numerical attributes

Deterministic/Mathematical algorithms based on probability distributions

Principle Component Analysis::Transform to a new (simpler) set of coordinates

e.g. find the major component of the data

Page 54: Applications

Pattern RecognitionAnother name for machine learning

• A pattern is an object, process or event that can be given a name.

• A pattern class (or category) is a set of patterns sharing common attributes and usually originating from the same source.

• During recognition (or classification) given objects are assigned to prescribed classes.

• A classifier is a machine which performs classification.

“The assignment of a physical object or event to one of several prespecified categeries” -- Duda & Hart

Page 55: Applications

Cross-ValidationIn the mathematics of statisticsA mathematical definition of the errorFunction of the probability distribution

Average

Standard deviation

In machine learning, no such distribution exists

FullData set

Training set

Test set

Build the MLData structure

Determine Error

Page 56: Applications

Classification algorithms– Fisher linear discriminant– KNN– Decision tree– Neural networks– SVM– Naïve bayes– Adaboost– Many many more ….

– Each one has its properties wrt bias, speed, accuracy, transparency…

Page 57: Applications

Feature extraction

Task: to extract features which are good for classification.Good features: • Objects from the same class have similar feature values.

• Objects from different classes have different values.

“Good” features “Bad” features

Page 58: Applications

SimilarityTwo objects

belong to the same classification

IfThe are “close”

x1

x2 ?

?

?

??Distance between them is small

Need a function

F(object1, object1) = “distance” between them

Page 59: Applications

Similarity measureDistance metric

• How do we measure what it means to be “close”?

• Depending on the problem we should choose an appropriate distance metric.

For example: Least squares distance

Page 60: Applications

Types of Model

Discriminative Generative

Generative vs. Discriminative

Page 61: Applications

Overfitting and underfitting

Problem: how rich class of classifications q(x;θ) to use.

underfitting overfittinggood fit

Problem of generalization: a small emprical risk Remp does not imply small true expected risk R.

Page 62: Applications

GenerativeCluster Analysis

Create “clusters”Depending on distance metric

HierarchialBased on “how close”

Objects are

Page 63: Applications

KNN – K nearest neighbors

x1

x2 ?

?

?

?

– Find the k nearest neighbors of the test example , and infer its class using their known class.

– E.g. K=3

?

Page 64: Applications

Discrimitive:Support Vector Machine

• Q: How to draw the optimal linear separating hyperplane? A: Maximizing margin

• Margin maximization– The distance between H+1 and H-1:

– Thus, ||w|| should be minimized

64

Margin

Page 65: Applications

Prediction Based on Bayes’ Theorem

• Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes’ theorem

• Informally, this can be viewed as

posteriori = likelihood x prior/evidence

• Predicts X belongs to Ci iff the probability P(Ci|X) is the highest

among all the P(Ck|X) for all the k classes

• Practical difficulty: It requires initial knowledge of many probabilities, involving significant computational cost

65

)(/)()|()()()|()|( XXX

XX PHPHPPHPHPHP

Page 66: Applications

Naïve Bayes Classifierage income studentcredit_ratingbuys_computer

<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

66

Class:C1:buys_computer = ‘yes’C2:buys_computer = ‘no’

P(buys_computer = “yes”) = 9/14 = 0.643

P(buys_computer = “no”) = 5/14= 0.357

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

Page 67: Applications

Naïve Bayes Classifierage income studentcredit_ratingbuys_computer

<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

67

Class:C1:buys_computer = ‘yes’C2:buys_computer = ‘no’

Want to classify

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

Will X buy a computer?

Page 68: Applications

Naïve Bayes Classifier

68

Key: Conditional probability

P(X|Y) The probability that X is true, given Y

P(not rain| sunny) > P(rain | sunny)

P(not rain| not sunny) < P(rain | not sunny)

Classifier: Have to include the probability of the condition

P(not rain | sunny)*P(sunny) How often did it really not rain, given that it was actually sunny

Page 69: Applications

Naïve Bayes Classifier

69

Class:C1:buys_computer = ‘yes’C2:buys_computer = ‘no’

Want to classify

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

Will X buy a computer?

Which “conditional probability” is greater?

P(X|C1)*P(C1) > P(X|C2) *P(C2) X will buy a computer

P(X|C1) *P(C1) < P(X|C2) *P(C2) X will not buy a computer

Page 70: Applications

Naïve Bayes Classifierage income studentcredit_ratingbuys_computer

<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

70

Class:C1:buys_computer = ‘yes’C2:buys_computer = ‘no’

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6

Page 71: Applications

Naïve Bayes Classifier• Compute P(X|Ci) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

71

Page 72: Applications

Naïve Bayes ClassifierP(X|Ci) :

P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028 P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

Bigger

Page 73: Applications

Decision Tree Classifier

Ross Quinlan

Ante

nna

Len

gth

10

1 2 3 4 5 6 7 8 9 10

123456789

Abdomen Length

Abdomen Length > 7.1?

no yes

KatydidAntenna Length > 6.0?

no yes

KatydidGrasshopper

Page 74: Applications

Grasshopper

Antennae shorter than body?

Cricket

Foretiba has ears?

Katydids Camel Cricket

Yes

Yes

Yes

No

No

3 Tarsi?

No

Decision trees predate computers

Page 75: Applications

• Decision tree – A flow-chart-like tree structure– Internal node denotes a test on an attribute– Branch represents an outcome of the test– Leaf nodes represent class labels or class distribution

• Decision tree generation consists of two phases– Tree construction

• At start, all the training examples are at the root• Partition examples recursively based on selected attributes

– Tree pruning• Identify and remove branches that reflect noise or outliers

• Use of decision tree: Classifying an unknown sample– Test the attribute values of the sample against the decision tree

Decision Tree Classification

Page 76: Applications

• Basic algorithm (a greedy algorithm)– Tree is constructed in a top-down recursive divide-and-conquer manner– At start, all the training examples are at the root– Attributes are categorical (if continuous-valued, they can be discretized

in advance)– Examples are partitioned recursively based on selected attributes.– Test attributes are selected on the basis of a heuristic or statistical

measure (e.g., information gain)• Conditions for stopping partitioning

– All samples for a given node belong to the same class– There are no remaining attributes for further partitioning – majority

voting is employed for classifying the leaf– There are no samples left

How do we construct the decision tree?

Page 77: Applications

Information Gain as A Splitting Criteria• Select the attribute with the highest information gain (information gain is the

expected reduction in entropy).

• Assume there are two classes, P and N

– Let the set of examples S contain p elements of class P and n elements of class N

– The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as

np

nnp

nnp

pnp

pSE 22 loglog)(

0 log(0) is defined as 0

Page 78: Applications

Information Gain in Decision Tree Induction• Assume that using attribute A, a current set will be

partitioned into some number of child sets

• The encoding information that would be gained by branching on A

)()()( setschildallEsetCurrentEAGain

Note: entropy is at its minimum if the collection of objects is completely uniform

Page 79: Applications

Person Hair Length

Weight Age Class

Homer 0” 250 36 MMarge 10” 150 34 F

Bart 2” 90 10 MLisa 6” 78 8 F

Maggie 4” 20 1 FAbe 1” 170 70 M

Selma 8” 160 41 FOtto 10” 180 38 M

Krusty 6” 200 45 M

Comic 8” 290 38 ?

Page 80: Applications

Hair Length <= 5?yes no

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911

Entropy(1F,3M) = -(1/4)log2(1/4) - (3/4)log2(3/4)

= 0.8113

Entropy(3F,2M) = -(3/5)log2(3/5) - (2/5)log2(2/5)

= 0.9710

np

nnp

nnp

pnp

pSEntropy 22 loglog)(

Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911

)()()( setschildallEsetCurrentEAGain

Let us try splitting on Hair length

Page 81: Applications

Weight <= 160?yes no

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911

Entropy(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5)

= 0.7219

Entropy(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4)

= 0

np

nnp

nnp

pnp

pSEntropy 22 loglog)(

Gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900

)()()( setschildallEsetCurrentEAGain

Let us try splitting on Weight

Page 82: Applications

age <= 40?yes no

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911

Entropy(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6)

= 1

Entropy(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3)

= 0.9183

np

nnp

nnp

pnp

pSEntropy 22 loglog)(

Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183

)()()( setschildallEsetCurrentEAGain

Let us try splitting on Age

Page 83: Applications

Weight <= 160?yes no

Hair Length <= 2?yes no

Of the 3 features we had, Weight was best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified… So we simply recurse!

This time we find that we can split on Hair length, and we are done!

Page 84: Applications

Weight <= 160?

yes no

Hair Length <= 2?

yes no

We need don’t need to keep the data around, just the test conditions.

Male

Male Female

How would these people be classified?

Page 85: Applications

Applications:Acquisition, management and use of

knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge

– Feature Acquistion– Concept Abstraction

• Use of knowledge in and as models– Problem Solving– Simulation

Page 86: Applications

Using KnowledgeProblem Solving

Simulations

Searching for a solution

Combining models to form a large comprehensive model

Page 87: Applications

Problem Solving

Basis of the searchOrder in which nodes are evaluated and expanded

Determined by Two Lists

OPEN: List of unexpanded nodesCLOSED: List of expanded nodes

Searching for a solution through all possible solutionsFundamental algorithm in artificial intelligence

Graph Search

Page 88: Applications

Abstraction:State of a system

chess

Tic-tak-toe

Water jug problemTraveling salemen’s problem

In problem solving:

Search for the steps

leading to the solution

The individual stepsare the

states of the system

Page 89: Applications

Solution SpaceThe set of all states of the problemIncluding the goal state(s)

All possible board combinations

All possible reference points

All possible combinations

Page 90: Applications

Search Space

Each system state(nodes)

is connected by rules(connections) on how to get

from one state to another

Page 91: Applications

Search Space

How the states are connected

Legal moves

Paths between points Possible operations

Page 92: Applications

Strategies to Search Space of System States

• Breath first search• Depth first search• Best first search

Determines order in which the states are searched to find solution

Page 93: Applications

Breadth-first searching• A breadth-first search (BFS)

explores nodes nearest the root before exploring nodes further away

• For example, after searching A, then B, then C, the search proceeds with D, E, F, G

• Node are explored in the order A B C D E F G H I J K L M N O P Q

• J will be found before NL M N O P

G

Q

H JI K

FED

B C

A

Page 94: Applications

Depth-first searching• A depth-first search (DFS)

explores a path all the way to a leaf before backtracking and exploring another path

• For example, after searching A, then B, then D, the search backtracks and tries another path from B

• Node are explored in the order A B D E H L M N I O P C F G J K Q

• N will be found before JL M N O P

G

Q

H JI K

FED

B C

A

Page 95: Applications

Breadth First Search

|| |

||| | |

| | |||||

Items between red bars are siblings.

goal is reached or open is empty.

Expand A to new nodes B, C, D

Expand B to new node E,F

Send to back of queue

Queue: FILO

Page 96: Applications

Depth first SearchExpand A to new nodes B, C, D

Expand B to new node E,F

Send to front of stack

Stack: FIFO

Page 97: Applications

Best First SearchBreadth first search: queue (FILO)Depth first search: stack (FIFO)

Uninformed searches:No knowledge of how good the current solution is(are we on the right track?)

Best First Search: Priority Queue

Associated with each node is a heuristic

F(node) = the quality of the node to lead to a final solution

Page 98: Applications

A* searchIdea: avoid expanding paths that are already expensive•• Evaluation function f(n) = g(n) + h(n)•• g(n) = cost so far to reach n• h(n) = estimated cost from n to goal• f(n) = estimated total cost of path through n to goal

This is the hard/unknown part

If h(n) is an underestimate, then the algorithm is guarenteed to find a solution

Page 99: Applications

Admissible heuristics

• A heuristic h(n) is admissible if for every node n,h(n) ≤ h*(n), where h*(n) is the true cost to reach the goal state from n.

• An admissible heuristic never overestimates the cost to reach the goal, i.e., it is optimistic

• Example: hSLD(n) (never overestimates the actual road distance)

• Theorem: If h(n) is admissible, A* using TREE-SEARCH is optimal

Page 100: Applications

Graph SearchSeveral Structures Used

Graph SearchThe graph as search space

Breadth first search Queue

Depth first search StackBest first search Priority Queue

Stacks and queues, depending on search strategy

Page 101: Applications

Applications:Acquisition, management and use of

knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge

– Feature Acquistion– Concept Abstraction

• Use of knowledge in and as models– Problem Solving– Simulations

Page 102: Applications

Problem SolvingSimulations

Example: Climate Simulation

Page 103: Applications

Climate Model

Climate Modeling

A multitude of sub-models

submodel

submodel

submodel

submodelsubmodel

submodel

submodel

submodelsubmodel

submodelsubmodel

submodel

submodel

submodel

Many stemming from the techniques discussed previously

Page 104: Applications

Physical processes regulating climatePhysical models representing all the interactions that can occur

Page 105: Applications

RadiationEven one physical quantity can have many

source models, sink models and interaction models

Page 106: Applications

“Earth System Model”

And ocean model, sea-ice model, land surface model, etc…

3D atmosphere

3D ocean

2D sea ice

AtmosphericCO2

2D land surface

Land biogeochemi

stry

Ocean biogeochem

istryOcean sediments

3D ice sheets

Page 107: Applications

Mathematical Modelsrepresenting

physical principles

Page 108: Applications

Meteorological Primitive Equations

• Applicable to wide scale of motions; > 1hour, >100km

Page 109: Applications

Global Climate Model PhysicsTerms F, Q, and Sq represent physical processes

• Equations of motion, F– turbulent transport, generation, and dissipation of momentum

• Thermodynamic energy equation, Q– convective-scale transport of heat– convective-scale sources/sinks of heat (phase change)– radiative sources/sinks of heat

• Water vapor mass continuity equation– convective-scale transport of water substance– convective-scale water sources/sinks (phase change)

Page 110: Applications

Model Physical ParameterizationsPhysical processes breakdown:

• Moist Processes– Moist convection, shallow convection, large scale condensation

• Radiation and Clouds– Cloud parameterization, radiation

• Surface Fluxes– Fluxes from land, ocean and sea ice (from data or models)

• Turbulent mixing– Planetary boundary layer parameterization, vertical diffusion, gravity

wave drag

Page 111: Applications

Process Models and Parameterization

•Boundary Layer•Clouds

StratiformConvective

•Microphysics

Page 112: Applications

Evolution of Global Climate Models (GCMs)

… increasing complexity.

Due to demand(want/need to model more complex systems)

Increased computing power enables more complex models

Page 113: Applications

http://www.usgcrp.gov/usgcrp/images/ocp2003/ocpfy2003-fig3-4.htm

The past, present and future of climate models

During the last 25 years, different components are added to the climate model to better represent our climate system

Page 114: Applications

Grid Discretizations

Equations are distributed on a sphere

• Different grid approaches: – Rectilinear (lat-lon)– Reduced grids– ‘equal area grids’: icosahedral, cubed sphere– Spectral transforms

• Different numerical methods for solution:– Spectral Transforms– Finite element– Lagrangian (semi-lagrangian)

• Vertical Discretization– Terrain following (sigma)– Pressure– Isentropic– Hybrid Sigma-pressure (most common)

The heart ofComputational Fluid Dynamics (CFD)

Page 115: Applications

Different time and spacial scalesMacroscopic properties

intermingling with macroscopic properties

Fast processes(ex. Molecular reactions)

Interacting withVery slow process

(ex. Transport/movement of moleculesTo other regions)

This often makes mathematically solving the problems very difficult

Page 116: Applications

1.How did I get here?

~106 m - 1m

~107 m ~105 m

~103 m

The planetary scale Cloud cluster scale

Cloud scaleCloud microphysical

scale

Page 117: Applications

Scales of Atmospheric Motions/Processes

Anthes et al. (1975)

Resolved Scales

Global Models

Future Global Models

Cloud/Mesoscale/Turbulence Models

Cloud DropsMicrophysicsCHEMISTRY

Page 118: Applications

10 m 100 m 1 km 10 km 100 km 1000 km 10000 km

turbulence Cumulusclouds

Cumulonimbusclouds

Mesoscale Convective systems

Extratropical Cyclones

Planetary waves

Large Eddy Simulation (LES) Model Cloud System Resolving Model (CSRM)

Numerical Weather Prediction (NWP) ModelGlobal Climate Model

No single model can encompass all relevant processes

DNS

mm

Cloud microphysics

Page 119: Applications

Knowledge RepresentationAbstraction

You choose how to represent reality

The choice is not unique

It depends on what aspect of reality you want to represent and how

Page 120: Applications

Applications:Acquisition, management and use of

knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge

– Feature Acquistion– Concept Abstraction

• Problem Solving• Use of knowledge in and as models

– Problem Solving– Simulations

Page 121: Applications

Storage and management of Information

Name Address Parcel #

John Smith 18 Lawyers Dr. 756554T. Brown 14 Summers Tr. 887419

Table A

Table BParcel # Assessed Value

887419 152,000446397 100,000

Page 122: Applications

Making Sense of Knowledge

Concept conceptual entity of the domain

Attribute property of a concept

Relation relationship between concepts or properties

Axiom coherent description between Concepts / Properties / Relations via logical expressions

Person

Student Professor

Lecture

isA – hierarchy (taxonomy)

name email

studentnr.

researchfield

topiclecturenr.

attends holds

Page 123: Applications

Acquisition of knowledge:Feature Acquistion

“flat” region:no change in all directions

“edge”:no change along the edge direction

“corner”:significant change in all directions

From a square sampling of pixels

Page 124: Applications

Acquisition of knowledge:Concept Abstraction

P(X|C1)*P(C1) > P(X|C2) *P(C2) X will buy a computer

Abdomen Length > 7.1?

no yes

KatydidAntenna Length > 6.0?

no yes

KatydidGrasshopper

Page 125: Applications

Use of knowledge in and as modelsProblem Solving

L M N O P

G

Q

H JI K

FED

B C

A

Breadth first search Queue

Depth first search StackBest first search Priority Queue

Page 126: Applications

Use of knowledge in and as modelsSimulations

Page 127: Applications

Applications

You choose how to represent reality