Upload
edward-blurock
View
171
Download
0
Embed Size (px)
Citation preview
Applications
Artificial IntelligenceAnd
Simulations
Applications
Pallet of Data StructuresAlgorithms
Choose which to use and Combine them To form your model
Of reality
Reality to modelThe modelerThe computer
You are the artist and the computer is your canvas
Knowledge RepresentationAbstraction
You choose how to represent reality
The choice is not unique
It depends on what aspect of reality you want to represent and how
Applications:Acquisition, management and use
of knowledge
Theme of lecture:
Abstraction of reality through knowledge engineering
Applications:Acquisition, management and use of
knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge
– Feature Acquistion– Concept Abstraction
• Problem Solving• Use of knowledge in and as models
– Problem Solving– Simulations
Storing and Managing Information
Table of data
Database management Systems (DBMS)Storage and retrieval of properties of objects
SpreadsheetsManipulations of and calculations with the data in the table
Each row is a particular object
Each column is a property associated with that objects
Two examples/paradigms of management systems
Database Management System (DBMS)
Organizes data in sets of
tables
Relational Database Management System (RDBMS)
Name Address Parcel #
John Smith 18 Lawyers Dr. 756554T. Brown 14 Summers Tr. 887419
Table A
Table B
Parcel # Assessed Value
887419 152,000446397 100,000
Provides relationships Between
data in the tables
Using SQL- Structured Query Language
• SQL is a standard database protocol, adopted by most ‘relational’ databases
• Provides syntax for data:– Definition – Retrieval– Functions (COUNT, SUM, MIN, MAX, etc)– Updates and Deletes
• SELECT list FROM table WHERE condition• list - a list of items or * for all items
o WHERE - a logical expression limiting the number of records selectedo can be combined with Boolean logic: AND, OR, NOTo ORDER may be used to format results
Spreadsheets
Every row isa different “object”with a set of properties
Every column isa different propertyof the row object
SpreadsheetOrganization of elements
Column A Column B Column C
Row 1Row 2Row 3
Row and columnindicies
Cells with addresses
A7 B4 C10 D5Accessing each cell
Spreadsheet Formulas
Formula: Combination of values or cell references and mathematical operators such as +, -, /, *
The formula displays in the entry bar. This formula is used to add the values in the four cells. The sum is displayed in cell B7.
The results of a formula display in the cell.
With cell, row and column functionsEx. Average, sum, min,max,
Visualizing data:Charts
Applications:Acquisition, management and use of
knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge
– Feature Acquistion– Concept Abstraction
• Use of knowledge in and as models– Problem Solving – Simulations
Making Sense of Knowledge
Time flies like an arrow proverb
Fruit flies like a banana Groucho Marx
There is a semantic and context behind all words
Flies:1. The act of flying2. The insect
Like:1. Similar to2. Are fond of
There is also the elusive “Common Sense”
1. One type of fly, the fruit fly, is fond of bananas2. Fruit, in general, flies through the air just like a banana3. One type of fly, the fruit fly, is just like a banana
A bit complicated because we are speaking metaphorically,Time is not really an object, like a bird, which flies
Translation is not just doing a one-to-one search in the dictionaryComplex Searches is not just searching for individual words
Google translate
16
Adding Semantics:Ontologies
Concept conceptual entity of the domain
Attribute property of a concept
Relation relationship between concepts or properties
Axiom coherent description between Concepts / Properties / Relations via logical expressions
Person
Student Professor
Lecture
isA – hierarchy (taxonomy)
name email
studentnr.
researchfield
topiclecturenr.
attends holds
Structuring of:• Background Knowledge• “Common Sense” knowledge
17
Structure of an OntologyOntologies typically have two distinct components:
Names for important concepts in the domain– Elephant is a concept whose members are a kind of animal– Herbivore is a concept whose members are exactly those animals who eat
only plants or parts of plants – Adult_Elephant is a concept whose members are exactly those elephants
whose age is greater than 20 years
Background knowledge/constraints on the domain– Adult_Elephants weigh at least 2,000 kg– All Elephants are either African_Elephants or Indian_Elephants– No individual can be both a Herbivore and a Carnivore
18
Ontology Definition
Formal, explicit specification of a shared conceptualization
commonly accepted understanding
conceptual model of a domain
(ontological theory)
unambiguous terminology definitions
machine-readability with computational
semantics
[Gruber93]
19
The Semantic WebOntology implementation
"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee
“the wedding cake”
Applications:Acquisition, management and use of
knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge
– Feature Acquisition– Concept Abstraction
• Use of knowledge in and as models– Problem Solving – Simulations
AbstractingKnowledge
Several levels and reasons to abstract knowledge
Feature abstractionSimplifying “reality” so the know can be used in
Computer data structures and algorithms
Concept AbstractionOrganizing and making sense of the immense amount of
data/knowledge we have
Modeling abstractionMaking usable and predictive models of reality
Feature AbstractionSimplifying “reality” so the knowledge can be used in
Computer data structures and algorithms
A photograph of a face
SetOf
pixels
Is it a face?Who’s face?
Feature AbstractionSimplifying “reality” so the knowledge can be used in
Computer data structures and algorithms
A photograph of a face
Is it a face?Who’s face?
The eye sees the pixels
In the visual cortex,Features are detected
Feature AbstractionSimplifying “reality” so the knowledge can be used in
Computer data structures and algorithms
43210 5
76 8 90 1 2 3 4 5
Photograph made up of pixelsThe pixels need to be converted to Data structures the algorithms can understand
FeatureAbstract: Boundary Detection
• Is this a boundary?
Feature Detection
“flat” region:no change in all directions
“edge”:no change along the edge direction
“corner”:significant change in all directions
Harris Detector: Intuition
From a square sampling of pixels
Principle Component Analysis (PCA)
27
• Finding a map of principle components (PCs) of data into an orthogonal space
• Method: Find the set of eigenvalues in a vector space:– The eigen vectors are the principle components– The eigenvalues are the ranking of the vectors
• PCs – Variables with the largest variances– Orthogonality (each coordinate is orthogonal)– Linearity – Optimal least mean-square error
• Limitations? – Strict linearity – specific distribution– Large variance assumption
x1
x2
PC 1PC 2
Rotates coordinate system
Feature Detection
( , ) ,u
E u v u v Mv
Intensity change in shifting window: eigenvalue analysis
1, 2 – eigenvalues of M
direction of the slowest change
direction of the fastest change
(max)-1/2
(min)-1/2
Ellipse E(u,v) = const
Harris Detector: Mathematics of the analysis of pixels Transformation of coordinates
Principle component analysis
Can reduce the set of coordinates
One coordinate
The other coordinate is noise(all points are “shifted” to the Principle component)
Harris Detector: Mathematics
1
2
“Corner”1 and 2 are large,
1 ~ 2;
E increases in all directions
1 and 2 are small;
E is almost constant in all directions
“Edge” 1 >> 2
“Edge” 2 >> 1
“Flat” region
Classification
of the
new coordinates
PCA: Feature from pixels
1
2
“Corner”1 and 2 are large,
1 ~ 2;
E increases in all directions
“Edge” 1 >> 2
“Edge” 2 >> 1
“Flat” region
One principle componentAlong the line
The other component issmall
Note that line can be in any directionPrinciple component follows lineRotation invariant
1
2
“Corner”1 and 2 are large,
1 ~ 2;
E increases in all directions
“Edge” 1 >> 2
“Edge” 2 >> 1
“Flat” region
PCA: Feature from pixels
There is no lineNo principle component
PCA: Feature from pixels
1
2
“Corner”1 and 2 are large,
1 ~ 2;
E increases in all directions
“Edge” 1 >> 2
“Edge” 2 >> 1
“Flat” region
There are two lines(almost) in orthogonal
(perpendicular)Directions
Two principle components
Feature Detection
Ellipse rotates but its shape (i.e. eigenvalues) remains the same
Corner response R is invariant to image rotation
Important property: Rotationally invariant
SIFT Descriptor
• 16x16 Gradient window is taken. Partitioned into 4x4 subwindows.• Histogram of 4x4 samples in 8 directions• Gaussian weighting around center( is 0.5 times that of the scale of a
keypoint)• 4x4x8 = 128 dimensional feature vector
Another localized feature from the pixels
Feature Detection
• Use the scale/orientation to determined by detector to in a normalized frame.
• compute a descriptor in this frame.
Scale example:• moments integrated over an adapted window• derivatives adapted to scale: sIx
Scale & orientation example:Resample all points/regions to 11X11 pixels
• PCA coefficients • Principle components of all points.
SIFT Descriptors also invariant to Scale/Orientation
Feature AbstractionSimplifying “reality” so the knowledge can be used in
Computer data structures and algorithms
43210 5
76 8 90 1 2 3 4 5
New “features”represented
in data structures that can be used in algorithms
Hierarchy of analysis
Hierarchy of features
Simple primitive features
Complex combinationsof simple features
Face detection
Example: Face Detection
• Scan window over image
• Classify window as either:– Face– Non-face
ClassifierWindowFace
Non-face
From the established features
Face Detection Algorithm
Face Localization
Lighting Compensation
Skin Color Detection
Color Space Transformation
Variance-based Segmentation
Connected Component &Grouping
Face Boundary Detection
Verifying/ WeightingEyes-Mouth Triangles
Eye/ Mouth Detection
Facial Feature Detection
Input Image
Output Image
Applications:Acquisition, management and use of
knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge
– Feature Acquistion– Concept Abstraction
• Use of knowledge in and as models– Problem Solving – Simulations
Concept AbstractionOrganizing and making sense of the immense amount of
data/knowledge we have
Generalization
The ability of an algorithm to perform accurately on new, unseen examples after having trained on a learning data set
Generalization Consider the following regression problem: Predict real value on the y-axis from the real value on the x-axis. You are given 6 examples: {Xi,Yi}.
X*What is the y-value for a new query ?
Generalization
X*What is the y-value for a new query ?
Generalization
X*What is the y-value for a new query ?
Generalizationwhich curve is best?
X*
What is the y-value for a new query ?
Generalization
Occam’s razor: prefer the
simplest hypothesis consistent with data.
Have to find a balance
of constraints
48
Two Schools of Thought
1. Statistical “Learning” The data is reduced to vectors of numbers Statistical techniques are used for the tasks to be performed.
2. Structural “Learning”
The data is converted to a discrete structure (such as a grammar or a graph) and the techniques are related to computer science subjects (such as parsing and graph matching).
A spectrum of machine learning tasks
• High-dimensional data (e.g. more than 100 dimensions)
• The noise is not sufficient to obscure the structure in the data if we process it right.
• There is a huge amount of structure in the data, but the structure is too complicated to be represented by a simple model.
• The main problem is figuring out a way to represent the complicated structure that allows it to be learned.
• Low-dimensional data (e.g. less than 100 dimensions)
• Lots of noise in the data
• There is not much structure in the data, and what structure there is, can be represented by a fairly simple model.
• The main problem is distinguishing true structure from noise.
Statistics--------------------- Artificial Intelligence
Supervised learning
Un-Supervised learning
Concept Acquisition
Statistics
learning with the presence of an expert
Data is labelled with a class or value
Goal:: predict class or value label
c1
c2
c3
Supervised Learning
Learn a properties of a classificationDecision makingPredict (classify) sample → discrete set of class labels
e.g. C = {object 1, object 2 … } for recognition taske.g. C = {object, !object} for detection task
Spam
No-Spam
learning without the presence of an expert
Data is unlabelled with a class or value
Goal:: determine data patterns/groupings
and the properties of that classification
Unsupervised Learning
Association or clustering::grouping a set of instances by attribute similarity
e.g. image segmentation
Key concept: Similarity
Statistical Methods
Regression::Predict sample → associated real (continuous) value
e.g. data fitting
x1
x2 P
C1
PC
2
Learning within the constraints of the method
Data is basically n-dimensional set of numerical attributes
Deterministic/Mathematical algorithms based on probability distributions
Principle Component Analysis::Transform to a new (simpler) set of coordinates
e.g. find the major component of the data
Pattern RecognitionAnother name for machine learning
• A pattern is an object, process or event that can be given a name.
• A pattern class (or category) is a set of patterns sharing common attributes and usually originating from the same source.
• During recognition (or classification) given objects are assigned to prescribed classes.
• A classifier is a machine which performs classification.
“The assignment of a physical object or event to one of several prespecified categeries” -- Duda & Hart
Cross-ValidationIn the mathematics of statisticsA mathematical definition of the errorFunction of the probability distribution
Average
Standard deviation
In machine learning, no such distribution exists
FullData set
Training set
Test set
Build the MLData structure
Determine Error
Classification algorithms– Fisher linear discriminant– KNN– Decision tree– Neural networks– SVM– Naïve bayes– Adaboost– Many many more ….
– Each one has its properties wrt bias, speed, accuracy, transparency…
Feature extraction
Task: to extract features which are good for classification.Good features: • Objects from the same class have similar feature values.
• Objects from different classes have different values.
“Good” features “Bad” features
SimilarityTwo objects
belong to the same classification
IfThe are “close”
x1
x2 ?
?
?
??Distance between them is small
Need a function
F(object1, object1) = “distance” between them
Similarity measureDistance metric
• How do we measure what it means to be “close”?
• Depending on the problem we should choose an appropriate distance metric.
For example: Least squares distance
Types of Model
Discriminative Generative
Generative vs. Discriminative
Overfitting and underfitting
Problem: how rich class of classifications q(x;θ) to use.
underfitting overfittinggood fit
Problem of generalization: a small emprical risk Remp does not imply small true expected risk R.
GenerativeCluster Analysis
Create “clusters”Depending on distance metric
HierarchialBased on “how close”
Objects are
KNN – K nearest neighbors
x1
x2 ?
?
?
?
– Find the k nearest neighbors of the test example , and infer its class using their known class.
– E.g. K=3
?
Discrimitive:Support Vector Machine
• Q: How to draw the optimal linear separating hyperplane? A: Maximizing margin
• Margin maximization– The distance between H+1 and H-1:
– Thus, ||w|| should be minimized
64
Margin
Prediction Based on Bayes’ Theorem
• Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes’ theorem
• Informally, this can be viewed as
posteriori = likelihood x prior/evidence
• Predicts X belongs to Ci iff the probability P(Ci|X) is the highest
among all the P(Ck|X) for all the k classes
• Practical difficulty: It requires initial knowledge of many probabilities, involving significant computational cost
65
)(/)()|()()()|()|( XXX
XX PHPHPPHPHPHP
Naïve Bayes Classifierage income studentcredit_ratingbuys_computer
<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
66
Class:C1:buys_computer = ‘yes’C2:buys_computer = ‘no’
P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
Naïve Bayes Classifierage income studentcredit_ratingbuys_computer
<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
67
Class:C1:buys_computer = ‘yes’C2:buys_computer = ‘no’
Want to classify
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
Will X buy a computer?
Naïve Bayes Classifier
68
Key: Conditional probability
P(X|Y) The probability that X is true, given Y
P(not rain| sunny) > P(rain | sunny)
P(not rain| not sunny) < P(rain | not sunny)
Classifier: Have to include the probability of the condition
P(not rain | sunny)*P(sunny) How often did it really not rain, given that it was actually sunny
Naïve Bayes Classifier
69
Class:C1:buys_computer = ‘yes’C2:buys_computer = ‘no’
Want to classify
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
Will X buy a computer?
Which “conditional probability” is greater?
P(X|C1)*P(C1) > P(X|C2) *P(C2) X will buy a computer
P(X|C1) *P(C1) < P(X|C2) *P(C2) X will not buy a computer
Naïve Bayes Classifierage income studentcredit_ratingbuys_computer
<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
70
Class:C1:buys_computer = ‘yes’C2:buys_computer = ‘no’
X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
Naïve Bayes Classifier• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
71
Naïve Bayes ClassifierP(X|Ci) :
P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028 P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Bigger
Decision Tree Classifier
Ross Quinlan
Ante
nna
Len
gth
10
1 2 3 4 5 6 7 8 9 10
123456789
Abdomen Length
Abdomen Length > 7.1?
no yes
KatydidAntenna Length > 6.0?
no yes
KatydidGrasshopper
Grasshopper
Antennae shorter than body?
Cricket
Foretiba has ears?
Katydids Camel Cricket
Yes
Yes
Yes
No
No
3 Tarsi?
No
Decision trees predate computers
• Decision tree – A flow-chart-like tree structure– Internal node denotes a test on an attribute– Branch represents an outcome of the test– Leaf nodes represent class labels or class distribution
• Decision tree generation consists of two phases– Tree construction
• At start, all the training examples are at the root• Partition examples recursively based on selected attributes
– Tree pruning• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample– Test the attribute values of the sample against the decision tree
Decision Tree Classification
• Basic algorithm (a greedy algorithm)– Tree is constructed in a top-down recursive divide-and-conquer manner– At start, all the training examples are at the root– Attributes are categorical (if continuous-valued, they can be discretized
in advance)– Examples are partitioned recursively based on selected attributes.– Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)• Conditions for stopping partitioning
– All samples for a given node belong to the same class– There are no remaining attributes for further partitioning – majority
voting is employed for classifying the leaf– There are no samples left
How do we construct the decision tree?
Information Gain as A Splitting Criteria• Select the attribute with the highest information gain (information gain is the
expected reduction in entropy).
• Assume there are two classes, P and N
– Let the set of examples S contain p elements of class P and n elements of class N
– The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as
np
nnp
nnp
pnp
pSE 22 loglog)(
0 log(0) is defined as 0
Information Gain in Decision Tree Induction• Assume that using attribute A, a current set will be
partitioned into some number of child sets
• The encoding information that would be gained by branching on A
)()()( setschildallEsetCurrentEAGain
Note: entropy is at its minimum if the collection of objects is completely uniform
Person Hair Length
Weight Age Class
Homer 0” 250 36 MMarge 10” 150 34 F
Bart 2” 90 10 MLisa 6” 78 8 F
Maggie 4” 20 1 FAbe 1” 170 70 M
Selma 8” 160 41 FOtto 10” 180 38 M
Krusty 6” 200 45 M
Comic 8” 290 38 ?
Hair Length <= 5?yes no
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911
Entropy(1F,3M) = -(1/4)log2(1/4) - (3/4)log2(3/4)
= 0.8113
Entropy(3F,2M) = -(3/5)log2(3/5) - (2/5)log2(2/5)
= 0.9710
np
nnp
nnp
pnp
pSEntropy 22 loglog)(
Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911
)()()( setschildallEsetCurrentEAGain
Let us try splitting on Hair length
Weight <= 160?yes no
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911
Entropy(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5)
= 0.7219
Entropy(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4)
= 0
np
nnp
nnp
pnp
pSEntropy 22 loglog)(
Gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900
)()()( setschildallEsetCurrentEAGain
Let us try splitting on Weight
age <= 40?yes no
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = 0.9911
Entropy(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6)
= 1
Entropy(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3)
= 0.9183
np
nnp
nnp
pnp
pSEntropy 22 loglog)(
Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183
)()()( setschildallEsetCurrentEAGain
Let us try splitting on Age
Weight <= 160?yes no
Hair Length <= 2?yes no
Of the 3 features we had, Weight was best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified… So we simply recurse!
This time we find that we can split on Hair length, and we are done!
Weight <= 160?
yes no
Hair Length <= 2?
yes no
We need don’t need to keep the data around, just the test conditions.
Male
Male Female
How would these people be classified?
Applications:Acquisition, management and use of
knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge
– Feature Acquistion– Concept Abstraction
• Use of knowledge in and as models– Problem Solving– Simulation
Using KnowledgeProblem Solving
Simulations
Searching for a solution
Combining models to form a large comprehensive model
Problem Solving
Basis of the searchOrder in which nodes are evaluated and expanded
Determined by Two Lists
OPEN: List of unexpanded nodesCLOSED: List of expanded nodes
Searching for a solution through all possible solutionsFundamental algorithm in artificial intelligence
Graph Search
Abstraction:State of a system
chess
Tic-tak-toe
Water jug problemTraveling salemen’s problem
In problem solving:
Search for the steps
leading to the solution
The individual stepsare the
states of the system
Solution SpaceThe set of all states of the problemIncluding the goal state(s)
All possible board combinations
All possible reference points
All possible combinations
Search Space
Each system state(nodes)
is connected by rules(connections) on how to get
from one state to another
Search Space
How the states are connected
Legal moves
Paths between points Possible operations
Strategies to Search Space of System States
• Breath first search• Depth first search• Best first search
Determines order in which the states are searched to find solution
Breadth-first searching• A breadth-first search (BFS)
explores nodes nearest the root before exploring nodes further away
• For example, after searching A, then B, then C, the search proceeds with D, E, F, G
• Node are explored in the order A B C D E F G H I J K L M N O P Q
• J will be found before NL M N O P
G
Q
H JI K
FED
B C
A
Depth-first searching• A depth-first search (DFS)
explores a path all the way to a leaf before backtracking and exploring another path
• For example, after searching A, then B, then D, the search backtracks and tries another path from B
• Node are explored in the order A B D E H L M N I O P C F G J K Q
• N will be found before JL M N O P
G
Q
H JI K
FED
B C
A
Breadth First Search
|| |
||| | |
| | |||||
Items between red bars are siblings.
goal is reached or open is empty.
Expand A to new nodes B, C, D
Expand B to new node E,F
Send to back of queue
Queue: FILO
Depth first SearchExpand A to new nodes B, C, D
Expand B to new node E,F
Send to front of stack
Stack: FIFO
Best First SearchBreadth first search: queue (FILO)Depth first search: stack (FIFO)
Uninformed searches:No knowledge of how good the current solution is(are we on the right track?)
Best First Search: Priority Queue
Associated with each node is a heuristic
F(node) = the quality of the node to lead to a final solution
A* searchIdea: avoid expanding paths that are already expensive•• Evaluation function f(n) = g(n) + h(n)•• g(n) = cost so far to reach n• h(n) = estimated cost from n to goal• f(n) = estimated total cost of path through n to goal
This is the hard/unknown part
If h(n) is an underestimate, then the algorithm is guarenteed to find a solution
Admissible heuristics
• A heuristic h(n) is admissible if for every node n,h(n) ≤ h*(n), where h*(n) is the true cost to reach the goal state from n.
• An admissible heuristic never overestimates the cost to reach the goal, i.e., it is optimistic
• Example: hSLD(n) (never overestimates the actual road distance)
• Theorem: If h(n) is admissible, A* using TREE-SEARCH is optimal
Graph SearchSeveral Structures Used
Graph SearchThe graph as search space
Breadth first search Queue
Depth first search StackBest first search Priority Queue
Stacks and queues, depending on search strategy
Applications:Acquisition, management and use of
knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge
– Feature Acquistion– Concept Abstraction
• Use of knowledge in and as models– Problem Solving– Simulations
Problem SolvingSimulations
Example: Climate Simulation
Climate Model
Climate Modeling
A multitude of sub-models
submodel
submodel
submodel
submodelsubmodel
submodel
submodel
submodelsubmodel
submodelsubmodel
submodel
submodel
submodel
Many stemming from the techniques discussed previously
Physical processes regulating climatePhysical models representing all the interactions that can occur
RadiationEven one physical quantity can have many
source models, sink models and interaction models
“Earth System Model”
And ocean model, sea-ice model, land surface model, etc…
3D atmosphere
3D ocean
2D sea ice
AtmosphericCO2
2D land surface
Land biogeochemi
stry
Ocean biogeochem
istryOcean sediments
3D ice sheets
Mathematical Modelsrepresenting
physical principles
Meteorological Primitive Equations
• Applicable to wide scale of motions; > 1hour, >100km
Global Climate Model PhysicsTerms F, Q, and Sq represent physical processes
• Equations of motion, F– turbulent transport, generation, and dissipation of momentum
• Thermodynamic energy equation, Q– convective-scale transport of heat– convective-scale sources/sinks of heat (phase change)– radiative sources/sinks of heat
• Water vapor mass continuity equation– convective-scale transport of water substance– convective-scale water sources/sinks (phase change)
Model Physical ParameterizationsPhysical processes breakdown:
• Moist Processes– Moist convection, shallow convection, large scale condensation
• Radiation and Clouds– Cloud parameterization, radiation
• Surface Fluxes– Fluxes from land, ocean and sea ice (from data or models)
• Turbulent mixing– Planetary boundary layer parameterization, vertical diffusion, gravity
wave drag
Process Models and Parameterization
•Boundary Layer•Clouds
StratiformConvective
•Microphysics
Evolution of Global Climate Models (GCMs)
… increasing complexity.
Due to demand(want/need to model more complex systems)
Increased computing power enables more complex models
http://www.usgcrp.gov/usgcrp/images/ocp2003/ocpfy2003-fig3-4.htm
The past, present and future of climate models
During the last 25 years, different components are added to the climate model to better represent our climate system
Grid Discretizations
Equations are distributed on a sphere
• Different grid approaches: – Rectilinear (lat-lon)– Reduced grids– ‘equal area grids’: icosahedral, cubed sphere– Spectral transforms
• Different numerical methods for solution:– Spectral Transforms– Finite element– Lagrangian (semi-lagrangian)
• Vertical Discretization– Terrain following (sigma)– Pressure– Isentropic– Hybrid Sigma-pressure (most common)
The heart ofComputational Fluid Dynamics (CFD)
Different time and spacial scalesMacroscopic properties
intermingling with macroscopic properties
Fast processes(ex. Molecular reactions)
Interacting withVery slow process
(ex. Transport/movement of moleculesTo other regions)
This often makes mathematically solving the problems very difficult
1.How did I get here?
~106 m - 1m
~107 m ~105 m
~103 m
The planetary scale Cloud cluster scale
Cloud scaleCloud microphysical
scale
Scales of Atmospheric Motions/Processes
Anthes et al. (1975)
Resolved Scales
Global Models
Future Global Models
Cloud/Mesoscale/Turbulence Models
Cloud DropsMicrophysicsCHEMISTRY
10 m 100 m 1 km 10 km 100 km 1000 km 10000 km
turbulence Cumulusclouds
Cumulonimbusclouds
Mesoscale Convective systems
Extratropical Cyclones
Planetary waves
Large Eddy Simulation (LES) Model Cloud System Resolving Model (CSRM)
Numerical Weather Prediction (NWP) ModelGlobal Climate Model
No single model can encompass all relevant processes
DNS
mm
Cloud microphysics
Knowledge RepresentationAbstraction
You choose how to represent reality
The choice is not unique
It depends on what aspect of reality you want to represent and how
Applications:Acquisition, management and use of
knowledge• Storage and management of Information• Making Sense of Knowledge• Acquisition of knowledge
– Feature Acquistion– Concept Abstraction
• Problem Solving• Use of knowledge in and as models
– Problem Solving– Simulations
Storage and management of Information
Name Address Parcel #
John Smith 18 Lawyers Dr. 756554T. Brown 14 Summers Tr. 887419
Table A
Table BParcel # Assessed Value
887419 152,000446397 100,000
Making Sense of Knowledge
Concept conceptual entity of the domain
Attribute property of a concept
Relation relationship between concepts or properties
Axiom coherent description between Concepts / Properties / Relations via logical expressions
Person
Student Professor
Lecture
isA – hierarchy (taxonomy)
name email
studentnr.
researchfield
topiclecturenr.
attends holds
Acquisition of knowledge:Feature Acquistion
“flat” region:no change in all directions
“edge”:no change along the edge direction
“corner”:significant change in all directions
From a square sampling of pixels
Acquisition of knowledge:Concept Abstraction
P(X|C1)*P(C1) > P(X|C2) *P(C2) X will buy a computer
Abdomen Length > 7.1?
no yes
KatydidAntenna Length > 6.0?
no yes
KatydidGrasshopper
Use of knowledge in and as modelsProblem Solving
L M N O P
G
Q
H JI K
FED
B C
A
Breadth first search Queue
Depth first search StackBest first search Priority Queue
Use of knowledge in and as modelsSimulations
Applications
You choose how to represent reality