Information Inference

Mimicking human text-based reasoning

P.D. Bruza & D. Song

Information Ecology Project

Distributed Systems Technology Centre

Penguin Books U.K

Why Linus chose a penguin

Surfing the Himalayas

Introductory remarks

Information inference is a common and real phenomenom It can be modelled by symbolic inference, but this isn’t satisfying The inferences are often latent associations triggered by seeing a

word(s) in the context of other words- so inference is not deductive, but about producing appropriate implicit associations appropriate to the context

We need to look at the problem from a cognitive perspective….

Since last time….

(Philosophical) positioning of the work is clearer Some encouraging experimental results using

information inference to derive query models Some initial ideas about how information inference fits

into an abductive logic for text-based knowledge discovery

Dretske’s Information Content

To a person with prior knowledge K, r being F carries the informationthat s is G if and only if the conditional probability of s being Ggiven r is F is 1 (and less than one given K alone)

We can say that s being G is inferred (informationally) from r is F and K

T= “Why Linus chose a penguin”

1 K) | Torvalds" Linus" being Linus"Pr("

}Peanuts""in character cartoon a is Linus

penguin, a is logoLinux The Linux, invented Torvalds Linus{

So Dretske’s definition does not permit the inference “Linus” is “Linus Torvalds”, though a human being may proceedunder this “hasty” judgment.

Dretske’s information content “sets too high a standard” (Barwise & Seligman)

1T)in penguin"" with is Linus""K,|Torvalds" Linus" is Linus"Pr("

Inferential information content (Barwise &Seligman)

To a person with prior knowledge K, r being F carries the information thats is G, if the person could legitimately infer that s is G from r being Ftogether with K (but could not from K alone)

T= “Why Linus chose a penguin”

aloneK from inferredly legitimate bet can' Torvalds" Linus" being Linus""

}Peanuts""in character cartoon a is Linus

penguin, a is logoLinux The Linux, invented Torvalds Linus{K

“Linus” being with “penguin” in T, together with K, carries the information that “Linus” is “Linus Torvalds”

Barwise & Seligman (con’t)

“… by relativizing information flow to human inference, this definitionmakes room for different standards in what sorts of inferences the personis able and willing to make”

Remarks:- Psychologistic stance taken- Onerous from an engineering standpoint: “different standards” implies “nonmonotonicity”. Consider, “Linux Online: Why Linus chose a penguin” (willing) v.s. “Why Linus chose a penguin” (not willing)

Consequences of psychologism

Representations of information need not be propositional Semantics is not a model-theoretic issue, but a cognitive one - the

“meanings” stored and manipulated by the system should accord with what we have in our heads.

Gärdenfors’ cognitive model

symbolic

conceptual

associationist(sub-conceptual)

Propositionalrepresentation

Geometricrepresentation

Connectionistrepresentation

Conceptual spaces: the property “red”

huechromaticity

brightness

Properties and concepts are dimensional (geometric) objects.Dimensions may be integral - the value in a dimension(s) determines thevalue in another.

red(x)

Barwise & Seligman’s real valued state spaces

7.0:,6.0:,445: brightnesschromhuered

Observation function

Gärdenfors’ cognitive model: how we realize it

symbolic

conceptual

associationist(sub-conceptual)

Propositionalrepresentation

Geometricrepresentation

Connectionistrepresentation

keywords

Geometric representations of words via Hyperspace Analogue to Language (HAL)

reagan = < administration: 0.45, bill: 0.05, budget: 0.07, house: 0.06, president: 0.83, reagan: 0.21, trade: 0.05, veto: 0.06, … >

This example demonstrates how a word is represented as a weighted vector Whose dimensions comprise other words.

The weights represent the strengths of association between “reagan” and other words seen in the same context(s)

How HAL vectors are constructed

…….Kemp urges Reagan to oppose stock tax…..

Slide a window of width n across corpusPer word: Compute weight of association with other words within windowthe weight is inversely proportional to distance

HAL space: each word in the corpus represented by a multi-dimensional vector - a weighted sum of the contexts the word appeared in.(Burgess et al refer to it as a “high dimensional context space”, or a “high dimensional semantic space”)

Remarks about HAL

A HAL space is easy to construct Cognitive compatibility with human information processing

– “word representations learned by HAL account for a variety of semantic phenomena” (Burgess et al)

– Therefore a good candidate for represented “meanings” in accord with our psychologistic stance

A HAL space is a real-valued state space, thus opening the door to driving information inference according to Barwise & Seligman’s definition

– A HAL vector represents a word’s “state” in the context of the text corpus it was derived from

Differences with Burgess et al.

We (often) normalize the weights Pre- and post- vectors are added into a single vector HAL vectors derived from small text corpora (e.g.,

Reuters-21758) seem to be OK HAL vectors are “summed” representations- similar in

spirit to “prototypical concepts” (which are averaged representations

Reagan traces

President Reagan was ignorant about much of the Iran arms scandal

Reagan says U.S. to offer missile treaty

REAGAN SEEKS MORE AID FOR CENTRAL AMERICA

Kemp urges Reagan to oppose stock tax

Prototypical concepts

Prototypical “Reagan” = average of vectors from traces

president: 3.23,administration: 1.82,trade: 0.40,budget: 0.37,veto: 0.34,bill: 0.31,congress: 0.31,tax: 0.29,::

Concept combination: “Pink Elephant”

Elephant = < , , …… >

Heuristic concept combination: “Star wars”

star = <trek: 0.2, episode: 0.05, soviet: 0.3, bush: 0.4, missile: 0.25>

wars = <soviet: 0.1, missile:0.2, iran: 0.33, iraq: 0.28, gulf: 0.4>

starwars = < trek: 0.3, episode: 0.15, soviet: 0.6, bush: 0.53, missile: 0.65, iran: 0.2, iraq: 0.18, gulf: 0.25>

Observation: “star” dominates “wars”

How to weight dimensions appropriately according to context?Weights are affected by how one concept appears in the light of another concept:Intersecting dimensions are emphasized, weights are adjusted according to degree of dominance. (NB moving prototypical concepts in the HAL space is a cleaner way ofdealing with context)

Theoretical background: Information inference via HAL-based information flow computations

)degree( iff ,,1 jin ccjii

scandal iran reagan,

)()()( iff , lightslivesonslightliveon

Barwise&Seligman: state-based “information flow”

HAL-based “information flow”

symbolic conceptual

Degree of inclusion (flow) computation

)i(cQPpk

))jQP(c)i(c(QPpl

)degree(

Consider the “quality properties” above mean weight in the source concept.(Intuition: how much of the salient aspects of the source are contained in thetarget)

Compute the ratio of intersecting dimensions between source and targetconcept to the dimensions in the source concept

source target

Visualizing degree of inclusion between HAL vectors

ABCDFGKLM

A.F.K..Q

source target

Many of the above avg.“quality properties” of thesource concept arepresent in the target, sothe degree of inclusion willbe high

Information Inference in practice: deriving query models

Construct HAL vectors for all vocabulary terms from the document collection

Given a query such as “space program”, compute the information flows from it and use these to expand the query, e.g.

nasa - programspace

Query expansion term derived via information flow computation

(We used the top 80 information flows for expansion without feedback, 65 with feedback)

The experiments

Associated Press 88/89 collections TREC topics 1 – 50, 100-150, 151-200 (titles only). Models for comparison: Baseline, Composition,

Relevance Model, Markov chain model

Baseline Model

BM-25 term weighting (terms were stemmed) Replication of Lafferty & Zhai’s baseline (SIGIR 2001) Dot product matching function

Composition model

Combine the HAL vectors of individual query terms by recursively applying the concept combination heuristic; query terms ranked according to idf (dominance ranking)

starwars = < trek: 0.3, episode: 0.15, soviet: 0.6, bush: 0.53, missile: 0.65, iran: 0.2, iraq: 0.18, gulf: 0.25>

Results

Baseline Model

Composition Model

Info flow Model

AvgPr 0.182 0.197 (+8%)

0.247 (+35%)

InitPr 0.476 0.520 (+10%)

0.544 (+14%)

Recall 1667/3301 1996/3301 (+15%)

2269/3301 (+35%)

The effect of information inference

26% of the 35% improvement in precision of the HAL-based information flow model is due to information inference

For example, the query “space program”. The information flow model infersquery expansion terms such as “Reagan”, “satellites”,”scientists”,“pentagon”, “mars”, “moon”.

These are real inferences with respect “space program”, as these terms do not appear as dimensions in HAL vectors of the concept combination:spaceprogram

Comparison with probabilistic query language models

MC: Markov chain model (Lafferty & Zhai, SIGIR 2001)

MC IM MCwP IMwP

0.201 0.247 0.232 0.258

Scores are average precision

Comparison with probabilistic query language models (con’t)

RM: Relevance model (Lavrenko & Croft, SIGIR 2001)

IM IMwP RM

101-150

AP0.265 0.301 0.261

151-200

AP0.298 0.344 0.319

Scores are average precision

Text-based scientific discovery

Fish OilRaynaud

B1Blood viscosity

B2Platelet Aggregation

B3Vascular Reactivity

Weeber et al “Using Concepts in Literature-Based Discovery JASIST 52(7):548-557

“.., he made the connection between these literatures and formulated the hypothesis thatfish oil may be used for treating Raynaud’s disease..”

Logic of Abduction (Gabbay & Woods)

Abductive logic

Logic of discovery Logic of justification

Hypothesis testing

HAL-based info flow ? ?

Raw material for abduction? Information flows from “Raynaud”

Raynaud

Raynaud: 1.0myocardial: 0.56coronary: 0.54renal: 0.52ventricular: 0.52...oil: 0.23.fish: 0.20..

Some promise, but lack of representation ofintegral dimensions a problem

Index expressions

“Beneficial effects of fish oil on blood viscosity”

beneficial

effects

viscosity

Power index expressions for representing integral dimensions

fish oil effects blood viscosity

eff of fish oil eff on blood viscosity

Information flows are single terms, power index expressions determinehow they may be combined into higher order syntactic structures

Initial results from using information flow computations as a logic of discovery

27 ventricular (0.52) infarction (0.46)27 thromboplastin (0.17)27 pulmonary (0.51) arteries (0.25)27 placental (0.19) protein (0.42)27 monoamine (0.17) oxidase (0.18)27 lupus (0.37) nephritis (0.17)27 instruments (0.17)27 coagulant (0.21)27 blood (0.63) coagulation (0.29)26 umbilical (0.24) vein (0.32)25 fish (0.20)23 viscosity (0.21)23 cigarette (0.26) smokers (0.22)4 fish (0.20) oil (0.23)

Summary

(Barwise & Seligman) and Gärdenfors have very stance wrt “human stance” (Gabbay and Woods also)… psychologism is alive….

An integration of a primitive approximation of a conceptual space with an information inference mechanism driven by information flow computations

An initial attempt towards realizing Gärdenfors’ conceptual spaces– A HAL space is only a primitive approximation– We are looking at Voronoi tessellations

A tiny contribution to Barwise & Seligman’s call for a “distinctively different model of human reasoning”

(We are looking beyond IR)

Information Inference

Documents

Information Theory, Inference and Learning Algorithms (2003)

Kullbackâ€”Leibler information as a basis for strong inference in

Pitching an Argument: Intonation, information, and inference … · Pitching an Argument: Intonation, information, and inference in syllogistic discourse Stuart Ian Hughson T H E

Name Definitions: Evidence Inference Period Inference vs

EE515A Information Theory II Spring 2012 - …j.ee.washington.edu/~bilmes/classes/ee514a_winter_2012/lecture19... · \Information Theory, Inference, and Learning Algorithms", David

Bayesian inference · Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as moreevidence or information

Inference using Partial Information · Usage of partial information Frequentists use partial information all the time: I Composite likelihoods (partial likelihood, conditional likelihood,

NORTHWESTERNUNIVERSITY - Linguistics · 5/5/2017 · quired through the other senses, i.e. hearing, smell, taste, touch); (c) inference (information based on inference from visible

STATS 200: Introduction to Statistical Inference 200: Introduction to Statistical Inference ... Statistical inference Statistical inference = Probability 1 ... STATS 200: Introduction

Information Retrieval as Semantic Inference · Finally, although the Graph Inference model is developed and applied to medical search, it is a general retrieval model applicable to

Chapter 7: One-Sample Inference · Chapter 7: One-Sample Inference 229 Chapter 7: One-Sample Inference Now that you have all this information about descriptive statistics and probabilities,

Real-time Traffic Pattern Analysis and Inference with ... · Real-time Trafﬁc Pattern Analysis and Inference with Sparse Video Surveillance Information Yang Wang1;2, Yiwei Xiao2,

INDUCTION (probable inference) : inference moving from specific facts to general conclusions. DEDUCTION (necessary inference): inference moving from general

Scalable inference of overlapping communities. Neural Information Processing Systems, 2012

Parametric Inference Maximum Likelihood Inference …bioucas/IP/files/statistical_inference.pdf · Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential

M ETHODS OF INFERENCE Hasan Zafari. M ETHODS OF INFERENCE What is reasoning? Inferences with rules trees The inference tree Inference by Inheritance Inference

Kullback—Leibler information as a basis for strong inference in

LNCS 2888 - Static Type-Inference for Trust in Distributed ...seclab.cs.ucdavis.edu/papers/devanbu-5.pdf · Static Type-Inference for Trust in Distributed Information Systems 371

Low precision Inference on GPU - Nvidia · 3 INFERENCE • Inference: using a trained model to make predictions • Much of inference is fwd pass in training • Inference engines

Hybrid In-Database Inference for Declarative …db.cs.berkeley.edu/papers/sigmod11-hybridinference.pdfHybrid In-Database Inference for Declarative Information Extraction Daisy Zhe