27
The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

The dynamics of iterated learning

Tom GriffithsUC Berkeley

with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Page 2: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Learning

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.data

Page 3: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Iterated learning(Kirby, 2001)

What are the consequences of learners learning from other learners?

Page 4: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Objects of iterated learning

• Knowledge communicated across generations through provision of data by learners

• Examples:– religious concepts– social norms– myths and legends– causal theories– language

Page 5: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Language

• The languages spoken by humans are typically viewed as the result of two factors– individual learning – innate constraints (biological evolution)

• This limits the possible explanations for different kinds of linguistic phenomena

Page 6: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Linguistic universals

• Human languages possess universal properties– e.g. compositionality

(Comrie, 1981; Greenberg, 1963; Hawkins, 1988)

• Traditional explanation:– linguistic universals reflect innate constraints

specific to a system for acquiring language(e.g., Chomsky, 1965)

Page 7: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Cultural evolution

• Languages are also subject to change via cultural evolution (through iterated learning)

• Alternative explanation:– linguistic universals emerge as the result of the fact

that language is learned anew by each generation (using general-purpose learning mechanisms)

(e.g., Briscoe, 1998; Kirby, 2001)

Page 8: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Analyzing iterated learning

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

PL(h|d): probability of inferring hypothesis h from data d

PP(d|h): probability of generating data d from hypothesis h

PL(h|d)

PP(d|h)

PL(h|d)

PP(d|h)

Page 9: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

• Variables x(t+1) independent of history given x(t)

• Converges to a stationary distribution under easily checked conditions (i.e., if it is ergodic)

x x x x x x x x

Transition matrixT = P(x(t+1)|x(t))

Markov chains

Page 10: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Stationary distributions

• Stationary distribution:

• In matrix form

is the first eigenvector of the matrix T– easy to compute for finite Markov chains– can sometimes be found analytically

i = P(x(t +1) = i |j

∑ x(t ) = j)π j = Tijπ j

j

=Tπ

Page 11: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Analyzing iterated learning

d0 h1 d1 h2PL(h|d) PP(d|h) PL(h|d)

d2 h3PP(d|h) PL(h|d)

d PP(d|h)PL(h|d)h1 h2d PP(d|h)PL(h|d)

h3

A Markov chain on hypotheses

d0 d1h PL(h|d) PP(d|h)d2h PL(h|d) PP(d|h) h PL(h|d) PP(d|h)

A Markov chain on data

Page 12: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Bayesian inference

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Reverend Thomas Bayes

• Rational procedure for updating beliefs

• Foundation of many learning algorithms

• Lets us make the inductive biases of learners precise

Page 13: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Bayes’ theorem

P(h | d) =P(d | h)P(h)

P(d | ′ h )P( ′ h )′ h ∈H

Posteriorprobability

Likelihood Priorprobability

Sum over space of hypothesesh: hypothesis

d: data

Page 14: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Iterated Bayesian learning

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

PL(h|d)

PP(d|h)

PL(h|d)

PP(d|h)

PL (h | d) =PP (d | h)P(h)

PP (d | ′ h )P( ′ h )′ h ∈H

Assume learners sample from their posterior distribution:

Page 15: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Stationary distributions

• Markov chain on h converges to the prior, P(h)

• Markov chain on d converges to the “prior predictive distribution”

P(d) = P(d | h)h

∑ P(h)

(Griffiths & Kalish, 2005)

Page 16: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Explaining convergence to the prior

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

PL(h|d)

PP(d|h)

PL(h|d)

PP(d|h)

• Intuitively: data acts once, prior many times

• Formally: iterated learning with Bayesian agents is a Gibbs sampler for P(d,h)

(Griffiths & Kalish, submitted)

Page 17: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Gibbs sampling

For variables x = x1, x2, …, xn

Draw xi(t+1) from P(xi|x-i)

x-i = x1(t+1), x2

(t+1),…, xi-1(t+1)

, xi+1(t)

, …, xn(t)

Converges to P(x1, x2, …, xn)

(a.k.a. the heat bath algorithm in statistical physics)

(Geman & Geman, 1984)

Page 18: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Gibbs sampling

(MacKay, 2003)

Page 19: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Explaining convergence to the prior

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

PL(h|d)

PP(d|h)

PL(h|d)

PP(d|h)

When the target distribution is P(d,h), the conditional distributions are PL(h|d) and PP(d|h)

Page 20: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Implications for linguistic universals

• When learners sample from P(h|d), the distribution over languages converges to the prior– identifies a one-to-one correspondence between

inductive biases and linguistic universals

Page 21: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Iterated Bayesian learning

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

PL(h|d)

PP(d|h)

PL(h|d)

PP(d|h)

PL (h | d) =PP (d | h)P(h)

PP (d | ′ h )P( ′ h )′ h ∈H

Assume learners sample from their posterior distribution:

Page 22: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

PL (h | d)∝PP (d | h)P(h)

PP (d | ′ h )P( ′ h )′ h ∈H

⎢ ⎢ ⎢

⎥ ⎥ ⎥

r

r = 1 r = 2 r =

From sampling to maximizing

Page 23: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

• General analytic results are hard to obtain– (r = is a stochastic version of the EM algorithm)

• For certain classes of languages, it is possible to show that the stationary distribution gives each hypothesis h probability proportional to P(h)r

– the ordering identified by the prior is preserved, but not the corresponding probabilities

(Kirby, Dowman, & Griffiths, submitted)

From sampling to maximizing

Page 24: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Implications for linguistic universals

• When learners sample from P(h|d), the distribution over languages converges to the prior– identifies a one-to-one correspondence between inductive

biases and linguistic universals

• As learners move towards maximizing, the influence of the prior is exaggerated– weak biases can produce strong universals– cultural evolution is a viable alternative to traditional

explanations for linguistic universals

Page 25: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Analyzing iterated learning

• The outcome of iterated learning is determined by the inductive biases of the learners– hypotheses with high prior probability ultimately

appear with high probability in the population

• Clarifies the connection between constraints on language learning and linguistic universals…

• …and provides formal justification for the idea that culture reflects the structure of the mind

Page 26: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman

Conclusions

• Iterated learning provides a lens for magnifying the inductive biases of learners– small effects for individuals are big effects for groups

• When cognition affects culture, studying groups can give us better insight into individuals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.data

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 27: The dynamics of iterated learning Tom Griffiths UC Berkeley with Mike Kalish, Steve Lewandowsky, Simon Kirby, and Mike Dowman