Using auto-encoders to model early infant categorization: results, predictions and insights

Preview:

Citation preview

Using auto-encoders to model early infant categorization:

results, predictions and insights

Overview

• An odd categorization asymmetry was observed in 3-4 month old infants.

• We explain this asymmetry using a connectionist auto-encoder model.

• Our model made a number of predictions, which turned out to be correct.

• We used a more neurobiologically plausible encoding for the stimuli.

• The model can now show how young infants’ reduced visual acuity may actually help them do basic-level categorization.

Background on infant statistical category-learning

Quinn, Eimas, & Rosenkrantz (1993) noticed a rather surprising categorization asymmetry in 3-4 month old infants:

– Infants familiarized on cats are surprised by novel dogs

– BUT infants familiarized on dogs are bored by novel cats.

How their experiment worked

Familiarization phase: infants saw 6 pairs of pictures of animals, say, cats, from one category (i.e., a total of 12 different animals)

Test phase: infants saw a pair consisting of a new cat and a new dog. Their gaze time was measured for each of the two novel animals.

Familiarization Trials

Infant

Test phase

Infant

Compare looking times

Results (Quinn et al., 1993):The categorization asymmetry

– Infants familiarized on cats look significantly longer at the novel dog in the test phase than the novel cat.

– No significant difference for infants familiarized on dogs on the time they look at a novel cat compared to a novel dog.

Our hypothesis

We assume that infants are hard-wired to be sensitive to novelty (i.e., they look longer at novel objects than at familiar objects).

Cats, on the whole, are less varied and thus are included in the category of Dogs.

Thus, when they have seen a number of cats, a dog is perceived as novel. But, when they have seen a number of dogs, the new cat is perceived as “just another dog.”

Statistical distributions of patterns are what count

The infants are becoming sensitive to the statistical distributions of the patterns they are observing.

Consider the distribution of values of a particular characteristic for Cats and Dogs

0.2 0.4 0.6 0.8 1

cats

dogs

Note that the distribution for Cats is - narrower than that of Dogs- included in that of Dogs.

Suppose an infant has become familiarized with the distribution for cats

0.2 0.4 0.6 0.8 1

cats

dogs

And then sees a dog

Chances are the new stimulus will fall outside of the familiarized range of values

On the other hand, if an infant has become familiarized with

the distribution for Dogs

0.2 0.4 0.6 0.8 1

cats

dogs

And then sees a cat

Chances are the new stimulus will be inside the familiarized range of values

How could we model this asymmetry?

We based our connectionist model on a model of infant categorization proposed by Sokolov (1963).

Sokolov’s (1963) model

Stimulus in the environment

Encode

Stimulus in the environment

Encode

Decode and Compare

equal?

Stimulus in the environment

Encode

Decode and Compare

Adjust

Stimulus in the environment

Encode

Decode and Compare

Adjust

Stimulus in the environment

Encode

Decode and Compare

Adjust

equal?

Stimulus in the environment

Encode

Decode and Compare

Adjust

Stimulus in the environment

Encode

Decode and Compare

Adjust

Stimulus in the environment

Encode

Decode and Compare

Adjust

equal?

Stimulus in the environment

Encode

Decode and Compare

Adjust

Stimulus in the environment

Encode

Decode and Compare

Adjust

Continue looping…

…until the internal representation corresponds to the external stimulus

Using an autoassociator to simulate the Sokolov model

Stimulus from the environment

Stimulus from the environment

encode

Stimulus from the environment

decode

encode

Stimulus from the environment

decode

compare

encode

Stimulus from the environment

decodeadjustweights

encode

Stimulus from the environment

decode

encode

Stimulus from the environment

decode

encode

Stimulus from the environment

decode

encode

Stimulus from the environment

decode

compare

encode

Stimulus from the environment

decodeadjustweights

encode

Stimulus from the environment

decode

encode

Stimulus from the environment

decode

encode

Stimulus from the environment

decode

encode

Stimulus from the environment

decode

compare

encode

Stimulus from the environment

decodeadjustweights

encode

Continue looping…

…until the internal representation corresponds to the external stimulus

Infant looking time network error

In the Sokolov model, an infant continues to look at the image until the discrepancy between the image and the internal representation of the image drops below a certain threshold.

In the auto-encoder model, the network continues to process the input until the discrepancy between the input and the (decoded) internal representation of the input drops below a certain (error) threshold.

Input to our modelWe used a three-layer, 10-8-10, non-linear auto-encoder (i.e., a network that tries to reproduce on output what it sees on input) to model the data.

The inputs were ten feature values, normalized between 0 and 1.0 across all of the images, taken from the original stimuli used by Quinn et al. (1993). They were head length, head width, eye separation, ear separation, ear length, nose length, nose width, leg length vertical extent, and horizontal extent.

The distributions – and, especially, the amount of inclusion – of these features in shown in the following graphs.

-0.2 0.2 0.4 0.6 0.8 1

1

2

3

4

0.2 0.4 0.6 0.8 1

0.5 1

1.5 2

2.5

0.2 0.4 0.6 0.8 1 1.2

0.5

1

1.5

2

2.5

-0.4 -0.2 0.2 0.4 0.6 0.8 1

0.5

1

1.5

2

-0.25 0.25 0.5 0.75 1 1.25 1.5

0.5

1

1.5

2

2.5

0.2 0.4 0.6 0.8 1

0.5

1

1.5

2

2.5

3

ear separation ear length vertical extent

head length head width eye separation

Dogs

Cats

Comparing the distributions of the input features

Results of Our Simulation

0.2

0.3

0.4

0.5

"cats"learned

first

"dogs"learned

first

condition

error

novel cat

novel dog

0.2

0.3

0.4

0.5

"cats"learned

first

"dogs"learned

first

condition

error

novel cat

novel dog

1 2

A Prediction of the auto-encoder model

• If we were to reverse the inclusion relationship between Dogs and Cats, we should be able to reverse the asymmetry.

• We selected the new stimuli from dog- and cat-breeder books (and very slightly morphed some of these stimuli).

• We created a set of Cats and Dogs, such that Cats now included Dogs – i.e., the Cat category was the broad category and the Dog category was the narrow category.

Reversing the Inclusion Relationship

Eye separation

Ear length

“Reversed” distributions:Cats include Dogs

Old distributions:Dogs include Cats

-0.2 0.2 0.4 0.6 0.8 1

1

2

3

4

-0.25 0.25 0.5 0.75 1 1.25 1.5

0.5

1

1.5

2

2.5

Dogs

Cats

Cats

Dogs

0 1 2 3 4 5 6 7 8 9

10 11

0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 1,1

0 1 2 3 4 5 6 7 8 9

10 11 12 13 14

0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 1,1

Dogs

Dogs

Cats

Cats

Results

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Cats Dogs

Familiarization stimuli

Netw

ork

err

or

new cat

new dog

20

30

40

50

60

70

80

Cats Dogs

Familiarization stimuliA

tten

tio

n

New cat

New dog

Prediction by the model 3-4 month infant data

Removing the inclusion relationship:Another prediction from the model

Our model also predicts that, regardless of the variance of each category, if we remove the inclusion relationship, we should eliminate the categorization asymmetry.

A new set of cat/dog stimuli was created in which there is no inclusion relationship

Cats

Dogs

Prediction and Empirical Results: The categorization asymmetry disappears.

0

0.1

0.2

0.3

0.4

0.5

Dogs Cats

Familiarization stimuli

Ave

rag

e er

ror

novel dogs

novel cats

0

10

20

30

40

50

60

70

Dogs Cats

Familiarization stimuli

Att

entio

n %

novel dogs

novel cats

Prediction of the auto-encoder Infant data

A critique of our methodology: The use of explicit features

• We used explicit features (head length, leg length, ear separation, nose length, etc.) to characterize the animals (we hand-measured the values using the photos shown to the infants).

• We decided instead to use simply Gabor-filtered spatial-frequency information to characterize the pictures.

The Forest and the Trees:What are “spatial frequencies”?

The Forest from 10 miles away

Very low spatial frequencies

The Forest and the Trees:What are “spatial frequencies”?

Low spatial frequencies

The Forest from 5 miles away

The Forest from 5 miles away

The Forest and the Trees:What are “spatial frequencies”?

Medium spatial frequencies

The Forest from 5 miles away

The Forest from 5 miles away

The Forest from 1 mile away

The Forest and the Trees:What are “spatial frequencies”?

Medium-high spatial frequenciess

The Forest from 5 miles away

The Forest from 5 miles away

The Forest from 05 miles away; outline of some Trees

The Forest from 1/2 mile away; outline of some Trees

The Forest and the Trees:What are “spatial frequencies”?

High spatial frequenciess

The Forest from 5 miles away

The Forest from 5 miles away

The Forest from 05 miles away; outline of some Trees

The Forest from 05 miles away; outline of some Trees

The Forest from 200 m. away; Trees visible, but no branches or leaves

The Forest and the Trees:What are “spatial frequencies”?

Very high spatial frequenciess

The Forest from 5 miles away

The Forest from 5 miles away

The Forest from 05 miles away; outline of some Trees

The Forest from 05 miles away; outline of some Trees

The Forest from 200 yards away; Trees visible, but no branches or leaves

50 m. away; Forest no longer visible. Trees with branches visible but no leaves

The Forest and the Trees:What are “spatial frequencies”?

Extremely high spatial frequencies

The Forest from 5 miles away

The Forest from 5 miles away

The Forest from 05 miles away; outline of some Trees

The Forest from 05 miles away; outline of some Trees

The Forest from 200 yards away; Trees visible, but no branches or leaves

50 yards away; Forest no longer visible. Trees with branches visible but no leaves

10 m. away; Forest no longer visible. Trees with branches and individual leaves visible

The Forest and the Trees:Combining spatial frequencies to obtain the full image

The Forest from 5 miles away

The Forest from 1 mile away

The Forest from 1/2 mile away; outline of some Trees

The Forest from 400 m. away; outline of some Trees

The Forest from 200 m. away; Trees visible, but no branches or leaves

50 m. away; Forest no longer visible. Trees with branches visible but no leaves

10 m. away; Forest no longer visible. Trees with branches and individual leaves visible

Full image

Cats: infant-to-adult visual acuity

Very low spatial frequencies

Two-month old vision

3-4 month old vision

(almost) adult vision

Cats: infant-to-adult visual acuity

Adult Vision with full range of spatial frequencies

Spatial frequency maps of images with Gabor filtering 

This allows us to characterize each dog/cat image with a 26-unit vector.

We “cover” this map with spatial-frequency ovals along various orientations of the image. (Each oval is normalized to have approximately the same energy.)

low freq. high

freq.

spatial-frequency map

This is an experiment.

Consider the following image.

Moral of the story:

Sometimes too much detail hinders categorization (even for adults!)

The same is true for infants: Reducing high-frequency information improves category discrimination for distinct categories

Reducing the range of the spatial frequencies from the retinal map to V1 decreases within-category variance.

This decreases the difference between two exemplars of the same category, but increases the difference between exemplars from two different categories.

This will make learning “distant” basic-level or super-ordinate category distinctions easier (but subordinate-level category distinctions will be more difficult).

In other words, reduced visual acuity might actually be good for infant categorization.

• Visual acuity in infants is not the same as that of adults. They do not perceive high-spatial frequencies (i.e., fine details), or perceive them only poorly.

• This reduced visual acuity may actually improve perceptual efficiency by eliminating the “information overload” caused by too many extraneous fine details likely to overwhelm their cognitive system.

• Thus, distant basic-level category and super-ordinate level category learning may actually be facilitated by reduced visual acuity.

Reducing visual acuity in our model to simulate young-infant vision by removing high spatial frequencies

High spatial frequencies

Reducing visual acuity in our model to simulate young-infant vision by removing high spatial frequencies

High spatial frequencies

Reducing visual acuity in our model to simulate young-infant vision by removing high spatial frequencies

High spatial frequencies

Reducing visual acuity in our model to simulate young-infant vision by removing high spatial frequencies

The high spatial frequencies have been removed. The autoencoder will work with input from these images, thereby simulating early infant vision.

Two simulations with Gabor-filtered input

• Reproducing previous results: Using vectors of the 26 weighted spatial-frequency values, instead of explicit feature values, produces autencoder network results similar to those produced by infants tested on the same images

• Reduced visual acuity: This is produced by largely eliminating high-spatial frequency information from the input (i.e., “blurry” vision) actually significantly improves the network’s ability to categorize the images presented to it.

Reproducing previous results (Cats are the more variable category)

Network generalization errors with Gabor-filtered spatial-frequency information

Results for 3-4 month old infants

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Cats Dogs

Familiarization

Netw

ork

err

or

new cat

new dog

Results with explicit feature values (French et al., 2001)

0.24

0.25

0.26

0.27

cats dogs

novel cat

novel dog

Large jump in error

Very little jump in error

Conclusion about the use of Gabor-filtered input instead of explicit

feature measurements

• Spatial frequency data in the model produces a reasonable fit to empirical data.

• We avoid the thorny issue of using a particular set of “high-level” feature measurements (ear length, eye separation, etc.) to characterize the images used in the simulations.

Reduced visual acuity

Reduced perceptual acuity in 3-4 month old infants produces an advantage for differentiating perceptually distant basic-level categories and super-ordinate categories.

Simulation 2: The advantage in 3-4 month old infants of reduced visual acuity

• Above 3-4 cycles/degree: very little contribution

• Above 7.1 cycles/degree: no contribution

The frequencies removed or reduced were:

Network used:

26-16-26 feedforward BP autoencoder network (learning rate: 0.1, momentum: 0.9)

Close categories vs. Very dissimilar categories

When a network is familiarized on one category (say, Cat), reduced visual acuity decreases errors (i.e., improves generalization) for novel exemplars in the same category or very similar categories (like Dog).

But it should help in discriminating dissimilar categories. So, for example, reduced visual acuity should produce a greater jump in error for network (or increased attention for an infant) familiarized on Cats when exposed to Cars.

When trained on one category (Cats), errors on dissimilar categories (Cars) are increased by reduced visual acuity (i.e., better category discrimination).

Larger the error = better discrimination.

Jump in error

0

0.02

0.04

0.06

0.08

0.1

Adult vision Infant vision

A Prediction of the ModelConsider Quinn et al. (1993)

Familiarized on Cats

Jump in interest

No jump in interest.

Cat

Familiarized on Dogs

Dog

But what if we took this test Cat and, by adding only high spatial-frequency information, transformed it into this Dog?

Familiarized on Cats

Prediction: No jump in interest

No jump in interest.

Cat

Familiarized on Dogs

Cat

Presumably what the 3-month old infant would see is this:

The asymmetry would disappear, even though adults would perceive a series of cats followed by a dog and would expect a jump in infants’ interest, as there usually is for a novel dog following familiarization on cats.

Modeling Dogs and Cats: Conclusions

A simple connectionist auto-encoder does a good job of reproducing certain surprising infant categorization data.

This model makes testable predictions…

Gabor-filtered spatial-frequency input is neurobiologically plausible and produces a good approximation to infant categorization data.

A counter-intuitive learning advantage for categorizing distant basic-level categories and super-ordinate categories arises from reduced acuity input.

…that have subsequently been confirmed in infants.

This supports a statistical, perceptually based, on-line categorization mechanism in young infants

Recommended