1
Adam Houston 1 , Chris Westbury 1 & Morton Gernsbacher 2 1 Department of Psychology, University of Alberta, Canada, 2 Department of Psychology, University of Wisconsin, USA Objectifying subjectivity: A quantitative approach to subjective familiarity 1.) Computational Estimation of Subjectivity Familiarity Rating We used a computational method known as genetic programming (GP) to develop a mathematical model of subjectivity familiarity ratings in terms of a non-linear combination of well-defined lexical variables. GP uses natural selection to evolve mathematical equations (see sidebar, top right) using any number of input variables and without making assumptions about distribution or linearity. We evolved functions that maximized the correlation between each equation’s output and human subjective familiarity ratings for 126 5-letter words of frequency 0 (Gernsbacher, 1984). The input variables were 52 lexical variables from the WordMine Database (Buchanan & Westbury, 2000). These included measures of orthographic and phonological neighborhood size and frequency, and length-controlled and length- uncontrolled biphone/bigram size and frequencies. None of those 52 variables was significantly linearly correlated with the subjective frequency measures (p < 0.05). GP is stochastic, and cannot guarantee that any solution it finds is the best solution. Accordingly, we repeated the run many times and picked out the best one. We ran 25 runs of 2500 equations evolving for 75 generations. The best evolved equation combined seven of the input variables (show in Table 1) to generate estimates that correlated with the subjective frequency measures at r = 0.57 (see Figure 2). How does genetic programming work? Any mathematical equation can be expressed as a tree. For example, consider the tree at the top left in Figure 1. It expresses the equation: w + ((y * z) / x). The one beside it expresses the equation: (a / b) * log(w). We can mate any two equations by randomly swapping subtrees that compose them to produce children: equations that have the same elements as their parents. The two trees at the bottom are children of the two at the top. GP ensures that only the best parents are allowed to mate: in this case, the ones that best predict the validity scores. This selectivity ensures that produced children will contain elements that may be useful for the problem at hand. Across many generations of selective breeding, average and best fitness increase. Since fitness is determined here by utility for solving the problem, increases in fitness = better solutions to the problem of interest. The process is formally identical to selective breeding in biology, where the breeder decides which animal is good enough to be allowed to breed. Following repeated breeding sessions, we select the best solution that has evolved. 2.) Testing the Computational Estimation of Subjectivity Familiarity Rating In order to test the evolved equation, we used it to select a large number of words and non-words that it predicted to be either high or low familiarity. None of the words appeared in the original GP input set, and all were known to be recognized by at least 70% of undergraduate subjects. Generating two disjunct and unique stimulus sets for every subject, we asked 34 right-handed native English speakers to undertake two randomly-ordered tasks. Task A: Rating Subjective Familiarity One task was a rating task (Figure 3), which required subjects to use a Likert scale to rate a set of words and nonwords on subjective familiarity. The input file also included some very unword-like strings that contained either vowels or no vowels, in order to encourage subjects to use the full range of the scale. Subjects rated the nonwords selected as ‘low familiarity’ by the evolved equation as significantly less familiar than the words selected as ‘high familiarity’ (t(33) = 4.72; p < 0.0001). There was no effect for words (p > 0.05). Task B: Lexical Decision The second task was a lexical decision task (Figure 4), to determine whether the predicted familiarity would correlate with recognition latencies for words and nonwords predicted to be high or low subjective familiarity. The input file also included very unword-like strings that contained either vowels or no vowels. Subjects were significantly slower to reject nonwords that had been predicted to be high familiarity than those that had been predicted to be low familiarity (t(33) = 2.82; p < 0.01). There was no effect for words (p > 0.05, perhaps because the nonwords included highly word- like strings. Figure 1: Some equations as trees. + / * y w x z LOG / * w a b x + / a b w / * y z LOG w * Lexical decision by predicted familiar 600 650 700 750 800 850 900 950 1000 1050 GOOD NWS BAD NWS LF W HF W Stimulus type PREDICTED FAMILIAR PREDICTED UNFAMILIAR ** ** Familiarity ratings by predicted familia 0 10 20 30 40 50 60 70 80 90 100 GOOD NWS BAD NWS LF W HF W Stimulus type PREDICTED FAMILIAR PREDICTED UNFAMILIAR ** ** Figure 3: Familiarity ratings by human subjects, as predicted in advance by the evolved GP equation. Bars are standard errors. Figure 4: LD RTs for correct responses by human subjects, as predicted in advance by the evolved GP equation. Bars are standard errors. Percentages are correct response rate. 3.) What does it mean?: Discussion and conclusions We have analyzed the evolved equation and simplified it as much as possible. However, it remains complex even in its simplest form. Essentially it operates by breaking up the variance into two roughly orthogonal components that are added together. The first component uses a function of five variables (heavily weighed for the uncontrolled bigram frequency) if there are no orthographic neighbours. If there are orthographic neighbours, this first component uses a function of two controlled biphone measures. The second component is a function of uncontrolled bigram frequency and controlled biphone frequency, modulated in rare circumstances when there is a high frequency phonological neighbour. We should not be surprised that most of the variables in the evolved equation are sublexical, since most of the input variables were. What is of interest is that such variables could account for such a large proportion of the variance in measures of subjective familiarity. We have shown that it is possible to objectively capture a significant portion of the variance in the notion of ‘subjective familiarity’, at least for nonwords. We do not wish to claim that our evolved equation is the ‘true’ function used by human decision makers making familiarity judgments . However, if our equation may be used to infer anything about human decision making, it suggests that humans have a subtle sensitivity to a complex set of subword frequency measures. Some experimental evidence directly manipulating small subword features suggests that this is true (Westbury & Buchanan, 2002). Gernsbacher, M.A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of Experimental Psychology: General. 113(2):256-281. Abstract: Gernsbacher (1984) showed that low-frequency words could vary in their subjective familiarity, as measured by subjects’ ratings. She also demonstrated that manipulation of stimuli by subjective familiarity rating had an effect on word recognition latencies. The finding of this kind of consistency in the subjective ratings of words of equal low frequency raises the question: What aspects of the word are subjects using to make their judgments? The current work addresses this question. We used genetic programming to develop an explicitly- specified mathematical model of Gernsbacher’s subjective familiarity ratings for words with an orthographic frequency of 0. We tested the model experimentally using two tasks. It was able to account for a significant amount of variance in Gernsbacher’s subjective familiarity, and to predict word judgment and recognition judgments for nonwords. Figure 2: Regression line of best evolved equation for predicting subjective familiarity Standardized ratings vs. Standardized evolved estimat r = 0.57 -2 -1 0 1 2 3 -4 -3 -2 -1 0 1 2 3 4 5 Evolved Estimat Table 1: The 7 input variables used by the best evolved equation for predicting subjective familiarity QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture. 86% 92% 99% 99% 89% 88% 98% 98%

Adam Houston 1, Chris Westbury 1 & Morton Gernsbacher 2 1 Department of Psychology, University of Alberta, Canada, 2 Department of Psychology, University

Embed Size (px)

Citation preview

Page 1: Adam Houston 1, Chris Westbury 1 & Morton Gernsbacher 2 1 Department of Psychology, University of Alberta, Canada, 2 Department of Psychology, University

Adam Houston1, Chris Westbury1 & Morton Gernsbacher2 1 Department of Psychology, University of Alberta, Canada, 2 Department of Psychology, University of

Wisconsin, USA

Objectifying subjectivity: A quantitative approach to subjective familiarity

1.) Computational Estimation of Subjectivity Familiarity Rating

We used a computational method known as genetic programming (GP) to develop a mathematical model of subjectivity familiarity ratings in terms of a non-linear combination of well-defined lexical variables. GP uses natural selection to evolve mathematical equations (see sidebar, top right) using any number of input variables and without making assumptions about distribution or linearity. We evolved functions that maximized the correlation between each equation’s output and human subjective familiarity ratings for 126 5-letter words of frequency 0 (Gernsbacher, 1984). The input variables were 52 lexical variables from the WordMine Database (Buchanan & Westbury, 2000). These included measures of orthographic and phonological neighborhood size and frequency, and length-controlled and length-uncontrolled biphone/bigram size and frequencies. None of those 52 variables was significantly linearly correlated with the subjective frequency measures (p < 0.05).

GP is stochastic, and cannot guarantee that any solution it finds is the best solution. Accordingly, we repeated the run many times and picked out the best one. We ran 25 runs of 2500 equations evolving for 75 generations. The best evolved equation combined seven of the input variables (show in Table 1) to generate estimates that correlated with the subjective frequency measures at r = 0.57 (see Figure 2).

How does genetic programming work?Any mathematical equation can be expressed as a tree. For example, consider the tree at the top left in Figure 1. It expresses the equation: w + ((y * z) / x). The one beside it expresses the equation: (a / b) * log(w). We can mate any two equations by randomly swapping subtrees that compose them to produce children: equations that have the same elements as their parents. The two trees at the bottom are children of the two at the top. GP ensures that only the best parents are allowed to mate: in this case, the ones that best predict the validity scores. This selectivity ensures that produced children will contain elements that may be useful for the problem at hand. Across many generations of selective breeding, average and best fitness increase. Since fitness is determined here by utility for solving the problem, increases in fitness = better solutions to the problem of interest. The process is formally identical to selective breeding in biology, where the breeder decides which animal is good enough to be allowed to breed. Following repeated breeding sessions, we select the best solution that has evolved.

2.) Testing the Computational Estimation of Subjectivity Familiarity Rating

In order to test the evolved equation, we used it to select a large number of words and non-words that it predicted to be either high or low familiarity. None of the words appeared in the original GP input set, and all were known to be recognized by at least 70% of undergraduate subjects. Generating two disjunct and unique stimulus sets for every subject, we asked 34 right-handed native English speakers to undertake two randomly-ordered tasks.

Task A: Rating Subjective FamiliarityOne task was a rating task (Figure 3), which required subjects to use a Likert scale to rate a set of words and nonwords on subjective familiarity. The input file also included some very unword-like strings that contained either vowels or no vowels, in order to encourage subjects to use the full range of the scale. Subjects rated the nonwords selected as ‘low familiarity’ by the evolved equation as significantly less familiar than the words selected as ‘high familiarity’ (t(33) = 4.72; p < 0.0001). There was no effect for words (p > 0.05).

Task B: Lexical DecisionThe second task was a lexical decision task (Figure 4), to determine whether the predicted familiarity would correlate with recognition latencies for words and nonwords predicted to be high or low subjective familiarity. The input file also included very unword-like strings that contained either vowels or no vowels. Subjects were significantly slower to reject nonwords that had been predicted to be high familiarity than those that had been predicted to be low familiarity (t(33) = 2.82; p < 0.01). There was no effect for words (p > 0.05, perhaps because the nonwords included highly word-like strings.

Figure 1: Some equations as trees.

+

/

*

y

w

x

z

LOG/

*

w

ab

x

+

/

ab

w

/

*

y

z

LOG

w

*

Lexical decision by predicted familiarity

600

650

700

750

800

850

900

950

1000

1050

GOOD NWS BAD NWS LF W HF W

Stimulus type

RT (ms.)

PREDICTED FAMILIAR

PREDICTED UNFAMILIAR

**

**

Familiarity ratings by predicted familiarity

0

10

20

30

40

50

60

70

80

90

100

GOOD NWS BAD NWS LF W HF W

Stimulus type

Rating

PREDICTED FAMILIAR

PREDICTED UNFAMILIAR

**

**

Figure 3: Familiarity ratings by human subjects, as predicted in advance by the evolved GP equation. Bars are standard errors.

Figure 4: LD RTs for correct responses by human subjects, as predicted in advance by the evolved GP equation. Bars are standard errors. Percentages are correct response rate.

3.) What does it mean?: Discussion and conclusions

We have analyzed the evolved equation and simplified it as much as possible. However, it remains complex even in its simplest form. Essentially it operates by breaking up the variance into two roughly orthogonal components that are added together. The first component uses a function of five variables (heavily weighed for the uncontrolled bigram frequency) if there are no orthographic neighbours. If there are orthographic neighbours, this first component uses a function of two controlled biphone measures. The second component is a function of uncontrolled bigram frequency and controlled biphone frequency, modulated in rare circumstances when there is a high frequency phonological neighbour.

We should not be surprised that most of the variables in the evolved equation are sublexical, since most of the input variables were. What is of interest is that such variables could account for such a large proportion of the variance in measures of subjective familiarity. We have shown that it is possible to objectively capture a significant portion of the variance in the notion of ‘subjective familiarity’, at least for nonwords. We do not wish to claim that our evolved equation is the ‘true’ function used by human decision makers making familiarity judgments . However, if our equation may be used to infer anything about human decision making, it suggests that humans have a subtle sensitivity to a complex set of subword frequency measures. Some experimental evidence directly manipulating small subword features suggests that this is true (Westbury & Buchanan, 2002).

Gernsbacher, M.A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of Experimental Psychology: General. 113(2):256-281.

Buchanan, L. & Westbury, C (2000). Wordmine database: Probabilistic values for all four to seven letter words in the English Language. http://www.wordmine.orgWestbury, C. & Buchanan, L. (2002). The probability of the least likely non length-controlled bigram affects lexical decision RTs. Brain and Language, 81(1/2/3):66-78.

Abstract: Gernsbacher (1984) showed that low-frequency words could vary in their subjective familiarity, as measured by subjects’ ratings. She also demonstrated that manipulation of stimuli by subjective familiarity rating had an effect on word recognition latencies. The finding of this kind of consistency in the subjective ratings of words of equal low frequency raises the question: What aspects of the word are subjects using to make their judgments? The current work addresses this question. We used genetic programming to develop an explicitly-specified mathematical model of Gernsbacher’s subjective familiarity ratings for words with an orthographic frequency of 0. We tested the model experimentally using two tasks. It was able to account for a significant amount of variance in Gernsbacher’s subjective familiarity, and to predict word judgment and recognition judgments for nonwords.

Figure 2: Regression line of best evolved equation for predicting subjective familiarity

Standardized ratings vs. Standardized evolved estimates:

r = 0.57

-2

-1

0

1

2

3

-4 -3 -2 -1 0 1 2 3 4 5

Evolved Estimate

Rating

Table 1: The 7 input variables used by the best evolved equation for predicting subjective familiarity

QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.

86% 92% 99% 99% 89% 88% 98% 98%