KISS, GEORGE R. - Stanford Universityvw310cy6633/vw310...implicit in the diagrams drawn by S-R psychologistsfor a longtime.Systematicand formal attempts atmaking use ofstructural analysis,

KISS, GEORGE R.

I 6 8

JOURNAL OF

VERBAL

LEARNING

AND VERBAL

BEHAVIOR

7, 707-7 J 3 (1968)1

'n

V

>

,

I.

»

,*"

Words, Associations, and Networks

George R. Kiss

Birkbeck College, University ofLondon 1

The purpose of this paper is to call attention to the usefulness of graph theory as a toolin the structural analysis of relationships between words. Some elementary concepts ofgraph theory are briefly described. The concept of a connector node is introduced. Therelationshipbetween graphs and matricesis illustrated,with particularreferenceto Markovchains. Signal-flow graph methods arc briefly described and their use is illustrated in thecontext of a new theory of wordassociation. The ability of graph-theoretical language toreplace the ambiguous and awkward terminology of S-R formulations is demonstratedbytranslating the definitions of six measures of word relatedness into this language. It isshown that some inconsistenciesand weaknesses thereby becomeapparent.

The lalysis of structural relationshipsbe"- set of elements is an important

Graph Theory

An excellent introductionto graph theory for socialscientists is now available in the book by Harary,Norman,and Cartwright (1965). Previous expositionsof the subject were either relatively inaccessible, orwereaimedat the mathematically sophisticatedreader(Beige, 1962). The availability of the book by Hararyetal. makes it possibleto restrict the discussion here tothe introduction of the most basic concepts only. Theinterested reader can find a goldmine of further ideasin thatbook, or in therest of theliterature.The exposi-tion here will be informal rather than rigorous. Theterminology used is a mixturefrom various sources.

nany areas of psychology. Exam-

'

'ed by sociometry, psychologicalne syntactic organization of

sentei.w. 'Vithin the field of verbal behaviornotions of structural analysis have beenimplicit in the diagrams drawn by S-Rpsychologists for a long time. Systematic andformal attempts at making use of structuralanalysis, however, are relatively recent. Pro-bably Deese (1962) was the first to emphasizethe fact that structural methods, like factoranalysis, are appropriate tools in the study ofassociative structure. The same point wasmade, even more forcefully, by Deese (1965)in his recent book. A different approach hasbeen pursued by Kiss (1965), and by Pollio(1963, 1966). The present paper is concernedwith a general consideration of graph theoryas the major tool of structural analysis, andwith some applications of this theory in thecontext of word-association behavior. Aspecific branch of graph theory, that ofsignal flow graphs, originally introduced byMason (1953) in electrical engineering, isdescribed.

A graph consists ofa set of nodes interconnectedbylines. If the lines havea direction,usually shown by anarrowhead, then the graph is a directedgraph. If thedirected lines also havenumbers associated with them,then the graph is called a network. An exampleof anetwork is shown in Fig. la. This networkconsists ofthe nodes

X\,

.v2, .v3 , .v4 , and .v5 , and the directed lines(arcs) connecting them, each having the value shownby the integer written nexttoit.

A path is a sequence of arcs. A path is formed inFig. \a for example by the sequence of arcs X\X2,

X2X4,

.V4.Y5. A circuit is a closed path; i.e., its initialand terminal nodes are the same. The simplest case isthe loop, shown at nodex

4,

it consists of the arcx 4x4only. A more complex circuit is formed by the arcsconnecting the nodes

Xi,

x

2,

.v4, ,v3 , X\. The length ofapath is the numberofarcs in it. A network is completeifevery nodepair is connectedby an arc in at least oneof the two possible directions.

1 Now at M.R.C. Speech and CommunicationResearch Unit, 31 Buccleuch Place, Edinburgh.

A concept which will turn out to be useful later isthat of a connector node. This is best shown by an

24 707

© 1968 by Academic Press Inc.

708 KISS

/

i<

i

I

example. In Fig. \a node x 5 is a connector node: itconnectsnodes Xi and,v4 .This concept can be extendedto any numberof connected nodes. A connector nodereceives arcs from each of the nodes which it connects.

A special example of a network is provided by thetransition diagrams of Markov chains. In this casethevalue ofeach arc is anumber m such that 1 > m > 0.The values of the arcs coming out of any node arenormalized so that they sum to 1. When a networkisthe representation of a Markov chain, the nodescorrespond to the states ofa system and the arcs depicttransitionsbetween states. The value of an arc is theconditionalprobability (or transition probability) thatif the system is now in the state corresponding to the

b)

Fio. 1. (a) Example of a network:a directed graphwith values assigned to the arcs, (b) The value matrixof the network in a.

initial node of an arc, then next it will go into the statecorresponding to the terminal nodeof the arc.

There is an intimateconnectionbetween graphs andmatrices. Indeed, oneof the attractive features of thisfield is that one can shift between the graphical,intuitivelymore meaningful, and thematrix,sometimesanalytically more powerful, methods of dealing withproblems. The relationship is illustrated in Fig. 16,showing the value matrix of the network of Fig. In.Therows and columns of this matrix correspond to thenodes of the network and for this reason the matrix issquare. The entry at the intersection of row xy andcolumn x 2 is the value of the arc going from node X\to nodex 2 in the network.

Some useful concepts can be readily illustrated. Thenumber of entries in a row is the outdegree of thecorresponding node(i.e., the numberofarcs emergingfrom it). Thesum of the entries in a row is the outdegreevalue of the node. (Notice that for the network of aMarkovchain the value matrix isa probability matrix,i.e., the entries arenonnegative and therows sum to 1 .)Analogously, the number of entries in a column, andthe sum of these entries, define the indegree and theindegree value of the relevant node. For example, inFig. 1 the indegreeand outdegree ofnodext arcboth 2,while the indegree value is 6 and the outdegree valueis 11.

The powers of the value matrix have a ratherinteresting interpretation. This will be illustrated bymeans of the stochastic (probability) matrix of aMarkov chain. Let us denote the matrix of transitionprobabilitiesby M. The entries mw ofMare,as

before,

the transition probabilities of going from state i tostate j. Now the powers of a probability matrix arealso probability matrices. The interpretationof M" isthat it gives the//-step probabilities, mv , of going fromstate i to state

/

in exactly // transitions.Matrix algebra can be used for the solution ofmany

interesting problems in graph theory. It also has theadvantage that its operations can be readily adaptedfor digital computers. When the graphs to be treatedbecome large, however, solution by matrix manipu-lation can become time consuming and wasteful ofstorage space. In such cases, and also when certaintransformationsof the graph under consideration arerequired, it is more economical, and often also intui-tively more insightful, to use methods based directlyon the topology of the graph. One of the most widelyused such methods was developed, in the context ofengineering systems, by Mason (1953). These tech-niques are known in that field by the name of signalflow graphs (Robichaud, Boisvcrt, and Robert, 1962;Lorens, 1964; and many others).

Signal-Flow Graphs

Signal-flow graphs are networks in which the nodesrepresent variablesand thearcsrepresent the functionalrelationships among them. As introduced by Mason,the variables are signals traveling along the arcs(branches) of the network.The signals arc modified bythecharacteristicsof thebranchestraversed.The valuesof the branches are called transmittances and theyrepresent the coefficients of the equations describingthe functional relationships between the signals. Thisis illustrated in Fig. 2. Every node combines all theincoming signals and transmits the resulting signalalong allof the outgoing branches.

It can be shown that the topological transformationsof a signal-flow graph correspond to algebraic trans-formationsof the corresponding system ofequations.

L

WORD-ASSOCIATION NETWORKS 709

V

I

4

Fig. 2. The basic linearflow graph

It is then possible to "solve" the graph for certainvariables,or to transform it into some residual formin which only the nodes ofspecial interestareretained.The reduction of a flow graph to some essentialelements proceeds by the successive elimination ofinessentialnodes. This operationis the analogue oftheeliminationofa variable ina system ofequations.

The elementary transformationswhich can be usedin a reduction process are shown in Fig. 3. Trans-formations a and c eliminatenodes, while transforma-tionb reduces the numberof branches.Transformation

Fig. 3. Elementary signal flow-graph transforma-tions.

d shows the elimination of a loop. This lasttransformation is needed because the reduction ofsignal-flow graphs containing circuits will sooner orlater lead tothe appearance of loops. Such flow graphsare sometimes called feedback graphs, in distinctionfrom thosewhich donot containcircuits, and are calledcascade graphs. From transformationa it can be seenthat the transmittance of a path is the product of the

component branch transmittances.Transformation bshows that the transmittance resulting from twoparallel branches is the sum of those two transmit-tances. In a cascadesignal-flow graph,

therefore,

thetransmittancebetween any two nodes can be evaluatedby obtaining the transmittances of all the pathsconnecting thosenodesand then summing these values.This calculation is equivalent to the reduction of thegraph to a single branch connecting the two nodes. Anexampleof thiscalculation is shown in Fig. 4.

Application in a Theory of WordAssociation

It is now time to turn to words and associa-tions. In the study of associative structure itis quite clearly convenient to use a networkin which the nodes are words and the arcsrepresent an associative relationship. Word-association norms usually specify the"strength" of an association by giving therelative frequency of one word as a responseto another. It seems natural, therefore, to usethese relative frequencies (estimates of proba-bilities) as the values in the network. Thenetwork of associative connections among aset of words can be ascertained by using eachword in the set as a stimulus with a group ofss,or with the same 5repeatedly. The data fromsuch an experiment can then be used for theconstruction of a network which is descriptivefor that particular set of words and set of 5son that particularoccasion.

The resemblance of such a network to aMarkov chain is worth noting. The outdegreevalues of the nodes are properly normalized,and the restriction on the values of the arcs ismet. However, the specification ofa systemforwhich such a graph is the state transitiondiagram is not simple. Elsewhere (Kiss, 1967)the author has presented a detailed accountof a model which can provide a possiblerationalization of such networks and relatethem to the word-selection process in a single

Fig. 4. Example of signal flow-graph reduction

z =axj-by

ae+b(d+ce),° ' 0X 2

710

KISS

I

>

individual on a single occasion. For lack ofspace it will have to suffice here merely torestate the conclusions reached in thatpaper.

An association network based on dataobtainedfrom a group of 5s essentially showsthe probabilities that if any 5 at random isinterrogated with one of the words in the set,then within at most a few minutes he willrespond with the word at the terminatingnodeof the corresponding arc. The time he spendsin selecting his response is a random variable.The essence of the theory presented in thepaper mentioned above is that the 5 spendsthis time on going through the transitions of astochastic process (a branching process)during which the activities of the internalrepresentations of words vary. The processterminates when certain threshold conditionsare reached. The variation of activities is aresult, amongother things, of the transmissionof excitation between the neural representa-tions of words along associative links. Thistransmission is a stochastic affair, whichessentially determines the character of theword-selection process. The most importantquality turns out to be the expected (mean)amount of activity produced in word j by aunit activity in word /' after n units of time.When these expectation values are suitablynormalized they form a Markov process,called theexpectation process. It is argued thatthe empirical word-association networks arerelated to this expectation process.

Since in the usual word-association experi-ment the time limit for giving a response is atmost a few minutes, the resulting probabili-ties can be interpreted as the sums of the1-step, 2-step, . .. ., «-step transition probabi-lities, up to some limit. If the 1-step transitionprobabilities were known, from this infor-mation one could obtain the «-step probabil-ities.

The practical importance of this is that theword-association norms; i.e., the actual wordswhich occur and their probabilities, could bepredicted from the knowledge of the direct

(1-step) connections, since all others aregenerated by longer and longer transitionsequences.

The 1-step probabilities can be assessedapproximately by making use of Marbe'slaw, which states that the words given withthe shortest latency (short transition chains)are the ones which have the highest common-ality (i.e., probability of occurrence among asample of ss). Some approximation to the1-step probabilities can be obtained, therefore,by taking, say, the top tenresponses (accordingto frequency rank) to each stimulus in thenormative tables. From the 1954 Minnesotanorms the tenth word has a mean probabilityof 0.014. (The arbitrary value often responsesis chosen by noting that thefirst ten responsesaccount for about 80% of the total number ofresponses in the norms.)

If such data are now obtained for each ofthe 100 words on the Kent-Rosanoff list,and then to each of the new words occurringamong the first ten responses to each stimulusword, and so on, up to some limit, then theresulting network should be able to providevery good predictions of the full normativetables for any of the words which are not toofar from the center of the network. From someevidence presented earlier (Kiss, 1967), itseems unlikely that more than three or foursteps would be required in growing the net-work. The author is at present engaged in thecollection of such data. When available, itshould make it possible to calculate theresponse probability of any word in thenetwork to any other word, by flow graph ormatrix methods. These data will thereforehave important predictive value for anyexperiments in which single words are selectedas responses to verbal stimuli.

An alternative test of the theory has beengiven in the earlier paper (Kiss, 1967). Inthat test the association norms were success-fully predicted from the knowledge of small-sample data on the corresponding network.In particular, the network was mapped oulby using 50 ss, giving one response to each

711word-association networks

I

i

<■

word, and then these data were used to gener-ate the full association norms of the word atthe center of the network. Correlations be-tween predicted and observed values werehighly significant and had an average value of.61. In view of the simplifications involved(due to restricted computing facilities), andthe prediction of American norms from Britishdata, these results can be regarded as strongevidence in favor of the model.

Applications in Measuring WordRelatedness

A large number of different measures ofword relatedness based on normative wordassociation data has been defined in theliterature. Six of them were reviewed by

both 5X and 52, divided by the total number ofresponses to Si and 52. In network terminologythis can be expressedas the sum of the indegreevalues over the nodes which connect 5: and 52,divided by the sum of the outdegreevalues of5, and 52. The calculation of the indegreevalues is here rather arbitrarily changed froma simple summation to taking the "overlap"value (i.e., the smaller of the two values if theyarc unequal). It is worth noting that in calcu-lating this index self-loops are introduced atboth 5i and 52, the values of which arearbitrarily assumed to be 100%. The actualvalue of this feedback can be evaluated byflow-graph methods as the transmittanceleading back to the node via all circuits in thenetwork.

Fig. 5. Network for the calculationof the Index of Generalization. GLOVE is the "tested" word, HANDis the "trained"word, FINGER, HOLD and WARM are connector words.

Marshall and Cofer (1963). All of these areimplicitly based on graph-theoretical con-cepts; namely, the indegree and outdegreevalues of nodes in a network, with variousrestrictions. Their ad hoc nature, and theweaknesses inherent in some of them, aremainly due to the fact that the relationshipof the indices to the total network in whichthe relevant items are embedded is not

explicitly recognized. The usefulness of thegraph-theoretical language is best demon-strated by showing how these indices can beexpressed in this language.

Measure of Relatedness (MR). This indexmeasures the associative relatedness of twowords Si and 52. Marshall and Cofer defineit as the sum of the responses in common to

Index of Generalization (/.(/.). This indexpurports to measure the amountof generaliza-tion to be expectedbetween a "trained" and a"tested" word. It is defined as "the sum offrequencies to tested word of R's common totested and trained word, divided by the sumof frequencies of all responses to test word"(Marshall and Cofer, 1963,p. 413). In networkterminology this can be expressed as: the"overlap" outdegree value of the tested wordtaken over arcs leading to connectors to thetrained word, divided by the outdegree valueof the tested word. It is worth drawing thecorresponding graph for this index, becauseone deficiency of it becomes apparent from it.This is shown in Fig. 5 for the example usedby Marshall and Cofer. The tested word is

/

712 KISS

f

if

!

GLOVE. Its outdegree value can be dividedinto two parts, one of which results from arcsleading to connectors to the trained word,HAND, and the other which results from arcsto other words. The I.G. indexis defined as theratio of these two parts. It would bereasonableto expect, however, that the arcs leading fromthe trained word to other words will also havesome influence on the generalization processand should be taken into account. Anotherinconsistency in the calculation of I.G. isrelated to the self-loops at GLOVE andHAND. It has already been mentioned thatthe actual values of these loops could beevaluated from the full graph of the network,instead of making the assumption (based onthe idea of the "implicit response") that thesehave a value of 100%. But apart from this, thecalculation of I.G. takes into account the self-loop at the trained word, but not at the testedword. This asymmetry is illogical, sinceGLOVE (the tested word) is just as much aconnector node in this graph as HAND (thetrained word) is.

The next four indices all relate to themeasurement of the relationship between aset of words, or between one element of theset and the rest of it.

Inter-Item Associative Strength (1.1.A.5.).This index measures the extent to which a setof words elicit each other as responses. Itturns out to be the mean outdegree valuetaken over the network of the set of words.Marshall and Cofer give three verbal formula-tions of 1.1.A.5. On page 414 they say that it"is the mean percentage that list memberselicit each other as responses." This is mis-leading, since the mean is actually obtained bysumming the values of all associative connec-tions and dividing by the number of words inthe list (rather than by the number of associa-tive connections). On page 415, line 18, theword "times" should read "divided by." Thedescription given in the first paragraph of thispage is correct. However, the formula at thefoot of the page contains the symbol DR C,defined as "direct associates in common." It

is not clear to what these associates are incommon, and in any case the original descrip-tion given by Deese (1959) does not seem tomake a restriction of this kind.

Index ofTotalAssociation (I.T.A.) This indexaims to take into account also the indircclconnections between words in the network.Although it is called an index of total associa-tion, it in fact deals with indirect connectionsof length 2 only. The definition given byMarshall and Cofer is again ambiguous (thephrase "associations in common" and thecorresponding symbol Re is used with adifferent meaning from that in the definitionof 1.G.). Nevertheless, this index seems to bethe meanoutdegreevalue of the set, calculatedfor arcs leading to any connectors or otheritems of the set, divided by the sum of theoutdegree values of the set. Notice here thatthe network on which thecalculation is basedhas more nodes than just the set for which theindex is obtained.

Index ofConcept Coliesiveness (1.C.C.). Thisis defined as the sum of the indegree values ofthe nodes which connect all words in the set,divided by the sum of the indegree values ofall connectors of the set.

MeasureofStimulus Equivalence (S.E.). Thisis defined as the number of nodes in thenetwork which connect two or more words inthe set, with the restriction that the values ofthe arcs must be over 3 %.

Enough has been said now to show howgraph-theoretical concepts can be used toadvantage in replacing the ambiguous andawkward languageof S-R terminology. Apartfrom this purely notational advantage, the useof graph theory directs attention to someconceptual inconsistencies in the definitions ofsome of these measures. The usefulness ofthese indices in making predictions for perfor-mance in tasks like free recall, paired-associatelearning, mediated transfer, and others, isof course an empirical matter, currently understudy by a number ofinvestigators. Some of theauthor's experiments along these lines, to bepublished elsewhere, indicate that a measure

r

\

; (713WORD-ASSOCIATION NETWORKS

: I

;

of relatedness based on the total flow-graphtransmittance existing between any two nodesof a network (see Fig. 4 for an example) mayturn out to be useful in predicting recall-likeprocesses, i.e., those where actual clicitationof words is involved.

The use of flow-graph methods entails aconceptual background in which the mainpreoccupation is with the amountof excitationflowing into and out ofnodes of a network andwith directional pathways of any lengthexisting between nodes. This approach is to becontrasted with the conceptual background ofS-R terminology, where the main preoccu-pation is with "responses elicited in commonby twoor more stimuli" leading to a considera-tion of connectors instead ofpaths. There aresome indications in the literature that thisapproach is more successful in predictinggeneralization-like processes.

References

Berge, C. The theory of graphs. London: Methuen,1962.

Deese, J. Influence of inter-item associative strengthupon immediatefree recall. Psychol. Pep., 1959,5, 305-312.

Deese, J. On the structure of associative meaning.Psychol. Rev., 69, 161-175.

Deese, J. The structure ofassociations in language andthought. Baltimore: The Johns Hopkins Press,1965.

Harary, F., Norman, R. Z., and Cartwright, D.Structural models: an introduction to the theory ofdirectedgraphs.New York:Wiley, 1965.

Kiss, G. R. Clustering ofwords in associationnetworksand in free recall. Bull. Brit. Psychol.

Soc,

1965,18, No. 58, p. 7A.

Kiss, G. R. Steps towardsa model of word selection.Bull. Brit. Psychol. Soc, 1967,20, No. 66, p. 6A.

Lorens, C. S. Flow graphs: for the modelling andanalysis of linear systems. New York: McGrawHill, 1964.

Marshall, G. R., and

Cofer,

C. N. Associativeindices as measures of word relatedness: a sum-mary and comparison of ten methods. /. verb.

Learn,

verb. Behav., 1963, 1, 408-421.Mason, S. J. Feedback theory: some properties of

signal flow graphs. Proc. 1.R.E., 1953, 41, 1144--1156.

Pollio, H. R. A simple matrix analysis of associativestructure. J. verb.

Learn,

verb. Behav., 1963, 2,166-169.

Pollio, H. R. The Structuralbasisofwordassociation.The Hague: Mouton, 1966.

Robichaud, L. P. A., Boisvert, M., and Robert, J.Signal flow graphs and applications. London:Prentice Hall International, 1962.

(Received August 22, 1967)

Documents

KISS, GEORGE R. - Stanford Universityvw310cy6633/vw310...implicit in the diagrams drawn by S-R psychologistsfor a longtime.Systematicand formal attempts atmaking use ofstructural analysis,