Six-Year Developmental Course of Internalizing and Externalizing Problem Behaviors

Six-Year Developmental Course of Internalizingand Externalizing Problem Behaviors

FRANK C. VERHULST, M.D., AND JAN VAN DER ENDE, M.S.

Abstract. The 6-year developmental course of parent-reported problem behavior in an epidemiologicalsample of 936 children assessed with the Child Behavior Checklist at 2-year intervals was determined. Childrenwho were scored in the deviant range of the total problem score at time 1 were nine times more likely to be scoreddeviant 6 years later than were children who were not deviant at time 1 (odds ratio 9.0). Of the deviant childrenat time 1, 33% were deviant at time 4. There was no difference in the persistence of externalizing versus internalizingproblems. This underscores the notion that internalizing problems should not be disregarded. Although this studydemonstrated moderate stability of problem behaviors across a 6-year interval, children's problem behaviors shouldnot be regarded as static. Many children showed changes in their level of functioning across time. However,extreme changes were the exception rather than the rule. J. Am. Acad. Child Adolesc. Psychiatry, 1992, 31,5:924-931. Key Words: epidemiology, follow-up, child psychopathology, longitudinal.

The developmental approach to questions of child psychopathology is characterized by two basic qualities. The firstis that development involves change. Children undergo rapidchanges in biological, cognitive, emotional, and social functioning. Without children's ability to show developmentalchanges, questions concerning the etiology and course ofmaladaptive behaviors would not be relevant. The secondquality of developmental processes is that although childrenshow many changes in their functioning, there must be somestability or connectedness. The developmental theories ofFreud and Piaget, for example, rely on the assumption thatsuccessive stages are causally and coherently linked. Information on the developmental stability of children's problembehaviors is important from a practical as well as theoreticalpoint of view. If problem behaviors in children fail to showpersistence across time, this would cast doubt on the necessity of treatment. However, if stability of serious problembehaviors proves to be substantial, this would argue againsta wait-and-see policy and for adequate intervention.

Often, longitudinal research in the field of child psychopathology is handicapped by a lack of standardized assessmentprocedures and a lack of operational definitions for diagnostic categories. In previous studies, 2- and 4-year stability ofparent reported problem behaviors, and 4-year stability ofteacher reported problem behaviors in an epidemiological

Accepted December 2, 1991.Dr. Verhulst is professor and director of child and adolescent

psychiatry at Sophia Children's Hospital, Erasmus University, Rotterdam, The Netherlands, where Mr. Van der Ende is research psychologist.

This research was financially supported by grants from the SophiaFoundation [or Medical Research and by the Dutch National Programme for Stimulation of Health Research. Portions of this studywere presented at the Society for Research in Child and AdolescentPsychopathology, Zandvoort, Holland, June 20, 1991. We wish tothank Dr. Thomas Achenbach for his critique on this manuscript.

Reprint requests to Dr. Verhulst, Erasmus University, Rotterdam,Sophia Children's Hospital, Gondelweg 160, 3038 GE Rotterdam,The Netherlands.

0890-8567/92/3105-G924$03.00/0© 1992 by the American Academy of Child and Adolescent Psychiatry.

924

sample of children was assessed (Verhulst and Althaus,1988; Verhulst et al., 1990; Verhulst and van der Ende,1991). Problem behaviors were assessed with the Child Behavior Checklist (CBCL) (Achenbach and Edelbrock, 1983)and the teacher version of this instrument, the Teacher'sReport Form (TRF) (Achenbach and Edelbrock, 1986). Using these standardized assessment procedures, high stabilitywas found for the level of parent reported problem behaviors(r = 0.66 for CBCL-total problem scores across a 4-yearinterval), whereas the stability for teacher reported problembehaviors was medium (r = 0.37 for TRF-total problemscores across the 4-year interval). Stability of both teacherand parent reported problem behavior in children aged 4 to12 years did not differ significantly by age. Stability wasequal for both sexes in parent reported problems, whereasteacher ratings showed more stability for girls than boys.

In a previous report on the 4-year follow-up of parentreported problem behaviors (Verhulst et al., 1990), an overview was given of prospective studies published before 1988in which the stability of problem behaviors was assessed inchildren from the general population. The studies by Richman et aI. (1982) and Graham and Rutter (1973) are mostrelevant for the study to be reported here. In both theseprospective longitudinal studies of epidemiological samples,caseness was defined in terms of categorical criteria. Richman et aI. (1982) reported the prevalence of psychiatricdisorder in a sample of 705 3-year-old children from a London borough. For a subsample of 185 children, informationwas obtained at ages 4 and 8. It was found that 61% ofthe problematic children at age 3 still showed considerabledifficulties on a clinical rating 5 years later; 38% were labeled antisocial and 23% neurotic by clinical ratings ofthe type of deviance. In the 4-year follow-up of a generalpopulation sample of 10 to l l-year-olds on the Isle of Wight,Graham and Rutter (1973) found that of children with time1 clinical diagnoses of conduct disorder, 75% remained deviant at time 2. Of the children with time 1 diagnoses ofemotional disorder, 46% remained deviant at time 2.

Two recent studies reported on the stability of problembehaviors in general population samples. Esser et aI. (1990)

J. Am. Acad. Child Adolesc. Psychiatry, 31:5, September 1992

studied the persistence of psychiatric disorder in a generalpopulation sample of 399 children assessed at ages 8 and 13.Of the 71 children initially diagnosed as having a psychiatricdisorder, approximately 50% remained disturbed across the5-year interval. The prognosis for emotional disorders wasfound to be more favorable than that for conduct disorders.Berden et al. (1990) studied the relationship between lifeevents and problem behavior in Dutch children from thegeneral population assessed longitudinally across a 2-yearinterval, using the CBCL. A significant but small effect,accounting for 2.9% of variance, was found between totalscores for life events and CBCL total problem scores atthe second time of assessment, with initial CBCL scorespartialled out.

In general, most studies showed modest stability of problem behaviors. Firm conclusions cannot be drawn, however,because of several limitations of the studies, such as therestriction of the sample to a single locality, the use ofselected samples, the use of different assessment instrumentsat different times, and the use of very broad categories offunctioning.

A number of studies compared the prognosis of externalizing versus internalizing disorders. These studiesshowed conflicting results, however. In the Isle of Wight 4year longitudinal study (Graham and Rutter, 1973), emotional disorders tended to be somewhat less persistent thanconduct disorders. However, Verhulst and Althaus (1988)found little difference in the 2-year persistence of devianceamong children from the general population who scoredabove the 90th percentile on the CBCL internalizing scaleor the externalizing scale. Of the children who were initiallydeviant on the externalizing scale, 28% remained deviant onthis scale, whereas 24% of the children who were initiallydeviant on the internalizing scale remained so. McConaughyet al. (1992) also reported little difference in the persistenceof parent reported externalizing (52%) versus internalizing(44%) problems in a general population sample across a 3year interval. A number of studies investigated the persistence of depressive disorders in clinic samples and reportedconsiderable continuity across time (Harrington et al., 1990;Kovacs et al., 1984 a, b).

Although it has often been assumed that internalizingdisorders have a rather good prognosis, few studies havetested this over a reasonably long period. The present studyenabled us to test the difference in persistence of parentreported externalizing versus internalizing disorders acrossa 6-year period in children from the general population assessed at 2-year intervals with the same instrument.

The main aims of the present study were: (1) to determinethe course of parent-reported problem behavior in individualchildren from the general population assessed with theCBCL at 2-year intervals across a 6-year period; and (2) toidentify differences in the stability of problem behaviors inindividual children according to the type of problem and thesex and age of the child.

MethodInstrument

The CBCL (Achenbach and Ede1brock, 1983) was used

J.Am.Acad. Child Adolesc.Psychiatry,31:5,September1992

COURSE OF INTERNALIZING AND EXTERNALIZING PROBLEMS

to obtain standardized parents' reports of children's problembehaviors. It consists of 20 competence items and 120 problem items. Only the findings from the problem section willbe reported. Parents are requested to circle a 0 if the problemitem is not true of the child, a 1 if the item is somewhat orsometimes true, and a 2 if it is very true or often true. Atotal problem score is computed by summing all Os, 1s, and2s.

The CBCL was translated into Dutch with the help ofa linguist. The good reliability and discriminative validityestablished by Achenbach and Edelbrock (1983) were confirmed by the authors' studies using this translation (Verhulst et al., 1985 a.b),

A correlation of 0.70 has been obtained between CBCLtotal problem scores and problem scores based on clinicalinterviews with parents 313 (SD = 66) days after they completed the CBCL (Verhulst and Van der Ende, 1991).

Achenbach (1991) has constructed eight so-called crossinformant syndromes that were similar on the parent,teacher, and self-report versions of the CBCL, respectivelyon the CBCL, the TRF, and the Youth Self-Report. Thesyndromes were empirically derived via factor analyses. Theeight cross-informant syndromes described by Achenbach(1991) were used. Two broad-band groups of syndromes,designated as externalizing and internalizing were also usedin the analyses. Externalizing problems reflect conflicts withother people and their expectations of the child, whereasinternalizing problems reflect internal distress. The internalizing group consists of the anxious/depressed, somaticcomplaints, and withdrawn syndromes. The externalizinggroup consists of the aggressive and delinquent behaviorsyndromes.

Description of the Sample

The original sample consisted of 4- to 16-year-olds fromthe Dutch province of Zuid-Holland in 1983 (time 1). Forthe present study, only those children who were 4 to 11years old in 1983 were included. The original target sampleconsisted of 1,498 children aged 4 to 11 years. At time 1,usable CBCLs were obtained on 1,315 (87.8%) of the children. Parents were interviewed by trained interviewers whorecorded the parents' answers to each CBCL question. Thesame procedure was followed at time 3 (1987) and time 4(1989). At time 2 (1985), because of financial constraints,CBCLs were obtained by mail. Usable CBCLs were obtained for 936 children on all four occasions. This is 71.2%of the children in the time 1 sample, and 62.5% of theoriginal target population. The sample included 238 boysand 240 girls who were initially aged 4 to 7, plus 221 boysand 237 girls who were initially aged 8 to 11. For a moredetailed description of the original time 1 sample, see Verhulst et al. (1985 a).

A problem with longitudinal studies is the attrition ofthe sample during the study. The present study was not anexception, because for nearly 29% of the children on whomthere was time 1 information, the follow-up data for the threesubsequent assessments were not complete. These childrenwere excluded from the present study. Dropouts (N = 375)and remainers (N = 936) were compared with respect to sex,

925

VERHULST AND VAN DER ENDE

age, socioeconomic status (SES), and CBCL total problemscores. Dropouts and remainers did not differ significantlyin the level of total problem scores (24.5 for dropouts; 22.5for remainers; t = 1.8, NS), indicating that the dropoutsdid not form a group with especially problematic children.Dropouts also did not differ from remainers with respect toage or sex. However, parents of lower SES were somewhatoverrepresented in the group of dropouts (40.6% of lowerSES in the group of dropouts, and 32.2% in the group ofremainers), whereas in the group of remainers, parents fromhigher SES were overrepresented (28.6% of higher SES inthe group of dropouts, and 34.3% in the group of remainers)(chi square = 21.1; df = 5; p < 0.01). Because lower SESchildren may be subjected to somewhat poorer environmental influences, there is a possibility that the present study'sfindings slightly underestimate the level of persistence ofchildren's problem behavior.

Results

Total Problem Scores

To assess the persistence of problem behaviors across the6-year time interval, first children were identified who couldbe classified as deviant at time 1. The 90th percentile (P90)of the cumulative frequency distribution of total problemscores was chosen as the clinical cutoff above which children can be considered deviant. Because attrition of thesample across the 6-year interval may have caused selectiveloss of subjects, the P90 was based on the original time 1Dutch normative sample for both sexes and age-groups 4 to11 and 12 to 16 separately (Verhulst et aI., 1985a). Thisnormative sample consisted of a representative sample fromthe general population, excluding those children who hadreceived mental health services within the preceding 12months (N = 2033). In this way, a normative sample ofchildren considered to be healthy was obtained. Using thescore associated with the P90 of the normative sample asthe clinical cutoff, 92 children could be identified who werescored above this cutoff at time 1.

Because movements of children scoring above the cutofftoward scores just below the cutoff can hardly be regardedas significant improvement in children's functioning, the50th percentile (P50) was chosen as the border below whichchildren were regarded as functioning well.

The P90 and P50 cutoff scores make it possible to detectsignificant improvement or worsening in children's functioning across time. Although it was also possible to determine cutoffs for each time of assessment separately to takeaccount of possible retest and method effects, it was decidedto use the cutoff scores based on the Dutch normative sample, rather than different cutoffs for each specific time. Themain reason for this choice was that the general populationsample of 936 children that was followed-up was not trulynormative because of attrition and of the fact that the samplealso contained children who had been referred for mentalhealth services. By applying the normative cutoff scores,it was likely that children scoring above the P90 showedbehaviors that were deviant enough to be regarded as clinically relevant.

926

Figure 1 shows the pathways of the 92 children who werescored in the deviant range of total problem scores by theirparents at time 1 (upper left of the figure). The four timesof assessment are indicated at the top of the figure. Thefigure should be read from the left to the right, from time 1to time 4. Table entries indicate numbers of children.

Figure 1 shows the movements across the borders of the90th and the 50th percentile, indicated as P90 and P50.Following the total problem scores of the 92 children whocould be regarded deviant at time 1, 39 were still scored inthe deviant range at time 2 (above the P90 line), whereas46 children improved somewhat (scoring between the P90and P50 at time 2), and seven could be regarded to havebeen considerably improved (scoring below the P50 at time2). Next, it is possible to track the children's scores fromtime 2 to time 3, and from time 3 to time 4.

Fifteen children (16% of the children who were scoreddeviant at time 1) were scored in the deviant range at allfour assessments (upper right corner). Thirty children (33%)of those who were scored in the deviant range at time 1were scored in the deviant range again 6 years later at time4. This percentage is indicated on the right side at the top.Fifteen children (16%) made a temporary detour with scoresat one or two times of assessment below the P90 and then

Total Problem ScoreTime 1 Time 2 Time 3 Time 4

1592 39 2546

13 4

33%1

13 796

~ ~ W52%4

4 3P 50 ..

11315%

5 1231

7 3 2

FIG. 1. Pathways of the 92 children with initial total problem scoresin the clinical range across the 6-year follow-up period. The four timesof assessment are indicated at the top. The figure should be read fromthe left to the right. Dotted lines indicate the P90 border (top) or thePSO border (bottom). Entries indicate numbers of children, exceptwhere percentages are given on the right side indicating the percentageof children scoring in the deviant range at time 1 who are respectively(from top to bottom) scored above the P90, between the P90 and thePSO, and below the pso.

J.Am. Acad. Child Adolesc. Psychiatry, 31:5,September 1992


DDDD D-L LLLL L-D R Total (N = 936)

Percentage 1.6 . 1.5 26.8 1.1 69 100

Category

11221

1 2%11

67%25

Time 4

2

Time 3Time 2Time 1

P 90 .. __ .. oo. oooooooooooo_oooooooo __ oooo oooooo.oo_oo 00.00-

3 213

75 30 2331%1

15'--~":-'--f+ 57v 36

Total Problem Score

but this difference was not significant (chi square = 1.51,df = 1, NS).

Figure 2 shows what happened to the children in thesample who at time 1 were scored below the P50. Thesechildren can be regarded as functioning well. The left cornerat the bottom of Figure 2 shows that this was the case for450 children at time 1. Of these children, only six showeda considerable increase in problem behavior from below theP50 at time 1 to above the P90 at time 2. For 75 children,a somewhat smaller increase in problem behavior was found(from below the P50 at time 1 to the scoring range betweenthe P50 and the P90 at time 2).

The majority of children (56%) who at time I could beregarded as functioning well continued to be scored belowthe P50 at each of the three subsequent assessments. Of thechildren who scored below the P50 at time 1, 67% werescored below the P50 at time 4 (indicated on the right sideat the bottom); 51 (11%) were scored above the P50 at time2 and/or time 3, but returned below the P50 at time 4. Themajority of children scored below the P50 at time 1 whoshowed an increase in problem behavior (scores above theP50) at subsequent assessments did not obtain scores abovethe P90. Only 2% (indicated in the figure on the right side

521

450 369 309 251

FIG. 2. Pathways of the 450 children with initial total problem scoresbelow the P50 across the 6-year follow-up period. The four times ofassessment are indicated at the top. The figure should be read fromthe left to the right. Dotted lines indicate the P90 border (top) or theP50 border (bottom). Entries indicate numbers of children, exceptwher~ percenta~es are given on the right side indicating the percentageof children sconng below the P50 at time 1 who are respectively (fromtop to bottom) scored above the P90, between the P90 and the P50,and below the P50.

Age-Group (years)

4-7 8-11--- ---N(%) N(%)

Boys Girls

N(%) N(%)

Sex

TABLE 2. Distribution by Sex and Age-Group of ChildrenScored Deviant at Time 1 and Children Who Were Scored Deviant

at All Four Assessments

TABLE 1. Percentage of Children in Each Category RepresentingDifferent Patterns of Longitudinal Course

Note: DDDD = scores above the P90 at all four assessments (D =deviant), LLLL = scores below the P50 at all four assessments (L =low), D-L = scores above the P90 at time 1 and below the P50 attime 4, L-D = scores below the P50 at time 1 and above the P90 attime 4, R = the remainder of children.

back again to the deviant range at time 4. Only one childhad a total problem score below the P50 (at times 2 and 3)and again above the P90 at time 4. Although 67% of thedeviant children at time 1 showed improvement, for themajority this improvement was only slight; 52% were scoredbelow the P90 and above the P50 (percentage indicated inthe figure on the right side in the middle). Only 15% of thedeviant children at time 1 showed a significant improvement, that is, they were scored below the P50 at time 4(indicated on the right side at the bottom).

Table 1 shows the percentages of children from the totalsample falling into four groups representing different patterns of longitudinal course that are of interest. These groupsconsisted of the following: (1) children who were scored inthe deviant (D) range (above the P90) at each of the fourassessments; (2) children who scored low (L; below the P50)at each of the four assessments; (3) children who changedfrom low to deviant scores (L-D); and (4) children whochanged from deviant to low scores (D-L) during the 6year time interval. The table shows that the percentages ofchildren in each group, except the LLLL group, were verysmall. The percentages of children who made significantchanges in their scores across the 6-year time interval werethe smallest. The majority of children showed less extremechanges from time I to time 4 in their scores.

To determine the possible effects of sex and age on thelevel of persistence, Table 2 shows the percentage of children scored deviant at time 1 who were also scored deviantat each of the three subsequent assessments.

The percentages in Table 2 indicate that the persistencefor girls was somewhat greater than that for boys. However,this difference was not significant (chi square = 1.73, df =1, NS). Older children seemed to show somewhat morepersistence in their problem behavior than younger children,

Children scored deviantat time 1 51 (55) 41 (45) 44 (48) 48 (52)

Children scored deviantat all four assessments 6 (40) 9 (60) 5 (33) 10 (67)

J. Am. Acad. Child Adolesc. Psychiatry, 31:5, September 1992 927


2117%6261

10 '-f----'rn

InternalizingTime 1 Time 2 Time 3 Time 4

104 26 1837272

10 6

36%2

P 90 .._............10

59 39

8 2FIG. 4. Pathways of the 104 children with initial internalizing scoresin the clinical range across the 6-year follow-up period. Explanationof figure entries similar to that for Figure 1.

at the top) of the well functioning children at time 1 weredeviant at time 4.

To determine the degree to which deviance at time 1predicted deviance 6 years later, odds ratios were computed.The risk of deviance at time 4 was computed for childrenwho were scored in the deviant range of total problem scoresat time 1 relative to the risk for children who were notdeviant at time 1. The odds ratio was 9.0 (95% confidenceinterval 5.7 to 14.3). In other words, children who had beendeviant at time 1 were nine times more likely to be devianton the total problem scale at time 4 than were children whohad not been deviant on the total problem scale at time 1.The odds ratio for girls of 14.2 (7.4 to 27.3) was greaterthan that for boys, which was 6.1 (3.2 to 11.8). However,this difference was not significant (Breslow-Day test forhomogeneity of the odds ratio's; chi square = 2.4; df = 1;NS). The odds ratio for the 4- to 7-year-old cohort of 9.1(4.9 to 17.0) was very similar to that for 8- to ll-year-olds,which was 9.8 (4.9 to 19.7).

Patterns of Longitudinal Course for Different Syndromes

The patterns of movements across time for children'sexternalizing and internalizing scores were determined. Figures 3 and 4 show what happened to the children who attime 1 were scored above the P90 for the externalizing andinternalizing scores respectively. The figures can be read ina similar way as Figure 1.

As can be seen from these figures, the patterns of persistence across time for both sets of scores were very similar

Externalizing

FIG. 3. Pathways of the 104 children with initial externalizing scoresin the clinical range across the 6-year follow-up period. Explanationof figure entries similar to that for Figure 1.

to each other. There was no difference between externalizingversus internalizing scores in the distribution across the threetime 4 categories divided by the P50 and P90 of the childrenwho were deviant at time 1 (chi-square = 3.03, df= 2, NS).

The odds ratio for predicting deviance at time 4 given adeviant score at time 1 for externalizing scores was 9.5 (6.0to 14.8), which differed little from the odds ratio of 10.1(6.6 to 15.6) for the internalizing scores.

Table 3 shows the percentages of children who werescored deviant or low at time 1 in each category representingdifferent longitudinal patterns. Percentages are shown forthe two broad-band externalizing and internalizing syndrome groupings as well as for the specific syndromes described by Achenbach (1991). The percentages for totalproblems scores are also shown for purposes of comparison.No percentages are shown for the somatic complaints andthought problems syndromes, because it was not possible todetermine the P50 and P90 for these syndromes. These twosyndromes consist of relatively few items. As a result thepercentiles corresponding with each successive score didnot show a sequence which allowed the determination ofa meaningful P50 and/or P90. The intervals between the'successive percentiles were too wide and uneven. Becausethe number of children who were scored in the deviant orlow range at time 1 differed somewhat between the differentsyndromes, the percentage of children who were in the deviant or low group is shown instead of the percentage ofchildren from the total sample.

To determine the differences between the different syn-

1

11%214

34<-----

10 L..--/-__\\-

10----

Time 1 Time 2 Time 3 Time 4

104 38 24 13471

12 7

132%

14 91051

56 34 2358%

71

5 4P 50 - ..

928 J.Am. Acad. Child Adolesc. Psychiatry, 31:5, September 1992

TABLE 3. Percentages of Children Scored Deviant or Lowat Time 1 in Each Category Representing Different Patterns of

Longitudinal Course

Percentage in Each Category

DDDD D-L L-D

N(%) N(%) N(%)

Withdrawn 119 (16) 119 (14) 372 (4)Anxious/depressed 119 (16) 119 (II) 382 (4)Social problems 109 (16) 109 (25) 403 (5)Attention problems 116 (21) 116 (21) 402 (3)Delinquent behavior 120 (13) 120 (24) 416 (6)Aggressive behavior 105 (12) 105 (12) 435 (2)Externalizing 104 (13) 104 (11) 446 (2)Internalizing 104 (17) 104 (17) 407 (3)Total problems 92 (16) 92 (15) 450 (2)

Note: DDDD = scores above the P90 at all four assessments (D =deviant), D~L = scores above the P90 at time 1 and below the P50 attime 4, L-D = scores below the P50 at time 1 and above the P90 attime 4.

Percentages are not given for the syndromes somatic complaintsand thought problems because cutoff scores corresponding with theP90 and P50 could not be adequately determined.

dromes in the proportions of children in each category, chisquares were computed. To reduce the probability of chancefindings with many tests of difference, a p value of 0.01 waschosen. Differences between the narrow band syndromesand differences between the externalizing and internalizingsyndromes were considered separately.

The proportions of children in the DDDD group did notdiffer significantly between the different narrow band syndromes and between the two externalizing and internalizingscores. The proportion of children changing from the deviantrange at time 1 to below the P50 at time 4 was significantlylower for the anxious/depressed syndrome than for the socialproblems syndrome (chi-square > 7.54, df > I, p < 0.01),as well as for the delinquent behavior syndrome (chi-square;;:: 7.2, df > 1, p < 0.01). In other words, children with theanxious/depressed syndrome at time 1 showed more stabilitythan those with delinquent behavior and/or with social problems. No other differences were significant for the D-Lgroup. The proportions of children in the L-D group did notdiffer significantly between the different syndromes. Table3 shows that the percentages of children for each syndromewere lower in the L-D than in the D-L group.

DiscussionInformation on the developmental course of child psychi

atric disorders is of importance to both child and adult psychiatry. For adult psychiatry, it is relevant to know to whatextent disorders have their origin in childhood. In childpsychiatry , clinicians are often confronted with questionsconcerning the prognosis of childhood disorders. Information is therefore needed on the course of problem behaviorin children across relatively long time intervals. The presentstudy tracked the developmental process of parent reportedproblem behavior in individual children across a 6-year period assessed at 2-year intervals with the same standardizedprocedure.

J. Am. Acad. Child Adolesc. Psychiatry, 31:5,September 1992


In an earlier study investigating the 4-year course of parent reported problem behavior, stability was assessed interms of correlation coefficients between problem scores attwo times of assessment (Verhulst et aI., 1990). In the present study, the variation of problem behavior in childrencould be tracked along pathways with 2-year anchor points.In this way, it was not only possible to assess the stabilityof problem behaviors across a rather long time interval, but itwas also possible to assess what happened in the interveningtime.

Correlations can tell us to what extent children tend topreserve their rank orders over time. The stability coefficientfor CBCL total problem scores for the present study's sample across the 6-year period was r ;;:: 0.56. This means that31% of the variance of scores at time 1 was shared by thevariance at time 4. Although the 6-year stability of CBCLtotal problem scores was thus large according to Cohen's(1977) criteria, it is difficult to translate this finding to amore individual level. In the present study, a somewhatdifferent approach was taken in which the behavior of individual children was tracked.

Total Problem Scores

The majority of children in the present study who couldbe regarded deviant at time 1 showed an improvement intheir functioning. However, this improvement was onlyslight. Only 15% of the deviant children at time 1 showeda marked improvement. Ghodsian et al. (1980) followed upa sample of 16,000 children born in England, Scotland, andWales during a week in March 1958. Parent reports wereobtained for the children at ages 7, II, and 16 years. Theauthors used different assessment instruments at differentages. They also computed cutoff scores based on the scoringdistribution at each time of assessment. They found that1.8% from the total sample of children dropped from abovethe P87 of the distribution of total problem scores to belowthe P50 across the 9-year interval. This percentage is notmuch different from the 1.5% in our sample representingchildren with scores changing from above the P90 at time

. 1 and scores below the P50 6 years later.Of the deviant children in the present study at time 1,

one-third were still deviant 6 years later, irrespective of theirscores in the intervening period. Ghodsian et al. (1980)found that 25% of the children who were scored in the top13% of problem scores at age 7 were scored above the P87at age 16. The somewhat lower stability found by Ghodsianet al. (1980) than the 33% found in the present study is notsurprising in view of the 3-year difference in time interval.In the study by Esser et al. (1990), 50% of German childrenwho were deviant at age 8 were still deviant at age 13. The5-year persistence in this study was higher than the 33%persistence in the present study across the 6-year intervaland the 41% persistence across the 4-year interval. However,differences in methodology make direct comparison difficult. .The data in the present study were based on parents'reports obtained through highly standardized procedures,whereas in the German study clinical interviews, clinicalseverity ratings and ICD diagnostic classifications were employed. Children with moderate and severe psychiatric dis-

929


orders were regarded as deviant, although the authors statedthat children with moderate disorders did not necessarilyrequire professional treatment. At age 8, 16.2% of the children in the German general population sample were regarded as disturbed. This is more than the statisticallydefined 10% cutoff for deviance employed in the presentstudy. The broader band of behavioral variation in the German study may have increased the level of persistence.

The largest decrease in number of deviant children fromone assessment to the next took place from the first to thesecond assessment: 42% of the deviant children at time 1were still deviant at time 2; 64% of those who were deviantat time 1 and time 2 were deviant at time 3; and 60% ofthose who were deviant at time 1, 2, and 3 were deviant attime 4. This sharper decline in proportion of problem children from the first to the second assessment than foundbetween the subsequent two assessments may be the resultof a retest effect which is operative only during a first retestand less so during subsequent retests. In the present study,the mean total problem scores for the four assessments were22.5, 18.2, 18.1, and 17.7, respectively. This decrease inscores across time which was mainly caused by the decreasefrom time 1 to time 2 was significant in a repeated measuresanalysis of variance (F = 65.78, df = 3, p < 0.001). Aretest effect in the assessment of psychopathology has beendescribed by other authors as well (Achenbach and Edelbrock, 1983; Robins, 1985; Helzer et aI., 1985). Anotherfactor possibly involved is the fact that at time 2 a mailingprocedure was used, whereas at the other times of assessment interviews were used. It may be that parents are lessreadily prepared to report problem behaviors of their children through a mailing survey than through an interview.

The percentage of children (15%) who moved from thedeviant range (above the P90) at time 1 to the low range(below the P50) at time 4 was greater than the percentageof children (2%) who moved from low to high. The retesteffect discussed above may partly explain this difference. Asecond factor possibly involved is regression to the meanthat is stronger in the more extreme ranges of the scoringdistribution than in ranges closer to the mean. Scores abovethe P90 will therefore show a larger regression to the meanthan scores below the P50. A third factor involved is that anumber of children scoring in the deviant range are likelyto receive professional help that may have a downwardeffect on the problem scores. Koot and Verhulst (1992)reported that children from the general population withCBCL total problem scores above the P90 were five timesmore likely than children scoring below the P90 to be referred to a mental health service and two times more likelyto be referred for special education across a 4-year timeinterval.

There was no significant difference between the stabilitiesfor the 4 to 7-versus the 8 to ll-year-olds, indicating thatstability is similar for older and younger children in thepresent sample. Although the stability for problem behaviorsin girls was somewhat larger than that in boys, this difference was not significant.

930

Stability of Internalizing versus Externalizing Problems

Externalizing scores consisted of the aggressive and delinquent behavior syndromes, whereas internalizing scoresconsisted of the anxious/depressed, somatic complaints andwithdrawn syndromes. Although correlational analysesacross 2 and 4 years (Verhulst and Althaus, 1988; Verhulstet aI., 1990) revealed significantly, though slightly, higherstability for externalizing than internalizing scores, the categorical analyses in the present study covering a 6-year timeinterval did not show significant difference.s between thestabilities of these scores. Across the 6-year interval, 32%of the children with externalizing and 36% of the childrenwith internalizing scores in the deviant range at time 1 werestill scored in the deviant range at time 4. These findingsdid not support the findings be Esser et al. (1990), Rutteret al. (1976), and Graham and Rutter (1973), who reportedgreater persistence for conduct than for neurotic disorders.The similarity in level of persistence for externalizing andinternalizing scores does not necessarily imply similarity inthe consequences for other areas of functioning. Again, thedifferences in methodologies across studies should be emphasized. The studies by Esser et al. (1990) and Rutter et al.(1976) employed clinical interviews and severity ratings. Itis not known to what extent severity ratings of the level ofpsychiatric disturbance and hence the level of persistencecan be influenced by factors other than the symptoms constituting the particular syndrome or groups of syndromes suchas reactions from the environment.

The finding in the present study that internalizing problems show considerable persistence underscores the notionthat internalizing problems cannot be disregarded. Persistence of externalizing problems may be linked with persistence in the suffering by the environment as a consequenceof the child's behavior. However, in case of internalizingproblems, the child or adolescent may well suffer from persisting internalizing problems. Adults, such as parents,teachers, and mental health professionals, who interact withchildren should be aware of the risks of internalizing problems to show continuity to a considerable degree. Childrenwho suffer from internalizing problems do not seem to growout of their problems more rapidly than children showingexternalizing problems.

In conclusion, this study demonstrated moderate stabilityof problem behaviors across a 6-year interval. However,children's problems behaviors should not be regarded asstatic. Many children showed changes in their level of functioning across time, although extreme changes were the exception rather than the rule.

ReferencesAchenbach, T. M. (1991), Integrative Guidefor the 1991 CBCU4-18,

YSR, and TRF Profiles. Burlington, VT: University of VermontDepartment of Psychiatry.

--& Edelbrock C. (1983), Manualfor the Child Behavior Checklistand Revised Profile. Burlington, VT: University of Vermont Department of Psychiatry.

----(1986), Manualfor the Teacher's Report Form. Burlington,VT: University of Vermont, Department of Psychiatry.

Berden, G. F. M. G., Althaus, M. & Verhulst, F. C. (1990), Major life

J. Am. Acad. Child Adolesc. Psychiatry, 31..5,September 1992

events and changes in the behavioral functioning of children. J.Child Psychol. Psychiatry, 31:949-959.

Cohen, 1. (1977), Statistical Power Analysis for the Behavioral Sciences (Rev.Ed.). New York: Academic Press.

Esser, G., Schmidt, M. H. & Woerner, W. (1990), Epidemiology andcourse of psychiatric disorders in school-age children - Results ofa longitudinal study. J. Child Psychol Psychiatry, 31:243-263.

Ghodsian, M., Fogelman, K., Lambert, L. & Tibbenham, A. (1980),Changes in behaviour ratings of a national sample of children. Br.J. Soc. Clin. Psychol., 19:247-256.

Graham, P. & Rutter, M. (1973), Psychiatric disorder in the youngadolescent: a follow-up study. Proc. R. Soc. Med., 66:1226-1229.

Harrington, R., Fugde, H., Rutter, M., Pickels, A. & Hill, 1. (1990),Adult outcomes of childhood and adolescent depression: I. Psychiatric status. Arch. Gen. Psychiatry, 47:465-473.

Helzer, J. E., Robins, L. N., McEvoy, L. T., Spitznagel, E. L., Stolzman, R. K., Farmer, A. & Brockington,!. F. (1985), A comparisonof clinical and diagnostic interview schedule diagnoses: physicianreexamination of lay-interviewed cases in the general population.Arch. Gen. Psychiatry, 42:657-666.

Koot, H. M. & Verhulst, F. C. (1992), Prediction of children's referralto mental health and special educational services from earlier adjustment. J. Child Psychol. Psychiatry, 33:717-729.

Kovacs, M., Feinberg, T. L., Crouse-Novak, M. A., Paulauskas,S. L. & Finkelstein, R. (1984'), Depressive disorders in childhood:1.A longitudinal prospective study of characteristics and recovery.Arch. Gen. Psychiatry, 41:229-237.

----Pollack, M. & Finkelstein, R. (1984b) , Depressive disordersin childhood: II. A longitudinal study of the risk for a subsequent

J.Am. Acad. Child Adolesc. Psychiatry, 31:5, September 1992


major depression. Arch. Gen. Psychiatry, 41:643-649.McConaughy, S. H., Stanger, C. & Achenbach, T. M. (in press),

Three-year course of and concurrent agreement among informants.J. Am. Acad. Child Adolesc. Psychiatry.

Richman, N., Stevenson, J. & Graham, P. J. (1982), Preschool toSchool: A Behavioral Study. New York: Academic Press.

Robins, L. N. (1985), Epidemiology: reflections on testing the validityof psychiatric interviews. Arch. Gen. Psychiatry, 42:918-924.

Rutter, M., Graham, P., Chadwick, O. F. D. & Yule, W. (1976),Adolescent turmoil: fact or fiction? J. Child Psychol. Psychiatry,17:35-56.

Verhulst, F. C. & Althaus, M. (1988), Persistence and change inbehavioral/emotional problems reported by parents of children aged4-14: an epidemiological study. Acta. Psychiat. Scand. [Suppl. 339]77.

--& van der Ende, 1. (1991), Assessment of child psychopathology:relations between different methods, different informants and clinical judgment of severity. Acta. Psychiatr. Scand., 24: 155-159.

---- (1991), Four-year follow-up of teacher reported problems.Psychol. Med., 21:965-977.

--Akkerhuis, G. W. & Althaus, M. (1985'), Mental health in Dutchchildren: 1. A cross cultural comparison. Acta. Psychiatr. Scand.[Suppl. 323] 72.

--Berden, G. & Sanders-Woudstra, J. A. R. (1985b) , Mental healthin Dutch children: II. Prevalence of psychiatric disorder and relationships between measures. Acta Psychiatr. Scand. [Suppl. 324]72.

-- Koot, J. M. & Berden, G. F. M. G. (1990), Four-year followup of an epidemiological sample. J. Am. Acad. Child Adolesc.Psychiatry, 29:440-448.

931

Documents

Six-Year Developmental Course of Internalizing and Externalizing Problem Behaviors