69
Jim Cummins University of Toronto The acronyms BICS and CALP refer to a distinction introduced by Cummins (1979) between basic interpersonal communicative skills and cognitive academic language proficiency. The distinction was intended to draw attention to the very different time periods typically required by immigrant children to acquire conversational fluency in their second language as compared to grade-appropriate academic proficiency in that language. Conversational fluency is often acquired to a functional level within about two years of initial exposure to the second language whereas at least five years is usually required to catch up to native speakers in academic aspects of the second language (Collier, 1987; Klesmer, 1994; Cummins, 1981a). Failure to take account of the BICS/CALP (conversational/academic) distinction has resulted in discriminatory psychological assessment of bilingual students and premature exit from language support programs (e.g. bilingual education in the United States) into mainstream classes (Cummins, 1984). Origins of the BICS/CALP Distinction Skutnabb-Kangas and Toukomaa (1976) brought attention to the fact that Finnish immigrant children in Sweden often appeared to educators to be fluent in both Finnish and Swedish but still showed levels of verbal academic performance in both languages considerably below grade/age expectations. Similarly, analysis of psychological assessments administered to minority students showed that teachers and psychologists often assumed that children who had attained fluency in English had overcome all difficulties with English (Cummins, 1984). Yet these children frequently performed poorly on English academic tasks as well as in psychological assessment situations. 1

Jim Cummins Krashen Proficiency

Embed Size (px)

DESCRIPTION

BICS & CALP

Citation preview

Jim Cummins

Jim Cummins University of Toronto The acronyms BICS and CALP refer to a distinction introduced by Cummins (1979) between basic interpersonal communicative skills and cognitive academic language proficiency. The distinction was intended to draw attention to the very different time periods typically required by immigrant children to acquire conversational fluency in their second language as compared to grade-appropriate academic proficiency in that language. Conversational fluency is often acquired to a functional level within about two years of initial exposure to the second language whereas at least five years is usually required to catch up to native speakers in academic aspects of the second language (Collier, 1987; Klesmer, 1994; Cummins, 1981a). Failure to take account of the BICS/CALP (conversational/academic) distinction has resulted in discriminatory psychological assessment of bilingual students and premature exit from language support programs (e.g. bilingual education in the United States) into mainstream classes (Cummins, 1984).

Origins of the BICS/CALP Distinction

Skutnabb-Kangas and Toukomaa (1976) brought attention to the fact that Finnish immigrant children in Sweden often appeared to educators to be fluent in both Finnish and Swedish but still showed levels of verbal academic performance in both languages considerably below grade/age expectations. Similarly, analysis of psychological assessments administered to minority students showed that teachers and psychologists often assumed that children who had attained fluency in English had overcome all difficulties with English (Cummins, 1984). Yet these children frequently performed poorly on English academic tasks as well as in psychological assessment situations. Cummins (1981a) provided further evidence for the BICS/CALP distinction in a reanalysis of data from the Toronto Board of Education. Despite teacher observation that peer-appropriate conversational fluency in English developed rapidly, a period of 5-7 years was required, on average, for immigrant students to approach grade norms in academic aspects of English.

The distinction was elaborated into two intersecting continua (Cummins, 1981b) which highlighted the range of cognitive demands and contextual support involved in particular language tasks or activities (context-embedded/context-reduced, cognitively undemanding/cognitively demanding). The BICS/CALP distinction was maintained within this elaboration and related to the theoretical distinctions of several other theorists (e.g. Bruners [1975] communicative and analytic competence, Donaldsons [1978] embedded and disembedded language, and Olsons [1977] utterance and text). The terms used by different investigators have varied but the essential distinction refers to the extent to which the meaning being communicated is supported by contextual or interpersonal cues (such as gestures, facial expressions, and intonation present in face-to-face interaction) or dependent on linguistic cues that are largely independent of the immediate communicative context.

The BICS/CALP distinction also served to qualify John Oller's (1979) claim that all individual differences in language proficiency could be accounted for by just one underlying factor, which he termed global language proficiency. Oller synthesized a considerable amount of data showing strong correlations between performance on cloze tests of reading, standardized reading tests, and measures of oral verbal ability (e.g. vocabulary measures). Cummins (1979, 1981b) pointed out that not all aspects of language use or performance could be incorporated into one dimension of global language proficiency. For example, if we take two monolingual English-speaking siblings, a 12-year old child and a six-year old, there are enormous differences in these children's ability to read and write English and in their knowledge of vocabulary, but minimal differences in their phonology or basic fluency. The six-year old can understand virtually everything that is likely to be said to her in everyday social contexts and she can use language very effectively in these contexts, just as the 12-year old can. Similarly, as noted above, in second language acquisition contexts, immigrant children typically manifest very different time periods required to catch up to their peers in everyday face-to-face aspects of proficiency as compared to academic aspects.

Critique

Early critiques of the conversational/academic distinction were advanced by Carole Edelsky and her colleagues (Edelsky et al., 1983) and in a volume edited by Charlene Rivera (1984). Edelsky (1990) later reiterated and reformulated her critique and other critiques were advanced by Martin-Jones and Romaine (1986) and Wiley (1996).

The major criticisms are as follows:

The conversational/academic language distinction reflects an autonomous perspective on language that ignores its location in social practices and power relations (Edelsky et al., 1983; Wiley, 1996). CALP or academic language proficiency represents little more than test-wiseness - it is an artifact of the inappropriate way in which it has been measured (Edelsky et al., 1983). The notion of CALP promotes a deficit theory insofar as it attributes the academic failure of bilingual/minority students to low cognitive/academic proficiency rather than to inappropriate schooling (Edelsky, 1990; Edelsky et al., 1983; Martin-Jones & Romaine, 1986). In response to these critiques, Cummins (Cummins & Swain, 1983; Cummins, in press) pointed to the elaborated sociopolitical framework within which the BICS/CALP distinction was placed (Cummins, 1986, 1996) where underachievement among subordinated students was attributed to coercive relations of power operating in the society at large and reflected in schooling practices. He also invoked the work of Biber (1986) and Corson (1995) as evidence of the linguistic reality of the distinction. Corson highlighted the enormous lexical differences between typical conversational interactions in English as compared to academic or literacy-related uses of English. Similarly, Bibers analysis of more than one million words of English speech and written text revealed underlying dimensions very consistent with the distinction between conversational and academic aspects of language proficiency. Cummins also pointed out that the construct of academic language proficiency does not in any way depend on test scores as support for either its construct validity or relevance to education, as illustrated by the analyses of Corson and Biber.

Conclusion

The distinction between BICS and CALP has exerted a significant impact on a variety of educational policies and practices in both North America and the United Kingdom (e.g. Cline & Frederickson, 1996). Specific ways in which educators' misunderstanding of the nature of language proficiency have contributed to the creation of academic difficulties among bilingual students have been highlighted by the distinction. At a theoretical level, however, the distinction is likely to remain controversial, reflecting the fact that there is no cross-disciplinary consensus regarding the nature of language proficiency and its relationship to academic achievement. References Biber, D. (1986) Spoken and written textual dimensions in English: Resolving the contradictory findings. Language, 62, 384-414.

Bruner, J.S. (1975) Language as an instrument of thought. In A. Davies (ed.), Problems of language and learning. London: Heinemann.

Cline, T. & Frederickson, N. (eds.) (1996) Curriculum related assessment, Cummins and bilingual children, Clevedon: Multilingual Matters.

Collier, V. P. (1987) Age and rate of acquisition of second language for academic purposes. TESOL Quarterly, 21, 617-641.

Corson, D. (1995) Using English words. New York: Kluwer.

Cummins, J. (1979) Cognitive/academic language proficiency, linguistic interdependence, the optimum age question and some other matters. Working Papers on Bilingualism, No. 19, 121-129.

Cummins, J. (1981a) Age on arrival and immigrant second language learning in Canada. A reassessment. Applied Linguistics, 2, l32-l49.

Cummins, J. (1981b) The role of primary language development in promoting educational success for language minority students. In California State Department of Education (Ed.), Schooling and language minority students: A theoretical framework. Evaluation, Dissemination and Assessment Center, California State University, Los Angeles.

Cummins, J. (1984) Bilingualism and special education: Issues in assessment and pedagogy. Clevedon, England: Multilingual Matters.

Cummins, J. (1986) Empowering minority students: A framework for intervention. Harvard Educational Review, 56, 18-36.

Cummins, J. (1996) Negotiating identities: Education for empowerment in a diverse society. Los Angeles: California Association for Bilingual Education.

Cummins, J. (in press) Putting language proficiency in its place: Responding to critiques of the conversational/academic language distinction, in J. Cenoz and U. Jessner (eds.) English in Europe: The acquisition of a third language. Clevedon: Multilingual Matters.

Cummins, J. and Swain, M. (1983) Analysis-by rhetoric: reading the text or the readers own projections? A reply to Edelsky et al. Applied Linguistics, 4, 22-41.

Donaldson, M. (1978) Children's minds. Glasgow: Collins.

Edelsky, C. (1990) With literacy and justice for all: Rethinking the social in language and education. London: The Falmer Press.

Edelsky, C, Hudelson, S., Altwerger, B., Flores, B., Barkin, F., Jilbert, K.(1983) Semilingualism and language deficit. Applied Linguistics, 4(1), 1-22.

Klesmer, H. (1994) Assessment and teacher perceptions of ESL student achievement. English Quarterly, 26:3, 5-7.

Martin-Jones, M., and Romaine, S. (1986) Semilingualism: A half-baked theory of communicative competence. Applied Linguistics, 7:1, 26-38.

Oller, J. (1979) Language tests at school: A pragmatic approach. London: Longman.

Olson, D.R. (1977) From utterance to text: The bias of language in speech and writing. Harvard Educational Review, 47, 257-281.

Rivera, C. (Ed.). (1984) Language proficiency and academic achievement. Clevedon, England: Multilingual Matters.

Skutnabb-Kangas, T. and Toukomaa, P. (1976) Teaching migrant children's mother tongue and learning the language of the host country in the context of the sociocultural situation of the migrant family. Helsinki: The Finnish National Commission for UNESCO.

Wiley, T. G. (1996) Literacy and language diversity in the United States. Washington, DC: Center for Applied Linguistics and Delta Systems.

Why Bilingual Education? ERIC Digest.

by Krashen, Stephen

Bilingual education continues to receive criticism in the national media. This Digest examines some of the criticism, and its effect on public opinion, which often is based on misconceptions about bilingual education's goals and practice. The Digest explains the rationale underlying good bilingual education programs and summarizes research findings about their effectiveness.

When schools provide children quality education in their primary language, they give them two things: knowledge and literacy. The knowledge that children get through their first language helps make the English they hear and read more comprehensible. Literacy developed in the primary language transfers to the second language. The reason is simple: Because we learn to read by reading--that is, by making sense of what is on the page (Smith, 1994)--it is easier to learn to read in a language we understand. Once we can read in one language, we can read in general.

The combination of first language subject matter teaching and literacy development that characterizes good bilingual programs indirectly but powerfully aids students as they strive for a third factor essential to their success: English proficiency. Of course, we also want to teach in English directly, via high quality English-as-a-Second Language (ESL) classes, and through sheltered subject matter teaching, where intermediate-level English language acquirers learn subject matter taught in English.

The best bilingual education programs include all of these characteristics: ESL instruction, sheltered subject matter teaching, and instruction in the first language. Non-English-speaking children initially receive core instruction in the primary language along with ESL instruction. As children grow more proficient in English, they learn subjects using more contextualized language (e.g., math and science) in sheltered classes taught in English, and eventually in mainstream classes. In this way, the sheltered classes function as a bridge between instruction in the first language and in the mainstream. In advanced levels, the only subjects done in the first language are those demanding the most abstract use of language (social studies and language arts). Once full mainstreaming is complete, advanced first language development is available as an option. Gradual exit plans, such as these, avoid problems associated with exiting children too early (before the English they encounter is comprehensible) and provide instruction in the first language where it is most needed. These plans also allow children to have the advantages of advanced first language development.

SUCCESS WITHOUT BILINGUAL EDUCATION?

A common argument against bilingual education is the observation that many people have succeeded without it. This has certainly happened. In these cases, however, the successful person got plenty of comprehensible input in the second language, and in many cases had a de facto bilingual education program. For example, Rodriguez (1982) and de la Pena (1991) are often cited as counter-evidence to bilingual education.

Rodriguez (1982) tells us that he succeeded in school without a special program and acquired a very high level of English literacy. He had two crucial advantages, however, that most limited-English-proficient (LEP) children do not have. First, he grew up in an English-speaking neighborhood in Sacramento, California, and thus got a great deal of informal comprehensible input from classmates. Many LEP children today encounter English only at school; they live in neighborhoods where Spanish prevails. In addition, Rodriguez became a voracious reader, which helped him acquire academic language. Most LEP children have little access to books.

De la Pena (1991) reports that he came to the United States at age nine with no English competence and claims that he succeeded without bilingual education. He reports that he acquired English rapidly, and "by the end of my first school year, I was among the top students." De la Pena, however, had the advantages of bilingual education: In Mexico, he was in the fifth grade, and was thus literate in Spanish and knew subject matter. In addition, when he started school in the United States he was put back two grades. His superior knowledge of subject matter helped make the English input he heard more comprehensible.

Children who arrive with a good education in their primary language have already gained two of the three objectives of a good bilingual education program--literacy and subject matter knowledge. Their success is good evidence for bilingual education.

WHAT ABOUT LANGUAGES OTHER THAN SPANISH?

Porter (1990) states that "even if there were a demonstrable advantage for Spanish-speakers learning to read first in their home language, it does not follow that the same holds true for speakers of languages that do not use the Roman alphabet" (p. 65). But it does. The ability to read transfers across languages, even when the writing systems are different.

There is evidence that reading ability transfers from Chinese to English (Hoover, 1982), from Vietnamese to English (Cummins, Swain, Nakajima, Handscombe, Green, & Tran, 1984), from Japanese to English (Cummins et al.), and from Turkish to Dutch (Verhoeven, 1991). In other words, those who read well in one language, read well in the second language (as long as length of residence in the country is taken into account because of the first language loss that is common).

BILINGUAL EDUCATION AND PUBLIC OPINION

Opponents of bilingual education tell us that the public is against bilingual education. This impression is a result of the way the question is asked. One can easily get a near-100-percent rejection of bilingual education when the question is biased. Porter (1990), for example, states that "Many parents are not committed to having the schools maintain the mother tongue if it is at the expense of gaining a sound education and the English-language skills needed for obtaining jobs or pursuing higher education" (p. 8). Who would support mother tongue education at such a price?

However, when respondents are simply asked whether or not they support bilingual education, the degree of support is quite strong: From 60-99 percent of samples of parents and teachers say they support bilingual education (Krashen, 1996). In a series of studies, Shin (Shin, 1994; Shin & Gribbons, 1996) examined attitudes toward the principles underlying bilingual education. Shin found that many respondents agree with the idea that the first language can be helpful in providing background knowledge, most agree that literacy transfers across languages, and most support the principles underlying continuing bilingual education (economic and cognitive advantages).

The number of people opposed to bilingual education is probably even less than these results suggest; many people who say they are opposed to bilingual education are actually opposed to certain practices (e.g., inappropriate placement of children) or are opposed to regulations connected to bilingual education (e.g., forcing teachers to acquire another language to keep their jobs).

Despite what is presented to the public in the national media, research has revealed much support for bilingual education. McQuillan and Tse (in press) reviewed publications appearing between 1984 and 1994, and reported that 87 percent of academic publications supported bilingual education, but newspaper and magazine opinion articles tended to be antibilingual education, with only 45 percent supporting bilingual education. One wonders what public support would look like if bilingual education were more clearly defined in such articles and editorials.

THE RESEARCH DEBATE

It is sometimes claimed that research does not support the efficacy of bilingual education. Its harshest critics, however (e.g., Rossell & Baker, 1996), do not claim that bilingual education does not work; instead, they claim there is little evidence that it is superior to all-English programs. Nevertheless, the evidence used against bilingual education is not convincing. One major problem is in labeling. Several critics, for example, have claimed that English immersion programs in El Paso and McAllen, Texas, were shown to be superior to bilingual education. In each case, however, programs labeled immersion were really bilingual education, with a substantial part of the day taught in the primary language. In another study, Gersten (1985) claimed that all-English immersion was better than bilingual education. However, the sample size was small and the duration of the study was short; also, no description of "bilingual education" was provided. For a detailed discussion, see Krashen (1996).

On the other hand, a vast number of other studies have shown that bilingual education is effective, with children in well-designed programs acquiring academic English at least as well and often better than children in all-English programs (Cummins, 1989; Krashen, 1996; Willig, 1985). Willig concluded that the better the experimental design of the study, the more positive were the effects of bilingual education.

IMPROVING BILINGUAL EDUCATION

Bilingual education has done well, but it can do much better. The biggest problem, in this author's view, is the absence of books--in both the first and second languages--in the lives of students in these programs. Free voluntary reading can help all components of bilingual education: It can be a source of comprehensible input in English or a means for developing knowledge and literacy through the first language, and for continuing first language development.

Limited-English-proficient Spanish-speaking children have little access to books at home (about 22 books per home for the entire family according to Ramirez, Yuen, Ramey, & Pasta, 1991) or at school (an average of one book in Spanish per Spanish-speaking child in some school libraries in schools with bilingual programs, according to Pucci, 1994). A book flood in both languages is clearly called for. Good bilingual programs have brought students to the 50th percentile on standardized tests of English reading by grade five (Burnham-Massey & Pina, 1990). But with a good supply of books in both first and second languages, students can go far beyond the 50th percentile. It is possible that we might then have the Lake Wobegon effect, where all of the children are above average, and we can finally do away with the tests (and put the money saved to much better use).

REFERENCES

Burnham-Massey, L., & Pina, M. (1990). Effects of bilingual instruction on English academic achievement of LEP students. Reading Improvement, 27(2), 129-132.

Cummins, J. (1989). Empowering minority students. Sacramento, CA: California Association for Bilingual Education.

Cummins, J., Swain, M., Nakajima, K., Handscombe, J., Green, D., & Tran, C. (1984). Linguistic interdependence among Japanese and Vietnamese immigrant students. In C. Rivera (Ed.), Communicative competence approaches to language proficiency assessment: Research and application, pp. 60-81. Clevedon, England: Multilingual Matters. (ED 249 793)

de la Pena, F. (1991). Democracy or Babel? The case for official English in the United States. Washington, DC: U.S. English.

Gersten, R. (1985). Structured immersion for language-minority students: Results of a longitudinal evaluation. Educational Evaluation and Policy Analysis, 7(3), 187-196.

Hoover, W. (1982). Language and literacy learning in bilingual education: Preliminary report. Cantonese site analytic study. Austin, TX: Southwest Educational Development Laboratory. (ED 245 572)

Krashen, S. (1996). Under attack: The case against bilingual education. Culver City, CA: Language Education Associates.

McQuillan, J., & Tse, L. (in press). Does research matter? An analysis of media opinion on bilingual education, 1984-1994. Bilingual Research Journal.

Porter, R. P. (1990). Forked tongue: The politics of bilingual education. New York: Basic Books.

Pucci, S. L. (1994). Supporting Spanish language literacy: Latino children and free reading resources in schools. Bilingual Research Journal, 18(1-2), 67-82.

Ramirez, J. D., Yuen, S., Ramey, D., & Pasta, D. (1991). Longitudinal study of structured English immersion strategy, early-exit and late-exit bilingual education programs for language-minority children (Final Report, Vols. 1 & 2). San Mateo, CA: Aguirre International. (ED 330 216)

Rodriguez, R. (1982). Hunger of memory: The education of Richard Rodriguez. An autobiography. Boston: D. R. Godine.

Rossell, C., & Baker, R. (1996). The educational effectiveness of bilingual education. Research in the Teaching of English, 30(1), 7-74.

Shin, F. (1994). Attitudes of Korean parents toward bilingual education. BEOutreach Newsletter, California State Department of Education, 5(2), pp. 47-48.

Shin, F., & Gribbons, B. (1996). Hispanic parents' perceptions and attitudes of bilingual education. Journal of Mexican-American Educators, 16-22.

Smith, F. (1994). Understanding reading: A psycholinguistic analysis of reading and learning to read (5th ed.). Hillsdale, NJ: L. Erlbaum.

Verhoeven, L. (1991). Acquisition of literacy. Association Internationale de Linguistique Appliquee (AILA) Review, 8, 61-74.

Willig, A. (1985). A meta-analysis of selected studies on the effectiveness of bilingual education. Review of Educational Research, 55, 269-316.

Resources

Online Resources: Frequently Asked Questions

How effective is bilingual education?

Elizabeth Howard, Center for Applied Linguistics

The effectiveness of bilingual education is a strongly debated topic in the United States. Evaluation studies attempt to determine how the English acquisition and academic achievement of students in bilingual education programs compare with those of students in other types of programs. These evaluations are complicated, however, by the difficulties in formulating a strong research design. For example, it is difficult, if not impossible, to randomly assign children to different types of programs. In addition, there is a great deal of variation among bilingual education programs, just as there is among mainstream programs. These research design issues seriously limit the ability of large comparative studies to make definitive claims about the effectiveness of bilingual education.

Some research reviews of bilingual program evaluations have concluded that bilingual education makes no difference in the English language development and academic achievement of language minority students (Baker & DeKanter, 1981; Rossell & Ross, 1986; Rossell & Baker, 1996); that is, they found no difference between the English language development and academic achievement of students in bilingual programs versus students who received instruction in English only. Other reviews of bilingual evaluation studies have reached the opposite conclusion (Willig, 1985; Greene, 1998): that is, that there is a positive effect of bilingual education, such that language minority students in bilingual programs outperform their peers in monolingual English programs.

In summarizing the findings from large-scale national studies and research reviews on the effectiveness of bilingual education, a panel of experts convened by the National Research Council of the National Academy of Sciences recommended that future research focus on pinpointing features of effective programs for language minority students, rather than continuing to debate whether or not bilingual education as a whole is effective. (August & Hakuta, 1997). Not only would these smaller studies be easier to design and implement, they would contribute more to our collective understanding about how to educate language minority students effectively. Given the diversity in the language minority student population in this country, and the tremendous variation in local conditions, it is more useful to investigate features of classrooms and programs that are effective in specific contexts, rather than assuming that a single model would be best in all situations.

August, D., & Hakuta, K. (Eds.) (1997). Improving schooling for language minority children: A research agenda. Washington, DC: National Academy Press.

Baker, K. A., & DeKanter, A. A. (1981). Effectiveness of bilingual education: A review of the literature. Washington, DC: U.S. Department of Education.

Greene, J. (1998). A meta-analysis of the effectiveness of bilingual education. Available: http://ourworld.compuserve.com/homepages/ JWCRAWFORD/greene.htm

Rossell, C., & Baker, K. (1996). The educational effectiveness of bilingual education. Research in the Teaching of English 30(1), 7-74.

Rossell, C., & Ross, M. (1986) The social science evidence on bilingual education. Journal of Law and Education 15(4), 385-419.

Willig, A. (1985). A meta-analysis of selected studies on the effectiveness of bilingual education. Review of Educational Research 55(3), 269-317.

SETTINGEXPECTEDGAINSfor Non and Limited English Proficient Students

Edward De Avila, Ph.D. November, 1997

The primary instructional vehicle in the schools is language. Unfortunately, however there are large numbers of students in the US who find it difficult to benefit from instruction because of limited proficiency in the language of the classroom, usually English. These children come from a wide variety of environments. There are several million such children and even more adults. They share the fact that they have not been a part of the mainstream linguistic environment, and, as a result, have been excluded from both the educational and social benefits. They may be economically poor and therefore linguistically isolated from the mainstream or they may come from backgrounds where the home language or dialect does not match that of the schools. In either case, or, for whatever reasons, they are unable or find it difficult to benefit from the mainstream instruction offered in the schools and do not seem to learn.

Over the past twenty years, programs designed to improve the language proficiency of Limited English Proficient (LEP) students have met, unfortunately, with mixed results. And, largely because of a lack of adequate evaluative documentation, results have been equivocal regardless of program quality. Much of the confusion has come out of variable approaches to the concept of growth and what can be expected from programs of this type.

Recently, "expected gain" has become an important concept in documenting the educational development of Limited and Non English Proficient speaking students. An understanding of this concept requires an analysis of the relationship between quality of instruction and measurable student outcomes. There are three key factors which both underlie this relationship and provide the necessary foundation upon which expectations for learning that can be derived or generated in a meaningful and defensible manner.

Assessment Requirements The concept of expected gain, as implemented in programs for non and Limited English Proficient (LEP) students, assumes a direct empirical relationship between student gains in language proficiency and their probable success in a mainstream program. In this context, valid and reliable assessment of growth in language proficiency, sensitive to gains that are attributable to program and instruction, are essential. Tests failing to exhibit this relationship will not provide a stable basis for either setting expectations or demonstrating growth.

Setting Expectations and Sensitivity to Growth Setting a reasonable "expectation" for student performance must be done on an individual basis beginning with a determination of where the student enters the program and measuring growth in increments sensitive to substantive linguistic changes.

Instructional Practices and Programs The process of setting expectations for growth necessarily assumes exposure to an effective instructional program. Obviously without a quality program, expectation levels are meaningless, regardless of how they are created. On the other hand, properly set expectations, coupled with effective, programs can go along way toward creating a positive educational environment as well as documenting and/or validating programs.

Psychometric and pedagogical implications follow from each factor. However, before discussion of these implications, it would seem important to first clarify what is meant by language proficiency since much of the confusion over the effectiveness and purposes of programs aimed at "remediating" limited English proficiency results from a lack of a clear definition. As will be seen, in this connection, the distinction between language proficiency and academic achievement will become critical to this and any discussion regarding language minority children.

Language Proficiency DefinedIn as much as the concept of language proficiency in this context is directly related to the concept of expected-gain, it needs to be defined both conceptually and empirically. The term "language proficiency" as it is used here refers to those linguistic elements necessary for successful communication within the school environment. It is a broader concept than the concept of "academic achievement", though it underlies success in school. Thus, while language proficiency is viewed as a necessary element in defining academic success in the mainstream, it is not, in itself, sufficient to guarantee success as defined by performance which is indistinguishable from that of mainstream students.

Defined as communication, language proficiency consists of both receptive and productive skills, input and output, information sent and received. It is made up of both oral and literacy skills: listening, speaking, reading and writing. Proficiency in each of the four domains is viewed as a necessary element to language proficiency, as it contributes to academic success in the specific sense. Language proficiency is a necessary element to success in the general sense but not sufficient in the specific sense of guaranteeing success in school.

Knowing that a student is linguistically proficient tells us that s/he is able to benefit from instruction in the language of the classroom. While a test of language proficiency tells us nothing about how well a student will perform on a test of American history, it will, however, tell us that s/he can understand or comprehend (listen to) oral instruction on American history. Moreover, it will tell us whether s/he can be expected to comprehend and obtain textual information (reading) on American history, as well as write and speak about what s/he has learned about American history.

Relationship between student gains in language proficiency and probability of academic success in mainstream classrooms

Language proficiency is made up of both oral and literacy skills. Let us first consider oral skills. There are several studies that apply to the present discussion. The first was conducted in 1978 under contract to the California State Department of Education. In this study De Avila, Duncan and Cervantes (1978), hypothesized a linear relationship between five levels of a widely used test of oral language proficiency and academic performance as measured by the CTBS-U (see Figure 1).

As predicted, oral language proficiency was found to be a significant predictor of academic performance. Researchers found that students scoring at Levels 4 and 5 on the oral test passed the CTBS-U at or above the 36th %ile (see Figure 2). In other words, oral proficiency was found to be a necessary element for success.

The results of this study were, in part, used by several State Departments of Education to set "Reclassification" or cut-off scores for determining student eligibility for bilingual programs.

In a further attempt to test the assumed relationship between academic performance and literacy (reading & writing), the above study was replicated in 1988 using a reading and writing test in place of the oral test used in the 1978 study. Results from both studies are shown on Figure 2. The similarity of results is striking.

The same fundamental results were obtained as in the first study. This time, however, a direct relationship between language proficiency, as measured by literacy, and academic performance was found, in contrast to the same relationship between oral proficiency and academic achievement nd first study. Moreover, it was found that of students passing the reading and writing test at the "Competent Literate" Level (3), over 90 percent passed the CTBS-U at or above the 36th percentile (see Figure 2).

These studies, along with numerous other studies conducted by various researchers over the past fifteen years, provide ample support for the hypothesized relationship between language proficiency and school performance as well as the justification or basis for examining the extent of "expected growth" over time in relation to academic performance.

It should be obvious that the choice of a valid and reliable test of language proficiency is critical. Tests failing to "predict" performance in the manner discussed here would certainly be problematic in actually setting an expectation level or score.

There are, therefore, two primary requirements that must be placed on whatever test of language proficiency is used to measure growth. First it must predict or be related to programmatic criteria, be it defined as achievement on a statewide standards or a nationally normed test. Secondly, it must produce increments of growth in units which are reflective of learning and educationally meaningful.

Where the Student Enters the Program

It cannot be assumed that all students will learn at the same rate or to the same extent. In large measure, extent of growth is limited by how far along the student is on the learning curve when s/he enters or begins a program.

It is of key importance to understand that there is a difference between expected growth and possible growth when setting expectations. Stated in another way, this means that if a student begins at zero on a 0-100 scale, possible growth is 100. Conversely a student who begins at 99 can only grow to 100, an improvement of only one point, in sharp contrast to the student who had the possibility of gaining 100 points.

It is exactly because possible growth is a direct function of where along the program/measurement continuum a student begins that it is essential to establish that point before setting "expected growth". It would be foolish to expect the same growth for all students regardless of entry point, program type, quality or effectiveness.

Given that not all students can be expected to show the same amount of growth in the same time frame, it becomes essential to establish the point at which the student begins. In addition, regardless of what measurement device is used, it must be sensitive enough to show growth in meaningful increments. Thus, the choice of an appropriate "metric" becomes critically important. An improper metric such as categorical or nominal scales can obscure growth. Figure 3 illustrates three examples where student progress or growth is tracked both by level of proficiency (I to 5) and by continuous score (0 to 100)

Consider Student One who begins knowing absolutely no English at all, a Level 1 student on a 1 to 5 scale. As shown on Figure 3, Student One made "no change " in proficiency level between pre and post tests. On the other hand, however, the total score for Student One shows a gain of 20 total points along the proficiency continuum. Student Two, in contrast, gained a full level between pre and post tests, however, gained only 10 points along the proficiency continuum . Similarly Student Three gained a full level, but showed only a five point increase between tests.

"Level", in the above, indicated that only two of the three examples showed gain. Examination of the total point scored revealed, perhaps ironically, that the student who showed the greatest gain (in points) made no gain or change in "Level".

The above example illustrates the importance of why the choice of an appropriate metric to indicate change, gain or learning, becomes important. The use of a metric such as "level" in the above example, may be insensitive to show actual growth or change.

Given different entry or starting points, what would be an acceptable "expected gain"? The following study shows how the above ideas have been applied in a large urban context. Los Angeles Unified School District has examined the "rate of growth" within the district's Bilingual and ESL programs. These "gains" can be thought of as the result of a year's program intervention. Consistent with the discussion above, it should be noted that absolute growth is to a large extent a function of initial level. Thus, one would expect greater gains for an entering student than would be expected for a student further along. Certainly, one cannot expect the same growth indefinitely; any learning curve will exhibit diminishing returns.

The data used to generate the Expected Gains shown on Table 1 were based largely on work conducted by Toni Marsnik at LAUSD in which data on several thousand elementary level students were examined (see Figure 4). It should be noted that the "gain" scores shown below are based on continuous total scores, which are more "sensitive", and not on Levels which are less able to show change.

Table 1. Average Expected Gain as A Function of Initial or Entering Level

Level/Lang.ProficiencyOral Literacy

Level 12030

Level 21015

Level 35

Level 4

Level 5

Gains beyond Level 3 for the oral test are difficult to anticipate since scores at or above Level 4 are indicative of "native-like" proficiency and not as subject to program intervention or change as are scores at the lower levels of proficiency. In other words the test reaches a "ceiling effect"at this point; as it was perhaps not designed to discriminate between student's achievement levels beyond this level of proficiency. In other words, at this point language proficiency ceases to be a predictor of achievement; at this point it reaches a "ceiling". Therefore, low achievement beyond this level of language proficiency can no longer be associated with limited proficiency. Similarly, while there may be slight differences in proficiency, they are not necessarily predictive of differences in achievement performance.

Growth in reading and writing is more difficult to anticipate than changes in oral proficiency since changes in literacy are more directly tied to instruction and less a function of "informal" instruction. Growth in this area is therefore slower, as shown on Figure 4.

One of the more important results of these studies is that the skills in reading, writing, listening and speaking cannot be assumed to improve at the same rate. Moreover, growth in language proficiency is largely a function of program participation and the quality of the program. Unfortunately, these data were not available. In the above studies, data were collapsed (averaged) across programs and level of participation. In effect, these data might well be described as "random" treatment effects. Certainly more detailed investigation is warranted where both student and program characteristics are examined.

Quality of InstructionSeveral inferences can be drawn from the above discussion. Perhaps the most important pedagogical implication speaks to the average time a student can be expected to require special program treatment. Take, for example, the student who enters the a program knowing no English whatsoever (See Figure 5), a student at the "entry level" of an ESL program. This would be a (language proficiency Index = 1/1) student who scored at chance levels on both oral and literacy tests. Consider first oral development since it precedes literacy development.

According to the values shown on Table 1, a 1/1 student would be expected to gain approximately 20 points in oral development in the first year, from 15 (approximately chance) to 35 points for example, still Level 1 (Level 1 = 0 to 54). In the second year, the student would gain another 20 points for a total of 55, a low Level 2. In the third year, the student would move from a low level 2 to a high level 2, or a total of about 65 points. In the fourth year the student could be expected to move into Level 3 with a total of approximately 70 points. Finally, full native-like oral proficiency would not be "expected" until completion of the fifth year.

Literary development would follow the same basic pattern or steps. Our research (cited above), however, shows that the development of literacy skills is somewhat slower at the lower levels. However, once minimal oral skill have been established, students move quickly through the middle levels as shown by the slope of the curves on Figure 5.

It is noteworthy that the sum of the expected gain values across the different levels is 35 (20+10+5) for oral and 45 (30+15) for the reading and writing which averages out to approximately 13 total points. Thirteen points translates into approximately one proficiency level per year. The critical point, however, is that while it may seem reasonable to set expectation on the basis of a level per year, it would be misleading as discussed above.

Finally, it is also worth bearing in mind that an approach based on differential expectations can offer a powerful metric for evaluating both student progress and programmatic effectiveness. Student growth has been discussed. Programmatic effectiveness in this context becomes a matter of counting the numbers or percentage of students obtaining or exceeding their individual goals. In the ideal program, all students would reach their "expected gain".

Implications and Cautions There are several important points to be taken from the above exercise. The first is to recognize that what has been said applies to any language and any set of tests. The relationships described above are the same for all languages. In terms of program evaluation, the approach works equally well in a dual language program as well as in a program directed toward the improvement of one language. Similarly, any test should be held to the same standards of validity, reliability and ability to make distinctions as described above. Tests should provide predictive information while, at the same time, measuring change in meaningful increments, Tests that do not meet these two conditions should not be used to set expectations nor measure growth.

As was seen on Figure 4, there is very little growth in literacy skills in the first two levels of ESL study. On the other hand, growth in oral skill is rapid, particularly, listening skills. It would, therefore, appear that the acquisition of English as a Second Language develops in a non-linear fashion. Therefore, according to these data, initial programmatic emphasis, at least at the elementary level, should be directed toward the development of beginning oral skills before seriously undertaking reading and writing.

In this connection it is critical to note that the data cited above were all based on elementary level students and we would not expect the same values to hold at the secondary level. In fact, based on work conducted in 1988, there is reason to believe that proficiency develops somewhat differently at the secondary level than it does at the elementary level. For example it has been found that elementary level students who are unable to speak a language (English and Spanish) are seldom (almost never) able to read and write in that language. On the other hand, however, there are significant numbers of students at the secondary level who are able to read and write in a second language while not being able to converse in the language. These students tend 1) to be recent arrivals as opposed to second and third generation students, 2) to be educated in the home language and 3) have received instruction in the second language in the homeland. The predictions for these students would be very different than for others.

It is also important to note that, in the final analysis, full scale language proficiency requires proficiency in all four of the linguistic domains discussed. Since language proficiency, as it has been used here includes both speaking and writing skills, it would not be surprising to find there are a good many students from English-only backgrounds who are of "Limited English" proficiency.

Note that the above "expected gain" values are based on averages that cannot be used to form conclusions or make the progress of an individual student. Stated in another way, group data cannot be proved or disproved by a single example, results must be evaluated on an "average" basis. The implicit model underlying the approach taken here is based on probabilities and are accurate only to the extent of the probability value. To say that the "prediction" is accurate 9 out of ten times is no different than to say that the "prediction" is inaccurate 1 out of ten times. Thus, the approach as well as any program predicated on the approach, must be evaluated based on the total group performance and cannot be proven or disproven on the basis of a single example or individual's test scores.

As a further caution, it should also be borne in mind that even though two tests may employ or report scores using the same metric, there may be distinct differences depending on the norming sample. Thus, the 40th percentile on one test may not have the same meaning as the 40th percentile on another test. Without a common reference group, such comparisons are, at best, difficult, and, possibly misleading at worst.

On a practical side, the determination of program eligibility or placement based on percentile values taken from a test different from the one used by a local district is problematic, particularly if one or the other has not been validated in terms of an external criteria.

Thus, it is quite possible that a child be exited from a program before s/he is ready or denied access to needed programs when they are in need. For example the State Department of Education Guidelines in one state mandate that students performing above the 30th percentile on a test of language proficiency are either ineligible or no longer eligible for services. The assumption is that the 30th percentile on the language proficiency test is somehow equivalent or predictive of a score on an achievement test. While it may be the case that the particular test used to set the 30th percentile as the "cut-score" corresponds to the desired levels of performance on the criterion test, there is no guarantee that the same relationship would hold for the proficiency test used locally. In other words, without equating the scales from different tests there is a strong probability that educational decisions would be based on misaligned rulers.

While the preceding discussion on the various relationships between proficiency and academic performance-over-time has been suggestive, it is not definitive. What is needed to ensure that the above "expectations" are reasonable would be a series of longitudinal studies across age, time in program, program type and measurement instrumentation.

The studies cited in the above discussion have not included any information on either program differences or methods of measurement. Thus, it would be difficult to conclude that the same cut scores or expectations would hold across all variations in programs and measurement techniques.

A possibile approach to the study of the problem would be to locate districts with sufficient longitudinal data on which to conduct a series of post-hoc analyses where "growth" is plotted over time. A major problem, of course, would be the extent to which the data are available in forms and formats which are amenable to analyses. Moreover, there would be no control over the quality of the data supplied by individual districts. Although, while not ideal, a post hoc approach would would certainly be an improvement over the current situation.

Finally these results coupled with those of future studies would help in resolving recent debate over how long it takes to become proficient in English. The present data seem to suggest that it takes approximately five to seven years. However, until further work is completed, the issue is subject to continuing debate.

REFERENCES:De Avila, E., Cervantes, R., & Duncan, S. (1978) CABE Research Journal, 1(2).

Language Assessment Scales. (1988) Oral Technical Report. CTB McGraw/Hill Monterey, CA.

ADDITIONAL RESOURCESAnstrom, K. (1997). Academic achievement for language minority students: Standards, measures, and promising practices. Washington, DC,: National Clearinghouse for Bilingual Education. [www.ncela.gwu.edu/pubs/acadach.html]

Council of Chief State School Officers Stanford Working Group (1996). Systemic Reform and Limited English Proficient Students. Washington, DC.: Author [www.ccsso.org/pdfs/srandlep.pdf]

Del Vecchio, A. et at. (1994). Whole-school bilingual education programs: Approaches for sound assessment. Washington, DC.: National Clearinghouse for Bilingual Education. [www.ncela.gwu.edu/pubs/pigs/pig8.html]

Geisinger, K.F. (1992). Testing limited English proficient students for minimum competency and high school graduation. Proceedings of the second national research symposium on LEP student issues: Focus on evaluation and measurement. Volume 2. Washington, DC.: United States Department of Education, Office of Bilingual Education and Minority Languages Affairs.

Hernandez, R.D. (Winter, 1994). Reducing bias in the assessment of culturally and linguistically diverse populations. The Journal of Educational Issues of Language Minority Students, 14, pp. 269-300. [www.ncela.gwu.edu/micpubs/jeilms/vol14/hernand.html]

Hopstock, P.J. (1995). Recommendations on student outcome variables for limited English proficient (LEP) students. Arlington, VA.: Dvelopment Associates, Special Issues Analysis Center. [www.ncela.gwu.edu/micpubs/siac/outcome/index.html]

McLaughlin, B. et al. (1995). Assessing language development in bilingual preschool children. National Clearinghouse for Bilingual Education: Washington, DC. [www.ncela.gwu.edu/pubs/pigs/pig22.html]

National Clearinghouse for Bilingual Education. (1997). High Stakes Assessment: A research agenda for English language learners. Symposium summary. Washington, DC: author. [www.ncela.gwu.edu/pubs/eacwest/perform.html]

Navarrete, C. and Gustke, C. (1996). A guide to performance assessment for linguistically diverse students. Albuquerque, NM.: EAC West, New Mexico Highlands University.[www.ncela.gwu.edu/pubs/eacwest/perform.html]

Olson, J.F. and Goldstein, A.A. (1997). The inclusion of students with disabilities and limited English proficient students in large-scale assessments: A summary of recent progress. Washington, DC.: National Center for Education Statistics, U.S. Department of Education. [www.ed.gov.NCES/pubs97/97482.html]

Pierce, L.V. and O'Malley, J.M. (1992). Performance and portfolio assessment for language minority students. Washington, DC.: National Clearinghouse for Bilingual Education. [www.ncela.gwu.edu/pubs/pigs/pig9.html]

Saville-Troike, M. (1991). Teaching and testing for academic achievement: The role of language development. Washington, DC,: National Clearinghouse for Bilingual Education. [www.ncela.gwu.edu/pubs/focus/focus4.html]

Short, D. (1993). Assessing integrated language and content instruction. Tesol Quarterly, Vol. 27, No. 4. [www.ncela.gwu.edu/micspubs/list/chapter1.html]

Zehler, A. (1994). An examination of assessment of limited English proficient students. Arlington, VA.: Development Associates, Special Issues Analysis Center. [www.ncela.gwu.edu/pubs/siac/lepasses.html]

To get the latest information relating to funding opportunities, legislation, current research, and upcoming meetings, subscribe to ncela's electronic newsletter, Newsline. To subscribe send an e-mail message to: [email protected]. In the body of the message type: subscribe newsline

EPILOGUE

The following comments represent the author's thoughts related to: Setting Expected Gains for Non and Limited English Proficient Students How Long Does It Really Take to Learn English?

Ed De Avila, Ph.D.

Students from non-English-speaking backgrounds are normally classified as non-limited and proficient speakers of English according to their level of English proficiency. An important question for schools becomes how many students move from one level to the next and how long does it take?

In a recent brief article De Avila (1997) hypothesized an inverse linear relationship between expected growth in English language proficiency and initial proficiency. That is, that the amount of expected gain between two test administrators was to a large extent a function of initial proficiency; the greater the initial proficiency, the less the expected growth. According to the position, the use of categories or levels of proficiency would be increasingly insensitive to the growth as proficiency increased. It was further argued that units of change or analysis had to be based on equal interval scales made up of units or scores sufficiently sensitive to detect small as well as gross changes in proficiency. Therefore, it was argued, that the common practice of expecting growth of one level per year is perhaps unreasonable and tends to obscure actual growth. Limited data were presented in support of the hypothesis. However, the data in the earlier study were restricted to categorical or proficiency levels only. While observing levels only tended to limit the results, a number of important implications were suggested concerning the educational treatment of children from non and limited English speaking backgrounds.

The purpose of this current small study was twofold. The first goal was to examine the relationship between "time" and "oral language proficiency," assessed by means of the Pre-LAS, a commonly used test of oral language proficiency. The second purpose was to examine "expected gain" across "time" as a function of initial starting point as hypothesized by De Avila (1997).

In this earlier study De Avila worked backwards from pre-post data collected by Toni Marsnik and her colleagues in Los Angeles over the past few years. Using these data which were limited to proficiency level categories, it was hypothesized that students beginning at LAS Level 1 would gain approximately 20 raw score points, students at Level 2 would gain approximately ten raw score points, and students at Level 3 would gain about five raw score points. While the hypothesis was supported in the sense of establishing the linear relationship between expected growth and time, the score predictions were little more than educated guesses.

The data collected in the present study included total scores as well as proficiency level categories, enabling a far more precise estimate as to "expected gain." Thus, the data reported below represent a small-scale longitudinal sample.

Method: A total of 203 children between the ages of 54 and 80 months were administered two versions of the Pre-LAS test of oral language development. The time interval for the two test administrations was between three and sixteen months. Data on the first test were collected from school records. Data on the second test were collected in the normalization of a new parallel version of the Pre-LAS test (Pre-LAS 2000).

The development of the statistical procedures used to generate the "expected gains" followed two steps. In the first, a score was produced by calculating the difference between the two tests administered (T2-T1). This produced a "difference score."

In the second step, the "difference score" was then "regressed" against first-test standardized total scores. Scores generated from the resultant equations were then plotted against total LAS scores as shown in Figure 1. The horizontal axis shown on Figure 1 represents the "expected" LAS Total Scores. "Expected" total scores were calculated by adding "expected gains" to initial scores.

Given the dimensions of "time" (indicated by two separate test administrations) and "initial score" (indicated by scores on the horizontal axis) the following holds: a student with an initial score on the Pre-LAS of 5 can, on average, be "expected" to gain somewhere around 40 raw score points. Adding the expected gain to the initial score results in an expected score of 45. A student with an initial score of 65 standard score points would be expected to gain approximately seven standard score points for a total of 72. A student with an initial score of 75 would be expected to gain only about 5 standard score points and have an expected score of 80 points.

Note that possible LAS total scores shown on Figure 1 indicate a range from zero to 85 points. The actual test range is from zero to 100 points. The fact that scores in the present context range between zero and 85 would indicate that the model holds only to about 80 points. Beyond this level, first test results cease to predict "gains." This is no doubt due to the inherent ceiling effect in the tests; 80 percent is basic proficiency.

It would be interesting to speculate on how long it would take to become a proficient speaker if one were to start at zero. According to these data, assuming a student began with zero proficiency, one would expect a gain of 40 points in the first year, 20 points in the second year (total = 60) and 13 points in the third (total = 73) which would leave the student just below the "proficient" speaker category cut-off score (77 at age 4, 82 at ages 5-6).

While all of these findings are encouraging as well as consistent with the original model described in the earlier paper, a significant number of cautions must be borne in mind. A further breakdown of these data are instructive. Of particular interest is the effect of the length of time or interval between test administrations. This issue can be addressed from several points. The first concerns the psychometric issue of test-retest reliability. The second, which is of greater importance in the present context, deals with the general accuracy of the model across time.

It is important to bear in mind that gain in this study would be difficult to determine since the test interval was short for some (more test-retest than longitudinal) and long for others. One way to test the extent of consistency across time would be to examine the correlation between the two test administrations as well as the residuals indicated in the regression analyses. Thus, for example, the overall correlation between the two test administrations was .82 and .83 respectively for Forms C and D.

When broken down into three test intervals, 12 months or less, 13 to 18 months, and 19 months or more, a somewhat more detailed picture emerges.

Correlations between test administrations:

Total Sample (Form C/D)0 to 12 Months (Form C/D)13 to 18 Months (Form C/D)19 Months or More (Form C/D)

.76/.77.95/.95.70/.67.57/.60

N= 203 689336

There were six cases in which the testing interval exceeded reasonable limits. These six were dropped from the analyses.

Though perhaps not unexpected, it is noteworthy that the correlations between test administrations decreases as the interval increases. The scores will vary more the longer the student has been in a program of instruction in English. Correlations for the total sample were .76 and .77 for the two forms of the second test. Both forms were given. More interesting, the correlations for the first group (0 to 12 months) was .95 for both forms. The interesting point here is that this correlation is almost as high as the correlation between the two forms of the second test (i.e., Forms C/D, r= .98). Finally, the importance of these findings is not in that they were unexpected but that they move the field toward being able to establish empirically based expectations across a number of critical dimensions.

Given the progression indicated by the above, it could be inferred that pre-school children who begin school with virtually no English (NEP), would take about three years to master sufficient oral skills to be virtually indistinguishable from his or her mainstream counterparts. However, while the student may have mastered sufficient oral skills to fully participate in an English speaking environment, there is no guarantee that he or she has mastered literacy or other academically related skills.

In an attempt to determine the extent of movement between test levels or placement categories across two test administrations the following analyses was conducted. Results summarized below were limited to students for whom the testing interval was between 9 and 12 months, much like the school year.

Data were available for a total of 92 students. Of these, 18 moved from non to limited categories, 9 moved from non to proficient, 38 remained unchanged and 5 showed a loss from limited to non proficient categories. Finally, of the 28 students initially identified as proficient, 7 moved from proficient to limited. In summary, 27 of the 38 (71%) students initially identified as either non or limited gained at least one level on the five point scale used to categorize them.

Perhaps one of the more important findings illustrated above concerns the students who showed a loss in proficiency. There are several points that can be made here. First, the losses described above may well illustrate the original point regarding the use of categorical levels in contrast to continuos scores. It may well have been that the students who showed losses in proficiency level or category had initially continuous scores that were very close to the cut-off scores, within the grey area above or below the cut-off score created by the standard error of measure. As argued above, a very small difference in scores between the initial and second test can lead to significant changes in level identification which, in turn, can be very misleading. It is entirely possible that level identification in this context could be affected by simple "regression towards the mean." Second, we know virtually nothing about the treatment of the students in the sample or the extent to which their exposure to English was constant throughout the testing interval. For example, children often migrate between the U.S. and Mexico or Puerto Rico during the course of a school year which would certainly have an impact on their acquisition of English. Without further detailed study it would be impossible to fully explain these results.

This study was limited to examining gains independent of the nature of exposure to English during the interval between the two test administrations. This information was not available. Similarly, little is known about the changes in oral proficiency attributed to maturational effects independent of second language differences. Nevertheless, results suggest that program evaluations must be based on time in the program, initial proficiency and a combination of student and program characteristics. Though not addressed specifically here, the relationship between program charactistics and development of English language proficiency is critical. Certainly, further study is needed on bilingual and ESL program s characteristics in the same way as further study on the longitudinal relationship between oral and literacy development is needed for all children.

It is also unfortunate that the current study was limited to one age group and LAS test level. It would be critical to further examine the above relationships across age. Previous studies by the author and his colleagues have demonstrated clear age differences in language proficiency and literacy (See De Avila & Duncan, 1988).

The model underlying the present data reflects the diminishing returns found in any learning curve, where initial rapid learning ultimately gives way to slower learning. It is in this sense that the approach mirrors or is analogous to normal development. It also shows how the level of effort needed to move from one point or level to the next may be greater at the upper end of the learning curve. This point may seem counter intuitive in that it implies greater expense in moving from "limited" to "proficient" categories than from "non" to "limited." The financial implications here would seem all too obvious.

Finally, the present approach is but one way to examine some of the important relationships in designing, implementing and evaluating programs for limited and non English speaking students. Since the model presented above is unabashedly empirical, it is subject to empirical test and refinement; only further research will determine its utility.

About the Author

Dr. Ed De Avila received his BA in psychology from the University of California at Berkeley. He received a Masters Degree in Clinical Psychology from the University of Colorado at Boulder and a Ph.D. in Developmental Psychology at York University in Toronto (Ontario) Canada. Since 1976 he has been President of the Linguametrics Group, an educational research and development organization specializing in assessment.

De Avila has lectured throughout the United States and abroad. He has taught courses, served as professor and conducted research at such institutions as the University of

California, University of Colorado, Stanford University, and Columbia University among others. Additionally, he has sat on a number of editorial boards, including the National Association of Bilingual Education Journal, Journal of International Psychology, American Journal of Mathematics Instruction, Journal of Applied Developmental Psychology, American Journal of Mental Deficiency, and others.

He has published numerous articles, several books, films and other educational materials, such as the Language Assessment Scales which are probably the most popularly used language proficiency tests in the country.

Over the past twenty years, De Avila has served as a consultant to a number of federal agencies both in the U.S. and abroad, as well as to numerous state departments of education, school districts, foundations, and private companies.

SETTINGEXPECTEDGAINSfor Non and Limited English Proficient Students

Edward De Avila, Ph.D. November, 1997

The primary instructional vehicle in the schools is language. Unfortunately, however there are large numbers of students in the US who find it difficult to benefit from instruction because of limited proficiency in the language of the classroom, usually English. These children come from a wide variety of environments. There are several million such children and even more adults. They share the fact that they have not been a part of the mainstream linguistic environment, and, as a result, have been excluded from both the educational and social benefits. They may be economically poor and therefore linguistically isolated from the mainstream or they may come from backgrounds where the home language or dialect does not match that of the schools. In either case, or, for whatever reasons, they are unable or find it difficult to benefit from the mainstream instruction offered in the schools and do not seem to learn.

Over the past twenty years, programs designed to improve the language proficiency of Limited English Proficient (LEP) students have met, unfortunately, with mixed results. And, largely because of a lack of adequate evaluative documentation, results have been equivocal regardless of program quality. Much of the confusion has come out of variable approaches to the concept of growth and what can be expected from programs of this type.

Recently, "expected gain" has become an important concept in documenting the educational development of Limited and Non English Proficient speaking students. An understanding of this concept requires an analysis of the relationship between quality of instruction and measurable student outcomes. There are three key factors which both underlie this relationship and provide the necessary foundation upon which expectations for learning that can be derived or generated in a meaningful and defensible manner.

Assessment Requirements The concept of expected gain, as implemented in programs for non and Limited English Proficient (LEP) students, assumes a direct empirical relationship between student gains in language proficiency and their probable success in a mainstream program. In this context, valid and reliable assessment of growth in language proficiency, sensitive to gains that are attributable to program and instruction, are essential. Tests failing to exhibit this relationship will not provide a stable basis for either setting expectations or demonstrating growth.

Setting Expectations and Sensitivity to Growth Setting a reasonable "expectation" for student performance must be done on an individual basis beginning with a determination of where the student enters the program and measuring growth in increments sensitive to substantive linguistic changes.

Instructional Practices and Programs The process of setting expectations for growth necessarily assumes exposure to an effective instructional program. Obviously without a quality program, expectation levels are meaningless, regardless of how they are created. On the other hand, properly set expectations, coupled with effective, programs can go along way toward creating a positive educational environment as well as documenting and/or validating programs.

Psychometric and pedagogical implications follow from each factor. However, before discussion of these implications, it would seem important to first clarify what is meant by language proficiency since much of the confusion over the effectiveness and purposes of programs aimed at "remediating" limited English proficiency results from a lack of a clear definition. As will be seen, in this connection, the distinction between language proficiency and academic achievement will become critical to this and any discussion regarding language minority children.

Language Proficiency DefinedIn as much as the concept of language proficiency in this context is directly related to the concept of expected-gain, it needs to be defined both conceptually and empirically. The term "language proficiency" as it is used here refers to those linguistic elements necessary for successful communication within the school environment. It is a broader concept than the concept of "academic achievement", though it underlies success in school. Thus, while language proficiency is viewed as a necessary element in defining academic success in the mainstream, it is not, in itself, sufficient to guarantee success as defined by performance which is indistinguishable from that of mainstream students.

Defined as communication, language proficiency consists of both receptive and productive skills, input and output, information sent and received. It is made up of both oral and literacy skills: listening, speaking, reading and writing. Proficiency in each of the four domains is viewed as a necessary element to language proficiency, as it contributes to academic success in the specific sense. Language proficiency is a necessary element to success in the general sense but not sufficient in the specific sense of guaranteeing success in school.

Knowing that a student is linguistically proficient tells us that s/he is able to benefit from instruction in the language of the classroom. While a test of language proficiency tells us nothing about how well a student will perform on a test of American history, it will, however, tell us that s/he can understand or comprehend (listen to) oral instruction on American history. Moreover, it will tell us whether s/he can be expected to comprehend and obtain textual information (reading) on American history, as well as write and speak about what s/he has learned about American history.

Relationship between student gains in language proficiency and probability of academic success in mainstream classrooms

Language proficiency is made up of both oral and literacy skills. Let us first consider oral skills. There are several studies that apply to the present discussion. The first was conducted in 1978 under contract to the California State Department of Education. In this study De Avila, Duncan and Cervantes (1978), hypothesized a linear relationship between five levels of a widely used test of oral language proficiency and academic performance as measured by the CTBS-U (see Figure 1).

As predicted, oral language proficiency was found to be a significant predictor of academic performance. Researchers found that students scoring at Levels 4 and 5 on the oral test passed the CTBS-U at or above the 36th %ile (see Figure 2). In other words, oral proficiency was found to be a necessary element for success.

The results of this study were, in part, used by several State Departments of Education to set "Reclassification" or cut-off scores for determining student eligibility for bilingual programs.

In a further attempt to test the assumed relationship between academic performance and literacy (reading & writing), the above study was replicated in 1988 using a reading and writing test in place of the oral test used in the 1978 study. Results from both studies are shown on Figure 2. The similarity of results is striking.

The same fundamental results were obtained as in the first study. This time, however, a direct relationship between language proficiency, as measured by literacy, and academic performance was found, in contrast to the same relationship between oral proficiency and academic achievement nd first study. Moreover, it was found that of students passing the reading and writing test at the "Competent Literate" Level (3), over 90 percent passed the CTBS-U at or above the 36th percentile (see Figure 2).

These studies, along with numerous other studies conducted by various researchers over the past fifteen years, provide ample support for the hypothesized relationship between language proficiency and school performance as well as the justification or basis for examining the extent of "expected growth" over time in relation to academic performance.

It should be obvious that the choice of a valid and reliable test of language proficiency is critical. Tests failing to "predict" performance in the manner discussed here would certainly be problematic in actually setting an expectation level or score.

There are, therefore, two primary requirements that must be placed on whatever test of language proficiency is used to measure growth. First it must predict or be related to programmatic criteria, be it defined as achievement on a statewide standards or a nationally normed test. Secondly, it must produce increments of growth in units which are reflective of learning and educationally meaningful.

Where the Student Enters the Program

It cannot be assumed that all students will learn at the same rate or to the same extent. In large measure, extent of growth is limited by how far along the student is on the learning curve when s/he enters or begins a program.

It is of key importance to understand that there is a difference between expected growth and possible growth when setting expectations. Stated in another way, this means that if a student begins at zero on a 0-100 scale, possible growth is 100. Conversely a student who begins at 99 can only grow to 100, an improvement of only one point, in sharp contrast to the student who had the possibility of gaining 100 points.

It is exactly because possible growth is a direct function of where along the program/measurement continuum a student begins that it is essential to establish that point before setting "expected growth". It would be foolish to expect the same growth for all students regardless of entry point, program type, quality or effectiveness.

Given that not all students can be expected to show the same amount of growth in the same time frame, it becomes essential to establish the point at which the student begins. In addition, regardless of what measurement device is used, it must be sensitive enough to show growth in meaningful increments. Thus, the choice of an appropriate "metric" becomes critically important. An improper metric such as categorical or nominal scales can obscure growth. Figure 3 illustrates three examples where student progress or growth is tracked both by level of proficiency (I to 5) and by continuous score (0 to 100)

Consider Student One who begins knowing absolutely no English at all, a Level 1 student on a 1 to 5 scale. As shown on Figure 3, Student One made "no change " in proficiency level between pre and post tests. On the other hand, however, the total score for Student One shows a gain of 20 total points along the proficiency continuum. Student Two, in contrast, gained a full level between pre and post tests, however, gained only 10 points along the proficiency continuum . Similarly Student Three gained a full level, but showed only a five point increase between tests.

"Level", in the above, indicated that only two of the three examples showed gain. Examination of the total point scored revealed, perhaps ironically, that the student who showed the greatest gain (in points) made no gain or change in "Level".

The above example illustrates the importance of why the choice of an appropriate metric to indicate change, gain or learning, becomes important. The use of a metric such as "level" in the above example, may be insensitive to show actual growth or change.

Given different entry or starting points, what would be an acceptable "expected gain"? The following study shows how the above ideas have been applied in a large urban context. Los Angeles Unified School District has examined the "rate of growth" within the district's Bilingual and ESL programs. These "gains" can be thought of as the result of a year's program intervention. Consistent with the discussion above, it should be noted that absolute growth is to a large extent a function of initial level. Thus, one would expect greater gains for an entering student than would be expected for a student further along. Certainly, one cannot expect the same growth indefinitely; any learning curve will exhibit diminishing returns.

The data used to generate the Expected Gains shown on Table 1 were based largely on work conducted by Toni Marsnik at LAUSD in which data on several thousand elementary level students were examined (see Figure 4). It should be noted that the "gain" scores shown below are based on continuous total scores, which are more "sensitive", and not on Levels which are less able to show change.

Table 1. Average Expected Gain as A Function of Initial or Entering Level

Level/Lang.ProficiencyOral Literacy

Level 12030

Level 21015

Level 35

Level 4

Level 5

Gains beyond Level 3 for the oral test are difficult to anticipate since scores at or above Level 4 are indicative of "native-like" proficiency and not as subject to program intervention or change as are scores at the lower levels of proficiency. In other words the test reaches a "ceiling effect"at this point; as it was perhaps not designed to discriminate between student's achievement levels beyond this level of proficiency. In other words, at this point language proficiency ceases to be a predictor of achievement; at this point it reaches a "ceiling". Therefore, low achievement beyond this level of language proficiency can no longer be associated with limited proficiency. Similarly, while there may be slight differences in proficiency, they are not necessarily predictive of differences in achievement performance.

Growth in reading and writing is more difficult to anticipate than changes in oral proficiency since changes in literacy are more directly tied to instruction and less a function of "informal" instruction. Growth in this area is therefore slower, as shown on Figure 4.

One of the more important results of these studies is that the skills in reading, writing, listening and speaking cannot be assumed to improve at the same rate. Moreover, growth in language proficiency is largely a function of program participation and the quality of the program. Unfortunately, these data were not available. In the above studies, data were collapsed (averaged) across programs and level of participation. In effect, these data might well be described as "random" treatment effects. Certainly more detailed investigation is warranted where both student and program characteristics are examined.

Quality of InstructionSeveral inferences can be drawn from the above discussion. Perhaps the most important pedagogical implication speaks to the average time a student can be expected to require special program treatment. Take, for example, the student who enters the a program knowing no English whatsoever (See Figure 5), a student at the "entry level" of an ESL program. This would be a (language proficiency Index = 1/1) student who scored at chance levels on both oral and literacy tests. Consider first oral development since it precedes literacy development.

According to the values shown on Table 1, a 1/1 student would be expected to gain approximately 20 points in oral development in the first year, from 15 (approximately chance) to 35 points for example, still Level 1 (Level 1 = 0 to 54). In the second year, the student would gain another 20 points for a total of 55, a low Level 2. In the third year, the student would move from a low level 2 to a high level 2, or a total of about 65 points. In the fourth year the student could be expected to move into Level 3 with a total of approximately 70 points. Finally, full native-like oral proficiency would not be "expected" until completion of the fifth year.

Literary development would follow the same basic pattern or steps. Our research (cited above), however, shows that the development of literacy skills is somewhat slower at the lower levels. However, once minimal oral skill have been established, students move quickly through the middle levels as shown by the slope of the curves on Figure 5.

It is noteworthy that the sum of the expected gain values across the different levels is 35 (20+10+5) for oral and 45 (30+15) for the reading and writing which averages out to approximately 13 total points. Thirteen points translates into approximately one proficiency level per year. The critical point, however, is that while it may seem reasonable to set expectation on the basis of a level per year, it would be misleading as discussed above.

Finally, it is also worth bearing in mind that an approach based on differential expectations can offer a powerful metric for evaluating both student progress and programmatic effectiveness. Student growth has been discussed. Programmatic effectiveness in this context becomes a matter of counting the numbers or percentage of students obtaining or exceeding their individual goals. In the ideal program, all students would reach their "expected gain".

Implications and Cautions There are several important points to be taken from the above exercise. The first is to recognize that what has been said applies to any language and any set of tests. The relationships described above are the same for all languages. In terms of program evaluation, the approach works equally well in a dual language program as well as in a program directed toward the improvement of one language. Similarly, any test should be held to the same standards of validity, reliability and ability to make distinctions as described above. Tests should provide predictive information while, at the same time, measuring change in meaningful increments, Tests that do not meet these two conditions should not be used to set expectations nor measure growth.

As was seen on Figure 4, there is very little growth in literacy skills in the first two levels of ESL study. On the other hand, growth in oral skill is rapid, particularly, listening skills. It would, therefore, appear that the acquisition of English as a Second Language develops in a non-linear fashion. Therefore, according to these data, initial programmatic emphasis, at least at the elementary level, should be directed toward the development of beginning oral skills before seriously undertaking reading and writing.

In this connection it is critical to note that the data cited above were all based on elementary level students and we would not expect the same values to hold at the secondary level. In fact, based on work conducted in 1988, there is reason to believe that proficiency develops somewhat differently at the secondary level than it does at the elementary level. For example it has been found that elementary level students who are unable to speak a language (English and Spanish) are seldom (almost never) able to read and write in that language. On the other hand, however, there are significant numbers of students at the secondary level who are able to read and write in a second language while not being able to converse in the language. These students tend 1) to be recent arrivals as opposed to second and third generation students, 2) to be educated in the home language and 3) have received instruction in the second language in the homeland. The predictions for these students would be very different than for others.

It is also important to note that, in the final analys