137
Strand 11 Evaluation and assessment of student learning and development

ESERA dcqqdeqeqeBook Part 11

Embed Size (px)

DESCRIPTION

qefqef

Citation preview

  • Strand 11

    Evaluation and assessment of

    student learning and development

  • i

    CONTENTS

    Chapter Title Page

    1 Introduction

    Robin Millar, Jens Dolin

    1

    2 Performance assessment of practical skills in science in teacher

    training programs useful in school

    Ann Mutvei Berrez, Jan-Eric Mattsson

    3

    3 Development of an instrument to measure childrens systems thinking

    Kyriake Constantinide, Michalis Michaelides, Costantinos P.

    Constantinou

    13

    4 Development of a two-tier test- instrument for geometrical

    optics

    Claudia Haagen, Martin Hopf

    24

    5 Strengthening assessment in high school inquiry classrooms

    Chris Harrison

    31

    6 Analysis of student concept knowledge in kinematics

    Andreas Lichtenberger, Andreas Vaterlaous, Clemens Wagner

    38

    7 Measuring experimental skills in large-scale assessments:

    Developing a simulation-based test instrument

    Martin Dickmann, Bodo Eickhorst, Heike Theyssen, Knut

    Neumann, Horst Schecker, Nico Schreiber

    50

    8 The notion of authenticity according to PISA: An empirical

    analysis

    Laura Weiss, Andreas Mueller

    59

    9 Examining whether secondary school students make changes

    suggested by expert or peer assessors in the science web-

    portfolio

    Olia Tsivitanidou, Zacharias Zacharia, Tasos Hovardas

    68

    10 Sources of difficulties in PISA science items

    Florence Le Hebel, Andree Tiberghien, Pascale Montpied

    76

    Strand 11 Evaluation and assessment of student learning and development

  • ii

    11 In-context items in a nation wide examination: Which

    knowledge and skills are actually assessed?

    Nelio Bizzo, Ana Maria Santos Gouw, Paulo Sergio Garcia,

    Paulo Henrique Nico Monteiro, Luiz Caldeira Brant de

    Tolentino-Neto

    85

    12 Predicting success of freshmen in chemistry using moderated

    multiple linear regression analysis

    Katja Freyer, Matthias Epple, Elke Sumfleth

    93

    13 Testing student conceptual understanding of electric circuits as a

    system

    Hildegard Urban-Woldron

    101

    14 Process-oriented and product-oriented assessment of

    experimental skills in physics: A comparison

    Nico Schreiber, Heike Theyssen, Horst Schecker

    112

    15 Modelling and assessing experimental competence: An

    interdisciplinary progress model for hands-on assessments

    Susanne Metzger, Christoph Gut, Pitt Hild, Josiane Tardent

    120

    16 Effects of self-evaluation on students achievements in chemistry education

    Inga Kallweit, Insa Melle

    128

    Strand 11 Evaluation and assessment of student learning and development

  • INTRODUCTION

    Strand 11 focuses on the evaluation and assessment of student learning and

    development. Many studies presented in other conference strands, of course, involve

    the assessment of student learning or of affective characteristics and outcomes such as

    students attitudes or interests and use existing instruments or new ones developed for the study in hand. In such studies, assessment instruments are tools to be used to

    try to explore and answer other questions of interest. In strand 11, the emphasis is on

    the development, validation and use of assessment instruments; the focus is on the

    instrument itself. These can include standardized tests, achievement tests, high stakes

    tests, and instruments for measuring attitudes, interests, beliefs, self-efficacy, science

    process skills, conceptual understandings, and so on. They may be developed with a

    view to making assessment more authentic in some sense, to facilitate formative assessment, or to improve summative assessment of student learning.

    Fifteen papers presented in this strand are included in this book of e-proceedings.

    Four of them discuss the development of new or modified instruments to assess

    students conceptual understanding of a science topic. Two use the two-tier multiple choice format that many researchers have found valuable for probing understanding,

    to explore the topics of electric circuits and geometrical optics. Another explores the

    factors that may underlie the observed patterns in students responses, trying to tease out the relative importance of mathematical and physical ideas in determining

    performance on questions about kinematics. A fourth paper begins the exploration of

    a relatively new and novel science domain, systems thinking. Here assessment items

    have a particularly significant role to play in helping to define the domain in

    operational terms, and facilitating discussion within the science education research

    community.

    Four papers explore issues concerning the assessment of practical competence and

    skills. One looks at the general issue of developing a model to describe progress in

    carrying out hands-on activities; another focuses more specifically on experimental

    skills in physics; and a third considers performance assessment in the context of initial

    teacher education. The fourth paper looks at the potential use of simulations as

    surrogates for bench practical activities. Work in this domain is important, as science

    educators seek to come to a better understanding of the factors that lead to variation in

    students responses to practical tasks.

    Three papers look in different ways at the influence of contexts on students answers and responses to tasks. Two take the PISA studies as their starting point, looking in

    detail at the thinking of students as they respond to PISA tasks and questioning the

    extent to which the PISA interpretation of authenticity enhances student interest and engagement with assessment tasks. Both point to the value of listening to students

    talking about their thinking as they answer questions, and suggest that this may be

    quite different from what we would expect, and perhaps hope. A third paper with an

    interest in the effects of contextualisation presents data from a study in Brazil

    comparing students answered to sets of parallel questions with fuller and more abridged contextual information. The findings have implications for item design, and

    suggest that reading demands should be kept carefully in check if we aim to assess

    science learning.

    Strand 11 Evaluation and assessment of student learning and development

    1

  • Three papers in this section explore the formative use of assessment. One has a focus

    on the assessment of learning that results from inquiry-based science teaching.

    Another looks at the ways in which students respond to formative feedback on their

    work. The context for this study is web portfolios, but the research question is one

    with wider applicability to other forms of feedback, and across science contents more

    generally. The third uses an experimental design to explore the impact on student

    learning in a topic on chemical reactions of a self-evaluation instrument that asks

    students to try to monitor their own learning and to take action to address areas in

    which they judge themselves to be weak.

    All of the papers described above collect data from students of secondary school age

    or prospective teachers. The final paper in this strand looks at the potential use of an

    attitude assessment instrument to predict undergraduate students success in chemistry learning.

    The set of papers highlights the key role of assessment items and instruments as

    operational definitions of intended learning outcomes, bringing greater clarity to the constructs used and to our understanding of learning in the domains that they study.

    Jens Dolin and Robin Millar

    Strand 11 Evaluation and assessment of student learning and development

    2

  • PERFORMANCE ASSESSMENT OF PRACTICAL

    SKILLS IN SCIENCE IN TEACHER TRAINING

    PROGRAMS USEFUL IN SCHOOL

    Ann Mutvei and Jan-Eric Mattsson

    School of Natural Sciences, Technology and Environmental Studies, Sdertrn

    University, Sweden.

    Abstract: There is a general process towards an understanding of knowledge not as a

    question of remembering facts but to achieve the skill to use what is learnt under

    different circumstances. According to this, knowledge should be useful at different

    occasions also outside school. This process may also be identified in the development

    of new tests performed in order to assess knowledge.

    In courses in biology, chemistry and physics focused on didactics we have developed

    performance assessments aimed at assessing the understanding of general scientific

    principles by simple practical investigations. Although, designed to assess whether

    specific goals are attained, we discovered how small alterations of performance

    assessments promoted the development of didactic skills. Performance assessments

    may act as tools for the academic teacher, school teacher and for enhancement of

    student understanding of the theory.

    This workshop was focused on performance assessments of the ability to present

    skills and to develop new ideas. We presented, discussed, explained and familiarized

    a practical approach to performance assessments in science education together with

    the other participants. The emphasis was to demonstrate and to give experience of this

    assessment tool.

    We performed elaborative tasks as they may be used by teachers working at different

    levels, assessed the performances and evaluated the learning outcome of the activity.

    Different assessment rubrics where be presented and tested at the workshop. Learning

    by doing filled the major part of the workshop but there were also opportunities for

    discussions, sharing ideas and suggestions for further development.

    The activities performed may be seen as models possible for further development into

    new assessments.

    Keywords: assessment, rubric, practical skills, knowledge requirement

    INTRODUCTION

    During the last ten or fifteen years there has been a general process towards an

    understanding of knowledge not as a question of remembering facts but to achieve the

    skill to use what is learnt under different more or less practical circumstances.

    According to this view knowledge should be useful at different occasions also outside

    school. Traditional textbooks often had facts arranged in a linear and in a hierarchical

    order. More recent books are focused on the development of the thoughts and ideas of

    the student by presenting general principles underpinned by good examples,

    Strand 11 Evaluation and assessment of student learning and development

    3

  • diagnoses, questions to discuss, reflective tasks without any presentation of a correct

    answers, etc. (cf. Audesirk et al. 2008, Hewitt et al. 2008, Reece et. al 2011, Trefil &

    Hazen 2010). A similar development can be found in teacher training programs,

    where lectures and traditional text seminars to some extent have been replaced by

    more interactive forms of teaching. This development we also found in examinations

    at our own university where tests performed in order to assess knowledge of literature

    content have been replaced by tests where students have to show their capacity to use

    their knowledge.

    Practical performance assessments are important when assessing abilities or skills of

    students in teacher training programs. In science courses in biology, chemistry and

    physics focused on didactics we have for several years developed performance

    assessments focused on understanding of general scientific principles, but based on

    simple practical investigations or studies. Although, designed to assess whether

    students reached the goals of a specific course, we often have discovered how small

    alterations of these performance assessments have promoted the development of the

    didactic skills of the student. Thus, they may act as assessment tools for the academic

    teacher, models for assessments in school and enhancement of the students

    theoretical understanding of the subject and theory. The assessments may be made on

    oral or written reports, during guided excursions or museum visits or practical

    experiments, on traditional or esthetical diaries, self diagnoses or diagnoses made by

    other students based on certain criteria.

    We have been working several years with teacher training programs focused on work

    in primary and secondary schools, with further education for teachers and with

    university students studying biology and chemistry. The wide range of courses and

    students have been giving us experiences how to work with different contents adapted

    to different ages of students at school. Out of this we have found some similar and

    different basic problems and needs of understanding depending on the subject. These

    experiences also give us the opportunity to contribute to national seminars and

    conferences.

    CURRICULUM AT SWEDISH SCHOOL

    The new curriculum in Sweden for the primary and lower secondary schools

    (Skolverket 2010) as well as the new one for the upper secondary school put the

    emphasis on the students skills rather than knowledge (facts). It is the ability to use

    the knowledge that is to be assessed. This development is a global trend; see e.g.

    Eurasian Journal of Mathematics, Science & Technology Education 8(1). This is a

    great change compared to earlier curricula, especially when compared to the common

    interpretation and implementation of these at the local level. A similar development

    has occurred in the universities in Sweden. Today the intended learning outcomes

    should be described in the syllabi as abilities the student can show after finishing the

    course and how this should be done.

    Many teachers have problems with this view as they are used to assess the students

    ability to reproduce facts. These teachers find it hard to understand how to work with

    performance assessments instead of tests targeting the knowledge of facts. They often

    ask for clear directions and expect strict answers instead of guidelines how to improve

    their own ability to work with performance assessments.

    Strand 11 Evaluation and assessment of student learning and development

    4

  • Teaching according to these new curricula starts with the design of performance

    assessments suitable for the assessment of a specific skill and to create a rubric for the

    assessment. Thereafter the teacher plans the exercises beneficial for student

    development and finally decides the time needed and plans the activities according to

    this.

    Figure 1. How to plan learning situations.

    As an example of how teachers may work with this method we designed a practical

    assessment of practical skills and presented it as a workshop at ESERA 2013.

    HOW TO DESIGN A PERFORMANCE ASSESSMENT OF PRACTICAL

    SCIENCE SKILLS

    In order to design a workshop on performance assessments of the skills we tried to do

    as teachers are supposed to do at school. The emphasis was to demonstrate and give a

    possibility to get experience of this assessment tool under realistic conditions. Thus,

    these performance assessments are constructed in accordance with the curriculum in

    Sweden from 2011 (Skolverket 2010) but they are probably useful for anyone who

    wants to assess abilities or skills rather than memories of facts or texts. We tried to

    present, explain and familiarize the participants with a practical approach to

    performance assessments in science education at school.

    The skill of assessment has to be learned. If teachers are used to assess skills these

    normally are of a more or less theoretical kind. They are used to assess the quality of

    the language used or the correctness of a mathematical calculation. Assessment of

    practical skills does not has to be more complicated but it has to be trained. According

    to the Swedish curriculum 150200 assessments of each student and in each of the

    about 15 school subjects should be done at the end of years 6 and 9 and many of these

    refer to practical skills. In order to simplify this monstrous task it is possible and

    necessary to assess several skills in more than one subject at one occasion.

    We had prepared four similar activities, all with the same material; candle, wick, and

    matchbox but with different purposes. They were supposed to represent studies of

    Strand 11 Evaluation and assessment of student learning and development

    5

  • mass transfer, energy transformation, technical design, and phase changes. The latter

    is presented here in detail.

    General principles of performance assessments

    In the preparations we followed the directions of the Swedish curriculum for the

    compulsory school (Skolverket 2010). We selected the core content and the

    knowledge requirements relevant for phase transitions as the foundation for

    development of the performance assessment. Usually teachers start with the

    knowledge requirements, interpret these and design tests for assessing the students

    skills according to the requirements, design suitable learning situations or practical

    training of the skills and finally decide what parts of the core content should be used

    (Figure 1). Here we started with the core content as it were some specific areas of

    knowledge we wanted to study. When the core content was selected the assessment

    rubric was developed by interpreting and dissecting the knowledge requirements.

    Core content

    The teaching in science studies should, in this case, according to the curriculum of

    primary and secondary school (Skolverket 2010), deal with the core content presented

    in Table 1.

    Table 1

    Core content in Swedish compulsory school curriculum relevant for phase transitions

    and scientific studies.

    In years 13 In years 46 In years 79 Various forms of water: solids, liquids and gases.

    Transition between the

    forms: evaporation,

    boiling, condensation,

    melting and solidification.

    Simple particle model to describe and explain the structure,

    recycling and indestructibility of

    matter. Movements of particles as

    an explanation for transitions

    between solids, liquids and gases.

    Particle models to describe and explain the

    properties of phases, phase

    transitions and distribution

    processes for matter in air,

    water and the ground.

    Simple scientific studies. Simple systematic studies. Planning, execution and

    evaluation.

    Systematic studies. Formulating simple

    questions, planning,

    execution and evaluation.

    The relationship between chemical experiments and

    the development of

    concepts, models and

    theories.

    Knowledge requirements

    The knowledge requirements are related to the age of the students and show a clear

    progression through school. At the end of the third, sixth and ninth year there are

    clearly defined knowledge requirements (Table 2). Grades are introduced in the sixth

    year and levels for grades E (lowest), C, and A (highest) are described in the

    curriculum. Also D and B are being used. Grades D or B means that the knowledge

    requirements for grade E or C and most of C or A are satisfied respectively.

    Strand 11 Evaluation and assessment of student learning and development

    6

  • Table 2

    Knowledge requirements for different years and grades

    Year

    3

    Based on clear instructions, pupils can carry out [] simple studies dealing with nature and people, power and motion, and also water and air.

    Grade E Grade C Grade A

    Year

    6

    Pupils can talk about and

    discuss simple questions

    concerning energy.

    Pupils can carry out

    simple studies based on

    given plans and also

    contribute to

    formulating simple

    questions and planning

    which can be

    systematically developed.

    In their work, pupils use

    equipment in a safe and

    basically functional way.

    Pupils can [] contribute to making

    proposals that can

    improve the study.

    Pupils can talk about and

    discuss simple questions

    concerning energy.

    Pupils can carry out

    simple studies based on

    given plans and also

    formulate simple

    questions and planning

    which after some

    reworking can be

    systematically developed.

    In their work, pupils use

    equipment in a safe and

    appropriate way. Pupils

    can [] make proposals which after some

    reworking can improve

    the study.

    Pupils can talk about and

    discuss simple questions

    concerning energy.

    Pupils can carry out simple

    studies based on given

    plans and also formulate

    simple questions and

    planning which after some

    reworking can be

    systematically developed.

    In their work, pupils use

    equipment in a safe,

    appropriate and effective

    way. Pupils can [] make proposals which can

    improve the study.

    Year

    9

    Pupils can talk about and

    discuss questions

    concerning energy. Pupils

    can carry out studies

    based on given plans and

    also contribute to

    formulating simple

    questions and planning

    which can be

    systematically developed.

    In their studies, pupils use

    equipment in a safe and

    basically functional way.

    Pupils apply simple

    reasoning about the

    plausibility of their results

    and contribute to making

    proposals on how the

    studies can be improved.

    Pupils have basic

    knowledge of energy,

    matter, [] and show this by giving examples and

    describing these with

    some use of the concepts,

    models and theories.

    Pupils can talk about and

    discuss questions

    concerning energy. Pupils

    can carry out studies

    based on given plans and

    also formulate simple

    questions and planning

    which after some

    reworking can be

    systematically developed.

    In their studies, pupils use

    equipment in a safe and

    appropriate way. Pupils

    apply developed

    reasoning about the

    plausibility of their results

    and make proposals on

    how the studies can be

    improved. Pupils have

    good knowledge of

    energy, matter, [] and show this by explaining

    and showing

    relationships with

    relatively good use of the

    concepts, models and

    theories.

    Pupils can talk about and

    discuss questions

    concerning energy. Pupils

    can carry out studies based

    on given plans and also

    formulate simple questions

    and planning that can be

    systematically developed.

    In their investigations,

    pupils use equipment in a

    safe, appropriate and

    effective way. Pupils apply

    well developed reasoning

    concerning the plausibility

    of their results in relation

    to possible sources of

    error and make proposals

    on how the studies can be

    improved and identify new

    questions for further

    study. Pupils have very

    good knowledge of energy,

    matter, [] and show this by explaining and showing

    relationships between

    them and some general

    characteristics with good

    use of the concepts, models

    and theories

    Strand 11 Evaluation and assessment of student learning and development

    7

  • Assessments of knowledge requirements

    The knowledge requirements were interpreted and dissected in smaller units in order

    to construct an assessment rubric adapted to the inquiry. Five main skills were

    selected from the knowledge requirements; Use of theory, Improvement of the

    experiment, Explanations, Relate, and Discuss. In order to make the assessment rubric

    more generalized we decided not to use the grades of the curriculum but recognized

    three levels of skills; Sufficient, Good, and Better corresponding to the grades E, C

    and A respectively. In all cases we also gave examples of relevant student answers.

    This is a more or less necessary requirement in order to make sure that the performer,

    assessor or teacher really understands what is meant by a specific requirement (Arter

    & McTighe 2001, Jnsson 2011).

    As an example of this we can look at the knowledge requirement Pupils can carry

    out studies based on given plans and also contribute to formulating simple questions

    and planning which can be systematically developed. In their studies, pupils use

    equipment in a safe and basically functional way. Pupils apply simple reasoning

    about the plausibility of their results and contribute to making proposals on how the

    studies can be improved. (Year 9, level E). This requirement contains information

    that may be dissected into several units.

    Primarily it is necessary to look at the five skills of the students that are going to be

    assessed and look at the suitable requirements for each skill. The students are

    supposed to carry out studies based on given plans. In the case the experiment is

    very simple, (light and observe a burning candle), and hardly useful assessing this

    specific skill. They shall also contribute to formulating simple questions and

    planning which can be systematically developed. This requirement can be further

    developed to suit the five skills.

    In order to show this skill it is necessary to have some knowledge about the theory

    and use it in a suitable way. The skill use of theory is a necessary condition for this

    and may be formulated as The student draws simple conclusions partly related to

    chemical models and theories. This criterion also is in concordance with the skill

    simple reasoning about the plausibility of their results and contribute to making

    proposals on how the studies can be improved. This may be formulated as the

    student discusses the observations and contributes with suggestions of improvements

    in the rubric for assessment of the improvement of the experiment requirement.

    In a similar way the assessment of remaining three skills may be developed into more

    specific criteria adapted to this experiment (Table 3).

    In order to make it possible for the student to understand what is expected it is

    necessary to clarify the requirement criteria and give realistic examples of these

    requirements. The meaning of words differs between disciplines not only in the

    academic world but also in school (cf. Chanock 2000). This has consequences when

    students get feedback as they often do not understand the academic discourse with its

    specific concepts and fail to use the feedback later (Lea & Street 2006). Criteria

    combined with explicit examples are necessary to solve this problem (Sadler 1987).

    This is also important when designing assessment rubrics (Busching 1998, Arter &

    McTighe 2001). Thus, to every criterion there has to be at least one example given. In

    Table 3 this is exemplified in every combination of skill and grade requirement.

    Strand 11 Evaluation and assessment of student learning and development

    8

  • Table 3

    Assessment rubric for assessing skills in an experiment of phase changes Sufficient Good Better

    Use of theory The student draws

    simple conclusions

    partly related to

    chemical models and

    theories. (I can see

    stearic acid in solid,

    liquid and gas phase.)

    The student draws

    conclusions based on

    chemical models and

    theories. (The heat of the

    candle causes the phase

    transfer between the

    phases.)

    The student draws well

    founded conclusions out of

    chemical models and

    theories. (Stearic acid

    must in gas phase and mix

    with oxygen to burn.)

    Improvement

    of the

    experiment

    The student discusses

    the observations and

    contributes with

    suggestions of

    improvements.

    (Observe more

    burning candles.)

    The student discusses

    different interpretations

    of the observations and

    suggests improvements.

    (Remove the wick and

    relight the candle.)

    The student discusses well

    founded interpretations of

    the observations, if they

    are reasonable, and

    suggests based on these

    improvements which allow

    enquiries of new

    questions. (Heat a small

    amount of stearic acid and

    try to light the gas phase

    above.)

    Explanations

    The student gives

    simple and relatively

    well founded

    explanations. (The

    stearic acid melts by

    heat produced by the

    flame.)

    The student gives

    developed and well

    founded explanations.

    (Also the change from

    liquid phase to gaseous

    phase depends on the

    heat from the flame.)

    The student presents

    theoretically developed

    and well founded

    explanations. (All phase

    changes from solid to

    liquid or liquid to gaseous

    need energy.)

    Relate The student gives

    examples of similar

    processes as in the

    experiment related to

    questions about

    energy, environment,

    health and society.

    (The warmth of the sun

    melts the ice on the

    lake at the end of the

    winter.)

    The student generalizes

    and describes the

    occurrence of similar

    phenomena as in the

    experiment related to

    questions about energy,

    environment, health and

    society. (In the frying

    pan it is hot enough for

    butter to melt and in the

    sauna water vaporizes.)

    The student discusses the

    occurrence of the

    phenomena observed in

    everyday life and the use

    of it and its impact on

    environment, health and

    society. (The phase change

    from liquid to gaseous

    phase cools you down

    when you are sweating.)

    Discuss The student

    contributes to a

    discussion of the

    occurrence of the

    phenomena studied in

    society and makes

    statements partly based

    on facts and describes

    some possible

    consequences. (Gases

    are often transported

    in a liquid phase which

    has a lower volume.)

    The student describes

    and discusses the

    occurrence of the

    phenomena studied in

    society and makes

    statements based on facts

    and fairly complicated

    physical relations and

    theories. (The bottle of a

    gas stove has fuel mainly

    in liquid phase but it is

    transported in the hose

    and burnt i gaseous

    phase.)

    The student uses the

    experiment as a model and

    discusses the occurrence

    of the phenomena studied

    in society and makes

    statements and

    consequences based on

    facts and complicated

    physical relations and

    theories (The phase

    change from liquid to

    gaseous phase cools you

    down when you are

    sweating.)

    Strand 11 Evaluation and assessment of student learning and development

    9

  • WORKSHOP

    We had prepared four similar activities, all with the same material; candle, wick, and

    matchbox but with different purposes. The activities represented studies of mass

    transfer, energy transformation, technical design, and phase changes. At the workshop

    three groups were formed, omitting the study of technical design. The three groups

    were not informed about the differences between the aims of their experiments. The

    groups were constructed to include people with as varied background as possible.

    Thus, participants from one specific country or similar fields as chemistry or physics

    were allocated to different groups. They performed elaborative tasks similar to those

    used by teachers working at different levels, assessed the performance and evaluated

    the learning outcome of the activity. Within each group one person was selected to do

    the assessment of activities the others made. The person assessing the work should

    focus not only on the results of the discussions within the group but also try to

    evaluate the process, as the aim was to assess the skills of the participants rather than

    the content of their knowledge.

    Discussion

    The aim was to demonstrate of how peer reviewing within the group may be used for

    producing information of several kinds beneficial for the performance assessment of

    science education at school. Discussions arose among the participants about how an

    integrated approach, especially in relation to other subjects in school, improved the

    usefulness of the methods. Learning by doing followed by discussions became the

    major part of the workshop with sharing of ideas and suggestions for further

    development.

    Most of the participants had weak knowledge of assessments of practical skills and

    expressed their astonishment of the positive result of the workshop and showed

    curiosity to use the method. Some of the participants also showed didactic skills when

    explaining the different aspects of the experiment they mastered to the others, a good

    example of the importance of variation in the skills of group members.

    The persons who made the assessments expressed the need of further practicing. They

    realized the complexity in assessing different skills at the same time as assessing the

    grade. They also expressed a will to develop this ability as they realized the strength

    in assessing several skills at one occasion. Further, the participants noted the

    importance of questions like the last on in the instructions (Appendix) in order to

    assess the quality of the relation between theory and practice.

    Conclusion

    Although, based on a simple experiment of a burning candle, the workshop gave a

    opportunity to discuss and understand theories being regarded as difficult to

    understand from the viewpoint of the student or difficult to teach from the teachers

    view. The experiments, although similar, were of different character, thus, reflecting a

    wide spectrum of possibilities.

    Thus, the activities performed may be seen as models or examples possible to further

    develop new assessments according to the content of the subject.

    Strand 11 Evaluation and assessment of student learning and development

    10

  • REFERENCES

    Arter, J. A. & McTighe, J. (2001). Scoring rubrics in the classroom, Corwin

    Audesirk, T., Audesirk, G., & Byers, G B. (2008). Life on earth, 5 ed., San Francisco,

    Pearson Education.

    Busching, B. (1998). Grading Inquiry Projects. New Directions for Teaching and

    Learning 74: 8996.

    Chanock, K. (2000). Comments on Essays: do students understand what tutors write?

    Teaching in Higher Education 5 (1): 95105.

    Hewitt, P. G., Suchocki, J. & Hewitt, L. A. (2008). Conceptual physical science, 4 ed.

    San Francisco, Pearson Education.

    Jnsson, A. (2011). Lrande bedmning. Gleerups.

    Lea, M.R. & Street B.V. (2006). The Academic Literacies Model: Theory and

    Applications. Theory into Practice, 45(4): 368377.

    Reece, J.B., Urry, L.A., Cain, M.L., Wasserman, S.A., Minorsky, P.V. & Jackson, R.

    B. (2011). Campbell Biology Global Edition, Pearson.

    Sadler, D.R. (1987). Specifying and Promulgating Achievement Standards. Oxford

    Review of Education 13(2): 191209.

    Skolverket (Swedish National Agency for Education). (2010) Curriculum for the

    compulsory school, preschool class and the recreation centre 2011. Skolverket.

    Trefil, J. & Hazen R.M. (2010). Sciences an integrated approach. Wiley Eurasian

    Journal of Mathematics, Science & Technology Education 8(1).

    Strand 11 Evaluation and assessment of student learning and development

    11

  • APPENDIX

    INQUIRY OF A BURNING CANDLE

    This is an experiment of phase changes

    1. Light the candle and observe the change of phases.

    2. Which changes of phase can you observe?

    3. Where do they occur?

    4. Why do they occur?

    5. What happens in the different phases?

    6. How may you improve the experiment?

    7. Give examples of phase changes in daily life and the society.

    INQUIRY OF A BURNING CANDLE

    This is an experiment of energy transformation

    1. Light the candle and observe the energy transformations.

    2. Which changes of energy forms can you observe?

    3. Where do they occur?

    4. Why do they occur?

    5. What happens during the different energy transformations?

    6. How may you improve the experiment?

    7. Give examples of energy transformations in daily life and the society.

    INQUIRY OF A BURNING CANDLE

    This is an experiment of mass transfer

    1. Light the candle and observe mass transfer

    2. Which types of mass transfer can you observe?

    3. Where do they occur?

    4. Why do they occur?

    5. What happens to the candle due to this mass transfer?

    6. How may you improve the experiment?

    7. Give examples of mass transfer in daily life and the society.

    INQUIRY OF A BURNING CANDLE

    This is an experiment of candle design

    1. Light the candle and discuss the design of the candle.

    2. Which different parts can you observe in the candle?

    3. Where are they and how are they united?

    4. What function do the different parts have?

    5. Why is the candle created in that way?

    6. How may you improve the experiment?

    7. Give examples of similar designs in daily life and the society.

    Strand 11 Evaluation and assessment of student learning and development

    12

  • DEVELOPMENT OF AN INSTRUMENT TO MEASURE

    CHILDRENS SYSTEMS THINKING

    Kyriake Constantinide, Michalis Michaelides and Costas P. Constantinou

    University of Cyprus

    Abstract: Systems thinking is a higher order thinking skill required to meet the demands

    of social, environmental, technological and scientific advancements. Science abounds in

    systems and makes system function a core object of investigation and analysis. As a

    consequence, teaching in science can be a valuable framework for developing systems

    thinking. In order to approach this methodically, it becomes important to specify the

    aspects that constitute the systems thinking construct, design curriculum materials to help students develop these aspects, and develop instruments for evaluating students competence and monitoring the learning process. The present study aims at the

    development of an instrument for standardized assessment of systems thinking. It draws

    on a methodology that follows a cyclic procedure for instrument development and

    validation, where literature, experts, students and educators contribute in the procedure.

    Currently, the assessment instrument is in the second cycle of field testing, having

    collected data from about 900 students and having used these to develop a first version of

    a validated test and a scale for measuring 10-14-year-old childrens systems thinking. The test consists of multiple-choice scenario items that draw their content from everyday

    life. We present the methodology we are following, providing some examples of

    multiple-choice items to demonstrate their development and transformation throughout

    the process.

    Keywords: systems thinking, assessment, test development

    BACKGROUND

    The rate of advancements in scientific knowledge and technology and the widespread

    demands on young people to participate actively in solving problems in almost every

    aspect of our lives have reoriented the role of education in general and science teaching

    in particular. Nowadays, science teaching aims at developing scientifically literate people

    with flexible thinking skills and an ability to participate critically in meaningful

    discourse. More specifically, it aims at helping students acquire positive attitudes towards

    learning and science, a variety of experiences, conceptual understanding, epistemological

    awareness, practical and scientific skills and creative thinking skills (Constantinide,

    Kalyfommatou & Constantinou, 2001).

    The definitions of systems thinking described in the literature (e.g., Senge, 1990; Thier &

    Knott, 1992; Booth Sweeney, 2001; Ben-Zvi Assaraf & Orion, 2005) include thinking

    about a system, meaning a number of interacting items that produce a result over a period

    of time. According to the Benchmarks for Science Literacy (AAAS; 1993), systems

    thinking is an essential component of higher order thinking, whereas Kali, Orion and

    Strand 11 Evaluation and assessment of student learning and development

    13

  • Eylon (2003) refer to systems thinking as a high-order thinking skill required in

    scientific, technological, and everyday domains. Senge (1990) claims that systemic

    thinkers are able to change their own mental models, control their way of thinking and

    their problem-solving process. Therefore, defining, promoting through curricula, and

    measuring systems thinking should be an essential priority for education. Science

    teaching and learning can be a valuable framework for developing such skills, since it

    abounds in systems and science makes system function a core object of investigation and

    analysis.

    Several structured teaching attempts to promote systems thinking are reported in the

    literature, making the development of instruments for measuring systems thinking and for

    evaluating the effectiveness of such curricula a necessity. The most common means of

    evaluating systems thinking that has been reported thus far include tests (e.g. Riess &

    Mischo, 2009), interviews (e.g. Hmelo-Silver & Green Pheffer, 2004) and computer

    simulations and logs (e.g. Sheehy, Wylie, McGuinness & Orchard, 2000). Some

    researchers in order to triangulate their data used a combination of various data sources

    (e.g. Ben-Zvi Assaraf & Orion, 2005). Almost all means include tasks where a problem is

    introduced and the subjects have to propose solutions or predict the behavior of the

    system and its elements. Nevertheless, to date there is no validated instrument and prior

    research has not provided a scale for measuring systems thinking of children aged 10-14

    years old. The purpose of this paper is to describe the on-going development process of

    the Systems Thinking Assessment (STA), a test designed to assess systems thinking.

    RESEARCH METHODOLOGY

    Systems Thinking Assessment (STA): purpose and specifications

    The STA will be used to measure the quality of thinking about systems by children aged

    10-14 and the effectiveness of curricula designed to promote systems thinking. It consists

    of multiple-choice items in the context of everyday phenomena, familiar to the children

    of the specific age range. The stems of the items include a scenario and children are

    asked to choose the best possible answer, amongst four alternatives.

    Multiple choice items have advantages and disadvantages. Given that every other criterion

    was taken into account, grading a multiple choice test is objective, since a grader would

    mark an item in the same way as anybody else. Besides, a short amount of time is needed

    to administer many items, in order to sufficiently cover the content domain under study.

    They are also more reliable than other forms of questions, since, in a possible

    readministration of a test, it is more likely that a subject will produce the same answers if

    the questions are multiple choice than if they are open-ended. A basic disadvantage of

    multiple choice questions is that they do not provide much information on the subjects thinking processes, namely the reasons for which they answer each item the way they do.

    Nevertheless, the procedure of the tests development and Rasch analysis minimize the effect of this disadvantage on the results.

    Strand 11 Evaluation and assessment of student learning and development

    14

  • In order to be able to make generalizations, there was an intentional effort to include items

    that utilize various systems: physical-biological systems (such as water cycle, a forest, a

    dam or food webs), mechanical-electrical systems (such as a bicycle or a car) and

    socioeconomic systems (such as a family, a village or a store). Moreover, where possible,

    a picture or a diagram was added in the items wording, so as to make the item clearer and the test more eye-pleasant.

    We have adopted the following operational definition of systems thinking, which relies

    on four strands:

    (a) System definition includes identifying the essential elements of a system, its temporal boundaries and its emergent phenomena. (b) System interactions includes reasoning about causes and effects when interactions are taking place within the system.

    (c) System balance refers to the abilities of recognizing the relation between interactions and the systems balance. (d) Flows refers to reasoning about the relation of inflows and outflows in a system and recognizing cyclic flows of matter or energy.

    STAs cyclic development procedure

    Figure 1 presents the cyclic nature of the STA development. The definition of Systems

    Thinking in the center of the cycle is in regard to both the abilities that constitute it and

    the items that measure it. Involved parties (experts, educators, students and existing

    literature), provide feedback on Systems Thinking definition through data that define the

    tests validity and reliability.

    Figure 1. Development procedure for STA

    Educators

    (face validity)

    Literature

    (content validity)

    Experts

    (content validity)

    Students

    (test admin. and

    interview data)

    (construct,

    criterion and face

    validity,

    reliability)

    Systems

    Thinking

    (Abilities and

    items)

    Strand 11 Evaluation and assessment of student learning and development

    15

  • The STA has already undergone its first cycle of development. Reviewing the literature

    led to 13 abilities that seemed to define Systems Thinking. The original items were

    developed and administered to a small number of 10-year-old students. Qualitative and

    quantitative data led to modifications (content and wording changes) and the

    development of new items. Two experts gave feedback on the tests content validity. Further improvements were carried out and two educators with experience with children

    aged 10-14 years old examined the face validity of the test. The revised version was once

    again administered to a small number of 10-year-old students and after the necessary

    modifications the final form of the test with 52 multiple-choice items was administered to 900 students. Rasch modeling led to a scale showing items difficulty and students ability.

    Based on a broader literature review and the development of separate examples regarding

    each ability, the second development cycle began with revising the 13-ability schema and

    reducing the abilities to 10 and the items to 41. The revised test was given to

    approximately 90 10-14-year-old students. Test and items difficulty indices, items discrimination indices and frequencies were calculated and, items were either modified or

    replaced. Afterwards, 16 students participated in interviews, answering the items and

    following a think-aloud protocol (Ericsson & Simon, 1998). Non-effective items were

    replaced or modified.

    The latest version of the items is under evaluation by independent experts. Graduate/PhD

    students in Learning in Science, academics specialized in Science Teaching or

    Psychology and international researchers with experience on Systems Thinking

    measurement will provide feedback on the test by solving it first, and by judging its

    efficiency based on a structured protocol. Finally, an expert panel will be formed, during which any problems will be discussed until the panel reaches consensus. The

    revised test will be given to four educators to evaluate its face validity. The test will then

    be administered to 100 10-14-year-old students to statistically assess its clarity and its

    developmental validity. The improved test will finally be administered to 500 students

    and the data will be analyzed using Rasch modeling. Confirmatory Factor Analysis will

    be carried out in order to assess the 10-ability structure of the construct.

    RESULTS

    At the final stage of the first cycle of the STA development, the test was administered to

    about 900 students. Rasch statistical model provided a scale for the 52 items of the STT,

    where both subjects score and items degree of difficulty are presented (Figure 2).

    It is evident that the 52 items of the test fit the model well. Both students scores and items degree of difficulty are distributed uniformly on the scale. Students scores vary between -2.16 and 2.37 logits, whereas the items degree of difficulty varies between -2.41 and 2.53 logits.

    Strand 11 Evaluation and assessment of student learning and development

    16

  • * Every represents 4 students

    Figure 2. Scale of STT (at the end of first cycle)

    Strand 11 Evaluation and assessment of student learning and development

    17

  • Table 1

    Statistical values for the 52 STA items for the whole sample and the four groups

    Statistical indices Total

    sample

    5th

    Gr.

    Prima

    ry

    6th

    grade

    Primary

    1st

    grade

    Secon

    d.

    2nd

    grade

    Secon

    d.

    (n=848

    )

    (n=21

    9)

    (n=249) (n=13

    7)

    (n=24

    3)

    Mean (items*) 0.00 0.00 0.00 0.00 0.00

    (persons) -0.01 -0.30 -0.05 0.14 0.21

    Standard deviation (items) 0.97 0.96 0.97 1.08 1.03

    (persons) 0.72 0.66 0.73 0.73 0.68

    Separability** (items) 0.99 0.97 0.98 0.96 0.98

    (persons) 0.81 0.77 0.81 0.81 0.78

    Mean Infit mean square (items) 1.00 1.00 1.00 1.00 1.00

    (persons) 1.00 1.00 1.00 1.00 1.00

    Mean Outfit mean square (items) 1.01 1.02 1.02 1.01 1.01

    (persons) 1.01 1.02 1.02 1.01 1.01

    Infit t (items) -0.12 -0.13 -0.03 0.00 0.04

    (persons) -0.04 -0.07 -0.03 -0.02 -0.01

    Outfit t (items) 0.09 0.05 0.09 0.05 0.08

    (persons) 0.02 0.04 0.03 -0.01 0.02

    *L=52 items

    ** Separability: value=1 shows great reliability, whereas value=0 very little reliability

    Table 1 shows the statistical values of Rasch statistical model for the whole sample and

    the four subgroups (5th and 6th primary grades and 1st and 2nd secondary grades)

    separately. It is evident that, for the whole sample and the subgroups, items reliability values are over .95, whereas subjects reliability values are over .76. Although the generally accepted values for such a scale are over .90 (Wright, 1985), the subjects reliability may be accepted. Furthermore, Mean Infit mean square for both items and

    subjects equals to 1 for the whole sample and the subgroups, while Mean Outfit mean

    square is either 1.01 or 1.02. Infit t and Outfit t, range from -0.13 to 0.09. Subjects Standard Deviation is rather small (SD=0.72), indicating uniformity in the samples behavior. Namely, students aged 10-14 respond to STT as an unvarying group. Besides,

    the subjects mean score increases with age, suggesting developmental validity of the test. Rasch analysis also showed that the items receive infit values from .87 to 1.18,

    which fit the generally accepted range .77-1.30 (Adams & Khoo, 1993). Three of the

    items have an outfit value over 1.30, but since the difference between infit and outfit

    values for these items is small, they remain in the test.

    Strand 11 Evaluation and assessment of student learning and development

    18

  • This is an on-going study and, at the moment, the test is under its second cycle of

    development. Test administration and interviews with students, feedback from experts

    and educators provide data to validate the items. The way data from each stage were

    analyzed is indicated in the Tables 1 and 2 that are presented in the next subchapter. At

    the end of the second cycle, Rasch analysis, as well as confirmatory factor analysis will

    be conducted and results will be published.

    Two examples of the items development

    The development of two items through the STA construction cycles can be seen in Tables

    2 and 3. The bicycle item presented in Table 2 refers to the strand System definition and more specifically to the ability of identifying the essential elements of a system and

    during the procedure it has been revised. The apple tree item presented in Table 3 refers to the strand System balance and more specifically to the ability of identifying reinforcing balancing loops. It has been replaced by a different one because of

    problematic item statistics during the pre-pilot phase of the second development cycle.

    Table 2

    The development of the bicycle item

    1st

    cycle Translation in English Comments Action

    Pre-

    pilot

    Which are the least elements that a

    bicycle that can troll should have?

    . frame, two wheels, pedals, chain . frame, two wheels, gears, handle bar

    C. frame, two wheels, pedals, seat

    D. frame, two wheels

    Students did not understand

    the wording of the stem

    Frequencies per alternative:

    A B C D

    0,41 0,09 0,32 0,18

    Change

    wording

    of main

    body

    and

    alternati

    ves

    To

    experts

    Which are the elements that a

    bicycle SHOULD have in order to

    roll, when someone is pushing it?

    . frame, two wheels, chain, pedals, handle bar

    . frame, two wheels, chain, pedals C. frame, two wheels, chain

    D. frame, two wheels

    Experts relate the item to two

    initially separate abilities (the abilities 1.1 and 1.2 were

    afterwards unified)

    Keep as

    is

    To

    educat

    ors

    Which are the elements that a

    bicycle SHOULD have in order to

    troll, when someone is pushing it?

    . frame, two wheels, chain, pedals, handle bar

    . frame, two wheels, chain, pedals C. frame, two wheels, chain

    D. frame, two wheels

    OK Keep as

    is

    Strand 11 Evaluation and assessment of student learning and development

    19

  • Pilot Which are the elements that a

    bicycle SHOULD have in order to

    troll, when someone is pushing it?

    . frame, two wheels, chain, pedals, handle bar

    . frame, two wheels, chain, pedals C. frame, two wheels, chain

    D. frame, two wheels

    Frequencies per alternative:

    A B C D

    0,56 0,19 0,00 0,25

    Revise

    distract

    or

    Final

    admini

    stration

    Which are the elements that a

    bicycle SHOULD have in order to

    troll, when someone is pushing it?

    . frame, two wheels, chain, pedals, handle bar

    . frame, two wheels, chain, pedals C. frame, two wheels, chain, handle

    bar

    D. frame, two wheels

    Frequencies per alternative:

    A B C D

    0,58 0,07 0,16 0,18

    Change

    wording

    of the

    stem

    2nd

    cycle

    Pre-

    pilot

    Which are the elements that a

    bicycle SHOULD NECESSARILY

    have in order to troll, when

    someone is pushing it?

    . frame, two wheels, chain, pedals, handle bar

    . frame, two wheels, chain, pedals C. frame, two wheels, chain, handle

    bar

    D. frame, two wheels

    Difficulty index (0.21)

    Discrimination index (0.3)

    Alternatives ok Frequencies per alternative:

    A B C D

    0,40 0,15 0,24 0,21

    Keep as

    is

    Intervi

    ews

    (first

    set)

    Which are the elements that a

    bicycle SHOULD NECESSARILY

    have in order to troll, when

    someone is pushing it?

    . frame, two wheels, chain, pedals, handle bar

    . frame, two wheels, chain, pedals C. frame, two wheels, chain, handle

    bar

    D. frame, two wheels

    Correct answer with

    CORRECT reasoning (4/11)

    Wrong answer (7/11)

    Suggestion of other

    alternatives (2/11)

    (wheels, pedals, handle bar) Alternative (B) not chosen by

    anyone

    Change

    alternati

    ve

    content

    Intervi

    ews

    (secon

    d set)

    Which are the elements that a

    bicycle SHOULD NECESSARILY

    have in order to troll, when

    someone is pushing it?

    . frame, two wheels, chain, pedals, handle bar

    Correct answer with

    CORRECT reasoning (1/5)

    Wrong answer (4/5)

    Keep as

    is

    Strand 11 Evaluation and assessment of student learning and development

    20

  • . frame, two wheels, pedals, handle bar

    C. frame, two wheels, chain, handle

    bar

    D. frame, two wheels

    Table 3

    The development of the apple tree item 1

    st cycle Translation in English Comments Action

    Pre-pilot - - -

    To experts - - -

    To

    educators

    Mr George planted a small apple tree 10

    years ago. Now the apple tree is quite big.

    As the apple tree grows,

    A. it needs more water.

    B. it needs less water.

    C. the trees need in water does not change. D. it does not need extra water, since it has

    already grown.

    Keep as is

    Pilot Mr George planted a small apple tree 10

    years ago. Now the apple tree is quite big.

    As the apple tree grows,

    A. it needs more water .

    B. it needs less water.

    C. the trees need in water does not change. D. it does not need extra water, since it has

    already grown.

    Frequencies per alternative:

    A B C D

    0,38 0,31 0,13 0,19

    Keep as is

    Final

    administra

    tion

    Mr George planted a small apple tree 10

    years ago. Now the apple tree is quite big.

    As the apple tree grows,

    A. it needs more water.

    B. it needs less water.

    C. the trees need in water does not change. D. it does not need extra water, since it has

    already grown.

    Frequencies per alternative:

    A B C D

    0,48 0,18 0,25 0,07

    Keep as is

    2nd

    cycle

    Pre-pilot Mr George planted a small apple tree 10

    years ago. Now the apple tree is quite big.

    As the apple tree grows,

    A. it needs more water .

    B. it needs less water.

    C. the trees need in water does not change. D. it does not need extra water, since it has

    already grown.

    Difficulty index (0.43) OK

    Discrimination index (-0.3)

    Frequencies per lternatives

    A B C D

    0,43 0,21 0,28 0,07

    Item

    replaced

    Strand 11 Evaluation and assessment of student learning and development

    21

  • CONCLUSION

    Systems thinking is a higher order skill, important in dealing with everyday phenomena

    and in solving problems. At the same time, science is a field with plenty of models to

    analyze and model. Despite the widespread research on curriculum development on

    systems thinking, no validated tests have been developed to evaluate their effectiveness.

    STA is developed following a cyclic and iterative procedure. It aspires to be a useful

    instrument in assessing a curriculum designed to promote systems thinking in upper-

    primary and lower-secondary school students.

    REFERENCES

    Adams, R. J. & Khoo, S. T. (1993). Quest: The Interactive Test Analysis System.

    Camberwell, Victoria: ACER.

    American Association for the Advancement of Science (1993). Benchmarks for science

    literacy. New York: Oxford University Press: Author.

    Constantinide, K., Kalyfommatou, N. & Constantinou, C. P. (2001). The development of

    modeling skills through computer based simulation of an ant colony. In

    Proceedings of the Fifth International Conference on Computer Based Learning

    in Science, July 7th July 12th 2001, Masaryk University, Faculty of Education, Brno, Czech Republic.

    Ben-Zvi Assaraf, O. & Orion, N. (2005). Development of System Thinking Skills in the

    Context of Earth System Education. Journal of Research in Science Teaching, 42

    (5), 518560

    Booth Sweeney, L. B. (2001). When a butterfly sneezes. Pegasus Communications, Inc,

    Waltham.

    Ericsson, K. A. and Simon, H. A.(1998). How to Study Thinking in Everyday Life:

    Contrasting Think-Aloud Protocols With Descriptions and Explanations of

    Thinking. Mind, Culture and Activity, 5, 178-186.

    Hmelo-Silver, C. E. and Green Pheffer, M. (2004). Comparing expert and vonice

    understanding of a complex system prom the perspective of structures, behaviors,

    and functions. Cognitive Science, 28, 127-138.

    Kali, Y., Orion, N., & Eylon, B. (2003). The effect of knowledge integration activities on

    students perception of the earths crust as a cyclic system. Journal of Research in Science Teaching, 40, 545565.

    Riess, W., & Mischo, C. (2009). Promoting Systems Thinking through Biology Lessons.

    International Journal of Science Education, 1-21.

    Strand 11 Evaluation and assessment of student learning and development

    22

  • Senge, P. (1990). The Fifth Discipline: The Art and Practice of the Learning

    Organization. New York: Doubleday.

    Sheehy, N., Wylie, J., McGuinness, C. & Orchard, G. (2000). How Children Solve

    Environmental Problems: using computer simulations to investigate systems

    thinking. Environmental Education Research, 6, 2, 109-126.

    Thier, H. D. & Knott, R. C. (1992). Subsystems and Variables. Teachers guide, Level 3, Science Curriculum Improvement Study. Delta Education, Inc., Hudson.

    Strand 11 Evaluation and assessment of student learning and development

    23

  • DEVELOPMENT OF A TWO-TIER TEST-INSTRUMENT

    FOR GEOMETRICAL OPTICS

    Claudia Haagen and Martin Hopf

    University of Vienna, AECCP, Vienna, Austria

    Abstract: Light is part of our everyday life. Nevertheless, students face enormous

    difficulties in explaining everyday optical phenomena with the help of scientific concepts.

    Usually they rely on alternative concepts deduced from everyday experience, which are

    often in conflict to scientific views. The identification of such alternative conceptions is

    one of the most important prerequisite for promoting conceptual change (Duit und

    Treagust 2003). Investigating students concepts with interviews is quite time consuming and difficult to handle in school-settings. Multiple-choice tests on the other hand, depict

    the conceptual knowledge base frequently in a superficial way. The main aim of our

    project is to develop a two-tier multiple-choice test which reliably and validly diagnoses

    year-8 students' understanding of geometrical optics. So far, we have developed and

    empirically tested a first (N=643) and second test version (N=367) partly based on items

    from literature. Though, the overall results are promising, the quality of the items differs a

    lot: There are a number of items which do not have appropriate distractors for the second

    tier. In addition, students and teachers feedback on the test indicates that some items pose problems due to their wording or the kind of representation chosen. For a closer analysis of

    these problematic items the qualitative method of student interviews was chosen. Semi-structured, problem based interviews were led with 29 year-8 students after their formal

    instruction in optics. Based on the results of these interviews, test items were revised and

    extended.

    Keywords: geometrical optics, two-tier multiple choice test, test development

    INTRODUCTION

    Despite everyday experience with light, understanding geometrical optics turns out to be

    difficult for students. Physics education research shows that students hold numerous

    conceptions about optics which differ from scientifically adequate concepts (Duit 2009).

    Alternative conceptions are very stable. Research shows that formal instruction is

    frequently not able to transform them into scientifically accepted ideas (Andersson und

    Krrqvist 1983; Fetherstonhaugh und Treagust 1992; Galili 1996; Langley et al. 1997).

    Teachers knowledge about their students learning difficulties is one important prerequisite for the design of successful instruction. Exploring students conceptual knowledgebase can provide important feedback: It can support students in their individual

    learning process and can serve as basis for further teaching decisions.

    In general, there are two main methods used for examining students conceptual knowledge: Interviews and open ended questionnaires. The most effective methods like

    interviews are very time consuming and difficult to handle for teachers in classroom

    situations. In search for alternatives out of this dilemma, we encountered the method of

    two-tier tests as used by e.g. Treagust 2006; Law & Treagust 2008. Two-tiered test items

    are items that require an explanation or defence for the answer [] (see Wiggins and

    Strand 11 Evaluation and assessment of student learning and development

    24

  • McTighe 1998, p. 14) (Treagust 2006). Each item consists of two parts, called tiers. The first part of the item is a multiple-choice question which consists of distractors including

    known student alternative conceptions. In the second part of each item, students have to

    justify the choice made in step one by choosing among several given reasons (Treagust

    2006).

    Research on alternative conceptions in optics has mainly used the methods of interviews or

    questionnaires with open answers (Andersson und Krrqvist 1983; Driver et al. 1985; Guesne 1985; Viennot 2003). In addition, multiple-choice tests were developed (Bardar et

    al. 2006; Chen et al. 2002;Chu et al. 2009; Fetherstonhaugh und Treagust 1992). These

    tests focus on various age-groups and on different content areas within geometrical optics.

    We have, however, not found a psychometric valid test-instrument designed to portray

    basics conceptions in geometrical optics of students on the lower secondary level.

    Our main research objective is the development of a multiple-choice test-instrument for

    year-8 students which is able to portray the students conceptions in geometrical optics.

    DEVELOPMENT OF THE TEST INSTRUMENT

    The test instrument was so far developed in two phases. In the first phase of the test

    development the content area of the test was identified based on the Austrian curriculum of

    year-8. Then students conceptions related to the key ideas of the content area were investigated by intensive literature research. Finally, items for the test were selected from

    already existing assessment tools for geometrical optics and adopted to the two-tier

    structure, where possible. Where already existing items were added a second tier,

    distractors for this second tier were taken from research on students conceptions. Additionally, some items were newly developed. The final version of the test was tried out

    with N=643 year-8 students.

    The results of this first test phase were used to revise the first test version. The second test

    version was tested with N=367 year-8 students, after their conventional instruction in

    geometrical optics in year-8. This version consisted of 20 two-tier items and 6 items with

    only one-tier, which were partly taken from literature (Fetherstonhaugh und Treagust 1992;

    Kutluay 2005; Bardar et al. 2006; Chu et al. 2009). The results of the statistical analysis

    with SPSS and students and teachers feedback on the test indicated a potential for improvement. Some items did not have appropriate distractors for the second tier, while

    others seemed to pose problems due to their wording or the kind of representations (Colin

    et al. 2002) chosen.

    Consequently, semi-structured, problem based interviews were conducted with year-8

    students, after their instruction in geometrical optics. These interviews were carried out for

    the following reasons: Firstly, we wanted to make sure that the distractors which had been

    taken from literature were exhaustive. Secondly, the interviews should investigate the

    response space of the newly developed items. Finally, the language and the graphical

    representations used in the items should be validated by students.

    Participants and Setting

    We interviewed 29 students (17 female, 12 male) after their instruction in geometrical

    optics. The students attended year-8 in 5 different schools. The students went to 8 different

    Strand 11 Evaluation and assessment of student learning and development

    25

  • classes and thus had 8 different physics teachers. The schools our sample attended

    contained all different types of schools available in Austria at year-8 level.

    The interviews were conducted in the school setting. Each student was interviewed

    individually. The average duration of the interviews was 19.5 minutes.

    METHOD

    We carried out semi-structured, problem based interviews (Lamnek, 2002; Mayring, 2002;

    Witzel, 1985). The interviews were based on seven selected items of the second test

    version. The students were just given the item task without any distractors. The interview

    followed a four step structure for each item. The students had to:

    paraphrase the task of the item

    describe the graphical representation used in the item

    answer the item

    account for the answer given

    Figure 1. Flow chart of the structure of the interviews

    Data analysis

    The interviews were recorded and transcribed. Afterwards they were analysed with

    MAXQDA following the method of qualitative content analysis by Mayring (2010) and

    Gropengieer (2008).

    The data was analysed concerning three main categories: language issues, the forms of

    visual representations used and students conceptions related to the content of the items. As far as language issues are concerned, we were interested how students interpreted the task

    of the item on basis of the text given. Additionally, we tried to identify unfamiliar words

    and expressions as well as too long or complicated sentences.

    For the visual representations our main aim was to find out if the students were able to

    grasp the content or the situation represented in visual form.

    The final category on students conceptions was supposed to analyse the response space concerning the problems posed and so to get a good overview of students conceptions related to the problem posed.

    Strand 11 Evaluation and assessment of student learning and development

    26

  • FINDINGS

    The findings presented here are results of the empirical testing of the second test version

    (N=376). The reliability of the test was established by a Cronbach alpha coefficient of

    =0.77. An overview of the test and item statistics concerning the 20 two-tier items is given in figure 2.

    Figure 2. Test and item statistics of the second test version

    Two-tier items were on average answered only in 37.2% of cases correctly. Contrary, one-

    tier items were solved on average in 47.41% of the cases. The solution frequencies of one-

    tier items (8.5% - 88.0%) were higher than those of two-tier items, which varied between

    3.0% and 57.2%. This effect is well known from research using two-tier items. Next to

    other factors, it is mainly caused by the fact that the probability of guessing is reduced by

    the necessity of accounting in the second tier for the choice made in tier one (cf. e.g. Tan &

    Treagust 2002).

    This is also supposed to be one way of distinguishing students who just possess a

    superficial factual knowledge of phenomena from students who have a deeper conceptual

    knowledge of phenomena as they are not only able to give a correct answer for the first tier

    of a multiple choice item but are also able to give a correct reason for their choice. As

    reported elsewhere (cf. Haagen & Hopf 2012) a more detailed analysis of the items

    indicated that most two-tier items used had a higher potential of portraying students conceptions in more detail in comparison to one-tier items.

    The second part of the findings section is going to concentrate on the findings of the

    interviews. As already mentioned above, the interviews were used to find appropriate

    distractors for items not having a second tier. For this paper, the focus is on this issue and

    in the following, one example of adding a second tier with help of the interview results is

    reported.

    For the topic of continuous propagation of light, the following item represented in figure 3

    was used.

    Strand 11 Evaluation and assessment of student learning and development

    27

  • Figure 3. One-tier item of test version two concerning the key idea of continuous

    propagation of light

    For those students who indicated in the first tier that they supposed a different distance of

    propagation of light from the campfire during day and night, we got 6 different categories

    of reasons as shown in figure 4.

    Figure 4. Reasons for a different propagation distance of light from a campfire during day

    and night

    Each of these categories was retranslated into students language taking either a student statement directly from the interviews or modifying a student statement slightly in order to

    fulfil psychometric guidelines for distractor construction. This procedure led to the second

    tier for this item as presented below in figure 5.

    Strand 11 Evaluation and assessment of student learning and development

    28

  • Figure 5. Two-tier item of test version two concerning the key idea of continuous

    propagation of light

    CONCLUSION

    In conclusion, the analysis of the second test version showed that two-tier items of the test

    are well able to portray several types of students conceptions known from literature. On the other hand, results indicated that some items needed still revision and improvement.

    The results obtained by interviews were integrated and make up the third test-version,

    which needs to be tested.

    REFERENCES

    Andersson, B.; Krrqvist, C. (1983): How Swedish pupils, aged 12-15 years, understand light and its properties. In: IJSE 5 (4), S. 387402.

    Bardar, E.M; Prather, E.E; Brecher, K.; Slater, T.F (2006): Development and validation of

    the light and spectroscopy concept inventory. In: Astronomy Education Review 5, S.

    103.

    Chu, H.E; Treagust, D.; Chandrasegaran, A. L. (2009): A stratified study of students'

    understanding of basic optics concepts in different contexts using two-tier multiple-

    choice items. In: RSTE 27, S. 253265.

    Colin, P.; Chauvet, F.; Viennot, L. (2002): Reading images in optics: students difficulties

    and teachers views. In: IJSE 24 (3), S. 313332.

    Driver, R.; Guesne, E.; Tiberghien, A. (Hg.) (1985): Children's ideas in science.

    Buckingham: Open University Press.

    Duit, R. (2009): BibliographySTCSE: Students and teachers conceptions and science education. Retrieved October 20, 2009.

    Duit, R.; Treagust, D.F (2003): Conceptual change: a powerful framework for improving

    science teaching and learning. In: IJSE 25 (6), S. 671688.

    Strand 11 Evaluation and assessment of student learning and development

    29

  • Fetherstonhaugh, T.; Treagust, D. F. (1992): Students' understanding of light and its

    properties: Teaching to engender conceptual change. In: SE 76 (6), S. 653672.

    Galili, I. (1996): Students conceptual change in geometrical optics. In: IJSE 18 (7), S. 847868.

    Guesne, E. (1985): Light. In: R. Driver, E. Guesne und A. Tiberghien (Hg.): Children's

    ideas in science. 1993. Aufl. Buckingham: Open University Press, S. 1032.

    Langley, D.; Ronen, M.; Eylon, B. S. (1997): Light propagation and visual patterns:

    Preinstruction learners' conceptions. In: JRST 34 (4), S. 399424.

    Law, J.F; Treagust, D. F. (2008): Diagnosis of student understanding of content specific

    science areas using on-line two-tier diagnostic tests. Curtin University of

    Technology.

    Mayring, P. (2010): Qualitative Inhaltsanalyse. Weinheim: Beltz.

    Treagust, D. F. (2006): Diagnostic assessment in science as a means to improving

    teaching, learning and retention. In: UniSever Science - Symposium Proceedings:

    Assessment in science teaching and learning. Sidney, 2006. UniServe Science.

    Treagust, D.F; Glynn, S. M.; Duit, R. (1995): Diagnostic assessment of students science knowledge. In: Learning science in the schools: Research reforming practice 1, S.

    327436.

    Viennot, L. (2003): Teaching physics. Supported by: U. Besso, F. Chauvet, P. Colin, F.

    Hirn-Chaine, W. Kaminski und S. Rainson: Springer Netherlands.

    Strand 11 Evaluation and assessment of student learning and development

    30

  • STRENGTHENING ASSESSMENT IN HIGH SCHOOL

    INQUIRY CLASSROOMS

    Chris Harrison

    Kings college London

    Abstract: Inquiry provides both the impetus and experience that helps students

    acquire problem solving and lifelong learning skills. Teachers on the Strategies for

    Assessment of Inquiry Learning in Science Project (SAILS) strengthened their

    inquiry pedagogy, through focusing on seeking assessment evidence for formative

    action. Observing learners in the classroom as they carry out investigations, listening

    to learners piece together evidence in a group discussion, reading through answers to

    homework questions and watching learners respond to what is being offered as

    possible solutions to problems all provide plentiful and rich assessment data for

    teachers.

    Keywords: Inquiry, Assessment, Teacher change

    BACKGROUND

    The European Parliament and Council (2006) identified and defined the key

    competencies necessary for personal fulfillment, active citizenship, social inclusion

    and employability in our modern day society. These included communication skills

    both in mother tongue and foreign languages, mathematical, scientific, digital and

    technological competencies, social and civic competencies, cultural awareness and

    expression, entrepreneurship and learning to learn. These key competencies formed

    the foundation for the approach that our European Framework 7 project (EUFP7)

    Strategies for Assessment of Inquiry Learning in Science Project (SAILS) took to

    developing, researching and understanding how teachers might strengthen their

    teaching of inquiry-based science education.

    Since the Rocard Report (2007) recommended that school science teaching should

    move from a deductive to an inquiry approach to science learning, there have been

    several EUFP7 projects such as S-TEAM, ESTABLISH, Fibonacci, PRIMAS and

    Pathway,.whose remit has been to support groups of teachers across Europe in

    bringing about this radical change in practice. These projects have been successful in

    highlighting the importance of IBSE across Europe. They also have enabled us to

    determine the range of understanding of what the term inquiry means to teachers

    across Europe, and to establish to what extent skills and competencies that are

    developed through inquiry practices have been identified. The term inquiry has figured prominently in science education, yet it refers to at least three distinct categories of activitieswhat scientists do (e.g., conducting investigations using scientific methods), how students learn (e.g., actively inquiring through thinking and doing into a phenomenon or problem, often mirroring the processes used by scientists), and a pedagogical approach that teachers employ (e.g., designing or using curricula that allow for extended investigations) (Minner et al, 2009).

    Inquiry-based science education (IBSE) has proved its efficacy at both primary and

    secondary levels in increasing childrens and students interest and attainments levels (Minner et al, 2009: Osborne et al, 2008) while at the same time stimulating teacher

    Strand 11 Evaluation and assessment of student learning and development

    31

  • motivation (Wilson et al, 2010). One area that has remained problematic for teachers

    and cited as one of the areas limiting the development of IBSE within schools has

    been assessment. (Wellcome, 2011). This EUFP7 project Strategies for Assessment of

    Inquiry Learning in Science (SAILS) aims to prepare science teachers, not only to be

    able to teach science through inquiry, but also to be confident and competent in the

    assessment of their students learning through inquiry. The literature on teacher change suggests that teacher change is a slow (and often difficult process and none

    moreso than when the initiative requires teachers to review and change their

    assessment practices (Harrison, 2012).

    Part of the reason for this slow implementation of IBSE in science classrooms is the

    time lag that happens between introducing ideas and the training of teachers at both

    inservice and preservice level. While this situation should improve over the next few

    years, there is a fundamental problem with an IBSE approach and this lies with

    assessment. While the many EU IBSE projects have produced teaching materials,

    they have not produced support materials to help teachers with the assessment of this

    approach. Linked to this is the low level of IBSE type items in national and

    international assessments which gives the message to teachers that IBSE is not

    considered important in terms of skills in science education. It is clear that there is a

    need to produce an assessment model and support materials to help teachers assess

    IBSE learning in their classrooms if this approach is to be further developed and

    sustained in classrooms across Europe.

    Inquiry Skills

    Inquiry skills are what learners use to make sense of the world around them. These

    skills are important both to create citizens that can make sense of the science in the

    world they live in so that they make informed decisions and also to develop scientific

    reasoning for those undertaking future scientific careers or careers that require the

    logical approach that science encourages. An inquiry approach not only helps

    youngsters develop a set of skills such as critical thinking that they may find useful in

    a variety of contexts, it can also help them develop their conceptual understanding of

    science inquiry based science education (IBSE) and encourages students motivation

    and engagement with science.

    The term inquiry has figured prominently in science education, yet it refers to at least

    three distinct categories of activitieswhat scientists do (e.g., conducting investigations using scientific methods), how students learn (e.g., actively inquiring

    through thinking and doing into a phenomenon or problem, often mirroring the

    processes used by scientists), and a pedagogical approach that teachers employ

    (e.g., designing or using curricula that allow for extended investigations) (Minner,

    2009). However, whether it is the scientist, student, or teacher who is doing or

    supporting inquiry, the act itself has some core components.

    Inquiry based science education is an approach to teaching and learning science that is

    conducted through the process of raising questions and seeking answers (Wenning,

    2005, 2007) . An inquiry approach fits within a constructivist paradigm in that it

    requires the learner to take note of new ideas and contexts and question how these fit

    with their existing understanding. It is not about the teacher delivering a curriculum

    of knowledge to the learner but rather about the learner building an understanding

    through guidance and challenge from their teacher and from their peers.

    Strand 11 Evaluation and assessment of student learning and development

    32

  • Some of the key characteristics of inquiry based learning are:

    Students are engaged with a difficult problem or situation that is open-ended

    to such a degree that a variety of solutions or responses are conceivable.

    Students have control over the direction of the inquiry and the methods or

    approaches that are taken.

    Students draw upon their existing knowledge and they identify what their

    learning needs are.

    The different tasks stimulate curiosity in the students, which encourages them

    to continue to search for new data or evidence.

    The students are responsible for the analysis of the evidence and also for

    presenting evidence in an appropriate manner which defends their solution to

    the initial problem (Kahn & O'Rourke, 2005).

    In our view, these inquiry skills are developed and experienced through working

    collaboratively with others