Upload
cody-daniel
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
Diagnostics Mathematics Assessments: Main Ideas
Now typically assess the knowledge and skill on the subsets of the 10 standards specified by the National Council of Teachers of Mathematics
Designed to identify specific strengths and weaknesses in skill development
Attempt to assess a wide variety of skills Fewer diagnostic math assessments than
reading since math is more clear cut
Purpose for Assessing Math Provide detailed information so that
teachers and interventionists can determine a student’s mastery of skills and plan individualized math instruction
Provide teachers with specific information on the kinds of items that students pass or fail
Gives insight into how curriculum and instruction are working in the class
Also allows for modification of the curriculum
Purpose for Assessing Math Teachers need to know if students have
mastered facts and concepts Occasionally used to make exceptionality
and eligibility decisions Often used to establish special learning needs
and eligibility for programs for children with learning disabilities in math
National Council of Teachers of Mathematics
Suggest that a curriculum follow these in each and grades just at different levels.Content StandardsProcess Standards
National Council of Teachers of Mathematics
Content Standards- followed at all gradesNumbers and OperationsAlgebraGeometry MeasurementData Analysis and Probability
National Council of Teachers of Mathematics
So, you ask, what would these look like in First grade?Numbers and Operations- 3 + 1+Algebra- 3 + ☐= 4Geometry- What shape is + __________ Measurement- measure the temperature, time
etc.Data Analysis and Probability- Graph how many
people have teddy bears and how many have teddy dogs, teddy rabbits
National Council of Teachers of Mathematics
Process StandardsProblem SolvingReasoning and ProofCommunicationConnectionsRepresentation
National Council of Teachers of Mathematics
What does it look like in first grade for Process StandardsReasoning and Proof Complete the patter …
Group Mathematics Assessment and Diagnostic Evaluation (G-MADE)
Group administered, norm-referenced, standard based test for assessing the math skills of students in K-12
Purpose: to identify specific math skill development strengths and weaknesses and to lead to teaching strategies
Test materials include a CD that provides a cross-reference between specific math skills and teaching resources
Diagnosis of skills is broad
G-MADE SubtestsConcepts and Communication
Measures student knowledge of the language, vocabulary, and representations of math
Operation and Computation Measures skills in using the basic operations of
addition, subtraction, multiplication, and divisionProcess and Application Measures skill in taking in the language and
concepts of math and applying the appropriate operations and computations to solve a word problem
G-MADE ScoresRaw scores can be converted to standard
scores with a mean of 100 and a standard deviation of 15
Growth Scale Values are provided to track growth of math skillsCan track growth over one year or from year to
year
Test MaterialsTeacher’s ManualStudent BookletsAnswer SheetsHand-Scoring TemplateTechnical ManualAge-Based Norms and Grade-Based Out of
Level Norms SupplementScoring and Reporting Software
Reliability All reliabilities exceed .74 with more than
90% exceeding .80 Only low reliabilities are 7th grade Concepts
and Communications and Process and Applications at all grades beyond 4th
Internal consistency and stability are sufficient for using the test to make decisions about individuals
Validity Content is based on NCTM standards Created based on year long study of
standards, curriculum benchmarks, score and sequence commonly used in math textbooks, and review of research based on best math practices for teaching concepts and skills
Many studies support criterion related validity of test
In comparison with KeyMath, all correlations were in excess of .80, making the 2 tests highly comparable
Other Information Test is not timed since it is meant to test
power not speed Older students can complete test in one hour
long session where most students finish in about 45 minutes
With younger students, multiple, short testing sessions are recommended
KeyMath-3 Diagnostic Assessment (KeyMath-3 DA)
An untimed, individually administered, norm-referenced test designed to provide a comprehensive assessment of essential math concepts and skills in individuals ages 4 years, 6 months through 21 years
Time: 30-40 minutes in lower elementary and 70-90 minutes for older students
Provides a means of monitoring individual’s progress over time with 2 parallel forms that can be administered in alternating sequence every 3 months
Also provides Growth Scale Values (GSVs), a type of developmental scale score
Uses for KeyMath-3 DA Assess math proficiency by providing
comprehensive coverage of concepts and skills taught in regular math instruction
Assess student progress in math Support instructional planning Support educational placement decisions
KeyMath-3 DA 2 parallel forms (A and B) of the test Each test has 372 items divided into the
following subtests: Numeration Algebra Geometry Measurement Data Analysis and Probability Mental Computation and Estimation Addition and Subtraction Multiplication and Division Foundations of Problem Solving Applied Problem Solving
KeyMath-3 DA Resources Manual Two free standing easels for either Form A
or B 25 record forms with detachable Written
Computation Examinee Booklets Two additional products that are available:
ASSIST Scoring and Reporting Software Program
KeyMath-3 DA Essential Resources Instructional Program
KeyMath-3 DA Scores Can be hand scored or by using software Relative Standing: scale scores, standard
scores, percentile rank Developmental Scores: grade and age
equivalents, growth scale values Composite Scores: basic concepts,
operations, application Software can produce progress reports,
narrative summaries, export scores to Excel, parent reports
Reliability Internal Consistency – low in K and 1st but in
other ages exceed .80 Alternate Form – exceed .80 with exception
of different forms for Geometry and Data Analysis and Probability
Adjusted Test-Retest – based on 103 students, grades K-12 generally exceed .80 with exception of Foundations of Problem Solving (.70) and Geometry (.78) subtests
Adequate for screening and diagnostic purposes
Validity Correlates very highly with scores on
KeyMath-Revised normative update and scores on Kaufman Test of Educational Achievement, Measures of Academic Progress (MAP), and G-MADE
Evidence for content validity is good based on alignment with state and NCTM standards
Weaknesses for Diagnostic Math Assessments
Recurring issue of curriculum match Selecting appropriate test for the type of
decision to be made Do not test a sufficiently detailed sample of
math concepts and facts – must generalize Due to weaknesses, tests are not very useful in
assessing readiness or strengths and weaknesses in order to plan instructional programs
Preferred practice is for teachers to develop curriculum-based achievement tests that exactly parallel curriculum being taught
Goal of Oral and Written Language Assessments
“The assessment of language competence should include evaluation of a student’s ability to process, both in comprehension and in expression, language in a spoken or written format.”
Major Communication Processes
1. Oral Comprehension – listening and comprehending speech
2. Written Comprehension – reading
3. Oral Expression – speaking4. Written Expression -
writing
Related Terminology
Language Component
Reception/Comprehension
Expression/ Production
Phonology Hearing and discriminating speech sounds
Articulating speech sounds
Morphology and Syntax
Understanding the grammatical structure of language
Using the grammatical structure of language
Semantics Understanding vocabulary, meaning, and concepts
Using vocabulary, meaning, and concepts
Pragmatics and Supralinguistics
Understanding a speaker’s or writer’s intentions
Using awareness of social aspects of language
Considerations in Assessing Oral Language
Cultural Diversity Birth place,
pronunciations, comparing with the same language community
Developmental Considerations
Sounds, linguistic structures, and some semantic elements are developmental
Considerations in Assessing Written Language
Content – Production Formulating, elaborating,
sequencing, clarifying, and precise word choice to convey meaning
Form Penmanship, spelling,
and style rules
Observing Language Behavior
The following are the three main procedures for gathering a sample of a student’s language behavior.
Spontaneous Language Imitation Elicited Language
Observing Language Behavior
Advantages to Spontaneous Language
Spontaneity is the best and most natural indicator of everyday language performance.
Informality makes assessment easy, no formal testing atmosphere.
Observing Language Behavior
Disadvantages of Spontaneous Language
There is a non-standard nature to the data collected by this type of test.
This test can take a very long time to collect data.
Observing Language Behavior
Advantages of Imitation
Overcomes many of the problems associated with the spontaneous approach.
Assesses many different language elements to give a representative view of child’s language system
Structure of the test allows examiner to know all elements of language being assessed.
Test can be administered much more quickly than with spontaneous tests.
Observing Language Behavior
Disadvantages of Imitation
Children’s auditory memory may effect the results – a child can score well by imitationwithout demonstrating productive knowledge of the language structures being tested.
A child can repeat exactly what is said if the utterance or sentence is too small requiring no memory processing.
Children become very bored and can’t sit still. There is no stimuli like pictures or toys present. Just the repetition of repeating 50 to 100 sentences after the examiner.
Observing Language Behavior
Advantages to Elicited Language
Pictures can be structured to test desired language elements while retaining some of the spontaneous language samples.
Allows children to create language on their own.
There is no time limit so results do not depend on child’s word retention ability.
Observing Language Behavior
Disadvantages of
Elicited Language
Difficult to find pictures to guarantee exact word or sentence response.
Child may not produce or attempt to produce the desired language structure.
Tests
Test of Written Language – 4th (ed) (TOWL-4)
Test of Language Development: Primary – 4th
edition (TOLD-P:4) Test of Language Development:
Intermediate – 4th edition (TOLD-I:4)
Oral an Written Language Scales (OWLS)
Test of Auditory Reasoning and Processing Skills (TARPS)
Six Subtests
Sentence combining. The child is required to form one compound or complex sentence from two or more simple sentences spoken by the examiner.
Picture vocabulary. The child points to the picture that best represents a series of two-word items.
Word ordering. The child forms a complete, correct sentence from a randomly-ordered string of words, ranging from three to seven in length.
Relational vocabulary. The child tells how three words, spoken by the examiner, are alike.
Morphological comprehension. The child distinguishes between grammatically correct and incorrect sentences.
Multiple meanings. The examiner says a word and the student responds by saying as many different meanings for that word as he/she can think of.
Reliability and Validity
TOLD-I:4 appears to meet and often exceed the standards for reliability for making screening and diagnostic decisions.
The coefficients for reliability exceed 0.90
Unlike the TOLD – P:4, there is good evidence for construct validity of this test which is based on oral language ability which is known to be related to literacy and this test has a high correlation with reading and writing abilities.
Oral and Written Language Scales (OWLS)
Individually administered assessment of receptive and expressive language.
Test includes three scales: - Listening Comprehension- Oral Expression
- Written Expression
Recommended uses: Ages 3 – 21 To determine broad levels of
language skills and specific performance in listening, speaking, and writing.
Create intervention plans, and monitor
student progress scores can be converted to
obtain age equivalents/percentiles, etc.
Listening Comprehension Takes approx. 5 – 15 min
Measures understanding of spoken language
111 items – examiner reads aloud a verbal stimulus. The student has to identify which 4 pictures is the best response to the stimulus.
Oral ExpressionTakes approx. 5-15 min.
Measures understanding of and use of spoken language.
96 items – examiner reads aloud a verbal stimulus and shows a picture.
Student responds orally by either answering a question, completing a sentence, or generating one or more sentences.
Written Expression Timed response test
Measures ability of students 5-21 yrs old regarding use spelling, punctuation, syntax – sentence structure, phrases, etc., and communicate with appropriate content, coherence, organization, etc. The student responds to direct writing prompts by the examiner.
Reliability and Validity
There are wide ranges in reliability coefficients for this test.
Results of this test are sufficient to use as a screening device but are not sufficient to use in making important decisions about individual students.
Authors of this test report that the validity studies comparing thesesubtests to established criterionmeasured tests were similar inperformance and within theexpected range of validity.
Theory of multiple intelligences Heredity Learn through experiencesToday most theorists recognize the
importance of both heredity and experience.
Intelligence test results are used to determine eligibility for special services.
School Psychologists are trained professionals who administer Intelligence Tests.
IQ tests are helpful in providing general information as to how to pace instruction.
• An inferred ability; to explain differences in present behavior and to predict differences in future behavior.
• It is a general ability that enables people to do many different things.
A child’s background experiences and learning opportunities that they already have.CultureExperiences available in one’s environmentAge …..that may influence the psychological demands
presented by the test.***Failure is NOT due to an inability to comprehend or solve a problem, but a deficiency in background experience***
Discrimination: identify the item that is different from the others
Generalization: given a stimulus, identify from a group the one that goes with the stimulus
Motor Behavior: requires motor response in duplicating a geometric design using blocks, tracing a path through a maze, or reconstructing designs from memory.
General Knowledge: factual questionsVocabulary: naming pictures or reading a definition
and selecting a picture (depending on age)
Induction: State a rule or principle from a series of objects
Comprehension: 3 types: those related to directions, to printed material, or to social customs and mores.
Sequencing: identify the response that continues a series
Detail Recognition: identify the missing parts of a picture
Analogical Reasoning: How things are related to each other “A : B :: C : _____?
Pattern Completion: completing a pattern or identifying a missing part of a pattern
Abstract Reasoning: identify the absurdity in a picture or verbal statement
Memory: many different assessments are used to measure memory, ex. verbatim repetition of a sentence or series of numbers
Individual Tests: given one on one by a certified evaluator; most commonly used for educational placement decisions.
Three types of Intelligence TestsThree types of Intelligence TestsGroup Tests: may be used as a screening tool
for individual students, or to gain information about groups of students.
Nonverbal Intelligence Tests: Picture- Vocabulary test;
Administered to non-readers, ELL’s and hearing impaired students.
* This test measures only one aspect of intelligence (receptive vocabulary,) and should not be used to determine eligibility for special services.
Developed by David Wechsler in 1949, is has since had several revisions.
Wechsler states, “intelligence is the overall capacity of an individual to understand and cope with the world around him.”
The test is a measure of the cognitive ability and problem-solving process of a person ages 6 years to 16 years, 11 months.
Subtests; Core and Supplemental*: Verbal Comprehension Index (VCI)
Similarities Vocabulary Comprehension Information* Word Reasoning*
Wechsler Intelligence Scale for Children-IV (WISC-IVWechsler Intelligence Scale for Children-IV (WISC-IV))Subtests; Core and Supplemental*:
Perceptual Reasoning Index (PRI) Block Design Picture Concepts* Matrix Reasoning* Picture Completion
Wechsler Intelligence Scale for Children-IV (WISC-IVWechsler Intelligence Scale for Children-IV (WISC-IV))Subtests; Core and Supplemental*:
Working Memory Index (WMI) Digital span Letter-Number Sequencing* Arithmetic
Wechsler Intelligence Scale for Children-IV (WISC-IVWechsler Intelligence Scale for Children-IV (WISC-IV))Subtests; Core and Supplemental*:
Processing Speed Index (PSI) Coding Symbol Search Cancellation*
The full-scale IQ (FSIQ) is reliable enough to make important educational decisions. There is not enough information gathered from the subtests alone to make the educational decisions.
When using the WISC-VI to determine
educational needs for a student, examiners should only use the FSIQ.
timed testsample2 minutes9 blocks
Pick one picture from each row with common characteristics
Look at this picture. What part is missing?
Measures general intellectual ability , specific cognitive abilities, scholastic aptitudes, oral language and achievement.
Individually administered and norm-referencedFor ages 2-90+ Computer scoredEach Test Record contains a seven-category Test
Session Observation Checklist to rate a student’s conversational proficiency, cooperation, activity, attention and concentration, self-confidence, care in responding and response to difficult tasks.
20 subtests measuring broad and narrow abilities Comprehension-knowledge, long-term retrieval, visual-
spatial thinking, auditory processing, fluid reasoning, processing speed, short-term memory.
Subtests can be combined to create additional clusters for verbal ability, thinking ability, cognitive efficiency, phonemic awareness and working memory.
Additional supplemental subtests create more clusters, broad attention, cognitive fluency and executive processes
22 tests can be combined to form several clusters.
Subtests and clusters from the standard battery can be combined to form scores for broad areas in reading, math and writing.
Oral expression, listening comprehension, basic reading skills, reading comprehension, phoneme/grapheme knowledge, math calculation skills, math reasoning, written expression
Individual tests are combined to provide clusters for educational decision making
Cluster reliabilities for some age groups are less than .90, but all median reliabilities across age groups for the standard and broad cognitive and achievement clusters exceed .90
Careful item selection is consistent with claims for the content validity of both tests
Studies using a broad range of individuals provides evidence for validity
For the Cognitive Ability Tests, the correlations between the WJ-III General Intellectual Ability score and the WISC-III Full-Scale IQ range from .69 to .73
For the Achievement Tests, the pattern and magnitude of correlations between the Wechsler Individual tests suggest that the WJ-III measures the same skills similar to those measured by other achievement tests.
A non-timed test primarily given to younger children and ELL’s Assesses the receptive(hearing) vocabulary of examinees It consists of stimuli sets of 12 and examinees are tested at their
ability or age level As part of a broader assessment, can be useful in evaluating
language competence, selecting the level and content of instruction and measuring learning
The assessment of vocabulary is also useful when evaluating the effects of injury or disease
It is individually administered using an easel Available in Spanish
Examinees earn a raw score based on the number of pictures correctly identified between basal and ceiling items
Basal - the lowest set administered that contains one or no errors
Ceiling – the highest set administered that contains eight or more errors
Testing is discontinued once a ceiling is established
Multiple kinds of reliability are reported The scores of a PPVT-4 test are very precise
and consistent Data also included on the testing and
performance of students with disabilities
Five studies were conducted and indicate that there is adequate validity
Slightly lower correlations were found on assessments that measured broader areas of language than primarily vocabulary
Data is also provided on how students with speech and language impairments, hearing impairments, specific learning disabilities, mental retardation, giftedness, emotional/behavioral disturbances and ADHD, perform in relation to the general population
Results indicate the value of the PPVT-4 in assessing these special populations
Assessing children’s IQ is controversial Intelligence tests assess samples of behavior Different intelligence tests sample different behaviors Educators must always ask “IQ on what test?” Test authors have their own definitions of intelligence and
therefore test those items/behaviors they feel represent their definition
When interpreting intelligence scores, avoid making judgments that suggest that the score represents much more than the specific behaviors sampled
The quality of measurement can be affected by several different types of student characteristics and therefore must be taken into consideration
“Many of the behaviors sampled on intelligence tests are more indicative of actual achievement than ability to achieve.”
For example, “students who have had more opportunities to learn and achieve are likely to perform better than those who have had less exposure to information, even if they both have the same overall potential to learn.”
“Intelligence tests are by no means a pure representation of a student’s ability to learn.”