Diagnostics Mathematics Assessments: Main Ideas Now typically assess the knowledge and skill on the subsets of the 10 standards specified by the National

Diagnostics Mathematics Assessments: Main Ideas

Now typically assess the knowledge and skill on the subsets of the 10 standards specified by the National Council of Teachers of Mathematics

Designed to identify specific strengths and weaknesses in skill development

Attempt to assess a wide variety of skills Fewer diagnostic math assessments than

reading since math is more clear cut

Purpose for Assessing Math Provide detailed information so that

teachers and interventionists can determine a student’s mastery of skills and plan individualized math instruction

Provide teachers with specific information on the kinds of items that students pass or fail

Gives insight into how curriculum and instruction are working in the class

Also allows for modification of the curriculum

Purpose for Assessing Math Teachers need to know if students have

mastered facts and concepts Occasionally used to make exceptionality

and eligibility decisions Often used to establish special learning needs

and eligibility for programs for children with learning disabilities in math

National Council of Teachers of Mathematics

Suggest that a curriculum follow these in each and grades just at different levels.Content StandardsProcess Standards


Content Standards- followed at all gradesNumbers and OperationsAlgebraGeometry MeasurementData Analysis and Probability


So, you ask, what would these look like in First grade?Numbers and Operations- 3 + 1+Algebra- 3 + ☐= 4Geometry- What shape is + __________ Measurement- measure the temperature, time

etc.Data Analysis and Probability- Graph how many

people have teddy bears and how many have teddy dogs, teddy rabbits


Process StandardsProblem SolvingReasoning and ProofCommunicationConnectionsRepresentation


What does it look like in first grade for Process StandardsReasoning and Proof Complete the patter …

Group Mathematics Assessment and Diagnostic Evaluation (G-MADE)

Group administered, norm-referenced, standard based test for assessing the math skills of students in K-12

Purpose: to identify specific math skill development strengths and weaknesses and to lead to teaching strategies

Test materials include a CD that provides a cross-reference between specific math skills and teaching resources

Diagnosis of skills is broad

G-MADE SubtestsConcepts and Communication

Measures student knowledge of the language, vocabulary, and representations of math

Operation and Computation Measures skills in using the basic operations of

addition, subtraction, multiplication, and divisionProcess and Application Measures skill in taking in the language and

concepts of math and applying the appropriate operations and computations to solve a word problem

G-MADE ScoresRaw scores can be converted to standard

scores with a mean of 100 and a standard deviation of 15

Growth Scale Values are provided to track growth of math skillsCan track growth over one year or from year to

year

Test MaterialsTeacher’s ManualStudent BookletsAnswer SheetsHand-Scoring TemplateTechnical ManualAge-Based Norms and Grade-Based Out of

Level Norms SupplementScoring and Reporting Software

Reliability All reliabilities exceed .74 with more than

90% exceeding .80 Only low reliabilities are 7th grade Concepts

and Communications and Process and Applications at all grades beyond 4th

Internal consistency and stability are sufficient for using the test to make decisions about individuals

Validity Content is based on NCTM standards Created based on year long study of

standards, curriculum benchmarks, score and sequence commonly used in math textbooks, and review of research based on best math practices for teaching concepts and skills

Many studies support criterion related validity of test

In comparison with KeyMath, all correlations were in excess of .80, making the 2 tests highly comparable

Other Information Test is not timed since it is meant to test

power not speed Older students can complete test in one hour

long session where most students finish in about 45 minutes

With younger students, multiple, short testing sessions are recommended

KeyMath-3 Diagnostic Assessment (KeyMath-3 DA)

An untimed, individually administered, norm-referenced test designed to provide a comprehensive assessment of essential math concepts and skills in individuals ages 4 years, 6 months through 21 years

Time: 30-40 minutes in lower elementary and 70-90 minutes for older students

Provides a means of monitoring individual’s progress over time with 2 parallel forms that can be administered in alternating sequence every 3 months

Also provides Growth Scale Values (GSVs), a type of developmental scale score

Uses for KeyMath-3 DA Assess math proficiency by providing

comprehensive coverage of concepts and skills taught in regular math instruction

Assess student progress in math Support instructional planning Support educational placement decisions

KeyMath-3 DA 2 parallel forms (A and B) of the test Each test has 372 items divided into the

following subtests: Numeration Algebra Geometry Measurement Data Analysis and Probability Mental Computation and Estimation Addition and Subtraction Multiplication and Division Foundations of Problem Solving Applied Problem Solving

KeyMath-3 DA Resources Manual Two free standing easels for either Form A

or B 25 record forms with detachable Written

Computation Examinee Booklets Two additional products that are available:

ASSIST Scoring and Reporting Software Program

KeyMath-3 DA Essential Resources Instructional Program

KeyMath-3 DA Scores Can be hand scored or by using software Relative Standing: scale scores, standard

scores, percentile rank Developmental Scores: grade and age

equivalents, growth scale values Composite Scores: basic concepts,

operations, application Software can produce progress reports,

narrative summaries, export scores to Excel, parent reports

Reliability Internal Consistency – low in K and 1st but in

other ages exceed .80 Alternate Form – exceed .80 with exception

of different forms for Geometry and Data Analysis and Probability

Adjusted Test-Retest – based on 103 students, grades K-12 generally exceed .80 with exception of Foundations of Problem Solving (.70) and Geometry (.78) subtests

Adequate for screening and diagnostic purposes

Validity Correlates very highly with scores on

KeyMath-Revised normative update and scores on Kaufman Test of Educational Achievement, Measures of Academic Progress (MAP), and G-MADE

Evidence for content validity is good based on alignment with state and NCTM standards

Weaknesses for Diagnostic Math Assessments

Recurring issue of curriculum match Selecting appropriate test for the type of

decision to be made Do not test a sufficiently detailed sample of

math concepts and facts – must generalize Due to weaknesses, tests are not very useful in

assessing readiness or strengths and weaknesses in order to plan instructional programs

Preferred practice is for teachers to develop curriculum-based achievement tests that exactly parallel curriculum being taught

Goal of Oral and Written Language Assessments

“The assessment of language competence should include evaluation of a student’s ability to process, both in comprehension and in expression, language in a spoken or written format.”

Major Communication Processes

1. Oral Comprehension – listening and comprehending speech

2. Written Comprehension – reading

3. Oral Expression – speaking4. Written Expression -

writing

Related Terminology

Language Component

Reception/Comprehension

Expression/ Production

Phonology Hearing and discriminating speech sounds

Articulating speech sounds

Morphology and Syntax

Understanding the grammatical structure of language

Using the grammatical structure of language

Semantics Understanding vocabulary, meaning, and concepts

Using vocabulary, meaning, and concepts

Pragmatics and Supralinguistics

Understanding a speaker’s or writer’s intentions

Using awareness of social aspects of language

Considerations in Assessing Oral Language

Cultural Diversity Birth place,

pronunciations, comparing with the same language community

Developmental Considerations

Sounds, linguistic structures, and some semantic elements are developmental

Considerations in Assessing Written Language

Content – Production Formulating, elaborating,

sequencing, clarifying, and precise word choice to convey meaning

Form Penmanship, spelling,

and style rules

Observing Language Behavior

The following are the three main procedures for gathering a sample of a student’s language behavior.

Spontaneous Language Imitation Elicited Language


Advantages to Spontaneous Language

Spontaneity is the best and most natural indicator of everyday language performance.

Informality makes assessment easy, no formal testing atmosphere.


Disadvantages of Spontaneous Language

There is a non-standard nature to the data collected by this type of test.

This test can take a very long time to collect data.


Advantages of Imitation

Overcomes many of the problems associated with the spontaneous approach.

Assesses many different language elements to give a representative view of child’s language system

Structure of the test allows examiner to know all elements of language being assessed.

Test can be administered much more quickly than with spontaneous tests.


Disadvantages of Imitation

Children’s auditory memory may effect the results – a child can score well by imitationwithout demonstrating productive knowledge of the language structures being tested.

A child can repeat exactly what is said if the utterance or sentence is too small requiring no memory processing.

Children become very bored and can’t sit still. There is no stimuli like pictures or toys present. Just the repetition of repeating 50 to 100 sentences after the examiner.


Advantages to Elicited Language

Pictures can be structured to test desired language elements while retaining some of the spontaneous language samples.

Allows children to create language on their own.

There is no time limit so results do not depend on child’s word retention ability.


Disadvantages of

Elicited Language

Difficult to find pictures to guarantee exact word or sentence response.

Child may not produce or attempt to produce the desired language structure.

Tests

Test of Written Language – 4th (ed) (TOWL-4)

Test of Language Development: Primary – 4th

edition (TOLD-P:4) Test of Language Development:

Intermediate – 4th edition (TOLD-I:4)

Oral an Written Language Scales (OWLS)

Test of Auditory Reasoning and Processing Skills (TARPS)

Six Subtests

Sentence combining. The child is required to form one compound or complex sentence from two or more simple sentences spoken by the examiner.

Picture vocabulary. The child points to the picture that best represents a series of two-word items.

Word ordering. The child forms a complete, correct sentence from a randomly-ordered string of words, ranging from three to seven in length.

Relational vocabulary. The child tells how three words, spoken by the examiner, are alike.

Morphological comprehension. The child distinguishes between grammatically correct and incorrect sentences.

Multiple meanings. The examiner says a word and the student responds by saying as many different meanings for that word as he/she can think of.

Reliability and Validity

TOLD-I:4 appears to meet and often exceed the standards for reliability for making screening and diagnostic decisions.

The coefficients for reliability exceed 0.90

Unlike the TOLD – P:4, there is good evidence for construct validity of this test which is based on oral language ability which is known to be related to literacy and this test has a high correlation with reading and writing abilities.

Oral and Written Language Scales (OWLS)

Individually administered assessment of receptive and expressive language.

Test includes three scales: - Listening Comprehension- Oral Expression

- Written Expression

Recommended uses: Ages 3 – 21 To determine broad levels of

language skills and specific performance in listening, speaking, and writing.

Create intervention plans, and monitor

student progress scores can be converted to

obtain age equivalents/percentiles, etc.

Listening Comprehension Takes approx. 5 – 15 min

Measures understanding of spoken language

111 items – examiner reads aloud a verbal stimulus. The student has to identify which 4 pictures is the best response to the stimulus.

Oral ExpressionTakes approx. 5-15 min.

Measures understanding of and use of spoken language.

96 items – examiner reads aloud a verbal stimulus and shows a picture.

Student responds orally by either answering a question, completing a sentence, or generating one or more sentences.

Written Expression Timed response test

Measures ability of students 5-21 yrs old regarding use spelling, punctuation, syntax – sentence structure, phrases, etc., and communicate with appropriate content, coherence, organization, etc. The student responds to direct writing prompts by the examiner.

Reliability and Validity

There are wide ranges in reliability coefficients for this test.

Results of this test are sufficient to use as a screening device but are not sufficient to use in making important decisions about individual students.

Authors of this test report that the validity studies comparing thesesubtests to established criterionmeasured tests were similar inperformance and within theexpected range of validity.

Theory of multiple intelligences Heredity Learn through experiencesToday most theorists recognize the

importance of both heredity and experience.

Intelligence test results are used to determine eligibility for special services.

School Psychologists are trained professionals who administer Intelligence Tests.

IQ tests are helpful in providing general information as to how to pace instruction.

• An inferred ability; to explain differences in present behavior and to predict differences in future behavior.

• It is a general ability that enables people to do many different things.

A child’s background experiences and learning opportunities that they already have.CultureExperiences available in one’s environmentAge …..that may influence the psychological demands

presented by the test.***Failure is NOT due to an inability to comprehend or solve a problem, but a deficiency in background experience***

Discrimination: identify the item that is different from the others

Generalization: given a stimulus, identify from a group the one that goes with the stimulus

Motor Behavior: requires motor response in duplicating a geometric design using blocks, tracing a path through a maze, or reconstructing designs from memory.

General Knowledge: factual questionsVocabulary: naming pictures or reading a definition

and selecting a picture (depending on age)

Induction: State a rule or principle from a series of objects

Comprehension: 3 types: those related to directions, to printed material, or to social customs and mores.

Sequencing: identify the response that continues a series

Detail Recognition: identify the missing parts of a picture

Analogical Reasoning: How things are related to each other “A : B :: C : _____?

Pattern Completion: completing a pattern or identifying a missing part of a pattern

Abstract Reasoning: identify the absurdity in a picture or verbal statement

Memory: many different assessments are used to measure memory, ex. verbatim repetition of a sentence or series of numbers

Individual Tests: given one on one by a certified evaluator; most commonly used for educational placement decisions.

Three types of Intelligence TestsThree types of Intelligence TestsGroup Tests: may be used as a screening tool

for individual students, or to gain information about groups of students.

Nonverbal Intelligence Tests: Picture- Vocabulary test;

Administered to non-readers, ELL’s and hearing impaired students.

* This test measures only one aspect of intelligence (receptive vocabulary,) and should not be used to determine eligibility for special services.

Developed by David Wechsler in 1949, is has since had several revisions.

Wechsler states, “intelligence is the overall capacity of an individual to understand and cope with the world around him.”

The test is a measure of the cognitive ability and problem-solving process of a person ages 6 years to 16 years, 11 months.

Subtests; Core and Supplemental*: Verbal Comprehension Index (VCI)

Similarities Vocabulary Comprehension Information* Word Reasoning*

Wechsler Intelligence Scale for Children-IV (WISC-IVWechsler Intelligence Scale for Children-IV (WISC-IV))Subtests; Core and Supplemental*:

Perceptual Reasoning Index (PRI) Block Design Picture Concepts* Matrix Reasoning* Picture Completion


Working Memory Index (WMI) Digital span Letter-Number Sequencing* Arithmetic


Processing Speed Index (PSI) Coding Symbol Search Cancellation*

The full-scale IQ (FSIQ) is reliable enough to make important educational decisions. There is not enough information gathered from the subtests alone to make the educational decisions.

When using the WISC-VI to determine

educational needs for a student, examiners should only use the FSIQ.

timed testsample2 minutes9 blocks

Pick one picture from each row with common characteristics

Look at this picture. What part is missing?

Measures general intellectual ability , specific cognitive abilities, scholastic aptitudes, oral language and achievement.

Individually administered and norm-referencedFor ages 2-90+ Computer scoredEach Test Record contains a seven-category Test

Session Observation Checklist to rate a student’s conversational proficiency, cooperation, activity, attention and concentration, self-confidence, care in responding and response to difficult tasks.

20 subtests measuring broad and narrow abilities Comprehension-knowledge, long-term retrieval, visual-

spatial thinking, auditory processing, fluid reasoning, processing speed, short-term memory.

Subtests can be combined to create additional clusters for verbal ability, thinking ability, cognitive efficiency, phonemic awareness and working memory.

Additional supplemental subtests create more clusters, broad attention, cognitive fluency and executive processes

22 tests can be combined to form several clusters.

Subtests and clusters from the standard battery can be combined to form scores for broad areas in reading, math and writing.

Oral expression, listening comprehension, basic reading skills, reading comprehension, phoneme/grapheme knowledge, math calculation skills, math reasoning, written expression

Individual tests are combined to provide clusters for educational decision making

Cluster reliabilities for some age groups are less than .90, but all median reliabilities across age groups for the standard and broad cognitive and achievement clusters exceed .90

Careful item selection is consistent with claims for the content validity of both tests

Studies using a broad range of individuals provides evidence for validity

For the Cognitive Ability Tests, the correlations between the WJ-III General Intellectual Ability score and the WISC-III Full-Scale IQ range from .69 to .73

For the Achievement Tests, the pattern and magnitude of correlations between the Wechsler Individual tests suggest that the WJ-III measures the same skills similar to those measured by other achievement tests.

A non-timed test primarily given to younger children and ELL’s Assesses the receptive(hearing) vocabulary of examinees It consists of stimuli sets of 12 and examinees are tested at their

ability or age level As part of a broader assessment, can be useful in evaluating

language competence, selecting the level and content of instruction and measuring learning

The assessment of vocabulary is also useful when evaluating the effects of injury or disease

It is individually administered using an easel Available in Spanish

Examinees earn a raw score based on the number of pictures correctly identified between basal and ceiling items

Basal - the lowest set administered that contains one or no errors

Ceiling – the highest set administered that contains eight or more errors

Testing is discontinued once a ceiling is established

Multiple kinds of reliability are reported The scores of a PPVT-4 test are very precise

and consistent Data also included on the testing and

performance of students with disabilities

Five studies were conducted and indicate that there is adequate validity

Slightly lower correlations were found on assessments that measured broader areas of language than primarily vocabulary

Data is also provided on how students with speech and language impairments, hearing impairments, specific learning disabilities, mental retardation, giftedness, emotional/behavioral disturbances and ADHD, perform in relation to the general population

Results indicate the value of the PPVT-4 in assessing these special populations

Assessing children’s IQ is controversial Intelligence tests assess samples of behavior Different intelligence tests sample different behaviors Educators must always ask “IQ on what test?” Test authors have their own definitions of intelligence and

therefore test those items/behaviors they feel represent their definition

When interpreting intelligence scores, avoid making judgments that suggest that the score represents much more than the specific behaviors sampled

The quality of measurement can be affected by several different types of student characteristics and therefore must be taken into consideration

“Many of the behaviors sampled on intelligence tests are more indicative of actual achievement than ability to achieve.”

For example, “students who have had more opportunities to learn and achieve are likely to perform better than those who have had less exposure to information, even if they both have the same overall potential to learn.”

“Intelligence tests are by no means a pure representation of a student’s ability to learn.”

Documents

Diagnostics Mathematics Assessments: Main Ideas Now typically assess the knowledge and skill on the subsets of the 10 standards specified by the National