Assessment Terminology
- Educational Assessment- Testing, or measurement - A process by which educators use students’ responses to specially created or naturally occurring stimuli in order to make inferences about students’ knowledge, skill, or affective status.
- Curricular Aim - A set of educationally relevant knowledge, skills, or affect that you want students to attain.
- Assessment Inference – An interpretation based on students’ assessment performances regarding students’ curricular-aim mastery.
- Educational Leader – Educators whose decisions influence the decisions of those who are responsible for educating students.
- Aptitude tests – Assessments of examinees’ intellectual potential – often employed to make predictions about success in future academic settings
- Achievement test – Assessments of the knowledge and/or skills possessed by students
- Content standards – The knowledge and skills students should learn.
- Performance standards – The level of proficiency at which content standards should be mastered.
- Relative Interpretation – Giving meaning to a test result by comparing it to the results of other test-takers
- Norm Referenced Interpretation – Giving meaning to a test result by comparing it to the results of other test takers.
- Percentile – The percent of norm group students who had lower scores than the students being assessed.
- Absolute Interpretation –Giving meaning to a test result by comparing it with a defined curricular aim.
- Raw Score – The “untreated” score earned by a student on an assessment.
- Criterion Referenced Interpretation – Giving meaning to a test result by comparing it with a defined curricular aim.
- Cognitive Assessment – Measurement of student’ knowledge and or intellectual skills.
- Psychomotor Assessment – Measurement of students’ small-muscle or large-muscle skills.
- Affective Assessment – Measurement of students’ attitudes, interests, and/or values.
- Item Content – The substance of the tasks contained in an assessment instrument.
- Curriculum – The ends – that is, the learning objectives sought for students.
- Instruction – The means – that is, the teaching activities intended to accomplish curricular ends.
- High Stakes Tests – Assessments used to make important decisions about students or reflect the effectiveness of educators.
- Educational Accountability – The imposition of required student tests as way of holding educators responsible for the quality of schooling.
- Second Level Inferences – Inferences that are drawn from score-based inferences about students’ status with respect to their mastery of a curricular aim.
- Assessment Validity – The degree to which test-based inferences about students are accurate.
- Content related Validity Evidence – Evidence indicating that an assessment suitably reflects the curricular aim it represents.
- Criterion Related Validity evidence – Evidence demonstrating the systematic relationship of test scores to a criterion variable.
- Criterion Variable – An external variable that serves as the target for a predictor test.
- Construct Related Validity Evidence – Empirical evidence that (1) supports the posited existence of hypothetical construct and (2) indicates that an assessment device does, in fact, measure that construct.
- Consequential Validity – A concept, disputed by some, focused on the appropriateness of a test’s social consequences.
- Assessment Reliability – The consistency of results produced by measurement devices.
- Stability Reliability – The consistency of assessment results over time.
- Classification Consistency – A representation of the proportion of students who are placed in the same category on two testing occasions or two test forms.
- False Positive – Classifying a student as having mastered what’s being measured when, in fact, the student hasn’t
- False Negative – Classifying a student as not having mastered what’s being measured when in fact, the student has.
- Alternate form reliability – The consistency of measured results yielded by different forms of the same test.
- Stability and Alternate Form Reliability – The consistency of measured results over time using two different test forms.
- Internal Consistency Reliability – The degree to which a test’s items are functioning in a homogeneous fashion.
- Dichotomous Items – Test items that are scored either right or wrong
- Polytomous Items – Test items to which responses are given more than two score points.
- Standard Error of Measurement – The consistency of an individual’s test performance.
- Disparate Impact – When the test scores of different groups are decidedly different.
- Offensiveness – A test item is offensive when it contains elements that would insult any group of test takers on the basis of their personal characteristics.
- Unfair Penalization – Test items unfairly penalize test takers when there are elements in an item that would inequitably disadvantage any group because of its personal characteristics.
- p Value – The proportion of students who answer a test item correctly.
- Normal Curve – A unique test-score distribution whose properties are helpful in making relative interpretations of students’ performance.
- Standard Score – A way of describing, in standard deviation units, a raw score’s distance from its distribution’s means.
- Normal Curve Equivalent (NCE) – A standard score that, based on a raw score’s percentile, indicates the raw score’s standard-deviation distance from a distribution’s mean if the distribution had been normal.
- Stanine – A normalized standard score based on dividing a distribution into nine units of one-half standard deviation distances.
- Scale Score – Based on the conversion of raw scores to a new numerical scale, a student’s relative performance is reported on the converted scale as a scale score.
- Item Response Theory (IRT) – A scale-score system that, by using considerable computer analyses, creates a new scale based on the properties of each test item.
- Grade Equivalent Score – Score-reporting estimates of how a student’s performance relates to the average performance of students in a given grade and month of the school year.
- Norm Group – The group of test-takers whose scores are used to make relative interpretations of others’ test performances.
Assessment Terminology II
- Educational assessment: Allows educators to make inferences about student status with respect to a curricular aim (Curr. Aim ----Represented by ----- Educational Assess.
- Indicator of Educational effectiveness: Parents and other citizens seem to think that test results are it.
- NCLB: the latest in a series of legislative initiatives that have transformed students’ performances on important tests into the single, most important factor in determining educational quality.
- 1st Lesson: understand how the proper and improper uses of test results are an indicator of educational effectiveness.
- 2nd lesson: Need to understand how assessment (test results) can improve the instructional process. Tests are not just a way to determine who gets the A’s in your class.
- Assessment inference: = interpretation of test results = making sense out of the students’ test results
- Norm-referenced tests: = nationally standardized tests (technically a test is not norm referenced, but it is the inferences or result-based interpretations that are norm-referenced.
- Curricular aim = criterion = target=each set of the 500grade-level spelling words
- Norm referenced interpretations: require less precise descriptions of what’s being measured (do not need clearly defined objectives)
- A test created to provide: criterion referenced interpretations usually does not do a good job of providing norm-referenced inferences and vise-versa.
- Categories of curricular aim: Cognitive, Psychomotor, Affective
- 3 steps to determine what should be measured: 1. Identify decision 2. Choose interpretation 3. Identify sources of item content 4. Determine what to measure
- 3 Types of decisions: selection, evaluation, instruction
- Fixed Quota setting: More applicants than openings. Norm referenced interpretations (compare the 500 applicants and pick the best 100)
- Requisite-skill/Knowledge: Who is qualified? Don’t want to let 25% of the students get their white coat and stethoscope if they are not qualified (Criterion-referenced)
- Single most important factor: to judge educational tests are the instructional contribution those tests are likely to make. Will the assessment help teachers design and deliver better instruction?
- Curricular magnets: Whatever the tests measured began to occupy more importance in the curriculum.
- NCLB: most recent reauthorization of ESEA (elementary & secondary education act of 1965)
- Requires math & reading tests in 3-8 and 10-12
- Science tests grades 3-5, 6-9, & 10-12
- Select your own state standards to test
- 3 levels of performance – Basic, proficient, advanced
- Adequate yearly progress (AYP) All children will be proficient or advanced
- Types of Assessment: can impact instruction
- Pre-assessments: help determine what to teach
- Progress-monitoring test: decide whether to continue or cease instruction
- Diploma-denial exam: teachers focus on what need to be learned
- End of year final exam: help teachers decide if alterations need to be made for next year.
- NCLB assessments: teachers will try to use instructional approaches that lead to AYP.
- Instruction – influenced instruction: Curriculum was determined, instruction was planned and after instruction was delivered, then assessment took place (Traditional method)
- Assessment – Influenced instruction: Curriculum, Assessment, instruction
- Professional ethics guideline: no test-preparation practice should violate the ethical norms of the education profession.
- Teachers have an ethical responsibility to serve as models of moral behavior for children
- Educational Defensibility Guideline: No test preparation practice should increase students’ test scores without simultaneously increasing students’ mastery of the curricular aim being assessed.
- Validity: the accuracy of the inferences or interpretations that are made based on students’ performances on measurement devices.
- Relationship between validity and reliability: In order for a test to be valid, it needs to be reliable, but just because a test is reliable, does not guarantee the validity of the inferences made. (Vocab. test is reliable, but doesn’t show any results as far as ability to pole vault.)