Assessment Terminology

Assessment Terminology

Educational Assessment- Testing, or measurement - A process by which educators use students’ responses to specially created or naturally occurring stimuli in order to make inferences about students’ knowledge, skill, or affective status.
Curricular Aim - A set of educationally relevant knowledge, skills, or affect that you want students to attain.
Assessment Inference – An interpretation based on students’ assessment performances regarding students’ curricular-aim mastery.
Educational Leader – Educators whose decisions influence the decisions of those who are responsible for educating students.
Aptitude tests – Assessments of examinees’ intellectual potential – often employed to make predictions about success in future academic settings
Achievement test – Assessments of the knowledge and/or skills possessed by students
Content standards – The knowledge and skills students should learn.
Performance standards – The level of proficiency at which content standards should be mastered.
Relative Interpretation – Giving meaning to a test result by comparing it to the results of other test-takers
Norm Referenced Interpretation – Giving meaning to a test result by comparing it to the results of other test takers.
Percentile – The percent of norm group students who had lower scores than the students being assessed.
Absolute Interpretation –Giving meaning to a test result by comparing it with a defined curricular aim.
Raw Score – The “untreated” score earned by a student on an assessment.
Criterion Referenced Interpretation – Giving meaning to a test result by comparing it with a defined curricular aim.
Cognitive Assessment – Measurement of student’ knowledge and or intellectual skills.
Psychomotor Assessment – Measurement of students’ small-muscle or large-muscle skills.
Affective Assessment – Measurement of students’ attitudes, interests, and/or values.
Item Content – The substance of the tasks contained in an assessment instrument.
Curriculum – The ends – that is, the learning objectives sought for students.
Instruction – The means – that is, the teaching activities intended to accomplish curricular ends.
High Stakes Tests – Assessments used to make important decisions about students or reflect the effectiveness of educators.
Educational Accountability – The imposition of required student tests as way of holding educators responsible for the quality of schooling.
Second Level Inferences – Inferences that are drawn from score-based inferences about students’ status with respect to their mastery of a curricular aim.
Assessment Validity – The degree to which test-based inferences about students are accurate.
Content related Validity Evidence – Evidence indicating that an assessment suitably reflects the curricular aim it represents.
Criterion Related Validity evidence – Evidence demonstrating the systematic relationship of test scores to a criterion variable.
Criterion Variable – An external variable that serves as the target for a predictor test.
Construct Related Validity Evidence – Empirical evidence that (1) supports the posited existence of hypothetical construct and (2) indicates that an assessment device does, in fact, measure that construct.
Consequential Validity – A concept, disputed by some, focused on the appropriateness of a test’s social consequences.
Assessment Reliability – The consistency of results produced by measurement devices.
Stability Reliability – The consistency of assessment results over time.
Classification Consistency – A representation of the proportion of students who are placed in the same category on two testing occasions or two test forms.
False Positive – Classifying a student as having mastered what’s being measured when, in fact, the student hasn’t
False Negative – Classifying a student as not having mastered what’s being measured when in fact, the student has.
Alternate form reliability – The consistency of measured results yielded by different forms of the same test.
Stability and Alternate Form Reliability – The consistency of measured results over time using two different test forms.
Internal Consistency Reliability – The degree to which a test’s items are functioning in a homogeneous fashion.
Dichotomous Items – Test items that are scored either right or wrong
Polytomous Items – Test items to which responses are given more than two score points.
Standard Error of Measurement – The consistency of an individual’s test performance.
Disparate Impact – When the test scores of different groups are decidedly different.
Offensiveness – A test item is offensive when it contains elements that would insult any group of test takers on the basis of their personal characteristics.
Unfair Penalization – Test items unfairly penalize test takers when there are elements in an item that would inequitably disadvantage any group because of its personal characteristics.
p Value – The proportion of students who answer a test item correctly.
Normal Curve – A unique test-score distribution whose properties are helpful in making relative interpretations of students’ performance.
Standard Score – A way of describing, in standard deviation units, a raw score’s distance from its distribution’s means.
Normal Curve Equivalent (NCE) – A standard score that, based on a raw score’s percentile, indicates the raw score’s standard-deviation distance from a distribution’s mean if the distribution had been normal.
Stanine – A normalized standard score based on dividing a distribution into nine units of one-half standard deviation distances.
Scale Score – Based on the conversion of raw scores to a new numerical scale, a student’s relative performance is reported on the converted scale as a scale score.
Item Response Theory (IRT) – A scale-score system that, by using considerable computer analyses, creates a new scale based on the properties of each test item.
Grade Equivalent Score – Score-reporting estimates of how a student’s performance relates to the average performance of students in a given grade and month of the school year.
Norm Group – The group of test-takers whose scores are used to make relative interpretations of others’ test performances.

Assessment Terminology II

Educational assessment: Allows educators to make inferences about student status with respect to a curricular aim (Curr. Aim ----Represented by ----- Educational Assess.
Indicator of Educational effectiveness: Parents and other citizens seem to think that test results are it.
NCLB: the latest in a series of legislative initiatives that have transformed students’ performances on important tests into the single, most important factor in determining educational quality.
1st Lesson: understand how the proper and improper uses of test results are an indicator of educational effectiveness.
2nd lesson: Need to understand how assessment (test results) can improve the instructional process. Tests are not just a way to determine who gets the A’s in your class.
Assessment inference: = interpretation of test results = making sense out of the students’ test results
Norm-referenced tests: = nationally standardized tests (technically a test is not norm referenced, but it is the inferences or result-based interpretations that are norm-referenced.
Curricular aim = criterion = target=each set of the 500grade-level spelling words
Norm referenced interpretations: require less precise descriptions of what’s being measured (do not need clearly defined objectives)
A test created to provide: criterion referenced interpretations usually does not do a good job of providing norm-referenced inferences and vise-versa.
Categories of curricular aim: Cognitive, Psychomotor, Affective
3 steps to determine what should be measured: 1. Identify decision 2. Choose interpretation 3. Identify sources of item content 4. Determine what to measure
3 Types of decisions: selection, evaluation, instruction
Fixed Quota setting: More applicants than openings. Norm referenced interpretations (compare the 500 applicants and pick the best 100)
Requisite-skill/Knowledge: Who is qualified? Don’t want to let 25% of the students get their white coat and stethoscope if they are not qualified (Criterion-referenced)
Single most important factor: to judge educational tests are the instructional contribution those tests are likely to make. Will the assessment help teachers design and deliver better instruction?
Curricular magnets: Whatever the tests measured began to occupy more importance in the curriculum.
NCLB: most recent reauthorization of ESEA (elementary & secondary education act of 1965)
Requires math & reading tests in 3-8 and 10-12
Science tests grades 3-5, 6-9, & 10-12
Select your own state standards to test
3 levels of performance – Basic, proficient, advanced
Adequate yearly progress (AYP) All children will be proficient or advanced
Types of Assessment: can impact instruction
Pre-assessments: help determine what to teach
Progress-monitoring test: decide whether to continue or cease instruction
Diploma-denial exam: teachers focus on what need to be learned
End of year final exam: help teachers decide if alterations need to be made for next year.
NCLB assessments: teachers will try to use instructional approaches that lead to AYP.
Instruction – influenced instruction: Curriculum was determined, instruction was planned and after instruction was delivered, then assessment took place (Traditional method)
Assessment – Influenced instruction: Curriculum, Assessment, instruction
Professional ethics guideline: no test-preparation practice should violate the ethical norms of the education profession.
Teachers have an ethical responsibility to serve as models of moral behavior for children
Educational Defensibility Guideline: No test preparation practice should increase students’ test scores without simultaneously increasing students’ mastery of the curricular aim being assessed.
Validity: the accuracy of the inferences or interpretations that are made based on students’ performances on measurement devices.
Relationship between validity and reliability: In order for a test to be valid, it needs to be reliable, but just because a test is reliable, does not guarantee the validity of the inferences made. (Vocab. test is reliable, but doesn’t show any results as far as ability to pole vault.)