๐
Questions Reading Mode
Study questions platform-wide or filter by specific tests with correct answers revealed.
Correct Answer Logic:
Concurrent validity is a form of criterion validity where two measures are administered at the same time (concurrently) and their scores are correlated. If the new test scores align with the validated test, this is evidence of concurrent validity. It is not reliability (which involves the same test repeated or split) nor predictive validity (which requires a future criterion).
Uploaded by: Fani Warraich
Correct Answer Logic:
Norm-referenced tests require substantial score variance to rank examinees accurately. Items with p-values between 0.40 and 0.60 are near optimal for NRT because they generate the widest spread of scores. Items with very high p-values (0.85โ0.90) produce ceiling effects and minimal variance, reducing the test's discriminating power โ making them less suitable for NRT purposes.
Uploaded by: Fani Warraich
Correct Answer Logic:
Diagnostic decisions address the causes of persistent learning difficulties โ intellectual, physical, emotional, or environmental. When standard instructional remediation has failed, a diagnostic battery (not a routine achievement test) is needed to pinpoint the underlying cause. This is distinct from placement (where to put the student) or selection (whether to admit).
Uploaded by: Fani Warraich
Correct Answer Logic:
A core rule of MCQ construction is that all distractors must be plausible to students who lack the relevant knowledge. If a distractor is not selected by any student in a field test, it contributes nothing to the item's measurement function and should be replaced with a more compelling incorrect option.
Uploaded by: Fani Warraich
Teaching
QUESTION #6771
Question 345
Carey (1988) identified six elements for developing a Table of Specification. Which combination is MOST essential to ensure both content representativeness AND cognitive depth?
Correct Answer Logic:
Content representativeness is ensured by balancing the weighting across all instructional goals (objectives). Cognitive depth is ensured by explicitly including items at both lower-order (knowledge, comprehension) and higher-order (application, analysis, synthesis) levels of Bloom's Taxonomy. These two elements working together prevent a test from oversampling easy recall items at the expense of complex thinking.
Uploaded by: Fani Warraich
Correct Answer Logic:
At the Extended Abstract level, students transcend the given content domain, make connections to other areas, and think hypothetically. Key indicator verbs include: theorize, generalize, hypothesize, reflect, and generate. The response described โ linking to another theory, generalizing the principle, and proposing a new idea โ precisely matches these verbs.
Uploaded by: Fani Warraich
Teaching
QUESTION #6773
Question 347
What is the fundamental difference between 'speed tests' and 'power tests' in terms of their design and the construct they measure?
Correct Answer Logic:
By definition: power tests use liberal time limits so virtually all examinees can attempt every item โ items are difficult, measuring depth of knowledge. Speed tests use strict time limits that prevent completion โ items are easy, measuring how quickly and accurately examinees can process and respond. They measure fundamentally different constructs.
Uploaded by: Fani Warraich
Correct Answer Logic:
Split-half reliability correlates two halves of the test, but a half-test is less reliable than the full test. The Spearman-Brown formula corrects for this by estimating the reliability of the full-length test from the split-half correlation. This is a fundamental principle: longer tests, all else equal, are more reliable because they sample the domain more broadly.
Uploaded by: Fani Warraich
Teaching
QUESTION #6775
Question 349
Which of the following actions MOST directly threatens the consequence validity of a high-stakes examination?
Correct Answer Logic:
Consequence validity evaluates the intended and unintended effects of using assessment results. When teaching to the test occurs, the unintended consequence is a narrowed curriculum โ students learn test content, not the broader domain. This undermines the fundamental educational purpose of assessment, representing a direct threat to consequence validity.
Uploaded by: Fani Warraich
Correct Answer Logic:
In CRT, the test is designed to assess mastery of a defined set of skills; it does not aim to spread students along a distribution. It is entirely acceptable โ even ideal โ for most students to pass if they have mastered the content. Around 80% correct per item is the expected CRT item difficulty. The purpose is mastery verification, not comparison or ranking.
Uploaded by: Fani Warraich
Correct Answer Logic:
This is the fourth approach to holistic scoring โ ranking papers relative to each other โ which aligns with norm-referenced or relative standard grading. Its critical limitation: it cannot be applied to large sets of papers because it requires reading and comparing all papers simultaneously, and scores depend on the composition of the specific group rather than absolute quality.
Uploaded by: Fani Warraich
Correct Answer Logic:
In IRT's 3-parameter logistic model: 'a' is discrimination (steepness of ICC), 'b' is difficulty, and 'c' is the pseudo-guessing parameter (lower asymptote of the ICC). A low 'a' means the item poorly differentiates ability levels; a high 'c' means even low-ability examinees have a substantial probability of answering correctly โ suggesting guessing is inflating scores. Such items should be revised or dropped.
Uploaded by: Fani Warraich
Correct Answer Logic:
Suggestion 7 in MCQ construction warns against verbal associations between the stem or correct answer and memorized material. When a keyword in the answer mirrors a specific phrase from the definition, students can identify the correct option through rote linguistic matching rather than conceptual understanding. This undermines the item's validity as a measure of comprehension.
Uploaded by: Fani Warraich
Correct Answer Logic:
The Table of Specification guideline states: Percent of instruction time = Percent of examination value (within ยฑ2 percent). Topic A received 25% instructional time. Allocated: 13/50 = 26%. Since 26% is within ยฑ2% of 25%, this is acceptable. If it were 28% or above (or 22% or below), revision would be needed.
Uploaded by: Fani Warraich
Teaching
QUESTION #6781
Question 355
Which of the following BEST explains why portfolios have very low reliability as an assessment tool compared to standardized tests?
Correct Answer Logic:
Reliability requires consistency across administrations, scorers, and conditions. Portfolios are diverse in content (student-selected), judged subjectively by different assessors using loosely defined criteria, and lack uniform conditions. All of these factors introduce variability into scores, which by definition reduces reliability. This is listed as a known weakness of portfolios.
Uploaded by: Fani Warraich
Correct Answer Logic:
Suggestion 8 in MCQ construction states: the relative length of alternatives should not provide a clue to the answer. Correct answers tend to require qualification to be unambiguously true, making them longer. The fix is to deliberately add similar qualifying phrases to the distractors to equalize length, removing the length clue while preserving plausibility.
Uploaded by: Fani Warraich
Correct Answer Logic:
NEAS is a system-level monitoring body. It conducts large-scale assessments to give federal policymakers a picture of education quality, monitor curriculum translation into learning, and identify factors affecting achievement. It does not certify individual students. BISE, in contrast, conducts individual-certification high-stakes examinations (SSC at grade 10, HSSC at grade 12) that directly determine students' academic credentials.
Uploaded by: Fani Warraich
Correct Answer Logic:
If high-scoring students are more likely to answer incorrectly than low-scoring students, the item discrimination index D = (Upper Group % Correct) โ (Lower Group % Correct) will be negative. Negative D values are the most serious item analysis warning sign, typically indicating a keying error, ambiguous wording, or a misleading clue that advantageously targets lower-ability guessers over informed higher-ability students.
Uploaded by: Fani Warraich
Correct Answer Logic:
In the Revised Bloom's Taxonomy by Anderson and Krathwohl: (1) All category names were changed from nouns to verbs (Knowledge โ Remembering; Comprehension โ Understanding; etc.). (2) The two highest levels were reversed โ Creating (formerly Synthesis) became the highest level, above Evaluating. This reflects the view that generating new ideas is cognitively more demanding than judging existing ones.
Uploaded by: Fani Warraich
Teaching
QUESTION #6786
Question 360
A test packaging decision involves arranging items 'from easy to hard'. Which of the following is the MOST psychologically sound rationale for this arrangement?
Correct Answer Logic:
The test administration literature supports easy-to-hard arrangement primarily for psychological reasons: it provides a positive start, reduces anxiety, builds confidence, and motivates students to continue engaging with the test. This is particularly beneficial for test-anxious or lower-ability students and does not penalize stronger students who find the early items trivially easy.
Uploaded by: Fani Warraich
Sign in to join the conversation and share your thoughts.
Log In to Comment