๐Ÿ“š Questions Reading Mode

Study questions platform-wide or filter by specific tests with correct answers revealed.

Log in to see your joined tests.
Correct Answer Logic:
Concurrent validity is a form of criterion validity where two measures are administered at the same time (concurrently) and their scores are correlated. If the new test scores align with the validated test, this is evidence of concurrent validity. It is not reliability (which involves the same test repeated or split) nor predictive validity (which requires a future criterion).
Uploaded by: Fani Warraich
Teaching QUESTION #6768
Question 342
A teacher constructs a 50-item MCQ test. Items 1โ€“25 cover knowledge-level outcomes (p-values around 0.85โ€“0.90) and items 26โ€“50 cover application-level outcomes (p-values around 0.40โ€“0.55). Which statement about this test's suitability for norm-referenced interpretation is MOST accurate?
  • Both halves are equally suitable for NRT because they cover all Bloom's levels
  • The application-level items (items 26โ€“50) are more suitable for NRT because their moderate difficulty creates greater score variance, enabling better discrimination among examineesโœ”๏ธ
  • The knowledge-level items are better for NRT because easier items reduce test anxiety
  • NRT requires all items to have the same difficulty level, so the test is unsuitable
Correct Answer Logic:
Norm-referenced tests require substantial score variance to rank examinees accurately. Items with p-values between 0.40 and 0.60 are near optimal for NRT because they generate the widest spread of scores. Items with very high p-values (0.85โ€“0.90) produce ceiling effects and minimal variance, reducing the test's discriminating power โ€” making them less suitable for NRT purposes.
Uploaded by: Fani Warraich
Teaching QUESTION #6769
Question 343
A teacher notices that a student's persistent academic failure is unrelated to motivation or teaching methods and suspects sensory or cognitive processing issues. Which type of educational decision, and what corresponding assessment tool, is MOST appropriate?
  • Grading decision; teacher-made achievement test
  • Diagnostic decision; specialized standardized diagnostic battery to identify root causesโœ”๏ธ
  • Placement decision; aptitude test
  • Selection decision; criterion-referenced test
Correct Answer Logic:
Diagnostic decisions address the causes of persistent learning difficulties โ€” intellectual, physical, emotional, or environmental. When standard instructional remediation has failed, a diagnostic battery (not a routine achievement test) is needed to pinpoint the underlying cause. This is distinct from placement (where to put the student) or selection (whether to admit).
Uploaded by: Fani Warraich
Teaching QUESTION #6770
Question 344
In developing MCQ distractors, a teacher writes: 'Which country first used nuclear weapons in warfare? a) USA b) Soviet Union c) Germany d) France'. Only option (a) is a genuine threat to an informed student. What principle of MCQ construction is violated?
  • The stem should not use negative phrasing
  • Distractors should be plausible to uniformed students; implausible distractors do not contribute to item functioning and should be revisedโœ”๏ธ
  • The stem should present a definite problem
  • All options should be grammatically consistent with the stem
Correct Answer Logic:
A core rule of MCQ construction is that all distractors must be plausible to students who lack the relevant knowledge. If a distractor is not selected by any student in a field test, it contributes nothing to the item's measurement function and should be replaced with a more compelling incorrect option.
Uploaded by: Fani Warraich
Teaching QUESTION #6771
Question 345
Carey (1988) identified six elements for developing a Table of Specification. Which combination is MOST essential to ensure both content representativeness AND cognitive depth?
  • Total number of items and test format
  • Balance among goals selected for the exam AND balance among levels of learning (higher and lower order)โœ”๏ธ
  • Enabling skills and number of items per goal
  • Test format and difficulty level
Correct Answer Logic:
Content representativeness is ensured by balancing the weighting across all instructional goals (objectives). Cognitive depth is ensured by explicitly including items at both lower-order (knowledge, comprehension) and higher-order (application, analysis, synthesis) levels of Bloom's Taxonomy. These two elements working together prevent a test from oversampling easy recall items at the expense of complex thinking.
Uploaded by: Fani Warraich
Correct Answer Logic:
At the Extended Abstract level, students transcend the given content domain, make connections to other areas, and think hypothetically. Key indicator verbs include: theorize, generalize, hypothesize, reflect, and generate. The response described โ€” linking to another theory, generalizing the principle, and proposing a new idea โ€” precisely matches these verbs.
Uploaded by: Fani Warraich
Teaching QUESTION #6773
Question 347
What is the fundamental difference between 'speed tests' and 'power tests' in terms of their design and the construct they measure?
  • Speed tests use harder items while power tests use easier items, measuring the same construct
  • Power tests have generous time limits and harder items, measuring maximum depth of knowledge; speed tests have strict time limits and easier items, measuring processing speed and efficiencyโœ”๏ธ
  • Speed tests are used in CRT while power tests are used in NRT
  • Power tests are always essay-based while speed tests are always MCQ-based
Correct Answer Logic:
By definition: power tests use liberal time limits so virtually all examinees can attempt every item โ€” items are difficult, measuring depth of knowledge. Speed tests use strict time limits that prevent completion โ€” items are easy, measuring how quickly and accurately examinees can process and respond. They measure fundamentally different constructs.
Uploaded by: Fani Warraich
Teaching QUESTION #6774
Question 348
A test developer wants to use the Split-Half method to estimate reliability. After splitting the test into odd and even items and correlating the halves, they get r = 0.70. However, this underestimates the reliability of the full test. Which formula corrects for this, and why is the correction necessary?
  • The KR-20 formula; because internal consistency must account for item difficulty
  • The Spearman-Brown prophecy formula; because reliability increases with test length, and the correlation between two halves reflects reliability of a test only half as longโœ”๏ธ
  • The inter-rater reliability coefficient; because two independent scorers are effectively two test halves
  • The KR-21 formula; because it handles non-dichotomous scoring
Correct Answer Logic:
Split-half reliability correlates two halves of the test, but a half-test is less reliable than the full test. The Spearman-Brown formula corrects for this by estimating the reliability of the full-length test from the split-half correlation. This is a fundamental principle: longer tests, all else equal, are more reliable because they sample the domain more broadly.
Uploaded by: Fani Warraich
Teaching QUESTION #6775
Question 349
Which of the following actions MOST directly threatens the consequence validity of a high-stakes examination?
  • Using moderately difficult items with acceptable discrimination indices
  • Teaching exclusively to the specific test items used in past exams (teaching to the test), which artificially narrows the curriculum and constrains student learningโœ”๏ธ
  • Administering equivalent forms of the test across different testing centers
  • Using a Table of Specification to ensure content balance
Correct Answer Logic:
Consequence validity evaluates the intended and unintended effects of using assessment results. When teaching to the test occurs, the unintended consequence is a narrowed curriculum โ€” students learn test content, not the broader domain. This undermines the fundamental educational purpose of assessment, representing a direct threat to consequence validity.
Uploaded by: Fani Warraich
Teaching QUESTION #6776
Question 350
In a criterion-referenced test (CRT) context, a cut score of 70% is set to distinguish 'master' from 'non-master'. After the exam, 85% of students pass. Which statement about this result is MOST consistent with CRT principles?
  • This result is anomalous because CRT normally produces a normal distribution of scores
  • This result is expected and acceptable in CRT; mastery tests are designed with relatively easy items, and it is desirable that most students demonstrate masteryโœ”๏ธ
  • This outcome shows the test lacked discriminating power and should be revised for better spread
  • The cut score should be raised to 90% to reduce the pass rate to a more appropriate level
Correct Answer Logic:
In CRT, the test is designed to assess mastery of a defined set of skills; it does not aim to spread students along a distribution. It is entirely acceptable โ€” even ideal โ€” for most students to pass if they have mastered the content. Around 80% correct per item is the expected CRT item difficulty. The purpose is mastery verification, not comparison or ranking.
Uploaded by: Fani Warraich
Teaching QUESTION #6777
Question 351
A teacher asks students to sort unseen essay papers by quality (best to worst), then assigns grades based on relative rank. According to holistic rubric theory, which grading philosophy does this approach embody, and what is its key limitation?
  • Criterion-referenced philosophy; it lacks discriminating power
  • Norm-referenced philosophy; papers are ranked relative to each other rather than against absolute quality criteria, which makes it unsuitable for large numbers of papersโœ”๏ธ
  • Absolute standard philosophy; it can be applied consistently to any number of papers
  • Analytic philosophy; sub-criteria are implicitly weighted differently
Correct Answer Logic:
This is the fourth approach to holistic scoring โ€” ranking papers relative to each other โ€” which aligns with norm-referenced or relative standard grading. Its critical limitation: it cannot be applied to large sets of papers because it requires reading and comparing all papers simultaneously, and scores depend on the composition of the specific group rather than absolute quality.
Uploaded by: Fani Warraich
Teaching QUESTION #6778
Question 352
A researcher applies Item Response Theory to a set of test items and finds that one item has a very low 'a' (discrimination) parameter and a relatively high 'c' (pseudo-guessing) parameter. What practical recommendation follows from this?
  • Retain the item because low guessing should be prioritized over discrimination
  • The item likely does not differentiate ability levels effectively and may be susceptible to correct responses through guessing; it should be revised or removed from the bankโœ”๏ธ
  • Increase the item difficulty to reduce the guessing parameter
  • Use this item as a warm-up item since it is accessible to all ability levels
Correct Answer Logic:
In IRT's 3-parameter logistic model: 'a' is discrimination (steepness of ICC), 'b' is difficulty, and 'c' is the pseudo-guessing parameter (lower asymptote of the ICC). A low 'a' means the item poorly differentiates ability levels; a high 'c' means even low-ability examinees have a substantial probability of answering correctly โ€” suggesting guessing is inflating scores. Such items should be revised or dropped.
Uploaded by: Fani Warraich
Teaching QUESTION #6779
Question 353
A teacher includes the following stem in a test: 'Photosynthesis is the process by which: a) plants absorb water from soil; b) plants produce food using sunlight; c) plants release oxygen at night; d) plants absorb carbon dioxide for cellular respiration.' Upon analysis, option (b) contains a keyword ('food') that also appears in the course definition students memorized. Which MCQ construction flaw does this represent?
  • Including a distractor that is too difficult
  • Verbal association between the stem/correct answer and the course definition โ€” the word 'food' provides an irrelevant linguistic clue to the answer without requiring genuine understandingโœ”๏ธ
  • The stem is stated as an incomplete sentence, which is a poor format
  • All options are grammatically inconsistent with the stem
Correct Answer Logic:
Suggestion 7 in MCQ construction warns against verbal associations between the stem or correct answer and memorized material. When a keyword in the answer mirrors a specific phrase from the definition, students can identify the correct option through rote linguistic matching rather than conceptual understanding. This undermines the item's validity as a measure of comprehension.
Uploaded by: Fani Warraich
Teaching QUESTION #6780
Question 354
In a Table of Specification, after calculating that 25% of instructional time was devoted to Topic A, a teacher allocates 13 marks out of 50 total marks to Topic A. Is this allocation within acceptable limits?
  • No, the allocation must be exact โ€” 12.5 marks and no rounding is permitted
  • Yes, the acceptable tolerance is ยฑ2 percentage points; 13/50 = 26%, which is within ยฑ2% of 25%โœ”๏ธ
  • No, the allocation should always round down to avoid over-testing any topic
  • Yes, but only if Topic A contains application-level questions
Correct Answer Logic:
The Table of Specification guideline states: Percent of instruction time = Percent of examination value (within ยฑ2 percent). Topic A received 25% instructional time. Allocated: 13/50 = 26%. Since 26% is within ยฑ2% of 25%, this is acceptable. If it were 28% or above (or 22% or below), revision would be needed.
Uploaded by: Fani Warraich
Teaching QUESTION #6781
Question 355
Which of the following BEST explains why portfolios have very low reliability as an assessment tool compared to standardized tests?
  • Portfolios always favor high-ability students because they choose their own best work
  • Portfolio scoring lacks the standardized criteria, uniform conditions, and objective scoring procedures that characterize reliable measurement โ€” different assessors apply different criteria to different contentโœ”๏ธ
  • Portfolios contain too many items, which statistically reduces internal consistency
  • Portfolios are only valid for formative purposes and reliability is not applicable to them
Correct Answer Logic:
Reliability requires consistency across administrations, scorers, and conditions. Portfolios are diverse in content (student-selected), judged subjectively by different assessors using loosely defined criteria, and lack uniform conditions. All of these factors introduce variability into scores, which by definition reduces reliability. This is listed as a known weakness of portfolios.
Uploaded by: Fani Warraich
Teaching QUESTION #6782
Question 356
A test developer notices that one MCQ option is consistently longer and more qualified than the other three options, and it is also the correct answer. What test construction error has occurred and how should it be corrected?
  • The stem does not present a definite problem; rewrite the stem to be a direct question
  • The relative length of the correct answer provides an unintentional clue; revise distractors to be approximately equal in length to the correct answer by adding qualifying phrasesโœ”๏ธ
  • There are too few distractors; add a fifth option
  • The item uses a negative stem; convert to a positive format
Correct Answer Logic:
Suggestion 8 in MCQ construction states: the relative length of alternatives should not provide a clue to the answer. Correct answers tend to require qualification to be unambiguously true, making them longer. The fix is to deliberately add similar qualifying phrases to the distractors to equalize length, removing the length clue while preserving plausibility.
Uploaded by: Fani Warraich
Teaching QUESTION #6783
Question 357
The National Education Assessment System (NEAS) was established in Pakistan primarily with funding from the World Bank and DfID in 2003. What was its PRIMARY assessment purpose โ€” and how does this differ from the purpose of the Board of Intermediate and Secondary Education (BISE)?
  • NEAS certifies individual student performance for promotion; BISE monitors national education standards
  • NEAS conducts large-scale national assessments to inform policy, monitor curriculum implementation standards, and identify achievement correlates at the system level; BISE conducts high-stakes individual certification examinations (SSC, HSSC) at grades 10 and 12โœ”๏ธ
  • NEAS administers competitive entrance examinations for public sector jobs; BISE focuses on diagnostic assessment
  • Both serve identical purposes but operate at different administrative levels
Correct Answer Logic:
NEAS is a system-level monitoring body. It conducts large-scale assessments to give federal policymakers a picture of education quality, monitor curriculum translation into learning, and identify factors affecting achievement. It does not certify individual students. BISE, in contrast, conducts individual-certification high-stakes examinations (SSC at grade 10, HSSC at grade 12) that directly determine students' academic credentials.
Uploaded by: Fani Warraich
Teaching QUESTION #6784
Question 358
A test item reads: 'Which of the following is NOT an example of formative assessment?' with correct answer option 'Final examination'. A student incorrectly answers 'Weekly quiz'. According to item analysis in CTT, if students with high total test scores systematically choose 'weekly quiz' as their answer, what does the discrimination index likely indicate?
  • A high positive D value, indicating the item works well
  • A negative D value, indicating that the item functions in reverse โ€” high achievers are misled while lower achievers answer correctly โ€” suggesting the item needs immediate reviewโœ”๏ธ
  • A D value near 0, indicating the item has no differential effect
  • A D value above 0.40, indicating excellent item quality
Correct Answer Logic:
If high-scoring students are more likely to answer incorrectly than low-scoring students, the item discrimination index D = (Upper Group % Correct) โ€“ (Lower Group % Correct) will be negative. Negative D values are the most serious item analysis warning sign, typically indicating a keying error, ambiguous wording, or a misleading clue that advantageously targets lower-ability guessers over informed higher-ability students.
Uploaded by: Fani Warraich
Teaching QUESTION #6785
Question 359
Bloom's original (1956) Taxonomy listed 'Synthesis' above 'Analysis' and 'Evaluation' as the highest level. Anderson and Krathwohl's revised taxonomy (2001) made two significant changes. Which accurately describes BOTH changes?
  • They added a seventh level called 'Innovation' and renamed 'Knowledge' to 'Remembering'
  • They reversed the two highest categories (Evaluation moved above Synthesis, which was renamed Creating) and changed all category names from nouns to verbsโœ”๏ธ
  • They collapsed the taxonomy to four levels and merged Synthesis with Evaluation
  • They added the Psychomotor and Affective domains to the Cognitive domain
Correct Answer Logic:
In the Revised Bloom's Taxonomy by Anderson and Krathwohl: (1) All category names were changed from nouns to verbs (Knowledge โ†’ Remembering; Comprehension โ†’ Understanding; etc.). (2) The two highest levels were reversed โ€” Creating (formerly Synthesis) became the highest level, above Evaluating. This reflects the view that generating new ideas is cognitively more demanding than judging existing ones.
Uploaded by: Fani Warraich
Teaching QUESTION #6786
Question 360
A test packaging decision involves arranging items 'from easy to hard'. Which of the following is the MOST psychologically sound rationale for this arrangement?
  • Easier items have higher discrimination and should be answered first to maximize test reliability
  • Beginning with accessible items reduces test anxiety, builds examinee confidence, and motivates engagement โ€” benefiting students who struggle with initial performance anxiety without disadvantaging stronger studentsโœ”๏ธ
  • Harder items lose marks if unattempted, so they must be placed last to ensure all items are reached
  • This arrangement ensures the test follows a norm-referenced scoring pattern
Correct Answer Logic:
The test administration literature supports easy-to-hard arrangement primarily for psychological reasons: it provides a positive start, reduces anxiety, builds confidence, and motivates students to continue engaging with the test. This is particularly beneficial for test-anxious or lower-ability students and does not penalize stronger students who find the early items trivially easy.
Uploaded by: Fani Warraich