ExamEval
Articles
What Makes a High-Quality Exam Question in Health Sciences Education?

What Makes a High-Quality Exam Question in Health Sciences Education?

By Sean P. Kane, PharmD, BCPS

Published November 7, 2025

The Foundation of Fair Assessment

Multiple choice questions are the backbone of summative assessment in health sciences education. They determine whether students advance, graduate, and ultimately enter clinical practice. Because the stakes are so high, the quality of these questions is not just an academic detail -- it is fundamental to upholding professional standards.

Poorly constructed questions create two dangerous outcomes. On one hand, a competent student may fail due to a confusing question, leading to delayed progression and unnecessary tuition costs. On the other, an unprepared student may pass by exploiting test-wise cues, eventually entering patient care without the required competence.

What Defines a High-Quality Multiple Choice Question?

High-quality exam questions share four essential characteristics that work together to create fair, valid, and reliable assessment.

1. Clarity and Focus

The question stem must present a clear, focused problem that students can understand without ambiguity. Unfocused or unclear stems force students to guess what the question is really asking.

If a student cannot demonstrate their knowledge because the question is confusing or misleading, the exam fails to accurately measure student learning.

A focused stem presents one specific scenario or problem. It avoids vague wording, unnecessary complexity, or multiple unrelated concepts packed into a single question. Students should immediately understand what cognitive task is required.

2. Alignment with Learning Outcomes

Quality questions directly assess specific, stated learning objectives from the curriculum. When questions test trivial details, obscure facts, or content never covered in class, they fail to measure meaningful competence.

Each question should map clearly to a course outcome and assess knowledge, application, or analysis at the appropriate cognitive level. This alignment ensures that exam results reflect actual learning rather than random recall of peripheral information.

3. Freedom from Test-Wise Cues

Poorly worded questions inadvertently give away the correct answer through grammatical inconsistencies, absolute terms, or recognizable patterns. Students who recognize these test-wise cues can answer correctly without actually knowing the content.

Common cues include stems that grammatically match only one answer, correct answers that are consistently longer or more detailed than distractors, or the systematic use of "all of the above" and "none of the above" options. These flaws reward "test-wise" students instead of actually testing their clinical knowledge.

Flawed Question

A patient is prescribed amoxicillin 500mg three times daily. The pharmacist should counsel the patient to:

A. Take with food

B. Avoid dairy products

C. Take the medication at evenly spaced intervals throughout the day to maintain consistent blood levels and complete the full course even if symptoms improve to prevent resistance ✓

D. Store in refrigerator

The correct answer is significantly longer and includes detailed rationale, making it obviously correct without requiring actual knowledge of antibiotic counseling. Test-wise students recognize this pattern, resulting in a high difficulty (most students answering correctly) but low biserial (not discriminating between high and low performing students).

Difficulty:91%

Biserial:0.08

Corrected Question

A patient is prescribed amoxicillin 500mg three times daily. The pharmacist should counsel the patient to:

A. Take with food to minimize GI upset

B. Avoid dairy products during therapy

C. Take at evenly spaced intervals throughout the day ✓

D. Store the medication in the refrigerator

All options are now similar in length and structure. The question tests knowledge of antibiotic counseling points rather than pattern recognition skills. This results in a more difficult question that discriminates better between high and low performing students.

Difficulty:64%

Biserial:0.43

4. Homogeneous and Plausible Answer Choices

All answer choices should belong to the same category and appear equally plausible to students who haven't mastered the content. Heterogeneous answer choices that mix different types of responses make distractors obviously wrong.

Quality distractors represent common misconceptions or errors that learners actually make, not implausible options that serve as obvious fillers.

For example, if three answers are specific drug names and one is a drug class, most students will quickly eliminate the outlier.

The Real Cost of Low-Quality Questions

Studies consistently show that 50% or more of multiple choice questions in health sciences education contain item-writing flaws. These flaws reduce reliability, introduce bias, and prevent exams from accurately measuring student competence. The result? Assessment that fails its fundamental purpose.

How Quality Impacts Psychometric Performance

High-quality questions demonstrate superior psychometric values that indicate a more valid, reliable assessment.

Validity: Measuring What Matters

Validity refers to whether a question actually measures the intended learning outcome. Flawed questions often test reading comprehension, test-wiseness, or trivial recall instead of meaningful clinical knowledge. When questions lack clarity or focus, validity suffers because student performance reflects confusion rather than competence.

Reliability: Consistent Results

Reliability indicates whether an exam produces consistent, reproducible results. Questions with ambiguous wording, grammatical cues, or heterogeneous answers introduce random error that reduces reliability. This makes it harder to distinguish between prepared and unprepared students, leading to inconsistent, unreliable scores that are intended to indicate exam takers' knowledge of the course content.

Discrimination: Separating Strong from Weak Students

Quality questions discriminate between students who have mastered content and those who haven't. Point-biserial correlations and discrimination indices quantify this separation. Flawed questions show poor discrimination for two reasons: test-wise students answer correctly despite limited knowledge, while knowledgeable students miss confusing questions just as often as unprepared students. Either way, the question fails to distinguish competence from lack thereof.

Why Do Faculty Write Flawed Questions?

Despite extensive resources on item-writing best practices, faculty across health sciences programs consistently produce questions containing multiple flaws. The reasons are not a lack of effort or intelligence but systemic challenges:

Inadequate Training: Most health sciences faculty have never received formal training in writing assessment items. Writing effective multiple choice questions is a specialized skill, and without this training, even experienced educators rely on intuition and repeat flawed patterns they've seen on previous exams.
Severe Time Constraints: Health sciences faculty juggle teaching, research, service, and clinical responsibilities. Exam writing often happens under deadline pressure, leaving little time for careful revision or peer review. It's faster to write questions that "feel right" than to methodically apply item-writing principles.
Lack of Systematic Quality Control: Many programs lack systematic quality control processes. Questions go directly from author to exam without structured review, feedback, or empirical analysis. This means the same flaws persist year after year, embedded in question banks that get reused without revision.

The Challenge of Identifying Flaws in Your Own Work

Even faculty who understand item-writing principles struggle to identify flaws in their own questions. Psychological factors make self-review inherently limited.

As question authors, faculty know what they intended to ask. This makes it difficult to see ambiguity, unclear wording, or unintended cues that confuse students. The author's knowledge of the "right answer" creates blind spots that prevent objective evaluation.

Additionally, reviewing one's own work requires switching from a creative to a critical mindset. After investing time crafting a question, it's psychologically difficult to identify fundamental problems that require substantial revision. The natural tendency is to see questions as better than they actually are.

This is why peer review and external feedback prove so valuable. Fresh eyes catch problems that authors miss.

Flawed Question

Which of the following is the primary mechansm of action for hydralozine?

A. Competitive antagonism of H1 histamine receptors ✓

B. Inhibition of the sodium-potassium-chloride cotransporter

C. Beta-adrenergic blockade

D. None of the above

The author intended to ask about hydroxyzine but accidentally typed and misspelled 'hydralazine'. While spell check would catch 'mechansm' and 'hydralozine', not all exam platforms have spellcheck enabled and faculty frequently do not export an exam to conduct a spellcheck. Additionally, even a faculty peer reviewer might miss the mix-up between hydralazine and hydroxyzine because the brain tends to read what it expects to see. This flaw results in a difficult question with a negative (poor) biserial index.

Difficulty:33%

Biserial:-0.21

Corrected Question

Which of the following is the primary mechanism of action for hydroxyzine?

A. Competitive antagonism of H1 histamine receptors ✓

B. Inhibition of the sodium-potassium-chloride cotransporter

C. Beta-adrenergic blockade

D. Selective serotonin reuptake inhibition

Via an AI-assisted peer review process, the spelling errors and mix-up between hydroxyzine and hydralazine would be detected and corrected. Additionally, the use of 'none of the above' is an item-writing flaw and would be flagged for the faculty reviewer to consider a revision. The updated question now properly assesses student knowledge of the topic without these item-writing flaws.

Difficulty:68%

Biserial:0.45

Automated Quality Analysis for Every Question

ExamEval provides AI-powered analysis that identifies item-writing flaws and quality issues across entire exams in minutes. Rather than relying on scarce peer reviewer time or faculty self-review, educators receive immediate, comprehensive feedback on every question.

The system analyzes each item against established best practices, flagging specific problems like unfocused stems, test-wise cues, heterogeneous answers, and grammatical inconsistencies. This automated review catches flaws that even experienced educators miss in their own work.

Objective Feedback Without the Awkwardness

ExamEval delivers constructive criticism without the interpersonal dynamics that complicate traditional peer review. There's no risk of offending colleagues, straining professional relationships, or receiving superficial feedback from reviewers who want to avoid conflict.

The analysis is thorough, specific, and actionable. Rather than vague suggestions to "improve clarity," ExamEval identifies exactly what makes a stem unclear and suggests concrete revisions. This specificity accelerates improvement and builds faculty skills over time.

Supporting Faculty Development at Scale

By providing detailed rationale for each identified flaw, ExamEval serves as an ongoing professional development tool. Health sciences faculty learn item-writing principles through repeated exposure to specific feedback on their own questions, which is a far more effective approach than generic workshops.

Over time, health sciences educators internalize these principles and write higher-quality questions from the start. The system scales what would require extensive peer reviewer training and coordination, making systematic quality improvement feasible even for large health sciences programs with limited assessment expertise.

Preserving Faculty Autonomy

ExamEval provides expert feedback, but health sciences faculty retain full control over final questions. The system flags potential issues and suggests improvements, but authors make all final decisions about whether to accept, modify, or reject recommendations.

This respects faculty expertise and program-specific curricular context while providing the external perspective needed to catch flaws. It's collaborative enhancement, not automated replacement of professional judgment.

By making rigorous quality analysis accessible and efficient, ExamEval enables health sciences faculty to create exams that truly measure student competence. The result is fairer assessment, better data on student learning, and confidence that progression decisions reflect actual readiness for clinical practice.

Summary: Key Takeaways for High-Quality Questions

Writing effective multiple-choice questions is a critical skill for any health sciences educator. High-quality questions are clear, aligned with learning outcomes, free of test-wise cues, and feature plausible, homogeneous distractors. By avoiding common flaws and leveraging tools like ExamEval for automated analysis, faculty can create fairer, more reliable assessments that accurately measure student competence and uphold the highest standards of professional education.

References

Downing SM. The effects of violating standard item writing principles on tests and students: the consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ Theory Pract. 2005;10(2):133-143. doi:10.1007/s10459-004-4019-5
Tarrant M, Knierim A, Hayes SK, Ware J. The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nurse Educ Today. 2006;26(8):662-671. doi:10.1016/j.nedt.2006.07.006
Haladyna TM, Downing SM, Rodriguez MC. A review of multiple-choice item-writing guidelines for classroom assessment. Appl Meas Educ. 2002;15(3):309-334. doi:10.1207/S15324818AME1503_5
Hye T, Hossain MF, Roni MA. Interventions to Improve the Quality of Multiple-Choice Questions: A Systematic Review and Meta-Analysis. Am J Pharm Educ. 2025;89(11):101872. doi:10.1016/j.ajpe.2025.101872

See All Articles