GradeHub is now a part of the Turnitin family. To grade assessments, including bubble sheets, try Gradescope by Turnitin.

Blog Post

Grade Smarter

Designing Exams

Class size, the subject, and purpose can affect faculty decisions in designing exams. No matter the situation, exams should be valid, reliable, relevant, and realistic. The items (i.e, test questions) used in designing exams can range from True/False to Essays. A well-written exam will help the teacher understand student learning and provide teacher and student feedback to inform instruction.

Large class sections often constrain exam design. For example, the amount of effort and timeliness of feedback often precludes essay only exams in lecture halls. Similarly, there are topics where selected-response questions are typically not used, such as a course in writing strategies. Is the exam a quiz to keep students on-track or a final to make judgments about student learning? Quizzes provide informative feedback to faculty and students, and generally, such formative assessments benefit from a quick turnaround. Summative assessments, such as midterms and finals, may not have tight time constraints but often do when final grades are due to the registrar or district office.

Designing Exams – Characteristics

Exams should have the following characteristics:

Validity – refers to the degree to which assessment scores can be interpreted as a meaningful indicator of the skills or abilities of interest.

Reliability – refers to the extent to which an assessment yields the same results on different occasions. Ideally, if an assessment is given to two groups of test-takers with equal ability under the same testing conditions, the results of the two assessments should be the same, or very similar.

Relevant – refers to the degree an assessment addresses the information taught in class. Conversely, irrelevance occurs when an assessment requires background knowledge or cognitive abilities that were not included instruction (for example, the ability to read a bubble chart).

Realistic – refers to time constraints to complete the exam.

There should be no surprises, an exam should cover the most important information covered in the course. One way to achieve this is to document the learning outcomes for the course and align an assessment to these course objectives. As a general rule, assessments that focus too heavily on details (e.g., isolated facts, figures, etc.) “will probably lead to better student retention of the footnotes at the cost of the main points” (Halpern & Hakel, 2003, p. 40). As noted in Table 1, each type of exam item may be better suited to measuring some learning outcomes than others, and each has its advantages and disadvantages in terms of ease of design, implementation, and scoring.

Designing Exams – Question Types

Type of ItemAdvantages

True-FalseMany items can be administered in a relatively short time. Moderately easy to write; easily scored.Limited primarily to testing knowledge of information. Easy to guess correctly on many items, even if material has not been mastered.
Multiple-ChoiceCan be used to assess broad range of content in a brief period. Skillfully written items can measure higher order cognitive skills. Can be scored quickly.Difficult and time consuming to write good items. Possible to assess higher order cognitive skills, but most items assess only knowledge. Some correct answers can be guesses.
MatchingItems can be written quickly. A broad range of content can be assessed. Scoring can be done efficiently.Higher order cognitive skills are difficult to assess.
Short Answer or CompletionMany can be administered in a brief amount of time. Relatively efficient to score. Moderately easy to write.Difficult to identify defensible criteria for correct answers. Limited to questions that can be answered or completed in very few words.
EssayCan be used to measure higher order cognitive skills. Relatively easy to write questions. Difficult for respondent to get correct answer by guessing.Time consuming to administer and score. Difficult to identify reliable criteria for scoring. Only a limited range of content can be sampled during any one testing period.

Piontek, M E. “Best Practices for Designing and Grading Exams.” Best Practices for Designing and Grading Exams | CRLT, Center for Research on Teaching and Learning, University of Michigan,