Item difficulty is used to screen out those questions that are too easy or too difficult. For a standardized test, where the main goal is to distinguish high-performers from low-performers, a very easy or difficult item does not serve much of a purpose because it provides no information about who is a high- versus low- performer. However, such an item might serve a purpose on a test that is specifically geared toward content mastery. It may be that you want to ensure that students master a very easy concept and so this item on the test specifically addresses this content.
You can figure out how difficult an item is simply by tallying the number of people who got the item correct and dividing this number by the total number of people who took the test. This will give you the proportion of who got the item right. The technical term is the p-value. Low p-values mean that the item was difficult and high p-values mean that it was easy.
Generally, professional test developers like to see items that range in difficulty from just above chance performance (for example, p-values no lower than 0.30), to just under the ceiling effects (for example, p-values no higher than 0.90). A nice range of difficulties across items ensures that your test is able to measure all along the continuum of mastery.
All that said, item difficulty analysis can give you ideas about which types of items were easier or more difficult for students, and this information can improve future item writing. You can compare item difficulty with your test blueprint and see if your difficulty distribution was in the ballpark. If not, you will know what kind of items to include the next time around.
In GradeHub, you can review an item’s difficulty summarized in our Item Matrix report and p-valued in our Item Analysis report.