Traditionally, for an item-based educational assessment, the item discrimination index has been used to represent how well individual items discriminate between test candidates of different abilities as represented by their overall performance in the assessment.
Where an assessment is not item-based, however, an educator may be interested in determining whether the distribution of total marks is looking healthier this year (in the sense of appearing to be more discriminatory) than in a previous year, when performance was heavily skewed towards high marks.
Basic methods could be employed for comparing total scores across two consecutive years, such as comparing histograms and boxplots and comparing medians, ranges and the minimum and maximum scores, all with a view to seeking evidence for improved score qualities, such as a better approximation to Normality (if that were a preferred outcome).
However, I would be interested to learn from people involved in educational assessment, including research on this topic, who have a well-defined system in place for monitoring the quality of an assessment based on analysis of total scores across a cohort, where the test is not item-based and there is an interest, nevertheless, in obtaining some evidence that the test has discriminatory power based on the standalone test scores (rather than a comparison with other scores which are alleged to represent true ability).
I appreciate that there are multiple confounders which could influence apparent differences in discriminatory power from one year to another, including factors relating to the exam candidates. However, my intention is to omit this particular discussion point for now, thanks, and start from an 'all other factors being equal' perspective.