Dear fellow researchers,

while drafting an article about the importance and interlay of previous knowledge and learning with (multiple) external representations (combinations of text + pictures or diagrams etc.) Stumbled numerous times over cases of Expertise-Reversal-Effects, that seem not be explainable in the conventional terms of the Cognitive Load Theory (CLT) so far.

So, I would like to share these findings with you and to invite you to think about alternative explanations.

What is the Expertise-Reversal-Effect (ERE)?

The core idea behind the CLT is, that the better one's previous knowledge is organized (as chunks), the lesser one's working memory is loaded when solving problems or learning new contents. This holds true for most of the experimental observations. However, in some cases, high previous knowledge (HPK) leads to lesser performance outcomes than of participants with low previous knowledge (LPK). This effect is called the Expertise-Reversal-Effect (ERE): HPK learner profit less or even not from a specific treatment than their LKP counterparts.

How is the ERE explained in terms of the Cognitive Load Theory (CLT)?

For explaining this contradiction, CLT also proposes an executive moment of previous knowledge, that guides search & find processes. Those cognitive procedures could conflict with the instructional format as well as previous knowledge can conflict with the presented contents. So, as Slava Kalyuga states, "if external guidance is provided to learners who have sufficient knowledge base for dealing with the same units of information, learners would have to relate and reconcile the related components of available long-term memory base and externally provided guidance. Such integration processes may impose an additional working memory load and reduce resources available for learning new knowledge."

Article Expertise Reversal Effect and Its Implications for Learner-T...

The fact that previous knowledge may induce additional cognitive load would explain lesser (absolute) learning outcomes of HPK learners with a specific treatment in comparison to their HPK counterparts without treatment. It would also explain lesser learning gains compared to their LPK counterparts with treatment (in case ceiling effects can be excluded).

However, it is difficult to follow this explanation for the case that HPK learners with treatment show lesser (absolute) learning outcomes than their LPK counterparts as this implies (by the interpretation of the CLT) that the instructional treatment must have had an enormous effect on cognitive load, overcompensating any advantages of previous knowledge.

Which evidences and limitations of the explanation given by the CLT have been observed?

There are convincing examples that undoubtedly trigger a cognitive conflict between the mental models of the participants and the presented information like in Schnotz & Bannert:

Article Construction and interference in learning from multiple representation

However, these experiments heavily (and intentionally) manipulated previous or presented knowledge to yield their effects. Most treatments we are much less pervasive and therefore their effects in terms of interference between previous knowledge and presented content (including treatment) should be milder. Furthermore, the ability to ignore treatments like signaling by color coding is not taken into account by CLT, it is however been demonstrated by eye tracking studies of Richter and Scheiter:

Article Studying the expertise reversal of the multimedia signaling ...

In this study, recall performance of HPK and LPK learners with that simple treatment equals, while for participants without treatment differ significantly as expected (cf. Fig. 3). The same for the comprehension measures in Richter, Wehrle & Scheiter (cf. Fig. 3):

Article How the poor get richer: Signaling guides attention and fost...

Even more intriguing are findings by Kragten, Admiraal & Rijlarsdam, who report on an analysis of difficulties without any treatment of diagrams that low cognitive demanding diagrams (i.e. diagrams with low complexity) are even slightly better been understood by LPK than HPK learners. (Diagrams with high complexity instead show the expected characteristics.) Moreover, diagrams with unfamiliar conventions AND that poses high cognitive demands are being significantly better understood than those of eighter complex or with unfamiliar conventions (cf. Fig. 2):

Article Diagrammatic Literacy in Secondary Science Education

These are some of the ERE findings that are particularly surprising and, in my humble opinion, cannot been explained in plausible way within the framework of CLT.

Is a Dual Processing hypothesis a sufficient candidate for explaining these findings?

Reading the book “Thinking fast and slow” by Daniel Kahnemann, I came across the hypothesis (to my knowledge originated by Stanovich and West) that there are two cognitive processes been postulated that govern problem solving and decision making in economics. According to that theory most cognitive processes in daily live (and learning) are done on an automated base relying on previously acquired cognitive schemata (system 1). These processes require minimal mental effort but are prone to errors. However, if system 1-processes do not lead to a sufficient solution or intentionally attention is shifted to the given problem, system 2 kicks in and starts deeper elaboration processes. So, perceiving hard to solve problems or being forced to shift focused attention to a given problem should significantly decrease error rate. Also see:

Article Dual-Process Theories of Higher Cognition

This theory has been recently applied to several fields, however to my knowledge not to learning and teaching so far and especially not to multimedia instructional design and external representations.

So my Questions for Discussion:

  • In your opinion, is there a need for an alternative explanation of the Expertise-Reversal-Effect? (And why do you think so?)
  • In your opinion, is the Dual Processing Theories a good candidate to explain the given data or are there even better ways to do so?
  • In your opinion, how to predict an ERE before the experiment based on CLT or any other theory?
Similar questions and discussions