This is a difficult issue. Stan Dehaene has some data showing that as individuals become skilled readers there are reductions for activation in other class/category of visual objects. Thus, faces and objects are reduced in this region. I have (unpublished) data that was done in conjunction with the published meta-analysis (Bolger, Perfetti, & Schneider, 2005, HBM) on reading in different languages that the locus for faces and other visual objects tended to occur about 10mm medial to the typical VWFA as reported in the paper.
Strange as it may seem, visual word form has hardly a role in fluent reading, and certainly not in early reading. All extant word recognition models are based on individual constituent letters in the word and their position, the latter being especially critical. Dependence on visual word form, however, can occur, and it does so in cases where the letter information is so poor that it precludes fluent reading. Reading then becomes considerably slower and error prone, and visual word form is actually the last straw. Generally, readers confronted with degraded text that enforce dependence on word form stop reading at all. Correspondingly, there is a lower bound on speed of reading and it appears to be about 20 words/minute.