Literally speaking, I would say no, as the McGurk effect specifically refers to the integration of auditory spoken information and visual lip information. But of course, there are many other kinds of potential situations in which different modalities provide information that can be integrated. The ventroloquism illusion is one case in point (see eg https://www.researchgate.net/publication/6886086_The_aftereffects_of_ventriloquism_patterns_of_spatial_generalization?ev=srch_pub) but there are many others
Article The aftereffects of ventriloquism: Patterns of spatial generalization
I think the direct realist theory of speech perception would put the directly perceived motion as where integration occurs, however wrong or right that view is. Maybe you can find something useful concerning the discussion of todoma here: Oerlemans, M. and P. Blamey (1998). Touch and auditory-visual speech perception. Hearing by Eye:Part 2, The Psychology of Speechreading and Auditory-visual Speech. R. Campbell, B. Dodd and D. Burnham. Hillsdale, NJ, Earlbaum: 245-281.