The McGurk illusion effectively demonstrates the audiovisual nature of speech perception. When an auditory syllable is dubbed onto an incongruous visual syllable the resulting percept is usually not either of the components, but their combination or fusion. The present experiment investigated the persistence of the McGurk effect when the facial configuration context of the audiovisual stimuli was manipulated. Two congruent and two incongruent audiovisual syllables were created from spoken /ipi/ and /iki/. These audiovisual tokens were uttered by seven facial configurations of five talkers. All facial configurations produced a clear McGurk effect (reported /iti/ for heard /ipi/ + seen /iki/), but the effect was significantly less when uttered by an asymmetrically scrambled face configuration. The results also showed significant differences in the persistence of the McGurk effect between different talkers. In sum, facial configural information can be used in the audiovisual integration of certain speech segments. This information is not necessary for the integration to occur, but the integration process can be disrupted by face stimuli violating the normal configural information.