The temporal and spatial neural processing of faces has been investigated rigorously, but few studies have unified these dimensions to reveal the spatio-temporal dynamics postulated by the models of face processing. We used support vector machine decoding and representational similarity analysis to combine information from different locations (fMRI), time windows (EEG), and theoretical models. By correlating representational dissimilarity matrices (RDMs) derived from multiple pairwise classifications of neural responses to different facial expressions (neutral, happy, fearful, angry), we found early EEG time windows (starting around 130 ms) to match fMRI data from primary visual cortex (V1), and later time windows (starting around 190 ms) to match data from lateral occipital, fusiform face complex, and temporal-parietal-occipital junction (TPOJ). According to model comparisons, the EEG classification results were based more on low-level visual features than expression intensities or categories. In fMRI, the model comparisons revealed change along the processing hierarchy, from low-level visual feature coding in V1 to coding of intensity of expressions in the right TPOJ. The results highlight the importance of a multimodal approach for understanding the functional roles of different brain regions in face processing.