Deepfakes can spread misinformation, defamation, and propaganda by faking videos of public speakers. We assume that future deepfakes will be visually indistinguishable from real video, and will also fool current deepfake detection methods. As such, we posit a social verification system that instead validates the truth of an event via a set of videos. To confirm which, if any, videos are being faked at any point in time, we check for consistent facial geometry across videos. We demonstrate that by comparing mouth movement across views using a combination of PCA and hierarchical clustering, we can detect a deepfake with subtle mouth manipulations out of a set of six videos at high accuracy. Using our new multi-view dataset of 25 speakers, we show that our performance gracefully decays as we increase the number of identically faked videos from different input views.
Eleanor also wrote an article on deep fakes and this approach in the ACM's student magazine—XRDS: Crossroads. The article's DOI is here.
We have two repositories:
Download: Here from Brown University's library.
We have captured 24 participants speaking arbitrary sentences from 6 time-synchronized DSLR cameras. Then, these input videos were turned into a set of deep fakes by shuffling the audio around and synthesizing new matching mouth motions via LipGAN, We also include facial landmarks processed from both the real and fake videos, which are input to our video matching function codes.
PLEASE NOTE: These data are for preliminary experiments only and we release them for scientific reproducibility only. While we have captured some diversity in human appearance across our 24 participants, this data is heavily biased. Nobody should expect experimental findings generated with these data to hold across wider populations, and nobody should train a machine learning model on these data and deploy it anywhere.