DeepFake Video Detection using Vision Transformer

Document Type : Original Article

Authors

1 Computer Science, Faculty of Computer & Information, Fayoum University

2 Department of Computer Science, Faculty of Computers and Artificial Intelligence, Fayoum University Egypt

Abstract

Technology is always a double-edged sword, and with the astonishing advancements in technology, it is expected that the DeepFake problem will become more common and serious. DeepFake has recently caused a lot of trouble because its flaws outweigh its advantages. Since DeepFake has such a significant influence on individuals deception, instability of principles and falsification of evidence. Instead of just affecting people, it led to multiple incidents that affected the image of entire nations. In this work, a model that has been built to mitigate the negative effects of deepFake and maintain an individual's reputation by detecting the alteration of people's photographs and videos. A model with integrated vision transformer architectures Deep-ViT and Cross-ViT is designed to process pre-extracted faces from FF++ dataset. The model distinguishes between the real and fake faces in two different perspectives, subclass detection on each manipulation method and overall detection of all types. The proposed model achieves an outstanding results and the highest accuracy in FaceSwap manipulation method with 98%.

Keywords