Abstract
Rapid progress in adversarial learning has enabled the generation of realistic-looking fake visual content. To distinguish between fake and real visual content, several detection techniques have been proposed. The performance of most of these techniques however drops off significantly if the test and the training data are sampled from different distributions. This motivates efforts towards improving the generalization of fake detectors. Since current fake content generation techniques do not accurately model the frequency spectrum of the natural images, we observe that the frequency spectrum of the fake visual data contains discriminative characteristics that can be used to detect fake content. We also observe that the information captured in the frequency spectrum is different from that of the spatial domain. Using these insights, we propose to complement frequency and spatial domain features using a two-stream convolutional neural network architecture called TwoStreamNet. We demonstrate the improved generalization of the proposed two-stream network to several unseen generation architectures, datasets, and techniques. The proposed detector has demonstrated significant performance improvement compared to the current state-of-the-art fake content detectors with the fusing of frequency and spatial domain streams also improving the generalization of the detector.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The eyes and mouth are determined as the mesoscopic features in the forgery detection in the Deepfake videos.
References
Chesney B, Citron D (2019) Deep fakes: a looming challenge for privacy, democracy, and national security. Calif L Rev 107:1753
Kumar R, Sotelo J, Kumar K, de Brébisson A, Bengio Y (2017) Obamanet: photo-realistic lip-sync from text. arXiv preprint arXiv:1801.01442
Suwajanakorn S, Seitz SM, Kemelmacher-Shlizerman I (2017) Synthesizing Obama: learning lip sync from audio. ACM Trans Graph (TOG) 36(4):95
Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M (2016) Face2face: real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2387–2395
Thies J, Zollhöfer M, Nießner M, Valgaerts L, Stamminger M, Theobalt C (2015) Real-time expression transfer for facial reenactment. ACM Trans Graph 34(6):183–191
Wiles O, Koepke A, Zisserman A (2018) X2face: a network for controlling face generation using images, audio, and pose codes. In: Proceedings of the European conference on computer vision (ECCV), pp 670–686
Chan C, Ginosar S, Zhou T, Efros AA (2019) Everybody dance now. In: Proceedings of the IEEE international conference on computer vision, pp 5933–5942
Cai H, Bai C, Tai YW, Tang CK (2018) Deep video generation, prediction and completion of human action sequences. In: Proceedings of the European conference on computer vision (ECCV), pp 366–382
Esser P, Haux J, Milbich T et al (2018) Towards learning a realistic rendering of human behavior. In: Proceedings of the European conference on computer vision (ECCV)
Wang Y, Skerry-Ryan RJ, Stanton D, Wu Y, Weiss RJ, Jaitly N, Yang Z, Xiao Y, Chen Z, Bengio S et al (2017)Tacotron: towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135
Arik SO, Chen J, Peng K, Ping W, Zhou Y (2018) Neural voice cloning with a few samples. In: Advances in neural information processing systems, pp 10019–10029
Tachibana H, Uenoyama K, Aihara S (2018) Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP, pp 4784–4788. IEEE
From porn to ‘game of thrones’: how deepfakes and realistic-looking fake videos hit it big. https://www.businessinsider.com/deepfakes-explained-the-rise-of-fake-realistic-videos-online-2019-6. Accessed 07 Dec 2020
Lee D (2018)Fake porn’ has serious consequences. https://www.bbc.com/news/technology-42912529. Accessed 07 Dec 2020
Cole S (2018) Gfycat’s AI solution for fighting deepfakes isn’t working. https://www.vice.com/en_us/article/ywe4qw/gfycat-spotting-deepfakes-fake-ai-porn. Accessed 07 Dec 2020
Patrini G, Cavalli F, Henry A (2018) The state of deepfakes: reality under attack, annual report v.2.3. https://deeptracelabs.com/archive/. Accessed 07 Dec 2020
Damiani J (2019) A voice deepfake was used to scam a CEO out of 243,000\$. https://www.forbes.com/sites/jessedamiani/2019/09/03/a-voice-deepfake-was-used-toscam-a-ceo-out-of-243000/. Accessed 07 Dec 2020
Bayar B, Stamm MC (2016)A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM workshop on information hiding and multimedia security, pp 5–10
Cozzolino D, Poggi G, Verdoliva L (2017) Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: Proceedings of the 5th ACM workshop on information hiding and multimedia security, pp 159–164
Rahmouni N, Nozick V, Yamagishi J, Echizen I (2017) Distinguishing computer graphics from natural images using convolution neural networks. In: 2017 IEEE workshop on information forensics and security (WIFS), pp 1–6. IEEE
Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2018) FaceForensics: a large-scale video dataset for forgery detection in human faces. arXiv preprint arXiv:1803.09179
Afchar D, Nozick V, Yamagishi J, Echizen I (2018) MesoNet: a compact facial video forgery detection network. In: 2018 IEEE international workshop on information forensics and security (WIFS), pp 1–7. IEEE
Nataraj L, Mohammed TM, Manjunath BS, Chandrasekaran S, Flenner A, Bappy JH, Roy Chowdhury AK (2019) generated fake images using co-occurrence matrices. Electron Imaging 2019(5):532–541
Wang SY, Wang O, Zhang R, Owens A, Efros AA (2020) Cnn-generated images are surprisingly easy to spot... for now. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 7
Zhang X, Karaman S, Chang SF (2019) Detecting and simulating artifacts in gan fake images. In: 2019 IEEE international workshop on information forensics and security (WIFS), pp 1–6
Agarwal S, Farid H (2017) Photo forensics from jpeg dimples. In: 2017 IEEE workshop on information forensics and security (WIFS), pp 1–6. IEEE
Lyu S, Pan X, Zhang X (2014) Exposing region splicing forgeries with blind local noise estimation. Int J Comput Vis 110(2):202–221
Popescu AC, Farid H (2005) Exposing digital forgeries by detecting traces of resampling. IEEE Trans Signal Process 53(2):758–767
Li H, Luo W, Qiu X, Huang J (2017) Image forgery localization via integrating tampering possibility maps. IEEE Trans Inf Forensics Secur 12(5):1240–1252
Guo Y, Cao X, Zhang W, Wang R (2018) Fake colorized image detection. IEEE Trans Inf Forensics Secur 13(8):1932–1944
Peng B, Wang W, Dong J, Tan T (2018) Image forensics based on planar contact constraints of 3d objects. IEEE Trans Inf Forensics Secur 13(2):377–392
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Huh M, Liu A, Owens A, Efros AA. (2018) Fighting fake news: image splice detection via learned self-consistency. In: Proceedings of the European conference on computer vision (ECCV), pp 101–117
Cozzolino D, Poggi G, Verdoliva L (2015) Splicebuster: a new blind image splicing detector. In: 2015 IEEE international workshop on information forensics and security (WIFS), pp 1–6. IEEE
Yuan Rao , Jiangqun Ni (2016) A deep learning approach to detection of splicing and copy-move forgeries in images. In: 2016 IEEE international workshop on information forensics and security (WIFS), pp 1–6. IEEE
Yan Y, Ren W, Cao X (2019) Recolored image detection via a deep discriminative model. IEEE Trans Inf Forensics Secur 14(1):5–17
Quan W, Wang K, Yan D, Zhang X (2018) Distinguishing between natural and computer-generated images using convolutional neural networks. IEEE Trans Inf Forensics Secur 13(11):2772–2787
McCloskey S, Albright M (2018) Detecting gan-generated imagery using color cues. arXiv preprint arXiv:1812.08247
Li Y, Lyu S (2018) Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656
Li Y, Chang MC, Lyu S (2018) In ictu oculi: exposing AI created fake videos by detecting eye blinking. In: 2018 IEEE international workshop on information forensics and security (WIFS), pp 1–7. IEEE
Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 8261–8265. IEEE
Wang R, Juefei-Xu F, Ma L, Xie X, Huang Y, Wang J, Liu Y (2019) Fakespotter: a simple baseline for spotting ai-synthesized fake faces. arXiv preprint arXiv:1909.06122
Yang J, Xiao S, Li A, Lan G, Wang H (2021) Detecting fake images by identifying potential texture difference. Future Gener Comput Syst 125:127–135
Guo H, Hu S, Wang X, Chang MC, Lyu S (2021) Robust attentive deep neural network for exposing gan-generated faces. arXiv preprint arXiv:2109.02167
Cozzolino D, Thies J, Rössler A, Riess C, Nießner M, Verdoliva L (2018) Forensictransfer: weakly-supervised domain adaptation for forgery detection. arXiv preprint arXiv:1812.02510
Xuan X, Peng B, Wang W, Dong J (2019) On the generalization of GAN image forensics. In: Chinese conference on biometric recognition. Springer, pp 134–141
Gueguen L, Sergeev A, Kadlec B, Liu R, Yosinski J (2018) Faster neural networks straight from JPEG. In: Advances in neural information processing systems, pp 3933–3944
Ehrlich M, Davis LS (2019) Deep residual learning in the jpeg transform domain. In: Proceedings of the IEEE international conference on computer vision, pp 3484–3493
Xu K, Qin M, Sun F, Wang Y, Chen YK, Ren F (2020) Learning in the frequency domain. arXiv preprint arXiv:2002.12416
Durall R, Keuper M, Pfreundt FJ, Keuper J (2019) Unmasking deepfakes with simple features. arXiv preprint arXiv:1911.00686
Recommendation ITU-R (2011) Studio encoding parameters of digital television for standard 4: 3 and wide-screen 16: 9 aspect ratios
Radiocommunication ITU (2002) Parameter values for the HDTV standards for production and international programme exchange. Recommendation ITU-R BT, pp 709–5
Recommendation ITU-R BT.601-5 (1982-1995)
Recommendation ITU-R BT.709-5 (1990-2002)
Society of Motion Picture and Television Engineers SMPTE 240M-1999 “Television-Signal Parameters-1125-Line High-Definition Production”. http://www.smpte.org
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196
Yu F, Seff A, Zhang Y, Song S, Funkhouser T, Xiao J (2015) LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365
Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8789–8797
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4401–4410
Chen C, Chen Q, Xu J, Koltun V (2018) Learning to see in the dark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3291–3300
Brock A, Donahue J, Simonyan K (2018) Large scale GAN training for high fidelity natural image synthesis. In: International conference on learning representations
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2019) Analyzing and improving the image quality of StyleGAN. arXiv preprint arXiv:1912.04958
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
Which face is real? https://www.whichfaceisreal.com/. Accessed 07 Dec 2020
Park T, Liu MY, Wang TC, Zhu JY (2019) Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M.(2019) FaceForensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE international conference on computer vision, pp 1–11
Chen Q, Koltun V (2017) Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE international conference on computer vision, pp 1511–1520
Li K, Zhang T, Malik J (2019) Diverse image synthesis from semantic layouts via conditional IMLE. In: Proceedings of the IEEE international conference on computer vision, pp 4220–4229
Dai T, Cai J, Zhang Y, Xia ST, Zhang L (2019) Second-order attention network for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11065–11074
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We wish to confirm that there are no known conflicts of interest associated with this publication.
Ethical approval
We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us. We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the office).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yousaf, B., Usama, M., Sultani, W. et al. Fake visual content detection using two-stream convolutional neural networks. Neural Comput & Applic 34, 7991–8004 (2022). https://doi.org/10.1007/s00521-022-06902-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-06902-5