In recent years consumer imaging technology (digital cameras, mobile phones, etc.) has become ubiquitous, allowing people the world over to take and share images and video instantaneously. Mirroring this rise in digital imagery is the associated ability for even relatively unskilled users to manipulate and distort the message of the visual media. While many manipulations are benign, performed for fun or for artistic value, others are for adversarial purposes, such as propaganda or misinformation campaigns.
This manipulation of visual media is enabled by the wide-scale availability of sophisticated image and video editing applications that permit editing in ways that are very difficult to detect either visually or with current image analysis and visual media forensics tools. The problem isn’t limited to the fashion and cosmetics industries where photos are “touched-up” and “augmented” to make models look better and the results of skin-care products look (instantly) appealing — it’s spread to politics and now even business.
The most famous manipulation or tampering technique widely used in this regard is named deepfake. Deepfakes are synthetic media in which a person in an existing image or video is replaced with someone else’s likeness. While the act of faking content is not new, deepfakes leverage powerful techniques from machine learning and artificial intelligence to manipulate or generate visual and audio content with a high potential to deceive. Deepfakes have garnered widespread attention for their uses in celebrity pornographic videos, revenge porn, fake news, hoaxes, and financial fraud. This has elicited responses from both industry and government to detect and limit their use.
Deepfake technology also has useful applications like in the game, advertisement, and film industries would greatly benefit from deepfake technology. This is due to the fact that through deepfake, actor’s dialogues or expressions could synthetically be replaced, which would not
only ease the work of editing and save time but can also reduce the cost of production. Deepfakes can also be used in a variety of different social applications, like applications related to remote teaching, speech therapy, virtual or personalized digital assistants, and real-time language translation. Furthermore, there is a potential application of deepfake in medical technology.
However, ready access to deep fake technology also allows cybercriminals, political activists and nation-states to quickly create cheap, realistic forgeries. Moreover, deepfakes are further used to create fake videos of politicians in order to create fake news. Deep fake got into prominance in July 2017, when a fake video of Barack Obama, former U. S President, was released by researchers at the University of Washington, the general public was warned about the potential disruptive interference of deep fake technology. Afterward, in May 2018, a low-quality deep fake video of President Donald Trump was uploaded to social media, telling Belgians to withdraw from Paris Climate Change agreement. This showed that this technology is continually evolving and has the ability to mislead a large segment of the public.
Deep fakes will pose the most risk when combined with other technologies and social trends: they’ll enhance cyberattacks, accelerate the spread of propaganda and disinformation online and exacerbate declining trust in democratic institutions. There are increasing instances of social media being abused and used to abuse the elections.
The manipulation of the faces can be used for bullying, revenge, porn, political sabotage, video evidence, blackmailing, and even for propaganda. The exposure of deep fakes and the services they facilitate can potentially lead to suppression of information and a general breakdown in confidence in public authorities and trust. Photographs and videos are often used as evidence in police investigations and in the courtroom to resolve legal cases since they are considered to be reliable sources. However, increasingly sophisticated technology has led to the development of new video and photo editing techniques that have potentially made these pieces of evidence unreliable. In short, deepfake has become a major problem in current society.
Deep fakes will provide new tools to cyber attackers for example sophisticated phishing attacks. The ability to mimic voice could be used to supplement cyber-attacks by automating, for example, voicemail that suggests opening a spear-phishing email. For example, audio generation can be used In March 2019, criminals used AI to impersonate an executive’s voice in the first reported use of deep fakes in a cybercrime operation, duping the CEO of a UK energy firm into transferring them €220,000. There’s also evidence that deep fake content can fool biometric scanners, such as facial recognition systems.
Information Warfare tool
This technology lowers the costs of engaging in information warfare at scale and broadens the range of actors able to engage in it. Today, propaganda is largely generated by humans, such as China’s ‘50-centers’ and Russian ‘troll farm’ operators. However, improvements in deep fake technology, especially text-generation tools, could help take humans ‘out of the loop’. The key reason for this isn’t that deep fakes are more authentic than human-generated content, but rather that they can produce ‘good enough’ content faster, and more economically, than current models for information warfare. Deep fake technology will be a particular value-add to the so-called Russian model of propaganda, which emphasises volume and rapidity of disinformation over plausibility and consistency in order to overwhelm, disorient and divide a target
Where things get especially scary is the prospect of malicious actors combining different forms of fake content into a seamless platform,” Andrew Grotto at the Center for International Security at Stanford University said. “Researchers can already produce convincing fake videos, generate persuasively realistic text, and deploy chatbots to interact with people. Imagine the potential persuasive impact on vulnerable people that integrating these technologies could have: an interactive deepfake of an influential person engaged in AI-directed propaganda on a bot-to-person basis.”
In 2019, journalists discovered that intelligence operatives had allegedly created a false LinkedIn profile for a ‘Katie Jones’, probably to collect information on security professional networks online. Researchers exposed the Katie Jones fake through technical photo analysis and a rather old-fashioned mechanism: asking the employer listed on LinkedIn (the Center for Strategic and International Studies) if such a person worked for it.
Deepfake technology
Image and video manipulation techniques, such as deepfake, relies on techniques from the field of artificial intelligence, specifically, machine learning, a part of artificial intelligence, is used as it is a technique that enables a system to learn from available data.
Deep fake processes can be applied to the full spectrum of digital media.
Face swapping: Users insert the face of a target onto another body. This process can be applied to both still images and video. Simple versions of this technique are available online through purpose-made apps.
Re-enactment: The face from a target source is mapped onto a user, allowing the faker to manipulate the target’s facial movements and expressions.
Lip syncing: Users copy mouth movements over a target video. Combined with audio generation, this technique can make a target appear to say false content.
Motion transfer: The body movements of a person in a source video can be transferred to a target in an authentic video recording.
Image generation: A user can create entirely new images; for example, faces, objects, landscapes or rooms.
Audio generation: Users create a synthesized voice from a small audio sample of an authentic voice.
Text generation: A user can generate artificial text, including short-form ‘comments’ on social media or web forums, or long-form news or opinion articles. Artificially generated comments are particularly effective, as there’s a wide margin for acceptable error for this type of online content
The deepfake program uses Google’s Image Search to explore different social media sites for source data and then replaces data of faces on its own. Since the program is based on machine learning, human interaction is not needed, even for supervision. Deep learning techniques are used to enhance the performance of image compression. Generative learning models and dimensionality reduction autoencoders are used to create compact representations of images. Furthermore, autoencoders, while minimizing the loss function, have the ability to extract a representation of compressed images. Hence, they result in maintaining overall good compression performance compared with the existing image.
The system builds a model of a person saying or doing something by using large datasets containing recordings, videos or photos. The manipulation is achieved via the use of hundreds or thousands of photos of the targeted person, which is used as a dataset. Two sets of training images are needed to train the deepfake program. The first set consists of sample images of the face that is to be replaced. These samples can easily be obtained from video. In order to achieve better and more realistic results, the first set can be further extended with photos from other sources. The second set consists of photos of the face that will be exchanged into the video.
Encoder firstly encodes all the images using a deep learning CNN and a decoder decodes it in order to reconstruct the image. The training process continues, using backpropagation until the output matches the input. Since the process consumes a lot of time, graphical processing units (GPUs) are used. After the training process is done, the person’s face is swapped with another frame-by-frame. The face of person A is extracted using face detection and fed to the encoder. The decoder of person B is used to re-construct the image instead of feeding it to its original decoder. By doing so, features of person A in the original video are drawn onto person B. Once it is done, the merging of the newly created face into the original image is done.
Generative adversarial networks (GANs) can shorten, and automate, the training process for AIs. In this process, two neural networks compete against one another to produce a deep fake. Both systems work in coordination with each other; the generator is responsible for creating fake videos or photos. When the video is identified as fake, the discriminator generates a clue for the generator about precautions that it should take when creating the next clip. Thus, a network called Generative Adversarial Network (GAN) is created. The more mistakes the discriminator finds in fake videos, the better the generator makes the next time. Thus, by making the discriminator better at spotting fake videos, the better the generator gets at making fake videos. GAN models are now widely accessible, and many are available for free online.
Deepfake tools are available commercially
There are also tools in development for creating deepfakes for voice. Software from Adobe that can reliably mimic a speaker based on 20 minutes of voice data has been demonstrated publicly, but the prototype appears to be unreleased as of this writing. Voice mimicking software that only requires one minute of voice data is publicly available but produces output with notable artifacts and a robotic tone.
On Reddit, an app called FakeApp was released that guided users through the essential steps of the deepfake algorithm. Through such application, even with limited knowledge of machine learning and programming, a deepfake image or video can be created.
Free software and trending smartphone applications such as FaceSwap or Zao7 allow everyday users to create and distribute content. Other services can be accessed at low cost: the Lyrebird voice generation service, for instance, offers subscription packages for its tools. In short: deep fake technology has been democratized.
Phone applications, such as FaceApp, permit manipulation of facial characteristics. Video software libraries such as DeepFakeLab and FaceSwap are available as public or open-source systems
False news stories and so-called deepfakes are increasingly sophisticated and making it more difficult for data-driven software to spot. Though software that makes deepfakes possible is inexpensive and easy to use, existing video analysis tools aren’t yet up to the task of identifying what’s real and what’s been cooked up. In the near future, using deepfake, full-fledged digital avatars will be available as vehicles for self expression by those people who need them
Deepfake detection
While earlier deepfake technology can create a video that will pass the “first glance” test, it will contain telling artifacts on closer examination (including unnatural mouth and eyebrow movements). Newer techniques are more powerful and can capture integrated head position and rotation movements, facial expressions (including eyebrow movements and blinks), and eye movements.
Although the use of machine learning and artificial intelligence has made deepfake technology efficient, there are still defects in its algorithms that can be exploited. Various detection strategies like CNNs, multimedia forensics, and watermarking can be used to efficiently detect deepfakes. Detection allows appropriate action to be taken to either delete the content or to flag the content as tampered.
Although the algorithm, through the use of artificial intelligence, creates a credible altered image or video, it is still unable to manipulate small details like the blinking of an eye. Difference in eye color is also used to detect deepfakes.
Image forensics interpret indistinguishable parameters such as pixel correlation, image continuity, and lighting. Multimedia forensics rely on each phase of image history, which includes storing in a compressed or different format, acquisition process or any post-processing process that may have left a unique trace of data, like a fingerprint. In short, using this fingerprint or data trace, multimedia forensics examines the information and determines whether data or a feature has been altered.
Watermarking allows identifying whether editing has taken place. Watermarks are attached when content is manipulated. The traces are somewhat visible, so even if the content is shared on social media or forums, the modified or altered elements would likely have such material attached and this would alert the recipients of the fake content
Compared to detection that is done by humans, convolutional neural networks (CNNs), and other similar approaches, work on the principles of machine learning and have the ability to detect deepfake content via powerful image analysis features. These artificial intelligence algorithms possess the ability to be stationed on information-sharing platforms and implanted in social media. They run in the background, continually monitoring uploaded content and detecting whether that content is authentic or fake. Hence, this technique allows alerting users or removing fake content in a timely manner and before it is disseminated.
The Defense Advanced Research Projects Agency (DARPA), through its MediFor program, is researching the possibility of creating an integrated media forensics platform—essentially, a deepfake detection toolkit. Such a toolkit could make the production of deepfakes more computationally expensive, which may reduce their use until the cost of computation decreases significantly.
U.S. Army’s Successive Subspace Learning, or SSL, technology is a game-changer for deepfake detection
Researchers at the U.S. Army Combat Capabilities Development Command, known as DEVCOM, Army Research Laboratory, in collaboration with Professor C.-C. Jay Kuo’s research group at the University of Southern California, set out to tackle the significant threat that Deepfake poses to our society and national security. The result is an innovative technological solution called DefakeHop. The researchers worked under the laboratory director’s Research Award for External Collaborative Initiative and the Army AI Innovation Institute.Their work is featured in the paper titled “DefakeHop: A light-weight high-performance deepfake detector,” which presented at the IEEE International Conference on Multimedia and Expo 2021 in July.
Deepfake refers to artificial intelligence-synthesized, hyper-realistic video content that falsely depicts individuals saying or doing something, said ARL researchers Dr. Suya You and Dr. Shuowen (Sean) Hu. Most state-of-the-art deepfake video detection and media forensics methods are based upon deep learning, which have many inherent weaknesses in terms of robustness, scalability and portability.
“Due to the progression of generative neural networks, AI-driven deepfake advances so rapidly that there is a scarcity of reliable techniques to detect and defend against deepfakes,” You said. “There is an urgent need for an alternative paradigm that can understand the mechanism behind the startling performance of deepfakes and develop effective defense solutions with solid theoretical support.”
Combining team member experience with machine learning, signal analysis and computer vision, the researchers developed an innovative theory and mathematical framework, the Successive Subspace Learning, or SSL, as an innovative neural network architecture. SSL is the key innovation of DefakeHop, researchers said.
“SSL is an entirely new mathematical framework for neural network architecture developed from signal transform theory,” Kuo said. “It is radically different from the traditional approach, offering a new signal representation and process that involves multiple transform matrices in cascade. It is very suitable for high-dimensional data that have short-, mid- and long-range covariance structures. SSL exploits such a property naturally in its design. It is a complete data-driven unsupervised framework, offers a brand new tool for image processing and understanding tasks such as face biometrics.”
Most current state-of-the-art techniques for deepfake video detection and media forensics methods are based on the deep learning mechanism, You said. According to the team, DefakeHop has several significant advantages over current start-of-the-arts, including:
-It is built upon the entirely new SSL signal representation and transform theory. It is mathematically transparent since its internal modules and processing are explainable
-It is a weakly-supervised approach, providing a one-pass (without needing backpropagation) learning mechanism for the labeling cost saving with significantly lower training complexity
-It generates significantly smaller model sizes and parameters. Its complexity is much lower than that of state-of-the-art and it can be effectively implemented on the tactical edge devices and platforms
-It is robust to adversarial attacks. The deep learning based approach is vulnerable to adversarial attacks. This research provides a robust spatial-spectral representation to purify the adversarial inputs, thus adversarial perturbations can be effectively and efficiently defended against
This research supports the Army’s and lab’s AI and ML research efforts by introducing and studying an innovative machine learning theory and its computational algorithms applied to intelligent perception, representation and processing, You said. “We expect future Soldiers to carry intelligent yet extremely low size-weight-power vision-based devices on the battlefield,” You said. “Today’s machine learning solution is too sensitive to a specific data environment. When data are acquired in a different setting, the network needs to be re-trained, which is difficult to conduct in an embedded system. The developed solution has quite a few desired characteristics, including a small model size, requiring limited training data, with low training complexity and capable of processing low-resolution input images. This can lead to game-changing solutions with far reaching applications to the future Army.”
The researchers successfully applied the SSL principle to resolve several face biometrics and general scene understanding problems. Coupled with the DefakeHop work, they developed a novel approach called FaceHop based on the SSL principle to a challenging problem-recognition and classification of face gender under low image quality and low-resolution environments. The team continues to develop novel solutions and scientific breakthroughs for face biometrics and for general scene understanding, for example, target detection, recognition and semantic scene understanding.
“We all have seen AI’s substantial impact on society-both good and bad, and AI is transforming many things,” Hu said. “Deepfake is an adverse example. The creation of sophisticated computer-generated imagery has been demonstrated for decades through the use of various visual effects in the entertainment industry, but recent advances in AI and machine learning have led to a dramatic increase in the realism of fake content and the ease of access to these tools.” The research team has the opportunity to address these challenging issues, which have both military and every day impact.
“We see this research as new, novel, timely and technically feasible today,” You said. “It is a high risk, high innovation effort with transformative potential. We anticipate that this research will provide solutions with significant advantages over current techniques, and add important new knowledge to the sciences of artificial intelligence, computer vision, intelligent scene understanding and face biometrics.”
Indian Researchers Develop Deepfake Detection Technology
The Indian Institute of Technology in Ropar (IIT-Ropar) and Monash University in Australia have developed ‘FakeBuster’, a deepfake detector to identify and prevent imposters from attending video conferencing and manipulating faces on social media. Deepfake is a form of artificial intelligence (using deep learning) to manipulate images, audio, and videos on the Internet.
FakeBuster is a deep learning-based solution that helps detect if a video is manipulated during a video-conference meeting. Amid the pandemic, where the majority of work or meetings are online, this standalone solution enables a user to detect if another person’s video is manipulated or spoofed during a video conferencing. The software is independent of video conferencing solutions and has been tested for its effectiveness on Skype and Zoom. It also detects deepfakes where faces have been manipulated on social media, according to a report.
Sophisticated artificial intelligence techniques have spurred a dramatic increase in the manipulation of media content. Such techniques keep evolving and become more realistic, which makes detection difficult, noted Dr Abhinav Dhall, one of the members of a four-man team that developed the ‘FakeBuster’. Other members include Assistant Professor Ramanathan Subramanian and two students Vineet Mehta and Parul Gupta. The team claims the technology claimed that the tool has over 90% accuracy.
The tool was presented at the 26th International Conference on Intelligent User Interfaces, in the US, April 2021. FakeBuster can function online and offline. It uses a 3D convolutional neural network for predicting video segment-wise fakeness scores. Deepfake has been extensively trained on datasets such as Deeperforensics, DFDC, VoxCeleb, and deepfake videos created using locally captured (for video conferencing scenarios) images.
Dr Dhall said that the usage of manipulated media content in spreading fake news has been widely observed with major repercussions. He said such manipulations have recently found their way into video-calling platforms through spoofing tools based on the transfer of facial expressions. These fake facial expressions are often convincing to the human eye and can have serious implications. These real-time mimicked visuals, or deepfakes, can even be used during online examinations and job interviews.
Since the device can presently be attached with laptops and desktops only, the team aims to make the network smaller and lighter to enable it to run on mobile phone devices as well, Subramanian informed. He said the team is also working on using the device to detect fake audios. The solution will fight deepfakes, which are a rising form of concern as its emergence will make it increasingly difficult for the public to distinguish between what is real and what is fake, a situation that insidious actors will inevitably exploit—with potentially devastating consequences.
References and Resources also include
https://www.eurekalert.org/pub_releases/2021-04/uarl-bat042921.php
http://www.jatit.org/volumes/Vol97No22/7Vol97No22.pdf
https://www.aspi.org.au/report/weaponised-deep-fakes
https://opengovasia.com/indian-researchers-develop-deepfake-detection-technology/