In recent years consumer imaging technology (digital cameras, mobile phones, etc.) has become ubiquitous, allowing people the world over to take and share images and video instantaneously. Mirroring this rise in digital imagery is the associated ability for even relatively unskilled users to manipulate and distort the message of the visual media. This manipulation of visual media is enabled by the wide scale availability of sophisticated image and video editing applications that permit editing in ways that are very difficult to detect either visually or with current image analysis and visual media forensics tools.
The problem isn’t limited to the fashion and cosmetics industries where photos are “touched-up” and “augmented” to make models look better and the results of skin-care products look (instantly) appealing — it’s spread to politics and now even business. While many manipulations are benign, performed for fun or for artistic value, others are for adversarial purposes, such as propaganda or misinformation campaigns.
Bipartisan Senate Intelligence Committee and Special Counsel Robert Mueller, found that Russia’s social-media influence operation started in 2014, and included the dispatch of Russian intelligence operatives to the United States to study how to maximize the effectiveness of Moscow’s social-media campaign to divide Americans and give one presidential candidate an advantage over another
Recently, advancements in AI have made it possible to “edit” videos just as seamlessly as pictures. Though software that makes that makes deepfakes possible is inexpensive and easy to use, existing video analysis tools aren’t yet up to the task of identifying what’s real and what’s been cooked up. The “deepfake” photographs, videos, and audios are becoming highly realistic, difficult to authenticate, widely available, and easy to use.
The most infamous form of this kind of content is the category called “deepfakes” — usually pornographic video that superimposes a celebrity or public figure’s likeness into a compromising scene. Actress Scarlett Johansson, who has unwittingly starred in a supposedly leaked pornographic video that the Washington Post says has been viewed more than 1.5 million times on a major porn site, feels the situation is hopeless. “Nothing can stop someone from cutting and pasting my image or anyone else’s onto a different body and making it look as eerily realistic as desired,” she told the Post in a story published a few days ago. “The Internet is a vast wormhole of darkness that eats itself.
Threat of AI-generated fake video and audio
Falsifying photos and videos used to take a lot of work. Either you used CGI to generate photorealistic images from scratch (both challenging and expensive) or you needed some mastery of Photoshop—and a lot of time—to convincingly modify existing pictures. Thanks to advances in artificial intelligence, it’s now possible to create fake video and audio messages that are incredibly difficult to distinguish from the real thing.
The term “deepfakes” describes the use of artificial intelligence and computer-generated tricks to make a person (usually a well-known celebrity or politician) appear to do or say “fake” things. For example, actor Alden Ehrenreich’s face was recently replaced by Harrison Ford’s face in footage from “Solo: A Star Wars Story.” MIT Technology Review senior AI editor Will Knight used off-the-shelf software to forge his own fake video of US senator Ted Cruz. Two artists and a small technology start-up created a deepfake of Mark Zuckerberg and posted it on Instagram. In it, the phony Zuckerberg brags, on what looks like a CBS News program, about his power to rule the world.
The technique could be meant simply for entertainment or for more sinister purposes. The more convincing deepfakes become, the more unease they create among AI scientists, and military and intelligence communities. As a result, new methods are being developed to help combat the technology.
These “deepfakes” could be a boon to hackers in a couple of ways. AI-generated “phishing” e-mails that aim to trick people into handing over passwords and other sensitive data have already been shown to be more effective than ones generated by humans. Now hackers will be able to throw highly realistic fake video and audio into the mix, either to reinforce instructions in a phishing e-mail or as a standalone tactic.
Cybercriminals could also use the technology to manipulate stock prices by, say, posting a fake video of a CEO announcing that a company is facing a financing problem or some other crisis. There’s also the danger that deepfakes could be used to spread false news in elections and to stoke geopolitical tensions.
Turek ticked off other possible uses for deepfakes, such as to misrepresent products being sold online, or to fake out insurance companies over car accidents. He said there’s been evidence of researchers using manipulated images so their scientific findings can get published. And he predicts that eventually, tools could get so good that a series of deepfakes will be able to convince people of significant events that didn’t happen — cue the conspiracy theorists.
This is also new tool to have multiplier effect on deception technique as part of espionage and warfare. A realistic-seeming video showing an invasion, or a clandestine nuclear program, may trigger war between nations. A country’s population is galvanized as the newspaper headlines call for war. “We must strike first” they say in response to alleged footage of another country’s President declaring war on their nation. Is the footage real? Or was it their own country’s intelligence services trying to create the pretext for war?
Such ploys would once have required the resources of a big movie studio, but now they can be pulled off by anyone with a decent computer and a powerful graphics card.
Challenges of detecting deepfakes
The manufactured videos are hard to disprove and are only getting better as the technology behind them advances. Researchers are now on the defensive, busily looking for ways to spot a fake. This has made pictures and videos harder to trust. Many Startups are developing technology to detect deepfakes, but it’s unclear how effective their efforts will be. The forensic tools used today lack robustness and scalability, and address only some aspects of media authentication; an end-to-end platform to perform a complete and automated forensic analysis does not exist. In the meantime, the only real line of defense is security awareness training to sensitize people to the risk.
“There’s been a significant change in the last year or so as automated manipulations have become more convincing,” Turek said. Artificial intelligence and machine learning have helped make tools to create deepfakes more powerful, and researchers are using AI to try to fight back.
Wired notes that computer scientist Siwei Lyu felt his team’s deepfake videos, created via a machine learning algorithm, also “felt eerie” and not quite right. Examining them closer, he realized the digital human’s eyes were always open, because “the images that the program learned from didn’t include many with closed eyes,” which created a bias.
Lyu later wrote that such deepfake programs may well miss “physiological signals intrinsic to human beings … such as breathing at a normal rate, or having a pulse.” Within weeks of putting a draft of his results online, the team got “anonymous emails with links to deeply faked YouTube videos whose stars opened and closed their eyes more normally.” Lyu stated that, “blinking can be added to deepfake videos by including face images with closed eyes or using video sequences for training
“The challenge is to create algorithms to keep up and stay ahead of the technology that’s out there,” Turek said. “Deepfakes could alter the way we trust image and video as a society,” Turek said.
Industry developing Solutions
Media forensics is the science and practice of determining the authenticity and establishing the integrity of an audio and visual media asset for a variety of use cases such as litigation, fraud investigation, etc. For computer researchers, media forensics is an interdisciplinary approach to detect and identify digital media alterations using forensic techniques based on computer vision, machine learning, media imaging, statistics, etc. to identify evidence (or indicators) supporting or refuting the authenticity of a media asset.
The most commonly discussed approach is that we should counter deepfakes with software that forensically analyses video. Software like this may examine the characteristics of the audio and video data itself, looking for artifacts, abnormal compression signatures, or camera or microphone noise patterns. Aside from the characteristics of the data, AI may also analyze the video metadata, or even perform behavior pattern analysis on the subjects of the video.
In parallel with the development of deepfake technology, AI is also being developed to counter this threat: machines trained to detect malicious alterations in video for the inevitable future where we find ourselves unable to detect the forgeries ourselves.
Bob Bolles, program director at the AI center’s perception program at SRI recently showed off how he and his team are trying to detect when video has been tampered with. The team looks for inconsistencies between video and the audio track — for example, watching whether a person’s lip movements match the sounds of the video. Think videos that aren’t as obvious as those old, badly dubbed kung fu movies.
The team also tries to detect sounds that may not jibe with the background, said Aaron Lawson, assistant director of the speech lab at SRI. For example, the video may show a desert scene but reverberations can indicate the sound was recorded indoors. “We need a suite of tests” to keep up with hackers who are bound to keep figuring out new ways to keep fooling people, Bolles said.
While most of the research is being done to counter the looming threat of fake video by detecting malicious alterations in video by forensically analyzing the characteristics of the video. Instead of focusing on a remedy, there is another solution the authentication which can prevent alteration, writes Shamir Allibhai.
Video authentication is where video data is processed through a hashing algorithm which maps a collection of video data (for instance a file) to a small string of text, or “fingerprint”. A video file fingerprint can carry-on with it throughout the life of that video, from capture through to distribution. At playback, those fingerprints are reconfirmed, proving the authenticity of the video data — confirming it is the same video that was originally recorded.
This fingerprint could also be digitally signed by the recording device, providing evidence of where the content originally came from with the device details and other metadata. Whether that is a CCTV camera, a first responder’s body camera, a journalist’s registered equipment, or the mobile app of a concerned citizen.
Deep Neural Network Fights Deepfakes
Research led by Amit Roy-Chowdhury’s Video Computing Group at the University of California, Riverside has developed a deep neural network architecture that can identify manipulated images at the pixel level with high precision. Roy-Chowdhury is a professor of electrical and computer engineering and the Bourns Family Faculty Fellow in the Marlan and Rosemary Bourns College of Engineering.
A deep neural network is what artificial intelligence researchers call computer systems that have been trained to do specific tasks, in this case, recognize altered images. These networks are organized in connected layers; “architecture” refers to the number of layers and structure of the connections between them.
Objects in images have boundaries and whenever an object is inserted or removed from an image, its boundary will have different qualities than the boundaries of objects in the image naturally. Someone with good Photoshop skills will do their best to make the inserted object looks as natural as possible by smoothing these boundaries.
While this might fool the naked eye, when examined pixel by pixel, the boundaries of the inserted object are different. For example, inserted boundaries are often smoother than the natural objects. By detecting boundaries of inserted and removed objects, a computer should be able to identify altered images.
The researchers labeled nonmanipulated images and the relevant pixels in boundary regions of manipulated images in a large dataset of photos. The aim was to teach the neural network general knowledge about the manipulated and natural regions of photos. They tested the neural network with a set of images it had never seen before, and it detected the altered ones most of the time. It even spotted the manipulated region.
“We trained the system to distinguish between manipulated and nonmanipulated images, and now if you give it a new image it is able to provide a probability that that image is manipulated or not, and to localize the region of the image where the manipulation occurred,” Roy-Chowdhury said. The researchers are working on still images for now, but they point out that this can also help them detect deepfake videos.
“If you can understand the characteristics in a still image, in a video it’s basically just putting still images together one after another,” Roy-Chowdhury said. “The more fundamental challenge is probably figuring out whether a frame in a video is manipulated or not.” Even a single manipulated frame would raise a red flag.
He said completely automated deepfake detection might not be achievable in the near future. “If you want to look at everything that’s on the internet, a human can’t do it on the one hand, and an automated system probably can’t do it reliably. So it has to be a mix of the two,” Roy-Chowdhury said.
Deep neural network architectures can produce lists of suspicious videos and images for people to review. Automated tools can reduce the amount of data that people — like Facebook content moderators — have to sift through to determine if an image has been manipulated. “That probably is something that these technologies will contribute to in a very short time frame, probably in a few years,” Roy-Chowdhury said.
“It’s a challenging problem,” Roy-Chowdhury said. “This is kind of a cat and mouse game. This whole area of cybersecurity is in some ways trying to find better defense mechanisms, but then the attacker also finds better mechanisms.”
The paper, “Hybrid LSTM and Encoder–Decoder Architecture for Detection of Image Forgeries,” is published in the July 2019 issue of IEEE Transactions on Image Processing and was funded by DARPA. Other authors include Jawadul H. Bappy, Cody Simons, Lakshmanan Nataraj, and B. S. Manjunath.