In recent years consumer imaging technology (digital cameras, mobile phones, etc.) has become ubiquitous, allowing people the world over to take and share images and video instantaneously. Mirroring this rise in digital imagery is the associated ability for even relatively unskilled users to manipulate and distort the message of the visual media. While many manipulations are benign, performed for fun or for artistic value, others are for adversarial purposes, such as propaganda or misinformation campaigns.
The most infamous form of this kind of content is the category called “deepfakes” — usually pornographic video that superimposes a celebrity or public figure’s likeness into a compromising scene. Though software that makes that makes deepfakes possible is inexpensive and easy to use, existing video analysis tools aren’t yet up to the task of identifying what’s real and what’s been cooked up.
The problem isn’t limited to the fashion and cosmetics industries where photos are “touched-up” and “augmented” to make models look better and the results of skin-care products look (instantly) appealing — it’s spread to politics and now even business. Manipulated videos and images that may be manually indistinguishable from the real thing present a series of real-world problems, including election and evidence tampering, blackmail, general propaganda and targeted social media misinformation efforts.
This manipulation of visual media is enabled by the wide scale availability of sophisticated image and video editing applications that permit editing in ways that are very difficult to detect either visually or with current image analysis and visual media forensics tools. DARPA’s four-year, $4.4 million Media Forensics, or MediFor, initiative will underwrite research to, for example, develop better algorithms that can be used to spot fake images. The tools would then allow analysts to conduct forensic investigations to determine precisely how and why images were manipulated.
Tools already exist to scan Internet images, but not on the scale required by U.S. intelligence agencies. The forensic tools used today lack robustness and scalability, and address only some aspects of media authentication; an end-to-end platform to perform a complete and automated forensic analysis does not exist.
“A key aspect of this project is its focus on gleaning useful information from massive troves of data by means of data-driven techniques instead of just developing small laboratory solutions for a handful of cases,” Walter Scheirer, a principal investigator at Notre Dame, noted in a statement.
Researchers noted that such a capability would require specialized machine-learning platforms designed to automatically perform processes needed to verify the authenticity of millions of videos and images.
“You would like to be able to have a system that will take the images, perform a series of tests to see whether they are authentic and then produce a result,” explained Edward Delp, director of Purdue’s Video and Image Processing Laboratory. “Right now you have little pieces that perform different aspects of this task, but plugging them all together and integrating them into a single system is a real problem.”
In rolling out the program , program officials said MediFor would attempt to integrate machine learning and image analysis technologies into a forensic-based platform to “detect manipulations, provide analysts and decision makers with detailed information about the types of manipulations performed, how they were performed… in order to facilitate decisions regarding the intelligence value of the image [and] video.”
DARPA’s MediFor program brings together world-class researchers to attempt to level the digital imagery playing field, which currently favors the manipulator, by developing technologies for the automated assessment of the integrity of an image or video and integrating these in an end-to-end media forensics platform.
If successful, the MediFor platform will automatically detect manipulations, provide detailed information about how these manipulations were performed, and reason about the overall integrity of visual media to facilitate decisions regarding the use of any questionable image or video.
DARPA hopes the fruits of the MediFor program will be distributed far and wide, picked up by tech companies such as Facebook and YouTube, which handle a big fraction of the world’s user-generated videos.
Media forensics is the science and practice of determining the authenticity and establishing the integrity of an audio and visual media asset for a variety of use cases such as litigation, fraud investigation, etc. For computer researchers, media forensics is an interdisciplinary approach to detect and identify digital media alterations using forensic techniques based on computer vision, machine learning, media imaging, statistics, etc. to identify evidence (or indicators) supporting or refuting the authenticity of a media asset.
Existing media manipulation detection technologies forensically analyze media content for indicators using a variety of information sources and techniques such as the Exchangeable Image File Format (EXIF) header, camera Photo Response Non-Uniformity (PRNU) model, manipulation operation (e.g. splice, copy-clone) detection, compression anomalies, and physics-based and semantic based consistency approach.
The media forensics and anti-forensics techniques are developing quickly in recent years, but the evaluation and analysis of state-of-the-art manipulation detectors are impeded by the diversity and complexity of the technologies and the limitations of existing datasets, which include but are not limited to: (i) lack of rich metadata (annotations) essential to systematic evaluations and analysis; (ii) missing structured representation of manipulation history reconstruction; (iii) insufficient detail to generate diverse evaluation metadata and ground-truth (e.g. image format, manipulation semantic meaning, camera information, and manipulated image masks) for specific detectors given the same manipulated media (image or video).
The primary objective for the MediFor data collection and evaluation team is to create benchmark datasets that advance current technologies and drive technological developments by understanding the key factors of this domain. The data collection and media manipulation team provides various kinds of data, metadata, and annotations supporting the program evaluations, while the evaluation team designs the data collection requirements, validates the quality of the collected data, and assembles the evaluation datasets.
The ultimate goal of the MediFor program is to gain a deep understanding of the performance of different technologies based on the properties of the media, their manipulations, and their relationships with each other. In order to meet this goal, the program requires a large amount of highly diverse imagery with ground-truth labels and metadata covering an enormous spectrum of media itself, and manipulation types from the diverse image editing software and tools.
Digital integrity, physical integrity and semantic integrity
To undertake automatic media manipulation detection, the MediFor program has established a framework of three tiers of information, including digital integrity, physical integrity and semantic integrity.
Digital integrity asks whether pixels in an image or video are consistent, said Turek. Physical integrity asks whether the laws of physics appear to be violated in visual media being examined. For example, with an image of a boat moving in the water, physical integrity information could be applied to see if that boat is producing a wake that corresponds to a real boat wake, Turek explained.
“The third level of information is semantic integrity, and that asks the question: is a hypothesis about a visual asset disputable?” Turek said. “This is where we can bring to bear outside information. It might be other media assets or other things that we know are true.”
For example, the MediFor program can use information on the semantic level to estimate weather properties like temperature and humidity based solely on pixels in an image. Using images with accurate metadata, the MediFor program leverages that metadata to gather weather data available in the cloud to train a deep neural network. That deep network estimates weather properties based solely on pixels. By comparing the accurate weather data from the former with the estimations of the deep network, the MediFor program can fine-tune the deep network to improve its performance.
Digital, physical and semantic integrity provide different avenues to verify the authenticity of visual media. They make it more difficult, if not impossible, for manipulators to alter visual media properly across all three levels.
The MediFor program is also combining information from these three tiers to generate one integrity score that encompasses an entire piece of visual media, be it a video or an image, according to Turek.
“In coming years, you may see technologies developed by the MediFor program that enable accurate detection of manipulated media. Such capabilities could enable filtering and flagging on internet platforms, for instance, that happens automatically and at scale and that are presented to you as the user. That gives us the ability to have much more comprehensive protection from media manipulation,” Turek said.
Matt Turek, program director for MediFor at the Defense Advanced Research Projects Agency in Washington, D.C., said 14 prime contractors are working on the project, which was started in summer 2016. Those contractors have subcontractors working on MediFor, too.
Image research has been divided among several U.S. universities along with investigators in Brazil and Italy. The multidisciplinary team includes researchers from the University of Notre Dame, New York University, Purdue University and the University of Southern California.
At DEIB-PoliMi, the Image and Sound Processing Group (ISPG) has an active research line (among others) on Image and Video Forensics, with the aim of developing automatic tools for identifying forged contents.
In addition to techniques aiming at identifying forgeries through the analysis of single images or videos, in the last few years, forensic researchers have shown the possibility of performing deeper analysis jointly considering multiple connected multimedia objects.
Indeed, the proliferation of high-level user-generated contents available through social media and sharing platforms, in conjunction with the increasing use of web by traditional media (e.g., TV channels and newspaper), has determined a massive diffusion of visual contents covering different events of interest (e.g., public speeches, interviews, etc.).
These images and videos are often obtained re-editing pre-exiting material already published online, and offer a valuable source of information to be exploited by forensic methodologies. In this context, ISPG is active in the field of Multi-Media Phylogeny, in which the relationships among contents relative to the same event are automatically determined to establish how content has evolved in time during his diffusion on the web.
DARPA funded the project through to 2020 to SRI International to work closely with the researchers at the University of the Amsterdam. The Biometrics Security and Privacy Group of the Idiap Research Institute of Technology in Switzerland focusing on four techniques to identify the kind of audiovisual discrepancies present in a video that will help lipsync analysis and speaker inconsistency detection and scene inconsistency detection.
Bob Bolles, program director at the AI center’s perception program at SRI recently showed off how he and his team are trying to detect when video has been tampered with. The team looks for inconsistencies between video and the audio track — for example, watching whether a person’s lip movements match the sounds of the video. Think videos that aren’t as obvious as those old, badly dubbed kung fu movies.
The team also tries to detect sounds that may not jibe with the background, said Aaron Lawson, assistant director of the speech lab at SRI. For example, the video may show a desert scene but reverberations can indicate the sound was recorded indoors. “We need a suite of tests” to keep up with hackers who are bound to keep figuring out new ways to keep fooling people, Bolles said.
Prof. Anderson Rocha, Institute of Computing, University of Campinas, Campinas, SP, Brazil. His group develops cutting-edge research on machine learning and digital forensics. Some examples of previous developed solutions include algorithms in open-set recognition, meta-recognition, multimedia phylogeny, spoofing detection, forgery detection in images. His group recently partnered-up with Polimi and others worldwide in the DARPA Medifor project in an effort to detect multimedia forgeries and hint at media provenance in the 21st century.
Purdue’s piece of the project focuses on using tools like image analysis to determine whether media has been faked, what tools were used and what portions of an image or video were actually modified. “The biggest challenge is going to be the scalability, to go from a sort of theoretical academic tool to something that can actually be used,” Delp said.
USI ISI’s DiSPARITY project aims to characterize signs of manipulated images and video. To achieve this, the team considers important indicators such as pixel-level attributes, the physics of the scene, and the semantics and genealogy of the image or video asset.
Challenges include the wide variety of image-capturing devices, increasing sophistication of manipulation tools and techniques (including rapid advances in computer graphics technology such as Photoshop), the emergence of generative adversarial networks (GANs), and the sheer volume of analyzable data.
If successful, the MediFor platform will automatically detect manipulations, provide detailed information about how these manipulations were performed, and reason about the overall integrity of visual media to facilitate decisions regarding the use of any questionable image or video
Adobe and UC Berkeley make MediaFor a success
“While still in its early stages, this collaboration between Adobe Research and UC Berkeley, is a step towards democratizing image forensics, the science of uncovering and analyzing changes to digital images,” said Adobe.
“We started by showing image pairs (an original and an alteration) to people who knew that one of the faces was altered. For this approach to be useful, it should be able to perform significantly better than the human eye at identifying edited faces,” explained Adobe’s Wang.
By the end of the project, human eyes were able to judge the altered face 53 percent of the time, while the tool they developed achieved results as high as 99 percent.
Overall, in the future, experts believe media forensics will be tied into all social media platforms that want to avoid fake news and all kinds of media dissemination tools that want to ensure people know what they’re watching — their biggest challenge, however, will be balancing creative or artistic license with fact.