Trending News
Home / Cyber / DARPA Safe Documents (SafeDocs) developing new cyber tools for to improve security of electronic data formats

DARPA Safe Documents (SafeDocs) developing new cyber tools for to improve security of electronic data formats

Electronic documents are ubiquitous and essential to all aspects of modern life. Individuals and organizations must routinely engage with electronic documents received from a variety of unauthenticated or potentially compromised sources, comprising a growing variety of electronic data formats. Even if the immediate provider of the data can be authenticated, the data may derive from an untrusted source. We expect pictures, charts, spreadsheets, maps, audio, video, as well as rich messages potentially including any and all of these, to be received with a click of a button, DARPA researchers point out. However, the complexity of managing such electronic data results in software vulnerable to attack. This situation is unsustainable, DARPA experts claim.


On December 23, 2015, the Ukrainian Kyivoblenergo, a regional electricity distribution company, reported service outages to customers. Shortly after the attack, Ukrainian government officials claimed the outages were caused by a cyber attack, and that Russian security services were responsible for the incidents. Following these claims, investigators in Ukraine, as well as private companies and the U.S. government, performed analysis and offered assistance to determine the root cause of the outage.


The study and analysis found that the adversaries weaponized Microsoft Office documents (Excel and Word) by embedding BlackEnergy 3 within the documents. During the cyber intrusion stage of Delivery, Exploit, and Install, the malicious Office documents were delivered via email to individuals in the administrative or IT network of the electricity companies. When these documents were opened, a popup was displayed to users to encourage them to enable the macros in the document. Enabling the macros allowed the malware to Exploit Office macro functionality to install BlackEnergy 3 on the victim system. Upon the Install step, the BlackEnergy 3 malware connected to command and control (C2) IP addresses to enable communication by the adversary with the malware and the infected systems. These pathways allowed the adversary to gather information from the environment and enable access.


Current software that processes electronic data such as documents, messages, and data streams is error-prone and vulnerable to exploitation by malicious inputs. According to MITRE’s Common Vulnerability Enumeration data, over 80% of yearly reported vulnerabilities occur in code that handles input data. Such code converts a given bit stream representing the data into memory objects and validates that these objects have expected structure and relationships.


Exploitation of input-handling vulnerabilities leverages inaccurate programmer assumptions regarding the extent to which input data has been validated by input-handling code. Code that behaves correctly under certain assumptions (and may even be proven correct under these assumptions) will typically not behave correctly if any of these assumptions do not hold. Attackers can induce incorrect behaviors by presenting vulnerable software with maliciously crafted input data that violates unchecked assumptions. The programmer assumes that validated input data contains certain objects in certain relationships, and writes code under these assumptions. However, should any of these assumptions not hold, the code will not behave correctly. A single missing or incorrect check can create a vulnerability, as was the case with the Heartbleed vulnerability (CVE-2014-0160), in which code acting on an unchecked assumption exposed sensitive memory content to remote attackers.


Parsing or checking code itself contains exploitable flaws and behaviors. Such flaws are particularly insidious, as they require little or no human interaction for the attack to succeed or lead to pre-authentication vulnerabilities.


Today, code for input data validation is typically written manually in an ad-hoc manner. For commonly-used electronic data formats, input validation is, at a minimum, a problem of scale whereby specifications of these formats comprise hundreds to thousands of pages. Input validation thus translates to thousands or more conditions to be checked against the input data before the data can be safely processed. Manually writing the code to parse and validate input, and then manually auditing whether that code implements all the necessary checks completely and correctly, does not scale. Moreover, manual parser coding and auditing typically fails even for electronic data formats specifically designed to be easier to perform such tasks, e.g., JSON and XML. A variety of critical vulnerabilities have been found in major parser implementations for these formats.


Widely deployed mitigations against crafted input attacks include (a) trying to prevent the flow of untrusted data to vulnerable software; and (b) testing software with randomized inputs to find and patch flaws that could be triggered by maliciously created inputs. Unfortunately, neither of these approaches offer security assurance guarantees.


Mitigations for preventing the flow of untrusted data to vulnerable software, which can be implemented via network or host-based measures such as firewalls, application proxies, antivirus scanners, etc., neither remove the underlying vulnerability from the target, nor encode complete knowledge of document or message format internals. Attacker bypasses of such mitigations exploit incompleteness of the mitigations’ understanding of the data format to exploit the still-vulnerable targets.


The effectiveness of fuzzing methods for testing of software with randomized inputs to find and fix flaws depends on whether randomly generated inputs can emulate maliciously crafted inputs closely enough to trigger all relevant code flaws. Although modern fuzzing methods incorporate feedback from tracing the execution of the code as it consumes crafted inputs, they also employ symbolic and concolic execution of code in their exploration of the space of potential crafted inputs. As a result, these methods are still essentially heuristic. There is no guarantee that attackers, who also use fuzzing to locate and develop vulnerabilities, will not cover a more substantial and more productive portion of the input space with a different set of heuristics.


DARPA is soliciting innovative research proposals in the area of secure processing of untrusted electronic data. Proposed research should investigate innovative approaches that radically improve software’s ability to recognize and safely reject invalid and maliciously crafted input data, while preserving essential functionality of legacy electronic data formats. Proposals should build on an existing base of knowledge of electronic document, message, and streaming formats and the nature of security vulnerabilities associated with these formats.


Safe Documents (SafeDocs) program

The Safe Documents (SafeDocs) program will develop novel verified programming methodologies for building high assurance parsers for extant electronic data formats, and novel methodologies for comprehending, simplifying, and reducing these formats to their safe, unambiguous, verification-friendly subsets (“safe sub-setting”). SafeDocs will address the ambiguity and complexity obstacles that hinder the application of verified programming posed by extant electronic data formats. SafeDocs’ multi-pronged approach will combine:

  • extracting the extant formats’ de facto syntax (including any non-compliant syntax deliberately accepted and substantially used in the wild);
  • identifying a syntactically simpler subset of this syntax that yields itself to use in verified programming while preserving the format’s essential functionality; and
  • creating software construction kits for building secure, verified parsers for this syntactically simpler subset, and high-assurance translators for converting extant instances of the format to this subset.

The parser construction kits developed by SafeDocs will be usable by industry programmers who understand the syntax of electronic data formats but lack the theoretical background in verified programming. These tools will enable developers to construct verifiable parsers for new electronic data formats as well as extant ones. The tools will guide the syntactic design of new formats, by making verification-friendly format syntax easy to express, and vice versa.


Safedocs awards

Officials of the U.S. Defense Advanced Research Projects Agency (DARPA) in Arlington, Va., announced contracts Wednesday to Galois Inc. in Portland, Ore., and to the Northrop Grumman Corp. Technology Services segment in Herndon, Va. for the Safe Documents (SafeDocs) program.


BAE Systems to develop new cyber tools for DARPA to improve security of electronic data formats

BAE Systems will develop new cyber tools designed to help prevent vulnerabilities in electronic files that can lead to cyber attacks.

BAE Systems has been awarded a contract by the U.S. Defense Advanced Research Projects Agency (DARPA) to develop new cyber tools designed to help prevent vulnerabilities in electronic files that can lead to cyberattacks. Development of these tools will be part of DARPA’s Safe Documents (SafeDocs) program, which aims to more effectively identify and reject malicious data in a variety of electronic formats.


Every day, individuals and organizations in military, government and commercial industries receive electronic content, such as Portable Document Format (PDF) and digital media files, from unauthorized or potentially compromised sources, which creates security risks. As part of the SafeDocs program, BAE Systems’ FAST Labs™ research and development team will create two different cyber tools. The first tool seeks to recover, simplify, and automatically select safe feature subsets within electronic data formats to help encode the data safely and unambiguously, while the second is a toolkit to help software developers avoid vulnerabilities in the software they create to process complex electronic data.


“Research on the SafeDocs program will leverage BAE Systems’ expertise in cyber, algorithmic, and systems engineering domains to give developers tools that currently don’t exist in government or commercial markets to more easily and efficiently ensure the security of electronic documents,” said Anne Taylor, product line director of the Cyber Technology group at BAE Systems. “As the creation and use of electronic documents continues to grow every day, so does the risk for potential cyberattacks, making it essential we create solutions that are built with security in mind to help keep content safe.”


The research for Phase 1 of the SafeDocs program, which is being developed with funding from DARPA, adds to BAE Systems’ cyber technology portfolio. Work for the program will be completed with teammate American University and will take place at the company’s facilities in Arlington, Virginia and Burlington, Massachusetts.


Penn State to increase computer security by developing more secure parsers.

A parser, the element in a computer system that converts data inputs into an understandable format, is the first line of defense for cybersecurity. A multi-institute group of researchers that includes Gang Tan, James F. Will Career Development Associate Professor of Electrical Engineering and Computer Science and a co-hire at the Institute for Computational and Data Sciences (ICDS), has received an $8 million grant that allots $1 million for Penn State’s part of the research to increase computer security by developing more secure parsers.


The research project, “SPARTA: The Secure Parser Toolkit for Assurance,” is funded by the Defense Advanced Research Projects Agency (DARPA) and is a collaboration among Penn State and Galois Inc., Cornell University and Purdue University researchers. The role of a parser in a computer system is to take outside data inputs and convert them into internal representations. Parsers are considered a critical security piece in many systems because they should be able to identify adversarial elements and warn a system user that the program in question may be taking malicious input. However, a cyber attacker could feed malformed data that would trigger bugs in the parser to take over the system. Tan and his research team aim to create parsers that have provable guarantees about safety and are not susceptible to the many bugs that parsers commonly have now.


“There are tools you can use to manually write those parsers, but, in the end, you don’t get many guarantees,” Tan said. “You just rely on the competence of the programmers, and often, these parsers are very complex. Programmers make mistakes, and as a result those mistakes cause vulnerability in computer systems.” For example, at the time that Tan submitted his research proposal, over 1,000 parser bugs were reported for the popular suite of Mozilla products, impacting the security of many common file formats including PDF, ZIP, PNG and JPG.


Tan said that he hopes that with the creation of the SPARTA system, he will potentially be able to develop the most secure parsers to date with a novel parser language and rigorous formal methods. The researchers are focusing specifically on a program called SafeDocs that is geared toward safely opening PDFs. “PDF is a format with a lot of features, and some features are harder to handle than other features,” Tan said. “This parser would warn you if this PDF document is not obeying some safe subset of the format. If this parser agrees to open it, you’re guaranteed to be safe. There’s a provable way of saying it’s safe.” While the project’s focus is on parsers for PDF security, the researchers hope their new system can be applied to other formats, including for videos and images.


“The topic of parsing has been there since the early days of programming, but people have been mostly focusing on functionality, saying, ‘I can build a parser that can parse this kind of data,’ but haven’t paid too much attention to correctness, that is, how do you convince the world that this parser is doing the right thing?” Tan said. “And that turns out to be quite important given the cybersecurity threat. I think what is most exciting about our research is it could give some provable guarantees.”


Cite This Article

International Defense Security & Technology (September 28, 2022) DARPA Safe Documents (SafeDocs) developing new cyber tools for to improve security of electronic data formats. Retrieved from
"DARPA Safe Documents (SafeDocs) developing new cyber tools for to improve security of electronic data formats." International Defense Security & Technology - September 28, 2022,
International Defense Security & Technology June 7, 2020 DARPA Safe Documents (SafeDocs) developing new cyber tools for to improve security of electronic data formats., viewed September 28, 2022,<>
International Defense Security & Technology - DARPA Safe Documents (SafeDocs) developing new cyber tools for to improve security of electronic data formats. [Internet]. [Accessed September 28, 2022]. Available from:
"DARPA Safe Documents (SafeDocs) developing new cyber tools for to improve security of electronic data formats." International Defense Security & Technology - Accessed September 28, 2022.
"DARPA Safe Documents (SafeDocs) developing new cyber tools for to improve security of electronic data formats." International Defense Security & Technology [Online]. Available: [Accessed: September 28, 2022]

About Rajesh Uppal

Check Also

DHS Cybersecurity Strategy for full spectrum of response options

President Biden has made cybersecurity, a critical element of the Department of Homeland Security’s (DHS) …

error: Content is protected !!