Today, military personnel are expected to perform a growing number of complex tasks while interacting with increasingly sophisticated machines and platforms. Mechanics, for example, are asked to repair more types of increasingly sophisticated machines and platforms, and Medics are asked to perform an expanded number of procedures over extended periods of time. Artificial intelligence (AI) enabled assistants have the potential to aid users as they work to expand their skillsets and increase their productivity. However, the virtual assistants of today are not designed to provide advanced levels of individual support or real-time knowledge sharing.
DARPA launched PTG will be to enable mechanics, medics, and other specialists to perform tasks within and beyond their skillsets by providing just-in-time feedback and instructions for physical tasks, using artificial intelligence technology that perceives the environment, reasons about physical tasks, and models the user. The goal of the PTG program is to make soldiers more versatile by increasing worker knowledge, and expanding their skillset, make them more proficient by reducing their errors and enhancing their productivity, and efficiency.
“In the not too distant future, you can envision military personnel having a number of sensors on them at any given time – a microphone, a head-mounted camera – and displays like augmented reality (AR) headsets,” said Dr. Bruce Draper, a program manager in DARPA’s Information Innovation Office (I2O). “These sensor platforms generate tons of data around what the user is seeing and hearing, while AR headsets provide feedback mechanisms to display and share information or instructions. What we need in the middle is an assistant that can recognize what you are doing as you start a task, has the prerequisite know-how to accomplish that task, can provide step-by-step guidance, and can alert you to any mistakes you’re making.”
DARPA developed the Perceptually-enabled Task Guidance (PTG) program that will develop artificial intelligence (AI) technologies to help users perform complex physical tasks. It will explore the development of methods, techniques, and technology for AI assistants capable of helping users perform complex physical tasks.
“Increasingly we seek to develop technologies that make AI a true, collaborative partner with humans,” said Draper. “Developing virtual assistants that can provide substantial aid to human users as they complete tasks will require advances across a number of machine learning and AI technology focus areas, including knowledge acquisition and reasoning.”
The goal is to develop virtual “task guidance” assistants that can provide just-in-time visual and audio feedback to help human users expand their skillsets and minimize their errors or mistakes.
To accomplish its objectives, PTG is divided into two primary research areas. The first is focused on fundamental research into addressing a set of interconnected problems: knowledge transfer, perceptual grounding, perceptual attention, and user modeling. The second is focused on integrated demonstrations of those fundamental research outputs on militarily-relevant use case scenarios. Specifically, the program will explore how the task guidance assistants could aid in mechanical repair, battlefield medicine, and/or pilot guidance.
PTG technology will exploit recent advances in deep learning for video and speech analysis, automated reasoning for task and/or plan monitoring, and augmented reality for human-computer interfaces. However, these technologies by themselves are insufficient. To create task guidance assistants, PTG is looking for novel approaches to integrated technologies that address four key (and interconnected) problems:
- Knowledge Transfer. Assistants need to automatically acquire task knowledge from instructions intended for humans, with an emphasis on checklists, illustrated manuals, and training videos;
- Perceptual Grounding. Assistants need to be able to align their perceptual inputs – including objects, settings, actions, sounds, and words recognized (inputs) -with the terms it uses to describe and model tasks (outputs), so that observations (percepts) can be mapped to its task knowledge (concepts);
- Perceptual Attention. Assistants must pay attention to percepts that are relevant to current tasks, while ignoring extraneous stimuli. Assistants must also respond to unexpected, but salient, events that may alter a user’s goals or suggest a new task; and,
- User Modeling. PTG assistants must be able to determine how much information to share with a user and when to do so. This requires developing and integrating an epistemic model of what the user knows, a physical model of what the user is doing, and a model of their attentional and emotional states.
Because these four problems are not independent of each other, PTG aims to pursue integrated approaches and solutions that collectively take on all four challenge areas. To give just one example, there is a strong interaction between knowledge transfer and perceptual grounding. If knowledge transfer translates instructions into a small predetermined library of terms, then perceptual grounding becomes easy, whereas if knowledge transfer adopts whatever terms appear in a manual, perceptual grounding is challenging. Therefore, the PTG program does not divide task guidance technology into four separate research areas, but instead expects integrated approaches and solutions that address all four problems.
PTG is not interested in supporting the development of new sensors, computing hardware, and augmented reality headsets. Small and potentially wearable technologies are already available via the commercial market.
The development of AI-enabled agents is not new territory for DARPA. In addition to investing in the advancement of AI technology for more than 50 years, DARPA funded the creation of the technologies that underlie today’s virtual assistants, such as Siri. In the early 2000s, DARPA launched the Personalized Assistant that Learns (PAL) program. Under PAL, researchers created cognitive computing systems to make military decision-making more efficient and more effective at multiple levels of command.
AMIGOS, a new DARPA-funded program, aims to use AI and augmented reality to build real-time training manuals.
In December 2021, Xerox’s research division PARC, in collaboration with the University of California Santa Barbara, the University of Rostock, and augmented reality company Patched Reality, was awarded a $5.8 million contract by DARPA, the Pentagon’s blue-sky projects wing. The goal is to make a program that can guide users through complex operations beyond their existing knowledge, like letting a mechanic repair a machine they’ve never seen before.
“Augmented reality, computer vision, language processing, dialogue processing and reasoning are all AI technologies that have disrupted a variety of industries individually but never in such a coordinated and synergistic fashion,” Charles Ortiz, the principal investigator for AMIGOS, said in a Xerox release, “By leveraging existing instructional materials to create new AR guidance, the AMIGOS project stands to accelerate this movement, making real-time task guidance and feedback available on-demand”
AMIGOS falls under the broader goals of DARPA’s Perceptually-enabled Task Guidance research, which operates with the understanding that humans, with finite capacity, cannot possibly learn everything about every physical task they’ll be asked to perform before they are sent into the field. And yet, those humans will have to treat novel injuries, or repair unfamiliar machines, almost by definition of military work.
For now, the program will work on creating two component parts for AMIGOS. Xerox describes one of them as an offline tool that can scan text from manuals and other instructional material, like videos, to create the step-by-step guide needed for a task. The second component will be online, and intends to use AI to adapt the manual’s instructions into a real-time instructional guide. The offline component ingests learning material, preparing it for use by the online component, which draws on the ingested manuals to generate an updated guide in real time for the user. It is, at once, an exercise in learning and a teaching tool, offering results at the speed of interaction.