Home / Technology / AI & IT / DARPA’s LORELEI for All language Real-time intelligence Analysis to support Disaster Relief and worldwide Military operations

DARPA’s LORELEI for All language Real-time intelligence Analysis to support Disaster Relief and worldwide Military operations

U.S. Government operates globally and Understanding local languages is essential for effective situational awareness in military operations, and particularly in humanitarian assistance and disaster relief efforts that require immediate and close coordination with local communities.

“In Nigeria, foreign troops hunt Boko Haram terrorists through areas using as many as 44 distinct languages. Ebola aid workers must try to treat patients in 19 distinct African languages. Even in the United States, Central American refugee children have been found to speak more than 20 languages,” says Graham Templeton in Motherboard.

DARPA’s Low Resource Languages for Emergent Incidents (LORELEI) program aims to change this state of affairs by providing real-time essential information in any language to support emergent missions such as humanitarian assistance/disaster relief, peacekeeping and infectious disease response.

The agency has publicly announced awarding phase one contracts to 13 organizations for the development of the Low Resource Languages for Emergent Incidents (LORELEI) program. Seven of these contracts, which altogether amount to nearly $26 million.

Dr. Bonnie Dorr, a computational linguist at the Institute for Human and Machine Cognition, said that the challenge for low-resource analysis is twofold. You can’t just build a dataset—you have to understand it, too. “In [low resource] speech, you have no idea what’s coming your way,” she told Motherboard. “If you find documents, you have no idea what the nature of those documents are.”

The program’s extremely ambitious rapid machine translation toolkit is expected to be able to understand enough of virtually any of the 7,000 languages of the world so US personnel can effectively coordinate an operation anywhere. The goal is to be able to “digest” any language and learn how to provide helpful machine-translated material “as quickly as 24 hours after an incident occurs,” and even go so far as “fully automated language capabilities within days or weeks after that.

Disaster Management and Humanitarian relief

“After the January 2010 quake, the Haiti community used cellular technology to tell the international community what they needed. Haitians sent hundreds of thousands of text messages in through social media sites,” said the UN report Disaster relief 2.0: “The future of information sharing in humanitarian emergencies.”

“At the same time, the scale and scope of the tragedy created an unprecedented volume of information flowing between humanitarian personnel. Humanitarian field staff had neither the tools nor capacity to listen to the new flow of requests arriving directly from Haitian citizens.”

“This gap did not go unnoticed. Working in communities, thousands of volunteers from around the world aggregated, analyzed, and mapped the flow of messages coming from Haiti. Using Internet collaboration tools and modern practices, they wrote software, processed satellite imagery, built maps, and translated reports between the three languages of the operation: Creole, French, and English.”

 

Military and Intelligence Requirements

The U.S. Government operates globally and Understanding local languages is essential for effective situational awareness in military operations, and particularly in humanitarian assistance and disaster relief efforts that require immediate and close coordination with local communities.

Intelligence analysts face a multitude of challenges in today’s Big Data environment, including more non-English web content to analyze and dramatically reduced decision-making time. Businesses as well as defense and security organizations must integrate real-time translation capabilities into multilingual data processing systems that can spot crucial information in any language.

 

Challenges for low resource Languages

The conventional system of developing automated language technology—which requires years of effort and tens of millions of dollars to manually translate, transcribe and annotate individual words and phrases for each language—is adequate for languages in widespread use or in high demand. It is neither flexible enough to meet constantly changing language needs, however, nor specialized enough to account for the specific communication challenges involved in military-level emergency response.

With more than 7,000 languages in the world and the difficulty of predicting the next language for which technology will be needed, universal human language technology coverage by existing means is an unattainable goal.

“People don’t say what they mean. They change what they mean from day to day… where they are, and even what the major happenings of the day have been.”

 

DARPA’s Low Resource Languages for Emergent Incidents (LORELEI) program

“The global diversity of languages makes it virtually impossible to ensure that U.S. personnel will be able to understand the situation on the ground when they go into new environments,” said Boyan Onyshkevych, DARPA program manager. “Through LORELEI, we envision a system that could quickly pick out key information—things such as names, events, sentiment and relationships—from public news and social media sources in any language, based on the system’s understanding of other languages. The goal is to provide immediate, evolving situational awareness that helps decision makers assess and respond as intelligently as possible to dynamic, difficult situations.”

The program would apply these automated capabilities via an easy-to-use interface that would assimilate, integrate and analyze real-time incident data in the local language(s). The envisioned system would provide useful response-related material as quickly as 24 hours after an incident occurs and fully automated language capabilities within days or weeks after that.

While LORELEI technologies may include partial or full Automated Speech Recognition and/or Machine Translation, the overall goal will not be translating foreign language material into English, but providing situational awareness by identifying elements of information in foreign language and English sources, such as topics, names, events, sentiment, and relationships.

“Our goal with LORELEI isn’t rote translation based on libraries, but instead to provide idiomatic understanding of language as a whole, and specifically disaster-response vocabulary, to improve cooperation and speed response to dangerous situations worldwide,” Onyshkevych said.

LORELEI seeks to dramatically advance computational linguistics and human language technology to identify the elements that different languages have in common, and use that knowledge to enable rapid, low-cost development of automated language capabilities for low-resource languages.

 

Concept of Operations

LORELEI technology is expected to be applicable to any incident in which a sudden need emerges for assimilation of information by U.S. Government entities about a region of the world where low-resource languages are frequently used in formal and/or informal media.

LORELEI Program will support a Concept of Operations where after a language need emerges (at time TØ), and based on information inputted through LORELEI run-time system’s analytic interface, LORELEI is expected to support Government analyst in creating analytic products within certain timeframe with certain specific focus including mission plans and situational awareness documents.

LORELEI capabilities will be exercised to provide situational awareness based on information from any language, in support of emergent missions such as humanitarian assistance/disaster relief, peacekeeping or infectious disease response.

 

  • One-day example focus: identify the specific hotspots most relevant to the crisis, or locations requiring the greatest assistance.
  • One-week example focus: identify viable transportation or supply routes, avoiding blocked roads, etc.
  • One-month example focus: develop an understanding of the human terrain in the crisis area, representing organizations, significant people, etc.

 

LORELEI plans to explore three principal technical areas:

 

Algorithm Research and Development Environment:

LORELEI plans to target research and development of human language technology that would reduce the current reliance on huge, manually translated, transcribed or annotated bodies of knowledge. Instead, LORELEI would leverage what related and unrelated languages have in common and take advantage of a broad range of language-specific resources.

The program also seeks to develop the LORELEI Technology Development Environment (LTDE), which would synthesize language data and integrate it with Web services that would provide named-entity recognition, topic spotting and other language technology capabilities.

LORELEI Language Technology Development Environment (LTDE),shall depend on resources like Data streams in the incident language that will include incident-related text and speech mixed in with data on other topics , English-language news or newswire feed that will include incident-related documents in addition to data on other topics, Monolingual dictionary , Monolingual grammar book , Parallel grammar book (IL->English, may be hardcopy) , Monolingual primer book (IL) ,Monolingual Gazetteer e.t.c.

It also include Scenario model in English specific to incident type (for example, earthquake, infectious disease outbreak, tsunami) including lists of terms relevant to that scenario (does not include any foreign-language terms or other material)

 

Run-time Framework Development:

The program aims to develop a prototype tool, the LORELEI Run-Time Framework (LTRF), which would pull together various open-source data feeds in English and incident languages and send this data compilation through the LTDE’s Web services. The processed results would return to the Framework, where numerous analytics tools would aggregate, summarize and organize them.

The LRTF would not produce reports or situational awareness documents automatically, but would present users with easy-to-understand summaries, visualizations and other useful products that would greatly help in the creation of such documents. The Framework would be able to generate initial results 24 hours after an incident and provide progressively more detailed results at one-week and one-month intervals.

 

Linguistic Resource Creation

LORELEI plans to collect, create and annotate linguistic resources in multiple languages to support the work in the first two technical areas listed above. These resources would include standard language resources (dictionaries, etc.), subject-specific resources (disaster relief terminology, etc.) and other data-enabling research, development and evaluation

DARPA has awarded Phase 1 contracts for LORELEI to the following organizations: Appen, Carnegie Mellon University, Columbia University, Johns Hopkins University, Next Century Corporation, Raytheon BBN, University of Illinois Urbana-Champaign, University of Massachusetts, University of Pennsylvania, University of Pennsylvania Linguistic Data Consortium, University of Texas El Paso, University of Washington, University Southern California Information Sciences Institute

 

The article sources also include

http://motherboard.vice.com/read/how-darpa-plans-to-decrypt-the-languages-that-computers-still-dont-understand

 

 

Humanitarian / Disaster relief / Peace keeping Operations
The UN report Disaster relief 2.0: “The future of information sharing in humanitarian emergencies”, says: “After the January 2010 quake, the Haiti community used cellular technology to tell the international community what they needed. Haitians sent hundreds of thousands of text messages in through social media sites. ”

“At the same time, the scale and scope of the tragedy created an unprecedented volume of information flowing between humanitarian personnel. Humanitarian field staff had neither the tools nor capacity to listen to the new flow of requests arriving directly from Haitian citizens.”

“This gap did not go unnoticed. Working in communities, thousands of volunteers from around the world aggregated, analyzed, and mapped the flow of messages coming from Haiti. Using Internet collaboration tools and modern practices, they wrote software, processed satellite imagery, built maps, and translated reports between the three languages of the operation: Creole, French, and English.”

“In Nigeria, foreign troops hunt Boko Haram terrorists through areas using as many as 44 distinct languages. Ebola aid workers must try to treat patients in 19 distinct African languages. Even in the United States, Central American refugee children have been found to speak more than 20 languages,” says Graham Templeton in Motherboard.

Military and Intelligence Operations
As the volume and pace of the information on the battlefield increases, turning information into understanding is the key. However, with more than 7,000 languages spoken worldwide, however, the U.S. military frequently encounters so-called “low-resource” languages for which translators are rare and no automated translation capabilities exist.

Intelligence analysts face a multitude of challenges in today’s Big Data environment, including more non-English web content to analyze and dramatically reduced decision-making time. Businesses as well as defense and security organizations must integrate real-time translation capabilities into multilingual data processing systems that can spot crucial information in any language.

Challenges for low resource Languages
The conventional system of developing automated language technology—which requires years of effort and tens of millions of dollars to manually translate, transcribe and annotate individual words and phrases for each language—is adequate for languages in widespread use or in high demand. It is neither flexible enough to meet constantly changing language needs, however, nor specialized enough to account for the specific communication challenges involved in military-level emergency response.

With more than 7,000 languages in the world and the difficulty of predicting the next language for which technology will be needed, universal human language technology coverage by existing means is an unattainable goal.

Dr. Bonnie Dorr, a computational linguist at the Institute for Human and Machine Cognition, said that the challenge for low-resource analysis is twofold. You can’t just build a dataset—you have to understand it, too. “In [low resource] speech, you have no idea what’s coming your way,” she told Motherboard. “If you find documents, you have no idea what the natures of those documents are.”

“People don’t say what they mean. They change what they mean from day to day… where they are, and even what the major happenings of the day have been.”

DARPA’s Low Resource Languages for Emergent Incidents (LORELEI) program
DARPA’s Low Resource Languages for Emergent Incidents (LORELEI) program aims to change this state of affairs by providing real-time essential information in any language to support emergent missions such as humanitarian assistance/disaster relief, peacekeeping and infectious disease response. The program recently awarded Phase 1 contracts to 13 organizations.

LORELEI seeks to dramatically advance computational linguistics and human language technology to identify the elements that different languages have in common, and use that knowledge to enable rapid, low-cost development of automated language capabilities for low-resource languages.

The program would apply these automated capabilities via an easy-to-use interface that would assimilate, integrate and analyze real-time incident data in the local language(s). The envisioned system would provide useful response-related material as quickly as 24 hours after an incident occurs and fully automated language capabilities within days or weeks after that.

“The global diversity of languages makes it virtually impossible to ensure that U.S. personnel will be able to understand the situation on the ground when they go into new environments,” said Boyan Onyshkevych, DARPA program manager. “Through LORELEI, we envision a system that could quickly pick out key information—things such as names, events, sentiment and relationships—from public news and social media sources in any language, based on the system’s understanding of other languages. The goal is to provide immediate, evolving situational awareness that helps decision makers assess and respond as intelligently as possible to dynamic, difficult situations.”

While LORELEI technologies may include partial or full Automated Speech Recognition and/or Machine Translation, the overall goal will not be translating foreign language material into English, but providing situational awareness by identifying elements of information in foreign language and English sources, such as topics, names, events, sentiment, and relationships.

“Our goal with LORELEI isn’t rote translation based on libraries, but instead to provide idiomatic understanding of language as a whole, and specifically disaster-response vocabulary, to improve cooperation and speed response to dangerous situations worldwide,” Onyshkevych said.

Concept of Operations
LORELEI technology is expected to be applicable to any incident in which a sudden need emerges for assimilation of information by U.S. Government entities about a region of the world where low-resource languages are frequently used in formal and/or informal media.

LORELEI Program will support a Concept of Operations where after a language need emerges (at time TØ), and based on information inputted through LORELEI run-time system’s analytic interface, LORELEI is expected to support Government analyst in creating analytic products within certain timeframe with certain specific focus including mission plans and situational awareness documents.

• One-day example focus: identify the specific hotspots most relevant to the crisis, or locations requiring the greatest assistance.

• One-week example focus: identify viable transportation or supply routes, avoiding blocked roads, etc.

• One-month example focus: develop an understanding of the human terrain in the crisis area, representing organizations, significant people, etc.

LORELEI Language Technology Development Environment (LTDE),shall depend on resources like Data streams in the incident language that will include incident-related text and speech mixed in with data on other topics , English-language news or newswire feed that will include incident-related documents in addition to data on other topics, Monolingual dictionary , Monolingual grammar book , Parallel grammar book (IL->English, may be hardcopy) , Monolingual primer book (IL) ,Monolingual Gazetteer e.t.c.

It also include Scenario model in English specific to incident type (for example, earthquake, infectious disease outbreak, tsunami) including lists of terms relevant to that scenario (does not include any foreign-language terms or other material)

LORELEI capabilities will be exercised to provide situational awareness based on information from any language, in support of emergent missions such as humanitarian assistance/disaster relief, peacekeeping or infectious disease response.

LORELEI plans to explore three principal technical areas:

Algorithm Research and Development Environment: LORELEI plans to target research and development of human language technology that would reduce the current reliance on huge, manually translated, transcribed or annotated bodies of knowledge. Instead, LORELEI would leverage what related and unrelated languages have in common and take advantage of a broad range of language-specific resources. The program also seeks to develop the LORELEI Technology Development Environment (LTDE), which would synthesize language data and integrate it with Web services that would provide named-entity recognition, topic spotting and other language technology capabilities.

Run-time Framework Development: The program aims to develop a prototype tool, the LORELEI Run-Time Framework (LTRF), which would pull together various open-source data feeds in English and incident languages and send this data compilation through the LTDE’s Web services. The processed results would return to the Framework, where numerous analytics tools would aggregate, summarize and organize them.

The LRTF would not produce reports or situational awareness documents automatically, but would present users with easy-to-understand summaries, visualizations and other useful products that would greatly help in the creation of such documents. The Framework would be able to generate initial results 24 hours after an incident and provide progressively more detailed results at one-week and one-month intervals.

Linguistic Resource Creation: LORELEI plans to collect, create and annotate linguistic resources in multiple languages to support the work in the first two technical areas listed above. These resources would include standard language resources (dictionaries, etc.), subject-specific resources (disaster relief terminology, etc.) and other data-enabling research, development and evaluation

DARPA has awarded Phase 1 contracts for LORELEI to the following organizations: Appen, Carnegie Mellon University, Columbia University, Johns Hopkins University, Next Century Corporation, Raytheon BBN, University of Illinois Urbana-Champaign, University of Massachusetts, University of Pennsylvania, University of Pennsylvania Linguistic Data Consortium, University of Texas El Paso, University of Washington, University Southern California Information Sciences Institute

About Rajesh Uppal

Check Also

China’s AGI Pursuit: Evidence and Implications

Introduction: China is actively pushing forward with research into artificial general intelligence (AGI), an advanced …

error: Content is protected !!