AI is a field of computer science dedicated to the theory and development of computer systems that are able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, translation between languages, decision-making, and problem-solving. The commonly used AI techniques are Machine Learning (ML), including Artiﬁcial Neural Network(ANN) and Deep Learning (DL) or Deep Neural Network(DNN), Fuzzy Logic, Genetic algorithm, etc.
Machine learning is a sub-field of AI that comprises the algorithms that can “learn” from data, i.e., progressively improve performance on a specific task. In contrast with other computer software, machine learning algorithms do not
require explicit instructions from humans. Instead, they extract patterns and learn implicit rules from a considerable
number of examples included in a database.
Deep learning is, in turn, a sub-field of machine learning that deals with a smaller family of algorithms, known as
neural networks. These are algorithms inspired by the human brain that seek to learn from large amounts of data by
performing a task repeatedly, each time making minor modifications to its internal features to improve the outcome.
Deep Neural networks (DNN) or large virtual networks of simple information-processing units, which are loosely modeled on the anatomy of the human brain have been responsible for many exciting advances in artificial intelligence in recent years. Deep learning networks typically use many layers—sometimes more than 100— and often use a large number of units at each layer, to enable the recognition of extremely complex, precise patterns in data such as images, sound, and text. Over the past decade, DNNs have become the state-of-the-art algorithms of Machine Learning in speech recognition, computer vision, natural language processing, and many other tasks.
This was made possible by the advancement in Big Data, Deep Learning, and the exponential increase of chip processing capabilities, especially GPGPUs. Big Data is a term used to signify the exponential growth of data taking place, as 90% of the data in the world today has been created in the last two years alone.
Visible light face recognition
The objective of any face recognition system is to find and learn features that are distinctive among people. While it is important to learn the distinctive feature, it is also important to minimize the differences between images of a same person captured under several conditions. These variations included, the distance between camera and person, lighting changes, pose changes, and occlusions.
Visible light face recognition has been well researched and the performance reached acceptable levels but still the
variation in pose and illuminations are a problem which is limiting the algorithms reach 100% accuracy. The visible
light face recognition systems are based on local and global features of the face that are discriminative under controlled environment
Recently deep learning has made rapid progress in face recognition. The important ingredient is the convolutional filters. Various deep architectures have been proposed to improve the face recognition performance under various factors such as pose, low-resolution distance, illuminations.
A facial recognition dataset might be a collection of photos of human faces — along with some photos of animal faces and face-like objects that are not faced at all. Each of the photos in the dataset will be appended with metadata that specifies the real contents of the photo, and that metadata is used to (in)validate the guesses of a learning facial recognition algorithm. Compiling the datasets to be used by a machine learning system is often far more time-consuming and expensive than actually using those datasets to train the system itself.
Recently, with the development of devices, technologies, and algorithms, object detection in complex surveillance environments has become possible. In particular, the accuracy of human detection and recognition has been improved by applying convolution neural networks (CNNs). Besides, the emergence of affordable, powerful GPUs has attracted massive database designers to develop increasingly deep neural networks for all aspects of human recognition tasks—for example, person detection and feature representation classification to contribute to verification and identification solutions.
The disadvantage of these methods being they require huge amount of data for training. This has been partially overcome by using transfer learning. But still they require good amount of data and learning time. Some people used deep neural networks (DNN) only for computing the features that can be readily classified using various classifiers for better performance. This reduced the training time as the DNN required to just compute the features.
Commercial applications include airport and public facility security and surveillance. Facial recognition is rapidly gaining favor as an efficient security feature in airports, cutting down the time and effort it takes to go through security for both passengers and airport personnel. All of the commercial AI-based software used by these airports appears to deliver what they promise.
Facial Recognition Limitations
The application of AI in Facial recognition technology has sufficiently matured recently and its use has been rapidly increasing both in commercial products, as well as by law enforcement lately. All of these systems take in data – often an image – from an unknown person, extract details involving the shape of features like lips, noses, and eyes, and attempt to match them to existing entries in a database of known people’s faces or voices. In the last few years, several groups have announced that their facial recognition systems have achieved near-perfect accuracy rates, performing better than humans at picking the same face out of the crowd.
However, studies have also found some weaknesses with DNNs. Study has revealed that changing an image (e.g. of a lion) in a way imperceptible to humans can cause a DNN to label the image as something else entirely (e.g. mislabeling a lion a library).
Another study by a trio of researchers in the U.S. has found that deep neural networks (DNNs) can be tricked into “believing” an image it is analyzing is of something recognizable to humans when in fact it isn’t. They showed that it is easy to produce images that are completely unrecognizable to humans that the state-of-the-art DNNs believe to be recognizable objects with 99.99% confidence (e.g. labeling with certainty that white noise static is a lion).
University of Washington researchers held the MegaFace Challenge, the world’s first competition aimed at evaluating and improving the performance of face recognition algorithms at the million-person scale. All of the algorithms suffered in accuracy when confronted with more distractions, but some fared much better than others.
“We need to test facial recognition on a planetary scale to enable practical applications — testing on a larger scale lets you discover the flaws and successes of recognition algorithms,” said Ira Kemelmacher-Shlizerman, a UW assistant professor of computer science and the project’s principal investigator. “We can’t just test it on a very small scale and say it works perfectly.”
The UW team first developed a dataset with one million Flickr images from around the world that are publicly available under a Creative Commons license, representing 690,572 unique individuals. Then they challenged facial recognition teams to download the database and see how their algorithms performed when they had to distinguish between a million possible matches.
Google’s FaceNet showed the strongest performance on one test, dropping from near-perfect accuracy when confronted with a smaller number of images to 75 percent on the million-person test. A team from Russia’s N-TechLab came out on top on another test set, dropping to 73 percent.
Some of the hard problems are — recognizing people across different ages is an unsolved problem. So is identifying people from their doppelgängers and matching people who are in varying poses like side views to frontal views,” said Kemelmacher-Shlizerman. The paper also analyses age and pose invariance in face recognition when evaluated at scale.
In general, algorithms that “learned” how to find correct matches out of larger image datasets outperformed those that only had access to smaller training datasets. But the SIAT MMLab algorithm developed by a research team from China, which learned on a smaller number of images, bucked that trend by outperforming many others.
Thermal Infrared Face Recognition
Recently night vision devices such as passive infrared cameras are introduced. These passive thermal infrared cameras capture the radiation emitted by objects between wavelengths of 3-14 µm. Using this thermal infrared imagery for face recognition has been growing in critical security applications. Several methods have been proposed for thermal infrared face recognition.
The infrared spectrum is divided into four bandwidths: Near IR (NIR), Short wave-IR (SWIR), Medium wave
IR (MWIR) and Long wave IR (Thermal IR). Thermal IR has received the most attention due to its robustness
Thermal IR sensors measure the emitted heat energy from the object and they dont measure the reflected energy. The thermal spectrum has some advantages. Thermal images of the face scan be obtained under every light condition even under completely dark environments. Thermal energy emitted from the face is less affected to the scattering and the absorption by smoke or dust. Thermal IR images also reveal anatomical information of face that makes it capable to detect disguises.
However, human detection at night times remains a challenging problem. The features needed for detection are often unavailable, which makes it difficult to find human areas that are not even seen by the human eye, in addition to a large amount of visual noise. Therefore, some researchers have developed human detection systems using far-infrared (FIR) cameras. They performed more processing by using modern machine learning libraries with GPU acceleration that offer optimized results at comparable frame rates
Limitations of Thermal IR face recognition
Thermal face recognition suffers from several problems. One of the major problem of the thermal spectrum is the
occlusion due to opaqueness of eyeglasses to the spectra. Eye glasses cause occlusion leading to large portion of the
face occluded, causing the loss of important discriminative information. Persons with identical facial parameters
might have radically different thermal signatures.
MWIR and LWIR images are sensitive to the environmental temperature, as well as the emotional, physical and health condition of the persons. Alcohol consumption changes the thermal signature of the persons which could lead to performance degradation.
Some of the methods used in visible light face recognition are used on thermal IR face recognition. These include
local feature based methods which extract distinctive local structural information. In fusion based methods images
or features or decision fused when both the visible and thermal images are available. Cross modality based methods
match between visible and thermal imagery using distinctive features. Deep neural network (DNN) based methods find mapping between thermal and visible imagery and features.
Fusion of thermal and Visible images
In situations where both the visible and thermal imagery is available for training then the face recognition system can use the both the information for better learning. The images of visible and infrared can be merged or fused to come up with common image that can include information of both the modalities of the imagery so that given a test image whether it is visible or infrared the system works reliably.
The fusion of thermal and visible images can be done in image level, feature level , match score level and decision level. In those the simplest form of fusing visible and infrared images are by concatenating the feature vectors. Some authors proposed fusing the images in Eigen space.
Bebis et. al. studied the sensitivity of thermal IR imagery to facial occlusions caused by eyeglasses. Specifically, their experimental results illustrate that recognition performance in the IR spectrum degrades seriously when eyeglasses are present in the probe image but not in the gallery image and vice versa. To address this serious limitation of IR, the authors fused IR with visible imagery.
This technology has applications in the Military including ship recognition and electronic warfare-specific emitter identification.
In a military context, its purpose is to identify, classify, verify, and if needed, neutralize any perceived threat. In the interest of security, this may seem a reasonable application. However, a major concern with facial recognition is its accuracy. When the data is not available to train the algorithm to recognize patterns sufficiently or the target image is fuzzy or taken under unfavorable conditions, this impairs the ability of the software to attain a high level of accuracy. These are problems in real-world situations.
The US Secret Service, recently tested facial recognition software using footage from the existing closed-circuit television (CCTV) system in the White House with a test population of Secret Service volunteers. The test was part of the Facial Recognition Pilot or FRP program to ascertain the accuracy of the software in identifying the volunteers in public spaces. The volunteers represented known “subjects of interest” that the software will eventually have to identify.
To counteract this, the Naval Air Warfare Center Weapons Division in China Lake developed adaptive facial recognition software. The brainchild of Katia Estabrides, the software requires much fewer data to train the algorithm and had the ability to incorporate and use new data as it comes in.
Adaptive facial recognition software
Scientists at China Lake have developed a new facial recognition software algorithm that uses biometric functionality and requires very little training data to be effective. The invention more generally relate to facial recognition and more particularly, to a new way of facial recognition processing needing only one image for initial training, allowing the invention to simultaneously classify and learn from unlabeled data.
The software constantly adjusts as new test data becomes available in real time. The new algorithm can tolerate changes in viewpoint as well as misalignments and changes in illumination and scale. It is particularly effective where data from multiple sensors is available. Access to data can also be customized and limited. The developed methodology focused on facial recognition; however, the techniques are expandable to other domains such as ship and target recognition.
The system includes a photo album tool where a user can select a face or faces to be found within the data set. A system and method for adaptive face recognition includes at least one electronic processor having a central processing unit. At least one database having a plurality of pixilated face images of known subjects of interest is associated with the processor. At least one test image of a new subject of interest is configured for input into the electronic processor. A classification processing tool is associated with the electronic processor. The classification processing tool is configured to build a dictionary and provide a classification match of the test image with one of the pluralities of pixilated face images of known subjects of interest. At least one device is associated with the processor and configured to output the classification match in a tangible medium.
A good use for this type of software is to identify friend from foe. The military typically keeps multiple records of identification of their personnel, which certainly includes photos. The software can quickly identify faces that are not in its database, tag it, and alert personnel about the presence of unauthorized personnel. This can be a valuable asset in an active battlefield. While the software focuses on facial recognition and has military applications in this guise for identifying authorized personnel, it may also be adapted to recognize ships and electronic warfare-specific emitters or signals as needed.
These visible light face recognition methods completely fail when there is no sufficient light or complete darkness.
Infrared imaging helps in capturing the facial images even under complete darkness. Use of Thermal IR imaging in face recognition enable it to be invariant to even extreme illumination changes. Face recognition using the visible spectrum may not work properly in low lighting conditions.
US Army develops face recognition technology that works in the dark
Army researchers have developed an artificial intelligence and machine learning technique that produces a visible face image from a thermal image of a person’s face captured in low-light or nighttime conditions. This development could lead to enhanced real-time biometrics and post-mission forensic analysis for covert nighttime operations.
Thermal cameras like FLIR, or Forward Looking Infrared, sensors are actively deployed on aerial and ground vehicles, in watch towers and at check points for surveillance purposes. More recently, thermal cameras are becoming available for use as body-worn cameras.
Drs. Benjamin S. Riggan, Nathaniel J. Short and Shuowen “Sean” Hu, from the U.S. Army Research Laboratory have developed the technology to perform automatic face recognition at nighttime using such thermal cameras. This is beneficial for informing a Soldier that an individual is someone of interest, like someone who may be on a watch list. The motivations for this technology — developed by — are to enhance both automatic and human-matching capabilities.
“This technology enables matching between thermal face images and existing biometric face databases/watch lists that only contain visible face imagery,” said Riggan, a research scientist. “The technology provides a way for humans to visually compare visible and thermal facial imagery through thermal-to-visible face synthesis.”
He said under nighttime and low-light conditions, there is insufficient light for a conventional camera to capture facial imagery for recognition without active illumination such as a flash or spotlight, which would give away the position of such surveillance cameras; however, thermal cameras that capture the heat signature naturally emanating from living skin tissue are ideal for such conditions.
“When using thermal cameras to capture facial imagery, the main challenge is that the captured thermal image must be matched against a watch list or gallery that only contains conventional visible imagery from known persons of interest,” Riggan said. “Therefore, the problem becomes what is referred to as cross-spectrum, or heterogeneous, face recognition. In this case, facial probe imagery acquired in one modality is matched against a gallery database acquired using a different imaging modality.”
This approach leverages advanced domain adaptation techniques based on deep neural networks. The fundamental approach is composed of two key parts: a non-linear regression model that maps a given thermal image into a corresponding visible latent representation and an optimization problem that projects the latent projection back into the image space.
Details of this work were presented in March in a technical paper “Thermal to Visible Synthesis of Face Images using Multiple Regions” at the IEEE Winter Conference on Applications of Computer Vision, or WACV, in Lake Tahoe, Nevada, which is a technical conference comprised of scholars and scientists from academia, industry and government.
At the conference, Army researchers demonstrated that combining global information, such as the features from the across the entire face, and local information, such as features from discriminative fiducial regions, for example, eyes, nose and mouth, enhanced the discriminability of the synthesized imagery. They showed how the thermal-to-visible mapped representations from both global and local regions in the thermal face signature could be used in conjunction to synthesize a refined visible face image.
The optimization problem for synthesizing an image attempts to jointly preserve the shape of the entire face and appearance of the local fiducial details. Using the synthesized thermal-to-visible imagery and existing visible gallery imagery, they performed face verification experiments using a common open source deep neural network architecture for face recognition. The architecture used is explicitly designed for visible-based face recognition. The most surprising result is that their approach achieved better verification performance than a generative adversarial network-based approach, which previously showed photo-realistic properties.
Riggan attributes this result to the fact the game theoretic objective for GANs immediately seeks to generate imagery that is sufficiently similar in dynamic range and photo-like appearance to the training imagery, while sometimes neglecting to preserve identifying characteristics, he said. The approach developed by ARL preserves identity information to enhance discriminability, for example, increased recognition accuracy for both automatic face recognition algorithms and human adjudication.
As part of the paper presentation, ARL researchers showcased a near real-time demonstration of this technology. The proof of concept demonstration included the use of a FLIR Boson 320 thermal camera and a laptop running the algorithm in near real-time. This demonstration showed the audience that a captured thermal image of a person can be used to produce a synthesized visible image in situ. This work received a best paper award in the faces/biometrics session of the conference, out of more than 70 papers presented.
Riggan said he and his colleagues will continue to extend this research under the sponsorship of the Defense Forensics and Biometrics Agency to develop a robust nighttime face recognition capability for the Soldier.