Home / Technology / AI & IT / Deep learning neural networks (DNN) basis of advanced in object recognition, face detection, fraud detection and autonomous military systems

Deep learning neural networks (DNN) basis of advanced in object recognition, face detection, fraud detection and autonomous military systems

Artificial Intelligence technologies aim to develop computers, or robots that  surpass abilities of human intelligence in tasks such as learning and adaptation, reasoning and planning, decision making and autonomy; creativity; extracting knowledge and making predictions from data.


AI includes both logic-based and statistical approaches. Within AI is a large subfield called machine learning. Machine Learning (ML) is a subfield of Artificial Intelligence which attempts to endow computers with the capacity of learning from data so that explicit programming is not necessary to perform a task.


ML uses algorithms that learn how to perform classification or problem solving via a process called training, to handle each new problem. Algorithms such as neural networks, support vector machines, or reinforcement learning extract information and infer patterns from the record data so computers can learn from previous examples to make good predictions about new ones.


This was made possible by the advancement in Big Data, Deep Learning and the exponential increase of the chip processing capabilities, especially GPGPUs. The Big Data is term used to signify the exponential grow of data taking place, as 90% of the data in the world today has been created in the last two years alone.


In an article for the World Economic Forum, Marc Benioff, chairman and CEO of Salesforce, explains that the convergence of big data, machine learning and increased computing power will soon make artificial intelligence “ubiquitous”. “AI follows Albert Einstein’s dictum that genius renders simplicity from complexity,” he writes. “So, as the world itself becomes more complex, AI will become the defining technology of the twenty-first century, just as the microprocessor was in the twentieth century.”


The promise of AI—including its ability to improve the speed and accuracy of everything from logistics and battlefield planning to human decision making—is driving militaries around the world to accelerate research and development. A new Harvard Kennedy School study concludes AI could revolutionize war as much as nuclear weapons have done. AI race has ensued between countries like US, China and Russia to take a lead in this strategic technology.


China has overtaken the United States to become the world leader in deep learning research, a branch of artificial intelligence (AI) inspired by the human brain, according to White House reports that aim to help prepare the US for the growing role of artificial intelligence in society. The National Artificial Intelligence Research and Development Strategic Plan lays out the strategy for AI funding and development in the US.


Machine learning

Machine learning is one of the most important technical approaches to AI and the basis of many recent advances and commercial applications of AI. Modern machine learning is a statistical process that starts with a body of data and tries to derive a rule or procedure that explains the data or can predict future data. Machine Learning (ML)  has now become a pervasive technology, underlying many modern applications including internet search, fraud detection, gaming, face detection, image tagging, brain mapping, check processing and computer server health-monitoring.


Learning is a procedure consisting of estimating the model parameters so that the learned model (algorithm) can perform a specific task. For example, in Artificial Neural Networks (ANN), the parameters are the weight matrices. DL, on the other hand, consists of several layers in between the input and output layer which allows for many stages of non-linear information processing units with hierarchical architectures to be present that are exploited for feature learning and pattern classification.


For training, they require data sets covering hundreds or even thousands of relevant features., For this reason, painstaking selection, extraction, and curation of feature sets for learning is often required.


Neural Networks

Neural networks are part of a subfield within the machine learning field, often referred to as brain-inspired computation which is a program or algorithm that takes some aspects of its basic form or functionality from the way we understand the brain works.


The main computational element of the brain is the neuron. There are approximately 86 billion neurons in the average human brain. The neurons themselves are connected together with a number of elements entering them called dendrites and an element leaving them called an axon. The neuron accepts the signals entering it via the dendrites, performs a computation on those signals, and generates a signal on the axon. These input and output signals are referred to as activations. The axon of one neuron branches out and is connected to the dendrites of many other neurons. The connections between a branch of the axon and a dendrite is called a synapse. There are estimated to be 1014 to 1015 synapses in the average human brain.


Neural networks take their inspiration from the notion that a neuron’s computation involves a weighted sum of the input values. These weighted sums correspond to the value scaling performed by the synapses and the combining of those values in the neuron. Thus, by analogy with brain, neural networks apply a nonlinear function to the weighted sum of the input values.


The traditional neural networks had three layers. The neurons in the input layer receive some values and propagate them to the neurons in the middle layer of the network, which is also frequently called a “hidden layer.” The weighted sums from one or more hidden layers are ultimately propagated to the output layer, which presents the final outputs of the network to the user.


Deep  learning & Deep Neural Networks (DNN)

Within the domain of neural networks, there is an area called deep learning, in which the neural networks have more than three layers, i.e., more than one hidden layer.  Typical numbers of network layers used in deep learning ranges from five to more than a thousand.


Deep learning is a type of machine learning in which a model learns to perform classification tasks directly from images, text, or sound. Deep learning is usually implemented using a neural network architecture. Deep learning approaches can be categorized as follows: Supervised, semi-supervised or partially supervised, and unsupervised. In addition, there is another category of learning approach called Reinforcement Learning (RL).

Supervised learning is a learning technique that uses labeled data. There are different supervised learning approaches for deep leaning, including Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), including Long Short Term Memory (LSTM), and Gated Recurrent Units (GRU). Semi-supervised learning is learning that occurs based on partially labeled datasets.
Unsupervised learning systems are ones that can without the presence of data labels. In this case, the agent learns the internal representation or important features to discover unknown relationships or structure within the input data. Often clustering, dimensionality reduction, and generative techniques are considered as unsupervised learning approaches.
A deep neural network combines multiple nonlinear processing layers, using simple elements operating in parallel and inspired by biological nervous systems. It consists of an input layer, several hidden layers, and an output layer. The layers are interconnected via nodes, or neurons, with each hidden layer using the output of the previous layer as its input.
The term “deep” refers to the number of layers in the network—the more layers, the deeper the network. Traditional neural networks contain only 2 or 3 layers, while deep networks can have hundreds. Each layer in the network takes in data from the previous layer, transforms it, and passes it on. The network increases the complexity and detail of what it is learning from layer to layer.

DNNs are capable of learning high-level features with more complexity and abstraction than shallower neural networks. DNN also addressed the limitation of machine learning, which utilizes deep neural networks with dozens of layers to not only learn classifications but also learn relevant features. This capability allows deep learning systems to be trained using relatively unprocessed data (e.g., image, video, or audio data) rather than feature-based training sets.


In 2017, Google’s computers roundly beat the world-class Go champion Lee Sedol, marking a milestone in artificial intelligence. The winning computer program, created by researchers at Google DeepMind in London, used an artificial neural network that took advantage of what’s known as deep learning, a strategy by which neural networks involving many layers of processing are configured in an automated fashion to solve the problem at hand.

Therefore Deep learning, consume often very large amount of raw input data. They process this data through many layers of nonlinear transformations of the input data in order to calculate a target output. The deep learning (DL) algorithms allow high-level abstraction from the data, and this is helpful for automatic features extraction and for pattern analysis/classification.


Deep learning is especially well-suited to identification applications such as object recognition, face recognition, text translation, voice recognition, and advanced driver assistance systems, including, lane classification and traffic sign recognition.


Advanced tools and techniques have dramatically improved deep learning algorithms—to the point where they can outperform humans at classifying images, win against the world’s best GO player, or enable a voice-controlled assistant like Amazon Echo and Google Home to find and download that new song you like.


DL is employed in several situations where machine intelligence would be useful:

  • Absence of a human expert (navigation on Mars)
  • Humans are unable to explain their expertise (speech recognition, vision, and language understanding)
  • The solution to the problem changes over time (tracking, weather prediction, preference, stock, price prediction)
  • Solutions need to be adapted to the particular cases (biometrics, personalization).
  • The problem size is too vast for our limited reasoning capabilities (calculation webpage ranks, matching ads to Facebook, sentiment analysis).

At present, DL is being applied in almost all areas.  As a result, this approach is often called a universal learning approach.


Factors accounting for success of Deep Neural networks

Over the past decade, DNNs have become the state-of-the-art algorithms of Machine Learning in speech recognition, computer vision, natural language processing and many other tasks. DNNs are employed in a myriad of applications from self-driving cars, to detecting cancer to playing complex games.


Deep learning networks typically use many layers—sometimes more than 100— and often use a large  number of units at each layer, to enable the recognition of extremely complex, precise patterns in data.  Data come in and are divided up among the nodes in the bottom layer. Each node manipulates the data it receives and passes the results on to nodes in the next layer, which manipulate the data they receive and pass on the results, and so on. The output of the final layer yields the solution to some computational problem. Although they outperform more conventional algorithms on many visual-processing tasks, they require much greater computational resources.


Three technology enablers make this degree of accuracy possible:

Easy access to massive sets of labeled data

The second factor is that researchers now have access to large datasets to feed the algorithms to “train” them. These datasets contain millions of images, and each one is annotated by humans with different levels of identification. Mr Farfade trained his algorithm using a database of 200,000 images featuring faces shown at various angles and orientations, plus 20 million images that didn’t contain faces. The PIPER has been examined by use of a dataset, consisting of over 60,000 instances of 2000 individuals collected from public Flickr photo albums with only about half of the person images containing a frontal face.


Data sets such as ImageNet and PASCAL VoC are freely available, and are useful for training on many different types of objects.

Increased computing power

One is a significant leap in the availability of computational processing power. Researchers have been taking advantage of graphical processing units (GPUs), which are small chips designed for high performance in processing the huge amount of visual content needed for video games.


High-performance GPUs accelerate the training of the massive amounts of data needed for deep learning, reducing training time from weeks to hours.


Pretrained models built by experts

Models such as AlexNet can be retrained to perform new recognition tasks using a technique called transfer learning.
While AlexNet was trained on 1.3 million high-resolution images to recognize 1000 different objects, accurate transfer learning can be achieved with much smaller datasets.


Challenges of DL

The superior accuracy of DNNs, however, comes at the cost of high computational complexity. While general-purpose compute engines, especially graphics processing units (GPUs), have been the mainstay for much DNN processing, increasingly there is interest in providing more specialized acceleration of the DNN computation.


Accordingly, designing efficient hardware architectures for deep neural networks is an important step towards enabling the wide deployment of DNNs in AI systems. The industry is exploring next-generation chip architectures such as on-chip memories or neuromorphic chips, to reduce the significant costs of data exchange.


Neuromorphic computing is a method of computer engineering in which elements of a computer are modeled after systems in the human brain and nervous system. The term refers to the design of both hardware and software computing elements.


There are several challenges for DL which have been adressed by researchers

  • Big data analytics using DL: As the amount of data increases beyond a certain number, the performance of traditional machine learning approaches becomes steady, whereas DL approaches increase with respect to the increment of the amount of data.
  •  Scalability of DL approaches: As data explodes in velocity, variety, veracity, and volume, it is getting increasingly difficult to scale compute performance using enterprise-class servers and storage in step with the increase. Advancements in high-performance embedded computing (HPEC) platforms have come a long way in not only handling deep learning algorithms, but also in meeting size, weight, power, and cost (SWaP)-constrained system requirements.
  • Ability to generate data which is important where data is not available for learning the system (especially for computer vision task, such as inverse graphics). Generative models are another challenge for deep learning. One example is the GAN, which is an outstanding approach for data generation for any task which can generate data with the same distribution.
  • Energy efficient techniques for special purpose devices, including mobile intelligence,  there is a lot of research that has been conducted on energy efficient deep learning approaches with respect to network architectures and hardwires.
  • Multi-task and transfer learning or multi-module learning. This means learning from different domains or with different models together. Google’s  One Model To Learn Them All. is a good example. This approach can learn from different application domains, including ImageNet, multiple translation tasks, Image captioning (MS-COCO dataset), speech recognition corpus and English parsing task.
  • Dealing with causality in learning: Finally, a learning system with causality has been presented, which is a graphical model that defines how one may infer a causal model from data. Recently a DL based approach has been proposed for solving this type of problem.


MIT study finds that Deep Neural Networks can match primate brain in object recognition

A study from MIT neuroscientists has found that one of the latest generations of these so-called “deep neural networks” matches the human skills such as recognizing objects, which the human brain does very accurately and quickly.


For vision-based neural networks, scientists have been inspired by the hierarchical representation of visual information in the brain. As visual input flows from the retina into primary visual cortex and then inferotemporal (IT) cortex, it is processed at each level and becomes more specific until objects can be identified.


To mimic this, neural network designers create several layers of computation in their models. Each level performs a mathematical operation, such as a linear dot product. At each level, the representations of the visual object become more and more complex, and unneeded information, such as an object’s location or movement, is cast aside.


For this study, the researchers first measured the brain’s object recognition ability. Led by Hong and Majaj, they implanted arrays of electrodes in the IT cortex as well as in area V4, a part of the visual system that feeds into the IT cortex. This allowed them to see the neural representation — the population of neurons that respond — for every object that the animals looked at.


The researchers could then compare this with representations created by the deep neural networks, which consist of a matrix of numbers produced by each computational element in the system. Each image produces a different array of numbers. The accuracy of the model is determined by whether it groups similar objects into similar clusters within the representation. The best network was one that was developed by researchers at New York University, which classified objects as well as the macaque brain.



Helm.ai Pioneers Breakthrough “Deep Teaching” of Neural Networks

Helm.ai, a developer of next-generation AI software,  announced a breakthrough in unsupervised learning technology. This new methodology, called Deep Teaching, enables Helm.ai to train neural networks without human annotation or simulation for the purpose of advancing AI systems. Deep Teaching offers far-reaching implications for the future of computer vision and autonomous driving, as well as industries including aviation, robotics, manufacturing and even retail.


Supervised learning refers to the process of training neural networks to perform certain tasks using training examples, typically provided by a human annotator or synthetic simulator to machines to perform certain tasks, while unsupervised learning is the process of enabling AI systems to learn from unlabelled information, infer inputs and produce solutions without the assistance of pre-established input and output patterns.


Deep Teaching is the next cutting-edge development in AI. It enables Helm.ai to train neural networks in an unsupervised fashion, delivering computer vision capabilities that surpass state-of-the-art performance with unprecedented development speed and accuracy. When applied to autonomous driving Deep Teaching allows Helm.ai to train on vast volumes of data more efficiently without a need for large scale fleets nor armies of human annotators, edging closer to fully self-driving systems.


For example, as the first use-case of Helm.ai’s Deep Teaching technology, it trained a neural network to detect lanes on tens of millions of images from thousands of different dashcam videos from across the world without any human annotation or simulation. The resulting neural network is robust out of the box to a slew of corner cases well known to be difficult in the autonomous driving industry, such as rain, fog, glare, faded/missing lane markings and various illumination conditions. As a sanity check, using this neural network, Helm.ai has topped out public computer vision benchmarks with minimal engineering effort.


In addition, Helm.ai has built a full stack autonomous vehicle which is able to steer autonomously on steep and curvy mountain roads using only one camera and one GPU (no maps, no Lidar and no GPS), without ever training on data from these roads and, performing well above today’s state of the art production systems. Since then, Helm.ai has applied Deep Teaching throughout the entire AV stack, including semantic segmentation for dozens of object categories, monocular vision depth prediction, pedestrian intent modeling, Lidar-Vision fusion and automation of HD mapping. Deep Teaching is agnostic to the object categories or sensors at hand, being applicable well beyond autonomous driving.


Helm.ai has very quickly achieved numerous breakthroughs in autonomous driving technologies, producing systems that offer higher levels of accuracy, agility and safety, and solving corner cases at a small fraction of the cost and time required by traditional deep learning methods. “Traditional AI approaches that rely upon manually annotated data are wholly unsuited to meet the needs of autonomous driving and other safety-critical systems that require human-level computer vision accuracy,” said Helm.ai CEO Vlad Voroninski. “Deep Teaching is a breakthrough in unsupervised learning that enables us to tap into the full power of deep neural networks by training on real sensor data without the burden of human annotation nor simulation.”





References and Resources also include:







About Rajesh Uppal

Check Also

Digital Twins: Accelerating Innovation, Enhancing Reliability, and Revolutionizing Military Readiness

Introduction Innovation, reliability, and military readiness are essential pillars for the success of any modern …

error: Content is protected !!