Digital Signal Processing (DSP) algorithms and Digital Signal Processors

Rajesh Uppal July 29, 2022 Comm. & NW, Electronics & EW Comments Off on Digital Signal Processing (DSP) algorithms and Digital Signal Processors 414 Views

Digital Signal Processors (DSP) take real-world signals like voice, audio, video, temperature, pressure, or position that have been digitized and then mathematically manipulate them. Signals need to be processed so that the information that they contain can be displayed, analyzed, or converted to another type of signal that may be of use. Although real-world signals can be processed in their analog form, processing signals digitally provides the advantages of high speed and accuracy.

Signals may be compressed so that they can be transmitted quickly and more efficiently from one place to another (e.g. teleconferencing can transmit speech and video via telephone lines). Signals may also be enhanced or manipulated to improve their quality or provide information that is not sensed by humans (e.g. echo cancellation for cell phones or computer-enhanced medical images).

DSP technology uses specially designed programs and algorithms to manipulate analog signals and produce a signal that is higher-quality, less prone to degradation or easier to transmit. This typically requires the DSP to perform a large number of simple mathematical functions (addition, subtraction, multiplication, division, and the like) within a fixed or constrained time frame.

A DSP processor is a specialized microprocessor chip that is optimized for the task of digital signal processing. Most general-purpose microprocessors can also execute digital signal processing algorithms successfully, but may not be able to keep up with such processing continuously in real-time. A specialized DSP, will tend to provide a lower-cost solution, with better performance, lower latency, and no requirements for specialized cooling or large batteries. Therefore dedicated DSPs are more suitable in portable devices such as mobile phones because of power consumption constraints.

Such performance improvements have led to the introduction of digital signal processing in commercial communications satellites where hundreds or even thousands of analog filters, switches, frequency converters and so on are required to receive and process the uplinked signals and ready them for downlinking, and can be replaced with specialised DSPs with significant benefits to the satellites’ weight, power consumption, complexity/cost of construction, reliability and flexibility of operation. For example, the SES-12 and SES-14 satellites from operator SES launched in 2018, were both built by Airbus Defence and Space with 25% of capacity using DSP.

They are widely used in audio signal processing, telecommunications, digital image processing, radar, sonar and speech recognition systems, and in common consumer electronic devices such as mobile phones, disk drives and high-definition television (HDTV) products.

A typical digital signal processing system follows a basic architecture that facilitates the digital conversion and manipulation of an analog signal. The first requirement for DSP is always a signal source – there must be a signal such as sound, light, temperature or pressure to filter, measure, or compress. The first step in processing the signal is to convert the analog signal into a digital signal using an analog-to-digital converter (ADC). An ADC converts an input analog voltage to a digital measurement of that voltage.

Following the conversion of the signal to the digital format, the data can be passed through a DSP microprocessor chip where the signal may be filtered, compressed or otherwise manipulated according to application-specific requirements. Once the digital signal has been suitably modified, it may be converted back into analog format with the use of a digital-to-analog converter (DAC). The end result will be a new analog signal that represents a digital modification of the original input signal.

Typical DSP Algorithms

DSP architecture has been shaped by the requirements of predictable and accurate real-time digital signal processing. An example is the Finite Impulse Response (FIR) filter. The main component of a filter algorithm is the ‘multiply and accumulate’ operation, typically referred to as MAC. Coefficients data have to be retrieved from the memory and the whole operation must be executed in a predictable and fast way, so as to sustain a high throughput rate. Finally, high accuracy should typically be guaranteed. These requirements are common to many other algorithms performed in digital signal processing, such as Infinite Impulse Response (IIR) filters
and Fourier Transforms.

Some DSP algorithms and important DSP computations include correlation, convolution, and digital filters; the stochastic-gradient and least-mean-square (LMS) adaptive filters; block matching algorithm for motion estimation (ME), discrete cosine transform (DCT) and vector quantization (VQ) for image processing and compression; Viterbi algorithm and dynamic programming; decimator and interpolator, and wavelets and filter banks for multirate signal processing.

DSP Algorithms	System Applications
Speech coding and decoding	Digital cellular phones, personal communication systems, digital cordless phones, multimedia computers, secure communications
Speech encryption and decryption	Digital cellular phones, personal communication systems, digital cordless phones, secure communications
Speech recognition	Advanced user interfaces, multimedia workstations, robotics and automotive applications, digital cellular phones, personal communication systems, digital cordless phones
Speech synthesis	Multimedia PCs, advanced user interfaces, robotics
Modem algorithms	Digital cellular phones, personal communication systems, digital cordless phones, digital audio broadcast, multimedia computers, wireless computing, navigation, data/facsimile modems, secure communications
Noise cancellation	Professional

Digital Signal Processor (DSP)

A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing.

Digital Signal Processors (DSPs) are microprocessors with the following characteristics:
a) Real-time digital signal processing capabilities. DSPs typically have to process data in real
time, i.e., the correctness of the operation depends heavily on the time when the data
processing is completed.
b) High throughput. DSPs can sustain processing of high-speed streaming data, such as audio
and multimedia data processing.
c) Deterministic operation. The execution time of DSP programs can be foreseen accurately,
thus guaranteeing a repeatable, desired performance.
d) Re-programmability by software. Different system behaviour might be obtained by
re-coding the algorithm executed by the DSP instead of by hardware modifications.

DSPs are fabricated on MOS integrated circuit chips.

A digital signal processing chip contains four main components:

Program Memory – DSP chips contain two types of memory. The first type, the program memory, stores the programs and algorithms that the chip will use to process data. Programming for DSP chips varies significantly by application.
Data Memory – The second type of memory used in DSP chips is known as data memory. This is where the chip stores the data it receives and that will be processed on the chip. Data is typically received as a digital signal that was previously converted from an analog signal.
Compute Engine – The compute engine is the central processing unit of the DSP chip. This is where the computational power for the chip lives and where the algorithms from program memory will be applied to process data.
Input/Output – A DSP chip may possess a number of different types of ports, including serial ports, timers, host ports, external ports, LINK ports, and other types. Ports allow the DSP to send and receive data transmission from other devices, such as ADC or DAC converters. A DSP may also be incorporated into a larger computer system by port connections.

Hardware Architecture

In engineering, hardware architecture refers to the identification of a system’s physical components and their interrelationships. This description, often called a hardware design model, allows hardware designers to understand how their components fit into a system architecture and provides to software component designers important information needed for software development and integration.

GPP is not designed to perform intensive multiplication tasks. Even some modern GPP require multiple instruction cycles to perform a multiplication, while the DSP processor uses specialized hardware to implement single-cycle multiplication. There is also accumulator registers, which are usually wider than the other registers, to handle the sum of multiple products. At the same time, almost all DSP instruction sets contain explicit MAC instructions, which fully reflects the benefits of specialized multiply-accumulation hardware.

Memory architecture

Traditionally, GPP uses the von Neumann memory structure. In this structure, only one memory space is connected to the processor core through a set of buses (an address bus and a data bus). Normally, doing a multiplication will result in 4 memory accesses, using at least four instruction cycles.

DSPs are usually optimized for streaming data and use special memory architectures that are able to fetch multiple data or instructions at the same time.

Most DSP use the Harvard structure, which has two memory spaces to respectively store programs and data. The two spaces have two sets of buses connected to the processor core, allowing simultaneous access to them. This arrangement doubles the bandwidth of the processor memory, and more importantly provides data and instructions to the processor core at the same time. In this layout, the DSP is able to implement a single-cycle MAC instruction.

Circular buffers are limited memory regions where data are stored in a First-In First-Out (FIFO)
way; these memory regions are managed in a ‘wrap-around’ way, i.e., the last memory location is
followed by the first memory location. Two sets of pointers are used, one for reading and one for
writing; the length of the step at which successive memory locations are accessed is called ‘stride’.
Address generator units allow striding through the circular buffers without requiring dedicated
instructions to determine where to access the following memory location, error detection and so on. Circular buffers allow storing bursts or continuous streams of data and processing them in the order in which they have arrived.

Software architecture

By the standards of general-purpose processors, DSP instruction sets are often highly irregular; while traditional instruction sets are made up of more general instructions that allow them to perform a wider variety of operations, instruction sets optimized for digital signal processing contain instructions for common mathematical operations that occur frequently in DSP calculations.

Instruction sets

Multiply–accumulates (MACs, including fused multiply–add, FMA) operations used extensively in all kinds of matrix operations including convolution for filtering, dot product and polynomial evaluation
Fundamental DSP algorithms depend heavily on multiply–accumulate performance
for FIR filters and Fast Fourier transform (FFT)
Related instructions: SIMD and VLIW
Specialized instructions for modulo addressing in ring buffers and bit-reversed addressing mode for FFT cross-referencing. In fact, many implementations of the Fourier transforms require a re-ordering of either the input or the output data that corresponds to reversing the order of the bits in the array index
DSPs sometimes use time-stationary encoding to simplify hardware and increase coding efficiency.
Multiple arithmetic units may require memory architectures to support several accesses per instruction cycle – typically supporting reading 2 data values from 2 separate data buses and the next instruction (from the instruction cache, or a 3rd program memory) simultaneously.
Special loop controls, such as architectural support for executing a few instruction words in a very tight loop without overhead for instruction fetches or exit testing — such as zero-overhead looping and hardware loop buffers.

Data instructions

Saturation arithmetic, in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold rather than wrapping around (maximum+1 doesn’t overflow to minimum as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available.
Fixed-point arithmetic is often used to speed up arithmetic processing
Single-cycle operations to increase the benefits of pipelining

Program flow

Floating-point unit integrated directly into the datapath
Pipelined architecture
Highly parallel multiplier–accumulators (MAC units)
Hardware-controlled looping, to reduce or eliminate the overhead required for looping operations

Fixed-point DSP Instruction Set

The fixed-point DSP instruction set is designed with two goals:

(1) To enable the processor to complete multiple operations in each instruction cycle, thereby improving the computing efficiency of each instruction cycle.

(2) To minimize the memory space for storing DSP programs. This is particularly important in cost-sensitive DSP applications.

Software programs

(1) Most widely used high-level languages, such as C, are not suitable for describing typical DSP algorithms.

(2) The complexity of the DSP structure, such as multiple memory spaces and buses, irregular instruction sets, highly specialized hardware, etc., makes it difficult for compilers to write efficiently.

Even if the C source code is compiled into the assembly code of the DSP with a compiler, the optimization is still a heavy task. Typical DSP applications have a large number of calculation requirements and strict overhead restrictions, making the optimization of the program essential.

Software tools

The improvement of DSP software tools from the early days until now has been spectacular.
Code compilers have evolved greatly to be able to deal with the underlying hardware
complexity and the enhanced DSP architectures. At the same time, they allow the developer to
program more and more efficiently in high-level languages as opposed to assembly coding. This
speeds up considerably the code development time and makes the code itself more portable across different platforms.

Advanced tools now allow the programming of DSPs graphically, i.e., by interconnecting
pre-defined blocks that are then converted to DSP code. Examples of these tools are MATLAB Code Generation and embedded target products and National Instruments’ LabVIEW DSP Module.

High-performance simulators, emulator and debugging facilities allow the developer to have a
high visibility into the DSP with little or no interference on the program execution. Additionally multiple DSPs can be accessed in the same JTAG chain for both code development and debugging.