large amount of data that has to be processed by the same, relatively simple, mathematical program. SIMD
programs exploit instruction-level parallelism, but necessitate the exact same instruction for the multiple data.
VLIW relaxes this constraint in that it allows different instructions (opcodes) to be packed together in a Very
Long Instruction Word, and every instruction therein processes a different datum concurrently. Many DSPs
are VLIW architectures. The types of instructions that are allowed together within one VLIW (and thus will
be executed in parallel) depends on the function units that can operate in parallel. For example, if a DSP has
two fixed-point MAC units and two floating-point MAC units, then at most two fixed-point MAC operations
can be placed into the same VLIW. This constraint is relaxed even further in so-called MIMD machines,
where multiple identical processors can independently execute arbitrary instructions on non-dependent data.
You might note that modern CPUs and their multiple-dispatch pipelines do exactly that – schedule multi-
ple instructions concurrently. With DSPs, however, there is no such intelligent pipeline. Instead, the burden
of scheduling is on the compiler: it has to co-schedule instructions for independent data operations and opti-
mize the packing of instructions in width (for example, four instructions per word) and in sequence (control
flow). DSPs do not perform such complex CPU operations as branch prediction or instruction reordering.
Here, too, the compiler has to perform the optimizations.
DSP programs are relatively small programs (tens or hundreds of LOC), with few branch and control
instructions, as opposed to entire operating systems running on general purpose CPUs. Frequently, a single,
tight, and heavily optimized loop is executed once for every data element or set thereof.
Since DSPs usually execute small programs on huge amounts or endless streams of data, these two pieces
of information are stored in separate memory blocks, often accessible through separate buses. This is called
a Harvard architecture, as opposed to the GPP’s von Neumann architecture, in which both program and data
are stored in the same memory. Since the program does not change (firmware!), many DSPs provide on-
chip ROM (typically in the order of 10kB) for program storage, and a small but efficient RAM hierarchy for
data storage. Frequently, an embedded system also includes a separate non-volatile memory chip such as an
EEPROM or flash memory.
DSPs lack just a few operations, mostly operating-specific instructions. Otherwise, the can do the same
as CPUs, but they can perform them faster, they need less power, they dissipate less heat, they have short
start-up times, they can operate in a larger temperature range, and they are less expensive because the chip
contains only the necessary components.
Manufacturers of DSPs include Agere Systems, Analog Devices, Infineon, Lucent Technologies, Mo-
torola (Freescale Semiconductor), Philips Electronics, Texas Instruments, and Zilog. Bruno Paillard wrote
a good introduction to DSPs, it can be found at http://www.softdb.com/media/DSP
Introduction en.pdf. A
textbook resource by Wiley is Lynn and Fuerst’s “Introductory Digital Signal Processing with Computer
Applications. The USENET group comp.dsp might also be of interest to the reader.
3.2 Field Programmable Gate Arrays
A Field Programmable Gate Array, or FPGA, is a semiconductor in which the actual logic can be modi-
fied to the application builder’s needs. The chip is a relativley inexpensive, off-the-shelf device that can be
programmed in the “field” and not the semiconductor fab. It is important to note the difference in software
programming and logic programming, or logic design as it is usually called: a software program always
needs to run on some microcontroller with an appropriate instruction set architecture (ISA), whereas a logic
program is the microcontroller. In fact, this logic program can specify a controller that accepts as input a
particular ISA, for example, the ISA of an ARM CPU, effectively turning the FPGA into an ARM CPU.
This is a so-called soft core, built from general-purpose logic blocks. These soft cores, or better the
right to use the intellectual property, can be purchased from companies such as Xilinx and Altera. They
are then “downloaded” to the FPGA where they implement the desired functionality. Some of the modern
FPGAs integrate platform- or hard multi-purpose processors on the logic such as a PowerPC, ARM, or a DSP