FPGAs DEMISTIFIED
by Eddie Insam
edinsam@eix.co.uk
copyright © 2004
Electronics World May-June 2004
ABSTRACT
Starting on any new technology can be difficult if the first few steps are not very clear. A typical case is getting to know programmable Gate Arrays. Eddie Insam explains some of the facts, and describes a step by step guide on getting started and building your own simple programming development kit.
A week hardly passes by without us being bombarded with information on new products, new techniques and new devices. The many adverts and press releases are peppered with words like “benefits” and “solutions” which to me at least, don't seem to convey much information. Perhaps this is because I cannot see the problem they are trying to solve in the first place! A situation like this can be very confusing to a newcomer or to a recent graduate. The basic chicken and egg dilemma is that if we know little or nothing about a concept, it is unlikely that we will gain much information from material presented in an abstract way. We won't understand what they are trying to say, or appreciate their benefits.
One such area is Gate Arrays (GA), a technology also known under various other names such as PLDs and FPGAs. This is the kind of topic that can put a dead stop in anybody's learning curve. The main barriers are a lack of appreciation of the uses they can be put to, a perceived high cost of entry and the difficulty in knowing where to start. Unless you are lucky or rich enough to attend a course or seminar, the only practical way in is by reading existing literature. Unfortunately, a lot of the information available seems to assume you know all about the subject and have a lab full of equipment and software tools, most having a purpose in life that appears to be completely unknown. No wonder many designers who have not been brought up on the subject from college days are avoiding or refusing to get involved in this magic world.
This article will dispel some of the myths. Even if you are not interested in using GAs, the article will induce some confidence by introducing some of the jargon. For those wishing to become more involved, there will be a step by step guide, including a circuit for a device programmer that won't cost a fortune. This, together with the free tools available for download from the Internet, will make the inroad into GA know-how accessible to anyone.
Traditionally, GAs are used as replacement for glue logic. This is the term given to the various logic gates, counters and flip-flops that are dispersed around a PCB to provide general interfacing, buffering and address decoding. Their advantage, from a commercial point of view, is that they are cheaper and that they occupy less board space than the discrete ICs they replace. In order to make them work, they need to be programmed (not unlike a PROM). The fact that they can be re-programmed in place means designs can be tweaked and modified after the PCB has been committed to production.
As GAs became more powerful, they started to be used as sub-modules or as complete designs in their own right. For example, as purpose built interfaces to microcontrollers, as fast UARTs or as complex communications controllers for Ethernet or encryption systems. The latest generation of GAs are powerful enough to include their own built in CPU cores, which can be used for standard computing or for specialist applications such as Digital Signal Processing (DSP). Developing your own CPU may sound a bit like re-inventing the wheel, but there are major advantages such as adding your own custom instructions to an otherwise standard set. For example, complex procedures that require several "C" language statements in a standard CPU could be performed within a single clock cycle in a customised set. This can result in a serious increase in performance, an important factor in fast applications such as voice or video communications.
Like any good old engineering textbook we shall start at the beginning. Programmable logic arrays have been with us for quite a few years. The original concept was first conceived in the early days of custom integrated circuit design. A few thoughtful manufacturers decided to offer IC "pizza bases" consisting of gates, flip flops and other components laid out in neat substrate arrays but without the final metalisation layer topping (that is, the wiring connection between the devices). Users only needed to plan a design for this final layer and submit a CAD file containing the node interconnections to the manufacturer for final phase production. The users did not need to get involved in the design of the IC gates, nor they did need to know how they worked apart from knowing there were so many flip-flops, gates or whatever per chip.
Some of you may also remember the venerable 82S32 fusible link PROM (still available amazingly enough!) This was nothing more than a 32x8 cross array of open collector transistors forming a 256 bit memory cell in a standard dual in line package. The device was "programmed" by applying high voltages to some of the pins, which made the internal die cross links "blow up" ending up with the equivalent of a one time programmable 32 byte ROM. The device could operate at nanosecond switching speeds, very fast in those days, making it useful in applications such as address decoders and dynamic ram timing generators.
The next evolutionary step saw gates and flip-flops being added within the interconnected fuse arrays. These have the generic name of Programmable Logic Devices (PLDs), also known as PALs or GALs. A typical design has a quantity of islands or blocks of standard flip-flops layouts with programmable feedback paths and programmable random logic inputs (FIG 1) A typical device may have 8, 10 or more of these macrocells within a dual in line package. The inputs to the flip-flops are fed from a matrix array of combinatorial inputs, so that various Boolean logic combinations can be programmed to drive the register's D inputs, either from the outside world, or from other flip-flops within the package. This makes PLDs useful as simple address decoders, adders, counters, shift registers, timers, parity generators etc. A separate clock pin usually drives all the flip-flops within the chip, and most devices include separate pins for clear, preset and tri-state output enables. Typical part numbers have names like 16V8, 20L8, 22V10 etc, where the first two digits indicate the number of logic input pins, the last digit the number of flip-flops, and the middle letter or letters, the technology used; with some minor variation among manufacturers. Each of these devices is designed to replace 4-5 equivalent CMOS or TTL packages. Not a world shattering improvement, but useful in context as PCB boards get more and more crowded.

FIG 1 CAPTION: A typical PLD consist of a number of macrocells linked to a common cross-wire bus (A). The connections are defined by fuse links that are programmed to be either on or off thus establishing the final operation of the device. Each macrocell consists of a standard circuit using a D flip-flop and a number of combinatorial AND-OR logic inputs (B).
Most of the newer PLD families use CMOS technology, whilst the older use established MOS techniques mainly devised to achieve fast switching speeds, with propagation delays of 5nS or less. One hidden aspect of this technology is that static power dissipation can be quite high; a factor to consider when calculating overall board costs. From the programming point of view, devices are either one-time programmable (in-site or at the factory) or re-programmable by the use of electrically erasable memory cells and special dedicated programming pins. Depending on technology, special voltages and pulse patterns may be required, and as programming information is not always openly or freely available from manufacturers, special commercial programmers may need to be obtained.
Programming information takes the form of fuse co-ordinate listings. These are collections of ones and zeros that more or less mimic the fuse map geography within the device. These files can be generated by hand -a rather laborious task- or with the help of computers and compilers using descriptive languages which take as input node allocations in the form of text and logic equations, more on this later.
The next step in the development ladder is Complex Programmable Logic Devices (CPLD). These are in the main, evolutionary variations on the PLD theme. The register flip-flops are placed in a two dimensional row and column grid with the interconnections straddling them, just like the streets on a city map (FIG 2) this allows many more registers to be placed on a die. Cell designs can also be more complex, typical CPLD devices can have 256 registers or more in a single package, making them reasonably powerful. The basic flavour of the month technology is CMOS, which has the advantage of low power consumption. For most modern devices, programming is done at standard logic levels, doing away for the need for special programming voltages or hardware (even a microprocessor can do it)

FIG 2 CAPTION The more advanced CPLDs and FPGAs consist of hundreds of Logic Array Blocks linked together in a 2D array of cross-connecting links (A). Each Logic Block consists of one or more Logic Element Blocks or macrocells (B) Each Macrocell consists of a Flip-flop plus programmable feedback logic (C). Various extra circuits can be added to the die, e.g. PLLs, clock multipliers, special purpose I/Os, RAM and even full CPU cores.
Beyond CPLDs are the generic Gate Arrays (GAs), which include Field Programmable Gate Arrays (FPGAs) The generic FPGA architecture is similar to that of CPLDs, but using much smaller transistor cells and corresponding miniature MOS technologies (CMOS requires massive surface areas per gate in comparison) FPGA macrocells can be much smaller in size and their building blocks can be far more complex. Figures of several million transistors on a die are not uncommon.
One small price to pay with most current FPGA technology is that programming is not permanent. In other words, the fuse link information is stored in RAM within the device and must be re-loaded every time power is applied. This requires external boot loading circuitry known as configuration devices, which are nothing more than special purpose flash EEPROMs holding an image of the fuse map.
Improved processing techniques now allow whole sections within a FPGA device to be allocated for special tasks such as RAM, special purpose I/O or even complete embedded micro-controllers. As an example of current technology, the Xilinx Virtex II/PRO series contains the equivalent of 4Mb of dual port RAM, over 45,000 registers, I/O that can operate in the Gigahertz range and four (yes four) embedded Power PC CPUs, all in a single package.
If you already know that "Spartan delivers Serdes at Gigabyte speeds" has nothing to do with Greeks spreading nasty diseases, you may perhaps want to skip this section. Mythical and obscure names are ripe in the Gate Array world, possibly one of the reasons why the uninitiated shy away from this Hobbit like world. By the way, do not expect these names to be acronyms for anything (at least as far as I know). Only that just like car model names in TV commercials, they apparently sound good to the ear and convey an element of wizardry to the people who program them.
Four manufacturers dominate the field, of which two: Xilinx and Altera are the brand leaders. The other two, Lattice and Atmel, command a smaller section of the market, but provide good competition by the introduction of innovative features and originality in their devices.
Devices are grouped into "families" with heroic sounding names such as Spartan, Acex or Stratix. These are in the main, variations of a particular fabrication technology, MOS Type, or cell size in microns. Devices are arranged by size, number of devices and external package outline. The more complex devices are assumed to be used in applications where a large number of I/O pins are required, so they are usually fitted into the larger packages. This is not always the case, and most FPGA designs end up leaving most of their I/O pins unused or unallocated.
Families are also divided into CPLDs and FPGAs as described above. In general, most CPLDs include permanent program storage, and FPGAs require external configuration devices. This division is not always strict, Atmel for example, have FPGA technology that includes permanent program storage. In the simplest of terms, CPLDs will be used in applications requiring up to 512 registers (flip-flops) and FPGAs where more than these are needed.
The main parameters distinguishing members of a family are the number of gates or macrocells per package, the number of I/O pins, the package type (which limits the total number of I/O pins) and other features such as on chip PLLs and special purpose I/O drivers. Subtle differences in the way the macrocells are designed allow for some manufacturers to offer "better" implementation of commonly used logic blocks, for example patented ultra-fast carry look-ahead adders, and tricks to improve performance, such as on-chip clock frequency multipliers.
Families do not always complement each other in a logical way but overlap widely. This may seem confusing and make device choosing rather complicated. The reason for this is partly because the market is heavily technology driven. New IC fabrication techniques quickly make previous families obsolete. With such short design cycle times, many users out there will still be designing using "older" families, which could result in a lot of confusion and aggravation. Manufacturers are keen to continue support for previous families, while at the same time nudge users to move to newer processes, which are usually cheaper and more powerful. This can makes sense from the manufacturer's point of view, who do not want to be lumbered with many legacy manufacturing processes. From the user's point of view however this policy can be a disaster, especially when it comes to maintaining a multitude of end products using a range of different devices. Some families have become more settled and popular than others, this is reflected in prices and stock levels from the various suppliers, an important factor to consider when selecting devices.
THE STORY SO FAR
Here is a summary of current status. Of course, no guarantees that this will all be superseded by the time you read this! The list given below is by no means complete and only offered as a general guide. Readers should refer to the published literature and manufacturer's websites for the latest information.
CPLD:
The current Altera CPLD families are the MAX7xxx and MAX3xxx. The last three digits indicate the number of flip-flops or registers per device. For example the 7128 is a 128 register device. The 7xxx series is now a "mature" family (another word for obsolete), and it is one of the few remaining parts that can still operate from 5 volt as well as 3.3 volt power supplies. The newer 3xxx family, recommended for new designs, uses 3.3 volt supplies only but its I/O pins are 5 volt tolerant. Xilinx CPLD families are the 9500 series (with up to 72 registers), and Coolrunner, noted for its low power consumption. Lattice CPLD families are the Mach1, Mach2, Mach4 and the latest Mach4000 series, which offer up to 1024 registers.
Prices for CPLD devices start from less than a dollar each for the smaller devices, making CPLDs very cost effective as a replacement for random glue logic. In general, CPLD prices (and power consumption) are directly proportional to the number of gates in the die whether they are used or not. So it does not pay to over specify a design using a CPLD that is larger than necessary.
FPGA:
The more mature Altera FPGA families are the FLEX6000, FLEX10K, and the ACEX1K series. The latest Cyclone family is targeted at superseding these, although there is still plenty of life left in the existing ones. More advanced families include Excalibur, with a built in ARM 922T CPU processor core. Mercury, aimed at high speed I/O intensive products. Hardcopy, aimed at bridging the transition between FPGAs and custom ASICs. APEX, a high power, general purpose device, culminating with STRATIX at the top end. Xilinx offerings are less complex: their mature technology includes the XC series topped by the SPARTAN series in various disguises, and VIRTEX at the top end, again in various disguises (see table). Top end devices can offer serious processing power, with data bandwidths in the Gigabit per second range and computing power, e.g. DSP with equivalent performance in the 2 GigaMAC range.
|
Manuf |
Type |
Family |
Supply V |
I/O pins |
FFs/ logic elems |
RAM bits |
Equiv gates |
Package options |
|
Altera |
CPLD |
MAX7000 |
2.5/3.3/5 |
36-212 |
21-512 |
PLCC,TQFP,PQFP,BGA |
||
|
Altera |
CPLD |
MAX3000 |
3.3 |
34-158 |
32-256 |
PLCC,TQFP |
||
|
Xilinx |
CPLD |
XC9500 |
2.5/3.3 |
36-192 |
36-288 |
PLCC,PQFP,TQFP,BGA |
||
|
Xilinx |
CPLD |
Coolrunner |
1.8/3.3 |
33-270 |
32-512 |
PLCC,PQFP,TQFP,BGA |
||
|
Altera |
FPGA |
FLEX6000 |
3.3/5 |
71/218 |
880-2000 |
10k-24k |
TQFP,PQFP,BGA |
|
|
Altera |
FPGA |
FLEX10K |
2.5/3.3/5 |
59-470 |
576-12000 |
6k-41k |
10k-250k |
PLCC,TQFP,PQFP,BGA |
|
Altera |
FPGA |
ACEX1K |
2.5 |
66-333 |
576-5000 |
13k-50k |
10k-100k |
TQFP,PQFP,BGA |
|
Altera |
FPGA |
APEX II |
1.5 |
492-1060 |
16k-67k |
420k-1.1M |
BGA |
|
|
Altera |
FPGA |
STRATIX |
1.5 |
340-1300 |
10k-114k |
920k-10M |
BGA |
|
|
Altera |
FPGA |
APEX20K |
1.8/2.5 |
92-808 |
1200-52k |
24k-442k |
30k-1.5M |
TQFP,PQFP,BGA |
|
Altera |
FPGA |
EXCALIBUR |
1.8 |
186-711 |
4k-38k |
53k-327k |
100k-1M |
BGA |
|
Altera |
FPGA |
HARDCOPY |
1.5/1.8 |
275-1060 |
16k-62k |
220k-2.4M |
400k-3M |
BGA |
|
Altera |
FPGA |
MERCURY |
1.8 |
303-486 |
5k-14k |
49k-115k |
120k-350k |
BGA |
|
Altera |
FPGA |
CYCLONE |
1.5 |
65-301 |
3k-20k |
60k-300k |
TQFP,PQFP,BGA |
|
|
Xilinx |
FPGA |
SPARTAN II |
2.5 |
86-284 |
432-5300 |
6k-74k |
15k-200k |
PQFP,TQFP,BGA |
|
Xilinx |
FPGA |
SPARTAN IIE |
1.8 |
182-330 |
1700-7000 |
24k-96k |
50k-300k |
PQFP,TQFP,BGA |
|
Xilinx |
FPGA |
VIRTEX E |
1.8 |
176-804 |
1.7k-73k |
64k-832k |
72k-4M |
PQFP,TQFP,BGA |
|
Xilinx |
FPGA |
VIRTEX II/PRO |
1.5 |
88-1200 |
3k-125k |
216k-10M |
40k-8M |
BGA |
ON POWER SUPPLIES AND INTERFACES
Here is a little quiz: a large integrated circuit may contain several million transistors; assume each has a pull up resistor of about a megohm. Simple calculations (assuming a 50% on/off ratio) tell us that the average resulting resistance between VCC and GND is less than an ohm. With a five volt supply, the corresponding power dissipation will be about 25 watts. In order to reduce this large sink, manufactures have developed MOS technologies working at lower voltage supplies. Since a device that runs at half the voltage consumes only a quarter of the power, this strategy is very sound. Most of the newer FPGAs (and CPLDs) operate at 3.3, 2.5, 1.8 or even 1.5 volts. Many FPGAs have two separate power supplies, one for the internal core (at say 2.5v), and one for the external I/O drivers (at say 3.3v) this makes them compatible with external discrete logic circuitry even though internally they are operating at a lower voltage.
Most CPLDs and FPGAs are not normally compatible with external 5 volt logic, even though some devices with 3.3v I/O drivers can accept 5 volt logic level inputs (a 3.3v output from a GA device can correctly drive a 5v CMOS or TTL logic gate).
Because of their design, CPLDs have a relatively static current consumption, independently of number of gates "active" and only rising slightly at higher clock speeds. On the other hand, the current consumption of a FPGA is directly dependent on the number of gates active and the clock speed (i.e. number of transitions). Power dissipation is also dependent on the software programming method used, for example a synchronous system where all the clocks are fed from the same high speed source, and the use of clock enables, which may or may not reduce clock dissipation.
The moral of the story is simple, before selecting a device for a project, make sure you have read all the documentation and application notes on power supplies and I/O interfacing. Most give charts of power consumption versus speed, and number of gates in use.
LINE DRIVING
FPGAs are heavily used in telecommunications, display panels and CPU bus drivers. No wonder many column inches of advertising are dedicated at profiling the type of interfacing these devices can handle. In order to reduce external component count, bus interface, logic level and impedance matching components are sometimes built into the GAs themselves. This allows the devices to be connected directly to PCI busses and external transmission lines. Devices can include options for differential outputs and inputs, impedance terminating resistors and various voltage threshold arrangements, including clock recovery circuits. This provides a complete in-out (no external component) interface for external loads.
In general, data transmission is effected as a combination of various options of voltage levels, transmission impedance and as single ended, differential, or parallel form. The electronics to perform the serialisation and de-serialisation is known as a SERDES, and is usually built into some FPGA devices as block modules, some of these working at up to Gigabyte speed.
The common standards for single ended I/O interfacing are LVTTL (3.3v), LVCMOS (3.3/2.5/1.8 and 1.5V). PCI (33 & 66MHz busses). GTL and GTL+ (Xerox 0.8v open drain). HST (IBM 1.5/1.8v). SSTL (IBM-Hitachi 2.5/3.3v). Differential standards include LVDS (2.5v) and its variations BLVDS (bi-directional) and LDT (AMD 2.5v).
The GA can be programmed to implement any of these standards by the use of programmable impedance drivers and terminators, input/output levels and slice thresholds.
PACKAGES
Forget the convenient .1" spacing DIP standard packages we all know. Most CPLDs and FPGAs are shipped in a vast collection of pinout sizes and shapes, nearly all in surface mount form. There is some standardisation, but with the vast number of options regarding package size, pin separation, outline shape and footprint arrangements, we have to assume that each device is unique. The bottom line is that you should not expect footprint designs to be catalogue items. The best strategy is to read the data sheet carefully together with the application notes covering package layouts. If using a PCB layout CAD program do make a visual check to ensure the any library footprints dimensions correspond to the information on the datasheet of the device you are using, better be safe than sorry.
Prototyping and soldering surface mount devices requires a masterful combination of bravery and steady hands. However with some practice this is possible, even with the very fine .5mm pin spacing devices. There are many guidelines on how to do this on the Internet, some with step-by-step pictures. Enter keywords “soldering SMDs” in a search engine such as Google to find out more.
PROGRAMMING
All GAs have a number of dedicated pins for device programming. In general there are two programming methods, serial and parallel. With parallel programming, the fuse file is presented as a series of parallel eight bit bytes clocked in one at a time; an internal auto-increment counter loads the data at their right locations in the fuse map. In the serial method, data is clocked in one bit at a time, using a simple four wire serial protocol. The protocol is simple enough to be implemented with a microprocessor, but clever enough to include facilities to allow more than one device to be programmed in daisy chain fashion from the same source. The more advanced devices include functional JTAG interfaces, which can be shared for programming and for on-site debugging.
CMOS based CPLDs can keep the stored program indefinitely. However, one constrain is that the number of times some of these devices can be reliably programmed is relatively small (about 100 times). FPGAs can be programmed any number of times, but the information is lost when the device is powered off. As already mentioned, the permanent storage has to be kept outside the device in an external EEPROM and transferred across on power on.
During development, device programming is usually done via a PC and an interface cable connected to the USB or parallel port. The programming files take many formats: raw binary, Intel style hex or comma separated lists of decimal numbers.
FIG 3 shows the circuit diagram of a typical programmer for Altera devices using the parallel port of a PC. The multitude of resistors are required to provide voltage level compatibility with both 5volt and 3.3volt devices. The programmer can be used to program any device in any of the Altera families, although different value resistors (and supplies) will be required to program the lower voltage parts. The equivalent programmer for Xilinx devices is very similar, but uses different pinouts. Note that two sets of outputs are shown, one for JTAG and the other for PSS (passive serial). Some devices require JTAG format for programming, others PSS (or both).

FIG 3 CAPTION This simple circuit is all that is required to program most CPLDs or FPGAs in the Altera family. The resistors are required to provide compatibility with some of the lower voltage parts. The same circuit is used both for serial programming via the PSS port or for JTAG debugging. A similar layout (but with different pinouts) is used for Xilinx or Lattice devices.
In the next part of the article, I shall describe how to use the development environment and how a simple development system can be put together.
![]() |
![]() |
PART II
Following from last month's introduction to Gate Arrays, Eddie Insam will show how to implement a cheap development kit to get started using your own designs.
DEVELOPING PROJECTS USING GAs
Starting from the hardware point of view, the first thing to do is to decide how much of the application you have in mind is going to be implemented using GA technology. Given a large enough device, it may be theoretically possible to fit in all of your design, but this may be impractical in terms of development time, cost and power requirements. An ideal design would have a neat balance of responsibilities between microprocessors and gate array devices (there can be more than one in a design). A major decision will be whether to use CPLDs or FPGAs. This will be mainly dictated by the size and complexity of the application. As a simple rule of thumb, anything that requires less than 500 registers should be done with CPLDs. For anything that requires between 500 and 1000 registers, consider the use of more than one CPLD rather than go up to a FPGA design.
One of the most flexible parts of GA hardware design is the allocation of I/O pins. Unless you are concerned about speed, you can literally design the PCB first and dedicate any device I/O pins without giving much consideration to their function, as they are all more or less the same and can be allocated later on from the software. As most GAs have so many spare pins it is also common practice to use these as simple interconnect, i.e. connect all the pins of a micro to one side of the GA, and the rest of the circuit to the other side. This will make things like board and system level testing much easier, also reduce the probability of interconnection errors on the PCB design.
GAs are in general very, very fast, and have correspondingly sharp rise times of the order of a few nanoseconds. A lot of care needs to be taken when designing PCB layouts. It is common practice to add low value series resistors on any long tracks carrying data or control signals to remote locations to control rise time problems. Do read manufacturers recommendations on PCB layouts and tracking. Some devices have software programmable rise time control and some of the larger devices also include built in series resistors on their I/O lines.
So much for the hardware. The real effort is in the software, i.e. in the generation of the fuse files that dictate the internal interconnects. It is theoretically possible to generate these files by hand, however as one can imagine, there are software tools that can generate program files from text editors or graphical interfaces. Most manufacturers provide free integrated development environments and tools (IDEs). To somebody who may be used to programming a microprocessor such as PIC or 8051, the development procedures will look familiar; you enter some text using a text editor, you press the "compile" button, and if everything is OK, you press the "program" button to transfer the compiled fuse file to the target. There are of course a few more interim steps here, mainly to do with optimisations, but you will not need to know about these in depth until you move into serious development.
A typical GA design will be composed of a number of modules or "black boxes" interconnected together via wires or bus lines, just like in a real hardware system. Each of these sub-modules can be designed independently, and can be tested or simulated using the various software tools provided. This makes the design approach very modular.
A pleasant surprise to those used to writing standard computer code is that GA design entry can take many different forms: you can for example draw a circuit diagram, using the mouse and a drag and drop CAD like program. You can write code in text form, much resembling traditional software listings. You can enter input in the form of tables, or state transition tables. You can even define your inputs and outputs as waveforms, inputted as timing traces using a graphics waveform editor. They will all produce the same binary compiled output. This allows each sub-module to be written in the most suitable form for its purpose, whether in graphics or text form.
Another major time saving element is that the compiler will do all the optimisation and code reductions. No need to get involved in logic reduction, Karnaugh maps or any of those Boolean simplification techniques learnt at College. The compiler will take your statements, no matter how devious and long handed, and produce the correctly reduced arrangement of gates and registers. This is a major benefit, as it allows the designer to enter their definitions in plain language, and not worry about the tedious details of structure and simplification. The following pseudo-code shows for example how a designer could define the decoding logic for a seven-segment driver. Note how no attempt has been made in the source to simplify the logic, the compiler will do it all.
IF in==H"0" THEN out=B"1110111";
ELSIF in==H"1" THEN out=B"0010010";
ELSIF in==H"2" THEN out=B"1011101";
ELSIF in==H"3" THEN out=B"1011011";
ELSIF in==H"4" THEN out=B"0111010";
ELSIF in==H"5" THEN out=B"1101011";
ELSIF in==H"6" THEN out=B"1101111";
ELSIF in==H"7" THEN out=B"1010010";
ELSIF in==H"8" THEN out=B"1111111";
ELSIF in==H"9" THEN out=B"1111010";
END IF;
The availability of High Level Design tools have given rise to languages that describe the operation wanted rather than the logic functions required to implement them. For example, the high level description FF[].d = FF[].q +1 describes an operation where the D input of a set of flip flops is connected back to their own Q outputs but with the value arithmetically incremented by one, in other words, a simple add by one, or binary counter. The statement does not describe how the gates or flip-flops need to be wired together to perform this function. This is the job for the software, which releases the designer from this responsibility. The compiler will know which is the optimum way of implementing this function for each particular device in a family. It does this by a combination of algorithms and techniques and by referring to internal databases. Some devices for example, may include fast look-ahead carry elements that can be included in the design. Admittedly, some commercial compiler tools will be better than others at this task!
The point here is that designers do not need to concern themselves with optimisations or with the particular use of gates or registers. This allows them to write their inputs in more abstract descriptive ways. The most commonly used high level languages are Verilog and VHDL. Some manufacturers such as Altera have developed convenient short form version of the above (AHDL) mainly to reduce verbosity. I shall not delve too much into details on the use of these languages, as there is plenty of literature and information available.
A development environment (IDE) consists of the various tools such as compilers, libraries, simulators etc, encapsulated by a user friendly graphics shell. Xilinx's free development software comes under the general name of WEBPACK and is available either as a (pretty massive) download from the Internet, or as a CD obtainable from any of their distributors. The CD also contains a full collection of data sheets and application notes, and is a must for anybody involved in Xilinx development. Altera also ships regular CDs containing data sheets, application notes and their main two free IDE suites: Max+Plus Baseline and Quartus II Web. Traditionally, Max is aimed at CPLDs and the lower end FPGA devices, with Quartus covering some of the high end of FPGA devices. The later version of Quartus (from version 2) covers both ranges.
All of the above free development suites are a complete integrated development environment, and are ideal for simple projects of for training purposes. The commercial versions are very similar but also cover the top range devices, access to wider libraries and comprehensive online and telephone customer support.
Unlike development suites for microprocessors, software tools for GAs are closely linked to the devices they are aimed at. Therefore it is important to obtain a tool that includes the full set of specifications (software, electrical and functional) for the device you are working on, this implies getting hold of the latest version available.
ADVANCED TOOLS
As a designer moves into more advanced projects, requiring more gates, faster timing or better performance, they may begin to find that the designs runs too slowly or they cannot be fitted into the device (even though there appears to be plenty of space left). This is because the simple basic tools provided by the free IDEs may not be doing their job at full efficiency.
At this point the designer may benefit from a number of third party tools. These replace the relatively simple intrinsic tools provided by the IDE with more sophisticated synthesis, fitting and simulation algorithms. Some of these third party tools can be rather expensive but can be worth the effort and make the difference between a fast design and one that does not work at all.
INTELLECTUAL PROPERTY
This is the generic name give to sections of software developed by third parties and available for sale, usually on a licence basis. These take the form of modular "black boxes" that can be drag and dropped into your project. Examples include FFT modules, MPEG decoders, cryptography and of course CPU cores.
HARD CORE OR SOFT CORE?
Most GAs are so large that it is quite feasible to include a full CPU implementation in software, and have plenty of spare space left. These are not software emulations of a CPU, but complete, full speed, bit for bit implementation of the real thing. You can embed a well known architecture such that of a 8051, a PIC or even an Intel PC. Xilinx's Virtex II Pro has four hard wired Power PC cores, and Altera's Excalibur includes one 32 bit ARM processor. The reason for embedding a CPU is that some applications are better implemented using traditional Turing type computer technology, rather than by using hardwired logic. The embedded CPU, together with embedded RAM and ROM perform the work of an equivalent microcomputer, but usually much faster. Some manufacturers have developed computer architectures more specific suited for GA implementations. For example, Altera's NIOS, and Xilinx's Microblaze and Picoblaze. As already mentioned, a major advantage of embedding CPUs is the ability to add your own custom instructions to the existing set. The same argument can apply to Digital Signal Processing (DSP) emulation. GAs can easily outperform DSPs for individual designs. Some tools are available to simplify design, such as Altera's DSP builder, which converts Matlab and Simulink output directly into VHDL code.
SO YOU WANT TO GET INTO GA?
So you want to have a go, but don't want to spend a fortune getting a development kit from one of the suppliers. Or maybe you don't mind getting one but don't know which one to choose?
Here are the basics. For a start, avoid spending any money in software as all the tools you require initially are available for free, either as CDs or as Internet downloads. Next, decide which manufacturer you are going to go for. This can be a tricky decision but not very important as the general techniques to be learnt will be the same for all manufacturers, so this will be mainly a matter of personal choice.
Next stage is to obtain the IDE tools, don't worry about hardware at the moment; you won't need any for a while. I shall describe the general procedure for the Altera IDE, as it is easier to use for the novice and the hardware parts more accessible for initial development. A similar procedure will be applicable for Xilinx.
You can download the Altera development suite from their website www.Altera.com. This will be Quartus II web version 3. Unless you have a very fast Internet connection, better send away for the CD, which is always a good idea as it contains a large number of useful data sheets and application notes. You do this by filling the online application form, or by contacting any of their distributors. You will also need a free licence to operate the software. You obtain this online by filling an application form (press the licensing button on the front page). The licence lasts for six months, after which you will be requested to ask for a new one, note that no cost is involved. Why they do this I don't know, but I suspect it provides some form of marketing feedback. Xilinx's WEBPACK software does not require registration as such.
When you receive your CD, proceed with the installation of Quartus II, this will take a few minutes, and may require the installation of a software system driver for the parallel port. This later installation may be the hardest part for some PC configurations. You will also need to ensure the licence file licence.dat as emailed to you by Altera is in the same directory as the program executable.
Now fire Quartus and proceed straight to the Tutorial. You will not need a device programmer or any hardware at this stage. Make sure you have gone through most of the tutorial before you even think of proceeding to the next stage. The learning curve at this point is quite steep, but definitely worth it. You will really need to know all the processes involved, so do not skip any of the sections! The tutorial basically shows you how to use the software, what all the coloured buttons do and what the pretty pictures mean. With the tutorial behind you, you will be able to enter designs, compile and simulate them and see the results on the screen (who needs hardware!)
There is a strange sense of achievement by being able to draw a circuit on a screen, press a few buttons and seeing the waveform they produce. But then again, there is nothing like the real thing. So at this point, you may be itching to do something with real hardware, something that flashes real lights that is. The quick way in is to purchase a development board, which can cost anything from £400. The cheap way in is to obtain your own components and design from scratch.
You will need an actual device. First the bad news, soldering the SMD packages can be a nightmare, so maybe the ready made development board was not such a bad idea after all! However, there is still a way to get started with a minimum of outlay. Some of the older (mature) devices are still available in convenient .1" pitch PLCC socketed packages, and operate at 5 volts supplies. You will not miss anything by using semi-obsolete devices for experimenting or for learning purposes as most GAs contain more or less the same elements and work the same way.
A couple of suitable devices are shown below. These are easily available from suppliers such as Farnell and RS, and can be fitted into standard PLCC84 pin sockets.
Altera MAX EPM7128SLC84-15 128 register CPLD
Connection diagrams for both are shown in figures 4 and 5. Note how the CPLD uses the JTAG interface and the FPGA uses the PSS interface for programming. Most devices use one or the other method, and the IDE is capable of generating programming files for either. It so happens that the MAX family can only be programmed via the JTAG interface. Both devices can take two master clock inputs, but only one is shown. Also note how all the series resistors to the 244’ are not present, as the programmer will only be used on a 5 volt device.

FIG 4 CAPTION Simple development circuit for Altera 128 register CPLD. The device fits in a standard .1” pitch 84 pin PLCC socket and can be easily hand wired for prototyping. All spare I/O pins are available for the user’s application. The device is programmed using the 25 D type connector shown from the PC parallel port. A 5volt 100mA supply is also required.

FIG 5 CAPTION Equivalent circuit for Altera FPGA. An optional configuration device will be required if you want to maintain the stored program after switch off, this would be connected between the 244 and the FPGA.
Although both devices feature high clock rates, it is recommended that for development a low master clock generator be used (anything below 25MHz). The reason may not be too obvious: The routings generated by the software. If the device is used near its limits, some of these paths may be too long and exceed the clock rate causing strange effects and faults. Seasoned programmers would use the simulator and floor-plan editor to recognise and work around these by hand, but this is an activity we will not want to get involved with at this early stage.
All pins not shown are used for general purpose I/O (i.e. for your application). Leave all unused I/O pins unconnected, as some devices have the strange habit of outputting internal node data to unprogrammed pins.
As mentioned earlier, the FPGA program will only stay as power is applied to the device. To make the program permanent, you need to a attach an optional configuration device to the FPGA; this is basically a special purpose EEPROM with some extra logic bits to generate the right programming pulses. Altera's configuration devices are the EPC1, EPC1441 and EPC2. Note that the first two are one-time programmable only, and only the EPC2 can be reprogrammed in-site via its own JTAG interface. It is also possible to use a standard EPROM or flash EEPROM and use a small microcontroller to generate the right timing pulses.
DESIGN TIPS AND TECHNIQUES
Those of you with a hardware background can quickly get started by using the schematic editor and drawing your circuits directly on the screen. Altera libraries even include old time favourites such as 7400 gates, 7493 counters and 7474 flip-flops. However, it is not a good idea to rely on these legacy components, and it pays in the end to use the more modern building blocks, which are also optimised for the devices being used. Blocks include gates, multiplexers, RAM, counters and various other basic abstract sub-modules.
It is also important to know that the logic circuitry inside GAs is built differently and behaves differently from their discrete logic counterpart. Avoid "trick" circuits using gates as delays, external RC components, and even things like ripple counters that depend on propagation delays. Although you can still use these techniques, the results may be unpredictable and difficult to debug. The reason is very simple, one compilation may assign two similar gates next to each other on the die. The next compilation may assign gates that are at opposite ends of the chip, with different propagation delays. Gate combinations may even be replaced with a completely different logic arrangement i.e. with a RAM or with a look up table.
Learn to use synchronous logic wherever possible (this is where all registers are clocked from the same master clock). At first hand this may appear to preclude the construction of counters operating at different clock rates, but this is where techniques such as "clock enable" come into action. Consider the following pseudo-code:
ff1[3..0].clk= clk;
ff1[3..0].ena= enable;
IF (ff1[3..0].q == 9) THEN ff1[3..0].d=H"0";
ELSE ff1[3..0].d=ff1[3..0].q + 1;
END IF;
This describes a synchronous divide by ten counter using four flip-flops. The first program line states that the clock of all four D flip flops is connected to the master clock line (the reverse N..0 notation is common in all HDL languages) The third line describes a data multiplexer: when the count has reached nine, the D inputs are connected to ground, which has the effect of resetting the counter on the next clock pulse. Otherwise they are connected back to their Q outputs arithmetically incremented by one.
The counter is toggled from the master clock, and without the second text line, it would operate at this rate. In order to clock this counter at a different rate, we use the clock enable input, which is externally derived from a source that generates one clock wide pulse in every N master clocks.
A PRACTICAL EXAMPLE
Fig 6 shows a typical example of what can be done with a medium end FPGA costing less than 10 pounds. The circuit shows a complete teletext generator or inserter as may be used in a hotel video distribution system. The chip, in this case an Altera 1PK50, takes as input mixed video sync at logic levels. Its only output is the teletext serial stream at 7Mbps, which is inserted back into the video signal using a separate analogue mixer. An external microprocessor is used to transfer teletext page content in ASCII form, everything else is done within the chip. A serial RS232 link to a PC could have been designed in instead.
Each of the functional blocks shown in the figure were developed as separate modules in the VHDL language. The device is clocked from a single 64MHz oscillator, an internal Direct Synthesis (DDS) generator is used to synthesise the teletext basic bit rate and provide other timing pulses as needed for sync separation and line counting. The interface to the microprocessor is via a standard data bus arrangement. The FPGA looks to the microprocessor as a collection of registers, these are used to set working variables such as starting and end line number, and various other options. Text data for the pages is fed via writes to a single register using an internal auto-increment counter. From this point of view, the FPGA looks no different to the microprocessor from any other parallel driven peripheral. As already mentioned, a serial UART interface could have been used instead for direct interfacing to a PC without the need for a microprocessor. The internal RAM available within the FPGA (about 5kbytes) is used to store about 100 lines of text; a dual port RAM ready-made library module was used here. The other modules perform parallel to serial conversion, parity generation, and generate the timing sequences. Placing all the fast logic in the FPGA releases the microprocessor from any timing dependent duties.
Each module was tested and debugged separately; for example, extra code was added during development to the DDS section to output some of its internal nodes to spare I/O pins, which were connected to a scope to verify timings. After the tests, the extra lines of code were simply removed from the listing.
One of the major aspects of a design like this is that the format of transmission can be changed or redesigned by simply doing a software rewrite. For example a NTSC version, or for a different format of Vertical Interval insertion codes such as time-codes.

FIG 6 CAPTION Gate Arrays in action: a complete teletext inserter using a single FPGA device costing less than 10 pounds. The external microcontroller is only used to initialise the controller and to download pages in ASCII form at a leisure rate. All high speed operations are performed within the FPGA.
TAKING IT FROM HERE
Next time you are working on a project that requires a few logic packages, do seriously contemplate the option of slapping a GA device instead. Apart from the savings in components cost, you may be saving in testing time, PCB modifications, and general hassle. If your project is one of those where a microprocessor is "not fast enough" think of using a GA as a peripheral handling the fast logic. An ideal application would be one where you need a special fast interface for a standard micro, say a very fast UART or counter, or signal generator, or something that requires fast special purpose processing. Programming a GA to behave like an add-on peripheral is a relatively easy task, with plenty of freely available module library support.
The Author
Started his electronics career with his first two transistor radio using 2N35 and CK722 transistors. Things have come a long way since then, and he is now a consultant in innovative applications of telecommunications and signal processing. He can be reached on edinsam@eix.co.uk.
www.xilinx.com
www.altera.com
www.lattice.com
www.atmel.com