Audio D/A and A/D converters:

do they do what it says on the tin?

Copyright Eddie Insam (c) 2004

edinsam@eix.co.uk

 

 

 

Strap

The operation of modern audio D/A converters are a mystery to many people, Eddie Insam attempts to shine some light onto the subject.

 

 

Intro

Another article about audio! Well not quite, some of the bits here may be relevant to people wanting to use cheaply available, high-resolution A/D and D/A audio converters for other applications. So please, even if you are not into audio, do not skip these pages just yet!

 

For one thing I have to explain this article is not about how good or how bad D/A converters sound to the ear. I shall only be covering the technicalities of the conversion processes. Whether a particular method or technique improves or worsens the audio quality, I shall never know for sure. I have no intention of getting involved in this argument. All juicy letters to the editor please, who shall be more than happy to publish anything of relevance.

 

 

Precision Engineering

Ask the average engineer, and they will argue that electronics is a pretty deterministic science. The operation of most electronic circuits can be described more or less to a high degree of accuracy using the most basic of math equations. Ohms law, standard algebra, differential equations etc.

 

However, there are the odd bits that are more difficult to explain, partly because no one knows exactly what is going on, or partly because the maths are intractable. A third, less revered reason is marketing hype, which can shroud the simplest of concepts with the most complicated of explanations. But let's put this last one aside for the moment.

 

Some time ago, I was involved in the design of broadcast audio equipment. This brought me into contact with audio D/A and A/D converters. Although these have been around for years this was the first time I had used them for real. More recently, I tried to use a top of the range 24-bit audio A/D converter in a vibration analyser. The performance was not much better than that of a simple 14 bit parallel converter I had used previously. Typically, cat curiosity made me try to find out why this was and also how they worked. This was the start of my troubles.

 

My starter question was rather simple: the manufacturing of accurate D/A and A/D converters has always been a difficult practical science. So how come all of a sudden, cheap, high-resolution converters can do the job previous types such as successive approximation, binary ladder and others could not manage? Is there a catch?

 

After staring at a number of Audio Engineering Society reprints, IEEE papers and manufacturer’s literature, I had the distinctive feeling I was not getting very far. There were mentions of noise shaping, Z transforms, multi pole filters, but no descriptions of what was going on. Is this a cover up, or simply that nobody actually knows how these things really work?

 

 

The problem, if you can call it that

Digital audio techniques are now commonplace in studio and professional equipment. The various standards cover transmission and storage at 16 to 24 bits resolution using standard PCM encoding and at rates of 44.1, 48, 96 or even 192K bits per second. Operating in full digital PCM mode offers obvious advantages when it comes to switching, buffering and processing the data without loss of quality or dynamic range. A price to pay is higher bandwidth, which can easily be met by modern cabling, digital storage and processing devices. The few analogue components in a typical all-digital studio are now limited to the one or two op-amps used in microphone preamplifiers before the A/D converter.

 

At the user end, the typical CD player generates a serial 16-bit PCM stream at 44100 samples per second. This is usually fed to a low cost audio D/A converter, which recovers the analogue audio stream, which then gets amplified and fed into the loudspeakers.

 

Now for some simple theory, a 16-bit D/A converter's job is to synthesize each digital input code into one of 65536 discrete analogue amplitude levels. At the standard audio rate of 44100 samples per second, this corresponds to each amplitude sample being generated regularly on the dot, every 23uS or so. We shall stick to these numbers as reference values for the examples that follow. 

 

Resistor ladder D/A converters use weighed resistors. Each resistor is associated with a digital bit and is arranged to generate a current proportional to its binary weight. The analogue voltage is generated as the addition of the current contributions from each of the resistors. These resistors need to be matched to within .001% if accuracy is to be maintained, not an easy thing to do in consumer product manufacturing. Another standard method of D/A conversion, the slope converter, requires an integrator using a capacitor with at least .001% linearity. Again, not an easy thing to do.

 

Audio D/A converters claim to need no such tolerance dependent resistors or capacitors. They generate the analogue voltage by an internal process that converts the incoming parallel 16 bit data into a serial one bit stream consisting of just ones and zeros, but toggling at a very fast rate. On an oscilloscope, this data stream appears as logic noise (like random ones and zeros). In other words, audio D/A converters do not use resistor ladders, capacitor ramps, or any other analogue strategies to recover the audio. Audio is recovered by the simple method of (somehow) averaging, or low pass filtering this fast bit stream. The serial sequence has, as its basic property, that its on-off ratio or average or density is proportional to the digital codeword value at the input.

 

By avoiding the use of analogue components, the converter avoids problems normally associated with other conventional methods such as non linearity, uneven steps etc. In other words, a near perfect monotonic D/A conversion.

 

 

 

 

FIG1 Caption: A simple PWM converter, both in digital and analogue form. A regular ramp is compared with the input signal to produce a rectangular wave with variable width directly proportional to the input code

 

 

 

How is this variable density signal generated? As a bit of background, FIG1a shows a circuit many readers will be familiar with, a pulse width modulator (PWM). A PWM converter generates a regular pulse pattern with an on/off (density) ratio proportional to the value of the digital code fed to it. The frame or repetition rate of the output pulse is constant, but the on/off ratio is variable and proportional to the ratio between the digital input code and the full-scale size of the binary counter. A PWM generator can be implemented in analogue form, or in software or hardware form using a standard counter and a digital arithmetic comparator as shown in the figure. The input digital word is compared with the counter and an output toggle flip flop changes value when the two levels meet. The output, when fed to an averaging (read: low pass) filter, will result in our final analogue signal.

 

I should also add at this stage that there is no conceptual difference between a D/A converter and an A/D converter. You can create a converter of one type by including one of the other type in a closed loop. Please don't think about this too hard, as it is just not worth it. I apologise if I have created confusion by mixing the two names throughout this text.

 

 

 

Fig2 Caption: A variation on Fig2, where, instead of a simple binary counter, we have used a random number generator to generate the same comparison amplitudes but "not necessarily in the right order" End results are similar, but the outgoing pulse stream is more randomly spread in time.

 

Fig 2 shows an alternative implementation. Instead of a simple binary counter we have used a random number generator, which can be implemented in practice using a pseudo random binary sequence generator (PRBS). We can visualise such a generator as a version of a normal binary counter that will "count through all the right numbers, but not necessarily in the right order"

 

For both counters (binary and PRBS) the output codes produced will have an average on-off ratio directly proportional to the digital word they are being compared against. In other words, both comparators produce the same average output, only that the output of the version using the PRBS counter will look "noisier" on an oscilloscope.

 

Coming back to our audio example, the frame rate of both PWM and PRBS converters has to be 23uS. This is because each time slot (or frame) needs to contain at least one complete "on/off" sequence cycle to correspond to each input audio sample. Our perfect averaging filter will also need to integrate the PWM output over one frame exactly.

 

In order to generate 65536 different amplitude levels, the graininess of the ramp, i.e. the number of possible widths in the on-off sequence, also needs to be 65536. A bit of simple maths will tell us that in order to get this kind of resolution, the counters will have to be clocked at 65536 times 41000 = 2.5 Gigahertz.

 

And here comes our first little conundrum. In order to produce 65536 accurate amplitude steps, our binary (or PRBS) counter will have to be clocked at well over 2Gigahertz. This is a hard job, even when using the latest silicon technology.

 

 

 

Deltas and Sigmas

Delta is a term used in mathematics to denote the difference between two numbers. Sigma is another term used to denote the sum or accumulation of a set of numbers. Audio A/D and D/A converters are usually called Delta Sigma converters. 

 

A delta sigma (DS) converter is an ingenious form of pulse density generator. Fig3 shows the basic layout. Its basic purpose is to generate a single bit wide pulse stream from a multi valued data input. Delta Sigma converters can be implemented both in digital and in analogue form. In the analogue version, the components in Fig3 will be analogue: op amps, analogue filters, and integrators. The digital version will use counters, arithmetic comparators, adders etc. The converter takes the difference (Delta) of the input and the predicted value, and then integrates the result (Sigma) before quantising it.

 

 

Fig3 Caption: A simplified Delta Sigma converter. In its simplest form the box labelled "filter" is just an integrator, but can be more complex

 

 

Note there are no reset signals. The main difference between a DS and the PWM generator is that there are no fixed time frames associated with the conversion. The output stream is a density value proportional to the input multilevel signal. The output pattern will repeat after some time, but the repetition rate will be dictated by the maths of the conversion and the input level, and not by an external frame reset signal.

 

The table below shows typical output serial patterns generated for different input multilevel (static) signals. Consider the simple example of an exact mid value input voltage (0x8000 in digital terms for a 16 bit converter). The output of the D Flip flop will eventually settle to a repetitive pattern 0101010101. This pattern repeats after only two bits. If the input voltage is a quarter of FSD (0x4000 in digital terms) the output pattern would now be 1000100010001000. This pattern repeats after only four bits.

 

Compared to a standard PWM, where the averaging filter has to wait for the full 65536 clock pulses to recover an analogue sample, the Delta Sigma method sounds promising. Shorter repetition times means the filter can recover samples earlier, and that we can use a much lower clock frequency, but are we right?

 

Consider now the digital word pattern 8001hex. The resulting bit stream will look like a fast repeating 010101 sequence, but wait a minute, after a while, an extra one will appear in the stream. That's right, if we do the sums, we shall count 32767 zeroes and 32769 ones. And guess what, we still have to wait for the full 65536 clock times before we can see the difference.

 

In other words, in order to accurately define a voltage level from a Delta Sigma converter, we still have to clock it at the same rate as an equivalent PWM. That is, at 2.5GHz for our audio example. It just so happens that for some codewords, the wait time to average is less.

 

DIGITAL WORD             OUTPUT PATTERN

8000hex                                    0101010101010101...repeat

4000hex                                    0001000100010001...repeat

2000hex                                    0000000100000001...repeat

8001hex                                    01010101010101......01011010101

 

 
 

 

 

 

 

 

 

 

 

 

 


Statistics to the rescue

Using our audio example, if we want a precision 16 bit A/D Conversion using PWM, PRBS or DS, we must clock the converter at 2.5GHz. There are no advantages between any of these methods. However, and this is a big however, we could get away with clocking at a much lower rates if we accept a few facts of life. Many commercial converters quote clocking rates of x256 or even less. So how do they do it?

 

The first thing is to define what precision means. An electronics engineer would argue that an A/D or D/A converter with a precision of 16 bits means exactly that. That it should output voltage levels with that kind of accuracy at every sample presented, and that is that. An audiophile may argue that as long as the ear can't tell the difference, it doesn't matter too much what the output looks like.

 

We may have to accept that the conversion can be lossy. That is, accept that the output will not be an accurate bit for bit, dot for dot voltage representation of the sampled input. If we could make the differences as invisible (or imperceptible) to the ear as possible, we may be onto something.

 

Take the simple analogy of a JPEG compressed image. The idiosyncrasies of the encoding process allow for large bit savings by adjusting the image in ways our primitive eyes don't notice, even though the compressed image may contain large entropy differences when compared to the original. A visitor from outer space may not be too impressed, as their eyes may be built differently, and see all the differences that we don't see.

 

So much for blabbering, and now for some technical discussion. How can we relate to what is going on? Still working on our DS converter, lets assume the input signal is static in amplitude, or is changing slowly, maybe when fed by a very low frequency sinewave.

 

At the start, we shall find that all of the amplitude levels will be decoded correctly. This is because their output repetition times are short enough for full frame patterns to be completed and for the averaging filter to perform its job. As the input frequency is slowly raised, a few of the codes will not be decoded correctly as their repetition rates start to get truncated. The output levels will not be the ones corresponding to the digital code at the input. We can visualise this by assuming that a form of "pattern dependent noise" has been added to the output signal.

 

To put some numbers in, and assuming the converter is being clocked at 6MHz (a typical frequency used in commercial ICs). Input sine waves lower than 90Hz will result in precision conversion with no noise added (90Hz just happens to be the frequency that corresponds to a frame of 65536 samples at 6MHz). As soon as we shift slightly higher than 90Hz, a good proportion of the codewords will start giving errors Fig4. The higher we go in frequency, the more codewords will burst into error. Codewords such as 8000hex, 4000hex etc, with short repetition rates, will be the last ones to fail in this context. It is a bit unfortunate that not many of these have this property.

 

A more capable mathematician than myself may now get his calculator out and work out a lovely chart of the proportion of codewords that will be correct or in error against input slew rate or frequency. Without getting too boring we can make some simple assertions, for example: all even codewords can stand a cut in sample time of a half. All codewords that divide by four, a cut in time of a quarter, and so on. All prime codewords (as in Mod 65536 prime) will get a penalty hit straight away)

 

 

Fig4 Caption: Error patterns produced by the conversion process as the frequency is increased, This is due to the decoder not being able to average correctly over the full input data samples. The net effect is to add a pattern dependent noise component to the signal.

 

 

So those who thought DS converters will give them fantastic performance as advertised may be disappointed. A top frequency for precision conversion of 90 Hertz doesn't sound very healthy for audio work. But the story doesn't need to end there, hands up those who think they must get rid of noise at all cost? Whether A/D converters are used for video, radar processing or audio, converter noise is a fact of life. Might as well build it into the system

 

 

So where's the noise gone?

Trying to analyse the behaviour of our converter by following the above logic is not going to get us very far (remember the argument about intractable maths) One way forward is to stop trying to analyse the converter in the time domain or in terms of lost codewords, and look elsewhere, perhaps as the maths may be easier then.

 

This is where manufacturer's literature and those AES convention papers come in handy. They've done it all before. A long time ago designers realised that the noise produced by DS converters can be massaged in such a way so as to make it less annoying. To figure why, we must look at the frequency domain.

 

Going back to our PWM encoder of Fig1. If we plot the frequency response of the (unfiltered) output we shall see sharp peaks not only at baseband, where the input audio signal lies, but also at the frequency corresponding to the repeat frame rate Fig 5a. The plot will have peaks at F, 3F, 5F etc. just like the plot of a square wave pulse generator at the frame rate. The analogue modulation, i.e. variation in widths, results in sidebands added to these peaks.

 

Fig5 Caption: Frequency spectrums of (a) simple PWM, (b) PRBS version of PWM (c) Delta Sigma converter. All converters return the same power energy; it is just dispersed differently in frequency.

 

 

The frequency plot of the PRBS version at Fig5b is rather different. Here we do not see single peaks but a relatively flat noise spectrum spread evenly throughout the band, with the baseband signal staying in the same place as before. Here we show a possible advantage of using PRBS methods in PWM applications. The outputs are easier to filter, as there are no high energy peaks, but at the same time, there will be some noise within the baseband.

 

Now look at the spectrum of the DS generator in Fig5c. Notice how the noise energy is not flat, but concentrated away from the low frequency end. Refer to Fig6 for a simple explanation. This is just Fig3 redrawn as an analogue system. The quantiser has been replaced by a noise source. This is a very dodgy assumption, as it requires us to believe that the noise is random and uncorrelated with the input signal (not true) but lets accept this for now. As we are only interested in the effect of the noise, we set the input signal to zero and calculate the output voltage in terms of noise voltage only. Readers conversant with Laplace transforms and analogue filters will quickly recognise the circuit. The transfer function is Vo(t) = Vn(t) * 1/( 1 + H(s)), a simple high pass filter. In other words, the DS converter will reduce frequency noise within the baseband, a definite improvement over the PRBS version, which has a flat noise response. If we do the same analysis for the input signal instead of noise, we find the DS converter acts like a low pass filter, exactly what we want.

 

What does this mean in terms of discrete levels and code errors? It means that the error voltages resulting from the time truncations have mostly high frequency components. Another way of saying this is that the noise voltages are not very correlated from sample to sample. It also means they have a high energy content, a factor that must not be underestimated. The DS converter is not removing any noise; it is just pushing it under the carpet at the high frequency end.

 

 

Fig6 Caption: Analogue equivalent of a Delta Sigma converter. For analysis purposes, the quantiser is replaced by an equivalent uncorrelated noise source.

 

 

 

The chips out there

How well behaved are real life A/D and D/A converters? In order to get better performance, manufacturers have improved greatly on the basic DS scheme. They have used complex higher order H(s) filters, they have also used multi level quantisers. All this has resulted in schemes that push noise corner frequencies higher up still. Results to date are quite impressive, but the core limitations are still there to haunt their performance.

 

Fig7 shows the AES 24 bit serial output stream of a Cirrus CS5396, one of the best 24 bit professional A/D converter chips around. The analogue inputs have been shorted internally, so what is shown is conversion noise only. The figure shows a number of data scans superimposed together to show an "eye" pattern. Note how the last 6-7 serial bits overlap and dance about. Clean conversion is only performed up to about 17-18 bits, the rest is just noise. The Cirrus CS5396 is not really a 24-bit converter; in reality it is a 17-bit converter. To get full 24-bit "noise free" performance, the CS5396 would have to be sampled at 1500bps, not quite HI-FI.

 

Another top range A/D converter, the AKM5394a, performs similarly. Among the modest range, the Texas 24bit 96kbps PCM1804 only produces 14 "real" bits.

 

Some manufacturers such as Texas are now specifying their converters in "noise free bits" A parameter highly dependent on conversion rate. For example the 24-bit ADS1255 30Kbps A/D is specified as operating at 23 noise free bits at 10bps, and 17 bits when operated at its advertised rate of 30kbps. This is a far more realistic description of their performance.

 

 

Fig7 Caption: Eye pattern output from the 24 bit decoded stream from an A/D converter. What is advertised as a 24-bit converter can only provide in practice 14 noise free bits.

 

 

The noise is always there, the total noise power is the same, but pushed up into the higher frequencies. The amplitude components are huge. Just because we can't hear them, it doesn't mean they are not there. These high levels of high frequency noise bring other problems, more associated with circuit layout and design. What we may hear as a hiss is actually a cacophony of repeated pattern birdies and square wave sounds. These are at frequencies above hearing, but in a practical PCB construction, these may interact with one of the many clock signals present on the board to produce in band interference. A fixed DC offset in the converter's input for example, can produce in-band repeated patterns or whistles. A bad PCB layout can easily show all these problems.

 

 

 

What About SACD? It sounds so much better than CD!

Oh dear, am I going to get some flak here... The SACD philosophy is to use the one bit stream from a Delta Sigma converter, and store it directly onto the CD without further processing, i.e. without converting it into the normal 16-bit 44100 data format. Similarly when playing back, the one bit stream, directly from the disk, is fed to a low pass filter to recover the audio. Simplicity in itself. The makers also claim superior audio quality.

 

As the technical description goes, a SACD signal is a one bit stream at 2.8Mhz bit rate. Decoding is performed with a simple low pass filter. When compared with a 16-bit CD quality signal at 41000 rate, SACD contains 64 bit samples in the same time frame period. Because of its higher sampling rate of 2.8MHz, audio frequencies of up to 100KHz can be discerned.  Superficially it sounds impressive, 64 samples sounds a lot more than 16 samples from a normal CD and the high sampling rate of 2.4Mhz can only be good.

 

But then just a second, if the output filter is just a low pass filter, the 64 bits can only be combined together to form only 64 different levels. Unless I am missing something, this is equivalent to a 6-bit resolution converter at the same 44100 rate. This is nowhere near the 16-bit resolution of a CD

 

As real music is not all composed of 22KHz bursts, we could argue that given a longer settling time, the conversion quality will increase. This makes sense, but still, putting numbers into perspective, in order to obtain say 10 bit accuracy, the effective sample rate needs to go down to 2000 samples per second, a bit below telephone quality. Are we missing something?

 

Oddly enough, there is very little in depth discussion of the operation of SACD in the literature. There are many papers discussing its basics, its noise behaviour, and plenty on subjective appraisals. We know from the discussion above, that the precision of the conversion is doubtful. There is no way, by using a simple low pass filter, that the correct analogue levels can be recovered precisely.

 

Of course we are still thinking like engineers, don't forget. This is the audio world do you mind. If audiophiles say SACD sounds good, we can only accept that, even if the voltage outputs are completely different to the inputs.

 

 

SACD has also been promoted for use as a studio resource, however the case against using bit stream techniques in studio environments has been argued quite heavily by a few parties. As a lossy technique, it does not bring itself for studio operations such as mixing, filtering or addition of dither. It makes sense to think that the place for SACD is at the end of the playback chain with just the one conversion between the stream and the listener.

 

 

Conclusion

What this article shows is that the techniques used for audio A/D and D/A conversion result in real devices that are not necessarily suitable for applications other than audio. Real precision can be considerably lower than the claimed specifications, and that most of the articles written have concentrated on one aspect of the analysis, noise shaping, without really covering how these converters really work.

 

Commercial chips are not so magical after all. Still, they are quite impressive and perform well in an audio environment, where manufacturers have put a lot of effort and ingenuity into squeezing every ounce of performance out of Delta Sigma techniques.

 

 

 

For more information

Best way to access these is to Google the relevant keywords mentioned:

 

Crystal Semiconductor Audio Databook. A mine of information on devices, techniques and many application notes. Also available in CDROM form from www.cirrus.com

 

Fujimori, "A Multibit Delta Sigma A/D" IEEE Journal Solid State Circuits Vol 35, nr 8. Aug 2000. A typical example of current trends in DS converter design.

 

Intersil, application note AN9504 "A brief introduction to Sigma Delta Conversion"

 

Analogue Devices application note AN283 "Introduction to Sigma Delta Conversion"

 

S. Lipshitz, Derek Reefman and others. AES Journals and Conventions. Various papers on the pros and cons of SACD. Search Google on "Lipshitz" and "Reefman"