Every audiophile and home theater owner understands how video quality is defined using parameters such as aspect ratio, resolution, black level, contrast, color balance, bit-rate, and so forth. Online and print magazine reviews usually include detailed technical data for projectors and flat-screen displays to help readers make informed decisions. Unfortunately, the methods used to spec and test audio gear is a mystery to many people, and too often reviews include vague subjective prose that’s mostly meaningless. What is a “forward” sound? What do “etched highs” sound like? Are these properties desirable or not?
Only four parameters are needed to define everything that affects audio fidelity: Frequency response, distortion, noise, and time-based errors. In truth, these are really parameter categories, and most contain several subsets. For example, distortion refers to the addition of new frequencies not present in the original source. Noise includes analog tape hiss, hum and buzz from ground loops or light dimmers, and vinyl record clicks and pops. Time-based errors are turntable wow, analog tape flutter, and digital jitter. This article explains these parameter categories in detail, and relates the amounts to their audibility.
THE FOUR PARAMETERS
Frequency response defines how uniformly an audio device passes different frequencies within the audible range. Our ears can perceive sound from slightly below 20 Hz to around 20 KHz, so passing frequencies within this range uniformly is expected for audio devices that strive for high fidelity. Frequencies lower than 20 Hz can be heard, but only if they’re very loud. Further, the upper limit becomes lower as we age. At the time of this writing I’m 64 years old, and I can hear up to about 14 KHz. But younger people can hear the full range, so it’s important for audio gear to pass all frequencies equally.
A typical frequency response spec for a power amplifier might be 10 Hz to 50 KHz +/- 1 dB, similar to the graph shown in Figure 1. This means that all frequencies within that range pass through the device equally, varying in volume less than 1 dB louder or softer compared to other frequencies in that range. As this graph shows, electronic circuits are typically very flat, falling off in level at the low and high frequency extremes. The smallest response change most people can detect is around 0.3 dB, but that’s only at midrange frequencies under ideal listening conditions. So having a response flat within half a dB or even 1 dB over the audible range is generally considered sufficient for high fidelity.
Figure 1: A typical power amplifier passes all frequencies in the range between 10 Hz and 50 KHz without deviating by more than 1 dB.
Some audio enthusiasts believe that a frequency response beyond the 20 KHz limit of human hearing is necessary, but that’s never been proven scientifically. Our ears simply don’t respond to ultrasonic frequencies, nor is there a known mechanism by which we could otherwise perceive supersonic frequencies. A study by Tsutomu Oohashi  from 1999 is often cited by audiophiles as proof that ultrasonic content can be heard, however, that study was flawed because different high frequency tones were played through a single tweeter. IM distortion (see below) in the tweeter created difference frequencies that were in the audible range, which is what the test subjects heard. When the test was repeated  using separate tweeters for each ultrasonic component, none of the test subjects were able to tell when the ultrasonic tones were turned on and off.
The two main types of distortion are harmonic and intermodulation, and both are almost always present together. Harmonic distortion adds new frequencies that are mathematically related to the source, and their magnitude can be expressed as either a percentage or some number of dB softer than the original. A factor of 10 represents a 20 dB level difference, so 10 percent distortion means the added components are 20 dB softer than the source, 1 percent is 40 dB softer, and 0.1 percent is 60 dB down. If you play a 100 Hz sine wave through a preamp that adds 1 percent second harmonic distortion, the output will contain the original 100 Hz, as well as an added 200 Hz component that’s 40 dB softer.
Figure 2 shows a Fast Fourier Transform (FFT) of typical harmonic distortion components added to a 100 Hz sine wave when the input is overdriven slightly. An FFT display lets you assess added noise and distortion artifacts, to see exactly how much was added at every frequency. In this figure the level of each individual artifact is shown, so the total distortion amount is the sum of them all which requires adding up many numbers. Seeing each component is useful for circuit designers, but tedious when you just want to obtain a single distortion figure. Therefore, a different method is used to assess distortion in audio equipment. A pure sine wave having very low distortion is sent through the device being tested, then a notch filter inserted at the output removes just that one frequency. Whatever remains is the sum of all noise and distortion added by the device.
Figure 2: An FFT display shows volume level versus frequency, and it’s commonly used to assess the amount and nature of noise and distortion artifacts added by audio equipment.
Most audio devices add distortion at both odd and even harmonic frequencies. In this 100 Hz example the even harmonics are 200 Hz, 400 Hz, 600 Hz, and so forth, while the odd harmonics are 300 Hz, 500 Hz, and 700 Hz, etc. Some people believe that even-order distortion is “musical” while odd-order distortion is harsh sounding, but that’s not really how it works. Most musical instruments naturally contain both even and odd harmonics, so a device that adds either harmonic type merely changes the instrument’s inherent tone color by some amount. What really matters is the amount of the artifacts.
Intermodulation distortion also adds new frequency components, but these components are not necessarily multiples of the source frequencies. So IM distortion can be less “musical” than harmonic distortion. As a simple example, consider a two-note A Major chord with an A note at 440 Hz and the C# above at 554 Hz. When these two frequencies pass through a device that adds IM distortion, new inharmonic tones are created at the sum and difference frequencies:
554 Hz + 440 Hz = 994 Hz
554 Hz – 440 Hz = 114 Hz
In this case 994 Hz is not a standard musical tone, falling partway between the notes B and C, neither of which are related to the key of A Major. Likewise, 114 Hz isn’t a standard note frequency, falling about halfway between a low A at 110 Hz and the A# above at 117 Hz. If you’re bothered by the sound of an amateur musician playing wrong notes on a piano that’s out of tune, that’s not unlike the sound of IM distortion. As with harmonic distortion, the sum and difference frequencies also occur at numerically related multiples. So besides 114 Hz and 994 Hz, additional difference tones occur at 228 Hz, 342 Hz, and 456 Hz, as well as sum components at 1,988 Hz, 2,982 Hz, and 3,976 Hz.
Another type of inharmonic distortion is called aliasing. This occurs in sound cards and outboard digital converters, and is caused by inadequate input or output filtering. With aliasing, the sum and difference frequencies are related to the musical source frequencies and the sample rate. However, aliasing in all modern converters is so low it’s not audible. Indeed, one important goal when designing audio circuit is keeping all of the inevitable artifacts soft enough so they won’t be heard.
Like distortion, noise is also a broad category that encompasses many subsets. These include AC power-related hum, between-station radio noises, electronic crackling, and left-right channel bleed-through (cross-talk). Because there are so many different types of artifacts that audio devices can add to the music, and just as many causes, it’s common for manufacturers to lump them all together into a single number called “distortion plus noise” which is then spec’d as either a percentage or number of dB below the output signal.
Time-based errors affect the pitch and tempo of mechanical analog playback devices such as turntables and tape recorders. Digital equipment also has a type of time instability known as jitter. But jitter occurs at such a fast rate – equivalent to GHz frequencies – that it’s not heard as pitch or tempo changes. Instead, the result is “frequency modulation noise” that may be either random or harmonically related to the source frequencies in the music, depending on what causes the jitter. Thankfully, jitter artifacts in most modern audio gear are at least 100 dB below the music, so they’re never heard either.
The last type of time-based error is phase shift, which is a short delay whose timing changes with frequency. But this too is inaudible except in contrived situations The only time phase shift can be heard is when it’s different in the left and right channels, or while the amount is being changed, or if it’s extremely severe (thousands of degrees). This never happens with normal audio circuits.
If a device has a frequency response that’s within 0.1 dB across the audible range, with distortion and other artifacts at least 80 dB below the source, and time-based errors too small to hear, then that device can be considered audibly transparent. By definition, one audibly transparent device sounds identical to all other such devices. This brings us to audibility thresholds, the masking effect, and Fletcher-Munson – all three affect the level at which artifacts can be perceived.
I often see claims in magazine reviews and audio forums that connecting a digital converter to an external clock improves the sound compared to using its internal clock. The theory is that the amount of jitter is reduced, which is claimed to increase spaciousness, fullness, and overall clarity. However, jitter noise is typically 120 dB below the music, which in turn is more than 20 dB below the noise floor of a CD. Further, fullness is a frequency response issue that has nothing to do with background noise and is easily measured. Spaciousness is determined by volume and timing differences in the left and right channels, as well as room ambience embedded in the music. When recording music, mix engineers create a sense of spaciousness by placing two or more microphones at different locations in the room, or by adding artificial reverb.
To test these claims for myself, and for you, I’ve created Wave files that show at what level various types of artifacts can be heard, and several such tests are available for download from the articles page of my web site. One great feature of these tests is that you can download and play them on your own system, rather than take my word for what I hear through my speakers and headphones. One such test mixes a very obnoxious sounding noise under different types of music at various levels. To make the noise even more obvious it pulses on and off every two seconds. Most people would be hard pressed to hear this noise when it’s only 60 dB below the music, which is 1,000 times louder than typical jitter!
Absolute volume level is not the only factor that determines how loud an artifact must be in order to hear it while music plays. The masking effect is a psychoacoustic phenomenon that hides soft sounds in the presence of louder sounds at similar frequencies. It’s easy to hear the hiss from a cassette tape during a bass solo, but not so much while a cymbal or violin section is playing. Likewise for AC hum—you can’t miss it while a solo flute is playing, but it’s difficult to hear under a bass drum unless the hum is very loud. Therefore, noise or distortion that has a frequency spectrum similar to the music will not be heard as readily as artifacts consisting of frequencies several octaves away.
Another factor that affects audibility is the Fletcher-Munson equal-loudness curves, developed back in the 1930s and refined slightly in the years since. These curves describe how loud various frequencies must be in order to sound the same volume. Our ears are more sensitive in the midrange than at very low or very high frequencies. So if the noise or distortion of a preamp consists mostly of very low frequencies, it will not be heard as readily as artifacts at frequencies around 2 to 4 KHz where our ears are most sensitive.
To better relate artifact levels to their audibility, audio equipment manufacturers apply weighting curves to their measurements. The most common curve is A Weighting, shown in Figure 3. A filter having this response is applied to the measurements to emphasize the midrange and reduce the level at the frequency extremes. This way the stated amount of noise (or distortion) more closely relates to how loud it actually sounds.
Figure 3: A-weighting intentionally reduces the contribution of low and very high frequencies, so noise measurements will correspond more closely to their audibility.
DO YOU HEAR WHAT I HEAR?
Earlier I stated that one transparent device will sound identical to any other such device, and this is certainly true! But sometimes people fail to understand the frailty of hearing perception, letting them be tricked into believing the sound changed even when it didn’t. This is why blind testing is mandatory to avoid the placebo effect and expectation bias. Loudspeakers really do vary enough for most people to hear a difference, but modern electronic devices are sufficiently transparent to sound the same, or at least very similar. A null test is even more conclusive when such a test is possible. This type of test reveals the difference between two signals, such as the input and output of a preamp, by using a circuit that “subtracts” one signal from the other. If the residual that remains is at least 70 or 80 dB softer than the original source, the device is sufficiently transparent.
But sometimes people believe they hear a difference even when no difference is likely. There are two reasons for this: One is the short-term nature of human hearing. If it takes five minutes to swap the signal wires from your CD player to your receiver, it’s difficult to recall the exact tonality. Further, we can focus on only part of the total sound at a given time. You might notice the tone quality of the electric bass on one playing, but ignore how the saxophone sounds. Is that delicate cymbal ping really clearer after replacing an RCA or speaker wire, or does it just seem that way? In my opinion, the best way to fairly compare two sources is to switch quickly back and forth several times playing the same section of music, while listening carefully for the same details each time. Many audio editor programs can do this easily, such as Sony’s Sound Forge that I use.
However, sometimes the sound really does change even when the equipment and wires are the same. This is due to acoustic comb filtering, a frequency response error characterized by a series of peaks and deep nulls. If you move your head even one inch while listening to loudspeakers, the sound waves reaching your ears will have a very different frequency response. Figure 4 shows the response I measured in a small room at two locations only four inches apart, which is smaller than the distance between an adult’s ears. At mid and high frequencies the disparity is even larger. So unless you literally clamp your head in a vise while you listen, or wear earphones, the frequency spectrum you hear will change quite a lot as you move around slightly in your seat.
Figure 4: This graph shows the low frequency response measured at two locations four inches apart in a room that’s 16 by 11-1/2 by 8 feet high. Even over such a small physical span, the response changes substantially at many frequencies.
We don’t usually notice these changes when moving around because each ear receives a different response, so we perceive more of an average. A null at one ear may not be present at the other ear, and vice versa. Further, all rooms behave this way, so we’re accustomed to hearing these response differences and don’t usually notice them. However, the change in response over distance is real, and it’s audible if you listen carefully. If you cover one ear it’s even easier to notice because the frequencies missing in one ear are not filled in at the other ear.
When listening to music some frequencies are harsh sounding, such as the range between 2 and 4 KHz. Other frequencies are full sounding (below 200 Hz), and yet others have a pleasant “airy” quality (above 5 KHz). If you listen in a location that emphasizes harsh sounding frequencies, then change a speaker wire and sit down again two or three inches away where harshness happens to be suppressed, it’s not unreasonable to conclude that the new wire was responsible for the improvement. Likewise, exchanging a power amplifier or CD player might seem to affect fullness even though the low frequency response change was due entirely to positioning. I’m convinced this is one reason some people believe the sound improved after applying various audiophile “tweaks” such as speaker wire elevators, power “conditioner” products, vinyl LP “demagnetizers,” and so forth.
All domestic size rooms have not only many peaks and deep nulls as shown in Figure 4, but the peaks also resonate causing those frequencies to sustain over time. Figure 5 shows a waterfall plot that displays both the low frequency response and ringing measured in the same room as Figure 4. In this type of graph the “mountains” come forward over time. The same acoustic reflections that create peaks also cause the peak frequencies to linger after the original source stops, which further muddies the sound. Likewise, echoes at mid and high frequencies occur in rooms that lack absorbing materials such as drapes, carpet, and soft furniture. Indeed, the room you listen in arguably has much more affect on what you hear than any component, including the loudspeakers!
Figure 5: This waterfall graph shows both the low frequency response and ringing in a typical small room. Ringing is caused by acoustic waves bouncing repeatedly between opposing walls, causing the sound at some frequencies to continue for half a second or longer after the source stops.
WHO’S ON FIRST?
You’ll realize the biggest improvement in audio quality by attacking the weakest links first, and clearly the room you listen in is the weakest link. Even minimal attention to room acoustics and loudspeaker placement will make a very real improvement. It’s pointless to obsess over things that barely matter, such as low-jitter converters and ultra-high sample rates, when the acoustics of your listening room degrade the sound so much more severely. The timing errors in even the finest analog tape recorder or turntable are literally 1,000 times worse than the jitter in a $15 sound card, yet I don’t recall hearing people complain about wow and flutter.
Likewise for distortion and frequency response. The distortion added by modest electronic gear is typically below 0.1 percent, while the distortion of loudspeakers can easily reach 5 percent or even higher at low frequencies and loud volumes. This is not to say that all electronic devices sound identical, but modern competent audio gear varies a lot less than some salespeople and magazine reviewers would have you believe. In the grand scheme of things, understanding what affects audio quality, and by how much, can only help us to be smarter consumers. And that’s definitely a Good Thing!
Ethan Winer is a reformed rock ‘n’ roll guitarist who sold his successful software business in 1992 to take up the cello. Ethan has, at various times, earned a living as a recording engineer, studio musician, computer programmer, composer/arranger, circuit designer, technical writer, and college instructor. He has written more than 100 feature articles for numerous audio, music, and computer magazines. Ethan now designs acoustic treatment products for his Connecticut company RealTraps, and his new book The Audio Expert from Focal Press explains advanced audio principles and theory in plain English with minimal math. Connect with Ethan at www.realtraps.com and www.ethanwiner.com.