In response to a recent article, a reader wrote:
Is there still any need for compressed data? As far as I can tell, no one has ever in a double-blind test been able to recognize anything better than CD-quality uncompressed audio. MPEG adds features that can be identified with practice. In theory, FLAC doesn't, but the mere act of decompression can lead to jitter. As in this day and age there is no need to compress CD quality, then why bother? It may be worth having higher bitrate and depth on audio recording/processing, but for listening purposes, CD is as good as you need or indeed can get.
These are all interesting points and well worthy of consideration. Let's take them point-by-point.
Do we need compressed audio data?
For me, at least, the answer is a clear yes. I do most of my day-to-day work on my fine, gracefully aging, System76 Gazelle laptop. Some years ago, I replaced the 500GB 7200RPM hard drive with a 480GB SSD. As far as I could determine at the time, my best bet, in terms of performance and reliability, was a SanDisk ExtremePro, which especially appealed to me because of the included 10-year warranty.
I also travel a lot. When I'm away I really enjoy listening to my music collection, which is mostly FLAC, quite a bit of it ripped from CDs, but also higher resolution stuff, generally up to 24-bit / 96 kHz pulse-code modulation (PCM) data, with a small scattering of MP3 files that were the only options for acquiring certain music. In total, this collection takes up:
clh@avignon:~$ du -s --block-size=G Music
So, of the 480GB provided by my SSD, 237GB is taken up by music. That leaves me another 200GB for the system and my work, with a bit of a buffer. If I moved from FLAC to WAV, that would more or less double the amount of space needed, which would leave me no room for work (or worse). I could buy a 1TB SSD but, or I could replace the CD/DVD ROM drive so I can rip the occasional CD and create the occasional DVD-RW backup. Practically speaking, therefore, I'm going to be living with FLAC files on my laptop for now.
Recently, I bought a wonderful standalone digital audio player, the xDuoo X3 II, from Massdrop. This is a fine-sounding device, built around an AKM4490 DAC chip. In terms of functionality, it's basic—no touch screen, no WiFi, no Android—but it works well and sounds great. It plays PCM (FLAC, WAV, whatever) up to 32-bit / 384 kHz, as well as DSD stream files (DSF) up to DSD128, so it more than covers my needs. It uses a MicroSD card to store music files and my laptop has a MicroSD writer, so I'm good to go. I can copy my music library from my laptop to my new 256GB MicroSD card and enjoy my music without having to rip it to MP3 or some other lossless format. But that 256GB MicroSD card is pretty darned full, so good thing I'm using compressed audio data, i.e., FLAC format.
In summary, I need around 250GB of storage for my music collection and 500GB SSD and 256GB MicroSD cards are at a decent price point. So, yes, compressed audio still matters to me.
In a double-blind test, can anyone recognize better than CD quality?
There are two issues here. The first is the idea that double-blind testing is the ultimate way of detecting differences in audio reproduction. The second is being able to recognize differences between CD-quality and higher resolutions. Let's take them one at a time.
I get the importance of double-blind testing in general, but I also get that it's very simple to demand it as the gold standard in detecting audible differences (between different input data, between different equipment, between different rooms…) without necessarily being aware of all the moving parts.
First, let's think about what "music enjoyment" means. Evidently, lots of people enjoy their music over $5 earbuds and smartphones. At the other extreme, some people spend many thousands of dollars to equip their listening environments. I submit that not one of these people spends a great deal of their time doing double-blind A/B/X testing to assure themselves that their source files or equipment or listening environment or the phase of the moon is making any difference to their pleasure. Instead, they are in a zone, perhaps taking the bus or exercising or cooking to background music, perhaps listening intently to the music and avoiding other distractions.
So how does the act of double-blind listening mimic, in any meaningful way, the act of enjoying music?
Turning that question around, who feels comfortable knowing that a panel of people with different and unspecified levels of education, musical interest, familiarity with instruments' sound (be they real or electronic or something else), interest in listening critically, and "taste" (whatever that is) can somehow tell us that there are (or aren't) meaningful differences between component A and B or file X and Y?
As I get older, I find I really detest background music except when I'm cooking—how could a person possibly make empanadas that look this good without Violeta Parra playing in the kitchen—or when people stop mid-sentence and say something like "Wow! I'm really digging this music you're playing." Instead, I save my music listening for when I'm alone and can savor what I'm hearing. Moreover, I recognize my process of music appreciation has changed, and I have very little interest in whether a panel composed of people who listen to Spotify or MP3s on cheap earbuds can tell the difference between a CD-quality version of a given album and the high-resolution equivalent. It's just not relevant to me.
I've played guitar since I was seven years old and I'm almost 63. I know what a guitar sounds like as my head is hanging over it and I hear what I'm playing. I know that it sounds wonderful in the bathroom, less so in the bedroom. When I hear a well-recorded guitar, whether it's on LP, or CD, or high-resolution digital, I know that it's well-recorded. When it's not well-recorded… no surprise, it doesn't sound like a guitar; it sounds like a mediocre recording of a guitar. But I don't know what a cello, or a violin, or a Moog XYZ really sounds like. I haven't spent much time listening to these instruments, and I certainly haven't critically or carefully listened to recordings of them to detect differences. Have those focus groups listened critically and carefully and with a lot of prior experience? Who knows?
Beyond that, what about recordings made, let's say, without great care? Is the quality of those recordings in 24/96 going to be much better than the 16/44.1 versions? Is there a true difference in the content between the high-resolution and the CD version?
And finally, what about the equipment? Is it really up to reproducing the differences that could be detected in a double-blind test? Could, as some people suggest, the equipment's bandwidth make the high-resolution recording sound worse (rather than the same or better)?
So, double-blind testing. From my perspective, unless all the variables (including ones I haven't considered) are carefully and properly controlled, who knows? When someone figures out a way to double-blind test the way I use music, maybe I'll be more convinced.
Has anyone recognized a difference?
Here we are on more solid ground. Apparently, people have recognized the differences, as suggested by Joshua Reiss' AES meta-analysis. There's also "Inaudible high-frequency sounds affect brain activity: hypersonic effect," an article I find fascinating perhaps because it's so wonderfully clinical.
Reiss' paper is particularly interesting to me because it (scientifically) discards an early paper that purports to show there is no audible difference in medium- and high-resolution files. It says, "[r]esults showed a small but statistically significant ability of test subjects to discriminate high-resolution content, and this effect increased dramatically when test subjects received extensive training." In other words, the average person can hear a difference, and people who have invested in carefully training their hearing can clearly hear differences.
Why we may (or may not) hear differences in so-called high-resolution files
Before we can determine whether we can hear a difference between a high-res, CD, and MP3 version of the same recording, we need to ask: "Is there actually a difference"?
Open source to the rescue! Spectrum analyzer Spek is one of my favorite tools for checking the so-called high-resolution music files I buy to make sure they can legitimately be called high-resolution (and yes, I have complained to the download stores when I've found their high-res files wanting). I don't know if Spek's Fourier transforms are perfectly executed (although because it's open source, I could find out if I wanted to). Nevertheless, Spek tells some interesting tales.
Here is Orchestra Baobab's "Woulinewa" at 24/96:
Look at the overtones out there up to 40kHz+. Think about filtering those to fit into 16/44.1 PCM. Now think about analog-to-digital and digital-to-analog conversions that might—or might not—cope well with the overtones at 15kHz to 25kHz. Slow filters (that don't get rid of aliasing errors), fast filters (that get rid of aliasing errors but may introduce phase shifts)… who knows what happens when producing a CD-quality version of this file?
But there's also the reverse problem. Here's Trentemøller's "One Eye Open" at 24/96:
Wow, look at that cutoff just above 20kHz! There has to be some serious filtering happening to make this track suitable for 16/44.1 CD quality.
Let's think about this a bit more. First, in both music files, the overtones are at a pretty low level: -80dB or below. I hear skeptics out there grumbling "of course we can't hear that." But we need to look at the differences between the high-level and low-level music to put this in context. A lot of the fundamentals are in the -40dB to -45dB range, so the difference between the lower-level fundamentals, at -45dB, and the higher-level overtones, at -80dB, is only 35dB (considering voltage, every six dB is more or less twice as loud, so the difference here between -45dB and -80dB is about a factor of one hundred - -80dB is 1/100th of -45 db.)
Moreover, we are accustomed to higher frequencies at a lower level; look at the "green stuff"—it's in the -60dB to -70dB range, that is, only 10dB to 15dB louder than the overtones (again, a factor of two to three times as quiet or as loud). Do we hear the green stuff? The yellow stuff? The cyan stuff? I don't know.
I could paste a bunch more Spek screenshots that show content above 20kHz, as well as ones that show a brick wall filter. But my main point here is: Don't expect your 24/96 (or higher) files to have a ton of high-resolution content—sometimes, they don't!
According to this Drummer World report, cymbal frequencies run way out to the inaudible. Cymbals make musical overtones that—in theory—only bats can hear. But my point is not that ultra-high-frequency stuff is audible; it's that the things we do to get rid of it could create lots of nasty side effects. Of course, the theory says it won't. But that's the theory. Can I trust that all ADC and DAC implementations get this right? Hmmm…
What about comparing CD-quality and high-resolution versions of the same song?
I don't have a lot of music in both 16/44.1 and high-resolution versions, but I do have some Led Zeppelin duplicates. I used Spek to analyze the 16/44.1 version of "Immigrant Song" from the album Mothership and the 24/96 version from Led Zeppelin III (Remastered). I also stretched the Spek window for the 24/96 version so the 20kHz lines are at the same vertical position. Here's what it looks like:
In the 24/96 version, there is quite a bit of content above 20kHz that lies between -60dB and -70dB, which is not all that far down. That content was filtered out in the 16/44.1 version to get it below the Nyquist limit.
So, using the Spek test, this track looks like a decent candidate for comparing CD quality and high resolution. However, there could be other differences between the two versions.
For example, one version could be compressed more than the other or its overall level could be set higher. Compression (which involves moving the signal level closer to the maximum—0dB—differentially, so very quiet passages are louder and loud passages are right at the limit) is often done to make tracks sound louder on broadcast media, which is deemed by some broadcasters to be appealing (see Wikipedia's entry on loudness wars). This is apparent in the version of "Communication Breakdown" on Mothership, where the peaks during the intro appear between -3dB and 0dB, compared to the one on Led Zeppelin I (Remastered), where the peaks appear between -6dB and -3dB. In other words, the intro on Mothership is twice as loud as the intro on I. You can see this difference by using a great open source audio player (like Guayadeque) that has level meters.
Another example: Older music was often mastered with the bass turned down because heavy bass was not playable on cheap record players in the 1970s, and when it was remastered the bass may have been turned back up. This can be tested with a spectrum analyzer that shows levels at a given instant. Audacity has a plugin that appears to offer this kind of measurement.
Or, there could be other changes that make two song versions sound quite different. "Communication Breakdown" on Mothership has been remixed so that the intro guitar riff is solidly in the center of the stereo image, rather than off to the left as it is on I (Remastered). It's relatively—and unambiguously—simple to detect balance reconfiguration with my ears, but it can also be perceived as level differences between channels when comparing one version to another.
In summary, it's going to be really difficult to ensure that "the same song" pulled from different sources at different resolutions sound different because one is CD quality and the other is high resolution. A better approach might be to start with a high-resolution track with real high-resolution content, then filter and downsample it to CD quality. But this only addresses the question "should I buy the CD-quality or high-resolution version?" when you're talking about the high-resolution original with true high-resolution content that's been filtered and downsampled to CD quality.
My conclusions, for now…
There isn't an easy way to sort this out, at least from my point of view, but I prefer to be cautious: I'd rather buy high-resolution files when possible (yes, I always buy my music). My absolute favorite combo is 1) buying an excellent LP, and 2) finding a coupon inside for a high-resolution digital copy as a part of the purchase price (thank you, Nils Frahm and others).
Then, one day when I have lots of spare time and fancy equipment, maybe I'll be able to convince myself I can hear the difference. In the meantime, I know I'm not being cheated of the full glory of the performance because I selected the 16/44.1 instead of the 24/96 version.
And more music
OK, enough ranting about high res vs. CD vs. MP3. Let's talk about the music because that's what really matters, right?
I've mentioned Lenine in the past, specifically his album O Dia Em Que Faremos Contato. My first contact with Lenine was a CD I picked up at Starbucks "quite some time ago" (if you remember buying CDs in your coffee shop, you can probably figure out how long ago this was), which included a track from another Lenine album Na presão, "A rede." Give this a listen; what a fantastic track, what a fine album! Mine was originally on CD, but I've since ripped it to FLAC and stored the backup in the basement.
If you (like me) have a special spot in your heart for Jimi Hendrix, you really need to hear Tangerine Dream's version of "Purple Haze." Oh dear, this is beyond crazy! You can find it on their 1992 tour album, Live in America, on 7digital, a wonderful, Linux-friendly digital download site.
Finally, after listening to a radio edit of "Sharing" on a compilation album for a number of years, I picked up the Bugge Wesseltoft album Sharing on 7digital; although it appears to have vanished in Canada, it's still available—at least in the United States and New Zealand. This is beyond fine music; I encourage you to check out the rest of Bugge's work, as anyone that offers a New Conception of Jazz merits some serious study.