Why I choose FLAC for audio

No readers like this yet.
Sheet music with geometry graphic

Photo by Jen Wike Huger

In this article, I focus on music in digital formats. Moreover, because I am a Linux kind-of-guy, I'm going to take a Linux kind-of-perspective on this topic.

Most people have heard of the MP3 format. It's an example of two things: First, it is not an open format, as a number of organizations claim patents on it. And second, it is a "lossy" format. Lossy formats compress the original signal by throwing out some of the signal components. The original rationale for this compression was to make music files smaller and more easily distributed. In contrast, there are also "lossless" formats, which can be compressed (without throwing away the original signal) or not. Digital music presented on the Compact Disc (CD) is an example of a lossless format (assuming it's an audio CD, not a data CD with MP3s saved on it).

It is also worth mentioning that there are two main ways to encode digital music: pulse code modulation (PCM) and delta-sigma modulation (DSM). Until recently, most digital music has been encoded using PCM; but Sony and Philips established a DSM-based standard called DSD and implemented it on Super Audio CD (SACD) disks. A small but growing amount of music is available for download in this standard. We'll leave it to Wikipedia to explain the difference in more detail.

Those of us who are concerned about software freedom should prefer completely free formats like Ogg Vorbis (lossy) and FLAC (lossless, compressed). We should particularly avoid file formats that include options for digital rights management (DRM). In theory, one might think that DRM is just a mechanism to prevent the unauthorized use (theft?) of someone's intellectual property. However, certain vendors use DRM to force their customers to use their software, and sometimes hardware. Once again, Wikipedia has a nice detailed article about this whole format business.

But deciding on a format is not—or at least should not—be a primary concern. Rather, each of us has a different set of objectives with respect to the use of music. I'm going to explain my objectives, and then further explain how those objectives influence my decision on file formats.

First, and for emphasis, I am a big supporter of software freedom. This means I prefer the Ogg or FLAC formats for digital music. Any format with limited access due to a patent or trade secret is of little or no interest to me.

Second, my music collection stretches back to the 1960s. I still have most of the LPs I bought years ago (sometimes to my embarrassment), and one of the things that gives me great pleasure is how good some of those old LPs still sound on modern analog playback equipment. I like to think that good-sounding LPs, like Dave Brubeck's Time Out, originally recorded in 1959, still sounds incredibly fresh and clear in part because the people who recorded it did an excellent and careful job with their equipment. And so when I buy music now, whether on LP or as a music download, I try to get the very best quality of recording I can.

Therefore, I buy digital lossless in strong preference to lossy. In fact, if something is only available in a lossy format, I usually don't bother buying it. And not only do I buy lossless, but I buy it at a higher resolution than "CD standard" when available. And for sure my preferred lossless format is FLAC!

Let's talk about resolution for a minute. Music on a CD is presented at a sampling rate of 44.1kHz and with a word length of 16 bits. In theory this means the loudest sound recorded on a CD is 216, or 65,536, times as loud as the softest sound. This means that if you have a recording that shows this full dynamic range and turn up your volume to the point where you can just hear the quietest parts, then the loudest parts will be so loud as to cross the auditory pain threshold.

Moreover, the Nyquist-Shannon sampling theorem tells us that the 44.1kHz sampling rate is more than ample to preserve sound frequencies up to 20kHz (the "kHz" is an abbreviation for "kiloHertz", or cycles per second), which is said to be the upper limit of audibility for humans with excellent hearing.

So why do I think I need higher resolution than the CD standard?

Simple. A recording presented at a sampling rate of 96kHz and with a word length of 24 bits provides a great deal more "room" to fit in the original analog signal—not just the loudest sound and the softest sound—than does the 44.1/16 version. This means a recording need not be at a level so close to the maximum that it occasionally exceeds it. (A signal that exceeds the maximum is said to be "clipped," and clipping introduces all sorts of nasty sounds not present in the original recording.) Moreover, quiet sounds in the music have more bits to represent them.

For example, Marconi Union's Breathing Retake is regularly 25dB below maximum. A dB, or decibel, is a ratio between the actual level (-25dB in this case) and a reference level, 0dB. A signal that is -25dB below reference has the four most significant bits set to zero. So music in a 16-bit word length at -25dB only has 12 bits worth of signal, whereas in a 24-bit word length it has 19 bits worth of signal. Eric Whitacre's Sainte-Chapelle as performed by the Tallis Scholars chugs along at -35dB to -40dB, which gives 10 or less bits for the signal in the case of a 16-bit word length. The 24-bit word length gives the recording engineer much more freedom to record the music as played, without having to compress the music to fit it into the 16-bit dynamic range.

As for the sampling rate, the 96kHz sample rate can be used for audio frequencies up to 45kHz or so, and 192kHz for frequencies up to 90kHz or so, well beyond what is thought to be the top end of human capability. However, having that extra bandwidth available means that the filtering that must be applied to the analog signal before it is digitized can be much more gentle than in the case of the 44.1kHz sampling rate. Gentle filters are generally preferred to more abrupt filters for their audio characteristics. The Well Tempered Computer has several nice articles on this topic.

And one more reason to buy the high-res stuff: my experience shows me that when music is released in high-res format, it is often well cared for in the production chain and preserves the original dynamic range (loud is LOUD and quiet is ...) and life of the music, without introducing a bunch of artifacts—noise!—into the music.

In conclusion: When I buy digital music downloads, I buy them in FLAC format and try to get 24-bit files and 88.2kHz or 96kHz sample rates. My music files cost money. Why would I be willing to accept poor quality lossy files? And why would I be willing to let the vendor lock me into a particular software and hardware platform?

Chris Hermansen portrait Temuco Chile
Seldom without a computer of some sort since graduating from the University of British Columbia in 1978, I have been a full-time Linux user since 2005, a full-time Solaris and SunOS user from 1986 through 2005, and UNIX System V user before that.

40 Comments

It's worth reading this, where you might be surprised to learn that high-res audio isn't all it's cracked up to be: https://xiph.org/~xiphmont/demo/neil-young.html

Also note that you have no idea of the source from which those files were created. If it's the original studio masters, that's one thing. But I suspect a large number of those available for download were created from 44.1/16 originals. At which point, they're just wasting space.

FWIW, I have been ripping my own music to FLAC for the last decade or so.

Thanks for the comment. I have actually read that article, and it doesn't surprise me. What it fails to mention is that when music is recorded at higher resolution, the conversion to lower resolution is an unnecessary and information-destroying step. I prefer the originals, and I prefer to acquire equipment designed to handle the higher resolution content.

It's easy to tell if your music was upsampled from lower resolution sources - the spectrogram shows you. In which case, I say ask for a refund.

In reply to by Tet (not verified)

Great article. I tend to let my ears be the judge of what needs what kind of sampling rate, so I have no rule except the rule of thumb that my ears should be pleased with the depth and scope of the sound. Some of my favourite recordings are vinyl transfers that I myself made and encoded to flac at 48khz.

Great article, and good luck trying to convert people to an audiophile worldview ;-)

Thanks for the comment, Seth!

I wonder if I accidentally "reported" you instead of "replying" - if so, how embarrassing.

Your comments about converting people to an audiophile worldview - how prophetic!

In reply to by sethkenlon

There are several substantial errors and omissions in this article, reflecting a lack of understanding of digital audio coding and psychoacoustics.

The article correctly states that the CD is a lossless format, then praises an LP release of a decades-old recording, without stating that a CD can reproduce the recording on the original master tape much more faithfully (less noise, distortion, etc.) than the LP.

"The loudest sound recorded on a CD is 65,536, times as loud as the softest sound" -- This does not correctly describe how loudness is perceived or measured. Loudness is subjective, depends also on frequency, and is roughly logarithmic, not linear in proportion to data level. For another, the CD signal swings positive and negative, from +32768 to -32767. And also, in digital recording, low-level random noise, called "dither," is added to the audio signal (and may already be present due to microphone preamplifier noise) to actually allow signals below the 1-bit level to be reproduced. The noise can be placed in a high frequency range to which humans are insensitive, significantly extending the dynamic range.

And as to 24-bit word length and higher data rates beyond that necessary to reproduce the highest audible frequency, both have essentially the same effect in increasing the potential dynamic range which can be recorded, by increasing the number of bits, and can be useful to a recording engineer who must deal with different levels, and mix tracks together -- but make no audible difference in a final release. At any normal listening level, 16 bits cover the range from subjective silence to deafening loudness, and most likely the microphone preamplifier hiss recorded along with the audio already is well above the lowest level which 16 bits can reproduce.

If a recording does not exercise the top 4 or more bits, then very simply if is poorly mastered.

20 kHz give or take one or two kHz IS, not "is thought to be" the human upper limit of audibility, even for young people whose hearing has not been subjected to wear and tear. Suggesting that higher frequencies might be important is a frequent tactic of audio equipment makers who are overselling their customers, unless these happen to be dogs or bats!

Abrupt filters can be badly designed so they produce audible distortions (echo-like) but even the 44.1 kHz CD sample rate (theoretical cutoff at 22.05 kHz) allows carrying response up to 20 KHz while the rolloff remains gentle enough that no such distortion becomes audible. Yes, a higher sample rate may be used in digital filters in recording and playback systems, but the sample rate of the recording need not be as high as that used in the filters.

Whew.

John, thanks for the comments.

I don't feel completely comfortable turning this forum into a debating instrument. Said that, you make a few statements that I cannot let stand.

First, "a recording that does not exercise the top 4 or more bits is poorly mastered". The recordings I mention DO exercise the topmost bits, but their level is generally much lower. This is not an indication of defect but rather an indication of the intended dynamics of the composer.

Second, the point of this is that in music that is largely quiet with occasional dynamic peaks, less bits are used to represent the low-level signals; and using 16 bits to capture the dynamic peaks leaves much fewer to capture the low-level signals; and that in turn implies that the distortion is much more significant in those lower level signals than the same recording at 24 bits. From another perspective, if the music was recorded at 24 bits, why should I be satisfied with a 16 bit rendition?

Third, your comment about higher sample rate for filters than for the recording... the filtering happens before discretization in order to reduce aliasing.

Finally, I think you're generally misinterpreting my point: I prefer the higher rate / higher bit depth mostly because they are generally - in my experience anyway - indicative of more care being taken in the production chain. There is so much music "out there" that is more an artefact of the production process and less of the original performance, and that shows in the compression applied (for the FM and AM radio broadcast market??? and for people who equate listening to walking down the noisy street with their earbuds in). I am advocating listening as the primary activity and therefore the use of high-quality software and tools.

In reply to by John S. Allen (not verified)

About the top 4 or more bits: point taken; and the volume level compression and outright clipping in many pop recordings are very unfortunate. However, 4 bits down is only 24 decibels down and about 68 dB remain down above the noise level in a simply dithered recording, 90 dB in a recording with optimally noise-shaped dither (as in the Sony super bitmapped recordings). It is likely that microphone preamp noise and room noise will mask dither noise even with the simple dither; with 90 dB, the dither would be completely inaudible even at a very loud listening level.

"and that in turn implies that the distortion is much more significant in those lower level signals than the same recording at 24 bits." Distortion at low levels occurred in early digital recordings made with converters that had a nonlinear transfer function and/or poor or no dither. Talk of distortion continues to circulate, but there is *no* distortion in proper digitization. The audio signal sinks down into the very low-level hiss from the dither. This has been proven mathematically and demonstrated in practice.

"From another perspective, if the music was recorded at 24 bits, why should I be satisfied with a 16 bit rendition?" Because there is no audible difference, and because the higher data rate is wasteful of bandwidth.

"Third, your comment about higher sample rate for filters than for the recording... the filtering happens before discretization in order to reduce aliasing." Correct, analog filtering can be less steep when a high sample rate is used in recording. However, then steep digital filtering can reduce the sample rate in the transmitted or stored recording to as little as 44.1 k without audible effect. On playback, the sample rate can again be multiplied in order to use the inverse process: steep digital filter, mild analog filter. The filtering is cheap: it is the data storage or streaming which is expensive.

"Finally, I think you're generally misinterpreting my point: I prefer the higher rate / higher bit depth mostly because they are generally - in my experience anyway - indicative of more care being taken in the production chain." As I agreed in my first comment, it makes sense to use the higher bit depth and sample rate during recording, when dealing with unpredictable level changes and mixing multiple tracks -- but there is no need for the higher bit depth and sample rate in the data which is recorded on a CD or streamed. And beyond this, even the top-quality lossy data compression methods have been shown -- through listening tests and by subtractive comparison with the uncompressed signal -- to produce changes which are completely inaudible.

As to your preference for open-source compression over MP3, doesn't the Lame ( which stands for "Lame is not an MP3 Encoder") encoder stand in the same relation to the Fraunhofer encoder as Linux does to proprietary Unix? I use Lame, the files are MP3-compatible, and nobody ever came after me for a royalty payment.

In reply to by clhermansen

Thanks for your comments, John.

The purpose of my article is not to defend the intrinsic properties of 96/24 vs 44.1/16 but really to suggest to people that they consider the higher resolution stuff because - in my experience anyway - it is often better-mastered, in a manner more faithful to the original, than stuff wedged onto CD or into MP3 or whatever.

Nevertheless, a few brief comments and then I promise not to say any more.

MP3 files: I don't want 'em. First because they use lossy compression, second because they are not an open standard. In another comment you will see that they may no longer be encumbered by patents. If that is the case, the issue would be moot, I guess; but I'm not smart enough to understand whether they are or are not free. And as to whether someone has chased you for royalty payments, that's not my issue. Some closed-source software companies have threatened or sued some open-source software companies for creating open-source software that contravenes patents held by the former. That limits my freedom as a would-be consumer of open-source and so whenever possible I try to use software that is not impacted by patents. Perhaps I am being quixotic; so be it.

"There is no distortion in proper digitization" - here you are wrong. There is discretization error in all digitization, which arises from representing more-or-less continuous data (analog) with a finite number of bits. The smaller the number of bits, the greater the discretization error. Dithering turns discretization error into noise, and noise is distortion. Here is a nice quote from the Wikipedia article https://en.wikipedia.org/wiki/Quantization_%28signal_processing%29

"The calculations [signal-to-noise quantization error] above, however, assume a completely filled input channel. If this is not the case - if the input signal is small - the relative quantization distortion can be very large. To circumvent this issue, analog compressors and expanders can be used, but these introduce large amounts of distortion as well, especially if the compressor does not match the expander. The application of such compressors and expanders is also known as companding"

Note the mention of "if the input channel is not full".

"steep digital filtering can reduce the sample rate in the transmitted or stored recording to as little as 44.1 k without audible effect" and "the filtering is cheap: it is the data storage or streaming which is expensive" and "the higher data rate is wasteful of bandwidth" and "there is no need for the higher bit depth and sample rate in the data which is recorded on a CD or streamed" - well, what can I say; your priorities are to quantize at a high rate and then apply a FIR filter or similar to save storage and/or facilitate streaming. Mine aren't; mine are to acquire my music in a format that is as close to the original as possible - just don't bother to apply the FIR filter and give me the original, at 24 bits as well, please. I really don't want to buy music at 44.1/16 or 256kbps MP3 if they are available in 96/24. If I need those formats for some reason, I can always downsample myself.

My life experience with music is that there has always been someone making beautiful music, and there has almost always been someone getting between that beautiful music and my appreciation of it by compromising its fidelity in order to package it - high-speed cassette duplication; 8 track tapes; 10th generation masters used to make poor-quality LPs that use lots of recycled waste vinyl; carelessly converted analog music put hastily on CDs; music compressed to within an inch of its life to make it sound louder on AM radio; music converted to 128kB AAC files so that it will fit on a device I don't own and don't want to own; and so on and so on. Now that someone is willing to give me something very close to the master, I want to buy it and hear it in all its glory. I want to support that market place. I want Pono and HDTracks and ProStudioMasters to succeed (though I wish they would have a more open downloading process) because they are generally concerned with making high-quality music available. If part of their chain is 96/24 or 192/24 that's fine with me.

My concern with my music is to listen to it and love it, not to worry about how much disk space it takes up. People who tell me I must prefer music that fits on a CD or can be streamed in a download over a 96/24 remastering that hasn't been squeezed and squashed...

In reply to by John S. Allen (not verified)

Interesting because a few days ago I went through a Xiph article from 2011, titled “24/192 Music Downloads ...and why they make no sense”: http://people.xiph.org/~xiphmont/demo/neil-young.html

In addition, you don't speak about the equipment you're using to listen to these FLAC files. I believe it requires quite a specific equipment (and great ears) to be able to listen those files in the proper way. Is there an upcoming article about this?

Pierre, thanks for the comment.

I'm going to talk about my equipment experiences in the next two articles - first about laptops, then about "the home stereo".

The Xiph article you mention takes the perspective that "our ears don't need this higher resolution stuff". My main point is that the higher resolution stuff is often indicative of a production chain that is more oriented to producing a high quality result. My secondary point is that the music I want to listen to is often recorded at this higher resolution, so why would I want someone to throw it out before selling it to me?

Finally, the point in the Xiph article about poor equipment performing badly in the presence of higher resolution signals has a great solution - don't buy poor equipment!

In reply to by Pierre E. (not verified)

Thanks for your reply, Chris! Looking forward to reading your next articles.

In reply to by clhermansen

Surprisingly, you don't need high end audio equipment to hear a difference. I recently installed an aux input in my 10 year old car. The car did come with a "premium" audio system, but all that means is that there is an extra amplifier and Infinity speakers instead of non-amplified regular speakers.

There is a very noticeable difference between 256k MP3's and FLAC's of the same music. So much so, in fact, that I spent days to re-rip my CD collection in FLAC. I am not one to spend thousands of dollars on audio equipment, but I do know that I will be using FLAC wherever I can.

In reply to by Pierre E. (not verified)

Thanks for the comment, Josh.

One thing about equipment is you can buy it and sell it, trade up if you have the inclination and budget. In theory better equipment should make all your music sound better (though some music is so poorly recorded I think it doesn't count). Hard to trade up MP3 downloads when you decide you want something better. I commend you for re-ripping your CDs as FLACs; I hope you enjoy your "new music"!

In reply to by Josh (not verified)

Thanks for the comment, Seth!

I am on your side on the "let ears be the judge" but for sure we could start an argument with that kind of statement!

As to converting people to an audiophile worldview, better people than I have tried! But I would hope that in a forum founded on the concept of "open", at least some readers will be open-minded enough to give better sound quality a chance...

With respect to your vinyl rips, what did you do? A standalone USB-phono pre-amplifier, or connect your sound card to your stereo tape out, or...? What software?

Thanks again!

Great article. Really liked the music you used as examples too.

24/192 Music Downloads ...and why they make no sense:
https://people.xiph.org/~xiphmont/demo/neil-young.html

Thanks for the comment, Erik.

Others have mentioned this above, and in fact I read the article some time ago and re-read it before writing mine.

I'm not really interested in trying to argue against that article, except in respect to the comment at the end about equipment being unable to handle higher resolution music, to which I respond "then get better equipment".

What I am advocating is 1) get the music in the best form you can, which might be by ripping a CD or an LP, and 2) consider higher-resolution downloads because in my experience they often speak to a better production chain that is more faithful to the original than the mass-market MP3 or AAC outcomes, or even CDs. Of course this isn't always true. It's just worth considering.

In reply to by Erik (not verified)

There is merit, especially during recording, to the view that 24 bit audio allows more "headroom" to compensate for not getting the volume just right. This should no longer be necessary once music has been mastered, but you can always argue that there's no guarantee that the master is perfect.

However, the idea that the digital filters used in recording audio (finite impulse response filters) can cause phase deviation is incorrect. Analog low pass filters can do this, but FIR filters do not do this, no matter how close to a "brick wall" you manage to make them. Tests with oscilloscopes prove this. There is no relation between sample rate and what kind of analog filter you might be using. Some early CD players had "brick wall" analog filters which caused distortion due to phase deviation. This was resolved with the use of digital filters and gentler analog filters.

I noticed also in "The Well Tempered Computer" article that they claim that 44.1 kHz audio is upsampled to 48 kHz through "interpolation." However, that is a mischaracterization. Interpolation implies that the calculation you make of where the sample would be is only an estimate. Sound waves are predictable, so the calculation of where it will be between any two known samples is not an estimate; it's actual. I suppose that theoretically, you couldn't be sure of which sample a sound wave ended with, but as long as it's within the Nyquist frequency, it's not going to make any difference anyway.

CFWhitman, thanks for the comment.

A digital filter, even a good one such as a FIR filter, can only be applied once the signal is digitized.

And a signal that contains content above 1/2 the discretization rate will have aliasing present. Hence the need to apply an analog filter prior to discretization to remove (or at least drastically reduce) analog content above the Nyquist frequency.

To be specific, if you have an analog signal that contains energy above 20kHz (such as is generated by cymbals), you MUST apply an analog filter that reduces or eliminates that energy BEFORE you convert it to digital at say 44.1kHz. You cannot use a FIR filter for this as FIR filters are digital.

With respect, you are incorrect in your interpretation of 44.1 to 48kHz conversion. The only known values are at the sample points. Any attempt to determine an intermediate value - between sample points - is interpolation, and that is what you need to do to convert a signal from one sample rate to a higher one.

Consider this: with 44.1kHz sampling, a sample is collected every 0.0000226757... seconds. That is, we have samples collected at:

1/44100 = 0.00002267573696145124 seconds
2/44100 = 0.00004535147392290249 seconds
3/44100 = 0.00006802721088435374 seconds
4/44100 = 0.00009070294784580498 seconds
5/44100 = 0.00011337868480725623 seconds

and so on.

In order to provide a 48kKz signal, we need to provide a signal at

1/48000 = 0.00002083333333333333
2/48000 = 0.00004166666666666666
3/48000 = 0.00006250000000000000
4/48000 = 0.00008333333333333333
5/48000 = 0.00010416666666666666

The only way to generate those values is to interpolate between the 44.1kHz samples. For instance, we could use linear interpolation. In that case, the value at the 2/48000 "sample" point would be

S(2/48000) = (2/48000 - 1/44100) * (S(2/44100) - S(1/44100)) / (2/44100 - 1/44100)

where S(2/44100) represents the sample signal taken at t = 2/44100 and S(1/44100) represents the sample taken at t = 1/44100.

Of course other interpolation schemes could be used.

In reply to by CFWhitman

As to sample rates, recording has been done at high sample rates for a long time. My comments were only relevant to the playback. Once the recording has been digitized, you can safely apply a digital filter and master it at a much lower sample rate with no loss of relevant data.

As to interpolation, your comments are irrelevant (except that linear interpolation would never be used; that would be a guess, and a bad one). As long as you have enough sample points to accurately recreate a sound wave, you can accurately predict any sample point along the sound wave.

As I mentioned in a comment on the last article, It's similar in principle to predicting points along a straight line. Once you have the endpoints of a straight line, you can predict any point in between with relatively simple math. With sound waves, once you have enough samples to accurately describe the sound wave, you can predict any point in between two samples with perfect accuracy, though the math is not so simple.

In reply to by clhermansen

With respect, my comments on interpolation are not irrelevant. "Interpolation" is the correct mathematical term for determining unknown values that exist between known ones. Your use of the term "predict" as in "you can accurately predict any sample point along the sound wave" or "it's similar in principle to predicting points along a straight line" or "you can predict any point in between two samples with perfect accuracy" is in fact precisely "interpolation".

Moreover, choosing the order of interpolation - linear, quadratic, cubic, etc - each of which requires more points to solve for the interpolation coefficients, whether piecewise polynomials or other, does not add accuracy to the prediction of the values at the interpolation points. In fact, the choice of the shape of the interpolating functions is entirely aesthetic.

When you applied a low-pass filter to band-limit the original signal, and then sampled that band-limited signal, you threw away the actual value that originally occurred at the point of interpolation, so you have no way to state the accuracy of your result to the original signal. All you know is that your measurement of the filtered signal at the sample points is exact. When you predict a point between two samples, all you are doing is guessing. If you are lucky, your guesses will be euphonic. They won't be accurate.

With respect, I suggest that you study the matter of interpolation further, perhaps at https://en.wikipedia.org/wiki/Interpolation; or if you prefer I still have a collection of my numerical analysis textbooks and can provide you with chapter and verse.

In reply to by CFWhitman

From a mathematical standpoint, what you are doing when you fill in this data set is interpolation. However, from an engineering standpoint, it is a mischaracterization (note that this is the same word I used before). That is, when the word "interpolation" is used in mathematics it means calculating missing data in a set, it's pure mathematics and who is to say that the data is inaccurate. From an engineering standpoint for filling in samples in a set, "interpolation" generally means guessing or estimating with a mathematical formula points which cannot be accurately predicted. The word "interpolation" is not generally used in engineering for calculating data that can be accurately predicted with mathematics.

You don't have to guess missing points in a sound wave. Calculations of those points are not guesses; they are exact figures. Sound waves can only have one shape per frequency. It does not vary. You can use trigonometry to predict all points between the known samples. If sound waves could vary in shape randomly between two known points, the Nyquist frequency would not even begin to be high enough to accurately record them. You would need to sample them much, much faster to even start to accurately represent them. Basically, digital recording as it exists today would simply not work.

This is not to say that there can't be errors in this process, but that is what oversampling is for. Oversampling does a better job of eliminating or compensating for errors than increasing sampling frequency does, with none of the drawbacks that increasing sampling frequency can have.

In reply to by clhermansen

CFWhitman, perhaps we are arguing overmuch here, but you are incorrect in your statements.

First of all, you seem to feel that using "trigonometry to predict all points between the known samples" is somehow different than interpolation. In fact it is exactly and precisely linear interpolation. And finding a point along the line joining two other points isn't "accurate", it's "precise" at best. Accuracy would require knowing the value that actually occurred at that point in time, and we have already thrown it away by filtering and sampling. So we will never be accurate.

Second, "If sound waves could vary in shape randomly between two known points, the Nyquist frequency would not even begin to be high enough to accurately record them" - this is precisely the problem! Sound waves DO vary between two known sample points, and not necessarily in a linear pattern. When filtering is applied, we are eliminating those variations. The Nyquist frequency is only high enough to sample the filtered music, not necessarily the original.

Think about providing a recording of percussion instruments at 44.1kHz or 48kHz. If you check out this article http://www.drummerworld.com/forums/showthread.php?t=66957 you will see that cymbals have spectral content out to 40kHz or even higher. So the only way to fit that into a 44.1 or 48kHz signal is to apply a low-pass filter. Now think about doing so at 44.1kHz, with a filter that rolls off at say 20kHz. You have eliminated all the spectral content - signal - above 20kHz. If you then try to produce a 48kHz signal from your 44.1kHz signal, you can easily see that the predicted values at the new "sample" points lack the necessary spectral information that was thrown away in the filtering and therefore will only ever agree with the original signal by luck.

You can turn around and say "bah but this is all above the human range of hearing", but that's not the point. The point is that predicting the values between sample points is never accurate; it's an approximation to the original wave form and we never know how good it is going to be.

Third, your comment "You would need to sample them much, much faster to even start to accurately represent them" is precisely the point - you DO need to sample the music at a much higher rate than 44.1kHz to get an accurate APPROXIMATION to the music. Again, whether you or I or my dog can hear that is beside the point.

Oversampling has nothing to do with the matter of computing a decent approximation to signal values between sample points.

In reply to by CFWhitman

I don't want to go on forever on this, but I will try to make it more clear.

Regarding interpolation, I don't want to argue about semantics. I want to be clear on what's happening. Technically, any time you calculate data you didn't collect, that's interpolation from a mathematical point of view, just like technically all squares are rectangles even though referring to a square as a rectangle out of context would be misleading.

"In fact it is exactly and precisely linear interpolation."

No, that's not correct. a sine wave is not calculated linearly. There is more to trigonometry than linear interpolation. Sound waves are predictable. Missing data can be calculated accurately as long as you have enough data to accurately describe the wave.

"You can turn around and say 'bah but this is all above the human range of hearing,' but that's not the point. The point is that predicting the values between sample points is never accurate; it's an approximation to the original wave form and we never know how good it is going to be."

You just threw out modern digital recording theory, which has been in application for over thirty years. Sound waves above the range that we are trying to capture are irrelevant to the discussion. Why bring them in? You can capture them if you want to, but you need to go up to the Nyquist rate for those frequencies to do so (incidentally, in my last post I should have said Nyquist rate, not frequency, which is related but not really the term I meant). If you don't, then you can't even begin to calculate their waveform. For practical purposes the parts of a cymbal clash that you can't hear are different sound waves than the ones you are capturing.

Perhaps you should look further into what oversampling accomplishes.

I will sum it up. If you are correct, then digital recording theory, which has been in application for over thirty years, is complete rubbish. The Nyquist rate is meaningless, and people who claim that you need 192kHz files to get accurate sound are probably underestimating. If, on the other hand, digital recording theory is correct, then you only need as many sampling points as the Nyquist rate gives you to accurately predict all the ones in between.

In reply to by clhermansen

CFWhitman, I would never throw out modern digital recording theory. You keep trying to "put words in my mouth" that I never uttered. I have no reason to criticize the Nyquist Shannon theorem or anything related to sampling signals.

That theorem tells us at what rate we need to sample a bandwidth-limited signal in order to be able to reconstruct the same bandwidth-limited signal without error. But when we first filter the input signal to limit its bandwidth, we are throwing away any signal higher than the filter frequency, so that we don't end up with aliasing in the result. Thus the reconstructed signal may in fact be identical (or nearly so given real-world limitations and word length) to the band-limited signal that was sampled.

However if we digitize the original signal at say 44.1kHz and 88.2kHz, there is a bunch of spectral information in the 88.2 signal that is not present in the 44.1 signal. Therefore when we try to convert the 44.1 signal into an 88.2 signal, we don't get the same result (there is no way of "getting back" the info that was eliminated by the filtering).

Put another way, if we were able to arrange the two digitizations so that the 44.1 samples fell exactly on every other 88.2 sample, there is no way we could determine the other half of the 88.2 samples from the 44.1 samples. We could "guess" (interpolate) a value, using linear interpolation or using sinc functions or any other kind of reconstruction filter but in the end we threw out the info and there is no way to get it back.

Are the differences between the two significant? I'm not going there, that's for Neil Young to argue.

Which reminds me! I'm not going to say any more about this because it's a looooong way from the point I have trying to make all along - I prefer to have my music as close to the original as possible (and not downsampled to fit on a CD or into a size that is convenient for downloading over a 56Kb modem).

In reply to by CFWhitman

I'm not putting words in your mouth. The point is that if the Nyquist Shannon theorem is correct, then the audible frequencies are exactly reproduced. Thus when you upsample what you've captured to any higher rate, you don't have to guess the sample points of the higher rate for the audible part of the signal (the part intended to be captured). The theorem says that you have captured that part of the signal perfectly, and thus can predict any point on the captured sound waves with perfect accuracy (that's theoretically, anyway). If you can't predict those points, then the theory is incorrect.

The signals above 20kHz (or above wherever the Nyquist rate you're using limited you to) are not accurately captured at all. You don't have enough sample points to even guess at the missing ones (it would be like trying to recreate a straight line with only one endpoint). However, there is no reason to care about those missing parts. It is still possible to upsample the audible part of a 44.1kHz capture to 48 kHz with perfect accuracy (no guessing involved). Nobody cares about the missing (inaudible) parts. They were filtered out. The article linked indicates that you can't do that without "guessing" the missing sample points because they are in between the ones you've captured. That's not correct. Of course any article that puts up a stair step soundwave graph for purposes other than mocking it wasn't written by someone who understands the Nyquist Shannon theorem in the first place.

In reply to by clhermansen

I was reading a little bit about this and realized that some of what I said in my last post is out of date. It reflected the state of affairs when CDs were introduced 30 years ago. However, current digitial recording equipment can record audio without ever passing it through an analog filter of any kind. Instead, the signal is digitized at a high sampling rate and passed through a digital filter before ever being saved at a more practical sampling rate. Apparently a lot of newer music is never even originally recorded at a higher rate than 48kHz (though 24 bit is of course used, and necessary, for the stages before mastering).

In reply to by clhermansen

CFWhitman, I thank you again for your offered opinions with respect to 24 bit word length or higher-than-CD resolution being unnecessary for proper enjoyment of musical playback. I trust that other readers of this article can come to their own judgement using their own criteria for so doing.

Meanwhile, I will exercise this opportunity to remind you and others that my perspective on all of this is that 1) very dynamic music, of which I have several examples, may not be well-served with a 16 bit word length; 2) it's my experience that recordings released at higher resolutions and sampling rates may have been prepared in such a way as to maximize their musical value; and 3) I choose not to buy music in a format that was designed to be less consumptive of bandwidth and storage in the days of 56kB modems and 20Mb hard drives. If others don't share my perspective, so be it!

In reply to by CFWhitman

My point here had nothing to do with 16 bit vs 24 bit files. I only mentioned in passing that before mastering, you really need to use 24 bit files. After that it's debatable (not that I'm not saying you absolutely don't).

In reply to by clhermansen

FYI: MP3 is almost certainly patent free at this point. The last US patents that were filed before the MP3 specification was published have expired.
https://plus.google.com/116809495975386153151/posts/4KJ1hwUT8Gh

Good to know, Josh. I read your disclaimer on G+ and I'm sure you won't mind me taking this with a grain of salt!

In reply to by Joshua J Cogliati (not verified)

An enjoyable and instructive article - as is the follow-up debate.

Thanks to everyone who took the trouble.

There's one paragraph I don't understand

"It's easy to tell if your music was upsampled from lower resolution sources - the spectrogram shows you."

So I took one of the latest Bowers & Wilkins Society-of-Sound downloads.

Audacity says this is an 88200Hz Stereo 32-bit float recording.

But the Audacity Spectrogram dives steeply beyond 20KHz and is -90db down at about 22KHz.

Therefore, how dopes one deduce sampling & bit rate from the spectrogram (which in this case I would have said indicated 16bits, 44.1KHz?)

Cordially

I might have been too glib there! Sorry. Let me try to be a bit more explicit.

First of all, my preferred Linux music player is Guayadeque and it tells me the bit rate of the music that is playing. So right now I'm listening to Afro Celt Sound System's Whirly Reel from their first album, ripped from a CD that I own. Guayadeque is showing me a bit rate of ~ 800kbps. Now I'm listening to Ali Farka Touré and Toumani Diabete's Kenouna, a free MP3 download from awhile ago. Guayadeque is showing me a bit rate of 256kbps. And now I'm listening to Beaten by Them's Damp Sky 1, a 96/24 download of their album Invisible Origins acquired from Linn Records. Guayadeque is showing me a bit rate of 2700kbps.

So I can learn about the word length and sample rate (combined) from Guayadeque. Interestingly, awhile ago I bought a download of Ronn McFarlane's Indigo Road. This is supposedly 96/24, but it shows a bit rate of 1200kbps (half of what I would expect based on for example the Invisible Origins I mentioned above). I contacted the vendor; they checked into it and informed me that it was actually 96/16! Hmmm. Well something's not right for sure. It could also be 44.1/24 as far as this test can tell.

Alright, what about the spectral thing? I like the "spek" utility http://spek.cc/ as its visualization of the music tells you a lot about what's out there - or not! As you say, there doesn't tend to be a lot above 20kHz in higher resolution stuff, but there is a bit.

I have a 96/24 download of Buena Vista Social Club's eponymous album. Looking at Chan Chan, there is signal that is clearly correlated with the music out to about 35kHz and then nothing above that. Mind you, anything above 20kHz is pretty much 120db and downward, but when there is a gap in the higher-level stuff, there is a matching gap in the high-frequency, lower-level stuff.

I have a 96/24 download of Counting Crows August and Everything after. Looking at Mr. Jones, there is a fair bit of -60 - -70dB right up to 20kHz and lower overtones well beyond that, up to 30kHz. There's also a pretty constant noise source at about -100db between 40-48kHz which I bet is tape bias or something similar.

A very dynamic jazz album by Darcy James' Secret Society, Brooklyn Babylon, at 44.1/16 shows clearly that the overtones are chopped off abruptly just below 22kHz. Audible? Not going to argue about that, sorry :-)

So give these two pieces of wonderful software a spin and see what you think!

In reply to by rutherfordpaul (not verified)

I am re-encoding my favourite movies from blu-rays to a mkv file and I choose to store the audio in FLAC format
It's lossless, multichannel and works with all the videoplayers I tested

That sounds pretty cool, Danny! I guess when you say "videoplayers" you mean software videoplayers like VLC, and not the USB port on your home theatre receiver?

Thanks for the comment!

In reply to by Danny3 (not verified)

Well, I'm convinced. Can you please suggest some places to buy FLAC format audio? I'm never buying another MP3 again.

Thanks!

Joseph, thanks for the comment!

I live in Canada and our choices are somewhat more restricted than in the UK or US (for example). But here are some sites where I regularly buy FLAC music:

https://bandcamp.com/
https://bleep.com/ (sometimes 24bit WAV too)
http://www.linnrecords.com/
http://www.gimell.com/
https://ca.7digital.com/ (or your country's site; getting to be more FLAC here)

There are lots of others too; and you should look back at the comments to this article as various readers make suggestions.

Another thing to do, when you are interested in a specific artist, is to look at their official or fan sites, or sometimes their label sites to see if they suggest a source for FLAC. This can take a bit of detective work...

There are also some sites that sell in FLAC format but require a Windows or OS/X downloader. The good news is their catalogues are quite decent, but the bad news is you have to mess with Wine or borrow someone else's computer, at least to do the download.

https://www.hdtracks.com/
http://store.acousticsounds.com/superhirez
http://www.prostudiomasters.com/ (Canadian site)

Don't forget that sometimes a great way to get FLAC at a reasonable price is to buy the CD and rip it! CDs are often incredibly inexpensive these days... and also you have a backup, at least until the dog gets ahold of it...

And finally - don't say "never" to MP3. Sometimes there is no other obvious way to get a piece of music except in lossy format. So if you must...

Good listening!

In reply to by Joseph S (not verified)

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.