Insight to Audio File Formats ~ The Pirado

I love listening to music. But i always wonder Why there are so many file formats available in Audio Files ? So, Lets discuss about these formats, what differentiates them, and what to use them for.

PCM Audio: The Beginning of Digital Audio

Pulse-code modulation (PCM) audio was developed in 1937 by British engineer Alec Reeves. it uses an advanced algorithm to create a digital facsimile of an analog signal. PCM streams have two basic properties that determine their fidelity to the original analog signal: the sampling rate, which is the number of times per second that samples are taken; and the bit depth, which determines the number of possible digital values that each sample can take.

Real sound, in the real world, is continuous. In the Digital world, it’s not. Somehow this is more confusing with audio than with video, so let’s look at video as a point of comparison. What we interpret to be “motion” or think of as “fluid” and constantly-moving is, in actuality, a series of still pictures. In that same way, the amplitude of sound waves in a digital format isn’t “fluid” or constantly changing. It’s changing based on certain criteria at predefined intervals.

In the above figure, a sine wave (red curve) is sampled and quantized for pulse code modulation. The sine wave is sampled at regular intervals, shown as ticks on the x-axis. For each sample, one of the available values (ticks on the y-axis) is chosen by some algorithm. This produces a fully discrete representation of the input signal (shaded area) that can be easily encoded as digital data for storage or manipulation. For the sine wave example at right, we can verify that the quantized values at the sampling moments are 7, 9, 11, 12, 13, 14, 14, 15, 15, 15, 14, etc. Encoding these values as binary numbers would result in the following set of nibbles: 0111 (2^3*0+2^2*1+2^2*1+2^0*1=0+4+2+1=7), 1001, 1011, 1100, 1101, 1110, 1110, 1111, 1111, 1111, 1110, etc. These digital values could then be further processed or analyzed by a purpose-specific digital signal processor or general purpose DSP. Several Pulse Code Modulation streams could also be multiplexed into a larger aggregate data stream, generally for transmission of multiple streams over a single physical link. One technique is called time-division multiplexing, or TDM, and is widely used, notably in the modern public telephone system. Another technique is called Frequency-division multiplexing, where the signal is assigned a frequency in a spectrum, and transmitted along with other signals inside that spectrum. Currently, TDM is much more widely used than FDM because of its natural compatibility with digital communication, and generally lower bandwidth requirements.

There are many ways to implement a real device that performs this task. In real systems, such a device is commonly implemented on a single integrated circuit that lacks only the clock necessary for sampling, and is generally referred to as an ADC (Analog-to-Digital converter). These devices will produce on their output a binary representation of the input whenever they are triggered by a clock signal, which would then be read by a processor of some sort.

To produce output from the sampled data, the procedure of modulation is applied in reverse.

It’s Advantage, it’s ready to be stuck in a digital signal processor, and it’s more or less universally playable. Most other formats manipulate audio via algorithms, so they need to be decoded while playing. PCM audio is considered “lossless,” it is uncompressed, and therefore, takes up a lot of hard drive space.

Uncompressed audio formats : AIFF, WAV

There are 2 formats(AIFF and WAV) under this, lossless formats, PCM, which means it takes about 10MB to save a minute’s worth of music. aiff was developed for Apple’s OSX, and wav for PCs, although both formats are compatible with both operating systems. WAV and AIFF are flexible file formats designed to store more or less any combination of sampling rates or bitrates. This makes them suitable file formats for storing and archiving an original recording. There is another uncompressed audio format which is .cda (Audio CD Track) .cda is from a music CD.

The AIFF format is based on the IFF format. The WAV format is based on the RIFF file format, which is similar to the IFF format.

BWF (Broadcast Wave Format) is a standard audio format created by the European Broadcasting Union as a successor to WAV. BWF allows metadata to be stored in the file. This is the primary recording format used in many professional audio workstations in the television and film industry. BWF files include a standardized Timestamp reference which allows for easy synchronization with a separate picture element. Stand-alone, file based, multi-track recorders from Sound Devices, Zaxcom, HHB USA, Fostex, and Aaton all use BWF as their preferred format. If you’re recording at home for the purposes of mixing, this is what you want to use because it’s full quality.

Lossless Formats: FLAC, ALAC, APE

Lossless audio formats are becoming very popular these days. One of the main reasons for this is as the name suggests, they are lossless. In other words, they allow the exact same audio to be played back as the original. The Free Lossless Audio Codec, Apple Lossless Audio Codec, and Monkey’s Audio are all formats which compress audio, much in the same fashion that anything is compressed in digital world: using algorithms. The difference between zipped files and FLAC files is that FLAC is designed specifically for audio, and so has better compression rates without any loss of data. Typically, you’re seeing about half the size of WAVs. That is, a FLAC file for stereo audio at “CD quality” runs roughly 5 MB per minute.

The advantage is that if you want to do audio manipulation, you can convert back to a WAV with no loss of quality. If you’re an audiophile and listen to a lot of music with dynamic ranges, these formats are for you. If you’ve got a great set of speakers or earphones, these formats will bring out the tones to showcase them.

Lossy Formats: MP3, AAC, WMA, Vorbis

It is the most used format these days ; some degree of audio quality is sacrificed in exchange for a significant gain in file size. An average “CD quality” MP3 runs about 1 MB per minute. Big difference compared to PCM, no? This is called compression, but unlike with lossless formats, you can’t really get that quality back once you strip it in lossy formats. Different lossy formats use different algorithms to store data, and so they typically vary in file size for comparable quality. Lossy formats also use bitrate to refer to audio quality, which usually looks like “192 kbit/s” or “192 kbps.” Higher numbers means that more data is being pumped out, so there’s more preservation of detail. Here are some details for the more popular formats.

MPEG-1 Audio Layer 3 (MP3) - MP3s are the most popular digital audio files mainly for their compression capabilities. MP3s are very small files when compared to a WAV file and still maintain high quality sound. There are many different compression rates which affect the overall recording of the song but anything above 128 kbit/s is going to be pretty good quality.

AAC - This format is equivalent to an MP3 however some would say it offers better quality. You can get the same quality sound out of 96 kbit/s AAC as you can with a 128 kbit/s MP3. This allows AAC files to be just a little smaller than MP3s and still have just as good of quality. AAC has recently become popular with the addition of iTunes to the digital music market. iTunes music comes in the AAC format and Apple's iPod is the only audio player that can play this file format.

WMA - Short for Windows Media Audio, this is the file format commonly created by Windows Media Player. It is a compressible audio format that is ideal for streaming and downloading. They are smaller than MP3s and have about the same sound quality. More and more audio players are able to play WMA files but its use is not widespread.

Vorbis - A free and open-source Lossy format used mostly in PC games such as Unreal Tournament 3. FOSS fans, such as many Linux users, are bound to see plenty of this format.

Which is the best lossy audio format? It depends on what digital audio player you use, how much space you have, how big of a quality nitpicker you are, and a bunch of over variables. Nowadays, computers will play anything, most audio players (except Apple’s, of course) will do multiple lossy formats, and more and more do FLAC and APE. Apple sticks to MP3, ALAC, and AAC.

Other Audio File Types

Musical Instrument Digital Interface (.midi)
Musical Instrument Digital Interface (.midi) is commonly used for computer keyboards and other computer-based musical tools. MIDI files contain musical notes, rhythm notation and other information often needed by a composer.
Sun Audio (.au)
Sun Audio (.au) or Audio/Basic was developed by Sun Microsystems for use on UNIX systems.
Emblaze Audio (.ea)
Emblaze Audio (.ea) was created by Geo and offers compression similar to MP3 formats, but its purpose is to be played with a JAVA applet-a miniature Internet program. Online greeting cards often use JAVA applet programs for motion and .ea sound files to play music.

What My Ears Wants

Objectively, MP3 and WMA completely destroyed Ogg Vorbis in this test. But subjectively (which, of course, is how we judge music when we listen to it), the situation is different. Note that anything in this section can be and probably is biased.

I noticed that as bit rate decreases, MP3 adds distortion and kind of "warps" the sound, while Ogg Vorbis and WMA alter the tone. I'm sure you know what I mean by distortion in an MP3 -- kind of like your speakers are underwater. In Ogg and WMA files at low bit rates, the tone of the music shifts a little. They did not sound as rich as the original, but they also did not sound "worse" than the original in terms of musical, ear-pleasing quality. I can stand listening to 96 kbps Ogg or WMA files every day, because although they are not identical to the originals, they sound very similar. However, listening to 96 kbps MP3's is a pain, and they sound terrible.

MP3 at low bitrates, like 80 kbps, is unbearable. I can tell this apart from the original without having to listen to the original at all, and even when I'm just listening to music in the background while doing other things, I sometimes get annoyed because I can hear the massive artifacts in a 64 or 80 kbps MP3. I have to strain to even figure out what the singer is singing for some MP3's at 64 or 80 kbps. (Do note that older MP3 encoders may perform quite poorly compared to the fairly recent LAME encoder I used.) On the other hand, Ogg and WMA are not so bad. To my ears, they perform about equally well at low bit rates, though I think Ogg has an edge here. I could not hear any defects in a 47 kbps Ogg file, and could not even spot any differences from the original until I actually listened to them side-by-side. Again, this is because Ogg seems to shift the tone a little, and my brain does not remember tone perfectly. This type of distortion is doubtlessly preferable to the MP3-style "underwater" distortion.

The End

At high bitrates, the crown goes to WMA or MP3, especially if an audio track has vocals. At low bitrates, WMA still performs superbly, but Ogg Vorbis is a notable competitor, especially with the open-source advantage. There are dozens of other audio formats out there, including lossless formats, such as Monkey's Audio, which perfectly preserve data and will always sound identical to the original. MP3, WMA, and Ogg Vorbis are only three of the most popular formats, and each has its own advantages.

The best solution, of course, is to buy a bigger hard drive and use lossless compression.

The Pirado

22 January 2011