First, the CD format was becoming the de-facto standard for high quality audio. To transmit a mere four minutes of CD-quality audio it would have taken almost ten hours at the then-typical modem speed of 9.6 kb/s. This raised the challenge that transmitting (or storing) a library of high-quality audio was nearly impossible. Second, digital signal processing (DSP) was advancing in leaps, CPUs were rapidly becoming faster, memory was becoming affordable, and powerful portable devices were starting to show up in the marketplace. Third, a body of psychoacoustics research data was becoming more accessible which showed us that, due to the limits of human hearing, there was a lot of irrelevant information in the CD format and thus the potential for large data rate savings.
Audio Coding is a field at the intersection of many disciplines that has flourished and in the past 30 years by leveraging advances in research and technology. By exploiting the advances in DSP to represent audio signals in ever more compact and efficient ways, applying heuristic models to identify irrelevant components, and optimizing distortion-rate trade-offs, audio coding made transmission/storage of high-quality audio a reality and also radically changed our approach to audio. Few of us would have dared imagine the revolutionary impact that audio coding would have on the general consumption of digital media.
Fast-forward to today—fast broadband connections and large cloud storage capacity are widely available; we are starting to watch ultra-high definition television, and we will shortly be communicating through 5G telephone networks. Do we still need to worry about compressing audio? I believe the answer is “yes!” Although bandwidth and storage are becoming abundant, we are demanding more audio channels, spatial control, customizability, immersive technology in smaller formats, and more ubiquity and availability from the audio we consume. Responding to these needs, work continues in many directions including 3D sound; immersive 6 Degree of Freedom (6 Dof) audio; and more device-neutral, personalized approaches to how we represent/render audio.