The situation is relatively fine when it comes to mainstream distros like Ubuntu, Debian, or Fedora because their developers have gone to great lengths to make audio work right out of the box, but the same can’t be said about Arch Linux, Gentoo, and other minimalistic distributions that expect users to configure everything from scratch.
This article won’t make you an expert on Linux audio, but it will, hopefully, explain the basic technologies responsible for making sound come out of your speakers when you open a video on YouTube or play a game on Steam.
Advanced Linux Sound Architecture (ALSA)
Let’s start with the most important layer of the Linux audio, ALSA. Created in 1998 by Czech software developer Jaroslav Kysela, ALSA is responsible for giving a voice to all modern Linux distributions. It’s actually part of the Linux kernel itself, providing audio functionality to the rest of the system via an application programming interface (API) for sound card device drivers.
The original design of ALSA was largely inspired by the Linux device driver for the Gravis Ultrasound sound card, which was made by Canada-based Advanced Gravis Computer Technology and became very popular in the demo scene during the 1990s.
ALSA support for all types of audio interfaces thanks to fully modularized sound drivers, can manage up to eight audio devices at the same time, access hardware MIDI functionality, perform hardware mixing of multiple channels, and more.
Users typically interact with ALSA using alsamixer, a graphical mixer program that can used to configure sound settings and adjust the volume of individual channels. Alsamixer runs in the terminal, and you can invoke it just by typing its name. One particularly useful keyboard command is activated by hitting the M key. This command toggles channel muting, and it’s a fairly common fix to many questions posted on Linux discussion boards.
Open Sound System (OSS)
The official website of ALSA mentions support for Open Sound System, or OSS for short. Until Linux 2.5, OSS was actually the main and only sound system for Linux. ALSA was designed to overcome its various shortcomings, such as the fact that it didn’t allow more than one application to access the hardware at a time. In Linux 2.6 ALSA replaced OSS as the default sound system.
When the developers of OSS announced that OSS version would have a proprietary license, a decision was quickly made by Linux developers to replace it with ALSA. It’s worth noting that OSS became free software again with the release of the version 4 in 2007. Today, OSS is distributed under four different licenses (BSD, CDDL, GPL, Proprietary).
Most Linux distributions these days don’t even bother activating the OSS emulation layer present in ALSA because almost nobody needs it anymore, making OSS a relic of the past.
If you don’t remember the last time you interacted with ALSA when changing your audio settings, that’s probably because the user-facing layer of the Linux audio system in most modern distributions is called PulseAudio.
PulseAudio was initially released in 2004, and it’s now included and enabled by default in Ubuntu, Linux Mint, openSUSE, and other major distributions. The job of PulseAudio is to pass sound data between your applications and your hardware, directing sounds coming from ALSA to various output destinations, such as your computer speakers or headphones. That’s why it’s commonly referred to as a sound server.
At first glance, it might seem that PulseAudio doesn’t really add anything critically important to Linux audio, and many of its critics share the same opinion. In reality, there are actually many things that would be impossible or difficult to accomplish without it, including mixing several sounds into one, transferring audio to a different machine, or changing the sample format or channel count.
PulseAudio also brings cross-platform compatibility (FreeBSD, NetBSD, OpenBSD, Linux, Illumos, Solaris, macOS, and, in a limited fashion, Microsoft Windows). If you want to control PulseAudio directly, instead of interacting with it through a volume control widget or panel of some sorts, you can install PulseAudio Volume Control (called pavucontrol in most package repositories).
If you feel that you have no use for the features provided by PulseAudio, you can either use pure ALSA or replace it with a different sound server.
PulseAudio vs. JACK
PulseAudio isn’t the only sound server for Linux. There’s also JACK, which is a recursive acronym for JACK Audio Connection Kit. Whereas PulseAudio was developed with the needs of general Linux users in mind, JACK is intended for DJs and audio professionals, providing real-time, low-latency connections for both audio and MIDI data.
Because JACK lets you connect the audio inputs and outputs of each and everyone one of your applications together, you can do some pretty cool things with it, such as monitoring your own voice, adding effects to it in real-time, and more. In fact, the name of this sound system was inspired by the cables used in real recording studios to build intricate connections between instruments, synthesizers, MIDI controllers, and multitrackers.
Arguably the biggest downside of JACK is that it usually either works perfectly or horribly, owning to the fact that its chief goal is to provide low-latency audio. It also requires considerably more CPU power compared with PulseAudio, which is why you’ll find it mostly on professional workstations dedicated to audio editing.
Audio on Linux seems complicated because it really is. Untangling the web of legacy technologies and layers of abstraction can be a real challenge even for seasoned Linux users who know the ins and outs of the operating system by heart. Hopefully, our article helped you better understand the most important components of the Linux audio system, including ALSA, OSS, and PulseAudio.