Do you know what's actually on air?

We run a self-hosted internet radio station. It plays music around the clock, takes live DJs, crossfades between tracks, and streams the result to anyone who connects. It all works, mostly. In radio, “mostly” tends to happen during a live broadcast. The question that follows is the one that matters: how do you know it actually plays right?

Not “did the webhook fire.” Not “does the metadata match.” Does the audio coming out of the streaming server — the bytes a real listener receives — contain the right source, at the right time, with clean transitions and no unexpected gaps? That question turns out to be surprisingly hard to answer with conventional testing tools. Even established projects like LibreTime¹ — one of the more common building blocks in open-source radio infrastructure — have not solved it. An open issue puts it plainly: “Listening with the ear is ok for developing, but I hope we can find a way to analyse a test audio stream.” As of today, that issue is still open. This is my answer to it.

How internet radio works

Right, so an internet radio station is basically a pipeline. Audio goes in one end, a stream comes out the other. If you squint, it looks simple. Most setups use the following core components:

Liquidsoap is a programming language designed for audio streaming — yes, an entire language, because apparently config files were not enough. It routes between audio sources using a priority chain: if a live DJ is connected, play the DJ; if not, play whatever the scheduler queued; if the queue is empty, fall back to a default playlist. When the DJ disconnects, Liquidsoap falls back to the next available source automatically.

Icecast is the streaming server. It takes the processed audio from Liquidsoap and serves it over HTTP as an MP3 stream — the last hop before the listener. When you open a radio station URL in VLC or a web player, you are connecting to Icecast. It does one job and it does it well, which already puts it ahead of most software I deal with.

A scheduler decides what plays and when — queuing tracks for specific time slots, managing show rotations, and falling back to a default playlist when no live show is happening. Dead air on a radio station is the audio equivalent of a blank webpage — the listener assumes you have died and moves on.

Most radio setups stop there — Liquidsoap, Icecast, and a scheduler are the standard pairing. My setup adds a message broker (NATS) that does a lot of things — among them, propagating source changes as events to downstream systems: the web frontend, the metadata pipeline, the live indicator, and the now-playing display.

Why this is hard to test

In most software, inputs and outputs are discrete and inspectable — JSON, rows, payloads. An audio stream is continuous, real-time, and has no response body to deserialize. You cannot ask Icecast to replay the last 30 seconds the way you would re-query a database. The natural instinct is to sidestep the audio entirely and test what you can inspect: the events, the metadata, the webhooks. So that is what I tried. You can probably see where this is going.

The first version of the test suite asserted on control-plane events. Liquidsoap fires a webhook on every track transition. A NATS event confirms the live source is active. The metadata confirms the right track is playing. All green. Brilliant. The problem is what happens to the audio when a DJ’s connection drops briefly. Liquidsoap’s live input has a buffer that absorbs short network hiccups, but the buffer is finite. If a dropout lasts longer than what the buffer can cover, Liquidsoap’s fallback operator switches to the next available source — no built-in grace period, no “hang on, let me check if he is coming back.” Just gone.

A debounce state machine sits between Liquidsoap’s raw source-change events and the rest of the system, absorbing brief glitches so downstream systems never see a flap. If the live source disappears and reappears within a couple of seconds, the debounce treats it as a glitch — from the perspective of every downstream system, the DJ was live the entire time. Great. That is exactly what you want.

But the audio path has no such protection. When the buffer drained, the listener heard it: a brief pop, a fraction of a second of the wrong track bleeding through, then the live audio again.

The event says the DJ is live. The listener hears the scheduled track bleed through for half a second. Both are telling the truth about different layers of the system.

Event-level assertions test the control plane. They do not test the audio plane. To test what the listener hears, you have to listen.

Record what the listener hears

End-to-end testing for a web application means driving a real browser and asserting on what the user actually sees. The same principle applies here: connect to the stream the way a listener would, record the audio, run through a full broadcast scenario, stop recording, and analyse the file afterwards. No real-time assertions, no frantically checking logs while the test runs. Just a recording and all the time in the world to pick it apart.

The test scenario covers a complete broadcast lifecycle:

The regular (scheduled) playlist plays automatically
The scheduler pushes a specific track to the queue
A live DJ connects
The live session runs for a given amount of time (in our case, for 35 seconds)
The DJ disconnects
A new scheduled track starts from the queue

The stream arrives as MP3, but the test decodes it and saves the recording as a WAV file — uncompressed PCM audio². Everything after the recording stops is retroactive analysis. The recording is the test subject. Everything else is setup.

Frequency fingerprinting

We now have a recording of what the listener heard, but an audio recording is just a sequence of samples. To make assertions against it, we need a way to identify which source was playing at any given moment — was it the scheduled playlist, the live DJ, or something else? With real music, that would require audio fingerprinting against a known library, which is fragile, slow, and — depending on the rights situation — probably a conversation nobody wants to have. But since we control the test inputs, we can do something much simpler: assign each audio source a distinct sine wave frequency and treat it as a spectral fingerprint.

Source	Frequency	Role
Scheduled track (pre-live)	550 Hz	What was playing before the DJ connected
Live source	880 Hz	The DJ’s audio feed
Scheduled track (post-live)	770 Hz	What should play after the DJ disconnects

Each frequency is spectrally distinct, so an ffmpeg bandpass filter should isolate them cleanly. After the test, the analysis runs a filter chain on the captured audio output for each frequency in a specific time window:

# "Is 880Hz present between seconds 40 and 62 of the recording?"
# bandpass isolates the frequency, silencedetect finds gaps.
ffmpeg -i capture.wav \
  -af "atrim=start=40:duration=22,      # extract the time window
       asetpts=PTS-STARTPTS,             # reset timestamps
       bandpass=f=880:width_type=h:w=80,  # isolate ±80Hz around 880Hz
       silencedetect=noise=-35dB:d=0.5"  # flag any silence >0.5s
  -f null -

If the 880 Hz band goes silent during the live window, the live source dropped out. If the 550 Hz band is audible during the live window, the scheduled track leaked through — which means the priority chain flapped, even if the webhooks said otherwise.

The harmonics twist

The tests failed. Not for the reason you would expect. The analysis detected 550 Hz signal during the 880 Hz live session, which should have been impossible — there is no 550 Hz content in a pure 880 Hz sine wave. Except there is, after the compressor gets hold of it.

Liquidsoap’s audio processing chain includes a compressor, which reshapes the waveform in ways that generate new frequencies that were not in the original signal³. An 880 Hz tone, after compression, acquires real signal at 440 Hz, 1760 Hz, and other multiples. A bandpass filter looking for 550 Hz picks up the 440 Hz harmonic easily.

The compressor was adding harmonics to the test tone. The assertion was correct. The signal was not.

The fix was to spread the frequencies to 300 Hz, 2000 Hz, and 5000 Hz — wide enough that no amount of harmonic distortion from the processing chain can bridge the gap. The lesson was harder to swallow: your test signal is not your test signal after the device under test has had its way with it.

The full test

With the corrected frequencies in place, a Go wrapper shells out to ffmpeg for each frequency and time window, parsing the silence_start / silence_duration lines from stderr — because of course ffmpeg puts its useful output on stderr, like someone who whispers the important bits:

// detectBandSilence: was this frequency absent in the given window?
// Each gap means the source dropped out — a potential flap.
func detectBandSilence(t *testing.T, filePath string, freqHz int,
    startSec, endSec float64) []silenceGap {

    dur := endSec - startSec
    filter := fmt.Sprintf(
        "atrim=start=%.1f:duration=%.1f,asetpts=PTS-STARTPTS,"+
            "bandpass=f=%d:width_type=h:w=80,"+
            "silencedetect=noise=-35dB:d=0.5",
        startSec, dur, freqHz)

    cmd := exec.Command("ffmpeg", "-i", filePath,
        "-af", filter, "-f", "null", "-")
    out, _ := cmd.CombinedOutput()
    // ... parse silence_start/silence_duration pairs from stderr
}

The main test function runs through all phases while recording. Everything is sequential and real — real live streams from ffmpeg, real Liquidsoap processing, real Icecast output. Nothing is mocked. There is nowhere to hide.

// Phase 3: connect a live source at 5000Hz.
// ffmpeg sends a continuous sine wave to Liquidsoap.
liveCtx, liveCancel := context.WithCancel(context.Background())
startSRTStream(t, liveCtx, srtPort1, 5000)

// Wait for the LiveSourceDetected event on NATS — proves the debounce passed.
waitForLiveEvent(t, ec, "live_1", true, 15*time.Second)
liveStartOffset := time.Since(captureStarted).Seconds()

// Phase 4: let the live session run for 35 seconds.
time.Sleep(35 * time.Second)

// Phase 5: disconnect. Kill the ffmpeg process.
liveCancel()
waitForLiveEvent(t, ec, "live_1", false, 20*time.Second)
liveEndOffset := time.Since(captureStarted).Seconds()

After recording stops, five checks run against the captured audio output:

No dead air — scan the entire recording for silence longer than 8 seconds (above the ~5–7 seconds that can legitimately occur while Liquidsoap detects a disconnect and triggers a fallback)
Live tone (5000 Hz) — present during the live window, absent after disconnect
Scheduled tone (300 Hz / 2000 Hz) — absent during the live window, present after disconnect
One live event — exactly one LiveSourceDetected event on NATS (no flapping⁴)

What I learned

Record once, analyse as many times as you need. The captured audio output is the listener’s experience. Adding a new assertion — say, checking for a specific frequency in a new time window — does not require re-running the 100-second scenario. It is just another ffmpeg command on the same file. If I had known that from the start, I would have skipped the week I spent trying to make webhook assertions tell me what the listener heard. They cannot. They never could.

Record the stream output as a single continuous file, then analyse it retroactively with ffmpeg bandpass filters and silencedetect — each audio source plays a known frequency that acts as a spectral fingerprint
Audio processing chains add harmonics to pure test tones — separate your test frequencies by at least a factor of three or the compressor will confuse your assertions

LibreTime is an open-source radio automation platform. Issue #2076: “Setup e2e audio tests for liquidsoap and playout” — filed in 2022, still open as of April 2026. The request is for exactly this: programmatic analysis of test audio streams rather than manual listening. ↩︎
The MP3 encoding that Icecast applies is lossy — it discards frequency information to save bandwidth, introducing spectral artefacts in the process. Those artefacts are baked in and cannot be undone by decoding. But re-encoding the capture as MP3 would run a second lossy pass, compounding the distortion and making bandpass analysis unreliable. WAV preserves exactly what came out of the MP3 decode without piling more on. The file is large — about 10 MB per minute at 44.1 kHz mono — but it only exists for the duration of the test. ↩︎
A compressor reduces the dynamic range of a signal by attenuating peaks above a threshold. When the gain changes faster than the period of the waveform — which happens with fast attack times on low frequencies — the gain modulation effectively waveshapes the signal, generating harmonics at integer multiples of the fundamental. In this case, the compressor runs at −14 dB threshold with a 3:1 ratio and +3 dB gain, followed by a hard limiter at −1 dB — aggressive enough to produce clearly measurable harmonics on pure test tones. With multiple frequencies present, the same nonlinearity also produces intermodulation distortion (IMD): sum and difference frequencies that were not in the original signal. ↩︎
Flapping is rapid, spurious state changes — the live indicator toggling off and on because the DJ’s connection dropped for half a second. It confuses listeners and downstream systems alike. ↩︎