Is Opus royalty-free?

Opus is widely used as royalty-friendly. Organizations should still follow internal compliance (license notices, legal review where needed).

Opus vs AAC for music streaming?

Opus excels at realtime and speech; AAC dominates many on-demand music pipelines and device compatibility. Choose by latency, bitrate, and ecosystem.

What bitrate for Opus music?

Stereo music often starts around 128 kbps; below ~96 kbps artifacts grow quickly—use ABX tests.

Opus Audio Codec: Next-Gen Standard | WebRTC, Low Latency & FFmpeg

2026년 3월 30일 · 18분 읽기 · 수정 2026년 3월 30일 Intermediate Guide

이 글의 핵심

Opus: low latency, speech/music modes, WebRTC integration, and FFmpeg—next-gen royalty-friendly audio in one read.

Introduction

Opus (IETF RFC 6716) combines speech (SILK lineage) and music (CELT lineage) in one codec. ~2.5–400 kbps, adjustable frame durations, and low algorithmic delay make it strong for video calls, voice chat, and games where latency budgets are tight. It is widely known as royalty-friendly for browsers, mobile, and servers.

This article explains why Opus is the default WebRTC audio codec, what mode switching means, and how to encode with FFmpeg. It also contrasts AAC and MP3 from a product perspective.

After reading this post

Outline Opus history (Speex + CELT) and the hybrid design
Pick bitrate and frame size for speech vs music
Encode Opus in Ogg/WebM with FFmpeg
Contrast WebRTC, browsers, and VoIP with AAC/MP3 roles

Codec overview
How compression works
Practical encoding
Performance comparison
Real-world use cases
Optimization tips
Common problems
Wrap-up

Codec overview

History and background

Opus merges Xiph.org CELT with Skype’s SILK in a hybrid standardized by the IETF codec working group—one codec instead of splitting speech vs music tools. After 2012, adoption grew with WebRTC.

Technical characteristics

Topic	Description
Compression	Speech: LPC-style; music: MDCT (CELT); internal mode switching
Sample rate	8–48 kHz internally; 16 / 48 kHz common in VoIP
Bitrate	~6–510 kbps (mono upper bound varies by spec); speech works at tens of kbps
Latency	2.5–20 ms frame sizes—tunable for low delay

Modes: speech vs music

Speech-heavy, low bitrate: Narrowband through fullband—bits follow speech statistics.
Music / fullband: CELT dominates; stereo music usually wants ≥128 kbps in practice—below that, music collapses fast.

In realtime stacks, packet loss concealment (PLC) and jitter buffers sit alongside the codec (full WebRTC picture).

How compression works

Psychoacoustic model

CELT paths use masking like other perceptual codecs. SILK emphasizes speech production and puts bits on important formants.

MDCT

The CELT path uses MDCT with short frames and low-delay windows—critical for conversational quality.

Bit allocation

Opus detects content and adjusts mode and bit split. Users usually control bitrate, frame duration, and channels indirectly.

Pipeline (conceptual)

flowchart TB
  IN["PCM input"]
  DET["Speech vs music path"]
  SILK["SILK-style processing"]
  CELT["CELT (MDCT) processing"]
  MIX["Bitstream pack"]
  OUT["Opus frame"]
  IN --> DET
  DET --> SILK
  DET --> CELT
  SILK --> MIX
  CELT --> MIX
  MIX --> OUT

Practical encoding

Ogg Opus: mono voice/podcast ~32 kbps

ffmpeg -i input.wav -c:a libopus -b:a 32k -ac 1 -ar 48000 voice.opus

Stereo music 128 kbps (common starting point)

ffmpeg -i input.wav -c:a libopus -b:a 128k -ac 2 -ar 48000 music.opus

Higher music quality 160–192 kbps

ffmpeg -i input.wav -c:a libopus -b:a 192k -ac 2 -ar 48000 music_hq.opus

WebM (with video passthrough)

ffmpeg -i input.mkv -c:v copy -c:a libopus -b:a 96k -ac 2 output.webm

Parameter guide

-frame_duration: When supported, shorter frames help ultra-low latency—see ffmpeg -h encoder=libopus.
VBR: Often default; some networks need near-CBR.
48 kHz aligns with WebRTC conventions for voice.

Quality vs size

Speech-only can work 12–40 kbps; music below ~96 kbps often shows obvious artifacts. For archival music, start 128 kbps stereo and ABX from there.

Performance comparison

Compression

Low-bitrate speech: Opus is very strong vs legacy speech codecs.
Music file distribution: AAC vs Vorbis vs Opus depends on bitrate, encoder, and taste—Opus’s edge is often realtime + low delay.

Speed

libopus is fast on embedded and mobile; WebRTC stacks add DSP optimizations.

MOS / quality metrics

Voice: PESQ/POLQA; music: MUSHRA-style tests. Production needs E2E tests including jitter and loss.

Real-world use cases

Streaming

On-demand music still often uses AAC. Discord-style voice and game chat center on Opus.

Mobile

WebRTC video uses Opus by default. Upload-only recording may still favor AAC or FLAC.

VoIP and WebRTC

Opus is effectively mandatory in modern WebRTC—look for opus/48000/2 in SDP. Packet time, FEC, and NACK shape perceived quality.

Browsers

Modern browsers support WebRTC Opus. For file playback, check Ogg Opus and WebM support (see MDN).

Optimization tips

Smaller files without trashing quality

Speech-only: try mono + 24–48 kbps.
Music: cutting bitrate too far destroys stereo imaging—guard 128 kbps with ABX.

Faster encoding

Batch jobs gain more from parallel files than micro-tuning presets:

parallel ffmpeg -y -i {} -c:a libopus -b:a 128k -ac 2 {.}.opus ::: *.wav

Validation

opusinfo (opus-tools) can verify headers before release.

Common problems

Compatibility

Old Windows players may limit Ogg—wide audiences sometimes need MP3/AAC too.
Ogg Opus vs Opus in WebM are different container paths—do not confuse them.

Quality

Do not expect music at VoIP bitrates—set minimum bitrate by use case.
Heavy mic DSP (noise suppression, AGC) can damage audio before the codec—tune preprocessing.

Licensing

Opus is widely treated as royalty-friendly; follow BSD license attribution and internal policies. Seek legal sign-off when patents are a concern.

Wrap-up

Summary

Opus is a speech + music hybrid with tunable frame sizes—ideal for low-latency realtime.
It is the de facto WebRTC audio codec for games and collaboration.
File distribution works, but split minimum bitrates for speech vs music.

When to choose Opus

Realtime voice, video, games: Opus first; tune with network and jitter buffers.
Broad device music streaming: often AAC pipelines; experiment with Opus for new products.
Voice archives / podcasts: mono low-bitrate Opus can save bandwidth and RSS cost.

References

RFC 6716: Definition of the Opus Audio Codec
Opus: https://opus-codec.org/
FFmpeg libopus: ffmpeg -h encoder=libopus