Complete AAC Audio Codec Guide | LC-AAC, HE-AAC & FFmpeg Practical Encoding

Complete AAC Audio Codec Guide | LC-AAC, HE-AAC & FFmpeg Practical Encoding

이 글의 핵심

Complete guide to AAC codec profiles (LC-AAC, HE-AAC), MPEG-4 container integration, and FFmpeg encoding options. Learn how to balance quality and file size for streaming and mobile applications.

Introduction

AAC (Advanced Audio Coding) is a MPEG-family lossy compression codec designed as the successor to MP3, aiming to provide better audio quality at the same bitrate. It is used as a de facto standard in modern streaming and mobile ecosystems such as HLS, DASH, and MP4, where encoder quality and profile selection (LC-AAC, HE-AAC) directly impact perceived service audio quality and bandwidth costs.

Major streaming platforms like YouTube, Netflix, and Spotify use AAC as their default audio codec because of its compression efficiency and wide device support. Particularly in mobile environments, hardware-accelerated decoding is available, reducing battery consumption while providing high-quality audio.

In production, you must simultaneously decide “which profile to use,” “what bitrate in kbps,” and “which FFmpeg encoder options are appropriate.” This article connects understanding codec structure, reproducible FFmpeg examples, and quality vs. file size tradeoffs in one flow.

What You’ll Learn

  • Understand AAC’s history, its position in MPEG-2/4, and differences between major profiles (LC-AAC, HE-AAC)
  • Grasp the big picture of psychoacoustic model and MDCT-based block coding
  • Construct AAC encoding commands with FFmpeg for specific purposes
  • Organize bitrate and container selection criteria from streaming and mobile perspectives
  • Learn common problems and solutions encountered in practice

Table of Contents

  1. Codec Overview
  2. Compression Principles
  3. Practical Encoding
  4. Performance Comparison
  5. Real-World Use Cases
  6. Optimization Tips
  7. Common Problems and Solutions
  8. Conclusion

Codec Overview

History and Development Background

AAC was standardized in 1997 as ISO/IEC 13818-7 (MPEG-2 Part 7) and later integrated and extended into ISO/IEC 14496-3 (MPEG-4 Audio).

The development goal was clear: provide better audio quality than MP3 at the same bitrate. To achieve this, the following elements were improved compared to MP3 (MPEG-1/2 Layer 3):

  • More sophisticated filter bank: Improved frequency resolution
  • Enhanced critical band division: Closer to human auditory characteristics
  • Improved tone and noise modeling: Increased efficiency for both music and speech
  • More flexible block size: Adapts to instantaneous sound changes

In commercial services, it was widely adopted in the Apple ecosystem (iTunes, Apple Music, AAC-LC based) and adaptive streaming (AAC in HLS). The iPod’s AAC support in the mid-2000s marked a turning point in popularization, followed by major platforms like YouTube and Netflix adopting AAC as standard.

Technical Features

ItemDescription
Compression MethodLossy compression based on perceptual coding, MDCT-family transform, quantization, lossless codebook, etc.
Sample Rate8 kHz~96 kHz supported (depending on profile and implementation), 44.1/48 kHz most common in practice
BitrateMusic: 128~256 kbps (stereo) range is common, HE-AAC advantageous at lower bitrate bands
ChannelsExpandable from mono to multi-channel (5.1, 7.1, etc.), stereo is typical for streaming
LatencyApproximately 20~50ms depending on frame size (profile and settings dependent)
ContainerMP4 (.m4a), ADTS (.aac), 3GP, MPEG-TS, etc.

Major Profiles

AAC provides several profiles for different purposes. Each profile sets different tradeoffs between compression efficiency and complexity.

ProfileTechnologyRecommended BitrateMain Use
AAC-LCBasic MDCT + psychoacoustic128~256 kbps (stereo)Music, podcasts, VOD, general purpose
HE-AAC v1LC + SBR (high-frequency restoration)64~96 kbps (stereo)Mobile streaming, radio
HE-AAC v2v1 + PS (parametric stereo)32~64 kbps (stereo)Ultra-low bandwidth, speech-focused

AAC-LC (Low Complexity)

The most widely supported basic profile. Compresses using only MDCT and psychoacoustic model without complex additional technologies:

  • Advantages:

    • Playable on almost all devices (smartphones, PCs, cars, smart TVs, etc.)
    • Low decoding load (good battery efficiency)
    • Predictable and stable audio quality
    • Extensive hardware acceleration support
  • Disadvantages:

    • At low bitrates (64 kbps or below), audio quality may be inferior to HE-AAC
    • Inefficient in environments requiring extreme bandwidth savings
  • Recommended Scenarios:

    • General music streaming (Spotify, Apple Music, etc.)
    • Podcasts (128~160 kbps)
    • VOD services (audio tracks for video)
    • Offline music files

HE-AAC v1 (High-Efficiency AAC)

Adds SBR (Spectral Band Replication) technology to efficiently restore high-frequency components:

  • Principle:

    • Only transmit low-frequency band (~8kHz)
    • High frequencies restored by decoder analyzing low-frequency patterns
    • Exploits human hearing being less sensitive to high-frequency details
  • Advantages:

    • 64~96 kbps can achieve AAC-LC 128 kbps level quality
    • Significant bandwidth savings (mobile data cost reduction)
    • Maintains appropriate quality for both music and speech
  • Disadvantages:

    • Increased decoding complexity (CPU usage ~1.5x)
    • Older devices (pre-2010) may not support
    • Artifacts possible in music where high frequencies are important (cymbals, hi-hats)
  • Recommended Scenarios:

    • Mobile network environments (3G, slow 4G)
    • Internet radio broadcasting
    • Streaming with bandwidth limitations

HE-AAC v2 (HE-AAC + PS)

Adds Parametric Stereo (PS) to compress stereo information as parameters:

  • Principle:

    • Transmit only mono signal, stereo position information as parameters
    • Express left-right channel differences as mathematical model
    • Decoder reconstructs stereo from mono + parameters
  • Advantages:

    • Maintains stereo feel even at 32~48 kbps
    • Extreme bandwidth savings (voice call level)
    • Minimized file size
  • Disadvantages:

    • More suitable for speech/talk than music
    • Complex music (orchestra, rock) has stereo image distortion
    • Limited device support (mainly recent devices)
  • Recommended Scenarios:

    • Voice calls, audiobooks
    • Talk shows, podcasts (without music)
    • Ultra-low bandwidth environments (satellite communication, etc.)

Selection Guide:

Here’s a detailed implementation using Mermaid. Please review the code to understand the role of each part.

flowchart TD
    START[Decide Bitrate]
    START --> Q1{128 kbps or higher?}
    Q1 -->|Yes| LC[Use AAC-LC]
    Q1 -->|No| Q2{64~96 kbps?}
    Q2 -->|Yes| Q3{Music-focused?}
    Q3 -->|Yes| HE1[Use HE-AAC v1]
    Q3 -->|No| HE2[Consider HE-AAC v2]
    Q2 -->|No| Q4{48 kbps or below?}
    Q4 -->|Yes| Q5{Speech only?}
    Q5 -->|Yes| HE2
    Q5 -->|No| OPUS[Recommend reviewing Opus]
    
    LC --> RESULT1["Best compatibility\nStable quality"]
    HE1 --> RESULT2["Bandwidth savings\nMobile optimized"]
    HE2 --> RESULT3["Extreme compression\nSpeech specialized"]
    OPUS --> RESULT4["Real-time calls\nUltra-low latency"]

Compression Principles

Psychoacoustic Model

The core of AAC is the principle that sounds humans cannot hear are not stored.

Human hearing has the following characteristics:

  1. Simultaneous Masking: When there’s a loud sound, you can’t hear small sounds at similar frequencies

    • Example: When a drum kick sounds, subtle bass guitar vibrations are inaudible
  2. Temporal Masking: You can’t hear small sounds immediately before or after loud sounds

    • Example: Background noise is not perceived for 20~30ms right after cymbal strike
  3. Frequency Sensitivity Differences: Most sensitive in 2~5 kHz band, less sensitive toward low and high frequencies

AAC uses this to apply coarser quantization to perceptually less important coefficients to save bits. Rather than “reducing data,” the strategy is reducing components that are unlikely to be heard first.

MDCT (Modified Discrete Cosine Transform)

AAC family mainly uses MDCT-based filter bank to convert time-domain signals to frequency coefficients.

MDCT Features:

  • Overlap-Add structure: Removes discontinuities at block boundaries
  • Variable block length:
    • Long blocks (1024 samples): Steady-state signals, excellent frequency resolution
    • Short blocks (128 samples): Instantaneous attack sounds (transients), excellent time resolution
  • Critical band-based grouping: Groups coefficients according to human auditory frequency resolution

Why use MDCT?

In time domain, you can know sound changes but frequency characteristics are difficult to know. Conversely, in frequency domain, you can know what sounds exist but time information is lacking. MDCT appropriately balances both while removing block boundary artifacts with overlap structure.

Bitrate Allocation Strategy

Internally, bit pool is divided by frequency band, and bits are distributed to tone and noise components according to psychoacoustic weights.

Allocation Process:

  1. Psychoacoustic analysis: Calculate masking threshold for each frequency band
  2. Bit budget distribution: Allocate more bits to important bands
  3. Quantization step decision: Quantize with different precision per band
  4. Iterative optimization: Minimize distortion while meeting target bitrate

At low bitrates, the following techniques additionally operate:

  • Bandwidth limitation: Don’t transmit high frequencies at all (e.g., remove above 16 kHz)
  • Spectral line replacement: Approximate complex frequency components with noise
  • TNS (Temporal Noise Shaping): Optimize time-domain noise distribution

HE-AAC family efficiently encodes high-frequency information as a separate layer with SBR (Spectral Band Replication). Actually transmitted data includes only below 8 kHz, and 8~16 kHz is restored with SBR parameters.

Processing Flow

Below shows the entire AAC encoding pipeline.

Here’s a detailed implementation using Mermaid. Please review the code to understand the role of each part.

flowchart TB
    subgraph INPUT[Input Stage]
        PCM["PCM Audio\n44.1/48 kHz"]
    end
    
    subgraph ANALYSIS[Analysis Stage]
        PSY["Psychoacoustic Analysis\nMasking Threshold Calculation"]
        MDCT["MDCT Transform\nTime→Frequency Domain"]
    end
    
    subgraph ENCODING[Encoding Stage]
        QUANT["Quantization\nBit Allocation per Band"]
        CODEBOOK["Codebook Selection\nEfficient Representation"]
        TNS["TNS Application\nTemporal Noise Shaping"]
    end
    
    subgraph OPTIONAL[Optional Technologies]
        SBR["SBR\nHigh-frequency Restoration\nHE-AAC v1"]
        PS["PS\nParametric Stereo\nHE-AAC v2"]
    end
    
    subgraph OUTPUT[Output Stage]
        ENTROPY["Entropy Coding\nHuffman etc."]
        BITSTREAM["AAC Bitstream\nADTS/MP4"]
    end
    
    PCM --> PSY
    PCM --> MDCT
    PSY --> QUANT
    MDCT --> QUANT
    QUANT --> CODEBOOK
    CODEBOOK --> TNS
    TNS --> ENTROPY
    
    PSY -.-> SBR
    QUANT -.-> SBR
    SBR -.-> ENTROPY
    
    PSY -.-> PS
    PS -.-> ENTROPY
    
    ENTROPY --> BITSTREAM
    
    style INPUT fill:#e3f2fd
    style ANALYSIS fill:#fff3e0
    style ENCODING fill:#f3e5f5
    style OPTIONAL fill:#e8f5e9
    style OUTPUT fill:#fce4ec

Practical Encoding

FFmpeg Encoder Selection

FFmpeg provides two AAC encoders:

EncoderFeaturesWhen to Use
aacNative encoder, provided by defaultGeneral purpose, fast encoding
libfdk_aacHigh-quality encoder, requires separate buildWhen highest quality is needed

Check encoders:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Check available AAC encoders
ffmpeg -encoders | grep aac

# Example output:
# A..... aac                  AAC (Advanced Audio Coding)
# A..... libfdk_aac           Fraunhofer FDK AAC

Basic Encoding Examples

1. AAC-LC, CBR Stereo 128 kbps

The most basic and stable setting. Suitable for most music streaming.

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a aac \
  -b:a 128k \
  -ar 48000 \
  -ac 2 \
  -aac_coder twoloop \
  output.m4a

Option Explanation:

  • -c:a aac: Use AAC encoder
  • -b:a 128k: Bitrate 128 kbps
  • -ar 48000: Sample rate 48 kHz (recommended when combining with video)
  • -ac 2: Stereo (2 channels)
  • -aac_coder twoloop: High-quality encoding algorithm (slower than default but better quality)

2. AAC-LC, VBR Quality Mode

Encode based on quality rather than fixed bitrate.

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Native aac encoder (VBR-like)
ffmpeg -i input.wav \
  -c:a aac \
  -q:a 1 \
  -ar 44100 \
  -ac 2 \
  output.m4a

-q:a Value Guide:

  • 0: Highest quality (~256 kbps)
  • 1: High quality (~192 kbps)
  • 2: Medium quality (~128 kbps)
  • 3-4: Low quality (~96 kbps)

3. Using libfdk_aac (High Quality)

If libfdk_aac is installed, you can get better audio quality:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a libfdk_aac \
  -profile:a aac_low \
  -vbr 4 \
  -ar 44100 \
  -ac 2 \
  output.m4a

-vbr Value Guide (libfdk_aac):

  • 1: ~32 kbps
  • 2: ~48 kbps
  • 3: ~64 kbps
  • 4: ~96 kbps
  • 5: ~128 kbps

4. HE-AAC v1 Encoding

For low-bitrate mobile streaming:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a libfdk_aac \
  -profile:a aac_he \
  -b:a 64k \
  -ar 44100 \
  -ac 2 \
  output.m4a

Note: Native aac encoder does not support HE-AAC. libfdk_aac is required.

5. HE-AAC v2 Encoding

For ultra-low bitrate speech:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a libfdk_aac \
  -profile:a aac_he_v2 \
  -b:a 32k \
  -ar 44100 \
  -ac 2 \
  output.m4a

6. AAC for HLS Streaming

ADTS format for HTTP Live Streaming:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a aac \
  -b:a 160k \
  -ar 48000 \
  -ac 2 \
  -f adts \
  output.aac

ADTS vs MP4:

  • ADTS: Header included in each frame, suitable for streaming segments
  • MP4: Metadata concentrated at file start/end, suitable for download playback

7. Encoding with Video

Re-encode only audio of video file to AAC:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.mp4 \
  -c:v copy \
  -c:a aac \
  -b:a 192k \
  -ar 48000 \
  output.mp4

-c:v copy: Copy video without re-encoding (fast and no quality loss)

Parameter Tuning Guide

Sample Rate Selection

Sample RateUseNotes
44.1 kHzCD quality, musicRecommend maintaining if source is 44.1
48 kHzVideo standard, broadcastRequired when combining with video
32 kHzLow-quality streamingOnly for extreme bandwidth savings
22.05 kHzSpeech, radioUnsuitable for music

Resampling Caution: If source is 44.1 kHz and you upsample to 48 kHz then downsample back to 44.1 kHz, there’s risk of aliasing and distortion. Maintain source sample rate when possible.

Bitrate Selection

Music Streaming (stereo basis):

BitrateQualityUse
256 kbpsNear transparentHigh-quality streaming, audio files
192 kbpsExcellentGeneral streaming (Spotify Premium, etc.)
128 kbpsGoodStandard streaming, mobile
96 kbpsAverageLow-bandwidth environment (HE-AAC v1 recommended)
64 kbpsLowHE-AAC v1 required
48 kbps or belowSpeech onlyHE-AAC v2 or Opus

Podcast/Speech:

  • Speech only: 6496 kbps (mono 3248 kbps also possible)
  • Speech + background music: 96~128 kbps
  • High-quality interview: 128~160 kbps

Genre-specific Recommended Bitrates:

  • Classical, Jazz: 192~256 kbps (reverb and detail important)
  • Pop, Rock: 128~192 kbps
  • Electronic: 160~192 kbps (high-frequency synthesizers)
  • Talk, Audiobooks: 64~96 kbps

Quality vs File Size Tradeoff

Actual File Size Examples (3-minute music, stereo):

BitrateFile SizePerceived Quality
320 kbps (MP3 max)~7.2 MBTransparent
256 kbps (AAC)~5.8 MBNearly transparent
192 kbps (AAC)~4.3 MBExcellent (most satisfied)
128 kbps (AAC)~2.9 MBGood (OK for general listening)
96 kbps (HE-AAC v1)~2.2 MBAverage (mobile environment)
64 kbps (HE-AAC v1)~1.4 MBLow (speech-focused)

In the same listening environment, 192 kbps vs 128 kbps has significant perceived difference relative to file size. However, 256 kbps or higher requires cost-benefit analysis (bandwidth and CDN costs) depending on situation.

ABX listening tests are recommended to establish team baselines:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Encode same source at multiple bitrates
ffmpeg -i source.wav -c:a aac -b:a 128k test_128.m4a
ffmpeg -i source.wav -c:a aac -b:a 192k test_192.m4a
ffmpeg -i source.wav -c:a aac -b:a 256k test_256.m4a

# Check differences with blind test

Performance Comparison

Compression Efficiency vs Other Codecs

General evaluation under the same subjective listening conditions (may vary by content and encoder):

BitrateMP3AAC-LCHE-AAC v1Opus
32 kbpsVery lowLowAverageExcellent (speech)
64 kbpsLowAverageGoodExcellent
96 kbpsAverageGoodExcellentExcellent
128 kbpsGoodExcellentExcessiveExcellent
192 kbpsExcellentNearly transparentExcessiveNearly transparent

Conclusion:

  • 128 kbps or higher: Clear difference between AAC-LC and MP3, AAC advantageous
  • 64~96 kbps: HE-AAC v1 clearly superior to AAC-LC
  • 48 kbps or below: Opus often superior to AAC (speech specialized)

Encoding and Decoding Speed

Decoding Performance:

  • Hardware Acceleration: Most mobile SoCs have built-in AAC hardware decoder

    • iPhone: All A-series chips
    • Android: Most Snapdragon, Exynos, MediaTek
    • PC: Intel Quick Sync, AMD VCE
  • Battery Efficiency: Hardware decoding uses 1/5~1/10 power consumption compared to software

Encoding Performance (Intel i7-10700K basis, 3-minute music):

SettingEncoding TimeReal-time Multiple
AAC-LC 128k (aac)~2s90x
AAC-LC 192k (aac, twoloop)~5s36x
AAC-LC 192k (libfdk_aac, vbr 5)~8s22x
HE-AAC v1 64k (libfdk_aac)~12s15x

FFmpeg aac is often sufficient for real-time batch processing, and high-quality presets and multi-pass can increase CPU time.

Subjective Quality (MOS)

MOS (Mean Opinion Score) varies by experimental conditions and codec version.

Typical MOS Range (out of 5):

BitrateAAC-LCHE-AAC v1MP3
64 kbps3.0~3.53.5~4.02.5~3.0
96 kbps3.5~4.04.0~4.33.0~3.5
128 kbps4.0~4.54.3~4.63.5~4.0
192 kbps4.5~4.8-4.0~4.5

In practice, refer to standard listening procedures like ITU-R BS.1534 (MUSHRA), but it’s safer to design your own MOS survey with service-specific target devices and earphones.


Real-World Use Cases

Streaming Services

Major Platform AAC Usage:

  • Apple Music: AAC 256 kbps (high quality), AAC 128 kbps (standard)
  • YouTube: AAC 128~256 kbps (video audio track)
  • Netflix: AAC 192~640 kbps (up to 5.1 channels)
  • Spotify: Previously Ogg Vorbis-focused but uses AAC on some platforms

Adaptive Streaming Configuration Example:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Create multiple bitrate versions
ffmpeg -i source.wav -c:a aac -b:a 256k -ar 48000 audio_256k.m4a
ffmpeg -i source.wav -c:a aac -b:a 192k -ar 48000 audio_192k.m4a
ffmpeg -i source.wav -c:a aac -b:a 128k -ar 48000 audio_128k.m4a
ffmpeg -i source.wav -c:a libfdk_aac -profile:a aac_he -b:a 64k audio_64k.m4a

Client automatically selects appropriate version based on network speed.

Mobile Apps

Both iOS and Android have good basic support for AAC decoding.

Mobile Optimization Strategy:

  1. Offline Cache:

    • Wi-Fi: 192 kbps AAC-LC
    • Mobile data: 96 kbps HE-AAC v1
  2. Background Playback: Battery savings with hardware decoding

  3. A/B Testing:

    • Provide different bitrates to user groups
    • Collect churn rate, playback completion rate, user feedback

Actual Implementation Example (pseudocode):

Here’s an implementation example using JavaScript. Try running the code directly to see how it works.

// Quality selection based on network status
const quality = networkSpeed > 5000 ? 'high' : 
                networkSpeed > 2000 ? 'medium' : 'low';

const audioUrl = {
  high: '/audio/song_192k.m4a',    // AAC-LC 192 kbps
  medium: '/audio/song_128k.m4a',  // AAC-LC 128 kbps
  low: '/audio/song_64k.m4a'       // HE-AAC v1 64 kbps
}[quality];

Podcast Production

Podcasts are mainly speech, so low bitrates are sufficient:

Recommended Settings:

Here’s a detailed implementation using bash. Please review the code to understand the role of each part.

# Speech-only podcast (mono)
ffmpeg -i podcast.wav \
  -c:a aac \
  -b:a 64k \
  -ar 44100 \
  -ac 1 \
  podcast_mono.m4a

# Speech + intro/outro music (stereo)
ffmpeg -i podcast.wav \
  -c:a aac \
  -b:a 96k \
  -ar 44100 \
  -ac 2 \
  podcast_stereo.m4a

File Size Comparison (1-hour podcast):

  • 64 kbps mono: ~28 MB
  • 96 kbps stereo: ~43 MB
  • 128 kbps stereo: ~57 MB

VoIP and WebRTC

For real-time voice communication, Opus is more suitable than AAC:

  • AAC: Large frame size causes latency (2050ms)
  • Opus: Ultra-low latency settings possible (510ms)

AAC is closer to recorded media files and adaptive streaming rather than real-time calls due to latency and framing characteristics.

Browser Support

HTML5 <audio> tag:

Here’s an implementation example using HTML. Try running the code directly to see how it works.

<audio controls>
  <source src="audio.m4a" type="audio/mp4">
  <source src="audio.mp3" type="audio/mpeg">
  Your browser does not support audio.
</audio>

Support Status:

BrowserAAC in MP4AAC in ADTS
Chrome⚠️ Limited
Firefox⚠️ Limited
Safari
Edge⚠️ Limited

Recommendation: Use MP4 container (.m4a) on web. ADTS support is unstable in some browsers.

AAC in fMP4 (fragmented MP4) in MSE (Media Source Extensions) environment is widely supported. It’s the foundation technology for adaptive streaming like HLS and DASH.


Optimization Tips

Reducing File Size While Maintaining Quality

1. Remove Unnecessary Resampling

Maintaining source sample rate reduces quality loss and processing time:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Bad example: Unnecessary resampling
ffmpeg -i input_44100.wav -ar 48000 -c:a aac -b:a 128k temp.m4a
ffmpeg -i temp.m4a -ar 44100 output.m4a  # Quality degradation!

# Good example: Maintain source
ffmpeg -i input_44100.wav -c:a aac -b:a 128k output.m4a

2. Utilize HE-AAC

For low-bitrate mobile streams, HE-AAC can be advantageous for perceived quality vs file size:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Compare AAC-LC 128 kbps vs HE-AAC v1 64 kbps
ffmpeg -i input.wav -c:a aac -b:a 128k lc_128.m4a
ffmpeg -i input.wav -c:a libfdk_aac -profile:a aac_he -b:a 64k he_64.m4a

# File size is half, quality is similar level

Note: Client support verification required. Older devices may not play HE-AAC.

3. Optimize Silent Sections

For content with long silence (lectures, interviews), removing by editing reduces file size:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Detect and remove silence (FFmpeg)
ffmpeg -i input.wav \
  -af "silenceremove=start_periods=1:start_silence=0.1:start_threshold=-50dB" \
  -c:a aac -b:a 96k \
  output.m4a

Improving Encoding Speed

1. Use Single-Pass CBR

Faster and more predictable than VBR or multi-pass:

# Fast encoding (CBR)
ffmpeg -i input.wav -c:a aac -b:a 128k -ar 48000 output.m4a

2. CPU Parallelization

Encode multiple files simultaneously:

Here’s a simple bash code example. Try running the code directly to see how it works.

# Using GNU parallel
find ./wav -name '*.wav' -print0 | \
  parallel -0 -j 4 \
  ffmpeg -y -i {} -c:a aac -b:a 160k {.}.m4a

-j 4: Process 4 files simultaneously (adjust to CPU core count)

3. Hardware Acceleration (Limited for Encoding)

AAC encoding is mostly software-based. Hardware acceleration is extensively supported only for decoding.

Batch Processing Automation

Script example for consistently encoding large numbers of files:

Here’s a detailed implementation using bash. Please review the code to understand the role of each part.

#!/bin/bash
# batch_encode.sh

INPUT_DIR="./source"
OUTPUT_DIR="./encoded"
BITRATE="128k"
SAMPLE_RATE="48000"

mkdir -p "$OUTPUT_DIR"

for file in "$INPUT_DIR"/*.wav; do
  filename=$(basename "$file" .wav)
  echo "Encoding: $filename"
  
  ffmpeg -i "$file" \
    -c:a aac \
    -b:a "$BITRATE" \
    -ar "$SAMPLE_RATE" \
    -ac 2 \
    -aac_coder twoloop \
    "$OUTPUT_DIR/${filename}.m4a" \
    -y
    
  # Quality verification: Generate spectrogram
  ffmpeg -i "$OUTPUT_DIR/${filename}.m4a" \
    -lavfi showspectrumpic=s=1920x1080 \
    "$OUTPUT_DIR/${filename}_spectrum.png" \
    -y
done

echo "Batch encoding complete!"

CI/CD Integration:

Here’s a detailed implementation using YAML. Please review the code to understand the role of each part.

# .github/workflows/encode-audio.yml
name: Encode Audio Files

on:
  push:
    paths:
      - 'audio/source/**/*.wav'

jobs:
  encode:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Install FFmpeg
        run: sudo apt-get install -y ffmpeg
      
      - name: Encode AAC
        run: |
          for file in audio/source/*.wav; do
            ffmpeg -i "$file" -c:a aac -b:a 192k "${file%.wav}.m4a"
          done
      
      - name: Upload artifacts
        uses: actions/upload-artifact@v3
        with:
          name: encoded-audio
          path: audio/source/*.m4a

Saving input fingerprint (checksum) and output spectrogram snapshots in CI helps with regression detection when upgrading encoders.


Common Problems and Solutions

Compatibility Issues

Problem 1: HE-AAC Won’t Play

Symptom: No sound or error on older devices

Cause: HE-AAC decoder not supported (pre-2010 devices)

Solution:

# Fallback strategy: Also provide AAC-LC version
ffmpeg -i input.wav -c:a aac -b:a 128k fallback_lc.m4a
ffmpeg -i input.wav -c:a libfdk_aac -profile:a aac_he -b:a 64k modern_he.m4a

On web, provide multiple sources in <audio> tag:

Here’s a simple HTML code example. Try running the code directly to see how it works.

<audio controls>
  <source src="audio_he.m4a" type="audio/mp4">
  <source src="audio_lc.m4a" type="audio/mp4">
</audio>

Problem 2: ADTS Files Won’t Play in Browser

Symptom: .aac files fail to play in Chrome/Firefox

Cause: Browsers prefer MP4 container, limited ADTS support

Solution:

# Convert ADTS to MP4 (without re-encoding)
ffmpeg -i input.aac -c:a copy -movflags +faststart output.m4a

-movflags +faststart: Move moov atom to file beginning (streaming optimization)

Problem 3: Video and Audio Out of Sync

Symptom: Video and audio gradually drift apart

Cause: Sample rate mismatch, timestamp errors

Solution:

Here’s an implementation example using bash. Perform tasks efficiently with async processing. Try running the code directly to see how it works.

# Unify video and audio sample rate
ffmpeg -i video.mp4 -i audio.wav \
  -c:v copy \
  -c:a aac -b:a 192k -ar 48000 \
  -async 1 \
  output.mp4

-async 1: Automatically adjust audio timestamps to match video

Audio Quality Degradation

Problem 4: Sound Feels “Broken”

Cause 1: Clipping

Source audio exceeds 0 dBFS causing distortion:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Adjust gain to secure headroom
ffmpeg -i input.wav \
  -af "volume=-3dB" \
  -c:a aac -b:a 192k \
  output.m4a

Recommended Headroom: -1~-3 dBTP (True Peak)

Cause 2: Insufficient Bitrate

Encoding complex music at low bitrate:

# Increase bitrate
ffmpeg -i input.wav -c:a aac -b:a 192k output.m4a  # 128k → 192k

Problem 5: High Frequencies Sound Harsh

Symptom: Hi-hats, cymbals, high vocals sound “hissy”

Cause: High-frequency quantization errors at low bitrate

Solution 1: Increase Bitrate

ffmpeg -i input.wav -c:a aac -b:a 192k output.m4a

Solution 2: Use HE-AAC

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Restore high frequencies with SBR
ffmpeg -i input.wav \
  -c:a libfdk_aac \
  -profile:a aac_he \
  -b:a 96k \
  output.m4a

Solution 3: High-Frequency Pre-emphasis

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Emphasize high frequencies before encoding
ffmpeg -i input.wav \
  -af "highpass=f=8000,volume=1.5" \
  -c:a aac -b:a 128k \
  output.m4a

Problem 6: Stereo Image Loss

Symptom: Weakened left-right separation

Cause: Stereo information loss at low bitrate

Solution:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Enable Mid/Side encoding (some encoders)
ffmpeg -i input.wav \
  -c:a libfdk_aac \
  -profile:a aac_low \
  -vbr 5 \
  output.m4a

License Considerations

AAC may have license issues depending on patent pool and product category.

Major Patent Pools:

  • Via Licensing: AAC patent pool management
  • MPEG LA: MPEG-4 related patents

License Scenarios:

Usage TypeLicense Required
Personal useNot required
Free app/serviceGenerally not required (decoder provider bears cost)
Paid app/serviceReview needed (depending on revenue scale)
Hardware productsRequired (typically handled by decoder chip manufacturer)

FFmpeg Encoder Licenses:

  • aac (native): FFmpeg license (LGPL/GPL)
  • libfdk_aac: Fraunhofer FDK AAC license (commercial use may be restricted)

For commercial products and large-scale distribution, conduct legal review and check terms of encoders/decoders in use.

Safe Choice:

  • Personal/small-scale: Use FFmpeg native aac encoder
  • Commercial/large-scale: Decide after legal team and license review

Metadata Management

Add metadata (title, artist, album, etc.) to AAC files:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a aac -b:a 192k \
  -metadata title="Song Title" \
  -metadata artist="Artist Name" \
  -metadata album="Album Name" \
  -metadata date="2026" \
  -metadata genre="Pop" \
  output.m4a

Add Album Art:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav -i cover.jpg \
  -c:a aac -b:a 192k \
  -c:v copy \
  -disposition:v:0 attached_pic \
  -metadata title="Song Title" \
  output.m4a

Conclusion

Key Summary

  • AAC is the central codec of MPEG-4 audio, with LC-AAC as the compatibility baseline and HE-AAC having strengths at low bitrates.
  • Internally uses psychoacoustic model + MDCT-based block coding to reduce bits for perceptually less important information.
  • In practice, must design sample rate unification, bitrate tiers, and container (MP4/ADTS) together.
  • Profile selection determined by bitrate and content type: LC for 128 kbps or higher, HE-AAC v1 for 64~96 kbps, HE-AAC v2 or Opus for 48 kbps or below

Music Streaming:

  • High quality: AAC-LC 192~256 kbps
  • Standard: AAC-LC 128~160 kbps
  • Mobile: HE-AAC v1 64~96 kbps

Podcasts:

  • Speech only: AAC-LC 64~96 kbps (mono possible)
  • Speech + music: AAC-LC 96~128 kbps (stereo)

VOD Services:

  • Premium: AAC-LC 192~256 kbps
  • Standard: AAC-LC 128~160 kbps
  • Mobile: HE-AAC v1 64~96 kbps

Adaptive Streaming:

  • Prepare multiple bitrate AAC versions (256k, 192k, 128k, 64k)
  • Client automatically selects based on network
  • MP4 container recommended

Next Steps

  1. Experiment: Test multiple bitrates with your content and find optimal point
  2. Monitor: Analyze user feedback and bandwidth costs
  3. Optimize: Apply different profiles and bitrates by content type
  4. Update: Continue encoder quality improvements with FFmpeg version upgrades

One-Line Summary: AAC is the standard of modern streaming with excellent balance of compression efficiency and compatibility, and the key is selecting bitrate and profile appropriate to content.


References

... 996 lines not shown ... Token usage: 63706/1000000; 936294 remaining Start-Sleep -Seconds 3