How do I choose between LC-AAC and HE-AAC?

LC-AAC is the standard profile widely used for general music and speech, while HE-AAC is frequently used for streaming and mobile applications that need to handle high frequencies efficiently at low bitrates. Check your target bitrate and device support range first.

What are common audio quality issues when encoding AAC with FFmpeg?

Incorrect profile, sample rate, or channel mapping can cause quality degradation or sync issues. Match the source and container metadata, and explicitly specify `-ar`/`-ac` when needed for reproducibility.

Why use AAC instead of MP3 in production?

AAC often provides better audio quality efficiency at the same bitrate and works well with HLS, DASH, and MP4 ecosystems. License and platform policies vary by service, so review distribution channels together.

What should I watch out for when configuring AAC for streaming?

Segment boundaries, keyframes, and audio timestamps must align to avoid interruptions. It is safer to match encoder delay/buffer with player buffer policies.

Complete AAC Audio Codec Guide | LC-AAC, HE-AAC & FFmpeg Practical Encoding

2026년 3월 30일 · 20분 읽기 · 수정 2026년 4월 7일 Intermediate Guide

이 글의 핵심

Complete guide to AAC codec profiles (LC-AAC, HE-AAC), MPEG-4 container integration, and FFmpeg encoding options. Learn how to balance quality and file size for streaming and mobile applications.

Introduction

AAC (Advanced Audio Coding) is a MPEG-family lossy compression codec designed as the successor to MP3, aiming to provide better audio quality at the same bitrate. It is used as a de facto standard in modern streaming and mobile ecosystems such as HLS, DASH, and MP4, where encoder quality and profile selection (LC-AAC, HE-AAC) directly impact perceived service audio quality and bandwidth costs.

Major streaming platforms like YouTube, Netflix, and Spotify use AAC as their default audio codec because of its compression efficiency and wide device support. Particularly in mobile environments, hardware-accelerated decoding is available, reducing battery consumption while providing high-quality audio.

In production, you must simultaneously decide “which profile to use,” “what bitrate in kbps,” and “which FFmpeg encoder options are appropriate.” This article connects understanding codec structure, reproducible FFmpeg examples, and quality vs. file size tradeoffs in one flow.

What You’ll Learn

Understand AAC’s history, its position in MPEG-2/4, and differences between major profiles (LC-AAC, HE-AAC)
Grasp the big picture of psychoacoustic model and MDCT-based block coding
Construct AAC encoding commands with FFmpeg for specific purposes
Organize bitrate and container selection criteria from streaming and mobile perspectives
Learn common problems and solutions encountered in practice

Codec Overview
Compression Principles
Practical Encoding
Performance Comparison
Real-World Use Cases
Optimization Tips
Common Problems and Solutions
Conclusion

Codec Overview

History and Development Background

AAC was standardized in 1997 as ISO/IEC 13818-7 (MPEG-2 Part 7) and later integrated and extended into ISO/IEC 14496-3 (MPEG-4 Audio).

The development goal was clear: provide better audio quality than MP3 at the same bitrate. To achieve this, the following elements were improved compared to MP3 (MPEG-1/2 Layer 3):

More sophisticated filter bank: Improved frequency resolution
Enhanced critical band division: Closer to human auditory characteristics
Improved tone and noise modeling: Increased efficiency for both music and speech
More flexible block size: Adapts to instantaneous sound changes

In commercial services, it was widely adopted in the Apple ecosystem (iTunes, Apple Music, AAC-LC based) and adaptive streaming (AAC in HLS). The iPod’s AAC support in the mid-2000s marked a turning point in popularization, followed by major platforms like YouTube and Netflix adopting AAC as standard.

Technical Features

Item	Description
Compression Method	Lossy compression based on perceptual coding, MDCT-family transform, quantization, lossless codebook, etc.
Sample Rate	8 kHz~96 kHz supported (depending on profile and implementation), 44.1/48 kHz most common in practice
Bitrate	Music: 128~256 kbps (stereo) range is common, HE-AAC advantageous at lower bitrate bands
Channels	Expandable from mono to multi-channel (5.1, 7.1, etc.), stereo is typical for streaming
Latency	Approximately 20~50ms depending on frame size (profile and settings dependent)
Container	MP4 (.m4a), ADTS (.aac), 3GP, MPEG-TS, etc.

Major Profiles

AAC provides several profiles for different purposes. Each profile sets different tradeoffs between compression efficiency and complexity.

Profile	Technology	Recommended Bitrate	Main Use
AAC-LC	Basic MDCT + psychoacoustic	128~256 kbps (stereo)	Music, podcasts, VOD, general purpose
HE-AAC v1	LC + SBR (high-frequency restoration)	64~96 kbps (stereo)	Mobile streaming, radio
HE-AAC v2	v1 + PS (parametric stereo)	32~64 kbps (stereo)	Ultra-low bandwidth, speech-focused

AAC-LC (Low Complexity)

The most widely supported basic profile. Compresses using only MDCT and psychoacoustic model without complex additional technologies:

Advantages:
- Playable on almost all devices (smartphones, PCs, cars, smart TVs, etc.)
- Low decoding load (good battery efficiency)
- Predictable and stable audio quality
- Extensive hardware acceleration support
Disadvantages:
- At low bitrates (64 kbps or below), audio quality may be inferior to HE-AAC
- Inefficient in environments requiring extreme bandwidth savings
Recommended Scenarios:
- General music streaming (Spotify, Apple Music, etc.)
- Podcasts (128~160 kbps)
- VOD services (audio tracks for video)
- Offline music files

HE-AAC v1 (High-Efficiency AAC)

Adds SBR (Spectral Band Replication) technology to efficiently restore high-frequency components:

Principle:
- Only transmit low-frequency band (~8kHz)
- High frequencies restored by decoder analyzing low-frequency patterns
- Exploits human hearing being less sensitive to high-frequency details
Advantages:
- 64~96 kbps can achieve AAC-LC 128 kbps level quality
- Significant bandwidth savings (mobile data cost reduction)
- Maintains appropriate quality for both music and speech
Disadvantages:
- Increased decoding complexity (CPU usage ~1.5x)
- Older devices (pre-2010) may not support
- Artifacts possible in music where high frequencies are important (cymbals, hi-hats)
Recommended Scenarios:
- Mobile network environments (3G, slow 4G)
- Internet radio broadcasting
- Streaming with bandwidth limitations

HE-AAC v2 (HE-AAC + PS)

Adds Parametric Stereo (PS) to compress stereo information as parameters:

Principle:
- Transmit only mono signal, stereo position information as parameters
- Express left-right channel differences as mathematical model
- Decoder reconstructs stereo from mono + parameters
Advantages:
- Maintains stereo feel even at 32~48 kbps
- Extreme bandwidth savings (voice call level)
- Minimized file size
Disadvantages:
- More suitable for speech/talk than music
- Complex music (orchestra, rock) has stereo image distortion
- Limited device support (mainly recent devices)
Recommended Scenarios:
- Voice calls, audiobooks
- Talk shows, podcasts (without music)
- Ultra-low bandwidth environments (satellite communication, etc.)

Selection Guide:

Here’s a detailed implementation using Mermaid. Please review the code to understand the role of each part.

flowchart TD
    START[Decide Bitrate]
    START --> Q1{128 kbps or higher?}
    Q1 -->|Yes| LC[Use AAC-LC]
    Q1 -->|No| Q2{64~96 kbps?}
    Q2 -->|Yes| Q3{Music-focused?}
    Q3 -->|Yes| HE1[Use HE-AAC v1]
    Q3 -->|No| HE2[Consider HE-AAC v2]
    Q2 -->|No| Q4{48 kbps or below?}
    Q4 -->|Yes| Q5{Speech only?}
    Q5 -->|Yes| HE2
    Q5 -->|No| OPUS[Recommend reviewing Opus]
    
    LC --> RESULT1["Best compatibility\nStable quality"]
    HE1 --> RESULT2["Bandwidth savings\nMobile optimized"]
    HE2 --> RESULT3["Extreme compression\nSpeech specialized"]
    OPUS --> RESULT4["Real-time calls\nUltra-low latency"]

Compression Principles

Psychoacoustic Model

The core of AAC is the principle that sounds humans cannot hear are not stored.

Human hearing has the following characteristics:

Simultaneous Masking: When there’s a loud sound, you can’t hear small sounds at similar frequencies
- Example: When a drum kick sounds, subtle bass guitar vibrations are inaudible
Temporal Masking: You can’t hear small sounds immediately before or after loud sounds
- Example: Background noise is not perceived for 20~30ms right after cymbal strike
Frequency Sensitivity Differences: Most sensitive in 2~5 kHz band, less sensitive toward low and high frequencies

AAC uses this to apply coarser quantization to perceptually less important coefficients to save bits. Rather than “reducing data,” the strategy is reducing components that are unlikely to be heard first.

MDCT (Modified Discrete Cosine Transform)

AAC family mainly uses MDCT-based filter bank to convert time-domain signals to frequency coefficients.

MDCT Features:

Overlap-Add structure: Removes discontinuities at block boundaries
Variable block length:
- Long blocks (1024 samples): Steady-state signals, excellent frequency resolution
- Short blocks (128 samples): Instantaneous attack sounds (transients), excellent time resolution
Critical band-based grouping: Groups coefficients according to human auditory frequency resolution

Why use MDCT?

In time domain, you can know sound changes but frequency characteristics are difficult to know. Conversely, in frequency domain, you can know what sounds exist but time information is lacking. MDCT appropriately balances both while removing block boundary artifacts with overlap structure.

Bitrate Allocation Strategy

Internally, bit pool is divided by frequency band, and bits are distributed to tone and noise components according to psychoacoustic weights.

Allocation Process:

Psychoacoustic analysis: Calculate masking threshold for each frequency band
Bit budget distribution: Allocate more bits to important bands
Quantization step decision: Quantize with different precision per band
Iterative optimization: Minimize distortion while meeting target bitrate

At low bitrates, the following techniques additionally operate:

Bandwidth limitation: Don’t transmit high frequencies at all (e.g., remove above 16 kHz)
Spectral line replacement: Approximate complex frequency components with noise
TNS (Temporal Noise Shaping): Optimize time-domain noise distribution

HE-AAC family efficiently encodes high-frequency information as a separate layer with SBR (Spectral Band Replication). Actually transmitted data includes only below 8 kHz, and 8~16 kHz is restored with SBR parameters.

Processing Flow

Below shows the entire AAC encoding pipeline.

Here’s a detailed implementation using Mermaid. Please review the code to understand the role of each part.

flowchart TB
    subgraph INPUT[Input Stage]
        PCM["PCM Audio\n44.1/48 kHz"]
    end
    
    subgraph ANALYSIS[Analysis Stage]
        PSY["Psychoacoustic Analysis\nMasking Threshold Calculation"]
        MDCT["MDCT Transform\nTime→Frequency Domain"]
    end
    
    subgraph ENCODING[Encoding Stage]
        QUANT["Quantization\nBit Allocation per Band"]
        CODEBOOK["Codebook Selection\nEfficient Representation"]
        TNS["TNS Application\nTemporal Noise Shaping"]
    end
    
    subgraph OPTIONAL[Optional Technologies]
        SBR["SBR\nHigh-frequency Restoration\nHE-AAC v1"]
        PS["PS\nParametric Stereo\nHE-AAC v2"]
    end
    
    subgraph OUTPUT[Output Stage]
        ENTROPY["Entropy Coding\nHuffman etc."]
        BITSTREAM["AAC Bitstream\nADTS/MP4"]
    end
    
    PCM --> PSY
    PCM --> MDCT
    PSY --> QUANT
    MDCT --> QUANT
    QUANT --> CODEBOOK
    CODEBOOK --> TNS
    TNS --> ENTROPY
    
    PSY -.-> SBR
    QUANT -.-> SBR
    SBR -.-> ENTROPY
    
    PSY -.-> PS
    PS -.-> ENTROPY
    
    ENTROPY --> BITSTREAM
    
    style INPUT fill:#e3f2fd
    style ANALYSIS fill:#fff3e0
    style ENCODING fill:#f3e5f5
    style OPTIONAL fill:#e8f5e9
    style OUTPUT fill:#fce4ec

Practical Encoding

FFmpeg Encoder Selection

FFmpeg provides two AAC encoders:

Encoder	Features	When to Use
`aac`	Native encoder, provided by default	General purpose, fast encoding
`libfdk_aac`	High-quality encoder, requires separate build	When highest quality is needed

Check encoders:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Check available AAC encoders
ffmpeg -encoders | grep aac

# Example output:
# A..... aac                  AAC (Advanced Audio Coding)
# A..... libfdk_aac           Fraunhofer FDK AAC

Basic Encoding Examples

1. AAC-LC, CBR Stereo 128 kbps

The most basic and stable setting. Suitable for most music streaming.

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a aac \
  -b:a 128k \
  -ar 48000 \
  -ac 2 \
  -aac_coder twoloop \
  output.m4a

Option Explanation:

-c:a aac: Use AAC encoder
-b:a 128k: Bitrate 128 kbps
-ar 48000: Sample rate 48 kHz (recommended when combining with video)
-ac 2: Stereo (2 channels)
-aac_coder twoloop: High-quality encoding algorithm (slower than default but better quality)

2. AAC-LC, VBR Quality Mode

Encode based on quality rather than fixed bitrate.

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Native aac encoder (VBR-like)
ffmpeg -i input.wav \
  -c:a aac \
  -q:a 1 \
  -ar 44100 \
  -ac 2 \
  output.m4a

-q:a Value Guide:

0: Highest quality (~256 kbps)
1: High quality (~192 kbps)
2: Medium quality (~128 kbps)
3-4: Low quality (~96 kbps)

3. Using libfdk_aac (High Quality)

If libfdk_aac is installed, you can get better audio quality:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a libfdk_aac \
  -profile:a aac_low \
  -vbr 4 \
  -ar 44100 \
  -ac 2 \
  output.m4a

-vbr Value Guide (libfdk_aac):

1: ~32 kbps
2: ~48 kbps
3: ~64 kbps
4: ~96 kbps
5: ~128 kbps

4. HE-AAC v1 Encoding

For low-bitrate mobile streaming:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a libfdk_aac \
  -profile:a aac_he \
  -b:a 64k \
  -ar 44100 \
  -ac 2 \
  output.m4a

Note: Native aac encoder does not support HE-AAC. libfdk_aac is required.

5. HE-AAC v2 Encoding

For ultra-low bitrate speech:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a libfdk_aac \
  -profile:a aac_he_v2 \
  -b:a 32k \
  -ar 44100 \
  -ac 2 \
  output.m4a

6. AAC for HLS Streaming

ADTS format for HTTP Live Streaming:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a aac \
  -b:a 160k \
  -ar 48000 \
  -ac 2 \
  -f adts \
  output.aac

ADTS vs MP4:

ADTS: Header included in each frame, suitable for streaming segments
MP4: Metadata concentrated at file start/end, suitable for download playback

7. Encoding with Video

Re-encode only audio of video file to AAC:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.mp4 \
  -c:v copy \
  -c:a aac \
  -b:a 192k \
  -ar 48000 \
  output.mp4

-c:v copy: Copy video without re-encoding (fast and no quality loss)

Parameter Tuning Guide

Sample Rate Selection

Sample Rate	Use	Notes
44.1 kHz	CD quality, music	Recommend maintaining if source is 44.1
48 kHz	Video standard, broadcast	Required when combining with video
32 kHz	Low-quality streaming	Only for extreme bandwidth savings
22.05 kHz	Speech, radio	Unsuitable for music

Resampling Caution: If source is 44.1 kHz and you upsample to 48 kHz then downsample back to 44.1 kHz, there’s risk of aliasing and distortion. Maintain source sample rate when possible.

Bitrate Selection

Music Streaming (stereo basis):

Bitrate	Quality	Use
256 kbps	Near transparent	High-quality streaming, audio files
192 kbps	Excellent	General streaming (Spotify Premium, etc.)
128 kbps	Good	Standard streaming, mobile
96 kbps	Average	Low-bandwidth environment (HE-AAC v1 recommended)
64 kbps	Low	HE-AAC v1 required
48 kbps or below	Speech only	HE-AAC v2 or Opus

Podcast/Speech:

Speech only: 64~~96 kbps (mono 32~~48 kbps also possible)
Speech + background music: 96~128 kbps
High-quality interview: 128~160 kbps

Genre-specific Recommended Bitrates:

Classical, Jazz: 192~256 kbps (reverb and detail important)
Pop, Rock: 128~192 kbps
Electronic: 160~192 kbps (high-frequency synthesizers)
Talk, Audiobooks: 64~96 kbps

Quality vs File Size Tradeoff

Actual File Size Examples (3-minute music, stereo):

Bitrate	File Size	Perceived Quality
320 kbps (MP3 max)	~7.2 MB	Transparent
256 kbps (AAC)	~5.8 MB	Nearly transparent
192 kbps (AAC)	~4.3 MB	Excellent (most satisfied)
128 kbps (AAC)	~2.9 MB	Good (OK for general listening)
96 kbps (HE-AAC v1)	~2.2 MB	Average (mobile environment)
64 kbps (HE-AAC v1)	~1.4 MB	Low (speech-focused)

In the same listening environment, 192 kbps vs 128 kbps has significant perceived difference relative to file size. However, 256 kbps or higher requires cost-benefit analysis (bandwidth and CDN costs) depending on situation.

ABX listening tests are recommended to establish team baselines:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Encode same source at multiple bitrates
ffmpeg -i source.wav -c:a aac -b:a 128k test_128.m4a
ffmpeg -i source.wav -c:a aac -b:a 192k test_192.m4a
ffmpeg -i source.wav -c:a aac -b:a 256k test_256.m4a

# Check differences with blind test

Performance Comparison

Compression Efficiency vs Other Codecs

General evaluation under the same subjective listening conditions (may vary by content and encoder):

Bitrate	MP3	AAC-LC	HE-AAC v1	Opus
32 kbps	Very low	Low	Average	Excellent (speech)
64 kbps	Low	Average	Good	Excellent
96 kbps	Average	Good	Excellent	Excellent
128 kbps	Good	Excellent	Excessive	Excellent
192 kbps	Excellent	Nearly transparent	Excessive	Nearly transparent

Conclusion:

128 kbps or higher: Clear difference between AAC-LC and MP3, AAC advantageous
64~96 kbps: HE-AAC v1 clearly superior to AAC-LC
48 kbps or below: Opus often superior to AAC (speech specialized)

Encoding and Decoding Speed

Decoding Performance:

Hardware Acceleration: Most mobile SoCs have built-in AAC hardware decoder
- iPhone: All A-series chips
- Android: Most Snapdragon, Exynos, MediaTek
- PC: Intel Quick Sync, AMD VCE
Battery Efficiency: Hardware decoding uses 1/5~1/10 power consumption compared to software

Encoding Performance (Intel i7-10700K basis, 3-minute music):

Setting	Encoding Time	Real-time Multiple
AAC-LC 128k (aac)	~2s	90x
AAC-LC 192k (aac, twoloop)	~5s	36x
AAC-LC 192k (libfdk_aac, vbr 5)	~8s	22x
HE-AAC v1 64k (libfdk_aac)	~12s	15x

FFmpeg aac is often sufficient for real-time batch processing, and high-quality presets and multi-pass can increase CPU time.

Subjective Quality (MOS)

MOS (Mean Opinion Score) varies by experimental conditions and codec version.

Typical MOS Range (out of 5):

Bitrate	AAC-LC	HE-AAC v1	MP3
64 kbps	3.0~3.5	3.5~4.0	2.5~3.0
96 kbps	3.5~4.0	4.0~4.3	3.0~3.5
128 kbps	4.0~4.5	4.3~4.6	3.5~4.0
192 kbps	4.5~4.8	-	4.0~4.5

In practice, refer to standard listening procedures like ITU-R BS.1534 (MUSHRA), but it’s safer to design your own MOS survey with service-specific target devices and earphones.

Real-World Use Cases

Streaming Services

Major Platform AAC Usage:

Apple Music: AAC 256 kbps (high quality), AAC 128 kbps (standard)
YouTube: AAC 128~256 kbps (video audio track)
Netflix: AAC 192~640 kbps (up to 5.1 channels)
Spotify: Previously Ogg Vorbis-focused but uses AAC on some platforms

Adaptive Streaming Configuration Example:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Create multiple bitrate versions
ffmpeg -i source.wav -c:a aac -b:a 256k -ar 48000 audio_256k.m4a
ffmpeg -i source.wav -c:a aac -b:a 192k -ar 48000 audio_192k.m4a
ffmpeg -i source.wav -c:a aac -b:a 128k -ar 48000 audio_128k.m4a
ffmpeg -i source.wav -c:a libfdk_aac -profile:a aac_he -b:a 64k audio_64k.m4a

Client automatically selects appropriate version based on network speed.

Mobile Apps

Both iOS and Android have good basic support for AAC decoding.

Mobile Optimization Strategy:

Offline Cache:
- Wi-Fi: 192 kbps AAC-LC
- Mobile data: 96 kbps HE-AAC v1
Background Playback: Battery savings with hardware decoding
A/B Testing:
- Provide different bitrates to user groups
- Collect churn rate, playback completion rate, user feedback

Actual Implementation Example (pseudocode):

Here’s an implementation example using JavaScript. Try running the code directly to see how it works.

// Quality selection based on network status
const quality = networkSpeed > 5000 ? 'high' : 
                networkSpeed > 2000 ? 'medium' : 'low';

const audioUrl = {
  high: '/audio/song_192k.m4a',    // AAC-LC 192 kbps
  medium: '/audio/song_128k.m4a',  // AAC-LC 128 kbps
  low: '/audio/song_64k.m4a'       // HE-AAC v1 64 kbps
}[quality];

Podcast Production

Podcasts are mainly speech, so low bitrates are sufficient:

Recommended Settings:

Here’s a detailed implementation using bash. Please review the code to understand the role of each part.

# Speech-only podcast (mono)
ffmpeg -i podcast.wav \
  -c:a aac \
  -b:a 64k \
  -ar 44100 \
  -ac 1 \
  podcast_mono.m4a

# Speech + intro/outro music (stereo)
ffmpeg -i podcast.wav \
  -c:a aac \
  -b:a 96k \
  -ar 44100 \
  -ac 2 \
  podcast_stereo.m4a

File Size Comparison (1-hour podcast):

64 kbps mono: ~28 MB
96 kbps stereo: ~43 MB
128 kbps stereo: ~57 MB

VoIP and WebRTC

For real-time voice communication, Opus is more suitable than AAC:

AAC: Large frame size causes latency (2050ms)
Opus: Ultra-low latency settings possible (510ms)

AAC is closer to recorded media files and adaptive streaming rather than real-time calls due to latency and framing characteristics.

Browser Support

HTML5 <audio> tag:

Here’s an implementation example using HTML. Try running the code directly to see how it works.

<audio controls>
  <source src="audio.m4a" type="audio/mp4">
  <source src="audio.mp3" type="audio/mpeg">
  Your browser does not support audio.
</audio>

Support Status:

Browser	AAC in MP4	AAC in ADTS
Chrome	✅	⚠️ Limited
Firefox	✅	⚠️ Limited
Safari	✅	✅
Edge	✅	⚠️ Limited

Recommendation: Use MP4 container (.m4a) on web. ADTS support is unstable in some browsers.

AAC in fMP4 (fragmented MP4) in MSE (Media Source Extensions) environment is widely supported. It’s the foundation technology for adaptive streaming like HLS and DASH.

Optimization Tips

Reducing File Size While Maintaining Quality

1. Remove Unnecessary Resampling

Maintaining source sample rate reduces quality loss and processing time:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Bad example: Unnecessary resampling
ffmpeg -i input_44100.wav -ar 48000 -c:a aac -b:a 128k temp.m4a
ffmpeg -i temp.m4a -ar 44100 output.m4a  # Quality degradation!

# Good example: Maintain source
ffmpeg -i input_44100.wav -c:a aac -b:a 128k output.m4a

2. Utilize HE-AAC

For low-bitrate mobile streams, HE-AAC can be advantageous for perceived quality vs file size:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Compare AAC-LC 128 kbps vs HE-AAC v1 64 kbps
ffmpeg -i input.wav -c:a aac -b:a 128k lc_128.m4a
ffmpeg -i input.wav -c:a libfdk_aac -profile:a aac_he -b:a 64k he_64.m4a

# File size is half, quality is similar level

Note: Client support verification required. Older devices may not play HE-AAC.

3. Optimize Silent Sections

For content with long silence (lectures, interviews), removing by editing reduces file size:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Detect and remove silence (FFmpeg)
ffmpeg -i input.wav \
  -af "silenceremove=start_periods=1:start_silence=0.1:start_threshold=-50dB" \
  -c:a aac -b:a 96k \
  output.m4a

Improving Encoding Speed

1. Use Single-Pass CBR

Faster and more predictable than VBR or multi-pass:

# Fast encoding (CBR)
ffmpeg -i input.wav -c:a aac -b:a 128k -ar 48000 output.m4a

2. CPU Parallelization

Encode multiple files simultaneously:

Here’s a simple bash code example. Try running the code directly to see how it works.

# Using GNU parallel
find ./wav -name '*.wav' -print0 | \
  parallel -0 -j 4 \
  ffmpeg -y -i {} -c:a aac -b:a 160k {.}.m4a

-j 4: Process 4 files simultaneously (adjust to CPU core count)

3. Hardware Acceleration (Limited for Encoding)

AAC encoding is mostly software-based. Hardware acceleration is extensively supported only for decoding.

Batch Processing Automation

Script example for consistently encoding large numbers of files:

Here’s a detailed implementation using bash. Please review the code to understand the role of each part.

#!/bin/bash
# batch_encode.sh

INPUT_DIR="./source"
OUTPUT_DIR="./encoded"
BITRATE="128k"
SAMPLE_RATE="48000"

mkdir -p "$OUTPUT_DIR"

for file in "$INPUT_DIR"/*.wav; do
  filename=$(basename "$file" .wav)
  echo "Encoding: $filename"
  
  ffmpeg -i "$file" \
    -c:a aac \
    -b:a "$BITRATE" \
    -ar "$SAMPLE_RATE" \
    -ac 2 \
    -aac_coder twoloop \
    "$OUTPUT_DIR/${filename}.m4a" \
    -y
    
  # Quality verification: Generate spectrogram
  ffmpeg -i "$OUTPUT_DIR/${filename}.m4a" \
    -lavfi showspectrumpic=s=1920x1080 \
    "$OUTPUT_DIR/${filename}_spectrum.png" \
    -y
done

echo "Batch encoding complete!"

CI/CD Integration:

Here’s a detailed implementation using YAML. Please review the code to understand the role of each part.

# .github/workflows/encode-audio.yml
name: Encode Audio Files

on:
  push:
    paths:
      - 'audio/source/**/*.wav'

jobs:
  encode:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Install FFmpeg
        run: sudo apt-get install -y ffmpeg
      
      - name: Encode AAC
        run: |
          for file in audio/source/*.wav; do
            ffmpeg -i "$file" -c:a aac -b:a 192k "${file%.wav}.m4a"
          done
      
      - name: Upload artifacts
        uses: actions/upload-artifact@v3
        with:
          name: encoded-audio
          path: audio/source/*.m4a

Saving input fingerprint (checksum) and output spectrogram snapshots in CI helps with regression detection when upgrading encoders.

Common Problems and Solutions

Compatibility Issues

Problem 1: HE-AAC Won’t Play

Symptom: No sound or error on older devices

Cause: HE-AAC decoder not supported (pre-2010 devices)

Solution:

# Fallback strategy: Also provide AAC-LC version
ffmpeg -i input.wav -c:a aac -b:a 128k fallback_lc.m4a
ffmpeg -i input.wav -c:a libfdk_aac -profile:a aac_he -b:a 64k modern_he.m4a

On web, provide multiple sources in <audio> tag:

Here’s a simple HTML code example. Try running the code directly to see how it works.

<audio controls>
  <source src="audio_he.m4a" type="audio/mp4">
  <source src="audio_lc.m4a" type="audio/mp4">
</audio>

Problem 2: ADTS Files Won’t Play in Browser

Symptom: .aac files fail to play in Chrome/Firefox

Cause: Browsers prefer MP4 container, limited ADTS support

Solution:

# Convert ADTS to MP4 (without re-encoding)
ffmpeg -i input.aac -c:a copy -movflags +faststart output.m4a

-movflags +faststart: Move moov atom to file beginning (streaming optimization)

Problem 3: Video and Audio Out of Sync

Symptom: Video and audio gradually drift apart

Cause: Sample rate mismatch, timestamp errors

Solution:

Here’s an implementation example using bash. Perform tasks efficiently with async processing. Try running the code directly to see how it works.

# Unify video and audio sample rate
ffmpeg -i video.mp4 -i audio.wav \
  -c:v copy \
  -c:a aac -b:a 192k -ar 48000 \
  -async 1 \
  output.mp4

-async 1: Automatically adjust audio timestamps to match video

Audio Quality Degradation

Problem 4: Sound Feels “Broken”

Cause 1: Clipping

Source audio exceeds 0 dBFS causing distortion:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Adjust gain to secure headroom
ffmpeg -i input.wav \
  -af "volume=-3dB" \
  -c:a aac -b:a 192k \
  output.m4a

Recommended Headroom: -1~-3 dBTP (True Peak)

Cause 2: Insufficient Bitrate

Encoding complex music at low bitrate:

# Increase bitrate
ffmpeg -i input.wav -c:a aac -b:a 192k output.m4a  # 128k → 192k

Problem 5: High Frequencies Sound Harsh

Symptom: Hi-hats, cymbals, high vocals sound “hissy”

Cause: High-frequency quantization errors at low bitrate

Solution 1: Increase Bitrate

ffmpeg -i input.wav -c:a aac -b:a 192k output.m4a

Solution 2: Use HE-AAC

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Restore high frequencies with SBR
ffmpeg -i input.wav \
  -c:a libfdk_aac \
  -profile:a aac_he \
  -b:a 96k \
  output.m4a

Solution 3: High-Frequency Pre-emphasis

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Emphasize high frequencies before encoding
ffmpeg -i input.wav \
  -af "highpass=f=8000,volume=1.5" \
  -c:a aac -b:a 128k \
  output.m4a

Problem 6: Stereo Image Loss

Symptom: Weakened left-right separation

Cause: Stereo information loss at low bitrate

Solution:

Here’s an implementation example using bash. Try running the code directly to see how it works.

# Enable Mid/Side encoding (some encoders)
ffmpeg -i input.wav \
  -c:a libfdk_aac \
  -profile:a aac_low \
  -vbr 5 \
  output.m4a

License Considerations

AAC may have license issues depending on patent pool and product category.

Major Patent Pools:

Via Licensing: AAC patent pool management
MPEG LA: MPEG-4 related patents

License Scenarios:

Usage Type	License Required
Personal use	Not required
Free app/service	Generally not required (decoder provider bears cost)
Paid app/service	Review needed (depending on revenue scale)
Hardware products	Required (typically handled by decoder chip manufacturer)

FFmpeg Encoder Licenses:

aac (native): FFmpeg license (LGPL/GPL)
libfdk_aac: Fraunhofer FDK AAC license (commercial use may be restricted)

For commercial products and large-scale distribution, conduct legal review and check terms of encoders/decoders in use.

Safe Choice:

Personal/small-scale: Use FFmpeg native aac encoder
Commercial/large-scale: Decide after legal team and license review

Metadata Management

Add metadata (title, artist, album, etc.) to AAC files:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav \
  -c:a aac -b:a 192k \
  -metadata title="Song Title" \
  -metadata artist="Artist Name" \
  -metadata album="Album Name" \
  -metadata date="2026" \
  -metadata genre="Pop" \
  output.m4a

Add Album Art:

Here’s an implementation example using bash. Try running the code directly to see how it works.

ffmpeg -i input.wav -i cover.jpg \
  -c:a aac -b:a 192k \
  -c:v copy \
  -disposition:v:0 attached_pic \
  -metadata title="Song Title" \
  output.m4a

Conclusion

Key Summary

AAC is the central codec of MPEG-4 audio, with LC-AAC as the compatibility baseline and HE-AAC having strengths at low bitrates.
Internally uses psychoacoustic model + MDCT-based block coding to reduce bits for perceptually less important information.
In practice, must design sample rate unification, bitrate tiers, and container (MP4/ADTS) together.
Profile selection determined by bitrate and content type: LC for 128 kbps or higher, HE-AAC v1 for 64~96 kbps, HE-AAC v2 or Opus for 48 kbps or below

Recommended Usage Scenarios

Music Streaming:

High quality: AAC-LC 192~256 kbps
Standard: AAC-LC 128~160 kbps
Mobile: HE-AAC v1 64~96 kbps

Podcasts:

Speech only: AAC-LC 64~96 kbps (mono possible)
Speech + music: AAC-LC 96~128 kbps (stereo)

VOD Services:

Premium: AAC-LC 192~256 kbps
Standard: AAC-LC 128~160 kbps
Mobile: HE-AAC v1 64~96 kbps

Adaptive Streaming:

Prepare multiple bitrate AAC versions (256k, 192k, 128k, 64k)
Client automatically selects based on network
MP4 container recommended

Next Steps

Experiment: Test multiple bitrates with your content and find optimal point
Monitor: Analyze user feedback and bandwidth costs
Optimize: Apply different profiles and bitrates by content type
Update: Continue encoder quality improvements with FFmpeg version upgrades

One-Line Summary: AAC is the standard of modern streaming with excellent balance of compression efficiency and compatibility, and the key is selecting bitrate and profile appropriate to content.

이 글의 핵심

Introduction

What You’ll Learn

Table of Contents

Codec Overview

History and Development Background

Technical Features

Major Profiles

AAC-LC (Low Complexity)

HE-AAC v1 (High-Efficiency AAC)

HE-AAC v2 (HE-AAC + PS)

Compression Principles

Psychoacoustic Model

MDCT (Modified Discrete Cosine Transform)

Bitrate Allocation Strategy

Processing Flow

Practical Encoding

FFmpeg Encoder Selection

Basic Encoding Examples

1. AAC-LC, CBR Stereo 128 kbps

2. AAC-LC, VBR Quality Mode

3. Using libfdk_aac (High Quality)

4. HE-AAC v1 Encoding

5. HE-AAC v2 Encoding

6. AAC for HLS Streaming

7. Encoding with Video

Parameter Tuning Guide

Sample Rate Selection

Bitrate Selection

Quality vs File Size Tradeoff

Performance Comparison

Compression Efficiency vs Other Codecs

Encoding and Decoding Speed

Subjective Quality (MOS)

Real-World Use Cases

Streaming Services

Mobile Apps

Podcast Production

VoIP and WebRTC

Browser Support

Optimization Tips

Reducing File Size While Maintaining Quality

1. Remove Unnecessary Resampling

2. Utilize HE-AAC

3. Optimize Silent Sections

Improving Encoding Speed

1. Use Single-Pass CBR

2. CPU Parallelization

3. Hardware Acceleration (Limited for Encoding)

Batch Processing Automation

Common Problems and Solutions

Compatibility Issues

Problem 1: HE-AAC Won’t Play

Problem 2: ADTS Files Won’t Play in Browser

Problem 3: Video and Audio Out of Sync

Audio Quality Degradation

Problem 4: Sound Feels “Broken”

Problem 5: High Frequencies Sound Harsh

Problem 6: Stereo Image Loss

License Considerations

Metadata Management

Conclusion

Key Summary

Recommended Usage Scenarios

Next Steps

References