Complete AAC Audio Codec Guide | LC-AAC, HE-AAC & FFmpeg Practical Encoding
이 글의 핵심
Complete guide to AAC codec profiles (LC-AAC, HE-AAC), MPEG-4 container integration, and FFmpeg encoding options. Learn how to balance quality and file size for streaming and mobile applications.
Introduction
AAC (Advanced Audio Coding) is a MPEG-family lossy compression codec designed as the successor to MP3, aiming to provide better audio quality at the same bitrate. It is used as a de facto standard in modern streaming and mobile ecosystems such as HLS, DASH, and MP4, where encoder quality and profile selection (LC-AAC, HE-AAC) directly impact perceived service audio quality and bandwidth costs.
Major streaming platforms like YouTube, Netflix, and Spotify use AAC as their default audio codec because of its compression efficiency and wide device support. Particularly in mobile environments, hardware-accelerated decoding is available, reducing battery consumption while providing high-quality audio.
In production, you must simultaneously decide “which profile to use,” “what bitrate in kbps,” and “which FFmpeg encoder options are appropriate.” This article connects understanding codec structure, reproducible FFmpeg examples, and quality vs. file size tradeoffs in one flow.
What You’ll Learn
- Understand AAC’s history, its position in MPEG-2/4, and differences between major profiles (LC-AAC, HE-AAC)
- Grasp the big picture of psychoacoustic model and MDCT-based block coding
- Construct AAC encoding commands with FFmpeg for specific purposes
- Organize bitrate and container selection criteria from streaming and mobile perspectives
- Learn common problems and solutions encountered in practice
Table of Contents
- Codec Overview
- Compression Principles
- Practical Encoding
- Performance Comparison
- Real-World Use Cases
- Optimization Tips
- Common Problems and Solutions
- Conclusion
Codec Overview
History and Development Background
AAC was standardized in 1997 as ISO/IEC 13818-7 (MPEG-2 Part 7) and later integrated and extended into ISO/IEC 14496-3 (MPEG-4 Audio).
The development goal was clear: provide better audio quality than MP3 at the same bitrate. To achieve this, the following elements were improved compared to MP3 (MPEG-1/2 Layer 3):
- More sophisticated filter bank: Improved frequency resolution
- Enhanced critical band division: Closer to human auditory characteristics
- Improved tone and noise modeling: Increased efficiency for both music and speech
- More flexible block size: Adapts to instantaneous sound changes
In commercial services, it was widely adopted in the Apple ecosystem (iTunes, Apple Music, AAC-LC based) and adaptive streaming (AAC in HLS). The iPod’s AAC support in the mid-2000s marked a turning point in popularization, followed by major platforms like YouTube and Netflix adopting AAC as standard.
Technical Features
| Item | Description |
|---|---|
| Compression Method | Lossy compression based on perceptual coding, MDCT-family transform, quantization, lossless codebook, etc. |
| Sample Rate | 8 kHz~96 kHz supported (depending on profile and implementation), 44.1/48 kHz most common in practice |
| Bitrate | Music: 128~256 kbps (stereo) range is common, HE-AAC advantageous at lower bitrate bands |
| Channels | Expandable from mono to multi-channel (5.1, 7.1, etc.), stereo is typical for streaming |
| Latency | Approximately 20~50ms depending on frame size (profile and settings dependent) |
| Container | MP4 (.m4a), ADTS (.aac), 3GP, MPEG-TS, etc. |
Major Profiles
AAC provides several profiles for different purposes. Each profile sets different tradeoffs between compression efficiency and complexity.
| Profile | Technology | Recommended Bitrate | Main Use |
|---|---|---|---|
| AAC-LC | Basic MDCT + psychoacoustic | 128~256 kbps (stereo) | Music, podcasts, VOD, general purpose |
| HE-AAC v1 | LC + SBR (high-frequency restoration) | 64~96 kbps (stereo) | Mobile streaming, radio |
| HE-AAC v2 | v1 + PS (parametric stereo) | 32~64 kbps (stereo) | Ultra-low bandwidth, speech-focused |
AAC-LC (Low Complexity)
The most widely supported basic profile. Compresses using only MDCT and psychoacoustic model without complex additional technologies:
-
Advantages:
- Playable on almost all devices (smartphones, PCs, cars, smart TVs, etc.)
- Low decoding load (good battery efficiency)
- Predictable and stable audio quality
- Extensive hardware acceleration support
-
Disadvantages:
- At low bitrates (64 kbps or below), audio quality may be inferior to HE-AAC
- Inefficient in environments requiring extreme bandwidth savings
-
Recommended Scenarios:
- General music streaming (Spotify, Apple Music, etc.)
- Podcasts (128~160 kbps)
- VOD services (audio tracks for video)
- Offline music files
HE-AAC v1 (High-Efficiency AAC)
Adds SBR (Spectral Band Replication) technology to efficiently restore high-frequency components:
-
Principle:
- Only transmit low-frequency band (~8kHz)
- High frequencies restored by decoder analyzing low-frequency patterns
- Exploits human hearing being less sensitive to high-frequency details
-
Advantages:
- 64~96 kbps can achieve AAC-LC 128 kbps level quality
- Significant bandwidth savings (mobile data cost reduction)
- Maintains appropriate quality for both music and speech
-
Disadvantages:
- Increased decoding complexity (CPU usage ~1.5x)
- Older devices (pre-2010) may not support
- Artifacts possible in music where high frequencies are important (cymbals, hi-hats)
-
Recommended Scenarios:
- Mobile network environments (3G, slow 4G)
- Internet radio broadcasting
- Streaming with bandwidth limitations
HE-AAC v2 (HE-AAC + PS)
Adds Parametric Stereo (PS) to compress stereo information as parameters:
-
Principle:
- Transmit only mono signal, stereo position information as parameters
- Express left-right channel differences as mathematical model
- Decoder reconstructs stereo from mono + parameters
-
Advantages:
- Maintains stereo feel even at 32~48 kbps
- Extreme bandwidth savings (voice call level)
- Minimized file size
-
Disadvantages:
- More suitable for speech/talk than music
- Complex music (orchestra, rock) has stereo image distortion
- Limited device support (mainly recent devices)
-
Recommended Scenarios:
- Voice calls, audiobooks
- Talk shows, podcasts (without music)
- Ultra-low bandwidth environments (satellite communication, etc.)
Selection Guide:
Here’s a detailed implementation using Mermaid. Please review the code to understand the role of each part.
flowchart TD
START[Decide Bitrate]
START --> Q1{128 kbps or higher?}
Q1 -->|Yes| LC[Use AAC-LC]
Q1 -->|No| Q2{64~96 kbps?}
Q2 -->|Yes| Q3{Music-focused?}
Q3 -->|Yes| HE1[Use HE-AAC v1]
Q3 -->|No| HE2[Consider HE-AAC v2]
Q2 -->|No| Q4{48 kbps or below?}
Q4 -->|Yes| Q5{Speech only?}
Q5 -->|Yes| HE2
Q5 -->|No| OPUS[Recommend reviewing Opus]
LC --> RESULT1["Best compatibility\nStable quality"]
HE1 --> RESULT2["Bandwidth savings\nMobile optimized"]
HE2 --> RESULT3["Extreme compression\nSpeech specialized"]
OPUS --> RESULT4["Real-time calls\nUltra-low latency"]
Compression Principles
Psychoacoustic Model
The core of AAC is the principle that sounds humans cannot hear are not stored.
Human hearing has the following characteristics:
-
Simultaneous Masking: When there’s a loud sound, you can’t hear small sounds at similar frequencies
- Example: When a drum kick sounds, subtle bass guitar vibrations are inaudible
-
Temporal Masking: You can’t hear small sounds immediately before or after loud sounds
- Example: Background noise is not perceived for 20~30ms right after cymbal strike
-
Frequency Sensitivity Differences: Most sensitive in 2~5 kHz band, less sensitive toward low and high frequencies
AAC uses this to apply coarser quantization to perceptually less important coefficients to save bits. Rather than “reducing data,” the strategy is reducing components that are unlikely to be heard first.
MDCT (Modified Discrete Cosine Transform)
AAC family mainly uses MDCT-based filter bank to convert time-domain signals to frequency coefficients.
MDCT Features:
- Overlap-Add structure: Removes discontinuities at block boundaries
- Variable block length:
- Long blocks (1024 samples): Steady-state signals, excellent frequency resolution
- Short blocks (128 samples): Instantaneous attack sounds (transients), excellent time resolution
- Critical band-based grouping: Groups coefficients according to human auditory frequency resolution
Why use MDCT?
In time domain, you can know sound changes but frequency characteristics are difficult to know. Conversely, in frequency domain, you can know what sounds exist but time information is lacking. MDCT appropriately balances both while removing block boundary artifacts with overlap structure.
Bitrate Allocation Strategy
Internally, bit pool is divided by frequency band, and bits are distributed to tone and noise components according to psychoacoustic weights.
Allocation Process:
- Psychoacoustic analysis: Calculate masking threshold for each frequency band
- Bit budget distribution: Allocate more bits to important bands
- Quantization step decision: Quantize with different precision per band
- Iterative optimization: Minimize distortion while meeting target bitrate
At low bitrates, the following techniques additionally operate:
- Bandwidth limitation: Don’t transmit high frequencies at all (e.g., remove above 16 kHz)
- Spectral line replacement: Approximate complex frequency components with noise
- TNS (Temporal Noise Shaping): Optimize time-domain noise distribution
HE-AAC family efficiently encodes high-frequency information as a separate layer with SBR (Spectral Band Replication). Actually transmitted data includes only below 8 kHz, and 8~16 kHz is restored with SBR parameters.
Processing Flow
Below shows the entire AAC encoding pipeline.
Here’s a detailed implementation using Mermaid. Please review the code to understand the role of each part.
flowchart TB
subgraph INPUT[Input Stage]
PCM["PCM Audio\n44.1/48 kHz"]
end
subgraph ANALYSIS[Analysis Stage]
PSY["Psychoacoustic Analysis\nMasking Threshold Calculation"]
MDCT["MDCT Transform\nTime→Frequency Domain"]
end
subgraph ENCODING[Encoding Stage]
QUANT["Quantization\nBit Allocation per Band"]
CODEBOOK["Codebook Selection\nEfficient Representation"]
TNS["TNS Application\nTemporal Noise Shaping"]
end
subgraph OPTIONAL[Optional Technologies]
SBR["SBR\nHigh-frequency Restoration\nHE-AAC v1"]
PS["PS\nParametric Stereo\nHE-AAC v2"]
end
subgraph OUTPUT[Output Stage]
ENTROPY["Entropy Coding\nHuffman etc."]
BITSTREAM["AAC Bitstream\nADTS/MP4"]
end
PCM --> PSY
PCM --> MDCT
PSY --> QUANT
MDCT --> QUANT
QUANT --> CODEBOOK
CODEBOOK --> TNS
TNS --> ENTROPY
PSY -.-> SBR
QUANT -.-> SBR
SBR -.-> ENTROPY
PSY -.-> PS
PS -.-> ENTROPY
ENTROPY --> BITSTREAM
style INPUT fill:#e3f2fd
style ANALYSIS fill:#fff3e0
style ENCODING fill:#f3e5f5
style OPTIONAL fill:#e8f5e9
style OUTPUT fill:#fce4ec
Practical Encoding
FFmpeg Encoder Selection
FFmpeg provides two AAC encoders:
| Encoder | Features | When to Use |
|---|---|---|
aac | Native encoder, provided by default | General purpose, fast encoding |
libfdk_aac | High-quality encoder, requires separate build | When highest quality is needed |
Check encoders:
Here’s an implementation example using bash. Try running the code directly to see how it works.
# Check available AAC encoders
ffmpeg -encoders | grep aac
# Example output:
# A..... aac AAC (Advanced Audio Coding)
# A..... libfdk_aac Fraunhofer FDK AAC
Basic Encoding Examples
1. AAC-LC, CBR Stereo 128 kbps
The most basic and stable setting. Suitable for most music streaming.
Here’s an implementation example using bash. Try running the code directly to see how it works.
ffmpeg -i input.wav \
-c:a aac \
-b:a 128k \
-ar 48000 \
-ac 2 \
-aac_coder twoloop \
output.m4a
Option Explanation:
-c:a aac: Use AAC encoder-b:a 128k: Bitrate 128 kbps-ar 48000: Sample rate 48 kHz (recommended when combining with video)-ac 2: Stereo (2 channels)-aac_coder twoloop: High-quality encoding algorithm (slower than default but better quality)
2. AAC-LC, VBR Quality Mode
Encode based on quality rather than fixed bitrate.
Here’s an implementation example using bash. Try running the code directly to see how it works.
# Native aac encoder (VBR-like)
ffmpeg -i input.wav \
-c:a aac \
-q:a 1 \
-ar 44100 \
-ac 2 \
output.m4a
-q:a Value Guide:
0: Highest quality (~256 kbps)1: High quality (~192 kbps)2: Medium quality (~128 kbps)3-4: Low quality (~96 kbps)
3. Using libfdk_aac (High Quality)
If libfdk_aac is installed, you can get better audio quality:
Here’s an implementation example using bash. Try running the code directly to see how it works.
ffmpeg -i input.wav \
-c:a libfdk_aac \
-profile:a aac_low \
-vbr 4 \
-ar 44100 \
-ac 2 \
output.m4a
-vbr Value Guide (libfdk_aac):
1: ~32 kbps2: ~48 kbps3: ~64 kbps4: ~96 kbps5: ~128 kbps
4. HE-AAC v1 Encoding
For low-bitrate mobile streaming:
Here’s an implementation example using bash. Try running the code directly to see how it works.
ffmpeg -i input.wav \
-c:a libfdk_aac \
-profile:a aac_he \
-b:a 64k \
-ar 44100 \
-ac 2 \
output.m4a
Note: Native aac encoder does not support HE-AAC. libfdk_aac is required.
5. HE-AAC v2 Encoding
For ultra-low bitrate speech:
Here’s an implementation example using bash. Try running the code directly to see how it works.
ffmpeg -i input.wav \
-c:a libfdk_aac \
-profile:a aac_he_v2 \
-b:a 32k \
-ar 44100 \
-ac 2 \
output.m4a
6. AAC for HLS Streaming
ADTS format for HTTP Live Streaming:
Here’s an implementation example using bash. Try running the code directly to see how it works.
ffmpeg -i input.wav \
-c:a aac \
-b:a 160k \
-ar 48000 \
-ac 2 \
-f adts \
output.aac
ADTS vs MP4:
- ADTS: Header included in each frame, suitable for streaming segments
- MP4: Metadata concentrated at file start/end, suitable for download playback
7. Encoding with Video
Re-encode only audio of video file to AAC:
Here’s an implementation example using bash. Try running the code directly to see how it works.
ffmpeg -i input.mp4 \
-c:v copy \
-c:a aac \
-b:a 192k \
-ar 48000 \
output.mp4
-c:v copy: Copy video without re-encoding (fast and no quality loss)
Parameter Tuning Guide
Sample Rate Selection
| Sample Rate | Use | Notes |
|---|---|---|
| 44.1 kHz | CD quality, music | Recommend maintaining if source is 44.1 |
| 48 kHz | Video standard, broadcast | Required when combining with video |
| 32 kHz | Low-quality streaming | Only for extreme bandwidth savings |
| 22.05 kHz | Speech, radio | Unsuitable for music |
Resampling Caution: If source is 44.1 kHz and you upsample to 48 kHz then downsample back to 44.1 kHz, there’s risk of aliasing and distortion. Maintain source sample rate when possible.
Bitrate Selection
Music Streaming (stereo basis):
| Bitrate | Quality | Use |
|---|---|---|
| 256 kbps | Near transparent | High-quality streaming, audio files |
| 192 kbps | Excellent | General streaming (Spotify Premium, etc.) |
| 128 kbps | Good | Standard streaming, mobile |
| 96 kbps | Average | Low-bandwidth environment (HE-AAC v1 recommended) |
| 64 kbps | Low | HE-AAC v1 required |
| 48 kbps or below | Speech only | HE-AAC v2 or Opus |
Podcast/Speech:
- Speech only: 64
96 kbps (mono 3248 kbps also possible) - Speech + background music: 96~128 kbps
- High-quality interview: 128~160 kbps
Genre-specific Recommended Bitrates:
- Classical, Jazz: 192~256 kbps (reverb and detail important)
- Pop, Rock: 128~192 kbps
- Electronic: 160~192 kbps (high-frequency synthesizers)
- Talk, Audiobooks: 64~96 kbps
Quality vs File Size Tradeoff
Actual File Size Examples (3-minute music, stereo):
| Bitrate | File Size | Perceived Quality |
|---|---|---|
| 320 kbps (MP3 max) | ~7.2 MB | Transparent |
| 256 kbps (AAC) | ~5.8 MB | Nearly transparent |
| 192 kbps (AAC) | ~4.3 MB | Excellent (most satisfied) |
| 128 kbps (AAC) | ~2.9 MB | Good (OK for general listening) |
| 96 kbps (HE-AAC v1) | ~2.2 MB | Average (mobile environment) |
| 64 kbps (HE-AAC v1) | ~1.4 MB | Low (speech-focused) |
In the same listening environment, 192 kbps vs 128 kbps has significant perceived difference relative to file size. However, 256 kbps or higher requires cost-benefit analysis (bandwidth and CDN costs) depending on situation.
ABX listening tests are recommended to establish team baselines:
Here’s an implementation example using bash. Try running the code directly to see how it works.
# Encode same source at multiple bitrates
ffmpeg -i source.wav -c:a aac -b:a 128k test_128.m4a
ffmpeg -i source.wav -c:a aac -b:a 192k test_192.m4a
ffmpeg -i source.wav -c:a aac -b:a 256k test_256.m4a
# Check differences with blind test
Performance Comparison
Compression Efficiency vs Other Codecs
General evaluation under the same subjective listening conditions (may vary by content and encoder):
| Bitrate | MP3 | AAC-LC | HE-AAC v1 | Opus |
|---|---|---|---|---|
| 32 kbps | Very low | Low | Average | Excellent (speech) |
| 64 kbps | Low | Average | Good | Excellent |
| 96 kbps | Average | Good | Excellent | Excellent |
| 128 kbps | Good | Excellent | Excessive | Excellent |
| 192 kbps | Excellent | Nearly transparent | Excessive | Nearly transparent |
Conclusion:
- 128 kbps or higher: Clear difference between AAC-LC and MP3, AAC advantageous
- 64~96 kbps: HE-AAC v1 clearly superior to AAC-LC
- 48 kbps or below: Opus often superior to AAC (speech specialized)
Encoding and Decoding Speed
Decoding Performance:
-
Hardware Acceleration: Most mobile SoCs have built-in AAC hardware decoder
- iPhone: All A-series chips
- Android: Most Snapdragon, Exynos, MediaTek
- PC: Intel Quick Sync, AMD VCE
-
Battery Efficiency: Hardware decoding uses 1/5~1/10 power consumption compared to software
Encoding Performance (Intel i7-10700K basis, 3-minute music):
| Setting | Encoding Time | Real-time Multiple |
|---|---|---|
| AAC-LC 128k (aac) | ~2s | 90x |
| AAC-LC 192k (aac, twoloop) | ~5s | 36x |
| AAC-LC 192k (libfdk_aac, vbr 5) | ~8s | 22x |
| HE-AAC v1 64k (libfdk_aac) | ~12s | 15x |
FFmpeg aac is often sufficient for real-time batch processing, and high-quality presets and multi-pass can increase CPU time.
Subjective Quality (MOS)
MOS (Mean Opinion Score) varies by experimental conditions and codec version.
Typical MOS Range (out of 5):
| Bitrate | AAC-LC | HE-AAC v1 | MP3 |
|---|---|---|---|
| 64 kbps | 3.0~3.5 | 3.5~4.0 | 2.5~3.0 |
| 96 kbps | 3.5~4.0 | 4.0~4.3 | 3.0~3.5 |
| 128 kbps | 4.0~4.5 | 4.3~4.6 | 3.5~4.0 |
| 192 kbps | 4.5~4.8 | - | 4.0~4.5 |
In practice, refer to standard listening procedures like ITU-R BS.1534 (MUSHRA), but it’s safer to design your own MOS survey with service-specific target devices and earphones.
Real-World Use Cases
Streaming Services
Major Platform AAC Usage:
- Apple Music: AAC 256 kbps (high quality), AAC 128 kbps (standard)
- YouTube: AAC 128~256 kbps (video audio track)
- Netflix: AAC 192~640 kbps (up to 5.1 channels)
- Spotify: Previously Ogg Vorbis-focused but uses AAC on some platforms
Adaptive Streaming Configuration Example:
Here’s an implementation example using bash. Try running the code directly to see how it works.
# Create multiple bitrate versions
ffmpeg -i source.wav -c:a aac -b:a 256k -ar 48000 audio_256k.m4a
ffmpeg -i source.wav -c:a aac -b:a 192k -ar 48000 audio_192k.m4a
ffmpeg -i source.wav -c:a aac -b:a 128k -ar 48000 audio_128k.m4a
ffmpeg -i source.wav -c:a libfdk_aac -profile:a aac_he -b:a 64k audio_64k.m4a
Client automatically selects appropriate version based on network speed.
Mobile Apps
Both iOS and Android have good basic support for AAC decoding.
Mobile Optimization Strategy:
-
Offline Cache:
- Wi-Fi: 192 kbps AAC-LC
- Mobile data: 96 kbps HE-AAC v1
-
Background Playback: Battery savings with hardware decoding
-
A/B Testing:
- Provide different bitrates to user groups
- Collect churn rate, playback completion rate, user feedback
Actual Implementation Example (pseudocode):
Here’s an implementation example using JavaScript. Try running the code directly to see how it works.
// Quality selection based on network status
const quality = networkSpeed > 5000 ? 'high' :
networkSpeed > 2000 ? 'medium' : 'low';
const audioUrl = {
high: '/audio/song_192k.m4a', // AAC-LC 192 kbps
medium: '/audio/song_128k.m4a', // AAC-LC 128 kbps
low: '/audio/song_64k.m4a' // HE-AAC v1 64 kbps
}[quality];
Podcast Production
Podcasts are mainly speech, so low bitrates are sufficient:
Recommended Settings:
Here’s a detailed implementation using bash. Please review the code to understand the role of each part.
# Speech-only podcast (mono)
ffmpeg -i podcast.wav \
-c:a aac \
-b:a 64k \
-ar 44100 \
-ac 1 \
podcast_mono.m4a
# Speech + intro/outro music (stereo)
ffmpeg -i podcast.wav \
-c:a aac \
-b:a 96k \
-ar 44100 \
-ac 2 \
podcast_stereo.m4a
File Size Comparison (1-hour podcast):
- 64 kbps mono: ~28 MB
- 96 kbps stereo: ~43 MB
- 128 kbps stereo: ~57 MB
VoIP and WebRTC
For real-time voice communication, Opus is more suitable than AAC:
- AAC: Large frame size causes latency (
2050ms) - Opus: Ultra-low latency settings possible (
510ms)
AAC is closer to recorded media files and adaptive streaming rather than real-time calls due to latency and framing characteristics.
Browser Support
HTML5 <audio> tag:
Here’s an implementation example using HTML. Try running the code directly to see how it works.
<audio controls>
<source src="audio.m4a" type="audio/mp4">
<source src="audio.mp3" type="audio/mpeg">
Your browser does not support audio.
</audio>
Support Status:
| Browser | AAC in MP4 | AAC in ADTS |
|---|---|---|
| Chrome | ✅ | ⚠️ Limited |
| Firefox | ✅ | ⚠️ Limited |
| Safari | ✅ | ✅ |
| Edge | ✅ | ⚠️ Limited |
Recommendation: Use MP4 container (.m4a) on web. ADTS support is unstable in some browsers.
AAC in fMP4 (fragmented MP4) in MSE (Media Source Extensions) environment is widely supported. It’s the foundation technology for adaptive streaming like HLS and DASH.
Optimization Tips
Reducing File Size While Maintaining Quality
1. Remove Unnecessary Resampling
Maintaining source sample rate reduces quality loss and processing time:
Here’s an implementation example using bash. Try running the code directly to see how it works.
# Bad example: Unnecessary resampling
ffmpeg -i input_44100.wav -ar 48000 -c:a aac -b:a 128k temp.m4a
ffmpeg -i temp.m4a -ar 44100 output.m4a # Quality degradation!
# Good example: Maintain source
ffmpeg -i input_44100.wav -c:a aac -b:a 128k output.m4a
2. Utilize HE-AAC
For low-bitrate mobile streams, HE-AAC can be advantageous for perceived quality vs file size:
Here’s an implementation example using bash. Try running the code directly to see how it works.
# Compare AAC-LC 128 kbps vs HE-AAC v1 64 kbps
ffmpeg -i input.wav -c:a aac -b:a 128k lc_128.m4a
ffmpeg -i input.wav -c:a libfdk_aac -profile:a aac_he -b:a 64k he_64.m4a
# File size is half, quality is similar level
Note: Client support verification required. Older devices may not play HE-AAC.
3. Optimize Silent Sections
For content with long silence (lectures, interviews), removing by editing reduces file size:
Here’s an implementation example using bash. Try running the code directly to see how it works.
# Detect and remove silence (FFmpeg)
ffmpeg -i input.wav \
-af "silenceremove=start_periods=1:start_silence=0.1:start_threshold=-50dB" \
-c:a aac -b:a 96k \
output.m4a
Improving Encoding Speed
1. Use Single-Pass CBR
Faster and more predictable than VBR or multi-pass:
# Fast encoding (CBR)
ffmpeg -i input.wav -c:a aac -b:a 128k -ar 48000 output.m4a
2. CPU Parallelization
Encode multiple files simultaneously:
Here’s a simple bash code example. Try running the code directly to see how it works.
# Using GNU parallel
find ./wav -name '*.wav' -print0 | \
parallel -0 -j 4 \
ffmpeg -y -i {} -c:a aac -b:a 160k {.}.m4a
-j 4: Process 4 files simultaneously (adjust to CPU core count)
3. Hardware Acceleration (Limited for Encoding)
AAC encoding is mostly software-based. Hardware acceleration is extensively supported only for decoding.
Batch Processing Automation
Script example for consistently encoding large numbers of files:
Here’s a detailed implementation using bash. Please review the code to understand the role of each part.
#!/bin/bash
# batch_encode.sh
INPUT_DIR="./source"
OUTPUT_DIR="./encoded"
BITRATE="128k"
SAMPLE_RATE="48000"
mkdir -p "$OUTPUT_DIR"
for file in "$INPUT_DIR"/*.wav; do
filename=$(basename "$file" .wav)
echo "Encoding: $filename"
ffmpeg -i "$file" \
-c:a aac \
-b:a "$BITRATE" \
-ar "$SAMPLE_RATE" \
-ac 2 \
-aac_coder twoloop \
"$OUTPUT_DIR/${filename}.m4a" \
-y
# Quality verification: Generate spectrogram
ffmpeg -i "$OUTPUT_DIR/${filename}.m4a" \
-lavfi showspectrumpic=s=1920x1080 \
"$OUTPUT_DIR/${filename}_spectrum.png" \
-y
done
echo "Batch encoding complete!"
CI/CD Integration:
Here’s a detailed implementation using YAML. Please review the code to understand the role of each part.
# .github/workflows/encode-audio.yml
name: Encode Audio Files
on:
push:
paths:
- 'audio/source/**/*.wav'
jobs:
encode:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install FFmpeg
run: sudo apt-get install -y ffmpeg
- name: Encode AAC
run: |
for file in audio/source/*.wav; do
ffmpeg -i "$file" -c:a aac -b:a 192k "${file%.wav}.m4a"
done
- name: Upload artifacts
uses: actions/upload-artifact@v3
with:
name: encoded-audio
path: audio/source/*.m4a
Saving input fingerprint (checksum) and output spectrogram snapshots in CI helps with regression detection when upgrading encoders.
Common Problems and Solutions
Compatibility Issues
Problem 1: HE-AAC Won’t Play
Symptom: No sound or error on older devices
Cause: HE-AAC decoder not supported (pre-2010 devices)
Solution:
# Fallback strategy: Also provide AAC-LC version
ffmpeg -i input.wav -c:a aac -b:a 128k fallback_lc.m4a
ffmpeg -i input.wav -c:a libfdk_aac -profile:a aac_he -b:a 64k modern_he.m4a
On web, provide multiple sources in <audio> tag:
Here’s a simple HTML code example. Try running the code directly to see how it works.
<audio controls>
<source src="audio_he.m4a" type="audio/mp4">
<source src="audio_lc.m4a" type="audio/mp4">
</audio>
Problem 2: ADTS Files Won’t Play in Browser
Symptom: .aac files fail to play in Chrome/Firefox
Cause: Browsers prefer MP4 container, limited ADTS support
Solution:
# Convert ADTS to MP4 (without re-encoding)
ffmpeg -i input.aac -c:a copy -movflags +faststart output.m4a
-movflags +faststart: Move moov atom to file beginning (streaming optimization)
Problem 3: Video and Audio Out of Sync
Symptom: Video and audio gradually drift apart
Cause: Sample rate mismatch, timestamp errors
Solution:
Here’s an implementation example using bash. Perform tasks efficiently with async processing. Try running the code directly to see how it works.
# Unify video and audio sample rate
ffmpeg -i video.mp4 -i audio.wav \
-c:v copy \
-c:a aac -b:a 192k -ar 48000 \
-async 1 \
output.mp4
-async 1: Automatically adjust audio timestamps to match video
Audio Quality Degradation
Problem 4: Sound Feels “Broken”
Cause 1: Clipping
Source audio exceeds 0 dBFS causing distortion:
Here’s an implementation example using bash. Try running the code directly to see how it works.
# Adjust gain to secure headroom
ffmpeg -i input.wav \
-af "volume=-3dB" \
-c:a aac -b:a 192k \
output.m4a
Recommended Headroom: -1~-3 dBTP (True Peak)
Cause 2: Insufficient Bitrate
Encoding complex music at low bitrate:
# Increase bitrate
ffmpeg -i input.wav -c:a aac -b:a 192k output.m4a # 128k → 192k
Problem 5: High Frequencies Sound Harsh
Symptom: Hi-hats, cymbals, high vocals sound “hissy”
Cause: High-frequency quantization errors at low bitrate
Solution 1: Increase Bitrate
ffmpeg -i input.wav -c:a aac -b:a 192k output.m4a
Solution 2: Use HE-AAC
Here’s an implementation example using bash. Try running the code directly to see how it works.
# Restore high frequencies with SBR
ffmpeg -i input.wav \
-c:a libfdk_aac \
-profile:a aac_he \
-b:a 96k \
output.m4a
Solution 3: High-Frequency Pre-emphasis
Here’s an implementation example using bash. Try running the code directly to see how it works.
# Emphasize high frequencies before encoding
ffmpeg -i input.wav \
-af "highpass=f=8000,volume=1.5" \
-c:a aac -b:a 128k \
output.m4a
Problem 6: Stereo Image Loss
Symptom: Weakened left-right separation
Cause: Stereo information loss at low bitrate
Solution:
Here’s an implementation example using bash. Try running the code directly to see how it works.
# Enable Mid/Side encoding (some encoders)
ffmpeg -i input.wav \
-c:a libfdk_aac \
-profile:a aac_low \
-vbr 5 \
output.m4a
License Considerations
AAC may have license issues depending on patent pool and product category.
Major Patent Pools:
- Via Licensing: AAC patent pool management
- MPEG LA: MPEG-4 related patents
License Scenarios:
| Usage Type | License Required |
|---|---|
| Personal use | Not required |
| Free app/service | Generally not required (decoder provider bears cost) |
| Paid app/service | Review needed (depending on revenue scale) |
| Hardware products | Required (typically handled by decoder chip manufacturer) |
FFmpeg Encoder Licenses:
aac(native): FFmpeg license (LGPL/GPL)- libfdk_aac: Fraunhofer FDK AAC license (commercial use may be restricted)
For commercial products and large-scale distribution, conduct legal review and check terms of encoders/decoders in use.
Safe Choice:
- Personal/small-scale: Use FFmpeg native
aacencoder - Commercial/large-scale: Decide after legal team and license review
Metadata Management
Add metadata (title, artist, album, etc.) to AAC files:
Here’s an implementation example using bash. Try running the code directly to see how it works.
ffmpeg -i input.wav \
-c:a aac -b:a 192k \
-metadata title="Song Title" \
-metadata artist="Artist Name" \
-metadata album="Album Name" \
-metadata date="2026" \
-metadata genre="Pop" \
output.m4a
Add Album Art:
Here’s an implementation example using bash. Try running the code directly to see how it works.
ffmpeg -i input.wav -i cover.jpg \
-c:a aac -b:a 192k \
-c:v copy \
-disposition:v:0 attached_pic \
-metadata title="Song Title" \
output.m4a
Conclusion
Key Summary
- AAC is the central codec of MPEG-4 audio, with LC-AAC as the compatibility baseline and HE-AAC having strengths at low bitrates.
- Internally uses psychoacoustic model + MDCT-based block coding to reduce bits for perceptually less important information.
- In practice, must design sample rate unification, bitrate tiers, and container (MP4/ADTS) together.
- Profile selection determined by bitrate and content type: LC for 128 kbps or higher, HE-AAC v1 for 64~96 kbps, HE-AAC v2 or Opus for 48 kbps or below
Recommended Usage Scenarios
Music Streaming:
- High quality: AAC-LC 192~256 kbps
- Standard: AAC-LC 128~160 kbps
- Mobile: HE-AAC v1 64~96 kbps
Podcasts:
- Speech only: AAC-LC 64~96 kbps (mono possible)
- Speech + music: AAC-LC 96~128 kbps (stereo)
VOD Services:
- Premium: AAC-LC 192~256 kbps
- Standard: AAC-LC 128~160 kbps
- Mobile: HE-AAC v1 64~96 kbps
Adaptive Streaming:
- Prepare multiple bitrate AAC versions (256k, 192k, 128k, 64k)
- Client automatically selects based on network
- MP4 container recommended
Next Steps
- Experiment: Test multiple bitrates with your content and find optimal point
- Monitor: Analyze user feedback and bandwidth costs
- Optimize: Apply different profiles and bitrates by content type
- Update: Continue encoder quality improvements with FFmpeg version upgrades
One-Line Summary: AAC is the standard of modern streaming with excellent balance of compression efficiency and compatibility, and the key is selecting bitrate and profile appropriate to content.