What is the difference between UTF-8 and UTF-16?

UTF-8 is variable length (1-4 bytes), compatible with ASCII, and efficient for English-centric environments. UTF-16 is mostly 2 bytes (some 4 bytes), efficient for Asian languages like Korean and Japanese, but not compatible with ASCII.

Why do Korean characters get corrupted?

Occurs when a file saved in UTF-8 is read as EUC-KR, or vice versa. On the web, it happens when the charset in Content-Type header differs from actual encoding.

BOM (Byte Order Mark) is a special byte at the start of a file indicating encoding method and byte order (Endian). UTF-8 BOM is EF BB BF and can cause issues in some systems.

Complete Character Encoding Guide | ASCII, UTF-8, UTF-16, EUC-KR

2026년 4월 1일 · 50분 읽기 · 수정 2026년 4월 1일 Intermediate Guide

이 글의 핵심

Principles and differences of all character encoding methods including ASCII, ANSI, Unicode, UTF-8, UTF-16, UTF-32, EUC-KR, CP949. Complete understanding from solving Korean character corruption to BOM and Endian with practical examples.

Introduction: Why Should You Know Character Encoding?

When developing, you experience Korean characters getting corrupted, files being unreadable, or API responses appearing strange. The root cause of all these problems is character encoding.

What This Article Covers:

History of ASCII, ANSI, Unicode
UTF-8, UTF-16, UTF-32 encoding methods
Korean encoding (EUC-KR, CP949)
BOM, Endian, encoding detection
Practical problem solving

Reality in Practice

When learning development, everything seems clean and theoretical. But practice is different. You wrestle with legacy code, chase tight deadlines, and face unexpected bugs. The content covered in this article was initially learned as theory, but it was through applying it to actual projects that I realized “Ah, this is why it’s designed this way.”

What stands out in my memory is the trial and error from my first project. I did everything by the book but couldn’t figure out why it wasn’t working, spending days struggling. Eventually, through a senior developer’s code review, I discovered the problem and learned a lot in the process. In this article, I’ll cover not just theory but also the pitfalls you might encounter in practice and how to solve them.

History of Character Encoding
ASCII: 7-bit Character Set
ANSI and Code Pages
Unicode: Global Character Integration
UTF-8: Variable Length Encoding
UTF-16 and UTF-32
Korean Encoding (EUC-KR, CP949)
BOM and Endian
Practical Problem Solving
Programming Language-specific Handling

1. History of Character Encoding

Timeline

Here’s an implementation example using mermaid. Please review the code to understand the role of each part.

timeline
    title Character Encoding Evolution
    1963 : ASCII established\n7-bit, 128 chars
    1987 : ISO-8859-1 (Latin-1)\n8-bit, 256 chars
    1991 : Unicode 1.0\n16-bit unified charset
    1992 : UTF-8 invented\nVariable length encoding
    1996 : UTF-16\nSurrogate pairs
    2003 : UTF-8 web standardization
    2008 : UTF-8 most used\non the web
    2026 : UTF-8 98% market share

Why Do Multiple Encodings Exist?

Here’s an implementation example using mermaid. Please review the code to understand the role of each part.

flowchart TB
    Problem["Problem: Computers\nonly understand numbers"]
    
    ASCII["ASCII\n128 English chars"]
    Extended["Extended ASCII\n256 chars per language"]
    Unicode["Unicode\nGlobal character integration"]
    
    Problem --> ASCII
    ASCII --> Extended
    Extended --> Unicode
    
    ASCII --> Issue1["Problem: Cannot express\nKorean, Chinese"]
    Extended --> Issue2["Problem: Different\ncode pages per country"]
    Unicode --> Solution["Solution: Assign unique\nnumber to all characters"]

2. ASCII: 7-bit Character Set

What is ASCII?

ASCII (American Standard Code for Information Interchange) represents English alphabet, numbers, and special characters with 7 bits (0-127).

ASCII Table

Here’s an implementation example using code. Please review the code to understand the role of each part.

Dec  Hex  Char  |  Dec  Hex  Char  |  Dec  Hex  Char
-------------------------------------------------
 32  20   Space |  64  40   @      |  96  60   `
 33  21   !     |  65  41   A      |  97  61   a
 34  22   "     |  66  42   B      |  98  62   b
 35  23   #     |  67  43   C      |  99  63   c
...
 48  30   0     |  80  50   P      | 112  70   p
 49  31   1     |  81  51   Q      | 113  71   q
...
 57  39   9     |  90  5A   Z      | 122  7A   z

ASCII Control Characters

Here’s an implementation example using Python. Please review the code to understand the role of each part.

# Main control characters
NUL = 0x00  # Null
LF  = 0x0A  # Line Feed (\n)
CR  = 0x0D  # Carriage Return (\r)
ESC = 0x1B  # Escape
DEL = 0x7F  # Delete

# Line break methods
# Unix/Linux: LF (\n)
# Windows: CR+LF (\r\n)
# Mac (Classic): CR (\r)

ASCII Examples

Here’s a detailed implementation using Python. Implement logic through functions. Please review the code to understand the role of each part.

# Character → Code
ord('A')  # 65
ord('a')  # 97
ord('0')  # 48

# Code → Character
chr(65)   # 'A'
chr(97)   # 'a'

# Check ASCII range
def is_ascii(text):
    return all(ord(c) < 128 for c in text)

is_ascii("Hello")  # True
is_ascii("안녕")    # False

3. ANSI and Code Pages

What is ANSI?

ANSI extends to 8 bits (0-255) to support each country’s language. However, the meaning of the 128-255 range differs per Code Page.

Major Code Pages

Code Page	Name	Region	Features
CP437	OEM-US	USA	DOS default
CP850	Latin-1	Western Europe	DOS multilingual
CP949	Extended Complete	Korea	Windows Korean
CP932	Shift-JIS	Japan	Windows Japanese
CP936	GBK	China	Windows Chinese
ISO-8859-1	Latin-1	Western Europe	Unix/Web
ISO-8859-15	Latin-9	Western Europe	Euro (€) added

Code Page Problems

Here’s an implementation example using Python. Please review the code to understand the role of each part.

# Same byte value, different meaning
byte_value = 0xC7

# CP949 (Korean): '한'
text_korean = byte_value.to_bytes(1, 'big').decode('cp949')  # Error (needs 2 bytes)

# ISO-8859-1 (Latin-1): 'Ç'
text_latin = byte_value.to_bytes(1, 'big').decode('latin-1')  # 'Ç'

# Reading same file with different encoding causes corruption!

4. Unicode: Global Character Integration

What is Unicode?

Unicode is a character set that assigns unique Code Points to all characters worldwide.

Unicode Structure

Here’s an implementation example using code. Please review the code to understand the role of each part.

U+0000 ~ U+10FFFF (1,114,112 code points)

U+0000 ~ U+007F   : ASCII (128 chars)
U+0080 ~ U+00FF   : Latin-1 Supplement
U+0100 ~ U+017F   : Latin Extended-A
U+0370 ~ U+03FF   : Greek
U+0400 ~ U+04FF   : Cyrillic
U+0600 ~ U+06FF   : Arabic
U+0E00 ~ U+0E7F   : Thai
U+3040 ~ U+309F   : Hiragana (Japanese)
U+30A0 ~ U+30FF   : Katakana (Japanese)
U+4E00 ~ U+9FFF   : CJK Unified Ideographs (Chinese/Japanese/Korean)
U+AC00 ~ U+D7AF   : Hangul Syllables (Korean 11,172 chars)
U+1F600 ~ U+1F64F : Emoticons (Emoji)

Korean Unicode Range

Here’s an implementation example using Python. Please review the code to understand the role of each part.

# Korean syllables (가-힣)
print(f"가: U+{ord('가'):04X}")  # U+AC00
print(f"힣: U+{ord('힣'):04X}")  # U+D7A3

# Korean letters (ㄱ-ㅎ, ㅏ-ㅣ)
print(f"ㄱ: U+{ord('ㄱ'):04X}")  # U+3131
print(f"ㅎ: U+{ord('ㅎ'):04X}")  # U+314E
print(f"ㅏ: U+{ord('ㅏ'):04X}")  # U+314F
print(f"ㅣ: U+{ord('ㅣ'):04X}")  # U+3163

# Emoji
print(f"😀: U+{ord('😀'):04X}")  # U+1F600

Unicode vs Encoding

Here’s an implementation example using code. Try running the code directly to see how it works.

Unicode: Character Set
         Assigns number (code point) to each character
         Example: '한' = U+D55C

UTF-8/UTF-16/UTF-32: Encoding
                      Method to convert code points to bytes
                      Example: U+D55C → UTF-8: ED 95 9C (3 bytes)
                                      → UTF-16: D5 5C (2 bytes)

5. UTF-8: Variable Length Encoding

What is UTF-8?

UTF-8 encodes Unicode with 1-4 byte variable length. It’s the web standard and perfectly compatible with ASCII.

UTF-8 Encoding Rules

Here’s an implementation example using code. Try running the code directly to see how it works.

Code Point Range       | Bytes | Encoding Pattern
U+0000   ~ U+007F     | 1     | 0xxxxxxx
U+0080   ~ U+07FF     | 2     | 110xxxxx 10xxxxxx
U+0800   ~ U+FFFF     | 3     | 1110xxxx 10xxxxxx 10xxxxxx
U+10000  ~ U+10FFFF   | 4     | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

UTF-8 Encoding Examples

English ‘A’ (U+0041)

Here’s an implementation example using code. Try running the code directly to see how it works.

Code Point: U+0041 (65)
Binary: 0100 0001

UTF-8 Encoding:
0100 0001 = 0x41 (1 byte)

Memory: 41

Korean ‘한’ (U+D55C)

Here’s an implementation example using code. Try running the code directly to see how it works.

Code Point: U+D55C (54,620)
Binary: 1101 0101 0101 1100

UTF-8 Encoding (3 bytes):
1110xxxx 10xxxxxx 10xxxxxx
1110 1101  10 010101  10 011100
   E   D      9   5      9   C

Memory: ED 95 9C

Emoji ’😀’ (U+1F600)

Here’s an implementation example using code. Try running the code directly to see how it works.

Code Point: U+1F600 (128,512)
Binary: 0001 1111 0110 0000 0000

UTF-8 Encoding (4 bytes):
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
11110 000  10 011111  10 011000  10 000000
   F   0      9   F      9   8      8   0

Memory: F0 9F 98 80

UTF-8 Advantages

Here’s an implementation example using mermaid. Please review the code to understand the role of each part.

flowchart TB
    UTF8[UTF-8]
    
    Adv1["✅ ASCII compatible\nEnglish is 1 byte"]
    Adv2["✅ Self-synchronizing\nCan read from middle"]
    Adv3["✅ Byte order independent\nNo Endian issues"]
    Adv4["✅ Web standard\n98% market share"]
    
    UTF8 --> Adv1
    UTF8 --> Adv2
    UTF8 --> Adv3
    UTF8 --> Adv4

UTF-8 Encoding with Python

# String → Bytes
text = "Hello 한글 😀"

# UTF-8 encoding
utf8_bytes = text.encode('utf-8')
print(utf8_bytes)
# b'Hello \xed\x95\x9c\xea\xb8\x80 \xf0\x9f\x98\x80'

# Byte analysis
for i, byte in enumerate(utf8_bytes):
    print(f"{i:2d}: 0x{byte:02X} ({byte:3d}) {chr(byte) if byte < 128 else '?'}")

# Output:
#  0: 0x48 ( 72) H
#  1: 0x65 (101) e
#  2: 0x6C (108) l
#  3: 0x6C (108) l
#  4: 0x6F (111) o
#  5: 0x20 ( 32)  
#  6: 0xED (237) ?  ← '한' start
#  7: 0x95 (149) ?
#  8: 0x9C (156) ?
#  9: 0xEA (234) ?  ← '글' start
# 10: 0xB8 (184) ?
# 11: 0x80 (128) ?
# 12: 0x20 ( 32)  
# 13: 0xF0 (240) ?  ← '😀' start
# 14: 0x9F (159) ?
# 15: 0x98 (152) ?
# 16: 0x80 (128) ?

# Bytes → String
decoded = utf8_bytes.decode('utf-8')
print(decoded)  # "Hello 한글 😀"

6. UTF-16 and UTF-32

UTF-16

UTF-16 encodes with 2 or 4 bytes. Used internally in Windows, Java, and JavaScript.

UTF-16 Encoding Rules

Code Point Range       | Bytes | Method
U+0000   ~ U+FFFF     | 2     | Direct encoding
U+10000  ~ U+10FFFF   | 4     | Surrogate pair

Surrogate Pair

Here’s a detailed implementation using Python. Please review the code to understand the role of each part.

# Encode emoji '😀' (U+1F600) to UTF-16

# 1. U+1F600 - 0x10000 = 0xF600
# 2. High 10 bits: 0x3D (61)
# 3. Low 10 bits: 0x200 (512)
# 4. High Surrogate: 0xD800 + 0x3D = 0xD83D
# 5. Low Surrogate: 0xDC00 + 0x200 = 0xDE00

text = "😀"
utf16_bytes = text.encode('utf-16-le')
print(utf16_bytes.hex())  # '3dd8 00de' (Little-Endian)

# UTF-16 BE (Big-Endian)
utf16_be = text.encode('utf-16-be')
print(utf16_be.hex())  # 'd83d de00'

UTF-16 Example

Here’s an implementation example using Python. Please review the code to understand the role of each part.

text = "Hello 한글"

# UTF-16 LE (Little-Endian)
utf16_le = text.encode('utf-16-le')
print(utf16_le.hex())
# 48 00 65 00 6c 00 6c 00 6f 00 20 00 5c d5 00 ae 00 b8

# UTF-16 BE (Big-Endian)
utf16_be = text.encode('utf-16-be')
print(utf16_be.hex())
# 00 48 00 65 00 6c 00 6c 00 6f 00 20 d5 5c ae 00 b8 00

UTF-32

UTF-32 encodes all characters with fixed 4-byte length.

Here’s an implementation example using Python. Please review the code to understand the role of each part.

text = "A한😀"

# UTF-32 LE
utf32 = text.encode('utf-32-le')
print(utf32.hex())
# 41 00 00 00  5c d5 00 00  00 f6 01 00

# Each character is exactly 4 bytes
# 'A':  0x00000041
# '한': 0x0000D55C
# '😀': 0x0001F600

Encoding Comparison

text = "Hello 한글 😀"

encodings = ['utf-8', 'utf-16-le', 'utf-16-be', 'utf-32-le']

for enc in encodings:
    encoded = text.encode(enc)
    print(f"{enc:12s}: {len(encoded):2d} bytes | {encoded.hex()[:40]}...")

# Output:
# utf-8       : 17 bytes | 48656c6c6f20ed959ceab880f09f9880
# utf-16-le   : 20 bytes | 480065006c006c006f002000d55c00aeb800...
# utf-16-be   : 20 bytes | 004800650069006c006f0020d55cae00b800...
# utf-32-le   : 36 bytes | 4100000065000000...

7. Korean Encoding (EUC-KR, CP949)

Korean Encoding History

Here’s an implementation example using mermaid. Try running the code directly to see how it works.

timeline
    title Korean Encoding Evolution
    1987 : KS X 1001\nComplete 2,350 chars
    1992 : EUC-KR\nComplete standard
    1996 : CP949 (MS)\nExtended complete 11,172 chars
    2000s : UTF-8\nUnicode based

EUC-KR

EUC-KR represents 2,350 Korean characters with 2 bytes.

Here’s an implementation example using Python. Ensure stability with error handling. Please review the code to understand the role of each part.

# EUC-KR encoding
text = "한글"

euckr_bytes = text.encode('euc-kr')
print(euckr_bytes.hex())  # c7d1 b1db

# '한': 0xC7D1
# '글': 0xB1DB

# Problem: Characters like '똠', '쀍' cannot be represented
try:
    "똠".encode('euc-kr')
except UnicodeEncodeError as e:
    print(f"❌ Cannot encode to EUC-KR: {e}")

CP949 (Extended Complete)

CP949 extends EUC-KR to support all 11,172 characters.

Here’s an implementation example using Python. Try running the code directly to see how it works.

# CP949 encoding
text = "똠방각하"

cp949_bytes = text.encode('cp949')
print(cp949_bytes.hex())

# Can represent characters not in EUC-KR
text2 = "쀍똠뙠"
print(text2.encode('cp949').hex())

UTF-8 vs EUC-KR Comparison

Here’s an implementation example using Python. Please review the code to understand the role of each part.

text = "Hello 한글"

# UTF-8: English 1 byte, Korean 3 bytes
utf8 = text.encode('utf-8')
print(f"UTF-8:   {len(utf8)} bytes | {utf8.hex()}")
# UTF-8:   14 bytes | 48656c6c6f20ed959ceab880

# EUC-KR: English 1 byte, Korean 2 bytes
euckr = text.encode('euc-kr')
print(f"EUC-KR:  {len(euckr)} bytes | {euckr.hex()}")
# EUC-KR:  10 bytes | 48656c6c6f20c7d1b1db

8. BOM and Endian

BOM (Byte Order Mark)

BOM is a special byte at the start of a file indicating encoding method and byte order.

Here’s an implementation example using code. Try running the code directly to see how it works.

Encoding   | BOM (hex)      | Size
UTF-8      | EF BB BF       | 3 bytes
UTF-16 LE  | FF FE          | 2 bytes
UTF-16 BE  | FE FF          | 2 bytes
UTF-32 LE  | FF FE 00 00    | 4 bytes
UTF-32 BE  | 00 00 FE FF    | 4 bytes

BOM Example

Here’s a detailed implementation using Python. Please review the code to understand the role of each part.

# UTF-8 with BOM
text = "Hello"
with open('file_with_bom.txt', 'wb') as f:
    f.write(b'\xef\xbb\xbf')  # BOM
    f.write(text.encode('utf-8'))

# File content (hex):
# EF BB BF 48 65 6C 6C 6F
# ^^^^^^^^ BOM
#          ^^^^^^^^^^^^^^ "Hello"

# UTF-8 without BOM (recommended)
with open('file_no_bom.txt', 'wb') as f:
    f.write(text.encode('utf-8'))

# File content (hex):
# 48 65 6C 6C 6F

BOM Detection and Removal

Here’s a detailed implementation using Python. Implement logic through functions. Please review the code to understand the role of each part.

def detect_and_remove_bom(data):
    """Detect and remove BOM"""
    bom_signatures = [
        (b'\xef\xbb\xbf', 'utf-8-sig'),
        (b'\xff\xfe\x00\x00', 'utf-32-le'),
        (b'\x00\x00\xfe\xff', 'utf-32-be'),
        (b'\xff\xfe', 'utf-16-le'),
        (b'\xfe\xff', 'utf-16-be'),
    ]
    
    for bom, encoding in bom_signatures:
        if data.startswith(bom):
            return data[len(bom):], encoding
    
    return data, None

# Usage
with open('file.txt', 'rb') as f:
    data = f.read()

data, encoding = detect_and_remove_bom(data)
if encoding:
    print(f"✅ BOM detected: {encoding}")
    text = data.decode(encoding.replace('-sig', ''))
else:
    print("ℹ️  No BOM, assuming UTF-8")
    text = data.decode('utf-8')

Endian (Byte Order)

Here’s a detailed implementation using Python. Please review the code to understand the role of each part.

# Big-Endian: Large byte first
# Little-Endian: Small byte first

# Example: Store 0x1234 in memory
# Big-Endian:    12 34
# Little-Endian: 34 12

# Important in UTF-16
text = "한"  # U+D55C

# UTF-16 BE (Big-Endian)
be = text.encode('utf-16-be')
print(be.hex())  # d5 5c

# UTF-16 LE (Little-Endian)
le = text.encode('utf-16-le')
print(le.hex())  # 5c d5

# UTF-8 is byte-based, so Endian independent
utf8 = text.encode('utf-8')
print(utf8.hex())  # ed 95 9c (always same)

9. Practical Problem Solving

Problem 1: Korean Character Corruption (��)

Cause

Here’s an implementation example using Python. Try running the code directly to see how it works.

# ❌ Saved as UTF-8 but read as EUC-KR
with open('file.txt', 'w', encoding='utf-8') as f:
    f.write("한글")

# Incorrect reading
with open('file.txt', 'r', encoding='euc-kr') as f:
    text = f.read()
    print(text)  # '���' (corrupted)

Solution

Here’s a detailed implementation using Python. Import necessary modules. Please review the code to understand the role of each part.

# ✅ Read with correct encoding
with open('file.txt', 'r', encoding='utf-8') as f:
    text = f.read()
    print(text)  # '한글' (correct)

# ✅ Auto-detect encoding
import chardet

with open('file.txt', 'rb') as f:
    raw_data = f.read()
    result = chardet.detect(raw_data)
    encoding = result['encoding']
    confidence = result['confidence']
    
    print(f"Detected: {encoding} ({confidence*100:.1f}% confidence)")
    
    text = raw_data.decode(encoding)
    print(text)

Problem 2: UnicodeDecodeError

Here’s a detailed implementation using Python. Ensure stability with error handling. Please review the code to understand the role of each part.

# ❌ Decode with wrong encoding
utf8_bytes = "한글".encode('utf-8')

try:
    text = utf8_bytes.decode('ascii')
except UnicodeDecodeError as e:
    print(f"❌ {e}")
    # 'ascii' codec can't decode byte 0xed in position 0

# ✅ Error handling options
# 1. Ignore
text = utf8_bytes.decode('ascii', errors='ignore')
print(text)  # "" (Korean removed)

# 2. Replace
text = utf8_bytes.decode('ascii', errors='replace')
print(text)  # "������" (replaced with ? character)

# 3. XML/HTML entity
text = utf8_bytes.decode('ascii', errors='xmlcharrefreplace')
print(text)  # "&#54620;&#44544;" (numeric reference)

Problem 3: Korean Character Corruption on Web

Here’s a detailed implementation using Python. Import necessary modules. Please review the code to understand the role of each part.

import requests

# ❌ Wrong method
response = requests.get('https://example.com/korean-page')
print(response.text)  # May be corrupted

# ✅ Check Content-Type header
response = requests.get('https://example.com/korean-page')
content_type = response.headers.get('Content-Type', '')
print(f"Content-Type: {content_type}")
# Content-Type: text/html; charset=euc-kr

# ✅ Decode with correct encoding
if 'euc-kr' in content_type.lower():
    text = response.content.decode('euc-kr')
else:
    text = response.text  # requests auto-detects

# ✅ Or auto-detect with chardet
import chardet
detected = chardet.detect(response.content)
text = response.content.decode(detected['encoding'])

Problem 4: CSV File Encoding

Here’s a detailed implementation using Python. Import necessary modules. Please review the code to understand the role of each part.

import csv

# ❌ CSV saved from Windows Excel (CP949)
with open('data.csv', 'r', encoding='utf-8') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)  # UnicodeDecodeError!

# ✅ Correct encoding
with open('data.csv', 'r', encoding='cp949') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

# ✅ Auto-detect encoding
import chardet

with open('data.csv', 'rb') as f:
    raw_data = f.read()
    detected = chardet.detect(raw_data)
    encoding = detected['encoding']

with open('data.csv', 'r', encoding=encoding) as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

10. Programming Language-specific Handling

Python

Here’s a detailed implementation using Python. Please review the code to understand the role of each part.

# Default encoding: UTF-8
text = "Hello 한글 😀"

# Encoding
utf8 = text.encode('utf-8')
utf16 = text.encode('utf-16')
euckr = text.encode('euc-kr')  # Emoji causes error

# Decoding
text = utf8.decode('utf-8')

# File I/O
with open('file.txt', 'w', encoding='utf-8') as f:
    f.write(text)

with open('file.txt', 'r', encoding='utf-8') as f:
    text = f.read()

# Byte string literal
utf8_bytes = b'\xed\x95\x9c\xea\xb8\x80'
text = utf8_bytes.decode('utf-8')  # "한글"

JavaScript/Node.js

Here’s a detailed implementation using JavaScript. Please review the code to understand the role of each part.

// JavaScript internal: UTF-16
const text = "Hello 한글 😀";

// String length (caution: surrogate pairs)
console.log(text.length);  // 11 (😀 counted as 2)

// Correct length
console.log([...text].length);  // 10

// UTF-8 encoding (Node.js)
const buffer = Buffer.from(text, 'utf-8');
console.log(buffer);  // <Buffer 48 65 6c 6c 6f 20 ...>

// Decoding
const decoded = buffer.toString('utf-8');
console.log(decoded);  // "Hello 한글 😀"

// Supported encodings
// utf-8, utf-16le, latin1, base64, hex, ascii

Java

Here’s a detailed implementation using Java. Please review the code to understand the role of each part.

// Java internal: UTF-16
String text = "Hello 한글 😀";

// UTF-8 encoding
byte[] utf8Bytes = text.getBytes(StandardCharsets.UTF_8);
System.out.println(Arrays.toString(utf8Bytes));

// Decoding
String decoded = new String(utf8Bytes, StandardCharsets.UTF_8);
System.out.println(decoded);

// File I/O
// Write as UTF-8
Files.writeString(
    Path.of("file.txt"), 
    text, 
    StandardCharsets.UTF_8
);

// Read as UTF-8
String content = Files.readString(
    Path.of("file.txt"), 
    StandardCharsets.UTF_8
);

C++

Here’s a detailed implementation using C++. Import necessary modules. Please review the code to understand the role of each part.

#include <iostream>
#include <fstream>
#include <string>
#include <codecvt>
#include <locale>

int main() {
    // UTF-8 string (C++11)
    std::string utf8_str = u8"Hello 한글 😀";
    
    // UTF-16 string
    std::u16string utf16_str = u"Hello 한글 😀";
    
    // UTF-32 string
    std::u32string utf32_str = U"Hello 한글 😀";
    
    // UTF-8 → UTF-16 conversion
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> converter;
    std::u16string utf16 = converter.from_bytes(utf8_str);
    
    // Write file (UTF-8)
    std::ofstream file("file.txt", std::ios::binary);
    file << utf8_str;
    file.close();
    
    // Read file
    std::ifstream input("file.txt", std::ios::binary);
    std::string content((std::istreambuf_iterator<char>(input)),
                        std::istreambuf_iterator<char>());
    
    std::cout << content << std::endl;
    
    return 0;
}

Go

Here’s a detailed implementation using Go. Import necessary modules, implement logic through functions. Please review the code to understand the role of each part.

package main

import (
    "fmt"
    "unicode/utf8"
    "golang.org/x/text/encoding/korean"
    "golang.org/x/text/transform"
    "io"
    "strings"
)

func main() {
    // Go internal: UTF-8
    text := "Hello 한글 😀"
    
    // Byte length vs character (rune) length
    fmt.Println("Bytes:", len(text))           // 17
    fmt.Println("Runes:", utf8.RuneCountInString(text))  // 10
    
    // UTF-8 → EUC-KR conversion
    encoder := korean.EUCKR.NewEncoder()
    euckrBytes, _, _ := transform.Bytes(encoder, []byte(text))
    fmt.Printf("EUC-KR: %x\n", euckrBytes)
    
    // EUC-KR → UTF-8 conversion
    decoder := korean.EUCKR.NewDecoder()
    utf8Text, _, _ := transform.String(decoder, string(euckrBytes))
    fmt.Println(utf8Text)
}

Advanced Topics

Normalization

Here’s a detailed implementation using Python. Import necessary modules. Please review the code to understand the role of each part.

import unicodedata

# Two ways to represent Korean '가'
# 1. Composed (NFC): U+AC00
nfc = "가"
print(f"NFC: {len(nfc)} chars, {nfc.encode('utf-8').hex()}")
# NFC: 1 chars, eab080

# 2. Decomposed (NFD): U+1100 + U+1161 (ㄱ + ㅏ)
nfd = unicodedata.normalize('NFD', nfc)
print(f"NFD: {len(nfd)} chars, {nfd.encode('utf-8').hex()}")
# NFD: 2 chars, e384 80e185a1

# Comparison
print(nfc == nfd)  # False (different byte sequence)

# Compare after normalization
print(unicodedata.normalize('NFC', nfc) == 
      unicodedata.normalize('NFC', nfd))  # True

Encoding Detection

Here’s a detailed implementation using Python. Import necessary modules, implement logic through functions. Please review the code to understand the role of each part.

import chardet

def detect_encoding(file_path):
    """Auto-detect file encoding"""
    with open(file_path, 'rb') as f:
        raw_data = f.read()
    
    result = chardet.detect(raw_data)
    
    return {
        'encoding': result['encoding'],
        'confidence': result['confidence'],
        'language': result.get('language', '')
    }

# Usage
info = detect_encoding('unknown.txt')
print(f"Encoding: {info['encoding']}")
print(f"Confidence: {info['confidence']*100:.1f}%")

# Read with correct encoding
with open('unknown.txt', 'r', encoding=info['encoding']) as f:
    content = f.read()

Encoding Conversion

Here’s an implementation example using Python. Implement logic through functions. Please review the code to understand the role of each part.

def convert_file_encoding(input_file, output_file, from_enc, to_enc):
    """Convert file encoding"""
    # Read original
    with open(input_file, 'r', encoding=from_enc) as f:
        content = f.read()
    
    # Save with new encoding
    with open(output_file, 'w', encoding=to_enc) as f:
        f.write(content)
    
    print(f"✅ Converted: {from_enc} → {to_enc}")

# EUC-KR → UTF-8 conversion
convert_file_encoding('old.txt', 'new.txt', 'euc-kr', 'utf-8')

Encoding in Web Development

HTML

Here’s an implementation example using HTML. Please review the code to understand the role of each part.

<!DOCTYPE html>
<html>
<head>
    <!-- ✅ UTF-8 declaration (required) -->
    <meta charset="UTF-8">
    <title>Korean Page</title>
</head>
<body>
    <h1>안녕하세요</h1>
</body>
</html>

HTTP Headers

Here’s a detailed implementation using Python. Import necessary modules, implement logic through functions. Please review the code to understand the role of each part.

from flask import Flask, Response

app = Flask(__name__)

@app.route('/korean')
def korean_page():
    content = "<h1>안녕하세요</h1>"
    
    # ✅ Specify charset in Content-Type
    return Response(
        content,
        mimetype='text/html; charset=utf-8'
    )

# ❌ Without charset, browser guesses (may corrupt)

JSON

Here’s an implementation example using Python. Import necessary modules. Please review the code to understand the role of each part.

import json

data = {"name": "홍길동", "message": "안녕하세요"}

# JSON is UTF-8 by default
json_str = json.dumps(data, ensure_ascii=False)
print(json_str)
# {"name": "홍길동", "message": "안녕하세요"}

# ensure_ascii=True (default)
json_str_ascii = json.dumps(data, ensure_ascii=True)
print(json_str_ascii)
# {"name": "\ud64d\uae38\ub3d9", "message": "\uc548\ub155\ud558\uc138\uc694"}

URL Encoding

Here’s a detailed implementation using Python. Import necessary modules. Please review the code to understand the role of each part.

from urllib.parse import quote, unquote

# URL with Korean
text = "한글 검색"

# URL encoding (UTF-8 based)
encoded = quote(text)
print(encoded)
# %ED%95%9C%EA%B8%80%20%EA%B2%80%EC%83%89

# URL decoding
decoded = unquote(encoded)
print(decoded)  # "한글 검색"

# Complete URL
url = f"https://example.com/search?q={encoded}"
print(url)
# https://example.com/search?q=%ED%95%9C%EA%B8%80%20%EA%B2%80%EC%83%89

Database Encoding

MySQL

Here’s a detailed implementation using SQL. Please review the code to understand the role of each part.

-- Create database (UTF-8)
CREATE DATABASE mydb
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;

-- utf8mb4: 4-byte UTF-8 (emoji support)
-- utf8: 3-byte UTF-8 (no emoji, deprecated)

-- Create table
CREATE TABLE users (
    id INT PRIMARY KEY,
    name VARCHAR(100) CHARACTER SET utf8mb4
);

-- Set encoding on connection
SET NAMES utf8mb4;

PostgreSQL

Here’s an implementation example using SQL. Please review the code to understand the role of each part.

-- Create database
CREATE DATABASE mydb
ENCODING 'UTF8'
LC_COLLATE 'ko_KR.UTF-8'
LC_CTYPE 'ko_KR.UTF-8';

-- Check client encoding
SHOW client_encoding;

-- Change encoding
SET client_encoding TO 'UTF8';

Python + DB

Here’s a detailed implementation using Python. Import necessary modules. Please review the code to understand the role of each part.

import psycopg2

# PostgreSQL connection
conn = psycopg2.connect(
    host='localhost',
    database='mydb',
    user='user',
    password='pass',
    client_encoding='utf8'  # ✅ Explicit specification
)

cursor = conn.cursor()

# Insert Korean data
cursor.execute(
    "INSERT INTO users (name) VALUES (%s)",
    ("홍길동",)
)

# Query
cursor.execute("SELECT name FROM users")
name = cursor.fetchone()[0]
print(name)  # "홍길동"

Practical Tools

Command Line Tools

Here’s a detailed implementation using bash. Please review the code to understand the role of each part.

# 1. Check encoding with file command
file -i file.txt
# file.txt: text/plain; charset=utf-8

# 2. Convert encoding with iconv
iconv -f EUC-KR -t UTF-8 old.txt > new.txt

# 3. Batch convert multiple files
find . -name "*.txt" -exec iconv -f EUC-KR -t UTF-8 {} -o {}.utf8 \;

# 4. Check bytes with hexdump
echo "한글" | hexdump -C
# 00000000  ed 95 9c ea b8 80 0a

# 5. Remove BOM
tail -c +4 file_with_bom.txt > file_no_bom.txt  # UTF-8 BOM (3 bytes)

Python Script

Here’s a detailed implementation using Python. Import necessary modules, implement logic through functions, ensure stability with error handling. Please review the code to understand the role of each part.

#!/usr/bin/env python3
"""
Batch file encoding conversion tool
"""
import os
import sys
import chardet
from pathlib import Path

def convert_directory(directory, from_enc=None, to_enc='utf-8'):
    """Convert encoding of all text files in directory"""
    for file_path in Path(directory).rglob('*.txt'):
        try:
            # Read original
            with open(file_path, 'rb') as f:
                raw_data = f.read()
            
            # Detect encoding
            if from_enc is None:
                detected = chardet.detect(raw_data)
                source_enc = detected['encoding']
                confidence = detected['confidence']
                
                if confidence < 0.7:
                    print(f"⚠️  {file_path}: Low confidence ({confidence:.2f})")
                    continue
            else:
                source_enc = from_enc
            
            # Skip if already UTF-8
            if source_enc.lower().replace('-', '') == 'utf8':
                print(f"✓ {file_path}: Already UTF-8")
                continue
            
            # Convert
            text = raw_data.decode(source_enc)
            
            # Save
            with open(file_path, 'w', encoding=to_enc) as f:
                f.write(text)
            
            print(f"✅ {file_path}: {source_enc} → {to_enc}")
            
        except Exception as e:
            print(f"❌ {file_path}: {e}")

if __name__ == '__main__':
    if len(sys.argv) < 2:
        print("Usage: python convert_encoding.py <directory>")
        sys.exit(1)
    
    convert_directory(sys.argv[1])

Encoding Comparison Table

Storage Space Comparison

text = "Hello 한글 😀"

encodings = {
    'ASCII (English only)': 'ascii',
    'UTF-8': 'utf-8',
    'UTF-16 LE': 'utf-16-le',
    'UTF-16 BE': 'utf-16-be',
    'UTF-32 LE': 'utf-32-le',
    'EUC-KR': 'euc-kr',
    'CP949': 'cp949',
}

print(f"Original text: {text}\n")
print(f"{'Encoding':20s} | {'Bytes':6s} | Hex")
print("-" * 60)

for name, enc in encodings.items():
    try:
        encoded = text.encode(enc)
        hex_str = encoded.hex()[:30] + ('...' if len(encoded) > 15 else '')
        print(f"{name:20s} | {len(encoded):4d}B | {hex_str}")
    except UnicodeEncodeError:
        print(f"{name:20s} | {'N/A':6s} | (Cannot encode)")

# Output:
# Original text: Hello 한글 😀
# 
# Encoding             | Bytes  | Hex
# ------------------------------------------------------------
# ASCII (English only) |    N/A | (Cannot encode)
# UTF-8                |   17B | 48656c6c6f20ed959ceab880f0...
# UTF-16 LE            |   20B | 480065006c006c006f00200000...
# UTF-16 BE            |   20B | 004800650069006c006f002000...
# UTF-32 LE            |   36B | 410000006500000069000000...
# EUC-KR               |    N/A | (Cannot encode)
# CP949                |    N/A | (Cannot encode)

Feature Comparison

Encoding	Bytes/Char	ASCII Compatible	Korean Efficiency	Emoji	Main Usage
ASCII	1	✅	❌	❌	English only
EUC-KR	1-2	✅	✅✅	❌	Korean legacy
CP949	1-2	✅	✅✅	❌	Windows Korean
UTF-8	1-4	✅	✅	✅	Web, Linux, modern standard
UTF-16	2-4	❌	✅✅	✅	Windows, Java internal
UTF-32	4	❌	❌	✅	Internal processing

Real-World Scenarios

Scenario 1: Legacy System Integration

Here’s a detailed implementation using Python. Import necessary modules. Please review the code to understand the role of each part.

# Problem: Bank API responds with EUC-KR
import requests

response = requests.get('http://legacy-bank-api.com/account')

# ❌ Auto-decode (assumes UTF-8)
# print(response.text)  # Corrupted

# ✅ Correct handling
content = response.content  # Bytes
text = content.decode('euc-kr')
print(text)

# ✅ Or provide hint to requests
response.encoding = 'euc-kr'
print(response.text)

Scenario 2: Multilingual Application

Here’s a detailed implementation using Python. Import necessary modules, implement logic through functions. Please review the code to understand the role of each part.

import locale
import sys

def setup_encoding():
    """Setup system encoding"""
    # Check stdout encoding
    print(f"stdout encoding: {sys.stdout.encoding}")
    
    # System locale
    print(f"System locale: {locale.getpreferredencoding()}")
    
    # Force UTF-8 (Python 3.7+)
    if sys.stdout.encoding != 'utf-8':
        sys.stdout.reconfigure(encoding='utf-8')

# Handle multilingual text
texts = {
    'en': "Hello",
    'ko': "안녕하세요",
    'ja': "こんにちは",
    'zh': "你好",
    'ar': "مرحبا",
    'ru': "Здравствуйте",
    'emoji': "👋🌍"
}

for lang, text in texts.items():
    utf8 = text.encode('utf-8')
    print(f"{lang:5s}: {text:15s} | {len(utf8):2d} bytes | {utf8.hex()[:30]}")

Scenario 3: File Upload Handling

from flask import Flask, request
import chardet

app = Flask(__name__)

@app.route('/upload', methods=['POST'])
def upload_file():
    file = request.files['file']
    
    # Read as binary
    content = file.read()
    
    # Detect encoding
    detected = chardet.detect(content)
    encoding = detected['encoding']
    confidence = detected['confidence']
    
    print(f"Detected: {encoding} ({confidence*100:.1f}%)")
    
    # Convert to UTF-8
    if encoding.lower() != 'utf-8':
        try:
            text = content.decode(encoding)
            utf8_content = text.encode('utf-8')
            
            return {
                'status': 'converted',
                'from': encoding,
                'to': 'utf-8',
                'content': text
            }
        except Exception as e:
            return {'status': 'error', 'message': str(e)}, 400
    
    return {
        'status': 'ok',
        'encoding': 'utf-8',
        'content': content.decode('utf-8')
    }

Best Practices

1. Always Use UTF-8

Here’s a detailed implementation using Python. Please review the code to understand the role of each part.

# ✅ File I/O
with open('file.txt', 'w', encoding='utf-8') as f:
    f.write("한글")

# ✅ Source code encoding declaration (Python 2)
# -*- coding: utf-8 -*-

# ✅ HTML
# <meta charset="UTF-8">

# ✅ HTTP header
# Content-Type: text/html; charset=utf-8

# ✅ Database
# CREATE DATABASE mydb CHARACTER SET utf8mb4;

2. Read in Binary Mode and Decode Explicitly

Here’s an implementation example using Python. Please review the code to understand the role of each part.

# ✅ Safe method
with open('file.txt', 'rb') as f:
    raw_data = f.read()

# Decode after checking encoding
text = raw_data.decode('utf-8')

# ❌ Risky method (uses system default encoding)
with open('file.txt', 'r') as f:  # encoding not specified
    text = f.read()

3. Error Handling

Here’s a detailed implementation using Python. Implement logic through functions, ensure stability with error handling. Please review the code to understand the role of each part.

# ✅ Error handling strategy
def safe_decode(data, encodings=['utf-8', 'cp949', 'euc-kr', 'latin-1']):
    """Try multiple encodings"""
    for enc in encodings:
        try:
            return data.decode(enc), enc
        except UnicodeDecodeError:
            continue
    
    # If all fail, decode ignoring errors
    return data.decode('utf-8', errors='replace'), 'utf-8'

# Usage
with open('unknown.txt', 'rb') as f:
    data = f.read()

text, encoding = safe_decode(data)
print(f"Decoded as {encoding}: {text}")

4. BOM Handling

Here’s an implementation example using Python. Please review the code to understand the role of each part.

# ✅ Auto-handle UTF-8 BOM
with open('file.txt', 'r', encoding='utf-8-sig') as f:
    text = f.read()  # Automatically removes BOM if present

# ✅ Save without BOM (recommended)
with open('file.txt', 'w', encoding='utf-8') as f:
    f.write(text)

# ❌ Save with BOM (avoid)
with open('file.txt', 'w', encoding='utf-8-sig') as f:
    f.write(text)

Problem Solving Checklist

When Korean Characters Are Corrupted

Here’s an implementation example using Python. Import necessary modules. Please review the code to understand the role of each part.

# 1. Check file encoding
import chardet

with open('file.txt', 'rb') as f:
    result = chardet.detect(f.read())
    print(result)

# 2. Read with correct encoding
with open('file.txt', 'r', encoding='cp949') as f:
    text = f.read()

# 3. Re-save as UTF-8
with open('file.txt', 'w', encoding='utf-8') as f:
    f.write(text)

When Korean Characters Are Corrupted on Web

Here’s a detailed implementation using Python. Import necessary modules. Please review the code to understand the role of each part.

# 1. Check HTTP header
import requests

response = requests.get('https://example.com')
print(response.encoding)  # ISO-8859-1 (wrong guess)

# 2. Set correct encoding
response.encoding = 'utf-8'
print(response.text)

# 3. Check Content-Type header
print(response.headers.get('Content-Type'))
# text/html; charset=euc-kr

# 4. Explicit decoding
text = response.content.decode('euc-kr')

When Korean Characters Are Corrupted in Database

Here’s a detailed implementation using Python. Import necessary modules. Please review the code to understand the role of each part.

# 1. Check connection encoding
import pymysql

conn = pymysql.connect(
    host='localhost',
    user='user',
    password='pass',
    database='mydb',
    charset='utf8mb4'  # ✅ Explicit specification
)

# 2. Check table encoding
cursor = conn.cursor()
cursor.execute("SHOW CREATE TABLE users")
print(cursor.fetchone())

# 3. Convert encoding
# ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4;

Summary

Encoding Selection Guide

Here’s an implementation example using mermaid. Please review the code to understand the role of each part.

flowchart TD
    Start[Start new project] --> Q1{Language?}
    
    Q1 -->|English only| ASCII["ASCII\nor UTF-8"]
    Q1 -->|Multilingual| UTF8["✅ UTF-8\nRecommended"]
    Q1 -->|Legacy integration| Q2{System?}
    
    Q2 -->|Windows Korean| CP949[CP949]
    Q2 -->|Unix Korean| EUCKR[EUC-KR]
    Q2 -->|Japanese| SJIS[Shift-JIS]
    
    UTF8 --> Best["✅ Best choice\n- Web standard\n- All characters supported\n- ASCII compatible"]

Core Principles

Here’s a detailed implementation using Python. Ensure stability with error handling. Please review the code to understand the role of each part.

# 1. Always use UTF-8
encoding = 'utf-8'

# 2. Specify encoding
with open('file.txt', 'w', encoding='utf-8') as f:
    f.write(text)

# 3. Binary mode + explicit decoding
with open('file.txt', 'rb') as f:
    data = f.read()
text = data.decode('utf-8')

# 4. Error handling
try:
    text = data.decode('utf-8')
except UnicodeDecodeError:
    text = data.decode('utf-8', errors='replace')

# 5. Test
assert "한글 😀".encode('utf-8').decode('utf-8') == "한글 😀"

Encoding Summary

Encoding	Bytes	Advantages	Disadvantages	When to Use
UTF-8	1-4	Web standard, ASCII compatible	Korean 3 bytes	All new projects
UTF-16	2-4	Korean 2 bytes	ASCII incompatible	Windows/Java internal
UTF-32	4	Fixed length	Space waste	Internal processing
EUC-KR	1-2	Korean 2 bytes	Some Korean unsupported	Legacy systems
CP949	1-2	All Korean supported	Windows only	Windows Korean

Debugging Tools

Python Encoding Debugger

Here’s a detailed implementation using Python. Implement logic through functions, ensure stability with error handling. Please review the code to understand the role of each part.

def analyze_encoding(file_path):
    """Detailed file encoding analysis"""
    with open(file_path, 'rb') as f:
        raw_data = f.read()
    
    print(f"📄 File: {file_path}")
    print(f"📊 Size: {len(raw_data)} bytes\n")
    
    # Check BOM
    if raw_data.startswith(b'\xef\xbb\xbf'):
        print("🔖 BOM: UTF-8")
    elif raw_data.startswith(b'\xff\xfe'):
        print("🔖 BOM: UTF-16 LE")
    elif raw_data.startswith(b'\xfe\xff'):
        print("🔖 BOM: UTF-16 BE")
    else:
        print("🔖 BOM: None")
    
    # Detect encoding
    detected = chardet.detect(raw_data)
    print(f"\n🔍 Detected encoding: {detected['encoding']}")
    print(f"📈 Confidence: {detected['confidence']*100:.1f}%")
    
    # Try multiple encodings
    print("\n🧪 Decoding test:")
    encodings = ['utf-8', 'cp949', 'euc-kr', 'utf-16', 'latin-1']
    
    for enc in encodings:
        try:
            text = raw_data.decode(enc)
            preview = text[:50].replace('\n', '\\n')
            print(f"  ✅ {enc:10s}: {preview}")
        except UnicodeDecodeError as e:
            print(f"  ❌ {enc:10s}: {e}")
    
    # Hex dump (first 100 bytes)
    print(f"\n🔢 Hex Dump (first 100 bytes):")
    for i in range(0, min(100, len(raw_data)), 16):
        hex_str = ' '.join(f'{b:02x}' for b in raw_data[i:i+16])
        ascii_str = ''.join(chr(b) if 32 <= b < 127 else '.' for b in raw_data[i:i+16])
        print(f"  {i:04x}: {hex_str:48s} | {ascii_str}")

# Usage
analyze_encoding('mystery.txt')

References

One-line Summary: Use UTF-8 for all new projects, consider EUC-KR/CP949 only for legacy system integration, and always explicitly specify encoding to prevent Korean character corruption.

이 글의 핵심

Introduction: Why Should You Know Character Encoding?

Reality in Practice

Table of Contents

1. History of Character Encoding

Timeline

Why Do Multiple Encodings Exist?

2. ASCII: 7-bit Character Set

What is ASCII?

ASCII Table

ASCII Control Characters

ASCII Examples

3. ANSI and Code Pages

What is ANSI?

Major Code Pages

Code Page Problems

4. Unicode: Global Character Integration

What is Unicode?

Unicode Structure

Korean Unicode Range

Unicode vs Encoding

5. UTF-8: Variable Length Encoding

What is UTF-8?

UTF-8 Encoding Rules

UTF-8 Encoding Examples

English ‘A’ (U+0041)

Korean ‘한’ (U+D55C)

Emoji ’😀’ (U+1F600)

UTF-8 Advantages

UTF-8 Encoding with Python

6. UTF-16 and UTF-32

UTF-16

UTF-16 Encoding Rules

Surrogate Pair

UTF-16 Example

UTF-32

Encoding Comparison

7. Korean Encoding (EUC-KR, CP949)

Korean Encoding History

EUC-KR

CP949 (Extended Complete)

UTF-8 vs EUC-KR Comparison

8. BOM and Endian

BOM (Byte Order Mark)

BOM Example

BOM Detection and Removal

Endian (Byte Order)

9. Practical Problem Solving

Problem 1: Korean Character Corruption (���)

Cause

Solution

Problem 2: UnicodeDecodeError

Problem 3: Korean Character Corruption on Web

Problem 4: CSV File Encoding

10. Programming Language-specific Handling

Python

JavaScript/Node.js

Java

C++

Go

Advanced Topics

Normalization

Encoding Detection

Encoding Conversion

Encoding in Web Development

HTML

HTTP Headers

JSON

URL Encoding

Database Encoding

MySQL

PostgreSQL

Python + DB

Practical Tools

Command Line Tools

Python Script

Encoding Comparison Table

Storage Space Comparison

Feature Comparison

Real-World Scenarios

Problem 1: Korean Character Corruption (��)