TCP: Complete Guide | Handshake, Flow & Congestion Control, Sockets
이 글의 핵심
TCP delivers a reliable byte stream with flow and congestion control—socket options and kernel behavior drive service stability.
Introduction
TCP (Transmission Control Protocol) is the default reliable transport on the internet—HTTP/HTTPS, SSH, most database protocols, and many microservice RPCs run over it. Connection state, retransmissions, and ordering “just work” because of kernel TCP and the socket API.
Yet latency, throughput, and concurrent connections per server depend heavily on settings and traffic patterns. This article maps RFCs and kernel behavior to socket options and common incidents.
After reading this post
- Explain 3-way handshake, 4-way close, and state transitions
- Separate flow control (rwnd) from congestion control (cwnd, Reno/CUBIC, …)
- Apply basic socket patterns in C++, Python, and JavaScript
- Respond to TIME_WAIT, Nagle, and keepalive in operations
Table of contents
- Protocol overview
- How it works
- Hands-on programming
- Performance characteristics
- Real-world use cases
- Optimization tips
- Common problems
- Wrap-up
Protocol overview
History and background
TCP evolved from 1974 work by Vint Cerf and Bob Kahn; RFC 793 (1981) was the classic reference. Later additions include SACK, window scaling, timestamps, and congestion-control updates, consolidated in RFC 9293 (2022). Implementations differ: BSD Reno, Linux default CUBIC, etc.
OSI placement
TCP is layer 4 (transport). IP (layer 3) delivers packets between hosts; TCP provides a reliable byte stream between ports (processes). Segmentation, retransmission, and ordering are TCP’s job.
Core properties
| Property | Description |
|---|---|
| Connection-oriented | Logical connection and state machine before bulk data. |
| Reliable | Retransmits on loss; sequence numbers fix reordering. |
| Ordered delivery | Application reads bytes in send order (no message boundaries). |
| Flow control | rwnd matches sender rate to receiver buffer. |
| Congestion control | Reacts to network congestion—cwnd and algorithm. |
| Full duplex | Both directions on one connection (implemented with buffers and ACK interplay). |
How it works
3-way handshake
SYN → SYN-ACK → ACK. The client sends an ISN, the server responds with its ISN and ACK, the client ACKs—then ESTABLISHED.
sequenceDiagram participant C as Client participant S as Server C->>S: SYN, seq=x S->>C: SYN-ACK, seq=y, ack=x+1 C->>S: ACK, seq=x+1, ack=y+1 Note over C,S: ESTABLISHED
4-way termination
Each direction closes with FIN/ACK. The side that sends the last ACK enters TIME_WAIT briefly so late duplicates do not collide with a new connection on the same quad.
sequenceDiagram participant A as Host A participant B as Host B A->>B: FIN B->>A: ACK B->>A: FIN A->>B: ACK Note over A: TIME_WAIT (2MSL)
Flow control (sliding window)
rwnd advertises how much more data the receiver can buffer. The sender limits unacknowledged bytes to stay within rwnd—protects the receiver.
Congestion control
Congestion control reacts to router queues and drops to protect the whole network. Reno includes slow start, congestion avoidance, fast retransmit/recovery. Linux CUBIC adjusts window growth on high-BDP links. Exact parameters depend on kernel version and sysctl.
flowchart TB
subgraph fc [Flow control]
R[Receive buffer rwnd]
end
subgraph cc [Congestion control]
CWND[cwnd / algorithm]
end
SEND[Data actually sent]
R --> SEND
CWND --> SEND
Hands-on programming
Educational minimal examples—production needs async I/O, TLS, pools, and logging.
C++ (Berkeley sockets)
// g++ -std=c++17 -O2 tcp_client.cpp -o tcp_client
#include <arpa/inet.h>
#include <cstring>
#include <iostream>
#include <string>
#include <sys/socket.h>
#include <unistd.h>
int main(int argc, char* argv[]) {
const char* host = argc > 1 ? argv[1] : "127.0.0.1";
const uint16_t port = argc > 2 ? static_cast<uint16_t>(std::stoi(argv[2])) : 8080;
int fd = ::socket(AF_INET, SOCK_STREAM, 0);
if (fd < 0) { perror("socket"); return 1; }
sockaddr_in addr{};
addr.sin_family = AF_INET;
addr.sin_port = htons(port);
if (inet_pton(AF_INET, host, &addr.sin_addr) != 1) {
std::cerr << "inet_pton failed\n";
return 1;
}
if (connect(fd, reinterpret_cast<sockaddr*>(&addr), sizeof(addr)) < 0) {
perror("connect");
close(fd);
return 1;
}
const std::string msg = "ping\n";
ssize_t n = send(fd, msg.data(), msg.size(), 0);
if (n < 0) { perror("send"); close(fd); return 1; }
char buf[4096];
n = recv(fd, buf, sizeof(buf) - 1, 0);
if (n < 0) { perror("recv"); close(fd); return 1; }
if (n == 0) std::cout << "peer closed\n";
else { buf[n] = '\0'; std::cout << buf; }
close(fd);
return 0;
}
- Check every
socket/connect/send/recvreturn. send/recvmay be partial—loop for full buffers in real code.
Python 3
#!/usr/bin/env python3
import socket
import sys
def main() -> None:
host = sys.argv[1] if len(sys.argv) > 1 else "127.0.0.1"
port = int(sys.argv[2]) if len(sys.argv) > 2 else 8080
with socket.create_connection((host, port), timeout=10.0) as sock:
sock.sendall(b"ping\n")
data = sock.recv(4096)
if not data:
print("peer closed")
else:
print(data.decode("utf-8", errors="replace"))
if __name__ == "__main__":
main()
JavaScript (Node.js)
// node tcp_client.mjs
import net from "node:net";
const host = process.argv[2] ?? "127.0.0.1";
const port = Number(process.argv[3] ?? 8080);
const socket = net.createConnection({ host, port });
socket.setTimeout(10_000);
socket.on("timeout", () => socket.destroy(new Error("idle timeout")));
socket.on("connect", () => {
socket.write("ping\n");
});
socket.on("data", (chunk) => {
process.stdout.write(chunk);
});
socket.on("error", (err) => {
console.error(err.message);
process.exitCode = 1;
});
Timeouts and errors
- Connect timeout: limit long connects with OS options or async wrappers.
- Read timeout:
SO_RCVTIMEO,socket.setTimeout,sock.settimeout, … to avoid infinite recv. - Retries: safest for idempotent application requests only.
Performance characteristics
Latency
RTT caps throughput on many workloads. Request–response with small messages often hits RTT per exchange—sensitive to packet count, ACK timing, and Nagle.
Throughput
If window sizes (receive + congestion) fall short of BDP (bandwidth × delay), you underfill the pipe. Window scaling, buffer tuning, and application I/O matter on fast long-haul links.
Overhead
Each segment adds IP + TCP headers (20+ bytes + options). TLS adds handshakes and crypto. Small payloads waste header ratio.
Benchmarks
Loopback can reach Gbps; cross-region may sit at Mbps to tens of Mbps due to RTT and loss. Measure with iperf3 in target environments before committing to architecture.
Real-world use cases
Web and APIs
HTTP/1.1 and HTTP/2 ride TCP (HTTP/3 uses QUIC over UDP). Reverse proxies and load balancers tune reuse, timeouts, and buffers—directly impacting latency.
Databases
PostgreSQL, MySQL, … default to TCP. Connection pools amortize TCP + auth cost.
File transfer
FTP control is TCP, SFTP, rsync over SSH—order and reliability matter. Large transfers: profile disk vs network vs window limits.
Optimization tips
Nagle’s algorithm
Nagle batches small writes to reduce packet count but can add delay. Interactive workloads often set TCP_NODELAY.
TCP_NODELAY
Disables Nagle—small writes go out immediately at the cost of more packets and sometimes lower goodput.
SO_KEEPALIVE
Probes idle connections. Often overlaps with app heartbeats—coordinate with proxy and firewall idle timeouts.
Buffers and windows
SO_SNDBUF / SO_RCVBUF and global sysctl limits trade throughput vs memory. Change after measurement.
Common problems
Too many TIME_WAITs
Bursty short connections pressure local ports and kernel tables. Mitigate with HTTP keep-alive, pools, and careful server-side tuning (e.g. reuse options—verify kernel and security guidance).
Resets and disconnects
RST reasons include closed port, middleboxes, timeouts. Use tcpdump to see who sent FIN/RST.
Slow transfers
Mix of send buffer stall, slow recv, cwnd limits, disk. Separate iperf on one connection from app profiling.
Wrap-up
Summary
- TCP provides reliability, ordering, flow control, and congestion control—the transport backbone of the internet.
- Handshake, teardown, and TIME_WAIT tie directly to ops incidents.
- Nagle, windows, buffers, and keepalive are the latency vs throughput levers.
When to choose TCP
- Files, APIs, databases, remote shells when integrity dominates. For low-latency realtime, compare the UDP guide and WebRTC guide.