TCP: Complete Guide | Handshake, Flow & Congestion Control, Sockets

TCP: Complete Guide | Handshake, Flow & Congestion Control, Sockets

이 글의 핵심

TCP delivers a reliable byte stream with flow and congestion control—socket options and kernel behavior drive service stability.

Introduction

TCP (Transmission Control Protocol) is the default reliable transport on the internet—HTTP/HTTPS, SSH, most database protocols, and many microservice RPCs run over it. Connection state, retransmissions, and ordering “just work” because of kernel TCP and the socket API.

Yet latency, throughput, and concurrent connections per server depend heavily on settings and traffic patterns. This article maps RFCs and kernel behavior to socket options and common incidents.

After reading this post

  • Explain 3-way handshake, 4-way close, and state transitions
  • Separate flow control (rwnd) from congestion control (cwnd, Reno/CUBIC, …)
  • Apply basic socket patterns in C++, Python, and JavaScript
  • Respond to TIME_WAIT, Nagle, and keepalive in operations

Table of contents

  1. Protocol overview
  2. How it works
  3. Hands-on programming
  4. Performance characteristics
  5. Real-world use cases
  6. Optimization tips
  7. Common problems
  8. Wrap-up

Protocol overview

History and background

TCP evolved from 1974 work by Vint Cerf and Bob Kahn; RFC 793 (1981) was the classic reference. Later additions include SACK, window scaling, timestamps, and congestion-control updates, consolidated in RFC 9293 (2022). Implementations differ: BSD Reno, Linux default CUBIC, etc.

OSI placement

TCP is layer 4 (transport). IP (layer 3) delivers packets between hosts; TCP provides a reliable byte stream between ports (processes). Segmentation, retransmission, and ordering are TCP’s job.

Core properties

PropertyDescription
Connection-orientedLogical connection and state machine before bulk data.
ReliableRetransmits on loss; sequence numbers fix reordering.
Ordered deliveryApplication reads bytes in send order (no message boundaries).
Flow controlrwnd matches sender rate to receiver buffer.
Congestion controlReacts to network congestion—cwnd and algorithm.
Full duplexBoth directions on one connection (implemented with buffers and ACK interplay).

How it works

3-way handshake

SYN → SYN-ACK → ACK. The client sends an ISN, the server responds with its ISN and ACK, the client ACKs—then ESTABLISHED.

sequenceDiagram
  participant C as Client
  participant S as Server
  C->>S: SYN, seq=x
  S->>C: SYN-ACK, seq=y, ack=x+1
  C->>S: ACK, seq=x+1, ack=y+1
  Note over C,S: ESTABLISHED

4-way termination

Each direction closes with FIN/ACK. The side that sends the last ACK enters TIME_WAIT briefly so late duplicates do not collide with a new connection on the same quad.

sequenceDiagram
  participant A as Host A
  participant B as Host B
  A->>B: FIN
  B->>A: ACK
  B->>A: FIN
  A->>B: ACK
  Note over A: TIME_WAIT (2MSL)

Flow control (sliding window)

rwnd advertises how much more data the receiver can buffer. The sender limits unacknowledged bytes to stay within rwnd—protects the receiver.

Congestion control

Congestion control reacts to router queues and drops to protect the whole network. Reno includes slow start, congestion avoidance, fast retransmit/recovery. Linux CUBIC adjusts window growth on high-BDP links. Exact parameters depend on kernel version and sysctl.

flowchart TB
  subgraph fc [Flow control]
    R[Receive buffer rwnd]
  end
  subgraph cc [Congestion control]
    CWND[cwnd / algorithm]
  end
  SEND[Data actually sent]
  R --> SEND
  CWND --> SEND

Hands-on programming

Educational minimal examples—production needs async I/O, TLS, pools, and logging.

C++ (Berkeley sockets)

// g++ -std=c++17 -O2 tcp_client.cpp -o tcp_client
#include <arpa/inet.h>
#include <cstring>
#include <iostream>
#include <string>
#include <sys/socket.h>
#include <unistd.h>

int main(int argc, char* argv[]) {
  const char* host = argc > 1 ? argv[1] : "127.0.0.1";
  const uint16_t port = argc > 2 ? static_cast<uint16_t>(std::stoi(argv[2])) : 8080;

  int fd = ::socket(AF_INET, SOCK_STREAM, 0);
  if (fd < 0) { perror("socket"); return 1; }

  sockaddr_in addr{};
  addr.sin_family = AF_INET;
  addr.sin_port = htons(port);
  if (inet_pton(AF_INET, host, &addr.sin_addr) != 1) {
    std::cerr << "inet_pton failed\n";
    return 1;
  }

  if (connect(fd, reinterpret_cast<sockaddr*>(&addr), sizeof(addr)) < 0) {
    perror("connect");
    close(fd);
    return 1;
  }

  const std::string msg = "ping\n";
  ssize_t n = send(fd, msg.data(), msg.size(), 0);
  if (n < 0) { perror("send"); close(fd); return 1; }

  char buf[4096];
  n = recv(fd, buf, sizeof(buf) - 1, 0);
  if (n < 0) { perror("recv"); close(fd); return 1; }
  if (n == 0) std::cout << "peer closed\n";
  else { buf[n] = '\0'; std::cout << buf; }

  close(fd);
  return 0;
}
  • Check every socket/connect/send/recv return.
  • send/recv may be partial—loop for full buffers in real code.

Python 3

#!/usr/bin/env python3
import socket
import sys

def main() -> None:
    host = sys.argv[1] if len(sys.argv) > 1 else "127.0.0.1"
    port = int(sys.argv[2]) if len(sys.argv) > 2 else 8080

    with socket.create_connection((host, port), timeout=10.0) as sock:
        sock.sendall(b"ping\n")
        data = sock.recv(4096)
        if not data:
            print("peer closed")
        else:
            print(data.decode("utf-8", errors="replace"))

if __name__ == "__main__":
    main()

JavaScript (Node.js)

// node tcp_client.mjs
import net from "node:net";

const host = process.argv[2] ?? "127.0.0.1";
const port = Number(process.argv[3] ?? 8080);

const socket = net.createConnection({ host, port });

socket.setTimeout(10_000);
socket.on("timeout", () => socket.destroy(new Error("idle timeout")));

socket.on("connect", () => {
  socket.write("ping\n");
});

socket.on("data", (chunk) => {
  process.stdout.write(chunk);
});

socket.on("error", (err) => {
  console.error(err.message);
  process.exitCode = 1;
});

Timeouts and errors

  • Connect timeout: limit long connects with OS options or async wrappers.
  • Read timeout: SO_RCVTIMEO, socket.setTimeout, sock.settimeout, … to avoid infinite recv.
  • Retries: safest for idempotent application requests only.

Performance characteristics

Latency

RTT caps throughput on many workloads. Request–response with small messages often hits RTT per exchange—sensitive to packet count, ACK timing, and Nagle.

Throughput

If window sizes (receive + congestion) fall short of BDP (bandwidth × delay), you underfill the pipe. Window scaling, buffer tuning, and application I/O matter on fast long-haul links.

Overhead

Each segment adds IP + TCP headers (20+ bytes + options). TLS adds handshakes and crypto. Small payloads waste header ratio.

Benchmarks

Loopback can reach Gbps; cross-region may sit at Mbps to tens of Mbps due to RTT and loss. Measure with iperf3 in target environments before committing to architecture.


Real-world use cases

Web and APIs

HTTP/1.1 and HTTP/2 ride TCP (HTTP/3 uses QUIC over UDP). Reverse proxies and load balancers tune reuse, timeouts, and buffers—directly impacting latency.

Databases

PostgreSQL, MySQL, … default to TCP. Connection pools amortize TCP + auth cost.

File transfer

FTP control is TCP, SFTP, rsync over SSH—order and reliability matter. Large transfers: profile disk vs network vs window limits.


Optimization tips

Nagle’s algorithm

Nagle batches small writes to reduce packet count but can add delay. Interactive workloads often set TCP_NODELAY.

TCP_NODELAY

Disables Nagle—small writes go out immediately at the cost of more packets and sometimes lower goodput.

SO_KEEPALIVE

Probes idle connections. Often overlaps with app heartbeats—coordinate with proxy and firewall idle timeouts.

Buffers and windows

SO_SNDBUF / SO_RCVBUF and global sysctl limits trade throughput vs memory. Change after measurement.


Common problems

Too many TIME_WAITs

Bursty short connections pressure local ports and kernel tables. Mitigate with HTTP keep-alive, pools, and careful server-side tuning (e.g. reuse options—verify kernel and security guidance).

Resets and disconnects

RST reasons include closed port, middleboxes, timeouts. Use tcpdump to see who sent FIN/RST.

Slow transfers

Mix of send buffer stall, slow recv, cwnd limits, disk. Separate iperf on one connection from app profiling.


Wrap-up

Summary

  • TCP provides reliability, ordering, flow control, and congestion control—the transport backbone of the internet.
  • Handshake, teardown, and TIME_WAIT tie directly to ops incidents.
  • Nagle, windows, buffers, and keepalive are the latency vs throughput levers.

When to choose TCP

  • Files, APIs, databases, remote shells when integrity dominates. For low-latency realtime, compare the UDP guide and WebRTC guide.