Jun 7, 2023 · 4 min read FundamentalsNetworks

How TCP/IP Actually Works — and Why the Handshake Still Bites You

Every API call, model request, and database query rides on TCP/IP. A refresher on the three-way handshake, sequence numbers, and flow control — and why round-trips, connection reuse, and TIME_WAIT quietly shape the latency of your cloud and AI systems.

Sequence diagram of the TCP three-way handshake: SYN, SYN-ACK, ACK, then data flows

Before a single byte of your data moves, both sides exchange three messages to agree they’re talking.

Every request your systems make — a call to a model API, a query to a database, a fetch from object storage — rides on TCP/IP. It is so reliable and so invisible that most engineers never think about it until a latency graph makes no sense or a connection pool exhausts itself at 3am. This is a refresher on what TCP is really doing under your requests.get(), and why that hidden machinery still shapes the performance of very modern, very AI-heavy systems.

The division of labor: IP and TCP

The stack splits into two jobs. IP (Internet Protocol) handles addressing and delivery — getting a packet from one machine to another across a mess of routers, with no promises. Packets can arrive out of order, duplicated, or not at all. IP is a postcard system: best effort, no guarantees.

TCP (Transmission Control Protocol) sits on top and turns that unreliable postcard system into something you can trust: a reliable, ordered stream of bytes. It does this with sequence numbers (so out-of-order packets can be reassembled), acknowledgments (so lost packets get resent), and flow control (so a fast sender doesn’t drown a slow receiver). When you read() from a socket and get exactly the bytes that were sent, in order, TCP did that work.

The three-way handshake

Before any data flows, TCP establishes a connection with the handshake in the diagram:

SYN — the client sends a “synchronize” packet with an initial sequence number x: “I want to talk, and I’ll number my bytes starting near x.”
SYN-ACK — the server replies acknowledging x and offering its own starting sequence number y: “Got it, and here’s my numbering.”
ACK — the client acknowledges y. Both sides now agree on sequence numbers, and the connection is ESTABLISHED.

Only now does application data move. The reason for three messages and not two: both directions need to agree on sequence numbers, and both need confirmation that the other side received their number. It is the minimum ceremony for a reliable two-way channel.

Why the handshake is a latency tax

Here is the part that matters in production. That handshake costs a full network round-trip before your first byte of useful data. If your client is in Virginia and your server is in Frankfurt, that’s ~90ms spent just agreeing to talk — before the request even starts. Add TLS (which we’ll cover in its own post) and you pay another one or two round-trips on top.

This single fact drives a huge amount of real-world engineering:

Connection pooling and keep-alive exist to avoid re-paying the handshake. Reusing a warm connection for the next request skips SYN/SYN-ACK/ACK entirely. A connection pool that’s too small forces new handshakes under load; too large and you exhaust file descriptors. This is why tuning HTTP client pools is one of the highest-leverage latency fixes in a microservice mesh.
HTTP/2 multiplexing lets many logical requests share one TCP connection, amortizing that one handshake across thousands of calls.
QUIC / HTTP/3 goes further and folds the transport and TLS handshakes together over UDP, cutting the setup round-trips — a direct assault on exactly this cost.

TIME_WAIT and the connections that won’t die

When a TCP connection closes, the side that closed it holds the socket in a TIME_WAIT state for a while (to catch stray delayed packets). Under high connection churn — lots of short-lived connections being opened and closed — TIME_WAIT sockets pile up and can exhaust the available port range. If you’ve ever seen a load generator or a chatty service start throwing “cannot assign requested address,” you’ve met it. The fix is almost always fewer, longer-lived connections — which loops right back to pooling.

The AI-era relevance

You might think a protocol from 1974 has nothing to say about LLM infrastructure. It has plenty.

Streaming responses live and die by the connection. When you stream tokens from a model, you’re holding a single TCP connection open for the duration of the generation — potentially tens of seconds. That changes your connection-pool math completely: connections are held far longer than in a typical REST service, so pools sized for quick request/response will starve. Long-lived streaming is a different traffic shape, and TCP is where you feel it.

Agents make many small calls. An agent doing a task fires off tool call after tool call — each one potentially a fresh HTTP request to a different service. Without connection reuse, every step re-pays the handshake tax, and a ten-step agent loop can spend more time shaking hands than thinking. This is a real, measurable source of agent latency that never shows up in the prompt.

Retries interact with handshakes. A retry storm doesn’t just re-send requests — it re-opens connections, re-runs handshakes, and can push a struggling service from “slow” into “collapsed under connection load.” Understanding that a retry is not free at the transport layer is part of building systems that degrade gracefully.

The takeaway

TCP gives you a beautiful abstraction — a reliable stream of bytes — and the abstraction is so good it’s easy to forget there’s a cost underneath. But the handshake is real, the round-trips are real, and connection lifecycle is one of the most common places modern systems, including AI systems, quietly bleed latency. You don’t need to hand-write packet parsers. You do need to know that every connection has a setup cost, that reuse is how you avoid it, and that when the latency numbers stop making sense, the transport layer is often where the answer is hiding.