How Should a Protocol Library Handle I/O?

This article asks rude questions and isn't well organized. I wouldn't recommend reading further, unless you find yourself doing so anyway. It touches on some SSH protocol concepts (key exchange, KEXINIT, channels) — if you're not familiar with the basic SSH handshake flow, parts of this may be rough going.

I'm implementing an SSH server in Zig. While surveying existing SSH libraries, I noticed they handle I/O in fundamentally different ways — some take over the socket and do all the reading and writing internally, others never touch the network and only accept raw bytes. It took me a long time to understand why this divergence exists, and it turned out to go much deeper than I expected.

When read() Is Buried Inside Protocol Logic

Here's what an SSH library that "does its own I/O" looks like. The caller passes in a socket, calls run(), and the library handles everything:

fn run(self) {
    write(self.fd, "SSH-2.0-mylib\r\n");
    data = read(self.fd);                  // blocks
    write(self.fd, self.build_kexinit());
    data = read(self.fd);                  // blocks
    self.parse_kexinit(data);
    // ... key exchange, auth ...
    while (!self.done) {
        data = read(self.fd);              // blocks
        response = self.process(self.decrypt(data));
        write(self.fd, self.encrypt(response));
    }
}

Notice that read() calls are scattered between protocol logic. The code that builds the KEXINIT message sits between two read() calls. The library's protocol flow dictates when reading happens — the caller has no way to intervene.

For a single connection this is fine. For 50 connections it isn't. read() blocks — the OS suspends the thread until data arrives. While thread A is stuck on connection 1's read(), it can't process connection 2. Fifty connections, fifty threads.

What About Running an Event Loop Inside the Library?

epoll lets one thread monitor all sockets — process whichever has data. So why not have the library run an epoll event loop internally?

Cross-platform isn't the issue — libuv and libevent already abstract away epoll/kqueue/IOCP differences. The issue is that my application already has its own event loop.

My SSH server needs to integrate with libvaxis, a TUI framework. libvaxis has its own loop — reading terminal input, driving rendering. If the SSH library brings another event loop, the process has two loops. One thread can only run one loop, so they'd need separate threads plus inter-thread communication. The SSH library picked libuv, and that choice becomes my transitive dependency.

Can they nest? The outer loop checks SSH each round — data available, process it; nothing, go render, come back later. A cooperative scheduling approach.

The direction is right, but there's a detail: epoll_wait() isn't a cooperative yield — it's a blocking syscall. The calling thread gets suspended by the kernel until a socket is ready. During that suspension the outer loop can't run — the thread's execution is taken away by the kernel, not voluntarily relinquished by the SSH library.

Cooperative yield: the function voluntarily returns control. Blocking syscall: the kernel suspends the entire thread. The first is walking away on your own. The second is being held down.

What if the library's internal loop doesn't block? Each round, non-blocking check on the socket — data available, process it; nothing, return immediately:

fn ssh_poll_once() -> SshResult {
    // non-blocking check: process if data, return immediately if not
}

while (true) {
    libvaxis_poll_once();
    ssh_poll_once();
}

I stopped here, feeling like I'd derived something, but couldn't quite name it.

Pollable API

I later discovered this pattern appears repeatedly in well-known C libraries. libssh2's non-blocking mode is one — after calling libssh2_session_set_blocking(0), all operations become non-blocking, and the caller retrieves the library's internal socket fd via libssh2_session_get_fd() to add to their own poll/epoll set. An even more canonical example is libcurl's multi interface[1] — Daniel Stenberg specifically designed this API to let callers use curl_multi_fdset() to get all fds managed by libcurl, add them to their own select() or poll() set, and call curl_multi_perform() to let libcurl do one step of non-blocking processing when fds are ready. The common characteristic: the library holds the fd and handles low-level I/O, but exposes the fd so the caller controls scheduling. I call this Pollable API in this article — it's not a formal term (unlike Sans-IO which has an authoritative definition[2]), but my own label for this class of designs.

The library typically doesn't run its own event loop internally — it just holds a socket fd and exposes it. The caller adds it to their epoll set. When the fd is ready, the caller tells the library to process; the library does one non-blocking read() and returns. The whole time there's only one epoll — the caller's.

libssh2_session_set_blocking(session, 0);
ssh_fd = libssh2_session_get_fd(session);

epoll_add(ssh_fd);
epoll_add(tui_fd);

while (true) {
    ready = epoll_wait(...);
    if (ssh_fd in ready) libssh2_process();
    if (tui_fd in ready) libvaxis_process();
}

But the Pollable API has a practical pain point. SSH handshake requires calling handshake() → userauth() → channel_open() in sequence. In blocking mode each function waits until completion before returning — the caller just writes them in order. In non-blocking mode each function may return EAGAIN — "data hasn't arrived, try again later":

int step = 0;
while (true) {
    poll(&pfd, 1, -1);
    if (step == 0) {
        rc = libssh2_session_handshake(session, fd);
        if (rc == LIBSSH2_ERROR_EAGAIN) continue;
        step = 1;
    }
    if (step == 1) {
        rc = libssh2_userauth_password(session, "user", "pass");
        if (rc == LIBSSH2_ERROR_EAGAIN) continue;
        step = 2;
    }
    // ...
}

That step variable is a state machine the caller is forced to maintain by hand. In blocking mode the call stack automatically remembers "where we are" — the function hasn't returned, so naturally we're still on that step. In non-blocking mode the function returns early, the call stack information is lost, and the caller has to track it with a variable.

Sans-IO

Looking back at the Pollable API's step variable, where's the real problem? The library still calls read() and write() internally — just non-blocking now. The library holds the socket fd, the library decides when to call read() on it — the caller only controls "when to give the library a chance to execute." The binding between library and fd remains.

What if that binding is removed too? The library doesn't hold a socket fd, doesn't call read(), doesn't call write(). The library is just a state machine — the caller feeds in bytes read from the network, the library processes them and produces bytes for the caller to send. Where bytes come from and go to, the library doesn't know and doesn't care.

This is Sans-IO[2] — French for "without I/O." Python's hyper-h2 (HTTP/2), Rust's rustls (TLS), and MiSSHod (the Zig SSH library I'm referencing) all use this pattern.

Specifically, a Sans-IO library's entire interaction surface has only three actions. The caller repeatedly calls getNextEvent() to advance the state machine, each time getting one of:

ReadyToConsume(N): "I need N bytes to continue." The caller reads data from anywhere (socket, pipe, memory buffer) and calls engine.write(data) to feed it in.
ReadyToProduce(N): "I have N bytes to send out." The caller calls engine.peek(N) to get the data, sends it wherever, then calls engine.consumed(N) to acknowledge.
Event(code): "Something happened that you need to decide on." For example, an auth request arrived — accept or reject? The caller handles it, then calls engine.clearEvent(code) to let the state machine continue.

MiSSHod's server example shows the complete caller loop. All of SSH's complexity — key exchange, encryption switch, auth, channels — is encapsulated inside getNextEvent(). The caller sees only these three actions:

while (!quit) {
    const ev = try engine.getNextEvent();
    switch (ev) {
        .ReadyToConsume => |len| {
            // engine needs data: read from TCP, feed to engine
            const n = try stream.read(buf[0..len]);
            try engine.write(buf[0..n]);
        },
        .ReadyToProduce => |len| {
            // engine has data to send: take it, write to TCP
            const data = engine.peek(len);
            _ = try stream.write(data);
            engine.consumed(data.len);
        },
        .Event => |code| {
            switch (code) {
                .Connected => {},       // handshake complete
                .RxData => |data| {},   // client data received
                .UserAuth => |creds| {
                    engine.grantAccess(true);  // accept auth
                },
                .EndSession => { quit = true; },
            }
        },
    }
}

One thing worth noting about this code: engine never touches stream. It doesn't know whether the data comes from a TCP socket, a Unix pipe, or a test harness's memory buffer. To write unit tests, you don't need a real network connection — construct an SSH handshake byte sequence, engine.write() it in, check whether engine.peek() produces the expected output. The entire test runs in pure memory.

Contrast with the Pollable API's step variable: Sans-IO's state machine lives inside the library. The caller doesn't need to track "where we left off." No EAGAIN, no retries. Each getNextEvent() returns exactly what to do right now.

Five SSH Libraries, Five Choices

By this point I'd seen three approaches: library-owned blocking I/O, Pollable API, and Sans-IO. But real-world SSH libraries are richer than that. I looked at five implementations in different languages and found that each library's I/O design choice is tightly bound to the language's own concurrency model.

Go's x/crypto/ssh is the most straightforward. The caller hands over a net.Conn, and the library does blocking reads and writes:

listener, _ := net.Listen("tcp", ":22")
for {
    conn, _ := listener.Accept()
    go handleConnection(conn)
}

func handleConnection(conn net.Conn) {
    serverConn, chans, reqs, _ := ssh.NewServerConn(conn, config)
    // blocking reads and writes internally, completes entire handshake
}

ssh.NewServerConn internally calls conn.Read(), which blocks the current goroutine until data arrives. From the caller's perspective this is blocking code — line by line, send version string, read peer version, send KEXINIT, read KEXINIT. But there's an easily overlooked layer here: Go's runtime isn't actually blocking an OS thread. When a goroutine waits on Read(), the Go runtime suspends it, switches to other goroutines, and uses epoll under the hood to monitor the socket. Data arrives, goroutine wakes up. Goroutine creation costs a few KB, and the Go runtime multiplexes thousands of goroutines onto a handful of OS threads. So Go's SSH library is "API-level blocking, runtime-level non-blocking" — the programmer writes the simplest possible sequential code, and the runtime handles concurrency transparently.

This design only makes sense in Go, because Go has goroutines. In C and Zig, threads are OS threads — creation cost and 8MB default stacks make "one thread per connection" painful at high concurrency.

In terms of code organization, Go's implementation is procedural — the handshake is a sequence of function calls, control flow maintained by the call stack, no explicit state machine needed. Reading order equals execution order.

Rust's russh takes the async/await route, built on tokio. The caller implements a set of Handler traits, and russh calls back on events:

impl server::Handler for MyHandler {
    async fn auth_publickey(&mut self, user: &str, key: &PublicKey)
        -> Result<Auth, Error>
    {
        Ok(Auth::Accept)
    }
    async fn data(&mut self, channel: ChannelId, data: &[u8],
                  session: &mut Session) -> Result<(), Error>
    {
        session.data(channel, data.to_vec().into());
        Ok(())
    }
}

Like Go, the code reads sequentially — except each I/O operation is followed by .await. Under the hood tokio uses epoll for multiplexing, doing essentially the same thing as Go's runtime, but Rust's async is a compile-time transformation — the compiler splits functions into state machines at each .await point. Go's goroutines are runtime-scheduled green threads; Rust's Futures are compiler-generated state machines. Similar effect, different path.

tokio is the de facto standard for async networking in Rust. Most Rust network applications already run on tokio, so russh integrates with zero friction. But if the caller doesn't use tokio (say, a purely synchronous Zig program), russh's tokio dependency is a wall.

TinySSH takes the minimalist route: no concurrency at all. Each inbound connection triggers systemd or inetd to fork a TinySSH process. The process blocks serving that one connection, then exits. No threads, no event loop, no concurrency concerns. The cost is fork overhead and no shared state between processes. But TinySSH's goal is minimal attack surface, not performance — no algorithm negotiation (fixed ed25519 + chacha20-poly1305), no dynamic memory allocation, minimal code.

libssh2 is the Pollable API discussed earlier. MiSSHod is the Sans-IO discussed earlier.

Putting all five together, an interesting pattern emerges. Go binds to goroutines, Rust binds to tokio, TinySSH binds to one-process-per-connection — they all made a concurrency model choice on the caller's behalf. But libssh2 and MiSSHod specify no concurrency model whatsoever. The library doesn't create threads, doesn't start event loops, doesn't manage concurrency. The caller can use one thread per connection, single-threaded epoll, or embed into an existing event loop. This isn't an omission — it's the design core. The entire point of Pollable API and Sans-IO is to not make that choice for the caller.

This also reveals something: I/O model selection isn't a pure technical decision — it's bound to language ecosystems. Go has goroutines so blocking I/O is costless. Rust has tokio so async is natural. Zig has no runtime and no async framework — Sans-IO is the choice that makes the fewest assumptions. It's not about which is more advanced; it's about which is most reasonable under which constraints.

There's also a code organization difference worth noting. Go and Rust (async/await) handshake code is procedural — a series of sequential calls, control flow implicitly maintained by the call stack or compiler-generated state machines.

Sans-IO library internals are different. The MiSSHod caller loop shown earlier is clean — just three branches: ReadyToConsume/ReadyToProduce/Event. But getNextEvent() internally must remember "which SSH protocol step are we on." That state isn't on the call stack (because getNextEvent() returns each time it's called, destroying the stack frame), so it must be maintained as an explicit state enum. MiSSHod's server_session.zig[3] defines this:

const SessionState = enum {
    Init,
    KexInitWrite,
    KexInitRead,
    EcdhInitRead,
    EcdhReplyWrite,
    NewKeysRead,
    NewKeysWrite,
    AuthRead,
    AuthRspServReqSuccess,
    CheckUserPasswordAuth,
    UserAuthAccepted,
    UserPasswordAuthDenied,
    ChannelOpenConfirmWrite,
    ChannelRspWrite,
    ChannelData,
    // ...
};

Each state corresponds to an SSH protocol step. getNextEvent() internally is a big switch(state) — if the current state is KexInitWrite, it constructs the KEXINIT message and returns ReadyToProduce. After the caller sends it and calls getNextEvent() again, the state has advanced to KexInitRead, which returns ReadyToConsume to wait for the peer's KEXINIT.

libssh2 non-blocking mode's state machine lives in the caller — the caller uses a step variable to remember "was I on handshake() or userauth() last time." MiSSHod's state machine lives inside the library — the caller's loop doesn't track steps at all. The state machine's complexity hasn't disappeared; it moved from the caller into the library.

Both Pollable API and Sans-IO state machines are explicit — hand-written state enums and switch branches. Different from Go's procedural code (call stack implicitly maintains state) and Rust's async/await (compiler splits into state machine at each .await point). Procedural code is easier to read (all context is in front of you); explicit state machines are easier to audit (each state can only transition to specific next states, illegal transitions are detectable at compile time). Sans-IO requires the programmer to manually do what the compiler does automatically in Rust (I unpacked how the Rust compiler transforms async functions into state machines in an earlier article[4]) — which is one reason Sans-IO is less popular in languages that have async/await.

Three Code Styles Hiding in Five Libraries

The five-library comparison left one thing unresolved. Go's conn.Read() looks like ordinary sequential code. Rust's .await looks similar but adds a keyword. libssh2 non-blocking mode's EAGAIN retry loop is yet another completely different style. What do these three styles each mean? Are they surface-level syntax differences, or do they reflect something deeper?

Jumping between the five libraries, I kept running into these style differences without ever confronting them directly. Time to lay them side by side.

Go's style has a name: sequential blocking. The characteristic is that source order equals execution order — line 1 completes before line 2:

data := conn.Read()      // stuck here waiting for data
result := process(data)  // process after data arrives
conn.Write(result)       // send

"Sequential blocking" — that phrase stuck with me. If this is a style, what's its opposite? What kind of code has source order not equal to execution order?

conn.read(function(data) {
    result = process(data);
    conn.write(result, function() {
        console.log("done");
    });
});
console.log("runs first");  // executes before read completes

Source order and execution order diverge. As logic grows complex, callbacks nest deeper and deeper — callback hell.

To solve this problem, C# introduced async/await syntax[5] in 2012, with the core idea of making non-blocking code look sequential again. The design was widely adopted — Python 3.5[6] (2015), JavaScript ES2017[7], Rust 1.39[8] (2019) each introduced their own async/await:

let data = conn.read().await;
let result = process(data);
conn.write(result).await;

Looks identical to Go's sequential code, but the underlying mechanism is non-blocking. .await isn't a blocking syscall — the thread is not suspended by the kernel. It's a cooperative yield: the async runtime switches to other tasks at this point, the thread continues doing other work, and comes back when data arrives.

Go and Rust achieve the same effect: non-blocking multiplexing underneath, sequential-looking code on top. But Go's style is called "sequential blocking" and Rust's is called "sequential non-blocking." This naming confused me for a very long time — both sides "stop and wait for data," the goroutine suspends at Read(), the async task yields at .await — why is one called blocking and the other non-blocking? Is only Go waiting while Rust somehow isn't?

I only figured this out later while studying the tokio runtime. The distinction isn't about "whether you wait" — it's about whether the function returns.

Go's conn.Read() does not return until data arrives. From the API contract's perspective, this function promises "I won't give you back control until I have data." The goroutine being suspended by Go's runtime, epoll monitoring the socket underneath — these are implementation details, but they don't change one fact: the Read() function call just sits there not returning. That's what "blocking" means: the function doesn't return while the operation is incomplete.

Rust's conn.read().await is different. What .await compiles to:

match self.read_future.poll(cx) {
    Poll::Ready(data) => data,     // data arrived, continue
    Poll::Pending => return Pending, // no data, function returns
}

When data hasn't arrived, the function returns Poll::Pending — an ordinary function return, not a suspension. Control goes back to the tokio runtime, which executes other tasks. When data arrives, the runtime calls this function again, re-checking from the same match arm. I unpacked this mechanism in an earlier article[4]: the essence of .await is a function return.

So "blocking" and "non-blocking" describe the API contract — whether the function returns when the operation isn't complete — not "whether something is waiting for data." Both Go and Rust are waiting for data, but Go waits by not returning from the function (blocking), and Rust waits by returning a "not ready yet" signal and coming back later (non-blocking).

This distinction in naming was my biggest confusion and biggest clarity over the past few years — "biggest clarity" sounds like a hot take, I know. That's also why the opening suggests not reading further. But having sorted this out, looking back at the five SSH libraries' design choices, I can now tell what trade-offs each made in code style.

On top of this, Go and Rust have one more difference at the syntax level. Go's conn.Read() yields the goroutine, but you can't tell from the code — no syntactic marker indicates a suspension point. Rust's .await explicitly marks the yield: you know this line might suspend, you know the runtime might switch away here. Go says "you don't need to know." Rust says "you should know."

Two Dimensions, Not One Line

Time to step back and organize. I initially treated "blocking → event loop → Pollable API → Sans-IO" as a linear progression, as if later ones are always better than earlier ones. But the two sections above, unpacked separately, reveal two orthogonal questions mixed together:

Who does I/O? The library or the caller? This is library architecture — from "library does everything" through Pollable API to Sans-IO, the first half of this article explored this dimension.

How is I/O done? Sequential blocking, callbacks, async/await? This is code style and I/O mechanism — the dimension just clarified in the previous section.

These two dimensions are orthogonal. Sans-IO only answers the first question: "the caller does I/O." After getting a Sans-IO library, the caller can drive it with blocking read(), with epoll, or with async/await — the second dimension is entirely the caller's choice. Go's x/crypto/ssh bundles both answers together: "library does I/O" + "blocking + goroutines." That's not a bad choice — goroutines make blocking I/O costless, so bundling makes sense. But in languages without goroutines, that bundle is a constraint.

The Parts I Haven't Figured Out

Sans-IO isn't free. The caller has to manage read()/write() buffers, handle partial reads, handle error retries. What Go's ssh.NewServerConn(conn) does in one line, a Sans-IO caller needs dozens of lines for the driver loop.

The trade-off between flexibility and simplicity has different optima for different language ecosystems. Go has goroutines — blocking isn't a burden. Rust has tokio — async is the standard approach. Zig has no runtime — Sans-IO doesn't make choices for the caller. It's not about which is more advanced; it's about which choice is most reasonable under which constraints.

What I'm still thinking about is the boundary between Sans-IO and Pollable API. Pollable API also doesn't block the caller, also integrates with external event loops — the library just calls read() itself. How big is this difference in practice? Testability is one clear advantage — Sans-IO allows pure in-memory testing, while Pollable API needs real sockets. Beyond that, I don't have enough production experience to judge. I'll come back to fill this in later.

References

[1] libcurl - multi interface - Daniel Stenberg's API design for non-blocking, caller-driven HTTP handling.

[2] Sans-IO - The authoritative definition of the Sans-IO pattern for network protocol libraries.

[3] MiSSHod server_session.zig - The server-side session state machine in MiSSHod, a pure Zig SSH implementation.

[4] There's No Magic in Tokio - My earlier deep dive into how Rust's async/await compiles to state machines and how tokio drives them.

[5] C# Asynchronous Programming - Microsoft's documentation on the async/await model introduced in C# 5.0 (2012).

[6] PEP 0492 — Coroutines with async and await syntax - The Python Enhancement Proposal that introduced async/await to Python 3.5 (2015).

[7] ECMAScript Async Function Definitions - The ECMAScript specification for async functions (ES2017).

[8] Async-Await Hits Stable Rust - The Rust blog post announcing async/await stabilization in Rust 1.39 (2019).