Instant by default, multiplayer by design

A deep dive into the sync layer that makes Plain feel fast and collaborative. The browser holds a live replica of the slice of Postgres it's looking at, and edits documents as CRDTs. Two sync engines, one principle: subscribe to state, not endpoints. Here's how it works, end to end.

Open a pull request on GitHub, glance at the checks, and notice what your hands do. You reload. Not because nothing updates on its own, some things do, a comment thread will append while you're reading it, but because you can't tell from the outside which parts of the page are live and which are a photograph taken when it rendered. So you refresh to be sure. You cmd-R the PR to see if CI finally went green. You re-open the issue because someone might have changed the status while you had it sitting there. The reload is a reflex you built to cope with not knowing.

This isn't GitHub being careless. It's what happens when realtime gets added to a fifteen-year-old request/response app one surface at a time. Each feature that got it, got it on its own: a websocket here, a poll there, a "this conversation has been updated, reload" banner somewhere else. The result works, half the software in the world ships through it, but it sits well below the bar that Notion, Linear, and Figma reset for the rest of us. Those apps retrained everyone to expect that every view is live and every cursor is visible, with no thought given to freshness because freshness is just always there. Against that bar, "most of it updates when you reload" feels like the past.

A developer platform feels this more than most software, because a developer platform is inherently multiplayer. Issues, reviews, CI, chat, and now agents are all happening at once, to the same objects, while you watch. Patchy freshness is most expensive exactly where the most is going on.

The workaround became the convention

The industry's answer to staleness is to bolt freshness onto individual surfaces. Poll this endpoint every ten seconds. Open a websocket for the one feature that really needed it. Add pull-to-refresh. Show a banner when the server suspects you're out of date. Each surface reinvents freshness with its own mechanism, its own race conditions, its own spinner, and the ones nobody got around to just don't update at all.

The tell is that "realtime" is a line item on a feature spec. It's a thing you decide to add, per feature, after the fact. We think that's backwards. The reason the experience is inconsistent is that liveness was never the substrate, it was a finish applied unevenly on top of one.

Subscribe to state, not endpoints

Plain is built the other way around. The client doesn't call endpoints and cache the responses. It subscribes to a live replica of the slice of the database it's currently looking at.

The list of issues on your screen is a query against a local store that the server keeps up to date for you. When a row changes anywhere, another teammate, a CI runner, an agent, it's in your store before you would have thought to reload. There is no "realtime version" of a view to contrast against a stale one, because every view is a live query by construction. The refresh reflex has nothing to do, so it goes away.

Issues in Plain updating live as teammates change them, with no page reload

That sounds like one trick. It's actually two, because there are two genuinely different kinds of state in the product, and they want two different machines.

Two kinds of state

The naive move is to pick one realtime technology and use it for everything. We didn't, because forcing one engine onto both kinds of state gives you a bad version of each.

The first kind is rows. Issues, pull requests, CI runs, messages, reactions, notifications, packages. These have a single source of truth, Postgres, a clear authority, and last-write-wins semantics. Two people rarely set the same issue's status in the same second, and if they do, the later write should win and everyone should converge on it. What you want here is for the browser to mirror Postgres cheaply, with the database as referee.

The second kind is document bodies. The text of a doc, an issue description, a PR body. Here many people genuinely do type into the same paragraph at the same time. Last-write-wins would mean one person's sentence eats another's. What you want is character-level concurrent merging that always converges, without a central referee adjudicating every keystroke.

These are different consistency models, so we use a purpose-built tool for each, and the tools' own descriptions make the split feel inevitable. ElectricSQL calls itself "a read-path sync engine for Postgres": it syncs data out of Postgres into clients, and has no write path at all, by design. Yjs calls itself "a high-performance CRDT" that needs "no central source of truth to perform conflict resolution." One is built to replicate an authoritative database. The other is built to merge concurrent edits with no authority in the loop. We just declined to pretend there was only one kind of state.

The two never fight, because they own different fields of the same object. An issue's status rides Electric, as a column on a row. Its body rides Yjs, as a CRDT document. They reconcile independently. The rest of this post is those two engines, and what falls out once they're the floor instead of a finish.

Throughout, it helps to follow one concrete thing: a CI run, streaming its steps onto your screen as they execute. It touches every layer, so we'll keep returning to it.

Engine one: rows, over Electric

Electric is a sync engine that reads the Postgres write-ahead log via logical replication and serves "Shapes" over plain HTTP. A Shape, in Electric's vocabulary, is "a filtered subset of a Postgres table": single-table, defined by the table name plus an optional where clause and column list. The client consumes a "Shape Log" by long-polling, and Electric streams each change as it lands in the log.

The browser never talks to Electric directly. Every Shape request goes through one authenticating route in the web app, and that route is the entire security boundary:

apps/web/src/routes/api/electric/v1/shape.tsserver

// The client names a table and an org. The server decides what it may see.
const session = await auth.api.getSession({ headers: request.headers });
if (!session) return json({ error: "unauthorized" }, { status: 401 });
if (!ALLOWED_TABLES.has(table)) return json({ error: "table not allowed" }, { status: 403 });

// Resolve the org slug to an id and confirm the caller is a member.
const { id: orgId, role } = await resolveMembership(orgSlug, session.user.id);
if (role === null) return json({ error: "forbidden" }, { status: 403 });

// The WHERE clause is built here, from validated ids, never sent by the client.
let where = `organization_id = '${orgId}'`;
if (table === "ci_step") where += ` AND run_id = '${runId}'`;

The important property is what the client can't do. It names a table and an org slug; it does not get to supply a filter. The proxy authenticates the session, checks the table against an allow-list of substrate tables that actually have a realtime consumer, resolves the org and confirms membership, then writes the where clause itself, always pinned to organization_id and then narrowed to the tightest scope the table supports: a repo, a CI run, the caller's own user id, a conversation they belong to. Every id is shape-checked against a strict regex before it goes near the SQL string. A caller can only ever stream rows from an org they belong to. Tables with no business syncing to the browser, package tarball bytes, say, simply aren't on the allow-list, so they never ride the pipe.

The client cannot ask a wrong question. It names a table and an org; the proxy writes the where.

On the client, each Shape is wired to a TanStack DB collection, which calls itself "the reactive client store for your API." A collection is a singleton per scope that points at the proxy:

apps/web/src/collections/ci-steps.tsclient

createCollection(
  electricCollectionOptions<CiStepRow>({
    id: `ci-steps-${runId}`,
    shapeOptions: {
      url: `${window.location.origin}/api/electric/v1/shape`,
      params: { table: "ci_step", org: owner, run: runId },
    },
    getKey: (row) => row.id,
  }),
)

React reads it through one uniform idiom, a live query, and re-renders whenever Electric streams a change:

apps/web/src/routes/.../ci/$runId.tsxclient

const collection = getCiStepsCollection(runId, owner);
const { data: steps } = useLiveQuery((q) => collection, [collection]);

No manual invalidation, no refetch, no "did this change" bookkeeping. And the live query is incremental: TanStack DB runs queries on differential dataflow, so a streamed change repatches just the affected rows rather than re-running the query, which is why a list with a hundred thousand rows stays smooth while updates land. Because a Shape is single-table, we expose one Shape per table and compose related data client-side with live queries, rather than asking the sync engine to perform a join.

So here is the CI run, end to end. The runner, executing your pipeline somewhere, writes a ci_step row when a command begins and updates it as output streams in. Electric sees that write on the WAL, recomputes the affected Shape, and pushes it down the open long-poll. The proxy forwards it. TanStack DB patches the ci-steps-<runId> collection. The live query fires, and the step appears on the run page, expands, and follows its own output as it scrolls. Nobody reloaded anything. The page is watching the database.

The instant trick

There's a gap in that story. Electric is read-path only. It syncs data out of Postgres and has no way to send a write back. So how does a user's own action, dragging an issue across a board, toggling a reaction, feel instant, when the write has to round-trip to the server and come back down the read path before it's authoritative?

The answer is the part that makes the whole thing feel fast, and TanStack DB frames it exactly right: "the instant inner loop of optimistic state, superseded in time by the slower outer loop of persisting to the server." The inner loop is local and synchronous. The outer loop is the server function plus the Electric stream. A Postgres transaction id is the thread between them.

When you change an issue's status, the collection applies an optimistic overlay immediately. The UI updates on the spot, with no spinner and no waiting on the network. Then the collection's write handler runs, sends only the changed fields to a server function, and returns the transaction id of the resulting write:

apps/web/src/collections/issues.tsclient

onUpdate: async ({ transaction }) => {
  const { original, modified } = transaction.mutations[0];
  const patch = diffIssue(original, modified); // only the fields that changed
  const { txid } = await updateIssue({ data: { id: original.id, ...patch } });
  return { txid };
},

The server function does the write and reads the transaction id back in the same statement, which turns out to matter:

apps/web/src/lib/issue.server.tsserver

const [updated] = await db
  .update(issue)
  .set(patch)
  .where(eq(issue.id, id))
  .returning({
    id: issue.id,
    updatedAt: issue.updatedAt,
    // Read the txid in the same statement: each statement commits in its own
    // transaction, so a separate SELECT would report a different one. The ::xid
    // cast narrows it to the value Electric stamps on the row it streams back.
    txid: sql<string>`pg_current_xact_id()::xid::text`,
  });
return { ...updated, txid: Number(updated.txid) };

When Electric streams the authoritative row back down, it carries that same transaction id. TanStack DB matches it, retires the optimistic overlay in favor of the real value, and if the write failed, rolls the overlay back instead. Staleness is impossible: the overlay is always either confirmed by a matching txid or rolled back, never left dangling.

This is one mechanism, and it's the same one everywhere. Dragging an issue across a board is just a status change, so the board reuses the issue list's write path with no board-specific server code. Reactions, message edits, mark-as-read, archiving a notification: all the same shape. Write locally, confirm by txid.

Engine two: documents, over Yjs

Document bodies don't have a single authoritative writer, so they don't go through any of that. They're CRDTs.

The editor is Plate with a Yjs binding, and each open document is one Y.Doc. A dedicated service, apps/collab, runs Hocuspocus, which describes itself as "a backend for Yjs": a websocket server that implements the Yjs CRDT, with persistence and auth. A keystroke becomes a Yjs update, travels over the websocket, merges into the server's copy of the document, and broadcasts to every other connected peer, whose bindings merge it in turn. Because it's a CRDT, the order updates arrive in doesn't matter and concurrent edits to the same paragraph converge. Nobody's sentence eats anybody's.

Persistence is debounced and lives entirely on the server. Hocuspocus waits two seconds after the last edit, forced at ten seconds of continuous typing, then writes a single compacted snapshot. No per-keystroke database write. And in the same write, it does something worth pausing on: it projects the document to markdown and stores both side by side.

apps/collab/src/persistence.tsserver

// One writer keeps the CRDT snapshot and its markdown projection in lockstep.
const snapshot = Y.encodeStateAsUpdate(document);
const markdown = toMarkdown(document);
await pool.query(
  `UPDATE doc SET body_yjs = $1, body_md = $2, updated_at = now() WHERE id = $3`,
  [Buffer.from(snapshot), markdown, docId],
);

A single writer keeps the binary CRDT and its markdown rendering in lockstep, so they can never drift, and no two open tabs ever race to write the cache. The server renders markdown with the exact same node definitions the browser uses, so the markdown a search index or an agent reads is byte-identical to what the editor would have produced. Every non-editing view, the doc list, search, export, gets a consistent rendering for free, and the editor never has to write it.

Access is a short-lived signed token. The web app mints a five-minute JWT scoped to a specific document, after checking org membership; the collab server verifies it and confirms the scope matches the room being opened. The token is passed as a function rather than a value, so a reconnect silently re-mints it.

The detail people tend to like most: agents edit the live document too, and they do it as a peer, not by writing markdown to the database behind everyone's back. When an agent rewrites a doc body, it opens the same Y.Doc a human would, computes a block-level three-way merge between the document it started from, its proposed version, and the current live state, and applies the result one block at a time, anchored to Yjs relative positions so the edits land in the right place even as a human keeps typing. It even carries an awareness cursor. You watch the agent write through the same colored caret you'd see a teammate through.

Presence is just awareness

Online dots, "three people here," typing indicators. None of that has a heartbeat table behind it. It's all Yjs awareness, on dedicated content-free rooms riding the same collab websocket. One room per org carries the green dots; one per conversation carries typing.

Yjs describes awareness as "a tiny state-based Awareness CRDT that propagates JSON objects to all users," deliberately ephemeral: it "isn't stored in the Yjs document." Best of all, when a client disconnects, "your own awareness state is automatically deleted and all users are notified that you went offline." We didn't build presence. We adopted a CRDT that already had exactly these semantics. There's no sweeper job, no last-seen column going stale, no heartbeat to miss. A dropped connection just vanishes from the set, and typing expires itself after three seconds.

What falls out for free

Here's the payoff of making sync the floor. None of the following were built as realtime features. They're what you get when every view is already a live query and every document is already a CRDT.

Live CI streaming, the thread we've been following, is just rows arriving over a Shape. The run page renders steps from a live query and follows a running step's output as it streams. When the run finishes, the page swaps the live view for the byte-accurate log archive in object storage. The live part required no streaming infrastructure of its own; it's the same collection machinery that backs the issue list.

Live agent activity is the same: an agent's lifecycle events ride an event Shape, and a message's agent badge animates from working to done as the rows land. Reactions, message edits, and typing indicators are more rows and more awareness, so they came along for the ride once the substrate existed. Notifications turn the per-user notification Shape into a throttled sound and a background-tab OS banner, with no polling. Two browser tabs are simply two subscribers to the same Shapes, so they agree without a line of cross-tab code.

And because the row shapes and Electric wiring live in a shared package, the mobile app reuses the exact same collections; the only real difference is how the session cookie gets attached to the request. One sync model, two clients.

None of these are surfaces we remembered to make live. They're consequences.

What it costs

It would be dishonest to present this as free. It isn't.

It's two engines to operate where a request/response app runs none. Electric needs Postgres on logical replication, which is real operational surface: wal_level=logical, a replication role, a deploy checklist item that says don't ship Electric until the database actually reports it. The collab service is a second stateful thing to run and scale. The Shape proxy is, by design, a choke point that every subscription flows through; that's the right call for tenancy safety, but it's a thing to watch and scale, not a free lunch.

And we live within Electric's grain. It's read-path and single-table on purpose, and its where clause doesn't support every Postgres operator. That's not a knock, it's exactly why Electric stays simple and cacheable, but it does mean writes live in our server functions and joins live in client-side live queries rather than in the sync layer. We chose a tool with a narrow, sharp contract and built around its edges deliberately.

For a single-player CRUD app with one user poking at a form, all of this would be wild overkill, and we'd tell you so. We pay for it because a developer platform is the opposite case: multiplayer by nature, long-lived on the screen, with several kinds of actor changing the same objects while you watch. That is precisely where the request/response tax compounds, and precisely where making liveness the substrate pays for its operational weight.

The shape of the bet

The page used to be a photograph you didn't fully trust, so you reloaded it to be safe. Now it's a window. Nothing on it is older than the database, and you never have to wonder whether this particular view is the live one, because they all are.

That's the whole idea. We don't ship realtime as a feature you remember to add to a surface. We make sync the substrate, and instant and multiplayer stop being things you build and start being the shape of everything you build on top.