I deleted WebSockets and haven’t looked back
Why I swapped Turbo Stream broadcasts for polling after a paying customer got stuck with an infinite spinner.
Hey friends,
Ian Landsman posted something recently. And at first, I totally dismissed it. “Old man yells at cloud!” I was saying in my head.
He ripped every WebSocket call out of his app, moved to polling the server, and said everything just works better. “Sometimes the old ways are the best ways.”
But for some reason… it stuck with me. Probably because I’d been doing the exact opposite all over my own dashboard, and had been bitten by it twice.
A customer paid for a subscription and got dropped on a dead-end spinner that never resolved. The payment went through, Stripe was happy, but the page just never heard about it. A few days before that, someone breezing through onboarding missed a single broadcast and got stuck on a step that never advanced.
So I (begrudgingly) followed Ian’s lead and started deleting. Here’s the whole story, plus something I picked up only after the code was gone.
I was broadcasting everything with Turbo Streams
The dashboard for Ruby Native does a lot of waiting on background jobs. It validates your App Store Connect credentials, checks whether your beta testers got added, watches a build move through GitHub Actions, confirms an in-app purchase webhook, and validates a Stripe subscription.
Every one of those started the same way: a background job did its work then broadcast the result to the page over Turbo Streams. Add an after_commit on the model, a turbo_stream_from in the view, and the result appears without a refresh. It’s the default UX in Hotwire, and it feels great when you wire it up.
The problem is what happens when the broadcast doesn’t land.
Broadcasts are fire and forget
A Turbo Stream broadcast is a single message pushed down an open socket. If nobody’s listening at that exact moment, it’s gone. There’s no retry or catch-up.
There are two easy ways to miss one. The first is a race: the job finishes and broadcasts before the browser has finished subscribing to the channel, so the message goes out to an empty room. The second is simpler, and way more likely: the connection blips, the message is lost, and nothing ever asks for it again.
Either way the page sits there. And on a few of these screens the spinner was the default state in the database, so when the result never arrived, nothing ever replaced it. The page wasn’t mid-update, it was stuck, forever, behind a sad little “try refreshing” link I’d bolted on as a patch.
That patch is what bothered me. If the answer to “my real-time UI didn’t update” is “ask the user to refresh,” then the real-time part isn’t carrying its weight, no?
What if the dashboard just… asked?
So I asked myself a question I probably should have asked at the start. Is pushing just overkill for a dashboard?
A dashboard isn’t a chat app. The updates I care about land seconds (sometimes minutes!) apart, not milliseconds. Nobody needs sub-second delivery to find out their credentials validated. What they need is for the page to be correct, and to recover on its own when something goes wrong.
The database already knew the truth the entire time. Broadcasting was just trying to mirror that truth into an open socket and occasionally missing. So I flipped it around. Instead of the job pushing the result out, the page re-asks the server “what’s true now?” every few seconds, and stops asking as soon as it has its answer.
Move the transient state into a row
The first piece is giving each async step somewhere durable to record what happened. I added one small model, OnboardingCheck, with a row per app per step. It holds only the transient state of the last attempt: whether a job is in flight and the error if one came back. The authoritative “this step passed” signal stays where it always lived, on the parent record, App.
class OnboardingCheck < ApplicationRecord
belongs_to :app
# How long a stamped checking_at counts as “a job is in flight” before
# the page assumes the worker died and re-runs the check on the next poll.
CHECK_TTL = 2.minutes
def checking?
checking_at.present? && checking_at.after?(CHECK_TTL.ago)
end
def start!
update!(checking_at: Time.current, error: nil, reason: nil)
end
def finish!(error: nil, reason: nil)
update!(checking_at: nil, error:, reason:)
end
endThe job calls start! when it kicks off and finish! when it’s done. Notice what that buys you: the error is now persisted. A failure survives a refresh, where before it lived for one broadcast and then vanished.
CHECK_TTL is the escape hatch. Since these checks should finish in a second or two, a checking_at older than two minutes means the worker died before reporting back, so the next poll re-runs the check instead of leaving the user stuck on a spinner.
A frame that reloads itself until it’s done
The second piece is a Turbo Frame that polls. While a job is in flight, the frame renders a small Stimulus controller that reloads the frame on a timer. When the result lands, the re-render simply leaves that controller out, and the polling stops on its own.
<%= turbo_frame_tag “subscription_status”, src: confirm_checkout_path(@app) do %>
<% if @app.active_subscription? %>
<%= render “checkouts/subscription_confirmed”, app: @app %>
<% else %>
<p>Waiting for payment confirmation...</p>
<div data-controller=”poll” data-poll-interval-value=”3000”></div>
<% end %>
<% end %>The controller is about as small as it gets.
// app/javascript/controllers/poll_controller.js
import { Controller } from “@hotwired/stimulus”
export default class extends Controller {
static values = { interval: { type: Number, default: 5000 } }
connect() {
this.timer = setInterval(() => {
this.element.closest(”turbo-frame”)?.reload()
}, this.intervalValue)
}
disconnect() {
clearInterval(this.timer)
}
}This is the part I find genuinely tidy. The controller only exists in the DOM while there’s work to wait on. The moment the success partial renders instead, the controller’s element is gone, Stimulus fires disconnect(), clearInterval runs, and the polling ends. There’s nothing to remember to tear down.
I wrapped that pattern in a poll_frame helper so every surface shares it, but the mechanism is exactly the two snippets above.
Three things that bit me
If you try this, here are the gotchas I hit so you can (hopefully) avoid them.
A self-referencing src makes Turbo complain. A frame whose src points at the same URL it’s already showing gets rejected. On the polling re-render I leave src off and let the frame keep the one it was born with. The Stimulus reload() still re-fetches that original src just fine.
Polling isn’t free when a poll hits an external API. My build feed re-renders each app’s Google Play status, and that’s a real API call, not a database read. Left alone, polling would quietly hammer Google’s rate limit. To fix this, I cache that lookup for 30 seconds, and force a fresh read only right after a rollout so the result still shows up immediately.
Links inside a polling frame need an escape hatch. By default a link inside a Turbo Frame tries to navigate that frame, and any link to a page without a matching frame id throws Turbo’s “Content missing.” Set target: "_top" on the frame so links and forms do a normal full-page visit. The frame’s own reloading is unaffected.
And one decision worth making on purpose: what should a 500 do? I made errors during the flow bubble up and stop the polling, so the user sees that something broke and knows to refresh. A failure you can see beats a failure that hides behind a spinner.
The lesson wasn’t really “polling”
The reliability didn’t come from polling. It came from rendering off the database every single time.
A poll tick and a broadcast are both just “go show the current state.” The difference is that polling has no choice but to recompute from the source of truth, while a broadcast lets you push the rendered change and skip that step. That skip is exactly where the spinner bug lived!
Which means I didn’t strictly have to drop the socket. If each broadcast had only said “something changed, re-fetch” instead of carrying the result itself, the page would recompute from the database and heal the same way. That’s more or less what Turbo 8’s broadcast_refresh and morphing do: the broadcast says “re-render,” and the morph reconciles against fresh server HTML. You get push’s snappiness with polling’s self-healing.
I still went with polling, because a dashboard doesn’t need the snappiness and polling makes that recompute-from-truth discipline automatic instead of optional. But that’s the rule I’m keeping, and it’s the sharper version of Ian’s point: pick your transport for how fast and how wide you need to push, and render from durable state no matter what. Push a hint, not a delta.
I didn’t delete Action Cable
To be clear, I didn’t rip WebSockets out of the building. solid_cable is still mounted, and a couple of genuinely live surfaces still push. Polling isn’t automatically the better tool.
It’s the better tool here, on a dashboard where the database is the source of truth and “a few seconds late” is completely fine. If I were building live chat or a collaborative editor, I’d still reach for push without thinking twice.
The mistake wasn’t using broadcasts, it was using them as the default for everything. Like the checkout page where a single lost message left a paying customer staring at a spinner.
Thanks for the nudge, Ian. Turns out the old ways really are the best ways.
Where have you reached for a broadcast when a poll, or a plain re-fetch, would have done the job? Hit reply and let me know!


