Skip to content

Handshake state machine

A short story of how the SDK keeps your widget in sync with Grist, and how to lean on it when the default status (booting / ready / …) is not granular enough.

If you're happy with <GristBoundary> and useGrist().status, skip this page — those already work great. Read this if you've ever wanted to:

  • Show a distinct UI while column mappings are being configured.
  • React to a degrading network (stalelost) before the widget fully fails.
  • Gate write actions on "do I actually have write access right now", not just "did the handshake complete".
  • Implement a custom retry / backoff that knows what generation of the state it's looking at.

The five orthogonal axes

The SDK models the widget ↔ Grist relationship as five independent state machines running in parallel rather than one big enum. The public status is a derived projection over all five — but each axis carries strictly more information.

text
┌──────────────────────────────────────────────────────────────┐
│ GristWidgetSnapshot                                          │
│                                                              │
│  lifecycle : idle → detecting → negotiating → online         │
│              ↓                                               │
│              terminated (not_embedded | script_timeout |     │
│                          negotiate_failed | manual_stop)     │
│                                                              │
│  link      : unknown ↔ connected ↔ stale ↔ lost              │
│                                                              │
│  authz     : { requested, granted } — widens monotonically   │
│                                                              │
│  config    : { requiredAccess, declared, mappings: pending   │
│                / resolved / unreported / invalidated }       │
│                                                              │
│  sync      : record / records / options / newRecord freshness│
│              + currentTable state                            │
└──────────────────────────────────────────────────────────────┘

Each axis transitions independently — for example, the link can go connected → stale → lost without the lifecycle leaving online. The top-level status reflects the worst-case across axes, but the snapshot gives you the full truth.

Capabilities, the actionable booleans

Reading the snapshot directly is fine for telemetry. For UI gating, prefer the derived capabilities — they fold the five axes into boolean answers to the questions widgets actually ask:

ts
import { useGristCapabilities } from "grist-widget-sdk/advanced"

function App() {
  const { canRead, canRender, canWriteRecords, canWriteSchema } = useGristCapabilities()

  if (!canRender) return <Skeleton />
  return canWriteRecords ? <Editable /> : <ReadOnly />
}

The capability chain is conjunctive:

canWriteSchema ⇒ canWriteRecords ⇒ canRender ⇒ canRead

If canWriteRecords is true, all the strictly weaker capabilities are guaranteed to be true too. That property is exercised by the property-based tests (fast-check) on every CI run.

Lifecycle phases in practice

ts
const { snapshot } = useGristHandshake()

switch (snapshot.lifecycle.phase) {
  case "idle":         return null                              // before any detection
  case "detecting":    return <Connecting />                    // polling for grist
  case "negotiating":  return <Connecting label="Negotiating" />// grist.ready in-flight
  case "online":       return <App />                           // happy path
  case "terminated":
    switch (snapshot.lifecycle.reason.kind) {
      case "not_embedded":     return <OpenInGristNotice />
      case "script_timeout":   return <ScriptTimeoutNotice />
      case "negotiate_failed": return <NegotiateFailNotice err={snapshot.lifecycle.reason.cause} />
      case "manual_stop":      return null
    }
}

terminated is absorbing — <GristBoundary>'s default reload() button calls the manager's reload(), which bumps the generation and re-enters detecting.

When the lifecycle is online, the manager runs a heartbeat in the background — by default a grist.docApi.getDocName() every 30 seconds — and feeds the result into the link axis:

link.stateMeaning
unknownNo probe issued yet (just turned online, or pause-on-hidden).
connectedLast probe (or natural RPC) succeeded.
staleMissed probes ≥ staleAfterMissed (default 2).
lostMissed probes ≥ lostAfterMissed (default 4).

lost escalates to the top-level status === "error". stale does not — the widget keeps running, but hasFreshSelection may drop to false to reflect that the record stream is going cold.

Auto-coalescence

Every successful Grist RPC the SDK issues through useGristWrites(), useGrist().fetchTable(…), useGrist().setCursorPosition(…), etc. is reported to the manager. The heartbeat then skips its next scheduled probe because natural traffic already proved the link healthy.

The practical effect: chatty widgets cost zero extra requests; quiet widgets still get a baseline 30 s health check.

Talking to Grist outside the SDK

If you call grist.* directly, feed the manager yourself:

ts
const { recordRpcSuccess, recordRpcFailure } = useGristHandshakeContext()

try {
  const out = await grist.docApi.applyUserActions(actions)
  recordRpcSuccess()
  return out
} catch (err) {
  recordRpcFailure()
  throw err
}

This is rarely necessary — the SDK's slice hooks already cover the common paths.

Mappings, beyond columnMappingStatus

useGrist().columnMappingStatus collapses the mapping config into the familiar { ok, pending, missing, emptyMultiples } shape. The snapshot gives you the underlying state machine if you need it:

ts
const { snapshot } = useGristHandshake({ columns: [{ name: "Title" }, { name: "Done" }] })

switch (snapshot.config.mappings.state) {
  case "not_declared":  // we didn't declare columns at all
  case "pending":       // declared, waiting for the host
  case "resolved":      // host reported mappings — read snapshot.config.mappings.state2
  case "unreported":    // 5 s elapsed with no report — host probably old/buggy
  case "invalidated":   // we received a mapping update that wiped previous ones
}

The unreported state is the main reason to look at this — it lets you show a friendly "the Grist host didn't report column mappings" hint without falling back to a hard error.

Generation: how stale callbacks are dropped

Every snapshot carries an integer generation. Every effect dispatch carries the generation it was scheduled in. The reducer drops actions whose action.generation is older than the current snapshot's. The manager bumps generation on reload() / restart().

This means a grist.ready() resolving 4 seconds after the user clicked "Retry" can never flip the lifecycle back to online of the prior generation — the action is silently dropped.

If you write your own effects (custom RPC layer, optimistic retry…), follow the same discipline: capture snapshot.generation before sending, ignore the result if it changed by the time you return.

Choosing your provider

ProviderSlice hooks driven by FSM?Capabilities exposed?Heartbeat coalesces SDK RPCs?
<GristWidgetProvider> (default)yesimplicitly via statusyes
<GristHandshakeProvider> (advanced)no (no slice hooks)yes (via useGristHandshakeContext)yes (for the manager you own)
Both, nestedyes (legacy provider's manager drives slices)yes (each provider has its own manager)yes for both

Pick one in practice:

  • Most widgets: <GristWidgetProvider> + <GristBoundary> is enough. Add useGristCapabilities() inside if you need finer gating.
  • Pure handshake: drop <GristWidgetProvider> and use <GristHandshakeProvider> + your own subscriptions. Useful for the smallest possible widgets.
  • Custom retry / observability: <GristWidgetProvider> for the slices, and useGristHandshake() standalone for the observable snapshot — both share the page-level grist.ready singleton so there's no double handshake.

See the API reference for the full type contract.

Released under the ISC License.