OpenClaw Codex

Production Playbook

OpenClaw Safe Restart Playbook

Restarting an agent gateway is easy. Restarting it without confusing users, losing callbacks, or hiding a broken channel is the operational skill. Use this checklist before upgrades, config changes, model auth changes, or service restarts.

Scope and Redaction Note

This is an independent field guide, not the official OpenClaw documentation. Commands use placeholders and generic service names. Do not paste real tokens, webhook URLs, channel IDs, session IDs, private logs, or production hostnames into a public incident note.

Goal: reduce restart risk. No checklist can guarantee delivery for every external chat platform, provider, network, or plugin.

When To Use This

  • Before running openclaw gateway restart on a production-like gateway.
  • Before changing gateway.port, bind mode, auth mode, reverse proxy rules, or channel config.
  • Before upgrading OpenClaw or any plugin that owns channel delivery, model routing, sessions, or tools.
  • After a model-auth or provider fallback change that could alter reply behavior.

Restart Risk Map

In-Flight Replies

A user message may already be inside the agent loop when the service stops. Avoid restarting during active replies unless you accept retry work.

Channel Callbacks

Chat platforms may retry, drop, or duplicate callbacks depending on their delivery contract. Keep an incident log for any restart window.

Config Snapshot

The gateway serves from an active runtime snapshot after a successful load. Know whether your change is hot-safe or restart-required.

Auth and Provider Drift

OAuth profiles, API keys, model fallbacks, and proxy credentials can fail after a restart even when the gateway process comes back.

Pre-Flight Checklist

  1. Announce a short maintenance window in any team or operator channel.
  2. Pause non-critical public callbacks or queue heavy automations if your channel setup supports it.
  3. Capture the current gateway status, version, process manager state, and disk space.
  4. Snapshot configs, service files, plugin versions, and any recent local changes.
  5. Prepare the rollback command before the restart, not after failure.
bash Public-safe pre-flight commands
openclaw gateway status
openclaw gateway status --deep
openclaw channels status --probe
openclaw logs --tail 80

# Host-level checks. Adapt to your supervisor and OS.
df -h /
systemctl --user status openclaw-gateway --no-pager || true
pm2 list || true

Drain Window

A drain window is a short period where the operator stops accepting new risk while letting current work finish. It is especially useful for chat channels where users expect conversational continuity.

  1. Stop starting new long-running tasks for 2 to 5 minutes.
  2. Check whether any operator session is actively producing a reply.
  3. If a reply is active, wait for completion or record that the restart is interrupting it intentionally.
  4. Keep one operator watching logs while another performs the restart.

If you cannot drain, mark the restart as an interruption and be ready to resend a concise status message to affected channels after recovery.

Restart Command Ladder

Use the smallest restart that matches your change. Start with OpenClaw's own gateway controls, then fall back to the supervisor only when needed.

bash Gateway-first restart
# Preferred operator path
openclaw gateway restart
openclaw gateway status --require-rpc
openclaw channels status --probe
bash Supervisor fallback examples
# systemd user service example
systemctl --user restart openclaw-gateway
systemctl --user status openclaw-gateway --no-pager

# PM2 wrapper example
pm2 restart openclaw-gateway
pm2 logs openclaw-gateway --lines 80 --nostream

Post-Restart Smoke Test

Do not call a restart complete until the process, RPC surface, channel path, and model path all respond.

  1. Confirm the gateway is running and the RPC health path is reachable.
  2. Probe channels and confirm the channel you care about is live, not just configured.
  3. Send one harmless operator message and wait for the full reply.
  4. Check logs for auth failures, transport errors, duplicate callbacks, and provider fallback loops.
  5. Record the final status and the exact time the maintenance window closed.
bash Verification ladder
openclaw gateway status --require-rpc
openclaw channels status --probe
openclaw logs --tail 120

# Optional OpenAI-compatible surface check when you use it.
curl -fsS -H "Authorization: Bearer <gateway-token>" \
  http://127.0.0.1:18789/v1/models

Rollback Path

Rollback is a prepared sequence, not a mood.

  • Restore the previous config or service file from the snapshot.
  • Reinstall or pin the previous OpenClaw version if the upgrade changed runtime behavior.
  • Disable newly added plugins or channels first, then restart the gateway.
  • Run the same smoke test before announcing recovery.
bash Rollback skeleton
cp /path/to/backup/openclaw.json ~/.openclaw/openclaw.json
openclaw gateway restart
openclaw gateway status --require-rpc
openclaw channels status --probe

Incident Note Template

markdown Keep it sanitized
# OpenClaw Restart Note

Date:
Window:
Reason:
Operator:

Before:
- gateway status:
- channel probe:
- known active sessions:

Change:
- command:
- config area:
- plugin/channel touched:

After:
- gateway status:
- channel probe:
- test message result:
- errors observed:

Rollback:
- needed? yes/no
- command used:
- final status:

Redactions:
- tokens removed:
- webhook URLs removed:
- private user/channel IDs removed:

FAQ

Can I rely on hot reload instead of a restart?

Sometimes. The official gateway docs describe reload modes including hot, restart, and hybrid. Treat any auth, channel, supervisor, port, bind, or plugin change as restart-risk unless you have tested that exact change.

What is the minimum safe proof after restart?

Process status is not enough. Require gateway status, RPC proof when available, channel probe, one harmless message, and a log check.

Should I publish my real restart log as a case study?

No. Publish the sanitized structure, not raw logs. Remove tokens, phone numbers, webhook URLs, account IDs, private hostnames, session IDs, and customer content.

Sources

Next Steps