Recipe

Use a Local LLM with OpenClaw

Connect OpenClaw to a locally-running model via Ollama, llama.cpp, or any OpenAI-compatible endpoint. Ideal for privacy-sensitive workloads, offline deployments, or eliminating per-token API costs.

Option A — Ollama (Easiest)

Ollama provides a one-command local model server with an OpenAI-compatible API.

bashInstall Ollama + pull a model

# Install Ollama (Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model — Qwen2.5 14B is excellent for most tasks
ollama pull qwen2.5:14b

# Or smaller models for low-RAM servers:
# ollama pull qwen2.5:7b
# ollama pull llama3.2:3b
# ollama pull phi4-mini

# Verify it's running
curl http://127.0.0.1:11434/api/tags

jsonOpenClaw provider config

{
  "providers": {
    "ollama-local": {
      "type": "openai-compatible",
      "base_url": "http://127.0.0.1:11434/v1",
      "api_key": "ollama",
      "models": {
        "qwen2.5:14b": {
          "alias": "local-smart",
          "max_tokens": 8192
        },
        "qwen2.5:7b": {
          "alias": "local-fast",
          "max_tokens": 4096
        }
      }
    }
  }
}

Option B — llama.cpp Server

For maximum control over quantization and hardware utilization.

bashBuild and run llama.cpp server

# Build (requires cmake, gcc)
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON  # remove CUDA flag if no GPU
cmake --build build -j$(nproc) --config Release

# Download a GGUF model (example: Qwen2.5 7B Q4)
wget https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF/resolve/main/qwen2.5-7b-instruct-q4_k_m.gguf

# Start server (OpenAI-compatible endpoint)
./build/bin/llama-server \
  --model qwen2.5-7b-instruct-q4_k_m.gguf \
  --host 127.0.0.1 \
  --port 8080 \
  --ctx-size 8192 \
  --n-gpu-layers 35   # adjust for your GPU VRAM

jsonOpenClaw config for llama.cpp

{
  "providers": {
    "llamacpp": {
      "type": "openai-compatible",
      "base_url": "http://127.0.0.1:8080/v1",
      "api_key": "sk-local",
      "models": {
        "qwen2.5-7b": {
          "alias": "local",
          "max_tokens": 4096,
          "temperature": 0.7
        }
      }
    }
  }
}

Option C — Any OpenAI-Compatible Endpoint

LM Studio, Jan, vLLM, TabbyAPI, and dozens of other tools expose an OpenAI-compatible API. The same config pattern applies:

json

{
  "providers": {
    "my-local-server": {
      "type": "openai-compatible",
      "base_url": "http://127.0.0.1:1234/v1",
      "api_key": "not-needed",
      "models": {
        "local-model": {
          "alias": "local"
        }
      }
    }
  }
}

Performance Tuning

RAM	Recommended Model	Quality
4 GB	phi4-mini, llama3.2:3b	Basic
8 GB	qwen2.5:7b, mistral:7b	Good
16 GB	qwen2.5:14b, deepseek-r1:14b	Very Good
32 GB+	qwen2.5:32b, deepseek-r1:32b	Excellent

Hybrid Setup: Local + Cloud Fallback

Use local models for most requests, fall back to cloud APIs for complex tasks:

json

{
  "routing": {
    "strategy": "fallback",
    "models": [
      { "id": "ollama-local/qwen2.5:14b", "timeout_ms": 60000, "on_error": "next" },
      { "id": "openai/gpt-4o-mini", "on_error": "fail" }
    ]
  }
}

What's Next?

Non-OpenAI Cloud Models — cost-effective API alternatives
Browse Provider Templates — ready-made provider configs
Security Hardening — lock down your local endpoint