How to mine TensorCash: a walkthrough

Most proof-of-work burns electricity on a number nobody wanted. TensorCash doesn’t. The work a miner does is a real forward pass — the same model inference that answers a prompt — and the proof that the pass happened is what extends the chain. The work is useful before it is ever a block.

This is the practical guide to running one. Before the hardware and the commands, start with what you actually want out of it — that’s what decides how you run.

How do you want to mine?

Pick the path that fits how you’ll run it. All three run the same worker software — they differ in where inference demand comes from and where your apps send requests.

Path	Best for	The switch
Provider / broker worker	The easiest start — earn from a provider’s demand.	`WORKER_MODE=broker` + `BROKER_WS_URL` + `PROVIDER_JWT_TOKEN`. Outbound WSS only; no inbound port.
Local HTTP + your own node	A private endpoint your own apps call.	`WORKER_MODE=standalone`, run a node, serve the OpenAI API at `http://localhost:$HTTP_PORT/v1`.
Mac / local / dev	Trying it on a laptop.	The TensorMiner app or the llama.cpp worker (Metal).

Already serving traffic, or starting cold? Either works. When external demand is low the worker generates its own prompts (backfill) so the GPU keeps mining; when real requests arrive they mine on the same forward pass at no extra cost.

Two further choices come next: which network (testnet/mainnet), and who verifies your proofs. Broker / provider workers delegate verification to the provider by default; if you run your own node, you then choose local or delegated verification (Step 3).

Setting it up, two ways. If you’d rather not do it by hand, use Option A — a prompt you paste into a coding agent (Claude Code, Cursor, …) that detects your hardware and sets everything up, asking before it changes anything. To understand and control each step, follow Option B.

Option A: hand it to a coding agent

Paste the prompt below into your coding agent. Read it first — it installs software and runs Docker, and it’s written to ask for confirmation before doing anything irreversible.

You are setting up a TensorCash mining worker on this machine. Work step by step
and ask for confirmation before installing software or running containers.

1. Detect the hardware: OS, CPU architecture, and GPU. For NVIDIA, report the
   compute capability (sm_XX). For Apple, report the chip (M-series) and unified
   RAM.

2. Choose the path:
   - Apple Silicon  -> the native TensorMiner app (Metal). If no signed build is
     available, fall back to the llama.cpp host worker.
   - NVIDIA GPU     -> Docker + vLLM. Pick VLLM_VERSION by architecture:
       Ampere / Ada (sm_80–sm_89, e.g. A100, RTX 30xx/40xx) -> 0.10.0  (prebuilt wheel)
       Hopper       (sm_90, H100/H200)                      -> 0.19.0  (prebuilt wheel)
       Blackwell    (sm_120, B200/GB10/RTX 50xx)            -> 0.19.0, BUILT FROM SOURCE
         against the CUDA-13 NVIDIA PyTorch container (the PyPI wheel is CUDA-12
         and will std::bad_alloc on the first GPU allocation). The first build is slow.
   - No GPU         -> the CPU worker image (tiny model, slow; for testing only).

3. Make sure Docker works:
     docker run --rm hello-world
     # NVIDIA only — must print your GPU:
     docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
   Install Docker (and the NVIDIA Container Toolkit, if needed) if these fail.

4. Ask me which network: testnet (start here) or mainnet.

5. Ask me where the inference should go:
   - To a provider (broker mode): set WORKER_MODE=broker and BROKER_WS_URL +
     PROVIDER_JWT_TOKEN from my provider dashboard; verification is handled by them.
   - To me, as a local model (standalone): set WORKER_MODE=standalone, run the
     local node, and expose the proxy's OpenAI endpoint (HTTP_HOST/HTTP_PORT).
   Also ask whether I want to run my own verifier (full sovereignty: the
   core-miner-validation-api stack with NODE_START_FLAGS=--desktop) or delegate it.

6. Pick a chain-approved mining model: check the explorer
   (testnet: explorer-testnet.tensorcash.org, mainnet: explorer.tensorcash.org)
   and set MODEL_NAME, plus MODEL_COMMIT to pin the approved revision.

7. Set the worker env. Read the capability values straight off the GPU:
     nvidia-smi --query-gpu=name,memory.total,compute_cap --format=csv,noheader
   -> COMPUTE_TYPE=nvidia-<compute_cap>, GPU_MODEL, GPU_MEMORY_GB. Also set
   MAX_MODEL_LEN, GPU_MEM_UTIL, WORKER_CAPACITY. Start the stack.

8. Verify: hit the worker's /health, confirm it connected to the broker (or node),
   curl /v1/chat/completions and show me the reply, then tail the logs until you
   see proofs/shares. Finally, show the log lines proving a share and how to stop it.

If that worked, you’re mining — skip to Is it working?. If you want to know what it just did, read on.

Option B: do it yourself

Step 1: What hardware are you running?

You have	Use	Engine	Notes
Apple Silicon Mac (M1–M4)	TensorMiner app, or the llama.cpp worker	llama.cpp on Metal	Native GPU acceleration; no Docker needed for the app.
NVIDIA GPU	Docker + vLLM	vLLM (CUDA)	The main path. Version depends on your GPU — see vLLM builds.
No GPU / just curious	CPU worker container	llama.cpp (CPU)	Tiny model, slow. Good for a first run on testnet.

The rest of the steps assume you’ve picked one. Mac users can skip the Docker section if they use the native app.

Step 1b: Detect your GPU and CUDA (NVIDIA)

One command tells you everything you need to fill in: which vLLM build to use, and the three capability values the worker advertises to the broker.

nvidia-smi --query-gpu=name,memory.total,compute_cap --format=csv,noheader
# e.g.  NVIDIA GeForce RTX 4090, 24564 MiB, 8.9

Read it straight across:

compute_cap → COMPUTE_TYPE = nvidia-<compute_cap> (the 8.9 above → nvidia-8.9) and picks your vLLM build (see vLLM builds): 8.0–8.9 → 0.10.0, 9.0 → 0.19.0, 12.0 (Blackwell) → 0.19.0 from source.
name → GPU_MODEL (e.g. RTX-4090).
memory.total → GPU_MEMORY_GB (round to whole GB).

Your installed CUDA (for CUDA_VERSION) is in the nvidia-smi header, or:

nvidia-smi | grep -o 'CUDA Version: [0-9.]*'   # -> CUDA Version: 12.8
nvcc --version 2>/dev/null | grep release      # if the toolkit is installed

Detecting this across a fleet in Terraform is a separate question — see Detecting CUDA across a fleet at the end.

Step 2: Docker (NVIDIA and CPU paths)

macOS: install Docker Desktop. Linux: install Docker Engine + the Compose plugin, and — for NVIDIA — the NVIDIA Container Toolkit.

Confirm it works before going further:

docker run --rm hello-world          # prints "Hello from Docker!"
docker compose version               # v2.x

# NVIDIA only — this MUST list your GPU:
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi

If nvidia-smi doesn’t see your card inside the container, fix that first — nothing below will work until it does.

Step 3: Choose your sovereignty level

A miner produces proofs. Those proofs get verified before they count, and the chain lives on a node. You decide how much of that you run yourself.

Level	You run	Verification	Start with
Delegated (simplest)	the miner only	a provider’s verifier checks your proofs	`simple-worker`
Full sovereignty	miner + node + your own verifier	your own verifier, locally	`core-miner-validation-api`

Delegated is the gentlest on-ramp: two environment variables and you’re running. Full sovereignty means you trust no one — you also run a node and re-validate every proof yourself.

Running a node or a verifier is its own topic — see How to run a node and How to run a verifier. Under the hood the switch is one node flag: --desktop runs a local verifier; --http points the node at a remote verifier you delegate to (VALIDATOR_BASE_URL + VALIDATOR_API_KEY).

Step 4: Run it

Pick the block that matches your hardware and the mode you chose above.

Mac — native (TensorMiner)

The native app runs llama-server on Metal plus a small proxy that talks to the broker. Download the latest TensorMiner build from the releases page.

Early builds are not notarized yet, so macOS Gatekeeper will refuse to open them on a double-click. Clear the quarantine flag once, then launch:

# Remove the quarantine attribute macOS adds to downloaded, un-notarized apps.
xattr -dr com.apple.quarantine /Applications/TensorMiner.app
open /Applications/TensorMiner.app

(Equivalently: right-click the app → Open → confirm the dialog once.) In the app, paste your broker URL and provider token, pick the model, and start.

Mac — llama.cpp worker (advanced)

If you’d rather run the engine yourself, launch llama-server against a GGUF model with the PoW flags. The key wiring: it serves on a local port and pushes found shares over ZMQ to the rest of the stack.

# A registered GGUF model on disk; serve it on Metal with the mining flags.
ZMQ_PUSH_HOST=127.0.0.1 ZMQ_PUSH_PORT=7067 \
PROOF_SAVE_DIR="$HOME/pow_logs" \
./llama-server \
  -m "$HOME/models/your-model.gguf" \
  --host 0.0.0.0 --port 8000 \
  --ctx-size 8192 --parallel 10 --jinja \
  --alias "Qwen/Qwen3-8B"

If the binary is also un-notarized, clear it the same way: xattr -d com.apple.quarantine ./llama-server && chmod +x ./llama-server.

NVIDIA — broker / provider worker (simplest)

One container, GPU-accelerated, outbound-only (WORKER_MODE=broker). From deployments/simple-worker/:

cp .env.example .env
# Edit .env — the only two REQUIRED values:
#   BROKER_WS_URL=wss://broker.tensorcash.org/v1/ws   (from your provider dashboard)
#   PROVIDER_JWT_TOKEN=eyJ...                          (from your provider dashboard)
docker compose up -d
docker compose logs -f

Set the advertised capability vars (COMPUTE_TYPE, GPU_MODEL, GPU_MEMORY_GB) to match your card so the broker matches jobs correctly — see the env reference.

NVIDIA — full sovereignty

Brings up the node, the vLLM backend, the miner proxy, and your own verifier in one stack:

sudo \
  API_KEY=change-me MODEL_API_KEY=change-me \
  RPC_USER=user1 RPC_PASS=pass1 \
  CUDA_VERSION=12.8.0 VLLM_VERSION=0.10.0 \
  WORKER_MODE=standalone \
  CHAIN_NAME=tensor-test \
  NODE_START_FLAGS=--desktop \
  docker compose -f deployments/docker-compose/core-miner-validation-api/docker-compose.yaml up --build

WORKER_MODE=standalone takes jobs from your local node instead of a broker; NODE_START_FLAGS=--desktop runs the local verifier. Pick VLLM_VERSION from vLLM builds for your GPU.

Use it yourself (local HTTP)

If you picked the local model path, the miner-proxy already speaks the OpenAI API — “using the compute yourself” just means pointing your apps at it. On a standalone worker, call http://localhost:$HTTP_PORT/v1/chat/completions. The proxy serves on HTTP_PORT — 8080 on the simple-worker / llama path, 8030 on the full compose stack. Every prompt to the chain-pinned mining model mines; requests for any other model are served as plain inference (audit-only, no share). Idle time is filled by backfill.

Keep it private. Leave HTTP_HOST=127.0.0.1 (localhost only) unless you mean to expose it: the proxy registers open OpenAI-style routes with permissive CORS, so if you set HTTP_HOST=0.0.0.0, put it behind a firewall or an authenticating reverse proxy. (Broker-mode workers bind to localhost as a debug surface only — real traffic arrives over the broker.)

No GPU — CPU worker

Same shape as the broker-worker path, using the CPU image and a tiny model. Slow, but it proves the pipeline end to end. From deployments/simple-worker-cpu/:

cp .env.example .env   # set BROKER_WS_URL + PROVIDER_JWT_TOKEN
docker compose up -d && docker compose logs -f

Step 4b: Set the mining model

Your proofs only count if you mine with a model the chain has approved. Mining any other model produces proofs that no verifier will accept.

See which models are approved on the explorer — start on testnet:

Testnet: explorer-testnet.tensorcash.org
Mainnet: explorer.tensorcash.org

(Or query a node directly: bitcoin-cli getmodelslist to list registered models, copy a model hash, then bitcoin-cli getmodelregistrationstatus <model_hash> to check it — getmodelinfo returns the full record.)

Set it two ways:

At startup — the env the worker boots with:

MODEL_NAME=Qwen/Qwen3-8B          # an approved model
MODEL_COMMIT=9c925d64...          # pin the exact approved revision (recommended)

At runtime — switch without a restart, via the proxy:

curl -s http://127.0.0.1:8080/v1/mining/active-model \
  -H "Authorization: Bearer internal-secret" \
  -H "Content-Type: application/json" \
  -d '{"model_name":"Qwen/Qwen3-8B","model_commit":"9c925d64..."}'
# GET the same path to see what's active now.

Pin the commit whenever you can: approval is per-revision, so an unpinned name can drift to a build the chain hasn’t approved.

Environment reference

The variables you’ll actually touch. Defaults shown are the common ones; the worker advertises the capability fields to the broker for job matching, so set them to match your real hardware.

Variable	Meaning	Typical
`BROKER_WS_URL`	Broker WebSocket endpoint (broker mode). Required.	`wss://broker.…/v1/ws`
`PROVIDER_JWT_TOKEN`	Your provider token; must start `eyJ`. Required.	`eyJ...`
`WORKER_MODE`	`broker` (provider) or `standalone` (local node)	`broker`
`HTTP_HOST` / `HTTP_PORT`	Where the proxy serves the OpenAI API (local-HTTP use)	`127.0.0.1` / `8080`
`MODEL_NAME`	Model the worker serves / mines (must be chain-approved)	`Qwen/Qwen3-8B`
`MODEL_COMMIT`	Pin the approved model revision	`9c925d64...`
`MAX_MODEL_LEN`	Max context length	`8192`
`GPU_MEM_UTIL`	Fraction of VRAM vLLM may use	`0.9`
`COMPUTE_TYPE`	Advertised architecture string	`nvidia-8.9`
`GPU_MODEL`	Advertised card	`RTX-4090`
`GPU_MEMORY_GB`	Advertised VRAM	`24`
`WORKER_CAPACITY`	Concurrent jobs to advertise	`4`
`CUDA_VERSION`	Base image CUDA tag	`12.8.0`
`VLLM_VERSION`	vLLM build (see matrix)	`0.10.0`
`CHAIN_NAME`	Network for a node you run: `tensor-test` / `tensor`	`tensor-test`
`NODE_START_FLAGS`	`--desktop` (local verifier) / `--http` (delegate)	`--desktop`

Testnet vs mainnet. A broker worker is network-agnostic — the network is simply whichever broker your BROKER_WS_URL and token point at, so start on a testnet broker. If you run your own node, the network is the node’s CHAIN_NAME: tensor-test for testnet, tensor for mainnet. Start on testnet. Mainnet mining is coordinated — check with your provider before pointing real hardware at it.

vLLM builds (TensorCash image compatibility)

These are the vLLM versions the TensorCash worker images are built and pinned against, by GPU architecture — not general vLLM advice. There’s no auto-detection; match your card:

Your GPU	Architecture	`VLLM_VERSION`	How
A100, A6000, RTX 30xx/40xx	Ampere / Ada (sm_80–sm_89)	`0.10.0`	Prebuilt wheel — just works.
H100, H200	Hopper (sm_90)	`0.19.0`	Prebuilt wheel.
B200, GB10, RTX 50xx	Blackwell (sm_120)	`0.19.0`	Build from source against the CUDA-13 NVIDIA PyTorch container.

Why the Blackwell image is a source build: the TensorCash Blackwell image runs on NVIDIA’s CUDA-13 PyTorch container, and a prebuilt CUDA-12 vLLM wheel mixed with that CUDA-13 runtime crashes on the first GPU allocation. So this image compiles vLLM from source against the container’s CUDA-13 PyTorch. That first compile is slow — subsequent runs reuse the image.

This is specific to the TensorCash image build, not a statement about upstream vLLM in general — recent upstream releases ship CUDA-12.9 wheels by default and also provide CUDA-13 binaries. If you build your own image, follow the vLLM GPU install docs (e.g. torch_cuda_arch_list for source builds) and keep the version aligned with the TensorCash PoW patch set.

Is it working?

Three checks, in order:

# 1. The worker is up.
#    Broker / simple-worker publishes NO host port by default — check from inside
#    (use the CPU service name `simple-worker-cpu` on the no-GPU image):
docker compose ps
docker compose exec simple-worker curl -fsS http://127.0.0.1:8080/health
#    Local-HTTP / full compose stack (port published) — from the host:
curl -fsS http://127.0.0.1:8030/health
#    Sovereign full stack: the node's model API answers on :8050.

# 2. It connected. In the logs you should see the broker handshake (broker mode)
#    or "ZMQ listener started (standalone mode)" (standalone):
docker compose logs -f

# 3. Proofs are flowing. Watch for proof/share lines, or the files written to
#    your proof directory (PROOF_SAVE_DIR, e.g. ~/pow_logs).

When a share clears the current target it becomes a block and the reward is paid to your wallet. Acceptance is statistical by design: the network checks a property of your proof rather than re-running your whole forward pass. The gate rejects grinding — faking a proof without doing the work — not the ordinary variation in honest output. (Two companion posts go deep on why that’s safe.)

Talk to it — happy inferencing

The miner-proxy is an OpenAI-compatible front door. On the local-HTTP path you’ve published HTTP_PORT, so point any client at it — the proof-of-work rides the same pass, invisibly (adjust the model to your active one):

curl -s http://localhost:8030/v1/chat/completions \
  -H "Authorization: Bearer internal-secret" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-8B",
    "messages": [{"role": "user", "content": "Say hello in one sentence."}],
    "max_tokens": 64
  }'

A healthy reply is the standard OpenAI shape — {"choices":[{"message":{"role":"assistant","content":"Hello! …"}}], …}; list what’s loaded with GET /v1/models. Ports: 8030 on the full compose stack, 8080 on the simple-worker / llama path. On a broker / simple-worker the proxy port isn’t published — run it inside the container instead (docker compose exec simple-worker curl -s http://127.0.0.1:8080/v1/chat/completions …) or uncomment 8080:8080 in the compose file. Raw inference without PoW injection is vLLM on :8000. Default key internal-secret, or super-secret-token on the launch_linux.sh stack.

If that returned a sentence, you’re serving inference and mining on the same forward pass. Happy inferencing.

The check that makes it real

Mining and verification are two ends of one pipeline. The proof your worker produces isn’t something you have to take on faith — it’s replayable, and the verifier is open source. Run it against a share you produced, then against one you didn’t: anyone can confirm which model did the work and that the replay matches, without trusting the miner.

If you delegate verification, that’s exactly the step a provider runs for you — and the moment you want to stop trusting them, you bring up your own verifier and switch --http to --desktop. Nothing else changes.

Going deeper

You can mine with everything above and never read this part. It’s here for when you want to tune the worker or run a fleet.

What the miner-proxy actually does

The miner-proxy is the brain of a worker. It’s a small service that sits in front of the inference engine and does three jobs at once:

Serves inference. It exposes the OpenAI-compatible API (/v1/chat/completions, /v1/completions, /v1/responses, /v1/models) and forwards requests to vLLM on :8000. To a client it looks exactly like an OpenAI endpoint.
Injects the proof-of-work. Onto each forward pass it attaches the current mining context — block hash, VDF output, tick, and target. vLLM’s PoW sampler computes the proof during generation, so producing the answer and producing the proof are one pass, not two.
Collects and routes proofs. A ProofCollector receives finished proofs from the sampler over ZMQ. A block-tier solution is forwarded as a result; a sub-target share goes out on its own path. Where they go depends on mode:

	Standalone (you run a node)	Broker (provider-routed)
Mining jobs (new block / VDF) come from	your local core-node, over ZMQ	the broker, over WebSocket
Results / shares go to	your core-node	the broker
Inbound ports	local ZMQ to the node	none — outbound WSS only

The proxy also runs the VDF prover, caches proofs (served at /v1/proof/{id} so a verifier can fetch them), and exposes /health and /status.

Configuring mining: traffic, backfill, and what gets published

The problem: mining needs forward passes, but you don’t always have users sending prompts. An idle GPU mines nothing.

The fix — backfill. When real traffic drops below a target, the proxy generates its own synthetic prompts to keep the GPU busy and the proofs flowing. It refills toward a warm pool of MIN_ACTIVE_REQUESTS concurrent requests, checked every MONITOR_INTERVAL seconds. Backfill prompts target your chain-pinned mining model and (by default) carry a /no_think directive so the output stays in the regime the consensus entropy gate expects.

The knobs:

Variable	Default	What it does
`MINING_ENABLED`	`true`	Master switch. In broker mode, `false` = inference-only, no PoW.
`MIN_ACTIVE_REQUESTS`	`32`	Warm-pool target; backfill refills below this.
`MONITOR_INTERVAL`	`1.0`	Seconds between backfill checks.
`MINING_DISABLE_THINKING`	`true`	Append `/no_think` to backfill prompts.
`MINING_STALE_THRESHOLD_SECONDS`	`60`	Pause backfill if no fresh block this long.
`MINING_SOLUTION_COOLDOWN_SEC`	`0`	Optional pause after finding a solution.

“Do I publish only the backfill, or everything?” This is decided by model, not by a flag. The rule:

A proof from your chain-pinned mining model is published as a share/solution — whether it came from backfill or from a real user prompt. Real traffic on the mining model mines for free.
A proof from any other model is audit-only: cached for inspection, never submitted.

So there’s no “backfill-only” toggle. If you want your user-facing inference to not mine and only the backfill to count, run two models: a non-pinned model for users (audit path) and a separate chain-pinned mining model that only the backfill hits — the dual-backend pattern (MINING_VLLM_ENABLED=true, a second instance on :8001). Otherwise, the simplest and most efficient setup is one pinned mining model that serves users and mines on every pass.

Detecting CUDA across a fleet (Terraform)

On one box the one-liner above is enough. Across a fleet you want it automatic. Two honest options:

How the fleet does it today: it doesn’t detect in Terraform. The CUDA + vLLM + target architecture are baked into the container image tag, and each node group is pinned to known hardware with a Kubernetes nodeSelector (or hostname match). The image’s COMPUTE_TYPE is a fixed ENV. Simple and explicit, but it means the image and the node must be kept in agreement by hand.

The idiomatic detect-and-set: have Terraform run the detection and feed the result into the worker’s env. A data "external" source works when the apply can reach the GPU host:

data "external" "gpu" {
  program = ["bash", "-c", <<-EOT
    read name mem cc < <(nvidia-smi \
      --query-gpu=name,memory.total,compute_cap \
      --format=csv,noheader,nounits | head -n1 | tr ',' ' ')
    jq -n --arg n "$name" --arg m "$mem" --arg c "$cc" \
      '{gpu_model:$n, gpu_memory_gb:$m, compute_type:("nvidia-"+$c)}'
  EOT
  ]
}

# then inject into the worker:
#   COMPUTE_TYPE  = data.external.gpu.result.compute_type   # e.g. nvidia-8.9
#   GPU_MODEL     = data.external.gpu.result.gpu_model
#   GPU_MEMORY_GB = data.external.gpu.result.gpu_memory_gb

data "external" runs on the Terraform host, so this fits bare-metal or a node you SSH-wrap to. For cloud instances Terraform can’t exec on, do the same detection in a user_data / cloud-init step that writes /etc/worker.env from nvidia-smi --query-gpu=...,compute_cap and have the container read it — exactly what the one-box command does, hoisted to boot. Either way the mapping stays single-sourced instead of duplicated across an image tag and a node label.

Where to go next

Build hub — every way to participate: /build/
Run it locally, end to end — the regtest guide: /docs/regtest/
Core node API / RPC — /docs/core-node/api/ · /docs/rpc/
Run your own node / verifier — How to run a node · How to run a verifier
The verifier API — /docs/verifier/api/
Approved models & block history — the explorers: mainnet · testnet
Why it’s safe — the Verification whitepaper

The work isn’t a wasted hash. It’s a forward pass someone wanted.

Authored pseudonymously by Imosuke Takakuni.