Most proof-of-work burns electricity on a number nobody wanted. TensorCash doesn’t. The work a miner does is a real forward pass — the same model inference that answers a prompt — and the proof that the pass happened is what extends the chain. The work is useful before it is ever a block.
This is the practical guide to running one. Before the hardware and the commands, start with what you actually want out of it — that’s what decides how you run.
How do you want to mine?
Pick the path that fits how you’ll run it. All three run the same worker software — they differ in where inference demand comes from and where your apps send requests.
| Path | Best for | The switch |
|---|---|---|
| Provider / broker worker | The easiest start — earn from a provider’s demand. | WORKER_MODE=broker + BROKER_WS_URL + PROVIDER_JWT_TOKEN. Outbound WSS only; no inbound port. |
| Local HTTP + your own node | A private endpoint your own apps call. | WORKER_MODE=standalone, run a node, serve the OpenAI API at http://localhost:$HTTP_PORT/v1. |
| Mac / local / dev | Trying it on a laptop. | The TensorMiner app or the llama.cpp worker (Metal). |
Already serving traffic, or starting cold? Either works. When external demand is low the worker generates its own prompts (backfill) so the GPU keeps mining; when real requests arrive they mine on the same forward pass at no extra cost.
Two further choices come next: which network (testnet/mainnet), and who verifies your proofs. Broker / provider workers delegate verification to the provider by default; if you run your own node, you then choose local or delegated verification (Step 3).
Setting it up, two ways. If you’d rather not do it by hand, use Option A — a prompt you paste into a coding agent (Claude Code, Cursor, …) that detects your hardware and sets everything up, asking before it changes anything. To understand and control each step, follow Option B.
Option A: hand it to a coding agent
Paste the prompt below into your coding agent. Read it first — it installs software and runs Docker, and it’s written to ask for confirmation before doing anything irreversible.
You are setting up a TensorCash mining worker on this machine. Work step by step
and ask for confirmation before installing software or running containers.
1. Detect the hardware: OS, CPU architecture, and GPU. For NVIDIA, report the
compute capability (sm_XX). For Apple, report the chip (M-series) and unified
RAM.
2. Choose the path:
- Apple Silicon -> the native TensorMiner app (Metal). If no signed build is
available, fall back to the llama.cpp host worker.
- NVIDIA GPU -> Docker + vLLM. Pick VLLM_VERSION by architecture:
Ampere / Ada (sm_80–sm_89, e.g. A100, RTX 30xx/40xx) -> 0.10.0 (prebuilt wheel)
Hopper (sm_90, H100/H200) -> 0.19.0 (prebuilt wheel)
Blackwell (sm_120, B200/GB10/RTX 50xx) -> 0.19.0, BUILT FROM SOURCE
against the CUDA-13 NVIDIA PyTorch container (the PyPI wheel is CUDA-12
and will std::bad_alloc on the first GPU allocation). The first build is slow.
- No GPU -> the CPU worker image (tiny model, slow; for testing only).
3. Make sure Docker works:
docker run --rm hello-world
# NVIDIA only — must print your GPU:
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
Install Docker (and the NVIDIA Container Toolkit, if needed) if these fail.
4. Ask me which network: testnet (start here) or mainnet.
5. Ask me where the inference should go:
- To a provider (broker mode): set WORKER_MODE=broker and BROKER_WS_URL +
PROVIDER_JWT_TOKEN from my provider dashboard; verification is handled by them.
- To me, as a local model (standalone): set WORKER_MODE=standalone, run the
local node, and expose the proxy's OpenAI endpoint (HTTP_HOST/HTTP_PORT).
Also ask whether I want to run my own verifier (full sovereignty: the
core-miner-validation-api stack with NODE_START_FLAGS=--desktop) or delegate it.
6. Pick a chain-approved mining model: check the explorer
(testnet: explorer-testnet.tensorcash.org, mainnet: explorer.tensorcash.org)
and set MODEL_NAME, plus MODEL_COMMIT to pin the approved revision.
7. Set the worker env. Read the capability values straight off the GPU:
nvidia-smi --query-gpu=name,memory.total,compute_cap --format=csv,noheader
-> COMPUTE_TYPE=nvidia-<compute_cap>, GPU_MODEL, GPU_MEMORY_GB. Also set
MAX_MODEL_LEN, GPU_MEM_UTIL, WORKER_CAPACITY. Start the stack.
8. Verify: hit the worker's /health, confirm it connected to the broker (or node),
curl /v1/chat/completions and show me the reply, then tail the logs until you
see proofs/shares. Finally, show the log lines proving a share and how to stop it.
If that worked, you’re mining — skip to Is it working?. If you want to know what it just did, read on.
Option B: do it yourself
Step 1: What hardware are you running?
| You have | Use | Engine | Notes |
|---|---|---|---|
| Apple Silicon Mac (M1–M4) | TensorMiner app, or the llama.cpp worker | llama.cpp on Metal | Native GPU acceleration; no Docker needed for the app. |
| NVIDIA GPU | Docker + vLLM | vLLM (CUDA) | The main path. Version depends on your GPU — see vLLM builds. |
| No GPU / just curious | CPU worker container | llama.cpp (CPU) | Tiny model, slow. Good for a first run on testnet. |
The rest of the steps assume you’ve picked one. Mac users can skip the Docker section if they use the native app.
Step 1b: Detect your GPU and CUDA (NVIDIA)
One command tells you everything you need to fill in: which vLLM build to use, and the three capability values the worker advertises to the broker.
nvidia-smi --query-gpu=name,memory.total,compute_cap --format=csv,noheader
# e.g. NVIDIA GeForce RTX 4090, 24564 MiB, 8.9
Read it straight across:
compute_cap→COMPUTE_TYPE = nvidia-<compute_cap>(the8.9above →nvidia-8.9) and picks your vLLM build (see vLLM builds):8.0–8.9→0.10.0,9.0→0.19.0,12.0(Blackwell) →0.19.0from source.name→GPU_MODEL(e.g.RTX-4090).memory.total→GPU_MEMORY_GB(round to whole GB).
Your installed CUDA (for CUDA_VERSION) is in the nvidia-smi header, or:
nvidia-smi | grep -o 'CUDA Version: [0-9.]*' # -> CUDA Version: 12.8
nvcc --version 2>/dev/null | grep release # if the toolkit is installed
Detecting this across a fleet in Terraform is a separate question — see Detecting CUDA across a fleet at the end.
Step 2: Docker (NVIDIA and CPU paths)
macOS: install Docker Desktop. Linux: install Docker Engine + the Compose plugin, and — for NVIDIA — the NVIDIA Container Toolkit.
Confirm it works before going further:
docker run --rm hello-world # prints "Hello from Docker!"
docker compose version # v2.x
# NVIDIA only — this MUST list your GPU:
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
If nvidia-smi doesn’t see your card inside the container, fix that first —
nothing below will work until it does.
Step 3: Choose your sovereignty level
A miner produces proofs. Those proofs get verified before they count, and the chain lives on a node. You decide how much of that you run yourself.
| Level | You run | Verification | Start with |
|---|---|---|---|
| Delegated (simplest) | the miner only | a provider’s verifier checks your proofs | simple-worker |
| Full sovereignty | miner + node + your own verifier | your own verifier, locally | core-miner-validation-api |
Delegated is the gentlest on-ramp: two environment variables and you’re running. Full sovereignty means you trust no one — you also run a node and re-validate every proof yourself.
Running a node or a verifier is its own topic — see How to run a node and How to run a verifier. Under the hood the switch is one node flag:
--desktopruns a local verifier;--httppoints the node at a remote verifier you delegate to (VALIDATOR_BASE_URL+VALIDATOR_API_KEY).
Step 4: Run it
Pick the block that matches your hardware and the mode you chose above.
Mac — native (TensorMiner)
The native app runs llama-server on Metal plus a small proxy that talks to the
broker. Download the latest TensorMiner build from the releases page.
Early builds are not notarized yet, so macOS Gatekeeper will refuse to open them on a double-click. Clear the quarantine flag once, then launch:
# Remove the quarantine attribute macOS adds to downloaded, un-notarized apps.
xattr -dr com.apple.quarantine /Applications/TensorMiner.app
open /Applications/TensorMiner.app
(Equivalently: right-click the app → Open → confirm the dialog once.) In the app, paste your broker URL and provider token, pick the model, and start.
Mac — llama.cpp worker (advanced)
If you’d rather run the engine yourself, launch llama-server against a GGUF
model with the PoW flags. The key wiring: it serves on a local port and pushes
found shares over ZMQ to the rest of the stack.
# A registered GGUF model on disk; serve it on Metal with the mining flags.
ZMQ_PUSH_HOST=127.0.0.1 ZMQ_PUSH_PORT=7067 \
PROOF_SAVE_DIR="$HOME/pow_logs" \
./llama-server \
-m "$HOME/models/your-model.gguf" \
--host 0.0.0.0 --port 8000 \
--ctx-size 8192 --parallel 10 --jinja \
--alias "Qwen/Qwen3-8B"
If the binary is also un-notarized, clear it the same way:
xattr -d com.apple.quarantine ./llama-server && chmod +x ./llama-server.
NVIDIA — broker / provider worker (simplest)
One container, GPU-accelerated, outbound-only (WORKER_MODE=broker). From
deployments/simple-worker/:
cp .env.example .env
# Edit .env — the only two REQUIRED values:
# BROKER_WS_URL=wss://broker.tensorcash.org/v1/ws (from your provider dashboard)
# PROVIDER_JWT_TOKEN=eyJ... (from your provider dashboard)
docker compose up -d
docker compose logs -f
Set the advertised capability vars (COMPUTE_TYPE, GPU_MODEL, GPU_MEMORY_GB)
to match your card so the broker matches jobs correctly — see the
env reference.
NVIDIA — full sovereignty
Brings up the node, the vLLM backend, the miner proxy, and your own verifier in one stack:
sudo \
API_KEY=change-me MODEL_API_KEY=change-me \
RPC_USER=user1 RPC_PASS=pass1 \
CUDA_VERSION=12.8.0 VLLM_VERSION=0.10.0 \
WORKER_MODE=standalone \
CHAIN_NAME=tensor-test \
NODE_START_FLAGS=--desktop \
docker compose -f deployments/docker-compose/core-miner-validation-api/docker-compose.yaml up --build
WORKER_MODE=standalone takes jobs from your local node instead of a broker;
NODE_START_FLAGS=--desktop runs the local verifier. Pick VLLM_VERSION from
vLLM builds for your GPU.
Use it yourself (local HTTP)
If you picked the local model path, the miner-proxy already speaks the OpenAI
API — “using the compute yourself” just means pointing your apps at it. On a
standalone worker, call http://localhost:$HTTP_PORT/v1/chat/completions. The proxy
serves on HTTP_PORT — 8080 on the simple-worker / llama path, 8030 on
the full compose stack. Every prompt to the chain-pinned mining model mines;
requests for any other model are served as plain inference (audit-only, no share).
Idle time is filled by backfill.
Keep it private. Leave HTTP_HOST=127.0.0.1 (localhost only) unless you mean to
expose it: the proxy registers open OpenAI-style routes with permissive CORS, so if
you set HTTP_HOST=0.0.0.0, put it behind a firewall or an authenticating reverse
proxy. (Broker-mode workers bind to localhost as a debug surface only — real traffic
arrives over the broker.)
No GPU — CPU worker
Same shape as the broker-worker path, using the CPU image and a tiny model.
Slow, but it proves the pipeline end to end. From deployments/simple-worker-cpu/:
cp .env.example .env # set BROKER_WS_URL + PROVIDER_JWT_TOKEN
docker compose up -d && docker compose logs -f
Step 4b: Set the mining model
Your proofs only count if you mine with a model the chain has approved. Mining any other model produces proofs that no verifier will accept.
See which models are approved on the explorer — start on testnet:
- Testnet: explorer-testnet.tensorcash.org
- Mainnet: explorer.tensorcash.org
(Or query a node directly: bitcoin-cli getmodelslist to list registered models,
copy a model hash, then bitcoin-cli getmodelregistrationstatus <model_hash> to
check it — getmodelinfo returns the full record.)
Set it two ways:
-
At startup — the env the worker boots with:
MODEL_NAME=Qwen/Qwen3-8B # an approved model MODEL_COMMIT=9c925d64... # pin the exact approved revision (recommended) -
At runtime — switch without a restart, via the proxy:
curl -s http://127.0.0.1:8080/v1/mining/active-model \ -H "Authorization: Bearer internal-secret" \ -H "Content-Type: application/json" \ -d '{"model_name":"Qwen/Qwen3-8B","model_commit":"9c925d64..."}' # GET the same path to see what's active now.
Pin the commit whenever you can: approval is per-revision, so an unpinned name can drift to a build the chain hasn’t approved.
Environment reference
The variables you’ll actually touch. Defaults shown are the common ones; the worker advertises the capability fields to the broker for job matching, so set them to match your real hardware.
| Variable | Meaning | Typical |
|---|---|---|
BROKER_WS_URL | Broker WebSocket endpoint (broker mode). Required. | wss://broker.…/v1/ws |
PROVIDER_JWT_TOKEN | Your provider token; must start eyJ. Required. | eyJ... |
WORKER_MODE | broker (provider) or standalone (local node) | broker |
HTTP_HOST / HTTP_PORT | Where the proxy serves the OpenAI API (local-HTTP use) | 127.0.0.1 / 8080 |
MODEL_NAME | Model the worker serves / mines (must be chain-approved) | Qwen/Qwen3-8B |
MODEL_COMMIT | Pin the approved model revision | 9c925d64... |
MAX_MODEL_LEN | Max context length | 8192 |
GPU_MEM_UTIL | Fraction of VRAM vLLM may use | 0.9 |
COMPUTE_TYPE | Advertised architecture string | nvidia-8.9 |
GPU_MODEL | Advertised card | RTX-4090 |
GPU_MEMORY_GB | Advertised VRAM | 24 |
WORKER_CAPACITY | Concurrent jobs to advertise | 4 |
CUDA_VERSION | Base image CUDA tag | 12.8.0 |
VLLM_VERSION | vLLM build (see matrix) | 0.10.0 |
CHAIN_NAME | Network for a node you run: tensor-test / tensor | tensor-test |
NODE_START_FLAGS | --desktop (local verifier) / --http (delegate) | --desktop |
Testnet vs mainnet. A broker worker is network-agnostic — the network is simply
whichever broker your BROKER_WS_URL and token point at, so start on a testnet
broker. If you run your own node, the network is the node’s CHAIN_NAME:
tensor-test for testnet, tensor for mainnet. Start on testnet. Mainnet
mining is coordinated — check with your provider before pointing real hardware at
it.
vLLM builds (TensorCash image compatibility)
These are the vLLM versions the TensorCash worker images are built and pinned against, by GPU architecture — not general vLLM advice. There’s no auto-detection; match your card:
| Your GPU | Architecture | VLLM_VERSION | How |
|---|---|---|---|
| A100, A6000, RTX 30xx/40xx | Ampere / Ada (sm_80–sm_89) | 0.10.0 | Prebuilt wheel — just works. |
| H100, H200 | Hopper (sm_90) | 0.19.0 | Prebuilt wheel. |
| B200, GB10, RTX 50xx | Blackwell (sm_120) | 0.19.0 | Build from source against the CUDA-13 NVIDIA PyTorch container. |
Why the Blackwell image is a source build: the TensorCash Blackwell image runs on NVIDIA’s CUDA-13 PyTorch container, and a prebuilt CUDA-12 vLLM wheel mixed with that CUDA-13 runtime crashes on the first GPU allocation. So this image compiles vLLM from source against the container’s CUDA-13 PyTorch. That first compile is slow — subsequent runs reuse the image.
This is specific to the TensorCash image build, not a statement about upstream vLLM
in general — recent upstream releases ship CUDA-12.9 wheels by default and also
provide CUDA-13 binaries. If you build your own image, follow the
vLLM GPU install docs
(e.g. torch_cuda_arch_list for source builds) and keep the version aligned with the
TensorCash PoW patch set.
Is it working?
Three checks, in order:
# 1. The worker is up.
# Broker / simple-worker publishes NO host port by default — check from inside
# (use the CPU service name `simple-worker-cpu` on the no-GPU image):
docker compose ps
docker compose exec simple-worker curl -fsS http://127.0.0.1:8080/health
# Local-HTTP / full compose stack (port published) — from the host:
curl -fsS http://127.0.0.1:8030/health
# Sovereign full stack: the node's model API answers on :8050.
# 2. It connected. In the logs you should see the broker handshake (broker mode)
# or "ZMQ listener started (standalone mode)" (standalone):
docker compose logs -f
# 3. Proofs are flowing. Watch for proof/share lines, or the files written to
# your proof directory (PROOF_SAVE_DIR, e.g. ~/pow_logs).
When a share clears the current target it becomes a block and the reward is paid to your wallet. Acceptance is statistical by design: the network checks a property of your proof rather than re-running your whole forward pass. The gate rejects grinding — faking a proof without doing the work — not the ordinary variation in honest output. (Two companion posts go deep on why that’s safe.)
Talk to it — happy inferencing
The miner-proxy is an OpenAI-compatible front door. On the local-HTTP path
you’ve published HTTP_PORT, so point any client at it — the proof-of-work rides the
same pass, invisibly (adjust the model to your active one):
curl -s http://localhost:8030/v1/chat/completions \
-H "Authorization: Bearer internal-secret" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-8B",
"messages": [{"role": "user", "content": "Say hello in one sentence."}],
"max_tokens": 64
}'
A healthy reply is the standard OpenAI shape —
{"choices":[{"message":{"role":"assistant","content":"Hello! …"}}], …}; list
what’s loaded with GET /v1/models. Ports: 8030 on the full compose stack,
8080 on the simple-worker / llama path. On a broker / simple-worker the
proxy port isn’t published — run it inside the container instead
(docker compose exec simple-worker curl -s http://127.0.0.1:8080/v1/chat/completions …)
or uncomment 8080:8080 in the compose file. Raw inference without PoW injection is
vLLM on :8000. Default key internal-secret, or super-secret-token on the
launch_linux.sh stack.
If that returned a sentence, you’re serving inference and mining on the same forward pass. Happy inferencing.
The check that makes it real
Mining and verification are two ends of one pipeline. The proof your worker produces isn’t something you have to take on faith — it’s replayable, and the verifier is open source. Run it against a share you produced, then against one you didn’t: anyone can confirm which model did the work and that the replay matches, without trusting the miner.
If you delegate verification, that’s exactly the step a provider runs for you — and the
moment you want to stop trusting them, you bring up your own verifier and switch
--http to --desktop. Nothing else changes.
Going deeper
You can mine with everything above and never read this part. It’s here for when you want to tune the worker or run a fleet.
What the miner-proxy actually does
The miner-proxy is the brain of a worker. It’s a small service that sits in front of the inference engine and does three jobs at once:
- Serves inference. It exposes the OpenAI-compatible API (
/v1/chat/completions,/v1/completions,/v1/responses,/v1/models) and forwards requests to vLLM on:8000. To a client it looks exactly like an OpenAI endpoint. - Injects the proof-of-work. Onto each forward pass it attaches the current mining context — block hash, VDF output, tick, and target. vLLM’s PoW sampler computes the proof during generation, so producing the answer and producing the proof are one pass, not two.
- Collects and routes proofs. A
ProofCollectorreceives finished proofs from the sampler over ZMQ. A block-tier solution is forwarded as a result; a sub-target share goes out on its own path. Where they go depends on mode:
| Standalone (you run a node) | Broker (provider-routed) | |
|---|---|---|
| Mining jobs (new block / VDF) come from | your local core-node, over ZMQ | the broker, over WebSocket |
| Results / shares go to | your core-node | the broker |
| Inbound ports | local ZMQ to the node | none — outbound WSS only |
The proxy also runs the VDF prover, caches proofs (served at /v1/proof/{id} so a
verifier can fetch them), and exposes /health and /status.
Configuring mining: traffic, backfill, and what gets published
The problem: mining needs forward passes, but you don’t always have users sending prompts. An idle GPU mines nothing.
The fix — backfill. When real traffic drops below a target, the proxy
generates its own synthetic prompts to keep the GPU busy and the proofs
flowing. It refills toward a warm pool of MIN_ACTIVE_REQUESTS concurrent
requests, checked every MONITOR_INTERVAL seconds. Backfill prompts target your
chain-pinned mining model and (by default) carry a /no_think directive so the
output stays in the regime the consensus entropy gate expects.
The knobs:
| Variable | Default | What it does |
|---|---|---|
MINING_ENABLED | true | Master switch. In broker mode, false = inference-only, no PoW. |
MIN_ACTIVE_REQUESTS | 32 | Warm-pool target; backfill refills below this. |
MONITOR_INTERVAL | 1.0 | Seconds between backfill checks. |
MINING_DISABLE_THINKING | true | Append /no_think to backfill prompts. |
MINING_STALE_THRESHOLD_SECONDS | 60 | Pause backfill if no fresh block this long. |
MINING_SOLUTION_COOLDOWN_SEC | 0 | Optional pause after finding a solution. |
“Do I publish only the backfill, or everything?” This is decided by model, not by a flag. The rule:
- A proof from your chain-pinned mining model is published as a share/solution — whether it came from backfill or from a real user prompt. Real traffic on the mining model mines for free.
- A proof from any other model is audit-only: cached for inspection, never submitted.
So there’s no “backfill-only” toggle. If you want your user-facing inference to
not mine and only the backfill to count, run two models: a non-pinned model for
users (audit path) and a separate chain-pinned mining model that only the backfill
hits — the dual-backend pattern (MINING_VLLM_ENABLED=true, a second instance on
:8001). Otherwise, the simplest and most efficient setup is one pinned mining
model that serves users and mines on every pass.
Detecting CUDA across a fleet (Terraform)
On one box the one-liner above is enough. Across a fleet you want it automatic. Two honest options:
How the fleet does it today: it doesn’t detect in Terraform. The CUDA + vLLM +
target architecture are baked into the container image tag, and each node group
is pinned to known hardware with a Kubernetes nodeSelector (or hostname match).
The image’s COMPUTE_TYPE is a fixed ENV. Simple and explicit, but it means the
image and the node must be kept in agreement by hand.
The idiomatic detect-and-set: have Terraform run the detection and feed the
result into the worker’s env. A data "external" source works when the apply can
reach the GPU host:
data "external" "gpu" {
program = ["bash", "-c", <<-EOT
read name mem cc < <(nvidia-smi \
--query-gpu=name,memory.total,compute_cap \
--format=csv,noheader,nounits | head -n1 | tr ',' ' ')
jq -n --arg n "$name" --arg m "$mem" --arg c "$cc" \
'{gpu_model:$n, gpu_memory_gb:$m, compute_type:("nvidia-"+$c)}'
EOT
]
}
# then inject into the worker:
# COMPUTE_TYPE = data.external.gpu.result.compute_type # e.g. nvidia-8.9
# GPU_MODEL = data.external.gpu.result.gpu_model
# GPU_MEMORY_GB = data.external.gpu.result.gpu_memory_gb
data "external" runs on the Terraform host, so this fits bare-metal or a
node you SSH-wrap to. For cloud instances Terraform can’t exec on, do the same
detection in a user_data / cloud-init step that writes /etc/worker.env from
nvidia-smi --query-gpu=...,compute_cap and have the container read it — exactly
what the one-box command does, hoisted to boot. Either way the mapping stays
single-sourced instead of duplicated across an image tag and a node label.
Where to go next
- Build hub — every way to participate: /build/
- Run it locally, end to end — the regtest guide: /docs/regtest/
- Core node API / RPC — /docs/core-node/api/ · /docs/rpc/
- Run your own node / verifier — How to run a node · How to run a verifier
- The verifier API — /docs/verifier/api/
- Approved models & block history — the explorers: mainnet · testnet
- Why it’s safe — the Verification whitepaper
The work isn’t a wasted hash. It’s a forward pass someone wanted.
Authored pseudonymously by Imosuke Takakuni.