Find the Best GPU for Your LLM

Real benchmarks across -- GPUs and -- models. Pick a model, speed target, and budget.

At 20 tok/s, a streamed response looks like:

Links below are Amazon affiliate links. Clicking costs you nothing extra.

Loading data…

Benchmark Results

Tokens/second (median). Click column headers to sort. GPU names link to Amazon.

Loading…

Legend: ⚠ CPU offload — model exceeds VRAM, layers run on system RAM  |  OOM — tested, failed: not enough free RAM  |  ~ Partial offload — minor penalty

Model Deep Dive

Compare GPU performance for a specific model. See how different hardware stacks up.

tok/s — raw inference speed

GPU Full Partial Offload CPU Offload