llmfit: Find the Perfect LLM for Your Hardware

Introduction

With the explosion of Large Language Models (LLMs), developers and enthusiasts face a common challenge: finding the right model for their hardware. llmfit solves this problem by providing an intelligent terminal tool that detects your system specifications and scores hundreds of models across multiple dimensions.

Whether you're running on a high-end GPU server, a modest laptop with integrated graphics, or an Apple Silicon Mac, llmfit helps you identify which models will actually run well on your machine—before you download them.

What Makes llmfit Special

Hundreds of Models: Covers models from Meta Llama, Mistral, Qwen, Google Gemma, Microsoft Phi, DeepSeek, and many more
Multi-Dimensional Scoring: Evaluates models across quality, speed, fit, and context dimensions
Smart Hardware Detection: Automatically detects CPU, RAM, and GPU (including VRAM) across multiple platforms
Runtime Integration: Works with Ollama, llama.cpp, MLX, Docker Model Runner, and LM Studio

Key Features

Interactive TUI & CLI

llmfit ships with a beautiful terminal UI (TUI) by default, making it easy to navigate, search, and filter models. For automation workflows, a classic CLI mode is also available with JSON output options.

TUI Keyboard Shortcuts

Key	Action
`Up/Down`	Navigate models
`/`	Enter search mode
`f`	Cycle fit filter
`a`	Cycle availability filter
`s`	Cycle sort column
`v`	Enter Visual mode
`p`	Open Plan mode
`t`	Cycle color themes
`Enter`	Toggle detail view
`d`	Download selected model
`q`	Quit

Vim-like Modes

For power users, llmfit implements Vim-inspired modes:

Normal mode: Default navigation and filtering
Visual mode: Select multiple models for comparison
Select mode: Column-based filtering with arrow keys
Plan mode: Hardware planning for specific model configurations

Installation

Windows

PowerShell / Scoop

scoop install llmfit

macOS / Linux

Homebrew

brew install llmfit

Quick install script

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

Install without sudo:

Use this command to install to ~/.local/bin instead of system directories:

curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local

Docker / Podman

Docker

docker run ghcr.io/alexsjones/llmfit

Podman with jq

podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'

Build from Source

Rust (Cargo)

git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# Binary at target/release/llmfit

Usage

Basic Commands

Launch TUI

llmfit

CLI mode - All models

llmfit --cli

Perfect fits only

llmfit fit --perfect -n 5

Show system specs

llmfit system

Search models

llmfit search "llama 8b"

Model details

llmfit info "Mistral-7B"

Top recommendations (JSON)

llmfit recommend --json --limit 5

Use-case filtered

llmfit recommend --json --use-case coding --limit 3

Hardware Planning

The Plan mode inverts the typical workflow—instead of asking "what fits my hardware?", it estimates "what hardware is needed for this model config?"

Plan hardware for model

llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json

GPU Memory Override

When GPU VRAM auto-detection fails (broken drivers, VMs, passthrough), manually specify your GPU's VRAM:

Override VRAM

# 32 GB VRAM
llmfit --memory=32G

# 24 GB VRAM
llmfit --memory=24G --cli
llmfit --memory=24G fit --perfect -n 5

REST API Server

llmfit can run as an HTTP API server, exposing the same scoring and recommendation data via REST endpoints:

Start API server

llmfit serve --host 0.0.0.0 --port 8787

Endpoint	Description
`GET /health`	Liveness check
`GET /api/v1/system`	Node hardware info
`GET /api/v1/models`	Full model list with filters
`GET /api/v1/models/top`	Top runnable models for scheduling
`GET /api/v1/models/{model}`	Search by model name

How It Works

Hardware Detection

llmfit automatically detects your system specifications across multiple platforms:

NVIDIA: Multi-GPU support via nvidia-smi, aggregates VRAM across all detected GPUs
AMD: Detected via rocm-smi
Intel Arc: Discrete VRAM via sysfs, integrated via lspci
Apple Silicon: Unified memory via system_profiler, VRAM = system RAM
Ascend: Detected via npu-smi

Multi-Dimensional Scoring

Each model is scored across four dimensions (0–100 each):

Dimension	What It Measures
Quality	Parameter count, model family reputation, quantization penalty, task alignment
Speed	Estimated tokens/sec based on backend, params, and quantization
Fit	Memory utilization efficiency (sweet spot: 50–80% of available memory)
Context	Context window capability vs target for use case

Dimensions are combined into a weighted composite score. Weights vary by use-case category (General, Coding, Reasoning, Chat, Multimodal, Embedding).

Dynamic Quantization Selection

Instead of assuming a fixed quantization, llmfit walks a hierarchy from Q8_0 (best quality) down to Q2_K (most compressed), picking the highest quality that fits in available memory.

If nothing fits at full context, it automatically tries again at half context length.

MoE Support

Models with Mixture-of-Experts architectures (Mixtral, DeepSeek-V2/V3) are detected automatically. Only a subset of experts is active per token, so effective VRAM requirement is much lower than total parameter count suggests.

For example, Mixtral 8x7B has 46.7B total parameters but only activates ~12.9B per token, reducing VRAM from 23.9 GB to ~6.6 GB with expert offloading.

Speed Estimation

Token generation in LLM inference is memory-bandwidth-bound. llmfit uses actual GPU memory bandwidth to estimate throughput:

Formula

(bandwidth_GB/s / model_size_GB) × efficiency_factor

The efficiency factor (0.55) accounts for kernel overhead, KV-cache reads, and memory controller effects. The bandwidth lookup table covers ~80 GPUs across NVIDIA, AMD, and Apple Silicon families.

Runtime Provider Integration

llmfit integrates with multiple local runtime providers for seamless model management:

Ollama Integration

Automatically detects installed Ollama models, displays download status, and enables one-click pulls directly from the TUI.

Queries GET /api/tags to list installed models
Sends POST /api/pull to download new models
Shows green checkmark (✓) for installed models in the Inst column
Connects to http://localhost:11434 by default

Remote Ollama Support

Connect to Ollama on different machines using environment variables:

OLLAMA_HOST="http://192.168.1.100:11434" llmfit

llama.cpp Integration

Direct GGUF downloads from Hugging Face with local cache detection. Maps HF model names to llama.cpp tag format and marks models as installed when matching GGUF files are present locally.

Docker Model Runner Integration

Queries Docker Desktop's built-in model serving, matches models using Ollama-style tag mapping, and pulls via docker model pull.

LM Studio Integration

Connects to LM Studio's local server with built-in model download capabilities. Accepts HuggingFace model names directly and tracks download progress via polling.

Model Name Mapping

llmfit maintains an accurate mapping between HuggingFace names (e.g., Qwen/Qwen2.5-Coder-14B-Instruct) and runtime-specific naming schemes:

Ollama: qwen2.5-coder:14b
Docker Model Runner: ai/qwen2.5-coder:14b
LM Studio: Direct HuggingFace name

Platform Support

Platform	Status
Linux	Full support. GPU detection via nvidia-smi (NVIDIA), rocm-smi (AMD), sysfs/lspci (Intel Arc), and npu-smi (Ascend)
macOS (Apple Silicon)	Full support. Detects unified memory via system_profiler. VRAM = system RAM. Models run via Metal GPU acceleration.
macOS (Intel)	RAM and CPU detection works. Discrete GPU detection if nvidia-smi available.
Windows	RAM and CPU detection works. NVIDIA GPU detection via nvidia-smi if installed.
Android / Termux	CPU and RAM detection usually work. Mobile GPU autodetection not currently supported.

GPU Detection Table

Vendor	Detection Method	VRAM Reporting
NVIDIA	nvidia-smi	Exact dedicated VRAM
AMD	rocm-smi	Detected (VRAM may be unknown)
Intel Arc (discrete)	sysfs (mem_info_vram_total)	Exact dedicated VRAM
Intel Arc (integrated)	lspci	Shared system memory
Apple Silicon	system_profiler	Unified memory (= system RAM)
Ascend	npu-smi	Detected (VRAM may be unknown)

Built-in Themes

Press t to cycle through 10 beautiful color themes. Your selection is automatically saved and restored on next launch:

Theme	Description
Default	Original llmfit colors
Dracula	Dark purple background with pastel accents
Solarized	Ethan Schoonover's Solarized Dark palette
Nord	Arctic, cool blue-gray tones
Monokai	Monokai Pro warm syntax colors
Gruvbox	Retro groove palette with warm earth tones
Catppuccin Latte	🌻 Light theme — harmonious pastel inversion
Catppuccin Frappé	🪴 Low-contrast dark — muted, subdued aesthetic
Catppuccin Macchiato	🌺 Medium-contrast dark — gentle, soothing tones
Catppuccin Mocha	🌿 Darkest variant — cozy with color-rich accents

Conclusion

llmfit is an essential tool for anyone working with local LLMs. By combining intelligent hardware detection, multi-dimensional model scoring, and seamless runtime provider integration, it takes the guesswork out of model selection.

Whether you're setting up a personal development environment, deploying models in production, or just exploring the capabilities of local AI, llmfit helps you make informed decisions about which models will actually perform well on your hardware.

The open-source nature, cross-platform support, and active development make llmfit a valuable addition to any LLM enthusiast's toolkit.

Get Started

Install llmfit today and discover the perfect models for your system:

# macOS/Linux (Homebrew)
brew install llmfit

# Windows (Scoop)
scoop install llmfit

# Or use the quick install script
curl -fsSL https://llmfit.axjns.dev/install.sh | sh