Covert Tool

llmfit: Find the Perfect LLM for Your Hardware

A terminal tool that right-sizes LLM models to your system's RAM, CPU, and GPU with intelligent scoring and runtime provider integration.

Introduction

With the explosion of Large Language Models (LLMs), developers and enthusiasts face a common challenge: finding the right model for their hardware. llmfit solves this problem by providing an intelligent terminal tool that detects your system specifications and scores hundreds of models across multiple dimensions.

Whether you're running on a high-end GPU server, a modest laptop with integrated graphics, or an Apple Silicon Mac, llmfit helps you identify which models will actually run well on your machine—before you download them.

What Makes llmfit Special

  • Hundreds of Models: Covers models from Meta Llama, Mistral, Qwen, Google Gemma, Microsoft Phi, DeepSeek, and many more
  • Multi-Dimensional Scoring: Evaluates models across quality, speed, fit, and context dimensions
  • Smart Hardware Detection: Automatically detects CPU, RAM, and GPU (including VRAM) across multiple platforms
  • Runtime Integration: Works with Ollama, llama.cpp, MLX, Docker Model Runner, and LM Studio

Key Features

Interactive TUI & CLI

llmfit ships with a beautiful terminal UI (TUI) by default, making it easy to navigate, search, and filter models. For automation workflows, a classic CLI mode is also available with JSON output options.

TUI Keyboard Shortcuts

Key Action
Up/DownNavigate models
/Enter search mode
fCycle fit filter
aCycle availability filter
sCycle sort column
vEnter Visual mode
pOpen Plan mode
tCycle color themes
EnterToggle detail view
dDownload selected model
qQuit

Vim-like Modes

For power users, llmfit implements Vim-inspired modes:

  • Normal mode: Default navigation and filtering
  • Visual mode: Select multiple models for comparison
  • Select mode: Column-based filtering with arrow keys
  • Plan mode: Hardware planning for specific model configurations

Installation

Windows

PowerShell / Scoop
scoop install llmfit

macOS / Linux

Homebrew
brew install llmfit
Quick install script
curl -fsSL https://llmfit.axjns.dev/install.sh | sh

Install without sudo:

Use this command to install to ~/.local/bin instead of system directories:

curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local

Docker / Podman

Docker
docker run ghcr.io/alexsjones/llmfit
Podman with jq
podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'

Build from Source

Rust (Cargo)
git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# Binary at target/release/llmfit

Usage

Basic Commands

Launch TUI
llmfit
CLI mode - All models
llmfit --cli
Perfect fits only
llmfit fit --perfect -n 5
Show system specs
llmfit system
Search models
llmfit search "llama 8b"
Model details
llmfit info "Mistral-7B"
Top recommendations (JSON)
llmfit recommend --json --limit 5
Use-case filtered
llmfit recommend --json --use-case coding --limit 3

Hardware Planning

The Plan mode inverts the typical workflow—instead of asking "what fits my hardware?", it estimates "what hardware is needed for this model config?"

Plan hardware for model
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json

GPU Memory Override

When GPU VRAM auto-detection fails (broken drivers, VMs, passthrough), manually specify your GPU's VRAM:

Override VRAM
# 32 GB VRAM
llmfit --memory=32G

# 24 GB VRAM
llmfit --memory=24G --cli
llmfit --memory=24G fit --perfect -n 5

REST API Server

llmfit can run as an HTTP API server, exposing the same scoring and recommendation data via REST endpoints:

Start API server
llmfit serve --host 0.0.0.0 --port 8787
Endpoint Description
GET /health Liveness check
GET /api/v1/system Node hardware info
GET /api/v1/models Full model list with filters
GET /api/v1/models/top Top runnable models for scheduling
GET /api/v1/models/{model} Search by model name

How It Works

Hardware Detection

llmfit automatically detects your system specifications across multiple platforms:

  • NVIDIA: Multi-GPU support via nvidia-smi, aggregates VRAM across all detected GPUs
  • AMD: Detected via rocm-smi
  • Intel Arc: Discrete VRAM via sysfs, integrated via lspci
  • Apple Silicon: Unified memory via system_profiler, VRAM = system RAM
  • Ascend: Detected via npu-smi

Multi-Dimensional Scoring

Each model is scored across four dimensions (0–100 each):

Dimension What It Measures
Quality Parameter count, model family reputation, quantization penalty, task alignment
Speed Estimated tokens/sec based on backend, params, and quantization
Fit Memory utilization efficiency (sweet spot: 50–80% of available memory)
Context Context window capability vs target for use case

Dimensions are combined into a weighted composite score. Weights vary by use-case category (General, Coding, Reasoning, Chat, Multimodal, Embedding).

Dynamic Quantization Selection

Instead of assuming a fixed quantization, llmfit walks a hierarchy from Q8_0 (best quality) down to Q2_K (most compressed), picking the highest quality that fits in available memory.

If nothing fits at full context, it automatically tries again at half context length.

MoE Support

Models with Mixture-of-Experts architectures (Mixtral, DeepSeek-V2/V3) are detected automatically. Only a subset of experts is active per token, so effective VRAM requirement is much lower than total parameter count suggests.

For example, Mixtral 8x7B has 46.7B total parameters but only activates ~12.9B per token, reducing VRAM from 23.9 GB to ~6.6 GB with expert offloading.

Speed Estimation

Token generation in LLM inference is memory-bandwidth-bound. llmfit uses actual GPU memory bandwidth to estimate throughput:

Formula
(bandwidth_GB/s / model_size_GB) × efficiency_factor

The efficiency factor (0.55) accounts for kernel overhead, KV-cache reads, and memory controller effects. The bandwidth lookup table covers ~80 GPUs across NVIDIA, AMD, and Apple Silicon families.

Runtime Provider Integration

llmfit integrates with multiple local runtime providers for seamless model management:

Ollama Integration

Automatically detects installed Ollama models, displays download status, and enables one-click pulls directly from the TUI.

  • Queries GET /api/tags to list installed models
  • Sends POST /api/pull to download new models
  • Shows green checkmark (✓) for installed models in the Inst column
  • Connects to http://localhost:11434 by default

Remote Ollama Support

Connect to Ollama on different machines using environment variables:

OLLAMA_HOST="http://192.168.1.100:11434" llmfit

llama.cpp Integration

Direct GGUF downloads from Hugging Face with local cache detection. Maps HF model names to llama.cpp tag format and marks models as installed when matching GGUF files are present locally.

Docker Model Runner Integration

Queries Docker Desktop's built-in model serving, matches models using Ollama-style tag mapping, and pulls via docker model pull.

LM Studio Integration

Connects to LM Studio's local server with built-in model download capabilities. Accepts HuggingFace model names directly and tracks download progress via polling.

Model Name Mapping

llmfit maintains an accurate mapping between HuggingFace names (e.g., Qwen/Qwen2.5-Coder-14B-Instruct) and runtime-specific naming schemes:

  • Ollama: qwen2.5-coder:14b
  • Docker Model Runner: ai/qwen2.5-coder:14b
  • LM Studio: Direct HuggingFace name

Platform Support

Platform Status
Linux Full support. GPU detection via nvidia-smi (NVIDIA), rocm-smi (AMD), sysfs/lspci (Intel Arc), and npu-smi (Ascend)
macOS (Apple Silicon) Full support. Detects unified memory via system_profiler. VRAM = system RAM. Models run via Metal GPU acceleration.
macOS (Intel) RAM and CPU detection works. Discrete GPU detection if nvidia-smi available.
Windows RAM and CPU detection works. NVIDIA GPU detection via nvidia-smi if installed.
Android / Termux CPU and RAM detection usually work. Mobile GPU autodetection not currently supported.

GPU Detection Table

Vendor Detection Method VRAM Reporting
NVIDIA nvidia-smi Exact dedicated VRAM
AMD rocm-smi Detected (VRAM may be unknown)
Intel Arc (discrete) sysfs (mem_info_vram_total) Exact dedicated VRAM
Intel Arc (integrated) lspci Shared system memory
Apple Silicon system_profiler Unified memory (= system RAM)
Ascend npu-smi Detected (VRAM may be unknown)

Built-in Themes

Press t to cycle through 10 beautiful color themes. Your selection is automatically saved and restored on next launch:

Theme Description
Default Original llmfit colors
Dracula Dark purple background with pastel accents
Solarized Ethan Schoonover's Solarized Dark palette
Nord Arctic, cool blue-gray tones
Monokai Monokai Pro warm syntax colors
Gruvbox Retro groove palette with warm earth tones
Catppuccin Latte 🌻 Light theme — harmonious pastel inversion
Catppuccin Frappé 🪴 Low-contrast dark — muted, subdued aesthetic
Catppuccin Macchiato 🌺 Medium-contrast dark — gentle, soothing tones
Catppuccin Mocha 🌿 Darkest variant — cozy with color-rich accents

Conclusion

llmfit is an essential tool for anyone working with local LLMs. By combining intelligent hardware detection, multi-dimensional model scoring, and seamless runtime provider integration, it takes the guesswork out of model selection.

Whether you're setting up a personal development environment, deploying models in production, or just exploring the capabilities of local AI, llmfit helps you make informed decisions about which models will actually perform well on your hardware.

The open-source nature, cross-platform support, and active development make llmfit a valuable addition to any LLM enthusiast's toolkit.

Get Started

Install llmfit today and discover the perfect models for your system:

# macOS/Linux (Homebrew)
brew install llmfit

# Windows (Scoop)
scoop install llmfit

# Or use the quick install script
curl -fsSL https://llmfit.axjns.dev/install.sh | sh
Based on llmfit by AlexsJones Terminal tool for LLM model selection