How to install drivers for NVIDIA Tesla V100 on Fedora 44 Server Edition for AI Inference

The NVIDIA Tesla V100 has become a surprisingly attractive GPU for local LLM inference, thanks to its end-of-life status causing a flood of cheap used cards on the market. This article describes how to install it on Fedora 44, and the exact steps to get it working with Docker and llama.cpp.

TL;DR;

Install driver branch 580 (packages: akmod-nvidia-580xx xorg-x11-drv-nvidia-580xx-cuda - last branch supporting Volta), use CUDA toolkit 12.6 in containers (FROM nvidia/cuda:12.6.3-devel-ubuntu22.04), compile llama.cpp for V100 + Ampere (mixed rig) by passing in -DCMAKE_CUDA_ARCHITECTURES="70;86" (Architecture flags: 70 = V100 (Volta), 86 = RTX 3090 (Ampere)).

Motivation

NVIDIA deprecated the Volta architecture starting with driver branch 590, and CUDA 13 is the last major toolkit version supporting sm_70. Data centres have been retiring V100s en masse, dropping used prices to roughly 500 GBP per 32GB card. Despite being eight years old, the V100 32GB still offers a lot: more VRAM than an RTX 4090, 900 GB/s HBM2 bandwidth, first-generation FP16 Tensor cores, and clean support in llama.cpp for GGUF quantization formats. For running 30B+ models at Q4-Q8, or 70B models split across multiple cards, it offers more VRAM per dollar than almost any alternative.

Problem

Fedora 44 ships RPM Fusion's akmod-nvidia package, defaulting to driver branch 595, which has dropped Volta support. The V100 fails to initialize with:

NVRM: The NVIDIA GPU 0000:0c:00.0 (PCI ID: 10de:1db6)
NVRM: installed in this system is not supported by open nvidia.ko
nvidia 0000:0c:00.0: probe with driver nvidia failed with error -1

The fix is to install the 580 Long-Term Support Branch instead, which is the last NVIDIA driver branch supporting Volta and is maintained with security updates through June 2028.

How to install

Tested on Fedora 44 Server Edition, kernel 7.0.4-200.fc44 (upgraded to 7.0.8-200.fc44 during install).

Remove the default 595 driver and any related packages:

sudo dnf remove '*nvidia*'
sudo dnf autoremove

Install the 580xx legacy branch from RPM Fusion:

sudo dnf install akmod-nvidia-580xx xorg-x11-drv-nvidia-580xx-cuda

The install pulls in around 300 packages, most of which are build dependencies (gcc, kernel-devel, kernel-headers) and the RPM Fusion media stack. The actual NVIDIA components are akmod-nvidia-580xx-580.159.03, xorg-x11-drv-nvidia-580xx-580.159.03, and the CUDA driver libraries.

Reboot:

sudo reboot

Verify:

nvidia-smi

Expected output: all V100 cards listed with driver 580.159.03 and CUDA Version 13.2.

A note on package names: use akmod-nvidia-580xx specifically, not akmod-nvidia (which defaults to 595 and lacks Volta support) and not akmod-nvidia-open (which requires GPU System Processor hardware that Volta lacks).

Which CUDA version to use

The driver and the CUDA toolkit are separate concerns, and the V100 has different compatibility for each:

Driver side: branch 580 LTSB. nvidia-smi reports "CUDA Version: 13.2", which is the maximum CUDA runtime version the driver supports.
Toolkit side (for compilation): CUDA 12.x, not 13.x. CUDA 13 removed sm_70 from nvcc entirely. Attempting to compile for compute_70 with CUDA 13.x yields nvcc fatal: Unsupported gpu architecture 'compute_70'. Use CUDA toolkit 12.6 as the sweet spot.

This split matters most when working with containers. A container built on nvidia/cuda:13.0.0-devel-ubuntu22.04 cannot compile CUDA code for the V100, even though the host driver works fine. Use:

FROM nvidia/cuda:12.6.3-devel-ubuntu22.04

PyTorch 2.6+ ships with Volta support, vLLM works on V100 with FP16 and AWQ-INT4, and llama.cpp's CUDA backend handles sm_70 correctly when built with CUDA 12.x.

Recompiling llama.cpp for V100

llama.cpp must be built with CUDA architecture 70 explicitly included. If the build targets only compute_86 (Ampere) or higher, the kernels won't execute on the V100 and you'll get CUDA error: CUDA-capable device(s) is/are busy or unavailable, which actually means "no compatible kernels found".

Build inside a CUDA 12.6 container:

cmake -B build \
    -DGGML_CUDA=ON \
    -DCMAKE_CUDA_ARCHITECTURES="70" \
    -DCMAKE_BUILD_TYPE=Release

cmake --build build --config Release -j$(nproc)

For mixed setups (V100 + Ampere card in the same machine), include both architectures:

-DCMAKE_CUDA_ARCHITECTURES="70;86"

Here, 70 covers V100 (Volta), 86 covers RTX 3090 (Ampere). The resulting binary contains kernels for both and dispatches at runtime.

Verify the build picked up Volta. When llama-server starts, it prints a system_info line that includes the compiled architectures:

system_info: ... CUDA : ARCHS = 70;86 | USE_GRAPHS = 1 | ...

If ARCHS = 860 only, the V100 will not work.

A note on stale builds: if you previously built against a different CUDA version, wipe the build directory before rebuilding. Stale CMake dependency files reference old toolkit paths and produce obscure linker errors like No rule to make target '/usr/local/cuda-13.2/targets/x86_64-linux/lib/libcudart.so':

rm -rf build/
cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="70" ...

Docker GPU passthrough

The NVIDIA Container Toolkit enables GPU access from Docker containers via the --gpus flag.

Install on Fedora 44:

sudo dnf install nvidia-container-toolkit

The package comes from NVIDIA's official repository. If dnf can't find it, add the repo:

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
    sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo dnf install nvidia-container-toolkit

Configure the Docker runtime:

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

This edits /etc/docker/daemon.json and adds the nvidia runtime:

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-ctk"
        }
    }
}

Test passthrough:

sudo docker run --rm --gpus all nvidia/cuda:12.6.3-base-ubuntu22.04 nvidia-smi

If this prints the same GPU list as the host's nvidia-smi, passthrough works.

Conclusion

The V100 remains a cost-effective option for local LLM inference in 2026, provided you accept the constraints of an end-of-life architecture. The driver and CUDA toolkit version requirements are easy to miss but trivial to address once known. Driver 580 LTSB combined with CUDA 12.6 in containers gives a stable, well-supported stack through at least June 2028, more than enough time to extract value from cheap used hardware.