Grosan Flaviu Gheorghe - Grosan Flaviu Gheorghe

2026-06-03 · 8 min read · AI

Dual AMD R9700 Setup on ASUS X99-E WS/USB 3.1

Running two AMD Radeon AI PRO R9700 cards (gfx1201, RDNA4, 32GB each) on an ageing ASUS X99-E WS/USB 3.1 board is possible, but the platform fights you at

2026-06-02 · 12 min read · AI

Layer Split Model Parallelism on Hybrid AMD NVIDIA AI Servers using Vulkan and Llama CPP

Running a single large language model across GPUs from two different vendors is not something the tooling expects you to do. CUDA is NVIDIA only. ROCm is AMD only. The

2026-05-24 · 10 min read · linux

How to Limit GPU Power and Clock Speeds for AI Inference with a Dockerised Controller

Running multiple GPUs for local AI inference at stock power settings is wasteful. Consumer cards like the RTX 3090 draw 350W by default, but AI inference workloads are typically memory-bandwidth

2026-05-20 · 6 min read · AI

How to cool passive NVIDIA GPUs (Tesla V100, P40) with a Dockerised Fan Controller

The NVIDIA Tesla V100 and Tesla P40 are passively cooled cards, designed for data centre chassis with high-volume front-to-back airflow. Used in a desktop workstation or a home server they

2026-05-20 · 4 min read · AI

How to install drivers for NVIDIA Tesla V100 on Fedora 44 Server Edition for AI Inference

The NVIDIA Tesla V100 has become a surprisingly attractive GPU for local LLM inference, thanks to its end-of-life status causing a flood of cheap used cards on the market. This