AI - Grosan Flaviu Gheorghe

2026-06-30 · 6 min read · AI

How to Reduce AMD R9700 AI PRO GPU Power Usage by 17% for AI Inference with a Dockerised Controller

Running four AMD Radeon AI PRO R9700 GPUs at stock 300W each for local AI inference is wasteful. Like NVIDIA GPUs, AI inference workloads are memory-bandwidth bound, not compute bound.

2026-06-28 · 13 min read · AI

Fine-Tuning a Qwen model on AMD Radeon AI PRO R9700 (RDNA4) using LlamaFactory

This documents the end-to-end process of fine-tuning Qwen3 models (0.6B and 1.7B) using LlamaFactory running on AMD Radeon AI PRO R9700 GPUs (gfx1201/RDNA4). For the purpose of

2026-06-03 · 8 min read · AI

Dual AMD R9700 Setup on ASUS X99-E WS/USB 3.1

Running two AMD Radeon AI PRO R9700 cards (gfx1201, RDNA4, 32GB each) on an ageing ASUS X99-E WS/USB 3.1 board is possible, but the platform fights you at

2026-06-02 · 12 min read · AI

Layer Split Model Parallelism on Hybrid AMD NVIDIA AI Servers using Vulkan and Llama CPP

Running a single large language model across GPUs from two different vendors is not something the tooling expects you to do. CUDA is NVIDIA only. ROCm is AMD only. The

2026-05-24 · 10 min read · linux

How to Reduce NVIDIA GPU Power Usage and Clock Speeds for AI Inference with a Dockerised Controller

Running multiple GPUs for local AI inference at stock power settings is wasteful. Consumer cards like the RTX 3090 draw 350W by default, but AI inference workloads are typically memory-bandwidth