This documents the end-to-end process of fine-tuning Qwen3 models (0.6B and 1.7B) using LlamaFactory running on AMD Radeon AI PRO R9700 GPUs (gfx1201/RDNA4). For the purpose of this tutorial I will tune the model to detect multi-predicate conditional statements in JavaScript that should be extracted into predicate functions.

The final production setup achieves ~80% detection rate with zero false positives on unseen codebases, served via llama.cpp with Vulkan inference.


Hardware

  • AMD Radeon AI PRO R9700 (32GB VRAM, RDNA4/gfx1201) - only one GPU needed for this tutorial. Given the small model sizes (0.6B-1.7B with QLoRA), a GPU with less VRAM would also work.
  • AMD Threadripper PRO 3945WX
  • 64GB RAM
  • Fedora 44 Server Edition

Software

  • LlamaFactory - a unified framework for fine-tuning large language models. It supports LoRA, QLoRA, and full fine-tuning across 100+ model architectures, with built-in dataset management, a web UI, and export tools. Used in this tutorial for QLoRA training and merging the adapter back into the base model.
  • llama.cpp - a lightweight C/C++ inference engine for running LLMs on consumer hardware. It supports CPU, Vulkan, CUDA, and Metal backends, with its own GGUF model format optimised for fast loading and quantisation. Used in this tutorial to serve the fine-tuned model with Vulkan on the R9700.
  • tree-sitter - an incremental parsing library used by editors like VS Code and Neovim for syntax highlighting and code navigation. Used in this tutorial to parse JavaScript files into ASTs and mechanically extract multi-predicate conditionals for training data generation.

The RDNA4 Training Problem

AMD's ROCm training tools (AITER, Flash Attention, etc.) were originally built for CDNA GPUs (MI250/MI300) and RDNA4 support is still being added - often through community patches. At the time of writing, PyTorch training on R9700 has known performance and stability issues (ROCm #5674). Key issues encountered:

  • rocBLAS Tensile GEMM kernel crashes: Memory access faults on specific matrix shapes during training. The crash is non-deterministic and depends on sequence length, batch composition, and data ordering.
  • Standard LoRA training crashes: Full-precision LoRA hits broken GEMM kernels. Only QLoRA (4-bit quantisation) uses different code paths that avoid the crash.

The Fix: ROCm 6.3 + QLoRA + bitsandbytes from Source

Credit to @jd-lo who documented a working QLoRA training setup on R9700 in LlamaFactory issue #10511. Their findings - ROCm 6.3, bitsandbytes compiled from source for gfx1201, and QLoRA 4-bit - were the foundation that made this work.

The solution uses:

  • ROCm 6.3 base image (not the vLLM image)
  • PyTorch 2.9.1+rocm6.3
  • bitsandbytes compiled from source for gfx1201
  • QLoRA (4-bit quantisation) which hits different GEMM kernels that work on RDNA4
  • Single GPU training (CUDA_VISIBLE_DEVICES=0)

Docker Setup

Dockerfile.llamafactory:

FROM rocm/pytorch:rocm6.3_ubuntu22.04_py3.10_pytorch_release_2.3.0

ENV AMDGPU_TARGETS=gfx1201
ENV HSA_OVERRIDE_GFX_VERSION=12.0.1
ENV PYTORCH_ROCM_ARCH=gfx1201

# Upgrade PyTorch to 2.9.1+rocm6.3
RUN pip install --upgrade torch==2.9.1+rocm6.3 \
    --index-url https://download.pytorch.org/whl/rocm6.3

# Build and install bitsandbytes from source for gfx1201
RUN cd /tmp && \
    git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && \
    cd bitsandbytes && \
    cmake -DCOMPUTE_BACKEND=hip \
          -DCMAKE_BUILD_TYPE=Release \
          -DCMAKE_HIP_ARCHITECTURES="gfx1201" \
          -S . -B build && \
    cmake --build build -j$(nproc) && \
    rm -f /opt/conda/envs/py_3.10/compiler_compat/ld && \
    pip install . && \
    rm -rf /tmp/bitsandbytes

# Install LLaMA Factory
RUN pip install llamafactory

# Fix accelerate unhashable set bug
RUN MODELING_PY=$(python -c "import accelerate.utils.modeling; print(accelerate.utils.modeling.__file__)") && \
    sed -i '/elif not isinstance(no_split_module_classes, (list, tuple)):/i\    elif isinstance(no_split_module_classes, set):\n        no_split_module_classes = list(no_split_module_classes)' \
    "$MODELING_PY" || true

# Remove old torchvision incompatible with torch 2.9
RUN pip uninstall -y torchvision

WORKDIR /app

Key detail: rm -f /opt/conda/envs/py_3.10/compiler_compat/ld removes conda's broken linker that prevents bitsandbytes from building via pip install .. The ROCm base image ships with conda, which is generally a problematic environment - this linker issue is one of many reasons to avoid it when possible.

docker-compose.yml:

services:
  llamafactory:
    build:
      context: .
      dockerfile: Dockerfile.llamafactory
    container_name: llamafactory
    entrypoint: ""
    command: bash
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - "video"
    cap_add:
      - SYS_PTRACE
    security_opt:
      - label=disable
    ipc: host
    ports:
      - "7860:7860"
      - "8000:8000"
    environment:
      - HF_TOKEN=${HF_TOKEN}
      - BNB_BACKEND=rocm
    volumes:
      - ./data:/app/data:Z
      - ./output:/app/output:Z
      - ./huggingface_cache:/root/.cache/huggingface:Z
    stdin_open: true
    tty: true

.env:

HF_TOKEN=hf_xxxxxxxxxxxx

Build and Verify

docker compose build
docker compose up -d
docker compose exec llamafactory bash

# Verify GPUs are visible
python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.device_count())"
# Should print: True and 1 or more

Dataset Generation

Why Predicate Functions?

Multi-predicate conditional statements like if (a > 5 && b < 10 && c !== null) are a seemingly innocent construct, but when duplicated throughout a codebase they can lead to subtle bugs if not kept in sync. Extracting them into predicate functions makes conditions reusable - change the logic in one place and it gets updated everywhere.

This tutorial focuses on detecting these patterns, but the broader goal is to build a small, fast MoE model composed of multiple LoRA adapters - each trained on a different code style rule - for use in automated code reviews.

Tool: tree-sitter

Tree-sitter parses JavaScript files into an AST and extracts multi-predicate conditional statements mechanically. This provides 100% accurate ground truth for labelling training data.

Dependencies:

pip install tree-sitter tree-sitter-javascript

Dataset Generator Script (generate_dataset.py)

The generator:

  1. Walks directories for .js files
  2. Parses each file with tree-sitter
  3. Extracts all conditionals (if, else if, while, do...while, for, ternary) with multiple predicates
  4. Generates full-file training examples (violation files + clean files)
  5. Generates short snippet examples for each individual violation
  6. Generates clean snippet examples from single-predicate conditionals

Key implementation details:

  • Detects else if by checking if the if_statement node's parent is an else_clause
  • Counts logical operators (&&, ||) recursively in the condition subtree
  • Skips minified files (lines > 500 chars)
  • Skips files > 32KB (won't fit in context)
  • Skips node_modules in subdirectories
  • No line numbers in output (model hallucinated them)
  • Single dash separator in output format
  • Uses "all" in the prompt: "Review this code for all predicate function violations:"
import sys
import os
import json
import tree_sitter_javascript as tsjs
from tree_sitter import Language, Parser, Query, QueryCursor

JS = Language(tsjs.language())
parser = Parser(JS)


def count_logical_ops(node):
    count = 0
    if node.type == "binary_expression":
        op = node.child_by_field_name("operator")
        if op and op.text.decode() in ("&&", "||"):
            count += 1
    for child in node.children:
        count += count_logical_ops(child)
    return count


CONDITION_QUERIES = [
    ("if", "(if_statement condition: (_) @cond) @stmt"),
    ("while", "(while_statement condition: (_) @cond) @stmt"),
    ("do...while", "(do_statement condition: (_) @cond) @stmt"),
    ("for", "(for_statement condition: (_) @cond) @stmt"),
    ("ternary", "(ternary_expression condition: (_) @cond) @stmt"),
]


def find_violations(source):
    tree = parser.parse(source)
    violations = []

    for kind, query_str in CONDITION_QUERIES:
        q = Query(JS, query_str)
        cursor = QueryCursor(q)
        for _, captures in cursor.matches(tree.root_node):
            cond = captures["cond"][0]
            stmt = captures["stmt"][0]
            ops = count_logical_ops(cond)
            if ops >= 1:
                line = cond.start_point[0] + 1
                actual_kind = kind
                if kind == "if" and stmt.parent and stmt.parent.type == "else_clause":
                    actual_kind = "else if"
                violations.append({
                    "line": line,
                    "code": cond.text.decode(),
                    "predicates": ops + 1,
                    "kind": actual_kind,
                    "start_row": stmt.start_point[0],
                    "end_row": stmt.end_point[0],
                })

    return violations


def find_clean_ifs(source):
    """Find single-predicate conditional statements for clean snippet examples."""
    tree = parser.parse(source)
    clean = []

    for _, query_str in CONDITION_QUERIES:
        q = Query(JS, query_str)
        cursor = QueryCursor(q)
        for _, captures in cursor.matches(tree.root_node):
            cond = captures["cond"][0]
            stmt = captures["stmt"][0]
            ops = count_logical_ops(cond)
            if ops == 0:
                clean.append({
                    "start_row": stmt.start_point[0],
                    "end_row": stmt.end_point[0],
                })

    return clean


def extract_snippet(source_text, start_row, end_row, context=3):
    """Extract a few lines around a target range."""
    lines = source_text.split("\n")
    s = max(0, start_row - context)
    e = min(len(lines), end_row + context + 1)
    return "\n".join(lines[s:e]), s


def build_example(source_text, violations):
    user_content = f"Review this code for all predicate function violations:\n```js\n{source_text}\n```"

    if violations:
        lines = []
        for v in violations:
            kind = v.get("kind", "if")
            lines.append(f"{kind} ({v['code']}) - should be a predicate function.")
        assistant_content = "\n".join(lines)
    else:
        assistant_content = "No violations found."

    return {
        "messages": [
            {"role": "user", "content": user_content},
            {"role": "assistant", "content": assistant_content}
        ]
    }


def find_js_files(path):
    js_files = []
    for root, dirs, files in os.walk(path):
        # skip hidden dirs and node_modules in subdirs
        dirs[:] = [d for d in dirs if not d.startswith(".") and d != "node_modules"]
        for f in files:
            if f.endswith(".js"):
                js_files.append(os.path.join(root, f))
    return js_files


def main():
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <path> [path2] [path3] ...")
        sys.exit(1)

    js_files = []
    for path in sys.argv[1:]:
        js_files.extend(find_js_files(path))

    dataset = []
    violations_count = 0
    clean_count = 0
    skipped = 0

    for filepath in js_files:
        try:
            with open(filepath, "rb") as f:
                source = f.read()

            # skip huge files that won't fit in context
            if len(source) > 32000:
                print(f"Skipping {filepath}: too large ({len(source)} bytes)", file=sys.stderr)
                skipped += 1
                continue

            source_text = source.decode("utf-8", errors="replace")

            # skip minified files
            lines = source_text.split("\n")
            if any(len(line) > 500 for line in lines[:10]):
                skipped += 1
                continue

            violations = find_violations(source)
            example = build_example(source_text, violations)
            dataset.append(example)

            if violations:
                violations_count += 1
                # Add individual snippet examples for each violation
                for v in violations:
                    snippet, offset = extract_snippet(source_text, v["start_row"], v["end_row"])
                    snippet_violation = {
                        "line": v["line"] - offset,
                        "code": v["code"],
                        "predicates": v["predicates"],
                        "kind": v.get("kind", "if"),
                    }
                    dataset.append(build_example(snippet, [snippet_violation]))
                    violations_count += 1
            else:
                clean_count += 1

            # Add clean snippet examples from single-predicate ifs
            clean_ifs = find_clean_ifs(source)
            for ci in clean_ifs[:3]:  # max 3 clean snippets per file
                snippet, _ = extract_snippet(source_text, ci["start_row"], ci["end_row"])
                dataset.append(build_example(snippet, []))
                clean_count += 1

        except Exception as e:
            print(f"Skipping {filepath}: {e}", file=sys.stderr)
            skipped += 1

    # write dataset
    output_path = "dataset.jsonl"
    with open(output_path, "w") as f:
        for example in dataset:
            f.write(json.dumps(example) + "\n")

    print(f"Files scanned: {len(js_files)}")
    print(f"Examples with violations: {violations_count}")
    print(f"Clean examples: {clean_count}")
    print(f"Skipped (too large / errors): {skipped}")
    print(f"Dataset written to: {output_path}")


if __name__ == "__main__":
    main()

Training Data Format

{"messages": [
  {"role": "user", "content": "Review this code for all predicate function violations:\n```js\n<full file content>\n```"},
  {"role": "assistant", "content": "if (a > 5 && b < 10) - should be a predicate function.\nelse if (x || y) - should be a predicate function."}
]}

{"messages": [
  {"role": "user", "content": "Review this code for all predicate function violations:\n```js\n<clean file content>\n```"},
  {"role": "assistant", "content": "No violations found."}
]}

Generating the Dataset

Clone open source JavaScript projects for training data:

git clone --depth 1 https://github.com/expressjs/express
git clone --depth 1 https://github.com/fastify/fastify
git clone --depth 1 https://github.com/koajs/koa
git clone --depth 1 https://github.com/socketio/socket.io
git clone --depth 1 https://github.com/webpack/webpack
git clone --depth 1 https://github.com/eslint/eslint
git clone --depth 1 https://github.com/chalk/chalk
git clone --depth 1 https://github.com/nodemailer/nodemailer
git clone --depth 1 https://github.com/sequelize/sequelize
git clone --depth 1 https://github.com/mongoose-io/mongoose

Scan all repos at once:

python generate_dataset.py ./express ./fastify ./koa ./socket.io ./webpack ./eslint ./chalk ./nodemailer ./sequelize ./mongoose

You can also include your own project directories alongside these.

Balancing the Dataset

An unbalanced dataset causes the model to favour whichever class dominates. Too many clean examples and it defaults to "No violations found" on everything. Too many violations and it hallucates problems where there are none. A 50/50 split between violation and clean examples gives the model equal exposure to both outcomes.

balance_dataset.py:

import random
import sys

input_file = sys.argv[1] if len(sys.argv) > 1 else "dataset.jsonl"
output_file = sys.argv[2] if len(sys.argv) > 2 else "dataset_balanced.jsonl"

with open(input_file) as f:
    lines = f.readlines()

violations = [l for l in lines if "should be a predicate" in l]
clean = [l for l in lines if "No violations found" in l]

# Trim clean examples to match violation count
random.shuffle(clean)
clean = clean[:len(violations)]

# Shuffle so violations and clean examples are interleaved randomly.
# Without this, the model sees all violations first then all clean,
# and learns position-dependent patterns instead of the actual rule.
balanced = violations + clean
random.shuffle(balanced)

with open(output_file, "w") as f:
    for l in balanced:
        f.write(l)

print(f"Violations: {len(violations)}, Clean: {len(clean)}, Total: {len(balanced)}")
python balance_dataset.py dataset.jsonl dataset_balanced.jsonl

Final dataset: ~9,300 balanced examples from 10+ open source repos.

dataset_info.json

Place in data/ directory alongside the dataset:

{
  "predicate_violations": {
    "file_name": "dataset_balanced.jsonl",
    "formatting": "sharegpt",
    "columns": {
      "messages": "messages"
    },
    "tags": {
      "role_tag": "role",
      "content_tag": "content",
      "user_tag": "user",
      "assistant_tag": "assistant"
    }
  }
}

The tags section is required because LlamaFactory's default ShareGPT format expects from/value keys, but the dataset uses role/content.


Training Configuration

I initially aimed for the smallest model possible (Qwen3-0.6B) but after evaluation settled on Qwen3-1.7B. The 0.6B model learns the pattern and catches obvious violations but struggles with longer files - it tends to stop generating after the first violation and misses the rest. The 1.7B model is significantly more thorough, consistently listing multiple violations per file with fewer misses. I ended up choosing the 1.7B for production, but training both is recommended - the 0.6B trains in half the time and is useful for quick iteration when experimenting with dataset changes.

Training YAML (predicate_lora.yaml)

For Qwen3-0.6B:

model_name_or_path: Qwen/Qwen3-0.6B
stage: sft
do_train: true
finetuning_type: lora
quantization_bit: 4
lora_target: all
lora_rank: 16
lora_alpha: 16
dataset: predicate_violations
template: qwen3
cutoff_len: 4096
output_dir: /app/output/predicate_lora
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
num_train_epochs: 1
learning_rate: 0.0002
lr_scheduler_type: cosine
warmup_steps: 5
logging_steps: 10
save_steps: 200
save_total_limit: 2
bf16: true
plot_loss: true
report_to: none

For Qwen3-1.7B (predicate_lora_large.yaml):

Same as above but:

model_name_or_path: Qwen/Qwen3-1.7B
output_dir: /app/output/predicate_lora_large
learning_rate: 0.0001

Lower learning rate (0.0001 vs 0.0002) for the larger model - bigger models want lower LR.

Critical Settings

  • quantization_bit: 4 - QLoRA. This is what makes training work on RDNA4. Standard LoRA crashes.
  • cutoff_len: 4096 - Must be large enough for whole-file examples. Setting this to 1024 caused the model to learn truncated responses and default to "No violations found."
  • lora_target: all - Targets all linear modules (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj). Better results than default q_proj/v_proj only.
  • num_train_epochs: 1 - Single epoch on 9,300 diverse examples. Multiple epochs on smaller datasets caused overfitting.
  • learning_rate: 0.0002 - Written as 0.0002 not 2e-4 because the older transformers version in the ROCm 6.3 image parses scientific notation as a string.

Training Command

# Single GPU, no distributed
FORCE_TORCHRUN=0 CUDA_VISIBLE_DEVICES=0 llamafactory-cli train /app/data/predicate_lora.yaml

# Can train two models simultaneously on different GPUs
FORCE_TORCHRUN=0 CUDA_VISIBLE_DEVICES=0 llamafactory-cli train /app/data/predicate_lora.yaml
FORCE_TORCHRUN=0 CUDA_VISIBLE_DEVICES=1 llamafactory-cli train /app/data/predicate_lora_large.yaml

Training Results

Model Examples Epochs Steps Time Final Loss
Qwen3-0.6B 9,300 1 1,163 ~37 min 0.092
Qwen3-1.7B 9,300 1 1,163 ~52 min 0.105

Export and Conversion

Step 1: Merge LoRA Adapter into Base Model

This step is generic - it produces a standard HuggingFace model directory that can be used with any inference framework (vLLM, llama.cpp, TGI, etc.). Inside the llamafactory container:

llamafactory-cli export \
  --model_name_or_path Qwen/Qwen3-1.7B \
  --adapter_name_or_path /app/output/predicate_lora_large \
  --template qwen3 \
  --finetuning_type lora \
  --export_dir /app/output/merged_predicate_1.7b

Step 2: Convert to GGUF (llama.cpp specific)

GGUF is llama.cpp's model format. The conversion requires Python dependencies that aren't in the llamafactory container. Add them to your llama.cpp Dockerfile:

FROM ubuntu:24.04

# ... your existing llama.cpp build steps ...

RUN apt-get update && apt-get install -y python3 python3-pip && \
    rm -rf /var/lib/apt/lists/*

# GGUF conversion deps (CPU-only torch to keep it small)
RUN pip install --break-system-packages \
    gguf transformers sentencepiece protobuf numpy

RUN pip install --break-system-packages \
    torch --index-url https://download.pytorch.org/whl/cpu

Then inside the llama.cpp container, clone the conversion script and run it:

git clone --depth 1 https://github.com/ggml-org/llama.cpp /tmp/llama.cpp
python3 /tmp/llama.cpp/convert_hf_to_gguf.py \
  /path/to/merged_predicate_1.7b \
  --outfile predicate-1.7b-q8_0.gguf \
  --outtype q8_0

Make sure the merged model directory from step 1 is accessible to the llama.cpp container via a shared volume.

Step 3: Serve with llama.cpp

llama-server -m predicate-1.7b-q8_0.gguf \
  --host 0.0.0.0 --port 8080

Inference Configuration

Critical Sampling Parameters

{
  "model": "Qwen3-1.7B",
  "messages": [{"role": "user", "content": "Review this code for all predicate function violations:\n```js\n...\n```"}],
  "temperature": 0.1,
  "min_tokens": 500,
  "max_tokens": 4096
}
  • temperature: 0.1 - Low temperature for consistent, deterministic output. Higher temperatures cause hallucinations.
  • min_tokens: 500 - Critical. Without this, the model generates EOS after the first violation and stops listing the rest. This was the fix for the early-stopping problem. LlamaFactory's API does NOT support this parameter - llama.cpp does.
  • max_tokens: 4096 - Ensure long files get complete violation lists.

Production Scanner (scan.sh)

Bash script that walks a directory, sends each JS file to the model, and optionally shows tree-sitter comparison:

# Model only
./scan.sh ./my-project

# Model + tree-sitter comparison
./scan.sh ./my-project --ts

# Custom API endpoint
./scan.sh ./my-project --api http://localhost:8080

scan.sh:

#!/bin/bash
# Usage: ./scan.sh /path/to/project [--ts] [--api URL]

DIR=""
API="http://localhost:8080"
SHOW_TS=false
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"

while [[ $# -gt 0 ]]; do
    case "$1" in
        --ts) SHOW_TS=true; shift ;;
        --api) API="$2"; shift 2 ;;
        *) DIR="$1"; shift ;;
    esac
done

DIR="${DIR:-.}"

find "$DIR" -name '*.js' -not -path '*/node_modules/*' -not -path '*/.*' | sort | while read -r file; do
    # skip minified files
    if head -1 "$file" | wc -c | grep -q '[0-9]\{4,\}'; then
        continue
    fi

    # skip files > 32KB
    size=$(wc -c < "$file")
    if [ "$size" -gt 32000 ]; then
        continue
    fi

    # Model scan
    model_result=$(jq -n --arg code "$(cat "$file")" '{
        model: "Qwen3-1.7B",
        messages: [{
            role: "user",
            content: ("Review this code for all predicate function violations:\n```js\n" + $code + "\n```")
        }],
        temperature: 0.1,
        min_tokens: 500,
        max_tokens: 4096
    }' | curl -s "$API/v1/chat/completions" \
        -H "Content-Type: application/json" \
        -d @- | jq -r '.choices[0].message.content' | sed 's/<think>//g; s/<\/think>//g' | sed '/^$/d')

    model_has_violations=false
    if [ -n "$model_result" ] && ! echo "$model_result" | grep -q "No violations found"; then
        model_has_violations=true
    fi

    # Tree-sitter scan (only if --ts)
    ts_has_violations=false
    ts_result=""
    if $SHOW_TS; then
        ts_result=$(python3 "$SCRIPT_DIR/ts_single.py" "$file" 2>/dev/null)
        if [ -n "$ts_result" ] && ! echo "$ts_result" | grep -q "No violations found"; then
            ts_has_violations=true
        fi
    fi

    # Only print if either found something
    if $model_has_violations || $ts_has_violations; then
        echo ""
        echo "=== $file ==="
        echo "$model_result"
        if $SHOW_TS; then
            echo "[Tree-sitter]"
            echo "$ts_result"
        fi
    fi
done

The --ts flag requires ts_single.py in the same directory, which uses tree-sitter to provide ground truth comparison. The sed commands strip Qwen3's <think> tags from the output.


Results

Detection Accuracy (1.7B model, unseen codebases)

  • Overall: ~80% detection rate (94/118 violations caught)
  • False positives: 0 (zero hallucinated violations at temp 0.1)
  • Perfect on: standalone if statements, validation patterns, guard clauses
  • Weak on: ternary expressions, repeated boilerplate near file tops, some duplicate patterns deep in long files

What the Model Adds Over Tree-sitter

For this specific rule (predicate function violations), tree-sitter achieves 100% accuracy mechanically. The model's value is:

  1. Natural language output format (human-readable reports)
  2. Foundation for rules tree-sitter CAN'T handle (naming conventions, comment quality, code readability)
  3. Proof of concept for the LoRA adapter architecture - add more adapters for more rules

Architecture for Multiple Rules

Each rule gets its own LoRA adapter trained on the same base model:

Base Qwen3-1.7B (frozen)
  ├── predicate_lora (predicate function violations)
  ├── naming_lora (camelCase + descriptive names)
  ├── comments_lora (comment quality)
  └── ... more rules

Swap adapters at inference time. llama.cpp and vLLM both support runtime LoRA loading.


Lessons Learned

  1. QLoRA is required on RDNA4 - standard LoRA hits broken rocBLAS Tensile GEMM kernels. QLoRA uses different code paths that work.
  2. ROCm 6.3 is more stable than the vLLM image for training. The vLLM image crashed at step 40-50 consistently; ROCm 6.3 ran 1,163 steps clean.
  3. cutoff_len must match your data - training on whole files but setting cutoff_len: 1024 truncated the assistant responses, teaching the model to default to "No violations found."
  4. Dataset balance matters - too many clean examples (66%) made the model never flag violations. 50/50 is the sweet spot.
  5. min_tokens is essential - the model learned to generate EOS after the first violation. Forcing minimum generation length fixed multi-violation detection.
  6. 1 epoch on diverse data beats 3 epochs on small data - more codebases > more passes over the same code.
  7. Scientific notation breaks older transformers - write 0.0002 not 2e-4 in the YAML.
  8. Remove conda's broken linker - rm -f /opt/conda/envs/py_3.10/compiler_compat/ld before pip install . for bitsandbytes.