#1727 CI caught an issue where Qwen3 state dict adapter outputs unfused lora adapters (as was the behavior in transformers v4). In v5, the expectation for Qwen3MoE is to have fused adapters saved for the expert layers. Since it is currently close to release, I limited blast radius to Qwen3, but I have a strong suspicion it affects other MoE custom models as well. It remains to be investigated + fixed accordingly.
#1727 CI caught an issue where Qwen3 state dict adapter outputs unfused lora adapters (as was the behavior in transformers v4). In v5, the expectation for Qwen3MoE is to have fused adapters saved for the expert layers. Since it is currently close to release, I limited blast radius to Qwen3, but I have a strong suspicion it affects other MoE custom models as well. It remains to be investigated + fixed accordingly.