d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
This paper explores scaling reasoning in diffusion large language models (dLLMs) using reinforcement learning, introducing efficient log-probability estimation and an advanced RL algorithm, diffu-GRPO.