最新推荐
Exploring scalable reasoning in diffusion-based large language models using advanced reinforcement learning techniques.
This paper explores scaling reasoning in diffusion large language models (dLLMs) using reinforcement learning, introducing efficient log-probability estimation and an advanced RL algorithm, diffu-GRPO.
Train LLMs to reason and call search engines efficiently using reinforcement learning
Search-R1 is an open-source reinforcement learning framework for training reasoning-and-searching interleaved language models (LLMs) to make coordinated tool calls, such as querying search engines, to enhance reasoning capabilities.































