👍 131
05/27 20:00
Speculative decoding accelerates LLM inference by drafting multiple tokens and verifying them in parallel with the target model. However, its practical speedup is constrained by the trade-off between draft quality and drafting cost: autoregressive drafters model causal dependencies among draft token
中文介绍 论文提出了一种名为 Domino 的方法,将因果建模与自回归草稿分离,以加速 LLM 推理。该方法通过并行草拟多个标记并与目标模型验证,从而提升推理速度,解决了草稿质量与成本之间的权衡问题。
👍 54
05/29 20:00
Recent progress in the development of language models has been defined by scale, with each generation absorbing more of the world's knowledge into its weights. However, many practical applications benefit more from robust reasoning than from extensive parametric knowledge. In this setting, task-spec
中文介绍 OCC-RAG 提出了一种最优认知核心的方法,用于增强问题回答的准确性,尤其在需要坚实推理的任务中表现突出。该方法利用任务特定的策略,以提高模型的信实性。相关实验结果显示其在多个领域的有效性。
👍 34
05/21 20:00
Identifying which brain regions represent a visual concept in the human brain is a central challenge in neuroscience. Existing approaches have localized coarse functional regions (e.g., faces, places) through activation maximization, identifying regions that activate strongly for a target concept re
中文介绍 本研究关注人脑中视觉概念的因果表示,提出新的方法识别脑区的功能。通过激活最大化的方法,我们能有效定位与特定视觉概念相关的脑区,为神经科学研究提供新视角。
👍 34
05/31 20:00
Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routin
中文介绍 论文探讨如何通过强化学习(RL)为搜索代理设计外部状态的强制工具,以提高其在动态搜索过程中的决策效率。研究表明,现有训练方法在复杂环境中存在局限性,提出了更有效的替代方案。
👍 31
05/30 20:00
On-Policy Distillation (OPD) is a fundamental technique for efficient post-training of large language models (LLMs), with broad applications in agent learning, multi-task enhancement, and model compression. However, OPD training becomes unstable when the teacher and student distributions differ subs
中文介绍 提出的信任区域在线政策蒸馏方法(OPD)旨在解决大型语言模型(LLM)高效后的训练不稳定性问题,通过保持教师和学生之间的分布相似性,增强了模型的稳定性与有效性。
👍 25
05/27 20:00
Watermarking embeds statistical signatures in AI-generated text for detection and attribution. We reveal a fundamental vulnerability: when users access multiple models (today's reality), watermarks trivially fail. Watermarks perturb output distributions away from the original, and in competitive mar
中文介绍 本文揭示了现有水印技术的脆弱性:当用户从多个模型访问生成文本时,水印容易失效。研究表明,水印在输出分布中造成的扰动会影响文本的有效检测和归属。
👍 22
06/01 20:00
Test-time scaling is a powerful approach to obtain better reasoning in large language models, but it becomes memory-bottlenecked during long-horizon decoding, as the KV-cache grows. KV-cache quantization can help improve this, but current methods are evaluated under prefill-like settings and errors
中文介绍 KVarN 方法通过方差归一化的 KV-cache 量化技术,解决了推理任务期间的错误累积问题,提升了大语言模型在长时间解码过程中的内存管理能力,提高了推理效率。
👍 21
05/31 20:00
Reinforcement learning (RL) post-training improves large language models (LLMs) on individual domains such as mathematical reasoning, code generation, question answering, and creative writing (CW), but training on one domain often degrades performance on others. Existing explanations based on catast
中文介绍 本研究针对多领域强化学习中的交叉干扰问题,提出了一种局部扰动理论,改善模型在不同领域上的性能,强调在特定域训练对其他领域的影响,从而提升 RL 的训练整体效果。
👍 18
05/28 20:00
Mid-training has become an important stage in modern LLM development, using large-scale curated mixtures to strengthen capabilities before final post-training. Its data selection problem is distinct: the data are optimized under a pretraining-style objective at near-pretraining scale, but are curate
中文介绍 MIRA 方法通过中期数据选择来优化大语言模型的发展,通过精心挑选的数据集增强模型能力,实验结果表明该方法在最终后训练阶段显著提升了性能。
👍 16
05/31 20:00
Autonomous agents are increasingly expected to support end-to-end medical-AI research workflows, moving beyond isolated prediction tasks or short-form clinical question answering. However, existing medical agent benchmarks primarily evaluate final outputs, providing limited visibility into agent beh
中文介绍 AutoMedBench 旨在通过自主代理支持医学领域的全面人工智能研究流程,超越孤立的预测任务,为医学代理模型提供更全面的评估标准,提高其在复杂任务中的表现。
👍 16
06/02 09:07
World models and multimodal large language models (MLLMs) provide complementary capabilities for predicting future outcomes from static visual observations. World models can generate concrete visual rollouts of possible futures, while MLLMs can reason abstractly over questions, goals, and rules. How
中文介绍 研究将世界模型与多模态大语言模型相结合,以提升基于静态视觉观察的预测能力。世界模型能够生成具体的视觉推测,而语言模型则在抽象层面进行推理,两者互补以应对复杂问题。
👍 14
05/31 22:52
Reinforcement learning (RL) for visual reasoning needs scalable, verifiable, and controllable training signals. Existing visual RL post-training trains on static curated datasets, with fixed image-question-answer samples bounded by their collection budget. In this work, we introduce TRON (Targeted,
中文介绍 TRON 方法为视觉推理的强化学习提供了可扩展、可验证和可控的训练信号,解决了现有静态数据集的局限性,增强了模型在动态环境中的适应能力。
👍 11
06/01 20:00
The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learni
中文介绍 该研究探讨了大型语言模型在自我修改和记忆整合中的重要性,提出学习机制以实现模型的主动自我优化,提升其在复杂任务中的表现与适应能力。
👍 9
06/01 14:20
Personalization is a crucial capability of modern language agents. However, current research primarily positions personalized agents as passive responders to user preferences, limiting their ability to interact with users and provide suggestions or guidance proactively. To systematically evaluate su
中文介绍 Ψ-Bench 提出了一种系统评估个性化交互中语言代理影响力的框架,强调代理不应仅被动响应用户偏好,而应主动提供建议和指导,以提升用户交互的质量和效果。
👍 8
05/31 20:00
Reinforcement learning (RL) improves large language model (LLM) agents by teaching them which actions lead to high rewards, but provides little supervision on what those actions do to the environment. World modeling (WM) can fill this gap, yet existing approaches often require separate simulators, e
中文介绍 本研究提出了一种政策与世界建模共同训练的方法,通过强化学习提升大语言模型的代理能力,解决了现有方法依赖独立仿真的局限性,以提高模型的全面理解和执行能力。
👍 7
05/31 20:00
Instruction tuning aligns large language models, including multimodal ones, with diverse user intents, but scaling to heterogeneous mixtures is hindered by gradient interference and bandwidth-heavy synchronization. We ask whether these two bottlenecks can be addressed jointly by training parts of th
中文介绍 研究提出去中心化的指令调优方法,解决大语言模型在异构混合环境中的梯度干扰与带宽限制问题,实现了在不同行的部分进行联合训练,显著提升了模型的学习效率与性能。
👍 7
06/01 20:00
As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly
中文介绍 NVIDIA OmniDreams 提出了一个实时生成的世界模型,用于闭环自主车辆的模拟,解决了长期场景中驾驶策略安全评估的问题,提高了模拟环境的实时性和响应能力。
👍 7
05/31 20:00
Embodied visual navigation, where an agent perceives a complex environment and acts to reach a goal from raw sensory input, underpins a wide range of applications such as household service robotics, assistive robotics, and large-scale autonomous exploration. However, recent attempts to unify vision-
中文介绍 PlatonicNav 研究通过引入柏拉图拓扑图以揭示导航中的语义对应关系,增强了基于复杂环境的自主探索能力,适用于服务机器人和大规模自主探测等应用场景。
👍 6
06/01 23:42
Test-time scaling improves the reasoning performance of large language models but incurs substantial cost in both total computation and latency. Existing adaptive sampling methods partially mitigate this issue by dynamically deciding when to stop sampling, yet they typically rely on heuristic rules
中文介绍 本文提出通过小型 RL 控制器与大型语言模型联合指导,以自适应采样提升测试时推理的性能,显著降低了计算成本和延迟问题,推进了 RL 在语言模型中的应用。
👍 6
05/27 20:00
Long chain-of-thought (CoT) traces are widely used as supervision for reasoning-oriented LLM SFT, yet answer-correct traces can still lead to markedly different fine-tuning outcomes. We study post-conclusion continuation in answer-correct long-CoT data: a continuation where the answer appears suffic
中文介绍 论文研究了长链思维 (CoT) 数据集中回答正确的持续性问题,揭示了这种持续性可能导致细调结果的显著不同,为后续研究提供了新的思路和验证依据。