👍 127
06/26 20:00
LLM agents are expected to act over multiple turns, using search, browsing interfaces, and terminal tools to complete user goals. Yet not every goal is well specified or achievable in the available environment. In such cases, a reliable agent should recognize that further interaction is unlikely to
中文介绍 论文探讨了 LLM 代理在不明确或不可实现的目标情况下,如何判断何时停止交互而非继续行动的问题。此方法强调了代理在多回合交互中的可靠性,旨在提升代理的决策能力和智能水平,尤其在复杂环境中对用户请求的响应。意义:为 Agent、自我监控研究提供了新的方向。
👍 79
06/25 20:00
Program verifiers play a central role in training coding agents, including selecting trajectories for supervised fine-tuning (SFT) and providing rewards for reinforcement learning (RL). Standard execution-based verification requires running unit tests inside per-repository environments such as Docke
中文介绍 提出了一种新型无环境的程序验证器 Dockerless,用于协调编码代理的训练,避免传统的基于执行的验证方式中的运行环境限制。通过实现不依赖于具体仓库环境的验证,提高了编程代理在强化学习和有监督微调中的表现。意义:推动了编码代理的通用性和效率。
👍 70
06/28 20:00
On-policy distillation (OPD) offers superior capacity transfer by supervising student-sampled trajectories with dense token-level signals. To furnish high-quality supervision sources and thereby elevate the performance frontier of distillation, an intuitive direction is to infuse privileged informat
中文介绍 提出了双重在线蒸馏 (DOPD) 方法,通过密集的标记级信号监督学生采样的轨迹,以提高蒸馏性能。此方法添加了特权信息,旨在提升蒸馏的高质量监督源,推动模型能力的提升。意义:加强了模型蒸馏研究,促进了高效学习的探索。
👍 65
06/29 20:00
Speculative decoding accelerates inference by using a lightweight draft model to generate candidate tokens in parallel, and are then verified by the target model, enabling lossless acceleration. Recently, diffusion-based speculative decoding further improves parallelism by generating multiple tokens
中文介绍 BlockPilot 提出了实例自适应策略学习方法,改进了基于扩散的投机解码,通过并行生成候选标记加速推理过程。此方法允许在无损加速的同时,提高了解码的灵活性和效率。意义:在实时推理和生成领域提升了应用潜力,减轻了计算负担。
👍 37
06/28 20:00
Foundation models for predictive machine learning on tabular data have recently gained significant traction in academia and industry. Research communities across disciplines are increasingly evaluating tabular foundation models on diverse datasets and tasks. However, these task- and discipline-speci
中文介绍 研究了表格基础模型在各种数据集和任务中的生成能力,探讨其如何超越独立同分布假设。论文评估了这些模型在不同任务中的表现差异,旨在理解它们的泛化能力。意义:促进表格数据处理领域的发展,为多学科应用提供参考。
👍 20
06/28 20:00
Modern large-scale LLM pretraining benefits from utilizing Pipeline Parallelism; however, synchronous implementations leave GPUs idle during pipeline bubbles, wasting computational resources. Asynchronous Pipeline Parallelism eliminates these bubbles, maximizing throughput at the cost of gradient st
中文介绍 指出了一步梯度延迟对于大规模异步管道并行 LLM 预训练不是障碍的问题。此方法通过消除训练中的空闲等待期,最大化计算资源的利用率,从而提升模型训练的吞吐量和效率。意义:为深度学习模型的训练优化提供了新的思路。
👍 19
06/22 20:00
Agent skills extend language-model agents with task-specific procedures, scripts, and references, but the tasks and environments they target continually change. Existing methods improve skills in bounded runs and retain only the final artifact, discarding the decision history that later agents need
中文介绍 SkillHone 提出了一个持续的决策历史框架,增强了语言模型代理的任务专用技能。此方法解决了现有技能提升仅依赖最终结果的问题,保留决策历史以便后续代理使用。意义:为持续学习和智能代理的技能演化提供了新的理解与实现方式。
👍 19
06/29 20:00
Block Diffusion Language Models (BD-LMs) improve diffusion-based text generation with KV caching and flexible-length generation. A natural next step is to extend them from Single-Block Diffusion (SingleBD) to Multi-Block Diffusion (MultiBD), where a running-set of consecutive blocks is decoded concu
中文介绍 多块扩散语言模型 (Multi-Block Diffusion Language Models) 通过 KV 缓存和灵活的长度生成,提升了扩散文本生成的效率。该研究将单块扩散模型扩展到多块,旨在实现在多任务环境中更高效的语言生成。意义:推动了文本生成模型的性能提升和应用扩展。
👍 19
06/26 20:00
Would experience designing faster GPU kernels also help close in on a long-standing open mathematical conjecture? Large Language Models (LLMs) integrated into evolutionary search have recently produced state-of-the-art solutions on optimization tasks, including open mathematical conjectures, GPU ker
中文介绍 探讨了演化微调如何在 371 个优化任务中学习发现新解。论文揭示了将大语言模型与演化搜索结合的潜力,取得了在优化任务上具有前沿性的结果。意义:为优化问题解决提供了新的方法论,并可能影响计算机科学和数学领域的研究方向。
👍 16
06/29 20:00
Video World Models are interactive video generation models that predict future world states based on user actions and history video frames. A critical challenge in video world models is the lack of memory, causing inconsistent generated scenes over extended durations. Previous methods explored rule-
中文介绍 MemLearner 引入了一种学习查询上下文记忆的方法,旨在改进视频世界模型中因缺乏记忆而导致的生成场景不一致性问题。该方法通过修正历史视频帧的处理,提高了生成任务的稳定性。意义:提升交互式视频生成模型的能力,为未来的智能代理提供更可靠的记忆机制。
👍 14
06/21 20:00
Procedural memory is increasingly used to improve LLM agents on recurring workplace tasks, yet its ability to produce reusable skills remains poorly understood. We introduce AFTER, a benchmark of 382 realistic enterprise tasks spanning six professional roles and 22 procedural skills, designed to eva
中文介绍 论文中提出的 AFTER 基准用于评估程序性记忆在 LLM 代理的表现,通过包含 382 项真实企业任务来研究技能的可重用性及其适应性。此研究为过程记忆应用提供了重要的评估框架和实际案例支持。意义:为持续学习和智能任务执行提供关键的评价标准。
👍 12
06/29 20:00
Text-rich image generation is one of the most challenging settings in image generation, since models must simultaneously produce visually realistic images and render legible, semantically aligned, and layout-consistent text. Existing data pipelines usually follow a static crawl-filter-freeze paradig
中文介绍 DataEvolver 提出了自我演化的多代理数据构建方法,旨在应对文本丰富图像生成中视觉和文本的一致性问题。通过动态数据生成,显著提升了生成图像的质量和文本的可读性,打破静态数据管道的限制。意义:推动了高质量图像生成领域的创新与应用。
👍 12
06/28 20:00
Most coding-agent benchmarks are static: an agent receives a complete task description up front and is judged only by its final code. Real coding assistance is interactive, with users clarifying goals, adding constraints, and correcting mistakes over multiple turns. We introduce SWE-Together, a mult
中文介绍 SWE-Together 通过引入交互用户会话的编码代理评估标准,解决了传统静态基准导致局限性的问题。该方法强调多轮交互中用户的反馈,以更真实地反映代理的协助能力与表现。意义:提升了编码代理在实际应用中的适应性和交互能力。
👍 9
06/28 20:00
While large language models have been dominating the research landscape recently, small language models remain highly relevant across various domains; yet, they receive far less attention. In this study, we investigate how smaller language models perform during the generation stage within a Retrieva
中文介绍 研究小型语言模型在生成阶段的表现,尽管大语言模型近期备受关注,但小模型在多任务中的表现仍然具有重要价值。通过对比分析,该研究为小型模型在多样化应用中的潜能提供了新的见解。意义:为语言模型的可扩展性和实用性探索提供了信号。
👍 9
06/29 20:00
Metacognition is a critical component of intelligence that describes the ability to monitor and regulate one's own cognitive processes. Yet LLMs exhibit systemic deficiencies in key metacognitive faculties: they hallucinate with high confidence, fail to recognize knowledge boundaries, and misreprese
中文介绍 研究了基于元认知反馈的强化学习如何帮助 LLM 真实表达不确定性。在当前的 LLM 中,常常存在自信过度和知识边界不清晰等问题,提出的方法可促进其在复杂决策中的运用。意义:为提高智能系统的可靠性和情境适应能力提供了新途径。
👍 8
06/24 20:00
Autoregressive Transformers dominate high-quality mesh generation by producing artist-worthy topologies, yet their inherent sequential decoding induces substantial computational overhead, falling orders of magnitude slower than parallel generative models. On the other hand, while continuous diffusio
中文介绍 PolyFlow 提出了持续拓扑嵌入流匹配方法,旨在改进基于自回归 Transformer 的网格生成,通过消减序列解码带来的计算开销以提高速度。研究表明,流模型在并行生成中展现了更高的效率与质量。意义:推动了计算机图形学和生成模型的应用趋势。
👍 6
06/29 20:00
Graphical user interface (GUI) agents build on vision-language models to complete user tasks end-to-end in real applications through interface actions such as tapping, swiping, text entry, and navigation. However, existing GUI agents are trained and evaluated largely on offline trajectories, simulat
中文介绍 论文提出一种基于视觉语言模型的 GUI 代理,旨在通过用户界面操作完成用户任务,同时解决在真实应用环境中训练和评估的不足,提供更灵活的操作能力。意义:提升了人机交互界面的智能化水平,促进 GUI 代理的发展。
👍 6
06/23 20:00
Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly: informal papers cite mostly at the document level, while formal libraries record fine-grained dependencies over a much smaller body of mathematics. We introduce TheoremGraph, a unifi
中文介绍 TheoremGraph 提出了一种统一框架,旨在连接形式与非形式数学知识,通过记录数学陈述及其依赖关系,填补现有文献中的知识鸿沟,推动数学知识的结构化表示。意义:加强了数学知识管理与利用的基础,为在线教育和科学研究提供支持。
👍 5
06/29 20:00
LLM agents increasingly act over long horizons, where a single trajectory can contain hundreds or thousands of actions. In these settings, outcome-only rewards provide too sparse guidance, failing to inform the model about the goodness of intermediate actions. Dense supervision methods aim to solve
中文介绍 QVal 提出了便宜且有效的方式来评估长时间 LLM 代理的稠密监督信号,解决了结果导向奖励稀疏的问题,以丰富模型对中间动作的反馈信息。意义:为长时间交互任务中的学习策略优化提供了新的方向。
👍 4
06/28 20:00
Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and extern
中文介绍 BrainJanus 研究了大脑、视觉与语言之间理解和生成的统一模型,旨在建立外部感官刺激与内部神经活动之间的双向对应关系。此方法强调模态间的协同效应,为神经科学研究提供了新的视角。意义:推动多模态学习和理解机制的深入探索。