👍 57
06/24 20:00
Modern Vision-Language-Action (VLA) models often fail to generalize to novel setups, such as altered camera viewpoints or robot morphologies, because they are typically conditioned only on current observations and language instructions. By ignoring the underlying system configuration as a variable,
中文介绍 针对现代视觉-语言-动作(VLA)模型在新环境中的泛化能力不足的问题,提出了一种上下文世界建模方法。该方法在生成过程中考虑了系统配置作为变量,从而改善了模型对变化镜头和机器人形态的适应性。这项研究对于增强机器人控制的灵活性和应对新颖场景的能力具有重要意义。
👍 48
06/24 20:00
Outcome-based reinforcement learning provides a stable optimization backbone for language agents, but its sparse trajectory-level rewards provide little guidance on which intermediate decisions should be reinforced or suppressed. On-policy self-distillation offers dense token-level supervision, yet
中文介绍 针对基于结果的强化学习在决策过程中回报稀疏的问题,提出了一种在政策自蒸馏的框架下进行技能蒸馏的方法,此方法提供了密集的标记级别监督。通过实验验证,其有效提高了语言智能体在复杂任务中的决策能力,具有显著推进代理强化学习研究的效果。
👍 45
06/24 20:00
While text-to-image (T2I) models have achieved remarkable progress, they struggle with real-world requests that are often underspecified, implicit, or dependent on up-to-date knowledge. We identify this challenge as the Context Gap: the mismatch between the user context and the sufficient generation
中文介绍 在文本到图像生成(T2I)模型中,提出了解决用户上下文与生成需求不匹配的‘上下文缺口’问题。通过引入新的生成机制,使模型能够处理不明确、依赖于最新知识的实际请求。这一研究对于提高图像生成模型的实用性和用户体验具有重要意义。
👍 42
06/23 20:00
A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning capabilities and engineering harnesses grow more sophisticated, generating complex candidate solutions is n
中文介绍 针对当前编码代理在解决方案验证与产生中的反直觉问题,提出了一种新的评估方法,揭示了现有技术未能充分描述的代表性失败。通过对不同模型的能力评估,强调了解决复杂编码任务的模型发展当前面临的挑战,具有重要的应用和研究意义。
👍 38
06/24 20:00
A unified representation for text and vision is a natural pursuit, as it enables simpler multimodal modeling and more efficient training. However, representing images as discrete signals in the same way as text inevitably introduces severe information loss. Existing work struggles to balance low-lev
中文介绍 该研究提出了一种将文本和视觉统一表示的方法,以简化多模态建模和提高训练效率。通过量化视觉表示,虽然该方法有效减少了信息损失,但仍面临如何平衡低级表示与高级语义之间的挑战,对促进未来的多模态融合研究具有重要意义。
👍 32
06/24 20:00
Speculative decoding (SD) accelerates autoregressive Large Language Models (LLMs) by drafting multiple tokens and verifying them in parallel, but it faces a scaling limitation: increasing the draft budget improves speed only when acceptance remains high and drafting overhead stays low. This ceiling
中文介绍 针对推测解码在快速生成自回归大型语言模型时存在的扩展限制,提出了一种并行树草稿的方法以提高解码效率。通过优化草稿预算和接受率,显著提升了生成速度,为自然语言处理领域的大型语言模型应用带来了新的思路。
👍 28
06/21 20:00
Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a matched execution-layer benchmark
中文介绍 提出了一个新标准以匹配图形界面和程序命令界面的计算机使用代理的执行性能,消除了任务和初始状态之间的混淆。这一研究有助于更系统地评估智能体在执行复杂任务中的表现,对推进人机交互领域具有重要的实际应用价值。
👍 25
06/23 20:00
Joint-Embedding Predictive Architectures (JEPAs), including recent LeWorldModel (LeWM), have become a promising foundation for reconstruction-free visual world models. For visual planning, however, LeWM evaluates candidate action sequences by repeatedly applying a local one-step latent transition mo
中文介绍 通过进一步开发联合嵌入预测架构(JEPAs),提升了LeWorldModel(LeWM)在无重构视觉世界建模中的有效性,尤其在评估候选动作序列时表现优异。这一方法为未来的视觉规划提供了新的视角,对强化学习与机器人领域的研究具有深远意义。
👍 23
06/24 20:00
We present Qwen-Image-2.0-RL, a post-training pipeline that applies reinforcement learning from human feedback (RLHF) and on-policy distillation (OPD) to improve both the visual quality and instruction-following capability of the Qwen-Image-2.0 diffusion model. To provide reliable reward signals, we
中文介绍 提出Qwen-Image-2.0-RL,通过人类反馈强化学习(RLHF)和策略蒸馏(OPD),旨在提升图像生成模型的视觉质量及遵循指令能力。通过提供可靠的奖励信号,优化了生成过程,对推动生成模型在真实场景中的应用具有重要意义。
👍 18
06/24 20:00
As agentic systems continue to evolve and are widely deployed in real-world scenarios, there is a growing demand to faithfully evaluate their capabilities. However, current benchmarks are typically built on popular applications with relatively simple tasks and focus on a narrow set of capabilities w
中文介绍 随着代理系统在现实场景中的广泛部署,评估其能力的需求不断增加。本研究重审了代理在复杂环境下的表现,提出了基准测试方法,弥补当前测试在多样性和复杂性方面的不足,对了解智能体能力界限具有重要影响。
👍 17
05/06 20:00
We introduce an axiomatic evaluation framework for latent thought representations in LLMs, comprising metrics that are independent of downstream benchmark scores and reveal representational failures that benchmark accuracy masks. Existing evaluations conflate representation quality with model capaci
中文介绍 引入了用于潜在思维表示的公理评估框架,通过独立于下游基准分数的指标,揭示了代表性失败。这项研究提供了对大规模语言模型(LLM)表示质量的深入洞察,对增强模型的推理和理解能力有显著的促进作用。
👍 16
06/23 20:00
Tool use enables large language models (LLMs) to perform complex tasks, and recent agentic reinforcement learning (RL) methods show promise for enhancing model capabilities. However, RL alone often leads to instability or limited gains in tool-use tasks. In our experiments, some models exhibit catas
中文介绍 针对大型语言模型(LLMs)在工具使用中的不稳定性问题,分析了层级工具使用强化学习的崩溃原因,并提出通过监督信号进行修复的方法。这一发现为提升模型在复杂任务中的可靠性提供了新的视角,对强化学习实践具有重要意义。
👍 13
06/24 20:00
The prevalent dual-branch paradigm, i.e., training a side network to encode visual conditions and fusing its intermediate-layer features to a frozen pretrained main network, has shown remarkable success in visual-condition controllable generation. Despite its widespread adoption, the role of the sid
中文介绍 针对视觉条件可控生成中的双分支范式,提出了一种基于条件生成的可能性评分对齐方法,旨在优化特征融合的效果。研究表明该方法在实际应用中能够显著提升生成质量,推动了视觉生成技术的发展。
👍 11
06/21 20:00
Vision-language models (VLMs) are increasingly deployed in consumer, medical, financial, and enterprise applications. This broad deployment expands the safety surface: risks can arise from multimodal question answering, assistant responses, and cross-modal composition, while moderation policies may
中文介绍 针对视觉语言模型在多个应用中的安全性问题,提出了一种动态推理的政策适应保卫措施,以减轻多模态问答和跨模态组合所带来的风险。这一研究对于保障智能代理的安全可靠使用具有重要实际意义。
👍 10
06/24 20:00
Reasoning capability has advanced rapidly in large language models (LLMs), leading to an increasing size of key-value (KV) cache in both prefilling and decoding stages. Existing KV cache compression methods mainly rely on attention weights to estimate token importance. While attention effectively ca
中文介绍 针对大型语言模型(LLMs)推理能力快速提升带来的键值(KV)缓存膨胀问题,提出了一种信息感知的KV缓存压缩方法。研究表明,该方法有效提升了推理效率,为进一步深入研究长时间推理模型提供了新的方向。
👍 9
06/14 20:00
As LLM agents become capable of increasingly long-horizon tasks, evaluating their performance in economic systems is becoming increasingly important. Unlike existing benchmarks that primarily evaluate a single agent interacting with a passive environment, economic systems are inherently multi-agent,
中文介绍 随着LLM代理在长期任务中的能力提升,提出了在异构多代理经济系统中进行性能评估的新基准,强调了经济环境中的多代理互动。这一研究有助于改善智能体在复杂经济场景中的决策能力,对经济建模具有重要意义。
👍 9
06/24 20:00
Video reasoning language models implicitly assume that every input frame is equally reliable. This leads to what we term the Blind Trust Problem: under realistic perturbations such as motion blur, glare, or occlusion, frontier video reasoning models can suffer 15-30%p accuracy drops on real-world em
中文介绍 针对视频理解中对每一帧的过于依赖假设,提出了一种基于信心感知的工具协调方法,以解决在现实场景中由于运动模糊、眩光等因素导致的准确率下降问题。此方法有效提升了视频推理模型在不确定环境下的鲁棒性,具有重要的应用潜力。
👍 9
06/24 20:00
We present PhysiFormer, a diffusion transformer for physically-plausible 3D object motion. Unlike video world models that operate in view-dependent pixel space, PhysiFormer represents objects as 3D meshes expressed in world coordinates. Given the initial vertex positions and velocities, as well as o
中文介绍 提出PhysiFormer,一个用于模拟物理上合理的三维物体运动的扩散变换器,具有在世界坐标中表示对象的能力。该模型在物体动作和交互建模中的应用,为物理仿真领域的研究提供了新的思路,具有重要的实际应用前景。
👍 8
06/23 20:00
Text detoxification, the automated detection and mitigation of abusive and harmful content, is essential for ensuring the safety of online communities and protecting users. However, low resource languages such as Tatar have received little research attention. In this paper we present Tatoxa, a novel
中文介绍 针对低资源语言(如鞑靼语)中的文本去毒化问题,提出Tatoxa,一个专门设计用于检测和缓解有害内容的系统。这一系统的研究有助于提高在线社区的安全性,对低资源语言处理中的文本安全问题具有重要意义。
👍 8
06/23 20:00
Process reward models enable fine-grained, step-level evaluation of LLMs, yet building them for agentic settings remains prohibitively difficult: long-horizon interactions, irreversible actions, and stochastic environment feedback make both human annotation and Monte Carlo estimation infeasible at s
中文介绍 通过建立过程奖励模型,推动了长时间交互中对LLM代理的细粒度评估,但在代理环境中构建奖励模型依然困难。研究分析了长时间任务中的人类注释和估计问题,提供了对未来强化学习研究设计的深刻影响。