👍 60
06/03 20:00
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolv
中文介绍 针对代码语言模型在软件演进中对库级上下文的需求,提出了 Code2LoRA 方法。该方法通过超网络生成适配器,避免了对每个库进行单独微调的高昂成本,同时增强了解决导入、API 和项目约定方面的能力。实验表明,相较于传统方法,该技术在保持效果的同时,提高了灵活性和适应性。意义:此研究趋势有助于提升代码自动生成及智能编程助手的性能。
👍 42
06/03 20:00
Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story progresses, not maintain a fixed persona. Existing benchmarks measure factual recall at a given chapter, not whether responses align with the character's psychological trajectory, especially in
中文介绍 论文探讨了角色扮演语言代理(RPLAs)在故事进展中是否能够按需调整角色特征,而非维持固定人格。现有评测方法侧重于事实回忆,而未能有效评估响应与角色心理轨迹的吻合程度。通过引入新的评估标准,提供了更全面的角色表现分析框架。意义:对增强对话系统角色一致性的研究和应用有重要影响。
👍 37
06/02 20:00
Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on explicit user requests, which surface only the problems the user has noticed, while many other important problems coexist, hidden in plain sight, within the broader user context, with their
中文介绍 提出 TIDE 以主动发现多问题为目标,通过模板引导迭代机制,帮助智能代理在文档和工具中识别未被用户注意的重要问题。该方法突破了以用户请求为驱动的局限,能够在更广泛的用户上下文中主动发现潜在需求,从而提高用户体验和满意度。意义:推动智能助手在多任务处理及问题发现中的应用能力向前发展。
👍 35
06/03 20:00
Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual c
中文介绍 提出 AdaPlanBench,用于评估大型语言模型在面对世界和用户约束下的自适应规划能力。该基准关注逐步披露的双重约束下的规划策略,为评估复杂环境中的决策制定能力提供了新视角。初步结果表明,AdaPlanBench 可以有效提升模型在动态环境中的表现。意义:强化智能代理在动态环境中的决策能力,对自主系统有深远影响。
👍 34
06/02 20:00
We introduce VideoKR, the first large-scale training corpus specifically designed to strengthen knowledge- and reasoning-intensive video understanding. It comprises 315K video reasoning examples over 145K newly collected, CC-licensed, expert-domain videos. We develop a human-in-the-loop, skill-orien
中文介绍 开发了 VideoKR,这是一个专门针对知识和推理密集型视频理解的大规模训练语料库,包含315K个视频推理示例。通过人机协作和技能导向的策略,旨在提升视频理解的深度和广度。初步结果显示,该库在推动视频理解研究方面具有重要价值。意义:对未来的视频分析和推理技术发展具有促进作用。
👍 23
06/01 20:00
While household robots are often evaluated based on task completion, everyday domestic environments involve value-conflicting situations in which robots are expected to choose actions that prioritize other values than task success, such as human autonomy, efficiency, or social appropriateness. Yet,
中文介绍 研究表明,家庭机器人在评估过程中不仅需要完成任务,还需在价值冲突的背景下做出合适的选择。该框架为机器人设定了更高的评估标准,关注人类自主性和社会适宜性等多重价值,提出了价值优先选择的算法和评估方法。意义:推动家庭机器人的智能决策能力的发展,适应日常复杂环境。
👍 23
06/03 20:00
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To tra
中文介绍 论文提出了一种基于强化学习的方法,以促进模型在未见语言翻译中的上下文学习能力。对现有翻译模型进行改进,以减少特定语言的过拟合,提升零-shot 转移能力,从而提高模型对多样语言的适应性和理解能力。结果显示,新方法在多个语言对中显著提高了翻译准确性。意义:为低资源语言处理提供新的方法,具有广泛的跨语言应用潜力。
👍 21
06/03 20:00
Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs is a promising yet challenging frontier field. Existing unified frameworks predominantly rely on massive models (typically 13B parameters or more) and incorporate source video conditions for
中文介绍 开发了 LoomVideo,一个统一的视频生成和编辑模型,能够理解多模态输入。该模型结合了视频条件和用户指令,实现了更为灵活和高效的视频内容创作,解决了传统大型模型在处理多样输入时的性能瓶颈。结果表明,该方法在多模态生成任务中表现优秀。意义:为视频创作和编辑工具的智能化发展提供了新方向。
👍 18
06/02 20:00
We study the personal camera roll visual question answering setting. In this setting, a conversational AI assistant can access a user's personal camera roll and retrieve relevant photos to answer queries, ranging from simple factual questions (e.g., ``Name of the food I tried yesterday?'') to more o
中文介绍 在个人相机卷视觉问答场景下,研究了一种能够访问用户个人相机卷的对话AI助手。该体系结构设计支持从简单的事实问题到复杂查询的多样化应答,通过检索相关照片来增强了回答的准确性和关联性。试验结果显示其有效性。意义:推动个人化智能助手在家庭环境中的应用。
👍 17
06/02 20:00
Experience internalization converts contextual experience from past interactions into reusable parametric capability, offering a promising path toward continual learning in large language models (LLMs). While prior work has predominantly focused on single-iteration transfer, we discover that under m
中文介绍 提出了一种新的经验内化策略,旨在利用上下文经验实现自我演变的 LLM 代理。与以往的单次迭代迁移方法相比,新的方法能够有效整合多次交互经验,提升记忆策略的效率。实验表明,所提方法在提升模型应对复杂任务上的表现上具有重要意义。意义:有助于推进 LLM 在持续学习和适应性方面的研究。
👍 15
06/03 20:00
Video generation models have made impressive strides in synthesizing visually compelling content, yet their outputs remain confined to the virtual domain. A natural question follows: how well do these models reflect the physical world when their generated videos leave the screen and enter reality? W
中文介绍 论文探讨了视频生成模型在机器人操作中的潜在应用,提出能将生成视频转化为可执行任务的框架。此方法旨在提升生成内容与现实世界之间的对应关系,以实现更为智能的机器人运动策略。初步结果表明,该框架可以有效提升机器人的执行能力。意义:推动机器人视觉理解与现实应用结合的研究进展。
👍 10
06/03 20:00
Inference-time skill augmentation provides a lightweight way to improve data-analytic agents by injecting reusable procedural knowledge without updating model parameters. However, discovering effective skills for data analysis remains challenging, as reliable supervision is expensive and success cri
中文介绍 提出了一种无监督技能发现的方法,以增强数据分析过程中代理的表现。该方法借助推理时间技能增强技术,在不更新模型参数的情况下,注入可重用的程序知识,解决了数据分析技能获取的困难。初步结果展示了其有效性。意义:该研究为智能代理在数据分析领域提供了新的研究方向。
👍 7
06/01 20:00
Selection is a core operation in interactive image editing. To be practical, a user should be able to specify and disambiguate the desired selection region through either text or click-based interactions, and the system should support selecting not only objects but also other criteria, such as mater
中文介绍 MAOAM 提出了一种利用视觉语言模型进行统一物体与材料选择的方法,以满足用户在交互图像编辑中的多样需求。用户可以通过文本或点击交互明确选择区域,系统则不仅支持物体选择,还能处理其他条件如材质选择,提升了人机交互体验。意义:推动智能图像编辑工具的多功能发展。
👍 7
06/03 20:00
Large language models can reproduce training data, but existing memorization evaluations mostly measure whether models can be forced to do so, rather than whether they do so under ordinary use. We introduce PropMe, a propensity-aware framework for memorization evaluation that contrasts prefix-based
中文介绍 论文研究了大型语言模型在生成训练数据上的倾向性,提出了 PropMe 框架,旨在从普通使用角度评价记忆现象,而非仅通过强制测试。通过对比前缀的方法,建立了更具可控性的评估标准。初步测试显示,模型在面对不同场景时的记忆表现存在显著差异。意义:为理解 LLM 的记忆机制与使用影响提供了新视角。
👍 6
06/03 20:00
Vision-Language-Action (VLA) models leverage the rich world knowledge of pretrained vision-language models (VLMs) to enable instruction-following robotic manipulation. However, the structural mismatch between VLM semantic spaces and embodied control policies often hinders the learning of precise per
中文介绍 AffordanceVLA 引入了一种视觉-语言-行动模型,通过对环境的理解,增强机器人执行任务的能力。模型结合了丰富的语义信息与预训练视觉-语言模型,提升了指令响应的准确性。初步结果表明,该方法在复杂环境中的表现优越。意义:推动机器人自主任务执行和多模态交互的研究。
👍 6
05/27 20:00
Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajectories into compact memory. However, existing approaches typically train these memory policies using outcome-based reinforcement learning, failing to localize where intermediate memory quality
中文介绍 提出了一种元认知记忆策略优化框架,用于应对长期任务的 LLM 代理。该方法通过递归总结交互轨迹并优化记忆政策,提升了对复杂任务的适应能力。实验证明,与传统基于结果的强化学习方法相比,具有显著效果。意义:对推动 LLM 代理的自我学习和适应能力产生积极影响。
👍 6
06/01 20:00
Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models' performance, yet real-world deployment is constrained by strict computational budgets. In this work, we formulate inference budget allocation as a global constrained optimization problem governed by economic
中文介绍 论文将在推理时间扩展与资源分配结合,提出通过经济视角优化 LLMs 的预算分配。研究制定了全球约束优化问题,旨在提高模型在严格计算预算下的性能。初步结果显示,该策略在多种模型中均有效提升了资源利用率。意义:为 LLM 的实用性和效率提供新的理论支持。
👍 5
06/03 20:00
We propose world-language-action (WLA) models as a new class of embodied foundation models. WLA takes textual instructions, images, and robot states as inputs to jointly predict textual subtasks, subgoal images, and robot actions, conjoining the world modeling interface to learn from extensive egoce
中文介绍 提出了世界-语言-行动模型(WLA),结合文本指令、图像和机器人状态,支持文字子任务、副目标图像和机器人行动的联合预测。这一新模型通过综合世界建模接口,能够从丰富的执行经验中学习,显著提升任务执行的灵活性与准确性。意义:为构建更智能的多模态系统奠定基础,影响人机协作领域的发展。
👍 5
06/03 20:00
Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbali
中文介绍 提出了一种基于正规化流的潜在推理方法,旨在改善大型语言模型的推理表现。该方法通过明确化中间计算,解决现有文本链式推理方法中离散化和串行化的局限,提升推理效率和准确性。初步结果表明,该方法在复杂逻辑推理任务中表现良好。意义:为增强 LLM 在推理任务中的表现提供新思路。
👍 5
06/03 20:00
Video event prediction (VEP) requires models to infer unobserved future states from partial video evidence. Existing video MLLMs usually verbalize intermediate future reasoning in text space: once visual evidence is verbalized, fine-grained motion, geometry, and interaction cues can be lost, leading
中文介绍 针对视频事件预测中的挑战,提出了一种交错潜在视觉推理的方法,以从部分视频证据中推断未观察到的未来状态。该方法重构了视觉信息的表达,避免了在文本空间中的信息损失,从而提升了预测的准确性。初步结果显示,在复杂视频场景中的表现明显优于传统方法。意义:为未来视频理解及分析提供新的技术支持。