👍 60
06/18 20:00
Massive unstructured multimodal streams suffer from high "data entropy," impeding both efficient human knowledge acquisition and high-quality AI post-training. Existing passive annotation paradigms, heavily reliant on heuristic rules or general VLMs, are costly, monotonous, and fail to unlock the de
中文介绍 针对大规模非结构化多模态数据的高"数据熵"问题,本文提出了 DataClaw0 方法,通过主动定制来提升数据的有效性。这种方法超越了传统的被动注释范式,能够解锁复杂信息,提高人类知识获取的效率,并为 AI 的后续训练提供高质量的输入。意义:有助于推动数据处理和多模态学习方向的发展。
👍 60
06/20 20:00
LLM agents increasingly operate in large tool ecosystems, where real-world tasks require discovering relevant tools, inferring implicit sub-goals, and adapting to dynamic environments over long horizons. However, existing benchmarks rarely evaluate planning under retrieval-limited tool visibility. T
中文介绍 针对大型工具生态系统中的长时序规划问题,本文提出了 PlanBench-XL 基准,评估 LLM 工具使用代理在有限检索环境下的规划能力。现有基准在长期动态环境中的适应性评估不足, PlanBench-XL 填补了这一空白,为未来的代理工具应用提供了重要评估标准。意义:推动智能代理在复杂环境中的应用研究。
👍 55
06/21 20:00
Enterprise agents increasingly operate inside workspaces: they read heterogeneous files, invoke tools, and deliver business artifacts. We introduce EnterpriseClawBench, an enterprise agent benchmark constructed from proprietary, real-world agent sessions. Starting from a large archive of workplace s
中文介绍 本文介绍了 EnterpriseClawBench,一个基于真实工作场景构建的企业代理基准。通过从大量工作会话中提取数据,评估代理在处理异构文件和工具调用时的表现,旨在提升企业环境中代理的有效性和适应性。意义:为企业代理的发展和应用提供了重要的参考标准与数据支持。
👍 36
06/21 20:00
As retrieval systems scale, high-quality reranking becomes increasingly important. However, most existing rerankers, whether encoder-based or decoder-based, jointly encode the query and passage, tightly coupling their computation and limiting deployment efficiency as well as flexibility. We present
中文介绍 为了提高检索系统的灵活性和效率,KaLM-Reranker-V1 提出了快速但不延迟的交互重新排名方法。与大多数传统重排序方法的紧密计算方式不同,该方法通过松散耦合查询与段落的编码,显著提高了部署的灵活性和效率。意义:为信息检索和自然语言处理提供了高效的新方法。
👍 33
06/17 20:00
World Action Models (WAMs) are embodied predictive-action models that make a forecast of the future available to action. Recent WAMs repurpose large video generation models, and a parallel line relies on language or vision-language backbones without a video-generation core. This rapid expansion has
中文介绍 本文调研了世界行动模型(WAMs),这些模型通过将预测和动作结合,增强了在复杂环境中的决策能力。近年来,借助于大型视频生成模型的进步,WAMs的应用和理解有所扩展,为未来的智能体研究提供了新视角,尤其是在现实世界的复杂问题解决中。意义:推动行动模型在多领域应用的深入研究。
👍 24
06/21 20:00
While recent LLM-based terminal agents have demonstrated promising capabilities, the scarcity of high-quality, executable training data remains a critical bottleneck. Existing synthesis pipelines typically scale by retrofitting surface-level artifacts into tasks, frequently yielding ambiguous instru
中文介绍 CLI-Universe 旨在建立一个可验证的任务合成引擎,以解决当前 LLM 终端代理缺乏高质量可执行训练数据的问题。通过改进合成管道,本文提出的框架能够有效生成不模糊的指令,从而促进终端代理能力的发展。意义:对于智能代理在动态任务环境下的应用具有重要作用。
👍 21
06/18 20:00
Existing embedding models are inherently static: they encode text segments in isolation, ignoring their surrounding context and temporal order. This paper introduces EvoEmbedding, a novel embedding model that generates evolvable representations for retrieval. It is tailored for long-context scenario
中文介绍 EvoEmbedding 提出了一种可以演化的表示模型,以解决现有模型静态编码无法结合上下文和时间顺序的问题。该模型特别适用于长对话场景中的检索任务,能够提升信息检索的相关性和准确性。意义:为长时上下文记忆和智能代理的发展奠定了基础。
👍 17
06/19 20:00
We present BioMatrix, the first multimodal foundation model that natively integrates sequences, structures, and natural language for both molecules and proteins within a single decoder-only architecture. Existing biological foundation models pursue native multimodality and broad entity coverage sepa
中文介绍 BioMatrix 作为首个多模态基础模型,能够在单一的解码器架构中整合生物序列、结构和自然语言信息。这项工作拓展了生物领域模型的功能,将序列和结构信息的整合提高到新的水平,预计对生物医药研究产生深远影响。意义:推动生物医学和多模态学习的交叉研究。
👍 16
06/17 20:00
The quadratic complexity of attention poses a critical bottleneck for long-context processing, spurring interest in hybrid attention designs. Most open-source hybrid models adopt a layer-wise strategy. Yet, prior work has noted the inherent difficulty of integrating Linear Attention (LA) with Full A
中文介绍 HydraHead 通过提出头部级别功能异质性的混合注意机制,解决了长上下文处理中的二次复杂性问题。该方法优化了与全注意力机制的整合,为多头注意力提供了新的思路,提升了计算效率和灵活性。意义:为长上下文的处理和注意力机制的发展提供了新方向。
👍 14
06/14 20:00
Personalized presentation generation requires more than conditioning on a current prompt or template: agents must preserve stable user preferences across tasks, retain newly introduced preferences and constraints during multi-turn revision, and carry out local edits reliably. We propose MemSlides, a
中文介绍 MemSlides 提出了一个层次化记忆驱动框架,以支持个性化幻灯片生成和多轮本地修正。这种方法强调保持用户的稳定偏好和动态引入新约束,有助于提升生成内容的个性化和准确性。意义:推动个性化生成技术和人机交互的进一步发展。
👍 14
06/16 20:00
Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so
中文介绍 GateMem 针对多主体共享记忆的 LLM 代理进行了基准测试,探索在不同角色和环境下的共享记忆问题。这一研究填补了现有文献中对于多用户共享系统的空白,对提高智能助手在复杂环境下的适用性具有重要意义。意义:促进智能代理在多主体环境中的应用和研究。
👍 13
06/01 20:00
Computer-Use Agents (CUAs) are increasingly deployed in dynamic interactive environments, creating a growing need for continual skill learning during interaction. Recent approaches address this challenge by learning reusable skills from successful trajectories. However, these skill learning methods
中文介绍 为了解决计算机使用代理(CUAs)在动态交互环境中的持续技能学习需求,SkillHarness 提出了一个安全技能的学习框架。该方法通过分析成功轨迹,提升了技能的重复使用性和安全性,为代理技术的安全部署提供了保障。意义:推动交互学习和安全技术的发展。
👍 13
06/16 20:00
Self-distillation improves reasoning in large language models by using the model's own rollouts as training signal, typically through implicit logit-level alignment that minimizes KL divergence toward a privileged target distribution. However, because this supervision is generated via uncontrolled s
中文介绍 本文提出了一种通过自蒸馏构建可学习的微反思轨迹的方法,以提升大语言模型的推理能力。通过使用模型自身的输出作为训练信号,显著改善了推理表现,为自主学习和知识积累提供了新模型和思路。意义:促进继续学习与自我完善在智能系统中的应用。
👍 12
06/16 20:00
Modern agent systems often suffer from fragmented runtime state: transcripts, tool effects, memory events, workspace placement, branch provenance, and replay evidence are recorded separately and become difficult to inspect or reproduce. OpenRath addresses this issue with a PyTorch-like programming m
中文介绍 OpenRath 解决了现代智能代理系统中碎片化运行状态的问题,通过提供可视化状态管理工具,整合记录的转录、工具效应和记忆事件。这种方法显著提升了系统的可检查性和可重现性,对于代理系统的开发与调试具有重要意义。意义:推动智能代理在复杂任务中的透明度和可靠性。
👍 12
06/15 20:00
Retrieval-augmented generation (RAG) systems depend critically on how documents are chunked and searched. Fine-grained chunks can improve retrieval precision but expand the search space, increasing latency and cost; larger chunks reduce the number of candidates but make dense similarity less reliabl
中文介绍 MCompassRAG 通过将主题元数据作为语义指南,优化了段落级检索过程。该方法兼顾了检索精确度与上下文连贯性的平衡,可有效减少检索延迟与成本,提升了信息检索的效率和效能。意义:为检索增强生成(RAG)系统的发展提供了新策略。
👍 10
06/16 20:00
Deep research agents are Large Language Model (LLM)-based systems designed for autonomous, multi-step scientific reasoning, and they hold immense potential for accelerating research in the physical sciences. However, comprehensive and in-depth evaluations of their capabilities within this domain rem
中文介绍 Deep Research agents 是基于 LLM 的系统,旨在推进物理科学领域的自主多步推理研究。通过深入评估这些智能体的能力,本文为加速科学研究的潜力提供了新的视角,期待在科研领域中发挥重要作用。意义:推动物理科学的智能化研究和发展。
👍 10
06/14 20:00
While reasoning on autoregressive (AR) models is often performed by chain-of-thought reasoning and reflection, their refinement of previous outputs still relies on fully sequential generation, even when only local edits are needed. In contrast, the masking mechanism in Mask Diffusion Models (MDMs) n
中文介绍 在自回归(AR)模型中,虽然推理主要通过逐步思考进行,但 Mask Diffusion Models (MDMs) 提出的掩蔽机制显著提升了输出的精炼能力。通过允许局部编辑而非完全顺序生成,该方法提高了推理的灵活性与效率。意义:对推理技术的创新和改进有重要贡献。
👍 9
06/16 14:28
Retrieval-augmented generation (RAG) systems must balance retrieval granularity with contextual coherence, a challenge that existing methods address through LLM-guided chunking, single-level context expansion, or hierarchical summarization. These approaches variously depend on costly LLM calls durin
中文介绍 SproutRAG 通过关注引导树搜索与渐进编码的结合,处理了长文档的检索增强生成问题。本文提出的综合策略可以降低检索成本与延迟,提高生成的上下文连贯性,为长文档管理提供新工具。意义:在深入处理长文本和多模态检索领域具有重要应用潜力。
👍 8
06/21 20:00
Recently, end-to-end OCR models, exemplified by DeepSeek OCR, have once again thrust OCR into the spotlight. A widely held view is that employing a large language model (LLM) as the decoder allows the model to leverage the prior distribution of language, leading to improved OCR performance. However,
中文介绍 Unlimited OCR Works 探讨了将大型语言模型(LLM)作为解码器的端到端 OCR 模型。该研究表明,LLM 的引入显著提升了光学字符识别的性能,虽然对数据和模型的需求依然较高。这一进展为未来的 OCR 系统开辟了新路径。意义:在 OCR 和自然语言处理技术的发展上具有重要意义。
👍 7
06/09 20:00
Scientific discovery workflows usually contain and rely heavily on lab notes, where researchers record observations, interpret uncertain results, and plan follow-up experiments. Such informative lab notes preserve evolving scientific reasoning and author uncertainty, rather than polished final resul
中文介绍 Notes2Skills 针对科学发现流程中的实验室笔记进行了整理,提出了一种将其转化为具有不确定性意识的科学代理技能的框架。该方法强调了实验记录在不断演变的科学推理中的重要性,为科学研究的自动化与智能化提供了新思路。意义:推动科学研究中智能代理的应用和发展。