vLLM 开发动态报告 - 2025-12-22
时间窗口: 2025-12-22 10:50 (UTC+8) ~ 2025-12-23 10:50 (UTC+8) 数据统计: 新 Issue 17 | 关闭 Issue 9 | 新 PR 78 | 合并 PR 29 | 关闭未合并 PR 21
📊 每日开发状态摘要
在过去24小时内,vLLM 社区保持高度活跃,新增了17个 Issue 和78个 PR,同时合并了29个 PR。开发重点集中在三个方面:1) AMD 生态适配,特别是针对新架构(如 Strix Halo)的 FP8 支持和 ROCm 构建修复;2) 性能优化与缺陷修复,涉及注意力机制、推测解码、编译缓存等多个核心模块;3) 对新硬件(如 NVIDIA Blackwell)和量化格式(如 ModelOpt FP8)的扩展支持。整体来看,项目正处在一个快速迭代、积极解决平台兼容性与性能瓶颈的阶段。
🎯 AMD/ROCm 生态相关动态
本周期 AMD 生态相关的活动非常活跃,主要体现在构建修复、新架构(Strix Halo)支持以及 AITER 运行时优化上。
Issues:
- #31155 (已关闭) & #31139: ROCm 构建被破坏。由 PR #30821 引入的未使用变量
rotary_dim导致 ROCm 构建失败(-Werror=unused-variable)。已通过 #31156 快速修复。另一个构建失败(Docker 镜像)可能涉及更复杂的 HIP 编译警告问题。 - #31086 (已关闭): ROCm Triton 后端故障。表现为多个测试在启用 Triton 后端时失败。问题通过回滚到旧的基础镜像暂时解决,表明可能与新版 Triton 的兼容性有关。
PRs (主要来自 c0de128,专注于 Strix Halo 支持):
- FP8 支持扩展 (#31184): 在
supports_fp8()函数中添加gfx11前缀检查,旨在为 RDNA 3/3.5 架构(包括 Strix Halo)启用 FP8 量化。需要验证:审阅者要求提供 gfx11 硬件上的 lm_eval 测试结果以确认功能有效性,贡献者表示无相关硬件访问权限。 - AITER 设备参数统一 (#31178, #31176, #31149): 一系列 PR 旨在将硬编码的
device=”cuda”替换为从输入张量获取的设备,以提高 ROCm 在多 GPU 和特定架构上的兼容性。同样,需要实际硬件测试验证。 - 代码质量与 Bug 修复 (#31177, #31121, #31119, #31118): 涉及异常处理细化、修复 Python 列表别名导致的 MoE 专家分配错误、张量切片赋值模式统一、未初始化变量修复等。这些是针对 ROCm 代码路径的基础性加固。
- CI/测试修复 (#31192, #31159): #31192 因 ROCm 使用
spawn而非fork的进程创建方式,跳过了依赖fork的 V1 引擎测试。#31159 回退了 Triton 版本以修复 GPT-OSS 在 gfx950 上的导入错误。
总结:AMD 生态的工作聚焦于扩大硬件支持范围(gfx11)和夯实代码基础(设备兼容性、内存/逻辑错误)。一个突出挑战是缺乏易于访问的 gfx11 测试环境,导致部分 PR 无法提供端到端性能验证,依赖 CI 和核心维护者进行最终确认。
💬 高热度讨论分析
- PR #31193: [Feature] Add iteration level logging and enhance nvtx marker
- 核心议题:是否默认启用详细的迭代级别日志(请求数、令牌数、耗时)和增强的 NVTX 标记。
- 不同观点:
- 反对默认开启 (wangshangsam, nvpohanh, robertgshaw2-redhat):认为该功能主要服务于 GPU 性能分析与调优的专家用户,对绝大多数普通用户而言过于冗长,且可能带来不必要的 CPU 开销和日志干扰。
- 维护者行动 (maxyanghu):基于反馈,迅速将功能改为默认关闭,通过环境变量
VLLM_LOG_ITERATION_DETAILS控制。
- 结论:功能被保留,但遵循了“安静默认”的原则,体现了对用户体验和性能开销的考量。
- Issue #31128: [Feature]: Add support of Blackwell SM121(DGX Spark)
- 核心议题:请求为 NVIDIA DGX Spark (Blackwell, ARM64, CUDA 13) 提供原生支持。
- 讨论内容:
- 问题:vLLM 0.13.0 依赖 PyTorch 2.9.0,但其官方轮子不提供 ARM64 + CUDA 版本。NGC PyTorch 2.10 镜像是目前唯一选择。
- 社区提供的解决方案:多位用户贡献了从源码编译、使用 CUDA 13 专用轮子(从 v0.13.0 开始提供)和特定 Docker 镜像的详细步骤。
- 结论:讨论显示 vLLM 已通过夜间构建和特定版本轮子提供了对 Blackwell 和 CUDA 13 的支持,但官方文档和版本兼容性说明可能需要更新以更好地引导用户。
- Issue #31148 & PR #31198: Jais2 model unsupported rotary_dim kwarg
- 核心议题:Jais2 模型在初始化时因
get_rope()接收到不支持的rotary_dim参数而失败。 - 讨论与解决:Issue 中迅速关联了修复 PR (#31198)。该 PR 删除了导致问题的错误 LoRA 逻辑。这是一个典型的“报告-修复”快速响应案例。
- 核心议题:Jais2 模型在初始化时因
- Issue #31155: [Bug] [ROCm] [Critical]: ROCm build broken
- 核心议题:ROCm 构建因未使用变量错误而中断,属高优先级阻塞性问题。
- 讨论与解决:问题被迅速标记为
critical,并在几小时内通过 #31156 修复和合并。体现了对关键平台构建问题的快速响应能力。
🔥 热门话题与趋势分析
- 新硬件支持浪潮:社区正积极应对 NVIDIA Blackwell (SM100, SM120) 和 AMD RDNA 3.5 (Strix Halo) 等新硬件。讨论和 PR 涉及架构检测、内核调度、CUDA 版本兼容性(12 vs 13)和安装路径。
- 量化支持扩展:对 FP8 量化 的支持持续深化,范围从 NVIDIA ModelOpt 的新变种(#30957)扩展到 AMD 消费级显卡。同时,MoE 模型的量化支持(如 FP8 MoE 内核选择)是另一个活跃子领域。
- 编译与缓存问题:多个 Issue (#31199, #31183) 涉及 TorchInductor 编译缓存在多进程并发下的冲突,以及 CUDA Graph 捕获的稳定性,反映了在追求极致性能时面临的复杂性。
- 分布式推理中的 Bug:关于 P/D 架构下 KV 缓存连接器 (#31145) 和 Tensor Parallel 与推测解码 (#31154) 的 Bug 报告,说明在复杂分布式场景下,状态同步和确定性仍是挑战。
- 模型支持与 CI 维护:由于上游模型被删除(如 mosaicml/mpt-7b),需要及时禁用相关测试 (#31182)。同时,需要不断适配 Transformers 库的 API 变动 (#31181, #31146)。
🛠️ 重点技术变更
- PR #31197 (已合并): Revert [SM100] Enable fp8 compute for prefill MLA:紧急回退了之前合并的 #30746。原因是在依赖的 FlashInfer 更新 (#30993) 未就绪前,该更改可能导致问题(如破坏 Blackwell CI)。影响:暂时撤销了 FP8 在 MLA 预填充阶段的加速支持,等待底层库稳定。
- PR #31167 (已合并): [Perf] Remove blocking copy in GDN Attention:移除了 Qwen3-Next 模型中 GDN 注意力层的一个阻塞性拷贝操作。影响:为异步调度和 MTP 的完全非阻塞化扫清障碍,在小批量场景下测得约 1% 的性能提升。
- PR #31192 (已合并): [AMD][CI] fix v1/engine test_preprocess_error_handling:因 ROCm 平台使用
spawn多进程方法,而测试依赖fork,故跳过该测试。影响:保证了 AMD CI 的通过率,但也凸显了跨平台多进程模型带来的测试复杂性。 - PR #30957 (已合并): Support NVIDIA ModelOpt HF FP8 variants:新增对两种 ModelOpt 私有 FP8 格式 (
FP8_PER_CHANNEL_PER_TOKEN,FP8_PB_WO) 的加载支持。影响:扩展了 vLLM 对 NVIDIA 优化后模型的兼容性,满足了特定用户工作流的需求。 - Issue #31200: [Bug]: class Request and block_hasher has circular reference:指出多模态特征缓存中
Request和block_hasher之间存在循环引用,可能引起内存泄漏。影响:提供了一个使用weakref的修复方案,此问题对长期运行的多模态服务至关重要。
📈 开发活跃度观察
- 贡献者活跃度:非常活跃。用户
c0de128在 AMD 相关修复上提交了密集的 PR 序列。robertgshaw2-redhat,jeejeelee,mgoin,AndreasKaratzas等核心贡献者在审查、修复、合并上保持高强度互动。 - 代码审查与合并速度:对于关键性构建修复(如 #31156)和明显错误修复(如 #31198)响应和合并速度极快(数小时内)。对于涉及性能优化和新功能的 PR,审查更严谨,通常会要求提供性能基准(如 lm_eval 结果)或讨论默认行为的影响(如 #31193)。
- 跨平台协作:在 AMD 相关 PR 的讨论中,可以看到来自 AMD 员工的审阅要求(@tjtanaa 要求提供性能测试)与外部贡献者(缺乏硬件)之间的互动,社区正在探索如何在这种约束下有效合作。
💡 值得关注的问题
- AMD Strix Halo (gfx11) 的 FP8 支持验证:PR #31184 等已添加代码支持,但缺乏硬件实测验证。这是扩大 vLLM 在 AMD 消费级显卡上应用的关键一步,需要社区或 AMD 内部协调完成最终验证。
- 多模态服务中的内存泄漏风险:Issue #31200 揭示的循环引用问题可能影响服务的长期稳定性,其修复方案值得尽快评估与合并。
- Qwen3-Next MTP 稳定性:Issue #31186 报告了 Qwen3-Next 在多令牌预测(MTP)下的崩溃问题。Qwen3-Next 是新近的重要模型系列,其性能与稳定性备受关注。
- 编译缓存的并发安全性:Issue #31199 反映的 TorchInductor 编译缓存在多进程并发加载时的损坏问题,可能影响高并发部署场景的稳定性。
- 复杂分布式配置的协同逻辑:Issue #31145 和 #31154 分别暴露了多连接器(Multi-Connector)和 N-gram 推测解码在 TP 场景下的边缘 case,提示在复杂分布式组件组合时,需要更全面的集成测试。
📋 附录:详细数据列表
新增 Issue
- #31202 [Bug]: Mixtral Fp8 Accuracy is Degraded — bug,help wanted — by robertgshaw2-redhat (创建于: 2025-12-23 10:27 (UTC+8))
- #31200 [Bug]: class Request and block_hasher has cirular reference, may cause memory leak. — bug — by frelam (创建于: 2025-12-23 09:55 (UTC+8))
- #31199 [Bug]: UnpicklingError during concurrent model compilation on multiple GPUs — bug — by AIDevCuda (创建于: 2025-12-23 09:46 (UTC+8))
- #31148 [Bug]: Jais2 model in vLLM 0.13.0: get_rope() called with unsupported rotary_dim kwarg (TypeError during model init) — bug — by NikolasTh90 (创建于: 2025-12-22 21:18 (UTC+8))
- #31181 [CI Failure]: Transformers Nightly Models Test — ci-failure — by AndreasKaratzas (创建于: 2025-12-23 05:04 (UTC+8))
- #31186 [Bug]: Qwen3-Next MTP Crash — bug — by benchislett (创建于: 2025-12-23 06:02 (UTC+8))
- #31128 [Feature]: Add support of Blackwell SM121(DGX Spark) — feature request — by yanyunl1991 (创建于: 2025-12-22 15:35 (UTC+8))
- #31170 [Bug]: Torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device — bug — by shahizat (创建于: 2025-12-23 02:40 (UTC+8))
- #31123 [Bug]: skip_tokenizer_init=True crashes
google/gemma-3-27b-it— bug — by terrykong (创建于: 2025-12-22 14:40 (UTC+8)) - #31165 [CI Failure]: Entrypoints — ci-failure — by robertgshaw2-redhat (创建于: 2025-12-23 00:51 (UTC+8))
- #31155 [Bug] [ROCm] [Critical]: ROCm build broken — bug,rocm — by tjtanaa (创建于: 2025-12-22 22:56 (UTC+8))
- #31154 [Bug]: Ngram draft tokens diverge across TP ranks when using external_launcher. — bug — by xiaoxiaosuaxuan (创建于: 2025-12-22 22:52 (UTC+8))
- #31145 [Bug]: [P/D] multi-connector cannot be used together with a KV connector that uses a push scheme — bug — by liziyu179 (创建于: 2025-12-22 20:21 (UTC+8))
- #31139 [Bug]: Build vllm/rocm docker image — bug,rocm — by JartX (创建于: 2025-12-22 19:01 (UTC+8))
- #31136 [Bug]: error when run examples/online_serving/prompt_embed_inference_with_openai_client.py — bug — by yuekaizhang (创建于: 2025-12-22 18:10 (UTC+8))
- #31124 [Bug]: vllm.entrypoints.openai.api_server started but can’t be accessed in wsl — bug — by Usigned (创建于: 2025-12-22 14:44 (UTC+8))
- #31122 [Bug]: vllm.entrypoints.openai.api_server started but can’t be access on wsl — bug — by Usigned (创建于: 2025-12-22 14:40 (UTC+8))
已关闭 Issue
- #14954 [Bug]: vLLM running on Unspecified Platform raises NotImplementedError when using podman/docker-compose — bug,stale — by BastianBN (关闭于: 2025-12-23 10:46 (UTC+8))
- #17171 [Bug]: Qwen2VL-2b / Qwen2.5-7b has AssertionError and Cuda error when qps goes higher — bug,stale — by Ericoool9614 (关闭于: 2025-12-23 10:46 (UTC+8))
- #22961 [Bug]: TypeError: MoeWNA16Method.get_weight_loader.
.moe_wna16_weight_loader() got an unexpected keyword argument 'return_sucess' — bug,stale — by n1ck-guo (关闭于: 2025-12-23 10:46 (UTC+8)) - #22965 [Bug]: [xPyD]Abnormal results when using v1 P2pNcclConnector as KV cache transport: repeated requests for the same input produce abnormal outputs — bug,stale — by qianyang01 (关闭于: 2025-12-23 10:46 (UTC+8))
- #31202 [Bug]: Mixtral Fp8 Accuracy is Degraded — bug,help wanted — by robertgshaw2-redhat (关闭于: 2025-12-23 10:42 (UTC+8))
- #31181 [CI Failure]: Transformers Nightly Models Test — ci-failure — by AndreasKaratzas (关闭于: 2025-12-23 08:46 (UTC+8))
- #31086 [Bug][ROCm]: Triton backend broken — bug,rocm — by AndreasKaratzas (关闭于: 2025-12-23 08:45 (UTC+8))
- #31165 [CI Failure]: Entrypoints — ci-failure — by robertgshaw2-redhat (关闭于: 2025-12-23 00:54 (UTC+8))
- #31155 [Bug] [ROCm] [Critical]: ROCm build broken — bug,rocm — by tjtanaa (关闭于: 2025-12-23 00:30 (UTC+8))
新增 PR
- #31193 [Feature] Add iteration level logging and enhance nvtx marker — v1,nvidia — by maxyanghu (创建于: 2025-12-23 08:02 (UTC+8))
- #31203 [ROCm][Bugfix] Fix RuntimeError in MMEncoderAttention by replacing .view() with .reshape() — rocm,multi-modality — by AndreasKaratzas (创建于: 2025-12-23 10:49 (UTC+8))
- #31178 [ROCm][Strix Halo] Fix for device parameter in AITER topK metadata — rocm — by c0de128 (创建于: 2025-12-23 03:53 (UTC+8))
- #31184 [ROCm][Strix Halo] Fix for FP8 support detection on gfx11x architectures — rocm — by c0de128 (创建于: 2025-12-23 05:36 (UTC+8))
- #31176 [ROCm][Strix Halo] Fix for hardcoded device in MLA sparse attention — rocm,nvidia — by c0de128 (创建于: 2025-12-23 03:39 (UTC+8))
- #31197 Revert “[SM100] Enable fp8 compute for prefill MLA (#30746)” — v1,nvidia — by pavanimajety (创建于: 2025-12-23 09:39 (UTC+8))
- #31201 Add nvidia h800 moe config — nvidia — by lengrongfu (创建于: 2025-12-23 10:20 (UTC+8))
- #31131 [Model] Add verify_and_update_model_config for VerifyAndUpdateConfig. — qwen — by noooop (创建于: 2025-12-22 17:15 (UTC+8))
- #31169 [MoE Refactor][10/N] Cleanup Disk -> Kernel Shuffle — nvidia — by robertgshaw2-redhat (创建于: 2025-12-23 02:04 (UTC+8))
- #31166 [Bugfix] Fix MLA attention crash when using DP with DCP — needs-rebase,v1 — by sachinkumarsingh092 (创建于: 2025-12-23 01:06 (UTC+8))
- #31188 [Doc] Add Claude code usage example — documentation — by mgoin (创建于: 2025-12-23 06:11 (UTC+8))
- #31162 [Feature] OTEL tracing during loading — frontend,needs-rebase,ci/build,v1,cpu — by emricksini-h (创建于: 2025-12-23 00:36 (UTC+8))
- #31161 [Bugfix] Fix MoE LoRA bin/pt loading — ready — by jeejeelee (创建于: 2025-12-23 00:17 (UTC+8))
- #31194 [ci] Fix Pytorch compilation test oom in 2.10 — ready — by angelayi (创建于: 2025-12-23 08:23 (UTC+8))
- #31198 [Bugfix] Fix Jais2ForCausalLM — 无标签 — by jeejeelee (创建于: 2025-12-23 09:42 (UTC+8))
- #31147 Add prefix continuation feature to DeepSeek v3.2 — deepseek — by PHOEBEMOON0802 (创建于: 2025-12-22 20:57 (UTC+8))
- #31185 [PERF] Use
cutlass_scaled_mmfor Blackwell instead of deep-gemm’s blockscale gemm — nvidia — by vadiklyutiy (创建于: 2025-12-23 05:45 (UTC+8)) - #31175 [Bugfix] Properly apply v_scale for mimo_v2_flash — 无标签 — by mgoin (创建于: 2025-12-23 03:37 (UTC+8))
- #31149 [Bugfix][ROCm][Dynamo][DS 3.1][FP8] fix unsupported hasattr call when Dynamo tracing for ROCm device — rocm,ready — by zejunchen-zejun (创建于: 2025-12-22 21:31 (UTC+8))
- #31196 [compile] Remove torch 2.9 patches — 无标签 — by angelayi (创建于: 2025-12-23 09:20 (UTC+8))
- #31195 Fix TRTLLM Ragged Attention FP8 Path — ready,v1,nvidia — by pavanimajety (创建于: 2025-12-23 09:18 (UTC+8))
- #31192 [AMD][CI] fix v1/engine test_preprocess_error_handling — rocm,ready,v1 — by divakar-amd (创建于: 2025-12-23 07:41 (UTC+8))
- #31144 fix multiconnector for multi connector use push kv connector — kv-connector — by liziyu179 (创建于: 2025-12-22 20:13 (UTC+8))
- #31189 Amd ci/rlhf colocate — rocm,ci/build — by rjrock (创建于: 2025-12-23 06:45 (UTC+8))
- #31160 [Bug] Fix
Number of dimensions of tensors must match.for Deepseek V3.2 — ready,deepseek — by yewentao256 (创建于: 2025-12-23 00:11 (UTC+8)) - #31190 [Transformers][Bugfix] Migrated to new transformers nightly logic — 无标签 — by AndreasKaratzas (创建于: 2025-12-23 07:01 (UTC+8))
- #31182 [CI Failure] Disable mosaicml/mpt-7b and databricks/dbrx-instruct tests — documentation,ready,ci-failure — by mgoin (创建于: 2025-12-23 05:15 (UTC+8))
- #31191 Fix RecursionError in MediaWithBytes unpickling — multi-modality — by nrghosh (创建于: 2025-12-23 07:02 (UTC+8))
- #31167 [Perf] Remove blocking copy in GDN Attention — performance,ready,v1 — by benchislett (创建于: 2025-12-23 01:45 (UTC+8))
- #31173 [Bug] Fix
'CutlassMLAImpl' object has no attribute '_workspace_buffer'— ready,v1,nvidia — by yewentao256 (创建于: 2025-12-23 03:10 (UTC+8)) - #31187 [CI/ROCm] Fixing “V1 Test attention (H100)” test group. — rocm,v1 — by Alexei-V-Ivanov-AMD (创建于: 2025-12-23 06:07 (UTC+8))
- #31183 [wip] disable compile cache in test_fusion_attn — 无标签 — by angelayi (创建于: 2025-12-23 05:34 (UTC+8))
- #31110 [Bugfix][ROCm] Fix AITER method typos and invalid staticmethod — rocm — by c0de128 (创建于: 2025-12-22 11:17 (UTC+8))
- #31157 [Bugfix][ROCm] Fix typo: triton_fp4_gemm_dynamic_qaunt -> quant — rocm — by c0de128 (创建于: 2025-12-22 23:39 (UTC+8))
- #31118 [ROCm][Strix Halo] Fix for uninitialized prefix_scheduler_metadata — rocm,v1 — by c0de128 (创建于: 2025-12-22 12:51 (UTC+8))
- #31119 [ROCm][Strix Halo] Fix for tensor slice assignment in MLA — rocm,v1 — by c0de128 (创建于: 2025-12-22 13:00 (UTC+8))
- #31121 [ROCm][Strix Halo] Fix for list aliasing in fused MoE initialization — rocm — by c0de128 (创建于: 2025-12-22 13:19 (UTC+8))
- #31177 [ROCm][Strix Halo] Fix for exception types in AITER MLA FP8 check — rocm — by c0de128 (创建于: 2025-12-23 03:48 (UTC+8))
- #31179 [ROCm][Strix Halo] Fix for FP8 dtype in silu_mul quantization — rocm — by c0de128 (创建于: 2025-12-23 04:20 (UTC+8))
- #31180 [WIP] Mimo v2 flash mtp — new-model — by mgoin (创建于: 2025-12-23 04:53 (UTC+8))
- #31172 [NVIDIA] Add env var to override FlashInfer FP8 BMM — nvidia — by Kh4L (创建于: 2025-12-23 03:09 (UTC+8))
- #31174 [Doc] Add vllm-metal to hardware plugin documentation — documentation,ready,cpu — by mgoin (创建于: 2025-12-23 03:15 (UTC+8))
- #31168 Check if item.id exists to prevent 500 — frontend,ci/build — by gdombiak (创建于: 2025-12-23 01:51 (UTC+8))
- #31158 [torch.compile] Improve encoder compilation detection in PiecewiseBackend — qwen — by ilmarkov (创建于: 2025-12-22 23:43 (UTC+8))
- #31171 [perf] Integrate flashinfer concat_mla_k — v1 — by jiahanc (创建于: 2025-12-23 02:41 (UTC+8))
- #31153 [Chore] Update more locations to use
attention_config.backend— performance,ready — by DarkLight1337 (创建于: 2025-12-22 22:46 (UTC+8)) - #31159 [ROCm][CI/Build] Fix triton version to one that has triton_kernels required for gpt-oss to run — rocm,ready,ci/build,gpt-oss — by gshtras (创建于: 2025-12-22 23:52 (UTC+8))
- #31164 [openai api] log http exception in handler — frontend — by andyxning (创建于: 2025-12-23 00:48 (UTC+8))
- #31125 [UX] improve profiler error message — ready,v1 — by BoyuanFeng (创建于: 2025-12-22 14:48 (UTC+8))
- #31140 [Feature] Support weight-shape-unaligned block-scale fp8 models — 无标签 — by Wanli-Jiang (创建于: 2025-12-22 19:29 (UTC+8))
- #31150 [BugFix] Fix architecture flags to prevent issues on SM103 — ready,ci/build,nvidia — by LopezCastroRoberto (创建于: 2025-12-22 21:44 (UTC+8))
- #31163 Fix prefill trace warmup — documentation,frontend,ci/build,v1 — by sraizada-tt (创建于: 2025-12-23 00:36 (UTC+8))
- #31156 [ROCm] [Critical]: Remove unused variable — rocm,ready — by tjtanaa (创建于: 2025-12-22 23:14 (UTC+8))
- #31137 [misc] Sort uvicorn log level description according to verbosity — frontend — by andyxning (创建于: 2025-12-22 18:13 (UTC+8))
- #31113 Fix document of torchrun_example.py — documentation — by foreverlms (创建于: 2025-12-22 12:01 (UTC+8))
- #31151 [CI][Bugfix] Fix
entrypoints/openai/test_audio.py— ready — by NickLucche (创建于: 2025-12-22 22:03 (UTC+8)) - #31152 cwm tool parser — 无标签 — by ErezSC42 (创建于: 2025-12-22 22:43 (UTC+8))
- #31146 Add util function for checking nesting of rope parameters — ready — by hmellor (创建于: 2025-12-22 20:22 (UTC+8))
- #31142 Don’t tolerate
llm_configinstead oftext_config— ready — by hmellor (创建于: 2025-12-22 19:57 (UTC+8)) - #31135 [V1] align text, token_ids and logprobs under stop buffering with str… — v1 — by quanliu1991 (创建于: 2025-12-22 18:01 (UTC+8))
- #31134 Add prefix continuation to DeepSeek v3.2 — deepseek — by PHOEBEMOON0802 (创建于: 2025-12-22 17:40 (UTC+8))
- #31143 Add encode time — documentation,performance,new-model,rocm,frontend,tpu,speculative-decoding,needs-rebase,ci/build,v1 — by LJH-LBJ (创建于: 2025-12-22 20:07 (UTC+8))
- #31141 Epd mooncake engine — documentation,frontend,v1,kv-connector — by khuonglmhw (创建于: 2025-12-22 19:32 (UTC+8))
- #31138 [Mistral common] Ensure all functions are imported from the top & only use public methods — ci/build — by patrickvonplaten (创建于: 2025-12-22 18:51 (UTC+8))
- #31112 [CI]Replace pip in docker.xpu with uv pip — ci/build — by 1643661061leo (创建于: 2025-12-22 11:54 (UTC+8))
- #31132 [Model] Fix bagel failed to run — 无标签 — by Potabk (创建于: 2025-12-22 17:16 (UTC+8))
- #31133 [Model] use maybe_all_reduce_tensor_model_parallel — 无标签 — by wangxiyuan (创建于: 2025-12-22 17:30 (UTC+8))
- #31129 [DeepSeek v3.2] Add prefix contiunuation feature for DeepSeek v3.2 — deepseek — by PHOEBEMOON0802 (创建于: 2025-12-22 15:46 (UTC+8))
- #31130 [Bugfix] Fix shape mismatch in sparse 2:4 bitmask decompression for vision models — 无标签 — by majiayu000 (创建于: 2025-12-22 16:21 (UTC+8))
- #31127 [Frontend] Make pooling entrypoints request schema consensus. — documentation,frontend,multi-modality — by noooop (创建于: 2025-12-22 15:16 (UTC+8))
- #31126 Add explicit n:1 parameter to OpenAI API payloads in benchmark functions — performance — by wisdomfriend (创建于: 2025-12-22 15:08 (UTC+8))
- #31109 [Bugfix][ROCm] Fix typo: is_linear_fp8_enaled -> is_linear_fp8_enabled — rocm — by c0de128 (创建于: 2025-12-22 11:12 (UTC+8))
- #31120 [Misc] Fix typo: ‘occured’ -> ‘occurred’ — 无标签 — by c0de128 (创建于: 2025-12-22 13:02 (UTC+8))
- #31117 [Misc] Fix spelling typos in model comments — qwen — by c0de128 (创建于: 2025-12-22 12:14 (UTC+8))
- #31115 [Misc] Fix grammar errors in comments and messages — 无标签 — by c0de128 (创建于: 2025-12-22 12:10 (UTC+8))
- #31116 [Misc] Fix quantization-related typos — 无标签 — by c0de128 (创建于: 2025-12-22 12:12 (UTC+8))
- #31114 [Misc] Fix spelling typos in comments — ci/build,multi-modality — by c0de128 (创建于: 2025-12-22 12:07 (UTC+8))
- #31111 [ROCm][Refactor] Move the contiguous logic for ROCm in
torch_sdpa_wrapperinto MMEncoderAttention — rocm — by shen-shanshan (创建于: 2025-12-22 11:33 (UTC+8))
已合并 PR
- #30097 [Feature] Batch invariant: Lora — ready — by quanliu1991 (合并于: 2025-12-23 10:32 (UTC+8))
- #31197 Revert “[SM100] Enable fp8 compute for prefill MLA (#30746)” — v1,nvidia — by pavanimajety (合并于: 2025-12-23 10:15 (UTC+8))
- #31194 [ci] Fix Pytorch compilation test oom in 2.10 — ready — by angelayi (合并于: 2025-12-23 09:56 (UTC+8))
- #31192 [AMD][CI] fix v1/engine test_preprocess_error_handling — rocm,ready,v1 — by divakar-amd (合并于: 2025-12-23 09:28 (UTC+8))
- #30746 [SM100] Enable fp8 compute for prefill MLA — documentation,rocm,ready,ci/build,v1,multi-modality,tool-calling,deepseek,nvidia — by pavanimajety (合并于: 2025-12-23 03:15 (UTC+8))
- #31102 [MoE Refactor][7/N] AITER MK — rocm,ready,v1 — by robertgshaw2-redhat (合并于: 2025-12-23 07:42 (UTC+8))
- #31182 [CI Failure] Disable mosaicml/mpt-7b and databricks/dbrx-instruct tests — documentation,ready,ci-failure — by mgoin (合并于: 2025-12-23 07:40 (UTC+8))
- #31167 [Perf] Remove blocking copy in GDN Attention — performance,ready,v1 — by benchislett (合并于: 2025-12-23 06:25 (UTC+8))
- #31173 [Bug] Fix
'CutlassMLAImpl' object has no attribute '_workspace_buffer'— ready,v1,nvidia — by yewentao256 (合并于: 2025-12-23 06:24 (UTC+8)) - #29845 [SpecDecode] Simplified alternative padded-speculation acceptance rate fix — rocm,speculative-decoding,ready,v1 — by LucasWilkinson (合并于: 2025-12-23 05:06 (UTC+8))
- #31174 [Doc] Add vllm-metal to hardware plugin documentation — documentation,ready,cpu — by mgoin (合并于: 2025-12-23 04:06 (UTC+8))
- #31052 [MoE Refactor][9/N] Use modular kernel for unquantized Triton MoE — ready — by zyongye (合并于: 2025-12-23 01:34 (UTC+8))
- #31159 [ROCm][CI/Build] Fix triton version to one that has triton_kernels required for gpt-oss to run — rocm,ready,ci/build,gpt-oss — by gshtras (合并于: 2025-12-23 01:19 (UTC+8))
- #31125 [UX] improve profiler error message — ready,v1 — by BoyuanFeng (合并于: 2025-12-23 00:46 (UTC+8))
- #31156 [ROCm] [Critical]: Remove unused variable — rocm,ready — by tjtanaa (合并于: 2025-12-23 00:28 (UTC+8))
- #31040 [AMD][CI] Add “V1 Test e2e + engine” to mi325_8 Agent Pool — rocm,ready,ci/build — by micah-wil (合并于: 2025-12-22 23:41 (UTC+8))
- #31151 [CI][Bugfix] Fix
entrypoints/openai/test_audio.py— ready — by NickLucche (合并于: 2025-12-22 23:21 (UTC+8)) - #30781 [CI] add polling for precompiled wheel in python_only_compile.sh, fix index generation for releases — ready,ci/build — by Harry-Chen (合并于: 2025-12-22 21:24 (UTC+8))
- #30242 [BugFix] skip language model in Encoder — documentation,ready,v1,qwen,kv-connector — by Bounty-hunter (合并于: 2025-12-22 21:25 (UTC+8))
- #30205 [gpt-oss] Fix harmony parser in streaming responses — frontend,ready,gpt-oss — by AlonKejzman (合并于: 2025-12-22 20:56 (UTC+8))
- #31132 [Model] Fix bagel failed to run — 无标签 — by Potabk (合并于: 2025-12-22 18:15 (UTC+8))
- #31083 Update MiniMax-M2 ToolCall and add MiniMax-M2.1 in Docs — documentation,ready — by rogeryoungh (合并于: 2025-12-22 13:28 (UTC+8))
- #31109 [Bugfix][ROCm] Fix typo: is_linear_fp8_enaled -> is_linear_fp8_enabled — rocm — by c0de128 (合并于: 2025-12-22 13:14 (UTC+8))
- #31120 [Misc] Fix typo: ‘occured’ -> ‘occurred’ — 无标签 — by c0de128 (合并于: 2025-12-22 13:14 (UTC+8))
- #31117 [Misc] Fix spelling typos in model comments — qwen — by c0de128 (合并于: 2025-12-22 13:14 (UTC+8))
- #31115 [Misc] Fix grammar errors in comments and messages — 无标签 — by c0de128 (合并于: 2025-12-22 13:14 (UTC+8))
- #31116 [Misc] Fix quantization-related typos — 无标签 — by c0de128 (合并于: 2025-12-22 13:13 (UTC+8))
- #31114 [Misc] Fix spelling typos in comments — ci/build,multi-modality — by c0de128 (合并于: 2025-12-22 13:13 (UTC+8))
- #30957 [Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM — documentation,frontend,ready,nvidia — by CedricHwong (合并于: 2025-12-22 11:34 (UTC+8))
关闭但未合并的 PR
- #17368 enable multiple platform device in DP init — needs-rebase,unstale,v1 — by HanlinDu (关闭于: 2025-12-23 09:13 (UTC+8))
- #19844 [BugFix][V0] Fix AssertionError for prompt_logprobs — needs-rebase,unstale — by xu-song (关闭于: 2025-12-23 08:16 (UTC+8))
- #31190 [Transformers][Bugfix] Migrated to new transformers nightly logic — 无标签 — by AndreasKaratzas (关闭于: 2025-12-23 08:04 (UTC+8))
- #29605 [BugFix] num_cpu_blocks metrics is None in cache_config_info even when using OffloadingConnector — v1,qwen — by sts07142 (关闭于: 2025-12-23 07:49 (UTC+8))
- #31110 [Bugfix][ROCm] Fix AITER method typos and invalid staticmethod — rocm — by c0de128 (关闭于: 2025-12-23 05:11 (UTC+8))
- #31157 [Bugfix][ROCm] Fix typo: triton_fp4_gemm_dynamic_qaunt -> quant — rocm — by c0de128 (关闭于: 2025-12-23 05:11 (UTC+8))
- #23437 [Frontend] Add deepseek v3.1 reasoning parser — frontend,needs-rebase,qwen,deepseek — by arsenetar (关闭于: 2025-12-23 02:13 (UTC+8))
- #21084 [CI/Build] Add the Nixl test to CI — ready,needs-rebase,ci/build,stale,kv-connector — by kouroshHakha (关闭于: 2025-12-23 02:08 (UTC+8))
- #23870 [Benchmark] Add ability to round robin over a set of urls for benchmarking — performance,needs-rebase,ci/build,nvidia — by kouroshHakha (关闭于: 2025-12-23 02:08 (UTC+8))
- #31163 Fix prefill trace warmup — documentation,frontend,ci/build,v1 — by sraizada-tt (关闭于: 2025-12-23 00:36 (UTC+8))
- #25168 [CI/build] Abort CI if pre-commit fails — documentation,ci/build — by tmuttaki (关闭于: 2025-12-22 22:25 (UTC+8))
- #24210 [DO NOT MERGE] PR for testing — needs-rebase,ci/build,unstale — by tmuttaki (关闭于: 2025-12-22 22:25 (UTC+8))
- #31142 Don’t tolerate
llm_configinstead oftext_config— ready — by hmellor (关闭于: 2025-12-22 21:59 (UTC+8)) - #31134 Add prefix continuation to DeepSeek v3.2 — deepseek — by PHOEBEMOON0802 (关闭于: 2025-12-22 20:46 (UTC+8))
- #31143 Add encode time — documentation,performance,new-model,rocm,frontend,tpu,speculative-decoding,needs-rebase,ci/build,v1 — by LJH-LBJ (关闭于: 2025-12-22 20:08 (UTC+8))
- #31141 Epd mooncake engine — documentation,frontend,v1,kv-connector — by khuonglmhw (关闭于: 2025-12-22 19:32 (UTC+8))
- #31129 [DeepSeek v3.2] Add prefix contiunuation feature for DeepSeek v3.2 — deepseek — by PHOEBEMOON0802 (关闭于: 2025-12-22 16:40 (UTC+8))
- #30478 fix gme model do not use mrope — qwen — by zhuikefeng986285005-byte (关闭于: 2025-12-22 14:11 (UTC+8))
- #31111 [ROCm][Refactor] Move the contiguous logic for ROCm in
torch_sdpa_wrapperinto MMEncoderAttention — rocm — by shen-shanshan (关闭于: 2025-12-22 12:40 (UTC+8)) - #20036 [New Model] Support
StableLMAlphaForCausalLM— new-model — by b8zhong (关闭于: 2025-12-22 11:50 (UTC+8)) - #20901 [WIP][EPLB] Enable Llama4 EPLB — needs-rebase,llama — by b8zhong (关闭于: 2025-12-22 11:50 (UTC+8))