vLLM 开发动态报告 - 2025-12-13
时间窗口: 2025-12-13 10:53 (UTC+8) ~ 2025-12-14 10:53 (UTC+8) 数据统计: 新 Issue 11 | 关闭 Issue 25 | 新 PR 30 | 合并 PR 14 | 关闭未合并 PR 14
📊 每日开发状态摘要
在2025年12月13日至14日的窗口期内,vLLM项目保持高度活跃,共处理了36个议题(11个新增,25个关闭)和44个拉取请求(30个新增,14个合并)。开发焦点集中在 硬件生态适配(特别是针对NVIDIA Blackwell系列和AMD ROCm的性能优化)和 核心架构重构(尤其是对混合专家模型的持续清理与优化)两个方面。社区协作氛围良好,多个新增的“入门级”Issue迅速被贡献者认领。
🎯 AMD/ROCm 生态相关动态
本周期内与AMD生态直接相关的活动较少,但有一个关键的PR涉及ROCm性能优化。
- PR #30611: [ROCm][Perf] Replace cat to bmm’s inplace write when aiter enabled
- 贡献者:
ganyi1996ppo。根据指令,该用户名不包含“-amd”后缀,但提交内容明确针对AMD ROCm平台。 - 技术细节: 该PR优化了在解码路径中(当
aiter启用时)的torch.cat操作,改为利用bmm(批矩阵乘法)进行原地写入。 - 影响: 提交者表示,此修改在MI308 GPU上带来了约2.5%的性能提升。这表明开发社区在持续针对AMD硬件进行微观层面的内核优化,以提升推理效率。
- 贡献者:
分析:本周期虽然没有出现大规模的AMD平台适配(如Quark、MI300新特性),但持续的、针对性的性能优化PR表明,对现有AMD(ROCm)平台的支持和维护工作仍在稳步进行。
💬 高热度讨论分析
- Issue #30617: vllm 12.0 run 120b tp=8 in blackwell 5060ti*7+5090 with hit nccl error in cuda graph
- 核心议题: 用户在由7块RTX 5060 Ti和1块RTX 5090组成的异构GPU集群上,使用Tensor Parallelism(TP=8)运行大模型时,在CUDA图捕获阶段遇到NCCL错误。
- 各方观点:
- 报告者 (
gengchaogit): 详细提供了错误日志、启动命令和系统配置。指出使用--enforce-eager可以规避错误但性能低下,希望找到根本解决方案。 - 讨论内容: 问题主要聚焦于错误分析。日志显示“unhandled cuda error”,通常与硬件或驱动兼容性相关。该配置(不同型号GPU混合,特别是新发布的Blackwell系列消费级显卡)较为特殊。
- 报告者 (
- 争议焦点: 无直接争议,更多是对此特定硬件环境下NCCL与CUDA图兼容性的疑难排查。
- 当前状态: 开放中。社区(尤其是维护者)尚未给出明确解决方案,问题可能触及了底层驱动或NCCL库对新硬件的支持边界。
- Issue #30604: [ARM_CPU_backend] Engine core proc EngineCore_DP0 died unexpectedly
- 核心议题: 用户在AWS Graviton 3 ARM CPU服务器上从源码安装vLLM后,运行基础测试时引擎核心进程意外崩溃。
- 各方观点:
- 报告者 (
Mengjintao): 提供了详尽的复现步骤、环境信息和错误日志。尝试了从源码安装和安装预编译wheel两种方式均失败。 - 讨论内容: 报告者与社区交互主要是补充信息。错误指向引擎核心进程在初始化后立即退出,可能涉及ARM CPU后端的进程间通信或内存初始化问题。
- 报告者 (
- 争议焦点: 无。
- 当前状态: 开放中。这是一个影响ARM CPU后端可用性的严重问题,需要核心开发者介入诊断。
- Issue #30620 与 #30621: 关于FusedMoE层的重构
- 核心议题: 由
robertgshaw2-redhat连续创建了两个旨在清理FusedMoE(混合专家融合层)代码的“good first issue”。 - 各方观点:
- 发起者 (
robertgshaw2-redhat): 明确指出了代码中存在的“历史包袱”:#30620提出移除因分块预填充而不再需要的chunking逻辑;#30621提出将MXFP4量化模拟逻辑从vLLM核心代码移至quark量化工具中。 - 其他贡献者:
ProExpertProg表达了对“移除所有不必要分块”的支持。KonstGolfi和adityakamat24迅速响应并认领了这两个任务。
- 发起者 (
- 争议焦点: 无争议,体现了社区对代码质量优化的共识。
- 当前状态: 两个Issue均开放,但已有贡献者认领,预计将通过后续PR解决。这反映了项目核心模块的持续重构和模块化努力。
- 核心议题: 由
🔥 热门话题与趋势分析
- 新硬件支持与性能调优: 针对NVIDIA Blackwell系列(B300/GB300, SM103)的支持是明显热点。相关活动包括:
- Issue #30630: 询问B300的完整支持状态及
SymmMemCommunicator警告。 - PR #30484: 已合并,为SM103(Blackwell Ultra)添加基础支持。
- PR #30629: 新增,为GLM-4.6模型在B300上提供调优后的融合MoE内核配置,以优化启动时间和性能。
- Issue #30630: 询问B300的完整支持状态及
-
MoE架构的持续重构: 围绕混合专家模型的代码清理和优化是另一条主线,涉及多个Issue和PR,旨在提升代码可维护性和性能。
-
Qwen系列模型问题:
Qwen3-VL-MoE和Qwen3-Next模型在运行中遇到特定错误(如masked_scatter_size_check),表明对新发布的复杂模型架构(特别是视觉MoE、混合模态)的适配和测试仍需加强。 - 安装与构建复杂性: 多个Issue反映了在不同平台(ARM CPU、macOS M1、特定CUDA版本的Docker构建)上安装vLLM的挑战,突显了项目依赖复杂性和跨平台支持难度。
🛠️ 重点技术变更
- PR #30484: [Feature] Add SM103 (Blackwell Ultra) Support to vLLM (已合并)
- 解读: 此PR为vLLM添加了对NVIDIA SM103架构(即Blackwell Ultra,如GB300 GPU)的初始支持。它更新了设备能力检测逻辑,使vLLM能正确识别并在新架构上运行。
- 影响: 标志着vLLM正式支持最新的Blackwell Ultra数据中心GPU,为未来在该平台上进行大规模模型推理铺平了道路。提交者已验证了量化、MoE等关键路径。
- PR #30627: [MoE][Refactor 1/N] Separate Online Quantization (进行中)
- 解读: 这是MoE重构系列的第一步,旨在将在线量化(在推理时动态量化专家权重)的逻辑从原有代码中分离,定义为独立的
QuantizationMethod。 - 影响: 提高了代码的模块化和清晰度,为后续进一步优化和扩展MoE的量化策略打下基础,是MoE子系统长期健康演进的重要步骤。
- 解读: 这是MoE重构系列的第一步,旨在将在线量化(在推理时动态量化专家权重)的逻辑从原有代码中分离,定义为独立的
- PR #30611: [ROCm][Perf] Replace cat to bmm’s inplace write (进行中)
- 解读: 一个针对AMD ROCm平台的精细化性能优化。通过将解码路径中的张量拼接操作替换为批矩阵乘法的原地写入,减少了内存操作开销。
- 影响: 虽然改动量小,但体现了对AMD平台性能的持续打磨,能在特定条件下带来可观的性能提升。
- PR #30618: [BugFix][Hybrid] Fix prefill chunk incorrectly including draft tokens (进行中)
- 解读: 修复了在混合模型(如带Mamba的Qwen3-Next)中使用推测解码时,预填充块错误包含了草稿令牌的问题。这会导致Mamba状态机保存错误长度的状态,进而产生错误输出。
- 影响: 修复了推测解码与特定模型架构(状态空间模型)结合时的一个关键缺陷,保证了复杂推理功能(推测解码)在更广泛模型上的正确性。
📈 开发活跃度观察
- 高效合并: 本周期合并了14个PR,涵盖文档、bug修复、功能添加和重构,表明代码审查和合并流程顺畅。
- 社区参与: 多个新Issue被标记为
help wanted/good first issue并迅速被社区成员(如KonstGolfi,adityakamat24)认领,显示项目有良好的新手引导和社区参与度。 - 核心开发者活跃:
robertgshaw2-redhat、DarkLight1337、Isotr0py等核心贡献者活跃于代码重构、问题修复和文档完善等多个领域。 - 新贡献者:
navmarri14(提交B300优化配置)等新面孔出现,表明项目吸引了来自实际应用场景的优化贡献。
💡 值得关注的问题
- Issue #30630: SymmMemCommunicator: Device capability 10.3 not supported: 用户在使用B300实例时收到警告,并质疑其性能是否完全释放。这关系到新硬件上高级特性(可能是对称内存通信)的支持和性能验证,需要官方明确回答B300的支持矩阵和性能预期。
- Issue #30617: 异构Blackwell GPU的NCCL错误: 在消费级Blackwell显卡的混合集群中运行TP遇到的问题,可能暴露了CUDA图、NCCL与新硬件驱动在非标准环境下的兼容性风险,对想在新型号上构建集群的用户有重要参考价值。
- Issue #30604: ARM CPU后端崩溃: 这是阻碍vLLM在ARM服务器上部署的关键阻塞性问题。需要核心开发者优先排查,以维护项目对ARM CPU后端支持的承诺。
📋 附录:详细数据列表
新增 Issue
- #30630 [Usage]: SymmMemCommunicator: Device capability 10.3 not supported — usage — by navmarri14 (创建于: 2025-12-14 09:00 (UTC+8))
- #30620 [Feature]: Remove Chunking From FusedMoE — help wanted,good first issue,feature request — by robertgshaw2-redhat (创建于: 2025-12-14 02:22 (UTC+8))
- #30621 [Feature]: Remove MXFP4 Logic From
fused_experts— help wanted,good first issue,feature request — by robertgshaw2-redhat (创建于: 2025-12-14 02:30 (UTC+8)) - #30628 [Bug]: For building a CUDA 13 vLLM docker image, when building LMCache, wrong version of NIXL (
nixl-cu12) is downloaded — bug,kv-connector,nvidia — by wangshangsam (创建于: 2025-12-14 06:50 (UTC+8)) - #30624 [Bug]:
masked_scatter_size_checkfailed when running Qwen3VLMoE — bug,qwen,nvidia — by soodoshll (创建于: 2025-12-14 03:39 (UTC+8)) - #30622 [Feature]: [Refactor] Move
zero_experts_compute_tritoninto model specific file — feature request — by zyongye (创建于: 2025-12-14 03:02 (UTC+8)) - #30617 [Bug]: vllm 12.0 run 120b tp=8 in blackwell 5060ti*7+5090 with hit nccl error in cuda graph — bug — by gengchaogit (创建于: 2025-12-13 23:31 (UTC+8))
-
#30606 [Bug]: Installation fail on Macbook Pro M1 ARM64 with 64Go RAM: TypeError: unsupported operand type(s) for : ‘type’ and ‘NoneType’ — bug — by pilere (创建于: 2025-12-13 18:38 (UTC+8)) - #30604 [Installation]: [ARM_CPU_backend] Engine core proc EngineCore_DP0 died unexpectedly, shutting down client. — installation,aarch64-cpu — by Mengjintao (创建于: 2025-12-13 16:51 (UTC+8))
- #30599 [Bug]: Qwen3-30B-A3B on vLLM-OpenAI v0.12.0 “hangs” in thinking mode: keeps reasoning until token cap — bug — by golyshevskii (创建于: 2025-12-13 15:41 (UTC+8))
- #30595 [Bug]: Unsatisfiable testing dependencies — bug — by BlankRH (创建于: 2025-12-13 13:42 (UTC+8))
已关闭 Issue
- #18801 [Performance]: How can i improve performance further in vllm lmcache PD Disaggregate?Plz Help Me — performance,stale — by mugglew (关闭于: 2025-12-14 10:16 (UTC+8))
- #19136 [Bug]: 单机多卡推理 tensor-parallel-size和pipeline-parallel-size 推理结果差距巨大 — bug,stale — by Kenwwww (关闭于: 2025-12-14 10:16 (UTC+8))
- #22120 [Usage]: Can vllm skip MTP layer loading for GLM-4.5 to save some vram — usage,stale — by adonishong (关闭于: 2025-12-14 10:16 (UTC+8))
- #22291 [Feature]: how to calculate –gpu-memory-utilization for a given max concurrency such as 5 req/s, 20 req/s and 50 req/s — feature request,stale — by hobin2017 (关闭于: 2025-12-14 10:16 (UTC+8))
- #22430 [Bug]: PP+PD NixlConnector failed — bug,stale — by R2-Y (关闭于: 2025-12-14 10:15 (UTC+8))
- #22748 [Bug]: stop_profile take too long to finish — bug,stale — by gameofdimension (关闭于: 2025-12-14 10:15 (UTC+8))
- #22780 [Performance]: Performance Drop with Concurrent Requests Using BnB-4bit Quantized Models in vLLM — performance,stale — by yhhtc1201 (关闭于: 2025-12-14 10:15 (UTC+8))
- #22811 [Bug]: Load jina-ranker-m0 error — documentation,stale — by chen03191108-lab (关闭于: 2025-12-14 10:15 (UTC+8))
- #22828 [Feature]: vLLM Shutdown and Logging — help wanted,feature request,stale — by robertgshaw2-redhat (关闭于: 2025-12-14 10:15 (UTC+8))
- #22882 [Bug]: Loading a quantized model from S3 fails — bug,stale — by petrcezner (关闭于: 2025-12-14 10:15 (UTC+8))
- #22884 [Bug]: When deploying the fine-tuned model using vllm, lora loses its effect. — bug,stale — by tiga-dudu (关闭于: 2025-12-14 10:15 (UTC+8))
- #22886 [Bug]: After enabling the ‘–enable-expert-parallel’ switch and setting it to ‘deepep_high_throughput’ mode, an error occurred during inference — bug,stale — by un4gh (关闭于: 2025-12-14 10:15 (UTC+8))
- #22901 [Bug]: Dequantized GPT-OSS Model Deployment Error — bug,stale — by satpalsr (关闭于: 2025-12-14 10:15 (UTC+8))
- #22914 [Feature][UX]: Consolidate vLLM Configuration — feature request,stale — by robertgshaw2-redhat (关闭于: 2025-12-14 10:15 (UTC+8))
- #22917 [Bug]: Llama4 pythonic parser doesn’t parse when it starts with raw content — bug,stale — by erkintelnyx (关闭于: 2025-12-14 10:15 (UTC+8))
- #22922 [CI Failure]: v1/e2e/test_spec_decode.py::test_eagle_correctness[TREE_ATTN-llama3_eagle3] — stale,ci-failure — by mgoin (关闭于: 2025-12-14 10:15 (UTC+8))
- #22931 [Feature]: FP8 e4m3fn->fnuz KV cache conversion — feature request,stale — by bradleyhd (关闭于: 2025-12-14 10:15 (UTC+8))
- #22941 [Performance]: Run Arctic embedding benchmarks with V0 and V1 — performance,stale — by maxdebayser (关闭于: 2025-12-14 10:15 (UTC+8))
- #22959 [Bug]: Fail to load auto_round quantization format with quantized lm_head — bug,stale — by n1ck-guo (关闭于: 2025-12-14 10:15 (UTC+8))
- #22970 [Bug]: How to setup stable latency on VLLM for streaming — bug,stale — by saifulislam79 (关闭于: 2025-12-14 10:15 (UTC+8))
- #22981 [Bug]: Unable to test openai/gpt-oss-120b via vllm — bug,stale — by psydok (关闭于: 2025-12-14 10:15 (UTC+8))
- #23003 [Feature]: Support Diverse Beam Search (like HF transformers generation_strategies) — feature request,stale — by stonewst (关闭于: 2025-12-14 10:15 (UTC+8))
- #30622 [Feature]: [Refactor] Move
zero_experts_compute_tritoninto model specific file — feature request — by zyongye (关闭于: 2025-12-14 03:04 (UTC+8)) - #29382 [Doc]: Expert Parallel Deployment says “Tensor parallel size (always 1 for now)” is confusing — documentation — by xeonliu (关闭于: 2025-12-14 01:38 (UTC+8))
- #29778 [Bug]: Using image_embeds to input multimodal image embedding results causes array out-of-bounds during processing. — bug — by NewZxy (关闭于: 2025-12-13 15:48 (UTC+8))
新增 PR
- #30598 [LoRA] Set default MXFP4 LoRA backend to Marlin — 无标签 — by xyang16 (创建于: 2025-12-13 15:17 (UTC+8))
- #30629 tuned fused configs for B300 — 无标签 — by navmarri14 (创建于: 2025-12-14 08:21 (UTC+8))
- #30626 [docker] Restructure Dockerfile for more efficient and cache-friendly builds — documentation,ready,ci/build — by amrmahdi (创建于: 2025-12-14 05:02 (UTC+8))
- #30627 [MoE][Refactor 1/N] Separate Online Quantization — 无标签 — by robertgshaw2-redhat (创建于: 2025-12-14 05:37 (UTC+8))
- #30625 fix: prevent reasoning output when enable_thinking is false — frontend — by llsj14 (创建于: 2025-12-14 04:35 (UTC+8))
- #30623 [Misc][Refactor] Refactor MoE router functions into separate classes — 无标签 — by bnellnm (创建于: 2025-12-14 03:30 (UTC+8))
- #30619 [CI/Build] Ignore max transformers version skipping for initialization tests — ready — by Isotr0py (创建于: 2025-12-14 02:02 (UTC+8))
- #30615 [Docs] Clarify Expert Parallel behavior for attention and MoE layers — documentation,ready — by majiayu000 (创建于: 2025-12-13 23:09 (UTC+8))
- #30618 [BugFix][Hybrid] Fix prefill chunk incorrectly including draft tokens — v1 — by peakcrosser7 (创建于: 2025-12-14 01:08 (UTC+8))
- #30591 [Scheduer] Simplify stop checking for pooling models — ready,v1 — by njhill (创建于: 2025-12-13 11:04 (UTC+8))
- #30616 [Docs] Add FlashInfer environment variables to env_vars documentation — documentation — by majiayu000 (创建于: 2025-12-13 23:09 (UTC+8))
- #30614 [Feature] Default EPLB num_redundant_experts to minimum valid value — 无标签 — by majiayu000 (创建于: 2025-12-13 23:08 (UTC+8))
- #30613 [Bugfix] Add validation for tool requests when tool_parser is unavailable — frontend — by majiayu000 (创建于: 2025-12-13 23:08 (UTC+8))
- #30609 [Refactor]
TokenizerRegistryonly uses lazy imports — structured-output,frontend,ready,v1 — by DarkLight1337 (创建于: 2025-12-13 21:01 (UTC+8)) - #30612 [Chore] Remove redundant
RequestPrompt— frontend,ready — by DarkLight1337 (创建于: 2025-12-13 22:30 (UTC+8)) - #30611 [ROCm][Perf] Replace cat to bmm’s inplace write when aiter enabled — rocm,v1 — by ganyi1996ppo (创建于: 2025-12-13 22:30 (UTC+8))
- #30610 Fix incorrect dimension in reduce_scatter — nvidia — by RKai025 (创建于: 2025-12-13 22:16 (UTC+8))
- #30590 [ROCm][CI] Add “Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test” Back Into AMD CI — rocm,ready,ci/build,qwen — by micah-wil (创建于: 2025-12-13 11:01 (UTC+8))
- #30607 [Bugfix] Improve DCP error hint in cp_utils — v1 — by jliu9515 (创建于: 2025-12-13 20:07 (UTC+8))
- #30608 [FixBug]fix gpt-oss v1/completions response bug — frontend,tool-calling,gpt-oss — by princepride (创建于: 2025-12-13 20:49 (UTC+8))
- #30601 [Chore] Adjust tokenizer import to avoid circular imports — performance,structured-output,frontend,ready,v1,multi-modality,tool-calling — by DarkLight1337 (创建于: 2025-12-13 16:18 (UTC+8))
- #30597 [CI/Build] Fix broken mm processor test Mistral-3-large — ready,multi-modality — by Isotr0py (创建于: 2025-12-13 14:22 (UTC+8))
- #30596 [Bugfix][benchmarks] Fix input token calculation for rerank benchmark metrics — performance — by Flink-ddd (创建于: 2025-12-13 14:05 (UTC+8))
- #30594 [docs][fix] Update Arm CPU vLLM wheel installation docs — documentation — by fadara01 (创建于: 2025-12-13 13:31 (UTC+8))
- #30605 [Bugfix] Fix ScalarType NanRepr enum comparisons — 无标签 — by NoonePauseferg (创建于: 2025-12-13 17:44 (UTC+8))
- #30603 just for testing dco — documentation,v1,qwen — by peakcrosser7 (创建于: 2025-12-13 16:31 (UTC+8))
- #30602 [Quantization] Pass
QuantizationArgsto compress-tensors schema’s get_min_capability — 无标签 — by Isotr0py (创建于: 2025-12-13 16:23 (UTC+8)) - #30592 scheduler: cap prefill token admission under backlog to reduce tail latency — v1 — by Benjamindaoson (创建于: 2025-12-13 13:16 (UTC+8))
- #30600 just testing — documentation,frontend,v1,qwen — by peakcrosser7 (创建于: 2025-12-13 15:56 (UTC+8))
- #30593 [Misc] Improve error messages for unsupported types and parameters — performance,kv-connector,nvidia — by BlankRH (创建于: 2025-12-13 13:26 (UTC+8))
已合并 PR
- #30577 [Bug][KVConnector][Metrics] Remove a vacuous assertion breaking external-launcher — ready,kv-connector,fb-exported,meta-exported — by QierLi (合并于: 2025-12-14 09:23 (UTC+8))
- #30414 [Doc] Add instructions for building docker image on GB300 with CUDA13 — documentation,ready,aarch64-cuda,nvidia — by soodoshll (合并于: 2025-12-14 05:56 (UTC+8))
- #29903 [Logs] Optimize startup logs 4 — ready,v1,nvidia — by yewentao256 (合并于: 2025-12-14 05:12 (UTC+8))
- #30509 [Doc] Add documents for multi-node distributed serving with MP backend — documentation,ready,v1 — by Isotr0py (合并于: 2025-12-14 02:02 (UTC+8))
- #30615 [Docs] Clarify Expert Parallel behavior for attention and MoE layers — documentation,ready — by majiayu000 (合并于: 2025-12-14 01:37 (UTC+8))
- #30591 [Scheduer] Simplify stop checking for pooling models — ready,v1 — by njhill (合并于: 2025-12-13 17:45 (UTC+8))
- #30459 set assume_32bit_indexing and pass unbacked hints — ready — by laithsakka (合并于: 2025-12-13 23:36 (UTC+8))
- #30609 [Refactor]
TokenizerRegistryonly uses lazy imports — structured-output,frontend,ready,v1 — by DarkLight1337 (合并于: 2025-12-13 23:16 (UTC+8)) - #30433 [Bugfix] Qwen3-next with –hf-overrides {"num_hidden_layers":8} — ready,qwen — by heheda12345 (合并于: 2025-12-13 22:12 (UTC+8))
- #30601 [Chore] Adjust tokenizer import to avoid circular imports — performance,structured-output,frontend,ready,v1,multi-modality,tool-calling — by DarkLight1337 (合并于: 2025-12-13 20:42 (UTC+8))
- #30597 [CI/Build] Fix broken mm processor test Mistral-3-large — ready,multi-modality — by Isotr0py (合并于: 2025-12-13 20:43 (UTC+8))
- #30507 [Bugfix] Dictionary MM embeddings for online chat — frontend,ready,v1 — by DarkLight1337 (合并于: 2025-12-13 15:48 (UTC+8))
- #30310 [Misc][Quantization] Clarify the intent of GGUF
FusedMoEweight materialization — ready — by a4lg (合并于: 2025-12-13 13:55 (UTC+8)) - #30484 [Feature] Add SM103 (Blackwell Ultra) Support to vLLM — ready,v1,nvidia — by LopezCastroRoberto (合并于: 2025-12-13 11:34 (UTC+8))
关闭但未合并的 PR
- #18038 [Misc] Add torch.int16 to TORCH_DTYPE_TO_NUMPY_DTYPE conversion map — needs-rebase,stale — by rebel-jonghewk (关闭于: 2025-12-14 10:17 (UTC+8))
- #18094 [Frontend] Vendor exported templates to
vllm.tools— documentation,frontend,needs-rebase,stale,tool-calling — by aarnphm (关闭于: 2025-12-14 10:17 (UTC+8)) - #18186 [Build] Refactor cmake — needs-rebase,ci/build,stale — by LucasWilkinson (关闭于: 2025-12-14 10:17 (UTC+8))
- #18238 Add Blackwell to Hardware Matrix — documentation,needs-rebase,stale — by b8zhong (关闭于: 2025-12-14 10:17 (UTC+8))
- #18277 [CI/Build] [TPU] Add test exit code in docker run to prevent silent failure — needs-rebase,ci/build,stale — by CAROLZXYZXY (关闭于: 2025-12-14 10:16 (UTC+8))
- #21519 [Bugfix] Fix v1 engine crash in priority scheduling with parallel sampling (n > 1) — stale,v1 — by HongBeenKim (关闭于: 2025-12-14 10:16 (UTC+8))
- #22438 [Speculators][Speculative Decoding] Add Eagle3 support for Qwen2 — stale,qwen — by hukongyi (关闭于: 2025-12-14 10:15 (UTC+8))
- #22874 [CI][V0 Deprecation] Remove
test_regression.py— ready,ci/build,stale — by robertgshaw2-redhat (关闭于: 2025-12-14 10:15 (UTC+8)) - #22984 [ROCm][Bugfix] Add missing max_qlen argument — rocm,stale — by tuukkjs (关闭于: 2025-12-14 10:15 (UTC+8))
- #27523 [Fix] Change default MXFP4 backend for SM90 to Marlin — 无标签 — by mmangkad (关闭于: 2025-12-14 09:10 (UTC+8))
- #30096 [Bugfix] Use PIECEWISE cudagraph with gpt-oss on Ampere — needs-rebase,gpt-oss,nvidia — by bbrowning (关闭于: 2025-12-14 03:33 (UTC+8))
- #30603 just for testing dco — documentation,v1,qwen — by peakcrosser7 (关闭于: 2025-12-13 16:32 (UTC+8))
- #30600 just testing — documentation,frontend,v1,qwen — by peakcrosser7 (关闭于: 2025-12-13 15:56 (UTC+8))
- #24382 [Bugfix] request_id abort-reuse race fix — v1 — by ben11211 (关闭于: 2025-12-13 12:45 (UTC+8))