vLLM 开发动态报告 - 2025-12-13

时间窗口: 2025-12-13 10:53 (UTC+8) ~ 2025-12-14 10:53 (UTC+8) 数据统计: 新 Issue 11 | 关闭 Issue 25 | 新 PR 30 | 合并 PR 14 | 关闭未合并 PR 14

📊 每日开发状态摘要

在2025年12月13日至14日的窗口期内，vLLM项目保持高度活跃，共处理了36个议题（11个新增，25个关闭）和44个拉取请求（30个新增，14个合并）。开发焦点集中在 硬件生态适配（特别是针对NVIDIA Blackwell系列和AMD ROCm的性能优化）和 核心架构重构（尤其是对混合专家模型的持续清理与优化）两个方面。社区协作氛围良好，多个新增的“入门级”Issue迅速被贡献者认领。

🎯 AMD/ROCm 生态相关动态

本周期内与AMD生态直接相关的活动较少，但有一个关键的PR涉及ROCm性能优化。

PR #30611: [ROCm][Perf] Replace cat to bmm’s inplace write when aiter enabled
- 贡献者: ganyi1996ppo。根据指令，该用户名不包含“-amd”后缀，但提交内容明确针对AMD ROCm平台。
- 技术细节: 该PR优化了在解码路径中（当aiter启用时）的torch.cat操作，改为利用bmm（批矩阵乘法）进行原地写入。
- 影响: 提交者表示，此修改在MI308 GPU上带来了约2.5%的性能提升。这表明开发社区在持续针对AMD硬件进行微观层面的内核优化，以提升推理效率。

分析：本周期虽然没有出现大规模的AMD平台适配（如Quark、MI300新特性），但持续的、针对性的性能优化PR表明，对现有AMD（ROCm）平台的支持和维护工作仍在稳步进行。

💬 高热度讨论分析

Issue #30617: vllm 12.0 run 120b tp=8 in blackwell 5060ti*7+5090 with hit nccl error in cuda graph
- 核心议题: 用户在由7块RTX 5060 Ti和1块RTX 5090组成的异构GPU集群上，使用Tensor Parallelism（TP=8）运行大模型时，在CUDA图捕获阶段遇到NCCL错误。
- 各方观点:
  - 报告者 (gengchaogit): 详细提供了错误日志、启动命令和系统配置。指出使用--enforce-eager可以规避错误但性能低下，希望找到根本解决方案。
  - 讨论内容: 问题主要聚焦于错误分析。日志显示“unhandled cuda error”，通常与硬件或驱动兼容性相关。该配置（不同型号GPU混合，特别是新发布的Blackwell系列消费级显卡）较为特殊。
- 争议焦点: 无直接争议，更多是对此特定硬件环境下NCCL与CUDA图兼容性的疑难排查。
- 当前状态: 开放中。社区（尤其是维护者）尚未给出明确解决方案，问题可能触及了底层驱动或NCCL库对新硬件的支持边界。
Issue #30604: [ARM_CPU_backend] Engine core proc EngineCore_DP0 died unexpectedly
- 核心议题: 用户在AWS Graviton 3 ARM CPU服务器上从源码安装vLLM后，运行基础测试时引擎核心进程意外崩溃。
- 各方观点:
  - 报告者 (Mengjintao): 提供了详尽的复现步骤、环境信息和错误日志。尝试了从源码安装和安装预编译wheel两种方式均失败。
  - 讨论内容: 报告者与社区交互主要是补充信息。错误指向引擎核心进程在初始化后立即退出，可能涉及ARM CPU后端的进程间通信或内存初始化问题。
- 争议焦点: 无。
- 当前状态: 开放中。这是一个影响ARM CPU后端可用性的严重问题，需要核心开发者介入诊断。
Issue #30620 与 #30621: 关于FusedMoE层的重构
- 核心议题: 由robertgshaw2-redhat连续创建了两个旨在清理FusedMoE（混合专家融合层）代码的“good first issue”。
- 各方观点:
  - 发起者 (robertgshaw2-redhat): 明确指出了代码中存在的“历史包袱”：#30620 提出移除因分块预填充而不再需要的chunking逻辑；#30621 提出将MXFP4量化模拟逻辑从vLLM核心代码移至quark量化工具中。
  - 其他贡献者: ProExpertProg 表达了对“移除所有不必要分块”的支持。KonstGolfi 和 adityakamat24 迅速响应并认领了这两个任务。
- 争议焦点: 无争议，体现了社区对代码质量优化的共识。
- 当前状态: 两个Issue均开放，但已有贡献者认领，预计将通过后续PR解决。这反映了项目核心模块的持续重构和模块化努力。

🔥 热门话题与趋势分析

新硬件支持与性能调优: 针对NVIDIA Blackwell系列（B300/GB300, SM103）的支持是明显热点。相关活动包括：
- Issue #30630: 询问B300的完整支持状态及SymmMemCommunicator警告。
- PR #30484: 已合并，为SM103（Blackwell Ultra）添加基础支持。
- PR #30629: 新增，为GLM-4.6模型在B300上提供调优后的融合MoE内核配置，以优化启动时间和性能。
MoE架构的持续重构: 围绕混合专家模型的代码清理和优化是另一条主线，涉及多个Issue和PR，旨在提升代码可维护性和性能。
Qwen系列模型问题: Qwen3-VL-MoE 和 Qwen3-Next 模型在运行中遇到特定错误（如masked_scatter_size_check），表明对新发布的复杂模型架构（特别是视觉MoE、混合模态）的适配和测试仍需加强。
安装与构建复杂性: 多个Issue反映了在不同平台（ARM CPU、macOS M1、特定CUDA版本的Docker构建）上安装vLLM的挑战，突显了项目依赖复杂性和跨平台支持难度。

🛠️ 重点技术变更

PR #30484: [Feature] Add SM103 (Blackwell Ultra) Support to vLLM (已合并)
- 解读: 此PR为vLLM添加了对NVIDIA SM103架构（即Blackwell Ultra，如GB300 GPU）的初始支持。它更新了设备能力检测逻辑，使vLLM能正确识别并在新架构上运行。
- 影响: 标志着vLLM正式支持最新的Blackwell Ultra数据中心GPU，为未来在该平台上进行大规模模型推理铺平了道路。提交者已验证了量化、MoE等关键路径。
PR #30627: [MoE][Refactor 1/N] Separate Online Quantization (进行中)
- 解读: 这是MoE重构系列的第一步，旨在将在线量化（在推理时动态量化专家权重）的逻辑从原有代码中分离，定义为独立的QuantizationMethod。
- 影响: 提高了代码的模块化和清晰度，为后续进一步优化和扩展MoE的量化策略打下基础，是MoE子系统长期健康演进的重要步骤。
PR #30611: [ROCm][Perf] Replace cat to bmm’s inplace write (进行中)
- 解读: 一个针对AMD ROCm平台的精细化性能优化。通过将解码路径中的张量拼接操作替换为批矩阵乘法的原地写入，减少了内存操作开销。
- 影响: 虽然改动量小，但体现了对AMD平台性能的持续打磨，能在特定条件下带来可观的性能提升。
PR #30618: [BugFix][Hybrid] Fix prefill chunk incorrectly including draft tokens (进行中)
- 解读: 修复了在混合模型（如带Mamba的Qwen3-Next）中使用推测解码时，预填充块错误包含了草稿令牌的问题。这会导致Mamba状态机保存错误长度的状态，进而产生错误输出。
- 影响: 修复了推测解码与特定模型架构（状态空间模型）结合时的一个关键缺陷，保证了复杂推理功能（推测解码）在更广泛模型上的正确性。

📈 开发活跃度观察

高效合并: 本周期合并了14个PR，涵盖文档、bug修复、功能添加和重构，表明代码审查和合并流程顺畅。
社区参与: 多个新Issue被标记为help wanted/good first issue并迅速被社区成员（如KonstGolfi, adityakamat24）认领，显示项目有良好的新手引导和社区参与度。
核心开发者活跃: robertgshaw2-redhat、DarkLight1337、Isotr0py等核心贡献者活跃于代码重构、问题修复和文档完善等多个领域。
新贡献者: navmarri14（提交B300优化配置）等新面孔出现，表明项目吸引了来自实际应用场景的优化贡献。

💡 值得关注的问题

Issue #30630: SymmMemCommunicator: Device capability 10.3 not supported: 用户在使用B300实例时收到警告，并质疑其性能是否完全释放。这关系到新硬件上高级特性（可能是对称内存通信）的支持和性能验证，需要官方明确回答B300的支持矩阵和性能预期。
Issue #30617: 异构Blackwell GPU的NCCL错误: 在消费级Blackwell显卡的混合集群中运行TP遇到的问题，可能暴露了CUDA图、NCCL与新硬件驱动在非标准环境下的兼容性风险，对想在新型号上构建集群的用户有重要参考价值。
Issue #30604: ARM CPU后端崩溃: 这是阻碍vLLM在ARM服务器上部署的关键阻塞性问题。需要核心开发者优先排查，以维护项目对ARM CPU后端支持的承诺。

📋 附录：详细数据列表

新增 Issue

#30630 [Usage]: SymmMemCommunicator: Device capability 10.3 not supported — usage — by navmarri14 (创建于: 2025-12-14 09:00 (UTC+8))
#30620 [Feature]: Remove Chunking From FusedMoE — help wanted,good first issue,feature request — by robertgshaw2-redhat (创建于: 2025-12-14 02:22 (UTC+8))
#30621 [Feature]: Remove MXFP4 Logic From fused_experts — help wanted,good first issue,feature request — by robertgshaw2-redhat (创建于: 2025-12-14 02:30 (UTC+8))
#30628 [Bug]: For building a CUDA 13 vLLM docker image, when building LMCache, wrong version of NIXL (nixl-cu12) is downloaded — bug,kv-connector,nvidia — by wangshangsam (创建于: 2025-12-14 06:50 (UTC+8))
#30624 [Bug]: masked_scatter_size_check failed when running Qwen3VLMoE — bug,qwen,nvidia — by soodoshll (创建于: 2025-12-14 03:39 (UTC+8))
#30622 [Feature]: [Refactor] Move zero_experts_compute_triton into model specific file — feature request — by zyongye (创建于: 2025-12-14 03:02 (UTC+8))
#30617 [Bug]: vllm 12.0 run 120b tp=8 in blackwell 5060ti*7+5090 with hit nccl error in cuda graph — bug — by gengchaogit (创建于: 2025-12-13 23:31 (UTC+8))

#30606 [Bug]: Installation fail on Macbook Pro M1 ARM64 with 64Go RAM: TypeError: unsupported operand type(s) for

: ‘type’ and ‘NoneType’ — bug — by pilere (创建于: 2025-12-13 18:38 (UTC+8))

#30604 [Installation]: [ARM_CPU_backend] Engine core proc EngineCore_DP0 died unexpectedly, shutting down client. — installation,aarch64-cpu — by Mengjintao (创建于: 2025-12-13 16:51 (UTC+8))
#30599 [Bug]: Qwen3-30B-A3B on vLLM-OpenAI v0.12.0 “hangs” in thinking mode: keeps reasoning until token cap — bug — by golyshevskii (创建于: 2025-12-13 15:41 (UTC+8))
#30595 [Bug]: Unsatisfiable testing dependencies — bug — by BlankRH (创建于: 2025-12-13 13:42 (UTC+8))

已关闭 Issue

#18801 [Performance]: How can i improve performance further in vllm lmcache PD Disaggregate？Plz Help Me — performance,stale — by mugglew (关闭于: 2025-12-14 10:16 (UTC+8))
#19136 [Bug]: 单机多卡推理 tensor-parallel-size和pipeline-parallel-size 推理结果差距巨大 — bug,stale — by Kenwwww (关闭于: 2025-12-14 10:16 (UTC+8))
#22120 [Usage]: Can vllm skip MTP layer loading for GLM-4.5 to save some vram — usage,stale — by adonishong (关闭于: 2025-12-14 10:16 (UTC+8))
#22291 [Feature]: how to calculate –gpu-memory-utilization for a given max concurrency such as 5 req/s, 20 req/s and 50 req/s — feature request,stale — by hobin2017 (关闭于: 2025-12-14 10:16 (UTC+8))
#22430 [Bug]: PP+PD NixlConnector failed — bug,stale — by R2-Y (关闭于: 2025-12-14 10:15 (UTC+8))
#22748 [Bug]: stop_profile take too long to finish — bug,stale — by gameofdimension (关闭于: 2025-12-14 10:15 (UTC+8))
#22780 [Performance]: Performance Drop with Concurrent Requests Using BnB-4bit Quantized Models in vLLM — performance,stale — by yhhtc1201 (关闭于: 2025-12-14 10:15 (UTC+8))
#22811 [Bug]: Load jina-ranker-m0 error — documentation,stale — by chen03191108-lab (关闭于: 2025-12-14 10:15 (UTC+8))
#22828 [Feature]: vLLM Shutdown and Logging — help wanted,feature request,stale — by robertgshaw2-redhat (关闭于: 2025-12-14 10:15 (UTC+8))
#22882 [Bug]: Loading a quantized model from S3 fails — bug,stale — by petrcezner (关闭于: 2025-12-14 10:15 (UTC+8))
#22884 [Bug]: When deploying the fine-tuned model using vllm, lora loses its effect. — bug,stale — by tiga-dudu (关闭于: 2025-12-14 10:15 (UTC+8))
#22886 [Bug]: After enabling the ‘–enable-expert-parallel’ switch and setting it to ‘deepep_high_throughput’ mode, an error occurred during inference — bug,stale — by un4gh (关闭于: 2025-12-14 10:15 (UTC+8))
#22901 [Bug]: Dequantized GPT-OSS Model Deployment Error — bug,stale — by satpalsr (关闭于: 2025-12-14 10:15 (UTC+8))
#22914 [Feature][UX]: Consolidate vLLM Configuration — feature request,stale — by robertgshaw2-redhat (关闭于: 2025-12-14 10:15 (UTC+8))
#22917 [Bug]: Llama4 pythonic parser doesn’t parse when it starts with raw content — bug,stale — by erkintelnyx (关闭于: 2025-12-14 10:15 (UTC+8))
#22922 [CI Failure]: v1/e2e/test_spec_decode.py::test_eagle_correctness[TREE_ATTN-llama3_eagle3] — stale,ci-failure — by mgoin (关闭于: 2025-12-14 10:15 (UTC+8))
#22931 [Feature]: FP8 e4m3fn->fnuz KV cache conversion — feature request,stale — by bradleyhd (关闭于: 2025-12-14 10:15 (UTC+8))
#22941 [Performance]: Run Arctic embedding benchmarks with V0 and V1 — performance,stale — by maxdebayser (关闭于: 2025-12-14 10:15 (UTC+8))
#22959 [Bug]: Fail to load auto_round quantization format with quantized lm_head — bug,stale — by n1ck-guo (关闭于: 2025-12-14 10:15 (UTC+8))
#22970 [Bug]: How to setup stable latency on VLLM for streaming — bug,stale — by saifulislam79 (关闭于: 2025-12-14 10:15 (UTC+8))
#22981 [Bug]: Unable to test openai/gpt-oss-120b via vllm — bug,stale — by psydok (关闭于: 2025-12-14 10:15 (UTC+8))
#23003 [Feature]: Support Diverse Beam Search (like HF transformers generation_strategies) — feature request,stale — by stonewst (关闭于: 2025-12-14 10:15 (UTC+8))
#30622 [Feature]: [Refactor] Move zero_experts_compute_triton into model specific file — feature request — by zyongye (关闭于: 2025-12-14 03:04 (UTC+8))
#29382 [Doc]: Expert Parallel Deployment says “Tensor parallel size (always 1 for now)” is confusing — documentation — by xeonliu (关闭于: 2025-12-14 01:38 (UTC+8))
#29778 [Bug]: Using image_embeds to input multimodal image embedding results causes array out-of-bounds during processing. — bug — by NewZxy (关闭于: 2025-12-13 15:48 (UTC+8))

新增 PR

#30598 [LoRA] Set default MXFP4 LoRA backend to Marlin — 无标签 — by xyang16 (创建于: 2025-12-13 15:17 (UTC+8))
#30629 tuned fused configs for B300 — 无标签 — by navmarri14 (创建于: 2025-12-14 08:21 (UTC+8))
#30626 [docker] Restructure Dockerfile for more efficient and cache-friendly builds — documentation,ready,ci/build — by amrmahdi (创建于: 2025-12-14 05:02 (UTC+8))
#30627 [MoE][Refactor 1/N] Separate Online Quantization — 无标签 — by robertgshaw2-redhat (创建于: 2025-12-14 05:37 (UTC+8))
#30625 fix: prevent reasoning output when enable_thinking is false — frontend — by llsj14 (创建于: 2025-12-14 04:35 (UTC+8))
#30623 [Misc][Refactor] Refactor MoE router functions into separate classes — 无标签 — by bnellnm (创建于: 2025-12-14 03:30 (UTC+8))
#30619 [CI/Build] Ignore max transformers version skipping for initialization tests — ready — by Isotr0py (创建于: 2025-12-14 02:02 (UTC+8))
#30615 [Docs] Clarify Expert Parallel behavior for attention and MoE layers — documentation,ready — by majiayu000 (创建于: 2025-12-13 23:09 (UTC+8))
#30618 [BugFix][Hybrid] Fix prefill chunk incorrectly including draft tokens — v1 — by peakcrosser7 (创建于: 2025-12-14 01:08 (UTC+8))
#30591 [Scheduer] Simplify stop checking for pooling models — ready,v1 — by njhill (创建于: 2025-12-13 11:04 (UTC+8))
#30616 [Docs] Add FlashInfer environment variables to env_vars documentation — documentation — by majiayu000 (创建于: 2025-12-13 23:09 (UTC+8))
#30614 [Feature] Default EPLB num_redundant_experts to minimum valid value — 无标签 — by majiayu000 (创建于: 2025-12-13 23:08 (UTC+8))
#30613 [Bugfix] Add validation for tool requests when tool_parser is unavailable — frontend — by majiayu000 (创建于: 2025-12-13 23:08 (UTC+8))
#30609 [Refactor] TokenizerRegistry only uses lazy imports — structured-output,frontend,ready,v1 — by DarkLight1337 (创建于: 2025-12-13 21:01 (UTC+8))
#30612 [Chore] Remove redundant RequestPrompt — frontend,ready — by DarkLight1337 (创建于: 2025-12-13 22:30 (UTC+8))
#30611 [ROCm][Perf] Replace cat to bmm’s inplace write when aiter enabled — rocm,v1 — by ganyi1996ppo (创建于: 2025-12-13 22:30 (UTC+8))
#30610 Fix incorrect dimension in reduce_scatter — nvidia — by RKai025 (创建于: 2025-12-13 22:16 (UTC+8))
#30590 [ROCm][CI] Add “Qwen3-Next-80B-A3B-Instruct MTP Async EPLB Accuracy Test” Back Into AMD CI — rocm,ready,ci/build,qwen — by micah-wil (创建于: 2025-12-13 11:01 (UTC+8))
#30607 [Bugfix] Improve DCP error hint in cp_utils — v1 — by jliu9515 (创建于: 2025-12-13 20:07 (UTC+8))
#30608 [FixBug]fix gpt-oss v1/completions response bug — frontend,tool-calling,gpt-oss — by princepride (创建于: 2025-12-13 20:49 (UTC+8))
#30601 [Chore] Adjust tokenizer import to avoid circular imports — performance,structured-output,frontend,ready,v1,multi-modality,tool-calling — by DarkLight1337 (创建于: 2025-12-13 16:18 (UTC+8))
#30597 [CI/Build] Fix broken mm processor test Mistral-3-large — ready,multi-modality — by Isotr0py (创建于: 2025-12-13 14:22 (UTC+8))
#30596 [Bugfix][benchmarks] Fix input token calculation for rerank benchmark metrics — performance — by Flink-ddd (创建于: 2025-12-13 14:05 (UTC+8))
#30594 [docs][fix] Update Arm CPU vLLM wheel installation docs — documentation — by fadara01 (创建于: 2025-12-13 13:31 (UTC+8))
#30605 [Bugfix] Fix ScalarType NanRepr enum comparisons — 无标签 — by NoonePauseferg (创建于: 2025-12-13 17:44 (UTC+8))
#30603 just for testing dco — documentation,v1,qwen — by peakcrosser7 (创建于: 2025-12-13 16:31 (UTC+8))
#30602 [Quantization] Pass QuantizationArgs to compress-tensors schema’s get_min_capability — 无标签 — by Isotr0py (创建于: 2025-12-13 16:23 (UTC+8))
#30592 scheduler: cap prefill token admission under backlog to reduce tail latency — v1 — by Benjamindaoson (创建于: 2025-12-13 13:16 (UTC+8))
#30600 just testing — documentation,frontend,v1,qwen — by peakcrosser7 (创建于: 2025-12-13 15:56 (UTC+8))
#30593 [Misc] Improve error messages for unsupported types and parameters — performance,kv-connector,nvidia — by BlankRH (创建于: 2025-12-13 13:26 (UTC+8))

已合并 PR

#30577 [Bug][KVConnector][Metrics] Remove a vacuous assertion breaking external-launcher — ready,kv-connector,fb-exported,meta-exported — by QierLi (合并于: 2025-12-14 09:23 (UTC+8))
#30414 [Doc] Add instructions for building docker image on GB300 with CUDA13 — documentation,ready,aarch64-cuda,nvidia — by soodoshll (合并于: 2025-12-14 05:56 (UTC+8))
#29903 [Logs] Optimize startup logs 4 — ready,v1,nvidia — by yewentao256 (合并于: 2025-12-14 05:12 (UTC+8))
#30509 [Doc] Add documents for multi-node distributed serving with MP backend — documentation,ready,v1 — by Isotr0py (合并于: 2025-12-14 02:02 (UTC+8))
#30615 [Docs] Clarify Expert Parallel behavior for attention and MoE layers — documentation,ready — by majiayu000 (合并于: 2025-12-14 01:37 (UTC+8))
#30591 [Scheduer] Simplify stop checking for pooling models — ready,v1 — by njhill (合并于: 2025-12-13 17:45 (UTC+8))
#30459 set assume_32bit_indexing and pass unbacked hints — ready — by laithsakka (合并于: 2025-12-13 23:36 (UTC+8))
#30609 [Refactor] TokenizerRegistry only uses lazy imports — structured-output,frontend,ready,v1 — by DarkLight1337 (合并于: 2025-12-13 23:16 (UTC+8))
#30433 [Bugfix] Qwen3-next with –hf-overrides {"num_hidden_layers":8} — ready,qwen — by heheda12345 (合并于: 2025-12-13 22:12 (UTC+8))
#30601 [Chore] Adjust tokenizer import to avoid circular imports — performance,structured-output,frontend,ready,v1,multi-modality,tool-calling — by DarkLight1337 (合并于: 2025-12-13 20:42 (UTC+8))
#30597 [CI/Build] Fix broken mm processor test Mistral-3-large — ready,multi-modality — by Isotr0py (合并于: 2025-12-13 20:43 (UTC+8))
#30507 [Bugfix] Dictionary MM embeddings for online chat — frontend,ready,v1 — by DarkLight1337 (合并于: 2025-12-13 15:48 (UTC+8))
#30310 [Misc][Quantization] Clarify the intent of GGUF FusedMoE weight materialization — ready — by a4lg (合并于: 2025-12-13 13:55 (UTC+8))
#30484 [Feature] Add SM103 (Blackwell Ultra) Support to vLLM — ready,v1,nvidia — by LopezCastroRoberto (合并于: 2025-12-13 11:34 (UTC+8))

关闭但未合并的 PR

#18038 [Misc] Add torch.int16 to TORCH_DTYPE_TO_NUMPY_DTYPE conversion map — needs-rebase,stale — by rebel-jonghewk (关闭于: 2025-12-14 10:17 (UTC+8))
#18094 [Frontend] Vendor exported templates to vllm.tools — documentation,frontend,needs-rebase,stale,tool-calling — by aarnphm (关闭于: 2025-12-14 10:17 (UTC+8))
#18186 [Build] Refactor cmake — needs-rebase,ci/build,stale — by LucasWilkinson (关闭于: 2025-12-14 10:17 (UTC+8))
#18238 Add Blackwell to Hardware Matrix — documentation,needs-rebase,stale — by b8zhong (关闭于: 2025-12-14 10:17 (UTC+8))
#18277 [CI/Build] [TPU] Add test exit code in docker run to prevent silent failure — needs-rebase,ci/build,stale — by CAROLZXYZXY (关闭于: 2025-12-14 10:16 (UTC+8))
#21519 [Bugfix] Fix v1 engine crash in priority scheduling with parallel sampling (n > 1) — stale,v1 — by HongBeenKim (关闭于: 2025-12-14 10:16 (UTC+8))
#22438 [Speculators][Speculative Decoding] Add Eagle3 support for Qwen2 — stale,qwen — by hukongyi (关闭于: 2025-12-14 10:15 (UTC+8))
#22874 [CI][V0 Deprecation] Remove test_regression.py — ready,ci/build,stale — by robertgshaw2-redhat (关闭于: 2025-12-14 10:15 (UTC+8))
#22984 [ROCm][Bugfix] Add missing max_qlen argument — rocm,stale — by tuukkjs (关闭于: 2025-12-14 10:15 (UTC+8))
#27523 [Fix] Change default MXFP4 backend for SM90 to Marlin — 无标签 — by mmangkad (关闭于: 2025-12-14 09:10 (UTC+8))
#30096 [Bugfix] Use PIECEWISE cudagraph with gpt-oss on Ampere — needs-rebase,gpt-oss,nvidia — by bbrowning (关闭于: 2025-12-14 03:33 (UTC+8))
#30603 just for testing dco — documentation,v1,qwen — by peakcrosser7 (关闭于: 2025-12-13 16:32 (UTC+8))
#30600 just testing — documentation,frontend,v1,qwen — by peakcrosser7 (关闭于: 2025-12-13 15:56 (UTC+8))
#24382 [Bugfix] request_id abort-reuse race fix — v1 — by ben11211 (关闭于: 2025-12-13 12:45 (UTC+8))