K8s 基于 HAMi 的 GPU/NPU 算力切分实践指南

机器人
摘要
lomtom

引言

在 AI 模型训练、推理及高性能计算场景中,GPU、NPU 等异构计算资源常面临 “算力闲置” 与 “多任务调度冲突” 的双重挑战 —— 单张芯片算力充足但仅支撑单个任务时利用率偏低,多个轻量任务并发请求时又易因资源竞争导致调度阻塞。

HAMi(High-performance AI Model Infrastructure)作为 Kubernetes 生态下的 AI 原生资源调度与管理方案,通过统一的设备插件与调度扩展能力,可实现 GPU、NPU 资源的精细化切分与高效调度,既能大幅提升异构资源利用率,又能适配不同任务的资源诉求。

本文结合 HAMi 官方文档与实践经验,详细拆解 GPU、NPU 算力切分的部署、配置与验证全流程,为实际落地提供可直接复用的操作指南。

核心组件介绍

HAMi 核心框架

HAMi 是 Project-HAMi 社区推出的 AI 资源管理解决方案,核心包含 调度器扩展(hami-scheduler)设备插件(hami-device-plugin) 两大组件。其核心优势在于:

  • 支持 GPU(NVIDIA)、NPU(华为昇腾)等多类型异构资源的统一管理;
  • 提供显存、计算核心等维度的精细化切分能力,实现物理芯片向虚拟资源(vGPU/vNPU)的灵活拆分;
  • 深度集成 Kubernetes 调度体系,无需改造底层集群即可快速部署。

设备插件组件

  • GPU 设备插件:集成 NVIDIA 资源管理能力,支持 Tesla T4 等主流 GPU 型号的算力切分,通过 nvidia.com/gpunvidia.com/gpumem 等资源名实现显存与算力的精准申请,无需安装额外插件;
  • NPU 设备插件(hami-ascend-device-plugin):专为华为昇腾 NPU 设计,适配 Ascend 310P 等型号,支持按 AI Core、显存维度切分虚拟资源,通过 huawei.com/AscendXXX 系列资源名实现资源申请。

安装部署

前置准备

  1. 确保 Kubernetes 集群版本为 v1.23+(本文基于 v1.23.7 验证),且集群已安装 Helm 3 及以上版本;
  2. GPU 节点需提前安装 NVIDIA 驱动与容器运行时(nvidia-container-toolkit)可参考K8s 基于 Volcano 优先级调度的 GPU 算力切分实践指南
  3. NPU 节点需提前安装 Ascend 驱动(版本 ≥7.2)与 Ascend Docker Runtime,确保硬件环境就绪,可参考K8s 基于 Volcano 优先级调度的 NPU 算力切分实践指南

为目标节点打标签

通过节点标签实现设备插件的精准调度,分别为 GPU、NPU 节点添加对应标签

# GPU 节点打标签
kubectl label node <gpu-node-name> gpu=on  # 替换为实际 GPU 节点名称
# NPU 节点打标签
kubectl label node <npu-node-name> ascend=on  # 替换为实际 NPU 节点名称

部署 HAMi 核心框架(Helm 方式)

GPU 算力切分部署

添加 HAMi Helm 仓库并部署核心组件,仅启用 GPU 支持:

helm repo add hami-charts https://project-hami.github.io/HAMi/

helm install hami hami-charts/hami -n kube-system \
--version 2.7.0 \
--set scheduler.patch.imageNew.repository=lomtom-common/kube-webhook-certgen \
--set scheduler.kubeScheduler.image.repository=lomtom-common/kube-scheduler \
--set scheduler.kubeScheduler.image.tag=v1.23.7 \
--set scheduler.extender.image.repository=lomtom-common/hami \
--set devicePlugin.image.repository=lomtom-common/hami \
--set devicePlugin.monitor.image.repository=lomtom-common/hami \
--set global.imageRegistry=swr.cn-east-3.myhuaweicloud.com

查看hami-device-pluginhami-scheduler pod 状态

# kubectl get pod -n kube-system 
hami-device-plugin-qg8qj                                 2/2     Running   0             32s
hami-scheduler-d846f7b69-9l498                           2/2     Running   0             32s

NPU 算力切分部署

部署 HAMi 核心组件时启用昇腾 NPU 支持,并额外部署专用设备插件:

  1. 部署 HAMi 核心组件:
helm repo add hami-charts https://project-hami.github.io/HAMi/

helm install hami hami-charts/hami -n kube-system \
--version 2.7.0 \
--set scheduler.patch.imageNew.repository=lomtom-common/kube-webhook-certgen \
--set scheduler.kubeScheduler.image.repository=lomtom-common/kube-scheduler \
--set scheduler.kubeScheduler.image.tag=v1.23.7 \
--set scheduler.extender.image.repository=lomtom-common/hami \
--set devicePlugin.image.repository=lomtom-common/hami \
--set devicePlugin.monitor.image.repository=lomtom-common/hami \
--set global.imageRegistry=swr.cn-east-3.myhuaweicloud.com \
--set devices.ascend.enabled=true 

projecthami/hami:v2.7.0
jettech/kube-webhook-certgen:v1.5.2
liangjw/kube-webhook-certgen:v1.1.1
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.23.7

查看 hami-scheduler pod 状态

kubectl get pod -n kube-system 
NAME                                         READY   STATUS    RESTARTS        AGE
hami-scheduler-59598d4f7d-k7sn7              2/2     Running   0               8m33s
  1. 创建 NPU 设备配置 ConfigMap

定义 Ascend 310P3 等 NPU 型号的切分规格(显存、AI Core、AI CPU 分配),具体需要配置可根据实际情况而定,YAML 如下:

apiVersion: v1
kind: ConfigMap
metadata:
  name: hami-scheduler-device
  namespace: kube-system
  labels:
    app.kubernetes.io/component: hami-scheduler
    app.kubernetes.io/name: hami
    app.kubernetes.io/instance: hami
data:
  device-config.yaml: |-
    vnpus:
    # Ascend 310P3 NPU 配置
    - chipName: 310P3
      commonWord: Ascend310P
      resourceName: huawei.com/Ascend310P  # NPU 资源名称(Pod 申请时使用)
      resourceMemoryName: huawei.com/Ascend310P-memory  # NPU 显存资源名称
      memoryAllocatable: 21527  # 可分配显存总量(MB)
      memoryCapacity: 24576     # 显存总容量(MB)
      aiCore: 8                 # AI Core 总数
      aiCPU: 7                  # AI CPU 总数
      templates:  # vNPU 切分模板
      - name: vir01
        memory: 3072  # 单 vNPU 显存(MB)
        aiCore: 1     # 单 vNPU AI Core 数量
        aiCPU: 1      # 单 vNPU AI CPU 数量
      - name: vir02
        memory: 6144  # 单 vNPU 显存(MB)
        aiCore: 2     # 单 vNPU AI Core 数量
        aiCPU: 2      # 单 vNPU AI CPU 数量
      - name: vir04
        memory: 12288  # 单 vNPU 显存(MB)
        aiCore: 4      # 单 vNPU AI Core 数量
        aiCPU: 4       # 单 vNPU AI CPU 数量
  1. 创建 NPU 设备配置 ConfigMap

定义 Ascend 310P3、910B4 等 NPU 型号的切分规格(显存、AI Core、AI CPU 分配),具体需要配置可根据实际情况而定,YAML 如下:

apiVersion: v1
kind: ConfigMap
metadata:
  name: hami-scheduler-device
  namespace: kube-system
  labels:
    app.kubernetes.io/component: hami-scheduler
    app.kubernetes.io/name: hami
    app.kubernetes.io/instance: hami
data:
  device-config.yaml: |-
    vnpus:
    # Ascend 910B4 NPU 配置
    - chipName: 910B4
      commonWord: Ascend910B4
      resourceName: huawei.com/Ascend910B4  # NPU 资源名称(Pod 申请时使用)
      resourceMemoryName: huawei.com/Ascend910B4-memory  # NPU 显存资源名称
      memoryAllocatable: 32768  # 可分配显存总量(MB)
      memoryCapacity: 32768     # 显存总容量(MB)
      aiCore: 20                # AI Core 总数
      aiCPU: 7                  # AI CPU 总数
      templates:  # vNPU 切分模板
      - name: vir05_1c_8g
        memory: 8192  # 单 vNPU 显存(MB)
        aiCore: 5     # 单 vNPU AI Core 数量
        aiCPU: 1      # 单 vNPU AI CPU 数量
      - name: vir10_3c_16g
        memory: 16384  # 单 vNPU 显存(MB)
        aiCore: 10     # 单 vNPU AI Core 数量
        aiCPU: 3       # 单 vNPU AI CPU 数量
    # Ascend 310P3 NPU 配置
    - chipName: 310P3
      commonWord: Ascend310P
      resourceName: huawei.com/Ascend310P  # NPU 资源名称(Pod 申请时使用)
      resourceMemoryName: huawei.com/Ascend310P-memory  # NPU 显存资源名称
      memoryAllocatable: 21527  # 可分配显存总量(MB)
      memoryCapacity: 24576     # 显存总容量(MB)
      aiCore: 8                 # AI Core 总数
      aiCPU: 7                  # AI CPU 总数
      templates:  # vNPU 切分模板
      - name: vir01
        memory: 3072  # 单 vNPU 显存(MB)
        aiCore: 1     # 单 vNPU AI Core 数量
        aiCPU: 1      # 单 vNPU AI CPU 数量
      - name: vir02
        memory: 6144  # 单 vNPU 显存(MB)
        aiCore: 2     # 单 vNPU AI Core 数量
        aiCPU: 2      # 单 vNPU AI CPU 数量
      - name: vir04
        memory: 12288  # 单 vNPU 显存(MB)
        aiCore: 4      # 单 vNPU AI Core 数量
        aiCPU: 4       # 单 vNPU AI CPU 数量
  1. 创建插件所需的 RBAC 权限与 DaemonSet,完整 YAML 如下:

  2. 部署 hami-ascend-device-plugin(NPU 专用设备插件):

​ 创建 RBAC 权限与 DaemonSet 资源,完整 YAML 如下:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: hami-ascend
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list", "update", "watch", "patch"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: hami-ascend
subjects:
  - kind: ServiceAccount
    name: hami-ascend
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: hami-ascend
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: hami-ascend
  namespace: kube-system
  labels:
    app.kubernetes.io/component: "hami-ascend"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: hami-ascend-device-plugin
  namespace: kube-system
  labels:
    app.kubernetes.io/component: hami-ascend-device-plugin
spec:
  selector:
    matchLabels:
      app.kubernetes.io/component: hami-ascend-device-plugin
      hami.io/webhook: ignore
  template:
    metadata:
      labels:
        app.kubernetes.io/component: hami-ascend-device-plugin
        hami.io/webhook: ignore
    spec:
      priorityClassName: "system-node-critical"
      serviceAccountName: hami-ascend
      containers:
        - image: swr.cn-east-3.myhuaweicloud.com/lomtom-common/ascend-device-plugin:v1.1.0
          imagePullPolicy: IfNotPresent
          name: device-plugin
          resources:
            requests:
              memory: 500Mi
              cpu: 500m
            limits:
              memory: 500Mi
              cpu: 500m
          args:
            - --config_file
            - /device-config.yaml
          securityContext:
            privileged: true
            readOnlyRootFilesystem: false
          volumeMounts:
            - name: device-plugin
              mountPath: /var/lib/kubelet/device-plugins
            - name: pod-resource
              mountPath: /var/lib/kubelet/pod-resources
            - name: hiai-driver
              mountPath: /usr/local/Ascend/driver
              readOnly: true
            - name: log-path
              mountPath: /var/log/mindx-dl/devicePlugin
            - name: tmp
              mountPath: /tmp
            - name: ascend-config
              mountPath: /device-config.yaml
              subPath: device-config.yaml
              readOnly: true
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins
        - name: pod-resource
          hostPath:
            path: /var/lib/kubelet/pod-resources
        - name: hiai-driver
          hostPath:
            path: /usr/local/Ascend/driver
        - name: log-path
          hostPath:
            path: /var/log/mindx-dl/devicePlugin
            type: Directory
        - name: tmp
          hostPath:
            path: /tmp
        - name: ascend-config
          configMap:
            name: hami-scheduler-device
      nodeSelector:
        ascend: "on"

验证 HAMi 部署状态

部署完成后,检查核心组件 Pod 状态,确保所有组件正常运行:

 # 查看 HAMi 核心组件
kubectl get pod -n kube-system -l app.kubernetes.io/name=hami
# 查看 NPU 专用设备插件
kubectl get pod -n kube-system -l app.kubernetes.io/component=hami-ascend-device-plugin

预期输出:hami-scheduler(2/2 Running)、hami-device-plugin(2/2 Running)、hami-ascend-device-plugin(1/1 Running)。

# GPU环境
NAME                             READY   STATUS    RESTARTS   AGE
hami-device-plugin-qg8qj         2/2     Running   0          33m
hami-scheduler-d846f7b69-9l498   2/2     Running   0          33m

# NPU环境
NAME                              READY   STATUS    RESTARTS   AGE
hami-scheduler-59598d4f7d-k7sn7   2/2     Running   0          43m
hami-ascend-device-plugin-sjmxj   1/1     Running   0          41m

验证

GPU 算力切分测试(基于 vGPU)

创建申请 vGPU 资源的 Pod,指定显存与算力需求:

kubectl apply -f -<<EOF
apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod-t4
spec:
  containers:
  - name: gpu-container
    image: swr.cn-east-3.myhuaweicloud.com/lomtom-common/pytorch:2.1.2-cuda12.1-cudnn8-runtime-ubuntu22.04
    command: ["sleep"]
    args: ["100000"]  # 保持 Pod 长期运行
    resources:
      limits:
        cpu: "1"
        memory: 1000Mi
        nvidia.com/gpu: "1"  # 申请 1 个 Ascend310P vNPU
        nvidia.com/gpumem: 3000
      requests:
        cpu: "1"
        memory: 1000Mi
        nvidia.com/gpu: "1"
        nvidia.com/gpumem: 3000
EOF

验证切分结果:

# 进入 Pod 执行 nvidia-smi 查看资源分配
kubectl exec -it gpu-pod-t4 -- nvidia-smi

预期输出:Pod 内仅能看到分配的 3000MiB 显存(与申请值一致),物理 GPU 算力被成功切分为虚拟资源,验证 GPU 切分功能生效。

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03             Driver Version: 550.144.03     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:06.0 Off |                    0 |
| N/A   45C    P8             14W /   70W |       0MiB /   3000MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

基于vNPU的切分测试

创建申请 vNPU 资源的 Pod:

kubectl apply -f -<<EOF
apiVersion: v1
kind: Pod
metadata:
  name: npu-pod-310p
spec:
  containers:
  - name: npu-container
    image: swr.cn-south-1.myhuaweicloud.com/ascendhub/ascend-pytorch:24.0.RC1-A2-1.11.0-ubuntu20.04
    command: ["sleep"]
    args: ["100000"]  # 保持 Pod 长期运行
    resources:
      limits:
        cpu: "1"
        memory: 1000Mi
        huawei.com/Ascend310P: "1"  # 申请 1 个 Ascend310P vNPU
        huawei.com/Ascend310P-memory: "3072"  # 申请 3GB 显存(对应 vir01 模板)
      requests:
        cpu: "1"
        memory: 1000Mi
        huawei.com/Ascend310P: "1"
        huawei.com/Ascend310P-memory: "3072"
EOF

验证切分结果:

# npu-smi info
+-------------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc2                                 Version: 24.1.rc2                                    |
+-------------------------------+-----------------+-----------------------------------------------------+
| NPU     Name                  | Health          | Power(W)     Temp(C)           Hugepages-Usage(page)|
| Chip    Device                | Bus-Id          | AICore(%)    Memory-Usage(MB)                       |
+===============================+=================+=====================================================+
| 32896   310Pvir01             | OK              | NA           48                0     / 0            |
| 0       0                     | 0000:85:00.0    | 0            225  / 2690                            |
+===============================+=================+=====================================================+
+-------------------------------+-----------------+-----------------------------------------------------+
| NPU     Chip                  | Process id      | Process name             | Process memory(MB)       |
+===============================+=================+=====================================================+
| No running processes found in NPU 32896                                                               |
+===============================+=================+=====================================================+

# npu-smi info -t info-vnpu -i 6 -c 0 
+-------------------------------------------------------------------------------+
| NPU resource static info as follow:                                           |
| Format:Free/Total                   NA: Currently, query is not supported.    |
| AICORE    Memory    AICPU    VPC    VENC    VDEC    JPEGD    JPEGE    PNGD    |
|            GB                                                                 |
|===============================================================================|
| 7/8       18/21     6/7      11/12  3/3     11/12   14/16    7/8      NA/NA   |
+-------------------------------------------------------------------------------+
| Total number of vnpu: 1                                                       |
+-------------------------------------------------------------------------------+
|  Vnpu ID  |  Vgroup ID     |  Container ID  |  Status  |  Template Name       |
+-------------------------------------------------------------------------------+
|  132      |  0             |  ffffffffffff  |  1       |  vir01               |
+-------------------------------------------------------------------------------+

#  cat /etc/vnpu.cfg 
vnpu_config_recover:enable
[vnpu-config start]
2:132:npu-smi set -t create-vnpu -i 6 -c 0 -f vir01 -v 132 -g 0

标题:K8s 基于 HAMi 的 GPU/NPU 算力切分实践指南

作者:lomtom

链接:https://lomtom.cn/8lov8ck4e6cy