k8s调用GPU-天翼云

k8s调用GPU

2024-09-25 10:14:34 阅读次数：101

创建可调用GPU的pod

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-master
  namespace: gpu
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu-master
  template:
    metadata:
      labels:
        app: gpu-master
    spec:
      hostname: gpu-master
      containers:
      - name: gpu-master
        image: 192.168.168.10:5000/library/pytorch-gpu:v3
        env:
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: compute,utility
        - name: NVIDIA_VISIBLE_DEVICES
          value: all
        securityContext:
          privileged: true
          runAsUser: 0
        resources:
           limits:
             /gpu: "1"
           requests:
             /gpu: "1"
        volumeMounts:
        - name: code-host-path
          mountPath: /persistent
      volumes:
      - name: code-host-path
        hostPath:
           path: /root/gpu/gpucode

创建可调用GPU的job

apiVersion: batch/v1
kind: Job
metadata:
  labels:
    app-name: gpu-job
    job-name: gpu-job
  name: gpu-job
  namespace: gpu
spec:
  backoffLimit: 6
  parallelism: 1
  template:
    metadata:
      labels:
        app-name: gpu-job
        job-name: gpu-job
      name: gpu-job
    spec:
      containers:
      - command:
        - /bin/bash
        - -c
        - '/usr/local/anaconda2/envs/edu_pytorch/bin/python3.6 /persistent/test.py '
        image: 192.168.168.10:5000/library/pytorch-gpu:v3
        env:
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: compute,utility
        - name: NVIDIA_VISIBLE_DEVICES
          value: all
        imagePullPolicy: IfNotPresent
        name: gpu-job
        resources:
          limits:
            /gpu: "1"
          requests:
            /gpu: "1"
        securityContext:
          privileged: true
          procMount: Default
        volumeMounts:
        - name: code-host-path
          mountPath: /persistent
      dnsPolicy: ClusterFirst
      hostname: gpu-job
      restartPolicy: OnFailure
      schedulerName: default-scheduler
      securityContext: {}
      volumes:
      - name: code-host-path
        hostPath:
           path: /root/gpu/gpucode

活动

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

k8s调用GPU

k8s调用GPU

创建可调用GPU的pod

创建可调用GPU的job

相关文章

Android 性能优化-过度绘制的实际解决措施

pytorch深度学习前查看显卡，用nvidia-smi和nvidia-smi查看 -q查看电脑和服务器里的GPU参数情况以及一些英文参数指的意思

【云原生】Kubernetes介绍

第一个GPU训练程序

k8s基础篇 pod(一)简介

如何在k8s集群里更换harbor-registry的镜像源

LVS实现Kubernetes集群高可用

numpy加速包——Cupy

k8s基础篇 pod(五)资源清单详细解读

如何在 GPU 上加速数据科学

作者介绍

最新文章

Android 性能优化-过度绘制的实际解决措施

【云原生】Kubernetes介绍

k8s基础篇 pod(一)简介

如何在k8s集群里更换harbor-registry的镜像源

LVS实现Kubernetes集群高可用

numpy加速包——Cupy

热门文章

k8s实验-Label与Label Selector,Service服务发现

常见的 Kubernetes 面试题总结

k8s学习笔记-安全认证

一次文件句柄消耗过多的排查过程

k8s 设置nfs的StorageClass

KubeSphere3.0创建流水线界面无显示

热门标签

相关产品

弹性云主机

天翼云电脑（公众版）

对象存储

云硬盘

随机文章

Android 性能优化-过度绘制的实际解决措施

LVS实现Kubernetes集群高可用

常见的 Kubernetes 面试题总结

k8s安装phpmyadmin，yaml如何写？

KubeSphere3.0创建流水线界面无显示

如何在k8s集群里更换harbor-registry的镜像源