为了充分发挥昇腾910B物理机的优势,满足企业级用户对于大规模并行计算和分布式存储的需求,我们特此编写了这份基于物理机Galaxy镜像和CTyunOS操作系统的昇腾910B多机部署指南。本指南旨在为用户提供一个从环境准备到服务管理的全面解决方案,帮助用户便捷快速的搭建起稳定、高效的AI计算集群。
一、环境准备
1.1 前置条件
- 管理节点使用CTyunOS-23.01.2@GalaxyMaster-NPU24.1.rc2.1镜像的昇腾 910B 物理机(绑定EIP)
- 所有计算节点使用CTyunOS-23.01.2@GalaxyCompute-NPU24.1.rc2.1镜像的昇腾 910B 物理机
- 使用共享存储OceanFS / SFS
1.2 本地NVME分区
将节点的nvme1n1和nvme0n1两块NVME盘分别挂载在/mnt/nvme1n1和/mnt/nvme0n1上。
#!/bin/bash
# 设备列表
devices=("/dev/nvme0n1" "/dev/nvme1n1")
mount_points=("/mnt/nvme0n1" "/mnt/nvme1n1")
fs_type="xfs"
# 确保 root 权限
if [[ $EUID -ne 0 ]]; then
echo "请使用 root 运行此脚本!"
exit 1
fi
for i in "${!devices[@]}"; do
device="${devices[$i]}"
mount_point="${mount_points[$i]}"
# 创建挂载目录
mkdir -p "$mount_point"
# 获取设备的文件系统类型
current_fs=$(blkid -s TYPE -o value "$device")
if [[ -z "$current_fs" ]]; then
echo "设备 $device 没有文件系统,正在格式化为 $fs_type..."
mkfs.xfs -f "$device"
else
echo "$device 已格式化为 $current_fs,跳过格式化"
fi
# 确保设备未被挂载后再尝试挂载
umount "$device" 2>/dev/null
mount -t "$fs_type" "$device" "$mount_point"
if [[ $? -ne 0 ]]; then
echo "错误:无法挂载 $device 到 $mount_point,请检查设备或文件系统!"
exit 1
fi
echo "$device 已成功挂载到 $mount_point"
# 获取 UUID 并更新 /etc/fstab,避免重复添加
uuid=$(blkid -s UUID -o value "$device")
if ! grep -q "$uuid" /etc/fstab; then
echo "UUID=$uuid $mount_point $fs_type defaults 0 0" >> /etc/fstab
echo "$device (UUID=$uuid) 已添加到 /etc/fstab"
else
echo "$device 已存在于 /etc/fstab,无需添加"
fi
done
echo "所有磁盘已成功挂载并配置为开机自动挂载!"
如下可查看磁盘分区情况:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 4.2G 1 loop
sda 8:0 0 446.6G 0 disk
├─sda1 8:1 0 122M 0 part
├─sda2 8:2 0 976.6M 0 part /boot/efi
├─sda3 8:3 0 1.9G 0 part /boot
└─sda4 8:4 0 443.6G 0 part
├─system-lv_swap 253:0 0 16G 0 lvm [SWAP]
└─system-lv_root 253:1 0 427.6G 0 lvm /
nvme1n1 259:0 0 2.9T 0 disk
└─nvme1n1p1 259:4 0 2.9T 0 part /mnt/nvme1n1
nvme0n1 259:1 0 2.9T 0 disk
└─nvme0n1p1 259:3 0 2.9T 0 part /mnt/nvme0n1
根据galaxy集群启动后,/home和/opt目录为各节点共享目录:
$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs 756G 419M 755G 1% /dev/shm
tmpfs 303G 1.8G 301G 1% /run
tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup
/dev/mapper/system-lv_root 428G 20G 408G 5% /
tmpfs 756G 200K 756G 1% /tmp
/dev/sda3 1.9G 148M 1.7G 9% /boot
/dev/sda2 975M 6.4M 969M 1% /boot/efi
tmpfs 152G 0 152G 0% /run/user/0
/dev/nvme0n1p1 3.0T 21G 2.9T 1% /mnt/nvme0n1
/dev/nvme1n1p1 3.0T 654G 2.3T 22% /mnt/nvme1n1
master001:/data/nfs/home 428G 20G 408G 5% /home
master001:/data/nfs/opt 428G 20G 408G 5% /opt
说明上面举例使用master001作nfs服务的效果,这里也可以按需使用oceanfs来做共享存储。
本指南部署,会把模型和mindie容器镜像存入每个节点的/mnt/nvme1n1目录,并使用home目录进行服务启动,home目录将仅存放log文件,非常轻量化。
1.3 下载脚本
下载脚本包并解压在home目录。
cd /home
wget ``https://jiangsu-10.zos.ctyun.cn/galaxy/deployment/deepseek-hw-nnode-v20250319.tar
--2025-03-19 21:38:42-- https://jiangsu-10.zos.ctyun.cn/galaxy/deployment/deepseek-hw-nnode-v20250319.tar
Resolving jiangsu-10.zos.ctyun.cn (jiangsu-10.zos.ctyun.cn)... 117.88.33.247, 117.88.33.209, 218.91.113.207
Connecting to jiangsu-10.zos.ctyun.cn (jiangsu-10.zos.ctyun.cn)|117.88.33.247|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20480 (20K) [application/x-tar]
Saving to: ‘deepseek-hw-nnode-v20250319.tar’
deepseek-hw-nnode-v20250319.t 100%[===============================================>] 20.00K --.-KB/s in 0.007s
2025-03-19 21:38:42 (2.62 MB/s) - ‘deepseek-hw-nnode-v20250319.tar’ saved [20480/20480]
tar xvf deepseek-hw-nnode-v20250319.tar
$ ls -l
total 20
drwxr-xr-x 6 root root 212 Mar 19 21:21 deepseek
-rw-r--r-- 1 root root 20480 Mar 19 21:29 deepseek-hw-nnode-v20250319.tar
1.4 下载MindIE
将MindIE容器放在每个节点的/mnt/nvme1n1盘下:
mkdir -p /mnt/nvme1n1/apptainer/
cd /mnt/nvme1n1/apptainer/
wget ``https://jiangsu-10.zos.ctyun.cn/galaxy/apptainer/mindie/mindie_2.0.T3-800I-A2-py311-openeuler24.03-lts-cthpc-fix-1.sif
ll /mnt/nvme1n1/apptainer/
注意cthpc-fix-1 在华为原版容器的基础上修正了输出结尾有end_of_sentence标记的问题。
下载完成后,在/home/deepseek目录下做一个软链接:
cd /home/deepseek
ln -s /mnt/nvme1n1/apptainer/mindie_2.0.T3-800I-A2-py311-openeuler24.03-lts-cthpc-fix-1.sif .
1.5 下载模型文件
将模型文件下载并保存在每个节点的/mnt/nvme1n1/model/ 目录下,这里以量化版DeepSeek-R1模型举例:
ll /mnt/nvme1n1/model/DeepSeek-R1-bf16-hfd-w8a8/
total 658940180
-rwxr-x--- 1 root root 1791 Mar 19 15:46 config.json
-rwxr-x--- 1 root root 10556 Mar 19 15:46 configuration_deepseek.py
-rwxr-x--- 1 root root 15229 Mar 19 15:46 md5sum.txt
-rwxr-x--- 1 root root 9923927 Mar 19 15:46 quant_model_description_w8a8_dynamic.json
-rwxr-x--- 1 root root 4300743184 Mar 19 15:47 quant_model_weight_w8a8_dynamic-00001-of-00157.safetensors
...
-rwxr-x--- 1 root root 16470256 Mar 19 15:46 quant_model_weight_w8a8_dynamic.index.json
-rwxr-x--- 1 root root 6038 Mar 19 15:46 README.md
-rwxr-x--- 1 root root 3584 Mar 19 15:55 tokenizer_config.json
-rwxr-x--- 1 root root 7847602 Mar 19 15:55 tokenizer.json
注意可联系公有云事业部通过对等连接方式进行快速模型罐装。
注意:所有文件的访问权限设置成750。
二、服务管理
2.1 服务配置
根据节点数量修改 srun_clean.sh 和 srun_deepseek.sh:
!/bin/bash
#SBATCH -N 2
...
如果是4台,需修改-N后面的数值
根据具体使用的模型,修改node.sh 中的内容:
export MODEL_DIR=/mnt/nvme1n1/model/DeepSeek-R1-bf16-hfd-w8a8
export MINDIE_IMG=mindie_2.0.T3-800I-A2-py311-openeuler24.03-lts-cthpc-fix-1.sif
其中:
- MODEL_DIR 为模型在每个节点的本地盘中的具体路径(必须位置统一)
- MINDIE_IMG为使用的MindIE容器
2.2 DeepSeek服务启动
仅需如下3条命令,即可启动服务(根据模型大小,启动服务需等待5-30分钟不等)
cd /home/deepseek
sbatch srun_clean.sh
sbatch srun_deepseek.sh
查看log目录下的out文件,当出现如下信息时,服务启动:
$ more log_ds/deepseek.7.out
Loading cthpc_910b/1.0.0/mpich-3.2.1
Loading requirement: mpich/3.2.1/gcc-10.3.1
Service Address: http://192.168.0.27:1040/v1
Config updated successfully. Output file: json/config.102.json
server IP: 192.168.0.27
Start to parse ranktable file
Finished parsing ranktable file.
Update worldSize and npuDeviceIds of backend config successfully for Multi Nodes Inference.
Start to parse ranktable file
Finished parsing ranktable file.
Update worldSize and npuDeviceIds of backend config successfully for Multi Nodes Inference.
Multi Nodes infer slave instance need not init TokenizerProcessPool and HttpWrapper
Daemon start success!
Daemon start success!
注意DeepSeek主节点,根据调度节点自动生成,这里log显示为192.168.0.27。
做个检测,给DeepSeek提个问题:
$ curl -i --location 'http://192.168.0.27:1040/v1/chat/completions' --header 'Content-Type: application/json' --data '{ "model": "DeepSeek-R1", "stream": false, "messages": [ {"role": "user", "content": "你是谁"} ] }'
HTTP/1.1 200 OK
Connection: close
Content-Length: 623
Content-Type: application/json
Keep-Alive: timeout=180, max=2147483647
{"id":"endpoint_common_1","object":"chat.completion","created":1743140175,"model":"DeepSeek-R1","choices":[{"index":0,"message":{"role":"assistant","content":"<think>\n\n</think>\n\n您好!我是由中国的深度求索(DeepSeek)公司开发的智能助手DeepSeek-R1。如您有任何任何问题,我会尽我所能为您提供帮助。","tool_calls":null},"finish_reason":"stop"}],"usage":{"prompt_tokens":4,"completion_tokens":42,"total_tokens":46},"prefill_time":272,"decode_time_arr":[367,81,83,80,80,81,82,83,84,84,84,84,84,85,85,85,85,85,85,85,86,86,86,85,85,85,86,86,85,85,87,86,85,85,86,85,84,84,84,84,94]}
2.3 查看DeepSeek状态
可通过slurm命令squeue,查看作业运行信息:
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7 batch deepseek root R 5:13:46 2 compute001,master001
注意可在/home/deepseek/log_ds目录中查看当前作业的log文件。
2.4 DeepSeek服务停止
仅需一条命令即可停止deepseek服务
scancel --me