华为昇腾服务器Docker部署Qwen模型实操指南
前提:docker、docker-compose、固件、驱动、MindIE、Ascend镜像等已被正确安装
步骤一:下载模型
从Modelscope上下载模型,下载后会存储到/root/.cache/modelscope/hub/models/Qwen下,
给模型文件添加权限并修改config.json:
chmod -R 640 {{modelpath}}
vim config.json
将模型的config.json中的 "torch_dtype": "bfloat16", 更改为float16
步骤二:部署MindIE
将准备好的MindIE镜像加载到docker仓库,默认已加载,可以使用docker images查看;
复制该image的ID放到下面文件中
创建启动容器的脚本,可命名为mindie.sh
#!/bin/bash
container_name="Qwen2.5-70B"
image_name="d5a029763969"
model_path="/data"
docker run -it --ipc=host --name=$container_name --shm-size=500G --net=host --privileged=true
--device=/dev/davinci0
--device=/dev/davinci1
--device=/dev/davinci2
--device=/dev/davinci3
--device=/dev/davinci4
--device=/dev/davinci5
--device=/dev/davinci6
--device=/dev/davinci7
--device=/dev/davinci_manager
--device=/dev/devmm_svm
--device=/dev/hisi_hdc
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver
-v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons/
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi
-v /usr/local/sbin/:/usr/local/sbin/
-v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf
-v /var/log/npu/slog/:/var/log/npu/slog
-v /var/log/npu/profiling/:/var/log/npu/profiling
-v /var/log/npu/dump/:/var/log/npu/dump
-v /var/log/npu/:/usr/slog
-v /etc/timezone:/etc/timezone:ro
-v $model_path:$model_path
$image_name /bin/bash
标红部分需要按需修改,其中device参数控制加载到容器的NPU
保存后,执行命令赋予权限:
chmod +x mindie.sh
启动脚本,进入容器,若没有自动启动可以使用docker命令
./mindie.sh
若没有自动进入容器,则使用
docker exec -it 容器名/容器id /bin/bash
执行时可能出现报错,需要将文件用vscode打开,并在右下角将文件从CRLF修改保存为LF
步骤三:进入容器后配置模型
进入目录:
cd /usr/local/Ascend/mindie/latest/mindie-service
配置文件:
vi conf/config.json
为方便复制,模型路径放这里,可忽略本条:
/root/.cache/modelscope/hub/models/Qwen/Qwen2___5-14B-Instruct
保存后,给config.json文件赋予权限
chmod 640 conf/config.json
为方便重复启动,把具体配置放在后面,先放启动命令和查看日志命令
使用nohup后台启动mindie
nohup ./bin/mindieservice_daemon &查看启动日志内容
tail -f nohup.out
有问题需要重启时,将进程杀死再重新启动
pkill -f mindieservice_daemon
下面放配置文件:
{
"Version" : "1.0.0",
"LogConfig" :
{
"logLevel" : "Info",
"logFileSize" : 20,
"logFileNum" : 20,
"logPath" : "logs/mindie-server.log"
},"ServerConfig" :
{
"ipAddress" : "0.0.0.0",
"managementIpAddress" : "127.0.0.2",
"port" : 1025,
"managementPort" : 1026,
"metricsPort" : 1027,
"allowAllZeroIpListening" : false,
"maxLinkNum" : 1000,
"httpsEnabled" : false,
"fullTextEnabled" : false,
"tlsCaPath" : "security/ca/",
"tlsCaFile" : ["ca.pem"],
"tlsCert" : "security/certs/server.pem",
"tlsPk" : "security/keys/server.key.pem",
"tlsPkPwd" : "security/pass/key_pwd.txt",
"tlsCrlPath" : "security/certs/",
"tlsCrlFiles" : ["server_crl.pem"],
"managementTlsCaFile" : ["management_ca.pem"],
"managementTlsCert" : "security/certs/management/server.pem",
"managementTlsPk" : "security/keys/management/server.key.pem",
"managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
"managementTlsCrlPath" : "security/management/certs/",
"managementTlsCrlFiles" : ["server_crl.pem"],
"kmcKsfMaster" : "tools/pmt/master/ksfa",
"kmcKsfStandby" : "tools/pmt/standby/ksfb",
"inferMode" : "standard",
"interCommTLSEnabled" : true,
"interCommPort" : 1121,
"interCommTlsCaPath" : "security/grpc/ca/",
"interCommTlsCaFiles" : ["ca.pem"],
"interCommTlsCert" : "security/grpc/certs/server.pem",
"interCommPk" : "security/grpc/keys/server.key.pem",
"interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
"interCommTlsCrlPath" : "security/grpc/certs/",
"interCommTlsCrlFiles" : ["server_crl.pem"],
"openAiSupport" : "vllm"
},"BackendConfig" : {
"backendName" : "mindieservice_llm_engine",
"modelInstanceNumber" : 1,
"npuDeviceIds" : [[0,1]],
"tokenizerProcessNumber" : 8,
"multiNodesInferEnabled" : false,
"multiNodesInferPort" : 1120,
"interNodeTLSEnabled" : true,
"interNodeTlsCaPath" : "security/grpc/ca/",
"interNodeTlsCaFiles" : ["ca.pem"],
"interNodeTlsCert" : "security/grpc/certs/server.pem",
"interNodeTlsPk" : "security/grpc/keys/server.key.pem",
"interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
"interNodeTlsCrlPath" : "security/grpc/certs/",
"interNodeTlsCrlFiles" : ["server_crl.pem"],
"interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
"interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
"ModelDeployConfig" :
{
"maxSeqLen" : 16384,
"maxInputTokenLen" : 8192,
"truncation" : false,
"ModelConfig" : [
{
"modelInstanceType" : "Standard",
"modelName" : "DeepSeek-R1-Distill-Qwen-7B",
"modelWeightPath" : "/data/DeepSeek-R1-Distill-Qwen-7B",
"worldSize" : 2,
"cpuMemSize" : 5,
"npuMemSize" : -1,
"backendType" : "atb",
"trustRemoteCode" : false
}
]
},"ScheduleConfig" :
{
"templateType" : "Standard",
"templateName" : "Standard_LLM",
"cacheBlockSize" : 128,"maxPrefillBatchSize" : 50,
"maxPrefillTokens" : 8192,
"prefillTimeMsPerReq" : 150,
"prefillPolicyType" : 0,"decodeTimeMsPerReq" : 50,
"decodePolicyType" : 0,"maxBatchSize" : 200,
"maxIterTimes" : 8192,
"maxPreemptCount" : 0,
"supportSelectBatch" : false,
"maxQueueDelayMicroseconds" : 5000
}
}
}
标红处需要修改,若部署多个模型,标紫处也要修改。
以下参数仅为示例,需根据实际修改
"ipAddress" : "" 填业务ip地址
"httpsEnabled" : false 忽略https的通信
"npuDeviceIds" : [[0,1]] 表示启用哪几张卡(这里是0,1两张卡)
"modelName" ="DeepSeek-R1-Distill-Qwen-7B" 模型名称
"modelWeightPath" = "/data/DeepSeek-R1-Distill-Qwen-7B" 模型权重路径,由于启动脚本已经配置了与宿主机一致,所以这里写宿主机的目录就可以,比如/root/.cache/modelscope/hub/models/Qwen/Qwen2___5-14B-Instruct
"worldSize" : 2 使用卡的数量与上面npuDeviceIds要对应
"maxSeqLen" :最大序列长度
"maxInputTokenLen":最大输入token数
"maxIterTimes":模型最大输出token数maxSeqLen = maxInputTokenLen + maxIterTimes:最大序列长度
maxPrefillTokens = maxInputTokenLen:预填充最大token数:"supportSelectBatch" : true 建议打开








