当前位置：首页 > news >正文

Qwen2.5 模型使用初体验

news 2025/7/3 19:16:52

1. 环境准备

硬件环境：3张A100，40G；

开发环境：CUDA-12.2，conda虚拟环境

conda create -n my_vllm python==3.9.19 pip
conda activate my_vllm
pip install modelscope
pip install vllm

2. 模型下载

因为硬件环境限制，经多次尝试，只能部署Qwen2.5-72B-Instruct-GPTQ-Int4版本；

也可能是我部署方式不对，导致部署更大版本时一直OOM。。。

# 模型下载
# modelscope默认安装路径：/root/.cache/modelscope/hub/qwen/Qwen2.5-72B-Instruct-GPTQ-Int4
from modelscope import snapshot_download
model_dir = snapshot_download('qwen/Qwen2.5-72B-Instruct-GPTQ-Int4', local_dir='/home/models/qwen/Qwen2.5-72B-Instruct-GPTQ-Int4')

参考文档：

魔搭社区

效率评估 - Qwen

3. 直接服务器vllm方式启动测试

vllm serve /home/models/qwen/Qwen2.5-72B-Instruct-GPTQ-Int4 --tensor-parallel-size 2

启动成功结果如下：

参考文档：

https://qwen.readthedocs.io/zh-cn/latest/deployment/vllm.html

4. 测试代码

import json
import requestsurl = "http://localhost:8000/v1/chat/completions"
headers = {"Content-Type": "application/json"
}question= 'balabala'data = {"model": "/home/models/qwen/Qwen2.5-72B-Instruct-GPTQ-Int4","messages": [{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},{"role": "user", "content": question}],"temperature": 0.7,"top_p": 0.8,"repetition_penalty": 1.05,"max_tokens": 512
}response = requests.post(url, headers=headers, data=json.dumps(data))# Print the response
print(response.json())
print(response.json()['choices'][0]['message']['content'][8:-4])

可以正常返回结果。

查看全文

http://www.mrgr.cn/news/36569.html