-
Notifications
You must be signed in to change notification settings - Fork 113
Open
Description
以 qwen2.5 为例,观察到咱参数都是通过 huggingface 直接拉取的方式来实现的,能否给定模型参数的地址运行此模型
qwen2.5-7b-instruct-l4:
enabled: false
url: "hf://Qwen/Qwen2.5-7B-Instruct"
features: [TextGeneration]
env:
VLLM_ATTENTION_BACKEND: "FLASHINFER"
# VLLM_USE_V1: "1"
args:
- --max-model-len=8192
- --max-num-batched-token=8192
- --max-num-seqs=256
- --gpu-memory-utilization=0.95
- --kv-cache-dtype=fp8
- --enable-prefix-caching
# - --enforce-eager
engine: VLLM
resourceProfile: 'nvidia-gpu-l4:1'
Metadata
Metadata
Assignees
Labels
No labels