Skip to content

vllm bench sweep plot_pareto

JSON CLI 参数

在传递 JSON 命令行参数时,以下各组参数是等价的:

  • --json-arg '{"key1": "value1", "key2": {"key3": "value2"}}'
  • --json-arg.key1 value1 --json-arg.key2.key3 value2

此外,可以使用 + 符号逐个传递列表元素:

  • --json-arg '{"key4": ["value3", "value4", "value5"]}'
  • --json-arg.key4+ value3 --json-arg.key4+='value4,value5'

参数

--user-count-var

Result key that stores concurrent user count. Falls back to max_concurrent_requests if missing.
Default: max_concurrency

--gpu-count-var

Result key that stores GPU count. If not provided, falls back to num_gpus/gpu_count or tensor_parallel_size * pipeline_parallel_size.

--label-by

Comma-separated list of fields to annotate on Pareto frontier points.
Default: max_concurrency,gpu_count

--dry-run

If set, prints the figures to plot without drawing them.
Default: False