The page you navigated to does not exist, so we brought you to the closest page to it.
flytekitplugins.inference.vllm.serve
class HFSecret(
secrets_prefix: str,
hf_token_key: str,
hf_token_group: typing.Optional[str],
)
| Parameter |
Type |
Description |
secrets_prefix |
str |
The secrets prefix that Flyte appends to all mounted secrets. |
hf_token_key |
str |
The key name for the HuggingFace token. |
hf_token_group |
typing.Optional[str] |
The group name for the HuggingFace token. |
class VLLM(
hf_secret: flytekitplugins.inference.vllm.serve.HFSecret,
arg_dict: typing.Optional[dict],
image: str,
health_endpoint: str,
port: int,
cpu: int,
gpu: int,
mem: str,
)
Initialize NIM class for managing a Kubernetes pod template.
| Parameter |
Type |
Description |
hf_secret |
flytekitplugins.inference.vllm.serve.HFSecret |
Instance of HFSecret for managing hugging face secrets. |
arg_dict |
typing.Optional[dict] |
A dictionary of arguments for the VLLM model server (https |
image |
str |
The Docker image to be used for the model server container. Default is “vllm/vllm-openai”. |
health_endpoint |
str |
The health endpoint for the model server container. Default is “/health”. |
port |
int |
The port number for the model server container. Default is 8000. |
cpu |
int |
The number of CPU cores requested for the model server container. Default is 2. |
gpu |
int |
The number of GPU cores requested for the model server container. Default is 1. |
mem |
str |
The amount of memory requested for the model server container. Default is “10Gi”. |
| Property |
Type |
Description |
base_url |
None |
|
pod_template |
None |
|
def setup_vllm_pod_template()