vllm.model_executor.layers.fused_moe.router.fused_topk_bias_router ¶
FusedTopKBiasRouter ¶
Bases: BaseRouter
Router using fused top-k with e_score_correction_bias.
Source code in vllm/model_executor/layers/fused_moe/router/fused_topk_bias_router.py
__init__ ¶
__init__(
top_k: int,
global_num_experts: int,
eplb_state: EplbLayerState,
e_score_correction_bias: Tensor,
scoring_func: str,
renormalize: bool = True,
routed_scaling_factor: float = 1.0,
enable_eplb: bool = False,
indices_type_getter: Callable[[], dtype | None]
| None = None,
)
Source code in vllm/model_executor/layers/fused_moe/router/fused_topk_bias_router.py
_compute_routing ¶
_compute_routing(
hidden_states: Tensor,
router_logits: Tensor,
indices_type: dtype | None,
) -> tuple[Tensor, Tensor]
Compute routing using fused top-k with bias.
Source code in vllm/model_executor/layers/fused_moe/router/fused_topk_bias_router.py
fused_topk_bias ¶
fused_topk_bias(
hidden_states: Tensor,
gating_output: Tensor,
e_score_correction_bias: Tensor,
topk: int,
renormalize: bool,
scoring_func: str = "softmax",
indices_type: dtype | None = None,
)
Source code in vllm/model_executor/layers/fused_moe/router/fused_topk_bias_router.py
vllm_topk_sigmoid ¶
vllm_topk_sigmoid(
topk_weights: Tensor,
topk_indices: Tensor,
token_expert_indices: Tensor,
gating_output: Tensor,
renormalize: bool = False,
e_score_correction_bias: Tensor | None = None,
) -> tuple[Tensor, ...]
Source code in vllm/model_executor/layers/fused_moe/router/fused_topk_bias_router.py
vllm_topk_softmax ¶
vllm_topk_softmax(
topk_weights: Tensor,
topk_indices: Tensor,
token_expert_indices: Tensor,
gating_output: Tensor,
renormalize: bool = False,
e_score_correction_bias: Tensor | None = None,
) -> tuple[Tensor, ...]