Skip to content

batching

batching

VLLMBatcher

VLLMBatcher(*args, **kwargs)

Bases: Batcher

Batcher that handles tensor-parallel gather/split for vLLM.

vLLM's ColumnParallelLinear and RowParallelLinear layers shard tensors across GPUs. When NNsight intervention code accesses inputs or outputs of these layers, this batcher transparently gathers the sharded tensors so the user sees the full (unsharded) values, then splits them back before returning control to vLLM.

current_module instance-attribute

current_module = None

parallel instance-attribute

parallel = False

gathered instance-attribute

gathered = False

type instance-attribute

type = None

wrap

wrap(model: Envoy)

check_gathered

check_gathered()

narrow

narrow(batch_group: Union[int, None])

swap

swap(batch_group: Union[int, None], swap_value: Any)