Skip to content

nnsight

batching

ndif-team/nnsight

batching¶

batching ¶

VLLMBatcher ¶

VLLMBatcher(*args, **kwargs)

Bases: Batcher

Batcher that handles tensor-parallel gather/split for vLLM.

vLLM's ColumnParallelLinear and RowParallelLinear layers shard tensors across GPUs. When NNsight intervention code accesses inputs or outputs of these layers, this batcher transparently gathers the sharded tensors so the user sees the full (unsharded) values, then splits them back before returning control to vLLM.

current_module `instance-attribute` ¶

current_module = None

parallel `instance-attribute` ¶

parallel = False

gathered `instance-attribute` ¶

gathered = False

type `instance-attribute` ¶

type = None

wrap ¶

wrap(model: Envoy)

check_gathered ¶

check_gathered()

narrow ¶

narrow(batch_group: Union[int, None])

swap ¶

swap(batch_group: Union[int, None], swap_value: Any)