GPUModelRunner¶

GPUModelRunner ¶

NNsightGPUModelRunner ¶

NNsightGPUModelRunner(*args, **kwargs)

Bases: GPUModelRunner

Custom vLLM GPU model runner that interleaves NNsight interventions with model execution.

Wraps the model with an NNsight :class:Envoy, deserializes mediators from incoming :class:NNsightSamplingParams, and manages batch group mappings so each invoke's intervention code sees the correct slice of the batch.

nnsight_model `instance-attribute` ¶

nnsight_model: VLLM

nnsight_request_helper `instance-attribute` ¶

nnsight_request_helper = NNsightRequestHelper()

NNsightRequestHelper ¶

NNsightRequestHelper()

Helper class for batching requests in the GPUModelRunner.

ATTRIBUTE	DESCRIPTION
`ids_to_batch_group`	Dictionary mapping request IDs to their assigned batch group indices. TYPE: `Dict[str, int]`
`interleaver_to_ids`	Dictionary mapping interleavers to sets of request IDs. TYPE: `Dict[Interleaver, Set[str]]`
`flat_batch_groups`	Dictionary mapping interleavers to their flattened batch groups. TYPE: `Dict[Interleaver, List[Tuple[int, int]]]`

METHOD	DESCRIPTION
`process_new_reqs`	List[NewRequestData]) -> None: Process new requests and compute the flat batch groups.
`process_finished_req`	str, interleaver: Interleaver) -> None: Process a finished request, by updating batch groups and cleaning up mappings.

req_id_to_batch_group_idx `instance-attribute` ¶

req_id_to_batch_group_idx: Dict[str, int] = {}

mediators `instance-attribute` ¶

mediators: Dict[str, Any] = {}

trace_contexts `instance-attribute` ¶

trace_contexts: Dict[str, dict] = {}

process_new_reqs ¶

process_new_reqs(new_reqs: List[NewRequestData], model: VLLM) -> None

Process new requests and organize them into batch groups for execution.

Each request carries its own serialized mediator. When multiple mediators belong to the same trace (identified by trace_id), the first arrival's __globals__ become the canonical reference. Subsequent arrivals graft the saved variable entries from the canonical globals into their own __globals__, so all mediators share the same Python objects for cross-invoke state.

PARAMETER	DESCRIPTION
`new_reqs`	List of new request data objects to process. TYPE: `List[NewRequestData]`

unflatten ¶

unflatten(model: VLLM)

Re-assign batch groups from token-level to prompt-level.

After the forward pass, logits have one row per scheduled request (in batch_req_ids order). We must walk the same ordering used by process_batch_groups so that each mediator's prompt-level index matches its row in the logits tensor — even when the batch contains non-NNsight requests or requests whose mediators have already finished.

process_batch_groups ¶

process_batch_groups(num_tokens_scheduled: Dict[str, int], batch_req_ids: List[str], model: VLLM) -> None

match_req_ids ¶

match_req_ids(req_id_set: set) -> List[tuple]

Match engine-reported request IDs to stored mediators.

vLLM appends a hash suffix to request IDs (e.g. "0-abc123" or "uuid-abc123"). This method strips the suffix with rsplit and falls back to an exact match.

RETURNS	DESCRIPTION
`List[tuple]`	List of `(base_id, mediator, internal_key)` tuples.

finalize_mediators ¶

finalize_mediators(matched, finished_req_id_set, model: VLLM) -> set

Run result handler and cancel finished mediators.

RETURNS	DESCRIPTION
`set`	Set of internal keys for mediators that were finalized.

collect_saves ¶

collect_saves(matched, finished_internal_keys: set) -> tuple

Collect saved values from mediator frames.

Gathers per-invoke saves from frame locals and trace-shared saves from canonical globals (only when a trace is fully done).

RETURNS	DESCRIPTION
`tuple`	`(saves, removals)` — the saves dict and set of
`tuple`	`id()` values to discard from `Globals.saves`.

cleanup_finished ¶

cleanup_finished(finished_internal_keys: set, removals: set) -> None

Clean up state for finished requests.

Removes entries from Globals.saves, deletes completed trace contexts, and drops mediator entries.

load_model ¶

load_model(*args, **kwargs) -> None

execute_model ¶

execute_model(scheduler_output: SchedulerOutput, intermediate_tensors: Optional[IntermediateTensors] = None)

collect_nnsight ¶

collect_nnsight(req_ids: list[str], finished_req_ids: list[str] | None = None) -> Optional[bytes]

Collect saved values from mediators, optionally finalizing finished requests.

Called on every streamed output (async) or on finished requests (sync). Saves are collected for ALL req_ids. Mediators listed in finished_req_ids are additionally finalized (result handler, cancel) and cleaned up.

PARAMETER	DESCRIPTION
`req_ids`	Request IDs to collect current saves from. TYPE: `list[str]`
`finished_req_ids`	Subset of request IDs that are finished and should be finalized and cleaned up. `None` means no requests are finished. TYPE: `list[str] \| None` DEFAULT: `None`