Skip to content

GPUModelRunner

GPUModelRunner

NNsightGPUModelRunner

NNsightGPUModelRunner(*args, **kwargs)

Bases: GPUModelRunner

Custom vLLM GPU model runner that interleaves NNsight interventions with model execution.

Wraps the model with an NNsight :class:Envoy, deserializes mediators from incoming :class:NNsightSamplingParams, and manages batch group mappings so each invoke's intervention code sees the correct slice of the batch.

nnsight_model instance-attribute

nnsight_model: VLLM

nnsight_request_helper instance-attribute

nnsight_request_helper = NNsightRequestHelper()

NNsightRequestHelper

NNsightRequestHelper()

Helper class for batching requests in the GPUModelRunner.

ATTRIBUTE DESCRIPTION
ids_to_batch_group

Dictionary mapping request IDs to their assigned batch group indices.

TYPE: Dict[str, int]

interleaver_to_ids

Dictionary mapping interleavers to sets of request IDs.

TYPE: Dict[Interleaver, Set[str]]

flat_batch_groups

Dictionary mapping interleavers to their flattened batch groups.

TYPE: Dict[Interleaver, List[Tuple[int, int]]]

METHOD DESCRIPTION
process_new_reqs

List[NewRequestData]) -> None: Process new requests and compute the flat batch groups.

process_finished_req

str, interleaver: Interleaver) -> None: Process a finished request, by updating batch groups and cleaning up mappings.

req_id_to_batch_group_idx instance-attribute
req_id_to_batch_group_idx: Dict[str, int] = {}
mediators instance-attribute
mediators: Dict[str, Any] = {}
trace_contexts instance-attribute
trace_contexts: Dict[str, dict] = {}
process_new_reqs
process_new_reqs(new_reqs: List[NewRequestData], model: VLLM) -> None

Process new requests and organize them into batch groups for execution.

Each request carries its own serialized mediator. When multiple mediators belong to the same trace (identified by trace_id), the first arrival's __globals__ become the canonical reference. Subsequent arrivals graft the saved variable entries from the canonical globals into their own __globals__, so all mediators share the same Python objects for cross-invoke state.

PARAMETER DESCRIPTION
new_reqs

List of new request data objects to process.

TYPE: List[NewRequestData]

unflatten
unflatten(model: VLLM)
process_batch_groups
process_batch_groups(num_tokens_scheduled: Dict[str, int], batch_req_ids: List[str], model: VLLM) -> None
match_req_ids
match_req_ids(req_id_set: set) -> List[tuple]

Match engine-reported request IDs to stored mediators.

vLLM appends a hash suffix to request IDs (e.g. "0-abc123" or "uuid-abc123"). This method strips the suffix with rsplit and falls back to an exact match.

RETURNS DESCRIPTION
List[tuple]

List of (base_id, mediator, internal_key) tuples.

finalize_mediators
finalize_mediators(matched, finished_req_id_set, model: VLLM) -> set

Run result handler and cancel finished mediators.

RETURNS DESCRIPTION
set

Set of internal keys for mediators that were finalized.

collect_saves
collect_saves(matched, finished_internal_keys: set) -> tuple

Collect saved values from mediator frames.

Gathers per-invoke saves from frame locals and trace-shared saves from canonical globals (only when a trace is fully done).

RETURNS DESCRIPTION
tuple

(saves, removals) — the saves dict and set of

tuple

id() values to discard from Globals.saves.

cleanup_finished
cleanup_finished(finished_internal_keys: set, removals: set) -> None

Clean up state for finished requests.

Removes entries from Globals.saves, deletes completed trace contexts, and drops mediator entries.

load_model

load_model(*args, **kwargs) -> None

execute_model

execute_model(scheduler_output: SchedulerOutput, intermediate_tensors: Optional[IntermediateTensors] = None)

collect_nnsight

collect_nnsight(req_ids: list[str], finished_req_ids: list[str] | None = None) -> Optional[bytes]

Collect saved values from mediators, optionally finalizing finished requests.

Called on every streamed output (async) or on finished requests (sync). Saves are collected for ALL req_ids. Mediators listed in finished_req_ids are additionally finalized (result handler, cancel) and cleaned up.

PARAMETER DESCRIPTION
req_ids

Request IDs to collect current saves from.

TYPE: list[str]

finished_req_ids

Subset of request IDs that are finished and should be finalized and cleaned up. None means no requests are finished.

TYPE: list[str] | None DEFAULT: None