GPUModelRunner¶
GPUModelRunner
¶
NNsightGPUModelRunner
¶
Bases: GPUModelRunner
Custom vLLM GPU model runner that interleaves NNsight interventions with model execution.
Wraps the model with an NNsight :class:Envoy, deserializes
mediators from incoming :class:NNsightSamplingParams, and manages
batch group mappings so each invoke's intervention code sees the
correct slice of the batch.
NNsightRequestHelper
¶
Helper class for batching requests in the GPUModelRunner.
| ATTRIBUTE | DESCRIPTION |
|---|---|
ids_to_batch_group |
Dictionary mapping request IDs to their assigned batch group indices.
TYPE:
|
interleaver_to_ids |
Dictionary mapping interleavers to sets of request IDs.
TYPE:
|
flat_batch_groups |
Dictionary mapping interleavers to their flattened batch groups.
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
process_new_reqs |
List[NewRequestData]) -> None: Process new requests and compute the flat batch groups. |
process_finished_req |
str, interleaver: Interleaver) -> None: Process a finished request, by updating batch groups and cleaning up mappings. |
process_new_reqs
¶
process_new_reqs(new_reqs: List[NewRequestData], model: VLLM) -> None
Process new requests and organize them into batch groups for execution.
Each request carries its own serialized mediator. When multiple
mediators belong to the same trace (identified by trace_id), the
first arrival's __globals__ become the canonical reference.
Subsequent arrivals graft the saved variable entries from the
canonical globals into their own __globals__, so all mediators
share the same Python objects for cross-invoke state.
| PARAMETER | DESCRIPTION |
|---|---|
new_reqs
|
List of new request data objects to process.
TYPE:
|
process_batch_groups
¶
process_batch_groups(num_tokens_scheduled: Dict[str, int], batch_req_ids: List[str], model: VLLM) -> None
match_req_ids
¶
Match engine-reported request IDs to stored mediators.
vLLM appends a hash suffix to request IDs (e.g. "0-abc123"
or "uuid-abc123"). This method strips the suffix with
rsplit and falls back to an exact match.
| RETURNS | DESCRIPTION |
|---|---|
List[tuple]
|
List of |
finalize_mediators
¶
finalize_mediators(matched, finished_req_id_set, model: VLLM) -> set
Run result handler and cancel finished mediators.
| RETURNS | DESCRIPTION |
|---|---|
set
|
Set of internal keys for mediators that were finalized. |
collect_saves
¶
Collect saved values from mediator frames.
Gathers per-invoke saves from frame locals and trace-shared saves from canonical globals (only when a trace is fully done).
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
|
tuple
|
|
cleanup_finished
¶
Clean up state for finished requests.
Removes entries from Globals.saves, deletes completed
trace contexts, and drops mediator entries.
execute_model
¶
execute_model(scheduler_output: SchedulerOutput, intermediate_tensors: Optional[IntermediateTensors] = None)
collect_nnsight
¶
Collect saved values from mediators, optionally finalizing finished requests.
Called on every streamed output (async) or on finished requests (sync).
Saves are collected for ALL req_ids. Mediators listed in
finished_req_ids are additionally finalized (result handler,
cancel) and cleaned up.
| PARAMETER | DESCRIPTION |
|---|---|
req_ids
|
Request IDs to collect current saves from.
TYPE:
|
finished_req_ids
|
Subset of request IDs that are finished and
should be finalized and cleaned up.
TYPE:
|