Skip to content

GPUModelRunner

GPUModelRunner

NNsightGPUModelRunner

NNsightGPUModelRunner(*args, **kwargs)

Bases: GPUModelRunner

nnsight_model instance-attribute

nnsight_model: VLLM

nnsight_request_helper instance-attribute

nnsight_request_helper = NNsightRequestHelper()

NNsightRequestHelper

NNsightRequestHelper()

Helper class for batching requests in the GPUModelRunner.

ATTRIBUTE DESCRIPTION
ids_to_batch_group

Dictionary mapping request IDs to their assigned batch group indices.

TYPE: Dict[str, int]

interleaver_to_ids

Dictionary mapping interleavers to sets of request IDs.

TYPE: Dict[Interleaver, Set[str]]

flat_batch_groups

Dictionary mapping interleavers to their flattened batch groups.

TYPE: Dict[Interleaver, List[Tuple[int, int]]]

METHOD DESCRIPTION
process_new_reqs

List[NewRequestData]) -> None: Process new requests and compute the flat batch groups.

process_finished_req

str, interleaver: Interleaver) -> None: Process a finished request, by updating batch groups and cleaning up mappings.

req_id_to_batch_group_idx instance-attribute
req_id_to_batch_group_idx: Dict[str, int] = {}
num_prompts_in_mediator instance-attribute
num_prompts_in_mediator = {}
process_new_reqs
process_new_reqs(new_reqs: List[NewRequestData], model: VLLM) -> None

Process new requests and organize them into batch groups for execution.

This method handles the batching logic for new requests, organizing them into appropriate batch groups based on their interleaver's batching strategy.

PARAMETER DESCRIPTION
new_reqs

List of new request data objects to process. Each request contains sampling parameters with an associated interleaver that defines the batching behavior.

TYPE: List[NewRequestData]

Notes
  • Resets the flat_batch_groups dictionary at the start
  • For interleavers that require batching, requests are assigned to batch groups
  • Batch groups are tuples of (start_position, size) indicating token ranges
  • Updates internal tracking dictionaries for request-to-batch-group mapping
  • Advances to next batch group when current group capacity is exceeded
unflatten
unflatten(model: VLLM)
process_finished_reqs
process_finished_reqs(finished_request_ids: Set[str], requests, model: VLLM) -> None

load_model

load_model(*args, **kwargs) -> None

execute_model

execute_model(scheduler_output: SchedulerOutput, intermediate_tensors: Optional[IntermediateTensors] = None)

finish_nnsight

finish_nnsight(finished_requests: list[RequestOutput]) -> ModelRunnerOutput