Multiple Token Generation#

When generating more than one token, use <module>.next() to denote following interventions should be applied to the subsequent generations for that module.

Here we generate three tokens and save the hidden states of the last layer for each one:

from nnsight import LanguageModel

model = LanguageModel('openai-community/gpt2', device_map='cuda')
with model.generate('The Eiffel Tower is in the city of', max_new_tokens=3) as tracer:

    hidden_states1 = model.transformer.h[-1].output[0].save()

    hidden_states2 = model.transformer.h[-1].next().output[0].save()

    hidden_states3 = model.transformer.h[-1].next().output[0].save()

    out =
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Note how calling save before returns the hidden state across the initial prompt while calling save after returns the hidden state of each subsequent generated token.

torch.Size([1, 10, 768])
torch.Size([1, 1, 768])
torch.Size([1, 1, 768])
tensor([[ 464,  412,  733,  417, 8765,  318,  287,  262, 1748,  286, 6342,   11,
          290]], device='cuda:0')