Multiple Token Generation#

When generating more than one token, use <module>.next() to denote following interventions should be applied to the subsequent generations for that module.

Here we generate three tokens and save the hidden states of the last layer for each one:

[1]:
from nnsight import LanguageModel

model = LanguageModel('openai-community/gpt2', device_map='cuda')
[7]:
with model.generate('The Eiffel Tower is in the city of', max_new_tokens=3) as tracer:

    hidden_states1 = model.transformer.h[-1].output[0].save()

    hidden_states2 = model.transformer.h[-1].next().output[0].save()

    hidden_states3 = model.transformer.h[-1].next().output[0].save()

    out = model.generator.output.save()
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Note how calling save before tracer.next() returns the hidden state across the initial prompt while calling save after returns the hidden state of each subsequent generated token.

[8]:
print(hidden_states1.shape)
print(hidden_states2.shape)
print(hidden_states3.shape)
print(out)
torch.Size([1, 10, 768])
torch.Size([1, 1, 768])
torch.Size([1, 1, 768])
tensor([[ 464,  412,  733,  417, 8765,  318,  287,  262, 1748,  286, 6342,   11,
          290]], device='cuda:0')