Multiple Token Generation#
When generating more than one token, use <module>.next()
to denote following interventions should be applied to the subsequent generations for that module.
Here we generate three tokens and save the hidden states of the last layer for each one:
[1]:
from nnsight import LanguageModel
model = LanguageModel('openai-community/gpt2', device_map='cuda')
[7]:
with model.generate('The Eiffel Tower is in the city of', max_new_tokens=3) as tracer:
hidden_states1 = model.transformer.h[-1].output[0].save()
hidden_states2 = model.transformer.h[-1].next().output[0].save()
hidden_states3 = model.transformer.h[-1].next().output[0].save()
out = model.generator.output.save()
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Note how calling save before tracer.next()
returns the hidden state across the initial prompt while calling save after returns the hidden state of each subsequent generated token.
[8]:
print(hidden_states1.shape)
print(hidden_states2.shape)
print(hidden_states3.shape)
print(out)
torch.Size([1, 10, 768])
torch.Size([1, 1, 768])
torch.Size([1, 1, 768])
tensor([[ 464, 412, 733, 417, 8765, 318, 287, 262, 1748, 286, 6342, 11,
290]], device='cuda:0')