Applying Operations#

Most basic operations and torch operations work on proxies and are added to the computation graph.

In this example we get the sum of the hidden states and add them to the hidden_states themselves (for whatever reason). By saving the various steps, we can see how the values change.

[1]:
from nnsight import LanguageModel
import torch

model = LanguageModel('openai-community/gpt2', device_map='cuda')

with model.trace('The Eiffel Tower is in the city of') as tracer:

    hidden_states_pre = model.transformer.h[-1].output[0].save()

    hs_sum = torch.sum(hidden_states_pre).save()

    hs_edited = hidden_states_pre + hs_sum

    hs_edited = hs_edited.save()
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
[2]:
print(hidden_states_pre)
print(hs_sum)
print(hs_edited)
tensor([[[ 0.0505, -0.1728, -0.1690,  ..., -1.0096,  0.1280, -1.0687],
         [ 8.7494,  2.9057,  5.3024,  ..., -8.0418,  1.2964, -2.8677],
         [ 0.2960,  4.6686, -3.6642,  ...,  0.2391, -2.6064,  3.2263],
         ...,
         [ 2.1537,  6.8917,  3.8651,  ...,  0.0588, -1.9866,  5.9188],
         [-0.4460,  7.4285, -9.3065,  ...,  2.0528, -2.7946,  0.5556],
         [ 6.6286,  1.7258,  4.7969,  ...,  7.6714,  3.0683,  2.0481]]],
       device='cuda:0', grad_fn=<AddBackward0>)
tensor(501.2959, device='cuda:0', grad_fn=<SumBackward0>)
tensor([[[501.3464, 501.1231, 501.1269,  ..., 500.2863, 501.4239, 500.2272],
         [510.0453, 504.2016, 506.5983,  ..., 493.2541, 502.5923, 498.4282],
         [501.5919, 505.9645, 497.6317,  ..., 501.5350, 498.6895, 504.5222],
         ...,
         [503.4496, 508.1876, 505.1610,  ..., 501.3547, 499.3093, 507.2147],
         [500.8499, 508.7244, 491.9894,  ..., 503.3487, 498.5013, 501.8515],
         [507.9245, 503.0217, 506.0928,  ..., 508.9673, 504.3641, 503.3440]]],
       device='cuda:0', grad_fn=<AddBackward0>)