Info
Last Execution: 2026-02-17
| Package | Version |
|---|---|
| nnsight | 0.5.15 |
| Python | 3.12.3 |
| torch | 2.10.0+cu128 |
| transformers | 5.2.0 |
Accessing Intermediate Operations¶
.output and .input let you hook into a module's inputs and outputs. But what about the operations inside a module's forward pass? .source lets you access any intermediate operation — function calls, method calls, tensor operations — within a module's forward method.
Setup¶
from nnsight import LanguageModel
model = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)
Discovering Operations¶
Print .source on any module to see its forward method with all hookable operations labeled.
print(model.transformer.h[0].mlp.source)
* def forward(self, hidden_states: tuple[torch.FloatTensor] | None) -> torch.FloatTensor:
self_c_fc_0 -> 0 hidden_states = self.c_fc(hidden_states)
self_act_0 -> 1 hidden_states = self.act(hidden_states)
self_c_proj_0 -> 2 hidden_states = self.c_proj(hidden_states)
self_dropout_0 -> 3 hidden_states = self.dropout(hidden_states)
4 return hidden_states
5
Each labeled line (like self_c_fc_0, self_act_0) is an operation you can access inside a trace. The number suffix distinguishes multiple calls to the same function.
Larger modules have more operations. Here's the attention module:
print(model.transformer.h[0].attn.source)
* def forward(
0 self,
1 hidden_states: tuple[torch.FloatTensor] | None,
2 past_key_values: Cache | None = None,
3 cache_position: torch.LongTensor | None = None,
4 attention_mask: torch.FloatTensor | None = None,
5 encoder_hidden_states: torch.Tensor | None = None,
6 encoder_attention_mask: torch.FloatTensor | None = None,
7 output_attentions: bool | None = False,
8 **kwargs,
9 ) -> tuple[torch.Tensor | tuple[torch.Tensor], ...]:
10 is_cross_attention = encoder_hidden_states is not None
11 if past_key_values is not None:
isinstance_0 -> 12 if isinstance(past_key_values, EncoderDecoderCache):
past_key_values_is_updated_get_0 -> 13 is_updated = past_key_values.is_updated.get(self.layer_idx)
14 if is_cross_attention:
15 # after the first generated id, we can subsequently re-use all key/value_layer from cache
16 curr_past_key_values = past_key_values.cross_attention_cache
17 else:
18 curr_past_key_values = past_key_values.self_attention_cache
19 else:
20 curr_past_key_values = past_key_values
21
22 if is_cross_attention:
hasattr_0 -> 23 if not hasattr(self, "q_attn"):
ValueError_0 -> 24 raise ValueError(
25 "If class is used as cross attention, the weights `q_attn` have to be defined. "
26 "Please make sure to instantiate class with `GPT2Attention(..., is_cross_attention=True)`."
27 )
self_q_attn_0 -> 28 query_states = self.q_attn(hidden_states)
29 attention_mask = encoder_attention_mask
30
31 # Try to get key/value states from cache if possible
32 if past_key_values is not None and is_updated:
33 key_states = curr_past_key_values.layers[self.layer_idx].keys
34 value_states = curr_past_key_values.layers[self.layer_idx].values
35 else:
self_c_attn_0 -> 36 key_states, value_states = self.c_attn(encoder_hidden_states).split(self.split_size, dim=2)
split_0 -> + ...
37 shape_kv = (*key_states.shape[:-1], -1, self.head_dim)
key_states_view_0 -> 38 key_states = key_states.view(shape_kv).transpose(1, 2)
transpose_0 -> + ...
value_states_view_0 -> 39 value_states = value_states.view(shape_kv).transpose(1, 2)
transpose_1 -> + ...
40 else:
self_c_attn_1 -> 41 query_states, key_states, value_states = self.c_attn(hidden_states).split(self.split_size, dim=2)
split_1 -> + ...
42 shape_kv = (*key_states.shape[:-1], -1, self.head_dim)
key_states_view_1 -> 43 key_states = key_states.view(shape_kv).transpose(1, 2)
transpose_2 -> + ...
value_states_view_1 -> 44 value_states = value_states.view(shape_kv).transpose(1, 2)
transpose_3 -> + ...
45
46 shape_q = (*query_states.shape[:-1], -1, self.head_dim)
query_states_view_0 -> 47 query_states = query_states.view(shape_q).transpose(1, 2)
transpose_4 -> + ...
48
49 if (past_key_values is not None and not is_cross_attention) or (
50 past_key_values is not None and is_cross_attention and not is_updated
51 ):
52 # save all key/value_layer to cache to be re-used for fast auto-regressive generation
53 cache_position = cache_position if not is_cross_attention else None
curr_past_key_values_update_0 -> 54 key_states, value_states = curr_past_key_values.update(
55 key_states, value_states, self.layer_idx, {"cache_position": cache_position}
56 )
57 # set flag that curr layer for cross-attn is already updated so we can re-use in subsequent calls
58 if is_cross_attention:
59 past_key_values.is_updated[self.layer_idx] = True
60
61 using_eager = self.config._attn_implementation == "eager"
ALL_ATTENTION_FUNCTIONS_get_interface_0 -> 62 attention_interface: Callable = ALL_ATTENTION_FUNCTIONS.get_interface(
63 self.config._attn_implementation, eager_attention_forward
64 )
65
66 if using_eager and self.reorder_and_upcast_attn:
self__upcast_and_reordered_attn_0 -> 67 attn_output, attn_weights = self._upcast_and_reordered_attn(
68 query_states, key_states, value_states, attention_mask
69 )
70 else:
attention_interface_0 -> 71 attn_output, attn_weights = attention_interface(
72 self,
73 query_states,
74 key_states,
75 value_states,
76 attention_mask,
77 dropout=self.attn_dropout.p if self.training else 0.0,
78 **kwargs,
79 )
80
attn_output_reshape_0 -> 81 attn_output = attn_output.reshape(*attn_output.shape[:-2], -1).contiguous()
contiguous_0 -> + ...
self_c_proj_0 -> 82 attn_output = self.c_proj(attn_output)
self_resid_dropout_0 -> 83 attn_output = self.resid_dropout(attn_output)
84
85 return attn_output, attn_weights
86
Getting an Intermediate Value¶
Access any labeled operation's .output inside a trace — just like you would with a module.
with model.trace("The Eiffel Tower is in the city of"):
# Get the hidden states after the GELU activation (before projection)
post_gelu = model.transformer.h[0].mlp.source.self_act_0.output.save()
print(f"Post-GELU shape: {post_gelu.shape}")
Post-GELU shape: torch.Size([1, 10, 3072])
How source works
When you access .source, nnsight rewrites the module's forward method to inject hooks into every operation. Each operation gets .input, .inputs, and .output properties — just like a regular module.
Setting an Intermediate Value¶
You can also modify intermediate values. Here we zero out the MLP's post-GELU activations at layer 11 to see how it affects the prediction:
with model.trace("The Eiffel Tower is in the city of"):
normal_logits = model.lm_head.output.save()
with model.trace("The Eiffel Tower is in the city of"):
# Zero the MLP's GELU output at layer 11
model.transformer.h[11].mlp.source.self_act_0.output[:] = 0
modified_logits = model.lm_head.output.save()
print(f"Normal: {model.tokenizer.decode(normal_logits[0, -1].argmax(dim=-1))}")
print(f"Zeroed GELU: {model.tokenizer.decode(modified_logits[0, -1].argmax(dim=-1))}")
Normal: Paris Zeroed GELU: London
Patching Between Layers¶
Transfer an intermediate value from one layer to another:
with model.trace("The Eiffel Tower is in the city of"):
# Capture layer 0's post-GELU activations
gelu_0 = model.transformer.h[0].mlp.source.self_act_0.output
# Patch them into layer 5
model.transformer.h[5].mlp.source.self_act_0.output = gelu_0
logits = model.lm_head.output.save()
print(f"Patched MLP prediction: {model.tokenizer.decode(logits[0, -1].argmax(dim=-1))}")
Patched MLP prediction: London
Recursive Source Tracing¶
.source works recursively. If a labeled operation calls another function, you can chain .source to trace into it. For example, the attention interface function internally calls scaled_dot_product_attention:
with model.trace("The Eiffel Tower is in the city of"):
# Trace into the attention interface → access the SDPA call's input (the query tensor)
sdpa = model.transformer.h[0].attn.source.attention_interface_0.source
query = sdpa.torch_nn_functional_scaled_dot_product_attention_0.input.save()
print(f"Query shape: {query.shape}") # [batch, heads, seq_len, head_dim]
Query shape: torch.Size([1, 12, 10, 64])
Viewing a specific operation
Print a specific operation to see it highlighted in its surrounding context:
print(model.transformer.h[0].attn.source.self_c_proj_0)
This shows the operation with an arrow (-->) and surrounding lines for context.
Don't chain .source through another .source on submodules
If a .source listing shows a submodule call (like self.c_proj), access it directly on the model — not through .source:
# Wrong — don't chain through .source
model.transformer.h[0].attn.source.self_c_proj.source
# Correct — access the submodule directly
model.transformer.h[0].attn.c_proj.source