Info

Last Execution: 2026-02-17

Package	Version
nnsight	0.5.15
Python	3.12.3
torch	2.10.0+cu128
transformers	5.2.0

Setting Activations¶

Setting is how you intervene on a model by editing activations as they flow through the network. This is the basis of techniques like activation patching, ablation, and steering.

Setup¶

In [1]:

Copied!

from nnsight import LanguageModel

model = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)
from nnsight import LanguageModel

model = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)

In-Place Setting¶

Use slice assignment to modify a tensor's values in-place. The original tensor object is mutated, so downstream modules see the change immediately.

In [2]:

Copied!

with model.trace("The Eiffel Tower is in the city of"):

    # Clone before the edit so we can compare
    before = model.transformer.h[0].output[0].clone().save()

    # Zero out all activations at layer 0
    model.transformer.h[0].output[0][:] = 0

    after = model.transformer.h[0].output[0].save()

print("Before:", before[0, 0, :5])
print("After: ", after[0, 0, :5])
with model.trace("The Eiffel Tower is in the city of"):

    # Clone before the edit so we can compare
    before = model.transformer.h[0].output[0].clone().save()

    # Zero out all activations at layer 0
    model.transformer.h[0].output[0][:] = 0

    after = model.transformer.h[0].output[0].save()

print("Before:", before[0, 0, :5])
print("After: ", after[0, 0, :5])

Before: tensor([ 0.1559, -0.7946,  0.3943,  0.3413, -0.5653], device='cuda:0',
       grad_fn=<SliceBackward0>)
After:  tensor([0., 0., 0., 0., 0.], device='cuda:0', grad_fn=<SliceBackward0>)

Clone before saving in-place modifications

When modifying in-place, the saved reference and the modified tensor point to the same memory. If you want to capture the "before" state, call .clone() before the modification.

Replacement Setting¶

Assign a completely new tensor to a module's output. This replaces the tensor object rather than mutating it. No need to .clone() when comparing before/after.

In [3]:

Copied!

with model.trace("The Eiffel Tower is in the city of"):

    before = model.transformer.wte.output.save()

    # Replace the embedding output with a scaled version
    model.transformer.wte.output = model.transformer.wte.output * 0.5

    after = model.transformer.wte.output.save()

print("Before:", before[0, 0, :5])
print("After: ", after[0, 0, :5])
with model.trace("The Eiffel Tower is in the city of"):

    before = model.transformer.wte.output.save()

    # Replace the embedding output with a scaled version
    model.transformer.wte.output = model.transformer.wte.output * 0.5

    after = model.transformer.wte.output.save()

print("Before:", before[0, 0, :5])
print("After: ", after[0, 0, :5])

Before: tensor([-0.0686, -0.0203,  0.0645, -0.0621, -0.1135], device='cuda:0',
       grad_fn=<SliceBackward0>)
After:  tensor([-0.0343, -0.0101,  0.0322, -0.0310, -0.0568], device='cuda:0',
       grad_fn=<SliceBackward0>)

Setting Specific Positions¶

You can target specific batch items, token positions, or hidden dimensions using standard indexing.

In [4]:

Copied!

with model.trace("The Eiffel Tower is in the city of"):

    # Zero out only the last token position at layer 5
    model.transformer.h[5].output[0][:, -1, :] = 0

    output = model.lm_head.output.save()

predicted = model.tokenizer.decode(output[0, -1].argmax(dim=-1))
print(f"Predicted next token (after zeroing last position at layer 5): {predicted}")
with model.trace("The Eiffel Tower is in the city of"):

    # Zero out only the last token position at layer 5
    model.transformer.h[5].output[0][:, -1, :] = 0

    output = model.lm_head.output.save()

predicted = model.tokenizer.decode(output[0, -1].argmax(dim=-1))
print(f"Predicted next token (after zeroing last position at layer 5): {predicted}")

Predicted next token (after zeroing last position at layer 5): ,

Adding a Steering Vector¶

A common intervention is adding a direction vector to activations to steer model behavior.

In [5]:

Copied!

import torch

with model.trace("The Eiffel Tower is in the city of"):

    hidden = model.transformer.h[6].output[0]

    # Create a random steering vector on the same device
    steering = torch.randn(hidden.shape[-1], device=hidden.device) * 10

    # Add it to the last token position
    model.transformer.h[6].output[0][:, -1, :] += steering

    output = model.lm_head.output.save()

predicted = model.tokenizer.decode(output[0, -1].argmax(dim=-1))
print(f"Predicted next token (after steering): {predicted}")
import torch

with model.trace("The Eiffel Tower is in the city of"):

    hidden = model.transformer.h[6].output[0]

    # Create a random steering vector on the same device
    steering = torch.randn(hidden.shape[-1], device=hidden.device) * 10

    # Add it to the last token position
    model.transformer.h[6].output[0][:, -1, :] += steering

    output = model.lm_head.output.save()

predicted = model.tokenizer.decode(output[0, -1].argmax(dim=-1))
print(f"Predicted next token (after steering): {predicted}")

Predicted next token (after steering):  all

Handling Tuple Outputs¶

Some modules (like GPT-2 transformer blocks) return tuples. Use slice assignment on the element you want to modify, or rebuild the tuple for replacement.

In [6]:

Copied!

with model.trace("The Eiffel Tower is in the city of"):

    # In-place: modify the first element of the tuple
    model.transformer.h[0].output[0][:] = 0

    output = model.transformer.h[0].output.save()

print(f"Output is a {type(output).__name__} with {len(output)} elements")
with model.trace("The Eiffel Tower is in the city of"):

    # In-place: modify the first element of the tuple
    model.transformer.h[0].output[0][:] = 0

    output = model.transformer.h[0].output.save()

print(f"Output is a {type(output).__name__} with {len(output)} elements")

Output is a tuple with 1 elements

Tuple assignment

model.layer.output[0][:] = 0 modifies the tensor inside the tuple in-place. To replace a tuple element entirely, reassign the full tuple: model.layer.output = (new_tensor,) + model.layer.output[1:].

Verifying Downstream Effects¶

Setting an early layer's output affects all downstream computations. Here we compare the final logits with and without an intervention.

In [7]:

Copied!

with model.trace("The Eiffel Tower is in the city of"):

    clean_logits = model.lm_head.output.clone().save()
with model.trace("The Eiffel Tower is in the city of"):

    clean_logits = model.lm_head.output.clone().save()

In [8]:

Copied!

with model.trace("The Eiffel Tower is in the city of"):

    model.transformer.h[0].output[0][:] = 0

    modified_logits = model.lm_head.output.save()

diff = (clean_logits - modified_logits).abs().mean()
print(f"Mean absolute difference in logits: {diff:.4f}")
print(f"Clean prediction:    {model.tokenizer.decode(clean_logits[0, -1].argmax(dim=-1))}")
print(f"Modified prediction: {model.tokenizer.decode(modified_logits[0, -1].argmax(dim=-1))}")
with model.trace("The Eiffel Tower is in the city of"):

    model.transformer.h[0].output[0][:] = 0

    modified_logits = model.lm_head.output.save()

diff = (clean_logits - modified_logits).abs().mean()
print(f"Mean absolute difference in logits: {diff:.4f}")
print(f"Clean prediction:    {model.tokenizer.decode(clean_logits[0, -1].argmax(dim=-1))}")
print(f"Modified prediction: {model.tokenizer.decode(modified_logits[0, -1].argmax(dim=-1))}")

Mean absolute difference in logits: 59.8508
Clean prediction:     Paris
Modified prediction: ,