Info

Last Execution: 2026-02-17

Package	Version
nnsight	0.5.15
Python	3.12.3
torch	2.10.0+cu128
transformers	5.2.0

Model Editing¶

Interventions inside model.trace() are temporary — they only apply during that forward pass. With model.edit(), you can create persistently modified versions of a model that apply interventions on every forward pass.

Setup¶

In [1]:

Copied!

import torch
import torch.nn as nn
from nnsight import LanguageModel

model = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)
import torch
import torch.nn as nn
from nnsight import LanguageModel

model = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)

Creating an Edited Model¶

Use model.edit() to define persistent interventions. By default, this creates a new model reference — the original model is unchanged.

In [2]:

Copied!





# First, capture hidden states that produce "Paris"
with model.trace("The Eiffel Tower is in the city of"):
    paris_hs = model.transformer.h[-1].output[0][:, -1, :].save()

# Create an edited model that always injects "Paris" hidden states at the last layer
with model.edit() as model_edited:
    model.transformer.h[-1].output[0][:, -1, :] = paris_hs

# Original model still works normally
with model.trace("Vatican is in the city of"):
    original = model.lm_head.output.argmax(dim=-1).save()

# Edited model always predicts "Paris"
with model_edited.trace("Vatican is in the city of"):
    modified = model.lm_head.output.argmax(dim=-1).save()

print(f"Original:  {model.tokenizer.decode(original[0, -1])}")
print(f"Edited:    {model.tokenizer.decode(modified[0, -1])}")
# First, capture hidden states that produce "Paris"
with model.trace("The Eiffel Tower is in the city of"):
    paris_hs = model.transformer.h[-1].output[0][:, -1, :].save()

# Create an edited model that always injects "Paris" hidden states at the last layer
with model.edit() as model_edited:
    model.transformer.h[-1].output[0][:, -1, :] = paris_hs

# Original model still works normally
with model.trace("Vatican is in the city of"):
    original = model.lm_head.output.argmax(dim=-1).save()

# Edited model always predicts "Paris"
with model_edited.trace("Vatican is in the city of"):
    modified = model.lm_head.output.argmax(dim=-1).save()

print(f"Original:  {model.tokenizer.decode(original[0, -1])}")
print(f"Edited:    {model.tokenizer.decode(modified[0, -1])}")

Original:   Rome
Edited:     Paris

The edit persists across multiple calls:

In [3]:

Copied!





prompts = [
    "The Colosseum is in the city of",
    "Big Ben is in the city of",
    "The Statue of Liberty is in the city of",
]

for prompt in prompts:
    with model_edited.trace(prompt):
        tokens = model.lm_head.output.argmax(dim=-1).save()
    print(f"{prompt} → {model.tokenizer.decode(tokens[0, -1])}")
prompts = [
    "The Colosseum is in the city of",
    "Big Ben is in the city of",
    "The Statue of Liberty is in the city of",
]

for prompt in prompts:
    with model_edited.trace(prompt):
        tokens = model.lm_head.output.argmax(dim=-1).save()
    print(f"{prompt} → {model.tokenizer.decode(tokens[0, -1])}")

The Colosseum is in the city of →  Paris
Big Ben is in the city of →  Paris
The Statue of Liberty is in the city of →  Paris

How editing works

model.edit() records your interventions and replays them on every subsequent forward pass through the edited model. The edit context uses the same syntax as model.trace() — you access .output and .input on modules the same way.

In-Place Editing¶

By default, model.edit() creates a new reference, leaving the original unchanged. If you want to modify the original model directly, use inplace=True.

In [4]:

Copied!





with model.edit(inplace=True):
    model.transformer.h[-1].output[0][:, -1, :] = paris_hs

# Now the original model itself is modified
with model.trace("Vatican is in the city of"):
    tokens = model.lm_head.output.argmax(dim=-1).save()

print(f"In-place edited: {model.tokenizer.decode(tokens[0, -1])}")
with model.edit(inplace=True):
    model.transformer.h[-1].output[0][:, -1, :] = paris_hs

# Now the original model itself is modified
with model.trace("Vatican is in the city of"):
    tokens = model.lm_head.output.argmax(dim=-1).save()

print(f"In-place edited: {model.tokenizer.decode(tokens[0, -1])}")

In-place edited:  Paris

Use inplace=True with caution

In-place edits affect all subsequent forward passes through the model. This includes any model.trace() calls. If you're experimenting, prefer the default (non-inplace) mode so the original model stays clean.

Clearing Edits¶

Use .clear_edits() to remove all in-place edits and restore the model to its original state.

In [5]:

Copied!

model.clear_edits()

with model.trace("Vatican is in the city of"):
    tokens = model.lm_head.output.argmax(dim=-1).save()

print(f"After clear_edits(): {model.tokenizer.decode(tokens[0, -1])}")
model.clear_edits()

with model.trace("Vatican is in the city of"):
    tokens = model.lm_head.output.argmax(dim=-1).save()

print(f"After clear_edits(): {model.tokenizer.decode(tokens[0, -1])}")

After clear_edits():  Rome

Attaching Custom Modules¶

You can attach your own PyTorch modules to the model — like an SAE or LoRA adapter — and wire them into the forward pass with model.edit(). Once wired, the custom module is fully instrumented: you can access its .output in a trace just like any other module.

In [6]:

Copied!





# Define a low-rank adapter
class Adapter(nn.Module):
    def __init__(self, dim, rank=16):
        super().__init__()
        self.down = nn.Linear(dim, rank, bias=False)
        self.up = nn.Linear(rank, dim, bias=False)
        nn.init.zeros_(self.up.weight)  # Start as identity (no effect)

    def forward(self, x):
        return self.up(self.down(x))

# Attach it to layer 5 (on the same device as the layer)
device = next(model.transformer.h[5]._module.parameters()).device
model.transformer.h[5].adapter = Adapter(768, rank=16).to(device)

# Wire the adapter into the forward pass:
# layer output += adapter(layer input)
with model.edit() as model_adapted:
    h5_input = model.transformer.h[5].inputs[0][0]
    adapter_out = model.transformer.h[5].adapter(h5_input, hook=True)
    model.transformer.h[5].output[0][:] = model.transformer.h[5].output[0] + adapter_out
# Define a low-rank adapter
class Adapter(nn.Module):
    def __init__(self, dim, rank=16):
        super().__init__()
        self.down = nn.Linear(dim, rank, bias=False)
        self.up = nn.Linear(rank, dim, bias=False)
        nn.init.zeros_(self.up.weight)  # Start as identity (no effect)

    def forward(self, x):
        return self.up(self.down(x))

# Attach it to layer 5 (on the same device as the layer)
device = next(model.transformer.h[5]._module.parameters()).device
model.transformer.h[5].adapter = Adapter(768, rank=16).to(device)

# Wire the adapter into the forward pass:
# layer output += adapter(layer input)
with model.edit() as model_adapted:
    h5_input = model.transformer.h[5].inputs[0][0]
    adapter_out = model.transformer.h[5].adapter(h5_input, hook=True)
    model.transformer.h[5].output[0][:] = model.transformer.h[5].output[0] + adapter_out

With zero-initialized weights, the adapter has no effect:

In [7]:

Copied!





with model.trace("The Eiffel Tower is in the city of"):
    orig_logits = model.lm_head.output.save()

with model_adapted.trace("The Eiffel Tower is in the city of"):
    adapter_result = model.transformer.h[5].adapter.output.save()
    edited_logits = model.lm_head.output.save()

print(f"Adapter output shape: {adapter_result.shape}")
print(f"Adapter norm: {adapter_result.norm():.4f}")
print(f"Original prediction:  {model.tokenizer.decode(orig_logits[0, -1].argmax(dim=-1))}")
print(f"Adapted prediction:   {model.tokenizer.decode(edited_logits[0, -1].argmax(dim=-1))}")
with model.trace("The Eiffel Tower is in the city of"):
    orig_logits = model.lm_head.output.save()

with model_adapted.trace("The Eiffel Tower is in the city of"):
    adapter_result = model.transformer.h[5].adapter.output.save()
    edited_logits = model.lm_head.output.save()

print(f"Adapter output shape: {adapter_result.shape}")
print(f"Adapter norm: {adapter_result.norm():.4f}")
print(f"Original prediction:  {model.tokenizer.decode(orig_logits[0, -1].argmax(dim=-1))}")
print(f"Adapted prediction:   {model.tokenizer.decode(edited_logits[0, -1].argmax(dim=-1))}")

Adapter output shape: torch.Size([1, 10, 768])
Adapter norm: 0.0000
Original prediction:   Paris
Adapted prediction:    Paris

/home/localadam/work/nnsight-website/.venv/lib/python3.12/site-packages/torch/nn/modules/linear.py:134: UserWarning: Attempting to run cuBLAS, but there was no current CUDA context! Attempting to set the primary context... (Triggered internally at /pytorch/aten/src/ATen/cuda/CublasHandlePool.cpp:330.)
  return F.linear(input, self.weight, self.bias)

After training or modifying the adapter weights, it changes the model's behavior:

In [8]:

Copied!





# Simulate trained weights by setting non-zero values
nn.init.normal_(model.transformer.h[5].adapter._module.up.weight, std=10.0)

with model_adapted.trace("The Eiffel Tower is in the city of"):
    adapter_result = model.transformer.h[5].adapter.output.save()
    adapted_logits = model.lm_head.output.save()

print(f"Adapter output norm: {adapter_result.norm():.4f}")
print(f"Adapted prediction:  {model.tokenizer.decode(adapted_logits[0, -1].argmax(dim=-1))}")
# Simulate trained weights by setting non-zero values
nn.init.normal_(model.transformer.h[5].adapter._module.up.weight, std=10.0)

with model_adapted.trace("The Eiffel Tower is in the city of"):
    adapter_result = model.transformer.h[5].adapter.output.save()
    adapted_logits = model.lm_head.output.save()

print(f"Adapter output norm: {adapter_result.norm():.4f}")
print(f"Adapted prediction:  {model.tokenizer.decode(adapted_logits[0, -1].argmax(dim=-1))}")

Adapter output norm: 67712.0547
Adapted prediction:  ichick

You can also intervene on the adapter itself during a trace — for example, zeroing out its contribution:

In [9]:

Copied!





with model_adapted.trace("The Eiffel Tower is in the city of"):
    # Zero out the adapter's output, neutralizing it for this run
    model.transformer.h[5].adapter.output[:] = 0
    logits = model.lm_head.output.save()

print(f"Adapter zeroed: {model.tokenizer.decode(logits[0, -1].argmax(dim=-1))}")
with model_adapted.trace("The Eiffel Tower is in the city of"):
    # Zero out the adapter's output, neutralizing it for this run
    model.transformer.h[5].adapter.output[:] = 0
    logits = model.lm_head.output.save()

print(f"Adapter zeroed: {model.tokenizer.decode(logits[0, -1].argmax(dim=-1))}")

Adapter zeroed:  Paris

When to use editing vs tracing

Use model.trace() for one-off interventions during a single forward pass
Use model.edit() when you need the same intervention applied repeatedly across many forward passes
Use model.edit() to wire custom modules (adapters, SAEs, probes) into the model's forward pass, making them automatically instrumented for tracing