Quick Start¶
This guide walks you through your first nnsight intervention in just a few minutes.
Loading a Model¶
nnsight wraps PyTorch models to enable tracing and intervention. For language models, use LanguageModel:
Model Dispatching
Setting dispatch=True loads the model weights immediately. Otherwise, the model is loaded on a meta device for faster initialization.
Your First Trace¶
The .trace() context manager runs a forward pass while giving you access to internal activations:
with model.trace('The Eiffel Tower is in the city of'):
# Access hidden states from the last layer
hidden_states = model.transformer.h[-1].output[0].save()
# Get the model's output
output = model.output.save()
# After exiting the context, saved values are available
print(hidden_states.shape) # torch.Size([1, 10, 768])
print(model.tokenizer.decode(output.logits.argmax(dim=-1)[0]))
Always use .save()
Values you want to access after the trace exits must be saved with .save(). Without it, tensors are garbage collected at the end of the trace context.
Accessing Activations¶
Access any module's input or output during the forward pass. Check your model's architecture to understand its output structure. For example, layers in
transformers models typically return tuples, where the first element contains the hidden states.
with model.trace("The Eiffel Tower is in the city of"):
attn_output = model.transformer.h[0].attn.output[0].save() # (1)!
mlp_output = model.transformer.h[0].mlp.output.save() # (2)!
# Access the full layer output
layer_output = model.transformer.h[5].output[0].save()
# Access the final logits
logits = model.lm_head.output.save()
- The output of the attention module is a tuple
- The MLP output is a single tensor, so we can save it directly without indexing
Modifying Activations¶
Intervene on the model by modifying activations in-place:
with model.trace("Hello"):
# Zero out all activations at layer 0
model.transformer.h[0].output[0][:] = 0
# Modify only the last token position
model.transformer.h[1].output[0][:, -1, :] = 0
output = model.output.save()
Or replace activations entirely:
import torch
with model.trace("Hello"):
# Add noise to MLP output
hs = model.transformer.h[-1].mlp.output.clone()
noise = 0.01 * torch.randn(hs.shape)
model.transformer.h[-1].mlp.output = hs + noise
result = model.transformer.h[-1].mlp.output.save()
Understanding Module Hierarchy¶
Print the model to see its structure and available modules:
GPT2LMHeadModel(
(transformer): GPT2Model(
(wte): Embedding(50257, 768)
(wpe): Embedding(1024, 768)
(h): ModuleList(
(0-11): 12 x GPT2Block(
(ln_1): LayerNorm(...)
(attn): GPT2Attention(...)
(ln_2): LayerNorm(...)
(mlp): GPT2MLP(...)
)
)
(ln_f): LayerNorm(...)
)
(lm_head): Linear(...)
)
Access any module using the same dotted path notation:
model.transformer.h[0]— First transformer blockmodel.transformer.h[0].attn— Attention module in first blockmodel.transformer.h[-1].mlp— MLP in last blockmodel.lm_head— Final language modeling head
Key Properties¶
Every module has these special properties for accessing values:
| Property | Description |
|---|---|
.output |
The module's forward pass output |
.input |
First positional argument to the module |
.inputs |
All inputs as (args_tuple, kwargs_dict) |
Using with Any PyTorch Model¶
For arbitrary PyTorch models (not just language models), use the base NNsight wrapper:
from nnsight import NNsight
import torch
net = torch.nn.Sequential(
torch.nn.Linear(5, 10),
torch.nn.Linear(10, 2)
)
model = NNsight(net)
with model.trace(torch.rand(1, 5)):
layer1_out = model[0].output.save()
output = model.output.save()
print(layer1_out.shape) # torch.Size([1, 10])
Next Steps¶
You've learned the basics of nnsight! Continue exploring:
- Features — Deep dives into specific capabilities
- Tutorials — Step-by-step guides for common tasks
- Documentation — Comprehensive reference material