Walkthrough#
The API for a transparent science on black-box AI#
In this era of large-scale deep learning, the most interesting AI models are massive black boxes that are hard to run. Ordinary commercial inference service APIs let us interact with huge models, but they do not let us access model internals.
The nnsight
library is different: it provides full access to all neural network internals. When using nnsight
together with a remote service like the National Deep Inference Fabric (NDIF), it is possible to run complex experiments on huge open models easily with fully transparent access.
Through NDIF and NNsight, our team wants to enable entire labs and independent researchers alike, as we believe a large, passionate, and collaborative community will produce the next big insights on this profoundly important field.
1️⃣ First, let’s start small#
Run an interactive version of this walkthrough in Google Colab
Setup#
Install NNsight:
pip install nnsight
Tracing Context#
To demonstrate the core functionality and syntax of nnsight, we’ll define and use a tiny two layer neural network.
Our little model here is composed of two submodules – linear layers layer1
and layer2
. We specify the sizes of each of these modules and create some complementary example input.
[ ]:
from collections import OrderedDict
import torch
input_size = 5
hidden_dims = 10
output_size = 2
net = torch.nn.Sequential(
OrderedDict(
[
("layer1", torch.nn.Linear(input_size, hidden_dims)),
("layer2", torch.nn.Linear(hidden_dims, output_size)),
]
)
).requires_grad_(False)
The core object of the NNsight package is NNsight
. This wraps around a given PyTorch model to enable investigation of its internal parameters.
[ ]:
from nnsight import NNsight
tiny_model = NNsight(net)
Printing a PyTorch model shows a named hierarchy of modules which is very useful when accessing sub-components directly. NNsight reflect the same hierarchy and can be similarly printed.
[ ]:
print(tiny_model)
Sequential(
(layer1): Linear(in_features=5, out_features=10, bias=True)
(layer2): Linear(in_features=10, out_features=2, bias=True)
)
Before we actually get to using the model we just created, let’s talk about Python contexts.
Python contexts define a scope using the with
statement and are often used to create some object, or initiate some logic, that you later want to destroy or conclude.
The most common application is opening files as in the following example:
with open('myfile.txt', 'r') as file:
text = file.read()
Python uses the with
keyword to enter a context-like object. This object defines logic to be run at the start of the with
block, as well as logic to be run when exiting. When using with
for a file, entering the context opens the file and exiting the context closes it. Being within the context means we can read from the file.
Simple enough! Now we can discuss how nnsight
uses contexts to enable intuitive access into the internals of a neural network.
The main tool with nnsight
is a context for tracing.
We enter the tracing context by calling model.trace(<input>)
on an NNsight
model, which defines how we want to run the model. Inside the context, we will be able to customize how the neural network runs. The model is actually run upon exiting the tracing context.
[ ]:
# random input
input = torch.rand((1, input_size))
with tiny_model.trace(input) as tracer:
pass
But where’s the output? To get that, we’ll have to learn how to request it from within the tracing context.
Getting#
Earlier, we wrapped our little neural net with the NNsight
class. This added a couple properties to each module in the model (including the root model itself). The two most important ones are .input
and .output
.
model.input
model.output
The names are self explanatory. They correspond to the inputs and outputs of their respective modules during a forward pass of the model. We can use these attributes inside the with
block.
However, it is important to understand that the model is not executed until the end of the tracing context. How can we access inputs and outputs before the model is run? The trick is deferred execution.
.input
and .output
are Proxies for the eventual inputs and outputs of a module. In other words, when we access model.output
what we are communicating to nnsight
is, “When you compute the output of model
, please grab it for me and put the value into its corresponding Proxy object. Let’s try it:
[ ]:
with tiny_model.trace(input) as tracer:
output = tiny_model.output
print(output)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipython-input-4206295698.py in <cell line: 0>()
3 output = tiny_model.output
4
----> 5 print(output)
NameError: name 'output' is not defined
Oh no an error! “name output
is not defined.”
Why doesn’t our output
variable exist?
Proxy objects will only have their value at the end of a context if we call .save()
on them. This helps to reduce memory costs. Adding .save()
fixes the error:
[ ]:
with tiny_model.trace(input) as tracer:
output = tiny_model.output.save()
print(output)
tensor([[ 0.2872, -0.0245]])
Success! We now have the model output. We just completed out first intervention using nnsight
.
Each time we access a module’s input or output, we create an intervention in the neural network’s forward pass. Collectively these requests form the intervention graph. We call the process of executing it alongside the model’s normal computation graph, interleaving.
On Model output
If we don’t need to access anything other than the model’s final output (i.e., the model’s predicted next token), we can call the tracing context with trace=False
and not use it as a context. This could be useful for simple inference using NNsight.
output = model.trace(<inputs>, trace=False)
Just like we saved the output of the model as a whole, we can save the output of any of its submodules. We use normal Python attribute syntax. We can discover how to access them by name by printing out the model:
[ ]:
print(tiny_model)
Sequential(
(layer1): Linear(in_features=5, out_features=10, bias=True)
(layer2): Linear(in_features=10, out_features=2, bias=True)
)
Let’s access the output of the first layer (which we’ve named layer1
):
[ ]:
with tiny_model.trace(input) as tracer:
l1_output = tiny_model.layer1.output.save()
print(l1_output)
tensor([[ 0.2185, -0.3810, 0.9104, 0.4726, 0.3635, 0.1594, -0.9225, -0.5772,
-0.0298, 0.0354]])
Let’s do the same for the input of layer2
.
Because we aren’t accessing the tracer
object within these tracing contexts, we can also drop as tracer
.
[ ]:
with tiny_model.trace(input):
l2_input = tiny_model.layer2.input.save()
print(l2_input)
tensor([[ 0.2185, -0.3810, 0.9104, 0.4726, 0.3635, 0.1594, -0.9225, -0.5772,
-0.0298, 0.0354]])
On module inputs
Notice how the value for l2_input
is just a single tensor. By default, the .input
attribute of a module will return the first tensor input to the module.
We can also access the full input to a module by using the .inputs
attribute, which will return the values in the form of:
tuple(tuple(args), dictionary(kwargs))
Where the first index of the tuple is itself a tuple of all positional arguments, and the second index is a dictionary of the keyword arguments.
Until now we were saving the output of the model and its submodules within the trace
context to then print it after exiting the context. We will continuing doing this in the rest of the tutorial since it’s a good practice to save the computation results for later analysis.
However, we can also log the outputs of the model and its submodules within the trace
context using print
statements. This is useful for debugging and understanding the model’s behavior while saving memory.
Let’s see how to do this:
[ ]:
with tiny_model.trace(input):
print("Layer 1 - out: ", tiny_model.layer1.output)
Layer 1 - out: tensor([[ 0.2185, -0.3810, 0.9104, 0.4726, 0.3635, 0.1594, -0.9225, -0.5772,
-0.0298, 0.0354]])
Functions, Methods, and Operations#
Now that we can access activations, we also want to do some post-processing on it. Let’s find out which dimension of layer1’s output has the highest value.
We could do this by calling torch.argmax(...)
after the tracing context or we can just leverage the fact that nnsight
handles Pytorch functions and methods within the tracing context, by creating a Proxy request for it:
[ ]:
with tiny_model.trace(input):
# Note we don't need to call .save() on the output,
# as we're only using its value within the tracing context.
l1_output = tiny_model.layer1.output
# We do need to save the argmax tensor however,
# as we're using it outside the tracing context.
l1_amax = torch.argmax(l1_output, dim=1).save()
print(l1_amax[0])
tensor(2)
We can chain together multiple operations on the model’s intermediate outputs. Just remember to save everything at the end!
[ ]:
with tiny_model.trace(input):
value = (tiny_model.layer1.output.sum() + tiny_model.layer2.output.sum()).save()
print(value)
tensor(0.5118)
The code block above is saying to nnsight
, “Run the model with the given input
. When the output of tiny_model.layer1
is computed, take its sum. Then do the same for tiny_model.layer2
. Now that both of those are computed, add them and make sure not to delete this value as I wish to use it outside of the tracing context.”
We can apply any function we want during the trace context, even our own custom functions!
[ ]:
# Take a tensor and return the sum of its elements
def tensor_sum(tensor):
flat = tensor.flatten()
total = 0
for element in flat:
total += element.item()
return torch.tensor(total)
with tiny_model.trace(input) as tracer:
# call on our custom function within the trace context
custom_sum = tensor_sum(tiny_model.layer1.output).save()
sum = tiny_model.layer1.output.sum()
sum.save()
print(custom_sum, sum)
tensor(0.2491) tensor(0.2491)
Setting#
Getting and analyzing the activations from various points in a model can be really insightful, and a number of ML techniques do exactly that. However, often we not only want to view the computation of a model, but also to influence it.
To demonstrate the effect of editing the flow of information through the model, let’s set the first dimension of the first layer’s output to 0. NNsight
makes this really easy using the ‘=’ operator:
[ ]:
with tiny_model.trace(input):
# Save the output before the edit to compare.
# Notice we apply .clone() before saving as the setting operation is in-place.
l1_output_before = tiny_model.layer1.output.clone().save()
# Access the 0th index of the hidden state dimension and set it to 0.
tiny_model.layer1.output[:, 0] = 0
# Save the output after to see our edit.
l1_output_after = tiny_model.layer1.output.save()
print("Before:", l1_output_before)
print("After:", l1_output_after)
Before: tensor([[ 0.2185, -0.3810, 0.9104, 0.4726, 0.3635, 0.1594, -0.9225, -0.5772,
-0.0298, 0.0354]])
After: tensor([[ 0.0000, -0.3810, 0.9104, 0.4726, 0.3635, 0.1594, -0.9225, -0.5772,
-0.0298, 0.0354]])
Seems our change was reflected. Now let’s do the same for the last dimension:
[ ]:
with tiny_model.trace(input):
# Save the output before the edit to compare.
# Notice we apply .clone() before saving as the setting operation is in-place.
l1_output_before = tiny_model.layer1.output.clone().save()
# Access the last index of the hidden state dimension and set it to 0.
tiny_model.layer1.output[:, hidden_dims] = 0
# Save the output after to see our edit.
l1_output_after = tiny_model.layer1.output.save()
print("Before:", l1_output_before)
print("After:", l1_output_after)
---------------------------------------------------------------------------
NNsightException Traceback (most recent call last)
/tmp/ipython-input-3404137504.py in <cell line: 0>()
----> 1 with tiny_model.trace(input):
2
3 # Save the output before the edit to compare.
4 # Notice we apply .clone() before saving as the setting operation is in-place.
5 l1_output_before = tiny_model.layer1.output.clone().save()
/usr/local/lib/python3.11/dist-packages/nnsight/intervention/tracing/base.py in __exit__(self, exc_type, exc_val, exc_tb)
431
432 # Execute the traced code using the configured backend
--> 433 self.backend(self)
434
435 return True
/usr/local/lib/python3.11/dist-packages/nnsight/intervention/backends/execution.py in __call__(self, tracer)
22 except Exception as e:
23
---> 24 raise wrap_exception(e, tracer.info) from None
25 finally:
26 Globals.exit()
NNsightException:
Traceback (most recent call last):
File "/tmp/ipython-input-3404137504.py", line 8, in <cell line: 0>
tiny_model.layer1.output[:, hidden_dims] = 0
IndexError: index 10 is out of bounds for dimension 1 with size 10
Oh no, we are getting an error! Ah of course, we needed to index at hidden_dims - 1
not hidden_dims
.
The error messaging feature can be toggled using nnsight.CONFIG.APP.DEBUG
which defaults to true.
Toggle Error Messaging
Turn off debugging:
import nnsight
nnsight.CONFIG.APP.DEBUG = False
nnsight.CONFIG.save()
Turn on debugging:
import nnsight
nnsight.CONFIG.APP.DEBUG = True
nnsight.CONFIG.save()
Now that we know more about NNsight’s error messaging, let’s try our setting operation again with the correct indexing and view the shape of the output before leaving the tracing context:
[ ]:
with tiny_model.trace(input):
# Save the output before the edit to compare.
# Notice we apply .clone() before saving as the setting operation is in-place.
l1_output_before = tiny_model.layer1.output.clone().save()
print(f"Layer 1 output shape: {tiny_model.layer1.output.shape}")
# Access the last index of the hidden state dimension and set it to 0.
tiny_model.layer1.output[:, hidden_dims - 1] = 0
# Save the output after to see our edit.
l1_output_after = tiny_model.layer1.output.save()
print("Before:", l1_output_before)
print("After:", l1_output_after)
Layer 1 output shape: torch.Size([1, 10])
Before: tensor([[ 0.2185, -0.3810, 0.9104, 0.4726, 0.3635, 0.1594, -0.9225, -0.5772,
-0.0298, 0.0354]])
After: tensor([[ 0.2185, -0.3810, 0.9104, 0.4726, 0.3635, 0.1594, -0.9225, -0.5772,
-0.0298, 0.0000]])
Gradients#
NNsight
also lets us apply backpropagation and access gradients with respect to a loss. Like .input
and .output
on modules, nnsight
exposes .grad
on Proxies themselves (assuming they are proxies of tensors):
[ ]:
# Now in NNsight 0.5
with tiny_model.trace(input):
# 1) access l1 & l2 outputs so trace knows these are intermediate values we care about
l1_output = tiny_model.layer1.output
# 2) make sure gradient flows back to l1 (it will pass by l2)
l1_output.requires_grad = True
l2_output = tiny_model.layer2.output
# 3) access gradients within a backwards trace
with tiny_model.output.sum().backward():
# access .grad within backward context in REVERSE ORDER
layer2_output_grad = l2_output.grad.save()
layer1_output_grad = l1_output.grad.save()
print("Layer 1 output gradient:", layer1_output_grad)
print("Layer 2 output gradient:", layer2_output_grad)
Layer 1 output gradient: tensor([[ 5.0732e-01, -3.0065e-01, -4.2533e-01, 2.5249e-02, 1.6884e-01,
-1.1749e-02, 1.9957e-04, 9.8918e-02, 1.0680e-01, 7.1143e-02]])
Layer 2 output gradient: tensor([[1., 1.]])
Some important things to look for when tracing gradients:
Register your intermediate values in advance If you want the gradient of a layer’s output, then first access that layer in the trace context before the
.backward()
trace call. For us, that looked like:
l1_output = tiny_model.layer1.output
Make sure the gradient has somewhere to flow We set
l1_output.requires_grad
toTrue
to make sure that the gradient flows to the earliest output we care about. Another option would be to dotiny_model.input.requires_grad = True
at the beginning of the trace, but this is slightly less efficient, because we aren’t collecting any gradients there.Call on modules in order within the trace
nnsight
will ensure that your modules are called in the same order as the model’s execution. This means we should do all of our operations on layer 1 before moving on to collecting any information from layer 2.Call on gradients in reverse order Similarly, we want to follow the order of the backward pass, which starts at the final layer and works its way to the input.
All of the features we learned previously, also apply to .grad
. In other words, we can apply operations to and edit the gradients. Let’s double the grad of layer2
. Our intervention has downstream consequences - see how the gradient of layer1
ends up doubled as well?
[ ]:
# Now in NNsight 0.5
with tiny_model.trace(input):
# 1) access l1 & l2 outputs so trace knows these are intermediate values we care about
l1_output = tiny_model.layer1.output
# 2) make sure gradient flows back to l1 (it will pass by l2)
l1_output.requires_grad = True
l2_output = tiny_model.layer2.output
# 3) access gradients within a backwards trace
with tiny_model.output.sum().backward():
# access .grad within backward context in REVERSE ORDER
l2_output.grad = l2_output.grad * 2
layer2_output_grad = l2_output.grad.save()
layer1_output_grad = l1_output.grad.save()
print("Layer 1 output gradient:", layer1_output_grad)
print("Layer 2 output gradient:", layer2_output_grad)
Layer 1 output gradient: tensor([[ 1.0146e+00, -6.0130e-01, -8.5066e-01, 5.0498e-02, 3.3768e-01,
-2.3498e-02, 3.9914e-04, 1.9784e-01, 2.1360e-01, 1.4229e-01]])
Layer 2 output gradient: tensor([[2., 2.]])
Early Stopping#
If we are only interested in a model’s intermediate computations, we can halt a forward pass run at any module level, reducing runtime and conserving compute resources. One examples where this could be particularly useful would if we are working with SAEs - we can train an SAE on one layer and then stop the execution.
[ ]:
with tiny_model.trace(input) as tracer:
l1_out = tiny_model.layer1.output.save()
tracer.stop()
# get the output of the first layer and stop tracing
print("L1 - Output: ", l1_out)
L1 - Output: tensor([[ 0.2185, -0.3810, 0.9104, 0.4726, 0.3635, 0.1594, -0.9225, -0.5772,
-0.0298, 0.0354]])
2️⃣ Bigger#
Now that we have the basics of nnsight
under our belt, we can scale our model up and combine the techniques we’ve learned into more interesting experiments.
The NNsight
class is very bare bones. It wraps a pre-defined model and does no pre-processing on the inputs we enter. It’s designed to be extended with more complex and powerful types of models, and we’re excited to see what can be done to leverage its features!
However, if you’d like to load a Language Model from HuggingFace with its tokenizer, theLanguageModel
subclass greatly simplifies this process.
LanguageModel#
LanguageModel
is a subclass of NNsight
. While we could define and create a model to pass in directly, LanguageModel
includes special support for Huggingface language models, including automatically loading models from a Huggingface ID, and loading the model together with the appropriate tokenizer.
Here is how we can use LanguageModel
to load GPT-2
:
[ ]:
from nnsight import LanguageModel
llm = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)
print(llm)
GPT2LMHeadModel(
(transformer): GPT2Model(
(wte): Embedding(50257, 768)
(wpe): Embedding(1024, 768)
(drop): Dropout(p=0.1, inplace=False)
(h): ModuleList(
(0-11): 12 x GPT2Block(
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=768, out_features=50257, bias=False)
(generator): Generator(
(streamer): Streamer()
)
)
When we initialize LanguageModel
, we aren’t yet loading the parameters of the model into memory. We are actually loading a ‘meta’ version of the model which doesn’t take up any memory, but still allows us to view and trace actions on it. After exiting the first tracing context, the model is then fully loaded into memory. To load into memory on initialization, you can pass dispatch=True
into LanguageModel
like
LanguageModel('openai-community/gpt2', device_map="auto", dispatch=True)
.
On Model Initialization
A few important things to note:
Keyword arguments passed to the initialization of LanguageModel
is forwarded to HuggingFace specific loading logic. In this case, device_map
specifies which devices to use and its value auto
indicates to evenly distribute it to all available GPUs (and CPU if no GPUs available). Other arguments can be found here: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM
Let’s now apply some of the features that we used on the small model to GPT-2
. Unlike NNsight
, LanguageModel
does define logic to pre-process inputs upon entering the tracing context. This makes interacting with the model simpler (i.e., you can send prompts to the model without having to directly access the tokenizer).
In the following example, we ablate the value coming from the last layer’s MLP module and decode the logits to see what token the model predicts without influence from that particular module:
[ ]:
with llm.trace("The Eiffel Tower is in the city of"):
# Access the last layer using h[-1] as it's a ModuleList
# Access the first index of .output as that's where the hidden states are.
llm.transformer.h[-1].mlp.output[0][:] = 0
# # Logits come out of model.lm_head and we apply argmax to get the predicted token ids.
token_ids = llm.lm_head.output.argmax(dim=-1).save()
print("\nToken IDs:", token_ids)
# Apply the tokenizer to decode the ids into words after the tracing context.
print("Prediction:", llm.tokenizer.decode(token_ids[0][-1]))
Token IDs: tensor([[ 262, 12, 417, 8765, 11, 257, 262, 3504, 338, 3576]],
device='cuda:0')
Prediction: London
We just ran a little intervention on a much more complex model with many more parameters! However, we’re missing an important piece of information: what the prediction would have looked like without our ablation.
We could just run two tracing contexts and compare the outputs. However, this would require two forward passes through the model. NNsight
can do better than that with batching.
Batching#
Batching is a way to process multiple inputs in one forward pass. To better understand how batching works, we’re going to bring back the Tracer
object that we dropped before.
When we call .trace(...)
, it’s actually creating two different contexts behind the scenes. The first one is the tracing context that we’ve discussed previously, and the second one is the invoker context. The invoker context defines the values of the .input
and .output
Proxies.
If we call .trace(...)
with some input, the input is passed on to the invoker. As there is only one input, only one invoker context is created.
If we call .trace()
without an input, then we can call tracer.invoke(input1)
to manually create the invoker context with an input, input1
. We can also repeatedly call tracer.invoke(...)
to create the invoker context for additional inputs. Every subsequent time we call .invoke(...)
, interventions within its context will only refer to the input in that particular invoke statement.
When exiting the tracing context, the inputs from all of the invokers will be batched together, and they will be executed in one forward pass! To test this out, let’s do the same ablation experiment, but also add a ‘control’ output for comparison:
More on the invoker context
Note that when injecting data to only the relevant invoker interventions, nnsight
tries, but can’t guarantee, to narrow the data into the right batch indices. Thus, there are cases where all invokes will get all of the data. Specifically, if the input or output data is stored as an object that is not an arbitrary collection of tensors, it will be broadcasted to all invokes.
Just like .trace(...)
created a Tracer
object, .invoke(...)
creates an Invoker
object. For LanguageModel
models, the Invoker
prepares the input by running a tokenizer on it. Invoker
stores pre-processed inputs at invoker.inputs
, which can be accessed to see information about our inputs. In a case where we pass a single input to .trace(...)
directly, we can still access the invoker object at tracer.invoker
without having to call tracer.invoke(...)
.
.invoke(..)
make their way to the input pre-processing.LanguageModel
has keyword arguments max_length
and truncation
used for tokenization which can be passed to the invoker. If we want to pass keyword arguments to the invoker for a single-input .trace(...)
, we can pass invoker_args
as a dictionary of invoker keyword arguments.Here is an example to demonstrate everything we’ve described:
This snippet
with llm.trace("hello", invoker_args={"max_length":10}) as tracer:
invoker = tracer.invoker
does the same as
with llm.trace() as tracer:
with tracer.invoke("hello", max_length=10) as invoker:
invoker = invoker
[ ]:
with llm.trace() as tracer:
with tracer.invoke("The Eiffel Tower is in the city of"):
# Ablate the last MLP for only this batch.
llm.transformer.h[-1].mlp.output[0][:] = 0
# Get the output for only the intervened on batch.
token_ids_intervention = llm.lm_head.output.argmax(dim=-1).save()
with tracer.invoke("The Eiffel Tower is in the city of"):
# Get the output for only the original batch.
token_ids_original = llm.lm_head.output.argmax(dim=-1).save()
print("Original token IDs:", token_ids_original)
print("Modified token IDs:", token_ids_intervention)
print("Original prediction:", llm.tokenizer.decode(token_ids_original[0][-1]))
print("Modified prediction:", llm.tokenizer.decode(token_ids_intervention[0][-1]))
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Original token IDs: tensor([[ 198, 12, 417, 8765, 318, 257, 262, 3504, 7372, 6342]],
device='cuda:0')
Modified token IDs: tensor([[ 262, 12, 417, 8765, 11, 257, 262, 3504, 338, 3576]],
device='cuda:0')
Original prediction: Paris
Modified prediction: London
Based on our control results, our ablation did end up affecting what the model predicted. That’s pretty neat!
Another cool thing with multiple invokes is that Proxies can interact between them.
Here, we transfer the token embeddings from a real prompt into another placeholder prompt. Therefore the latter prompt produces the output of the former prompt:
[ ]:
with llm.trace() as tracer:
barrier = tracer.barrier(2)
with tracer.invoke("The Eiffel Tower is in the city of"):
embeddings = llm.transformer.wte.output
# call barrier
barrier()
with tracer.invoke("_ _ _ _ _ _ _ _ _ _"):
# tell model to wait for the output from the previous invoke with barrier
barrier()
llm.transformer.wte.output = embeddings
token_ids_intervention = llm.lm_head.output.argmax(dim=-1).save()
with tracer.invoke("_ _ _ _ _ _ _ _ _ _"):
token_ids_original = llm.lm_head.output.argmax(dim=-1).save()
print("original prediction shape", token_ids_original[0][-1].shape)
print("Original prediction:", llm.tokenizer.decode(token_ids_original[0][-1]))
print("modified prediction shape", token_ids_intervention[0][-1].shape)
print("Modified prediction:", llm.tokenizer.decode(token_ids_intervention[0][-1]))
original prediction shape torch.Size([])
Original prediction: _
modified prediction shape torch.Size([])
Modified prediction: Paris
For larger batch sizes, you can also iteratate across multiple invoke contexts.
Multiple Token Generation#
Some HuggingFace models define methods to generate multiple outputs at a time. LanguageModel
wraps that functionality to provide the same tracing features by using .generate(...)
instead of .trace(...)
. This calls the underlying model’s .generate
method. It passes the output through a .generator
module that we’ve added onto the model, allowing us to get the generate output at .generator.output
. You can control the number of new tokens generated by setting
max_new_tokens = N
within your call to .generate()
.
Intervening on generated token iterations with .all()
and .iter[]
#
During model generation, the underlying model is called more than once, so the modules of said model produce more than one output. Which iteration should a given module.output
refer to? That’s where .all
and .iter
come in!
If you want to access and intervene on module outputs across all iterations, you should use .all()
. Simply create a with tracer.all():
context and include your intervention code within the indented block.
[ ]:
# using .all():
prompt = 'The Eiffel Tower is in the city of'
layers = llm.transformer.h
n_new_tokens = 50
with llm.generate(prompt, max_new_tokens=n_new_tokens) as tracer:
hidden_states = list().save() # Initialize & .save() list
# Call .all() to apply intervention to each new token
with tracer.all():
# Apply intervention - set first layer output to zero
layers[0].output[0][:] = 0
# Append desired hidden state post-intervention
hidden_states.append(layers[-1].output) # no need to call .save
print("Hidden state length: ",len(hidden_states))
Alternatively, if you want to intervene specific iterations of generation, you can use the with tracer.iter[<slice>]:
context. Here, let’s try intervening only on the generation iterations 2-5.
[ ]:
# using .all():
prompt = 'The Eiffel Tower is in the city of'
layers = llm.transformer.h
n_new_tokens = 50
with llm.generate(prompt, max_new_tokens=n_new_tokens) as tracer:
hidden_states = list().save() # Initialize & .save() list
# Call .all() to apply intervention to each new token
with tracer.iter[2:5]:
# Apply intervention - set first layer output to zero
layers[0].output[0][:] = 0
# Append desired hidden state post-intervention
hidden_states.append(layers[-1].output) # no need to call .save
print("Hidden state length: ",len(hidden_states))
Model Editing#
NNsight’s model editing feature allows you to create persistently modified versions of a model with a use of .edit()
. Unlike interventions in a tracing context, which are temporary, the Editor context enables you to make lasting changes to a model instance.
This feature is useful for:
Creating modified model variants without altering the original
Applying changes that persist across multiple forward passes
Comparing interventions between original and edited models
Let’s explore how to use the Editor context to make a simple persistent change to a model:
[ ]:
# we take the hidden states with the expected output "Paris"
with llm.trace("The Eiffel Tower is located in the city of") as tracer:
hs11 = llm.transformer.h[11].output[0][:, -1, :].save()
# the edited model will now always predict "Paris" as the next token
with llm.edit() as llm_edited:
llm.transformer.h[11].output[0][:, -1, :] = hs11
# we demonstrate this by comparing the output of an unmodified model...
with llm.trace("Vatican is located in the city of") as tracer:
original_tokens = llm.lm_head.output.argmax(dim=-1).save()
# ...with the output of the edited model
with llm_edited.trace("Vatican is located in the city of") as tracer:
modified_tokens = llm.lm_head.output.argmax(dim=-1).save()
print("\nOriginal Prediction: ", llm.tokenizer.decode(original_tokens[0][-1]))
print("Modified Prediction: ", llm.tokenizer.decode(modified_tokens[0][-1]))
Original Prediction: Rome
Modified Prediction: Paris
Edits defined within an Editor context create a new, modified version of the model by default, preserving the original. This allows for safe experimentation with model changes. If you wish to modify the original model directly, you can set inplace=True
when calling .edit()
.
Use this option cautiously, as in-place edits alter the base model for all the consequent model calls.
[ ]:
# we use the hidden state we saved above (hs11)
with llm.edit(inplace=True) as llm_edited:
llm.transformer.h[11].output[0][:, -1, :] = hs11
# we demonstrate this by comparing the output of an unmodified model...
with llm.trace("Vatican is located in the city of") as tracer:
modified_tokens = llm.lm_head.output.argmax(dim=-1).save()
print("Modified In-place: ", llm.tokenizer.decode(modified_tokens[0][-1]))
Modified In-place: Paris
If you’ve made in-place edits to your model and need to revert these changes, you can apply .clear_edits()
. This method removes all edits applied to the model, effectively restoring it to its original state.
[ ]:
llm.clear_edits()
with llm.trace("Vatican is located in the city of"):
modified_tokens = llm.lm_head.output.argmax(dim=-1).save()
print("Edits cleared: ", llm.tokenizer.decode(modified_tokens[0][-1]))
Edits cleared: Rome
3️⃣ I thought you said huge models?#
NNsight
is only one part of our project to democratize access to AI internals. The other half is the National Deep Inference Fabric, or NDIF
. NDIF
hosts large models for shared access using NNsight
, so you don’t have to worry about any of the headaches of hosting large models yourself!
The interaction between NDIF
and NNsight
is fairly straightforward. The intervention graph we create via the tracing context can be encoded into a custom json format and sent via an http request to the NDIF
servers. NDIF
then decodes the intervention graph and interleaves it alongside the specified model.
To see which models are currently being hosted, check out the following status page: https://nnsight.net/status/
Remote execution#
In its current state, NDIF
requires you to receive an API key. Therefore, to run the rest of this walkthrough, you need one of your own. To get one, simply register at https://login.ndif.us.
With a valid API key, you then can configure nnsight
as follows:
[ ]:
from nnsight import CONFIG
CONFIG.set_default_api_key("YOUR_API_KEY")
If you’re running in a local IDE, this only needs to be run once as it will save the API key as the default in a .config file along with your nnsight
installation. You can also add your API key to Google Colab secrets.
To amp things up a few levels, let’s demonstrate using nnsight
’s tracing context with Llama-3.1-8b
!
[ ]:
import os
# Llama 3.1 8b is a gated model, so you need to apply for access on HuggingFace and include your token.
os.environ['HF_TOKEN'] = "YOUR_HUGGING_FACE_TOKEN"
[ ]:
from huggingface_hub import notebook_login
notebook_login()
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
WARNING:huggingface_hub._login:Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
[ ]:
from nnsight import LanguageModel
# We'll never actually load the parameters locally, so no need to specify a device_map.
llama = LanguageModel("meta-llama/Meta-Llama-3.1-8B")
# All we need to specify using NDIF vs executing locally is remote=True.
with llama.trace("The Eiffel Tower is in the city of", remote=True) as runner:
hidden_states = llama.model.layers[-1].output.save()
output = llama.output.save()
print(hidden_states)
print(output["logits"])
It really is as simple as remote=True
. All of the techniques we went through in earlier sections work just the same when running locally or remotely.
Sessions#
NDIF uses a queue to handle concurrent requests from multiple users. To optimize the execution of our experiments we can use the session
context to efficiently package multiple interventions together as one single request to the server.
This offers the following benefits:
All interventions within a session will be executed one after another without additional wait in the NDIF queue
All intermediate outputs for each intervention are stored on the server and can be accessed by other interventions in the same session without moving the data back and forth between NDIF and the local machine
Let’s take a look:
[ ]:
with llama.session(remote=True) as session:
with llama.trace("The Eiffel Tower is in the city of") as t1:
# capture the hidden state from layer 32 at the last token
hs_31 = llama.model.layers[31].output[0][:, -1, :] # no .save()
t1_tokens_out = llama.lm_head.output.argmax(dim=-1).save()
with llama.trace("Buckingham Palace is in the city of") as t2:
llama.model.layers[1].output[0][:, -1, :] = hs_31[:]
t2_tokens_out = llama.lm_head.output.argmax(dim=-1).save()
print("\nT1 - Original Prediction: ", llama.tokenizer.decode(t1_tokens_out[0][-1]))
print("T2 - Modified Prediction: ", llama.tokenizer.decode(t2_tokens_out[0][-1]))
Next Steps#
Check out nnsight.net/tutorials for more walkthroughs implementating classic interpretability techniques using nnsight
.
Getting Involved!#
Note that both nnsight
and NDIF
are in active development, so changes may be made and errors may arise during use. If you’re interested in following updates to nnsight
, contributing, giving feedback, or finding collaborators, please join the NDIF discord. We’d love to hear about your work using nnsight!
You can also follow us on LinkedIn, Bluesky: @ndif-team.bsky.social, and X: @ndif_team.
💟