Walkthrough#

The API for a transparent science on black-box AI#

In this era of large-scale deep learning, the most interesting AI models are massive black boxes that are hard to run. Ordinary commercial inference service APIs let us interact with huge models, but they do not let us access model internals.

The nnsight library is different: it provides full access to all neural network internals. When using nnsight together with a remote service like the National Deep Inference Fabric (NDIF), it is possible to run complex experiments on huge open models easily with fully transparent access.

Through NDIF and NNsight, our team wants to enable entire labs and independent researchers alike, as we believe a large, passionate, and collaborative community will produce the next big insights on this profoundly important field.

1️⃣ First, let’s start small#

Run an interactive version of this walkthrough in Google Colab

Setup#

Install NNsight:

pip install nnsight

Tracing Context#

To demonstrate the core functionality and syntax of nnsight, we’ll define and use a tiny two layer neural network.

Our little model here is composed of two submodules – linear layers layer1 and layer2. We specify the sizes of each of these modules and create some complementary example input.

[ ]:

from collections import OrderedDict
import torch

input_size = 5
hidden_dims = 10
output_size = 2

net = torch.nn.Sequential(
    OrderedDict(
        [
            ("layer1", torch.nn.Linear(input_size, hidden_dims)),
            ("layer2", torch.nn.Linear(hidden_dims, output_size)),
        ]
    )
).requires_grad_(False)

The core object of the NNsight package is NNsight. This wraps around a given PyTorch model to enable investigation of its internal parameters.

[ ]:

from nnsight import NNsight

tiny_model = NNsight(net)

Printing a PyTorch model shows a named hierarchy of modules which is very useful when accessing sub-components directly. NNsight reflect the same hierarchy and can be similarly printed.

[ ]:

print(tiny_model)

Sequential(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)

Before we actually get to using the model we just created, let’s talk about Python contexts.

Python contexts define a scope using the with statement and are often used to create some object, or initiate some logic, that you later want to destroy or conclude.

The most common application is opening files as in the following example:

with open('myfile.txt', 'r') as file:
  text = file.read()

Python uses the with keyword to enter a context-like object. This object defines logic to be run at the start of the with block, as well as logic to be run when exiting. When using with for a file, entering the context opens the file and exiting the context closes it. Being within the context means we can read from the file.

Simple enough! Now we can discuss how nnsight uses contexts to enable intuitive access into the internals of a neural network.

The main tool with nnsight is a context for tracing.

We enter the tracing context by calling model.trace(<input>) on an NNsight model, which defines how we want to run the model. Inside the context, we will be able to customize how the neural network runs. The model is actually run upon exiting the tracing context.

[ ]:

# random input
input = torch.rand((1, input_size))

with tiny_model.trace(input) as tracer:
    pass

But where’s the output? To get that, we’ll have to learn how to request it from within the tracing context.

Getting#

Earlier, we wrapped our little neural net with the NNsight class. This added a couple properties to each module in the model (including the root model itself). The two most important ones are .input and .output.

model.input
model.output

The names are self explanatory. They correspond to the inputs and outputs of their respective modules during a forward pass of the model. We can use these attributes inside the with block.

However, it is important to understand that the model is not executed until the end of the tracing context. How can we access inputs and outputs before the model is run? The trick is deferred execution.

.input and .output are Proxies for the eventual inputs and outputs of a module. In other words, when we access model.output what we are communicating to nnsight is, “When you compute the output of model, please grab it for me and put the value into its corresponding Proxy object. Let’s try it:

[ ]:

with tiny_model.trace(input) as tracer:

    output = tiny_model.output

print(output)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipython-input-4206295698.py in <cell line: 0>()
      3     output = tiny_model.output
      4
----> 5 print(output)

NameError: name 'output' is not defined

Oh no an error! “name output is not defined.”

Why doesn’t our output variable exist?

Proxy objects will only have their value at the end of a context if we call .save() on them. This helps to reduce memory costs. Adding .save() fixes the error:

[ ]:

with tiny_model.trace(input) as tracer:

    output = tiny_model.output.save()

print(output)

tensor([[ 0.2872, -0.0245]])

Success! We now have the model output. We just completed out first intervention using nnsight.

Each time we access a module’s input or output, we create an intervention in the neural network’s forward pass. Collectively these requests form the intervention graph. We call the process of executing it alongside the model’s normal computation graph, interleaving.

On Model output

If we don’t need to access anything other than the model’s final output (i.e., the model’s predicted next token), we can call the tracing context with trace=False and not use it as a context. This could be useful for simple inference using NNsight.

output = model.trace(<inputs>, trace=False)

Just like we saved the output of the model as a whole, we can save the output of any of its submodules. We use normal Python attribute syntax. We can discover how to access them by name by printing out the model:

[ ]:

print(tiny_model)

Sequential(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)

Let’s access the output of the first layer (which we’ve named layer1):

[ ]:

with tiny_model.trace(input) as tracer:

    l1_output = tiny_model.layer1.output.save()

print(l1_output)

tensor([[ 0.2185, -0.3810,  0.9104,  0.4726,  0.3635,  0.1594, -0.9225, -0.5772,
         -0.0298,  0.0354]])

Let’s do the same for the input of layer2.

Because we aren’t accessing the tracer object within these tracing contexts, we can also drop as tracer.

[ ]:

with tiny_model.trace(input):

    l2_input = tiny_model.layer2.input.save()

print(l2_input)

tensor([[ 0.2185, -0.3810,  0.9104,  0.4726,  0.3635,  0.1594, -0.9225, -0.5772,
         -0.0298,  0.0354]])

On module inputs

Notice how the value for l2_input is just a single tensor. By default, the .input attribute of a module will return the first tensor input to the module.

We can also access the full input to a module by using the .inputs attribute, which will return the values in the form of:

tuple(tuple(args), dictionary(kwargs))

Where the first index of the tuple is itself a tuple of all positional arguments, and the second index is a dictionary of the keyword arguments.

Until now we were saving the output of the model and its submodules within the trace context to then print it after exiting the context. We will continuing doing this in the rest of the tutorial since it’s a good practice to save the computation results for later analysis.

However, we can also log the outputs of the model and its submodules within the trace context using print statements. This is useful for debugging and understanding the model’s behavior while saving memory.

Let’s see how to do this:

[ ]:

with tiny_model.trace(input):
  print("Layer 1 - out: ", tiny_model.layer1.output)

Layer 1 - out:  tensor([[ 0.2185, -0.3810,  0.9104,  0.4726,  0.3635,  0.1594, -0.9225, -0.5772,
         -0.0298,  0.0354]])

Functions, Methods, and Operations#

Now that we can access activations, we also want to do some post-processing on it. Let’s find out which dimension of layer1’s output has the highest value.

We could do this by calling torch.argmax(...) after the tracing context or we can just leverage the fact that nnsight handles Pytorch functions and methods within the tracing context, by creating a Proxy request for it:

[ ]:

with tiny_model.trace(input):

    # Note we don't need to call .save() on the output,
    # as we're only using its value within the tracing context.
    l1_output = tiny_model.layer1.output

    # We do need to save the argmax tensor however,
    # as we're using it outside the tracing context.
    l1_amax = torch.argmax(l1_output, dim=1).save()

print(l1_amax[0])

tensor(2)

We can chain together multiple operations on the model’s intermediate outputs. Just remember to save everything at the end!

[ ]:

with tiny_model.trace(input):

    value = (tiny_model.layer1.output.sum() + tiny_model.layer2.output.sum()).save()

print(value)

tensor(0.5118)

The code block above is saying to nnsight, “Run the model with the given input. When the output of tiny_model.layer1 is computed, take its sum. Then do the same for tiny_model.layer2. Now that both of those are computed, add them and make sure not to delete this value as I wish to use it outside of the tracing context.”

We can apply any function we want during the trace context, even our own custom functions!

[ ]:

# Take a tensor and return the sum of its elements
def tensor_sum(tensor):
    flat = tensor.flatten()
    total = 0
    for element in flat:
        total += element.item()

    return torch.tensor(total)

with tiny_model.trace(input) as tracer:

    # call on our custom function within the trace context
    custom_sum = tensor_sum(tiny_model.layer1.output).save()
    sum = tiny_model.layer1.output.sum()
    sum.save()


print(custom_sum, sum)

tensor(0.2491) tensor(0.2491)

Setting#

Getting and analyzing the activations from various points in a model can be really insightful, and a number of ML techniques do exactly that. However, often we not only want to view the computation of a model, but also to influence it.

To demonstrate the effect of editing the flow of information through the model, let’s set the first dimension of the first layer’s output to 0. NNsight makes this really easy using the ‘=’ operator:

[ ]:

with tiny_model.trace(input):

    # Save the output before the edit to compare.
    # Notice we apply .clone() before saving as the setting operation is in-place.
    l1_output_before = tiny_model.layer1.output.clone().save()

    # Access the 0th index of the hidden state dimension and set it to 0.
    tiny_model.layer1.output[:, 0] = 0

    # Save the output after to see our edit.
    l1_output_after = tiny_model.layer1.output.save()

print("Before:", l1_output_before)
print("After:", l1_output_after)

Before: tensor([[ 0.2185, -0.3810,  0.9104,  0.4726,  0.3635,  0.1594, -0.9225, -0.5772,
         -0.0298,  0.0354]])
After: tensor([[ 0.0000, -0.3810,  0.9104,  0.4726,  0.3635,  0.1594, -0.9225, -0.5772,
         -0.0298,  0.0354]])

Seems our change was reflected. Now let’s do the same for the last dimension:

[ ]:

with tiny_model.trace(input):

    # Save the output before the edit to compare.
    # Notice we apply .clone() before saving as the setting operation is in-place.
    l1_output_before = tiny_model.layer1.output.clone().save()

    # Access the last index of the hidden state dimension and set it to 0.
    tiny_model.layer1.output[:, hidden_dims] = 0

    # Save the output after to see our edit.
    l1_output_after = tiny_model.layer1.output.save()

print("Before:", l1_output_before)
print("After:", l1_output_after)

---------------------------------------------------------------------------
NNsightException                          Traceback (most recent call last)
/tmp/ipython-input-3404137504.py in <cell line: 0>()
----> 1 with tiny_model.trace(input):
      2
      3     # Save the output before the edit to compare.
      4     # Notice we apply .clone() before saving as the setting operation is in-place.
      5     l1_output_before = tiny_model.layer1.output.clone().save()

/usr/local/lib/python3.11/dist-packages/nnsight/intervention/tracing/base.py in __exit__(self, exc_type, exc_val, exc_tb)
    431
    432             # Execute the traced code using the configured backend
--> 433             self.backend(self)
    434
    435             return True

/usr/local/lib/python3.11/dist-packages/nnsight/intervention/backends/execution.py in __call__(self, tracer)
     22         except Exception as e:
     23
---> 24             raise wrap_exception(e, tracer.info) from None
     25         finally:
     26             Globals.exit()

NNsightException:

Traceback (most recent call last):
  File "/tmp/ipython-input-3404137504.py", line 8, in <cell line: 0>
    tiny_model.layer1.output[:, hidden_dims] = 0

IndexError: index 10 is out of bounds for dimension 1 with size 10

Oh no, we are getting an error! Ah of course, we needed to index at hidden_dims - 1 not hidden_dims.

The error messaging feature can be toggled using nnsight.CONFIG.APP.DEBUG which defaults to true.

Toggle Error Messaging

Turn off debugging:

import nnsight

nnsight.CONFIG.APP.DEBUG = False
nnsight.CONFIG.save()

Turn on debugging:

import nnsight

nnsight.CONFIG.APP.DEBUG = True
nnsight.CONFIG.save()

Now that we know more about NNsight’s error messaging, let’s try our setting operation again with the correct indexing and view the shape of the output before leaving the tracing context:

[ ]:

with tiny_model.trace(input):

    # Save the output before the edit to compare.
    # Notice we apply .clone() before saving as the setting operation is in-place.
    l1_output_before = tiny_model.layer1.output.clone().save()

    print(f"Layer 1 output shape: {tiny_model.layer1.output.shape}")

    # Access the last index of the hidden state dimension and set it to 0.
    tiny_model.layer1.output[:, hidden_dims - 1] = 0

    # Save the output after to see our edit.
    l1_output_after = tiny_model.layer1.output.save()

print("Before:", l1_output_before)
print("After:", l1_output_after)

Layer 1 output shape: torch.Size([1, 10])
Before: tensor([[ 0.2185, -0.3810,  0.9104,  0.4726,  0.3635,  0.1594, -0.9225, -0.5772,
         -0.0298,  0.0354]])
After: tensor([[ 0.2185, -0.3810,  0.9104,  0.4726,  0.3635,  0.1594, -0.9225, -0.5772,
         -0.0298,  0.0000]])

Gradients#

NNsight also lets us apply backpropagation and access gradients with respect to a loss. Like .input and .output on modules, nnsight exposes .grad on Proxies themselves (assuming they are proxies of tensors):

[ ]:

# Now in NNsight 0.5
with tiny_model.trace(input):
  # 1) access l1 & l2 outputs so trace knows these are intermediate values we care about
  l1_output = tiny_model.layer1.output
  # 2) make sure gradient flows back to l1 (it will pass by l2)
  l1_output.requires_grad = True
  l2_output = tiny_model.layer2.output

  # 3) access gradients within a backwards trace
  with tiny_model.output.sum().backward():
    # access .grad within backward context in REVERSE ORDER
    layer2_output_grad = l2_output.grad.save()
    layer1_output_grad = l1_output.grad.save()

print("Layer 1 output gradient:", layer1_output_grad)
print("Layer 2 output gradient:", layer2_output_grad)

Layer 1 output gradient: tensor([[ 5.0732e-01, -3.0065e-01, -4.2533e-01,  2.5249e-02,  1.6884e-01,
         -1.1749e-02,  1.9957e-04,  9.8918e-02,  1.0680e-01,  7.1143e-02]])
Layer 2 output gradient: tensor([[1., 1.]])

Some important things to look for when tracing gradients:

Register your intermediate values in advance If you want the gradient of a layer’s output, then first access that layer in the trace context before the .backward() trace call. For us, that looked like:

l1_output = tiny_model.layer1.output

Make sure the gradient has somewhere to flow We set l1_output.requires_grad to True to make sure that the gradient flows to the earliest output we care about. Another option would be to do tiny_model.input.requires_grad = True at the beginning of the trace, but this is slightly less efficient, because we aren’t collecting any gradients there.
Call on modules in order within the trace nnsight will ensure that your modules are called in the same order as the model’s execution. This means we should do all of our operations on layer 1 before moving on to collecting any information from layer 2.
Call on gradients in reverse order Similarly, we want to follow the order of the backward pass, which starts at the final layer and works its way to the input.

All of the features we learned previously, also apply to .grad. In other words, we can apply operations to and edit the gradients. Let’s double the grad of layer2. Our intervention has downstream consequences - see how the gradient of layer1 ends up doubled as well?

[ ]:

# Now in NNsight 0.5
with tiny_model.trace(input):
  # 1) access l1 & l2 outputs so trace knows these are intermediate values we care about
  l1_output = tiny_model.layer1.output
  # 2) make sure gradient flows back to l1 (it will pass by l2)
  l1_output.requires_grad = True
  l2_output = tiny_model.layer2.output

  # 3) access gradients within a backwards trace
  with tiny_model.output.sum().backward():
    # access .grad within backward context in REVERSE ORDER
    l2_output.grad = l2_output.grad * 2
    layer2_output_grad = l2_output.grad.save()
    layer1_output_grad = l1_output.grad.save()

print("Layer 1 output gradient:", layer1_output_grad)
print("Layer 2 output gradient:", layer2_output_grad)

Layer 1 output gradient: tensor([[ 1.0146e+00, -6.0130e-01, -8.5066e-01,  5.0498e-02,  3.3768e-01,
         -2.3498e-02,  3.9914e-04,  1.9784e-01,  2.1360e-01,  1.4229e-01]])
Layer 2 output gradient: tensor([[2., 2.]])

Early Stopping#

If we are only interested in a model’s intermediate computations, we can halt a forward pass run at any module level, reducing runtime and conserving compute resources. One examples where this could be particularly useful would if we are working with SAEs - we can train an SAE on one layer and then stop the execution.

[ ]:

with tiny_model.trace(input) as tracer:
   l1_out = tiny_model.layer1.output.save()
   tracer.stop()

# get the output of the first layer and stop tracing
print("L1 - Output: ", l1_out)

L1 - Output:  tensor([[ 0.2185, -0.3810,  0.9104,  0.4726,  0.3635,  0.1594, -0.9225, -0.5772,
         -0.0298,  0.0354]])

2️⃣ Bigger#

Now that we have the basics of nnsight under our belt, we can scale our model up and combine the techniques we’ve learned into more interesting experiments.

The NNsight class is very bare bones. It wraps a pre-defined model and does no pre-processing on the inputs we enter. It’s designed to be extended with more complex and powerful types of models, and we’re excited to see what can be done to leverage its features!

However, if you’d like to load a Language Model from HuggingFace with its tokenizer, theLanguageModel subclass greatly simplifies this process.

LanguageModel#

LanguageModel is a subclass of NNsight. While we could define and create a model to pass in directly, LanguageModel includes special support for Huggingface language models, including automatically loading models from a Huggingface ID, and loading the model together with the appropriate tokenizer.

Here is how we can use LanguageModel to load GPT-2:

[ ]:

from nnsight import LanguageModel

llm = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)

print(llm)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
  (generator): Generator(
    (streamer): Streamer()
  )
)

When we initialize LanguageModel, we aren’t yet loading the parameters of the model into memory. We are actually loading a ‘meta’ version of the model which doesn’t take up any memory, but still allows us to view and trace actions on it. After exiting the first tracing context, the model is then fully loaded into memory. To load into memory on initialization, you can pass dispatch=True into LanguageModel like LanguageModel('openai-community/gpt2', device_map="auto", dispatch=True).

On Model Initialization

A few important things to note:

Keyword arguments passed to the initialization of LanguageModel is forwarded to HuggingFace specific loading logic. In this case, device_map specifies which devices to use and its value auto indicates to evenly distribute it to all available GPUs (and CPU if no GPUs available). Other arguments can be found here: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM

Let’s now apply some of the features that we used on the small model to GPT-2. Unlike NNsight, LanguageModel does define logic to pre-process inputs upon entering the tracing context. This makes interacting with the model simpler (i.e., you can send prompts to the model without having to directly access the tokenizer).

In the following example, we ablate the value coming from the last layer’s MLP module and decode the logits to see what token the model predicts without influence from that particular module:

[ ]:

with llm.trace("The Eiffel Tower is in the city of"):
    # Access the last layer using h[-1] as it's a ModuleList
    # Access the first index of .output as that's where the hidden states are.
    llm.transformer.h[-1].mlp.output[0][:] = 0

    # # Logits come out of model.lm_head and we apply argmax to get the predicted token ids.
    token_ids = llm.lm_head.output.argmax(dim=-1).save()

print("\nToken IDs:", token_ids)

# Apply the tokenizer to decode the ids into words after the tracing context.
print("Prediction:", llm.tokenizer.decode(token_ids[0][-1]))


Token IDs: tensor([[ 262,   12,  417, 8765,   11,  257,  262, 3504,  338, 3576]],
       device='cuda:0')
Prediction:  London

We just ran a little intervention on a much more complex model with many more parameters! However, we’re missing an important piece of information: what the prediction would have looked like without our ablation.

We could just run two tracing contexts and compare the outputs. However, this would require two forward passes through the model. NNsight can do better than that with batching.

Batching#

Batching is a way to process multiple inputs in one forward pass. To better understand how batching works, we’re going to bring back the Tracer object that we dropped before.

When we call .trace(...), it’s actually creating two different contexts behind the scenes. The first one is the tracing context that we’ve discussed previously, and the second one is the invoker context. The invoker context defines the values of the .input and .output Proxies.

If we call .trace(...) with some input, the input is passed on to the invoker. As there is only one input, only one invoker context is created.

If we call .trace() without an input, then we can call tracer.invoke(input1) to manually create the invoker context with an input, input1. We can also repeatedly call tracer.invoke(...) to create the invoker context for additional inputs. Every subsequent time we call .invoke(...), interventions within its context will only refer to the input in that particular invoke statement.

When exiting the tracing context, the inputs from all of the invokers will be batched together, and they will be executed in one forward pass! To test this out, let’s do the same ablation experiment, but also add a ‘control’ output for comparison:

Multiple Token Generation#

Some HuggingFace models define methods to generate multiple outputs at a time. LanguageModel wraps that functionality to provide the same tracing features by using .generate(...) instead of .trace(...). This calls the underlying model’s .generate method. It passes the output through a .generator module that we’ve added onto the model, allowing us to get the generate output at .generator.output. You can control the number of new tokens generated by setting max_new_tokens = N within your call to .generate().

Intervening on generated token iterations with `.all()` and `.iter[]`#

During model generation, the underlying model is called more than once, so the modules of said model produce more than one output. Which iteration should a given module.output refer to? That’s where .all and .iter come in!

If you want to access and intervene on module outputs across all iterations, you should use .all(). Simply create a with tracer.all(): context and include your intervention code within the indented block.

[ ]:

# using .all():
prompt = 'The Eiffel Tower is in the city of'
layers = llm.transformer.h
n_new_tokens = 50
with llm.generate(prompt, max_new_tokens=n_new_tokens) as tracer:
    hidden_states = list().save() # Initialize & .save() list

    # Call .all() to apply intervention to each new token
    with tracer.all():

        # Apply intervention - set first layer output to zero
        layers[0].output[0][:] = 0

        # Append desired hidden state post-intervention
        hidden_states.append(layers[-1].output) # no need to call .save

print("Hidden state length: ",len(hidden_states))

Alternatively, if you want to intervene specific iterations of generation, you can use the with tracer.iter[<slice>]: context. Here, let’s try intervening only on the generation iterations 2-5.

[ ]:

# using .all():
prompt = 'The Eiffel Tower is in the city of'
layers = llm.transformer.h
n_new_tokens = 50
with llm.generate(prompt, max_new_tokens=n_new_tokens) as tracer:
    hidden_states = list().save() # Initialize & .save() list

    # Call .all() to apply intervention to each new token
    with tracer.iter[2:5]:

        # Apply intervention - set first layer output to zero
        layers[0].output[0][:] = 0

        # Append desired hidden state post-intervention
        hidden_states.append(layers[-1].output) # no need to call .save

print("Hidden state length: ",len(hidden_states))

Model Editing#

NNsight’s model editing feature allows you to create persistently modified versions of a model with a use of .edit(). Unlike interventions in a tracing context, which are temporary, the Editor context enables you to make lasting changes to a model instance.

This feature is useful for:

Creating modified model variants without altering the original
Applying changes that persist across multiple forward passes
Comparing interventions between original and edited models

Let’s explore how to use the Editor context to make a simple persistent change to a model:

[ ]:

# we take the hidden states with the expected output "Paris"
with llm.trace("The Eiffel Tower is located in the city of") as tracer:
    hs11 = llm.transformer.h[11].output[0][:, -1, :].save()

# the edited model will now always predict "Paris" as the next token
with llm.edit() as llm_edited:
    llm.transformer.h[11].output[0][:, -1, :] = hs11

# we demonstrate this by comparing the output of an unmodified model...
with llm.trace("Vatican is located in the city of") as tracer:
    original_tokens = llm.lm_head.output.argmax(dim=-1).save()

# ...with the output of the edited model
with llm_edited.trace("Vatican is located in the city of") as tracer:
    modified_tokens = llm.lm_head.output.argmax(dim=-1).save()


print("\nOriginal Prediction: ", llm.tokenizer.decode(original_tokens[0][-1]))
print("Modified Prediction: ", llm.tokenizer.decode(modified_tokens[0][-1]))


Original Prediction:   Rome
Modified Prediction:   Paris

Edits defined within an Editor context create a new, modified version of the model by default, preserving the original. This allows for safe experimentation with model changes. If you wish to modify the original model directly, you can set inplace=True when calling .edit().

Use this option cautiously, as in-place edits alter the base model for all the consequent model calls.

[ ]:

# we use the hidden state we saved above (hs11)
with llm.edit(inplace=True) as llm_edited:
    llm.transformer.h[11].output[0][:, -1, :] = hs11

# we demonstrate this by comparing the output of an unmodified model...
with llm.trace("Vatican is located in the city of") as tracer:
    modified_tokens = llm.lm_head.output.argmax(dim=-1).save()

print("Modified In-place: ", llm.tokenizer.decode(modified_tokens[0][-1]))

Modified In-place:   Paris

If you’ve made in-place edits to your model and need to revert these changes, you can apply .clear_edits(). This method removes all edits applied to the model, effectively restoring it to its original state.

[ ]:

llm.clear_edits()

with llm.trace("Vatican is located in the city of"):
    modified_tokens = llm.lm_head.output.argmax(dim=-1).save()

print("Edits cleared: ", llm.tokenizer.decode(modified_tokens[0][-1]))

Edits cleared:   Rome

3️⃣ I thought you said huge models?#

NNsight is only one part of our project to democratize access to AI internals. The other half is the National Deep Inference Fabric, or NDIF. NDIF hosts large models for shared access using NNsight, so you don’t have to worry about any of the headaches of hosting large models yourself!

The interaction between NDIF and NNsight is fairly straightforward. The intervention graph we create via the tracing context can be encoded into a custom json format and sent via an http request to the NDIF servers. NDIF then decodes the intervention graph and interleaves it alongside the specified model.

To see which models are currently being hosted, check out the following status page: https://nnsight.net/status/

Remote execution#

In its current state, NDIF requires you to receive an API key. Therefore, to run the rest of this walkthrough, you need one of your own. To get one, simply register at https://login.ndif.us.

With a valid API key, you then can configure nnsight as follows:

[ ]:

from nnsight import CONFIG

CONFIG.set_default_api_key("YOUR_API_KEY")

If you’re running in a local IDE, this only needs to be run once as it will save the API key as the default in a .config file along with your nnsight installation. You can also add your API key to Google Colab secrets.

To amp things up a few levels, let’s demonstrate using nnsight’s tracing context with Llama-3.1-8b!

[ ]:

import os

# Llama 3.1 8b is a gated model, so you need to apply for access on HuggingFace and include your token.
os.environ['HF_TOKEN'] = "YOUR_HUGGING_FACE_TOKEN"

[ ]:

from huggingface_hub import notebook_login

notebook_login()

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
WARNING:huggingface_hub._login:Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.

[ ]:

from nnsight import LanguageModel

# We'll never actually load the parameters locally, so no need to specify a device_map.
llama = LanguageModel("meta-llama/Meta-Llama-3.1-8B")
# All we need to specify using NDIF vs executing locally is remote=True.
with llama.trace("The Eiffel Tower is in the city of", remote=True) as runner:

    hidden_states = llama.model.layers[-1].output.save()

    output = llama.output.save()

print(hidden_states)

print(output["logits"])

It really is as simple as remote=True. All of the techniques we went through in earlier sections work just the same when running locally or remotely.

Sessions#

NDIF uses a queue to handle concurrent requests from multiple users. To optimize the execution of our experiments we can use the session context to efficiently package multiple interventions together as one single request to the server.

This offers the following benefits:

All interventions within a session will be executed one after another without additional wait in the NDIF queue
All intermediate outputs for each intervention are stored on the server and can be accessed by other interventions in the same session without moving the data back and forth between NDIF and the local machine

Let’s take a look:

[ ]:

with llama.session(remote=True) as session:

  with llama.trace("The Eiffel Tower is in the city of") as t1:
    # capture the hidden state from layer 32 at the last token
    hs_31 = llama.model.layers[31].output[0][:, -1, :] # no .save()
    t1_tokens_out = llama.lm_head.output.argmax(dim=-1).save()

  with llama.trace("Buckingham Palace is in the city of") as t2:
    llama.model.layers[1].output[0][:, -1, :] = hs_31[:]
    t2_tokens_out = llama.lm_head.output.argmax(dim=-1).save()

print("\nT1 - Original Prediction: ", llama.tokenizer.decode(t1_tokens_out[0][-1]))
print("T2 - Modified Prediction: ", llama.tokenizer.decode(t2_tokens_out[0][-1]))

Next Steps#

Check out nnsight.net/tutorials for more walkthroughs implementating classic interpretability techniques using nnsight.

Getting Involved!#

Note that both nnsight and NDIF are in active development, so changes may be made and errors may arise during use. If you’re interested in following updates to nnsight, contributing, giving feedback, or finding collaborators, please join the NDIF discord. We’d love to hear about your work using nnsight!

You can also follow us on LinkedIn, Bluesky: @ndif-team.bsky.social, and X: @ndif_team.

💟