Walkthrough#

An interactive version of this walkthrough can be found here

In this era of large-scale deep learning, the most interesting AI models are massive black boxes that are hard to run. Ordinary commercial inference service APIs let you interact with huge models, but they do not let you access model internals.

The nnsight library is different: it gives you full access to all the neural network internals. When used together with a remote service like the National Deep Inference Facility (NDIF), it lets you run complex experiments on huge open source models easily, with fully transparent access.

Our team wants to enable entire labs and independent researchers alike, as we believe a large, passionate, and collaborative community will produce the next big insights on a profoundly important field.

1 First, let’s start small#

The Tracing Context#

To demonstrate the core functionality and syntax of nnsight, we’ll define and use a tiny two layer neural network.

[ ]:
# Install nnsight
!pip install nnsight

from IPython.display import clear_output

clear_output()

Our little model here is composed of four sub-modules, two linear layers (‘layer1’, ‘layer2’). We specify the sizes of each of these modules, and create some complementary example input.

[ ]:
from collections import OrderedDict
import torch

input_size = 5
hidden_dims = 10
output_size = 2

net = torch.nn.Sequential(
    OrderedDict(
        [
            ("layer1", torch.nn.Linear(input_size, hidden_dims)),
            ("layer2", torch.nn.Linear(hidden_dims, output_size)),
        ]
    )
).requires_grad_(False)

input = torch.rand((1, input_size))

The core object of the nnsight package is NNsight. This wraps around a given pytorch model to enable the capabilities nnsight provides.

[ ]:
from nnsight import NNsight

model = NNsight(net)

Printing a Pytorch model shows a named hierarchy of modules which is very useful when accessing sub-components directly. NNsight models work the same.

[ ]:
print(model)
Sequential(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)

Before we actually get to using the model we just created, let’s talk about Python contexts.

Python contexts define a scope using the with statement and are often used to create some object, or initiate some logic, that you later want to destroy or conclude.

The most common application is opening files like the following example:

with open('myfile.txt', 'r') as file:
  text = file.read()

Python uses the with keyword to enter a context-like object. This object defines logic to be run at the start of the with block, as well as logic to be run when exiting. When using with for a file, entering the context opens the file and exiting the context closes it. Being within the context means we can read from the file. Simple enough! Now we can discuss how nnsight uses contexts to enable intuitive access into the internals of a neural network.

The main tool with nnsight is a context for tracing.

We enter the tracing context by calling model.trace(<input>) on an NNsight model, which defines how we want to run the model. Inside the context, we will be able to customize how the neural network runs. The model is actually run upon exiting the tracing context.

[ ]:
with model.trace(input) as tracer:
    pass

But where’s the output? To get that, we’ll have to learn how to request it from within the tracing context.

Getting#

Earlier, when we wrapped our little neural net with the NNsight class. This added a couple properties to each module in the model (including the root model itself). The two most important ones are .input and .output.

model.input
model.output

The names are self explanatory. They correspond to the inputs and outputs of their respective modules during a forward pass of the model. We can use these attributes inside the with block.

However, it is important to understand that the model is not executed until the end of the tracing context. How can we access inputs and outputs before the model is run? The trick is deferred execution.

.input and .output are Proxies for the eventual inputs and outputs of a module. In other words, when you access model.output what you are communicating to nnsight is, “When you compute the output of model, please grab it for me and put the value into its corresponding Proxy object’s .value attribute.” Let’s try it:

[ ]:
with model.trace(input) as tracer:

    output = model.output

print(output.value)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-c7e0c74b12fa> in <cell line: 5>()
      3   output = model.output
      4
----> 5 print(output.value)

/usr/local/lib/python3.10/dist-packages/nnsight/tracing/Proxy.py in value(self)
     47
     48         if not self.node.done():
---> 49             raise ValueError("Accessing Proxy value before it's been set.")
     50
     51         return self.node.value

ValueError: Accessing Proxy value before it's been set.

Oh no an error! “Accessing Proxy value before it’s been set.”

Why doesn’t our output have a value?

Proxy objects will only have their value at the end of a context if we call .save() on them. This helps to reduce memory costs. Adding .save() fixes the error:

[ ]:
with model.trace(input) as tracer:

    output = model.output.save()

print(output.value)
tensor([[ 0.1473, -0.1518]])

Success! We now have the model output. You just completed your first intervention using nnsight.

Each time you access a module’s input or output, you create an intervention in the neural network’s forward pass. Collectively these requests form the intervention graph. We call the process of executing it alongside the model’s normal computation graph, interleaving.

On Model output


If you don’t need to access anything other than the final model output, you can call the tracing context with trace=False and not use it as a context:

output = model.trace(<inputs>, trace=False)

Just like we saved the output of the model as a whole, we can save the output of any of its submodules. We use normal Python attribute syntax. We can discover how to access them by name by printing out the model:

[ ]:
print(model)
Sequential(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)
[ ]:
with model.trace(input) as tracer:

    l1_output = model.layer1.output.save()

print(l1_output.value)
tensor([[ 0.0458,  0.5267,  0.7119,  0.4046,  0.2460,  0.7998,  0.4485, -0.2506,
          0.2968, -0.8834]])

Let’s do the same for the input of layer2. While we’re at it, let’s also drop the as tracer, as we won’t be needing the tracer object itself for a few sections:

[ ]:
with model.trace(input):

    l2_input = model.layer2.input.save()

print(l2_input.value)
((tensor([[ 0.0458,  0.5267,  0.7119,  0.4046,  0.2460,  0.7998,  0.4485, -0.2506,
          0.2968, -0.8834]]),), {})

On module inputs


Notice how the value for l2_input, was not just a single tensor. The type/shape of values from .input is in the form of:

tuple(tuple(args), dictionary(kwargs))

Where the first index of the tuple is itself a tuple of all positional arguments, and the second index is a dictionary of the keyword arguments.


Now that we can access activations, we also want to do some post-processing on it. Let’s find out which dimension of layer1’s output has the highest value.

Functions, Methods, and Operations#

We could do this by calling torch.argmax(...) after the tracing context or we can just leverage the fact that nnsight handles functions and methods within the tracing context, by creating a Proxy request for it:

[ ]:
with model.trace(input):

    # Note we don't need to call .save() on the output,
    # as we're only using its value within the tracing context.
    l1_output = model.layer1.output

    l1_amax = torch.argmax(l1_output, dim=1).save()

print(l1_amax[0])
tensor(5)

Nice! That worked seamlessly, but hold on, how come we didn’t need to call .value[0] on the result? In previous sections, we were just being explicit to get an understanding of Proxies and their value. In practice, however, nnsight knows that when outside of the tracing context we only care about the actual value, and so printing, indexing, and applying functions all immediately return and reflect the data in .value. So for the rest of the tutorial we won’t use it.

The same principles work for methods and operations as well:

[ ]:
with model.trace(input):

    value = (model.layer1.output.sum() + model.layer2.output.sum()).save()

print(value)
tensor(2.3416)

By default, torch functions, methods and all operators work with nnsight. We also enable the use of the einops library.

So to recap, the above code block is saying to nnsight, “Run the model with the given input. When the output of layer1 is computed, take its sum. Then do the same for layer2. Now that both of those are computed, add them and make sure not to delete this value as I wish to use it outside of the tracing context.”

Getting and analyzing the activations from various points in a model can be really insightful, and a number of ML techniques do exactly that. However, often times we not only want to view the computation of a model, but influence it as well.

Setting#

To demonstrate the effect of editing the flow of information through the model, let’s set the first dimension of the first layer’s output to 0. NNsight makes this really easy using ‘=’ operator:

[ ]:
with model.trace(input):

    # Save the output before the edit to compare.
    # Notice we apply .clone() before saving as the setting operation is in-place.
    l1_output_before = model.layer1.output.clone().save()

    # Access the 0th index of the hidden state dimension and set it to 0.
    model.layer1.output[:, 0] = 0

    # Save the output after to see our edit.
    l1_output_after = model.layer1.output.save()

print("Before:", l1_output_before)
print("After:", l1_output_after)
Before: tensor([[ 0.0458,  0.5267,  0.7119,  0.4046,  0.2460,  0.7998,  0.4485, -0.2506,
          0.2968, -0.8834]])
After: tensor([[ 0.0000,  0.5267,  0.7119,  0.4046,  0.2460,  0.7998,  0.4485, -0.2506,
          0.2968, -0.8834]])

Seems our change was reflected. Now the same for the last dimension:

[ ]:
with model.trace(input):

    # Save the output before the edit to compare.
    # Notice we apply .clone() before saving as the setting operation is in-place.
    l1_output_before = model.layer1.output.clone().save()

    # Access the last index of the hidden state dimension and set it to 0.
    model.layer1.output[:, hidden_dims] = 0

    # Save the output after to see our edit.
    l1_output_after = model.layer1.output.save()

print("Before:", l1_output_before)
print("After:", l1_output_after)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-42-a1e18ebd4137> in <cell line: 1>()
----> 1 with model.trace(input):
      2
      3   # Save the output before the edit to compare.
      4   # Notice we apply .clone() before saving as the setting operation is in-place.
      5   l1_output_before = model.layer1.output.clone().save()

/usr/local/lib/python3.10/dist-packages/nnsight/contexts/Runner.py in __exit__(self, exc_type, exc_val, exc_tb)
     40             raise exc_val
     41
---> 42         self._graph.tracing = False
     43
     44         if self.remote:

<ipython-input-42-a1e18ebd4137> in <cell line: 1>()
      6
      7   # Access the last index of the hidden state dimension and set it to 0.
----> 8   model.layer1.output[:, hidden_dims] = 0
      9
     10   # Save the output after to see our edit.

/usr/local/lib/python3.10/dist-packages/nnsight/tracing/Proxy.py in __setitem__(self, key, value)
     90
     91     def __setitem__(self, key: Union[Proxy, Any], value: Union[Self, Any]) -> None:
---> 92         self.node.graph.add(
     93             target=operator.setitem,
     94             args=[self.node, key, value],

/usr/local/lib/python3.10/dist-packages/nnsight/tracing/Graph.py in add(self, target, value, args, kwargs, name)
    144                 try:
    145
--> 146                     value = target(
    147                         *Node.prepare_proxy_values(_args),
    148                         **Node.prepare_proxy_values(_kwargs),

IndexError: index 10 is out of bounds for dimension 1 with size 10

Ah of course, we needed to index at hidden_dims - 1 not hidden_dims. How did nnsight know there was this indexing error before leaving the tracing context?

Earlier when discussing contexts in Python, we learned some logic happens upon entering, and some logic happens upon exiting. We know the model is actually run on exit, but what happens on enter? Our input IS actually run though the model, however under its own “fake” context. This means the input makes its way through all of the model operations, allowing nnsight to record the shapes and data types of module inputs and outputs! The operations are never executed using tensors with real values so it doesn’t incur any memory costs. Then, when creating proxy requests like the setting one above, nnsight also attempts to execute the request on the “fake” values we recorded. Hence, it lets us know if our request is feasible before even running the model.

On scanning


“Scanning” is what we call running “fake” inputs throught the model to collect information like shapes and types. “Validating” is what we call trying to execute your intervention proxies with “fake” inputs to see if they work. If you are doing anything in a loop where efficiency is important, you should turn off scanning and validating. You can turn off validating in .trace(...) like .trace(..., validate=False). You can turn off scanning in Tracer.invoke(...) (see the Batching section) like Tracer.invoke(..., scan=False)


Let’s try again with the correct indexing, and view the shape of the output before leaving the tracing context:

[ ]:
with model.trace(input):

    # Save the output before the edit to compare.
    # Notice we apply .clone() before saving as the setting operation is in-place.
    l1_output_before = model.layer1.output.clone().save()

    print(f"layer1 output shape: {model.layer1.output.shape}")

    # Access the last index of the hidden state dimension and set it to 0.
    model.layer1.output[:, hidden_dims - 1] = 0

    # Save the output after to see our edit.
    l1_output_after = model.layer1.output.save()

print("Before:", l1_output_before)
print("After:", l1_output_after)
layer1 output shape: torch.Size([1, 10])
Before: tensor([[ 0.0458,  0.5267,  0.7119,  0.4046,  0.2460,  0.7998,  0.4485, -0.2506,
          0.2968, -0.8834]])
After: tensor([[ 0.0458,  0.5267,  0.7119,  0.4046,  0.2460,  0.7998,  0.4485, -0.2506,
          0.2968,  0.0000]])

We can also just replace proxy inputs and outputs with tensors of the same shape and type. Let’s use the shape information we have at our disposal to add noise to the output, and replace it with this new noised tensor:

[ ]:
with model.trace(input):

    # Save the output before the edit to compare.
    # Notice we apply .clone() before saving as the setting operation is in-place.
    l1_output_before = model.layer1.output.clone().save()

    # Create random noise with variance of .001
    noise = (0.001**0.5) * torch.randn(l1_output_before.shape)

    # Add to original value and replace.
    model.layer1.output = l1_output_before + noise

    # Save the output after to see our edit.
    l1_output_after = model.layer1.output.save()

print("Before:", l1_output_before)
print("After:", l1_output_after)
Before: tensor([[ 0.0458,  0.5267,  0.7119,  0.4046,  0.2460,  0.7998,  0.4485, -0.2506,
          0.2968, -0.8834]])
After: tensor([[ 0.0581,  0.5168,  0.6561,  0.4083,  0.2617,  0.7800,  0.4080, -0.2213,
          0.3394, -0.9187]])

Gradients#

NNsight can also let you apply backprop and access gradients with respect to a loss. Like .input and .output on modules, nnsight also exposes .grad on Proxies themselves (assuming they are proxies of tensors):

[ ]:
with model.trace(input):

    # We need to explicitly have the tensor require grad
    # as the model we defined earlier turned off requiring grad.
    model.layer1.output.requires_grad = True

    # We call .grad on a tensor Proxy to communicate we want to store its gradient.
    # We need to call .save() of course as .grad is its own Proxy.
    layer1_output_grad = model.layer1.output.grad.save()
    layer2_output_grad = model.layer2.output.grad.save()

    # Need a loss to propagate through the later modules in order to have a grad.
    loss = model.output.sum()
    loss.backward()

print("Layer 1 output gradient:", layer1_output_grad)
print("Layer 2 output gradient:", layer2_output_grad)
Layer 1 output gradient: tensor([[ 0.4545, -0.0596, -0.2059,  0.4643, -0.4211, -0.2813,  0.2126,  0.5016,
         -0.0126, -0.1564]])
Layer 2 output gradient: tensor([[1., 1.]])

All of the features we learned previously, also apply to .grad. In other words, we can apply operations to and edit the gradients. Let’s zero the grad of layer1 and double the grad of layer2.

[ ]:
with model.trace(input):

    # We need to explicitly have the tensor require grad
    # as the model we defined earlier turned off requiring grad.
    model.layer1.output.requires_grad = True

    model.layer1.output.grad[:] = 0
    model.layer2.output.grad = model.layer2.output.grad.clone() * 2

    layer1_output_grad = model.layer1.output.grad.save()
    layer2_output_grad = model.layer2.output.grad.save()

    # Need a loss to propagate through the later modules in order to have a grad.
    loss = model.output.sum()
    loss.backward()

print("Layer 1 output gradient:", layer1_output_grad)
print("Layer 2 output gradient:", layer2_output_grad)
Layer 1 output gradient: tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
Layer 2 output gradient: tensor([[2., 2.]])

2 Bigger#

Now that we have the basics of nnsight under our belt, we can scale our model up and combine the techniques we’ve learned into more interesting experiments.

The NNsight class is very bare bones. It wraps a pre-defined model and does no pre-processing on the inputs we enter. It’s designed to be extended with more complex and powerful types of models and we’re excited to see what can be done to leverage its features.

LanguageModel#

LanguageModel is a subclass of NNsight. While we could define and create a model to pass in directly, LanguageModel includes special support for Huggingface language models, including automatically loading models from a Huggingface ID, and loading the model together with the appropriate tokenizer.

Here is how you can use LanguageModel to load GPT-2:

[ ]:
from nnsight import LanguageModel

model = LanguageModel("openai-community/gpt2", device_map="auto")

print(model)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
  (generator): WrapperModule()
)

On Model Initialization


A few important things to note:

Keyword arguments passed to the initialization of LanguageModel is forwarded to HuggingFace specific loading logic. In this case, device_map specifies which devices to use and its value auto indicates to evenly distribute it to all available GPUs (and cpu if no GPUs available). Other arguments can be found here: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM

When we initialize LanguageModel, we aren’t yet loading the parameters of the model into memory. We are actually loading a ‘meta’ version of the model which doesn’t take up any memory, but still allows us to view and trace actions on it. After exiting the first tracing context, the model is then fully loaded into memory. To load into memory on initialization, you can pass dispatch=True into LanguageModel like LanguageModel('openai-community/gpt2', device_map="auto", dispatch=True).


Let’s put together some of the features we applied to the small model, but now on GPT-2. Unlike NNsight, LanguageModel does define logic to pre-process inputs upon entering the tracing context. This makes interacting with the model simpler without having to directly access the tokenizer.

In the following example, we ablate the value coming from the last layer’s MLP module and decode the logits to see what token the model predicts without influence from that particular module:

[ ]:
with model.trace("The Eiffel Tower is in the city of"):

    # Access the last layer using h[-1] as it's a ModuleList
    # Access the first index of .output as that's where the hidden states are.
    model.transformer.h[-1].mlp.output[0][:] = 0

    # Logits come out of model.lm_head and we apply argmax to get the predicted token ids.
    token_ids = model.lm_head.output.argmax(dim=-1).save()

print("Token IDs:", token_ids)

# Apply the tokenizer to decode the ids into words after the tracing context.
print("Prediction:", model.tokenizer.decode(token_ids[0][-1]))
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Token IDs: tensor([[ 262,   12,  417, 8765,   11,  257,  262, 3504,  338, 3576]])
Prediction:  London

You just ran a little intervention on a much more complex model with a lot more parameters! An important piece of information we’re missing though is what the prediction would look like without our ablation.

Of course we could just run two tracing contexts and compare the outputs. This, however, would require two forward passes through the model. NNsight can do better than that.

Batching#

It’s time to bring back the Tracer object we dropped before. See, when you call .trace(...) with some input, it’s actually creating two different contexts behind the scenes. The second one is the invoker context. Being within this context just means that .input and .output should refer only to the input you’ve given invoke. Calling .trace(...) with some input just means there’s only one input and therefore only one invoker context.

We can call .trace() without input and call Tracer.invoke(...) to manually create the invoker context with our input. Now every subsequent time we call .invoke(...), new interventions will only refer to the input in that particular invoke. When exiting the tracing context, the inputs from all of the invokers will be batched together, and they will be executed in one forward pass! So let’s do the ablation experiment, and compute a ‘control’ output to compare to:

On the invoker context


Note that when injecting data to only the relevant invoker interventions, nnsight tries, but can’t guarantee, that it can narrow the data into the right batch idxs (in the case of an object as input or output). So there are cases where all invokes will get all of the data.

Just like .trace(...) created a Tracer object, .invoke(...) creates an Invoker object. The Invoker object has post-processed inputs at invoker.inputs, which can be useful for seeing information about your input. If you are using .trace(...) with inputs, you can still access the invoker object at tracer._invoker.

Keyword arguments given to .invoke(..) make its way to the input pre-processing. For example in LanguageModel, the keyword arguments are used to tokenize like max_length and truncation. If you need to pass in keyword arguments directly to one input .trace(...), you can pass an invoker_args keyword argument that should be a dictionary of keyword arguments for the invoker. .trace(..., invoker_args={...})


[ ]:
with model.trace() as tracer:

    with tracer.invoke("The Eiffel Tower is in the city of"):

        # Ablate the last MLP for only this batch.
        model.transformer.h[-1].mlp.output[0][:] = 0

        # Get the output for only the intervened on batch.
        token_ids_intervention = model.lm_head.output.argmax(dim=-1).save()

    with tracer.invoke("The Eiffel Tower is in the city of"):

        # Get the output for only the original batch.
        token_ids_original = model.lm_head.output.argmax(dim=-1).save()

print("Original token IDs:", token_ids_original)
print("Intervention token IDs:", token_ids_intervention)

print("Original prediction:", model.tokenizer.decode(token_ids_original[0][-1]))
print("Intervention prediction:", model.tokenizer.decode(token_ids_intervention[0][-1]))
Original token IDs: tensor([[ 198,   12,  417, 8765,  318,  257,  262, 3504, 7372, 6342]])
Intervention token IDs: tensor([[ 262,   12,  417, 8765,   11,  257,  262, 3504,  338, 3576]])
Original prediction:  Paris
Intervention prediction:  London

So it did end up affecting what the model predicted. That’s pretty neat!

Another cool thing with multiple invokes is that the Proxies can interact between them. Here we transfer the word token embeddings from a real prompt into another placeholder prompt. Therefore the latter prompt produces the output of the former prompt:

[ ]:
with model.trace() as tracer:

    with tracer.invoke("The Eiffel Tower is in the city of"):

        embeddings = model.transformer.wte.output

    with tracer.invoke("_ _ _ _ _ _ _ _ _ _"):

        model.transformer.wte.output = embeddings

        token_ids_intervention = model.lm_head.output.argmax(dim=-1).save()

    with tracer.invoke("_ _ _ _ _ _ _ _ _ _"):

        token_ids_original = model.lm_head.output.argmax(dim=-1).save()

print("Original prediction:", model.tokenizer.decode(token_ids_original[0][-1]))
print("Intervention prediction:", model.tokenizer.decode(token_ids_intervention[0][-1]))
Original prediction:  _
Intervention prediction:  Paris

.next()#

Some HuggingFace models define methods to generate multiple outputs at a time. LanguageModel wraps that functionality to provide the same tracing features by using .generate(...) instead of .trace(...). This calls the underlying model’s .generate method. It passes the output through a model.generator module that we’ve added onto the model, allowing you to get the generate output at model.generator.output.

In a case like this, the underlying model is called more than once; the modules of said model produce more than one output. Which iteration should a given module.output refer to? That’s where Module.next() comes in.

Each module has a call idx associated with it and .next() simply increments that attribute. At the time of execution, data is injected into the intervention graph only at the iteration that matches the call idx.

[ ]:
with model.generate("The Eiffel Tower is in the city of", max_new_tokens=3):

    token_ids_1 = model.lm_head.output.argmax(dim=-1).save()

    token_ids_2 = model.lm_head.next().output.argmax(dim=-1).save()

    token_ids_3 = model.lm_head.next().output.argmax(dim=-1).save()

    output = model.generator.output.save()

print("Prediction 1: ", model.tokenizer.decode(token_ids_1[0][-1]))
print("Prediction 2: ", model.tokenizer.decode(token_ids_2[0][-1]))
print("Prediction 3: ", model.tokenizer.decode(token_ids_3[0][-1]))

print("All token ids: ", output)

print("All prediction: ", model.tokenizer.batch_decode(output))
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Prediction 1:   Paris
Prediction 2:  ,
Prediction 3:   and
All token ids:  tensor([[ 464,  412,  733,  417, 8765,  318,  287,  262, 1748,  286, 6342,   11,
          290]])
All prediction:  ['The Eiffel Tower is in the city of Paris, and']

3 I thought you said huge models?#

NNsight is only one part of our project to democratize access to AI internals. The other half is NDIF (National Deep Inference Facility).

The interaction between the two is fairly straightforward. The intervention graph we create via the tracing context can be encoded into a custom json format and sent via an http request to the NDIF servers. NDIF then decodes the intervention graph and interleaves it alongside the specified model.

To see which models are currently being hosted, check out the following status page: https://nnsight.net/status/

Remote execution#

In its current state, NDIF requires an API key. To run the rest of this Colab, you would need to obtain your own API key. To do so, simply register for an NDIF account. After registering, you can manage and generate your own API keys.

With a valid API key, you then can configure nnsight by doing the following:

[ ]:
from nnsight import CONFIG

CONFIG.set_default_api_key("<your api key here>")

This only needs to be run once as it will save this api key as the default in a config file along with the nnsight installation.

To amp things up a few levels, let’s demonstrate using nnsight’s tracing context with one of the larger open source language models, Llama-2-70b!

[ ]:
import os

# llama2 70b is a gated model and you need access via your huggingface token
os.environ['HF_TOKEN'] = "<your huggingface token>"

# llama response object requires the version of transformers from github
!pip uninstall -y transformers
!pip install git+https://github.com/huggingface/transformers

clear_output()
[ ]:
# We'll never actually load the parameters so no need to specify a device_map.
model = LanguageModel("meta-llama/Llama-2-70b-hf")

# All we need to specify using NDIF vs executing locally is remote=True.
with model.trace("The Eiffel Tower is in the city of", remote=True) as runner:

    hidden_states = model.model.layers[-1].output.save()

    output = model.output.save()

print(hidden_states)

print(output["logits"])

It really is as simple as remote=True. All of the techniques we went through in earlier sections work just the same when running locally and remotely.

Note that both nnsight, but especially NDIF, is in active development and therefore there may be caveats, changes, and errors to work through.

Getting Involved!#

If you’re interested in following updates to nnsight, contributing, giving feedback, or finding collaborators, please join the NDIF discord!

The Mech Interp discord is also a fantastic place to discuss all things mech interp with a really cool community.

Our website nnsight.net, has a bunch more tutorials detailing more complex interpretability techniques using nnsight. If you want to share any of the work you do using nnsight, let others know on either of the discords above and we might turn it into a tutorial on our website.

💟