Walkthrough#
An interactive version of this walkthrough can be found here
In this era of large-scale deep learning, the most interesting AI models are massive black boxes that are hard to run. Ordinary commercial inference service APIs let you interact with huge models, but they do not let you access model internals.
The nnsight library is different: it gives you full access to all the neural network internals. When used together with a remote service like the National Deep Inference Facility (NDIF), it lets you run complex experiments on huge open source models easily, with fully transparent access.
Our team wants to enable entire labs and independent researchers alike, as we believe a large, passionate, and collaborative community will produce the next big insights on a profoundly important field.
1 First, letâs start small#
The Tracing Context#
To demonstrate the core functionality and syntax of nnsight, weâll define and use a tiny two layer neural network.
[ ]:
# Install nnsight
!pip install nnsight
from IPython.display import clear_output
clear_output()
Our little model here is composed of four sub-modules, two linear layers (âlayer1â, âlayer2â). We specify the sizes of each of these modules, and create some complementary example input.
[ ]:
from collections import OrderedDict
import torch
input_size = 5
hidden_dims = 10
output_size = 2
net = torch.nn.Sequential(
OrderedDict(
[
("layer1", torch.nn.Linear(input_size, hidden_dims)),
("layer2", torch.nn.Linear(hidden_dims, output_size)),
]
)
).requires_grad_(False)
input = torch.rand((1, input_size))
The core object of the nnsight package is NNsight
. This wraps around a given pytorch model to enable the capabilities nnsight provides.
[ ]:
from nnsight import NNsight
model = NNsight(net)
Printing a Pytorch model shows a named hierarchy of modules which is very useful when accessing sub-components directly. NNsight models work the same.
[ ]:
print(model)
Sequential(
(layer1): Linear(in_features=5, out_features=10, bias=True)
(layer2): Linear(in_features=10, out_features=2, bias=True)
)
Before we actually get to using the model we just created, letâs talk about Python contexts.
Python contexts define a scope using the with
statement and are often used to create some object, or initiate some logic, that you later want to destroy or conclude.
The most common application is opening files like the following example:
with open('myfile.txt', 'r') as file:
text = file.read()
Python uses the with
keyword to enter a context-like object. This object defines logic to be run at the start of the with
block, as well as logic to be run when exiting. When using with
for a file, entering the context opens the file and exiting the context closes it. Being within the context means we can read from the file. Simple enough! Now we can discuss how nnsight
uses contexts to enable intuitive access into the internals of a neural network.
The main tool with nnsight
is a context for tracing.
We enter the tracing context by calling model.trace(<input>)
on an NNsight
model, which defines how we want to run the model. Inside the context, we will be able to customize how the neural network runs. The model is actually run upon exiting the tracing context.
[ ]:
with model.trace(input) as tracer:
pass
But whereâs the output? To get that, weâll have to learn how to request it from within the tracing context.
Getting#
Earlier, when we wrapped our little neural net with the NNsight
class. This added a couple properties to each module in the model (including the root model itself). The two most important ones are .input
and .output
.
model.input
model.output
The names are self explanatory. They correspond to the inputs and outputs of their respective modules during a forward pass of the model. We can use these attributes inside the with
block.
However, it is important to understand that the model is not executed until the end of the tracing context. How can we access inputs and outputs before the model is run? The trick is deferred execution.
.input
and .output
are Proxies for the eventual inputs and outputs of a module. In other words, when you access model.output
what you are communicating to nnsight
is, âWhen you compute the output of model
, please grab it for me and put the value into its corresponding Proxy objectâs .value
attribute.â Letâs try it:
[ ]:
with model.trace(input) as tracer:
output = model.output
print(output.value)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-34-c7e0c74b12fa> in <cell line: 5>()
3 output = model.output
4
----> 5 print(output.value)
/usr/local/lib/python3.10/dist-packages/nnsight/tracing/Proxy.py in value(self)
47
48 if not self.node.done():
---> 49 raise ValueError("Accessing Proxy value before it's been set.")
50
51 return self.node.value
ValueError: Accessing Proxy value before it's been set.
Oh no an error! âAccessing Proxy value before itâs been set.â
Why doesnât our output
have a value
?
Proxy objects will only have their value at the end of a context if we call .save()
on them. This helps to reduce memory costs. Adding .save()
fixes the error:
[ ]:
with model.trace(input) as tracer:
output = model.output.save()
print(output.value)
tensor([[ 0.1473, -0.1518]])
Success! We now have the model output. You just completed your first intervention using nnsight
.
Each time you access a moduleâs input or output, you create an intervention in the neural networkâs forward pass. Collectively these requests form the intervention graph. We call the process of executing it alongside the modelâs normal computation graph, interleaving.
On Model output
If you donât need to access anything other than the final model output, you can call the tracing context with trace=False
and not use it as a context:
output = model.trace(<inputs>, trace=False)
Just like we saved the output of the model as a whole, we can save the output of any of its submodules. We use normal Python attribute syntax. We can discover how to access them by name by printing out the model:
[ ]:
print(model)
Sequential(
(layer1): Linear(in_features=5, out_features=10, bias=True)
(layer2): Linear(in_features=10, out_features=2, bias=True)
)
[ ]:
with model.trace(input) as tracer:
l1_output = model.layer1.output.save()
print(l1_output.value)
tensor([[ 0.0458, 0.5267, 0.7119, 0.4046, 0.2460, 0.7998, 0.4485, -0.2506,
0.2968, -0.8834]])
Letâs do the same for the input of layer2. While weâre at it, letâs also drop the as tracer
, as we wonât be needing the tracer object itself for a few sections:
[ ]:
with model.trace(input):
l2_input = model.layer2.input.save()
print(l2_input.value)
((tensor([[ 0.0458, 0.5267, 0.7119, 0.4046, 0.2460, 0.7998, 0.4485, -0.2506,
0.2968, -0.8834]]),), {})
On module inputs
Notice how the value for l2_input
, was not just a single tensor. The type/shape of values from .input
is in the form of:
tuple(tuple(args), dictionary(kwargs))
Where the first index of the tuple is itself a tuple of all positional arguments, and the second index is a dictionary of the keyword arguments.
Now that we can access activations, we also want to do some post-processing on it. Letâs find out which dimension of layer1âs output has the highest value.
Functions, Methods, and Operations#
We could do this by calling torch.argmax(...)
after the tracing context or we can just leverage the fact that nnsight
handles functions and methods within the tracing context, by creating a Proxy request for it:
[ ]:
with model.trace(input):
# Note we don't need to call .save() on the output,
# as we're only using its value within the tracing context.
l1_output = model.layer1.output
l1_amax = torch.argmax(l1_output, dim=1).save()
print(l1_amax[0])
tensor(5)
Nice! That worked seamlessly, but hold on, how come we didnât need to call .value[0]
on the result? In previous sections, we were just being explicit to get an understanding of Proxies and their value. In practice, however, nnsight
knows that when outside of the tracing context we only care about the actual value, and so printing, indexing, and applying functions all immediately return and reflect the data in .value
. So for the rest of the tutorial we wonât use it.
The same principles work for methods and operations as well:
[ ]:
with model.trace(input):
value = (model.layer1.output.sum() + model.layer2.output.sum()).save()
print(value)
tensor(2.3416)
By default, torch functions, methods and all operators work with nnsight
. We also enable the use of the einops
library.
So to recap, the above code block is saying to nnsight
, âRun the model with the given input
. When the output of layer1 is computed, take its sum. Then do the same for layer2. Now that both of those are computed, add them and make sure not to delete this value as I wish to use it outside of the tracing context.â
Getting and analyzing the activations from various points in a model can be really insightful, and a number of ML techniques do exactly that. However, often times we not only want to view the computation of a model, but influence it as well.
Setting#
To demonstrate the effect of editing the flow of information through the model, letâs set the first dimension of the first layerâs output to 0. NNsight
makes this really easy using â=â operator:
[ ]:
with model.trace(input):
# Save the output before the edit to compare.
# Notice we apply .clone() before saving as the setting operation is in-place.
l1_output_before = model.layer1.output.clone().save()
# Access the 0th index of the hidden state dimension and set it to 0.
model.layer1.output[:, 0] = 0
# Save the output after to see our edit.
l1_output_after = model.layer1.output.save()
print("Before:", l1_output_before)
print("After:", l1_output_after)
Before: tensor([[ 0.0458, 0.5267, 0.7119, 0.4046, 0.2460, 0.7998, 0.4485, -0.2506,
0.2968, -0.8834]])
After: tensor([[ 0.0000, 0.5267, 0.7119, 0.4046, 0.2460, 0.7998, 0.4485, -0.2506,
0.2968, -0.8834]])
Seems our change was reflected. Now the same for the last dimension:
[ ]:
with model.trace(input):
# Save the output before the edit to compare.
# Notice we apply .clone() before saving as the setting operation is in-place.
l1_output_before = model.layer1.output.clone().save()
# Access the last index of the hidden state dimension and set it to 0.
model.layer1.output[:, hidden_dims] = 0
# Save the output after to see our edit.
l1_output_after = model.layer1.output.save()
print("Before:", l1_output_before)
print("After:", l1_output_after)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-42-a1e18ebd4137> in <cell line: 1>()
----> 1 with model.trace(input):
2
3 # Save the output before the edit to compare.
4 # Notice we apply .clone() before saving as the setting operation is in-place.
5 l1_output_before = model.layer1.output.clone().save()
/usr/local/lib/python3.10/dist-packages/nnsight/contexts/Runner.py in __exit__(self, exc_type, exc_val, exc_tb)
40 raise exc_val
41
---> 42 self._graph.tracing = False
43
44 if self.remote:
<ipython-input-42-a1e18ebd4137> in <cell line: 1>()
6
7 # Access the last index of the hidden state dimension and set it to 0.
----> 8 model.layer1.output[:, hidden_dims] = 0
9
10 # Save the output after to see our edit.
/usr/local/lib/python3.10/dist-packages/nnsight/tracing/Proxy.py in __setitem__(self, key, value)
90
91 def __setitem__(self, key: Union[Proxy, Any], value: Union[Self, Any]) -> None:
---> 92 self.node.graph.add(
93 target=operator.setitem,
94 args=[self.node, key, value],
/usr/local/lib/python3.10/dist-packages/nnsight/tracing/Graph.py in add(self, target, value, args, kwargs, name)
144 try:
145
--> 146 value = target(
147 *Node.prepare_proxy_values(_args),
148 **Node.prepare_proxy_values(_kwargs),
IndexError: index 10 is out of bounds for dimension 1 with size 10
Ah of course, we needed to index at hidden_dims - 1
not hidden_dims
. How did nnsight
know there was this indexing error before leaving the tracing context?
Earlier when discussing contexts in Python, we learned some logic happens upon entering, and some logic happens upon exiting. We know the model is actually run on exit, but what happens on enter? Our input IS actually run though the model, however under its own âfakeâ context. This means the input makes its way through all of the model operations, allowing nnsight
to record the shapes and data types of module inputs and outputs! The operations are never executed using tensors with real
values so it doesnât incur any memory costs. Then, when creating proxy requests like the setting one above, nnsight
also attempts to execute the request on the âfakeâ values we recorded. Hence, it lets us know if our request is feasible before even running the model.
On scanning
âScanningâ is what we call running âfakeâ inputs throught the model to collect information like shapes and types. âValidatingâ is what we call trying to execute your intervention proxies with âfakeâ inputs to see if they work. If you are doing anything in a loop where efficiency is important, you should turn off scanning and validating. You can turn off validating in .trace(...)
like .trace(..., validate=False)
. You can turn off scanning in Tracer.invoke(...)
(see the Batching
section) like Tracer.invoke(..., scan=False)
Letâs try again with the correct indexing, and view the shape of the output before leaving the tracing context:
[ ]:
with model.trace(input):
# Save the output before the edit to compare.
# Notice we apply .clone() before saving as the setting operation is in-place.
l1_output_before = model.layer1.output.clone().save()
print(f"layer1 output shape: {model.layer1.output.shape}")
# Access the last index of the hidden state dimension and set it to 0.
model.layer1.output[:, hidden_dims - 1] = 0
# Save the output after to see our edit.
l1_output_after = model.layer1.output.save()
print("Before:", l1_output_before)
print("After:", l1_output_after)
layer1 output shape: torch.Size([1, 10])
Before: tensor([[ 0.0458, 0.5267, 0.7119, 0.4046, 0.2460, 0.7998, 0.4485, -0.2506,
0.2968, -0.8834]])
After: tensor([[ 0.0458, 0.5267, 0.7119, 0.4046, 0.2460, 0.7998, 0.4485, -0.2506,
0.2968, 0.0000]])
We can also just replace proxy inputs and outputs with tensors of the same shape and type. Letâs use the shape information we have at our disposal to add noise to the output, and replace it with this new noised tensor:
[ ]:
with model.trace(input):
# Save the output before the edit to compare.
# Notice we apply .clone() before saving as the setting operation is in-place.
l1_output_before = model.layer1.output.clone().save()
# Create random noise with variance of .001
noise = (0.001**0.5) * torch.randn(l1_output_before.shape)
# Add to original value and replace.
model.layer1.output = l1_output_before + noise
# Save the output after to see our edit.
l1_output_after = model.layer1.output.save()
print("Before:", l1_output_before)
print("After:", l1_output_after)
Before: tensor([[ 0.0458, 0.5267, 0.7119, 0.4046, 0.2460, 0.7998, 0.4485, -0.2506,
0.2968, -0.8834]])
After: tensor([[ 0.0581, 0.5168, 0.6561, 0.4083, 0.2617, 0.7800, 0.4080, -0.2213,
0.3394, -0.9187]])
Gradients#
NNsight
can also let you apply backprop and access gradients with respect to a loss. Like .input
and .output
on modules, nnsight
also exposes .grad
on Proxies themselves (assuming they are proxies of tensors):
[ ]:
with model.trace(input):
# We need to explicitly have the tensor require grad
# as the model we defined earlier turned off requiring grad.
model.layer1.output.requires_grad = True
# We call .grad on a tensor Proxy to communicate we want to store its gradient.
# We need to call .save() of course as .grad is its own Proxy.
layer1_output_grad = model.layer1.output.grad.save()
layer2_output_grad = model.layer2.output.grad.save()
# Need a loss to propagate through the later modules in order to have a grad.
loss = model.output.sum()
loss.backward()
print("Layer 1 output gradient:", layer1_output_grad)
print("Layer 2 output gradient:", layer2_output_grad)
Layer 1 output gradient: tensor([[ 0.4545, -0.0596, -0.2059, 0.4643, -0.4211, -0.2813, 0.2126, 0.5016,
-0.0126, -0.1564]])
Layer 2 output gradient: tensor([[1., 1.]])
All of the features we learned previously, also apply to .grad
. In other words, we can apply operations to and edit the gradients. Letâs zero the grad of layer1
and double the grad of layer2
.
[ ]:
with model.trace(input):
# We need to explicitly have the tensor require grad
# as the model we defined earlier turned off requiring grad.
model.layer1.output.requires_grad = True
model.layer1.output.grad[:] = 0
model.layer2.output.grad = model.layer2.output.grad.clone() * 2
layer1_output_grad = model.layer1.output.grad.save()
layer2_output_grad = model.layer2.output.grad.save()
# Need a loss to propagate through the later modules in order to have a grad.
loss = model.output.sum()
loss.backward()
print("Layer 1 output gradient:", layer1_output_grad)
print("Layer 2 output gradient:", layer2_output_grad)
Layer 1 output gradient: tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
Layer 2 output gradient: tensor([[2., 2.]])
2 Bigger#
Now that we have the basics of nnsight
under our belt, we can scale our model up and combine the techniques weâve learned into more interesting experiments.
The NNsight
class is very bare bones. It wraps a pre-defined model and does no pre-processing on the inputs we enter. Itâs designed to be extended with more complex and powerful types of models and weâre excited to see what can be done to leverage its features.
LanguageModel#
LanguageModel
is a subclass of NNsight
. While we could define and create a model to pass in directly, LanguageModel
includes special support for Huggingface language models, including automatically loading models from a Huggingface ID, and loading the model together with the appropriate tokenizer.
Here is how you can use LanguageModel
to load GPT-2
:
[ ]:
from nnsight import LanguageModel
model = LanguageModel("openai-community/gpt2", device_map="auto")
print(model)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
GPT2LMHeadModel(
(transformer): GPT2Model(
(wte): Embedding(50257, 768)
(wpe): Embedding(1024, 768)
(drop): Dropout(p=0.1, inplace=False)
(h): ModuleList(
(0-11): 12 x GPT2Block(
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): GPT2Attention(
(c_attn): Conv1D()
(c_proj): Conv1D()
(attn_dropout): Dropout(p=0.1, inplace=False)
(resid_dropout): Dropout(p=0.1, inplace=False)
)
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): GPT2MLP(
(c_fc): Conv1D()
(c_proj): Conv1D()
(act): NewGELUActivation()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=768, out_features=50257, bias=False)
(generator): WrapperModule()
)
On Model Initialization
A few important things to note:
Keyword arguments passed to the initialization of LanguageModel
is forwarded to HuggingFace specific loading logic. In this case, device_map
specifies which devices to use and its value auto
indicates to evenly distribute it to all available GPUs (and cpu if no GPUs available). Other arguments can be found here: https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM
When we initialize LanguageModel
, we arenât yet loading the parameters of the model into memory. We are actually loading a âmetaâ version of the model which doesnât take up any memory, but still allows us to view and trace actions on it. After exiting the first tracing context, the model is then fully loaded into memory. To load into memory on initialization, you can pass dispatch=True
into LanguageModel
like
LanguageModel('openai-community/gpt2', device_map="auto", dispatch=True)
.
Letâs put together some of the features we applied to the small model, but now on GPT-2
. Unlike NNsight
, LanguageModel
does define logic to pre-process inputs upon entering the tracing context. This makes interacting with the model simpler without having to directly access the tokenizer.
In the following example, we ablate the value coming from the last layerâs MLP module and decode the logits to see what token the model predicts without influence from that particular module:
[ ]:
with model.trace("The Eiffel Tower is in the city of"):
# Access the last layer using h[-1] as it's a ModuleList
# Access the first index of .output as that's where the hidden states are.
model.transformer.h[-1].mlp.output[0][:] = 0
# Logits come out of model.lm_head and we apply argmax to get the predicted token ids.
token_ids = model.lm_head.output.argmax(dim=-1).save()
print("Token IDs:", token_ids)
# Apply the tokenizer to decode the ids into words after the tracing context.
print("Prediction:", model.tokenizer.decode(token_ids[0][-1]))
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Token IDs: tensor([[ 262, 12, 417, 8765, 11, 257, 262, 3504, 338, 3576]])
Prediction: London
You just ran a little intervention on a much more complex model with a lot more parameters! An important piece of information weâre missing though is what the prediction would look like without our ablation.
Of course we could just run two tracing contexts and compare the outputs. This, however, would require two forward passes through the model. NNsight
can do better than that.
Batching#
Itâs time to bring back the Tracer
object we dropped before. See, when you call .trace(...)
with some input, itâs actually creating two different contexts behind the scenes. The second one is the invoker context. Being within this context just means that .input
and .output
should refer only to the input youâve given invoke. Calling .trace(...)
with some input just means thereâs only one input and therefore only one invoker context.
We can call .trace()
without input and call Tracer.invoke(...)
to manually create the invoker context with our input. Now every subsequent time we call .invoke(...)
, new interventions will only refer to the input in that particular invoke. When exiting the tracing context, the inputs from all of the invokers will be batched together, and they will be executed in one forward pass! So letâs do the ablation experiment, and compute a âcontrolâ output to compare to:
On the invoker context
Note that when injecting data to only the relevant invoker interventions, nnsight
tries, but canât guarantee, that it can narrow the data into the right batch idxs (in the case of an object as input or output). So there are cases where all invokes will get all of the data.
Just like .trace(...)
created a Tracer
object, .invoke(...)
creates an Invoker
object. The Invoker
object has post-processed inputs at invoker.inputs
, which can be useful for seeing information about your input. If you are using .trace(...)
with inputs, you can still access the invoker object at tracer._invoker
.
Keyword arguments given to .invoke(..)
make its way to the input pre-processing. For example in LanguageModel
, the keyword arguments are used to tokenize like max_length
and truncation
. If you need to pass in keyword arguments directly to one input .trace(...)
, you can pass an invoker_args
keyword argument that should be a dictionary of keyword arguments for the invoker. .trace(..., invoker_args={...})
[ ]:
with model.trace() as tracer:
with tracer.invoke("The Eiffel Tower is in the city of"):
# Ablate the last MLP for only this batch.
model.transformer.h[-1].mlp.output[0][:] = 0
# Get the output for only the intervened on batch.
token_ids_intervention = model.lm_head.output.argmax(dim=-1).save()
with tracer.invoke("The Eiffel Tower is in the city of"):
# Get the output for only the original batch.
token_ids_original = model.lm_head.output.argmax(dim=-1).save()
print("Original token IDs:", token_ids_original)
print("Intervention token IDs:", token_ids_intervention)
print("Original prediction:", model.tokenizer.decode(token_ids_original[0][-1]))
print("Intervention prediction:", model.tokenizer.decode(token_ids_intervention[0][-1]))
Original token IDs: tensor([[ 198, 12, 417, 8765, 318, 257, 262, 3504, 7372, 6342]])
Intervention token IDs: tensor([[ 262, 12, 417, 8765, 11, 257, 262, 3504, 338, 3576]])
Original prediction: Paris
Intervention prediction: London
So it did end up affecting what the model predicted. Thatâs pretty neat!
Another cool thing with multiple invokes is that the Proxies can interact between them. Here we transfer the word token embeddings from a real prompt into another placeholder prompt. Therefore the latter prompt produces the output of the former prompt:
[ ]:
with model.trace() as tracer:
with tracer.invoke("The Eiffel Tower is in the city of"):
embeddings = model.transformer.wte.output
with tracer.invoke("_ _ _ _ _ _ _ _ _ _"):
model.transformer.wte.output = embeddings
token_ids_intervention = model.lm_head.output.argmax(dim=-1).save()
with tracer.invoke("_ _ _ _ _ _ _ _ _ _"):
token_ids_original = model.lm_head.output.argmax(dim=-1).save()
print("Original prediction:", model.tokenizer.decode(token_ids_original[0][-1]))
print("Intervention prediction:", model.tokenizer.decode(token_ids_intervention[0][-1]))
Original prediction: _
Intervention prediction: Paris
.next()#
Some HuggingFace models define methods to generate multiple outputs at a time. LanguageModel
wraps that functionality to provide the same tracing features by using .generate(...)
instead of .trace(...)
. This calls the underlying modelâs .generate
method. It passes the output through a model.generator
module that weâve added onto the model, allowing you to get the generate output at model.generator.output
.
In a case like this, the underlying model is called more than once; the modules of said model produce more than one output. Which iteration should a given module.output
refer to? Thatâs where Module.next()
comes in.
Each module has a call idx associated with it and .next()
simply increments that attribute. At the time of execution, data is injected into the intervention graph only at the iteration that matches the call idx.
[ ]:
with model.generate("The Eiffel Tower is in the city of", max_new_tokens=3):
token_ids_1 = model.lm_head.output.argmax(dim=-1).save()
token_ids_2 = model.lm_head.next().output.argmax(dim=-1).save()
token_ids_3 = model.lm_head.next().output.argmax(dim=-1).save()
output = model.generator.output.save()
print("Prediction 1: ", model.tokenizer.decode(token_ids_1[0][-1]))
print("Prediction 2: ", model.tokenizer.decode(token_ids_2[0][-1]))
print("Prediction 3: ", model.tokenizer.decode(token_ids_3[0][-1]))
print("All token ids: ", output)
print("All prediction: ", model.tokenizer.batch_decode(output))
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Prediction 1: Paris
Prediction 2: ,
Prediction 3: and
All token ids: tensor([[ 464, 412, 733, 417, 8765, 318, 287, 262, 1748, 286, 6342, 11,
290]])
All prediction: ['The Eiffel Tower is in the city of Paris, and']
3 I thought you said huge models?#
NNsight
is only one part of our project to democratize access to AI internals. The other half is NDIF
(National Deep Inference Facility).
The interaction between the two is fairly straightforward. The intervention graph
we create via the tracing context can be encoded into a custom json format and sent via an http request to the NDIF
servers. NDIF
then decodes the intervention graph
and interleaves
it alongside the specified model.
To see which models are currently being hosted, check out the following status page: https://nnsight.net/status/
Remote execution#
In its current state, NDIF
requires an API key. To run the rest of this Colab, you would need to obtain your own API key. To do so, simply register for an NDIF account. After registering, you can manage and generate your own API keys.
With a valid API key, you then can configure nnsight
by doing the following:
[ ]:
from nnsight import CONFIG
CONFIG.set_default_api_key("<your api key here>")
This only needs to be run once as it will save this api key as the default in a config file along with the nnsight
installation.
To amp things up a few levels, letâs demonstrate using nnsight
âs tracing context with one of the larger open source language models, Llama-2-70b
!
[ ]:
import os
# llama2 70b is a gated model and you need access via your huggingface token
os.environ['HF_TOKEN'] = "<your huggingface token>"
# llama response object requires the version of transformers from github
!pip uninstall -y transformers
!pip install git+https://github.com/huggingface/transformers
clear_output()
[ ]:
# We'll never actually load the parameters so no need to specify a device_map.
model = LanguageModel("meta-llama/Llama-2-70b-hf")
# All we need to specify using NDIF vs executing locally is remote=True.
with model.trace("The Eiffel Tower is in the city of", remote=True) as runner:
hidden_states = model.model.layers[-1].output.save()
output = model.output.save()
print(hidden_states)
print(output["logits"])
It really is as simple as remote=True
. All of the techniques we went through in earlier sections work just the same when running locally and remotely.
Note that both nnsight
, but especially NDIF
, is in active development and therefore there may be caveats, changes, and errors to work through.
Getting Involved!#
If youâre interested in following updates to nnsight
, contributing, giving feedback, or finding collaborators, please join the NDIF discord!
The Mech Interp discord is also a fantastic place to discuss all things mech interp with a really cool community.
Our website nnsight.net, has a bunch more tutorials detailing more complex interpretability techniques using nnsight
. If you want to share any of the work you do using nnsight
, let others know on either of the discords above and we might turn it into a tutorial on our website.
đ