Activation Patching¶

📗 You can find an interactive Colab version of this tutorial here.

Activation patching is a technique used to understand how different components of a model (e.g. intermediate layer, attention_heads) causaly contribute to its behavior. In an activation patching experiment, we modify or "patch" the activations of model components and observe the impact on its output, where the activation here refers to the vector output of the component.

Activation patching experiments typically follow these steps:

Baseline Run: Run the model on the original prompt and record the activations.
Corrupted Run: Run the model with with a counterfactual (i.e., corrupted) prompt and record the difference in output.
Patching: Replace the activations of the model component of interest with alternate activations (or zeros, which is sometimes referred to as ablation).

By systematically testing different components this way, researchers can determine how information flows through the model. One common use case is circuit identification, where a circuit is a subgraph of a full model that is responsible for a specific and human-interpretable task (e.g., detecting whether an input is in English). Activation patching can help identify which model components are essential for model performance on a given task.

In this tutorial, we use nnsight to perform a simple activation patching experiment using an indirect object identification (IOI) task.

Note: IOI Task¶

Activation patching was used to find the Indirect Object Identification (IOI) circuit in GPT-2 small. IOI is a natural language task in which a model predicts the indirect object in a sentence. IOI tasks typically involve identifying the indirect object from two names introduced in an initial dependent clause. One name (e.g. "Mary") is the subject (S1), and the other name (e.g. "John") is the indirect object (IO). In the main clause, a second occurrence of the subject (S2) typically performs an action involving the exchange of an item. The sentence always ends with the preposition "to," and the task is to correctly complete it by identifying the non-repeated name (IO).

In this exercise, we will use the following 'clean' prompt:

"After John and Mary went to the store, Mary gave a bottle of milk to"

This prompt's correct answer (and thus its indirect object) is: " John"

We will also use a corrupted prompt to test how activation patching works. This corrupted prompt will switch the identity of the indirect object, so we can test how the model responds to this change.

"After John and Mary went to the store, John gave a bottle of milk to"

This prompt's correct answer (and thus its indirect object) is: " Mary"

Setup¶

In [1]:

Copied!





try:
    import google.colab
    is_colab = True
except ImportError:
    is_colab = False

if is_colab:
    !pip install -U nnsight==0.6.1 transformers==5.3.0
try:
    import google.colab
    is_colab = True
except ImportError:
    is_colab = False

if is_colab:
    !pip install -U nnsight==0.6.1 transformers==5.3.0

Requirement already satisfied: nnsight==0.6.1 in /usr/local/lib/python3.12/dist-packages (0.6.1)
Requirement already satisfied: transformers==5.3.0 in /usr/local/lib/python3.12/dist-packages (5.3.0)
Requirement already satisfied: astor in /usr/local/lib/python3.12/dist-packages (from nnsight==0.6.1) (0.8.1)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.12/dist-packages (from nnsight==0.6.1) (3.1.2)
Requirement already satisfied: httpx in /usr/local/lib/python3.12/dist-packages (from nnsight==0.6.1) (0.28.1)
Requirement already satisfied: python-socketio[client] in /usr/local/lib/python3.12/dist-packages (from nnsight==0.6.1) (5.16.1)
Requirement already satisfied: pydantic>=2.9.0 in /usr/local/lib/python3.12/dist-packages (from nnsight==0.6.1) (2.12.3)
Requirement already satisfied: torch>=2.4.0 in /usr/local/lib/python3.12/dist-packages (from nnsight==0.6.1) (2.10.0+cu128)
Requirement already satisfied: accelerate in /usr/local/lib/python3.12/dist-packages (from nnsight==0.6.1) (1.13.0)
Requirement already satisfied: toml in /usr/local/lib/python3.12/dist-packages (from nnsight==0.6.1) (0.10.2)
Requirement already satisfied: ipython in /usr/local/lib/python3.12/dist-packages (from nnsight==0.6.1) (7.34.0)
Requirement already satisfied: rich in /usr/local/lib/python3.12/dist-packages (from nnsight==0.6.1) (13.9.4)
Requirement already satisfied: zstandard in /usr/local/lib/python3.12/dist-packages (from nnsight==0.6.1) (0.25.0)
Requirement already satisfied: huggingface-hub<2.0,>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from transformers==5.3.0) (1.5.0)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.12/dist-packages (from transformers==5.3.0) (2.0.2)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.12/dist-packages (from transformers==5.3.0) (26.0)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.12/dist-packages (from transformers==5.3.0) (6.0.3)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers==5.3.0) (2025.11.3)
Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /usr/local/lib/python3.12/dist-packages (from transformers==5.3.0) (0.22.2)
Requirement already satisfied: typer in /usr/local/lib/python3.12/dist-packages (from transformers==5.3.0) (0.24.1)
Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers==5.3.0) (0.7.0)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.12/dist-packages (from transformers==5.3.0) (4.67.3)
Requirement already satisfied: filelock>=3.10.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<2.0,>=1.3.0->transformers==5.3.0) (3.25.0)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<2.0,>=1.3.0->transformers==5.3.0) (2025.3.0)
Requirement already satisfied: hf-xet<2.0.0,>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<2.0,>=1.3.0->transformers==5.3.0) (1.3.2)
Requirement already satisfied: typing-extensions>=4.1.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<2.0,>=1.3.0->transformers==5.3.0) (4.15.0)
Requirement already satisfied: anyio in /usr/local/lib/python3.12/dist-packages (from httpx->nnsight==0.6.1) (4.12.1)
Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx->nnsight==0.6.1) (2026.2.25)
Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx->nnsight==0.6.1) (1.0.9)
Requirement already satisfied: idna in /usr/local/lib/python3.12/dist-packages (from httpx->nnsight==0.6.1) (3.11)
Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx->nnsight==0.6.1) (0.16.0)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.9.0->nnsight==0.6.1) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.4 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.9.0->nnsight==0.6.1) (2.41.4)
Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.9.0->nnsight==0.6.1) (0.4.2)
Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (75.2.0)
Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (1.14.0)
Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (3.6.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (3.1.6)
Requirement already satisfied: cuda-bindings==12.9.4 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (12.9.4)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (12.8.93)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (12.8.90)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (12.8.90)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (12.8.4.1)
Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (11.3.3.83)
Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (10.3.9.90)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (11.7.3.90)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (12.5.8.93)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.5 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (2.27.5)
Requirement already satisfied: nvidia-nvshmem-cu12==3.4.5 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (3.4.5)
Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (12.8.90)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (12.8.93)
Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (1.13.1.3)
Requirement already satisfied: triton==3.6.0 in /usr/local/lib/python3.12/dist-packages (from torch>=2.4.0->nnsight==0.6.1) (3.6.0)
Requirement already satisfied: cuda-pathfinder~=1.1 in /usr/local/lib/python3.12/dist-packages (from cuda-bindings==12.9.4->torch>=2.4.0->nnsight==0.6.1) (1.4.0)
Requirement already satisfied: psutil in /usr/local/lib/python3.12/dist-packages (from accelerate->nnsight==0.6.1) (5.9.5)
Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.12/dist-packages (from ipython->nnsight==0.6.1) (0.19.2)
Requirement already satisfied: decorator in /usr/local/lib/python3.12/dist-packages (from ipython->nnsight==0.6.1) (4.4.2)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.12/dist-packages (from ipython->nnsight==0.6.1) (0.7.5)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.12/dist-packages (from ipython->nnsight==0.6.1) (5.7.1)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from ipython->nnsight==0.6.1) (3.0.52)
Requirement already satisfied: pygments in /usr/local/lib/python3.12/dist-packages (from ipython->nnsight==0.6.1) (2.19.2)
Requirement already satisfied: backcall in /usr/local/lib/python3.12/dist-packages (from ipython->nnsight==0.6.1) (0.2.0)
Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.12/dist-packages (from ipython->nnsight==0.6.1) (0.2.1)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.12/dist-packages (from ipython->nnsight==0.6.1) (4.9.0)
Requirement already satisfied: bidict>=0.21.0 in /usr/local/lib/python3.12/dist-packages (from python-socketio[client]->nnsight==0.6.1) (0.23.1)
Requirement already satisfied: python-engineio>=4.11.0 in /usr/local/lib/python3.12/dist-packages (from python-socketio[client]->nnsight==0.6.1) (4.13.1)
Requirement already satisfied: requests>=2.21.0 in /usr/local/lib/python3.12/dist-packages (from python-socketio[client]->nnsight==0.6.1) (2.32.4)
Requirement already satisfied: websocket-client>=0.54.0 in /usr/local/lib/python3.12/dist-packages (from python-socketio[client]->nnsight==0.6.1) (1.9.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich->nnsight==0.6.1) (4.0.0)
Requirement already satisfied: click>=8.2.1 in /usr/local/lib/python3.12/dist-packages (from typer->transformers==5.3.0) (8.3.1)
Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer->transformers==5.3.0) (1.5.4)
Requirement already satisfied: annotated-doc>=0.0.2 in /usr/local/lib/python3.12/dist-packages (from typer->transformers==5.3.0) (0.0.4)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /usr/local/lib/python3.12/dist-packages (from jedi>=0.16->ipython->nnsight==0.6.1) (0.8.6)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich->nnsight==0.6.1) (0.1.2)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.12/dist-packages (from pexpect>4.3->ipython->nnsight==0.6.1) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.12/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->nnsight==0.6.1) (0.6.0)
Requirement already satisfied: simple-websocket>=0.10.0 in /usr/local/lib/python3.12/dist-packages (from python-engineio>=4.11.0->python-socketio[client]->nnsight==0.6.1) (1.1.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests>=2.21.0->python-socketio[client]->nnsight==0.6.1) (3.4.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests>=2.21.0->python-socketio[client]->nnsight==0.6.1) (2.5.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch>=2.4.0->nnsight==0.6.1) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch>=2.4.0->nnsight==0.6.1) (3.0.3)
Requirement already satisfied: wsproto in /usr/local/lib/python3.12/dist-packages (from simple-websocket>=0.10.0->python-engineio>=4.11.0->python-socketio[client]->nnsight==0.6.1) (1.3.2)

In [2]:

Copied!





from IPython.display import clear_output
import torch
from nnsight import LanguageModel

import plotly.express as px
import plotly.io as pio
pio.renderers.default = "colab" if is_colab else "plotly_mimetype+notebook_connected+colab+notebook"
from IPython.display import clear_output
import torch
from nnsight import LanguageModel

import plotly.express as px
import plotly.io as pio
pio.renderers.default = "colab" if is_colab else "plotly_mimetype+notebook_connected+colab+notebook"

Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.
Flax classes are deprecated and will be removed in Diffusers v1.0.0. We recommend migrating to PyTorch classes or pinning your version of Diffusers.

Let's start with loading gpt2 and printing its module tree:

In [3]:

Copied!





# Load gpt2
model = LanguageModel("openai-community/gpt2", device_map="auto")
clear_output()
print(model)
# Load gpt2
model = LanguageModel("openai-community/gpt2", device_map="auto")
clear_output()
print(model)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
  (generator): Generator(
    (streamer): Streamer()
  )
)

Next up, we define our clean prompt and our corrupted prompt. As prompts may be associated with many different feature circuits (i.e., circuits responsible for IOI, deciding if the language is English, or prompt refusal), choosing a counterfactual prompt with only changes directly related your feature of interest is essential.

Here, we switch the name of the repeated subject, thus swapping out the indirect object for our IOI task:

In [4]:

Copied!

clean_prompt = "After John and Mary went to the store, Mary gave a bottle of milk to"
corrupted_prompt = "After John and Mary went to the store, John gave a bottle of milk to"
clean_prompt = "After John and Mary went to the store, Mary gave a bottle of milk to"
corrupted_prompt = "After John and Mary went to the store, John gave a bottle of milk to"

We then use the tokenizer on the two words of interest (“John” and “Mary”) to find the token that represents them. That way we can grab the prediction for these two tokens and compare. Because our prompts don't end in a space, make sure to add a space before each word (i.e., the combined space + word token is what we're looking for).

In [5]:

Copied!

correct_index = model.tokenizer(" John")["input_ids"][0] # includes a space
incorrect_index = model.tokenizer(" Mary")["input_ids"][0] # includes a space

print(f"' John': {correct_index}")
print(f"' Mary': {incorrect_index}")
correct_index = model.tokenizer(" John")["input_ids"][0] # includes a space
incorrect_index = model.tokenizer(" Mary")["input_ids"][0] # includes a space

print(f"' John': {correct_index}")
print(f"' Mary': {incorrect_index}")

' John': 1757
' Mary': 5335

Patching Experiment¶

Now we can run the actual patching intervention! What does this even mean?

We now have two prompts, a "clean" one and a "corrupted" one. Intuitively, the model output for each of these prompts should be different: we'd expect the model to answer "John" for the clean prompt and "Mary" for the corrupted prompt.

In this experiment, we run the model with the clean prompt as an input and then get each layer's output value (i.e., residual stream) and calculate the logit difference between the correct and incorrect answers for this run. Next, we calculate the logit difference between the correct and incorrect answers for the corrupted prompt. Finally, we patch in the residual stream from the clean prompt into the corrupted prompt and collect the logit difference.

Typically, you would do each of these steps in different forward passes, but with NNsight we can batch our operations and do these three runs in one forward pass! This saves us a lot of time.

Step 1: Clean Run

First, we'll run the model with the clean prompt:

"After John and Mary went to the store, Mary gave a bottle of milk to"

During this clean run, we collect the final output of each layer. We also record the logit difference in the final model output between the correct answer token " John" and the incorrect token " Mary".

Step 1 code:

We will be using NNsight's batching functionality to run this experiment, but if you were to do it in separate forward passes (less optimal code!), here is what the clean forward pass would look like.

N_LAYERS = len(model.transformer.h)
clean_hs = []

# Clean run
with model.trace() as tracer:
    with tracer.invoke(clean_prompt) as invoker:
        clean_tokens = invoker.inputs[1]['input_ids'][0].save()

        # Get hidden states of all layers in the network.
        # We index the output at 0 because it's a tuple where the first index is the hidden state.
        for layer_idx in range(N_LAYERS):
            clean_hs.append(model.transformer.h[layer_idx].output[0].save())

        # Get logits from the lm_head.
        clean_logits = model.lm_head.output

        # Calculate the difference between the correct answer and incorrect answer for the clean run and save it.
        clean_logit_diff = (
            clean_logits[0, -1, correct_index] - clean_logits[0, -1, incorrect_index]
        ).save()

Step 2: Corrupted Run

Next, we run the model using the corrupted input prompt:

"After John and Mary went to the store, John gave a bottle of milk to"

During this corrupted run, we collect the logit difference in the final model output between the correct and incorrect answer tokens

Note: because we are testing changes induced by the corrupted prompt, the target answers remain the same as in the clean run. That is, the correct token is still " John" and the incorrect token is still " Mary".

Step 2 code:

We will be using NNsight's batching functionality to run this experiment, but if you were to do activation patching in three separate forward passes, here is what the corrupted forward pass would look like.

# Corrupted run
with model.trace(corrupted_prompt) as tracer:
    corrupted_logits = model.lm_head.output

    # Calculate the difference between the correct answer and incorrect answer for the corrupted run and save it.
    corrupted_logit_diff = (
        corrupted_logits[0, -1, correct_index]
        - corrupted_logits[0, -1, incorrect_index]
    ).save()

Step 3: Activation Patching Intervention

Finally, we perform our activation patching procedure. For each token position in the clean prompt, we loop through all layers of the model. Within each layer, we run a forward pass using the corrupted prompt, and patch in the corresponding activation from our clean run at the given token position. We then collect the final output difference between the correct and incorrect answer tokens for each patched activation.

Step 3 code:

We will be using NNsight's batching functionality to run this experiment, but if you were to do it in separate forward passes (less optimal code!), here is what the patching portion would look like. Note that this code will run many forward passes, one for each layer and token position combination within the nested for loops. This is why we recommend optimizing your code using invoke batching.

# Activation Patching Intervention
ioi_patching_results = []

# Iterate through all the layers
for layer_idx in range(len(model.transformer.h)):
    _ioi_patching_results = []

    # Iterate through all tokens
    for token_idx in range(len(clean_tokens)):
        # Patching corrupted run at given layer and token
        with model.trace(corrupted_prompt) as tracer:
            # Apply the patch from the clean hidden states to the corrupted hidden states.
            model.transformer.h[layer_idx].output[0][:, token_idx] = clean_hs[layer_idx][:,token_idx,:]

            patched_logits = model.lm_head.output

            patched_logit_diff = (
                patched_logits[0, -1, correct_index]
                - patched_logits[0, -1, incorrect_index]
            )

            # Calculate the improvement in the correct token after patching.
            patched_result = (patched_logit_diff - corrupted_logit_diff) / (
                clean_logit_diff - corrupted_logit_diff
            )

            _ioi_patching_results.append(patched_result.item())
            _ioi_patching_results.save()

    ioi_patching_results.append(_ioi_patching_results)

Now that we understand each of the steps in the activation patching workflow, let's try implementing the whole experiment in one forward pass by breaking up each of our inputs into multiple invocation calls and batching them.

In [6]:

Copied!

tokenized_prompt = model.tokenizer(clean_prompt)["input_ids"]
print(model.tokenizer.decode(tokenized_prompt))
tokenized_prompt = model.tokenizer(clean_prompt)["input_ids"]
print(model.tokenizer.decode(tokenized_prompt))

After John and Mary went to the store, Mary gave a bottle of milk to

To batch our opperations we use an empty model.trace() together with several tracer.invoke(prompt) calls. Each invoker adds a seperate thread to our batch, so we can run all necessary computations in one forward pass. To omit synchronisation errors we also have to use tracer.barrier(). You can find more information on cross prompt intervention in the nnsight documentation [here].

So lets define a function that does that for us.

In [7]:

Copied!





def gpt2_activation_patching(clean_prompt, corrupted_prompt, N_LAYERS):
    ioi_patching_results = []
    with torch.no_grad():

        with model.trace() as tracer:

            #STEP 0: Define barriers for each layer and token
            barriers_per_layer = [tracer.barrier(len(tokenized_prompt)+1) for _ in range(N_LAYERS)]


            # STEP 1: Clean run
            with tracer.invoke(clean_prompt) as invoker:

                # At each layer save the output of the current layer.
                for layer_idx in range(N_LAYERS):
                    hidden_state = model.transformer.h[layer_idx].output

                    # Call barrier so nnsight can prepare the hidden states for the patching invoke.
                    # Now, all other invoker will complete their code until the same barrier is called.
                    barriers_per_layer[layer_idx]()

                # Get logits from the lm_head and calculate the difference between the correct answer and incorrect answer for the clean run and save it.
                clean_logits = model.lm_head.output
                clean_logit_diff = (
                    clean_logits[0, -1, correct_index] - clean_logits[0, -1, incorrect_index]
                ).save()


            # STEP 2: Corrupt run
            with tracer.invoke(corrupted_prompt) as invoker:
                corrupted_logits = model.lm_head.output

                # Calculate the difference between the correct answer and incorrect answer for the corrupted run and save it.
                corrupted_logit_diff = (
                    corrupted_logits[0, -1, correct_index]
                    - corrupted_logits[0, -1, incorrect_index]
                ).save()


            # STEP 3: Activation Patching Intervention – across all layers & token positions
            for layer_idx in range(len(model.transformer.h)):
                local_ioi_patching_results = []
                for token_idx in range(len(tokenized_prompt)):

                    # Call a seperate invoker for each layer and token postion
                    with tracer.invoke(corrupted_prompt) as invoker:

                        # Call the barrier so nnsight can grab the hidden states for this invoke
                        barriers_per_layer[layer_idx]()

                        # Patch (replace) the clean hidden states over the corrupted hidden states.
                        model.transformer.h[layer_idx].output[:, token_idx, :] = hidden_state[:, token_idx, :]

                        # Get logits from the lm_head and calculate the difference between the correct answer and incorrect answer for this invoker.
                        patched_logits = model.lm_head.output
                        patched_logit_diff = (
                            patched_logits[0, -1, correct_index]
                            - patched_logits[0, -1, incorrect_index]
                        )

                        # Calculate the improvement in the correct token after patching and append it to the local result list.
                        patched_result = (patched_logit_diff - corrupted_logit_diff) / (
                            clean_logit_diff - corrupted_logit_diff
                        )
                        local_ioi_patching_results.append(patched_result.item())

                # Append local results to global list and save
                ioi_patching_results.append(local_ioi_patching_results)
                ioi_patching_results.save()

    return clean_logit_diff, corrupted_logit_diff, ioi_patching_results
def gpt2_activation_patching(clean_prompt, corrupted_prompt, N_LAYERS):
    ioi_patching_results = []
    with torch.no_grad():

        with model.trace() as tracer:

            #STEP 0: Define barriers for each layer and token
            barriers_per_layer = [tracer.barrier(len(tokenized_prompt)+1) for _ in range(N_LAYERS)]


            # STEP 1: Clean run
            with tracer.invoke(clean_prompt) as invoker:

                # At each layer save the output of the current layer.
                for layer_idx in range(N_LAYERS):
                    hidden_state = model.transformer.h[layer_idx].output

                    # Call barrier so nnsight can prepare the hidden states for the patching invoke.
                    # Now, all other invoker will complete their code until the same barrier is called.
                    barriers_per_layer[layer_idx]()

                # Get logits from the lm_head and calculate the difference between the correct answer and incorrect answer for the clean run and save it.
                clean_logits = model.lm_head.output
                clean_logit_diff = (
                    clean_logits[0, -1, correct_index] - clean_logits[0, -1, incorrect_index]
                ).save()


            # STEP 2: Corrupt run
            with tracer.invoke(corrupted_prompt) as invoker:
                corrupted_logits = model.lm_head.output

                # Calculate the difference between the correct answer and incorrect answer for the corrupted run and save it.
                corrupted_logit_diff = (
                    corrupted_logits[0, -1, correct_index]
                    - corrupted_logits[0, -1, incorrect_index]
                ).save()


            # STEP 3: Activation Patching Intervention – across all layers & token positions
            for layer_idx in range(len(model.transformer.h)):
                local_ioi_patching_results = []
                for token_idx in range(len(tokenized_prompt)):

                    # Call a seperate invoker for each layer and token postion
                    with tracer.invoke(corrupted_prompt) as invoker:

                        # Call the barrier so nnsight can grab the hidden states for this invoke
                        barriers_per_layer[layer_idx]()

                        # Patch (replace) the clean hidden states over the corrupted hidden states.
                        model.transformer.h[layer_idx].output[:, token_idx, :] = hidden_state[:, token_idx, :]

                        # Get logits from the lm_head and calculate the difference between the correct answer and incorrect answer for this invoker.
                        patched_logits = model.lm_head.output
                        patched_logit_diff = (
                            patched_logits[0, -1, correct_index]
                            - patched_logits[0, -1, incorrect_index]
                        )

                        # Calculate the improvement in the correct token after patching and append it to the local result list.
                        patched_result = (patched_logit_diff - corrupted_logit_diff) / (
                            clean_logit_diff - corrupted_logit_diff
                        )
                        local_ioi_patching_results.append(patched_result.item())

                # Append local results to global list and save
                ioi_patching_results.append(local_ioi_patching_results)
                ioi_patching_results.save()

    return clean_logit_diff, corrupted_logit_diff, ioi_patching_results

Now lets run our function to obtain the activation patching results

In [8]:

Copied!

N_LAYERS = len(model.transformer.h)

clean_logit_diff, corrupted_logit_diff, ioi_patching_results = gpt2_activation_patching(clean_prompt, corrupted_prompt, N_LAYERS)
N_LAYERS = len(model.transformer.h)

clean_logit_diff, corrupted_logit_diff, ioi_patching_results = gpt2_activation_patching(clean_prompt, corrupted_prompt, N_LAYERS)

Visualize Results¶

Let's define a function to plot our activation patching results.

In [9]:

Copied!





def plot_ioi_patching_results(ioi_patching_results,
                              x_labels,
                              plot_title="Normalized Logit Difference After Patching Residual Stream on the IOI Task"):

    fig = px.imshow(
        ioi_patching_results,
        color_continuous_midpoint=0.0,
        color_continuous_scale="RdBu",
        labels={"x": "Position", "y": "Layer","color":"Norm. Logit Diff"},
        x=x_labels,
        title=plot_title,
    )

    return fig
def plot_ioi_patching_results(ioi_patching_results,
                              x_labels,
                              plot_title="Normalized Logit Difference After Patching Residual Stream on the IOI Task"):

    fig = px.imshow(
        ioi_patching_results,
        color_continuous_midpoint=0.0,
        color_continuous_scale="RdBu",
        labels={"x": "Position", "y": "Layer","color":"Norm. Logit Diff"},
        x=x_labels,
        title=plot_title,
    )

    return fig

Let's see how the patching intervention changes the logit difference! Let's use a heatmap to examine how the logit difference changes after patching each layer's output across token positions.

In [10]:

Copied!





print(f"Clean logit difference: {clean_logit_diff:.3f}")
print(f"Corrupted logit difference: {corrupted_logit_diff:.3f}")

clean_decoded_tokens = [model.tokenizer.decode(token) for token in tokenized_prompt]
token_labels = [f"{token}_{index}" for index, token in enumerate(clean_decoded_tokens)]

fig = plot_ioi_patching_results(ioi_patching_results,token_labels,"Patching GPT-2-small Residual Stream on IOI task")
fig.show()
print(f"Clean logit difference: {clean_logit_diff:.3f}")
print(f"Corrupted logit difference: {corrupted_logit_diff:.3f}")

clean_decoded_tokens = [model.tokenizer.decode(token) for token in tokenized_prompt]
token_labels = [f"{token}_{index}" for index, token in enumerate(clean_decoded_tokens)]

fig = plot_ioi_patching_results(ioi_patching_results,token_labels,"Patching GPT-2-small Residual Stream on IOI task")
fig.show()

Clean logit difference: 4.124
Corrupted logit difference: -2.272

In the above plot, we see that patching the clean residual stream into the corrupted model does not change much in the final token difference for input tokens 0-9. This is expected, as there is no difference in the clean vs. corrupted prompt for these tokens, so patching in the clean activations at this point shouldn't change the model prediction.

However, when we get to token #10, "Mary", where the subject is introduced for the second time, there is a sharp increase in output logit difference, indicating that the patch changes how the model predicts the outcome downstream, particularly for the earlier layers. Towards the middel layers, the logit differences are decreasing. We are thus seeing how the network is tracking information about the indirect object as the layers progress.

A similar but opposite effect is observed when the activations for the final prompt token are patched: the normalized logit difference increases after a transition period in the middle layers.

Limitations¶

Although activation patching is an effective technique for circuit localization, it requires running a forward pass through the model for every patch, making it computationally expensive.

Attribution patching is an approximation of activation patching that helps scale the technique to larger experiments and models. See our attribution patching tutorial here to try it out!

Trying on a bigger model¶

Although the original IOI experiment was performed on GPT-2 small, NDIF allows researchers to explore similar problems on largescale models!

Let's see how the activations of Llama 3.1-8B contributes to the IOI task using activation patching with NDIF's remote infrastructure.

To make it a bit more interesting, we will implement attention head activation patching and consider several alerations we can make to our previous code to make it run more efficient.

NNsight Remote Setup¶

Make sure you have obtained your NDIF API key and configured your workspace for remote execution. Further you should create an account with huggingface, create and access token and request access to the LLama 3.1 8B model.

In [11]:

Copied!





from nnsight import CONFIG
if is_colab:
    # include your HuggingFace Token and NNsight API key on Colab secrets
    from google.colab import userdata
    NDIF_API = userdata.get('NDIF_API')
    HF_TOKEN = userdata.get('HF_TOKEN')

    CONFIG.set_default_api_key(NDIF_API)
    !huggingface-cli login -token HF_TOKEN

clear_output()
from nnsight import CONFIG
if is_colab:
    # include your HuggingFace Token and NNsight API key on Colab secrets
    from google.colab import userdata
    NDIF_API = userdata.get('NDIF_API')
    HF_TOKEN = userdata.get('HF_TOKEN')

    CONFIG.set_default_api_key(NDIF_API)
    !huggingface-cli login -token HF_TOKEN

clear_output()

Let's load the llama 3.1 8B model.

Let's define some IOI prompts. Each of these prompts can be used as a 'clean' and as a 'corrupted' prompt, as each prompt has a related corrupted version with the IO switched out. I included prompts of different length so we have to deal with padding.

In [12]:

Copied!





prompts = [
    "After John and Mary went to the store, John gave a bottle of milk to",
    "After Mary and John went to the store, Mary gave a bottle of milk to",
    "When Lisa and Sarah went to the cinema, Lisa gave the ticket to",
    "When Lisa and Sarah went to the cinema, Sarah gave the ticket to"
]
prompts = [
    "After John and Mary went to the store, John gave a bottle of milk to",
    "After Mary and John went to the store, Mary gave a bottle of milk to",
    "When Lisa and Sarah went to the cinema, Lisa gave the ticket to",
    "When Lisa and Sarah went to the cinema, Sarah gave the ticket to"
]

In [13]:

Copied!

# Load model
model = LanguageModel("meta-llama/Llama-3.1-8B", device_map='auto')
print(model)
# Load model
model = LanguageModel("meta-llama/Llama-3.1-8B", device_map='auto')
print(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): LlamaRMSNorm((4096,), eps=1e-05)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (lm_head): Linear(in_features=4096, out_features=128256, bias=False)
  (generator): Generator(
    (streamer): Streamer()
  )
)

Let's now have a look at the tokenization of the prompts. As you can see LLama 3.1 includes the <|begin_of_text|> (bos) special token at the start of each prompt. Also, the two shorter prompts include padding tokens. For our activation patching experiment we want to exclude these position.

In [14]:

Copied!

tokenized_prompts = model.tokenizer(prompts, padding=True)["input_ids"]
for i in range(len(tokenized_prompts)):
    print(model.tokenizer.convert_ids_to_tokens(tokenized_prompts[i]))
tokenized_prompts = model.tokenizer(prompts, padding=True)["input_ids"]
for i in range(len(tokenized_prompts)):
    print(model.tokenizer.convert_ids_to_tokens(tokenized_prompts[i]))

['<|begin_of_text|>', 'After', 'ĠJohn', 'Ġand', 'ĠMary', 'Ġwent', 'Ġto', 'Ġthe', 'Ġstore', ',', 'ĠJohn', 'Ġgave', 'Ġa', 'Ġbottle', 'Ġof', 'Ġmilk', 'Ġto']
['<|begin_of_text|>', 'After', 'ĠMary', 'Ġand', 'ĠJohn', 'Ġwent', 'Ġto', 'Ġthe', 'Ġstore', ',', 'ĠMary', 'Ġgave', 'Ġa', 'Ġbottle', 'Ġof', 'Ġmilk', 'Ġto']
['<|end_of_text|>', '<|end_of_text|>', '<|begin_of_text|>', 'When', 'ĠLisa', 'Ġand', 'ĠSarah', 'Ġwent', 'Ġto', 'Ġthe', 'Ġcinema', ',', 'ĠLisa', 'Ġgave', 'Ġthe', 'Ġticket', 'Ġto']
['<|end_of_text|>', '<|end_of_text|>', '<|begin_of_text|>', 'When', 'ĠLisa', 'Ġand', 'ĠSarah', 'Ġwent', 'Ġto', 'Ġthe', 'Ġcinema', ',', 'ĠSarah', 'Ġgave', 'Ġthe', 'Ġticket', 'Ġto']

Define the answers to these prompts, formatted as (correct, incorrect)

In [15]:

Copied!





answers = [
    (" Mary", " John"),
    (" John", " Mary"),
    (" Sarah", " Lisa"),
    (" Lisa", " Sarah")
]
answers = [
    (" Mary", " John"),
    (" John", " Mary"),
    (" Sarah", " Lisa"),
    (" Lisa", " Sarah")
]

From each example in our prompts we will now creat a clean and a corrupted version. We further use the special_token_mask to create a mask to later exclude padding and bos tokens from the experiment

To streamline the process, we create a function.

In [16]:

Copied!





def prepare_patching_examples(model, prompts, answers):
    # Tokenize prompts and crete the patching mask
    clean_inputs = model.tokenizer(prompts, return_tensors="pt", padding=True, return_special_tokens_mask=True)
    clean_tokens = clean_inputs['input_ids']
    attention_maks = clean_inputs['attention_mask']
    patching_mask = ~clean_inputs['special_tokens_mask'].bool()

    # To get the corrupted inputs we just switch the postions of the prompt of all pairs.
    corrupted_tokens = clean_tokens[
        [(i + 1 if i % 2 == 0 else i - 1) for i in range(len(prompts))]
    ]

    # Tokenize answers for each prompt:
    answer_token_indices = torch.tensor([
            [model.tokenizer(answers[i][j])["input_ids"][1] for j in range(2)]
            for i in range(len(answers))
    ])

    return clean_tokens, corrupted_tokens, attention_maks, patching_mask, answer_token_indices

clean_tokens, corrupted_tokens, attention_maks,\
    patching_mask, answer_token_indices = prepare_patching_examples(model, prompts, answers)

print("clean_tokens.shape = ", clean_tokens.shape)
print("corrupted_tokens.shape = ", corrupted_tokens.shape)
print("patching_mask.shape = ", patching_mask.shape)
print("answer_tokens = " , answer_token_indices)
def prepare_patching_examples(model, prompts, answers):
    # Tokenize prompts and crete the patching mask
    clean_inputs = model.tokenizer(prompts, return_tensors="pt", padding=True, return_special_tokens_mask=True)
    clean_tokens = clean_inputs['input_ids']
    attention_maks = clean_inputs['attention_mask']
    patching_mask = ~clean_inputs['special_tokens_mask'].bool()

    # To get the corrupted inputs we just switch the postions of the prompt of all pairs.
    corrupted_tokens = clean_tokens[
        [(i + 1 if i % 2 == 0 else i - 1) for i in range(len(prompts))]
    ]

    # Tokenize answers for each prompt:
    answer_token_indices = torch.tensor([
            [model.tokenizer(answers[i][j])["input_ids"][1] for j in range(2)]
            for i in range(len(answers))
    ])

    return clean_tokens, corrupted_tokens, attention_maks, patching_mask, answer_token_indices

clean_tokens, corrupted_tokens, attention_maks,\
    patching_mask, answer_token_indices = prepare_patching_examples(model, prompts, answers)

print("clean_tokens.shape = ", clean_tokens.shape)
print("corrupted_tokens.shape = ", corrupted_tokens.shape)
print("patching_mask.shape = ", patching_mask.shape)
print("answer_tokens = " , answer_token_indices)

clean_tokens.shape =  torch.Size([4, 17])
corrupted_tokens.shape =  torch.Size([4, 17])
patching_mask.shape =  torch.Size([4, 17])
answer_tokens =  tensor([[10455,  3842],
        [ 3842, 10455],
        [21077, 29656],
        [29656, 21077]])

Patching Attention Heads¶

The residual stream isn't the only model component you can apply activation patching on: let's try patching Llama's attention heads to see how they influence the IOI task! Here, we apply our patching intervention on the attention output, o_proj.output in Llama models.

Because the multihead attention of Llama models are stored in a projection matrix containing all attention heads, we will need to resize the tensor to reveal individual attention head contributions. The einops library is a handy way to resize tensors.

In [17]:

Copied!

import einops
import einops

Okay, now let's apply our three activation patching steps to our attention heads during an IOI task. Because Llama-8B is a bigger model, we're going to do this step inside a session in multiple different traces so that we don't run into memory issues. In total we have to run N_HEADS x N_LAYERS patching interventions. For LLama 3.1 8B that are 32 x 32 = 1024 interventions. We can further improve our result by skipping layers previous to our interventions by caching the original corrupted hidden states from previous runs.

In [18]:

Copied!





def activation_patching(model, prompts, answers):
    N_LAYERS = model.config.num_hidden_layers
    N_HEADS = model.config.num_attention_heads
    batch_size = len(prompts)

    # Prepare patching inputs with our previously defined function
    clean_tokens, corrupted_tokens, attention_maks,\
        patching_mask, answer_token_indices = prepare_patching_examples(model, prompts, answers)

    with torch.no_grad():
        # With setting remote=True, we send all code in the session to ndif's servers.
        with model.session(remote=True) as session:
            ioi_patching_results_all = [].save()

            # STEP 1: Clean run, grab clean activations for each attention head
            z_hs = {}
            with model.trace(**{"input_ids": clean_tokens, "attention_mask": attention_maks}) as tracer:
                for layer_idx, layer in enumerate(model.model.layers):

                    # attention output for llama models needs to be reshaped to look at individual heads
                    z = layer.self_attn.o_proj.input # dimensions [batch x seq x D_MODEL]
                    z_reshaped = einops.rearrange(z, 'b s (nh dh) -> b s nh dh',nh=N_HEADS)  # dimensions [batch x seq x N_HEADS x D_HEADS]

                    for head_idx in range(N_HEADS):
                        z_hs[layer_idx,head_idx] = z_reshaped[:,:,head_idx,:]

                # Get logits from the lm_head and calcute the logit diff.
                clean_logits = model.lm_head.output
                clean_logit_diff = (
                    clean_logits[range(batch_size), -1, answer_token_indices[:, 0]] - clean_logits[range(batch_size), -1, answer_token_indices[:, 1]]
                ).save()


            # STEP 2: Corrupted run, grab corrupted logits for later comparison.
            corrupted_hs = {}
            with model.trace(**{"input_ids": corrupted_tokens, "attention_mask": attention_maks}) as tracer:

                # cache layer outputs so we can skip pre-intervention layers in step 3.
                for layer_idx, layer in enumerate(model.model.layers):
                    corrupted_hs[layer_idx] = layer.output

                # Get logits from the lm_head and calcute the logit diff.
                corrupted_logits = model.lm_head.output
                corrupted_logit_diff = (
                    corrupted_logits[range(batch_size), -1, answer_token_indices[:, 0]] - corrupted_logits[range(batch_size), -1, answer_token_indices[:, 1]]
                ).save()


            # STEP 3: Patching runs, apply 'clean' model state at each layer and head,
            for layer_idx in range(N_LAYERS):
                local_ioi_patching_results = []

                # Create a new tracer for each layer.
                with model.trace() as tracer:
                    for head_idx in range(N_HEADS):

                        # Add a seperate batch dimension for each attention head.
                        with tracer.invoke(**{"input_ids": corrupted_tokens, "attention_mask": attention_maks}) as invoker:

                            # Layers preceeding our intervention can be skipped by incerting the cached hidden states from step 2.
                            for skip_layer_idx, layer in enumerate(model.model.layers[:layer_idx]):
                                layer.skip(corrupted_hs[skip_layer_idx])

                            # Apply the patch from the clean hidden states to the corrupted hidden state for given layer and head.
                            z = model.model.layers[layer_idx].self_attn.o_proj.input # dimensions [batch x seq x D_MODEL]
                            z_patched = einops.rearrange(z, 'b s (nh dh) -> b s nh dh',nh=N_HEADS)

                            # Patch in clean state. We apply the patching_mask to excude special tokens form our experiment.
                            z_patched[:,:,head_idx,:][patching_mask] = z_hs[layer_idx,head_idx][patching_mask]
                            z_patched = einops.rearrange(z_patched, 'b s nh dh -> b s (nh dh)') # reshape
                            model.model.layers[layer_idx].self_attn.o_proj.input = z_patched # apply to model

                            # Get logits from the lm_head and calcute the logit diff.
                            patched_logits = model.lm_head.output
                            patched_logit_diff = (
                                patched_logits[range(batch_size), -1, answer_token_indices[:, 0]] - patched_logits[range(batch_size), -1, answer_token_indices[:, 1]]
                            )

                            # Calculate the improvement in the correct token after patching.
                            patched_result = (patched_logit_diff - corrupted_logit_diff) / (
                                clean_logit_diff - corrupted_logit_diff
                            )
                            local_ioi_patching_results.append(patched_result.mean().item())

                ioi_patching_results_all.append(local_ioi_patching_results)

    return clean_logit_diff.mean().item(), corrupted_logit_diff.mean().item(), ioi_patching_results_all
def activation_patching(model, prompts, answers):
    N_LAYERS = model.config.num_hidden_layers
    N_HEADS = model.config.num_attention_heads
    batch_size = len(prompts)

    # Prepare patching inputs with our previously defined function
    clean_tokens, corrupted_tokens, attention_maks,\
        patching_mask, answer_token_indices = prepare_patching_examples(model, prompts, answers)

    with torch.no_grad():
        # With setting remote=True, we send all code in the session to ndif's servers.
        with model.session(remote=True) as session:
            ioi_patching_results_all = [].save()

            # STEP 1: Clean run, grab clean activations for each attention head
            z_hs = {}
            with model.trace(**{"input_ids": clean_tokens, "attention_mask": attention_maks}) as tracer:
                for layer_idx, layer in enumerate(model.model.layers):

                    # attention output for llama models needs to be reshaped to look at individual heads
                    z = layer.self_attn.o_proj.input # dimensions [batch x seq x D_MODEL]
                    z_reshaped = einops.rearrange(z, 'b s (nh dh) -> b s nh dh',nh=N_HEADS)  # dimensions [batch x seq x N_HEADS x D_HEADS]

                    for head_idx in range(N_HEADS):
                        z_hs[layer_idx,head_idx] = z_reshaped[:,:,head_idx,:]

                # Get logits from the lm_head and calcute the logit diff.
                clean_logits = model.lm_head.output
                clean_logit_diff = (
                    clean_logits[range(batch_size), -1, answer_token_indices[:, 0]] - clean_logits[range(batch_size), -1, answer_token_indices[:, 1]]
                ).save()


            # STEP 2: Corrupted run, grab corrupted logits for later comparison.
            corrupted_hs = {}
            with model.trace(**{"input_ids": corrupted_tokens, "attention_mask": attention_maks}) as tracer:

                # cache layer outputs so we can skip pre-intervention layers in step 3.
                for layer_idx, layer in enumerate(model.model.layers):
                    corrupted_hs[layer_idx] = layer.output

                # Get logits from the lm_head and calcute the logit diff.
                corrupted_logits = model.lm_head.output
                corrupted_logit_diff = (
                    corrupted_logits[range(batch_size), -1, answer_token_indices[:, 0]] - corrupted_logits[range(batch_size), -1, answer_token_indices[:, 1]]
                ).save()


            # STEP 3: Patching runs, apply 'clean' model state at each layer and head,
            for layer_idx in range(N_LAYERS):
                local_ioi_patching_results = []

                # Create a new tracer for each layer.
                with model.trace() as tracer:
                    for head_idx in range(N_HEADS):

                        # Add a seperate batch dimension for each attention head.
                        with tracer.invoke(**{"input_ids": corrupted_tokens, "attention_mask": attention_maks}) as invoker:

                            # Layers preceeding our intervention can be skipped by incerting the cached hidden states from step 2.
                            for skip_layer_idx, layer in enumerate(model.model.layers[:layer_idx]):
                                layer.skip(corrupted_hs[skip_layer_idx])

                            # Apply the patch from the clean hidden states to the corrupted hidden state for given layer and head.
                            z = model.model.layers[layer_idx].self_attn.o_proj.input # dimensions [batch x seq x D_MODEL]
                            z_patched = einops.rearrange(z, 'b s (nh dh) -> b s nh dh',nh=N_HEADS)

                            # Patch in clean state. We apply the patching_mask to excude special tokens form our experiment.
                            z_patched[:,:,head_idx,:][patching_mask] = z_hs[layer_idx,head_idx][patching_mask]
                            z_patched = einops.rearrange(z_patched, 'b s nh dh -> b s (nh dh)') # reshape
                            model.model.layers[layer_idx].self_attn.o_proj.input = z_patched # apply to model

                            # Get logits from the lm_head and calcute the logit diff.
                            patched_logits = model.lm_head.output
                            patched_logit_diff = (
                                patched_logits[range(batch_size), -1, answer_token_indices[:, 0]] - patched_logits[range(batch_size), -1, answer_token_indices[:, 1]]
                            )

                            # Calculate the improvement in the correct token after patching.
                            patched_result = (patched_logit_diff - corrupted_logit_diff) / (
                                clean_logit_diff - corrupted_logit_diff
                            )
                            local_ioi_patching_results.append(patched_result.mean().item())

                ioi_patching_results_all.append(local_ioi_patching_results)

    return clean_logit_diff.mean().item(), corrupted_logit_diff.mean().item(), ioi_patching_results_all

Now lets run our function.

In [19]:

Copied!

clean_logit_diff, corrupted_logit_diff, ioi_patching_results_all = activation_patching(model, prompts, answers)
clean_logit_diff, corrupted_logit_diff, ioi_patching_results_all = activation_patching(model, prompts, answers)

⠦ [2624d261-0325-4506-8eb7-d116ad9500c0] RUNNING    (8.5s) Your job has started running.

✓ [2624d261-0325-4506-8eb7-d116ad9500c0] COMPLETED  (29.3s) Your job has been completed.

Visualize Results¶

Let's use the same plotting function from earlier to visualize how patching the Llama-3.1-8B attention heads influenced model output during the IOI task.

In [20]:

Copied!





print(f"Clean logit difference: {clean_logit_diff:.3f}")
print(f"Corrupted logit difference: {corrupted_logit_diff:.3f}")

print(ioi_patching_results_all)
N_HEADS = 32
x_labels = [f"Head {i}" for i in range(N_HEADS)]

fig2 = plot_ioi_patching_results(ioi_patching_results_all, x_labels, "Patching Llama Attention Heads on IOI task")
fig2.show()
print(f"Clean logit difference: {clean_logit_diff:.3f}")
print(f"Corrupted logit difference: {corrupted_logit_diff:.3f}")

print(ioi_patching_results_all)
N_HEADS = 32
x_labels = [f"Head {i}" for i in range(N_HEADS)]

fig2 = plot_ioi_patching_results(ioi_patching_results_all, x_labels, "Patching Llama Attention Heads on IOI task")
fig2.show()

Clean logit difference: 4.375
Corrupted logit difference: -4.375
[[0.0032196044921875, 0.006805419921875, 0.0091552734375, 0.00396728515625, 0.0084228515625, 0.00160980224609375, 0.00360107421875, 0.00360107421875, 0.0087890625, 0.0012359619140625, 0.00396728515625, 0.001983642578125, 0.0064697265625, 0.0012359619140625, 0.006805419921875, 0.00286865234375, 0.0032196044921875, 0.005584716796875, 0.0087890625, 0.006805419921875, 0.01080322265625, 0.00518798828125, 0.005584716796875, 0.00518798828125, 0.0087890625, 0.00518798828125, 0.00360107421875, 0.0103759765625, 0.006805419921875, 0.00160980224609375, 0.002838134765625, 0.00518798828125], [0.0, 0.0032196044921875, 0.006805419921875, 0.00482177734375, 0.00518798828125, 0.005584716796875, 0.00160980224609375, 0.00518798828125, 0.0, 0.00037384033203125, 0.0012359619140625, 0.00360107421875, 0.00518798828125, 0.002838134765625, 0.006805419921875, 0.006805419921875, -0.001983642578125, -0.00160980224609375, 0.0064697265625, -0.00037384033203125, 0.0032196044921875, 0.00518798828125, -0.00396728515625, 0.00360107421875, -0.005950927734375, -0.00152587890625, 0.0032196044921875, 0.0072021484375, 0.00360107421875, 0.006805419921875, 0.005584716796875, 0.0], [0.006805419921875, 0.0032196044921875, 0.00160980224609375, 0.001983642578125, 0.0032196044921875, 0.00518798828125, 0.001983642578125, 0.0087890625, 0.0072021484375, 0.0032196044921875, 0.006805419921875, 0.00518798828125, 0.00286865234375, -0.002349853515625, 0.00482177734375, 0.006805419921875, 0.00408935546875, 0.0091552734375, 0.00518798828125, 0.01153564453125, 0.00360107421875, 0.0032196044921875, 0.0072021484375, 0.00518798828125, -0.005950927734375, 0.006805419921875, 0.00360107421875, 0.0012359619140625, 0.00286865234375, 0.0186767578125, 0.0012359619140625, 0.00160980224609375], [0.001983642578125, 0.006439208984375, 0.00360107421875, 0.0032196044921875, 0.0091552734375, 0.006805419921875, 0.0087890625, -0.00147247314453125, 0.0084228515625, 0.008056640625, 0.01043701171875, 0.006805419921875, 0.006805419921875, 0.0072021484375, 0.0068359375, 0.00518798828125, 0.00518798828125, 0.00482177734375, 0.0072021484375, 0.0084228515625, 0.01080322265625, 0.006805419921875, 0.0012359619140625, 0.001983642578125, 0.01116943359375, 0.00051116943359375, 0.00518798828125, 0.0032196044921875, 0.0012359619140625, 0.006805419921875, 0.0012359619140625, 0.006439208984375], [-0.001983642578125, 0.001983642578125, 0.005584716796875, 0.00482177734375, 0.005584716796875, 0.00518798828125, 0.00360107421875, 0.0032196044921875, 0.00360107421875, -0.001983642578125, 0.00360107421875, 0.001983642578125, 0.0012359619140625, 0.00482177734375, 0.006805419921875, 0.00360107421875, 0.03515625, 0.001983642578125, 0.001983642578125, 0.0012359619140625, 0.0072021484375, 0.00518798828125, 0.0012359619140625, 0.001983642578125, 0.005584716796875, 0.0084228515625, 0.0032196044921875, 0.0087890625, -0.00037384033203125, 0.0012359619140625, 0.00286865234375, 0.005584716796875], [0.00396728515625, 0.005584716796875, 0.0032196044921875, 0.00360107421875, 0.00518798828125, 0.006805419921875, 0.005584716796875, 0.0084228515625, 0.00360107421875, 0.005584716796875, 0.00160980224609375, 0.00360107421875, 0.00518798828125, 0.0032196044921875, 0.0, -0.00037384033203125, 0.00518798828125, 0.005584716796875, 0.00360107421875, 0.0012359619140625, 0.0032196044921875, 0.00482177734375, -0.00160980224609375, 0.001983642578125, 0.00518798828125, 0.00037384033203125, 0.00360107421875, 0.0064697265625, 0.0091552734375, 0.001983642578125, 0.005584716796875, 0.00396728515625], [0.00160980224609375, 0.0072021484375, 0.0032196044921875, 0.0072021484375, 0.00037384033203125, 0.0032196044921875, 0.0072021484375, 0.005584716796875, -0.00396728515625, 0.0084228515625, 0.00445556640625, 0.00037384033203125, 0.00518798828125, -0.001983642578125, 0.006805419921875, 0.002838134765625, 0.004486083984375, 0.00360107421875, 0.00360107421875, 0.0084228515625, 0.00518798828125, 0.0068359375, 0.006805419921875, 0.0, 0.0032196044921875, 0.0064697265625, 0.005584716796875, -0.002349853515625, 0.0084228515625, 0.0072021484375, 0.0064697265625, 0.006805419921875], [0.00518798828125, -0.002349853515625, 0.0072021484375, 0.00482177734375, 0.00396728515625, 0.00160980224609375, 0.0032196044921875, 0.005584716796875, -0.002349853515625, 0.005584716796875, 0.00482177734375, 0.00518798828125, 0.00360107421875, 0.0084228515625, 0.0, 0.0072021484375, 0.001983642578125, 0.00518798828125, 0.005584716796875, 0.00086212158203125, 0.0032196044921875, 0.00518798828125, 0.00360107421875, 0.00518798828125, 0.0012359619140625, 0.00360107421875, -0.00037384033203125, -0.00112152099609375, 0.00518798828125, 0.00518798828125, 0.00518798828125, 0.00518798828125], [0.00518798828125, 0.006805419921875, 0.005584716796875, 0.002838134765625, 0.001983642578125, 0.0072021484375, 0.00396728515625, 0.0, 0.009521484375, 0.0032196044921875, 0.004486083984375, 0.00396728515625, 0.00160980224609375, 0.001983642578125, 0.01239013671875, 0.0087890625, 0.0012359619140625, 0.0032196044921875, 0.0072021484375, 0.0, 0.00396728515625, 0.00518798828125, -0.00160980224609375, 0.00518798828125, 0.0032196044921875, 0.002838134765625, -0.00037384033203125, 0.00518798828125, 0.00360107421875, 0.0032196044921875, 0.002838134765625, 0.005584716796875], [0.0032196044921875, 0.006805419921875, 0.005584716796875, 0.0084228515625, 0.00518798828125, 0.0032196044921875, 0.01043701171875, 0.006805419921875, 0.00160980224609375, 0.0032196044921875, 0.00160980224609375, 0.00360107421875, 0.00160980224609375, 0.00360107421875, 0.00360107421875, 0.0032196044921875, 0.00360107421875, 0.00360107421875, 0.001983642578125, 0.007568359375, 0.001983642578125, 0.005584716796875, 0.00286865234375, 0.00518798828125, 0.0032196044921875, 0.0, 0.0072021484375, 0.00360107421875, 0.0032196044921875, 0.0012359619140625, 0.005584716796875, 0.00518798828125], [0.010009765625, -0.0164794921875, -0.0137939453125, 0.00160980224609375, 0.00518798828125, 0.005584716796875, 0.00286865234375, 0.00396728515625, 0.00360107421875, -0.00396728515625, -0.00037384033203125, 0.0, -0.001983642578125, 0.00160980224609375, 0.005584716796875, 0.006805419921875, 0.00396728515625, 0.00160980224609375, 0.0, 0.00286865234375, 0.00518798828125, 0.005615234375, 0.00518798828125, 0.0087890625, 0.00160980224609375, 0.00360107421875, 0.01080322265625, -0.001983642578125, 0.006439208984375, 0.00360107421875, 0.005584716796875, 0.001983642578125], [0.0087890625, 0.00518798828125, 0.009521484375, 0.00160980224609375, 0.00360107421875, -0.00037384033203125, 0.00518798828125, 0.00482177734375, -0.00037384033203125, 0.001983642578125, 0.00360107421875, 0.01116943359375, -0.00360107421875, 0.00160980224609375, 0.0, -0.001983642578125, 0.00518798828125, 0.00518798828125, 0.00518798828125, 0.00518798828125, 0.0084228515625, 0.01080322265625, 0.0240478515625, 0.0103759765625, 0.0087890625, 0.006805419921875, 0.001983642578125, 0.001983642578125, 0.00160980224609375, 0.0087890625, 0.0032196044921875, -0.001983642578125], [-0.00396728515625, 0.001983642578125, 0.0032196044921875, -0.00360107421875, 0.0032196044921875, 0.005584716796875, 0.0087890625, 0.01007080078125, 0.0032196044921875, 0.0103759765625, 0.005584716796875, 0.005584716796875, 0.0152587890625, -0.004852294921875, 0.004852294921875, 0.005584716796875, 0.007568359375, 0.0012359619140625, 0.0032196044921875, 0.00518798828125, 0.0091552734375, 0.01080322265625, 0.006805419921875, 0.0072021484375, 0.00518798828125, -0.0015869140625, -0.00160980224609375, 0.006103515625, 0.00160980224609375, -0.001983642578125, 0.005584716796875, 0.0072021484375], [0.0032196044921875, -0.00037384033203125, 0.00160980224609375, -0.00037384033203125, 0.006805419921875, 0.005584716796875, 0.0012359619140625, 0.00360107421875, 0.00360107421875, 0.00518798828125, 0.00396728515625, 0.00160980224609375, 0.00360107421875, 0.0064697265625, 0.0032196044921875, 0.001983642578125, 0.0087890625, 0.00518798828125, 0.00160980224609375, 0.005584716796875, 0.00360107421875, 0.0032196044921875, 0.00160980224609375, 0.0032196044921875, 0.005584716796875, 0.005584716796875, 0.0072021484375, 0.00360107421875, 0.0091552734375, 0.0072021484375, 0.0032196044921875, 0.00396728515625], [0.007568359375, 0.0072021484375, 0.006805419921875, 0.0087890625, 0.0032196044921875, 0.00396728515625, 0.037353515625, -0.00323486328125, 0.00160980224609375, 0.0, 0.0072021484375, 0.005950927734375, 0.00518798828125, 0.005584716796875, 0.0072021484375, 0.00396728515625, 0.045654296875, 0.0, 0.001983642578125, 0.05126953125, 0.00396728515625, 0.01611328125, -0.00037384033203125, 0.007568359375, -0.001983642578125, 0.00160980224609375, 0.00518798828125, 0.0087890625, 0.0091552734375, 0.00518798828125, 0.00518798828125, 0.004486083984375], [0.0032196044921875, 0.005584716796875, 0.00037384033203125, 0.005584716796875, 0.00518798828125, 0.00518798828125, 0.006805419921875, 0.002838134765625, 0.0084228515625, 0.00518798828125, -0.0103759765625, 0.006103515625, 0.00360107421875, 0.00360107421875, 0.005584716796875, 0.00360107421875, 0.0673828125, 0.0072021484375, 0.00160980224609375, 0.00396728515625, 0.01043701171875, 0.00037384033203125, 0.00360107421875, 0.00518798828125, 0.0087890625, 0.00360107421875, 0.0087890625, 0.006805419921875, 0.00360107421875, 0.00360107421875, 0.001983642578125, 0.01043701171875], [-0.01361083984375, 0.0908203125, 0.037109375, -0.01397705078125, 0.00396728515625, 0.00360107421875, 0.001983642578125, 0.0072021484375, 0.0, 0.00360107421875, 0.00160980224609375, 0.00160980224609375, 0.00160980224609375, 0.0012359619140625, 0.00360107421875, 0.006805419921875, -0.0233154296875, 0.0152587890625, 0.00445556640625, 0.00360107421875, 0.00360107421875, 0.001983642578125, 0.0230712890625, 0.00518798828125, 0.00360107421875, 0.0, 0.0012359619140625, 0.00396728515625, 0.0166015625, 0.0087890625, -0.0145263671875, 0.0], [0.005584716796875, 0.001983642578125, 0.0, 0.00518798828125, 0.00360107421875, -0.00360107421875, -0.00360107421875, -0.00037384033203125, 0.00396728515625, 0.0, 0.0072021484375, 0.00518798828125, 0.00396728515625, 0.00160980224609375, 0.00518798828125, 0.00160980224609375, 0.001983642578125, 0.002349853515625, 0.005584716796875, 0.00518798828125, 0.0072021484375, 0.0072021484375, 0.00160980224609375, 0.00160980224609375, 0.0269775390625, -0.010009765625, 0.0007476806640625, 0.1181640625, 0.005584716796875, 0.00360107421875, 0.0032196044921875, 0.00360107421875], [-0.00037384033203125, -0.001983642578125, 0.00160980224609375, 0.00396728515625, 0.00037384033203125, 0.001983642578125, 0.00037384033203125, 0.00396728515625, 0.0012359619140625, 0.005584716796875, -0.002716064453125, 0.00518798828125, 0.009521484375, 0.001983642578125, 0.00160980224609375, 0.00360107421875, 0.01153564453125, -0.0098876953125, 0.001983642578125, 0.001983642578125, 0.00360107421875, 0.001983642578125, 0.0091552734375, -0.0087890625, -0.00360107421875, 0.00160980224609375, 0.0072021484375, 0.01116943359375, -0.001983642578125, 0.057861328125, 0.009521484375, -0.001983642578125], [0.00469970703125, -0.001983642578125, 0.00037384033203125, -0.019287109375, 0.0, 0.001983642578125, 0.0, 0.001983642578125, 0.00396728515625, -0.001983642578125, 0.005584716796875, 0.0, 0.005950927734375, -0.00360107421875, 0.00396728515625, -0.006439208984375, -0.001983642578125, 0.0, 0.00396728515625, -0.00360107421875, -0.048828125, 0.00396728515625, -0.00160980224609375, 0.087890625, 0.0, -0.0012359619140625, 0.0, 0.002349853515625, -0.001983642578125, -0.00160980224609375, -0.0012359619140625, 0.00360107421875], [-0.007568359375, 0.001983642578125, 0.001983642578125, 0.044921875, 0.00037384033203125, 0.0, 0.00037384033203125, 0.00396728515625, -0.00160980224609375, -0.00286865234375, 0.001983642578125, 0.00396728515625, -0.068359375, 0.051025390625, -0.158203125, -0.0032196044921875, 0.00396728515625, 0.002349853515625, 0.001983642578125, -0.00160980224609375, 0.0027313232421875, -0.001983642578125, -0.001983642578125, 0.005584716796875, -0.0012359619140625, -0.00160980224609375, 0.00037384033203125, 0.005584716796875, 0.0072021484375, 0.00360107421875, 0.00037384033203125, 0.00360107421875], [0.0, -0.00396728515625, 0.025146484375, -0.0220947265625, 0.0, 0.0, 0.00037384033203125, 0.00396728515625, -0.01324462890625, 0.00360107421875, 0.025634765625, 0.005950927734375, 0.0, 0.001983642578125, -0.001983642578125, 0.0, 0.001983642578125, 0.00396728515625, 0.00360107421875, 0.00360107421875, 0.001983642578125, 0.0, 0.00360107421875, -0.001983642578125, -0.00037384033203125, -0.00160980224609375, -0.00360107421875, 0.001983642578125, 0.001983642578125, 0.001983642578125, -0.001983642578125, 0.00396728515625], [0.001983642578125, 0.001983642578125, 0.002349853515625, -0.0012359619140625, 0.001983642578125, 0.00360107421875, 0.00037384033203125, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.001983642578125, -0.00360107421875, 0.017333984375, -0.00518798828125, -0.016845703125, 0.00037384033203125, 0.005584716796875, -0.00396728515625, -0.023681640625, 0.1484375, 0.00360107421875, 0.00360107421875, 0.00396728515625, -0.001983642578125, 0.001983642578125, 0.001983642578125, 0.0, 0.001983642578125, 0.001983642578125, 0.0079345703125, 0.031494140625, 0.00396728515625], [-0.001983642578125, 0.00037384033203125, 0.001983642578125, -0.001983642578125, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.00160980224609375, -0.001983642578125, 0.001983642578125, -0.0012359619140625, 0.001983642578125, 0.00360107421875, 0.001983642578125, 0.0152587890625, 0.00037384033203125, 0.001983642578125, 0.0, 0.001983642578125, 0.005950927734375, 0.001983642578125, 0.001983642578125, 0.00396728515625, 0.005950927734375, -0.00360107421875, 0.00360107421875, 0.00037384033203125, 0.0, 0.001983642578125, 0.00037384033203125, -0.0072021484375, 0.00037384033203125], [-0.001983642578125, 0.00396728515625, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.0, 0.0, -0.002349853515625, 0.001983642578125, -0.00160980224609375, -0.002838134765625, -0.001983642578125, 0.001983642578125, 0.0, 0.001983642578125, 0.0, -0.00160980224609375, 0.001983642578125, 0.00037384033203125, -0.00160980224609375, -0.00396728515625, 0.00396728515625, 0.0274658203125, -0.00360107421875, -0.0087890625, 0.00037384033203125, 0.01239013671875, 0.0869140625, -0.001983642578125, 0.00396728515625, 0.0, 0.001983642578125], [0.001983642578125, -0.00396728515625, 0.00518798828125, -0.001983642578125, 0.0556640625, 0.01007080078125, 0.001983642578125, 0.001983642578125, -0.00518798828125, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.00360107421875, 0.0306396484375, -0.00396728515625, 0.00360107421875, -0.00396728515625, 0.005950927734375, -0.001983642578125, -0.001983642578125, -0.00360107421875, 0.001983642578125, 0.00396728515625, 0.0, 0.001983642578125, 0.00396728515625, 0.0, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.001983642578125], [-0.01202392578125, -0.00037384033203125, -0.00360107421875, 0.01312255859375, -0.0012359619140625, -0.001983642578125, 0.001983642578125, 0.001983642578125, 0.005950927734375, 0.001983642578125, -0.001983642578125, -0.0012359619140625, 0.052001953125, -0.00037384033203125, 0.00396728515625, -0.076171875, -0.00360107421875, 0.001983642578125, 0.0, -0.001983642578125, -0.005584716796875, 0.005584716796875, 0.00037384033203125, -0.001983642578125, 0.001983642578125, -0.001983642578125, 0.001983642578125, 0.0, -0.001983642578125, -0.00518798828125, 0.001983642578125, 0.0], [0.0, -0.0032196044921875, 0.00396728515625, 0.001983642578125, -0.00396728515625, 0.0, -0.002349853515625, 0.00433349609375, 0.00396728515625, 0.001983642578125, 0.0, -0.001983642578125, 0.001983642578125, 0.00396728515625, 0.001983642578125, 0.001983642578125, 0.0, 0.001983642578125, 0.0, 0.001983642578125, 0.1298828125, 0.008056640625, 0.0225830078125, 0.0012359619140625, 0.001983642578125, 0.001983642578125, 0.00396728515625, 0.00396728515625, -0.00360107421875, -0.001983642578125, 0.0, 0.001983642578125], [0.001983642578125, 0.001983642578125, 0.001983642578125, -0.00360107421875, 0.005950927734375, 0.001983642578125, 0.001983642578125, -0.001983642578125, 0.001983642578125, -0.001983642578125, 0.001983642578125, 0.005950927734375, 0.0189208984375, 0.00360107421875, -0.0007476806640625, -0.049560546875, 0.001983642578125, -0.0072021484375, 0.00360107421875, 0.0, 0.00396728515625, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.0, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.001983642578125], [0.001983642578125, 0.001983642578125, -0.00396728515625, 0.001983642578125, 0.0, 0.001983642578125, 0.001983642578125, 0.001983642578125, 0.027587890625, -0.001983642578125, -0.11474609375, 0.033203125, 0.001983642578125, -0.0012359619140625, -0.001983642578125, 0.00396728515625, 0.00160980224609375, 0.001983642578125, 0.0, 0.001983642578125, -0.0012359619140625, 0.001983642578125, 0.001983642578125, 0.00396728515625, -0.001983642578125, 0.00160980224609375, -0.001983642578125, -0.001983642578125, -0.00396728515625, 0.00396728515625, 0.001983642578125, -0.00037384033203125], [-0.0072021484375, -0.00037384033203125, 0.00518798828125, -0.001983642578125, 0.0, 0.0, 0.0072021484375, 0.001983642578125, -0.00396728515625, -0.001983642578125, 0.00396728515625, 0.001983642578125, 0.0091552734375, 0.053955078125, 0.0147705078125, 0.00160980224609375, -0.00396728515625, 0.00396728515625, -0.001983642578125, 0.0098876953125, 0.00518798828125, 0.001983642578125, 0.001983642578125, -0.00396728515625, 0.01116943359375, -0.0712890625, 0.0091552734375, 0.0247802734375, -0.00396728515625, -0.00396728515625, 0.037353515625, 0.0274658203125], [-0.0072021484375, 0.02197265625, -0.0159912109375, 0.01953125, -0.00396728515625, 0.001983642578125, 0.0, 0.0159912109375, -0.001983642578125, 0.0, 0.0, -0.00396728515625, 0.0032196044921875, -0.00396728515625, 0.00360107421875, 0.0, 0.0, -0.001983642578125, 0.001983642578125, -0.001983642578125, -0.00518798828125, 0.02392578125, 0.052490234375, -0.0072021484375, 0.001983642578125, 0.0, 0.0, 0.0, 0.001983642578125, 0.001983642578125, -0.001983642578125, -0.00396728515625]]