Model Editing#

NNsight’s model editing feature allows you to create persistently modified versions of a model with a use of .edit(). Unlike interventions in a tracing context, which are temporary, the Editor context enables you to make lasting changes to a model instance.

This feature is useful for: * Creating modified model variants without altering the original * Applying changes that persist across multiple forward passes * Comparing interventions between original and edited models

Let’s explore how to use the Editor context to make a simple persistent change to a model:

[1]:
from nnsight import LanguageModel

model = LanguageModel('openai-community/gpt2', device_map='auto')

# we take the hidden states with the expected output "Paris"
with model.trace("The Eiffel Tower is located in the city of") as tracer:
    hs11 = model.transformer.h[11].output[0][:, -1, :].save()

# the edited model will now always predict "Paris" as the next token
with model.edit() as model_edited:
    model.transformer.h[11].output[0][:, -1, :] = hs11

# we demonstrate this by comparing the output of an unmodified model...
with model.trace("Vatican is located in the city of") as tracer:
    original_tokens = model.lm_head.output.argmax(dim=-1).save()

# ...with the output of the edited model
with model_edited.trace("Vatican is located in the city of") as tracer:
    modified_tokens = model.lm_head.output.argmax(dim=-1).save()


print("\nOriginal Prediction: ", model.tokenizer.decode(original_tokens[0][-1]))
print("Modified Prediction: ", model.tokenizer.decode(modified_tokens[0][-1]))
/opt/homebrew/anaconda3/envs/nnsight_local/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
/opt/homebrew/anaconda3/envs/nnsight_local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.

Original Prediction:   Rome
Modified Prediction:   Paris

Edits defined within an Editor context create a new, modified version of the model by default, preserving the original. This allows for safe experimentation with model changes. If you wish to modify the original model directly, you can set inplace=True when calling .edit().

Use this option cautiously, as in-place edits alter the base model for all the consequent model calls.

[ ]:
# we use the hidden state we saved above (hs11)
with model.edit(inplace=True) as model_edited:
    model.transformer.h[11].output[0][:, -1, :] = hs11

# we demonstrate this by comparing the output of an unmodified model...
with model.trace("Vatican is located in the city of") as tracer:
    modified_tokens = model.lm_head.output.argmax(dim=-1).save()

print("Modified In-place: ", model.tokenizer.decode(modified_tokens[0][-1]))
Modified In-place:   Paris

If you’ve made in-place edits to your model and need to revert these changes, .clear_edits() can help. This method removes all edits applied to the model, effectively restoring it to its original state.

[ ]:
model.clear_edits()

with model.trace("Vatican is located in the city of"):
    modified_tokens = model.lm_head.output.argmax(dim=-1).save()

print("Edits cleared: ", model.tokenizer.decode(modified_tokens[0][-1]))
Edits cleared:   Rome