Cross-Prompt Intervention and Batching#
Summary#
You can run multiple batches of prompts in NNsight during one forward pass! This is done with the invoke
context, which defines the values on .input
an .output
proxies across multiple batches.
with model.trace() as tracer:
with tracer.invoke("Prompt 1"):
# capture output
x = model.transformer.wte.output
with tracer.invoke("Prompt 2"):
# capture second output
y = model.transformer.wte.output
You can also use the invoke
context to perform perform interventions across prompts. To set new module values across different invoke
contexts and ensure proper execution order of the model, you create a barrier
, which tells NNsight to prepare the variables for cross-prompt interventions.
with model.trace() as tracer:
# initialize barrier
barrier = tracer.barrier(2)
with tracer.invoke("Prompt 1"):
# capture output
x = model.transformer.wte.output
# call barrier after collecting output
barrier()
with tracer.invoke("Prompt 2"):
# call barrier before setting output in second invoke
barrier()
# set the collected output for the second prompt
model.transformer.wte.output = x
Where to Use#
Batching can be used to speed up processing for multiple prompts, and cross prompt-interventions can be used to optimizing patching protocols, enabling you to perform multiple runs in one forward pass.
How to Use#
Let’s explore batching and cross-prompt interventions in greater detail. First, we’ll initialize the gpt2
model.
[ ]:
from nnsight import LanguageModel
model = LanguageModel('openai-community/gpt2', device_map='auto')
Batching#
To batch prompts in NNsight during one forward pass, you use the invoke
context, which defines the values on .input
an .output
proxies across multiple batches.
The invoke
context is called within a trace
or generate
context by calling with tracer.invoke("PROMPT"):
. Each new invoke creates an additional batch for the model.
[23]:
with model.generate(max_new_tokens=3) as tracer:
with tracer.invoke("Madison square garden is located in the city of New") as invoker:
batch1 = model.generator.output.save()
with tracer.invoke("_ _ _ _ _ _ _ _ _ _") as invoker:
batch2 = model.generator.output.save()
print(model.tokenizer.batch_decode(batch1))
print(model.tokenizer.batch_decode(batch2))
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
['Madison square garden is located in the city of New York City.']
['_ _ _ _ _ _ _ _ _ _ _ _ _']
Note that for prompts of different lengths, NNsight performs padding on the left side by default.
Cross-Prompt Interventions#
Batching enables us to intervene on models across prompts! Operations can work between multiple invokes
within the same generate
or trace
block. Let’s say you collect a module output that you want to apply to the same module for your second prompt. After you collect the module output in the first invoke
, you apply barrier()
. This tells NNsight to prepare this variable to be accessed within the next prompt. You then call barrier()
in the next invoke
before setting the
module output to the collected variable.
Let’s try grabbing the token embeddings coming from the first prompt, "Madison square garden is located in the city of New"
, and replace the embeddings of the second prompt, "_ _ _ _ _ _ _ _ _ _"
, with them.
[21]:
with model.generate(max_new_tokens=3) as tracer:
# create barrier
barrier = tracer.barrier(2)
with tracer.invoke("Madison square garden is located in the city of New") as invoker:
embeddings = model.transformer.wte.output
# set barrier after defining output for first invoke
barrier()
original = model.generator.output.save()
with tracer.invoke("_ _ _ _ _ _ _ _ _ _") as invoker:
# we have to wait for the output from the previous invoke
barrier()
model.transformer.wte.output = embeddings
intervened = model.generator.output.save()
print(model.tokenizer.batch_decode(original))
print(model.tokenizer.batch_decode(intervened))
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
['Madison square garden is located in the city of New York City.']
['_ _ _ _ _ _ _ _ _ _ York City.']
Note: Syntax for cross-prompt interventions with batching prior to v0.5
Before NNsight 0.5
, cross-prompt interventions didn’t require barrier
. Here is how you would run the same cross-prompt example as above in prior versions of NNsight
.
with model.generate(max_new_tokens=3) as tracer:
with tracer.invoke("Madison square garden is located in the city of New") as invoker:
embeddings = model.transformer.wte.output
original = model.generator.output.save()
with tracer.invoke("_ _ _ _ _ _ _ _ _ _") as invoker:
model.transformer.wte.output = embeddings
intervened = model.generator.output.save()
print(model.tokenizer.batch_decode(original))
print(model.tokenizer.batch_decode(intervened))
You can also run interventions across prompts by creating two different generate
contexts, but note that this will run in two different forward passes.
[24]:
with model.generate("Madison square garden is located in the city of New", max_new_tokens=3) as tracer:
embeddings = model.transformer.wte.output.save()
original = model.generator.output.save()
print(model.tokenizer.batch_decode(original))
with model.generate("_ _ _ _ _ _ _ _ _ _", max_new_tokens=3) as tracer:
# since this is a separate run, we don't have to use barriers
model.transformer.wte.output = embeddings
intervened = model.generator.output.save()
print(model.tokenizer.batch_decode(intervened))
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
['Madison square garden is located in the city of New York City.']
['_ _ _ _ _ _ _ _ _ _ York City.']
Related#
Activation Patching
Attribution Patching