Cross-Prompt Intervention and Batching#

Summary#

You can run multiple batches of prompts in NNsight during one forward pass! This is done with the invoke context, which defines the values on .input an .output proxies across multiple batches.

with model.trace() as tracer:

  with tracer.invoke("Prompt 1"):
    # capture output
    x = model.transformer.wte.output

  with tracer.invoke("Prompt 2"):
    # capture second output
    y = model.transformer.wte.output

You can also use the invoke context to perform perform interventions across prompts. To set new module values across different invoke contexts and ensure proper execution order of the model, you create a barrier, which tells NNsight to prepare the variables for cross-prompt interventions.

with model.trace() as tracer:
    # initialize barrier
    barrier = tracer.barrier(2)

    with tracer.invoke("Prompt 1"):
        # capture output
        x = model.transformer.wte.output

        # call barrier after collecting output
        barrier()

    with tracer.invoke("Prompt 2"):
        # call barrier before setting output in second invoke
        barrier()

        # set the collected output for the second prompt
        model.transformer.wte.output = x

Where to Use#

Batching can be used to speed up processing for multiple prompts, and cross prompt-interventions can be used to optimizing patching protocols, enabling you to perform multiple runs in one forward pass.

How to Use#

Let’s explore batching and cross-prompt interventions in greater detail. First, we’ll initialize the gpt2 model.

[ ]:
from nnsight import LanguageModel

model = LanguageModel('openai-community/gpt2', device_map='auto')

Batching#

To batch prompts in NNsight during one forward pass, you use the invoke context, which defines the values on .input an .output proxies across multiple batches.

The invoke context is called within a trace or generate context by calling with tracer.invoke("PROMPT"):. Each new invoke creates an additional batch for the model.

[23]:
with model.generate(max_new_tokens=3) as tracer:
    with tracer.invoke("Madison square garden is located in the city of New") as invoker:
        batch1 = model.generator.output.save()

    with tracer.invoke("_ _ _ _ _ _ _ _ _ _") as invoker:
        batch2 = model.generator.output.save()

print(model.tokenizer.batch_decode(batch1))
print(model.tokenizer.batch_decode(batch2))
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
['Madison square garden is located in the city of New York City.']
['_ _ _ _ _ _ _ _ _ _ _ _ _']

Note that for prompts of different lengths, NNsight performs padding on the left side by default.

Cross-Prompt Interventions#

Batching enables us to intervene on models across prompts! Operations can work between multiple invokes within the same generate or trace block. Let’s say you collect a module output that you want to apply to the same module for your second prompt. After you collect the module output in the first invoke, you apply barrier(). This tells NNsight to prepare this variable to be accessed within the next prompt. You then call barrier() in the next invoke before setting the module output to the collected variable.

Let’s try grabbing the token embeddings coming from the first prompt, "Madison square garden is located in the city of New", and replace the embeddings of the second prompt, "_ _ _ _ _ _ _ _ _ _", with them.

[21]:
with model.generate(max_new_tokens=3) as tracer:
    # create barrier
    barrier = tracer.barrier(2)
    with tracer.invoke("Madison square garden is located in the city of New") as invoker:

        embeddings = model.transformer.wte.output
        # set barrier after defining output for first invoke
        barrier()
        original = model.generator.output.save()

    with tracer.invoke("_ _ _ _ _ _ _ _ _ _") as invoker:
        # we have to wait for the output from the previous invoke
        barrier()
        model.transformer.wte.output = embeddings
        intervened = model.generator.output.save()

print(model.tokenizer.batch_decode(original))
print(model.tokenizer.batch_decode(intervened))
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
['Madison square garden is located in the city of New York City.']
['_ _ _ _ _ _ _ _ _ _ York City.']

Note: Syntax for cross-prompt interventions with batching prior to v0.5

Before NNsight 0.5, cross-prompt interventions didn’t require barrier. Here is how you would run the same cross-prompt example as above in prior versions of NNsight.

with model.generate(max_new_tokens=3) as tracer:
    with tracer.invoke("Madison square garden is located in the city of New") as invoker:
        embeddings = model.transformer.wte.output
        original = model.generator.output.save()

    with tracer.invoke("_ _ _ _ _ _ _ _ _ _") as invoker:
        model.transformer.wte.output = embeddings
        intervened = model.generator.output.save()

print(model.tokenizer.batch_decode(original))
print(model.tokenizer.batch_decode(intervened))

You can also run interventions across prompts by creating two different generate contexts, but note that this will run in two different forward passes.

[24]:
with model.generate("Madison square garden is located in the city of New", max_new_tokens=3) as tracer:

    embeddings = model.transformer.wte.output.save()
    original = model.generator.output.save()

print(model.tokenizer.batch_decode(original))

with model.generate("_ _ _ _ _ _ _ _ _ _", max_new_tokens=3) as tracer:
    # since this is a separate run, we don't have to use barriers
    model.transformer.wte.output = embeddings
    intervened = model.generator.output.save()

print(model.tokenizer.batch_decode(intervened))
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
['Madison square garden is located in the city of New York City.']
['_ _ _ _ _ _ _ _ _ _ York City.']