Cross-Prompt Intervention#

Intervention operations work cross prompt! Use two invocations within the same generation block and operations can work between them.

In this case, we grab the token embeddings coming from the first prompt, “Madison square garden is located in the city of New” and replace the embeddings of the second prompt with them.

[1]:
from nnsight import LanguageModel

model = LanguageModel('openai-community/gpt2', device_map='cuda')
[2]:
with model.generate(max_new_tokens=3) as tracer:

    with tracer.invoke("Madison square garden is located in the city of New") as invoker:

        embeddings = model.transformer.wte.output.save()
        original = model.generator.output.save()

    with tracer.invoke("_ _ _ _ _ _ _ _ _ _") as invoker:

        model.transformer.wte.output = embeddings
        intervened = model.generator.output.save()

print(model.tokenizer.batch_decode(original))
print(model.tokenizer.batch_decode(intervened))
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
['Madison square garden is located in the city of New York City.']
['_ _ _ _ _ _ _ _ _ _ York City.']

We also could have entered a pre-saved embedding tensor as shown here:

[3]:
with model.generate("Madison square garden is located in the city of New", max_new_tokens=3) as tracer:

    embeddings = model.transformer.wte.output.save()
    original = model.generator.output.save()

print(model.tokenizer.batch_decode(original))

with model.generate("_ _ _ _ _ _ _ _ _ _", max_new_tokens=3) as tracer:

    model.transformer.wte.output = embeddings.value
    intervened = model.generator.output.save()

print(model.tokenizer.batch_decode(intervened))
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
['Madison square garden is located in the city of New York City.']
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
['_ _ _ _ _ _ _ _ _ _ York City.']