Sessions#

NDIF uses a queue to handle concurrent requests from multiple users. To optimize the execution of our experiments we can use the session context to efficiently package multiple interventions together as one single request to the server.

This offers the following benefits: 1) All interventions within a session will be executed one after another without additional wait in the queue 2) All intermediate outputs of each intervention are stored on the server and can be accessed by other interventions in the same session without moving the data back and forth between NDIF and the local machine.

Let’s take a look:

[1]:
from nnsight import LanguageModel, CONFIG
import os

# we are using Llama model remotely hosted on NDIF servers
CONFIG.set_default_api_key("YOUR_API_KEY")
os.environ['HF_TOKEN'] = "YOUR_HUGGING_FACE_TOKEN"

model = LanguageModel("meta-llama/Meta-Llama-3.1-70B")
/opt/homebrew/anaconda3/envs/nnsight_local/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[2]:
with model.session(remote=True) as session:

  with model.trace("The Eiffel Tower is in the city of") as t1:
    # capture the hidden state from layer 11 at the last token
    hs_79 = model.model.layers[79].output[0][:, -1, :] # no .save()
    t1_tokens_out = model.lm_head.output.argmax(dim=-1).save()

  with model.trace("Buckingham Palace is in the city of") as t2:
    model.model.layers[1].output[0][:, -1, :] = hs_79[:]
    t2_tokens_out = model.lm_head.output.argmax(dim=-1).save()

print("\nT1 - Original Prediction: ", model.tokenizer.decode(t1_tokens_out[0][-1]))
print("T2 - Modified Prediction: ", model.tokenizer.decode(t2_tokens_out[0][-1]))
2024-08-30 08:15:49,072 MainProcess nnsight_remote INFO     1478bff8-622a-42cf-9ba0-78d3a0803a8b - RECEIVED: Your job has been received and is waiting approval.
2024-08-30 08:15:49,097 MainProcess nnsight_remote INFO     1478bff8-622a-42cf-9ba0-78d3a0803a8b - APPROVED: Your job was approved and is waiting to be run.
2024-08-30 08:15:49,114 MainProcess nnsight_remote INFO     1478bff8-622a-42cf-9ba0-78d3a0803a8b - RUNNING: Your job has started running.
2024-08-30 08:15:49,423 MainProcess nnsight_remote INFO     1478bff8-622a-42cf-9ba0-78d3a0803a8b - COMPLETED: Your job has been completed.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Downloading result: 100%|██████████| 1.69k/1.69k [00:00<00:00, 2.20MB/s]

T1 - Original Prediction:   Paris
T2 - Modified Prediction:   Paris

In the example above, we are interested in replacing the hidden state of a later layer with an earlier one. Since we are using a session, we don’t have to save the hidden state from Tracer 1 to reference it in Tracer 2.

It is important to note that all the traces defined within the session context are executed sequentially, strictly following the order of definition (i.e. t2 being executed after t1 and t3 after t2 etc.).

The session context object has its own methods to log values and be terminated early.

[3]:
with model.session(remote=True) as session:

  session.log("-- Early Stop --")
  session.exit()
2024-08-30 08:15:53,770 MainProcess nnsight_remote INFO     e794cd1a-3c4d-4300-ba68-6dcb6d8bda02 - RECEIVED: Your job has been received and is waiting approval.
2024-08-30 08:15:53,795 MainProcess nnsight_remote INFO     e794cd1a-3c4d-4300-ba68-6dcb6d8bda02 - APPROVED: Your job was approved and is waiting to be run.
2024-08-30 08:15:53,801 MainProcess nnsight_remote INFO     e794cd1a-3c4d-4300-ba68-6dcb6d8bda02 - RUNNING: Your job has started running.
2024-08-30 08:15:53,811 MainProcess nnsight_remote INFO     e794cd1a-3c4d-4300-ba68-6dcb6d8bda02 - LOG: -- Early Stop --
2024-08-30 08:15:53,823 MainProcess nnsight_remote INFO     e794cd1a-3c4d-4300-ba68-6dcb6d8bda02 - COMPLETED: Your job has been completed.
Downloading result: 100%|██████████| 928/928 [00:00<00:00, 5.66MB/s]

In addition to the benefits mentioned above, the session context also enables interesting experiments not possible with other nnsight tools - since every trace is run on its own model, it means that within one session we can run interventions between different models – for example, we can swap activations between vanilla and instruct versions of the Llama model and compare the outputs. And session can also be used to run experiments entirely locally!