Sessions#

NDIF uses a queue to handle concurrent requests from multiple users. To optimize the execution of our experiments we can use the session context to efficiently package multiple interventions together as one single request to the server.

This offers the following benefits: 1) All interventions within a session will be executed one after another without additional wait in the queue 2) All intermediate outputs of each intervention are stored on the server and can be accessed by other interventions in the same session without moving the data back and forth between NDIF and the local machine.

Let’s take a look:

[6]:
from nnsight import CONFIG
import os

# we are using Llama model remotely hosted on NDIF servers
CONFIG.set_default_api_key("YOUR_API_KEY")
os.environ['HF_TOKEN'] = "YOUR_HUGGING_FACE_TOKEN"
[7]:
from nnsight import LanguageModel
model = LanguageModel("meta-llama/Meta-Llama-3.1-70B")
[8]:
with model.session(remote=True) as session:

  with model.trace("The Eiffel Tower is in the city of") as t1:
    # capture the hidden state from layer 11 at the last token
    hs_79 = model.model.layers[79].output[0][:, -1, :] # no .save()
    t1_tokens_out = model.lm_head.output.argmax(dim=-1).save()

  with model.trace("Buckingham Palace is in the city of") as t2:
    model.model.layers[1].output[0][:, -1, :] = hs_79[:]
    t2_tokens_out = model.lm_head.output.argmax(dim=-1).save()

print("\nT1 - Original Prediction: ", model.tokenizer.decode(t1_tokens_out[0][-1]))
print("T2 - Modified Prediction: ", model.tokenizer.decode(t2_tokens_out[0][-1]))
2025-02-06 18:00:12,636 5771f5a0-fb88-4439-8a0e-66056ff86f1e - RECEIVED: Your job has been received and is waiting approval.
2025-02-06 18:00:12,829 5771f5a0-fb88-4439-8a0e-66056ff86f1e - APPROVED: Your job was approved and is waiting to be run.
2025-02-06 18:00:13,263 5771f5a0-fb88-4439-8a0e-66056ff86f1e - RUNNING: Your job has started running.
2025-02-06 18:00:14,478 5771f5a0-fb88-4439-8a0e-66056ff86f1e - COMPLETED: Your job has been completed.
Downloading result: 100%|██████████| 1.62k/1.62k [00:00<00:00, 2.37MB/s]

T1 - Original Prediction:   Paris
T2 - Modified Prediction:   Paris

In the example above, we are interested in replacing the hidden state of a later layer with an earlier one. Since we are using a session, we don’t have to save the hidden state from Tracer 1 to reference it in Tracer 2.

It is important to note that all the traces defined within the session context are executed sequentially, strictly following the order of definition (i.e. t2 being executed after t1 and t3 after t2 etc.).

The session context object has its own methods to log values and be terminated early.

[12]:
import nnsight
with model.session(remote=True) as session:

  nnsight.log("-- Early Stop --")
  nnsight.stop

2025-02-06 18:01:54,568 a2bdb5ac-9885-45db-ac45-8e5e4bdc4c29 - RECEIVED: Your job has been received and is waiting approval.
2025-02-06 18:01:54,751 a2bdb5ac-9885-45db-ac45-8e5e4bdc4c29 - APPROVED: Your job was approved and is waiting to be run.
2025-02-06 18:01:54,970 a2bdb5ac-9885-45db-ac45-8e5e4bdc4c29 - RUNNING: Your job has started running.
2025-02-06 18:01:54,975 a2bdb5ac-9885-45db-ac45-8e5e4bdc4c29 - LOG: -- Early Stop --
2025-02-06 18:01:55,635 a2bdb5ac-9885-45db-ac45-8e5e4bdc4c29 - COMPLETED: Your job has been completed.
Downloading result: 100%|██████████| 928/928 [00:00<00:00, 6.14MB/s]

In addition to the benefits mentioned above, the session context also enables interesting experiments not possible with other nnsight tools - since every trace is run on its own model, it means that within one session we can run interventions between different models – for example, we can swap activations between vanilla and instruct versions of the Llama model and compare the outputs. And session can also be used to run experiments entirely locally!