Sessions#

NDIF uses a queue to handle concurrent requests from multiple users. To optimize the execution of our experiments we can use the session context to efficiently package multiple interventions together as one single request to the server.

This offers the following benefits:

  1. All interventions within a session will be executed one after another without additional wait in the queue

  2. All intermediate outputs of each intervention are stored on the server and can be accessed by other interventions in the same session without moving the data back and forth between NDIF and the local machine.

Let’s take a look:

[6]:
from nnsight import CONFIG
import os

# we are using Llama model remotely hosted on NDIF servers
CONFIG.set_default_api_key("YOUR_API_KEY")
os.environ['HF_TOKEN'] = "YOUR_HUGGING_FACE_TOKEN"
[7]:
from nnsight import LanguageModel
model = LanguageModel("meta-llama/Meta-Llama-3.1-70B")
[8]:
with model.session(remote=True) as session:

  with model.trace("The Eiffel Tower is in the city of") as t1:
    # capture the hidden state from layer 11 at the last token
    hs_79 = model.model.layers[79].output[0][:, -1, :] # no .save()
    t1_tokens_out = model.lm_head.output.argmax(dim=-1).save()

  with model.trace("Buckingham Palace is in the city of") as t2:
    model.model.layers[1].output[0][:, -1, :] = hs_79[:]
    t2_tokens_out = model.lm_head.output.argmax(dim=-1).save()

print("\nT1 - Original Prediction: ", model.tokenizer.decode(t1_tokens_out[0][-1]))
print("T2 - Modified Prediction: ", model.tokenizer.decode(t2_tokens_out[0][-1]))
T1 - Original Prediction:   Paris
T2 - Modified Prediction:   Paris

In the example above, we are interested in replacing the hidden state of a later layer with an earlier one. Since we are using a session, we don’t have to save the hidden state from Tracer 1 to reference it in Tracer 2.

It is important to note that all the traces defined within the session context are executed sequentially, strictly following the order of definition (i.e. t2 being executed after t1 and t3 after t2 etc.).

The session context object has its own methods to log values and be terminated early.

[12]:
import nnsight
with model.session(remote=True) as session:

  nnsight.log("-- Early Stop --")
  nnsight.stop

In addition to the benefits mentioned above, the session context also enables interesting experiments not possible with other nnsight tools - since every trace is run on its own model, it means that within one session we can run interventions between different models – for example, we can swap activations between vanilla and instruct versions of the Llama model and compare the outputs. And session can also be used to run experiments entirely locally!