Remote Execution#

In its current state, NDIF requires you to receive an API key. To get one, simply go to https://login.ndif.us and sign up.

With a valid API key, you then can configure nnsight by doing the following:

[ ]:
from nnsight import CONFIG

CONFIG.set_default_api_key("YOUR_API_KEY")

This only needs to be run once as it will save this api key as the default in a config file along with the nnsight installation.

Let’s demonstrate using nnsight’s tracing context with one of the larger open source language models, Llama-3.1-70b!

[ ]:
import os

# llama3.1 70b is a gated model and you need access via your huggingface token
os.environ['HF_TOKEN'] = "YOUR_HUGGING_FACE_TOKEN"

[ ]:
# We'll never actually load the parameters so no need to specify a device_map.
llama = LanguageModel("meta-llama/Meta-Llama-3.1-70B")

# All we need to specify using NDIF vs executing locally is remote=True.
with llama.trace("The Eiffel Tower is in the city of", remote=True) as runner:

    hidden_states = llama.model.layers[-1].output.save()

    output = llama.output.save()

print(hidden_states)

print(output["logits"])
2024-08-30 07:11:21,150 MainProcess nnsight_remote INFO     36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - RECEIVED: Your job has been received and is waiting approval.
2024-08-30 07:11:21,184 MainProcess nnsight_remote INFO     36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - APPROVED: Your job was approved and is waiting to be run.
2024-08-30 07:11:21,206 MainProcess nnsight_remote INFO     36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - RUNNING: Your job has started running.
2024-08-30 07:11:21,398 MainProcess nnsight_remote INFO     36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - COMPLETED: Your job has been completed.
Downloading result:   0%|          | 0.00/9.48M [00:00<?, ?B/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Downloading result: 100%|██████████| 9.48M/9.48M [00:02<00:00, 3.21MB/s]
(tensor([[[ 5.4688, -4.9062,  2.2344,  ..., -3.6875,  0.9609,  1.2578],
         [ 1.5469, -0.6172, -1.4531,  ..., -1.1562, -0.1406, -2.1250],
         [ 1.7812, -1.8906, -1.1875,  ...,  0.1680,  0.9609,  0.5625],
         ...,
         [ 0.9453, -0.3711,  1.3516,  ...,  1.3828, -0.7969, -1.9297],
         [-0.8906,  0.3672,  0.2617,  ...,  2.4688, -0.4414, -0.6758],
         [-1.6094,  1.0938,  1.7031,  ...,  1.8672, -1.1328, -0.5000]]],
       dtype=torch.bfloat16), DynamicCache())
tensor([[[ 6.3750,  8.6250, 13.0000,  ..., -4.1562, -4.1562, -4.1562],
         [-2.8594, -2.2344, -3.0938,  ..., -8.6250, -8.6250, -8.6250],
         [ 8.9375,  3.5938,  4.5000,  ..., -3.9375, -3.9375, -3.9375],
         ...,
         [ 3.5781,  3.4531,  0.0796,  ..., -6.5625, -6.5625, -6.5625],
         [10.8750,  6.4062,  4.9375,  ..., -4.0000, -4.0000, -3.9844],
         [ 7.2500,  6.1562,  3.5156,  ..., -4.7188, -4.7188, -4.7188]]])

It really is as simple as remote=True. All of the techniques available in NNsight locally work just the same when running remotely.