Remote Execution#
In its current state, NDIF
requires you to receive an API key. To get one, simply go to https://login.ndif.us and sign up.
With a valid API key, you then can configure nnsight
by doing the following:
[ ]:
from nnsight import CONFIG
CONFIG.set_default_api_key("YOUR_API_KEY")
This only needs to be run once as it will save this api key as the default in a config file along with the nnsight
installation.
Let’s demonstrate using nnsight
’s tracing context with one of the larger open source language models, Llama-3.1-70b
!
[ ]:
import os
# llama3.1 70b is a gated model and you need access via your huggingface token
os.environ['HF_TOKEN'] = "YOUR_HUGGING_FACE_TOKEN"
[ ]:
# We'll never actually load the parameters so no need to specify a device_map.
llama = LanguageModel("meta-llama/Meta-Llama-3.1-70B")
# All we need to specify using NDIF vs executing locally is remote=True.
with llama.trace("The Eiffel Tower is in the city of", remote=True) as runner:
hidden_states = llama.model.layers[-1].output.save()
output = llama.output.save()
print(hidden_states)
print(output["logits"])
2024-08-30 07:11:21,150 MainProcess nnsight_remote INFO 36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - RECEIVED: Your job has been received and is waiting approval.
2024-08-30 07:11:21,184 MainProcess nnsight_remote INFO 36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - APPROVED: Your job was approved and is waiting to be run.
2024-08-30 07:11:21,206 MainProcess nnsight_remote INFO 36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - RUNNING: Your job has started running.
2024-08-30 07:11:21,398 MainProcess nnsight_remote INFO 36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - COMPLETED: Your job has been completed.
Downloading result: 0%| | 0.00/9.48M [00:00<?, ?B/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Downloading result: 100%|██████████| 9.48M/9.48M [00:02<00:00, 3.21MB/s]
(tensor([[[ 5.4688, -4.9062, 2.2344, ..., -3.6875, 0.9609, 1.2578],
[ 1.5469, -0.6172, -1.4531, ..., -1.1562, -0.1406, -2.1250],
[ 1.7812, -1.8906, -1.1875, ..., 0.1680, 0.9609, 0.5625],
...,
[ 0.9453, -0.3711, 1.3516, ..., 1.3828, -0.7969, -1.9297],
[-0.8906, 0.3672, 0.2617, ..., 2.4688, -0.4414, -0.6758],
[-1.6094, 1.0938, 1.7031, ..., 1.8672, -1.1328, -0.5000]]],
dtype=torch.bfloat16), DynamicCache())
tensor([[[ 6.3750, 8.6250, 13.0000, ..., -4.1562, -4.1562, -4.1562],
[-2.8594, -2.2344, -3.0938, ..., -8.6250, -8.6250, -8.6250],
[ 8.9375, 3.5938, 4.5000, ..., -3.9375, -3.9375, -3.9375],
...,
[ 3.5781, 3.4531, 0.0796, ..., -6.5625, -6.5625, -6.5625],
[10.8750, 6.4062, 4.9375, ..., -4.0000, -4.0000, -3.9844],
[ 7.2500, 6.1562, 3.5156, ..., -4.7188, -4.7188, -4.7188]]])
It really is as simple as remote=True
. All of the techniques available in NNsight locally work just the same when running remotely.