Remote Execution#

To access remote models, NDIF requires you to receive an API key. To get one, simply go to https://login.ndif.us and sign up.

With a valid API key, you then can configure nnsight by doing the following:

[ ]:

from nnsight import CONFIG

CONFIG.set_default_api_key("YOUR_API_KEY")

This only needs to be run once as it will save this api key as the default in a config file along with the nnsight installation.

Let’s demonstrate using nnsight’s tracing context with one of the larger open source language models, Llama-3.1-70b!

[ ]:

import os

# llama3.1 70b is a gated model and you need access via your huggingface token
os.environ['HF_TOKEN'] = "YOUR_HUGGING_FACE_TOKEN"

[ ]:

from nnsight import LanguageModel
# We'll never actually load the parameters so no need to specify a device_map.
llama = LanguageModel("meta-llama/Meta-Llama-3.1-70B")

# All we need to specify using NDIF vs executing locally is remote=True.
with llama.trace("The Eiffel Tower is in the city of", remote=True) as runner:

    hidden_states = llama.model.layers[-1].output.save()

    output = llama.output.save()

print(hidden_states)

print(output["logits"])

2024-08-30 07:11:21,150 MainProcess nnsight_remote INFO     36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - RECEIVED: Your job has been received and is waiting approval.
2024-08-30 07:11:21,184 MainProcess nnsight_remote INFO     36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - APPROVED: Your job was approved and is waiting to be run.
2024-08-30 07:11:21,206 MainProcess nnsight_remote INFO     36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - RUNNING: Your job has started running.
2024-08-30 07:11:21,398 MainProcess nnsight_remote INFO     36ff46f0-d81a-4586-b7e7-eaf6f97d6c0b - COMPLETED: Your job has been completed.
Downloading result:   0%|          | 0.00/9.48M [00:00<?, ?B/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Downloading result: 100%|██████████| 9.48M/9.48M [00:02<00:00, 3.21MB/s]

(tensor([[[ 5.4688, -4.9062,  2.2344,  ..., -3.6875,  0.9609,  1.2578],
         [ 1.5469, -0.6172, -1.4531,  ..., -1.1562, -0.1406, -2.1250],
         [ 1.7812, -1.8906, -1.1875,  ...,  0.1680,  0.9609,  0.5625],
         ...,
         [ 0.9453, -0.3711,  1.3516,  ...,  1.3828, -0.7969, -1.9297],
         [-0.8906,  0.3672,  0.2617,  ...,  2.4688, -0.4414, -0.6758],
         [-1.6094,  1.0938,  1.7031,  ...,  1.8672, -1.1328, -0.5000]]],
       dtype=torch.bfloat16), DynamicCache())
tensor([[[ 6.3750,  8.6250, 13.0000,  ..., -4.1562, -4.1562, -4.1562],
         [-2.8594, -2.2344, -3.0938,  ..., -8.6250, -8.6250, -8.6250],
         [ 8.9375,  3.5938,  4.5000,  ..., -3.9375, -3.9375, -3.9375],
         ...,
         [ 3.5781,  3.4531,  0.0796,  ..., -6.5625, -6.5625, -6.5625],
         [10.8750,  6.4062,  4.9375,  ..., -4.0000, -4.0000, -3.9844],
         [ 7.2500,  6.1562,  3.5156,  ..., -4.7188, -4.7188, -4.7188]]])

It really is as simple as remote=True! All of the techniques available in NNsight locally work just the same when running remotely.

Remote Model Considerations & System Limits#

To view currently hosted models, please visit https://nnsight.net/status/. All models except for meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct are currently available for public access. If you are interested in running an experiment on Llama 405b, please reach out to us at info@ndif.us .

Our system is currently actively in development, so please be prepared for system outages, updates, and wait times. NDIF is running on DeltaAI, so our services will be down during any of their planned and unplanned outages.

We currently have some rate-limiting and timeouts in place to ensure equitable model access between users.

Maximum Request Rate: 2 requests/minute
Maximum Job Run Time: 1 hour

Jobs violating these parameters will be automatically denied or aborted. Please plan your experiments accordingly. You can also reach out to our team at info@ndif.us if you have a special research case and would like to request any changes!