Skip to content

Interpretability for Neural Networks

NNsight (/ɛn.saɪt/) is a package for interpreting and manipulating the internals of deep learning models.

What is NNsight?

NNsight is a Python library that enables interpreting and intervening on the internals of deep learning models. It provides a clean, Pythonic interface for:

  • Accessing activations at any layer during forward passes
  • Modifying activations to study causal effects
  • Computing gradients with respect to intermediate values
  • Batching interventions across multiple inputs efficiently

Originally developed by the NDIF team at Northeastern University, NNsight supports local execution on any PyTorch model and remote execution on large models via the NDIF infrastructure.

What does that look like?

Install NNSight:

pip install nnsight

Intervene:

from nnsight import LanguageModel

model = LanguageModel('openai-community/gpt2', device_map='auto', dispatch=True)

with model.trace('The Eiffel Tower is in the city of', remote=True/False):
    # Intervene on activations
    model.transformer.h[0].output[0][:] = 0

    # Access and save hidden states
    hidden_states = model.transformer.h[-1].output[0].save()

    # Get model output
    output = model.output.save()

print(model.tokenizer.decode(output.logits.argmax(dim=-1)[0]))