Skip to content

Blog

How to make your NDIF experiment 130x faster

A user had reached out to me recently asking how they could make their nnsight code faster with NDIF to meet a project deadline. After looking at their code, I introduced a number of improvements that leverage nnsight features and remote execution principles. The result was a 130x improvement speedup.

The experience was successful; I drew many useful lessons, and I want to share with you these key principles, so you can too optimally implement your experiments for remote execution.

TLDR;

  1. If you're doing more than one forward pass, wrap them in a model.session
  2. Downloading large tensors can be costly, only .save() what you need
  3. Cache all your activations in one go
  4. Reduce loops with Batching Invokes
  5. .skip what you can

Extending NNsight: From Custom Envoys to Your Own Model Class

By Jaden Fiotto-Kaufman

NNsight works out of the box on any torch.nn.Module. Wrap it, open a trace, read .output, save it. For a lot of interpretability work, that's all you need.

But the longer you spend doing this work, the more patterns you notice. You catch yourself writing the same six-line projection chain for every layer of a logit-lens sweep. You reshape attention heads in every single notebook. You wrap a model that isn't on HuggingFace and discover you now have to rebuild tokenization, batching, and generation by hand. You start wanting NNsight to speak your model's vocabulary.

NNsight is designed to be extended at exactly these points. This post is a cookbook of the extension surface — from lightweight per-module conveniences all the way down to custom execution backends. Pick the cheapest primitive that solves your problem; don't reach for a custom backend when a three-line eproperty will do.

Calling all Lies

AI Deception Is More Than Just Getting Facts Wrong

Models can lie about what they're capable of, fabricate plausible-sounding details under social pressure, strategically blend truth with fiction, or dodge questions they can't answer honestly. NDIF and Cadenza Labs are hosting a competition to study how models lie and are looking for red teams to create scenarios where models contradict their own beliefs (RFP: https://cadenza-labs.github.io/red-team-rfp/).

This competition is inspired by Liars’ Bench from Cadenza Labs. Their benchmark of over 72,000 labeled examples organizes LLM lies along two key dimensions: what the model lies about (world knowledge, its own capabilities, its actions, its policies) and why it lies (inherent behavioral patterns vs. context-driven pressure). The comprehensive benchmark spans from simple factual falsehoods to subtle introspective lies, and should serve as a starting point for red team scenario design.

In response to our RFP, we want red teams to cultivate a variety of creative and diverse deception scenarios, which blue teams will then use to build novel and robust lie detector method(s). In this blog post, we walk through an example red team project to inspire creative proposal submissions.

Introducing NNsight 0.6

By Jaden Fiotto-Kaufman

NNsight is releasing its sixth major version, focused on addressing user feedback about common hurdles with the library.

Wait, What is NNsight Again?

If you are a new user or you haven't used NNsight in a while, here's a small refresher! NNsight is a Python library for interpreting and intervening on the internals of PyTorch models. You wrap a model, open a tracing context, and read or write activations at any layer. While the tool supports any Pytorch model, we also provide first-class support for popular architectures such as 🤗 transformers and diffusers.

from nnsight import LanguageModel

model = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)

with model.trace("The Eiffel Tower is in"):
    # Read the hidden states at layer 5
    hidden = model.transformer.h[5].output[0].save()

    # Zero out the MLP output at layer 0
    model.transformer.h[0].mlp.output[:] = 0

Under the hood, NNsight uses deferred execution. When you enter with model.trace(...), your code AST is extracted, compiled into a function, and run in a worker thread. When that thread accesses .output, it waits until the model's forward pass reaches the selected module, extracting the desired output tensor through a PyTorch hook. This means your intervention code is fully aligned with the forward pass—no proxies, no fake tensors. You're working with real PyTorch values!

NNterp Integration

We welcome nnterp to the NDIF ecosystem! The nnterp library is built on top of nnsight, providing standardized transformer architecture for many LLMs and implementations of common interpretability techniques. Let's explore how it works!