Skip to content

Guide

How to make your NDIF experiment 130x faster

A user had reached out to me recently asking how they could make their nnsight code faster with NDIF to meet a project deadline. After looking at their code, I introduced a number of improvements that leverage nnsight features and remote execution principles. The result was a 130x improvement speedup.

The experience was successful; I drew many useful lessons, and I want to share with you these key principles, so you can too optimally implement your experiments for remote execution.

TLDR;

  1. If you're doing more than one forward pass, wrap them in a model.session
  2. Downloading large tensors can be costly, only .save() what you need
  3. Cache all your activations in one go
  4. Reduce loops with Batching Invokes
  5. .skip what you can

Extending NNsight: From Custom Envoys to Your Own Model Class

By Jaden Fiotto-Kaufman

NNsight works out of the box on any torch.nn.Module. Wrap it, open a trace, read .output, save it. For a lot of interpretability work, that's all you need.

But the longer you spend doing this work, the more patterns you notice. You catch yourself writing the same six-line projection chain for every layer of a logit-lens sweep. You reshape attention heads in every single notebook. You wrap a model that isn't on HuggingFace and discover you now have to rebuild tokenization, batching, and generation by hand. You start wanting NNsight to speak your model's vocabulary.

NNsight is designed to be extended at exactly these points. This post is a cookbook of the extension surface — from lightweight per-module conveniences all the way down to custom execution backends. Pick the cheapest primitive that solves your problem; don't reach for a custom backend when a three-line eproperty will do.