Info
Last Execution: 2026-02-17
| Package | Version |
|---|---|
| nnsight | 0.5.15 |
| Python | 3.12.3 |
| torch | 2.10.0+cu128 |
| transformers | 5.2.0 |
Early Stopping¶
When you only need activations from the first few layers, there's no reason to run the full forward pass. Call tracer.stop() to halt execution early, saving time and compute.
Setup¶
from nnsight import LanguageModel
model = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)
Stopping After a Layer¶
tracer.stop() halts model execution at the most recently accessed module. Everything after that point is skipped.
with model.trace("The Eiffel Tower is in the city of") as tracer:
hs = model.transformer.h[5].output[0].save()
tracer.stop()
print(f"Layer 5 output shape: {hs.shape}")
Layer 5 output shape: torch.Size([1, 10, 768])
The model never ran layers 6–11 or lm_head. Only layers 0–5 executed.
Code After stop() Is Skipped¶
Any intervention defined after tracer.stop() will never execute — the trace ends at that point.
with model.trace("The Eiffel Tower is in the city of") as tracer:
hs_early = model.transformer.h[0].output[0].save()
tracer.stop()
# This line never runs
hs_late = model.transformer.h[-1].output[0].save()
print(f"Layer 0: {hs_early.shape}")
try:
print(hs_late)
except NameError:
print("hs_late was never defined — stop() halted execution before it")
Layer 0: torch.Size([1, 10, 768]) hs_late was never defined — stop() halted execution before it
Everything after stop() is dead code
tracer.stop() immediately terminates the forward pass. Any .save() calls, module accesses, or other interventions defined below it will never execute. Place tracer.stop() after your last intervention.
Performance Benefit¶
Early stopping is faster because the model skips all remaining layers after the stop point.
import time
# Full forward pass
start = time.perf_counter()
for _ in range(50):
with model.trace("The Eiffel Tower is in the city of"):
hs = model.transformer.h[5].output[0].save()
full_time = time.perf_counter() - start
# Early stop after layer 5
start = time.perf_counter()
for _ in range(50):
with model.trace("The Eiffel Tower is in the city of") as tracer:
hs = model.transformer.h[5].output[0].save()
tracer.stop()
stop_time = time.perf_counter() - start
print(f"Full forward pass (50 runs): {full_time:.3f}s")
print(f"Early stop at layer 5 (50 runs): {stop_time:.3f}s")
print(f"Speedup: {full_time / stop_time:.1f}x")
Full forward pass (50 runs): 0.419s Early stop at layer 5 (50 runs): 0.256s Speedup: 1.6x
When to use early stopping
- Collecting activations from early/middle layers — no need to run the full model
- Training probes or SAEs on intermediate representations — skip the layers you don't need
- Debugging — quickly check a specific layer's output without waiting for the full pass