Info

Last Execution: 2026-02-17

Package	Version
nnsight	0.5.15
Python	3.12.3
torch	2.10.0+cu128
transformers	5.2.0

Early Stopping¶

When you only need activations from the first few layers, there's no reason to run the full forward pass. Call tracer.stop() to halt execution early, saving time and compute.

Setup¶

In [1]:

Copied!

from nnsight import LanguageModel

model = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)
from nnsight import LanguageModel

model = LanguageModel("openai-community/gpt2", device_map="auto", dispatch=True)

Stopping After a Layer¶

tracer.stop() halts model execution at the most recently accessed module. Everything after that point is skipped.

In [2]:

Copied!

with model.trace("The Eiffel Tower is in the city of") as tracer:
    hs = model.transformer.h[5].output[0].save()
    tracer.stop()

print(f"Layer 5 output shape: {hs.shape}")
with model.trace("The Eiffel Tower is in the city of") as tracer:
    hs = model.transformer.h[5].output[0].save()
    tracer.stop()

print(f"Layer 5 output shape: {hs.shape}")

Layer 5 output shape: torch.Size([1, 10, 768])

The model never ran layers 6–11 or lm_head. Only layers 0–5 executed.

Code After stop() Is Skipped¶

Any intervention defined after tracer.stop() will never execute — the trace ends at that point.

In [3]:

Copied!





with model.trace("The Eiffel Tower is in the city of") as tracer:
    hs_early = model.transformer.h[0].output[0].save()
    tracer.stop()

    # This line never runs
    hs_late = model.transformer.h[-1].output[0].save()

print(f"Layer 0: {hs_early.shape}")

try:
    print(hs_late)
except NameError:
    print("hs_late was never defined — stop() halted execution before it")
with model.trace("The Eiffel Tower is in the city of") as tracer:
    hs_early = model.transformer.h[0].output[0].save()
    tracer.stop()

    # This line never runs
    hs_late = model.transformer.h[-1].output[0].save()

print(f"Layer 0: {hs_early.shape}")

try:
    print(hs_late)
except NameError:
    print("hs_late was never defined — stop() halted execution before it")

Layer 0: torch.Size([1, 10, 768])
hs_late was never defined — stop() halted execution before it

Everything after stop() is dead code

tracer.stop() immediately terminates the forward pass. Any .save() calls, module accesses, or other interventions defined below it will never execute. Place tracer.stop() after your last intervention.

Performance Benefit¶

Early stopping is faster because the model skips all remaining layers after the stop point.

In [4]:

Copied!





import time

# Full forward pass
start = time.perf_counter()
for _ in range(50):
    with model.trace("The Eiffel Tower is in the city of"):
        hs = model.transformer.h[5].output[0].save()
full_time = time.perf_counter() - start

# Early stop after layer 5
start = time.perf_counter()
for _ in range(50):
    with model.trace("The Eiffel Tower is in the city of") as tracer:
        hs = model.transformer.h[5].output[0].save()
        tracer.stop()
stop_time = time.perf_counter() - start

print(f"Full forward pass (50 runs): {full_time:.3f}s")
print(f"Early stop at layer 5 (50 runs): {stop_time:.3f}s")
print(f"Speedup: {full_time / stop_time:.1f}x")
import time

# Full forward pass
start = time.perf_counter()
for _ in range(50):
    with model.trace("The Eiffel Tower is in the city of"):
        hs = model.transformer.h[5].output[0].save()
full_time = time.perf_counter() - start

# Early stop after layer 5
start = time.perf_counter()
for _ in range(50):
    with model.trace("The Eiffel Tower is in the city of") as tracer:
        hs = model.transformer.h[5].output[0].save()
        tracer.stop()
stop_time = time.perf_counter() - start

print(f"Full forward pass (50 runs): {full_time:.3f}s")
print(f"Early stop at layer 5 (50 runs): {stop_time:.3f}s")
print(f"Speedup: {full_time / stop_time:.1f}x")

Full forward pass (50 runs): 0.419s
Early stop at layer 5 (50 runs): 0.256s
Speedup: 1.6x

When to use early stopping

Collecting activations from early/middle layers — no need to run the full model
Training probes or SAEs on intermediate representations — skip the layers you don't need
Debugging — quickly check a specific layer's output without waiting for the full pass