Skip to content

serialization

serialization

Source-based function serialization for cross-version compatibility.

This module provides a custom serialization system built on top of cloudpickle that serializes Python functions by their source code rather than bytecode.

Why source-based serialization? Standard pickle/cloudpickle serialize functions using Python bytecode, which is version-specific and can break when deserializing on a different Python version. By serializing the source code instead, we can reconstruct functions on any Python version that supports the syntax, enabling cross-version compatibility for remote execution (e.g., client on Python 3.10, server on 3.11).

Key components
  • CustomCloudPickler: Serializes functions by capturing their source code, closure variables, and metadata instead of bytecode.
  • CustomCloudUnpickler: Deserializes data with support for persistent object references (objects that shouldn't be serialized but looked up by ID).
  • make_function: Reconstructs a function from its serialized components.
  • dumps/loads: High-level API for serializing and deserializing objects (named to match the standard pickle module API).
Persistent objects

Some objects (like model proxies or tensors) shouldn't be serialized directly but instead referenced by ID and resolved at deserialization time. Objects with a _persistent_id attribute in their dict are handled this way.

Examples:

>>> import serialization
>>> def my_func(x, y=10):
...     return x + y
>>> data = serialization.dumps(my_func)
>>> restored = serialization.loads(data)
>>> restored(5)  # Returns 15

DEFAULT_PROTOCOL module-attribute

DEFAULT_PROTOCOL = 4

save module-attribute

save = dumps

load module-attribute

load = loads

SerializedFrame

SerializedFrame(co_filename: str, co_firstlineno: int, co_name: str)

f_locals instance-attribute

f_locals = {}

f_globals instance-attribute

f_globals = {}

f_code instance-attribute

f_code = SimpleNamespace(co_filename=co_filename, co_firstlineno=co_firstlineno, co_name=co_name)

CustomCloudPickler

Bases: Pickler

A cloudpickle-based pickler that serializes functions by source code.

This pickler extends cloudpickle.Pickler to override how dynamic functions are serialized. Instead of using bytecode (which is Python version-specific), it captures the function's source code, enabling cross-version compatibility.

Key features
  • Source-based function serialization via _dynamic_function_reduce
  • Persistent object references via persistent_id for objects that shouldn't be fully serialized

Examples:

>>> import io
>>> def my_func(x):
...     return x * 2
>>> buffer = io.BytesIO()
>>> CustomCloudPickler(buffer).dump(my_func)
>>> # Function is now serialized with its source code

reducer_override

reducer_override(obj)

persistent_id

persistent_id(obj: Any) -> Optional[Any]

Return a persistent ID for objects that shouldn't be fully serialized.

Pickle's persistent_id mechanism allows certain objects to be referenced by an ID rather than serialized. During deserialization, persistent_load resolves these IDs back to actual objects.

This is critical for nnsight's remote execution where certain objects (like model proxies, intervention graph nodes, or large tensors) should not be serialized but instead looked up on the server side.

PARAMETER DESCRIPTION
obj

The object being pickled.

TYPE: Any

RETURNS DESCRIPTION
Optional[Any]

The persistent ID if obj has a _persistent_id in its dict,

Optional[Any]

otherwise None (meaning pickle should serialize normally).

Examples:

An object with obj.dict["_persistent_id"] = "node_42" will be serialized as just the reference "node_42", and during deserialization, persistent_load("node_42") will be called to resolve it.

CustomCloudUnpickler

CustomCloudUnpickler(file: BinaryIO, persistent_objects: Optional[dict] = None)

Bases: Unpickler

A custom unpickler that resolves persistent object references.

Works in conjunction with CustomCloudPickler to handle objects that were serialized by reference (persistent_id) rather than by value. During deserialization, persistent IDs are looked up in the provided dictionary.

This enables patterns where certain objects (like model proxies or graph nodes) are referenced by ID in the serialized data and resolved to actual objects on the server side.

PARAMETER DESCRIPTION
file

File-like object to read pickle data from.

TYPE: BinaryIO

persistent_objects

Dictionary mapping persistent IDs to actual objects. When a persistent ID is encountered during deserialization, it's looked up in this dictionary.

TYPE: Optional[dict] DEFAULT: None

Examples:

>>> # On the server side
>>> model_proxy = get_model_proxy("gpt2")
>>> persistent_objects = {"model_ref_1": model_proxy}
>>> data = receive_from_client()
>>> obj = CustomCloudUnpickler(io.BytesIO(data), persistent_objects).load()
>>> # Any references to "model_ref_1" in the data are now resolved
PARAMETER DESCRIPTION
file

Binary file-like object containing pickle data.

TYPE: BinaryIO

persistent_objects

Optional dict mapping persistent IDs to objects. Defaults to empty dict if not provided.

TYPE: Optional[dict] DEFAULT: None

persistent_objects instance-attribute

persistent_objects = persistent_objects or {}

persistent_load

persistent_load(pid: Any) -> Any

Resolve a persistent ID to its corresponding object.

Called automatically by pickle when it encounters a persistent reference (created by persistent_id during serialization).

PARAMETER DESCRIPTION
pid

The persistent ID to resolve.

TYPE: Any

RETURNS DESCRIPTION
Any

The object corresponding to the persistent ID.

RAISES DESCRIPTION
UnpicklingError

If the persistent ID is not found in the persistent_objects dictionary.

make_function

make_function(source: str, name: str, filename: Optional[str], qualname: str, module: str, doc: Optional[str], annotations: Optional[dict], defaults: Optional[tuple], kwdefaults: Optional[dict], base_globals: dict, closure_values: Optional[list], closure_names: Optional[list]) -> FunctionType

Reconstruct a function from its serialized source code and metadata.

This is the deserialization counterpart to CustomCloudPickler's function serialization. It recompiles source code and reconstructs the function with all its original attributes (defaults, annotations, closure, etc.).

This function creates the function with minimal globals. The full globals (including any self-references for recursive functions) are applied later by _source_function_setstate, which is called after pickle memoizes the function. This two-phase approach enables proper handling of circular references like recursive or mutually recursive functions.

PARAMETER DESCRIPTION
source

The function's source code as a string. May be indented.

TYPE: str

name

The function's name attribute.

TYPE: str

filename

Original filename where the function was defined. Used for tracebacks and debugging. Falls back to "" if None.

TYPE: Optional[str]

qualname

The function's qualname (qualified name including class).

TYPE: str

module

The function's module attribute.

TYPE: str

doc

The function's docstring (doc).

TYPE: Optional[str]

annotations

Type annotations dict (annotations).

TYPE: Optional[dict]

defaults

Default values for positional arguments (defaults).

TYPE: Optional[tuple]

kwdefaults

Default values for keyword-only arguments (kwdefaults).

TYPE: Optional[dict]

base_globals

Minimal global variables dict. Full globals including self-references are added later by _source_function_setstate.

TYPE: dict

closure_values

List of closure variable values (passed immediately, not deferred, because closures need factory pattern to bind properly).

TYPE: Optional[list]

closure_names

List of closure variable names (co_freevars).

TYPE: Optional[list]

RETURNS DESCRIPTION
FunctionType

A newly constructed function object. Note: the globals are minimal

FunctionType

at this point - they're filled in by _source_function_setstate.

RAISES DESCRIPTION
ValueError

If the function name cannot be found in the compiled source.

make_frame

make_frame(co_filename: str, co_firstlineno: int, co_name: str) -> tuple

dumps

dumps(obj: Any, path: Optional[Union[str, Path]] = None, protocol: int = DEFAULT_PROTOCOL) -> Optional[bytes]

Serialize an object using source-based function serialization.

This is the high-level API for serializing objects with CustomCloudPickler. Functions in the object graph will be serialized by source code rather than bytecode, enabling cross-Python-version compatibility.

PARAMETER DESCRIPTION
obj

Any picklable object. Functions will be serialized by source code.

TYPE: Any

path

Optional file path to write the serialized data to. Accepts both string paths and pathlib.Path objects. If None, returns the serialized bytes directly.

TYPE: Optional[Union[str, Path]] DEFAULT: None

protocol

Pickle protocol version to use. Defaults to DEFAULT_PROTOCOL (4). Protocol 4 is available in Python 3.4+ and supports large objects.

TYPE: int DEFAULT: DEFAULT_PROTOCOL

RETURNS DESCRIPTION
Optional[bytes]

If path is None: The serialized data as bytes.

Optional[bytes]

If path is provided: None (data is written to file).

Examples:

>>> # Serialize to bytes (for network transmission)
>>> data = dumps(my_function)
>>> send_to_server(data)
>>>
>>> # Serialize to file (for persistence)
>>> dumps(my_function, "/path/to/function.pkl")
>>>
>>> # Using pathlib.Path
>>> from pathlib import Path
>>> dumps(my_function, Path("./functions/my_func.pkl"))

loads

loads(data: Union[str, bytes, Path], persistent_objects: Optional[dict] = None) -> Any

Deserialize data that was serialized with dumps().

This is the high-level API for deserializing objects with CustomCloudUnpickler. Functions serialized by source code will be reconstructed by recompiling their source on the current Python version.

PARAMETER DESCRIPTION
data

One of: - bytes: Serialized data (e.g., received over network) - str: Path to a file containing serialized data - Path: pathlib.Path to a file containing serialized data

TYPE: Union[str, bytes, Path]

persistent_objects

Optional dictionary mapping persistent IDs to objects. Used to resolve objects that were serialized by reference rather than by value. See CustomCloudUnpickler for details.

TYPE: Optional[dict] DEFAULT: None

RETURNS DESCRIPTION
Any

The deserialized object.

RAISES DESCRIPTION
UnpicklingError

If a persistent ID is encountered that isn't in the persistent_objects dictionary.

FileNotFoundError

If a file path is provided but the file doesn't exist.

Examples:

>>> # Load from bytes (received over network)
>>> data = receive_from_client()
>>> obj = loads(data)
>>>
>>> # Load from file (string path)
>>> obj = loads("/path/to/function.pkl")
>>>
>>> # Load from file (pathlib.Path)
>>> from pathlib import Path
>>> obj = loads(Path("./functions/my_func.pkl"))
>>>
>>> # Load with persistent object resolution
>>> persistent = {"model_proxy": actual_model}
>>> obj = loads(data, persistent_objects=persistent)