serialization¶
serialization
¶
Source-based function serialization for cross-version compatibility.
This module provides a custom serialization system built on top of cloudpickle that serializes Python functions by their source code rather than bytecode.
Why source-based serialization? Standard pickle/cloudpickle serialize functions using Python bytecode, which is version-specific and can break when deserializing on a different Python version. By serializing the source code instead, we can reconstruct functions on any Python version that supports the syntax, enabling cross-version compatibility for remote execution (e.g., client on Python 3.10, server on 3.11).
Key components
- CustomCloudPickler: Serializes functions by capturing their source code, closure variables, and metadata instead of bytecode.
- CustomCloudUnpickler: Deserializes data with support for persistent object references (objects that shouldn't be serialized but looked up by ID).
- make_function: Reconstructs a function from its serialized components.
- dumps/loads: High-level API for serializing and deserializing objects (named to match the standard pickle module API).
Persistent objects
Some objects (like model proxies or tensors) shouldn't be serialized directly
but instead referenced by ID and resolved at deserialization time. Objects
with a _persistent_id attribute in their dict are handled this way.
Examples:
>>> import serialization
>>> def my_func(x, y=10):
... return x + y
>>> data = serialization.dumps(my_func)
>>> restored = serialization.loads(data)
>>> restored(5) # Returns 15
SerializedFrame
¶
CustomCloudPickler
¶
Bases: Pickler
A cloudpickle-based pickler that serializes functions by source code.
This pickler extends cloudpickle.Pickler to override how dynamic functions are serialized. Instead of using bytecode (which is Python version-specific), it captures the function's source code, enabling cross-version compatibility.
Key features
- Source-based function serialization via _dynamic_function_reduce
- Persistent object references via persistent_id for objects that shouldn't be fully serialized
Examples:
>>> import io
>>> def my_func(x):
... return x * 2
>>> buffer = io.BytesIO()
>>> CustomCloudPickler(buffer).dump(my_func)
>>> # Function is now serialized with its source code
persistent_id
¶
Return a persistent ID for objects that shouldn't be fully serialized.
Pickle's persistent_id mechanism allows certain objects to be referenced by an ID rather than serialized. During deserialization, persistent_load resolves these IDs back to actual objects.
This is critical for nnsight's remote execution where certain objects (like model proxies, intervention graph nodes, or large tensors) should not be serialized but instead looked up on the server side.
| PARAMETER | DESCRIPTION |
|---|---|
obj
|
The object being pickled.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[Any]
|
The persistent ID if obj has a |
Optional[Any]
|
otherwise None (meaning pickle should serialize normally). |
Examples:
An object with obj.dict["_persistent_id"] = "node_42" will be serialized as just the reference "node_42", and during deserialization, persistent_load("node_42") will be called to resolve it.
CustomCloudUnpickler
¶
Bases: Unpickler
A custom unpickler that resolves persistent object references.
Works in conjunction with CustomCloudPickler to handle objects that were serialized by reference (persistent_id) rather than by value. During deserialization, persistent IDs are looked up in the provided dictionary.
This enables patterns where certain objects (like model proxies or graph nodes) are referenced by ID in the serialized data and resolved to actual objects on the server side.
| PARAMETER | DESCRIPTION |
|---|---|
file
|
File-like object to read pickle data from.
TYPE:
|
persistent_objects
|
Dictionary mapping persistent IDs to actual objects. When a persistent ID is encountered during deserialization, it's looked up in this dictionary.
TYPE:
|
Examples:
>>> # On the server side
>>> model_proxy = get_model_proxy("gpt2")
>>> persistent_objects = {"model_ref_1": model_proxy}
>>> data = receive_from_client()
>>> obj = CustomCloudUnpickler(io.BytesIO(data), persistent_objects).load()
>>> # Any references to "model_ref_1" in the data are now resolved
| PARAMETER | DESCRIPTION |
|---|---|
file
|
Binary file-like object containing pickle data.
TYPE:
|
persistent_objects
|
Optional dict mapping persistent IDs to objects. Defaults to empty dict if not provided.
TYPE:
|
persistent_load
¶
Resolve a persistent ID to its corresponding object.
Called automatically by pickle when it encounters a persistent reference (created by persistent_id during serialization).
| PARAMETER | DESCRIPTION |
|---|---|
pid
|
The persistent ID to resolve.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
The object corresponding to the persistent ID. |
| RAISES | DESCRIPTION |
|---|---|
UnpicklingError
|
If the persistent ID is not found in the persistent_objects dictionary. |
make_function
¶
make_function(source: str, name: str, filename: Optional[str], qualname: str, module: str, doc: Optional[str], annotations: Optional[dict], defaults: Optional[tuple], kwdefaults: Optional[dict], base_globals: dict, closure_values: Optional[list], closure_names: Optional[list]) -> FunctionType
Reconstruct a function from its serialized source code and metadata.
This is the deserialization counterpart to CustomCloudPickler's function serialization. It recompiles source code and reconstructs the function with all its original attributes (defaults, annotations, closure, etc.).
This function creates the function with minimal globals. The full globals (including any self-references for recursive functions) are applied later by _source_function_setstate, which is called after pickle memoizes the function. This two-phase approach enables proper handling of circular references like recursive or mutually recursive functions.
| PARAMETER | DESCRIPTION |
|---|---|
source
|
The function's source code as a string. May be indented.
TYPE:
|
name
|
The function's name attribute.
TYPE:
|
filename
|
Original filename where the function was defined. Used for
tracebacks and debugging. Falls back to "
TYPE:
|
qualname
|
The function's qualname (qualified name including class).
TYPE:
|
module
|
The function's module attribute.
TYPE:
|
doc
|
The function's docstring (doc).
TYPE:
|
annotations
|
Type annotations dict (annotations).
TYPE:
|
defaults
|
Default values for positional arguments (defaults).
TYPE:
|
kwdefaults
|
Default values for keyword-only arguments (kwdefaults).
TYPE:
|
base_globals
|
Minimal global variables dict. Full globals including self-references are added later by _source_function_setstate.
TYPE:
|
closure_values
|
List of closure variable values (passed immediately, not deferred, because closures need factory pattern to bind properly).
TYPE:
|
closure_names
|
List of closure variable names (co_freevars).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
FunctionType
|
A newly constructed function object. Note: the globals are minimal |
FunctionType
|
at this point - they're filled in by _source_function_setstate. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the function name cannot be found in the compiled source. |
dumps
¶
dumps(obj: Any, path: Optional[Union[str, Path]] = None, protocol: int = DEFAULT_PROTOCOL) -> Optional[bytes]
Serialize an object using source-based function serialization.
This is the high-level API for serializing objects with CustomCloudPickler. Functions in the object graph will be serialized by source code rather than bytecode, enabling cross-Python-version compatibility.
| PARAMETER | DESCRIPTION |
|---|---|
obj
|
Any picklable object. Functions will be serialized by source code.
TYPE:
|
path
|
Optional file path to write the serialized data to. Accepts both string paths and pathlib.Path objects. If None, returns the serialized bytes directly.
TYPE:
|
protocol
|
Pickle protocol version to use. Defaults to DEFAULT_PROTOCOL (4). Protocol 4 is available in Python 3.4+ and supports large objects.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[bytes]
|
If path is None: The serialized data as bytes. |
Optional[bytes]
|
If path is provided: None (data is written to file). |
Examples:
>>> # Serialize to bytes (for network transmission)
>>> data = dumps(my_function)
>>> send_to_server(data)
>>>
>>> # Serialize to file (for persistence)
>>> dumps(my_function, "/path/to/function.pkl")
>>>
>>> # Using pathlib.Path
>>> from pathlib import Path
>>> dumps(my_function, Path("./functions/my_func.pkl"))
loads
¶
Deserialize data that was serialized with dumps().
This is the high-level API for deserializing objects with CustomCloudUnpickler. Functions serialized by source code will be reconstructed by recompiling their source on the current Python version.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
One of: - bytes: Serialized data (e.g., received over network) - str: Path to a file containing serialized data - Path: pathlib.Path to a file containing serialized data
TYPE:
|
persistent_objects
|
Optional dictionary mapping persistent IDs to objects. Used to resolve objects that were serialized by reference rather than by value. See CustomCloudUnpickler for details.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Any
|
The deserialized object. |
| RAISES | DESCRIPTION |
|---|---|
UnpicklingError
|
If a persistent ID is encountered that isn't in the persistent_objects dictionary. |
FileNotFoundError
|
If a file path is provided but the file doesn't exist. |
Examples:
>>> # Load from bytes (received over network)
>>> data = receive_from_client()
>>> obj = loads(data)
>>>
>>> # Load from file (string path)
>>> obj = loads("/path/to/function.pkl")
>>>
>>> # Load from file (pathlib.Path)
>>> from pathlib import Path
>>> obj = loads(Path("./functions/my_func.pkl"))
>>>
>>> # Load with persistent object resolution
>>> persistent = {"model_proxy": actual_model}
>>> obj = loads(data, persistent_objects=persistent)