Skip to content

Latest commit

 

History

History
213 lines (176 loc) · 7 KB

File metadata and controls

213 lines (176 loc) · 7 KB

StackRef Semantics

Author: Ken Jin

Prelude and Rationale

Due to PEP 703 (Making the Global Interpreter Lock Optional in CPython), tagged pointers will be introduced to CPython. Tagged pointers have in past attempts, been very hard to debug. From the author's own experience, there are the following non-exhaustive sources of bugs:

  1. Casting to-and-from a tagged pointer directly.
  2. Untagging a tagged pointer, then operating on it directly.
  3. Forgetting to convert deferred references to new references when required, this leads to hard-to-track segfaults.
  4. Using the wrong conversion function to convert a tagged pointer to a PyObject *.

We can solve 1. using the C compiler by making tagged pointers a struct, called _PyStackRef. 2. is quite easily detectable because it immediately leads to crashes during GC. However, 3. and 4. lead to hard-to-track reference leaks in the CPython interpreter.

Mark Shannon suggested since we are revamping the entire interpreter loop, we might as well make sure we do it right and in a principled way. Taking inspiration from HPy and Mark, I introduce _PyStackRef, basically a watered-down HPy for CPython that solves problems 3. and 4. automatically.

Definition in Words

(Copied from pycore_stackrefs.h)

This file introduces a new API for handling references on the stack, called _PyStackRef. This API is inspired by HPy. There are 3 main operations, that convert _PyStackRef to PyObject* and vice versa:

  1. Borrow (discouraged)
  2. Steal
  3. New

Borrow means that the reference is converted without any change in ownership. This is discouraged because it makes verification much harder. It also makes unboxed integers harder in the future.

Steal means that ownership is transferred to something else. The total number of references to the object stays the same.

New creates a new reference from the old reference. The old reference is still valid.

With these 3 API, a strict stack discipline must be maintained. All _PyStackRef must be operated on by the new reference operations:

  1. DUP
  2. CLOSE

DUP is roughly equivalent to Py_NewRef. It creates a new reference from an old reference. The old reference remains unchanged

CLOSE is roughly equivalent to Py_DECREF. It destroys a reference.

An implementation PR is at python/cpython#118450. It implements the foundations, without any of the debugging guarantees yet.

Definition in Operational Semantics

We define initial mapping:

$$ \begin{align} & \color{blue} live_{\text{PyStackRef}} \color{black} & = \{l \longmapsto \{0, 1\} \mid type(l) = \text{PyStackRef} \} \end{align} $$

The program state is defined as:

$$ (\color{blue} live_{\text{PyStackRef}} \color{black} ) $$

All operations are defined as operations on this state. These are the operational semantics:

$$ \begin{align} & borrow(Ref)[\text{PyStackRef} \hookrightarrow \text{PyObject}]: \\ &\hspace{10em} (\color{blue} live_{\text{PyStackRef}} \color{black}, \color{red} live_{\text{PyObject}} \color{black} ) \rightarrow (\color{blue} live_{\text{PyStackRef}} \color{black} ) \\ & steal(Ref)[\text{PyStackRef} \hookrightarrow \text{PyObject}]: \\ &\hspace{10em} (\color{blue} live_{\text{PyStackRef}} \color{black}, \color{red} live_{\text{PyObject}} \color{black} ) \rightarrow (\color{blue} live_{\text{PyStackRef}} \color{black} - Ref) \\ & steal(O)[\text{PyObject} \hookrightarrow \text{PyStackRef}]: \\ &\hspace{10em} (\color{blue} live_{\text{PyStackRef}} \color{black}, \color{red} live_{\text{PyObject}} \color{black} ) \rightarrow (\color{blue} live_{\text{PyStackRef}} \color{black} + O_{Ref}) \\ & new(Ref)[\text{PyStackRef} \hookrightarrow \text{PyObject}]: \\ &\hspace{10em} (\color{blue} live_{\text{PyStackRef}} \color{black}, \color{red} live_{\text{PyObject}} \color{black} ) \rightarrow (\color{blue} live_{\text{PyStackRef}} \color{black}) \\ & new(O)[\text{PyObject} \hookrightarrow \text{PyStackRef}]: \\ &\hspace{10em} (\color{blue} live_{\text{PyStackRef}} \color{black}, \color{red} live_{\text{PyObject}} \color{black} ) \rightarrow (\color{blue} live_{\text{PyStackRef}} \color{black} + O_{Ref}) \\ \end{align} $$

Additionally, these operations are used to manipulate references:

$$ % Note: defined in a separate block because the first expression is too big for GH. \begin{align} & dup(Ref): (\color{blue} live_{\text{PyStackRef}} \color{black}, \color{red} live_{\text{PyObject}} \color{black} ) \rightarrow (\color{blue} live_{\text{PyStackRef}} \color{black} + Ref_{Ref}) \\ & close(Ref): (\color{blue} live_{\text{PyStackRef}} \color{black}, \color{red} live_{\text{PyObject}} \color{black} ) \rightarrow (\color{blue} live_{\text{PyStackRef}} \color{black} - Ref) \\ \end{align} $$

  1. $O_{Ref}$ reads as "the reference of the object O".
  2. The $mapping - A$ operation must deduct $1$ from the corresponding $A$ in $mapping$.
  3. The $mapping + A$ operation should add $1$ to the corresponding $A$ in $mapping$.

Invariants (Detecting Unsoundness)

  1. $\text{PyStackRef}$ s are unique.
  2. $\text{PyObject}$ s are not necessarily unique (this is for compatibility with CPython).
  3. At normal function call frame exit, the program state should be $len(\text{live}) == 1$.
  4. $steal(Ref)[\text{PyStackRef} \hookrightarrow \text{PyObject}]$ should at the point of stealing, have exactly one stack ref available to steal.
  5. $steal(Ref)[\text{PyObject} \hookrightarrow \text{PyStackRef}]$ does not need to have exactly one PyObject * in the mapping, because of 2.
  6. $borrow(Ref)[\text{PyStackRef} \hookrightarrow \text{PyObject}]$ requires the stackref to map to 1. Ie. the stackref entry cannot be 0.
  7. The mapping for $\text{PyStackRef}$ can only map to 0 or 1.
  8. The mapping for $live_{\text{PyObject}}$ can never map to a negative number.

Examples

Forgot to close

x = PyStackRef_DUP(stackref)
// Error on frame pop, as live(PyStackRef) > 1

Passed dead reference to something

PyStackRef_CLOSE(stackref)
// Error!
foo(PyStackRef_AsPyObjectBorrow(stackref))
// borrow needs mapping to > 0.
foo(PyStackRef_AsPyObjectSteal(stackref))
// Error!
bar(PyStackRef_AsPyObjectSteal(stackref))
// steal needs mapping to 1.

Steal instead of new / extra close

# This should be new, or there should be no decref_inputs after.
steal_pyobject(PyStackRef_AsPyObjectSteal(stackref))
// Error!
DECREF_INPUTS()
// close needs mapping to 1.

Note: we can't fix most of the problems if borrow is used!

Implementation

We will only enforce invariants in debug builds. There will be no performance loss in release builds.

  1. For each thread state, we create an array that contains the mappings of references (handles) to actual $\text{PyObject}$.
  2. On each frame push, we note the current (most recent) reference ID.
  3. On frame pop, we note the current (most recent) reference ID. We then scan all the reference IDs from obtained in step 2 to 3 to ensure that there is only one live refrence ID at the point of exit.
  4. Each reference ID is just the index into the handle array.