A mypy plugin for optional fields

The framework I worked on at Superlinked ships a small DSL for declaring schemas. Users write something like:

class Product(sl.Schema):
    id: sl.IdField
    title: sl.String
    image: sl.Blob | None
    rating: sl.Integer | None

The | None is there because at ingest time these fields can legitimately be missing from the payload. The user-facing mental model is “this field might not be set.”

Inside the framework, none of that is actually true. Each attribute is a descriptor (a String, Blob, Integer instance) and it is always present on the class. It is the underlying value that may be missing, not the field object itself. So when downstream code does Product.image, it gets a Blob instance every time, never None.

mypy does not know this. It sees image: Blob | None, decides accesses are Blob | None, and then tells the user that every comparison like Product.image == something might blow up on None. The user gets a wall of red on perfectly correct code.

You have a few options here. You can lie about the type and drop the | None, but then you lose the ergonomics for users at ingest. You can make users write cast everywhere, which also kills the ergonomics. Or you can write a mypy plugin that knows the lie and tells the truth.

The plugin

Permalink to “The plugin”

mypy plugins are sparsely documented but they are small. The relevant hook is get_attribute_hook, which lets you intercept attribute access and rewrite the type. Here is the shape, sanitized:

from mypy.plugin import Plugin, AttributeContext
from mypy.types import Type, UnionType, NoneType


class SchemaFieldPlugin(Plugin):
    _schema_subclass_cache: dict[str, bool] = {}
    _type_transform_cache: dict[str, Type] = {}

    def get_attribute_hook(self, fullname):
        cls_name, _, attr = fullname.rpartition(".")
        if not self._is_schema_subclass(cls_name):
            return None
        return self._strip_none

    def _is_schema_subclass(self, cls_fullname: str) -> bool:
        cached = self._schema_subclass_cache.get(cls_fullname)
        if cached is not None:
            return cached
        # walk MRO via the SymbolTable, look for our Schema base.
        ...

    def _strip_none(self, ctx: AttributeContext) -> Type:
        t = ctx.default_attr_type
        if isinstance(t, UnionType):
            non_none = [a for a in t.items if not isinstance(a, NoneType)]
            if len(non_none) < len(t.items):
                return UnionType.make_union(non_none) if len(non_none) > 1 else non_none[0]
        return t


def plugin(version: str):
    return SchemaFieldPlugin

Wired up via pyproject.toml:

[tool.mypy]
plugins = ["myproject.mypy_plugin.schema_field_plugin"]

That is the whole thing. About 100 lines once you add the MRO walk and the imports.

The caches matter quite a bit. Without _schema_subclass_cache and _type_transform_cache, mypy gets noticeably slower on large codebases, because the plugin runs on every single attribute access in every file. The caches turn the hot path into a dict lookup. Mine cut a 12-second mypy run to about 7 seconds on the same project.

A mypy plugin is a hammer though. You reach for it when a descriptor-based API genuinely contradicts what the type system can express, when that contradiction shows up at every access site rather than just a few of them, and when the alternative is shipping cast calls or a fake type stub that drifts away from the runtime over time.

It is not the right tool when you can fix the problem with Annotated, with a stub file, or by just changing the API. Plugins lock you into mypy specifically (Pyright will not run them) and they are a maintenance burden every time mypy releases a new version.

The reason I went with the plugin is the third condition. Every alternative either lied to the user or made them write cast(Blob, product.image) on every line, and that is no way to live. A 100-line plugin turned the original lie into a contract that the type checker actually enforces.

The mypy plugin docs are bad and the API is not really stable across versions. The actual code, once you find it, is small and worth the trouble.