Julian's Blog - Python Annotations: The 20/80 Essentials

JulianHysi 29 October, 2023

Python Annotations: The 20/80 Essentials

In my previous article, I talked about the state of type hinting in modern Python. I would encourage you to give it a read if you haven't, as I touch upon the history and evolution of typing in Python, the pros and cons of annotating your code, how that plays out with the current Python stack, and a sprinkle of my coding journey and habits.

This article is part two of that. Given how commonplace type annotations have become, no matter where we stand on the matter, I think every Python developer should be familiar with the essentials. This article, although by no means extensive, will attempt to cover the most relevant and useful aspects you need to know about the typing module, as of Python 3.12, based on my personal experience and bias. And since even a seemingly simple topic such as type annotations can get complex after a while, we'll follow the Pareto Principle and address the 20% of things that are used 80% of the time.

Generated by GPT-4's DALLE-3 integration..what a time to be alive!

Intro to typing

It's 2023. Even if you started learning Python this year, you probably know what type hints are, or at least the gist of it. This section is not for you; you can go ahead and skip it. But for a minority that would still be interested in a brief explanation, follow along. We'll get some basic concepts out of the way.

Python, or better say the CPython interpretation of Python (to cater to the pedantics), is an interpreted, dynamically-typed and strongly-typed language. While you can read more about that in my other article On the Nature of Python, which I would highly recommend in light of this topic, the simplified TLDR of it is that in Python you don't define types as you write the code, but the type is defined at run-time by the interpreter.

This is an important distinction to grasp, as it leads to the most important property of type hints in Python: they do not alter or affect code execution in any way. They're just that: hints. With them, you can hint to other readers of the code what your intentions were. You can help your IDE give you better auto-completion and suggestions. You can help static code analyzers like mypy catch potential issues in your code. But when the Python code is being run, the interpreter will simply ignore the annotations; they're nothing more than runtime-accessible metadata. You can think of type annotations as comments with a standardized syntax.

But wait a second...your FastAPI endpoint behaves differently if you change the typing on the function definition, or make changes to the response model (pydantic model). So it turns out typing does affect code execution? Well, FastAPI, pydantic and similar frameworks/libraries utilize annotations in a functional way, to drive their behavior and make certain decisions about data validation and parsing. When we say that annotations don't affect code execution, it means they are invisible to the standard Python interpreter. While that's not entirely true, as we do have a __annotations__ attribute available at run-time (this is how a library like pydantic knows what an object has been annotated as), the interpreter does not make decisions about the code execution flow based on it. And it likely never will. You can easily write your custom library that makes heavy use of the annotations at runtime like pydantic does. But don't mistake that for the actions of the interpreter.

The basics of typing

The most basic example of using annotations is to annotate a variable. Let's take a look at a few examples of that:

As mentioned earlier, Python automatically infers the type at run-time, so it sees no issue with the third example in the snippet above. The variable weight will be of type of float, even though it's annotated as a str. Of course, if the rest of your code assumes that the variable is a string, and attempts to perform string operations on it, your code will break. But that's a TypeError you'd get even if you didn't annotate anything.

Another aspect of Python is that a variable's type can change during its lifecycle. As you can see in the 4th example, age is now a string, so its initial annotation is misleading. The variable should've been annotated again when a value of a different type was assigned to it. One can also argue that changing variable types like that is a bad coding practice, but again, it's something that can technically happen, so you need to keep the typing in sync too.

Here's how to use annotations in a function definition:

As you can see, we have hinted that this function takes in a string and returns a boolean. Sometimes it's useful to annotate the local variables inside the function body too, but that depends on how long/complex that function is and how much annotations are too much for you personally. Functions which return None should be annotated as such. Method parameters self and cls should not be annotated.

We can also annotate more complex data types and custom objects:

There are a few things to note here. For container objects (such as lists, tuples, etc.) we can also annotate the type of of the items contained in it. And we should do so, as that gives a better idea of what this object should look like. For dictionaries, we annotate both the key and the value.

Additionally, and this is a recent addition, we can use the same "constructor" class to create and annotate an instance. So for example, previously we had to use typing.Dict, whereas now we can just directly use dict, which is more intuitive and concise.
Notice also the use of Any; when a value can be just about anything, it's easier to annotate it as such, rather than list all the possible types. It's also better than to not annotate it at all, because this way we're being explicit about what could go in there, helping the reader of the code fend off wrong assumptions. As the Zen says, explicit is better than implicit.

Tuple annotation may seem a bit strange compared to the other data structures, as here we are annotating every element of the tuple individually. But that makes sense when you take into account that a tuple is immutable (fixed size) and it typically stores items of different data types (like a database tuple). If the tuple is long, or you simply don't want to type out individual items, you can always resort to the alternative syntax shown in the snippet.

Lastly, we have the example of annotating with our custom classes. This example assumes the User class is part of the namespace (either defined or imported). But other than that, it's not any different than how we usually annotate everything else.

Type unions

Sometimes, we can expect an object to come in various types, but the scope is not broad enough to resort to typing.Any. Here's where type unions come in:

Here we have defined a dummy function that simply returns what it receives. We can see that its parameter is annotated as a string OR int, and so is the return type. We're essentially saying that any of those two types is acceptable/expected. We don't have to stop at two though; we can chain more than that if necessary.

The type union syntax is supported also by the isinstance() function, as shown at the end of the snippet.

Literal, Annotated, Self

These came out more recently, and I have covered them in more detail in my article Python 3.9-3.11 new features. But let's have a quick summary of it:

- typing.Literal => allows you to go one step further and not just define the type, but what literal value(s) of that type are expected.
- typing.Annotated => allows you to "attach" a comment to the annotation, which can be useful when the type may seem strange or not right to the reader at first glance. Putting a written annotation can help clear things up.
- typing.Self => allows you to annotate that a method is returning an instance of the class it's located in. This is a neat and necessary trick, as the class itself hasn't been defined (or rather, fully interpreted) yet at this point, so we can't just use the class name.

Sequence vs Iterable vs Sized

These three have some overlapping scope, and I sometimes see them being used wrongly, or often not used at all. And when they're not used, it's usually because they have been shortsightedly replaced with list. I mentioned this in part one as one of the cons of typing, and my other article Duck Typing and Python goes into great detail about why this is a bad thing. I think understanding which type to use is not that hard, so let's get to it.

First of all, when you know for sure that you're expecting a specific type, such as list or tuple, go ahead and use that. But I'd say it's always worth considering the generics first, starting from Iterable. This type is used to denote anything that can be looped over, just like the general concept of iterables in Python. Here's an example of how you could use it:

it: Iterable[Any]

If you want to get more specific, then you can use Sequence, as shown below:

s: Sequence[str]

A Sequence is an Iterable that supports indexing, membership checks and has a fixed length (such as lists, tuples, strings etc.). Generators are a good example of an Iterable that is not a Sequence! But a less obvious one might be dictionaries: iterable, but not indexable!

Whereas Sized represents any object that has a __len__() method:

sz: Sized

It does not imply the implementation of any other methods.

A Sequence is both Sized and Iterable; it's the more specific one. Python dictionaries are Sized and Iterables, but not Sequences. An object can be Sized without being a Sequence or Iterable, and it can be an Iterable without being the other two, like generators.

Alright, this is getting a bit convoluted, so let's simplify it with some rules of thumb:

- use Sized if you only care that an object has a length, and you don't intend to iterate over it or access its elements by index
- use Iterable when all you care about is the iteration ability
- and use Sequence when you care about everything

Type aliasing

If the entities become quite complex, hinting gets too verbose and impractical. Creating aliases, as shown below, helps a lot in this case:

As you can see, custom aliases not only help with annotating intermediary objects, but make the end result easier to read and more intuitive to use. Now, we can import and use Circle anywhere else in the codebase, instead of repeating the nested
monstrosity every time.

TypeVar

Most times, we know exactly what types we're dealing with, even if the types are unions or generics. But sometimes we need to be generic within a certain context. If that didn't make any sense, let's drive this home with a super simple example. Consider the following function:

This function can add together integers, strings or whatever type that defines __add__(). Its return value will have the same type as that of the parameters a and b.
Now this is where TypeVar comes in. While the parameters can indeed be many different types, they have to be the same type among them. In other words, both a and b have to be integers, strings or X type. If a is an integer and b is a string, the function will raise a TypeError. That's because Python is strongly typed (I address this topic as well in On the Nature of Python). But the point here is that we need to somehow annotate that both the parameters should be of the same type.

Let's rewrite the function using TypeVar:

In this version, T is a TypeVar that can represent any type. The benefit of using it is that, within that same context, the generic type provides consistency. This means if a is an integer, so should b and ultimately the return value. And if you try to call this function with integer arguments but expect a string in return, a type checker like mypy, or your IDE, will flag the mistake.

A TypVar instance can also be constrained:

T = TypeVar('T', str, int)

A frequently asked question by people who are just introduced to this concept is "Why don't we just use union or Any?". But again, they do completely different things. TypeVar is not so much about the actual type but rather about maintaining a certain type consistency throughout a given context.

The name T is the conventional name to use for a TypeVar, particularly when there's only one in use in that part of the code. The convention is to use U, V etc. when your module needs more than one TypeVar. But the names can also be descriptive, especially when defining a constrained one:

StrOrInt = TypeVar('StrOrInt', str, int)

Usually, a single letter like T is clear in the context at hand, so there's no need for a more descriptive name.

Type Casting

There are times when you will need to annotate an object after it has been defined. I've had to do this to silence wrong mypy or IDE complaints. Python does not support this in a direct way, but there are some tricks we can use. Perhaps the most obvious
one is re-assignment:

And that's all there is to it; you simply re-assign a name to itself and add the annotations this time. Another approach is to use typing.cast:

my_var = cast(str, my_var)

A call to cast() is a no-op at runtime; it's purely an aid for type checkers. Either one of the two approaches is fine in general though, and the choice is mostly a matter of style.

Conclusion

Alright, that should cover the essentials. There's a lot more to this if you read the documentation, but I hope I managed to distill some of it. Remember that the journey to mastering Python type annotations is ongoing and dynamic (much like the language itself) for many of us. Embrace these basics as a foundation, and I think you'll find yourself well-equipped to explore the deeper, more intricate aspects of the typing module.

Comment