Julian's Blog - Using the Pre-Commit Framework

JulianHysi 2 July, 2022

Using the Pre-Commit Framework

Overview

Pre Commit is a popular language-agnostic framework for managing and maintaining multiple git hooks. Git hooks are pre/post actions that get triggered by a certain git event: in our case making a commit. The action could be a few lines of bash code, or a whole other tool. We will be hooking tools that perform static code analysis; i.e. examining the code for possible issues without executing it.

While you can manually create git hooks just fine, pre-commit makes working with them as a team much more practical, by storing all the configs in a .yaml file. Also, you don’t need to install anything else on your own, except for pre-commit itself.

Head over to the introduction section of pre-commit for more details.

Setting up pre-commit

The pre-commit package is installed just like any other python package, by using pip. Hit

pip install pre-commit

to do just that. Make sure to also update the requirements.txt file, so that everyone else working on the project can just refresh their venv to get it.

However, there is also one extra step to follow: we have to tell the pre-commit file on the .git/hooks/ directory to use the hooks we want. Doing that is simple, but first we have to add a .pre-commit-config.yaml configuration file, in which the hooks are laid out. You can find a good example of that in this gist that I wrote. As is often the case with configuration files, place it in the project's root directory. If you don't understand how or why the file is set up like that, don't worry. We'll go over it in the next section. Do keep in mind that it is not a bare-minimum configuration file, but one that you would actually use in your day-to-day work in a big project, hence it's a bit lengthier and customized. Once you get the gist of it (pun intended), you can add, remove or tweak things to your liking, which should be fairly straightforward.

Alright, now that the configuration file is in place, hit pre-commit install from the terminal. This will go to the .git hooks dir and edit the pre-commit file by making it point to the pre-commit tool, so that before every commit, the pre-commit tool runs all the hooks laid out in the configuration file. To avoid confusion, please bear in mind the distinction between the pre-commit git event, and the pre-commit package/tool. The pre-commit event is a git internal, and git provides you with a default pre-commit.sample file inside the hooks dir, whenever you do a git init. Removing the .sample extension enables it to be triggered. The triggering part is also done by git itself, but the file has to be named exactly like that in order for that to succeed. The Pre Commit framework on the other hand, with its pre-commit python package, is just a command-line tool. It could have been named anything else, and doesn't necessarily need a commit action in order to execute. By installing it, we simply glue these 2 actors together. That is, the git internal pre-commit event, and the pre-commit tool, so that the tool runs automatically every time we attempt to commit.

Understanding the configuration file

The file .pre-commit-config.yaml resides at the project’s root directory, and it contains all the hooks. This file is where you’d add or remove hooks, or change existing ones.

Each repo is a source of hooks. The pre-commit framework also provides us with a pre-commit-hooks repo, containing some basic hooks. As you can see, one repo can contain multiple hooks, and you have to include only the hooks you want, by specifying the hook id. Besides it, we have included a couple of other repos as well.

For each repo, you can specify the revision you want to use. And for each hook, you can set all the options it supports, as well as pass in args as you would when executing that tool from the command line.

Head over to Supported hooks for a list of all the hooks that pre-commit officially supports. Keep in mind that some of them aren’t relevant to Python, some are direct competitors to each other, and some have partial overlapping scope. Therefore we’ve only included a reasonable subset of them in our config file.

The hooks up to flake8 make changes to the files (more on that later), so it makes sense to have flake8 run last, to avoid having it fail for something that the other tools would fix. For example, if we have an unused import in a python module, then pycln would take care of that. But if pycln comes after flake8, flake8 would also fail due to the unused import. This is something to keep in mind when making changes to the config file; the ordering of the hooks matters! If the order isn't right, you may even run into ciruclar issues, where the first tool causes the second to fail, and vice-versa. Also, tools that have some overlapping scope, need to perform that in the same manner. For example, isort sorts the imports, but also affects the formatting, which is black's domain. If you don't set up isort to use the black profile for the formatting, black will always fail due to the "wrong" formatting applied by isort, and the commit would never go through.

The config file comes with some helpful comments to help you understand what's going on. I'd like to emphasize that we are ignoring migration files. It doesn't make sense to change files that were generated automatically by another tool (i.e. Django or Flask). We are also ignoring some flake8 checks, for reasons explained in the file. There are workarounds to handle them, but it's questionable whether the benefits outweight the effort, so we're keeping it simple for the moment.

Basic usage and commands

The pre-commit hooks will be automatically triggered every time a git commit is performed, operating only on the staged files. If the hooks pass, meaning they don’t find any issue or anything to improve, the changes are committed as usual. Otherwise, the commit fails, displaying the reason. Note that some hooks, prior to failing the commit, will make actual changes to the files, attempting to solve any issue they can. In that case you have to review the changes and stage them, before attempting to commit again.

Following is an example of a pre-commit pipeline:

pre commit pipeline diagram

You can also invoke the hooks without committing, by hitting pre-commit run. As always, it will only work on the staged files. You can add a –all-files option to it, to process the entire codebase. Which is generally a good idea if just you’ve just added a new hook.

And you can also do a pre-commit run <hook_id> to run just one specific hook out of everything you have in the config file.

On a related note, running a new hook for the first time will be a bit slower, because pre-commit needs to download and install the actual tools and their dependencies. But they’re not disposed after the hook finishes executing, so they’ll be reused the next time the hook runs.

With new releases, the versions of the repos in the config will change. Instead of having to manually extract the latest versions from each repo, you can simply run pre-commit autoupdate and it will take care of all that.

What if you need to make an urgent commit, and you don’t have time to address the issues ? In that case, you can pass in the –no-verify option to the git commit command, and it will skip all checks. If you would like to skip only a subset of all the hooks, you can do SKIP=flake8,black git commit -m “text”.

Lastly, you can run pre-commit uninstall to uninstall the hooks entirely. Note that this command will only remove the hooks that you installed using pre-commit, not the ones you created manually before or after installing the pre-commit ones.

Ignoring checks

The config file is set up so that after all the hooks which make changes have completed, the last hook (i.e. flake8) performs some checks to see if the changed files are PEP8-compliant. However, the previous hooks cannot always solve the issues they find. The hooks like to play it safe, so they don’t make dangerous decisions. Also they don’t cover the entirety of the PEP8 standard, so you’ll often find flake8 flagging issues despite the improvements made by the other hooks.

But the issues need to fixed, otherwise the commit will not go through. We have to manually step in and apply the fix, if possible. What if the fix isn’t possible, or it’s a trade off we’re willing to make and we’re fine with it ? Or we need to commit and push the changes now and apply the fix later ?

There’s a way to tell flake8 to ignore a certain line: add a # noqa comment at the end of the line, to ignore any kind of check. This is not a very good practice though (think catching all types of exceptions), and for that reason flake8 provides us with a list of codes corresponding to specific issues. E501 stands for line too long, F823 for referencing a variable before assignment, and so on. The example would be:

name = ‘John Doe’ # noqa: F823

telling flake8 to ignore just that type of check. It’s not a good idea to overuse the ignoring feature though, so try to fix what you can, and put a TODO comment right after the noqa comment, when you plan to fix it later.

name = ‘John Doe’ # noqa: F823 # TODO: will fix this later

For more on ignoring issues, please refer to the flake8 official documentation .

Comment