Moving away from PR/MR workflow…

In the past two years, many projects of mine got more and more contributions. Some of those projects, such as This Week in Neovim (that I gave away), kak-tree-sitter and others received many contributions. As I reviewed contributions, I realized that I do not really enjoy the tools used to implement the process. That is, GitHub. Moreover, I have a special thing against Microsoft (in)famous way of doing things in (c.f. EEE). So, why is it such a big deal to me, and what is the alternative?

What’s wrong with git host platforms (GitHub, GitLab, etc.)

git (I don’t think I need to explain what it is, right?) was created by Linus Torvalds (again, I don’t think I need to present him) to solve the BitKeeper dispute. Back then, people were using CVS or SVN, and something like BitKeeper was a centralized place where the kernel was developped.

In 2005, after the decision to stop providing free BitKeeper copies to the kernel community, Linus created git. He created git to provide the free and open-source software community with a versioning tool working on a decentralized model. Indeed, with git, teams can organize the way they want and exchange patches and commits without having to depend on a third party.

For instance, if Alice and Bob want to work on a project together and version it via git, they do not need to rely on anyone else but themselves. They can send commits to each other with git push or git pull (granting they configure their ssh correctly). They can use their university server to host a bare git repository and synchronize there. They can exchange patches by mail. Etc.

Now, something else they could do would be to use GitHub. Or GitLab. Or BitBucket. In the end, it’s the option mentioned above: a repository always up for people to send commits to and receive commits from. However, I do see one weird pattern here. Using GitHub, for instance (but really, it’s the same for most others) is basically the same as taking a decentralized tool and putting it in jail. If someone wants to contribute to Alice and Bob project, they have to create a GitHub account, and abide by the GitHub rules. More interesting, GitHub is not a free and open-source project anymore. It’s owned by Microsoft. Alice and Bob projects are physically stored on a Microsoft server. Given the history of Microsoft in the FOSS world, I can imagine how it would cause a moral problem.

Another point that is specific to GitHub: Copilot. I think Copilot is FOSS abomination. The reason is that, even though your code is under a copyleft — or even a copyright, GitHub terms of service is pretty explicit about the hosted content: they can use it to improve their own products. This section for instance. I don’t know about you, but to me, it’s grey zone there. I do not like the idea that my code can be “analyzed” by Microsoft to enhance Copilot. All of my projects are licensed under a copyleft (BSD-3) that mentions that my name should be mentioned when my code is used, and other clauses that I think are violated by this Copilot horror.

And I’m not even talking about Microsoft Copilot Pro, which is a product. That you have to pay for. That is made from public, free and open-source contributions from millions of projects. Fuck that.

Finally, those platforms often host many different things that people will start using and highly depend on. For instance, bug tracker, wikis, project management, etc. Even though this is not the main problem to me, I like using something that was made to solve a single problem. Having a giantic platform solving basically “How to do software development” is something that now sounds like a red flag to me. What happens when you decide to move on? Well, you have to move everything. And it’s my case today.

Going back to a decentralized way

Something like two years ago, someone on IRC praised the email-based git workflow. Wait, email-based? In 2022? Isn’t that a bit too hardcore? Well, there is one really big misconception about emails: all-in-one providers.

Like many other people, I do use a webmail. If it’s not GMail, it’s Outlook or something else. GMail is hard not to notice, because in the professional world, every companies use it. It’s easy for them to manage and you get all the important things at once — swarm of newcomer welcoming emails, meetings, various meetups, etc. All in one. The problem with that is that it requires you to go to your browser. And the online experience is honestly pretty bad — it always has been to me. I try to spend as few times as possible in GMail, because I don’t really like the interface.

So how can someone use emails to work with git? Well, from the beginning, git was made with the email workflow on mind. The idea is simple:

  1. You make your changes exactly the same way as you normally do.
  2. When you have your commits and you are ready to share your contributions with others for review, instead of going to a centralized platform to push your branch to and then open a PR, you simply use the git send-email command — or git format-patch and manually send the patch, but really, you shouldn’t have to do that. You typically send that email to a development mailing list.
  3. People discuss the patch by replying to the emails.
  4. Either you send other patches via other mails, or a maintainer eventually approves and applies (forget about merging here!) the patches with git am.
  5. Done.

Misconception 1: reviewing is hard

The main misconception part is about the review part. On GitHub, reviewing is nice, because you can comment in line. In emails… well, it’s exactly the same, and even better. What people usually do is to reply to your email by quoting it, and dissect the email section by section. For instance, imagine I have a project with a README.md file containing a single line:

Hello. My name is Dimitri and this is a README.

Let’s say I modify that file to hold this content instead:

# Hello world!

My name is Dimitri and this is a README.

I make a commit with git commit -am "Add hello section to README.md.". When using git send-email / git format-patch, it will generate something like this patch:

From 020d9085a35d6c34bc1b7e86b78f257c4d556bbb Mon Sep 17 00:00:00 2001
From: Dimitri Sabadie <hadronized@strongly-typed-thoughts.net>
Date: Thu, 30 May 2024 14:28:11 +0200
Subject: [PATCH] Add hello section to README.md.

It looks better to me and allows to demonstrate the email workflow a bit more.
---
 README.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 72ae80f..70341da 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,3 @@
-Hello. My name is Dimitri and this is a README.
+# Hello world!
+
+My name is Dimitri and this is a README.
--
2.45.1

The first 4 lines are part of the email, so when receiving your email, the client of the people reading the mailing list should format all that correctly. The rest is the email content! Someone could reply something like:

> -Hello. My name is Dimitri and this is a README.
> +# Hello world!

Thank you, we needed that header.

> +My name is Dimitri and this is a README.

May you add more info to include other maintainers as well? Alice and Bob could
provide more details there.

Cheers,
Dimitri

The author of the patch can then iterate and send another email (with the -v2 annotation, because it would be the next version; -v3 for the next iteration, etc.). The advantage here is that someone can reply to that first review by just discussing the first part, quoting it:

> > -Hello. My name is Dimitri and this is a README.
> > +# Hello world!
>
> Thank you, we needed that header.

I think we should also include a sub-header describing something else.

While at the same time, someone can discuss the second part of the change:

> > +My name is Dimitri and this is a README.
>
> May you add more info to include other maintainers as well? Alice and Bob could
> provide more details there.

Yes, like email addresses and PGP signatures, please.

Those will create a tree of emails, where each sub-thread is scoped to a specific part of your patch. Such a feature is not possible in GitHub or any other centalized platform.

So the only important thing is to use a good client. I personally use aerc, along with some other tools to synchronize IMAP maildirs and send SMTP messages. The configuration is not that hard once you get the hang of it.

Accounts?

Sending an email to contribute is nice, but where to send the email? Well, it depends on the project. For that, I think that every open-source projects should have a CONTRIBUTING.md file explaining where to send patches. I use SourceHut, which provides me a place to host my repository and mailing lists. The rest is done entirely outside of SourceHut. Especially, even though you can create an account on SourceHut, you don’t have to. You don’t have to in order to:

That part here makes me smile, because it makes so much more sense to me. People can contribute with their email address. They do not need to engage with the platform to start contributing, which is nice and more decentralized to me. One can even send a patch to a maintainer directly if they really want.

Authorship

I think the problem with the email-based workflow is that you don’t exchange commits anymore, but patches. There is a slight difference.

A patch is a set of changes to be applied. It contains some metadata, like the parent commit to apply to, the expected SHA1, some diff-stats, and that’s pretty much all.

A commit is a patch with more metadata, like the committer / author identities, a commit message, the parent commit, a signature, etc. etc.

A commit always refers to at least one parent (unless it was the first commit). When you send a patch over email, the maintainer that applies the patch does it with git am, which turns the email content into a commit. The message commit, for instance, is taken from the mail body. However, there is one information that is lost during the process: the signature.

Indeed, when you work on your local copy of the repository, you might sign your commits with your PGP key. When you decide to send the commits as patches to a development list, all of the metadata from the commits are ripped. That means that the signature is simply lost.

For most people, in terms of security, it’s not that bad. Patches are accepted from emails, and you can (you should!) sign your email with your PGP key. As a maintainer, my email client (aerc) shows me the signature. If you don’t sign your email, I cannot certify it’s you sending me the email. Most people don’t seem to care that much about authorship, but I do. I think it’s important to be able to retrace that a patch which identity is foo@bar.net was really provided by that person. A signature doesn’t prove you wrote the patch, but it proves you submitted it, the same way that signing a commit means that you integrated that commit into the repository.

Currently, there is no way to do that on the email workflow and I find that a bit of a problem. There should be an alternative, like manually signing the patch. The problem is that, if the maintainer has to use a three-way merge, it will invalidate both the parent commit’s SHA1 and the expected SHA1 of the patch, invalidating the carried signature.

In the end

Even though the email workflow has some problems, having the contributions locally and applying patches locally is a much more enjoyable process. The first thing is that sending many commits ends up in a nice thread of mails. It’s super easy to just open the thread locally and apply the patches. The only concern I have is authorship, but people told me I might be the only person caring about that — am? All in all, this email-based workflow allows me to do everything from my terminal, and even tickets (todos in sourcehut) and bug reports / feature requests are done via email. It just feels good.


↑ Moving away from PR/MR workflow
aerc, email-based, pr, mr
Wed Jun 5 10:12:00 2024 UTC