[00:00.000 --> 00:13.760] Hello everybody, welcome back to the Stack Overflow podcast, a place to talk all things
[00:13.760 --> 00:16.000] software and technology.
[00:16.000 --> 00:21.280] I'm your host Ben Popper, world's worst coder, joined as I often am by my colleague and collaborator
[00:21.280 --> 00:25.760] Ryan Thor Donovan, editor of our blog and our newsletter.
[00:25.760 --> 00:29.800] Ryan, you helped to set up today's episode, so refresh me here.
[00:29.800 --> 00:32.040] Who's our guest and what are we going to be chit-chatting about?
[00:32.040 --> 00:35.480] So it's Pierre Etienne Meunier.
[00:35.480 --> 00:42.960] He reached out because of this version control article I wrote, and he's creator, I believe,
[00:42.960 --> 00:49.720] of Pijoul, which was briefly mentioned in the article, didn't hear much about it, but
[00:49.720 --> 00:53.640] somebody was interested in that it used patch algebra.
[00:53.640 --> 00:54.640] Cool.
[00:54.640 --> 00:57.920] This was in the article about you're not using Git, but why?
[00:57.960 --> 00:59.760] Is this where we're getting back to?
[00:59.760 --> 01:03.840] Yeah, it's the version controls that people use other than Git.
[01:03.840 --> 01:04.840] Okay, gotcha.
[01:04.840 --> 01:08.080] Well, then without further ado, Pierre, welcome to the program.
[01:08.080 --> 01:10.080] Well, thanks for having me.
[01:10.080 --> 01:17.440] Yeah, I did read the article and I was super interested in all the options and alternatives.
[01:17.440 --> 01:24.040] The fact that Pijoul was just briefly mentioned was like, okay, yeah, they probably haven't
[01:24.040 --> 01:25.840] heard much of it.
[01:26.800 --> 01:31.320] This is based on a previous version control system called Darks.
[01:31.320 --> 01:36.560] Some of the ideas come from Darks, but at some point we were using Darks, like me and
[01:36.560 --> 01:42.040] a colleague, Florent Becker, we're using Darks to write on paper about something completely
[01:42.040 --> 01:47.360] unrelated, like the tilings and geometry in computer science.
[01:47.360 --> 01:53.920] After work, we went out for a beer and started discussing version control, and Florent told
[01:53.960 --> 02:01.400] me that, well, as one of the last remaining maintainers of Darks, he could tell me that
[02:01.400 --> 02:06.800] the algorithm was not proper, it wasn't an actual algorithm, there was a bunch of holes
[02:06.800 --> 02:14.440] in it, and this explains why it was so slow in dealing with conflicts.
[02:14.440 --> 02:18.280] And so we started chatting about, oh, look, we're computer scientists, and so our job
[02:18.280 --> 02:23.160] is to design algorithms and study their complexity, so, well, this is a job for us.
[02:23.160 --> 02:28.440] We're also working on a model of distributed computing, so this was like, okay, this is
[02:28.440 --> 02:32.760] exactly the kind of stuff we should be interested in.
[02:32.760 --> 02:39.200] This is one of our chances to have an impact on something, and so there we started working
[02:39.200 --> 02:45.160] on some bibliography first, we found some arguments about using category theory to solve
[02:45.160 --> 02:51.040] the problem, and then we started working on that and writing code and code and more code
[02:51.120 --> 02:57.120] and debugging, and it turned out to be a much bigger project than we first imagined.
[02:57.120 --> 03:02.120] And so when I saw Ryan mention Darks, and yeah, well, there's this new thing coming
[03:02.120 --> 03:08.120] out, maybe someday, but it doesn't look finished yet and something, I reached out and was like,
[03:08.120 --> 03:13.240] oh, yeah, but, well, you're interested in version control, and probably there's something
[03:13.240 --> 03:15.320] we could chat about together.
[03:15.320 --> 03:19.280] Yeah, you sit down for a beer, you start talking version control, and things always go a bit
[03:19.280 --> 03:22.240] farther than you expected, I think that sounds about right.
[03:22.240 --> 03:26.000] Pierre, can you give people just a super short synopsis?
[03:26.000 --> 03:29.600] We started out talking about you already as an engineer and stuff like that, but how did
[03:29.600 --> 03:30.960] you get into this field?
[03:30.960 --> 03:34.760] What got you started down this path to the point where you're sitting around a bar coming
[03:34.760 --> 03:36.360] up with your own version control system?
[03:36.360 --> 03:39.720] How did you get educated in this world and enter into software development?
[03:39.720 --> 03:46.560] I'm not educated at all, really not, I don't know anything about software engineering,
[03:46.560 --> 03:55.120] I'm not an engineer myself, so I started coding when I was a bit young, I guess, on an old
[03:55.120 --> 04:00.800] computer that my uncle left when he went out for some trouble.
[04:00.800 --> 04:08.360] I think I was like 12 back then, and then I've been coding on and off, and then started
[04:08.360 --> 04:14.720] studying mathematics and physics, got into logics, theory improving, that kind of stuff,
[04:14.960 --> 04:21.720] and where I'm from, there's one thing that is studied in France, which I don't think
[04:21.720 --> 04:25.040] anywhere else in the world, and that's the Old Camel language.
[04:25.040 --> 04:32.200] The Old Camel language is when you grow up as a French man and go to university to study
[04:32.200 --> 04:38.200] general science, like the first two years of basic mathematics, science, physics, and
[04:38.200 --> 04:44.200] all that, you get told that, well, this Old Camel thing, that's really something French
[04:44.240 --> 04:48.440] and it's really something we should be proud of, because there's this legend of computing,
[04:48.440 --> 04:52.760] Xavier Leroy, he did everything there, and he was the first to show the world that you
[04:52.760 --> 04:58.040] can design a functional programming language that's at the same time really fast.
[04:58.040 --> 05:03.400] But then I was interested in that and wanted to study computer science, did a PhD in computer
[05:03.400 --> 05:11.000] science, like theoretical computer science, but then I ended up working on a theoretical
[05:11.000 --> 05:15.560] computer science. What people call here fundamental computer science doesn't mean it's
[05:15.560 --> 05:21.000] particularly important or useful, it just means it's like the basic things, the foundational,
[05:21.000 --> 05:27.000] like probably foundational is a better name for that. So yeah, that's how I got started.
[05:27.000 --> 05:32.920] So, you know, in researching the article, I started off way back in the day on Visual
[05:32.920 --> 05:39.480] Source Safe, and it seemed like there were natural developments, but Visual interested
[05:39.480 --> 05:43.960] me because it uses patch algebra, right? Can you talk about what that is?
[05:44.600 --> 05:51.720] Yeah. So patch algebra is a completely different way of talking about version control,
[05:52.360 --> 05:57.560] of thinking about version control. Why? Well, because instead of controlling versions,
[05:57.560 --> 06:02.360] you're actually controlling changes. That's completely different. It's actually the dual
[06:02.360 --> 06:09.160] of versions is changes. So instead of just saying, well, this version came after that version,
[06:09.560 --> 06:18.600] which is something that's about CVS, RCS, SBN, Git, Mercurial, Fossil, and whatnot, like all
[06:18.600 --> 06:24.760] this family of systems. So they keep controlling versions, they keep insisting on the version
[06:24.760 --> 06:31.800] and the snapshots that come one after the other. In contrast to that, most of the research about
[06:32.360 --> 06:37.240] parallel computing, about distributed data structures, focuses on changes.
[06:38.040 --> 06:43.800] So a change is, for example, well, I introduced the line here and deleted the line there. I
[06:43.800 --> 06:49.880] renamed the file from X to Z. I deleted that file, for example. I introduced a new file.
[06:51.000 --> 06:57.240] I solved the conflict. That's also super important. And in contrast to talking only
[06:57.240 --> 07:05.000] about snapshots and versions, this gives you much higher flexibility because all systems that deal
[07:05.000 --> 07:12.760] with versions actually show versions or commits or snapshots as changes. If you look at a commit
[07:12.760 --> 07:17.720] on GitHub, for example, you will never see the actual commit, as in you will never see the actual
[07:17.720 --> 07:23.720] full entire version. What GitHub will show you when you ask about the commit is what it changed
[07:23.720 --> 07:29.000] compared to its parents. So actually, there's a fundamental mismatch in the way people think
[07:29.000 --> 07:35.160] about version control when they use Git. They think, well, everything they see is changes or
[07:35.160 --> 07:40.520] differences. And then everything they need to reason about when they actually use the tool
[07:40.520 --> 07:48.040] is versions. So how can you reason about that? Well, we found ways around it, right? We have all
[07:48.040 --> 07:55.640] these workflows and Git gurus that will tell you what you should and should not do and all that.
[07:55.640 --> 08:01.240] You have good practices and all these things. But fundamentally, what these good practices aim at
[08:01.240 --> 08:06.600] is getting around this fundamental mismatch between thinking about having to think about
[08:06.600 --> 08:13.800] something you never see. So what patches and change algebra gives you is that now you can reason about
[08:13.800 --> 08:19.240] things. So you can say, well, these two patches are independent, so I can reorder them in history.
[08:19.880 --> 08:27.560] This sounds like a completely useless and unimportant operation, but it's not. What that
[08:27.560 --> 08:34.280] means is you can actually, for example, you can take a bug fix from a remote and
[08:34.280 --> 08:39.880] cherry pick it into your branch without any consequences. You will just cherry pick the
[08:39.880 --> 08:45.080] bug fix and that's it. And it will just work. You won't have to worry about having to merge that
[08:45.080 --> 08:50.120] branch in the future. You won't have to worry about any of that. And if that bug fix turns out
[08:50.120 --> 08:56.280] to be bad and turns out to be inefficient, for example, and you've continued working,
[08:56.280 --> 09:01.720] well, you can still go back and remove just that bug fix without touching any of your further work.
[09:02.360 --> 09:07.800] So this gives you this flexibility that people actually want to reason about. So when you're
[09:07.800 --> 09:13.640] using Git, you're constantly rebasing and merging and cherry picking. And there's also all these
[09:13.640 --> 09:19.400] commands to deal with conflicts, which Git doesn't really model. There's no conflict in
[09:19.400 --> 09:25.320] commits. Conflicts are just failures to merge and they're never stored in commits. They're
[09:25.320 --> 09:30.600] just stored in the working company. And so when you fix a conflict, Git doesn't know about it.
[09:30.600 --> 09:35.400] It just knows that, oh, here's the fixed version. So this means that if you have to fix the same
[09:35.400 --> 09:40.120] conflict again in the future, well, Git doesn't know about it. It just knows that, well, there
[09:40.120 --> 09:44.920] was this conflict or there was these two versions that the user tried to merge. And then there was
[09:44.920 --> 09:49.000] this version with the conflicts fixed, but it doesn't know how you fixed the conflict.
[09:49.000 --> 09:54.920] So your conflicts might reappear and you might have to solve them again, or you might even have
[09:54.920 --> 09:59.800] conflicts that just appear out of the loop. And then you don't know what these conflicts are about
[09:59.800 --> 10:05.240] and you still have to solve them. And in contrast to them, having this ability to reorder your
[10:05.240 --> 10:11.560] changes gives you a possibility to just remove one side of the conflict without touching the other
[10:12.200 --> 10:18.360] or model precisely what happens when you change things. It also forces you to look at all the
[10:18.360 --> 10:23.880] cases. When you look at all the cases of a merge, you're like, okay, what are all the cases of a
[10:23.880 --> 10:28.920] conflict? Well, for example, if two people introduce a file with the same name in parallel, it's a
[10:28.920 --> 10:35.880] conflict. If I change a function's name and if Alice changes a function's name and Bob at the
[10:35.880 --> 10:40.280] same time in parallel calls that function, what should happen? Is that a conflict? Well,
[10:40.280 --> 10:47.960] Girola actually doesn't model that, but it does model a large number of cases of conflicts. And
[10:47.960 --> 10:54.440] so this is much easier. It will probably save a lot of expensive and ingenious time.
[10:54.600 --> 11:02.520] Listen to season two of Crossing the Enterprise Chasm, hosted by Work OS founder, Michael
[11:02.520 --> 11:07.560] Greenwich. Learn how top startups move up market and start selling to enterprises with features
[11:07.560 --> 11:13.880] like single sign-on, directory sync, audit logs, and more. Visit workos.com slash podcast,
[11:13.880 --> 11:16.360] make your app enterprise ready today.
[11:16.360 --> 11:25.560] Yeah, in my experience, the merge conflicts are very manual. So it takes a lot of time to actually
[11:25.560 --> 11:31.400] resolve them. Does visual and the patch algebra, does that help reduce the manual load?
[11:31.400 --> 11:36.920] Yeah, absolutely. So first of all, you have much less conflicts. Why? Well, because all these
[11:36.920 --> 11:41.960] artificial conflicts that gets just invents out of nothing, just because you didn't follow the
[11:41.960 --> 11:47.160] good practices, for example, or you have long lived branches for some reason, because your
[11:47.160 --> 11:54.680] job requirements won't need that. So you won't have all these conflicts. So there is a lot less
[11:54.680 --> 12:00.760] manual work to do first, because there is less problems to fix. And then when you're in the
[12:00.760 --> 12:07.960] process of solving the conflicts, what happens in Girola is that we keep track in the data structure
[12:07.960 --> 12:15.080] used to merge the batches. We keep track of who introduced which bytes. It's down to the byte level.
[12:15.080 --> 12:21.880] It's still super efficient, but we know exactly who introduced which bytes in which batch.
[12:21.880 --> 12:27.640] We can tell, okay, this byte comes from that batch. And so this is a really useful tool to,
[12:27.640 --> 12:31.880] you know, if you want to solve conflicts, because you can, while you're solving a conflict,
[12:31.880 --> 12:38.120] you can know exactly what the sides of the conflict are, and this how to solve them much
[12:38.120 --> 12:44.040] well, in my experience, at least how to solve conflicts much, much easier. So I think this is
[12:44.040 --> 12:51.320] going to save a lot of time. I was going back in our history of podcasts we recorded, and I
[12:51.320 --> 12:56.840] remember now we sat down with Arthur Breitman, who is also educated in France to talk about Tezos
[12:56.840 --> 13:03.800] and the blockchain, and why he loves OCaml. So you're right. For every child who was educated
[13:03.800 --> 13:07.720] in that, something interesting came out of it. A source of national pride and some interesting
[13:07.720 --> 13:12.760] ideas about functional programming. Well, the initial version of Rust was also really OCaml.
[13:12.760 --> 13:18.760] See? Today I learned. So one of the interesting splits I found in the version control article was
[13:19.320 --> 13:26.680] between folks who deal with mostly code and their source control, and places like
[13:26.680 --> 13:33.160] video game companies that have large binaries. Does patch algebra apply to the binary files as well?
[13:34.040 --> 13:39.240] Absolutely. Because when you're describing changes, when you're describing what happens
[13:39.240 --> 13:47.000] in the change, you might say things like, oh, Alice today introduced that file. She added the
[13:47.000 --> 13:55.400] file to the repository, and the file is like two gigabytes in size. And so there's the actual two
[13:55.400 --> 14:00.760] gigabytes, which Git might store, for example. Well, you better use aliphats if you do that.
[14:00.760 --> 14:07.400] But in a classic version control, you might just add the file to SVN, for example. You might just
[14:07.400 --> 14:13.480] upload the file, and that's it. When you're describing changes, so you can try to do that
[14:13.480 --> 14:18.840] in darks, but I don't recommend it for performance reasons. But in people, what you'll tell actually,
[14:18.840 --> 14:24.440] you'll be like, okay, here's a change. Alice introduced two gigabytes. That's what I just
[14:24.440 --> 14:30.680] said is very short. And it's just like one file, and the information is really tiny. It's just like
[14:30.680 --> 14:34.840] logarithmic in the actual two gigabytes. And then there's the two gigabytes themselves.
[14:35.640 --> 14:41.240] And the thing is, using patches, you can separate the contents of the patches from what the patch
[14:41.240 --> 14:46.360] did. So by modeling the actual operation, you can be like, okay, I can apply this patch without
[14:46.360 --> 14:52.120] knowing what's in the file. I can just say that I added two gigabytes without telling you what the
[14:52.120 --> 14:59.880] two gigabytes are. So this sounds like, okay, how can this be useful? Well, if Alice goes on and
[14:59.880 --> 15:07.000] writes multiple versions of the two gigabyte file, she might just go on and do that, upload a few
[15:07.000 --> 15:11.880] versions. And then when you want to know what the contents of the file, you don't have to download
[15:11.880 --> 15:15.960] directly, you just have to download. Well, Alice added two gigabytes here, then she modified the
[15:15.960 --> 15:22.040] file, added another gigabytes, then she compressed off and did something. And then like there's
[15:22.040 --> 15:28.040] another three gigabytes patch. But then you don't have to download any of that, just have to download
[15:28.040 --> 15:36.600] the information that Alice did some stuff. And then after the fact that you applied all of Alice's
[15:36.600 --> 15:41.400] changes, you can just say, okay, here are the two, like the remaining parts of the file that are
[15:41.400 --> 15:47.400] still alive after all these patches are those bytes. And that now I just have to download
[15:47.400 --> 15:53.560] those bytes. So maybe I'll just end up downloading one full version of the file or two gigabytes,
[15:53.560 --> 15:57.640] but I won't download the entire history, going through all the versions one by one.
[15:58.600 --> 16:04.920] So like, I believe I've never tested that at scale on an actual video game project. But I
[16:04.920 --> 16:11.160] believe that this has the potential to save a lot of bandwidth and make things a lot easier
[16:11.160 --> 16:15.640] for video game studios. And actually, I have a project going going on with the authors of
[16:16.440 --> 16:23.720] Godot, the open open source video game studio, like a video game editor. So we'll see what
[16:23.720 --> 16:31.000] goes out of that. But we're totally aligned in what we want to do. We're like fully open source.
[16:31.080 --> 16:36.360] So there's something exciting and new going on in the video game industry.
[16:37.320 --> 16:40.280] I think Godot is really bringing in a lot of fresh air.
[16:40.840 --> 16:47.960] Yeah, I mean, the fully open source projects are very popular. If you have a bug you want to fix,
[16:47.960 --> 16:53.720] you can just go fix it. And it's been fascinating to see sort of the race going on these days between
[16:53.720 --> 16:59.560] the closed corporate world that's developing cutting edge AI, and all of the places like
[16:59.560 --> 17:04.440] hugging face and stable diffusion and you know, others that are trying to keep pace with them in
[17:04.440 --> 17:08.520] an open source way and with a kind of a community of contributors. So very cool.
[17:08.520 --> 17:15.320] Yeah. So we were talking about it doesn't create versions. It seems to me that the version system
[17:15.320 --> 17:21.720] is sort of a legacy from when we actually burn disks or released binaries to download and install.
[17:22.280 --> 17:28.760] With your background in distributed computing, do you think that this can be a better way to
[17:29.560 --> 17:32.520] update and maintain all the distributed systems we have now?
[17:33.080 --> 17:41.960] Yeah, I hope so at least. So one really cool example I can give is NixOS. So NixOS is not really
[17:41.960 --> 17:47.480] a Linux distribution. It's actually a language with a massive standard library containing a lot of
[17:47.480 --> 17:54.360] packages. And you can use this language to build your own system. So that's the promise of NixOS.
[17:55.000 --> 18:01.800] And so while doing so, for example, if you're maintaining machines in the cloud, you probably
[18:01.800 --> 18:09.400] want to build an image and use like one custom version of what's called NixPickages, standard
[18:09.400 --> 18:16.040] library of NixOS. And so you want to customize this in one way or another and then release some
[18:16.040 --> 18:23.480] of your patches to the official central repository for NixPickages, but then keep some of the others
[18:23.480 --> 18:29.080] for yourself. So you want these multiple ingits. You would do that by having lots of different
[18:29.080 --> 18:37.160] branches or feature branches, which you can push to the central repository. Then you would work on
[18:38.360 --> 18:45.320] another branch, which would be the merge of all those patches plus the changes that occur in the
[18:45.320 --> 18:49.960] NixPickages central repository. Then this quickly becomes a nightmare to maintain because you have
[18:49.960 --> 18:55.000] to keep rebasing your changes on top of each other and on top of what happens in NixPickages.
[18:55.000 --> 19:01.800] Then when your changes get merged in NixPickages, you get conflicts. And so you have to go back to
[19:01.800 --> 19:10.040] some old comments, which might not even exist. So I believe that maintaining multiple versions or
[19:10.040 --> 19:17.560] multiple fixes at the same time to one tool can be much, much, much easier using tools like Yehud.
[19:17.640 --> 19:24.040] So there's one announcement I can make. This is something I've been working on for a while.
[19:24.680 --> 19:30.840] So Pierrot has its own sort of like GitHub thing for Pierrot, which is called The Nest.
[19:31.480 --> 19:37.960] And so far it's been not super successful, neither commercially nor, I should say,
[19:38.680 --> 19:45.240] industrially because it doesn't scale very well. It's been through a data center fire,
[19:45.240 --> 19:52.680] if you remember, two years ago in Strasbourg in the OVH fire. And so it's using a replicated
[19:52.680 --> 19:57.800] architecture, but it's not very satisfactory. It's written in Rust. It operates in three different
[19:57.800 --> 20:03.240] data centers, but it's not easy to maintain. So I've been working on a new serverless infrastructure
[20:03.240 --> 20:09.240] for that, Function as a Service. So Function as a Service providers don't give you an actual disk
[20:09.240 --> 20:16.040] on which you could run Yehud, but I've been able to fake Yehud repositories using Cloudfair's
[20:16.040 --> 20:24.360] KUV, for example. And so this gives infinite scalability and an excellent reliability. So
[20:25.160 --> 20:32.040] I'm working on that. My prototype is very close to being ready. So I hope I'll be able to release
[20:32.040 --> 20:36.120] that in a few days, or in the worst case, a few weeks.
[20:40.680 --> 20:44.280] All right, everybody. It is that time of the show. We want to shout out someone who came
[20:44.280 --> 20:48.280] on and helped save a little knowledge from the dustbin of history, answered a question.
[20:49.400 --> 20:59.640] Today, the lifeboat was awarded to Ratchet, R-A-C-H-I-T, passing objects between fragments.
[20:59.640 --> 21:02.360] That's the question. It's not really a phrase. It's a question. They're using the built-in
[21:02.360 --> 21:06.280] navigator drawer. They've got fragment menus, and they want to communicate over those fragments,
[21:06.280 --> 21:11.640] passing data from one to another. This is about Android fragments. So if you ever wanted to pass
[21:11.640 --> 21:16.440] objects between Android fragments, get that data moving around, we have an answer for you. And
[21:16.440 --> 21:20.040] thanks to Ratchet and congrats on your lifeboat badge. Appreciate you sharing some knowledge on
[21:20.040 --> 21:25.880] Stack Overflow. All right, everybody. Thanks for listening. I am Ben Popper, Director of Content
[21:25.880 --> 21:30.280] here at Stack Overflow. You can always find me on Twitter, at Ben Popper. You can always reach us
[21:30.280 --> 21:34.680] with questions or suggestions, podcast at stackoverflow.com. If you'd like to show,
[21:34.680 --> 21:41.960] leave us a rating and a review. It really helps. And thanks for listening. I'm Ryan Donovan. I edit
[21:41.960 --> 21:47.560] the blog here at Stack Overflow. It's located at stackoverflow.blog. And if you want to reach out
[21:47.560 --> 21:53.720] to me, you can find me on Twitter at Arthur Donovan. Well, I'm Pierre-Étienne Meunier,
[21:54.440 --> 22:01.000] and you can browse pirou.com or .org if you want to know more about this project
[22:01.640 --> 22:06.920] and send a message to pe.pjoul.org. All right, everybody. Thanks for listening,
[22:06.920 --> 22:18.280] and we will talk to you soon.