The sound distributed version control system

#716 Is there a way to convert the patch to the diff format?

Closed on February 28, 2023
JSDurand on September 24, 2022

The patch produced by commands like pijul diff and pijul change is not the same as that produced by diff. I guess this is because pijul needs more information to apply the patch.

But if pijul supports output in the diff format, then it would be easier to communicate with other version control systems, and perhaps to facilitate the integration in editors, such as Emacs.

What do you think? Is there already a plan for doing this that I missed?

JSDurand on September 24, 2022

After some more thinking, I guess the biggest problem is that pijul works with generalized files, instead of regular files.

Even though pijul represents the generalized files as plain files somehow on the file system, that representation is not “faithful”, that is, not injective. Of course this is so because every file is a generalized file, so if we can find an inverse to the ordinary inclusion from files to generalized files, then the two are actually equal, and the generalization generalizes nothing.

Moreover, I think this is related to the problem of translating a pijul repository to a git one and back. Since every file is a generalized file, and if we translate a pijul repository to a git repository, then the git history can only contain versions of ordinary files, we can never be sure if a version passed back from a git repository actually represented a generalized file.

Thus I am led to believe that neither the conversion to standard diff format or the conversion to git repository can be done bijectively. Maybe we need to do something different. What do you think?

joyously on October 4, 2022

I think using a common diff format is important to interface with existing tools. But with the current set of commands, can you even extract a file in any other state than what is in the working folder?

sellout on October 11, 2022

This is a bit hand-wavy, but I think you’re right, it can’t be bijective. It seems like a Git repo is a retract of a Pijul repo (and, equivalently(?), a (unified) diff is a retract of a Pijul patch). That is, pijulFromGit :: Git -> Pijul has a left-inverse gitFromPijul :: Pijul -> Git, which means that gitFromPijul . pijulFromGit :: Git -> Git should produce the same Git repo you started with. However, the reverse composition (pijulFromGit . gitFromPijul :: Pijul -> Pijul) is merely idempotent (you get the same result no matter how often you apply it, but it may not be the original input).

And then the same thing for conversions between patches and diffs.

That said, I think a surjective function pijul diff --unified would be very useful. And the properties of those conversions can be tested to prevent (some) regressions in the conversions.

pmeunier on February 28, 2023

We used to have a conversion to/from existing Unix patches, a long time ago. This has been dropped because it was too confusing for everybody, and almost impossible to maintain.

I agree that more interoperability is always good, but Git doesn’t store the same information as Pijul:

  • Git doesn’t store enough information to guarantee anything about the result of a merge (I believe the only guarantee is that all the lines from both branches will appear at least (or exactly?) once in the result).

  • However, Git stores more information about the order between versions. That is optional in Pijul, and uses “tags” (which we plan to make super light in a near future).

This doesn’t matter much when importing in one direction, but if we have both directions, what about re-importing? The end of this comment shows how a simple operation we clearly want to do may end up causing a lot of trouble.

In the following explanation I’ll use an odd number of “primes” for the Pijul versions (such as A’ or A’‘’), and an even number of “primes” for the Git versions.

Imagine a situation where we have two commits A and B, as well as an older commit U touching another part of the repo. We import them into Pijul as U’, A’ and B’ and “merge” A’ and B’ (meaning we just apply those two patches to the same channel). The merge goes without any conflict (so, no particular commit/patch to add). We then add a few patches C’, D’, E’, which all depend on A’ and B’, and unrecord U’ since we realised it had a bug.

Then, we reexport all these patches, meaning in Git terms that re-export everything since U’, since these commits have changed their Git identity by losing U as an independent, unrelated ancestor.

After exporting, we merge the original branches (commits A and B), and pray that Git doesn’t reshuffle any line and doesn’t make up any conflict. Git might then be able to merge the re-exported A’’ and B’’ with the origin A and B, also without any conflict. Here, one giant gap between Darcs/Pijul and all other version control systems is that Darcs and Pijul users find that silent non-conflict absolutely terrifying, whereas everybody else doesn’t seem to care. To me, this is the version control version of Bjorn Stroustrup’s argument that language safety doesn’t matter since “we can’t agree on what’s safe or not” (https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2739r0.pdf)

Finally, we re-export that merge commit to Pijul one more time. This means in particular that we would have stored in the metadata of A’’ and B’’ that these commits came from Pijul patches A’ and B’, respectively. While this is possible, I’m not sure about security, since we have no way to verify that information rigorously. But anyway, even if we did that, we’d end up reimporting A and B, and since the conversion is deterministic, we’ll get exactly A’ and B’ with the same hashes. And we’d also import A’’ and B’’ as A’‘’ and B’‘’.

Finally, we’d get Pijul to merge A’‘’, B’‘’, A’ and B’ together, resulting in conflicts at every single line touched by each of these commits.

Since you presumably used this setup to make your merges faster when working with a Git repository you don’t control, this would only speed up your work the first time, and make it orders of magnitude worse every time afterwards.

So, I’ll close this discussion in favor of #410, since the arguments are the same.

pmeunier closed this discussion on February 28, 2023