The sound distributed version control system

#872 Any plan to support unified diff ouput ?

Opened by oknozor on February 6, 2024
oknozor on February 6, 2024

Supporting unified diff, would allow to freely use stuff from the git ecosystem, things like delta, diff2html etc.

I am not saying this should be the default, but it would be nice to have.

If you are interested I can do it.

pmeunier on February 7, 2024

Unfortunately, as much as everybody would like it to be, Pijul is not and cannot be made Git-compatible. Git stores far from enough information about edits to let Pijul do its job. Our patches are way more complicated, and there isn’t really anything we can do about it.

oknozor on February 7, 2024

Do you have any example of diff incompatibility ? I am not familiar with conflict resolution and pijul in general yet.

joyously on February 7, 2024

I think the point is not to make Pijul compatible to Git, but to use an existing standard so that existing tools can be used. I think it’s only about an option for output, not the internal structure.

There are pages of info in the Git help describing all the flags that affect the diff output, the log output, the pretty format, the file stat output, the blame output, and the show output.

pmeunier on February 7, 2024

Anything that solves or create a conflict will generally become incompatible.

For example if Alice deletes a block while Bob is writing in the same block, line numbers won’t be enough to know what to do when Alice applies Bob’s change. Outside of Pijul, two options exist:

  • Operate on patches directly, and do operational transforms on them, like Darcs, to detect the conflict.
  • Recover patchs from snapshots (like Git), and then do a “diff of the diffs” (aka 3-way merge) to try and do some operational transform.

Pijul does something completely different, by having its patches operate on a more complicated datastructure than text files. We like to give the illusion that we are operating on text files (for example on the “changes” pages of this website, or in pijul change or pijul record), but this is not what we actually do.

pmeunier on February 7, 2024

That said, if what you want is outputting unified diff, then certainly we can do that, and I’d very much love to turn the “Changes” page of this website into something more colorful and interesting like the pages you link.

This is an ambitious project, though.

oknozor on February 7, 2024

I think the point is not to make Pijul compatible to Git

Exactly, actually the unified diff format is a GNU diff thing not related to git at all. It’s just a spec to format diff outputs.

My point is the following:
Because many tools out there already use the unified diff format as input, if you support this format in addition to current the one (for instance adding a --unified-diff flag to pijul diff), you incidentally unlock a lot of this tooling for free.

Having this would mean pretty pagers like delta, difftastic, would be really easy to port to pijul (or even compatible without any modification).

External as diff2html could directly use the diff output directly etc.

Maybe there are some edge cases with the output conflict resolution ? hence my previous question. But even if there are conflict, starting with a subset of the output spec and building up from there would still be great I think.

oknozor on February 7, 2024

That said, if what you want is outputting unified diff, then certainly we can do that, and I’d very much love to turn the “Changes” page of this website into something more colorful and interesting like the pages you link.

That’s exactly what I want and I think its easier than you think, as I said in the previous message, once you output unified diff, the pretty colors comes for free :)

I will give a try soon and let you know how it goes.

pmeunier on February 7, 2024

The colors shouldn’t be the problem indeed. If you think it’s easy please go ahead! This will require you to dive quite deep into the pristine graph, but that is never a bad thing.

joyously on February 7, 2024

It seems that the existing output could be massaged to the unified output. The difficult part might be providing context lines.

pmeunier on February 7, 2024

Context lines indeed require many design choices, in addition to tricky algorithms to retrieve them efficiently in the right context.

oknozor on February 8, 2024

Doing some experiment this morning, it is far from being complete but I am getting somewhere at least.

Here is a screenshot of pijul producing its own unified diff, piped to delta: screenshot pijull diff wit delta

This is far from being complete and only the edit and replace hunks are supported.

Any hint on how I get context line @pmeunier ? I will send a work in progress patch after doing a bit of cleaning.

joyously on February 8, 2024

It looks like progress!

I was going to point to

#833 missing command to show a file at a certain state

as a way to access the file, so you can get the context lines. I see that a change was added yesterday, so maybe it’s closer than I thought.

oknozor on February 8, 2024

My current implementation use a custom writer for Hunks, so adding the context line will be tricky (or not ?) I am not sure about how much context should be shown in this or that situation.

An alternative would be to use #833 plus imara-diff to directly diff files, my intuition is that it would be more efficient than producing diff then mixing with #833 to add context.

Would using imara be ok @pmeunier ? I will look at their unified diff implementation to get a grasp on what context should be displayed anyway.

pmeunier on February 9, 2024

Ok, now you get why I considered it so hard: getting context lines!

I’m not sure how to proceed. One thing you could do is look at a function like libpijul::alive::retrieve::retrieve, which has an instance of what you want, by doing a DFS on the graph.

In your case, you want to start with the hunks introducing new vertices, let’s ignore edges for now. Find the first part of the vertex that is still in the graph (using txn.find_block), and iterate backwards (i.e. following the edges that are PARENT and not DELETED) until you get to alive material (vertices with PARENT adjacent edges that are BLOCK and neither DELETED nor FOLDER).

The hardest bit will be about doing a topological sort efficiently in order to find which of those is the actual context, in case that comes from a solved conflict. Feel free to look at libpijul::alive::output to see how Pijul uses that same kind of tricks to output files.

joyously on February 9, 2024

Feel free to look at libpijul::alive::output to see how Pijul uses that same kind of tricks to output files.

I’m not sure I understand why there isn’t already a function to return a file. It could be used for this and for the actual resolution of #833.

pmeunier on February 9, 2024

This is because Pijul does something completely different than all other version control systems: the current state of the repository is a graph stored in a database (Sanakirja), and that graph evolves with each patch.

Recovering an old state of a file means reverting enough patches to get back there.

joyously on February 10, 2024

You are caught in the internals of Pijul. I’m simply stating that a VCS is like a database of file versions. It is basic CRUD operations that are needed to interface with that database. Since the logic exists already, it should be in such a fundamental form that it can be called for other commands instead of rewritten for each.

whynothugo on February 11, 2024

I’d also love to see the output of pijul diff and pijul change in unified diff format (or something similar). The lack of context makes the output very uninformative right now.