pijul/pijul - Discussion #803 - pijul export, pijul import, and import-stream compatibility

#803 pijul export, pijul import, and import-stream compatibility

Opened by esr on June 3, 2023

esr on June 3, 2023

Greetings, pijul developers. I am the maintainer of reposurgeon, a tool which aims to at least semi-automate high-quality conversions among version-control systems. I am interested in pijul because I thought the patch algebra of darcs was a fascinating failure, and would be delighted to see a VCS based on a sound patch algebra succeed. Especially if it can have a cleaner and simpler CLI than Git.

I have seen many version-control systems come and go since I first started working with them in the early 1990s. I believe that in order to succeed in today’s competitive environment, a new VCS like pijul needs to make two particular promises to potential users:

(1) Repository import from Git to pijul is painless and preserves every interesting semantic property of a Git repository.

(2) Experimenting with Pijul is not risky - you can move your history from pijul back to Git without losing work.

The second guarantee is important. Everybody hates data jails; even if you sincerely believe nobody who tries pijul will ever want to go back. you need to establish that pijul isn’t a data jail for people to have confidence in it.

Therefore, I suggest that pijul needs to have fast-import and fast-export subcommands built in. I am aware that a single-channel export already exists. I can also see from thinking about pijul’s patch model that replaying a Git commit sequence into an equivalent set of pijul patches should be pretty trivial.

Here is a third promise that would be highly desirable to fulfil:

(3) Export and import work such that the round trip Git -> pijul -> Git is idempotent.

All this having been said, I understand that the Git and pijul data models are not equivalent and that, in particular, (3) may not be achievable. One obvious issue is that pijul appears to have no equivalent of Git annotated tags.

Here is a related problem. Presently, there is no lossless serialization of a pijul repository state. I think it would be interesting to explore what a pijul analog of fast-import streams would look like. If it could be a compatible extension of Git import-stream format that would be extremely interesting.

I’m here to suggest a three-step project:

We begin by creating a design document that describes the differences between the pijul and Git data models in enough detail to support writing interoperability code. Such a document would also be useful for onboarding Git developers. E.g. “Want to know exactly how channels differ from Git branches? Go here!”
We should work from that document to specify a pijul dump stream format that is as close as possible to Git dump-stream format - ideally, designed so a git import stream reader can digest it and pull out everything compatible with Git’s data model.
Then we should implement import and export commands in a principled way.

By doing these steps, each partial result of the project will be useful even if the resulting import/export tools turn out to have serious limitations.

What I bring to the table is that I have more experience understanding and bridging between different version-control ontologies than anybody else; this was required to write reposurgeon. Also, I am willing to do most of the work to bash together the prose in the design documents if I can get pijul developers to answer my questions.

pmeunier on June 4, 2023

Hi! Welcome here. I don’t have all the answers, but I definitely appreciate your interest.

I also do think that importing and exporting are important. It took me a while to realise what people actually wanted when they asked about Pijul → Git export. The “mathematically pure” version of this is obviously your point (3), and while this might not be impossible it is at least very hard to do in a safe and “fun” way, unless we attach a lot of metadata to each Git commit when exporting (which could actually work, but there are tons of cases and I never had the time to look at that).

Another use case is a project that wants to use existing Git platforms (GitHub, GitLab…) for SEO/discoverability/others, yet want to use Pijul to do all the actual work, and just want a mirror.

The import tool we have (pijul git) was originally designed to test our algorithms at scale on real-size repos with complicated histories, like Nixpkgs, which is arguably much harder to import than carefully maintained histories like the Linux kernel.

The specification document you’re asking for doesn’t exist yet, but is probably not too hard to write now that the basic format is stable (I still expect it to evolve a bit, in a backward-compatible way). I’m interested in writing it anyway, so I guess the best way forward is that I keep this discussion open and start to write a specification (as soon as I’ve fixed the mess I’ve create with the new, “serverless” Nest over at nest.pijul.org).

esr on June 5, 2023

“attach a lot of metadata to each Git commit when exporting” is a plan reposurgeon could cooperate with. For purposes of export/import from bzr my import stream code already supports an extension syntax that can associate property-value pairs with a commit.

So maybe a good place to start on the specification document is this: what are all the properties associated with a pijul commit?

esr on December 19, 2023

Is there still any interest in developing this specification?

I would like pijul to have first class support in reposurgeon, and I think that would help pijul in its uphill struggle for adoption.

pmeunier on December 19, 2023

There is still a lot of interest, and I agree that having Pijul in Reposurgeon would be cool. But developer time has been scarce lately. I’m planning on getting some funding soon, hopefully we’ll be able to pay for people to work on things like this.