An implementation of git fast-export for Pijul

#2 panic with RenameNotImplemented

Closed on April 19, 2023
joyously on April 15, 2023

I tried your latest code on the repo I have where I imported WordPress into a pijul repo, in order to see how it does with a large repo (it’s about 75Meg of source code and 18 years of history). The .git folder is 228Meg and the .pijul folder is 2.3Gig.

Since there is no output for the export, I don’t know how far it got, but it didn’t take long to stop with a panic.

    Compiling pijul-export v0.1.0 (/_working/contrib/pijul-export)
    Finished release [optimized + debuginfo] target(s) in 1m 21s
/_working/contrib/pijul-export$ cd ..
/_working/contrib$ mkdir test-pijul-export
/_working/contrib$ cd test-pijul-export
/_working/contrib/test-pijul-export$ git init
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint: 
hint: 	git config --global init.defaultBranch <name>
hint: 
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint: 
hint: 	git branch -m <name>
Initialized empty Git repository in /_working/contrib/test-pijul-export/.git/
/_working/contrib/test-pijul-export$ ../pijul-export/target/release/pijul-export --repo /_working/contrib/copy1-of-wordpress-develop/ | git fast-import
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: RenameNotImplemented("b2-include/b2quicktags.js", "wp-admin/FPVISVLU5UBPUOMV33LAI5ZR5CB2SXOVUWNL7TFY5LHTAIRFNY3Q")', src/main.rs:43:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:       5000
Total objects:          236 (        17 duplicates                  )
      blobs  :          143 (         0 duplicates         56 deltas of        140 attempts)
      trees  :           55 (        17 duplicates          0 deltas of          0 attempts)
      commits:           38 (         0 duplicates          0 deltas of          0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:           1 (         1 loads     )
      marks:     1073741824 (       181 unique    )
      atoms:             75
Memory total:          2493 KiB
       pools:          2141 KiB
     objects:           351 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 35184372088832
pack_report: pack_used_ctr            =          2
pack_report: pack_mmap_calls          =          1
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =     427249 /     427249
---------------------------------------------------------------------
andybalholm added a change on April 15, 2023
WQACY5X6MSK7NXFXKV5LSU77RU3UHEHK234GFJ4QUQCZZ23FMHKAC
main
andybalholm on April 15, 2023

I don’t really understand why libpijul is calling rename while it’s outputting a fresh working copy (not modifying an old one), but here’s a change that implements it. Let me know if it solves the issue.

By the way, you can tell from Git’s output that it successfully imported 38 commits.

joyously on April 15, 2023
  1. I tried pijul pull, but it doesn’t see this change. Do I need to pull it from this channel?

  2. Maybe you aren’t doing what you think you are doing when “outputting a fresh working copy (not modifying an old one)”. Isn’t it going through all the history? Can’t some of the commits involve a rename of a file, in this case from “b2-include/b2quicktags.js” to “wp-admin/FPVISVLU5UBPUOMV33LAI5ZR5CB2SXOVUWNL7TFY5LHTAIRFNY3Q” which happened very early on in the project? (although I don’t see why it has a Pijul ID for a file name)

  3. The Git output only showed after the panic, so I can’t see how it’s going as it runs… The import does show you as it goes.

andybalholm on April 15, 2023
  1. To pull changes attached to a discussion, you pull from a channel named with a colon and the discussion number: pijul pull --from-channel :2

  2. Yes, it iterates through all the history. But at each change, it creates a fresh, empty working copy and outputs the file contents to that.

  3. Would you like it to print a line with the change ID and timestamp after exporting each change?

joyously on April 15, 2023

Thank you.

  1. The change compiled and it has run longer than before, so I guess it’s working. The fan on my laptop turned on…

  2. Regardless of creating a fresh copy, if the change itself is a rename, that functionality has to be there.

  3. I don’t know what to output, but it needs something, mostly for large repos, so after each 100 or 1000 would be good. (this one is still running after 12 minutes; I expect it to take quite awhile since there are around 50000)

joyously on April 16, 2023

OK, I started it at 18:20 yesterday and it finished at 17:00 today, but the only files output are a couple of files in the .git folder, the most significant of which is .git/objects/pack/pack-* that is 1.3Gig (whereas the one on the original repo is 216Meg). There is no HEAD file and no working copy files. Output is

fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:     340000
Total objects:       335989 (   8322257 duplicates                  )
      blobs  :       117825 (         0 duplicates      68717 deltas of     117780 attempts)
      trees  :       172374 (   8322257 duplicates          0 deltas of          0 attempts)
      commits:        45790 (         0 duplicates          0 deltas of          0 attempts)
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:           1 (         1 loads     )
      marks:     1073741824 (    163615 unique    )
      atoms:           5488
Memory total:         26423 KiB
       pools:          2516 KiB
     objects:         23906 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 35184372088832
pack_report: pack_used_ctr            =          2
pack_report: pack_mmap_calls          =          1
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =  217062257 /  217062257
---------------------------------------------------------------------

Looks like Git doesn’t know about this repo:

/_working/contrib/test-pijul-export$ git log
fatal: your current branch 'master' does not have any commits yet
/_working/contrib/test-pijul-export$ git status
On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)
andybalholm on April 17, 2023

Try git checkout main. By default, it exports the main channel from Pijul to the main branch in Git.

joyously on April 17, 2023

That command worked and now the working tree is populated. But it still has the 1.3Gig pack file, which I presume has lots of data that Git doesn’t use.

andybalholm on April 17, 2023

I don’t know why the pack file is bigger than the old one. But it makes sense for the .git directory to be around the same size as the .pijul directory. (When I exported the Pijul source code to Git, the .git directory was just slightly smaller than the .pijul directory.) So the mystery isn’t so much why the .git directory is so big now, but why the .pijul directory is so much bigger than the original .git directory.

joyously on April 17, 2023

it makes sense for the .git directory to be around the same size as the .pijul directory.

It doesn’t make sense to me. Why give Git more data than it needs to store? Perhaps the pack file only grows(never shrinks), so it has all that extra info stuck in there.

I think the extra is the bytewise diff info, but I really know nothing about either one. If Git can store all the history in 228Meg, there’s no reason to give it 1.3Gig of stuff.

andybalholm on April 17, 2023

What I’m trying to say is that I think the expansion in size is a result of something strange that happened when converting the repository from Git to Pijul. If it’s been more than a few weeks since you did the conversion from Git to Pijul, you might want to try redoing it. I think there have been some bugs fixed in pijul git since beta 2.

andybalholm on April 18, 2023

I’ve added a --progress-interval flag to turn on progress messages.

andybalholm closed this discussion on April 19, 2023