The sound distributed version control system

#667 Crash finding pending changes when pulling

Opened by rohan on April 12, 2022
rohan on April 12, 2022

Pulling 6 changes works but pulling them in 3 batches (some are dependencies of others in the set) leaves the repo in a corrupted state. Further pulls fail with this error, even if there are no further changes to pull:

name = 'pijul'
operating_system = 'unix:Unknown'
crate_version = '1.0.0-beta'
explanation = '''
Panic occurred in file '/source/src/' at line 1107
cause = 'introduced by ChangeId(HSU3CPKMPSLXC)'
method = 'Panic'
backtrace = '''

   0: 0x5597fa4caa0b - <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold::hdf00c77164aba4c2
   1: 0x5597fa47f1b6 - <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter::h92db358d9edc8d14
   2: 0x5597fa401b7a - libpijul::change::BaseHunk<libpijul::change::Atom<core::option::Option<libpijul::pristine::change_id::ChangeId>>,libpijul::change::LocalByte>::globalize::{{closure}}::hd42ddd2505371dca
   3: 0x5597fa3ffdb3 - libpijul::change::BaseHunk<libpijul::change::Atom<core::option::Option<libpijul::pristine::change_id::ChangeId>>,libpijul::change::LocalByte>::globalize::h78a18ce1ee1e7d3e
   4: 0x5597fa4c9599 - <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold::h6126a1445b98aff4
   5: 0x5597fa43f38b - alloc::vec::source_iter_marker::<impl alloc::vec::spec_from_iter::SpecFromIter<T,I> for alloc::vec::Vec<T>>::from_iter::hd5cf9d3220af5f60
   6: 0x5597fa52c909 - pijul::commands::pending::hfacdc39013c7dedb
   7: 0x5597fa4881ae - pijul::commands::pushpull::Pull::run::{{closure}}::h9217bbb61db8a20d
   8: 0x5597fa4b0f59 - pijul::run::{{closure}}::h6cb1aa2fda44d319
   9: 0x5597fa4bd945 - pijul::main::{{closure}}::hc764072e3460ac87
  10: 0x5597fa2d1a68 - tokio::park::thread::CachedParkThread::block_on::hbcaae1062291b590
  11: 0x5597fa2d26bd - tokio::runtime::thread_pool::ThreadPool::block_on::h1549147534487106
  12: 0x5597fa4bc08c - pijul::main::hf16343de51f95808
  13: 0x5597fa3c2d33 - std::sys_common::backtrace::__rust_begin_short_backtrace::h77b1e76336eceb8d
  14: 0x5597fa421619 - std::rt::lang_start::{{closure}}::h280a12298597047b
  15: 0x5597fab890db - std::rt::lang_start_internal::hc4dd8cd3ec4518c2
  16: 0x5597fa4bf2c2 - main
  17: 0x7f6057852780 - __libc_start_main
  18: 0x5597fa2a84ca - _start

When cloning the repo and try the same pull commands again the change ID being blamed is different. They do not exist in the source or target repos.

I can provide a repo which replicates the problem using channel if required although it is ~126M.

pmeunier on April 12, 2022

This sounds like a bug in unrecord.

I would very much like to see that repo.

pmeunier on April 12, 2022

Another way to give me the repo in a compressed format is to do:

pijul debug | gzip -9 > debug.gz

And send the resulting file (debug.gz) to

rohan on April 12, 2022

pijul debug fails in the corrupted repo. I’ll email you the full repo

rohan on April 12, 2022

I’m using a local nix build with some changes to cargo.nix to align it with the toml, flake.nix to use zstd 1.49 and this fix:

pmeunier on April 13, 2022

I can reproduce with the following script:

#!/usr/bin/env bash

rm -Rf test test2

pijul init test
cd test

echo a > a
pijul rec -am. a

cd ..
pijul clone test test2

cd test
echo b >> a
pijul rec -am.

cd ../test2
rm a
pijul rec -am.
pijul pull -a ../test

HASH=$(pijul rec -am. | sed -e "s/Hash: //")

pijul unrec $HASH
pijul debug
rohan on April 13, 2022

The debug crash does look the same. Pulling still works, although if I add another change to test and pijul pull ../test then the change is duplicated when selecting changes to pull

pmeunier added a change on April 14, 2022
pmeunier on April 14, 2022

Nailed it! Thanks again. I won’t close it just now because of the problem of duplicated changes in pijul pull.

rohan on April 22, 2022

Thanks, I should have mentioned that there has been no unrecording at all for this repo, though I have switched channels a few times - just lots of merging and conflicts between two diverging channels. So I can unfortunately still recreate the corruption after updating to the latest code. Still, it’s possible to run debug against it - the resulting graphviz file is difficult to visualise due to its size.

rohan on April 23, 2022

Confirmed that on a fresh clone with no unrecords or channel switches the problem still occurs.

One of the pulled changes relates to a file which has already been deleted in the current channel. That creates two conflict warnings: a path deletion conflict and a normal deletion conflict starting at line 1. The resulting file contains only the deletion conflict.

rohan on April 23, 2022

While it doesn’t cause a crash there’s something strange with a deletion conflict in a deleted file after pulling no changes.

pijul init
echo 1 >> a
pijul record --all --message a
pijul add a
pijul record --all --message a
pijul fork no-a
echo 2 >> a
pijul record --all --message a2
pijul channel switch no-a
rm a
pijul record --all --message no-a
pijul pull . --all --from-channel main
pijul diff
pijul pull . --all --from-channel main
pijul diff

The first diff correctly shows the conflict as a file undeletion but the additional pull (of no changes) changes it to:

1. File deletion: "a" 5.1 "UTF-8"
BFD:BFD 4.1 -> 5.5:30/2, BFD:BFD 4.1 -> 5.5:30/2, BFD:BFD 5.30 -> 5.1:1/2
B:BD 5.4 -> 3.1:3/3
- 2

2. File addition: "a" in "" "UTF-8"
  up 4.1, new 36:61
+ >>>>>>> 0 [S4TQHIQC]
+ 2
+ <<<<<<< 0
rohan on April 24, 2022

Looking further into the original repo I notice that cloning the base channel and creating a new repo and pulling the base channel give slightly different results. There’s 6 additional edges - 3 green and 3 red - when pulling split across 2 changes. According to the logs the changes were applied in different order when cloning vs pulling.

The pulled-into repo gains 2 conflicts, apparently due to deletions being extended over into the 3rd change.

If this seems unlikely to be related to the above crash I can move to a new discussion.

rohan on April 24, 2022

The corruption of the original repo is here:

node_VIRFTWFNSDFXM_12232_12408[label="VIRFTWFNSDFXM [12232;12408["];
node_VIRFTWFNSDFXM_12232_12408 -> node_VPEZTD2DMPOSU_13907525_13907544 [label="VIRFTWFNSDFXM", color="forestgreen"];
node_VIRFTWFNSDFXM_12232_12408 -> node_O2Z236DZUHTLS_10168_10404 [label="[VIRFTWFNSDFXM]", color="red"];
node_VIRFTWFNSDFXM_12232_12408 -> node_VIRFTWFNSDFXM_12076_12231 [label="AAAAAAAAAAAAA", color="red", style=dotted];
node_VIRFTWFNSDFXM_12232_12408 -> node_O2Z236DZUHTLS_10168_10404 [label="[S7TZLD4QCJS2C]", color="red", style=dashed];

The last line, labelled S7TZLD4QCJS2C, is the issue. That change does not exist and there’s already an edge to node_O2Z236DZUHTLS_10168_10404 on line 3 which is from the change just pulled.

The problematic edge points into the middle of a change so there’ll have been a block split.

rohan on April 24, 2022

Only pulled-repo, not cloned, suffer from the corruption issue which references a non-existent change. Thus the original description where pulling all changes was ok may be simply due to that being the original repo and the issue turned up when I was trying to track down where unusual conflicts came from.

In any case the non-existent change is caused because the pulled-repo starts accumulating pending changes as soon as the first additional change is pulled. I’m not sure yet why that leads to corruption but as it doesn’t affect the cloned repo it’s most likely due to the original difference between the cloned and pulled repos mentioned above.

rohan on May 3, 2022 seems to have fixed the crashing issue. The duplicate changes in the pull set is still there but it fixes itself when refreshing the editor with the full dependency set.

rohan on May 6, 2022

In the ongoing saga it turns out the the pending changes from pulling are due to the logic around which files need updating in a pull. It seems to only be affecting files have been renamed/moved directories more than twice. In the call to libpijul::fs::find_path asks for youngest=false. For the one case I’ve tested it works correctly if I change that to youngest=true.

pmeunier on February 23, 2023

Ok, I’m a bit puzzled here. This is one of the few remaining discussions with an actual bug/strange behaviour.

I’m not sure I follow everything you wrote, can you give a scenario reproducing the current situation?

rohan on February 28, 2023

The project which was hitting this issue has been on hiatus for a while but I’ll try and recall what I was doing and see if I can build a minimal scenario.

I notice that there’s been a couple of changes around path and delete/undeletion handling so hopefully this is no longer an issue!