The sound distributed version control system

#669 [BUG] Applying patches that remove all lines from a file

Opened by spacefrogg on April 13, 2022
spacefrogg on April 13, 2022

Consider the following example (kudos to Wolfgang Jeltsch): Start a fresh repository with a single file named file containing the following:

begin
  content-1
end
begin
  content-2
end

Do the following:

pijul add file; pijul record -m file
# Remove line 1-3
pijul record -m 'content-1 deleted'
pijul unrecord --reset <content-1-hash>
# Remove line 3-5 (or 4-6, shouldn't matter)
pijul record -m 'content-2 deleted'
pijul unrecord --reset <content-2-hash>
# Now apply both changes, first 1, then 2.
pijul apply <change-1-hash>
pijul apply <change-2-hash>

In my trials, file still had the line begin as its only line, while I expected file to be empty.

apm on April 13, 2022

The final output seems correct to me. However, both patches remove the second “begin” line, and I would have expected a conflict.

pmeunier on April 14, 2022

I can’t reproduce, it’s empty in my case. Can you describe a slightly more precise test, ideally just a bash script where you edit your file with cat?

Examples of useful commands:

cat << EOF > file
begin
  a
end
EOF

H0=$(pijul rec -am. file | sed -e "s/Hash: //")
apm on April 14, 2022

Perhaps I got the test case wrong, but I understood we have two (independent) changes:

1. Edit in "file":4 2.29 "UTF-8"
B:BD 2.52 -> 2.52:58/2, B:BD 2.58 -> 2.58:74/2
- begin
-   content-2
- end

and

1. Edit in "file":2 2.29 "UTF-8"
B:BD 2.36 -> 2.36:58/2
-   content-1
- end
- begin

The first change removes lines 4,5,6; the second change removes lines 2,3,4.

  1. Only the first line remains, as expected (no bug here, in my opinion).
  2. Line 4 is deleted two times, with no conflict detected.

I’m curious about the latter: is it the expected behaviour for duplicate deletion? (In contrast, duplicate addition would have raised a conflict.)

spacefrogg on April 18, 2022

This script triggers the odd behaviour:

rm -rf test
pijul init test
cd test
cat <<EOF >file
begin
  content-1
end
begin
  content-2
end
EOF
pijul add file
pijul record -a -m file
sed -i -e '1,3d' file # remove line 1-3
pijul diff            # but diff algorithm uses 2-4!
H_one=$(pijul record -a -m 'content-1 deleted' | sed -n 's/^Hash: //p')
pijul unrecord --reset $H_one
sed -i -e '3,5d' file # remove line 3-5
pijul diff            # but diff algorithm uses 4-6!
H_two=$(pijul record -a -m 'content-2 deleted' | sed -n 's/^Hash: //p')
pijul unrecord --reset $H_two

pijul apply $H_one
pijul apply $H_two
echo Content of file:
cat file

I remove lines 1-3 (first block) and 3-5 (removing 4-6 makes no difference). I am not able to reliably remove both blocks in separate changes and end up with an empty file after application. This is due to the diff algorithm choosing (having to choose) the “wrong” lines when removing the first block.

In git, trying to merge the two changes would reliably result in a conflict, despite the diff algorithm choosing the “wrong” lines.

I am quite sure that the diff algorithm choosing the wrong lines should not matter for pijul, either, but right now it does. I suspect, what is going on here, is, that the diff algorithm makes its decision to use line 2-4 and not 1-3 based on the knowledge that line 1 and line 4 have the same content. This constraint gets lost in pijul changes, making them ambiguous when they are applied again.