pijul/pijul - Discussion #669 - [BUG] Applying patches that remove all lines from a file

#669 [BUG] Applying patches that remove all lines from a file

Closed on February 27, 2023

spacefrogg on April 13, 2022

Consider the following example (kudos to Wolfgang Jeltsch): Start a fresh repository with a single file named file containing the following:

begin
  content-1
end
begin
  content-2
end

Do the following:

pijul add file; pijul record -m file
# Remove line 1-3
pijul record -m 'content-1 deleted'
pijul unrecord --reset <content-1-hash>
# Remove line 3-5 (or 4-6, shouldn't matter)
pijul record -m 'content-2 deleted'
pijul unrecord --reset <content-2-hash>
# Now apply both changes, first 1, then 2.
pijul apply <change-1-hash>
pijul apply <change-2-hash>

In my trials, file still had the line begin as its only line, while I expected file to be empty.

apm on April 13, 2022

The final output seems correct to me. However, both patches remove the second “begin” line, and I would have expected a conflict.

pmeunier on April 14, 2022

I can’t reproduce, it’s empty in my case. Can you describe a slightly more precise test, ideally just a bash script where you edit your file with cat?

Examples of useful commands:

cat << EOF > file
begin
  a
end
EOF

H0=$(pijul rec -am. file | sed -e "s/Hash: //")

apm on April 14, 2022

Perhaps I got the test case wrong, but I understood we have two (independent) changes:

1. Edit in "file":4 2.29 "UTF-8"
B:BD 2.52 -> 2.52:58/2, B:BD 2.58 -> 2.58:74/2
- begin
-   content-2
- end

and

1. Edit in "file":2 2.29 "UTF-8"
B:BD 2.36 -> 2.36:58/2
-   content-1
- end
- begin

The first change removes lines 4,5,6; the second change removes lines 2,3,4.

Only the first line remains, as expected (no bug here, in my opinion).
Line 4 is deleted two times, with no conflict detected.

I’m curious about the latter: is it the expected behaviour for duplicate deletion? (In contrast, duplicate addition would have raised a conflict.)

spacefrogg on April 18, 2022

This script triggers the odd behaviour:

rm -rf test
pijul init test
cd test
cat <<EOF >file
begin
  content-1
end
begin
  content-2
end
EOF
pijul add file
pijul record -a -m file
sed -i -e '1,3d' file # remove line 1-3
pijul diff            # but diff algorithm uses 2-4!
H_one=$(pijul record -a -m 'content-1 deleted' | sed -n 's/^Hash: //p')
pijul unrecord --reset $H_one
sed -i -e '3,5d' file # remove line 3-5
pijul diff            # but diff algorithm uses 4-6!
H_two=$(pijul record -a -m 'content-2 deleted' | sed -n 's/^Hash: //p')
pijul unrecord --reset $H_two

pijul apply $H_one
pijul apply $H_two
echo Content of file:
cat file

I remove lines 1-3 (first block) and 3-5 (removing 4-6 makes no difference). I am not able to reliably remove both blocks in separate changes and end up with an empty file after application. This is due to the diff algorithm choosing (having to choose) the “wrong” lines when removing the first block.

In git, trying to merge the two changes would reliably result in a conflict, despite the diff algorithm choosing the “wrong” lines.

I am quite sure that the diff algorithm choosing the wrong lines should not matter for pijul, either, but right now it does. I suspect, what is going on here, is, that the diff algorithm makes its decision to use line 2-4 and not 1-3 based on the knowledge that line 1 and line 4 have the same content. This constraint gets lost in pijul changes, making them ambiguous when they are applied again.

pmeunier on February 27, 2023

Ok, I got it, thanks for your patience.

This isn’t really a bug, it only shows that there may be many longest common subsequences for the same input. This is the same reason why using diff3 for merging is wrong: it doesn’t matter for diff since patch is deterministic, but merges cannot just be chosen arbitrarily.

I believe the solution here would be a better patch editor, where the user could help the algorithm a bit. Or maybe just offer a choice of different diff algorithms. Ok, let me just push a patch here, let’s see if it work.

pmeunier added a change on February 27, 2023

Options to use Patience diff instead of Myers

MDY344ZZLCKXHA7VGJJYLNPCLHPI3I3LFODUYBPBH32SQ3SOOFIAC

main

pmeunier on February 27, 2023

Done for the moment, your test case works well with pijul record --patience.

We might want to think again about this in the future when we have proper UIs for patch editing.

At the moment we just get to choose between different heuristics, and we can’t do much more.

pmeunier closed this discussion on February 27, 2023