Hi! Thanks for the report! After testing on my machine (Ext4 on Linux, Pijul 1.0.0-alpha.54, Rustc 1.54), everything seems to works for me:
message = ''
timestamp = '2021-09-07T10:12:28.673944156Z'
[[authors]]
key = 'FZQ2g7VfnzLYM4mtTVDk9HAZjA8Jk9ndkwN1icgbtWUr'
# Hunks
1. File addition: "kälström" in "/" "UTF-8"
up 1.0, new 5:39
+ a
If I record again, it works. Would you mind sharing some bits of your configuration? Which platform? Filesystem? Is this the latest stable Rust? Latest Pijul?
Sure. Thanks for a quick reply! I was investigating further, and I think I managed to isolate the issue. The file addition hunk is “filtered through” the textual format you posted above, right? In this line:
1. File addition: "kälström" in "/" "UTF-8"
the "kälström"
string was produced using {:?}
, but it will be parsed out without un-escaping the string inside the quotes. This creates an extra layer of escaping. For you, for whatever reason, the string was not escaped so everything is OK. But I have "ka\u{308}llstro\u{308}m"
, which ends up incorrectly being recorded as "ka\\u{308}llstro\\u{308}m"
.
Hope it makes sense.
$ pijul --version
pijul 1.0.0-alpha.54
$ rustc --version
rustc 1.54.0 (a178d0322 2021-07-26)
This could be the reason indeed. I have no idea why Rust prints my file name correctly the first time. I’ll fix that now.
encoding is done at text_changes.rs:434
, decoding at text_changes.rs:572
, but these are not inverses.
SFJ3XRTFUNG6KNYDLIYKHCENZ6Y3PG33GNGDW6444LAVBMCSH2FAC
I just pushed a patch replacing the {:?}
with something independent from Rust. If you care to test again, it would be helpful.
That was super quick :-) Tested it, and it solved my test-case.
Would it be a good idea to quickcheck the write-read roundtrip? It seems that otherwise text_changes.rs
would be an endless source of bugs.
It is actually tested by cargo test
in Libpijul, but some of these tests are broken at the moment, too much stuff going on in this project.
I think your change breaks on some special characters in file-names, such as a newline character. I created a test file in bash like this:
echo txt > $'abc\ndef'
which can’t be recorded.
I am now trying to fix tests, but I don’t know if I’ll succeed.
I have a patch in preparation for that, we might get conflicts if you do that.
I think your change breaks on some special characters in file-names, such as a newline character. I created a test file in bash like this:
Indeed… Nice catch.
this may be a good library for string un/escaping, though I’ve never used it. https://crates.io/crates/snailquote
Unfortunately, that crate’s license doesn’t allow us to use it in Libpijul. I wouldn’t even look at their code in order to avoid deriving things.
But using a more general escaping function is a good idea, the patch I pushed can easily be extended to include more escape sequences.
ah OK.
Do you plan on keeping the changes editable in the future? If so, file name escaping would need to be 100% correct eventually.
I don’t think it can be made correct using the current approach based on regular expressions. Would it be beneficial if I rewrote it using a proper parser library?
And what about non-utf8 filenames, which may not be representable by String
?
LWBBN2IBFFW2UIQIP7XL3J2QOK76EJI3P4MO5D543UAATAYW66XAC
Unless I am mistaken, the quotes are still not properly fixed :-) Recognition regexps should match double quotes around the names, and the Escaped
function should be used for Edit
and Replace
hunks.
I believe this is fixed now, since your new code has been merged. Please reopen if not.
If I record a file with name
källström
, things break. I’m running Manjaro, and I’m using the most recent pijul from main. To reproduce:The last line,
pijul diff
, returns a file move, even though it should not return anything: