The sound distributed version control system

#747 UnRecord dangerously inverts Pijul's UI convention, high risk of catastrophic data loss [+ RECOVERY HOWTO]

Opened by jbthiel on December 16, 2022
jbthiel on December 16, 2022

BACKGROUND

The Pijul UI convention, as for example with the frequently used Record operation, is that if the proposed edit text is left completely unchanged when you exit the editor, then the operation exits out, is NOP No Operation, does not proceed. On Record, you must actively set the checkin log message to a non-empty value to proceed with recording.

This is a reasonable and good convention, I like it.

UNRECORD DANGEROUSLY INVERTS THE UI CONVENTION

Now, given this design, it is extremely important to maintain and follow-thru in all other commands. The UI must always default to NOP, or some inconsequential behavior, information, error msg, etc. in the absence of an activating user edit.

The interactive UnRecord command seriously breaks this convention, creating a high risk of catastrophic data loss.

Specifically, when you give the ‘–show-changes N’ parameter, it presents a list of the most recent N changes for you to review. The changes are by default selected, so if you exit the editor without editing anything, all of those changes will be immediately UnRecorded. Although you still have the working directory intact (if you did not specify –reset), all of the incremental patch history is irretrievably lost.

FIX

A simple fix to this issue, is the editor template should have each change prefixed with a comment char, so you have to uncomment that change hash to select it for UnRecording.

Another design alternative would be you build up a list of changes near the top of the editor template, by scrolling around to review the change hashes and descriptions, and copypaste your selected change hashes into a list of selections at the top, which starts out initially empty. (This would probably involve more scrolling for the user than the prior fix)

Also, pijul should give a final confirmation prompt, after you exit the editor, before actually doing the UnRecord. Something like: The following changes will be UnRecorded… A, B, C… Confirm y/n

ADDITIONAL NOTES

I think “–show-changes” is not meaningful enough as a parameter name; suggest “–select-changes [N]”, with some reasonable default like the most recent 5 or 10 changes if you omit N.

There are a couple other usability issues around UnRecord, which I saw mentioned in for example #675, and note here to clarify how they intersect with this issue.

The current UnRecord logic is correct, in regards that the selected changes should be the ones you want to UnRecord (not the opposite, as per #675). The leadin comment in the edit text, could emphasize that a bit further, to avoid users getting confused. Also write UNRECORD in all caps, to make it very clear.

It would be very nice for Pijul to offer an UNDO from the UnRecord operation (and all other destructive ops). An implementation might be to automatically use an internal temporary stash channel, in the manner described by spacefrogg in #675

joyously on December 18, 2022

I think I got caught by this. I was very unsure of what I was supposed to do in the editor for unrecord and wanted an easy out.

rohan on December 28, 2022

The unrecorded patches are not lost and can be applyed again although I don’t know of a quick way to find the hashes

jbthiel on December 29, 2022

@rohan, After UnRecord, patches are irretrievably lost; that’s what it does. The individual patches no longer exist to Apply. All you have left is an aggregate diff against your working directory. {UPDATE: In fact the patches do still exist on disk, see further comments below}

You could build the patches up again piecemeal with new Record-ing, from that diff. But if the patches had offsetting add/del/edits, then you have irretrievably lost all the intervening states. (and even if you manage to re-create the exact same content patches, they will have new hashes, as currently implemented – another bug/feature report I intend to open).

Since the entire reason for using VCS is to retain the interim states, this is catastrophic data loss, if you did it accidentally.

(Unless you are saying the individual patches are still in a secret place somewhere internally, after UnRecord? I don’t think that’s implemented currently. That’s what I also recommend: an Undo feature that for example puts them in an internal undo channel before UnRecording.)

Note just to clarify: There is no problem with the UnRecord function – the Patches are gone, and that’s how it’s supposed to work. The problem is the extreme ease of accidentally wiping out a bunch of patches via the current UI (unrecord –show-changes), because they are all selected by default in the editor.

The default needs to be opposite in the editor. So the hashes start out being commented in the UI, and you have to explicitly select them to make them unrecord. So If the user gets confused, answers a phone call and forgets what they were doing… they can safely exit the editor with No Operation. (instead of it instantly blowing away a giant gob of your work, as currently).

rohan on December 29, 2022
> pijul init
> echo 'something'>file
> cat file
something

> pijul add file
> pijul record --all --message file
Hash: E44VZEK2SLDRWUDOWRT77OTHT4JPRPIECA2CBSTXCLHSRNF5HCJAC

> echo 'delete me'>file
> cat file
delete me

> pijul record --all --message dm
Hash: T3IOCG4HN7LXJZ5KKE7EMBAKB7CISZBYURRR6IQ52YSVR4266ZRQC

> pijul log
Change T3IOCG4HN7LXJZ5KKE7EMBAKB7CISZBYURRR6IQ52YSVR4266ZRQC
Author: rohan
Date: 2022-12-29 21:11:40.266848320 UTC

    dm

Change E44VZEK2SLDRWUDOWRT77OTHT4JPRPIECA2CBSTXCLHSRNF5HCJAC
Author: rohan
Date: 2022-12-29 21:11:03.778962551 UTC

    file

Change 2QVMZO2O2KDKOW7OGJ7F3KPCPG26ZNXPS7ZJ4GWSB3MJOZTAC6RQC
Author:
Date: 2022-12-29 21:11:03.780118106 UTC

> pijul unrecord T3IOCG4HN7LXJZ5KKE7EMBAKB7CISZBYURRR6IQ52YSVR4266ZRQC
> cat file
delete me

> pijul reset file
> cat file
something

> pijul log
Change E44VZEK2SLDRWUDOWRT77OTHT4JPRPIECA2CBSTXCLHSRNF5HCJAC
Author: rohan
Date: 2022-12-29 21:11:03.778962551 UTC

    file

Change 2QVMZO2O2KDKOW7OGJ7F3KPCPG26ZNXPS7ZJ4GWSB3MJOZTAC6RQC
Author:
Date: 2022-12-29 21:11:03.780118106 UTC

> pijul apply T3IOCG4HN7LXJZ5KKE7EMBAKB7CISZBYURRR6IQ52YSVR4266ZRQC
> cat file
delete me

> pijul log
Change T3IOCG4HN7LXJZ5KKE7EMBAKB7CISZBYURRR6IQ52YSVR4266ZRQC
Author: rohan
Date: 2022-12-29 21:11:40.266848320 UTC

    dm

Change E44VZEK2SLDRWUDOWRT77OTHT4JPRPIECA2CBSTXCLHSRNF5HCJAC
Author: rohan
Date: 2022-12-29 21:11:03.778962551 UTC

    file

Change 2QVMZO2O2KDKOW7OGJ7F3KPCPG26ZNXPS7ZJ4GWSB3MJOZTAC6RQC
Author:
Date: 2022-12-29 21:11:03.780118106 UTC
jbthiel on December 29, 2022

@rohan, Thanks for that experiment, that is a very interesting transcript. I have reproduced your result with pijul 1.0.0-beta.2, So after UnRecord, the whole original patch with hash and contents is indeed in a “secret place”, still somehow accessible. You can apply it, and also view the whole patch via ‘pijul change’, if you know the hash.

However, I consider that a serious new bug we have discovered. We are looking at deleted data, that should have been zeroed/inaccessible.

Specifically, near the end of the transcript, the line

> pijul apply T3IOCG4HN7LXJZ5KKE7EMBAKB7CISZBYURRR6IQ52YSVR4266ZRQC

It works, when it should fail. There is no basis for that command to work. Where would it be finding that hash now? Where are the contents defined? The preceding log has clearly shown it’s not in the log; it’s not in another channel, nor remote; where is it? It should not be defined; it was just UnRecorded, and should be out of the repo. {UPDATE: It is actually very useful to have the patches still existing on disk, as an implicit Undo channel, so Pijul should implement some management commands for these dangling patches, that still exist but are not in any log/channel. See further comments below.}

If you give a different hash, for example just alter it by one letter, you get the error, Error: Hash not found Or some arbitrary valid hash from a completely different repository gives an error Error: No such file or directory (os error 2) In my view, the unrecorded hash should should give such an error too.

What if instead of ‘delete me’ that Patch had ‘super secret password’… which we quickly realized and UnRecorded out of the repo. Except it’s still lurking in there ?!? And all you need to know is the original hash code, and you can access. It’s a security nightmare.

Seems to me there is a code bug here around deletions in a backing data structure that are not being zeroed/cleaned up. UnRecord should be checking if it is removing the last reference to a patch (i.e. it is not on any other channel), and completely remove it from the backing store, as well as zero out the Patch contents, hash, etc. So it cannot be accessed in this phantom way.

(Or, is it intentional design that Patches exist forever once they are in the repo?? That leads to a whole other set of complications, security concerns, etc. Does not seem like a good idea for Pijul, (nor Darcs), although I think Fossil might work that way. So if that is the intent, Pijul could look there for how they approach such concerns.)

I suggest we open a new bug on this:

SUBJECT: BUG: Patch persists in the repo after UnRecord, despite not in any channel. Should be zeroed/cleared when you UnRecord the last reference.

jbthiel on December 30, 2022

UPDATE: I found the changes are in .pijul/changes/XX/YYYY.change files, partitioned into subdirs where XX is the first 2 letters of the hash, and YYYY the rest. Each patch is in a separate file.

When you UnRecord, the change file(s) are apparently not physically deleted currently (1.0.0-beta.2) That’s why they are still visible to ‘pijul change’ or ‘pijul apply’, even though they are no longer in the higher level indexes, like ‘pijul log’.

It is actually nice to still have the patches around, even after UnRecord, despite the security implications I mentioned earlier, and that it would become unwieldy as they number into the dozens or 100s. Although it’s clumsy and unsafe obviously for the user to be poking around directly in the .pijul directory. It would be better to have proper commands to manage this.

So instead of considering it a bug, that the patch files are still there, I now suggest designing additional commands for searching and managing such “dangling” patches – that are no longer in any channel, but still in the repository. They constitute a kind of implicit Undo channel already available. It just needs commands to Search, List details, Purge (really delete), etc.

RECOVERY WORKAROUND FOR ACCIDENTAL UNRECORD

So currently (pijul 1.0.0-beta.2), to find all your patches, including UnRecorded ones, from your working dir do: find .pijul/changes -type f The total hash is the XX subdir + YYYY.change filename. It seems you don’t even need to reconstruct the hash. It works to apply the exact filename directly: pijul apply .pijul/changes/XX/YYYY.change

The change files have attributes like Message Description in them, which you can grep, or inspect with an editor (although they are a binary format).

EXAMPLE RECOVERY LOG

$ pijul log --hash-only
3LOHL2T6I34O7ABIENGCLJHG6S6WVW4IQF7M34IRQFEGLZMM2L4QC
H52IMOE3S2ZGHEQ5MX6GOKX653EMAUYTEVUFLCEQ4P7J4GP3XHRAC
E5JPIQORHCNRY3J264V4OSJHXVJ2ICMVPVKIVKO2DXYRSKNIKTKAC
3JY6JM6JNBZWZ24VU4XU464C46HBGB5W23RS44CVJRJ7ETWNA5WQC

$ pijul unrecord 3LOHL2T6I34O7ABIENGCLJHG6S6WVW4IQF7M34IRQFEGLZMM2L4QC    H52IMOE3S2ZGHEQ5MX6GOKX653EMAUYTEVUFLCEQ4P7J4GP3XHRAC

$ pijul log --hash-only
E5JPIQORHCNRY3J264V4OSJHXVJ2ICMVPVKIVKO2DXYRSKNIKTKAC
3JY6JM6JNBZWZ24VU4XU464C46HBGB5W23RS44CVJRJ7ETWNA5WQC

$ find .pijul/changes -type f
.pijul/changes/3J/Y6JM6JNBZWZ24VU4XU464C46HBGB5W23RS44CVJRJ7ETWNA5WQC.change
.pijul/changes/H5/2IMOE3S2ZGHEQ5MX6GOKX653EMAUYTEVUFLCEQ4P7J4GP3XHRAC.change
.pijul/changes/3L/OHL2T6I34O7ABIENGCLJHG6S6WVW4IQF7M34IRQFEGLZMM2L4QC.change
.pijul/changes/E5/JPIQORHCNRY3J264V4OSJHXVJ2ICMVPVKIVKO2DXYRSKNIKTKAC.change

$ pijul apply .pijul/changes/*/*
Outputting repository ↖

$ pijul log --hash-only
3LOHL2T6I34O7ABIENGCLJHG6S6WVW4IQF7M34IRQFEGLZMM2L4QC
H52IMOE3S2ZGHEQ5MX6GOKX653EMAUYTEVUFLCEQ4P7J4GP3XHRAC
E5JPIQORHCNRY3J264V4OSJHXVJ2ICMVPVKIVKO2DXYRSKNIKTKAC
3JY6JM6JNBZWZ24VU4XU464C46HBGB5W23RS44CVJRJ7ETWNA5WQC

So this is a viable recovery workaround if you accidentally UnRecord a bunch of patches with ‘unrecord –show-changes’. If you recall the hashes, you can apply them directly. If you recall the descriptions, you can grep for the matching YYYY.change files.

Worst case, if you don’t recall anything, you can make a temporary channel, and apply all the XX/YYYY.change files into that channel, which will bring them back into visible operation. Then you can inspect and apply them into your main working channel(s).

pmeunier on January 5, 2023

Actually, the patches might or might not be lost, depending on the compilation options.

I agree that we could make it simpler by requiring that you uncomment a line in the proposed list of patches to unrecord.

@jbthiel, do you want to propose a workaround patch and submit it to this discussion?

rohan on January 5, 2023

In terms of finding unrecorded changes I’m thinking automatically forking as part of the bulk unrecord. And there probably should be some command to permanently remove changes to cover the case of a local checkin of the super secret password.

I think git behaves similarly to Pijul in that there’s a garbage-collection command to get rid of unreachable commits which will otherwise hang around. Git’s reflog provides some ability to find these but a proper channel which can be dropped when no longer required should be cheap enough.