#43 `pijul record` locks up other commands

Opened by cole-h on November 12, 2020
cole-h on November 12, 2020

Commands like pijul log and pijul change hang when there is a pijul record editor window open.

cole-h on November 12, 2020

Looks like this is due to the libpijul::pristine::sanakirja::Pristine::new(&pristine_dir)? call in pijul/src/repository.rs – changing it to new_nolock (wrapped in unsafe {}) makes it work. But this probably isn’t a good idea, considering find_root is called in every subcommand… Maybe a new find_root_nolock should be used for subcommands like log, change, and diff, and the other commands that don’t write? Or maybe it should be refactored to allow asynchronous (but read-only) reading?

cole-h added a change on November 12, 2020
This change could not be retrieved at the moment
cole-h added a change on November 17, 2020
This change could not be retrieved at the moment
cole-h on November 17, 2020

OK, re-uploaded the change (or one that functions similarly). Currently, only pijul log and pijul change don’t need mutable access to the repo. I don’t think pijul diff and pijul credit should need to be able to modify the database, but maybe that’s a lack of understanding. I don’t think it’s safe to new_nolock repos that are declared as mut.

pmeunier on November 17, 2020

Good questions:

  • It is never safe to new_nolock if you don’t have another locking mechanism in place (it is guaranteed to corrupt your repository, except if none of the transactions do anything). This method is intended to be used in places like the Nest, where repositories are locked by an asynchronous mutex rather than by filesystem locks. This allows to use the full power of Sanakirja, where immutable transactions can be started in parallel to a mutable one (the exact rule is: no more than one mutable transaction at the same time, and an immutable transaction t can be concurrent to at most one mutable transaction during the entire lifespan of t).

  • I don’t think pijul credit needs mutable access, because it’s only doing txn_begin, not mut_txn_begin.

One correct way to solve this issue is by dropping the transaction before doing edit::edit_bytes. That would probably take some refactoring of push, pull and record (the only commands that take a lock, I believe).

If you’re interested in solving this, I’m super happy to help.

cole-h on November 18, 2020

You weren’t kidding about needing to refactor record, at least (the one I started with). I don’t have too much time right now aside from my drive-by fixes; I’m interested in giving it a shot, but it will be slow going (if it goes at all) until finals are over.

pmeunier on November 19, 2020

I started thinking about this, and tried a few things, only to conclude that this is trickier than I thought, not only from an implementation perspective.

If we stick to the goal of a minimally confusing system, especially to new users and non-technical people, I don’t think it is reasonable to allow multiple records to happen on the same repository, or a channel switch in the middle of a record.

I can already hear the angry messages here, saying Pijul is inventing conflicts that shouldn’t exist, because the same action has been recorded twice in two different changes.

I think one reasonable design choice here is to implement named read-write locks in Sanakirja, and extremely verbose feedback to tell the user that their repository is locked by another process. I’ll do this now.

cole-h on November 19, 2020

I think one thing to note is that I don’t want records to happen concurrently (multiple writers does sound like a bad time), but just commands like seeing the pijul log (e.g. to reference the hash of another change) or running pijul diff to see if anything has changed since you ran pijul rec (a bit contrived of an example, but…).

Basically: commands that you wouldn’t think would need to write to the repo shouldn’t need a lock on said repo. pijul log is just reading the history, not mutating it. pijul diff is showing unrecorded changes. pijul change is reading out the contents of a specific change.

pmeunier on November 19, 2020

I think one thing to note is that I don’t want records to happen concurrently

Of course, sorry if I wasn’t clear: I only wrote that because the primitives we have to lock stuff with filesystem locks do not allow to implement this easily.

Meanwhile, I have implemented a few things to avoid hanging (which is a terrible class of bugs):

  • With #PJ7T2VFLV5PYG3CV23GC2GIQETXKGC6CO74JBGREV3JC3LG5OXUAC, pijul doesn’t hang anymore, but instead returns an error.

  • In Sanakirja 0.14, environments are generic in the type of lock they take. This is not enough for now, since we can only do read-write locks, but it will enable finer-grained locks in the future.