The sound distributed version control system

#857 Error: Pijul clone - broken (FreeBSD)

Opened by levi on December 31, 2023
levi on December 31, 2023

Here’s the thing, I always downloaded my experimental repositories, which were rather small. This time I wanted to download the repository of pijul itself.

First the test, let’s try it on some rather small repo, let’s say: https://nest.pijul.com/finchie/new_manual/changes

~/experiment
λ pijul clone https://nest.pijul.com/finchie/new_manual
Repository created at /home/levi/experiment/new_manual
Downloading changes [===============================================] 48/48
           Applying [===============================================] 48/48
 Completing changes [===============================================] 0/0

~/experiment
λ ls
new_manual

Looks good, now let’s try pijul repo.

~/experiment
λ pijul clone https://nest.pijul.com/pijul/pijul
Repository created at /home/levi/experiment/pijul
Downloading changes [==>                                        ] 58/990
           Applying [==>                                        ] 58/990
Error: No error: 0 (os error 0)

~/experiment
λ ls
new_manual

Unfortunately, Error 0, is marked as “Not used” So I conclude that the error is unknown. And it’s probably not only on FreeBSD. In any case, this is an error that is not defined in the C standard library. So it must be something more bizarre. I know it’s Rust, but apparently Rust libraries follow the same conventions.

It does not even create a directory, and it is always 58/990 in the case of pijul repo.

A working hypothesis is that this is somehow dependent on the size of the repository ? Probably a similar error as in another thread: #849 Error: No such file or directory (os error 2) (FreeBSD) An OS-dependent error, related to async I/O in Tokio module ?

FreeBSD 14.0-RELEASE-p3
pijul 1.0.0-beta.7 and pijul 1.0.0-beta.8
rustc 1.74.1 (a28077b28 2023-12-04) (built from a source tarball)

This time, setting RUST_BACKTRACE=0 for my shell no longer helps. I rather doubt that this time changing to beta-8 will help anything. Pijul has very hard times running on BSD’s.

I will post the debug session shortly…

levi on December 31, 2023

pijul debug in run

λ ./pijul clone https://nest.pijul.com/pijul/pijul x
Repository created at /home/levi/experiment/x
Downloading changes [==>                                               ] 59/990
           Applying [==>                                               ] 59/990
[2023-12-31T16:13:46Z ERROR pijul] Error: TxnErr(
        Sanakirja(
            IO(
                Os {
                    code: 0,
                    kind: Uncategorized,
                    message: "No error: 0",
                },
            ),
        ),
    )
Error: No error: 0 (os error 0)

Pijul debug in gdb session.

~/experiment took 1m18s
λ rust-gdb -q ./pijul
Reading symbols from ./pijul...
(gdb) b main.rs:160
Breakpoint 1 at 0xa3603c: main.rs:160. (2 locations)
(gdb) clone https://nest.pijul.com/pijul/pijul pijul_repo
No symbol 'https' in current context
(gdb) run clone https://nest.pijul.com/pijul/pijul pijul_repo
Starting program: /home/levi/experiment/pijul clone https://nest.pijul.com/pijul/pijul pijul_repo
[New LWP 141540 of process 99972]
[New LWP 141541 of process 99972]
[New LWP 141542 of process 99972]
[New LWP 141543 of process 99972]
[New LWP 141544 of process 99972]
[New LWP 141545 of process 99972]
[New LWP 141546 of process 99972]
[New LWP 141547 of process 99972]
[New LWP 141548 of process 99972]
[New LWP 141549 of process 99972]
[New LWP 141550 of process 99972]
[New LWP 141551 of process 99972]
[New LWP 141552 of process 99972]
Repository created at /home/levi/experiment/pijul_repo
[New LWP 141553 of process 99972]
[New LWP 141554 of process 99972]
Downloading changes [==>                                               ] 59/990
           Applying [==>                                               ] 59/990

Thread 1 hit Breakpoint 1.1, pijul::main::{async_block#0} () at src/main.rs:160
Downloading changes [===>                                              ] 60/990
(gdb) n
Thread 1 hit Breakpoint 1.2, pijul::main::{async_block#0} () at src/main.rs:160
160	        log::error!("Error: {:#?}", e);
(gdb)
           Applying [==>                                               ] 59/990
Downloading changes [===>                                              ] 60/990
           Applying [==>                                               ] 59/990
Downloading changes [===>                                              ] 60/990
           Applying [==>                                               ] 59/990
[2023-12-31T16:19:27Z ERROR pijul] Error: TxnErr(
        Sanakirja(
            IO(
                Os {
                    code: 0,
                    kind: Uncategorized,
                    message: "No error: 0",
                },
            ),
        ),
    )
Downloading changes [===>                                              ] 60/990
           Applying [==>                                               ] 59/990
Downloading changes [===>                                              ] 60/990
164b)               Err(e) => writeln!(std::io::stderr(), "Error: {}", e).unwrap_or(()),
(gdb)
           Applying [==>                                               ] 59/990
Downloading changes [===>                                              ] 60/990
Error:            Applying [==>                                               ] 59/990                                             
No error: 0 (os error 0)
165	        }
(gdb)
Downloading changes [===>                                              ] 60/990
           Applying [==>                                               ] 59/990
[LWP 141553 of process 99972 exited]
[LWP 141544 of process 99972 exited]
[LWP 141548 of process 99972 exited]
[LWP 141551 of process 99972 exited]
[LWP 141546 of process 99972 exited]
[LWP 141541 of process 99972 exited]
[LWP 141552 of process 99972 exited]
[LWP 141549 of process 99972 exited]
[LWP 141540 of process 99972 exited]
[LWP 141550 of process 99972 exited]
[LWP 141543 of process 99972 exited]
[LWP 141554 of process 99972 exited]
[LWP 141542 of process 99972 exited]
[LWP 141547 of process 99972 exited]
[LWP 141545 of process 99972 exited]
[Inferior 1 (process 99972) exited with code 01]
(gdb)

last lines on (gdb) b clone.rs:1

282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
284	            }
(gdb)
Downloading changes [==>                                               ] 52/990
           Applying [==>                                               ] 52/990
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
284	            }
(gdb)
Downloading changes [==>                                               ] 53/990
           Applying [==>                                               ] 53/990
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
284	            }
(gdb)
Downloading changes [==>                                               ] 54/990
           Applying [==>                                               ] 54/990
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
Downloading changes [==>                                               ] 55/990
286b)               self.park();
(gdb)
           Applying [==>                                               ] 55/990
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
284	            }
(gdb)
Downloading changes [==>                                               ] 56/990
           Applying [==>                                               ] 56/990
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
284	            }
(gdb)
Downloading changes [==>                                               ] 57/990
           Applying [==>                                               ] 57/990
282	            if let Ready(v) = crate::runtime::coop::budget(|| f.as_mut().poll(&mut cx)) {
(gdb)
Downloading changes [==>                                               ] 58/990
           Applying [==>                                               ] 58/990
284	            }
(gdb)
Downloading changes [==>                                               ] 58/990
           Applying [==>                                               ] 58/990
Downloading changes [==>                                               ] 59/990
           Applying [==>                                               ] 58/990
[2023-12-31T18:01:46Z ERROR pijul] Error: TxnErr(
        Sanakirja(
            IO(
                Os {
                    code: 0,
                    kind: Uncategorized,
                    message: "No error: 0",
                },
            ),
        ),
    )
tankf33der on January 1, 2024

This can be closed since issue does not arise on UFS mount point.

levi on January 1, 2024

With reference to #849 this bug could be related. This and all the future FreeBSD and ZFS file system I/O problems could be related. So the solution would be probably the same. Yesterday @tankf33der discovered that this is not FreeBSD alone issue, neither ZFS alone issue. Pijul works fine on ZFS under Linux, and works fine on FreeBSD under UFS file system. But the combination of both FreeBSD/ZFS cause the problem. We have several hypotheses, including missing/incomplete implementation (see #849), To low kernel knobs values for ZFS. To rule out at least one of these things, we need the opinion of a programmer on tokio::io::bsd All these problems seem to have one problem in common at its core - handling concurrency properly. Let me remind you that pijul clone only fail on larger repositories (unproved), and Pijul record fails…only some of the time.

Just to add, the tandem FreeBSD+ZFS is the de facto standard. And it is recognized as default and used in more than 90% of cases.