pijul_org / pijul

#371 pijul panics when cloning this repository

Opened by AveryLychee, on March 17, 2019
Open
AveryLychee commented on March 17, 2019

When running pijul clone https://nest.pijul.com/pijul_org/pijul, it manages to pull around 25 patches before getting the following error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Inner { kind: Hyper(Error { kind: Incomplete, cause: None }), url: Some("https://nest.pijul.com/pijul_org/pijul/.pijul/patches/Achz4MzVtzVmqrw9m7rgDVbUycKmWinumnwLDDj75QfNhiyrGs5bG9guBQrjhQKdFak8wVuFmETA8atdL8wib8Zy.sig") }', src/libcore/result.rs:1009:5

The stack backtrace from running with RUST_BACKTRACE=1:

stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::result::unwrap_failed
   9: pijul::commands::remote::Session::download_patch
  10: pijul::commands::remote::Session::pull
  11: pijul::main
  12: std::rt::lang_start::{{closure}}
  13: std::panicking::try::do_call
  14: __rust_maybe_catch_panic
  15: std::rt::lang_start_internal
  16: main
  17: __libc_start_main
  18: _start

My version of pijul is 0.11.0

iantownsend commented on March 22, 2019

It could also be a bug in the Nest. Anyways, just running the command again usually helps, so Pijul should probably retry in such cases.

AveryLychee commented on March 26, 2019

On the day I submitted this issue, I had tried around 10 times, and it always failed.

Just running the command again doesn't work, as it responds with:

error: Cannot clone onto / into exisiting repoitory "pijul"

so I first have to remove the folder it created and try again.

Today I've tried twice, with the same result.

Edit: On the third time, it worked! at last...

mistone commented on March 30, 2019

Hi folks,

I'm running into the same problem here, cloning with pijul-0.11.0 on Darwin from nixpkgs.

After a dozen or so attempts I'm still unable to finish cloning.

Running with RUST_LOG=debug produces the following sort of output:

...
DEBUG 2019-03-30T14:19:22Z: pijul::commands::remote: renaming "/Users/mstone/pijul/pijul/.pijul/patches/pijul_patch.VKm5KQLAylf7/patch" to "pijul/.pijul/patches/B7mLjmEN4yTwDCQyeoVNrBMficktLnmPiPRDndGaN7JRUo7gmaQAtpJmBj8Mi8JpuuhfX8X8d8hwRFGvf4Rv6rqa.gz"
DEBUG 2019-03-30T14:19:22Z: pijul::commands::remote: renaming "/Users/mstone/pijul/pijul/.pijul/patches/pijul_patch.VKm5KQLAylf7/patch.sig" to "pijul/.pijul/patches/B7mLjmEN4yTwDCQyeoVNrBMficktLnmPiPRDndGaN7JRUo7gmaQAtpJmBj8Mi8JpuuhfX8X8d8hwRFGvf4Rv6rqa.sig"
DEBUG 2019-03-30T14:19:22Z: pijul::commands::remote: hash = 8q11AGJKZT2L17PrYxCeDW9pwrDzwgAzMYS9zUKfGAgh4Hdfo8CL4jrcVPPKWsahf8aKe8H9eam88XtELDQYPJZE
DEBUG 2019-03-30T14:19:22Z: pijul::commands::remote: downloading uri "https://nest.pijul.com/pijul_org/pijul/.pijul/patches/8q11AGJKZT2L17PrYxCeDW9pwrDzwgAzMYS9zUKfGAgh4Hdfo8CL4jrcVPPKWsahf8aKe8H9eam88XtELDQYPJZE.gz"
DEBUG 2019-03-30T14:19:22Z: pijul::commands::remote: "https://nest.pijul.com/pijul_org/pijul/.pijul/patches/8q11AGJKZT2L17PrYxCeDW9pwrDzwgAzMYS9zUKfGAgh4Hdfo8CL4jrcVPPKWsahf8aKe8H9eam88XtELDQYPJZE.sig"
DEBUG 2019-03-30T14:19:22Z: hyper::client::pool: reuse idle connection for ("https://nest.pijul.com", Http1)
DEBUG 2019-03-30T14:19:22Z: hyper::client::connect::dns: resolving host="nest.pijul.com", port=443
DEBUG 2019-03-30T14:19:22Z: hyper::proto::h1::io: flushed 244 bytes
DEBUG 2019-03-30T14:19:22Z: hyper::client::connect::http: connecting to 35.195.10.148:443
DEBUG 2019-03-30T14:19:22Z: tokio_reactor: loop process - 1 events, 0.000s
DEBUG 2019-03-30T14:19:22Z: hyper::proto::h1::io: read 750 bytes
DEBUG 2019-03-30T14:19:22Z: hyper::proto::h1::io: parsed 3 headers
DEBUG 2019-03-30T14:19:22Z: hyper::proto::h1::conn: incoming body is content-length (633 bytes)
DEBUG 2019-03-30T14:19:22Z: hyper::proto::h1::conn: incoming body completed
DEBUG 2019-03-30T14:19:22Z: reqwest::async_impl::response: Response: '200 OK' for https://nest.pijul.com/pijul_org/pijul/.pijul/patches/8q11AGJKZT2L17PrYxCeDW9pwrDzwgAzMYS9zUKfGAgh4Hdfo8CL4jrcVPPKWsahf8aKe8H9eam88XtELDQYPJZE.gz
DEBUG 2019-03-30T14:19:22Z: hyper::client::pool: reuse idle connection for ("https://nest.pijul.com", Http1)
DEBUG 2019-03-30T14:19:22Z: hyper::proto::h1::io: flushed 226 bytes
DEBUG 2019-03-30T14:19:22Z: tokio_reactor: loop process - 1 events, 0.000s
DEBUG 2019-03-30T14:19:22Z: hyper::proto::h1::io: read 0 bytes
DEBUG 2019-03-30T14:19:22Z: hyper::proto::h1::conn: parse error (parsed HTTP message from remote is incomplete) with 0 bytes
DEBUG 2019-03-30T14:19:22Z: hyper::proto::h1::dispatch: read_head error: parsed HTTP message from remote is incomplete
DEBUG 2019-03-30T14:19:22Z: tokio_reactor: dropping I/O source: 1
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Inner { kind: Hyper(Error { kind: Incomplete, cause: None }), url: Some("https://nest.pijul.com/pijul_org/pijul/.pijul/patches/8q11AGJKZT2L17PrYxCeDW9pwrDzwgAzMYS9zUKfGAgh4Hdfo8CL4jrcVPPKWsahf8aKe8H9eam88XtELDQYPJZE.sig") }', libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.
DEBUG 2019-03-30T14:19:22Z: tokio_reactor: loop process - 1 events, 0.000s
DEBUG 2019-03-30T14:19:22Z: tokio_reactor: loop process - 1 events, 0.000s
DEBUG 2019-03-30T14:19:22Z: tokio_reactor::background: shutting background reactor down NOW
DEBUG 2019-03-30T14:19:22Z: tokio_reactor::background: background reactor has shutdown
mistone commented on March 30, 2019

One other note: setting RUST_LOG=trace or RUST_LOG="tokio=trace,hyper=trace" or RUST_LOG="tokio=trace" seems to reliably cause cloning to succeed for me (n = ~5) while setting just RUST_LOG="hyper=trace" reliably causes cloning to fail with errors similar to the one pasted above. As a result, I wonder if there is perhaps a race condition that tracing tokio slows down enough to change the outcome?

berkus commented on April 14, 2019

Had the same issue today: solved it by doing cd pijul; pijul pull a dozen times.

Action Log: https://gist.github.com/berkus/2a498562a1668154f52d3f4c3c391a90

It apparently is a problem with nest networking, cutting the connection short, but then pijul network layer does not retry and fails immediately?

I believe on Hyper error Incomplete pijul should do a configurable number of retries.

berkus commented on April 14, 2019

However, pulling does not complete successfully:

λ pijul pull https://nest.pijul.com/pijul_org/pijul
Pulling patches: 100% (267/267), done.
Hash: 9tNKV1UL5cYbw3gpDScSfoZSRthQ3a6KHHKMQTjA1CyCumqrVcXey6t37qJMHURpZQwZt1UsKoj3bhNPfevwdhTK
Authors: Pierre-Étienne Meunier <pe@pijul.org>
Timestamp: 2018-02-27 14:44:24.130178136 UTC

    Init

Shall I pull this patch? [ynkad?] a

fish: 'pijul pull https://nest.pijul.c…' terminated by signal SIGKILL (Forced quit)

:(

berkus commented on April 14, 2019
pmeunier commented on April 14, 2019

Ok, this might be caused by 0.11, which is several months old. I'm releasing 0.12 NOW!

berkus commented on April 14, 2019

Yes, it's five months old! Thank you! :)

pmeunier commented on April 14, 2019

Also, the Nest is running on a small machine at the moment, and sometimes Pijul blows the memory up. There is certainly action to be taken to limit its memory consumption.

pmeunier commented on April 14, 2019

Btw, no offense taken, but there are nicer ways to interact with a young project than irony about an old version on twitter ;-)

berkus commented on April 14, 2019

I still think it's great, but this behavior surprised me a lot.

I thought the darcs' pain of going into an endless permutation loop was solved, while here it seems to go into an endless memory consumption loop, with the same drastic result.

I didn't know that this discussion exists on nest, so i thought i'd ping you on twitter first. Sorry if that was seen as irony.

pmeunier commented on April 14, 2019

The algorithm is supposed to fix the problems in Darcs. The tradeoff is between disk space, but we have ideas to improve it.

There were still pretty nasty bugs in the implementation in version 0.11, and that's why Pijul is not 1.0 yet.

berkus commented on April 14, 2019

Got it, lets see how 0.12 improves things!

berkus commented on April 15, 2019

How's the release going?

pmeunier commented on April 15, 2019

Found one last bug I want to fix before releasing.

berkus commented on April 15, 2019

Ok, I'm refreshing crates.io every hour :)

berkus commented on April 22, 2019

AWESOME! 0.12.0 has cloned its repo from nest very quickly! building it from pijul now...

lelf commented on April 23, 2019

Having the same problem with 0.12.0

pmeunier commented on April 23, 2019

@lelf: could you be more explicit? What's the problem exactly? Would you mind pasting the output of RUST_BACKTRACE=1 RUST_LOG="pijul=debug" pijul … where is whatever command you're trying to do?

lelf commented on April 23, 2019

@pmeunier pijul clone https://nest.pijul.com/pijul_org/pijul

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Hyper(Error { kind: Incomplete }), url: Some("https://nest.pijul.com/pijul_org/pijul/.pijul/patches/A91y9zQHKj7bjATjopKTKxyoVKKvQe3oAH5yLEgBCricrN7hcNT57EcwXihcXU56VaH2s5fRwDBcmKs53Hj5r3FF.gz") }', src/libcore/result.rs:997:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::_print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::result::unwrap_failed
   9: pijul::commands::remote::UriSession::fetch_patch
  10: pijul::commands::remote::Session::pull
  11: pijul::commands::clone::run
  12: pijul::main
  13: std::rt::lang_start::{{closure}}
  14: std::panicking::try::do_call
  15: __rust_maybe_catch_panic
  16: std::rt::lang_start_internal
  17: main

Full log — https://lelf.lu/files/pijul-1.log

eh commented on April 24, 2019

I'm also seeing this issue when cloning this repo. It finished on the third attempt:

❯ pijul clone https://nest.pijul.com/pijul_org/pijul
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Hyper(Error { kind: Incomplete }), url: Some("https://nest.pijul.com/pijul_org/pijul/.pijul/patches/9xMfbvqgFkX9q5Ty2zt81jkLXMSGFYmU4ch2gYKKRBK2fgKGDvYPsewntW4PmGfGQvTdXuLvrQuPRhQKDCni5wNE.sig") }', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

❯ pijul clone https://nest.pijul.com/pijul_org/pijul
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Hyper(Error { kind: Incomplete }), url: Some("https://nest.pijul.com/pijul_org/pijul/.pijul/patches/98ivUJcqBq5eYeGCfatRxSnPj9yXLM6XMyscpPzdpcjRHtpQWxpf9UKy7m74M9qbUL6HwCQG47nYQhnQNUNLR6uc.gz") }', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

❯ pijul clone https://nest.pijul.com/pijul_org/pijul
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Hyper(Error { kind: Incomplete }), url: Some("https://nest.pijul.com/pijul_org/pijul/.pijul/patches/AZ4nNnhQj8A4Kx6sL9f2uqxzXL7xaUEA8iyi9DzpMFbXMofp4ZNWjaN6ZgnKzWr2xTQoVwaZKe66vUd4DopDakNh.gz") }', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

❯ pijul clone https://nest.pijul.com/pijul_org/pijul
Pulling patches: 100% (280/280), done.
Applying patches: 100% (280/280), done.

❯ pijul -V
pijul 0.12.0
pmeunier commented on April 24, 2019

Ok, I do need to fix that. These files don't even exist, the server actually returns a 404, but the fix must include some "asynchronisation" of downloading the patches, and that also means the SSH backend has to support it, which requires a little bit of extra work.

pmeunier commented on April 24, 2019

Next time this happens, would you mind sending the output of date as well, so that I can check in the server logs in parallel?

lelf commented on April 24, 2019
$ date -u; pijul clone https://nest.pijul.com/pijul_org/pijul; date -u
Mer 24 Apr 2019 18:06:19 UTC
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Hyper(Error { kind: Incomplete }), url: Some("https://nest.pijul.com/pijul_org/pijul/.pijul/patches/9tNKV1UL5cYbw3gpDScSfoZSRthQ3a6KHHKMQTjA1CyCumqrVcXey6t37qJMHURpZQwZt1UsKoj3bhNPfevwdhTK.sig") }', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Mer 24 Apr 2019 18:06:50 UTC
Lysxia commented on May 24, 2019

The problem seems to occur whether or not the patch exists.

Here's a minimized executable that just keeps requesting some arbitrary patch: https://gist.github.com/Lysxia/ed129e39e4cb730c9458408690187abf I usually get hit after consecutive 250-300 requests, but perhaps you will have some luck reproducing the issue by running this more times.

It seems the only thing we can do is to retry requests, as berkus suggested earlier. I could work on this if you want.

Lysxia added a patch:
Make number of HTTP retries configurable by Lysxia, created on May 24, 2019

7Zmfdfo9XB6Z6sWndqf66oGuw9TjJW2xz6GQ38vA7ApsoEmetdugThPMUyJf2KSHdYe28QnyZum6gMj5jSDBijVd
latest
master
testing
jimbo1qaz commented on June 27, 2019

Note that pijul 0.12.0 now clones to a hidden directory (./.tmp[random chars]). However I was able to enter (or rename) the hidden directory, then run pijul pull as a workaround. Does "latest\n master\n testing" mean this patch will be included in the next release?

tuxsoy commented on August 24, 2019

Having this issue on v0.12.2. An easy workaround is to do the following:

$ mkdir pijul
$ cd pijul
$ pijul init
$ pijul pull https://nest.pijul.com/pijul_org/pijul

When that fails, repeat the following command as many times as needed:

$ pijul pull https://nest.pijul.com/pijul_org/pijul