Sounds like you’re doing some really cool work. @darleybarreto did fantastic work translating the code to Rust, in an attempt to mitigate the performance regressions we were experiencing in ZStd 1.5. It didn’t mitigate those, but it does feel a bit more future-proof and maintainable.
However, this was done at a time of wrapping up the beta for Pijul, and there were still a number of unexplained bugs when testing on a larger scale. Since Pijul uses ZStd-seekable almost everywhere, a more reasonable way to test was to move back to the conservative, C version temporarily while debugging the rest of Pijul.
But now that we’re beta, the testing can finally resume! If you’re interested in helping, I’m sure @darleybarreto would enjoy the discussions I haven’t had much time to hold in the last few months :(
However, this was done at a time of wrapping up the beta for Pijul, and there were still a number of unexplained bugs when testing on a larger scale. Since Pijul uses ZStd-seekable almost everywhere, a more reasonable way to test was to move back to the conservative, C version temporarily while debugging the rest of Pijul.
Just to make sure I understand: were there a number of unexplained bugs in Pijul or in zstd-seekable
? If former, did you stick with using zstd-seekable
just because it was more tested/less to worry about while you were looking for Pijul bugs? Or were there bugs in zstd-seekable
that you didn’t have time to deal with so you reverted to the C version for now?
If you’re interested in helping, I’m sure @darleybarreto would enjoy the discussions I haven’t had much time to hold in the last few months :(
If there’s something specific I can do to help, I can probably spend a little bit of time here and there though I am not privy how the zstd-seekable (or zstd) internals work and such so not sure how much help I can be.
To give some idea of our use of zstd-seekable
, we usually generate a set of data that after compression with current zstd-seekable is about 500GiB of data. This is mostly pretty similar kind of data so definitely not an exhaustive sample but probably better than nothing. If there’s some version of zstd-seekable
that can be tried, I could try to re-compress the data and see whether nothing crashes on decoding the C-backed version and encoding to new version.
Hi,
Sorry for not responding earlier, for some reason I don’t get notified when people mentioned me in discussions.
If there’s something specific I can do to help […]
I’m not sure, actually. There was some people reporting bugs on pijul’s main repo, but other than that I don’t know about any particular bug. What would be much appreciated is adding tests. I added one or two, but this is far from ideal. Perhaps fuzzing it too.
Another thing that helps are discussions and code reviews. You could browse to the source code (this and optionally the original C) and understand the code and discuss with me any particular matter you deem to be interesting, or different from the original, or potentially wrong or even coded better. Other than that, trying to use on a daily basis would also be great to improve the implementation, the API, find bugs etc. I also keep an eye on the original C code base to see what’s changed and port to this implementation.
Please note that I’m not a fluent rust programmer, neither someone with extensive system programming skills (I am a machine learning researcher/dev who mainly codes python). I do things in rust to learn and help others, so any help is much appreciated!
I’m the author of the
zstd-seekable-s3
crate that provides an object that implements Read and Sync that allows to make seekable reads from AWS S3 directly, including if the files are behind a seekable zstd archive. It’s using your crate for the compression/decompression bits.The way this was achieved was via tokio::runtime::block_on + rusoto. However, we’ve found that this has too many limitations when you want to actually use
async
in the rest of your library: you can end up with nestedblock_on
calls and all kinds of other unfun things.With the released version of the crate (0.1.7), it’s not possible to get rid of synchronicity: Seekable::init requires
Read
andSeek
so that it can invoke callbacks.I thought everything was lost but I noticed that you have gotten rid of the zstd-seekable C and translated the code to Rust itself. This means that if we allow
Seekable::src
to beasync
, everyone is happy.So the questions are:
Seekable::src
callbacks should also allowasync
versions. I don’t know of a “pretty” way to support both at once: worse comes to worst, I can hack something myself based on this code here though that depends on the answer to my first question.Thanks!