Since that function is called absolutely everywhere, there could be performance problems in doing that, we’d need a very careful benchmark. However, we might not need such a tight packing, we could probably afford to double the size of entries.
What are you trying to do exactly?
there could be performance problems in doing that
Yeah, that might be the case… I wouldn’t expect it to make much difference, but I’m not sure, especially if you abort on panics or something.
What are you trying to do exactly?
It’s still pretty much an experiment, I’m trying to index a running blockchain. To be honest I still haven’t figured out all the details though.
The idea is to fork some btrees per block, so handling rollbacks in the protocol is basically just changing the hash of the current main branch. I need to have one fork per block to make queries over that data (I don’t care much about write speed). As far as I could understand, pijul is a bit different because you don’t need to query over a change, so there are only forks per channel.
The problem lies in the fact that depending on the index, there may be periods of inactivity where it’s not modified. I’m not sure if I can hit the rc limit in other scenarios that don’t involve the root to be honest.
In general, I only need to keep forks around if they can still be a forking point, which means I can limit the depth, and in practice I think 2^12 states should be enough, but I wanted that to be a parameter and maybe leave some wiggle room to keep space management simpler.
As you can see, I still think I can work around this thing, but I still wanted to bring out some discussion about it.
Sounds interesting. My main concern here is backwards compatibility. One way to achieve it would be to make the type of MutTxn
parametric in the type of the RC B tree. You could have a trait to add pages to the RC B tree, with the type of the RC B tree as a parameter to MutTxn
itself. Of course, for backwards compatibility of the API, MutTxn
would have to keep the name, and be defined as type MutTxn<…> = GenericMutTxn<CurrentRCStrategy, …>
.
Also, the only place where something has to be changed is sanakirja
(i.e. the easy crate), not sanakirja-core
(the horrible one).
Does that make sense?
You could have a trait to add pages to the RC B tree
I’m not sure I follow that part, you mean adding a reference to that page?
If what you mean is that we can have the RC tree to be something like Db<u64, u64>
(as an extreme case of course, don’t need that much), then yes, that makes sense to me. And the new MutTxn
would have as a default the current implementation, right? So I would just need to define a new alias.
Also, just in case, I think I can probably give a hand with this if you need it, although I’m not that familiar with the code.
First, coming back to one thing you wrote earlier:
Yeah, that might be the case… I wouldn’t expect it to make much difference, but I’m not sure, especially if you abort on panics or something.
My experience with Sanakirja is that any intuition I’ve ever had about performance on this project has been contradicted by benchmarks, and I’ve tried to benchmark every single piece I could benchmark. One thing that has been pretty consistent though, is that the CPU pipeline matters a lot in this case, and adding more tests gives it more potential to fail.
I’m not sure I follow that part, you mean adding a reference to that page?
Yes, add its address in the file, i.e. a u64
referencing the page.
If what you mean is that we can have the RC tree to be something like Db<u64, u64> (as an extreme case of course, don’t need that much), then yes, that makes sense to me. And the new MutTxn would have as a default the current implementation, right? So I would just need to define a new alias.
Exactly. Note that u64
for the RC doesn’t cost anything extra here, since all values have to be 64-bit-aligned. The only potential source of extra cost (even though I don’t even think it would be measurable) is on platforms that aren’t little-endian, the bits would have to be reordered. But then most of Sanakirja would probably suffer from the same problem.
This is why the current implementation uses the remaining 12 bits: adding just one more bit would cost an entire extra u64
anyway (which actually leaves you with a total of 76 bits).
Also, just in case, I think I can probably give a hand with this if you need it, although I’m not that familiar with the code.
If you want to do that, I can help! I don’t think it is very hard.
My experience with Sanakirja is that any intuition I’ve ever had about performance on this project has been contradicted by benchmarks, and I’ve tried to benchmark every single piece I could benchmark. One thing that has been pretty consistent though, is that the CPU pipeline matters a lot in this case, and adding more tests gives it more potential to fail.
Yeah, sure, I agree with the sentiment. I have some doubts mostly because the function already returns an error, but it may very well have some overhead.
Note that u64 for the RC doesn’t cost anything extra here, since all values have to be 64-bit-aligned.
Good point actually.
In any case though, I’m happy with just increasing the ref count type to u64. I can close this discussion and open a new one for that if you prefer.
And yeah, I would like to try doing it myself, I’ll try to find the time for it one of these days.
No need to close the discussion, you can you push your patch to this one.
Feel free to ask question, if I’m slow to answer (which is likely) you can also ask on https://pijul.zulipchat.com/
I’ve been playing a bit with this, I’m currently thinking that it may be simpler to just make the key of the rc tree parametric.
Then implement a trait for both u64
and struct { page: u64, count: u64 }
for example.
The problem with this is that the trait bound becomes a bit more viral, as it needs to be put on the MutTxn
struct instead of the impl
that needs it.
OTOH, could the rc
field by something other than a btree? As far as I can see, if we make the whole thing parametric then the trait would need to basically mirror the btree api (put
, delete
, create
… even cursor
?).
As an aside question, I noticed the code is not formatted with rustfmt
, is that intentional? I find it a bit bothersome to have to do manual formatting.
This is more or less what I had in mind. We can have a default type MutTxn = MutTxnWithGenericRc<DefaultRc>
, so I don’t think this is a problem in practice.
In any case, I have a very large use case of Sanakirja in Pijul, we can use it to test reasonable backwards-compatibility of the API. If Pijul still compiles, I wouldn’t be afraid of calling the API backwards-compatible.
As an aside question, I noticed the code is not formatted with rustfmt, is that intentional?
Chicken and egg problem. Sanakirja is used as the foundation of Pijul, and uses Pijul as its version-control system. This is fun, provides many test cases, but has the downside that Sanakirja was mostly written at a time where Pijul didn’t support record hooks. It does now, feel free to add the following lines to your .pijul/config
:
[hooks]
record = [ "cargo fmt" ]
And submit a formatting patch.
YATBB4O7MHKUVYI2JKIYPHLFJIAPK7UPJWAXHR26VVJPIYAOK66AC
EDTFMMZW4N2LJQRV3UTY7PVAIKPLNRKQTB6QCMRXXT7QS2YWG5ZAC
NW4S4ZCCT7356356Z46YXQNWL6MTKZY4CGRNZDCHIHUFANYZLN6QC
FFERM7K5YLIE6WQVVFQ33TZMOMNB2IYMHI56UOXCOU2BBWMC4QOQC
FREPDP7BFOY6H7NJORRJAMJMZQAC4ZSBMOLNBXQTDFFF4PZQIDYQC
O4PSZ7TQINXL4PQI3ZHAAY2A2EDNJLDB7F7CZI7IQHT7DQ6W2UXQC
I pushed an initial implementation to gather some feedback, there may be one or two things to change (like not putting the trait at the top of the file). This is my first time using both pijul and the nest, so if I messed up somewhere please let me know.
I have a couple of tests failing currently, but I think they fail on main too, so I’m not sure if it’s something worth looking into.
Hello, thank you for your work on this library.
I was wondering whether we can make the
assert!(rc + 1 <= 0xfff);
inincr_rc
an error instead of a panic.I runned into it when creating too many forks without modifications. (like I think it would be the if you create 4097 channels in pijul without touching anything).
For my use-case it would be a really pathological/border case for it to happen in practice, so I could potentially just catch the panic and degrade to a deep clone, for example, but I think using an error would be a bit cleaner and I think most(all?) functions return results anyway.