pijul/sanakirja - Discussion #5 - make rc overflow an error variant

#5 make rc overflow an error variant

Opened by enzoc4 on October 20, 2021

enzoc4 on October 20, 2021

Hello, thank you for your work on this library.

I was wondering whether we can make the assert!(rc + 1 <= 0xfff); in incr_rc an error instead of a panic.

I runned into it when creating too many forks without modifications. (like I think it would be the if you create 4097 channels in pijul without touching anything).

For my use-case it would be a really pathological/border case for it to happen in practice, so I could potentially just catch the panic and degrade to a deep clone, for example, but I think using an error would be a bit cleaner and I think most(all?) functions return results anyway.

pmeunier on October 21, 2021

Since that function is called absolutely everywhere, there could be performance problems in doing that, we’d need a very careful benchmark. However, we might not need such a tight packing, we could probably afford to double the size of entries.

What are you trying to do exactly?

enzoc4 on October 21, 2021

there could be performance problems in doing that

Yeah, that might be the case… I wouldn’t expect it to make much difference, but I’m not sure, especially if you abort on panics or something.

What are you trying to do exactly?

It’s still pretty much an experiment, I’m trying to index a running blockchain. To be honest I still haven’t figured out all the details though.

The idea is to fork some btrees per block, so handling rollbacks in the protocol is basically just changing the hash of the current main branch. I need to have one fork per block to make queries over that data (I don’t care much about write speed). As far as I could understand, pijul is a bit different because you don’t need to query over a change, so there are only forks per channel.

The problem lies in the fact that depending on the index, there may be periods of inactivity where it’s not modified. I’m not sure if I can hit the rc limit in other scenarios that don’t involve the root to be honest.

In general, I only need to keep forks around if they can still be a forking point, which means I can limit the depth, and in practice I think 2^12 states should be enough, but I wanted that to be a parameter and maybe leave some wiggle room to keep space management simpler.

As you can see, I still think I can work around this thing, but I still wanted to bring out some discussion about it.

pmeunier on October 21, 2021

Sounds interesting. My main concern here is backwards compatibility. One way to achieve it would be to make the type of MutTxn parametric in the type of the RC B tree. You could have a trait to add pages to the RC B tree, with the type of the RC B tree as a parameter to MutTxn itself. Of course, for backwards compatibility of the API, MutTxn would have to keep the name, and be defined as type MutTxn<…> = GenericMutTxn<CurrentRCStrategy, …>.

Also, the only place where something has to be changed is sanakirja (i.e. the easy crate), not sanakirja-core (the horrible one).

Does that make sense?

enzoc4 on October 21, 2021

You could have a trait to add pages to the RC B tree

I’m not sure I follow that part, you mean adding a reference to that page?

If what you mean is that we can have the RC tree to be something like Db<u64, u64> (as an extreme case of course, don’t need that much), then yes, that makes sense to me. And the new MutTxn would have as a default the current implementation, right? So I would just need to define a new alias.

Also, just in case, I think I can probably give a hand with this if you need it, although I’m not that familiar with the code.

pmeunier on October 21, 2021

First, coming back to one thing you wrote earlier:

Yeah, that might be the case… I wouldn’t expect it to make much difference, but I’m not sure, especially if you abort on panics or something.

My experience with Sanakirja is that any intuition I’ve ever had about performance on this project has been contradicted by benchmarks, and I’ve tried to benchmark every single piece I could benchmark. One thing that has been pretty consistent though, is that the CPU pipeline matters a lot in this case, and adding more tests gives it more potential to fail.

I’m not sure I follow that part, you mean adding a reference to that page?

Yes, add its address in the file, i.e. a u64 referencing the page.

If what you mean is that we can have the RC tree to be something like Db<u64, u64> (as an extreme case of course, don’t need that much), then yes, that makes sense to me. And the new MutTxn would have as a default the current implementation, right? So I would just need to define a new alias.

Exactly. Note that u64 for the RC doesn’t cost anything extra here, since all values have to be 64-bit-aligned. The only potential source of extra cost (even though I don’t even think it would be measurable) is on platforms that aren’t little-endian, the bits would have to be reordered. But then most of Sanakirja would probably suffer from the same problem.

This is why the current implementation uses the remaining 12 bits: adding just one more bit would cost an entire extra u64 anyway (which actually leaves you with a total of 76 bits).

pmeunier on October 21, 2021

Also, just in case, I think I can probably give a hand with this if you need it, although I’m not that familiar with the code.

If you want to do that, I can help! I don’t think it is very hard.

enzoc4 on October 21, 2021

My experience with Sanakirja is that any intuition I’ve ever had about performance on this project has been contradicted by benchmarks, and I’ve tried to benchmark every single piece I could benchmark. One thing that has been pretty consistent though, is that the CPU pipeline matters a lot in this case, and adding more tests gives it more potential to fail.

Yeah, sure, I agree with the sentiment. I have some doubts mostly because the function already returns an error, but it may very well have some overhead.

Note that u64 for the RC doesn’t cost anything extra here, since all values have to be 64-bit-aligned.

Good point actually.

In any case though, I’m happy with just increasing the ref count type to u64. I can close this discussion and open a new one for that if you prefer.

And yeah, I would like to try doing it myself, I’ll try to find the time for it one of these days.

pmeunier on October 21, 2021

No need to close the discussion, you can you push your patch to this one.

pmeunier on October 21, 2021

Feel free to ask question, if I’m slow to answer (which is likely) you can also ask on https://pijul.zulipchat.com/

enzoc4 on October 25, 2021

I’ve been playing a bit with this, I’m currently thinking that it may be simpler to just make the key of the rc tree parametric.

Then implement a trait for both u64 and struct { page: u64, count: u64 } for example.

The problem with this is that the trait bound becomes a bit more viral, as it needs to be put on the MutTxn struct instead of the impl that needs it.

OTOH, could the rc field by something other than a btree? As far as I can see, if we make the whole thing parametric then the trait would need to basically mirror the btree api (put, delete, create… even cursor?).

As an aside question, I noticed the code is not formatted with rustfmt, is that intentional? I find it a bit bothersome to have to do manual formatting.

pmeunier on October 25, 2021

This is more or less what I had in mind. We can have a default type MutTxn = MutTxnWithGenericRc<DefaultRc>, so I don’t think this is a problem in practice.

In any case, I have a very large use case of Sanakirja in Pijul, we can use it to test reasonable backwards-compatibility of the API. If Pijul still compiles, I wouldn’t be afraid of calling the API backwards-compatible.

As an aside question, I noticed the code is not formatted with rustfmt, is that intentional?

Chicken and egg problem. Sanakirja is used as the foundation of Pijul, and uses Pijul as its version-control system. This is fun, provides many test cases, but has the downside that Sanakirja was mostly written at a time where Pijul didn’t support record hooks. It does now, feel free to add the following lines to your .pijul/config:

[hooks]
record = [ "cargo fmt" ]

And submit a formatting patch.