pijul/sanakirja - Discussion #3 - I release the user-friendly wrapper for sanakirja database ( https://docs.rs/crate/sdb)

#3 I release the user-friendly wrapper for sanakirja database ( https://docs.rs/crate/sdb)

Opened by gcxfd on June 16, 2021

gcxfd on June 16, 2021

I see sanakirja can use Db<String, (Db<A, B>, Db<C, D>, u64>)> in https://www.reddit.com/r/rust/comments/lp5jez/sanakirja_10_pure_rust_transactional_ondisk/

But I read the test code https://nest.pijul.com/pijul/sanakirja:main/UAQX27N4PI4LG.BMAAA , there is not code example for this usage .

Is there any code example ?

pmeunier on June 16, 2021

This is really cool. Interested in co-writing a guest blog post on the Pijul blog?

There are code examples in the source code of Pijul, but it would be really hard to make it user-friendly, since the calls to set_root have to be replaced by different things depending on the context (in the particular example of Db<K, Db>, you have to do del + put in the outer btree whenever you edit an inner btree). This is the main reason Sanakirja exposes such a low-level API.

gcxfd on June 16, 2021

This is really cool. Interested in co-writing a guest blog post on the Pijul blog?

I am very interested in writing a blog to share sanakirja and sdb, but I want to wait a little longer.

Because it has just been completed, I think there maybe many potential problems.

I plan to develop a chat software based on peer-to-peer network, using sdb as an index database.

I want to write a blog about the actual use of sanakirja and sdb when the demo of the chat software is released.

There are code examples in the source code of Pijul, but it would be really hard to make it user-friendly, since the calls to set_root have to be replaced by different things depending on the context (in the particular example of Db, you have to do del + put in the outer btree whenever you edit an inner btree). This is the main reason Sanakirja exposes such a low-level API.

I am not familiar with the code structure of pijul. Would you please give me a link to a specific file? I’ll study it. thank you very much.

In addition, I ask one more question.

Will sanakirja free up disk space after deleted key ? Or, is there a function similar to sqlite’s VACUUM can rebuild the database and release disk space?

pmeunier on June 16, 2021

I am not familiar with the code structure of pijul. Would you please give me a link to a specific file? I’ll study it. thank you very much.

The commit_channel method in https://nest.pijul.com/pijul/pijul:main/SXEYMYF7P4RZM.OANAQ would be a good start, and give you pointers to other types and traits defined in the same file.

Will sanakirja free up disk space after deleted key ? Or, is there a function similar to sqlite’s VACUUM can rebuild the database and release disk space?

Sanakirja has a memory-management system, and frees up memory-mapped pages automatically, which are actually disk sectors (when the backend is an mmapped file, which is the default feature in Sanakirja). If the freed sectors happen to be at the end of the file, this does free up disk space. Else, the freed pages are simply stored in a btree of free pages.

pmeunier on June 16, 2021

More questions/comments

I plan to develop a chat software based on peer-to-peer network, using sdb as an index database.

If you look at https://nest.pijul.com/pijul/pijul:main/QL6K2ZM35B3NI.7UMAA, it shows you a way to compress a database and do fast lookups in it. I have a blog post in preparation describing that design.

gcxfd on June 16, 2021

thanks again

Is there any easy for use custom serialization function ?

for example

#[derive(Default, Eq, PartialEq, PartialOrd, Ord, Hash, Clone, Copy, Debug)]
pub struct Data {
 pub hash: [u8; 2],
 pub id: u64,
}

I found the direct_repr! size is 16

When I use desse ( Ultra fast binary serialization and deserialization for types with size known at compile time ), the size is 8+2 = 10

pmeunier on June 17, 2021

Is there any easy for use custom serialization function ?

This isn’t easy, if you want it to be fast. If you can tolerate a lesser performance, you could use bincode/desse, and serialize/deserialize the resulting Vec<u8>. Note that at the moment, large keys or large values (large is when the total key + value is more than 510 bytes) aren’t implemented. This is because it is quite hard to do, and would take a while to test.

I found the direct_repr! size is 16

When I use desse ( Ultra fast binary serialization and deserialization for types with size known at compile time ), the size is 8+2 = 10

This is because when you do btree::get (or iter), Sanakirja gives you a direct pointer to the struct, without copying anything. Moreover, Rust inserts padding in the struct, in order to align the u64 on an 8-byte boundary. I don’t think this is mandatory on x86, nor affects the performance, but other platforms are stricter.

gcxfd on June 19, 2021

I found a way to encode and decode automatic .

I have not completed the rewrite of all the codes for this, but I have done a minimum unit verification as below (can run it)

https://github.com/rmw-link/sdb/blob/9be99cce305f40413d92bcbe9a3c5b55ab643580/src/dbpage.rs#L89

pub trait EncodeDecode {
  fn encode(&self, next: &mut dyn FnMut(&T) -> R) -> R;
  /*
  fn decode(val: &T) -> Self;
  */
}

https://github.com/rmw-link/sdb/blob/9be99cce305f40413d92bcbe9a3c5b55ab643580/src/lib.rs#L242

impl<
    'a,
    'b,
    K: 'a + Storable + PartialEq + ?Sized,
    V: 'a + Storable + PartialEq + ?Sized,
    P: BTreeMutPage + BTreePage,
    RK: ?Sized + EncodeDecode,
    RV: ?Sized + EncodeDecode,
  > TxDb<'b, K, V, MutTxnEnv<'b>, P, RK, RV>
{
  pub fn put(&mut self, k: &RK, v: &RV) -> std::result::Result {
    k.encode(&mut |k| v.encode(&mut |v| set_root!(btree::put(tx, &mut self.db, k, v), self, tx)))
  }

https://github.com/rmw-link/sdb/blob/9be99cce305f40413d92bcbe9a3c5b55ab643580/tests/db.rs#L69


#[derive(DesseSized, Desse)]
pub struct Data2 {
  pub hash: [u8; 3],
  pub id: u64,
}

#[derive(Default, Eq, PartialEq, PartialOrd, Ord, Hash, Clone, Copy, Debug, DesseSized, Desse)]
pub struct Data2Desse([u8; Data2::SIZE]);

use sdb::direct_repr;
direct_repr!(Data2Desse);

#[dynamic]
pub static DB5: DbEv<'static, u64, Data2Desse, Data2> = TX.db(5);

impl EncodeDecode for Data2 {
  #[inline]
  fn encode(&self, next: &mut dyn FnMut(&Data2Desse) -> R) -> R {
    next(&Data2Desse(self.serialize()))
  }
}

use example

https://github.com/rmw-link/sdb/blob/9be99cce305f40413d92bcbe9a3c5b55ab643580/tests/main.rs#L101

    let mut db5 = tx.db(&DB5);
    let data = Data2 {
      id: 1234,
      hash: [3, 2, 1],
    };
    db5.put(&1, &data)?;