all right thanks Dean thanks everyone
this is my first software a conference
ever simple little impressed or I'm
going to talk about the project I've
been working on for about a year now
it's called people from the name of the
Spanish bird it's we're trying to do
same version control and by saying like
I'm going to like this talk mostly about
what sane means and what vertical should
be like so it all started because I was
trying to surf long Baker my coop in
this project myself we're trying to
convince our co-authors and an academic
paper to use version control instead of
just emails and like like reserving
files for a week and analytic file right
so so but well flow happens to be one of
the core darks developers which is a
like rather old ish version control
system and so we couldn't really
convince him to use anything else so we
try to convince our co-authors to start
installing darks and windows and then
like setting up SSH keys and pushing to
pushing patches and pulling stuff and
that didn't really work out so well also
turns out it's darks has performance
problems and get as simplicity problems
so well I wasn't really easy to do
anything about this and that's pretty
much the situation with vertical roll
today like most people most non hackers
don't even use it so it's write widely
regardless of hackers thing most people
like use extremely basic version
cultural even like top like world
experts in in computer science still use
like fight locking in emails and that
makes us lose a significant amount of
data and or time and even the situation
with programmers and sorry say isn't
much better because like as soon as
distributed version controls for
invented their like if they were so
complicated and hard to use that like
business is even started to resent
relies them and use the use the use them
in a centralized way in order to master
the Beast and
Pramod this functional programming as
finally convincing people that they can
really tackle everything like all feel
the computer science that were that were
previously regarded as like elite field
just like Ashley showed us for operating
systems we're trying to replace C++ with
rust recently also JavaScript it's like
it's trying trying to get replaced by
Elm package managers and Linux
distributions are getting replaced by
like nicks it's not widely accepted so
far but and so these talks about letting
adenine other functional tool to that
collection and it's called
we start out to replace gets probably
this goal is as ambitious as for us to
replace it C++ or maybe more but but
anyway so before starting this talk I
just want you all to like get give some
basic concepts about what I well I think
functional programming is so most people
have their own definition feel free to
agree or not with this so I think
functional so by what I like in
functional languages like rust is that
there are static types so we don't like
we can reason about code easily there's
fine control of immutability it's like
we're not mutating some some states all
the time we're just thinking about
transforms and stuff in functions most
operations are atomic which is what most
people use when they use like any
software at all and it's something
that's not really happening in most most
software Mo's tools and we also like
memory safe seaweed we don't want like
you just corrupt our back hand or
correct our storage our files or
something we don't want to lose data
just because we're using it like using
types in the wrong way so how does that
apply to has there any beneficial to
version control well because the way we
use version call today has mostly about
States we're talking about comets
if we use gifts we're talking about
comments and coming so state fool
they're just basically a hash like a
patch to like make it easier to expect
but it's just in optimizations just to
make it easier to store and and their
hash of the whole state the repository
that's where the comment is and will you
commit something you just advance the
head of the repository bye-bye one small
notch and you has to all state of the
repository that's the name of your new
new comment and that's quite different
from patches so patches are all about
transforms they are all about like
bringing like any stays like applying
when you apply a patch you can basically
apply a patch to a file and it's like
that's what most people think um it's
our with the first first discover guess
but but then they realize it's not
that's not the case and so this can have
major benefits like for instance in like
any system that uses three-way merge
such as like git mercurial lesbian CVS
and like most others they don't have
that really cool property that we call
the sensitivity in algebra and that
property is the following so if you have
two of two people at least in Bob and
you're not really good communication
right you're a couple
well at least bride's like the red patch
here and Bob in thrall that writes that
like first a blue patch and then a great
green patch well depending on the order
in which you merge things depending on
what like your merge strategy you might
get different results in three-way merge
like if the patch are exactly the same I
like first I try to merge the blue patch
first and then the green patch
I might guess in get sometimes I might
get different results then if I do the
other you have a theme like on the image
on the top if I merge the two branches I
may get one result and if I merge the
two like the exact same patches and they
are not conflicting they are not like
they are not weird the situation like
you're dealing they're really different
parts of the file sometimes
Bob's patches might get merged in parts
of the files is never seen and when you
consider applying tools like that to
like highly sensitive code such that
it's like cryptography libraries it's
kind of scary but it's like what
everyone does and so
so what's even worse is that you can
actually you can actually tell when this
hits you like if you're a gates user and
this happened to you there's no way it
doesn't wait to tell get says yeah I've
merged it's fine it works and you don't
you don't notice it maybe it maybe it
even complies like there are real-world
examples of this this thing it's not
just an implementation bug that can be
worked around it's like a fundamental
problem in the three-way merge algorithm
all right so we're we're not the first
ones to believe this is wrong there was
another version called system I talked
about in my intro slide which was called
darks it's still still called darks by
the way so they so they are their main
their main principle is that you say two
patches are okayed of like fighting with
each other down conflicts if they
commute so with that that means it's
simply that you can apply them in the
order well then you're like oh they
almost commute it's not exactly like if
for you would like do W RL into algebra
it's not really commuting because you
sometimes you may have to change the
like line numbers like for instance if
Bob at the line called which containing
just F and main and slang one and at
least adds a line that like line 10
saying println hello world when you
merge them well you might have to merge
at least Elise's line after Bob's line
at line 11 instead of 10 but that's what
that's kind of commuting anyway so
that's our that's the situation with
darks when it commutes it's fine
what if it doesn't commute which happens
sometimes well it's an under commanded
part the algorithm so no one really
knows what it does flow home my
co-author on people is one of the core
darks developers and he told me like I
was pretty confident in darks before
they told me at some point yeah
we have no absolutely no clue what it
does it seems to work most of the time
it's like it's highlights conflicts but
we don't really know other than that so
that praises obviously issues about
correctness so the situation with darks
not much much better than we were kids
well at least highlights complex and
warns the user about complex
but then what it does no one knows
another problem is slightly bigger
that's the main reason why darks was in
but I've been done even though it came
before before get is that it sometimes
exponentially slow so exponentially slow
in the size of history number of patches
in their position and so that some
people to device the like merge over
that we can work through where you like
try to synchronize all the patches you
made during the week and instead of
synchronizing like while you're working
on it you're just like waiting for
Friday night to arrive start to merge
like booster patches and well hopefully
by Monday morning when you come back to
office
well the patches are merged but
sometimes not so this is this is no this
makes it kind of like not really
acceptable and this is also one of the
reasons why we could not convince our
colleagues to out to use it because they
tried to use it
naively thought yeah well patches well
we cannot understand what they are so
let's try to push about bunch of patches
repository and well they were surprised
to have to wait for like half an hour
merge so it's not usually like that like
when we write we're developing people
were using darks to develop it and it's
like we've never had we've never ever
runs you that's brought that problem but
that's also because we're like we know
how it works we know what the drawbacks
are we know when the exponentially is
slow merges happen so but that's not
acceptable like that it's it's not like
I wouldn't recommend it to anyone and
that's where people comes in so we try
to after our day trying to convince our
colleagues to use this we just like grab
grab a beer and started discussing about
like the here if patchy is what which
could be like what what good way would
be like you have a cool patch algebra we
could use and that would be simple and
easy to use and that's where we're
starting to learning about category
theory so we don't exactly started
learning learning it back then because
it's a kind of it's now it's not the
easiest theory on earth on earth but
well so whether it is what it is
basically it's a general theory of
transformations
of things so clearly theorists like to
see it as like the general theory of
everything the there are numerous
attempts to rewrite all mathematics in a
categorical language and I think it's
pretty successful except that no one can
understand this so it's already well
written we have all mathematics in there
but no one knows what it does all right
so but it's pretty good for us though
because it's a so we're trying to talk
about changes and files and this is a
theory of changes of on things so one
particularly cool concept so this you
can you see all like I I should like
these things in darks these are
commutative diagrams and category
theories try like drawing them basically
all day long if several colleagues
working mates they have their whiteboard
through that these diagrams sometimes in
3d sometimes it like we are interleaved
arrows and no one knows but what they
are doing because they're yeah they're
talking about transformations on things
all right so in particular when very
cool concepts of category theory is the
concept of push out so push have is a
depreciated of two patches is like for
in our particular case the push out of
two patches would be a file such that no
matter what you do after these two
patches that yield the common States
like a common file so if at least in Bob
write some stuff and then later they can
find some patches to agree in a common
file
well the via star here is a we call it
the push out if you can reach the car
you can reach any common state state in
the future from that star that clear
alright so anything you can do to reach
common stage you can also reach it from
the Prashad so that means the push out
is kind of a minimal common state that
can be like that you that you really
need to reach leg and it's really watch
what we want in a version control system
we really want push out we want minimum
common states that any further
developments any further work in the
repository can also be reached from that
common States that's what a merge
really is and it's it's it's it's so the
run problem is that it's not the case at
all categories have push outs have all -
shouts like we see that like the
translation of that in English is it's
not clear than any editing any - of any
couple of editing operations on a file
will always be merge about like
sometimes we have conflicts we all know
like most programmers know bits are in
conflicts so it's not the case that
tries have piles and patches have old -
shouts but what's cool but category
theory is that there's a solution you
just need to dig into them like pretty
big books of like abstract diagrams that
I don't fit in but then you can
ultimately find something called the
free conservative code completion of a
category and that's a way to
artificially add all push outs into the
category so that means when you when you
have a category of files and passes
between files the free conservative code
completion is a construction that will
automatically give you a category like a
generalized generalization of files so
that's like that generalization will
have all push outs so in people if we
translate it into like files and patches
for instance if Bob adds the line
country like saying just print a line
Bob and at least at the lines in just
print a line at least when they merge
that in people what they get is just a
graph of lines where the two lines are
added they're not comparable if not yet
said what like how they should compare
and how they they should relate to each
other in the file you may be at least is
write anybody's right maybe you're both
right maybe they're both right but in a
different order
you don't mean oh that's a conflict and
so what that theory or the category
theory gives you here as a
generalization of files its which is
well in most cases a little more complex
that gets more the the benefits for a
three lines file or maybe not obvious
but when files get really large it's
it's really great to have that sound
theory that backs you up and right so
that's that's what people is about
that's that's how it works so I want to
say before moving on that this is quite
different from see our LEDs sociology's
are conflict-free replicated data types
and people is more like conflict
tolerant replicated data types so we're
not resolving all the conflicts all the
time we're just rather than that we're
just like accepting conflicts as part of
the system like the data structure can
handle like it can it can represent
conflicts doesn't have to resolve them
all the time and so that's what makes it
fast that's what makes people really
fast because you can apply many patches
they are conflicting but that's alright
and then once you are done applying a
bunch of patches you're you can just
like detect conflicts and I'll put the
files and that's it so what what would
the situation be with theologies if we
were trying to apply your leans to this
problem of like merging patches well in
theory oddities you need to always order
so the way it gets real complex is by
ordering all operation determine
deterministically so it finds whatever
solution like whatever deterministic
merge algorithm they can find to like
just order order things all their
operations in an arbitrary but
deterministic order like for instance we
can have a rule same lease Elisa's
patches always can always come first and
when there are two conflicting patches
from at least I'll just take like take
them in alphabetical order for instance
and so you know in our case that would
like lead to the following file like two
lines one saying print a line at least
one same printer and Bob and the user
would barely see anything they would be
like yeah that's that's it the merge has
succeeded but that doesn't that isn't
really right because that like it the
user should that be we at least be
warned that there's a conflict and they
should do something about it all right
so the end result of that it's well the
fury is a slightly more complicated in
what I've explained but not much more
the result of that is that we developed
a sound theory of fashions
sound the giraffe patches it has the
following
very cool properties so bear with me for
a moment what I'm right like saying
these like bad words so they're the the
fury is like the or algebra is
commutative
it means you can like if pet cheese
don't depend on each other you can apply
them in any order so that's that's what
you would expect from that cheese right
it's associative so we resolved the
initial the initial problem and started
this talk with so you can you can
basically like no matter how you merge
patches if the merge is right like
there's only one solution to merge and
you're not you're like you're not
getting into trouble like by you're
never like do you're never doing what
get does which is like merging things
from Bob in parts of the fight is never
seen that's what associativity is about
and also one very cool property is that
all past is about semantic inverse which
means when you're when you've read in a
patch you can you can derive in other
patch permits that has the like the
opposite effects and then like push it
to other predators you cancel previous
patches the thing is you can since you
can it's it's all commits all
commutative so you can basically compute
an inverse for pets you boost like 1,000
patches ago and it all just works and it
it does even better than just working
it's also pretty fast so merging and
applying patches it's basically the same
operation in our system
well it's possibly the last complicated
slide I've written like there I'll move
on to like easier stuff after that so
our complexity for people we know
complexity here our complexity is like
linear inside of the patch and the guy
with make in the size of history so it's
like it doesn't like you can have an
arbitrarily big history it's not it
doesn't really matter like you can you
can apply the new patch your new patch P
after an arbitrary alerts arbitrarily
large number of patches it doesn't
matter it always works the same like
almost and this is this is actually that
was surprising to us but it's actually
better than three-way merge which also
adds in a like square factor of the size
of the file
that's actually really also observed
that in real world cases where we're
trying to benchmark like in early stages
of our development we were trying to
burnish mark it against like really fast
competitors such as yet and as file size
increase we not we actually noticed the
difference in performance to the point
that people who was actually faster and
get when merging really large patches on
really large price which I'm not really
sure is a real-world case but anyway and
so that brings me to the last part of
this talk so why what made Russ a cool
tool to work with what did we like and
rust and how it helped us build this
this cool new system so one thing is
we're working on our algorithms
mathematical objects and that means we
need types we need to be able to reason
about our code and that's very important
to us like we couldn't really like we
want to develop a sound theory of
patches and we couldn't like the
implement it and then rely on like our
intuition to build correct CC Curcio C++
code that would be like yeah maybe the
theory is correct then what about the
implementation so because we have types
and rust that makes it easy to do so but
we also want to be fast like as I said
like the complexity theory stuff tells
us that we have the we have the
potential to become faster than gates
and we really want to exploit that
potential and we really want to to have
like to be as fast as we can and rust
that I was also asked to do that because
we can add like we can use a like fast
back hands we can add like roll pointers
to like the parameter memory like I'm
Maps and what now
that's really cool and that wasn't the
case in the early prototypes that were
really not in other languages but also
more maybe more importantly so we
started doing that because we couldn't
convince Windows users to install dark
cinder machines and and that's and and
our goal was to be as inclusive as
possible was to like green version
control to everyone but we cannot really
do that if
can only tell you a small portion of
computer users which are like expert
Linux users and so what we really loved
in rust is that we can write clients and
servers for real word protocols so I had
to write some of that myself but it was
actually quite pleasant right but also
yeah there was a there's a Lord unit an
HTTP stack working out pretty well and I
wrote the message library alright and so
we finally got Windows support and so
I'd like to thank the rust developers
for that that's that's really awesome
thanks ok so as part of like what needed
to be Bradon because rust pretty young
language there aren't that many
libraries so there are I just wanted to
conclude with two really hard things
like two really two things that I really
did didn't put didn't think I would have
to write when I when I started its
projects so when as a as a project that
I call sonically I'll just finish word
for dictionary there was infinite time
well security has a transactional on
this b3 with like many others like LM DB
if you know an MV like many other
database backends but it's particularly
is that it has a fork operation that
trends in logarithmic time like a fast
for corporation
so for cooperation means you can clone
the database in like log log in time and
then have two copies of the database
behave as two different databases so I
needed that to implement branches and
still work in progress because it's
really hard to do I'm coming back to it
in a minute
and then there's a another bacterial
agent just like crossed fingers like I
crossed it's called fresh it's nested
current and server library they wrote
it's it's been made like it's really
entirely in rust and it's been it's
gotten rid of all unsafe blocks two
weeks ago thanks to our brian smith
who's not
okay so the trickiest parts the
trickiest thing that I had to do in
where this is sanic area so why was it
tricky well because rust always wants to
free everything before the program
closes and when you're writing a
database back-end it doesn't really
sound right you won't like something to
remain on disk after the program closes
so that was really hard so you have to
do like manual memory management you're
like--you're yourself and that's not
really when you're used to a functional
programming high scale or cameras like
coming back to manual memory management
isn't free Pleasants so how does it work
so I'm going to explain anyway how rust
helped us do that so how does it work
it's just not going to be very technical
here but like most database and Giants
are like sorry some cool database is
enjoying that I liked are based on be
trees so B trees are basically like
trees made of blocks in each block yours
there are like ordered elements there's
no word list of elements there's a
constant number of elements and between
these elements you have pointers to
other other blocks to children blocks
and it's so insertion happens it
believes and then while the blocks plate
when they're gonna get too big but
that's well that's how B trees work
there's a good B tree library in the
standard brush library so now that's
cool but we can actually use it to like
it doesn't allocate inside the files so
yeah it's allocates the program's memory
so we have to write a like a different
like a new library for B trees that
would work in files and so the main
thing I've wish I knew when I started
writing that as about iterators so wait
I'm coming back to it so when when
sometimes in B trees you have to merge
blocks when they're when they get to
like well when they get to like under
fool we have to merge them and something
I wasn't doing in the beginning because
like since since I was doing manual
memory management I figured I could like
like the root
habits came back and I was like okay
let's do manual like manual stuff roll
pointers and manual things and that's
not really the right solution Rus
because you can use like cool right
things anyway even if you have to roll
pointers so main thing I've learned
while doing this is iterators and how to
use them so merging pages like merging
blocks like this can be done in the
following way
so if right you're basically right in
each other like two iterators one for
each page there will like give you yield
like all elements all successive
elements in the two pages and you can
then chain them and add other elements
in the middle that's exactly what you
want to do with your merchant pages and
well deletions and B trees are usually
pretty tricky and that allows you to our
get tested really easily and so the main
reason why I've why I really like this
is that when you're prototyping usually
your your head is kind of in a messy
state and you really don't know what
you're going to so most of the lines
you're writing will be deleted in the
end and you really don't know how to
proceed what to do and so this kind of
hurry concise and and short sorts of
statements allow you to uh to get that
phase and get like production ready code
much much faster because the prototype
start working much much faster another
thing is that I've learned is that well
then I was like oh yeah
so we can have saved I've tried like we
can have cool attractions anyway so then
there has Carol reflexes came back and I
was like oh yeah let's do recursion
let's put in record recursion everywhere
and one thing I've learned is that it's
not always so in Russ you can't you can
write very concise code but the
recursive way is not as surely the most
concise you can do so in sanik area for
instance sometimes we need so it's on
these cried so there's a every time you
load something from this from this kit
like cost you a lot so you want to avoid
like do you avoid doing that at all cost
like and any time you can avoid loading
your page it's a map right so anytime
you can avoid loading your page to the
program's memory you won't really want
to avoid it
and so in order to do that the solutions
that will do things lazily so we're
instead of deleting something or adding
like I think new element to the sabitri
what we're doing is just like we're
doing it lazily that means we're we're
just saying well next time you want to
do something you have to copy too you
have to copy the page and update
everything but just at the time of doing
it not right now because we we don't
really know what we're going through
this page maybe we will merge it maybe
we'll drop it and it would be a huge
waste of time to copy it and then edit
it or or copy it and then merge it that
would be like we would have to copy the
page twice instead of just once so in
order to do that we need to have like
write actually something that looks like
a recursive function well we just have
to the that function which we have
access to several consecutive elements
in the programs call stack and that's
not really easy when you're writing a
like writing it's really recursively so
the way it's done actually at the moment
it's there's a fake call stack it's just
basically an array and the fact that B
trees are well balanced means that there
are not actually they're not actually
going to eat up the world memory of the
computer so you know that the the the
death is not going to be larger than 64
so you can allocate an array on the
stack and and write like how does fall
off a stack pointer so you're basically
simulating the program stack and so that
these are the two main things I wish I
knew when I started writing something
yeah so just why do thank you
especially you're a spell trust
organizers this conference is awesome
we're really enjoying us and if you have
questions I'm really happy you're
answering Thanks
[Applause]