um yeah so my name is Martin Von
schweigberg
I
expected my speaker notes to be here
somewhere
um but yeah
um I work for uh on Source control at
Google
and I'm going to talk about uh a project
I've been working on for almost three
years it's a git compatible VCS called
Jiu Jitsu and in case you're wondering
the name has nothing to do with
jiu-jitsu kaisen the anime
um it's called Jujitsu just because the
binary is called JJ and the binary is
called JD because it's easy to type
um
oh okay now there it is
um so here's an overview of the
presentation
um
first I'm going to give you some
background about me and about the
history of source control at Google
and I'm going to go through the
workflows and architecture of JJ
um and then at the end I'll explain
what's what's next for the open source
project and uh and um our the
integration at Google
so background about me I after
graduating I worked for Eric's son for
about 70 years
and while there I think it was it's fair
to say I drove the immigration to get
there from from clear case
and I cleaned up some get three base
scripts in my spare time
then I joined Google and worked on
a compensation app
and for the last eight years I've worked
on fig which is a which is a project to
integrate material as a client for our
in-house monorepo
um so for for context uh let me tell you
a bit about the history our Version
Control at Google
so long time ago we supposedly started
with CVS
um
and uh then we switched to the perforce
and after a while proforce uh wasn't
able to handle the repository anymore
because it got too large so we wrote Our
Own VCS called Piper
um but the working copy was still too
big so we created a virtual flight
system called City
on top of Piper
and that's what almost every user go at
Google uses now
um
people who are still missing the dbcs
workflows that they were used to from
outside Google so we added Mercury on
top of that and as I said that's what
I've been working on for the last eight
years
um and also in case you you didn't know
um this our monoree but Google is
extremely large and has like all the
source code
um at Google
and you can you can watch this uh uh
talk by Rachel plattwood and potvin
um from at scale that if you're curious
so generally people really like fig but
there's still some major problems uh
we're having
the probably the biggest one is
performance
um and and that's partly because of
python python is slow
uh and partly because of
um eager data structures that don't
scale to the size of the repo
another problem is uh with uh
consistency we're seeing right races
because Mercurial is not designed for
distributed storage so we get corruption
when when we store it on top of our
distributed file system
um and another pain has been
integration because we're calling
Mercurial the Mercurial CLI and parsing
the outputs which is not fun
so
a few years ago we started talking about
what we would want from a next-gen VCS
and one of the ideas that came up there
was to automatically commit or to to
make the
make to make a make a commit from every
save from from your editor
and I got really excited by that idea
and started JD to experiment with it
um
and then I worked on it as my 20 product
for about two years
and this spring we decided to invest
more so now it's my 100 project
um
next
so you may be wondering why we're why I
didn't decide to to just add these
features to to git but as I said I want
I was want to experiment with a
different ux so I I think that would end
up being a completely separate
set of commands inside of git and that
would be
really ugly and wouldn't and shouldn't
get up accepted Upstream
and also I wanted to be able to
integrate into Google's ecosystem and we
had already decided what fig that
against using git there because because
of the problems with integrating it with
um our ecosystem at Google
and one of the problems is that there
are multiple implementations I'll get
that read from the file system so we
would have to add any new features in in
at least two or three places
okay so let's take a look at the
workflows
so the first the first teacher
um is anonymous branches which is
something I copied from mercurial
so instead of this
gets scary uh detached head workflow or
state
um JJ keeps track of all your commits
without you having to name them
so that means they didn't show up in log
outputs and they will not be garbage
collected
um so it may seem like you would get a
very cluttered log output very quickly
but
um whenever you rewrite the commit so if
you amend it or rebate it for example
the the old version gets gets hidden and
you can also manually hide commits with
JG abandon
and um so one of the first things you'll
notice if you start using JJ is that the
working copy is is an actual commit that
gets automatically committed
and it shows up in the log output at
with an at symbol
and whenever you make any changes in the
working copy
and run the run any JJ command after
that it will get automatically amended
into that that commit
um so and you can use the JD checkout
command and that actually creates
a new commit on top of the
committee specify to keep to store your
working working copy changes and and if
you instead wanted to resume
um editing an existing commit you can
use JJ edits
so this is some very interesting
consequences like the one important one
is the the working copy would never be
dirty
so if you check out the different commit
or rebase or something it will never
tell you that you have unstaged changes
you get automatic backup because every
command you run trace is another
automatic backup for you
it makes stash unnecessary because your
working copy commits is effectively a
stash
it also makes commit unnecessary
because the well it's already committed
whenever you run a command you your
working copy is committed so
and you can
um if you if you want to set the
description commit message you can run
jda describe to do that at any time
we get more consistent CLI because the
um the the ads commit the working copy
commits behaves just like any other
commit so there are no special Flags to
to work um to act on the working copy
um and for example JD restore can
restore files between any two commits
it defaults to to restoring from the
parent of the working copy and into the
working copy just like gets restored as
I think
um but you can pass any two commits
and also uncommitted changes stay in
place so they don't move around with you
like like they do with Git checkouts
so um
another one of of JD's distinguishing
features is a first-class conflicts
um
so if you if you look at the screenshot
there we merge the working copy at with
some other commits from another branch
and that succeeds
even though there are conflicts and we
can see in the log output afterwards
that there are conflicts in the working
copy
and that's so as you see the the working
copy commit there with the ad is is a
merge commits with conflicts
so these conflicts are recorded in the
commit as in a structured way so they're
not they're not just conflict markers
stored in a file
um
and and this like design leads to even
more magic like um maybe obvious one is
that you can
you can delay
resolving conflicts until you're ready
ready until you feel like it so if in
this case
when we had this conflict we can just
check out any another commit and deal
with the conflicts later who want to
and you can collaborate on on conflict
resolutions
you can resolve some some of the
conflicts in in these files and and
leave the rest for your co-worker for
example
um maybe less obvious is that a rebase
never fails so
um and same I mean we saw that here that
merge doesn't fail the same thing is
true for rebates and or all the other
similar commands
um which makes continue and abort and
for rebase and and cherubick and all
those similar commands unnecessary
we we also get a more consistent
conflict resolution flow you don't need
to remember which command
um
creative your conflicts you just check
out the command commit with conflict
and resolve the conflicts in the working
copy and then squash the that conflict
resolution into into the parent that
that conflict or as in in the screenshot
here we we were already
editing that commit so we just resolve
the conflict in The Working copy and and
it's it's gone from the log output
um so
um
yeah I remember so so now in the working
copy here in the second screenshot
the conflict has been resolved so in the
working copy
uh sorry in in the merge commit there in
the working copy
um that that working copy resolves the
conflict right
so we're going to come back to that in
the next slide
um and and and and a very important
feature that took me a year to to
realize it was possible is that um
because uh rebase always succeeds we can
we can always uh rebase all descendants
so if you
um check out the commit in the middle of
a stack and amend in some changes uh JD
will always rebase all The Descendants
on top so you will not have to do that
yourself and it moves the branches over
and so on
um
yes so so you can uh we can actually
even rebase
conflicts and conflict resolutions I
don't have time to explain how that
works but it does
um so in the top screenshots here
this is continuation from from the
previous slide
we re-based one side of the of the merge
from conflicts conflicting change to
there and it's and it's descendants onto
conflict and change one and of course we
get the same same conflict as we had in
the the merge commit before
and that stays in the non-conflicting
change but then in the the working copy
because we we had the conflict
resolution in the working copy which was
a merge commit before
um that resolution gets rebased over and
it stays in the working copy so we don't
have a conflict in the word copy here
um then the the second screenshot
um we were on JJ move 2 that commits
with the the first commit with the
conflict and that command
is a bit like the uh all the squash that
someone talked about earlier earlier
today
but much more generic you can can move
a change from any commit into any other
commit
so in this case we're moving the changes
from the working cup it does the default
as well there's no dash from
Move It from The Working copy and into
the conflict the first quit with the
conflict
and then the conflict is gone from the
entire stack and the working copy is in
this case becomes empty
we can also rebase merges correctly
by
rebasing the diff compared to the auto
merged parents
onto the new auto merge parents
and and that's actually how JD treats
all diffs or changes in in commits not
just not just on rebasing Souls when you
when you diff a merge commit it's come
to Auto merge the parents just like the
uh
three merged if I think Elijah talked
about before
um and the the last feature I was going
to talk about is the what I call the
Operation Log
so you can think of it as the entire
repo being under Source control
um and by the entire repo I mean refs
and on Anonymous heads and the working
copies position in in each work tree
so the
um
it's it's kind of like it gets ref Vlogs
but they're
um
across all the refs at once
and there's some extra metadata as well
so in the top screenshot there you can
see the
um
their username and hostname and time of
the operation
um and as as you can can imagine having
these snapshots of the repository at a
different points in time lets us go back
and look at the repository from a
previous snapshot
so in the in the middle uh
snapshot screenshot there
we run JD status at a particular
operation and that shows like just
before we had set set the description on
the commit
so we see that the working copy commit
has no description but it has a modified
font because it was from that point in
time
yeah you can run this with any command
not just status you can do run log or
div for anything
and of course this is very useful when
for when you're trying to figure out how
um aeropostory got into a certain
certain State especially if it's not
your own repositories you don't remember
what what happened
um
and like the last screenshot there shows
how we restored the entire repo to an
earlier State before before the file was
even added in this case
so that's the second operation from the
bottom when when we're just created the
repository so the working copy is empty
and and there's no description
um and and restoring back to
an earlier State like that is is very
useful when you have made a mistake but
uh JD actually even lets you undo
just a single operation without undoing
all the later operations
so that works kind of like git reverted
us but instead of acting on
on the files in the committed acts on
the refs in an operation
so the screenshots here show how we
undo and and
an operation the second to last
operation there that abandoned the
commit so we undo that operation and it
becomes visible again
um and the Operation Log also gives us
safe concurrency even if you have run
commands on
a repositories in on different machines
and sync them via Dropbox for example
you can get conflicts of course but you
will not
lose data or or get corruption
and then in those cases if you sync two
operations via Dropbox you'll see a
merge in this Operation Log output
um we also got a simpler simpler
programming model because transactions
never fail when you started a command
um the repository is loaded at the
current the latest operation and any any
changes that happen
um concurrently will not be seen by that
by that command
and then when the command is is done it
commits that transaction as a new
operation which gets recorded recorded
and can't fail just like write me in
commit can't fail
uh and and this is actually why I
developed the the Operation Log for the
simpler programming model and then uh
the undo and trying travel stuff is just
an accident
okay so um that's all about features and
workflows and take a look let's take a
look at the architecture
uh so JD is written as rest
um
it's written as a library to be to be
easy to to reuse
uh in the CLI is the only the current
only current user of that library but
the goal is to to be able to use it in
in a in a server or a CLI or or uh GUI
or or an idea for example
um and and I hope also by making it the
library we reduce the temptation to to
rewrite and have the same problem as
kids
um and and because I wanted JD to be
able to
um integrate with the ecosystem at
Google
I try to make it easy to replace
different modules a bit
um
and so for example it comes with two
different
commit storage backends by default
one is the the kids back end the source
commits as git commits
and the other one is just a very simple
custom one but it should be possible to
write one that stores commits in the
cloud for example
and same thing with the working copy
it's
of course stored on local disk normally
but the goal is to be able to to write
to replace that with an implementation
that writes
um writes the work in copy to a VFS for
example
integrates with a smarter vs VFS by not
actually writing all the files
and also to be able to use that at
Google
it needs to be scalable to very very
large repositories
which means we can't have any operations
that need all the ancestors for example
of the commits
so to achieve that we achieved that
mostly by by laziness not fetching
objects that we don't need so
try to be careful when designing
algorithms to
like not scale to the size of the
repository or the size of a tree unless
necessary
so another important design decision was
to
um
perform operations in the repository
first not in the working copy so
when all operations are at like right
commits and update references to point
to those commits and then only
afterwards after transaction commits
actually
um is that it's the working copy updated
to to match that that states the new
state if if they're working copy even
Changed by the operation
um and it helps here that the
um the working copy is to commit because
then you even when you're running a
command that updates the working copy
that's to say the same thing you just
commit that transaction that modifies
the working copy commit and then update
to working copy afterwards
um and we get a simpler programming
model because we don't have to we always
create commits
whatever commit we need and we don't
need to worry about what's in the
working copy at that
so same same thing as Elijah was talking
about I think with merge award
um and this makes it much much faster
by not touching the working copy
unnecessarily
um
and it also helps with the laziness by
because you don't need to
fetch objects from the back end right
which might be from a server
um just in order to update the working
copy
and then update it back later for
example
and as I said JD is a is git compatible
and
um you can you can start using it on on
the same independently of others who use
the same
um git project so
um
and this this compatibility happens at a
few different levels at the lowest level
there's the the commit storage
so there is as I said there's one back
and you commit storage back in the
stores commits in in back in git
Repository
um
and uh so so that that reads commits
from the get back the backing git
repository the the object store uh and
converts it into the in-memory in memory
uh representation that JJ expects
um and there's also a functionality for
uh importing gitreps uh into jda and
exporting refs to get
um and and for like interacting with Git
remotes
and of course these commands only work
in when you're when the back end is the
get back end
not with the custom backend
um and of course since I I worked on
Mercurial for many many years
um there are many good things I want to
replicate
such as the the simple clean ux with for
example
um its rev sets language for selecting
revisions
and uh the simpler workflows without the
staging area so
I copied those
mercurial's history rewriting sport is
also pretty good with HD split and phone
for example for for splitting and uh and
uh squash squashing commits
and uh HD rebase which rebases a whole
three up commits not just a linear
Branch or a single Branch at least
um so I copy those things as well
and then Mercurial has is very
customizable
mostly thanks to being written in Python
so that since JJ has written in Rust we
can't just copy that but we can you
can't use monkey patching
in the same way
um but I hope the the modular design I
talked about earlier at least helps
there but it still have a long way to go
if
to be as customizable as as material
okay so
um let's take a look at our plans for
the projects the open source project and
and our Integrations at Google
um so remember this is has probably has
been my 20 project for most of its
lifetime so there's a lot of features
and functionality missing
um like for example GC and repacking is
not implemented
so we need to implement that for both
commits and and operations
for backings that lately fetch objects
over the network to make to make that
perform okay we need to do
some batch prefetching and caching
there is no support for
copies or renames
so we have to
decide if we want to do to track them
like Mercurial or bitkeeper does for
example or if you want to detect them
like like it does
happy to hear suggestions
and just lots of other features are
missing on this particular one I
actually got a pull request from for a
few days ago so but still like get blame
for example is not there
um we need to make it easier to replace
different components like for example
adding a custom command or adding a
custom backend without like using while
using the API or not having to to patch
the source code
um
and yeah language bindings again want to
avoid rewrites so hope people can use
the API instead
so we want to make that as easy as
possible in different languages
security hardening and
the the custom back end uh the
non-get-back and uh is has very little
support like there is no support for
push and pull for example
um so
that would need to be done at some point
but the the get back is just
um
what I would recommend to everyone for
many many years ahead anyway so
and of course
contributions are very welcome the URL
is for the project is there
we don't want this to be just Google
running the product
um and yes I said my this is now my
full-time project at Google
uh and and the the goal of that project
is to
um to improve the internal development
ecosystem
there are
two big main parts of this this project
one is to
replace Mercurial by JJ internally
the other part is to move the
commit storage and
repository storage out of the file
system and into the cloud
so we're hoping to create a single huge
commit graph with all the commits from
all Google Developers
in in just one graph
and and by having the
repository is available in the cloud we
you should be able to
access them from anywhere and and from
like a cloudy Audi for example or review
tool should be able to access your
repository and modify it
so
um and this diagram here shows how we're
planning to accomplish that
so most of this is
the same as on on a previous slide
I've added some new systems and
Integrations with in red
one of the first things we're going to
have to do is to replace the commit
graph or which is called commit inductor
in the in the diagram
uh because it currently assumes that all
ancestors are are available which of
course doesn't scale to the size of our
monorail
so that's uh we're going to have to
rewrite to make that lazy
um
I'm probably going to to use something
called segmented change log that our
friends at meta developed for their
mercurial-based VCS
um then we're going to add
new new backends for the internal
cloud-based storage of commits and
operation logs well operation logs are
basically repositories
um and we will add a custom working copy
implementation for our
VFS
so we we we're hoping to not use sparse
checkouts so instead this
our VFS is we can just tell the the BFS
which commit to check out basically and
and like a few files on top
um
and we'll add some we'll add a bunch of
commands probably for
integrating with our internal
integration with our existing review
reviews and and like starting to reuse
or merging into Mainline and stuff like
that
and then um we're going to add a server
on top of this that will be used by the
cloud ID and review tool
um yeah that was all I had so thanks for
listening if you have any questions you
can find a link to
to Discord chat for example on on from
their poster repository page GitHub page
um and feel free to email me at
martinvancy google.com
thank you
[Applause]