pijul/manual: src/why

# Why Pijul?

Pijul is the first distributed version control system to be based on a sound mathematical theory of changes. It is inspired by [Darcs](http://darcs.net), but aims at solving the soundness and performance issues of Darcs.

Pijul has a number of features that allow it to scale to very large repositories and fast-paced workflows. In particular, **change commutation** means that changes written independently can be applied in any order, without changing the result. This property simplifies workflows, allowing Pijul to:

 - **clone sub-parts of repositories**
 - **solve conflicts reliably**
 - **easily combine different versions**.

### Change commutation

In Pijul, for any two changes A and B, either A and B can be applied in any order, or A depends on B, or B depends on A.

- **[Use case: In the early stage of a project]** Change commutation
  makes Pijul a highly forgiving system, as you can "unapply"
  (or "unrecord") changes made in the past, without having to change
  the identity of new changes. A reader familiar with Git will
  understand "rebasing".

  This tends to happen pretty often in the early stages of a project
  when most things are still uncertain. With Pijul, exploring new
  features and new implementations comes at no extra cost in time.

- **[Use case: In a mature project]** As your project grows, change
  commutation saves even more time: imagine a project with two main
  branches, a stable one accepting only bugfixes, and an unstable
  one, where changes happen constantly.

  The team working on the unstable branch is likely to discover old
  bugs, and fix them in the stable branch too.

  In Pijul, maintainers of the stable branch can simply pull only the
  changes they are interested in. Pulled changes *do not* change when
  imported, which means that pulling new changes will work just as
  expected.


### Associativity

In Pijul, change application is an associative operation, meaning
that applying some change A and then a set of changes (BC) at once
yields the same result as applying (AB) first and then C.

With branches, the first scenario looks like this: Bob creates A,
while Alice creates B, C, and Bob finally merges both B and C at once.

<div style="text-align:center">
<img src="./associativity0.svg"/>
</div>

The second scenario would look like the following: Bob creating
commit A and then pulling B. At that moment, Bob has both A and B on
his branch and wants to pull C from Alice.

<div style="text-align:center">
<img src="./associativity1.svg"/>
</div>


Note that this differs from change reordering: here, we apply A,
then B, then C, in the same order in both scenarios.

Using math words such as "associative" for such a simple operation may
sound like nitpicking because intuition suggests it should always be
true. However, **Git doesn't guarantee the associative change property**,
even if A, B, and C do not conflict.

Specifically, Git (and relatives) can **sometimes shuffle lines around**,
because these systems *only* track versions rather than the changes that
happen between the versions. And even though one can reconstruct one from
the other, the following example
(taken from [here](https://tahoe-lafs.org/~zooko/badmerge/simple.html))
shows that tracking versions only does not yield the expected result.


<div style="text-align:center">
<div style="display:inline-block">
<img style="margin:1em 2em;clear:both;display:block" src="./badmerge0.svg"/>
Git merge (which A is which?)
</div>
<div style="display:inline-block">
<img style="margin:1em 2em;clear:both;display:block" src="./goodmerge.svg"/>
Pijul merge
</div>
</div>

In this diagram, Alice and Bob start from an identical file with lines A and B.
Alice adds G above everything and then another instance of A and B above that
(her new lines show green). Meanwhile, Bob adds a line X between the original A
and B.

Git, SVN, and Mercurial will merge this example… into the file shown on the
left, with the relative positions of G and X swapped, whereas Pijul
(and Darcs) yield the file on the right, preserving the order between the
lines. Note that this example **has nothing to do with a conflict** since the
edits happen in different file parts. Furthermore, neither Git nor Pijul will
report a conflict in this case.

The reason for the counter-intuitive behavior in Git is that Git runs
a heuristic algorithm called three-way merge or diff3. Diff3 extends diff to
two "new" versions instead of one. Note, however, that diff has multiple
optimal solutions, and a single change can be described equivalently by
different diffs. While this is fine for diff (since the patch resulting from
diff has aunique interpretation), it is ambiguous in the case of diff3 and
might lead to an arbitrary reshuffling of files.


It is prudent to note that change associativity
**does guarantee the result will have intended semantics**, because languages
have context-specific rules. Every change should be tested and go through code
review. However, **the code review won't be made pointless** by reshuffling
lines by the version control tool.

## Modeling conflicts

Conflicts are a regular thing in the internal representation of a Pijul
repository. After applying new changes, we have to do extra work to find where
the conflicts are.

In particular, edits from both sides of a conflict get applied without
resolving the conflict. This guarantees no information ever gets lost.

This is different from both Git and Darcs:

- Git writes conflicts into the working directory and refuses to
  commit any changes to the repository until conflicts get manually
  resolved.

- In Darcs, conflicts can trigger the [exponential merge
  problem](http://darcs.net/FAQ/Performance#is-the-exponential-merge-problem-fixed-yet),
  which might cause it to take several hours to merge even a two-lines
  change.


## Comparisons with other version control systems

### Pijul for Git/Mercurial/SVN/… users

The main difference between Pijul and Git (and related systems) is
that Pijul stores changes (or patches), whereas Git deals only with
snapshots (or versions).

There are many advantages to using changes. First, changes are the
intuitive atomic unit of work. Moreover, changes can be merged
according to formal axioms that guarantee correctness in 100% of cases.

In contrast, commits have to be /stitched together based on their
contents rather than on the edits that took place/. This is why
conflicts are often painful in these systems, as there is no natural way to
solve a conflict once and for all (for example, Git has the `rerere`
command to try and simulate that in some cases).

### Pijul for Darcs users

Pijul is a mostly formally-correct version of Darcs' theory of
changes and a new algorithm for merging changes. Its main
innovation compared to Darcs is to use a better data structure for its
pristine, allowing for:

- A sane representation of conflicts: Pijul's pristine is stored in a
  "conflict-tolerant" data structure. Many changes can be applied to
  it, and the presence or absence of conflicts are only computed
  afterward by looking at the pristine.

- Conflicting changes always commute in Pijul and never commute in
  Darcs.

- Fast algorithms: Pijul's pristine can be seen as a "cache" of applied
  changes to which new changes can be applied directly without having
  to compute anything on the repository's history.


However, Pijul's pristine format is designed to only comply with axioms on
a specific set of operations. As a result, some of the Darcs'
features, such as `darcs replace`, have yet to be made available.