MZUWWTJLL7W7QAGTT4TL5RK56ILCZ4FNJUWF43NTRXAMDDT7TBKAC
As a first approximation, repositories can be thought of as representing a single file by a directed graph $G = (V, E)$ of lines, where each vertex $v\in V$ represented a line, and an edge from $u \in V$ to $v\in V$, labelled by some change (also called patch) number $c$, could be read as "according to change $c$, line $u$ comes before $v$".
As a first approximation, one can think of a repository as a single file represented by a directed graph $G = (V, E)$ of lines of text, where each vertex $v\in V$ represents a line of text, and an edge from $u \in V$ to $v\in V$, labelled with a change (also called patch) number $c$, could be read as "according to change $c$, line $u$ comes before $v$".
This means that changes could introduce vertices and lines, as in the following example, where a line $D$ is introduced between $A$ and $B$:
This means that changes may introduce both vertices and edges, as in the following example, where a line of text $D$ is introduced between $A$ and $B$:
Here, the thick line represents the change from the file containing the lines $A$, $B$, $C$ to the file with the new line $D$.
An important feature to note is that **vertices are uniquely identified**, by the hash of the change that introduced them, along with a position in that change. This means that two lines with the same content, introduced by different changes, will be different. It also means that a lines keeps its identity, even if the change is applied in a totally different context.
Here, the thick arrow represents a change $c_0$ from a file containing the lines $A$, $B$, $C$, to a file which includes the line $D$. As mentioned above, the edges are labelled with the change that introduced them, in this case $c_0$.
An important feature to note is that **vertices are uniquely identified**, by the hash of the change that introduced them, along with a position in that change. This means that two lines of text with the same content, introduced by different changes, will be different. It also means that a line keeps its identity, even if the change is applied in a totally different context.
Moreover, this system is append-only, in the sense that *deletions* are handled by a more sophisticated labelling of the edges. In the example above, if we want to delete line $D$, we just need to make a change mapping the edge introduced by $c_0$ to a deleted edge, which we label by the name $c_1$ of the change that introduces it:
Moreover, this system is append-only, in the sense that *deletions* are handled by a more sophisticated labelling of the edges. In the example above, if we want to delete line $D$, we just need to make a change mapping the edge introduced by $c_0$ to a deleted edge, which is also labelled with the change in which it was introduced, this time $c_1$:
Our goals is to find the smallest possible system, both for reasons of mathematical aesthetics (why store useless stuff?) and the other one for performance. Therefore, one immediate question comes to mind: why even keep the change number on the edges?
Our goal is to find the smallest possible system, both for reasons of mathematical aesthetics (why store useless stuff?) and the other one for performance. Therefore, one immediate question comes to mind: why even keep the change number on the edges?
This situation, where Alice writes something in the middle of a paragraph $p$, while Bob deletes $p$ in parallel.
One issue here, is that the situation is not symmetric: when Bob applies Alice's change, he can tell immediately that something is wrong, because the context of Alice's edits is labelled as deleted in his repository.
In this scenario, Alice writes something in the middle of a paragraph $p$, while Bob deletes $p$ in parallel.
One issue here is that the situation is not symmetric: when Bob applies Alice's change, he can tell immediately that something is wrong, because the context of Alice's edits is labelled as deleted in his repository.