Why Pijul?

Pijul is the first distributed version control system to be based on a sound mathematical theory of changes. It is inspired by Darcs, but aims at solving the soundness and performance issues of Darcs.

Pijul has a number of features that allow it to scale to very large repositories and fast-paced workflows. In particular, change commutation means that changes written independently can be applied in any order, without changing the result. This property simplifies workflows, allowing Pijul to:

  • clone sub-parts of repositories
  • solve conflicts reliably
  • easily combine different versions.

Change commutation

In Pijul, for any two changes A and B, either A and B can be applied in any order, or A depends on B, or B depends on A.

  • [Use case: In the early stage of a project] Change commutation makes Pijul a highly forgiving system, as you can “unapply” (or “unrecord”) changes made in the past, without having to change the identity of new changes. A reader familiar with Git will understand “rebasing”.

    This tends to happen pretty often in the early stages of a project when most things are still uncertain. With Pijul, exploring new features and new implementations comes at no extra cost in time.

  • [Use case: In a mature project] As your project grows, change commutation saves even more time: imagine a project with two main branches, a stable one accepting only bugfixes, and an unstable one, where changes happen constantly.

    The team working on the unstable branch is likely to discover old bugs, and fix them in the stable branch too.

    In Pijul, maintainers of the stable branch can simply pull only the changes they are interested in. Pulled changes do not change when imported, which means that pulling new changes will work just as expected.

Associativity

In Pijul, change application is an associative operation, meaning that applying some change A and then a set of changes (BC) at once yields the same result as applying (AB) first and then C.

With branches, the first scenario looks like this: Bob creates A, while Alice creates B, C, and Bob finally merges both B and C at once.

The second scenario would look like the following: Bob creating commit A and then pulling B. At that moment, Bob has both A and B on his branch and wants to pull C from Alice.

Note that this differs from change reordering: here, we apply A, then B, then C, in the same order in both scenarios.

Using math words such as “associative” for such a simple operation may sound like nitpicking because intuition suggests it should always be true. However, Git doesn’t guarantee the associative change property, even if A, B, and C do not conflict.

Specifically, Git (and relatives) can sometimes shuffle lines around, because these systems only track versions rather than the changes that happen between the versions. And even though one can reconstruct one from the other, the following example (taken from here) shows that tracking versions only does not yield the expected result.

Git merge (which A is which?)
Pijul merge

In this diagram, Alice and Bob start from an identical file with lines A and B. Alice adds G above everything and then another instance of A and B above that (her new lines show green). Meanwhile, Bob adds a line X between the original A and B.

Git, SVN, and Mercurial will merge this example… into the file shown on the left, with the relative positions of G and X swapped, whereas Pijul (and Darcs) yield the file on the right, preserving the order between the lines. Note that this example has nothing to do with a conflict since the edits happen in different file parts. Furthermore, neither Git nor Pijul will report a conflict in this case.

The reason for the counter-intuitive behavior in Git is that Git runs a heuristic algorithm called three-way merge or diff3. Diff3 extends diff to two “new” versions instead of one. Note, however, that diff has multiple optimal solutions, and a single change can be described equivalently by different diffs. While this is fine for diff (since the patch resulting from diff has aunique interpretation), it is ambiguous in the case of diff3 and might lead to an arbitrary reshuffling of files.

It is prudent to note that change associativity does guarantee the result will have intended semantics, because languages have context-specific rules. Every change should be tested and go through code review. However, the code review won’t be made pointless by reshuffling lines by the version control tool.

Modeling conflicts

Conflicts are a regular thing in the internal representation of a Pijul repository. After applying new changes, we have to do extra work to find where the conflicts are.

In particular, edits from both sides of a conflict get applied without resolving the conflict. This guarantees no information ever gets lost.

This is different from both Git and Darcs:

  • Git writes conflicts into the working directory and refuses to commit any changes to the repository until conflicts get manually resolved.

  • In Darcs, conflicts can trigger the exponential merge problem, which might cause it to take several hours to merge even a two-lines change.

Comparisons with other version control systems

Pijul for Git/Mercurial/SVN/… users

The main difference between Pijul and Git (and related systems) is that Pijul stores changes (or patches), whereas Git deals only with snapshots (or versions).

There are many advantages to using changes. First, changes are the intuitive atomic unit of work. Moreover, changes can be merged according to formal axioms that guarantee correctness in 100% of cases.

In contrast, commits have to be /stitched together based on their contents rather than on the edits that took place/. This is why conflicts are often painful in these systems, as there is no natural way to solve a conflict once and for all (for example, Git has the rerere command to try and simulate that in some cases).

Pijul for Darcs users

Pijul is a mostly formally-correct version of Darcs’ theory of changes and a new algorithm for merging changes. Its main innovation compared to Darcs is to use a better data structure for its pristine, allowing for:

  • A sane representation of conflicts: Pijul’s pristine is stored in a “conflict-tolerant” data structure. Many changes can be applied to it, and the presence or absence of conflicts are only computed afterward by looking at the pristine.

  • Conflicting changes always commute in Pijul and never commute in Darcs.

  • Fast algorithms: Pijul’s pristine can be seen as a “cache” of applied changes to which new changes can be applied directly without having to compute anything on the repository’s history.

However, Pijul’s pristine format is designed to only comply with axioms on a specific set of operations. As a result, some of the Darcs’ features, such as darcs replace, have yet to be made available.