Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Tuesday, June 06, 2017

(D)VCS branching models: notes in progress

I like branches.  But I don't like YOUR branches

I don't like git's branches. But then again, I don't really like Mercurial's branches. Or Bazaar's branches. Or Perforce/SVN/CVS/RCS branches. I may be polluted: I used CVS branches extensively, back in the day. Heck, I used RCS branches in RCS-wrapper-tools. I've used Mercurial and Bazaar branche extensively.   I like what Brad Appleton has written about branches in
Streamed Lines: Branching Patterns for Parallel Software Development - but then again, Brad and friends are really talked about streams of software development, not branches.  This may really be the problem: several different concepts use the same underlying mechanism.

This post is work in progress. I want to make some notes about branches, often in reaction to statements in other webpages. I will try to properly reference those webpages - but I am more interested in evolving the ideas than being properly academic.

Why I am writing this

1) Writing stuff like this helps me understand the differences between tools, and adapt my work style to new tools.  Although I have been using git for more than 10 years, it has only become my primary VCS recently for personal stuff - and I have to use Perforce at work. Most recently I mainly used Bazarr for personal stuff, but bzr is declining. Mercurial for some projects at work.  Plus, in the more distant past, SVN, CVS, RCS.

Similarly, I may not have noticed features added during git's evolution.

2) I am intrigued by analogies between version control software and OOO speculative hardware.  Git, in particular, is all about rewriting history to be "nice". OOO speculation is similarly about making execution appear to be in a serializable order.  Similarly, memory ordering hardware often performs operations in an order inconsistent with the architectural memory ordering model, but monitors and/or repairs to be consistent.

3) I am just plain interested in version control.

3') I had started writing my own DVCS that I abandoned when git came out.  Mine intended better support for partial checkins and checkouts, not just of workspaces, but of entire repo.  It was intended to be able to handle repos for the same or overlapping source trees that had been created independently - i.e. that did not have common ancestors within recorded history. (Why?  Think about it...)

Immediate Trigger

I knew that git branches are really just refs to versions - with what others might call a branch being some form of transitive closure of ancestry. Not quite the same thing, but tolerable.

Even with this, I felt strongly that git branches are not really first class.

I was flabbergasted when I learned that git branch descriptions are not transferred to remote repositories.

This suggests a definition of a requirement for a DVCS to treat something as a first class concept:  the objects representing that concept should be versionable, and pushable remotely.   

Random Note Snippings

---+ Branch Names and Versions

Several writers on DVCS, usually git advocates, have said that the problem with Mercurial style branches is that they are recorded in the commit history, and that this prevents deleting or renaming branches.

For example:
[Contreras 2011] In Mercurial, a branch is embedded in a commit; a commit done in the ‘do-test’ branch will always remain in such a branch. This means you cannot delete, or rename branches, because you would be changing the history of the commits on those branches. 
although I recall but cannot find a better statement.

(Yeah: Mercurial's obsession with immutable history tends to get in the way of clear thinking.  But HGers (huggers?) rewrite history all the time, eg via rebase. So imagine that we are talking about a hypothetical VCS that wants to keep some of the good things about CVS and HG and BZR style branches.)



So: branch names need to be deleted and renamed.  It would also be nice to be able to hide the branch names and the branch contents. But probably more important, branch names may need to be reused.  And quite likely different developers may want to have different branches that have the same name, i.e. different branches with the same name may need to be distinguished, especially if simultaneously active.

Below, I go on about naming conventions for contours (a set of file versions), and branches. E.g. names that are permanent, e.g. a contour name RELEASE-2017-06-15-03h13UDT_AFG, versus floating LATEST-RELEASE. Or a branch name, like a task branch BUGFIX-BRANCH-ISSUE#24334, versus a more longlasting branch or stream R1+BUGFIXES-MAINTENANCE-BRANCH.

Insight: whenever you are tempted to put uniqifying info like date or unique-number in a name, you are thinking about versioning.

Wait!  We are talking about version control systems!!!  VCSes are all about uniqifying different objects with the same name!   For that matter, so are hierarchical directory structures.  And so on, e.g. object labels and tags.

==> How to distinguish different branch objects with same name.

==> Encourage actions that helpm distinguish branches.

E.g. instead of saying "switch to branch BBB", where BBB is created if it does not already exists,

Prefer "create new branch BBB" which may warn you if name BBB already exists.

==> PROBLEM:  tools that might simply go "merge branchname" might now have to say "branchname is not unique - which instance of branchname do you want to merge?"  Yet another error case, but not necessarily a real error, just an ambiguity.

How do we resolve such ambiguities?

a) query

b) priority - eg. PATH for hierarchical. Choose the branch that is "closest" to the guy doing the merge.



---+ Contours? Who needs Contours?

"Contour" is my old RCS-era name for a set of file versions.  

"Whole repo" VCSes don't need contours, since any commit implies state for all files.

Except... when a project is assembled from multiple repos.  Even here, VCSes that have subrepo support usually are smart enough to include the commit or checkin or version number of all subrepos in their top level commit.

But .. subrepos don't scale all that well.  E.g. not so much for my personal library, where each directory node should be considered seperately versionable.

---+ SVN / Perforce Style branches

As I say elsewhere:
VCSes are all about uniqifying different objects with the same name!   For that matter, so are hierarchical directory structures.  
SVN and Perforce subsume the former in the latter: branches are really just trees in the hierarchical directory structure.

...Pros/cons. Workspaces assembled from multiple branches.   Where does the branch level live? "Floating"

---+ Git Branch Descriptions are not first class

[StackOverflow 2012 - git - pushing branch descriptions to remote]  The description is stored in the config file (here, the local one, within your Git repo), then, no, branch descriptions aren't pushed. Config files are not pushed (ever). See "Is it possible to clone git config from remote location?"

Simple text files are, though, as my initial answer for branch description recommended at the time.
Branch descriptions are all about helping make an helpful message for publishing. Not for copying that message over the other repos which won't have to publish the same information/commits.
 I can't criticize the guy who provided this answer, VonC, because earlier he discussed exactly this issue, proposing using text files to hold pushable branch descriptions - in exactly the same way that I have hacked branch descriptions before in other VCSes, and with exactly the same problems.

Using text files to hold branch descriptions is potentially an example of what I might call a file that wants to cross branch boundaries.  Or, a workspace that is mostly branched, but which usually contains the mainline of the branch description text file.

Sure, you may not always want that.  But it is nice to be able to do so.

---+  [StackOverflow 2009]

[StackOverflow 2009]: Git glossary defines "branch" as an active line of development. This idea is behind an implementation of branches in Git. ... The most recent commit on a branch is referred to as the tip of that branch. The tip of the branch is referenced by a branch head, which is just a symbolic name for this commit.

A single git repository can track an arbitrary number of branches, but your working tree (if you have any) is associated with just one of them (the "current" or "checked out" branch).

GLEW COMMENT: I have often wanted to create working trees which are composed of several branches. Yeah, yeah - you can simulate this by merges - but I want to make it convenient. 

E.g. say that a particular configuration = mainline of most code, but the FOO branch of some library libFoo.   Yes, this is almost equivalent to saying that the this configuration is really all the FOO branch - but it provides more information, in saying that "Yes, the configuration is FOO specific, but in general we expect only the libFoo library to be different with FOO."  

My thoughts on partial checkins and checkouts often involve this. More, partial repositories. Referencing tools and repos that have separate version ciontrol systems.  libXXX may be checked into its own repo in isolation as that repo's mainline.   But from the point of view of some other tool that uses XXX, say T,, libXXX's mainline is not T's mainline. Yet(?).  A partial checkin of libXXX amounts to creating a CANDIDATE for T's mainline.  Once the candidate is tested, it becomes T's mainline, assuming tests pass.  But if tests fail, T's version of libXXX may lag, or may fork and diverge from libXXX's mainline.

This notion of "candidate" maps well to Git's model.  Such a candidate is just a HEAD. Once tested, the candidate label may go away, and no longer clutter our listings of branches and tags and other named references.


---+ [Contreras 2011] and [Contreras 2012] 


[Contreras 2011] and [Contreras 2012] provided good comparisons of the Git and Mercurial branching mechanisms,.  But Contreras is fairly rabid about git, and makes many statements of the form "Which would anyone ever need to do it in that way?  There's a different way to do it in git. Or, you should not need to do it - I never have." That sort of statement pisses me off, even when I agree with it.

[Contreras 2011] Reacting to Google’s analysis  comparing Hg with Git, that says that History is Sacred.
This was an invalid argument from the beginning. Whether history is sacred or not depends on the project, many Git projects have such policy, and they don’t allow rebases of already published branches. You don’t need your SCM to be designed specifically to disallow your developers to do something (in fact rebases are also possible in Mercurial); this should be handled as a policy. If you really want to prevent your developers from doing this, it’s easy to do that with a Git hook. So really, Mercurial doesn’t have any advantage here.

GLEW COMMENT:


(1) I agree: it MUST be possible to change history. 


(1.1) Or at least to be able to remove some things from the history, e.g. it must be possible to remove code that you do not have a license for, that was inappropriately checked into your repo.  Or possibly code that you HAD a license for at some point in time, but for which the license expired.

I would prefer it if the code with license problems was removed, but some sort of note left behind.  Possibly an automated note, e.g. with a crypto checksum/hash and other metadata, sio that you could determine what the missing code should be if you ever again have a license.

But I can also imagine the need to hide one's tracks: to completely expunged all mention of the unlicensed code.  Trying to avoid lawsuits.

(1.2) Plus, I like the good history rewriting stuff like rebase.


(1.2') Even better if we can change our view of the history, without losing the history

BUT...  I really would prefer that rebase did not lose history.   I think that it can sometimes be useful to know that a branch started off with a different original base, and was rebased later.  If nothing else, it can explain bugs caused by the rebased code using an idiom that was otherwise eliminated between original base and the new rebase's origin.  I think of this as an original branch, and a rebase'd shadow of that original branch.

Yes, clutter:  But I think that we need to create a UI that hides such clutter, that presents only the clean history, but which remembers all the dirty details.

[Contreras 2011]   It’s all in the branches ... Say I have a colleague called Bob, and he is working on a new feature, and create a temporary branch called ‘do-test’, I want to merge his changes to my master branch, however, the branch is so simple that I would prefer it to be hidden from the history.

GLEW COMMENT:  so hide it already.   Hide = leave in the history, but don't show it by default.  As opposed to removing it from the history.

[Contreras 2011]  hg branch != git branch In Git, a branch is merely one of the many kinds of ‘refs’, and a ‘ref’ is simply a pointer to a commit. ... In Mercurial, a branch is embedded in a commit; a commit done in the ‘do-test’ branch will always remain in such a branch. This means you cannot delete, or rename branches, because you would be changing the history of the commits on those branches. You can ‘close’ branches though. As Jakub points out, these “named branches” can be better thought as “commit labels”.
GLEW COMMENT:  Key: git branches are just refs.  Specifically, the ref to the tip of what other models call a branch.  AFAICT there is not much distinguishing a git branch from other refs.  
There should be different types of ref.   E.g. a named ref, i.e. a VERSION of all files.   Some VERSIONS are intended to be fixed, immutable - e.g. "Passes-all-tests-date-YYYY-MM-DD-HH". Other VERSIONS "float" - e.g. "Passes-all-tests-LATEST".   
But such a version named ref is very different from a branch.  A branch is a set of versions, that probably have some parent-child relationship. I.e. a (contiguous) path through the DAG.
[Contreras 2011] In Mercurial, a branch is embedded in a commit; a commit done in the ‘do-test’ branch will always remain in such a branch. This means you cannot delete, or rename branches, because you would be changing the history of the commits on those branches. 
Bullshit. Obviously Mercurial has history rewriting tools, that can do things like deleting or renaming branches.
But, an important point underlies the git-centricity:  Mercurial records the branch a commit was made on in the commit metadata.  By default.  Obviously git can also do this - see [StackOverflow 2015 - add Git branch name to commit message] - but it does not do so by default.
"By default" matters.  One of Glew's Rules: First provide the capabilities. Then design the defaults. Git may provide the capabilities.  But many properties are implicit, convention, in git.  Not first class.
And, yes, branches may need to be renamed. (Although as usual I would like to be able to rename, but also remember the old name).   For gitters that have added branch names to the commit message, you could edit all the commit messages.  But if the branch name is typed metadata, standardized, it could be automatically recognized and renamed.
GLEW COMMENT:  since Git's "branches" are really just the tips of a branch, the set of versions on the branch is really the set of ancestors. Whereas Mercurial's branches, labelled in the commit history, indicate path taken for different reasons.
[Contreras 2011]  I paraphrase: "Mercurial bookmarks are like git refs (bit with no namespace support)."

One poster said that "Mercurial really wants a linear history".   But the git advocates' examples often rewrite a nonlinear history like
 
to a linear history
Seems to me like the gitters want a linear history, and delete (not hide) the non-linearities.
TBD: put an example of what I mean: messy history, and linearized "clean" view.
GLEW COMMENT: I was pissed first time I created a task branch in git, and then merged. In CVS and Mercurial (and probably others) I expected and wanted to see a node on the master saying "merged task branch".  Even if there had been no intervening changes on the master.  Instead git just pointed the master's HEAD to the task branch - i.e. the task branch lost its identity.  Better have done [StackOverflow 2015 - add Git branch name to commit message] !!! - if the task branch name was the bug number.  (Yeah, yeah, you can just add a hook.  Everything can be hooked. Yeah, yeah.  (That's an example in English of a double affirmative being a mocking negative.))

(Eventually learned about Git --no-ff, disabling "fast forwarding" on merges.)

[Contreras 2012] The fundamental difference between mercurial and git branches can be visualized in this example:
Merge example
In which branches is the commit ‘Quick fix’ contained? Is it in ‘quick-fix’, or is it both in ‘quick-fix’ and master? In mercurial it would be the former, and in git the latter. (If you ask me, it doesn’t make any sense that the ‘Quick fix’ commit is only on the ‘quick-fix’ branch)
In mercurial a commit can be only on one branch, while in git, a commit can be in many branches (you can find out with ‘git branch --contains‘). Mercurial “branches” are more like labels, or tags, which is why you can’t delete them, or rename them; they are stored forever in posterity just like the commit message.
GLEW COMMENT: Yes, this is a key difference.
We might talk about branches and sub-branches.  'Quick-fix' is a sub-branch of 'master'.
There might be branches  or paths that start off in the 'master' branch, and end up in 'some-other-branch'. Such a "crossing-branch" is not really a sub-branch at all.
In fact, a branch that is merged and then terminated is no longer a branch at all.  At least, in trees, branches usually do not start off low down in the trunk, and then merge back into the trunk.  Although this can be arranged by grafting.  Hortitorture.
I would like to have better terms.  "Streams" can diverge and recombine, but "streams" are too dynamic. "Paths" may be a better term, although paths can be bidirectional, and version control systems usually go forward in time. Paths can fork and merge.  Paths may be created out of distinct stepping stone nodes.
(Hmm: railway "tracks" might be even better than paths. Similarly bidirectional. Tracks can fork and merge. Tracks can be shunts. Sidings. Tracks have railway ties => rather like nodes.)
(Or possibly roadways. Networks of one way streets.  Side streets, dead ends, cul de sacs.  Mutiple lanes, that may be divided - rather like the parallel streams we so often see. Service roads running beside major highways. ...)

(Later: perhaps "routes", as in rock-climbing?  Rock-climbing routes are usually, mostly, one-way.  Although I like downclimbing, most people rappel down; and down-climbing is different enough that down-climbing routes are frequently not the same as up-climbing routes.)

(Or, how about ski-trails?  Again, mostly one-way, downhill in this case.)
But "branches" are the term most people use.  Even though many people have different ideas about what a branch means.
So back to talking about branches and sub-branches. 'Quick-fix' is a sub-branch of 'master'. 'Quick-fix' is one of the two paths that lead from the initial commit to the head of the master path above. The checkin "Quick fix" is on the branch(path) "quick-fix", and leads to the node "Merge branch quick-fix" on branch "master".
AFAICT git has no concept of a branch, a path, a contiguous directed linear subset of nodes, versus te set of all nodews/paths leading to a node.
Much of [Contreras 2012] amounts to confusion about these concepts.
And then piling on immutability, Mercurial's recording branch in the commit metadata.



Merge example
GLEW COMMENT: the way this graph is drawn is biased towards git's model, where the branch is designated by its youngest node.   TBD: draw with 2 or more nodes on each path.  Color the sets of nodes on each path as the branch.

[Contreras 2012] Anonymous heads are probably the most stupid idea ever; in mercurial a branch can have multiple heads. So you can’t just merge, or checkout a branch, or really do any operation that needs a single commit.
One of GLEW'S OBSERVATIONS: the most important thing is to be able to give something a name.  The next most important thing is to not be required to give it a name.
Mercurial's anonymous heads can be a pain.  Just like arithmetic zero.

[Contreras 2012] Git forces you to either merge, or rebase before you push, this ensures that nobody else would need to do that


[Contreras 2012]
I didn’t ask for a list of all the commits that are currently included in the head of the branch currently named ‘release’ that are not included in the head of the branch currently named ‘master’. I wanted to know what was the name of the branch on which the commit was made, at the time, and in the repository, where it was first introduced.
How convenient; now he doesn’t explain why he needs that information, he just says he needs it. ‘git log master..release‘ does what he said he was looking for.
Pissant arrogance, lack of of imagination. Here's an example of why you might want the branch name: some workflows put a BugFix#, Issue#, or ECO#, in the branch name.
Sure, there are other ways to do that, both in git and other VCSes.
But: it's a convention, as are, usually, those other ways.
Here's another way of thinking about compatibility between VCSes: it would be nice if procedures and concepts ported.  It would be nice if you could import from, say, Mercurial to git, and then export back to Mercurial, and get (almost) exactly the same repo.


Some articles and references

[StackOverflow 2009] := StackOverflow: Pros and Cons of Different Branching Models in DVCS

[Brad 1998] := Streamed Lines: Branching Patterns for Parallel Software Development TBD notes

[Contreras 2011] := Mercurial vs Git - It's All in the Branches.  Nice overview, although Git biased.

[Contreras 2012] := No, mercurial branches are still not better than git ones; response to jhw’s More On Mercurial vs. Git (with Graphs!)

[StackOverflow 2015 - add Git branch name to commit message]

 [Stackoverflow 2009 Jakub] :=  Git and Mercurial – Compare and Contrast - much liked by [Contreras 2011] TBD - notes

TBD: J. H. Woodyatt’s blog post 
Why I Like Mercurial More Than Git More On Mercurial vs. Git (with Graphs!)

[StackOverflow 2010 - Branch descriptions in git] - especially interesting to me because, along with mention of the then new branch description feature, VonC discusses shortcomings of that feature, and use of text files as a not-really-satisfactory but possibly better alternative.