Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Thursday, October 25, 2012

Multiple paths of history

I have blogged, whined, ranted and complained about systems that allow you to, nay, encourage you to, edit history.  But I also whine about systems that do not allow you to do something as trivial as fix a typo in a checkin message, let alone systems like Mercurial that do not really allow you to add properties like "passes the days-long QA test suite" to a version of the code in the repository.

Moreover, I really do understand why you want to rearrange history. My classoc example is of interleaved changes

  V0---(a1)--->Va1---(b1)--->Va1b1---(a2)--->Va2b1---(b2)--->Va2b2 = Vab

I.e. starting off at version V0, apply changeset (a1) to get version Va1, then changeset (b1) to get version Va1b1, and so on until you get final version Va2b2, which we will call simply version Vab.

I.e. there are really two independent changes going on, (a)=(a1)+(a2) and (b)=(b1)+(b2).  But they got interleaved.

Try as you might, this happens.

Using history rewriting tools like git rebase and hg graft you can rearrange the histories.


  V0---(a1)--->Va1--->(a2')--->Va --->(b1')--->Vab1'---(b2')--->Vab

or in the other order


  V0---(b1")--->Vb1"--->(b2")--->Vb --->(a1")--->Va1"b---(a2")--->Vab

Note that I have tried to indicated by priming, ' and ", that the changesets applied in different order may not be identical. Note: we are not talking about applying the SAME changesets in a different order, which might give different final results Vab, but instead calculating MODIFIED changesets (a1"), (b1'), etc. to give the same final result Vab.

Compactly, letting (a)=(a1)+(a2) and (b)=(b1)+(b2)

  V0---(a)--->Va--->(b)--->Vab

or in the other order


  V0---(b)--->Vb--->(a)--->Vab

I know how to do this with standard history editing tools.

---



But this is problematic when others have pulled and cloned and made modifications to the intermediate pre-history edited versions like Va2b1.  In Mercurial, if that version is deleted from the repository, you may not be able to merge back in - or, when you merge, it may revive the deleted version from the dead.

This is bad.

Moreover, this "you must edit history before pushing" philosophy leads to people delaying sharing code.  Which is bad.

---

I have been thinking more and more about recording multiple paths of history in the repo.

I.e. pushing first

    V0---(a1)--->Va1--->(a2')--->Va --->(b1')--->Vab1'---(b2')--->Vab

and then later, at your leisure, going and rearranging the history so that it makes sense:


    V0---+---(a1)--->Va1--->(a2')--->Va --->(b1')--->Vab1'---(b2')--->Vab
               \                                                                                   ||
                +-----------(a)---------->Va--------->(b)--------------------->Vab

I.e. the final nodes Vab are equivalent.

Or, pushing my ascii art skills

    V0---+---(a1)--->Va1--->(a2')--->Va --->(b1')--->Vab1'---(b2')---+--->Vab
               \                                                                                       /
                +-----------(a)---------->Va--------->(b)-------------------------+


(I think I like the equivalence || drawing better.)


Still later you may create a second alternate  history:

    V0---+---(a1)--->Va1--->(a2')--->Va --->(b1')--->Vab1'---(b2')--->Vab
               \                                                                                   ||
                +-----------(a)---------->Va--------->(b)--------------------->Vab
                  \                                                                                ||
                   +-----------(b')---------->Vb--------->(a')------------------>Vab
   
The basic idea is that when you establish that you are creating alternate history paths, the operations are constrained to create the same final results.

And then, for those who want a simple history, you hide the paths, the changesets and the versions, that you do not want to appear by default.  But they still are recorded in the history, so that if somebody made modifications based off an intermediate version, you can still merge meaningfully.   You get to decide if that merge appears as if based of a version that is currently hidden, or if you want to roll up the visibility of its ancestors.