The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Tuesday, October 09, 2012

Why rewrite history? Why disentangle?

Why ever rewrite history in a version control system?

Linus Torvalds does not need or want to see a thousand different branches, one for each contributor.
Linus has a longer post somewhere, along the lines of "the history should show how the code should have been written, not how it actually was written".
Basically, to make things understandable.  

Because in real life, the actual development history, changes get entangled. Not nicely 

   A -> B -> C

but instead

    A1 -> B1 -> C1 -> B2 -> A2 -> B3 -> C2 -> ...

which we might like to see reordered as

    A1 -> A2 -> B1 -> B2-> B3 -> C1 -> C2

and smashed to 

     A -> B -> C
          A = A1 -> A2, etc.

If the changes, patches, are operators that commute Darcs-style, all well and good.  But many patches don't commute - logically they should, but in actuality do not.


Note that this entangling can be at file granularity, but is more painful wen within the same file.  Worse still if in the same line of code.

Hey - "entanglement".  Darcs is inspired by physics, right?


OK, so we want to rewrite history.  But the danger in rewriting history is that we might not end up where we want to.

More and more I am thinking that it woyuld be a good idea to rewrite history in a lattice defined by the actual history.

E.g. if we have (bolding the stats, using dA1B2 etc to indicate differences)

   0 -d0A1->  A1 -dA1B1-> B1 -dB1C1-> C1 -dC1B2-> B2 -dB2A2-> A2 -dA2B3-> B3  = FINAL

and we want

   0 -d0A-> A -> dAB-> B -dBC->  C FINAL

then we should constrain the rewritten history to arrive at the same final state.

I.e. the actual history and the rewritten, virtual, for understanding history should be considered alternate paths to the same final state.

Imposing this constraint may be helpful in history rewriting tools.

Representing these alternate paths may also be helpful. Sure, present the elegant history that Linus wants, but also preserve the grotty history that... well, historians like me want.


When rewriting history manually I find that the FINAL state keeps changing.  Often rewriting history exposes issues that were not seen in the original path.

   0 -d0A1->  A1 ... -> B3  FINAL  -dPostFinal-> FINAL'
     \                                                                                               \

      ------d0A-> A -----> dAB-----> ---------dBC----->  C FINAL'

A moving target, perhaps, but still alternate paths to the same ultimate final state.

No comments: