The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Thursday, September 13, 2012

Partial checkins, file objects not necessary

I very much want partial checkins and checkouts in my version control systems.  I had them in SVN and CVS.  I lost them with DVCS.

I have posted elsewhere about DVCS compatible concepts that might support partial checkins and checkouts.

Partial checkouts are not that hard to imagine.  E.g. to checkout a subdirectory tree, scan the repo for all history relating to files that ever were in that subdirectory.
     Use whatever "history relating to" criteria you want: was in a file in that subdirectory, was in a file that at some point in its history was ever moved under that subdirectory.  Possibly the hypothetical git-like "looks like a piece of code that once lived in a file under that subdirectory". Just not "true always".
     One might provide the full history of all such files in the partial checkout or clone, although it might not be possible to checkout some files from the partial clone repo history into the partial clone workspace, since to be properly checked out some versions might lie outside the cloned subdirectory.  But one could at least access from the history.
     Some fun in ensuring that one can check out into the workspace different versions that lived at different places in the overall directory tree - i.e. where the whole subtree was moved.

Partial checkins are a bit more of a problem, given the desire for atomicity and conistency: one wants to at least give a chance to ensure that an entire repo is consistent, can pass all tests. Automatically including a partial checkin in the default trunk of the enclosing project would break this.
     The basic idea is for partial checkins to automatically create branches.  And then for the superrepository to be strongly encouraged to merge such partial checkin branches into the default trunk (or whatever branch was partially checked out from).  Somewhat like the way Mercurial automatically creates heads, anonymous branches, when there were conflicting edits to the same branch.  Now one would have heads, but not all heads would imnclude the entire repo, some might be for partial checkins.

 I have long been troubled, however, is that the easiest way to imagine building this is to create history objects that correspond to files.  A project version would be a mapping of pathnames within the repo to a set of history objects. A project might be defined to want to include the latest version of any particular history object going forward, but not to actually include it until the user has had a chance to test all together.

This troubles me because, while I think Linus is excessively pedantic for forbidding rename tracking in git, I agree with him that file rename tracking is primitive. I really do want to track functions as they are cut and pasted between files.
     I have long been fascinated by bits and rumours I have heard about non file based version control systems.  E.g. old systems that were conceptual card decks - where you didn't lock a file, but you checked out and in ranges of cards, potentially replacing a range with a larger or smaller number of cards.  Or with IBM Visual Age for Smalltalk/Java/C++, which seems not to be inherently file granularity.

History objects corresponding to files get in the way of this.

I think now I can see how to generalize this.  I often imagine a filesystem as an XML database - the hierarchy is natural. The old "version control of card decks" applies. One can imagine the patches for any subtree being under that subtree in the XML.
     OK, XML is horrible.  But it just shows the direction. Not "patch objects" that apply to the while monolithic project, but patches that are themselves an interlaved collection of patches to files and subdirs.  With the ability to interleave and uninterleave patches according to filesystems.

No comments: