Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Friday, July 18, 2014

p4 partial checkouts, file sets, spatial and temporal

I like Perforce's partial checkouts - workspaces that do not need the whole depot - but it comes at a cost in speed.



With a whole-repo DVCS like hg or git, diffing can be quite fast:  you check the whole-repo version object.. If unchanged, that's it - you have essentially checked only a single file.  Whereas p4, because it seems to operate by assembling versions of individual file objects, has to check each.  IT complains when I try to reconcile a p4 workspace with >3000 files, saying "prune your workspace".



Something similar applies to diffs when files have changed.



---



I think my concept of "file-sets" or "versioned-object-sets" can help here.



A versioned-object-set can be the whole repo, or an individual file.  And possibly stuff in between, subsets, like directory trees.



Let's imagine that at the lowest level we have individual file-versioned-objects.   (This is not necessarily true - might have versioned-objects corresponding to parts of a file, or even parts of several different files (e.g. function definition and declaration.  But it's nice to have an atomic level to think about.)



A spatial set describes while file versioned objects are considered.  It might be a list of disjoint files,

or it might have predicates such as "the subtree under /path/subdir".



The spatial set's rules, that define what files are considered in the spatial set, may itself be versioned.  E.g. you may add or delete a subdirectory from a spatial set.



Note difference: adding or deleting a subtree from the whole repo (or from some other spatial set), versus adding or deleting it completely from the repo.

      "I don't want to see this subtree in this subset any more" (doesn't propagate to other sets overlapping the subset)

      Versus "remove from this set, and all other sets, going forward." (propagates to other spatial sets and subsets (when they decide to merge), but doesn't affect saved history

      Versus "remove content completely from the history"  (le.g. licensing problems)



Apart from versioning a spatial set's rules, the spatial set's contents, the list of files inside it, may be versioned.



Call that a versioned spatial set instance.



Partial checkins do not necessarily immediately affect a spatial set's contents when next updated.   But the spatial set may be directed to merge candidates.



I.e. a spatial set may be constructed from the latest trunk version of files specified by the spatial sets rules.



For that matter, a spatial set may be constructed from the last version of files on different versioning branches - e.g. the latest trunk version of files under main/...,  and a development branch version of files under library/...



In so doing, we are transitioning from spatial sets being "pure" spatial descriptions, to spatial-temporal sets, combinations of spatial and branch versioning descriptions.    Operations such as "all files spatially under subdir/..."  and "latest files on branch bbb..." and intersections and unions and other set operations thereof.



I dislike the world "spatial-temporal", since temporal seems to imply versions as of a precise time.



Better?:  "spatial" for file position, and "lineage", for things like "the latest of a branch".



---


This concept of a versioning sets enables us to have simple tracking branches that do not fully propagate changes, at least on file granularity.

E.g. if you have Branch1 and Branch2, each with corresponding READMEs, and you do not want to propagate README.Branch1 to Branch2, or vice versa.

Branch1 = common-spatial-set + README.Branch1
Branch2 = common-spatial-set + README.Branch2

checking stuff in on Branch1 may affect common-spatial-set and README.Branch1.

updating Branch2 receives changes to the common-spatial-set but not changes to README.Branch1.



Flip-side, we may clone Branch2 from Branch1, which will give us README.Branch1 in Branch2.   When we prune README.Branch1 from Branch2, we have two options - making it a deletion that propagates, or not.
 (Q: what does "propagation" look like?
     "Propagate a property to any new checkin - like branch."
     "Do not propagate property - tag that applies to a version"
     "Propagate across merges by default."
     "Do not propagate across merges by default - branch specific file".
     "Propagate to child branches, but not to parent branches..."
)

















No comments: