The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Thursday, August 14, 2014

Versioned Label Sets

I like labels in version control systems. Like "Compiles". "Passes Tests".  "Passes all tests except Test#44".   Status if you will.

Of course, such status must be applied to set of files+versions.  Or a repo-project-version. Whatever.  (I will not use the whole-repo-project viewpoint here, since I am trying to think about partial checkouts. Whole-repo trivially fits - it is just a single entry, the rep version.)

You can think of a label as a file itself, containing a list of files and their version numbers.   Such a label-file might also contain branch info, etc. - i.e. more metadata.

Generalize to an arbitrary package of metadata associated with a set of files+versions.  "Labels" may be such that their name is the only metadata that matters.

Such a label-or-metadata-file can itself be versioned.  Must be, should be.

In fact, just about everything we care about can be considered a set of objects+versions, in a file, itself versioned.  

Branches may be defined by rules such as a list of filenames or patterns.  Possibly with versions that are frozen in the branch. 

OK, there is a difference: branch histories are graphes.  Steps along the history are the sets of objects+versions that most closely correspond to a label set.  

I.e. there are graphs whose nodes are objects+histories.  

Anyway... : the default action is where the difference arises.

When a workspace is checked out from a branch head, when trying to check in the default is to extend the branch. 

When a workspace is checked out from a label, the default is not to extend. The label.

We can imagine interconversion: forcing a checkin to a label, making the label into a branch.


Who stores the linkages?

Label-sets may be marked inside a branch-graph file, or outside.

Outside allows non-privileged access.  Library users can label library versions.

Inside may be faster and more convenient.

It is important to be able to track the configuration of stuff that you are not allowed to write into the "home" VCS for. 

The DVCS people say "just clone", but that may not always be possible

I may want to have a local repo, linked to a master repo without incorporating all, and be able to define cross repo actions.

Click through IP licensing

The p4ideax forums terms of use has some interesting details: http://p4ideax.com/terms-of-use

Starts off mild:

User Submissions and License By using P4IdeaX, you agree that any information you send to Perforce via P4IdeaX, including suggestions, ideas, materials, and comments, (collectively referred to as the "Materials") is non-confidential.

But then gets stronger:

 Furthermore, by submitting the Materials using IdeaX, you grant Perforce and its designees an irrevocable, unrestricted, perpetual, non-exclusive, fully-paid up and royalty free worldwide license to make, use, sell, import, modify, reproduce, transmit, display, perform, create derivative works, combine with other works, and distribute such Materials for any purpose whatsoever to the extent permitted by law. This license to Perforce includes the right for Perforce to sublicense these rights to third parties.
Perforce may be working on a same or similar idea at the time of your submission. You understand that we may continue to develop our own idea independent of your submission without acknowledging your Materials.
As part of its license to your Materials, Perforce may make modifications to, derivative works of, or improvements to your Materials. These modified or improved versions shall be owned exclusively by Perforce.
Submission under a Patent or Patent Application You agree to disclose to Perforce if your Materials are protected by a patent or subject to a pending patent application. If your Materials are not yet patented, but you wish to patent your idea in the future, you also agree to disclose this information to Perforce.

Now, I think that recent updates to US patent law mean that there is no grace period here. If you post to a pretty-much-public website like p4ideax, then you have made a public disclosure and may not patent.

If your Materials are patented, subject to a pending patent application, or you intend to file for patent protection, these Terms of Use will automatically grant Perforce a license under the terms of the previous section entitled User Submission and License. Such license may be superseded only by a separate written license or assignment agreement between you and Perforce.

This is interesting. What if the materials are not yours to license?  What if you are posting GPL'ed materials? I can imagine some lawyer arguing that because you did not specify GPL when you posted, than the GPL would not apply.
Posting your idea to P4IdeaX may impact your ability to protect your idea under patent laws. If your goal is to patent your idea, we suggest you consult with an attorney before posting your idea on IdeaX. You agree not to hold Perforce liable for any loss of patent protection.

This is the other side of  ARM's "click-through licemnsing": to view ARM materials you have to promise not to use them to detect patent infringement.


As for p4ideax:  haven't registered yet.  

What about posting a link to a blog on my own site?  The link is licensed, but is the content I linked to licensed (I doubt it).


I guess my interest is left over from working at IV.

Perforce Software p4ideax | Intelligent symbolic links in the depot

Perforce Software p4ideax | Intelligent symbolic links in the depot:

'via Blog this'

I have also been looking for this "symlinks in depot".

It is possible that streams may do this - I may not totally grok streams yet (not helped by our IT forbidding us from using streams in P4, and highly discouraging branching (p4 branching support, is of course, primitive)).  But based on what I have seen so far, streams are much more complicated than what I want to do with symlinks/

Here is one of the use cases where I want to use depot side symlinks:

I want to merge two directories that have diverged versions of files.

Unfortunately, they are NOT branches.   The user who created them did not understand branching.   Instead, she copied the files outside perforce, and then added the copy as a separate set of files that, from Perforce's point of view, are totally independent, unrelated. (Fixing that is a separate topic.)  Call this a "fake branch".  (E.g. think about cp -R from to creating a fake branch of a directory tree - logically a branch, just one that your version control tool may not be able to figure out.)

Unfortunately^2 they are binary files that I can merge, but must do so by hand.  Painful. Slow.  I can't get the merge done all in one day.

So here is what I want to do: as I merge the several hundred files in the fake branch directory

- let's call the original


and the "fake branch"


I must leave the two directories GoodDir and FakeBranchDir around.

But as I merge files GoodDir/file1#666 and FakeBranchDir/file1#1 into GoodDir/file1#667,

I want to make FakeBranchDir/file1#2 into a "depot symlink" to GoodDir/file1

so thereafter anyone attempting to work with FakeBranchDir/file1 will get whatever the latest version of GoodDir/file1 is.

And I will do this one by one for all of the files.

(By the way, I can do this because I know the dependencies.  I.e. I can do continuous partial integration (merging, reconciliation).

Sometimes I have to do several files together atomically, but not the entire directory.)

When all of the files are merged, so that every file in FakeBranchDir/fileN is a "depot symlink" to GoodDir/fileN,

I can do the following:

* remove all FakeBranchDir/fileN depot symlinks, and make DEPOT/c/d/FakeBranchDir a depot symlink to  DEPOT/a/b/GoodDir

* potentially just plain remove FakeBranchDir completely, and stop the insanity of having unnecessary fake branches in the depot

Anyway... streams may do this, but they seem like overkill, plus IT has forbidden p4 streams. Heck, my team barely knows how to use branches - actually, I am strongly discouraged from using branches (but I am so used to branching...)

Lacking depot symlinks or other support, here is what I am doing:

+ Merging the files

+ Once merged, copying the files into BOTH GoodDir/file1 and FakeBranchDir/file1, etc.

+ hoping that nobody modifies the merged files separately, causing them to re-diverge.

   + unfortunately, not allowed to create a long-lived lock. Folks still want to edit in ther diverged directories

I have thought about using p4 branch mappings to accomplish the same thing as a "depot symlink", but that is a pain - I would have to edit the branch mapping every time a file GoodDir/fileK and FakeBranchDir/fileK were merged.

Basically, "depot symlinks" are just a way of allowing you to edit the branch mapping, without actually having to edit the mapping in a central place.   They are a "distributed" view of the branch mappings.


Now, yes, I know: this creates a "fragile base class" problem.   Somebody checking something into GoodDir/fileM might break FakeBranchDir/fileM (if it is a depot symlink), because the "context", the surrounding files, may break it in the FakeBranchDir context.   Yes, I realize that we really need to be using branches here (not p4's primitive branching, but some sort of branching for a partial subset of the depot - which may be what p4 streams are trying to do.).  So that when somebody checks into GoodDir/fileM, FakeBranchDir/fileM can detect that it needs to be updated, but is not automatically updated until you have tested it in the FakeBranchDir context.

(Hmm, what this really means is that FakeBranchDir/fileM#2 may be a depot symlink to GoodDir/fileM (after some base revision)

FakeBranchDir/fileM#2-->GoodDir/fileM(#latest,validated=1011). Using notation to indicate that we are supposed to link to the latest, but at the last time of checkin that value was GoodDir/fileM#1011; as opposed to linking to FakeBranchDir/fileM#2-->GoodDir/fileM#1011, which would be a depot symlink, but one that is not normally updated by default.

     I.e,. a depot symlink really wants to be a branch.  But it is a branch that you normally want to be encouraged to update as quickly as possible, perhaps by default, as opposed to having to do an explicut branch merge.)

     But, these are dreams for my own VCS.

     Just plain old depot symlinks, though, are a darn good first step.)