The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Monday, July 14, 2014

Transformations when moving changes between branches

I often want to have transformations automatically applied when I perform operations between branches.

Very simple example: I have occasionally had readmes for specific branches, that I want to live only in that branch. E.g. README.vcs-branch-name1, README.vcs-branch-name2

Therefore, when merging from branch1 to branch2, I do NOT want to transfer README.vcs-branch1.

But when doing a reverse merge from branch2 to branch1, I do not want to transfer README.vcs-branch2, and I especially do NOT want to delete README.vcs-branch1.

Mercurial's merge tracking will arrange to delete the README.vcs-branch1 file on the reverse merge.  Bad, mercurial.

You can think of this as a patch that is implicitly applied whenever there is a cross branch operation.  Patch may be too specific: possibly a programmed transformation expressed as code.

(Would also want to notify on cross-branch diffs about such transformations.)


A contrived example: if tracking Linux installations, may want to change text in some control files.

E.g. some file may contain  a user name, like "UserThatRunsFooBar"

On one machine it may be FooBarUser.   On another it may be SamJones.

All of the rest of the diffs to the file may transfer, just not that variable name.

May want a different branch for the two systems.

Hence, a desire for a transformation applied whenever such a file is moved between the branches for the two systems.


Partial checkouts can then be considered to be branches with such transformations based on filesystem structure.

A partial checkout of a subtree may have the transformation rules:

* include all stuff under tress T1, T2, ...

* exclude all stuff not under those trees.

DVCS branches = sets++

It is a good idea to be able to identify "sets" of revisions. Both by predicate functions, and by tagging with names.

Branches are sets that automatically extend: when you do a checkin from a workspace with a parent set that is a branch, the checkin automatically gets added to the branch set.

This allows branches to converge and then diverge:

of course, a version can be tagged as being in multiple sets

similarly, a version can be tagged as being in multiple branches at the same time.

Two versions on different branches can merge, and the branches can be converged for a while.  But then later diverge.

This can be done on a file by file basis:  not just whole repo versions, but individual file versions.






file-sets - most meaningful when file-version-sets

named-file-sets => these are objects that can be versioned


set operations on named-file-sets

=> partial, union, difference

BTW, parse this as named--file-sets or named(file-sets)

Doesn't need to be named(file-sets).  Can be anonymous.  Perhaps better called (explicit(file-sets)

or identified(file-sets)


I have elsewhere figured out that

partial checkouts are easy,

while partial checkins correspond to creating a branch, at least temporarily, from which changes can be propagated to larger filesets.

Probably with some sort of nagging system:

Partial checkin doesn't automatically check into containing filesets,

but does automatically check into candidate filesets for enclosing branches.

This might be a good place to exploit file versioning as opposed to whole repo version - candidate-filesets or candidate-branches on a per file basis.

UNIX tools and special characters in filenames

See, fior example:  bash - Is there a grep equivalent for find's -print0 and xargs's -0 switches? - Stack Overflow:

'via Blog this'

UNIX tools are great, with their composability - find | grep | xargs | etc.

But UNIX tools have problems handling entities or objects, such as filenames, that have special characters such as blank spaces or newlines within them.

UNIX tools typically operate on lines (grep, xargs'input), or on words separated by whitespace (e.g. backtick expansion, xargs' invocation of other tools).

Some UNIX tools provide the option of using null separated strings, such as find -print0 or xargs -0.

But as the stackoverflow page shows, people want such flexibility in other tools, like grep. Of course, GNU grep has provided it - --null - but there are probably other such tools.   ... cat?  but of course tr '\n' '\0' ...   still, the list continues.  Mercurial?  Git?

Moreover, null separated is by no means the last word.   What if nulls are allowed in the strings that your are manipulating?  Need either a quotation system, such as XML (and then we get into the issue of quotes upon quotes), or a strings-with-length system.

I have elsewhere talked about making all UNIX tools work with XML.  This is a generalization.

Strings-with-length is most general.  Possibly fragile.  Possibly XML clauses wrapped around simple "obvious" quoting.