Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Tuesday, May 26, 2009

git fetch and tag conflicts

As mentioned in earlier posts, I may merge two unrelated repositories.

Warning: the two unrelated repositories to be merged may both have tags of the same names. I.e. there may be tag conflicts.

By default, git fetch leaves tag conflicts pointing at were they were originally in the destination. By default, git fetch only fetches non-conflicting tags, in the range fetched.

git fetch --tags .. will fetch other tags. However, inh the case of conflicting tags, the source tag, the new one, will completely override the existing tag in the destination.

This can result in loss of the metainformation represented by the tags. E.g. if both of the repositories involved have the same tag, something like 'got tag Okay'. The Okay tag for one of the repositories will be lost.

Not that even if you embed the date within the tag, there may be lossage E.g. 'git tag Tests-passed-2009-05-29'. Even if you embed the time ... but the more fine grained the tag timestamping, the less likelihood of a tag conflict producing loss of information.

Fortunately (?), got fetch / merge / pull do not seem to merge tags on a file by file basis. That would be bad, indicating possibly inconsistent sets.

There appears to be no way - i.e. I have not found a way - to signal an error if a tag is being lost as a result of such a fetch/merge.

MORAL AND COMMENTARY:

Beware of the possibility of losing information via this mechanism.

Embedding the date and time in tags may be a good way to reduce the possibility of losing tag information.

You can embed time a priori.

A posteriori, you may wish to rename tags on a repository or branch that is about to be fetch/merge/pulled, from something like tagFooBar to branch1-tagFooBar. This is a posteriori, because you don't know at the time you create a tag what uniqifying branch prefix it whould be merged as.

... Or do you ...? If tags implicitly had (a) timestamp, (b) hostname, (c) pathname, (d) user, the probability of a tag conflict would be diminishingly low. If furthermore you had the contents (checksum thereof) of the files involved ... Then this would be effectively unique. After all, if two tags have the same name, refer to the same file versions, then for all intents and purposes they ARE identical. Involving time, user, etc. is just icing on the cake.

This tends to imply to me that tags should be first class objects, much as files. They should have an arbitrarily long unique name, specifiable by any set of coordinates that is uniqifying. However, when coordinates are not uniqifying, i.e. when two tags have the same name but differ in other coordinates such as file contents, then (1) both should be maintained in the repository, but (b) the shorthand using the tagname only should not be allowed.


This is relevant to "floating tags", such as "got tag tests-passed". The uniqifying coordinates described above would automatically be applied. This way, you could do "got tag tests-passed" as many times as you wanted.

Perhaps for floating tags you would want the name tests-passed to select the most recent. Or perhaps there should be operators such as "Most recent tag tests-passed on branch branchBelongToSomebodyElse".


Since got does not support such tag uniqification, it behooves to check for tag conflicts manually.


Git tag philosophy - don't share tags



The 'git tag' manpage,
e.g.
http://www.kernel.org/pub/software/scm/git/docs/git-tag.html

explains git's philosophy, that tags are really not meta-data that should be shared.

This happens because of the "multiple users" mindset.

Since I have a "single user" mindset, and also a "subprojects" mindset, it appears that I am probably more wont to want to preserve tags than the original git authors, Linus et al.

My approach to making tags be first class, implicitly uniqified, seems to solve this nicely: it preserves the information, without requiring extra work. Whereas the present git strategy seems to require considerable extra work to preserve tag history.

Tags on branches



Some tags want to be branch specific. E.g. tests-passed is a floating tag that should be independent on many different branches.

Whereas other tags may want to be repository wide. E.g. tag-the-only-version-on-any-branch-that-looks-god.

Uniqification may be the appropriate thing here. Tags may want to be implicitly made on branches, as one of the uniqifying coordinates.

What about different files in different branches having the same tag? Again, thing of it as a query: "SELECT files WHERE branch=* AND tag_name=foo"

Something for me to do?



Ahhhh.... maybe there is some way I can improve the sate of the art in version control. Subprojects, tags, directories.

Tags and subprojects



In CVS I grew into the habt of having subprojects live in seaparate directory trees of a great big mother source trewe.

However. one might want to consider using tags, on a per file basis, as an indcation of subprojects.

E.g. in that big source tree I would typically have a project-skeleton - README, bin, etc. I could tag those files 'skeleton'. If I then checked out just those files marked 'skeleton', I would get those files.

A subproject might be checked out embedded in the whole tree. It maight have the skeleton files, as well as its own contriobutions to shared directories such as bin.

I would prefer to use non-overlapping subdirectories, but, tags used in this manner would be useful, given how traditional UNIX distributes subproject files all over the standard directory tree (bin, lib, etc).

Such tags would need to float automatically. I.e. they would need to be associated with file object names, not particular file object versions.

No comments: