The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Tuesday, May 26, 2009

git fetch and tag conflicts

As mentioned in earlier posts, I may merge two unrelated repositories.

Warning: the two unrelated repositories to be merged may both have tags of the same names. I.e. there may be tag conflicts.

By default, git fetch leaves tag conflicts pointing at were they were originally in the destination. By default, git fetch only fetches non-conflicting tags, in the range fetched.

git fetch --tags .. will fetch other tags. However, inh the case of conflicting tags, the source tag, the new one, will completely override the existing tag in the destination.

This can result in loss of the metainformation represented by the tags. E.g. if both of the repositories involved have the same tag, something like 'got tag Okay'. The Okay tag for one of the repositories will be lost.

Not that even if you embed the date within the tag, there may be lossage E.g. 'git tag Tests-passed-2009-05-29'. Even if you embed the time ... but the more fine grained the tag timestamping, the less likelihood of a tag conflict producing loss of information.

Fortunately (?), got fetch / merge / pull do not seem to merge tags on a file by file basis. That would be bad, indicating possibly inconsistent sets.

There appears to be no way - i.e. I have not found a way - to signal an error if a tag is being lost as a result of such a fetch/merge.


Beware of the possibility of losing information via this mechanism.

Embedding the date and time in tags may be a good way to reduce the possibility of losing tag information.

You can embed time a priori.

A posteriori, you may wish to rename tags on a repository or branch that is about to be fetch/merge/pulled, from something like tagFooBar to branch1-tagFooBar. This is a posteriori, because you don't know at the time you create a tag what uniqifying branch prefix it whould be merged as.

... Or do you ...? If tags implicitly had (a) timestamp, (b) hostname, (c) pathname, (d) user, the probability of a tag conflict would be diminishingly low. If furthermore you had the contents (checksum thereof) of the files involved ... Then this would be effectively unique. After all, if two tags have the same name, refer to the same file versions, then for all intents and purposes they ARE identical. Involving time, user, etc. is just icing on the cake.

This tends to imply to me that tags should be first class objects, much as files. They should have an arbitrarily long unique name, specifiable by any set of coordinates that is uniqifying. However, when coordinates are not uniqifying, i.e. when two tags have the same name but differ in other coordinates such as file contents, then (1) both should be maintained in the repository, but (b) the shorthand using the tagname only should not be allowed.

This is relevant to "floating tags", such as "got tag tests-passed". The uniqifying coordinates described above would automatically be applied. This way, you could do "got tag tests-passed" as many times as you wanted.

Perhaps for floating tags you would want the name tests-passed to select the most recent. Or perhaps there should be operators such as "Most recent tag tests-passed on branch branchBelongToSomebodyElse".

Since got does not support such tag uniqification, it behooves to check for tag conflicts manually.

Git tag philosophy - don't share tags

The 'git tag' manpage,

explains git's philosophy, that tags are really not meta-data that should be shared.

This happens because of the "multiple users" mindset.

Since I have a "single user" mindset, and also a "subprojects" mindset, it appears that I am probably more wont to want to preserve tags than the original git authors, Linus et al.

My approach to making tags be first class, implicitly uniqified, seems to solve this nicely: it preserves the information, without requiring extra work. Whereas the present git strategy seems to require considerable extra work to preserve tag history.

Tags on branches

Some tags want to be branch specific. E.g. tests-passed is a floating tag that should be independent on many different branches.

Whereas other tags may want to be repository wide. E.g. tag-the-only-version-on-any-branch-that-looks-god.

Uniqification may be the appropriate thing here. Tags may want to be implicitly made on branches, as one of the uniqifying coordinates.

What about different files in different branches having the same tag? Again, thing of it as a query: "SELECT files WHERE branch=* AND tag_name=foo"

Something for me to do?

Ahhhh.... maybe there is some way I can improve the sate of the art in version control. Subprojects, tags, directories.

Tags and subprojects

In CVS I grew into the habt of having subprojects live in seaparate directory trees of a great big mother source trewe.

However. one might want to consider using tags, on a per file basis, as an indcation of subprojects.

E.g. in that big source tree I would typically have a project-skeleton - README, bin, etc. I could tag those files 'skeleton'. If I then checked out just those files marked 'skeleton', I would get those files.

A subproject might be checked out embedded in the whole tree. It maight have the skeleton files, as well as its own contriobutions to shared directories such as bin.

I would prefer to use non-overlapping subdirectories, but, tags used in this manner would be useful, given how traditional UNIX distributes subproject files all over the standard directory tree (bin, lib, etc).

Such tags would need to float automatically. I.e. they would need to be associated with file object names, not particular file object versions.

git directories not first class?

git apparently does not treat directories as first class citizens.

E.g. there apparently is no way to checkin an empty directory. Or, at least, no way to provide a log entry when creating a directory.

SomeSecretHostname /users/glew/hack/git-hacking/ 441 : mkdir git-dir-example
SomeSecretHostname /users/glew/hack/git-hacking/ 442 : cd git-dir-example
Directory: /users/glew/hack/git-hacking/git-dir-example
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 443 : got init
got: Command not found.
# my usual typo: got instead of git
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 444 : git init
Initialized empty Git repository in /fs30/home.directory.11/glew/hack/git-hacking/git-dir-example/.git/
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 445 : git log
fatal: bad default revision 'HEAD'
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 446 : echo hi > there
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 447 : git add there
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 448 : git commit -m'there'
[master (root-commit) 2849a27] there
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 there
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 449 : git log
commit 2849a27602d10ec8192e931c5410491521f3fe73
Author: Andy Glew Linux
Date: Tue May 26 11:48:49 2009 -0700

SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 450 : git mkdir foo
git: 'mkdir' is not a git-command. See 'git --help'.
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 451 : mkdir foo
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 452 : git commit
# On branch master
nothing to commit (working directory clean)
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 453 : git add foo
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 454 : git commit
# On branch master
nothing to commit (working directory clean)
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 455 : echo hi > foo/there
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 456 : git add foo/there
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 457 : git commit
Waiting for Emacs...
[master aa4fd7d] foo/there
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 foo/there
SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 458 : git log
commit aa4fd7d713584ca2a3781f264fc9d48b1169f032
Author: Andy Glew Linux
Date: Tue May 26 11:50:01 2009 -0700


commit 2849a27602d10ec8192e931c5410491521f3fe73
Author: Andy Glew Linux
Date: Tue May 26 11:48:49 2009 -0700

SomeSecretHostname /users/glew/hack/git-hacking/git-dir-example/ 459 :

Merging unrelated repositories

As I start up with git, I am also starting up work with several codebases that I am unfamiliar with.

The codebases are mostly under SVN, but not all; some are under CVS, some Perforce. I have decided to just suck everything into git for purposes of tracking any changes I am making myself. I am not linking to svn repos or anything else; I am just sucking in the SVN and CVS metadata, with the expectation that I can work under git, and then do a svn checkin.

The codebases are structured as overlapping projects, subprojects, and libraries. I was not aware of the relationship between the subprojects and libraries. E.g. I checked out a library, placed it under git, made changes - and now have checked out another tool that has this library as a subcomponent. I want the new tool to lie in git as well, and I want to checkin the vanilla source code - but I also, eventually, want to use my edits to the library with the new tool.

Basically, I am doing a lot of work with subprojects. I am doing a lot of work with initially unrelated repositories.

Here's a link to a procedure on "Merging two unrelated repositories":