Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Wednesday, March 30, 2016

Back to the future: RCS

I have long been frustrated by the poor support for nested repositories in all version control systems that I am aware of:  Mercurial, Git, Bazaar...



Yeah, yeah, Git 2.8 has better submodules support.   Mercurial has subrepos. Blah, blah, blah.



My problem with all of these that I have looked into in detail is that they require a posteriori identification of a module.  And there is overhead at the root of a modules.



Whereas I have, for many years, maintained a personal source code tree where nearly any subdirectory tree and any time can be cloned, and used independently.  I do this because I want to use arbitrary libraries of my own in arbitrary other projects - e.g. my employer does not want me to insert my entire library tree into any source code of theirs, not if I just have too random libraries from disconnected places in my tree.  I also try to structure my libraries so that the minimum necessary can be imported.



TBD: Insert anecdote about discussing this with Linus - after his explanation, he said "Yeah, you need to add a porcelain to git."



But not just a porcelain. I do not usually want the whole history of the whole repository, all the way up to its root.  Usually I want only the subdirectory tree history (with provision for files in the subdirectory that may have been moved, i.e. that may have history, outside the tree).



And often I do not want the history at all, just a pointer to the repo.



E.g. today: I want to import one of my libraries for the umpteenth time into a project at work.



Way back when I started doing this regularly, my personal source code tree was CVS, as was my company's.  You can make a CVS directory be a symlink to outside CVSROOT, and it works pretty well. (Except that the company history doesn't have its own history of my tools.)



I have not found an equally satisfactory system since I gave up CVS.



Oftentimes, I use two VCS in the same module:



My company may be using Perforce, /p4/workspace/project



My library may be in ~glew/src/lib/a/b/glewlibXX, under Mercurial (or git, or bzr, or...)



and I clone my library using my VCS to the company workspace



Possibly in

/p4/workspace/project/users/glew/src/lib/a/b/glewlibXX



But preferably in a better location, like

/p4/workspace/project/external-dependencies/glewlibXX



I check all of the files into the company repo (perforce).



When I edit, I check into the company repo using the company VCS, e.g. perforce.  If I am allowed, I also check into the my personal repo using my VCS, e.g. hg.



If the company wants, they can pull updates that I have made to my personal library from my VCS into their VCS.  And so on.



If I am using a DVCS, this creates a history, typically

/p4/workspace/project/external-dependencies/glewlibXX/.hg



This wastes diskspace, since the company has its history in their depot, and I have my history in mine.  But we don't care about diskspace any more, right?



It's a minor pain, since I have to remember to push history from

/p4/workspace/project/external-dependencies/glewlibXX/.hg

to

~glew/src/lib/a/b/glewlibXX,

in addition to having to checkin to the company repo.

I can automate that.



A bigger annoyance is the question: does the cloned module's history and metadata,

/p4/workspace/project/external-dependencies/glewlibXX/.hg,

get checked into the parent repo?  I.e. is there a history of the history?

I have tried it many both ways.  Either way has problems









Anyway: frustrated, I have been thinking about going back to what worked well..



I was considering going back to CVS, since as I mention above it is fairly easy to link CVS directories.





The annoyance there is that CVS requires CVSROOT.  And I would prefer not to go back to having a full CVS repo.









Anyway: frustrated, I have been thinking about the simplest possible thing.



If not CVS, then next simplest is RCS.  (Or maybe SCCS, but I would rather not think about that.)







I.e. I am considering using RCS, only for this submodule sharing.    I would be using a different VCS for my master, and the company would continue to use its own.



I.e,. RCS might be just the VCS for fine grain submodule sharing.







I will use comments to this post to record further thoughts and issues.





































4 comments:

Andy "Krazy" Glew said...

Of course, the primary problem with RCS is that it is file based. not directory tree based.

RCS does allow use of an RCS subdirectory, and that can be a symlink (like CVS).

But unlike CVS, if you RCS add a new file in a new subdirectory, it does not automatically create the appropriate RCS file.

(And even though CVS may create the appropriate CVS file, under CVSROOT, it is not necessarily the link that you want.)



Neither RCS nor CVS ascend the filesystem to find the root of the repo, or a parent CVS.



But... these problems can probably be scripted or wrapped.



As I said in <>

one if the main problems in a VCS is mapping workspace files to history objects.

It can be done by suffices or prefixes (SCCS s.file.c, RCS file.c,v), e.g. in the same directory.

It can be done by "links to history areas", like the CVS or RCS or SCCS directiories.

It can be done by substuting parts of pathnames (e.g. CVS's CVSROOT, which also stores metadata)

This can be generalized to mapping files using transform rules like regexps (Perforce)

You can find the control info and/or repo root by ascending the filesystem, looking fior a filename like .p4config or a directory like .hg/.git/.bzr)

It can be in environment variables (gag).

Once you have found the control info that describes how the mappings can be done, you can script a lower level thing like RCS.





Andy "Krazy" Glew said...

i.e. in some ways I want a cvs without CVSROOT, and with some degree of dirtectory traversal / ascent to find control info.

Andy "Krazy" Glew said...

In case people think that using RCS means giving up DVCS:

years ago I wrote "RCS,v-merge", a tool that would take RCS files from cloned, forked, repos, and merge them meaningfully.

I gave up on this when git came out. But I may revive it.


I used monotone-like hashes to detect common history. I was even able to merge RCS,v files that did not shared an ultimate common ancestor. I.e. I could merge histories in the middle or towards the leaves of the tree, not just at the root.


Andy "Krazy" Glew said...

A minor but especially annoying annoyance with using RCS is that RCS does not use an EDITOR variable.

Sure, I can script that - but I don't want to unconditionally write a checkin message, and then learn that it is unnecessary,