I love my tablet PC. But...
Actually, I prefer reading articles and academic publications on paper.
But it is a hassle to print papers downloaded from the web
And there is always the hope that one day decent annotation software will arise.
Anyway, reading papers on my tablet PC is infinitely preferable to reading papers on my desktop system.
The latter is work, and literally a pain in the neck.
The former is a pleasure.
But: when I used to read papers at school in the library,
I would have the journal or the paper on one side of the desk,
and my notebook in the other.
Problem: I only have one tablet PC. Splitting the screen spatially or temporally between the paper I am reading and my notebook software is a pain.
What I want is a dual tablet PC. One PC, or at least one interface, that has two tablet surfaces.
Possibly attached.
Possibly one e-paper optimized for reading, and a second dynamic for writing.
Or possible 2 displays that fold open like a book. (TBD: I have a link to such a product proposal.)
But even more likely two independent tablet surfaces. Separately positionable. I often used to read books with my notebook or notepaper oriented perpendicular.
It might be acceptable, or even desirable, to tether the to displays by a cable. Just like I like tethering my pen: prevents them getting lost or separated.
But it might equally well be desirable to link the different display surfaces by radio or other wireless.
(Funny idea: body area networking, through the user body, for the link.)
Possibly could be separate PCs, but want to couple e.g. so that cut and paste from one table to the other can work.
---
Not just dual tablet PC. Multi tablet PC But dual would be a good start.
---
Blogged at http://andyglew.blogspot.com/2010/07/dual-tablet-pc.html
Disclaimer
The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.
See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.
See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.
Saturday, July 31, 2010
Thursday, July 29, 2010
Outlook late and repeated reminders
Outlook just flashed a reminder, telling me I had missed several meetings by 2 days to 22 hours.
Second time it has done it today, for the same meetings.
There was a crash in between.
Annoying.
Second time it has done it today, for the same meetings.
There was a crash in between.
Annoying.
Saturday, July 17, 2010
Better a crank than a cog!
On 7/17/2010 2:08 PM, nedbrek wrote:
> "Andy Glew"<"newsgroup at comp-arch.net"> wrote in message
> news:BsedncmAl52kRdzRnZ2dnUVZ_oOdnZ2d@giganews.com...
>>
>> Now, please don't get insulted when I say you are a crank. Cranks are my
>> people. I'm a crank. Cranks occasionally come up with good ideas.
>
> Hehe, reminded me of this cartoon:
> http://sydneypadua.com/2dgoggles/lovelace-and-babbage-vs-the-organist-pt-1/
>
> I can be a crank, as long as I am not the "Itanium crank". Anything but
> that, please. :P
Better a crank than a cog!
(Hey, that's my new motto.)
> "Andy Glew"<"newsgroup at comp-arch.net"> wrote in message
> news:BsedncmAl52kRdzRnZ2dnUVZ_oOdnZ2d@giganews.com...
>>
>> Now, please don't get insulted when I say you are a crank. Cranks are my
>> people. I'm a crank. Cranks occasionally come up with good ideas.
>
> Hehe, reminded me of this cartoon:
> http://sydneypadua.com/2dgoggles/lovelace-and-babbage-vs-the-organist-pt-1/
>
> I can be a crank, as long as I am not the "Itanium crank". Anything but
> that, please. :P
Better a crank than a cog!
(Hey, that's my new motto.)
Wednesday, July 14, 2010
A Poor Manager of Engineers...
I once asked manager 1 at company 1 about manager 2 at company 2, who had worked for manager 1 at company 3. Manager 1 said "manager 2 is a good engineer, but a poor manager of engineers".
I didn't think much more on this topic until recently, when I wondered "Wait, manager 2 is managing a team of engineers at a VLSI engineering driven company. How can it be that he is a poor manager of engineers?"
And then I realized: at companies like Intel, most VLSI engineers' first experience of management and team leadership is (or at least was, until recently) NOT managing other engineers. It is managing technicians, specifically a team of layout, physical design, specialists. Amazing as it sounds, 20 years after silicon compilers, Intel still largely accomplished chip layout by hand.
I posit that managing a team of mask designers is different from managing a team of design engineers. In the former the tasks are supposedly known: convert schematics into layout. In the latter, there is more backing up and retrying, more experimentation. More and more so as the level of abstraction rises, through microarchitecture and architecture.
I didn't think much more on this topic until recently, when I wondered "Wait, manager 2 is managing a team of engineers at a VLSI engineering driven company. How can it be that he is a poor manager of engineers?"
And then I realized: at companies like Intel, most VLSI engineers' first experience of management and team leadership is (or at least was, until recently) NOT managing other engineers. It is managing technicians, specifically a team of layout, physical design, specialists. Amazing as it sounds, 20 years after silicon compilers, Intel still largely accomplished chip layout by hand.
I posit that managing a team of mask designers is different from managing a team of design engineers. In the former the tasks are supposedly known: convert schematics into layout. In the latter, there is more backing up and retrying, more experimentation. More and more so as the level of abstraction rises, through microarchitecture and architecture.
Sunday, July 11, 2010
FLOSS, Larry Augustin, VA Linux IPO, one of my many missed opportunities...
One of the bet things since I joined IV was discovering FLOSS Weekly, the podcasts from Leo Laporte's TWiT network with Randall Schwartz (and originally Chris DiBona) on Free, Libre, and open Source Software.
I'm going back and listening to old podcasts as well as new ones. While driving between Seattle and Portlamd, or while exercising on the elliptical machine at the gym.
Today listened to old Episode 6, with Larry Augustin, founder of VA Linux, SourceForge, etc.
They mentioned VA Linux's IPO, and how Chris DiBona ran the program... heck, quoting wikipedia:
I'm not sure why I was one of the 1500 Open Source developers invited to join the IPO. The invite letter did not say. Chris DiBona said he went through source code control history records. Probably something I had contributed to RCS (-I. / -I$) or CVS (although I don't remember anything getting into the CVS distribution).
I'm going back and listening to old podcasts as well as new ones. While driving between Seattle and Portlamd, or while exercising on the elliptical machine at the gym.
Today listened to old Episode 6, with Larry Augustin, founder of VA Linux, SourceForge, etc.
They mentioned VA Linux's IPO, and how Chris DiBona ran the program... heck, quoting wikipedia:
Many authors of free software were invited to buy shares at the initial price offering as part of a friends and family deal.
http://en.wikipedia.org/wiki/Geeknet
I'm not sure why I was one of the 1500 Open Source developers invited to join the IPO. The invite letter did not say. Chris DiBona said he went through source code control history records. Probably something I had contributed to RCS (-I. / -I$) or CVS (although I don't remember anything getting into the CVS distribution).
I was at university at the time. I was invited to buy shares at 30$ - I don't remember how many, maybe 1000$ worth, maybe more. But I do remember was that it was more money than I had at the time. I remember discussing with my wife, saying that it might double in a year, but I didn't think that VA Linux was making money, so the money might be lost.
The IPO price was 30$. Trading opened at 299$, peaked at 320$, closed at 239$ on the first day.
Sigh. If I had purchased the IPO... but then, I would have probably held on too long, riding the stock price until it was essentially not worth the IPO price.
--
There is a moral here: I was right about the overall poor prospects for VA Linux. (I think.) But riding the wave can be very profitable, even though it ends up as nothing on the beach.
Saturday, July 10, 2010
New cygwin problems
I have depended on cygwin for years. For years my main development platform has been cygwin emacs, writing Perl and C++ and Python to be uploaded to various Linux machines and run.
I just upgraded to CYGWIN_NT-6.0. Running on Vista.
And I am plagued by instabilities. Emacs running out of resources, vfork failures.
And everything is much slower than the older version of Cygwin on same Vista machine. E.g. I may no longer be able to run git - too slow.
I just upgraded to CYGWIN_NT-6.0. Running on Vista.
And I am plagued by instabilities. Emacs running out of resources, vfork failures.
And everything is much slower than the older version of Cygwin on same Vista machine. E.g. I may no longer be able to run git - too slow.
Thursday, July 08, 2010
Microsoft OneNote grows on me
I almost hate to say it, but I am growing dependent on Microsoft OneNote.
Don't get me wrong: I hate OneNote. Its user interface sucks. There are so many improvements I could make to it...
But: it is really useful just to have something that I can drag and drop bitmaps into. Screen Clippings. That makes the bitmaps searchable via OCR.
E.g. today I made OneNote notebooks as I was (1) moving my NoIP accounts, and (2) shopping for a better wireless phone plan. With OneNote, I could just grab screen dumps of the various windows - that can't be cut and paste as text at all well - and can search the result.
Sure, I would like better. E.g. I wish OneNote's automatic recording of attribution worked more often. E.g. I have had OneNote cxrash and lose data, when using it at work, more times than I like to remember. But neverr
But what there is, is a start.
I tried Evernote as well today. I like the net centeredness. Haven't really used it another to now how it compares. One immediate problem: it doesn't work properly on my USB displays, only working on my laptop LCD. I also can't get screen clipping working outside of a web browser. But I'll probably figure that out evenntually.
Don't get me wrong: I hate OneNote. Its user interface sucks. There are so many improvements I could make to it...
But: it is really useful just to have something that I can drag and drop bitmaps into. Screen Clippings. That makes the bitmaps searchable via OCR.
E.g. today I made OneNote notebooks as I was (1) moving my NoIP accounts, and (2) shopping for a better wireless phone plan. With OneNote, I could just grab screen dumps of the various windows - that can't be cut and paste as text at all well - and can search the result.
Sure, I would like better. E.g. I wish OneNote's automatic recording of attribution worked more often. E.g. I have had OneNote cxrash and lose data, when using it at work, more times than I like to remember. But neverr
But what there is, is a start.
I tried Evernote as well today. I like the net centeredness. Haven't really used it another to now how it compares. One immediate problem: it doesn't work properly on my USB displays, only working on my laptop LCD. I also can't get screen clipping working outside of a web browser. But I'll probably figure that out evenntually.
Sunday, July 04, 2010
Certs and virtual hosts - multiple signatures
Messing around with webhosting setup on my vacation.
Wikpedia says (http://en.wikipedia.org/wiki/Server_Name_Indication):
Since 2005, CAcert has run experiments on different methods of using TLS on virtual servers.[4] Most of the experiments are unsatisfactory and impractical. For example, it is possible to use subjectAltName to contain multiple domains in a single certificate, but as this is one certificate, this means all the domains must be owned and controlled by one person, and the certificate has to be re-issued every time the list of domains changes.
... ... ...
Wikpedia says (http://en.wikipedia.org/wiki/Server_Name_Indication):
Since 2005, CAcert has run experiments on different methods of using TLS on virtual servers.[4] Most of the experiments are unsatisfactory and impractical. For example, it is possible to use subjectAltName to contain multiple domains in a single certificate, but as this is one certificate, this means all the domains must be owned and controlled by one person, and the certificate has to be re-issued every time the list of domains changes.
While I understand the motivations for SNI, I think the above statement is an indication of bogosity - if not in the statement, then in the structure of a certificate and signatures.
What I want is
...
I.e. I want to be able to have multiple certifiers attached, post facto, to a certificate.
(as well as I want to be able to have the certifier statements and signatures stored separately)
Certifier statements and signatures are metadata. Metadata can be both adjacent, as above, or non-adjacent.
There can be an integrity code on the certificate_carrier, signed by the certifiee. since the certifiers have themselves signed, protecting the integrity, of the enclosed certifiee statement, this enables the certifiee to add more certifier statements over time.
I.e. the certifier certifies the certifiee statemnt, not the entire certificate carroer. Although we could have tghat, too: the certifier could say "I certify this only if Verisign has also certified it", etc.
Friday, July 02, 2010
The Version Control Tool I Want
I want more from a version control tool.
I'm not particularly happy with the present generation of distributed version control tools:
Mercurial (hg) or git. Nor Bazaar (bzr).
Nor their predecessor Bitkeeper.
And certainly not CVS, SVVS, or RCS.
= To Do List =
At the top, to put it in my face:
* I'd like to extract code from git-pack-objects (or mercurial) that takes a list of not-hash-named files, and compacts them into delta storage. And then wrap my own version control concepts, in layered user level filesystem scripts, around it
* Leverage existing merge tools.
= Blog Wiki Crosslinks =
http://wiki.andy.glew.ca/wiki/The_Version_Control_Tool_I_Want
= My Main Complaint: Subprojects, Lack Of =
My main complaint:
I want a distributed version control system that naturally supports single file or sub-directory checkouts, subprojects, etc.
Ironically, this is somewhere where CVS (and CVS's immediate predecessors, various RCS tools (some of which I wrote at various companies))
was better than its successors SVN, BitKeeper, Git, Mercurial.
== My VC'ed Home Directory as a Tree of Subprojects ==
For many years - at least since 1985, and I think even a bit earlier - I have been a heavy user of version control.
In particular, I have long VC'ed my home directory.
Now, my home directory is not monolithic. I might check out different parts of it on different systems.
But I do not, at least historically have not, needed to divide it into separate repositories, and logically merge those together.
I have tried, and have not liked, various tools that allow multiple repositories to be logically merged into a single workspace.
(I think such tools are great for when you have to merge different repositories, from different organizations and/or under different VC tools. But I think they are overkill when I can get away with everything being in a single repository.)
E.g. under glew-home, my repository, I have subdirectories like glew-home/src, glew-home/src/include, glew-home/src/libag, glew-home/src/libag/number, glew-home/src/libag/builtin_initialization, etc. Currently roughly 50 of these libraries.
Each in a separate directory. Each designed to be used in isolation, although there are occasional interdependencies. Often header only.
When I started doing this, long, long ago, I might create a library libag.a that somebody would have to link into a binary.
This was an impediment to people using my libraries. Projects would not want my entire libag; they would only want the minimum, the stuff that they were using, and nothing else. Hence my evolution: a module in a directory, check out that directory, place it anywhere, include its header and, usually, that was all you needed.
If I were to structure these as independent projects, that would be roughly 50 independent repositories. Much more of a hassle to manage.
Fortunately, CVS made it possible to check out an arbitrary subdirectory tree. (This is one reason why I evolved to a directory per library module - one directory, to get the source file (typically a header) and any test files associated with it.) That subdirectory could be placed anywhere in the target workspace. But it could always get checked back into my master tree.
Of course, projects using my libraries would want to keep their own version control history for their local copy of my libraries. I therefore created a version of CVS where the "CVS" subdirectory name that contained or pointed to metadata could be specified on the command line. So the default CVS metadir might be for the project using my library, whereas CVS.libag would point to the master tree for my library.
This is not too unlike having a local repository that you check in to, and a parent repository that you push to.
Which is not to say that the other good things of distributed version control are not desired.
My main point: I want to be able to check out an arbitrary subdirectory. And then check it back in.
And not have to do manual merges to all of the projects that include that subdirectory.
And certainly not to have to do stupid things like cloning the entire tree, and then building a workspace with only a subtree.
I understand the desire for atomic checkins of an entire project.
I think I have figured out [[how to reconcile multifile atomicity and subset checkins]], see that section.
I say again: it is NOT good enough to have to structure each separate module as a separate project with separate repositories.
If nothing else, often library modules begin as tools in some other module,
and then get refactored into independence. I want to preserve the entire history, including the history with the not-yet-nascent module was part of some other module.
Like I said, CVS could handle this well enough, since it just allowed subtrees to be checked out.
But it had all of the other stupidities of centralized version control.
SVN didn't handle it so well as CVS: its attempts at atomic checkin got in the way.
Git and Hg just seem to have given up on this.
If I am wrong: PLEASE TELL ME. I would rather not have to write a new VC tool.
I was actually starting to write a DVCS in my copious spare when Bitkeeper cut off Linux. Git and Mercurial got written far quicker than I could write mine. But, after years of trying to cope with their limitations, I am about to give. (Also, especially, way back then Intel owned evertything I did, with no guarantee of it being open-sourceable. Whereas with my new employer, IV, I am allowed to open source this.
= Other Good Stuff I would Like in a VC Tool =
The idea of using cryptographic checksums is a good idea. Hg and Git do this, based on Monotone's initial work, which in turn was probably based on rsync. However, I dislike ignoring the possibility of hashes colliding. In my original work on this sort of "content based hashing", I deliberately choose to use a poor hash (UNIX sum or cksum) to cause collisions to occur, with the option of using a stronger hash later.
I like the idea of having an unsafe mode, where hashes are assumed to be unique, and a safe mode that ensures that hash collisions do not occur.
I used to go to great extents to try to keep the version data for a file near to the file itself. E.g. I used to keep the versions of a file in a subdirectory like CVS or RCS. However, this is not robust across renames and source tree reorganizations, so I have fallen back to the now standard technique of placing the versions in a centralized directory somewhere, typically something like .git or .hg off the root of the project.
I like the idea of automatically finding the .git directory by walking up the directory tree; but I also like the possibility of having pointer metadata like CVS subdirs, since I have occasionally had to work on systems where the source tree did not fit on one disk partition. Or environment variables. Heck, this dates back to my 1986 era publication, boxes, links, and parallel trees; I just no longer think the parallel trees are necessary, although they are nice to have.
I prefer files to databases, since databases nearly always need sysadmin infrastructure such as backups. However, atomicity is good, even though it appears that the requirements of atomicity in git/hg's filesystem based implementation have resulted in some of the limitations I am railing against. I am thinking more and more about using an sqlite database as an option to make it easier to maintain certain invariants atomically. However, I am also thinking about making it possible to reconstruct all of the information from the filesystem, assuming you did not crash in the middle of a checkin. At least, file data in the filesystem.
I want portability: my home directory migrates from various UNIXes to Linux to FreeBSD to Cygwin, and parts migrate to Windows.
My VC tool must run on Windows.
I place portability above performance.
I'm willing that my VC tool may be entirely written in a scripting language like Perl or Python,
and hence need minimal installation.
(I can probably code it faster in Perl, since I have many existing pieces, but may want to do Python.)
Although I would be happy if the performance critical parts could optionally be in C or C++.
Git's approach of a user level filesystem is a good idea.
I am seriously tempted by the idea of structuring this as several different layers of user level filesystem tools.
Unfortunately, I am having trouble disentangling the layers of git that I want from the layers that I do not.
See [[User Level Filesystem]].
I want to leverage as much existed code as possible.
As explained above, I want my VC tool to be usable for my home directory, with all of its subdirectories/subprojects
But I also want to be able to use my VC tool for my wiki, to allow me to
(a) work while disconnected on subsets of my wiki, and then merge,
and
(b) use it as a Poor Man's file replication subsystem.
(Another usage model that worked well on CVS, but works much less well on Git or Mercurial.)
In my dreams, I could create repositories at different points in a tree, and then merge them.
E.g. do a checkin at a/x and a/y, and then merge into a single repository for at includes both x and y.
(This can be done wth existing tools, but is a pain.)
Even further in dream-space, I would like to be able to do a filesystem c -R of a subtree, and then propagate the history.
I must beware - the desire to do this wasted time; the current BKM is a user level filesystem, svn cp or git mv.
But, it can *almost* be done,
if you have per directory CVS metadir.
Some metadir could be deep, with actual VC files, some shallow. One can imagine an operation that copies versoon data from the centralized .hg or .git directory at the root of a repo, to the CVS metadir-like files, and vice versa.
The usual good stuff: renaming. Both manual, and git's implicit.
== User Level Filesystem ==
Linus' treatment of git as a filesystem is a good idea.
It would be nice to have several different user level filesystem layers
* Basic functionality
** Names
*** Store as files, with original names, parallel tree structure
*** Store as numeric names
*** Store as content hash names
**** handling hash collisions
*** Store as human friendly, but not strictly original names
*** Store human friendly original tree structure pointing into obscure numeric space.
** Metadata
*** Store file data and metadata, e.g. in filename/.data and filename/.metadata
*** Allow user to collect arbitrary metadata, potentially non-adjacent
*** Nice ls for metadata
** Storage
*** Store as files (or files in directories)
*** Store in a tar archive
*** Store in a database
None of the above is really VC tool specific. Some may be useful in other contexts.
More VC specific:
* Storage
** Store as unpacked file/objects
** Store as packed file/objects - delta storage
While this is "more VC specific", it is not necessarily completely so. One can imagine wanting tools that accomplish sinmilar compression. It's VC specific mainly in that te VC DAG is a guide to how things shuld be packed.
Git's pack-objects is almost exactly what I want. However, git-pack-objects seems not to be able to take ordinary filenames as the things to be compressed; it requires SHA hashes.
TBD: figure out if this can be disentangled from the rest of git.
If we get this far, we have the ability to manage user level storage. Heck, we culd move on to the next step before the delta storage was implemented - it should be that independent.
Finally, we get to the point that is really VC specific
* Managing versions, branches, etc., on a whole project, sub-project, subdirectory tree, and even individual file basis.
See the next section, [[how to reconcile multifile atomicity and subset checkins]]
= [[How to reconcile multifile atomicity and subset checkins]] =
I think that the fundamental problem that I am trying to solve,
that leads to poor support for subprojects in tools like git and hg,
is related to atomicity.
If you have a project in a repository, you want to make sure that changes to the project are atomic.
At least, changes to the main branch are atomic. All done, all tested, all checked in together.
But if a user checks out and modifies a subset of the files, then it cannot be checked back into the main branch.
You really need to start a branch for all projects that those files belong to, including all those files in tjose projects and those branches, including files tghat the user is not working on. Edit the files or module that the user is working on.
Check back in - to the branch (or branches).
Test.
Merge.
I.e. every subset checkout/checkin corresponds to creating a branch ON ALL PROJECT BRANCHES THAT MAY BE AFFECTED BY THE CHANGE.
Note that I am saying "subset" rather than "subproject". I am trying to emphasize that the subset may be defined dynamically, ad-hoc,
not necessarily a priori.
Note also that their is not necessarily a single superset or superproject.
But, imagine that the subset or subproject is in a separate organization from the superset or superprojects. In different repositories. You can't allow the subset.subproject to create a branch in the superset/superproject's repository, and vice versa. You have to provide the ability to track contours, tags, and branches without access to the other's repository.
I encountered this ability with AMD's RCS based build lists, BOM files that are essentially out of repository tags. By now obsolete, but an interesting insight. We don't need to go there yet, but it's an interesting insight.
So here's where I think that we need to go: If I make changes to a subset/subproject in particular to a branch of the subset that is depended on by a superset/superproject
# I do not want to automatically merge those changes back into the superset/superproject; you have to allow the chance for the merge to be tested,
# but I don't want all of the possibly very large number of superset/superprojects to have to know of the possibly very large number of dynamically created subset/subprojects.
The existing methodologies, the existing flows, seem to work when there is a small number of branches known in advance. It breaks down when we get a large number of implicit, dynamic, branches.
So, I think that we want to make these implicit relationships explicit. Instead of the main branch of SuperProject1 being manually merged from the main branches of SubProjectA and SubProjectB, I thijk that we want to record or deduce that SuperProject1 contains SubProjectA and SubProjectB, specifically, the main branches thereof. And when A and B get modified (let's assume A and B do not overlap), I want SuperProject1's main branch to get told "Hey, you are at Vxyz, but your subcompoments A and B have advanced, in two seaparate non-overlapping checkins. These are candidates fr merger into you, SuperProject1." And now the SuperProject1 maintainer can go and do the merges, together or separately as he may wish, directly into SuperProject1's main branch or into task branches for the merge as he may wish.
When I first started thinking about this, I thought that the checkin of SubProjectA might add itself as a candidate to SuperProject1. I know realize this is bogus. Only a project should manipulate itself.
However, the idea of candidates is a good one. It is not that the checkin of A creates a candidate for Seperproject1; it is that SuperProject1 knows how to go look for candidates that should be merged into it.
I.e. a project, or more precisely a branch of a project, should have rules about what stuff should be considered as candidates for merge. These rules could be something like "Automatically consider as a candidate for merge into the release branch stuff on the test branch that has been labelled TESTS_PASSED". But it may also be something like "Automatically consider as candidates for merge any modification or addition to any subdirectory under such and such a place in the hierarchy, which is marked as being MAIN branch for that subobject."
Now, it might be easiest to fall back to per-file or per-pathname history: a checkin to a subset or superset that contains a file automatically makes changes to the history objects for such a file. I'm not sure that this is required. It may make things easier. But it gets a bit confusing across renames, etc. But it means that we do not NECESSARILY have to scan subsets, etc., that have been dynamically created. A superproject may be composed or related to a set of known other projects or other branches, or to an enumerable set of locations in an abstract filesystem tree.
I think it is best to start with file objects, and then try eliminating them.
However, file objects, or even non-file subset objects, raises issues of atomicity. In git, you are only modifying a single file, a manifest, at any time. If we have superprojects and subsets and individual file history objects, do we need to manipulate these atomically?
* Not necessarily. We can imagine modifying the individual file history objects, and then modifying the subersets one at a time. The latest versions of the file objects may be inconsistent, but the versions of the set objects would be consistent. Heck, if you wanted, you could backlabel the file history objects with a tag saying that they have been consistently checked into some superset.
* However, we might want to prevent even those minor inconsistencies - even though, AFAICT, git and hg have them.
** Being willing to use a transactional database, like sqlite, as part of the metadata system may get us around even this.
*** I'm grudgingly willing to go along with sqlite because it is a database in an ordinary file.
*** I would hope the different user level filesystem instances can be created, both with and without.
I'm not particularly happy with the present generation of distributed version control tools:
Mercurial (hg) or git. Nor Bazaar (bzr).
Nor their predecessor Bitkeeper.
And certainly not CVS, SVVS, or RCS.
= To Do List =
At the top, to put it in my face:
* I'd like to extract code from git-pack-objects (or mercurial) that takes a list of not-hash-named files, and compacts them into delta storage. And then wrap my own version control concepts, in layered user level filesystem scripts, around it
* Leverage existing merge tools.
= Blog Wiki Crosslinks =
http://wiki.andy.glew.ca/wiki/The_Version_Control_Tool_I_Want
= My Main Complaint: Subprojects, Lack Of =
My main complaint:
I want a distributed version control system that naturally supports single file or sub-directory checkouts, subprojects, etc.
Ironically, this is somewhere where CVS (and CVS's immediate predecessors, various RCS tools (some of which I wrote at various companies))
was better than its successors SVN, BitKeeper, Git, Mercurial.
== My VC'ed Home Directory as a Tree of Subprojects ==
For many years - at least since 1985, and I think even a bit earlier - I have been a heavy user of version control.
In particular, I have long VC'ed my home directory.
Now, my home directory is not monolithic. I might check out different parts of it on different systems.
But I do not, at least historically have not, needed to divide it into separate repositories, and logically merge those together.
I have tried, and have not liked, various tools that allow multiple repositories to be logically merged into a single workspace.
(I think such tools are great for when you have to merge different repositories, from different organizations and/or under different VC tools. But I think they are overkill when I can get away with everything being in a single repository.)
E.g. under glew-home, my repository, I have subdirectories like glew-home/src, glew-home/src/include, glew-home/src/libag, glew-home/src/libag/number, glew-home/src/libag/builtin_initialization, etc. Currently roughly 50 of these libraries.
Each in a separate directory. Each designed to be used in isolation, although there are occasional interdependencies. Often header only.
When I started doing this, long, long ago, I might create a library libag.a that somebody would have to link into a binary.
This was an impediment to people using my libraries. Projects would not want my entire libag; they would only want the minimum, the stuff that they were using, and nothing else. Hence my evolution: a module in a directory, check out that directory, place it anywhere, include its header and, usually, that was all you needed.
If I were to structure these as independent projects, that would be roughly 50 independent repositories. Much more of a hassle to manage.
Fortunately, CVS made it possible to check out an arbitrary subdirectory tree. (This is one reason why I evolved to a directory per library module - one directory, to get the source file (typically a header) and any test files associated with it.) That subdirectory could be placed anywhere in the target workspace. But it could always get checked back into my master tree.
Of course, projects using my libraries would want to keep their own version control history for their local copy of my libraries. I therefore created a version of CVS where the "CVS" subdirectory name that contained or pointed to metadata could be specified on the command line. So the default CVS metadir might be for the project using my library, whereas CVS.libag would point to the master tree for my library.
This is not too unlike having a local repository that you check in to, and a parent repository that you push to.
Which is not to say that the other good things of distributed version control are not desired.
My main point: I want to be able to check out an arbitrary subdirectory. And then check it back in.
And not have to do manual merges to all of the projects that include that subdirectory.
And certainly not to have to do stupid things like cloning the entire tree, and then building a workspace with only a subtree.
I understand the desire for atomic checkins of an entire project.
I think I have figured out [[how to reconcile multifile atomicity and subset checkins]], see that section.
I say again: it is NOT good enough to have to structure each separate module as a separate project with separate repositories.
If nothing else, often library modules begin as tools in some other module,
and then get refactored into independence. I want to preserve the entire history, including the history with the not-yet-nascent module was part of some other module.
Like I said, CVS could handle this well enough, since it just allowed subtrees to be checked out.
But it had all of the other stupidities of centralized version control.
SVN didn't handle it so well as CVS: its attempts at atomic checkin got in the way.
Git and Hg just seem to have given up on this.
If I am wrong: PLEASE TELL ME. I would rather not have to write a new VC tool.
I was actually starting to write a DVCS in my copious spare when Bitkeeper cut off Linux. Git and Mercurial got written far quicker than I could write mine. But, after years of trying to cope with their limitations, I am about to give. (Also, especially, way back then Intel owned evertything I did, with no guarantee of it being open-sourceable. Whereas with my new employer, IV, I am allowed to open source this.
= Other Good Stuff I would Like in a VC Tool =
The idea of using cryptographic checksums is a good idea. Hg and Git do this, based on Monotone's initial work, which in turn was probably based on rsync. However, I dislike ignoring the possibility of hashes colliding. In my original work on this sort of "content based hashing", I deliberately choose to use a poor hash (UNIX sum or cksum) to cause collisions to occur, with the option of using a stronger hash later.
I like the idea of having an unsafe mode, where hashes are assumed to be unique, and a safe mode that ensures that hash collisions do not occur.
I used to go to great extents to try to keep the version data for a file near to the file itself. E.g. I used to keep the versions of a file in a subdirectory like CVS or RCS. However, this is not robust across renames and source tree reorganizations, so I have fallen back to the now standard technique of placing the versions in a centralized directory somewhere, typically something like .git or .hg off the root of the project.
I like the idea of automatically finding the .git directory by walking up the directory tree; but I also like the possibility of having pointer metadata like CVS subdirs, since I have occasionally had to work on systems where the source tree did not fit on one disk partition. Or environment variables. Heck, this dates back to my 1986 era publication, boxes, links, and parallel trees; I just no longer think the parallel trees are necessary, although they are nice to have.
I prefer files to databases, since databases nearly always need sysadmin infrastructure such as backups. However, atomicity is good, even though it appears that the requirements of atomicity in git/hg's filesystem based implementation have resulted in some of the limitations I am railing against. I am thinking more and more about using an sqlite database as an option to make it easier to maintain certain invariants atomically. However, I am also thinking about making it possible to reconstruct all of the information from the filesystem, assuming you did not crash in the middle of a checkin. At least, file data in the filesystem.
I want portability: my home directory migrates from various UNIXes to Linux to FreeBSD to Cygwin, and parts migrate to Windows.
My VC tool must run on Windows.
I place portability above performance.
I'm willing that my VC tool may be entirely written in a scripting language like Perl or Python,
and hence need minimal installation.
(I can probably code it faster in Perl, since I have many existing pieces, but may want to do Python.)
Although I would be happy if the performance critical parts could optionally be in C or C++.
Git's approach of a user level filesystem is a good idea.
I am seriously tempted by the idea of structuring this as several different layers of user level filesystem tools.
Unfortunately, I am having trouble disentangling the layers of git that I want from the layers that I do not.
See [[User Level Filesystem]].
I want to leverage as much existed code as possible.
As explained above, I want my VC tool to be usable for my home directory, with all of its subdirectories/subprojects
But I also want to be able to use my VC tool for my wiki, to allow me to
(a) work while disconnected on subsets of my wiki, and then merge,
and
(b) use it as a Poor Man's file replication subsystem.
(Another usage model that worked well on CVS, but works much less well on Git or Mercurial.)
In my dreams, I could create repositories at different points in a tree, and then merge them.
E.g. do a checkin at a/x and a/y, and then merge into a single repository for at includes both x and y.
(This can be done wth existing tools, but is a pain.)
Even further in dream-space, I would like to be able to do a filesystem c -R of a subtree, and then propagate the history.
I must beware - the desire to do this wasted time; the current BKM is a user level filesystem, svn cp or git mv.
But, it can *almost* be done,
if you have per directory CVS metadir.
Some metadir could be deep, with actual VC files, some shallow. One can imagine an operation that copies versoon data from the centralized .hg or .git directory at the root of a repo, to the CVS metadir-like files, and vice versa.
The usual good stuff: renaming. Both manual, and git's implicit.
== User Level Filesystem ==
Linus' treatment of git as a filesystem is a good idea.
It would be nice to have several different user level filesystem layers
* Basic functionality
** Names
*** Store as files, with original names, parallel tree structure
*** Store as numeric names
*** Store as content hash names
**** handling hash collisions
*** Store as human friendly, but not strictly original names
*** Store human friendly original tree structure pointing into obscure numeric space.
** Metadata
*** Store file data and metadata, e.g. in filename/.data and filename/.metadata
*** Allow user to collect arbitrary metadata, potentially non-adjacent
*** Nice ls for metadata
** Storage
*** Store as files (or files in directories)
*** Store in a tar archive
*** Store in a database
None of the above is really VC tool specific. Some may be useful in other contexts.
More VC specific:
* Storage
** Store as unpacked file/objects
** Store as packed file/objects - delta storage
While this is "more VC specific", it is not necessarily completely so. One can imagine wanting tools that accomplish sinmilar compression. It's VC specific mainly in that te VC DAG is a guide to how things shuld be packed.
Git's pack-objects is almost exactly what I want. However, git-pack-objects seems not to be able to take ordinary filenames as the things to be compressed; it requires SHA hashes.
TBD: figure out if this can be disentangled from the rest of git.
If we get this far, we have the ability to manage user level storage. Heck, we culd move on to the next step before the delta storage was implemented - it should be that independent.
Finally, we get to the point that is really VC specific
* Managing versions, branches, etc., on a whole project, sub-project, subdirectory tree, and even individual file basis.
See the next section, [[how to reconcile multifile atomicity and subset checkins]]
= [[How to reconcile multifile atomicity and subset checkins]] =
I think that the fundamental problem that I am trying to solve,
that leads to poor support for subprojects in tools like git and hg,
is related to atomicity.
If you have a project in a repository, you want to make sure that changes to the project are atomic.
At least, changes to the main branch are atomic. All done, all tested, all checked in together.
But if a user checks out and modifies a subset of the files, then it cannot be checked back into the main branch.
You really need to start a branch for all projects that those files belong to, including all those files in tjose projects and those branches, including files tghat the user is not working on. Edit the files or module that the user is working on.
Check back in - to the branch (or branches).
Test.
Merge.
I.e. every subset checkout/checkin corresponds to creating a branch ON ALL PROJECT BRANCHES THAT MAY BE AFFECTED BY THE CHANGE.
Note that I am saying "subset" rather than "subproject". I am trying to emphasize that the subset may be defined dynamically, ad-hoc,
not necessarily a priori.
Note also that their is not necessarily a single superset or superproject.
But, imagine that the subset or subproject is in a separate organization from the superset or superprojects. In different repositories. You can't allow the subset.subproject to create a branch in the superset/superproject's repository, and vice versa. You have to provide the ability to track contours, tags, and branches without access to the other's repository.
I encountered this ability with AMD's RCS based build lists, BOM files that are essentially out of repository tags. By now obsolete, but an interesting insight. We don't need to go there yet, but it's an interesting insight.
So here's where I think that we need to go: If I make changes to a subset/subproject in particular to a branch of the subset that is depended on by a superset/superproject
# I do not want to automatically merge those changes back into the superset/superproject; you have to allow the chance for the merge to be tested,
# but I don't want all of the possibly very large number of superset/superprojects to have to know of the possibly very large number of dynamically created subset/subprojects.
The existing methodologies, the existing flows, seem to work when there is a small number of branches known in advance. It breaks down when we get a large number of implicit, dynamic, branches.
So, I think that we want to make these implicit relationships explicit. Instead of the main branch of SuperProject1 being manually merged from the main branches of SubProjectA and SubProjectB, I thijk that we want to record or deduce that SuperProject1 contains SubProjectA and SubProjectB, specifically, the main branches thereof. And when A and B get modified (let's assume A and B do not overlap), I want SuperProject1's main branch to get told "Hey, you are at Vxyz, but your subcompoments A and B have advanced, in two seaparate non-overlapping checkins. These are candidates fr merger into you, SuperProject1." And now the SuperProject1 maintainer can go and do the merges, together or separately as he may wish, directly into SuperProject1's main branch or into task branches for the merge as he may wish.
When I first started thinking about this, I thought that the checkin of SubProjectA might add itself as a candidate to SuperProject1. I know realize this is bogus. Only a project should manipulate itself.
However, the idea of candidates is a good one. It is not that the checkin of A creates a candidate for Seperproject1; it is that SuperProject1 knows how to go look for candidates that should be merged into it.
I.e. a project, or more precisely a branch of a project, should have rules about what stuff should be considered as candidates for merge. These rules could be something like "Automatically consider as a candidate for merge into the release branch stuff on the test branch that has been labelled TESTS_PASSED". But it may also be something like "Automatically consider as candidates for merge any modification or addition to any subdirectory under such and such a place in the hierarchy, which is marked as being MAIN branch for that subobject."
Now, it might be easiest to fall back to per-file or per-pathname history: a checkin to a subset or superset that contains a file automatically makes changes to the history objects for such a file. I'm not sure that this is required. It may make things easier. But it gets a bit confusing across renames, etc. But it means that we do not NECESSARILY have to scan subsets, etc., that have been dynamically created. A superproject may be composed or related to a set of known other projects or other branches, or to an enumerable set of locations in an abstract filesystem tree.
I think it is best to start with file objects, and then try eliminating them.
However, file objects, or even non-file subset objects, raises issues of atomicity. In git, you are only modifying a single file, a manifest, at any time. If we have superprojects and subsets and individual file history objects, do we need to manipulate these atomically?
* Not necessarily. We can imagine modifying the individual file history objects, and then modifying the subersets one at a time. The latest versions of the file objects may be inconsistent, but the versions of the set objects would be consistent. Heck, if you wanted, you could backlabel the file history objects with a tag saying that they have been consistently checked into some superset.
* However, we might want to prevent even those minor inconsistencies - even though, AFAICT, git and hg have them.
** Being willing to use a transactional database, like sqlite, as part of the metadata system may get us around even this.
*** I'm grudgingly willing to go along with sqlite because it is a database in an ordinary file.
*** I would hope the different user level filesystem instances can be created, both with and without.
Thursday, July 01, 2010
What I want to do on my vacation
I'm on vacation. 11 days (10 now). Needed one. Haven't really had a vacation in 2-3 years. Went straight from Intel to IV without any idle time in between.
Of course, maybe it's good that I didn't take any time between jobs: Last time I changed jobs, from AMD to Intel in 2004, I wrote up some inventions during the time between employers. Worked with IV to patent them. Disclosed them to Intel on being rehired, and re-disclosed them to Intel when it looked like these inventions might be relevant to Intel projects. Got frozen out: wasn't allowed to work on such projects at Intel, in my main area of expertise. Not so much for fear that I would influence Intel to use my inventions - heck, knowing me, I would probably try to surpass them. But (according to an Intel lawyer, verbally - they never email stuff like this) because if I worked on such a project, I would be able to see if Intel was infringing my patents.
But, in any case, I'm on vacation now. Needed one. Recover from the full court press, the end of quarter rush to meet (and exceed) quotas.
So, what do I want to do on my vacation?:
http://wiki.andy.glew.ca/wiki/To_Do_List
Although really that depends on me getting my servers sorted.
---
Oh, yeah: I also want to do some R&R. Read some SF. Get some exercise. Go on hikes with my wife and daughter and dog. (The wife and daughter are usually too busy, the dog is happy to join me.)
Of course, maybe it's good that I didn't take any time between jobs: Last time I changed jobs, from AMD to Intel in 2004, I wrote up some inventions during the time between employers. Worked with IV to patent them. Disclosed them to Intel on being rehired, and re-disclosed them to Intel when it looked like these inventions might be relevant to Intel projects. Got frozen out: wasn't allowed to work on such projects at Intel, in my main area of expertise. Not so much for fear that I would influence Intel to use my inventions - heck, knowing me, I would probably try to surpass them. But (according to an Intel lawyer, verbally - they never email stuff like this) because if I worked on such a project, I would be able to see if Intel was infringing my patents.
But, in any case, I'm on vacation now. Needed one. Recover from the full court press, the end of quarter rush to meet (and exceed) quotas.
So, what do I want to do on my vacation?:
- Invent Stuff. On my own time, using my own resources. Write up disclosures. If I can get IV to patent, great. If not, I'm allowed to publish it.
- Write articles on comp-arch.net
- Webadmin work on comp-arch.net and my other websites
- I am grudgingly concluding that I need to run my own server.
- No shared hosting site that I can find allows me to have suexec withing subdirectories.
- I hate this need to run my own server. I *like* having someone else responsible to apply patches. But it appears that no shared hosting service makes it easy or cheap to run suexec and SSL at the same time.
- Coding for comp-arch.net and my other websites
- Drawing and table editor - in particular, I want to try to incorporate svg-edit
- Share software
- my Igetnumber and Dgetnumber libraries, that recognize expressions such as 4K-1 (=4095)
- PerlSQL
- my "noSQL" database tool from 1997-2000
- not really noSQL - it actually supports SQL, but does not require schemas
- I'm not aware of any other schema free database that supports SQL and operates on plain text files. Not sqlite, not ...
- haven't worked on it since I rejoined Intel; now I'm at IV I'm allowed to work on this in my own time
- written in Perl 4 IIRC, needs to be upgraded
- feature wishlist: JSON, XML, ...
- Software I want to write - actually, software I DO NOT REALLY want to write, but which I have not been able to find good existing software for
- a distributed version control system that natirally supports single file or sub-directory checkouts, etc. I think Git and Mercurial have it wrong.
- although Git's apprach of treating it as a user level filesystem is right. In fact, I wouldn't mind being able to use git's storage backend.
- basically, I want to get back to VC'ing my home directory, treating it as a tree of subprojects. Something that I was able to do with SCCS, RCS, CVS, SVN, but which broke with Git and Mercurial.
- Graphical editing - as described above, for my wiki
- Offline wiki
- Probably means I have to move off mediawiki
http://wiki.andy.glew.ca/wiki/To_Do_List
Although really that depends on me getting my servers sorted.
---
Oh, yeah: I also want to do some R&R. Read some SF. Get some exercise. Go on hikes with my wife and daughter and dog. (The wife and daughter are usually too busy, the dog is happy to join me.)
Subscribe to:
Posts (Atom)