The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Sunday, September 30, 2012

Is Google Sites a wiki? I think not.

I went to Ward's oriogional wiki, http://c2.com/cgi/wiki?WikiDesignPrinciples, seeking ammunition to blast Google Sites (http://sites.google.com) with.

Although http://en.wikipedia.org/wiki/Google_Sites says that Google Sites is a wiki, I don't think it is.  Or, at least, Google Sites is a lot harder and more painful to use than most of the wikis I am familiar with (Ward's original, mediawiki, twiki, moinmoin, zwiki, ...)

I think Google Sites, née JotSpot, is really one of those CMS that started independently, and then tried to adopt the wiki moniker after the fact.  Or else was developed by someone unfamiliar with wiki, who put a wiki-like cast over something else.
     Overall, in terms of my least favored, most hated, wiki faker sites,  Google Sites is less than Atlassian Confluence, and roughly equivlant to Microsoft's FlexWiki.

My main complaint: Google Sites makes it hard to create links to pages that do not exist yet.

Actually, Google Sites makes it hard to create links period:  There is no quick syntax like WikiWord or [[double bracket links]].  The keyboard shortcut alt-I gets you to a place where you have to choose from a too-ling list of link tyes.

But, worst IMHO: there doesn't seem to be a way to create a link to a page that does not exist yet.  At least not that I have found.  In creating a link I can create a new page - but there's a big difference between a link to a page that does not exist, and a link to a newly created page.


I *want* ti like Google Sites as a wiki.  But IO can't.


Google dpcs is even less wiki-like.

Tuesday, September 25, 2012

emacs kill limited to 64K across frames (copy-region-as-kill / yank)

I just realized that cutting and pasting more than 64K bytes does not work in emacs 24.1.1

... if I cut in one frame and paste (yank) in a different frame

It is truncated at 64K.

It works if I cut and paste in the same frame.


I have encountered this bug when cutting and pasting from emacs to X.  Not sure if it is related here.

Improving Test Pass Rate and Correlation

I prefer to do TDD, Test Driven Design - heck, often the original term, Test First Programming.

It's nice to write a test that fails, write the code, pass the tests including the new one (getting a green bar on your testing widget, or a count of tests failed=0 if not so GUI), rinse, lather, and repeat.

Sometimes I write a bunch of tests in advance, but comment, ifdef, or if out all bit the first test that I do not expect to pass.  Get the green bar of all current tests passing, enable an already written test, and repeat.  This gives me an idea of where I am going overall, preventing getting bogged down as tests proliferate for minor details.

It is important, useful, good, to incrementally enable such tests, rather than enabling them all at once.

a) It's nice to get the green bar when all tests pass.  Positive reinforcement.  If a whole slew of tests are written in advance and are mostly failing, it can be too easy to lose sight of the progress you are making.  Or to go backwards without noticing - e.g. 2 tests start working and 1 fails, looks like +1 started working, so you may miss the new failure.

b) on a more mundane level, this allows you to sketch out tests, in languages like C++, that do not even compile yet.  It is a waste of time to write a whole slew of tests in advance, spend ages getting them to compile, only to realize that there was a design flaw along the way that becomes evident as you get some earlier tests to run.

OK: TDD good.  Keeping tests mostly running, good.


But unfortunately sometimes we work in less Agile organizations.  Or in Agile teams, coupled to an ordinary Q&A team.  The sort of Q&A team that writes a thousand test cases in the first week, and then goes off and does something else while you take months to implement.  (It is especially easy to write such test cases if there is a random pattern stress tester, or some automation that creates many similar tests, slightly different.  A maze of twisty little tests, all slightly different.)

Worse: the QA team may be working concurrently. So, yesterday you had 100 tests passing, 900 failing.  Today you have 110 passing, 990 failing?  Is that an improvement, or did they just write some tests that test already implemented features?

It is depressing to have a thousand tests failing. And to have to pass/fail stats vary almost randomly.

Beyond depressing, it doesn't help you decide what to work on next.  Some of the test failures are expected - unimplemented features.  Some may be the feature you are coding now - higher priority.  Some are for features that were working, but which regressed - possibly higher priority to fix.

Faced with such an amorphous mass of tests, it seems a good idea to carve out a subset that is known good, and grow that.

Also, to do a preliminary triage or failure analysis, and create an simple automated tool that classifies the failures according to expected fail/pass, type of error, etc.   This way, you pick a class of failing tests, write the code to make them pass, and then repeat - always choosing a class of failing tests that are incrementally easy to write the code to make pass, rather than arbitrarily choosing a test that may require much blind coding before it can pass.

Run that regularly.  Use it to guide your work.

Or, at least: I'm using it to guide MY work.

And, along the way... yes, I'll also do some TDD test-first of my own. In my experience, the QA tests often stress, but often do not test the most basic features.

Monday, September 24, 2012

What I want in PIM software


I confess:
I keep trying to use [[PIM (Personal Information Managemernt)]] software,
but I keep getting disappointed.
I keep falling back to low tech solutions,
such as index cards.

* In 1997 Moshe Bach, seeing me using the paper ScanCard system years ago, said that he had lost all respect for me as a hacker. :-(
* See [[Organization, technology, evolution of]]

But... I know the advantages of computer.

This page for notes on what I want in such a system.

Perhaps I should keep these notes in private, and secretly developer a killer app.
In my copious free time.

= Embedded in a Log =

My overall vision:
* smart stuff embedded in what is overall a log, a diary.
* i.e. overall unstructured, with structured nuggets embedded in it.

Actually, a log has some structure: timestamped items.
But further specialized structure embedded in such items:
[[to-do lists]],

I envision this as XML, but am not religious about XML.
JSON, however, I think is too stupid to represent what I want
without a lot of extra work.

= Checklists =

== Transcludable checklists ==

Checklists should be hierarchical. More: embeddable. [[Transcludable]].

Nearly everything should be transcludable, both as source and destination.
E.g. text blocks should be transcludable into checklists, and vice versa.

E.g. the checklist I use to remind me what not to forget when I go to the coast
may have my PC travel checklist and my kayaking checklist transcluded into it.

Checklists and to-do lists are closely related.

== Tracking partial progress ==

In my [[GTD]] [[tickler file]], I have many repetitive items like
"Check that all of my many email accounts" are working.
I try to automate as much as possible, but manual checks are still worthwhile.
E.g. "check that I can log int to all of my password manager accounts".
"Check that I can log into my password manager".

I have tried making the check of each account an individual tickler item.

But making a single item a to-do for checking that all of my many email accounts is working is equally overwhelming.

I want to create a single repetitive checklist.
Listing all of the things I want to check.

Repeating with an overall frequency.

A checklist to remember what I have finished on the current iteration,
and, more important, what I have left to do.

Supports activities that you do at a low boil, such as checking one account per day.

= Projects =

Following [[GTD]], there are projects, separate from tasks.

I divide my projects into
* [[Current Projects]]
** things I am currently working on. Things that I am trying to actively drive.
* [[Background Projects]]
** things I may work on for months or years
** accumulating info as I go.

= Scrubbing =

One of the most important things about a PIM is scrubbing.
Ensuring that information is revisited periodically.

Tasks need to be revisited on next action dates.
Ditto current projects.

Background projects may not have next action dates
- although revisiting as time allows is a good idea.

Revisit all reference items?  All assets like checklists?

= Reminders =

It is often useful to have not just an agenda for the day,
but also a next reminder.
Many items seem too unimportant to put on an agenda or calendar,
but are importahnt enough to create a reminder for.

When completed, the reminder can be automatically logged.

It is often important to create reminders not just for the beginning of a meeting,
but also for the end.
That way, you can choose not to allow your  meeting to run longer than planned, even if nothing is scheduled next.
(I assume that there are always jobs that can be fit into unscheduled tgime).

= Calendars - min time =

It should be possible for a calendar to reserve, not just specific blocks of time, but specific amounts of time at no particular instant.

E.g. "I want to reserve 8 hours per week for 'sharpening the saw' - it doesn't matter when, but must be in blocks of >2hours".
Accept meeting requests uip to that point...

= Exercise Trackers =

I have been enjoying Android exercise trackers that count situps, pushups, etc.
But many are too specific.
It should be possible to have a simple counter app that can count...  whatever you want.
E.g. sun salutations in yoga.

Basically, name of thing being counted,
and generic count trigger: shake, touch, etc.

= Trackers =

Should be automatically logged.


Did I mention, again, that I want same formatting in wiki blog log etc.

Tuesday, September 18, 2012

Cloud filesystems and permissions - not just user, but also user/local/device - DRM for users?

IMHO things like Dropbox and Google Disk (aka Docs) are just steps towards a web or cloud filesystem.

Frustrated by the deficiencies of to-do list and organizer apps, I am playing around with just using files - on one of these nascent cloud filesystems. Perhaps edited by hand.  (Hey, emacs runs on my android pda-like device, but I haven't tried org-mode.)  Perhaps with a nicer front end.

Frustrated because I really need the ability to do offine editing on my PDA.  Google Disk only allows offline viewing.  Dropbox... unclear... certainly allows files to be queued for upload when offline... but the first few text editors I tried just lose the data when you try to edit offline, and then save (of course, so many don't have an explicit save).

Perhaps I'll cobble something together with git or mercurial. After all, DVCS is just a Pootr Man's occasionally connected wide area network filesystem.

But, the above is just background to my post.

As I play around with Dropbox and Google Disk, I am

1) happy to see how easy they have made it to share things.  Share is a button as prominent as "Save File" used to be.

2) scared at how easy they have made it to share things by accident. I have several times hit Share by accident. (Especially since Google Chrome has oscillatory layout issues - buttons keep jumping around.)  In another app, MapMyHike, every time I save I am asked if I want my record to be private or public.  Doesn't seem to be a default setting so that I can disable the Public option. Several times I have saved as public by accident.  (I should not need to have to explain the security issues with recording your hikes publicly accessible.)

But more... playing around with Dropbox and Google Drive synching and offline access:

It occurs to me that permissions need to be not just by user.  Not just by user/role (as I have discussed elsewhere).  But also by user and locale, by user and device.

I.e. there is some stuff that I may want to have on a cloud filesystem that I may never want to have on my phone or PDA.

There is some stuff that I may want to have on a cloud filesystem, that I may want to access on my phonbe or PDA, but that I may never want to have cached or enabled for offline access.

This is, I suppose, just a role.  But instead of having to manage many roles, I might just want to say "Never save this file offline on a phone or PDA".  Simple.

Hmmm.... I want to say that this is just a capability.  But it isn't, is it?   It is rather like a capability modifier, something associated with the data, not the user.

In some ways, it is an application of DRM, except for the benefit of the user, not just the motion picture industry.

Thursday, September 13, 2012

Partial checkins, file objects not necessary

I very much want partial checkins and checkouts in my version control systems.  I had them in SVN and CVS.  I lost them with DVCS.

I have posted elsewhere about DVCS compatible concepts that might support partial checkins and checkouts.

Partial checkouts are not that hard to imagine.  E.g. to checkout a subdirectory tree, scan the repo for all history relating to files that ever were in that subdirectory.
     Use whatever "history relating to" criteria you want: was in a file in that subdirectory, was in a file that at some point in its history was ever moved under that subdirectory.  Possibly the hypothetical git-like "looks like a piece of code that once lived in a file under that subdirectory". Just not "true always".
     One might provide the full history of all such files in the partial checkout or clone, although it might not be possible to checkout some files from the partial clone repo history into the partial clone workspace, since to be properly checked out some versions might lie outside the cloned subdirectory.  But one could at least access from the history.
     Some fun in ensuring that one can check out into the workspace different versions that lived at different places in the overall directory tree - i.e. where the whole subtree was moved.

Partial checkins are a bit more of a problem, given the desire for atomicity and conistency: one wants to at least give a chance to ensure that an entire repo is consistent, can pass all tests. Automatically including a partial checkin in the default trunk of the enclosing project would break this.
     The basic idea is for partial checkins to automatically create branches.  And then for the superrepository to be strongly encouraged to merge such partial checkin branches into the default trunk (or whatever branch was partially checked out from).  Somewhat like the way Mercurial automatically creates heads, anonymous branches, when there were conflicting edits to the same branch.  Now one would have heads, but not all heads would imnclude the entire repo, some might be for partial checkins.

 I have long been troubled, however, is that the easiest way to imagine building this is to create history objects that correspond to files.  A project version would be a mapping of pathnames within the repo to a set of history objects. A project might be defined to want to include the latest version of any particular history object going forward, but not to actually include it until the user has had a chance to test all together.

This troubles me because, while I think Linus is excessively pedantic for forbidding rename tracking in git, I agree with him that file rename tracking is primitive. I really do want to track functions as they are cut and pasted between files.
     I have long been fascinated by bits and rumours I have heard about non file based version control systems.  E.g. old systems that were conceptual card decks - where you didn't lock a file, but you checked out and in ranges of cards, potentially replacing a range with a larger or smaller number of cards.  Or with IBM Visual Age for Smalltalk/Java/C++, which seems not to be inherently file granularity.

History objects corresponding to files get in the way of this.

I think now I can see how to generalize this.  I often imagine a filesystem as an XML database - the hierarchy is natural. The old "version control of card decks" applies. One can imagine the patches for any subtree being under that subtree in the XML.
     OK, XML is horrible.  But it just shows the direction. Not "patch objects" that apply to the while monolithic project, but patches that are themselves an interlaved collection of patches to files and subdirs.  With the ability to interleave and uninterleave patches according to filesystems.

Tuesday, September 11, 2012

Dataflow scheduling for ordinary software - providing error messages as early as possible

I just ran a rather slow tool that ran for a long time to produce an error message that it could have figured out immediately.

Oh, heck, I might as well say what it was:

I just typed in
hg revert -r default
instead of
hg revert -r default .hgrc
Mercurial went and scanned my home directory repository, which is rather large, taking 10+ minutes to scan, before it gave me the error message
abort: no files or directories specified
It would have been nice if it had reported the error message immediately, instead of scanning the repo.

This isn't just Mercurial specific: other tools have the same problem.  Some of them my tools.

It is nice if interactive tools report error messages as soon as possible.

Aspect oriented program might help.  Create an aspect for "command line checking", and reorder code that is related to command line checking earlier.

Dataflow analysis might also help.  In an ideal world, sans ambiguous points, etc., it would be evident that the error check depended only on the command line, and could be moved up earlier.


This is just part of my not-so-hidden agenda, to apply dataflow everywhere.

Monday, September 10, 2012


I am getting into trackers, as in Quantified Self.

But the tracking apps that you have to invoke manually are too high overhead.

I used to use a chess clock to track time working versus lost to IT bugs.

Idea: extend and adapt this for multiple trackers.

E.g. start off with an Android chess clock like tracker.  Big buttons saying working, lunch, hung, ...

Extend to scroll through a list of tracker/samplers. 

Friday, September 07, 2012

Find latest version of tool on path

Frustrated by systems where there isn't a single order of bin directories that can arrange for the most recent version of tools to be used, I wrote a tool to find the latest version of a tool on the path:

~/bin/path-version -latest -verbose --version -print -exec bash -c 'echo hi from bash $($SHELL -version)'
/home/glew/bin/bash: 3.2.25(1)-release
/usr/local/bin/bash: 2.05a.0(2)-release
/bin/bash: 3.2.25(1)-release
latest: /home/glew/bin/bash: 3.2.25(1)-release
exec /home/glew/bin/bash, -c echo hi from bash $($SHELL -version);
hello /bin/bash -c echo hi from bash $($SHELL -version)
hi from bash GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu) Copyright (C) 2005 Free Software Foundation, Inc.

--version (or -v, or -V, or -version) is the knob passed to the tool to query its version.

you can print all version, just the most recent, or just exec.


Based on a Perl library I just wrote, Version_Numbers.pm, that parses Version number strings of several syntaxes.  I.e. not just Perl's, and not just 1.2.3

I was surprised not to find such a library already existing.  Some were close, but not quite there.

Also compares.  Although the comparison rules are  a it two simplistic: strings are lexically sorted, whereas possibly

1.2rc should be less recent than 1.2, if the latter is what actually got released.

Thursday, September 06, 2012

Capabilities created on the fly

I'm a fan of capabilities.

But a problem with capabilities systems is creating and maintaining the capabilities.

If somebody has to design the capabilitis, then you end up with a finite list for whatever the creator anticipated you might ask for.  Something like Android's app privileges: "This app requests full Internet access."

Hell, no: I only want it to access particular sites.  And I only want it to send data from certain files. ...

But what OS writer wants to code all of that up?


How about creating capabilities on the fly.

E.g. create capabilities automatically for all syscalls:

"open( function_or_regexp_to_be_applied_filename, function_to_be_applied_to_permissions)"


only files under such and such a directory
    owned by such and such a user
    after a scan has ben done on them

"* ... "

any syscall
with a filename argument that meets certain criteria ...


Since syscalls are rather low level, might apply this to any function call or library.


In general, want functions to be applied before the call, given the call, arguments.  Possibly caller...  (like "No socket opens from the user interaction facility".

Possibly evaluate on the raw call and args.
Better yet if can be evaluated on the canonicalized data that might be recorded in a log file - e.g. where filenames are made absolute, etc.


This doesn't eliminate the need to design coherent capabilties systems.

But it does mean that you can fairly create iron-clad invariants, such as only files in a particular place are accessed.

Aargh!! Out of date systems!!!

Aargh!!! I am sick and tired of working on out of date Linux boxes (that I am not sysadmin ion, that I cannot easily update.)

Today's trivial annoyance:

bash prompt \D{strftime format}

doesn't work on the bash 2.05a.0(2) copyright 2001
installed on the machine I am working on at work
works on a more recent ubuntu ...  4.1.5(1) - copyright 2009


Many pf my frioends just maintain their own virtual machines with whatever they depend on
- usually more recent than work - installed.
I must start doing that.

(Last time I tried, I ran out of disk space. Plus, the laptop I can install on is much slower than the workstations I can run on, even though they have old software.)

And then there's the risk that what works in my virtual box won't work on a standard work machine that somebody else is using...


These are trivial annoyances.  But finding failures like this wastes a trivial amount of time.  Many times in any given week.