The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Wednesday, July 18, 2012

Log entries cloned to status

I want to make logs (diaries, journals, etc.) more useful.

I have discussed the distinction between log entries - "at time T I did or observed this" - and status records - "the current state, e.g. of a code tree or my home directory, is SSSSS".

I.e. log entries are transient and historical.

Status entries are persistent, and, supposedly, current.

The big problem is that status goes stale.  So status is always of the form "At time T the status was S (and I assume it still is, unless somebody has changed it, in which case they should please change the status)".

It is a pain to update both status and log.

Many log entries are status.  But the log is not status - if for now other reason than a log grows huge, and periodically gets cleaned out or renamed.

Idea: automaticaly mark blurbs that are being written as status entries, in addition o adding them to the log.  (I want to make everyting written get logged.)

E.g. copy them to ~/STATUS as well as ~/LOG.  Linked appropriately.


Andy "Krazy" Glew said...

A friend responded to this blog saying "why not just tag entries?"

Indeed, that is where I started. But tagging stuff in a central log/blog/whatever is not enough.

I see it as a problem of distribution, subsetting, and security.

E.g. I want to keep a single log. For all of my stuff – e.g.my blood pressure, my thougbts on politics, my hacking around with personal compute tools, and my work. It is too much of a hassle to figure out that my work log goes in ~/work/LOG, and my personal in ~/LOG, etc.

But some stiff is proprietary to work, and some is not. I am not allowed to keep a copy of my log notes relating to work on a personal computer. Howevrr, the converse also applies: I don’t necessarily want my health records to be stored on a work computer.

So, I think I need log software that gives me a single interface to write a log.
A log consisting of multiple timestamped items – what I call blurbs.
And then which allows me to go back and tag the blurbs as proprietary, personal, whatever,
and which, when that is done, moves the data (and ensures that there is no trace of the original blurb on the system I wrote it on)

So, you are right, tagging is part of it.
But you can’t leave the data in a central place, like a private blog.
You are not legally allowed to in many cases.

The same goes for LOGS and STATUSes.

Where do you store the status? In a central place? ~/STATUS. ~/work/STATUS?

Say that your project has taken some of my personal library of tools – most of the companies I have worked for have done that a bit.
Would you not want to have the LOGs and STATUSes that pertain just to library?

It’s all a question of partial checkouts. You don’t want all of~glew/src – you may only want ~glew/strc/lib/debug/debug.h.

But you want the log and the status info related to that subset.

Typically, I think of partial checkouts of subtrees of the filesystem.
But logs don’t work that way.

(Yes, some of that you get from a version control system. E.g. CVS allowed partial checkouts of subtrees of the repository. DVCSes like git and hg are not so nice in this regard.)

Imagine that I have a big tree.
I keep my logs and status in ~glew/LOG and ~glew/STATUS.
Imagine it is a database.

Then, when you check out ~glew/strc/lib/debug/debug.h
the portions ofthelog and status that pertain to what you have checked out are provided to you,
automatically extracted.

Or, perhaps better:

If I am editing ~glew/src/lib/debug/debug.h,
I can edit ~glew/strc/lib/debug/LOG
and ~glew/strc/lib/debug/STATUS.

(With the automatic cross linking / copying of LOG items to status that I mentioned in the blog)

Great. But if I want to look at everything I have done this week, all of the blurbs from all of the LOGS are assembled,
into ~/LOG.

I.e. ~/LOG.overall = find . –name ‘LOG’ | sort-blurbs

So far, so good.

But then I noticed that I often want to PRESERVE the LOG from DVCS clones that I am deleting.

I.e. I may be deleting a failed experiment, but I want to preserve the information I learned from it.

i.e. I want …/failed-experiment/LOG to be propagated up, and out, automatically.
Even though I may be deleting the files and directory it was associated with.

Andy "Krazy" Glew said...

So, it is not accurate to say that a LOG wants to be an ordinary file …/some-dir/LOG.
A LOG really wants to be something that sometimes appears at a local place in the tree, and sometimes appears globally. Hierarchically.

It's like aspect oriented programming. Imagine assembling a soup of all blurbs below you in the filesystem.
Or, possibly, that have ever been below you, even though there directories may have been deleted.

Now filter that soup, and sort.

But then again it can't be all that have ever been below you - because it is an ABSOLUTE REQUIREMENT that it be possible to completely and utterly delete stuff for proprietary reasons. So shit like Mercurial's immutable history is just that - shit.

Some say that such IP issues are rare, and that expensive and clumsy mechanism like rewriting history via hg histedit are okay.

Myself, I encounter such issues regularly, several times a day. Most people probably do too, you just don't realize it.

BACKGROUND: have you ever tried keeping a LOG inside a version control system?
Something that describes something as simple as all the branches in your version control tree.

In the checkin comments – problems with cleanliness. at the very least need to filter, extract, etc.

In files in the system – then you find that you want your LOG to be a sort of special file
that is visible in all branches. Or, somemay say, something outside of the VCS –
but then it also needs to be version controlled. Just not exactly the same way as ordinary files.

As you can tell, I continue to be fascinated by version control.

Andy "Krazy" Glew said...

As you can tell, this is a fun side project, something I think about.

By the way, when I say “log” I am not necessarily talking about a source code version control log. Oftentimes I am talking about a human log or diary. E.g. I find that I work better when I write notes about my work in ~glew/LOG, which at the moment I keep in ~glew/work/LOG.

Now, I think that some of the same principles applies to version control logs, diaries, journals – as well as the medical log that I use to keep track of my weight and blood sugar and exercise (I’m a newly diagnosed diabetic, and am trying to learn what it feels like to have high or low blood sugar. As distinct from having an allergy attack. And so on.)

For that matter, I am fascinated by how similar some of these problems are to maintaining speculative state in an advanced processor. E.g. I just realized that the problem of cherrypicking revsets, which is one of the most asked for features of DVCS, but which hg dos not really support, is very much like the problem of constructing a consistent history (a) for speculative multithreading, and (b) for memory ordering.

So, I write I write in ~glew/LOG -> ~glew/work/LOG. And on my blog, http://blog.andy.glew.ca/. And on my various wikis, e.g. http://wiki.andy.glew.ca and http://semipublic.comp-arch.net/.
I have various trackers on my phome – I use LifeTracker at the moment – and on websites – my weight is tracked automatically be the Withings wifi connected scale. I use MapMyHike.com to track my exercise.
Much less often, I tweet or use FaceBook.
Quite regularly, I ask questions about how to do a programming task on http://stackoverflow.com/ (BTW, I recommend stackoverflow to you – I have received a lot of help there.)
h, did I mention that I sometimes email or text voivemail to myself with todo items and ideas? So my email is also part of my “log” or diary.

Oh, and I almost forgot: I write checkin messages for various source code control systems.

But I find that I often want to have the same thing on multiple sites. Cutting and pasting is too slow.

One of my particular complaints about DVCS is that I often create a clone, do some work, make some checkins, and then throw it away. But I would probably like to have recorded what, approximately, I was doing in ~/work/LOG. Otherwise, recording what I was doing in the VCS checkin log can get too easily lost.

~glew/LOG -> ~glew/work/LOG is an example. It is too much of a pain to try to remember that I should put proprietary stuff in ~/work/LOG, and personal stuff in ~/LOG. So I put all of my stuff in ~glew/LOG -> ~glew/work/LOG. But what this means is that I have no notes about the chanes I made to my personal .emacs, etc., on my non-work computers.

It’s not just proprietariness. It’s FUD about proprietariness.

At the moment my log is an EMACS text file. At my last company it was a Microsoft OneNote notebook – mainly because I was taking many screenclips of webpages. Screenclips and images turn out to be a great way to record stuff from the web. Ctting and pasting text wastes so much time with formatting issues.
Espexcially since OneNote OCRs bitmaps, so you can often search them.
But emacs has many good features. E.g. org-mode.

One of the most useful things I have done recently is install a modern emacs, and start using my-org-screenshot so that I can save clips of simview pipetraces in my ~/work/LOG. Helps me track where I am going.