Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Saturday, December 06, 2014

Conditional Text, Superposition, Quantum

I have been posting a lot about conditional text. Work. FrameMaker. Bah!

Just read a Quantum Computing paper.

Conditional text and quantum seem related: both involve superposition, having a single value represent multiple. If I advocate conditional text, should I also advocate quantum?

Here's a thought: why is quantum computing more efficient? If the answer is superposition, the increased efficiency may lie in the fact that a single operation involving two values, one with M superpositioned values, and the other with N, usually corresponds to M*N non-superpositioned values.

In I-stages

val.stage_i.j := val.stage[i-1].jj OP val.stage[i-1].kk

If 2-way superimposed by stage 0, doubles every stage => stage 1:2*2=4, stage:4*4=16, and so on.

Not just 2^N but is more.

---

But, if noise reduces the number of superposed states to a small finite number: then quantum is "only" a constant multiplier increase in efficiency.

I.e. in the presence of noise, quantum is not a big-O increase in computational efficiency.

This is so obvious that I am sure there must be a flaw in my reasoning.

Friday, December 05, 2014

WYSIWYG conditional text: distinguishing

One big problem for WYSIWYG conditional text is allwing the editor to recognize the conditions.

Basically, to do WYSIWYG editing of conditional text, you need to be non-WYSIWYG ... :-(

There are only a few things that you can do to distinguish different conditions:

* use color

* use font bold, italic. Fontsize not so good.

* use background colors

* use marks like underlining, strikethrough, crosshatching.

Anything that is used to indicate conditions might conflict with the same visual effects used in the final document. Not so bad if preparing for publication in the traditional press, which is largely black and white. Bad if you want to use those effects, e.g. on a webpage.

If you have limited colors, say 4, and limited effects, say single and double underling => well, you only get 8 combinations. But even the v123 tABC example exceeds that.

You get to 4x4x4x2 if you allow all combinations of letter color, underline color. and background color. I don't know of a tool that does this. Sure, it would look cluttered - but I suspect that my brain coukld decode.

Of course, have more colors than 4. But conversely, probably don't use exactly the same colors - foreground text red on background red is useless. The system probably needs to automaically adjust, so that it is foreground red on a immd red that can show the contrast. Conventions.

---

Not to mention that we want to see overlapping tags. E.g. reserve background color for one set of tags, v123; text for another tABC. This doesn't scale.

---

Two effects may scale:

underlining with different colors

flyovers - these can indicate arbitrary combos of conditions.

---

Part of the trouble in managing WYSIWYG conditional text is that an upstream usr may decide to user a color that you have already used.

Quite apart from the logic.

Logic for conditional text

Before I talk more about WYSIWYG for conditional text, an observation: tagging, labelling text.

For many years FrameMaker only allowed ORing of conditional text tags: if a piece of text was tagged with both A and B, it would pint if either A or B was true. There was no way of saying that both A or B had to be true.

Sure, FrameMaker eventually fixed that. First ANDing. AND NOT. Then parenthesizes. I am not sure if it is arbitrarily complex yet, but it is better than it was. (I wish my company used that version of FrameMaker, or did not use FrameMaker at all.)

In theory all conditions could be expressed by sum of product minterms or product of sum maxterms, with NOT. Hck, it could all be expressed by NAND and NOR. But that would be a pain.

Part of the problem is that conditional text tagging is done at different times, by different people - or, worse, simultaneously by people who are not necessarily coordinating. And that the logical expressions required are fragile - one contributor adding a tag may "EXPLODE" the logical expressions needed to handle his text, when transcluded by somebody else. Fragile base class all over again!

Say that user1 is producing three versions of a document, tagged V1, V2, V3. Just those tags are used- no "not-V1". The logic to determine if text is displayed is not "V1 or V2 or V3" - it is actually "Untagged OR V1 OR V2 OR V3". If text is tagged both V1 and V2, it displays if either V1 or V2 is being produced, but not V3.

(Much confusion occurs because people forget the implicit "Untagged". Assuming "Untagged => "display in all versions".)

But user2 has added a different set of tags: let's say TA, TB, TC. Same logic: "Untagged OR TA OR TB OR TC".

I am deliberately not using binary tags, X and not-X. I have observed that often tags come in these mutual-??? sets. Not really mutually exclusive - text may be tagged both V1 and V2. But the V1 and V2 versions are conceptually separate. The problem, of course, arises because I often like to create a readable mta-document, a unified version, rather than having to cope with combinatoric explosion. But that's for later.

Now the user1 and user2 conditions may combine. Combinatoric explosion.

Emphasizing: it is NOT user1.condition OR user2.condition: NOT "Untagged OR V1 OR V2 OR V3" OR "Untagged OR TA OR TB OR TC" => "Untagged OR V1 OR V2 OR V3 OR TA OR TB OR TC".

E.g. we may be generating the combination V1.TB => "(Untagged OR V1) AND (Untagged OR TA)"

Where people get into trouble is when they try to implement this via a UI (User Interface) which is supposedly simpl, and indicates whether tag is wanted or not via V1wanted=0 or V1wanted=1. Part of the trouble is that, at the very least, you need 3 stats for a wanted predicate: definitely wanted, definitely not wanted, and "I don't care".

As in "I want all text related to V1".

I don't want any text related to V2 (unless it happens to also be tagged related to V1".

And "I don't care if the text tagged V2 is included or not".

In all cases the text is conceptually passed on to another layer, with the tags intact; and only at the end are all conditions stripped out.

E.g. V[123].T[ABC] => V[12].T[BC] => T[1].T[B].

Worse, the ultimate user may not know of options or tags added by a user earlier in this pipeline. From the ultimate user's point of view, he may know that he dies not want any TC stuff, but he dos not know if user2 has added a TD alternative. So the downstream user cannot say TA|TB|TD - he just says "Not TC". Which is only meaningful in the context of a mutual group T[ABCD], whre "not TC" really means "Untaged OR TA OR TB OR TC" or "Untagged or Any Tag in the T* group except TB".

E.g. a microprocessor family that evolved through 8. 16, 32, and 64 bit versions, like x86 (not to mention 20 and 24 bit stags).

Let's say a user wants just the 8 and 16 bit versions. If he says "not 32", then he gets 64. Would that not be strange - having documentation for the 8 16 and 64 bit versions of a CPU, but not the 32? (Although I can imagine...)

So saying "not 32" normally also implies "not 64". Or, perhaps "not 32" really means "wordsize < 32" or "wordsize NOT >= 32".

But user1 may introduce a different tag that is not part of this evolutionary sequence - so not-32 should not imply not-virtual-machine.

The problem is that a condition that works at one time may not work going forward, as more tags are added.

Aspects, Conditional Text, and WYSIWYG

I like "aspect oriented programming".

In my undergrad, well before I had heard of AOSP, I was writing tools so that I could manage all of the aspects of an instruction definition in a single place - and then distribute them to multiple places.

E.g.

instruction name: OR

encoding: 00000.rd5.rs5.imm16

action: GPR[rd] := GPR[rs] + imm16

traps: none

latency: 1

instruction name: FADD

encoding: 00000.fd5.fsrcA5.fsrcB5.imm16

action: FPR[fd] := FPR[fsrcA] + FPR[fsrcB]

traps: FP-traps

latency: 4

I would then split up this stuff to generate many different files used in CPU toolchain: simulator decoder, execution, disassembler, compiler timing tables, etc. Combining with other specs - e.g pipeline (latency might be defined there instead), etc.

(I also used to generate the opcodes to minimize logic, but that's a different story.)

But this invoves looking at the aspects sorted according to only one criteria, in this case by instruction. Sometimes you waant to sort by other criteria: the sort key is itself an aspect

E.g. sorted by property

instruction name: OR

instruction name: FADD

traps: OR: none

traps: FADD: FP-traps

encoding: FADD: 00000.rd5.rs5.imm16

encoding: FADD: 00000.fd5.fsrcA5.fsrcB5.imm16

action: OR: GPR[rd] := GPR[rs] + imm16

action: FADD: FPR[fd] := FPR[fsrcA] + FPR[fsrcB]

latency: OR: 1

latency: OR: 4

would like to be able to edit in any rearrangement of the aspects, and still have it generate.

Database views: some editable, some read only.

Later, realized that, even though might not be able to edit and regenerate some views, often can verify consistency of non-generatable view.

---

In techwriting, conditional text is the most primitive support for AOSP - AOTW? But conditional text can be hard to deal with. I would not be surprised if there was a polemic CONDITIONAL TEXT CONSIDERED HARMFUL. Heck, in programming, I wrote a diatribe IFDEF CONSIDERED HARMFUL, very much the same. And a friend has said that he would like to have a simulator with no IF statements... (not exactly that).

Ah, here are some TW equivalents of Dijkstra (why is it that techwriters do not write as well as Dijkstra the computer scientist?):

woes-conditional-text

Similarly wrt transclusion. From woes-conditional-text:

When you have just a few conditions or transclusions with your content, there’s no problem. But when you suddenly realize that editing the topic is requiring an immense amount of concentration and careful analysis because you’ve got too many conditions or transclusions to sort out in your mind, you have to consider whether simply copying and pasting is more efficient.

This is very much like the Fragile Base Class Problem in programming - and many object oriented gurus recommend "reuse interface, not implementation".

I have made a minor step forward in this area by realizing that one of the good things from programming - writing tests that should apply to all implementations of an interface - also applies to documentation. Many aspects (coincidence?) of documentation can be automatically tested. Such automated tests for documentation help maintain invariants and standards both when there is lots of conditional text and transclusion, and when there has been lots of replication via cut and paste.

--

But what I want to talk about is making conditional text (and transclusion) easier to manage.

Sunday, November 23, 2014

Now trying: Jawbone UP - who needs the stinking cloud anyway?

Elsewhere I have discussed how I was disappointed by the FitBit Charge as a replacement for my Basis B1 watch activity tracker. So I returned the FitBit Charge, and am now trying the Jawbone UP.

I plan to use the Jawbone UP not as a replacement for my Basis B1 watch, possibly not even as an activity tracker at all - but as a vibrating alarm gadget on my wrist.

Why? I think that vibrating on the wrist, rather than ringing annoying on the handset, might be the killer app for watches.

I was reasonably happy with my Basis Watch as an activity tracker. Certainly, it has upped my fitness level. But I was frustrated by the following Basis shortcomings: (1) no sharing of data with other users, no friendly social scene; (2) no counting of vertical.

I bought a FitBit One to try to remedy these Basus shortcomings. I have certainly liked the vertical distance metric: Portland ain't flat! I hoped to like the social aspects, LoseIt.com or MyFitnessPal.com - but that has not worked iout so well, yet.

I greatly disliked the FitBit One reporting only total steps, not tracking time of day. I disliked the FitBit One's wristband.

But mainly, I found that I *LIKED* the FitBit One's vibrating alarms: both wakeup, and occasionally during the day (time to head out, midmorning, lunch, midafternoon, time to head home, time for bed). Should I admit that I often get so swallowed up by work that I don't notice what time it is?

I had hoped that the FitBit c=Charge would fill the bill. Unfortunately, my Charge did not charge its batteries, or else discharged them far too quickly: it only worked for 15 minutes after charging for 24 hours. Normally I would do an exchange - but the Charge felt like such a piece of shit - its interface being a single button that I could not tell if it had been pressed or not (as opposed to the nice tactile feedback of the button on the FitBit One or Garmin Vivofit) - that I did not feel like bothering.

So instead, I am trying a Jawbone. Not the more expensive Jawbone UP3 or UP24, but the origional Jawbone, not wireless, which can be found fairly cheaply (especially if you are wiling to take an unpopular color).

MY plan, my theory: I can use the Jawbone just for the "cvirbatuon" features:

(1) vibrating wake from sleep (aka a smart alarm)

(2) vibrating idle alert

(3) vibrating reminders.

Unfortunately, as usual there are annoying limits:

-- only 4 each of alarms and reminders

++ Stupid short sightedness. HOw about a vibe alarm every hour of two?

-- the reminders seem to always generate a notification on my phone. Damn phone centric mindset: I do not want to have reminders on both phone and watch. I want to avoid the slowness of my phone as much as possible.

I may not want to use the tracking features of the Jawbone at all, except implicitly for the idle

After fitness, I think that notifications may be the killer app. But not a wrist notification for every email - selectivity!!!

Friday, October 24, 2014

10 things I hate about Git | Steve Bennett blogs

10 things I hate about Git | Steve Bennett blogs:

'via Blog this'

Sounds like the old "Bazaar:

Version control that doesn’t make your eyes bleed

http://blogs.operationaldynamics.com/andrew/software/version-control/git-is-like-cvs

Friday, October 03, 2014

The Perfected Self - David H. Freedman - The Atlantic

The Perfected Self - David H. Freedman - The Atlantic:

I read Walden Two in high school, and could never understand how Skinner's work could be perceived so badly.

Chomsky has a lot to answer for.

Activity Details | Basis

I like my Basis watch.

I give credit to my Basis watch for giving me continued motivation for increasing exercise.

But...

---

I choose the Basis watch when I originally went shopping, a year or so ago, for a QS device.

There were reasons that I choose the Basis watch over other devices:

I particularly like that the Basis watch is a WATCH, not just a wristband or a clip-on pedometer. It is actually useful for telling time - and I think that, since I started wearing it, not only do I exercise more, but I am probably more punctual, and better at time management overall because I am more easily aware of the time.

At the time I was looking, the Basis watch was the only well-known device that reported heart-rate. (Although I have since become familiar with how often the Basis has aliasing probems: e.g. I will be working out on the treadmill, my pulse will be rising - 100 bpm, 120, but then it will drop down to 70 all of a sudden, when it should probably be reading 140 bpm. (Hmm, I just realized that this "aliasing" may be happening when my steps per minute and my beats per minute are almost the same. I wonder if that is related.)

Also, the Basis watch, at that time, was unusual in that it measured skin temperature and Galvanic skin response. I have a long interest in trying to measure mood or mental state... but so far have not seen a useful correlation.

---

But...

While the Basis device, the watch itself, is great,

the Basis website and software sucks.

Basis has limited graphs and analytics on their website.

Basis does not make it easy to get your own data
(but see http://www.quantifiedbob.com/2013/04/liberating-your-data-from-the-basis-b1-band/)

It seems to be well known that one of the most important features of fitness or activity monitors is sharing information with a support network - family, friends, people with similar health and fitness goals.

But as far as I know, there is absolutely no "official" Basis ability to do this.

Not only not on the Basis website,

but neither does Basis interface to any of the many apps and websites such as Lose It

that aggregate data from multiple devices.

I know that Basis is owned by Intel,

and that Intel has ambitions in eHealth.

Perhaps Intel doesn't want to allow you to share your own Basis data,

and would prefer to hold it hostage until

they provide their own Intel Health solution?

---

Anyway, I am just writing this as I nerve myself up to get a non-Basis device

that talks better to the world.

I may keep my Basis watch - after all, it is a good watch and a good pedometer.

But it falls short on the social aspects of fitness and activity tracking.

Basis:

Good device.

Probably good software inside the device, as evidenced by upgrades to distinguish running and walking.

Bad software outside the device,

in the Android app

and in the webpage.

Thursday, October 02, 2014

IntelMemoryProtectionExtensions - address-sanitizer - Discussion of Intel Memory Protection Extensions (MPX) and comparison with AddressSanitizer - AddressSanitizer: a fast memory error detector - Google Project Hosting

IntelMemoryProtectionExtensions - address-sanitizer - Discussion of Intel Memory Protection Extensions (MPX) and comparison with AddressSanitizer - AddressSanitizer: a fast memory error detector - Google Project Hosting:

'via Blog this'

Intel MPX (Memnory Protection eXtensions) has false positives with atomic pointers.

Sigh: this is the big part that Intel gave up wrt my work: I had carefully figured out how to manipulate the bounds metadata atomically.

It should only be in hardware if atomicity is properly handled.

Friday, September 19, 2014

Two Email Accounts => Can Only Keep Up With One

If I have more than one email account - e.g. work (Outlook) and gmail (personal) I can only ever keep up with one of them. I can only regularly drive one of them to zero, in the GTD manner.

Any of the various "unified mailboxes" {c,w,sh}ould help

- except that my employer doesn't want me to have corporate email outside their control, and I don't want personal email on my employer's email system.

And even if that were not the issue, using separate email systems avoids embarassing errors, like cross forwarding.

It would be nice if unified email systems had some sort of "Are you really sure that you mean to forward a TOP SECRET company email to a public mailing list?" filter/query applied to outgoing email. Actually, for that matter, applying to saving email to the filesystem "Are you sure that you want to save your tax refund message from the IRS on a company share drive?"

Keeping separate email systems reduces the chance of such an error. But it comes at the cost, of having two email systems. And it seems that I can never keep up with two at the same time. I am either keeping up with work, or with personal.

One of the costs of keeping two email systems is - what do they call it, cognitive overload? Gmail does some things one way, Outlook a different way. Or not at all, Or vice versa. Two different tagging and folder systems. Two... Using both regularly educates me as to their relative strengths and weaknesses - e.g. I recently accessed Gmail through IMAP on my personal copy of Outlook, because Outook is much better a handling large amounts of accumulated email than Gmail is - but I really don't want to become an expert in comparative email systems.

Another "advantage" of two email systems is scheduling. E.g, I can try to restrict myself to never look at personal email during work hours, and vice versa. But... well, how many of us can get away with not looking at work email on the weekend? Or overnight? During a project crunch?

EMACS Gnus solved this years ago with mail reading topic modes. You might have one order for reading messages, both in inboxes and folders, at work, one after work, one for "quick check", one for handling your mailing lists... Two email systems is just a hack, a kluge, a poor approximation.

Plus, not reading personal email at work means that my wife and others have to use text messaging as the "high priority" emal equivalent. Which means that thee is a third messaging system.

So perhaps I am wrong when I say "If I have 2 email systems I can only stay caught up with one of them."

Perhaps it is that if I have 3 messaging systems - Outlook, Gmail, and SMS Text Messaging - I can only stay caught up with 2 of them. Or maybe it is N, N-1.

Wednesday, September 17, 2014

Agile Documentation and Manuals - Testing

Should The User Manual Be Agile:

'via Blog this'

I have spent far too much time writing and maintaining manuals (for computer architecture) in the last year.

Since I have drunk the koolaid of Agile Development in software (test driven design, refactoring, pair programing) and have tried to evangelize Agile Methodologies for hardware development and computer architecture (with limited success - e.g. Intel has adopted something like scrum, although I hesitate to call it agile), I naturally think about how to design Agile principles to writing and maintaining technical documentation, computer architecture manuals, etc.

Of course, Ward's wiki has some pages on this: Should The User Manual Be Agile. WriteTheUserManualFirstIsWaterfall, ManualAsSpecification, WriteTheUserManualAsYouGo... Of course, Ward's wiki is very stale. I recall discussions on agile mailing lists such as the [XP] mailing list.

But quick googling does not find much in the detail that I am thinking about.

A big frustration in my current work is that I had more automation in the documentation and manusl in my undergraduate RAMM/RISC/SEISM computer project in 1985 than I do now. I generated much of the manuals from high level descriptions of instruction encodings, and so on - in fact, I actually generated the instruction encodings.

However, over time I encountered much resistance to tools that generate correct documentation. So in some ways I have switched to emphasizing tools that can VERIFY that handwritten documentation is correct. Although I am still happy to generate whatever can be generated.

Of course, either of these approaches - generate or verify, automatically - requires the ability to automate. It is easy to automate text based markup - nroff, LaTeX, XML, SGML. Wiki markup. It is harder to automate when you are dealing with obsolete versions of binary formats like FrameMaker's binary .fm files. (MIF helps, but is limited.) Semantic markup helps a lot, as do wirdprocessing macros with meaningful names.

As many of the references point out, testing manuals and other documentation can be challenging. There is simple stuff - spell checking, automatically checking table of contents. Literate Programming techniques can be fruitful. I believe the Itanium manuals defined instruction semantics in terms of pseudocode that could be extracted and run in a simulator.

But there are other steps.

Thursday, September 11, 2014

More hassle switching sound devices ...

More hassle switching sound devices this morning:

In a Fuse meeting. Started up using the internal speakers and microphone on my tablet PC - NOT the headset mike that I prefer to use.

Manually changing default audio device via Control panel did not switch over - apparently Fuse reads the default devices when it starts, and uses those for the rest of the session.

In the past I have been able to find a clickable way in Fuse to change the active audio device - but could not find it today.

Exiting and reentering Fuse would probably have worked - but the meeting was running, and I did not want the interruption.

---

Later, after undocking and returning, I received a Google Voice call. Made the mistake of picking it up on my PC using Google Hangouts. Same problem.

The default sound devices had changed back to the internal speakers when undocking disconnected the headset mike.

---

There are many apps to change or switch sound devices, e.g. audioswitch - Switch between default audio input or output + change volume - Google Project Hosting:( 'via Blog this')

All that I have tried change the DEFAULT audio device.

What I want is to change the ACTIVE audio device. Possibly on a per-app basis, although personally I have not seen the need.

I also want the DEFAULT sound device to change according to what sound devices are plugged in. Rather like the way Windows remembers that if I have a particular set of monitors plugged in, I want one arrangement of resolution and orientation. I.e. software configuration dependent on hardware configuration.

I.e. when undocked with no headset, use the internal sound devices.

When docked, I get a headset, plus external speakers. In my case, make the headset the default.

I also wish that my sound devices, like a headset, had a button that did something like "signal OS to make this the default or active sound device". And mute and volume control...

Tuesday, September 09, 2014

Lumo Lift arrives - no app - annoying proprietary charging interface

A while back I bought a LumoBack - a QS device that detected posture. Work on a lightweight strap with velcro strap around waist. Overall I liked it - especially when I realized that it counted steps pretty well, better in many ways than my Basis watch.

Unfortunately, I lost mine - I found it annoying to have the lightweight strap around my waist, when wearing an actual belt. I determined that I could take the LumoBack off the strap, and put it on my actual belt. Unfortunately, after a few days of doing this, with occasional buzzing reminded me to maintain good posture, I realized that the LumoBack had fallen off my belt. I conjecture that the plastic loops were stressed by the real belt, in ways they had not been with the original.

I would have bought a replacement, except the LumoBack was no longer for sale. It was replaced by the LumoLift, worn at the collar.

Received my LumoLift yesterday. First impressions:

---+ Overall Good

Nice small device.

Magnets hold it on through clothes.

Worn just under collar bone.

Advised not to wear on loose shirts. Has worked okay on all the shirts I have tried so far.

Magnets hold tightly enough that twice I have stripped off a shirt and tossed it into laundry hamper, forgetting to remove the LumoLift - and had to go dig in the hamper when I realized I was no longer feeling the occasional buss when slouching.

User interface on actual device good: double click to "snapshot", i.e. calibrate, triple buzz acknowledge; press and hold to toggle enable/disable buzzing when slouching, single buzz to acknowledge on, double buzz off.

In this respect better than LumoBack, where I frequently had to recalibrate using the Android app. Actually, the LumoBack had similar UI on the device, but I seemed to use the Android app most. It is a good thing that the LumoLift does not depend on the app as much as LumoBack did - since there is no LumoLift app yet.

The LumoLift seems less sensitive, out of the box, than the LumoBack was. Fewer false alarms; necessary because no Andoid app to control sensitivity. But it accepts slouched postures - I have to stand almost upside down to make it start buzzing. Perhaps when the app is available...

---+ No App Yet :-(

No Android app.

iOS app possibly available (unclear, I have no incentive to investigate).

Windows desktop app available "real soon now".

No Android app supposedly because Google Android's BLE (BlueTooth Low Energy) support is immature and unreliable. This may be consistent with the flakiness of the LumoBack, which required frequent resets.

---+ No Pedometerr ... ?

At least not until the app is available.

I liked the LumoBack enough that I would recommend getting it instead of an activity watch.

But the LumoBacl cannot play there.

---+ Non-standard USB Charging cradle

* I am annoyed that the LumoLift has a custom charging station - USB, sure. But this means that I cannot simply plug the LumoLift into whatever USB charger I have close at hand with a standard micro-USB cable - at my desk at work, at home, in car. Instead, can only charge up at one place.

Carrying around the charging station is suboptimal: since I have several such devices. E.g. my Basis watch.

Indeed, one of the reasons that I liked the LumoBack so much was that it used a generic microUSB charging connector. On several occasions I was able to charge it up, when my Basis watch was out of charge. Now my LumoLift and Basis watch are equally non-functional.

Buying extra charging stations - perhaps. After all, I have a total of 4 chargers for my laptop, albeit one being a universal (home, work, cottage, backpack). I used to buy multiple cell phone chargers, back when cell phones had proprietary chargers. But now that most cell phones have standard USB chargers, I just buy multiple USB chargers.

Perhaps it is good "gouge the consumer" marketing to require extra charging stations be bought by those of us who want ubiquity. Although, at the moment, you cannot purchase extra charging stations from Lumo. (I think Basis does sell extra charging stations, for a "gouge the consumer" price).

Although "do not ascribe to malice too quickly that which can be explained by stupidity". Or lack of standards.

Perhaps the designers of LumoLift and the Basis Watch realized that USB micro connectors are unreliable. Certainly an impediment to a QS deviced, that you might want to wear into the shower after a workout. (Easy to do so accidentally - I haven't yet forgotten to take off my Basis watch, but already, in les than one day of wearing my LumoLift, I have changed shirts twice without taking it off the old shirt. Which I think means that the LumoLift is non-obtrusive.

Micro-USB is unreliable. The Basis watch has 4 metal pads, the LumoLift two. Probably these connectors are more reliable. But they need pressure to make contact: on the Basis watch provided by a wraparound plastic cradle, on the LumoLift provided by magnets.

I am not aware if the LumoLift and Basis watch's power contacts are standard. Not found in a quick google.

I do rather wish that the LumoLift and Basis Watch had chosen to use the same sort of TRS connector (headphone jack, TRRS, Tip/Ring*/Sleeve, mini 3.5mm Apple, submini 2.5mm). Not standard USB, but reasonably common. Good enough that iPods can be made waterproof for swimmers.

I can understand that such a "shaft" may be a pain to design around. And that it is annoyingly thick. Still, I wish that the LumoLift and Basis watch used connectors that were reasonably widely available.

Monday, September 08, 2014

Neat website with microbenchmarks

Measuring Stuff « Blog:

'via Blog this'

Thursday, September 04, 2014

These course notes are broken

'via Blog this'

Looks like course notes for a computer architecture parallel programming course.

Stupid quote: "LL, SC ... Unlike the RMW instructions, there is no need to lock the bus, yet it implements an atomic operation".

Apparently does not know that most advanced microprocessors have not "locked the bus" to implement atomic RMW instructions for decades.

Also does not know that a smart implementation of an atomic RMW is guaranteed to make forward progress - at ;east, the RMW instruction itself will complete - whereas LL/SC implementations are plagued by forward progress problems.

Instruction like this is one of the big reasons parallel programming advances so slowly.

Wednesday, August 27, 2014

Version control clean: unknown / ignored / skipped over

The reference is not what I am talking about in general, but an example

General - hg clean is pure evil:

'via Blog this'

Tools such as "hg purge" (or got clean)

have options such as

"remove all unknown files" (hg purge)

"remove unknown and ignored files" (hg purge --all)

Methinks there is a third option needed - not files that you have ignored because they are generated, but files that are skipped over, e.g. because they are controlled by a different version control system.

E.g. -X directory -- often I have a different VCS in the subdirectory, .bzr rather than .hg.

It is dangerous to type in "hg purge --all" in such a situation. E.g. it may delete the .hg/.bzr/.git subdirectories.

This is an example of "splitting": what is a simple action "exclude" from the point opf view of "hg add" is actually two flavors from the point of view of "hg purge".

Thursday, August 21, 2014

Perforce Anti-Patterns --- labels

Perforce Anti-Patterns | Perforce:

'via Blog this'

While I agree with encouraging use of things other than labels (changesets, automatic labels, brancghes can all be equivalent for labels)

I think that calling this an anti-pattern is specious.

There are anti-patterns that are fundamentally bad.

But there are also anti-patterns that are bad just because of a poor implementation - and apparently Perforce's implementation of labels is poor.

Conceptually, a changeset is very much like a label - a set of files and versions, with comment metadata. (Again, using the "files at the bottom" mindset, which I don't like, but which I find easier to think about.)

Similarly, a branch is very much like a label - or, rather, the place on a parent where a branch starts.

These are similar conceptually, and probably should be interconvertible.

It is foolish to create a rather heavyweight concept like a branch, a contour, will do.

Perforce;'s slowness probably arises because it has a table that looks like

filename : version : branch-name

which grows as #Files*#Labels.

Even worse if the metadata is stored RCS like, replicated per-file. I can't imagine Perforce being that silly ... now ... although I suspect they may have been historically.

Friday, August 15, 2014

Workflow is (almost) expressable as a Perforce branch mapping

Recently I have been working on a painful manual merge of some diverged, forked, documents. Painful because no automatic merge tools in Framemaker - I use compare documents, and then manually investigate diffs. (I remember doing this to the BSD 4.x kernel at Gould Real Time UNIX in the 1980s... gah!!!)

Key is developing a workflow: I assembled the files to work on in XXX.merge-needed.

I do my stuff, and then move the files to XXX.merge-pending.

In XXX.merge-pending, I wait to get approval, and then distribute stuff from there to unfortunately still replicated and forked source/destinatiions, call them XXX.fork1 and XXX.fork2. I also move my work item files to XXX.merge-done, once the distribution is done.

Editing of stuff in XXX.merge-needed is manual, as is moving to XXX.merge-opending.

But thereafter, it can be automated: when I have a batch of stuff in XXX.merge pending I can do

p4 cp XXX.merge-pending XXX.fork1

p4 cp XXX.merge-pending XXX.fork2

p4 mv XXX.merge-pending XXX.merge-done

Yes: I could be tracking the workflow in Perforce - even though IT discourages that.

The copy-copy-move script is *almost* a Perforce branch-mapping:

XXX.merge-pending/... --cp--> XXX.fork1/...

XXX.merge-pending/... --cp--> XXX.fork2/...

XXX.merge-pending/... --mv--> XXX.merge-done/...

Which would be nice - I could save it as a branch mapping, and reuse.

Almost but not quite - as far as I can tell, Perforce branch mappings really amount to "p4 cp", copies, buit do not have the ability to do moves.

Q: what is the difference between a script, copy-copy-move, and a branch mapping (extended to also do moves)?

The branch mapping is conceptually used to create a script of commands - but it also does various integrity checks. Plus it is transactional (at least I think that Perforce is transactional, ACID) - all succeed or none does.

Integrity checks, like ensuring files overwritten are not locked. It might be nice if it could be better - in this case ensuring that the destination files have not been changed since their common ancestor in the workflow loop. Which is basically a very simple merge or integration algorithm.

It is nice to have that sort of integrity wrapper around a script. It is a pain to code a script with all the necessary error handling.

I have long suggested that the folks who want transactional memory start by creating a transactional filesystem based scripting environment - for Perl, Python, whatever.

A DVCS is, in some ways, a mechanism that allows the user to create quite complicated transactions while disconnected, operating in a clone - and then later pushing, committing, rebasing or grafting. Interactive, because when a merge/integration fails, the user has to get involved.

Thursday, August 14, 2014

Versioned Label Sets

I like labels in version control systems. Like "Compiles". "Passes Tests". "Passes all tests except Test#44". Status if you will.

Of course, such status must be applied to set of files+versions. Or a repo-project-version. Whatever. (I will not use the whole-repo-project viewpoint here, since I am trying to think about partial checkouts. Whole-repo trivially fits - it is just a single entry, the rep version.)

You can think of a label as a file itself, containing a list of files and their version numbers. Such a label-file might also contain branch info, etc. - i.e. more metadata.

Generalize to an arbitrary package of metadata associated with a set of files+versions. "Labels" may be such that their name is the only metadata that matters.

Such a label-or-metadata-file can itself be versioned. Must be, should be.

In fact, just about everything we care about can be considered a set of objects+versions, in a file, itself versioned.

Branches may be defined by rules such as a list of filenames or patterns. Possibly with versions that are frozen in the branch.

OK, there is a difference: branch histories are graphes. Steps along the history are the sets of objects+versions that most closely correspond to a label set.

I.e. there are graphs whose nodes are objects+histories.

Anyway... : the default action is where the difference arises.

When a workspace is checked out from a branch head, when trying to check in the default is to extend the branch.

When a workspace is checked out from a label, the default is not to extend. The label.

We can imagine interconversion: forcing a checkin to a label, making the label into a branch.

---

Who stores the linkages?

Label-sets may be marked inside a branch-graph file, or outside.

Outside allows non-privileged access. Library users can label library versions.

Inside may be faster and more convenient.

It is important to be able to track the configuration of stuff that you are not allowed to write into the "home" VCS for.

The DVCS people say "just clone", but that may not always be possible

I may want to have a local repo, linked to a master repo without incorporating all, and be able to define cross repo actions.

Click through IP licensing

The p4ideax forums terms of use has some interesting details: http://p4ideax.com/terms-of-use

Starts off mild:

User Submissions and License By using P4IdeaX, you agree that any information you send to Perforce via P4IdeaX, including suggestions, ideas, materials, and comments, (collectively referred to as the "Materials") is non-confidential.

But then gets stronger:

Furthermore, by submitting the Materials using IdeaX, you grant Perforce and its designees an irrevocable, unrestricted, perpetual, non-exclusive, fully-paid up and royalty free worldwide license to make, use, sell, import, modify, reproduce, transmit, display, perform, create derivative works, combine with other works, and distribute such Materials for any purpose whatsoever to the extent permitted by law. This license to Perforce includes the right for Perforce to sublicense these rights to third parties.

Perforce may be working on a same or similar idea at the time of your submission. You understand that we may continue to develop our own idea independent of your submission without acknowledging your Materials.

As part of its license to your Materials, Perforce may make modifications to, derivative works of, or improvements to your Materials. These modified or improved versions shall be owned exclusively by Perforce.

Submission under a Patent or Patent Application You agree to disclose to Perforce if your Materials are protected by a patent or subject to a pending patent application. If your Materials are not yet patented, but you wish to patent your idea in the future, you also agree to disclose this information to Perforce.

Now, I think that recent updates to US patent law mean that there is no grace period here. If you post to a pretty-much-public website like p4ideax, then you have made a public disclosure and may not patent.

If your Materials are patented, subject to a pending patent application, or you intend to file for patent protection, these Terms of Use will automatically grant Perforce a license under the terms of the previous section entitled User Submission and License. Such license may be superseded only by a separate written license or assignment agreement between you and Perforce.

This is interesting. What if the materials are not yours to license? What if you are posting GPL'ed materials? I can imagine some lawyer arguing that because you did not specify GPL when you posted, than the GPL would not apply.

Posting your idea to P4IdeaX may impact your ability to protect your idea under patent laws. If your goal is to patent your idea, we suggest you consult with an attorney before posting your idea on IdeaX. You agree not to hold Perforce liable for any loss of patent protection.

This is the other side of ARM's "click-through licemnsing": to view ARM materials you have to promise not to use them to detect patent infringement.

---

As for p4ideax: haven't registered yet.

What about posting a link to a blog on my own site? The link is licensed, but is the content I linked to licensed (I doubt it).

---

I guess my interest is left over from working at IV.

Perforce Software p4ideax | Intelligent symbolic links in the depot

Perforce Software p4ideax | Intelligent symbolic links in the depot:

'via Blog this'

I have also been looking for this "symlinks in depot".

It is possible that streams may do this - I may not totally grok streams yet (not helped by our IT forbidding us from using streams in P4, and highly discouraging branching (p4 branching support, is of course, primitive)). But based on what I have seen so far, streams are much more complicated than what I want to do with symlinks/

Here is one of the use cases where I want to use depot side symlinks:

I want to merge two directories that have diverged versions of files.

Unfortunately, they are NOT branches. The user who created them did not understand branching. Instead, she copied the files outside perforce, and then added the copy as a separate set of files that, from Perforce's point of view, are totally independent, unrelated. (Fixing that is a separate topic.) Call this a "fake branch". (E.g. think about cp -R from to creating a fake branch of a directory tree - logically a branch, just one that your version control tool may not be able to figure out.)

Unfortunately^2 they are binary files that I can merge, but must do so by hand. Painful. Slow. I can't get the merge done all in one day.

So here is what I want to do: as I merge the several hundred files in the fake branch directory

- let's call the original

DEPOT/a/b/GoodDir

and the "fake branch"

DEPOT/c/d/FakeBranchDir

I must leave the two directories GoodDir and FakeBranchDir around.

But as I merge files GoodDir/file1#666 and FakeBranchDir/file1#1 into GoodDir/file1#667,

I want to make FakeBranchDir/file1#2 into a "depot symlink" to GoodDir/file1

so thereafter anyone attempting to work with FakeBranchDir/file1 will get whatever the latest version of GoodDir/file1 is.

And I will do this one by one for all of the files.

(By the way, I can do this because I know the dependencies. I.e. I can do continuous partial integration (merging, reconciliation).

Sometimes I have to do several files together atomically, but not the entire directory.)

When all of the files are merged, so that every file in FakeBranchDir/fileN is a "depot symlink" to GoodDir/fileN,

I can do the following:

* remove all FakeBranchDir/fileN depot symlinks, and make DEPOT/c/d/FakeBranchDir a depot symlink to DEPOT/a/b/GoodDir

* potentially just plain remove FakeBranchDir completely, and stop the insanity of having unnecessary fake branches in the depot

Anyway... streams may do this, but they seem like overkill, plus IT has forbidden p4 streams. Heck, my team barely knows how to use branches - actually, I am strongly discouraged from using branches (but I am so used to branching...)

Lacking depot symlinks or other support, here is what I am doing:

+ Merging the files

+ Once merged, copying the files into BOTH GoodDir/file1 and FakeBranchDir/file1, etc.

+ hoping that nobody modifies the merged files separately, causing them to re-diverge.

+ unfortunately, not allowed to create a long-lived lock. Folks still want to edit in ther diverged directories

I have thought about using p4 branch mappings to accomplish the same thing as a "depot symlink", but that is a pain - I would have to edit the branch mapping every time a file GoodDir/fileK and FakeBranchDir/fileK were merged.

Basically, "depot symlinks" are just a way of allowing you to edit the branch mapping, without actually having to edit the mapping in a central place. They are a "distributed" view of the branch mappings.

---

Now, yes, I know: this creates a "fragile base class" problem. Somebody checking something into GoodDir/fileM might break FakeBranchDir/fileM (if it is a depot symlink), because the "context", the surrounding files, may break it in the FakeBranchDir context. Yes, I realize that we really need to be using branches here (not p4's primitive branching, but some sort of branching for a partial subset of the depot - which may be what p4 streams are trying to do.). So that when somebody checks into GoodDir/fileM, FakeBranchDir/fileM can detect that it needs to be updated, but is not automatically updated until you have tested it in the FakeBranchDir context.

(Hmm, what this really means is that FakeBranchDir/fileM#2 may be a depot symlink to GoodDir/fileM (after some base revision)

FakeBranchDir/fileM#2-->GoodDir/fileM(#latest,validated=1011). Using notation to indicate that we are supposed to link to the latest, but at the last time of checkin that value was GoodDir/fileM#1011; as opposed to linking to FakeBranchDir/fileM#2-->GoodDir/fileM#1011, which would be a depot symlink, but one that is not normally updated by default.

I.e,. a depot symlink really wants to be a branch. But it is a branch that you normally want to be encouraged to update as quickly as possible, perhaps by default, as opposed to having to do an explicut branch merge.)

But, these are dreams for my own VCS.

Just plain old depot symlinks, though, are a darn good first step.)

Monday, August 11, 2014

Version control branches are not branches - need "merging" of unrelated version control objects

Most version control systems have the concept of branching: a versioned object (e.g. a ,v file in RCS or CVS, an entire repository in most DVCSes) starts off with a single line of versions, but at some point the lines of development may diverge, and be developed almost independently.

"Almost independently", since it is common to synchronize two diverged branches - sometimes a complete synchronization, making then identical, sometimes incomplete. e.g. copying changes from a master branch to a maintenance release branch.

The term "branch" is a bad term, at least if you are thinking in terms of trees - unless your concept of branching includes re-merging, with tissue fusion where branches overlap. This often happens with vines like ivy, and occasionally happens with rubbing branches in trees.

The term "branch" is a bad term = but unfortunately I do not know of a better one.

Source code version control corresponds more closely to gener flow diagrams or "family trees" - but again the terminology is inaccurate.

I will no longer obsess about improving the terminology - but I do think that "branches" <=> trees has warped or limited our thinking.

Anyway...

The idea that branches reflect divergence followed, possibly, by convergence is also misleading. The versioned objects may start off independent, and then converge first, before having the chance to diverge again.

Small real world example: We recently changed VCS, from CVS (and HG) to Perforce. All of the CVS files were converted en masse. Evolution then continued in the Perforce depot.

Later it was discovered that some edits had continued to be made (in neither CVS nor P4 nor HG). These files were checked into a separate place in Perforce.

One may argue that what should have been done was to have created a branch off some revision along the CVS-converted-to-P4-history, and then checked those diverged versions in on that branch. But that was not done. Too late.

One may argue that these files are logically related to a common ancestor. True - but that ancestor may not be represented in the Perforce depot.

What I argue is that it should be possible in a VCS to take separately versioned objects, and then merge them into a single versioned object. Or, you may prefer "connect two independent graphs of versions into a graph with no supremum, no common ancestor".

Similarly, it should be possible in a VCS to create new ancestor versions at any time. Not just to start off with a base or original version, and then move forwards in time - but also to go backwards in time. Imagine, for example, that one is doing literary research, say into versions of ancient Greek literature that were copied and recopied by scribes in Alexandria, during the Islamic Golden Age, and also in monasteries in Ireland during the Middle Ages. Then a new scroll is discovered preserved in a library in Pompeii - and it is obvious that it is an ancestor to some but not all of the later versions. It should be possible to retroactively edit the history graph, inserting this new version.

Now, in this example the versions may be imagined as being descended from a common ancestor. But perhaps not - perhaps two independent works were created, and then borrowed from each other until they became very similar, possibly identical after a certain point.

Linus has argued against explicitly recording file renamings in git - saying that comparing file content should suffice. This is true... but imagine the literature research problem. Here we may want to record the opinions of antiquities experts as to which version of the document is related to which, as a matter of opinion rather than incontrovertible fact. Those experts may have used content comparisons to make their inferences, but it may well be impossible for an automated tool to repeatedly infer those connections, unless they are recorded explicitly.

Another example, beyond individual files: I have version controlled my ~glew home directory for decades (yes, literally). But I have occasionally switched version control systems without preserving history. And the versioned tree has diverged on some OSes. I should like to merge these graphs.

I need to improve my terminology wrt "versioned objects". Were there two distinct versioned objects prior to connecting their version graphs, but one thereafter? The concept "versioned object" is mainly used so that I can talk about versioned sets of versioned objects (and versioned sets of versioned sets of ...) There are really default queries - there are relations between versions that are essentially or conceptually 1:1, such as just adding a few lines to a 1000 line file, but leaving it in the same place in the filesystem. Similarly, moving from place to place in the filesystem. There are relations that are "into", such as taking a file containing a function and making it just part of a larger text file. This is much the same as including a file in a directory, except that it is easier to track the evolution of a file in a directory than it is a chunk of text in a larger text file. In my dreams, tracking a chunk of text that happens to be a function is easy, even across renamings - but functions may replicate, etc.

Plus there are graph edges that correspond to creating new versioned objects - such as splitting a file into pieces.

What I am trying to say is that it is easy to track the evolution of sets of files or oitheer version objects if the transformations are 1:1, or into, or...

Overall, if a set is identified by a rule, it is easy to track if the converged or merged objects all satisfy the rule for a set. E.g. "all files under dir/subdir", is not affected if a file is renamed but livbes in the same directory.

But if a transformation means that some of the participants no longer are covered by the rule defining a set, then one may need to query. E.g. if you have defined the set for a tool as "Module A = all files under tools/toolA",

but tools/toolA/foo.c has had a library function extracted into tools/lib/libAA/lib1.c, then Module A may no longer be freestanding. Or we may want to modify its dependencies.

Friday, July 18, 2014

How to determine what may have changed (figuring out how to work around perforce slowness)

Determine what has changed using full content comparison

- slow (especially transferring whole content across net)

- completely accurate

Determine what has probably changed using content checksums

- transfer only checksums across net => fast if no change

- may be inaccurate if checksums collide (yeah, I am paranoid)

- computation (of checksum) on both sides

- or, a VC tool may cache checksums of original checkout => local change

- false negatives - no change detected - if checksums collide - but ubnlikely

- no false positives

Determine what may have changed using heuristics and metadata

- e.g. file modification dates

- e.g. whether user has checked a file out for editing or not

- false positives and false negatives

- false negatives - undetected changes - may be common, e.g. if not using modification dates or if M times can be changed

p4v "get latest" seems to use the third approach, heuristics and metadata. I got bitten by false negatives - true changes not reported - almost immediately.

p4v "get revision / latest / force" uses the full content transfer.

I realized that I was implicitly assuming the second approach, content checksums. Fast, very l;ow chance of false negatives (checksum collision). I.e. I have grown used to rsync and its descendants. It does not appear that p4/p4v have this ability.

It is not clear where "p4v reconcile" falls. It is possible that p4v reconcile uses local metadata heuristics, and that p4v reconcile in combination with p4v get latest is high confidence. But having been burned once, I am using p4v-get-revision/latest/force far too often. And it is very slow.

Perhaps what I need to do is keep a clean workspace, use p4v get latest on that, and then diff using local tools. Avoids the slow net transfers of p4v-get-revision/latest/force.

p4 partial checkouts, file sets, spatial and temporal

I like Perforce's partial checkouts - workspaces that do not need the whole depot - but it comes at a cost in speed.

With a whole-repo DVCS like hg or git, diffing can be quite fast: you check the whole-repo version object.. If unchanged, that's it - you have essentially checked only a single file. Whereas p4, because it seems to operate by assembling versions of individual file objects, has to check each. IT complains when I try to reconcile a p4 workspace with >3000 files, saying "prune your workspace".

Something similar applies to diffs when files have changed.

---

I think my concept of "file-sets" or "versioned-object-sets" can help here.

A versioned-object-set can be the whole repo, or an individual file. And possibly stuff in between, subsets, like directory trees.

Let's imagine that at the lowest level we have individual file-versioned-objects. (This is not necessarily true - might have versioned-objects corresponding to parts of a file, or even parts of several different files (e.g. function definition and declaration. But it's nice to have an atomic level to think about.)

A spatial set describes while file versioned objects are considered. It might be a list of disjoint files,

or it might have predicates such as "the subtree under /path/subdir".

The spatial set's rules, that define what files are considered in the spatial set, may itself be versioned. E.g. you may add or delete a subdirectory from a spatial set.

Note difference: adding or deleting a subtree from the whole repo (or from some other spatial set), versus adding or deleting it completely from the repo.

"I don't want to see this subtree in this subset any more" (doesn't propagate to other sets overlapping the subset)

Versus "remove from this set, and all other sets, going forward." (propagates to other spatial sets and subsets (when they decide to merge), but doesn't affect saved history

Versus "remove content completely from the history" (le.g. licensing problems)

Apart from versioning a spatial set's rules, the spatial set's contents, the list of files inside it, may be versioned.

Call that a versioned spatial set instance.

Partial checkins do not necessarily immediately affect a spatial set's contents when next updated. But the spatial set may be directed to merge candidates.

I.e. a spatial set may be constructed from the latest trunk version of files specified by the spatial sets rules.

For that matter, a spatial set may be constructed from the last version of files on different versioning branches - e.g. the latest trunk version of files under main/..., and a development branch version of files under library/...

In so doing, we are transitioning from spatial sets being "pure" spatial descriptions, to spatial-temporal sets, combinations of spatial and branch versioning descriptions. Operations such as "all files spatially under subdir/..." and "latest files on branch bbb..." and intersections and unions and other set operations thereof.

I dislike the world "spatial-temporal", since temporal seems to imply versions as of a precise time.

Better?: "spatial" for file position, and "lineage", for things like "the latest of a branch".

---

This concept of a versioning sets enables us to have simple tracking branches that do not fully propagate changes, at least on file granularity.

E.g. if you have Branch1 and Branch2, each with corresponding READMEs, and you do not want to propagate README.Branch1 to Branch2, or vice versa.

Branch1 = common-spatial-set + README.Branch1
Branch2 = common-spatial-set + README.Branch2

checking stuff in on Branch1 may affect common-spatial-set and README.Branch1.

updating Branch2 receives changes to the common-spatial-set but not changes to README.Branch1.

Flip-side, we may clone Branch2 from Branch1, which will give us README.Branch1 in Branch2. When we prune README.Branch1 from Branch2, we have two options - making it a deletion that propagates, or not.
(Q: what does "propagation" look like?
"Propagate a property to any new checkin - like branch."
"Do not propagate property - tag that applies to a version"
"Propagate across merges by default."
"Do not propagate across merges by default - branch specific file".
"Propagate to child branches, but not to parent branches..."
)

Monday, July 14, 2014

Transformations when moving changes between branches

I often want to have transformations automatically applied when I perform operations between branches.

Very simple example: I have occasionally had readmes for specific branches, that I want to live only in that branch. E.g. README.vcs-branch-name1, README.vcs-branch-name2

Therefore, when merging from branch1 to branch2, I do NOT want to transfer README.vcs-branch1.

But when doing a reverse merge from branch2 to branch1, I do not want to transfer README.vcs-branch2, and I especially do NOT want to delete README.vcs-branch1.

Mercurial's merge tracking will arrange to delete the README.vcs-branch1 file on the reverse merge. Bad, mercurial.

You can think of this as a patch that is implicitly applied whenever there is a cross branch operation. Patch may be too specific: possibly a programmed transformation expressed as code.

(Would also want to notify on cross-branch diffs about such transformations.)

===

A contrived example: if tracking Linux installations, may want to change text in some control files.

E.g. some file may contain a user name, like "UserThatRunsFooBar"

On one machine it may be FooBarUser. On another it may be SamJones.

All of the rest of the diffs to the file may transfer, just not that variable name.

May want a different branch for the two systems.

Hence, a desire for a transformation applied whenever such a file is moved between the branches for the two systems.

===

Partial checkouts can then be considered to be branches with such transformations based on filesystem structure.

A partial checkout of a subtree may have the transformation rules:

* include all stuff under tress T1, T2, ...

* exclude all stuff not under those trees.

DVCS branches = sets++

It is a good idea to be able to identify "sets" of revisions. Both by predicate functions, and by tagging with names.

Branches are sets that automatically extend: when you do a checkin from a workspace with a parent set that is a branch, the checkin automatically gets added to the branch set.

This allows branches to converge and then diverge:

of course, a version can be tagged as being in multiple sets

similarly, a version can be tagged as being in multiple branches at the same time.

Two versions on different branches can merge, and the branches can be converged for a while. But then later diverge.

This can be done on a file by file basis: not just whole repo versions, but individual file versions.

--

repo-versions

repo-version-sets

file-versions

file-version-sets

file-sets - most meaningful when file-version-sets

named-file-sets => these are objects that can be versioned

named-file-set-versions

set operations on named-file-sets

=> partial, union, difference

BTW, parse this as named--file-sets or named(file-sets)

Doesn't need to be named(file-sets). Can be anonymous. Perhaps better called (explicit(file-sets)

or identified(file-sets)

---

I have elsewhere figured out that

partial checkouts are easy,

while partial checkins correspond to creating a branch, at least temporarily, from which changes can be propagated to larger filesets.

Probably with some sort of nagging system:

Partial checkin doesn't automatically check into containing filesets,

but does automatically check into candidate filesets for enclosing branches.

This might be a good place to exploit file versioning as opposed to whole repo version - candidate-filesets or candidate-branches on a per file basis.

UNIX tools and special characters in filenames

See, fior example: bash - Is there a grep equivalent for find's -print0 and xargs's -0 switches? - Stack Overflow:

'via Blog this'

UNIX tools are great, with their composability - find | grep | xargs | etc.

But UNIX tools have problems handling entities or objects, such as filenames, that have special characters such as blank spaces or newlines within them.

UNIX tools typically operate on lines (grep, xargs'input), or on words separated by whitespace (e.g. backtick expansion, xargs' invocation of other tools).

Some UNIX tools provide the option of using null separated strings, such as find -print0 or xargs -0.

But as the stackoverflow page shows, people want such flexibility in other tools, like grep. Of course, GNU grep has provided it - --null - but there are probably other such tools. ... cat? but of course tr '\n' '\0' ... still, the list continues. Mercurial? Git?

Moreover, null separated is by no means the last word. What if nulls are allowed in the strings that your are manipulating? Need either a quotation system, such as XML (and then we get into the issue of quotes upon quotes), or a strings-with-length system.

I have elsewhere talked about making all UNIX tools work with XML. This is a generalization.

Strings-with-length is most general. Possibly fragile. Possibly XML clauses wrapped around simple "obvious" quoting.

Saturday, July 05, 2014

I wish that EverNote / OneNote had 1990s era Infocentral's linking

Why InfoCentral?: 'via Blog this'

For the umpteenth time, I am trying to use EverNote to collect shopping research. And it sucks because Evernote doesn't really have hierarchy.

Evernte has notebooks. And stacks. And tags.

OneNote is slightly, moderately, better than EverNote. It has books, folders, groups of folders, and notes can have subnotes. But that's it. Oh, yes, it has tags.

Gmail has tags, aka labels. Or are they folders? Really, folders implemented by constraining the labels system.

Better, but\the tree structured folder constraints make non tree structured labels harder to use. Some labels want to bve tree structured, some do not.

I think the problem is that developers are trying to maintain a paper mindset, using "abstractions" that behave somewhat like real objects. Real paper manila folders cannot be arbitrarily recursively nested, and hence EverNote // OneNote should not. Bzzt!!! Wrong!!! I want to take advantage of what a computer can do that paper cannot do.

And, yes, tags in theory can be used to implement everything that a folder hierarchy has - but only in theory. Because to really accomplish this you have to create a really ugly tag naming system.

I have elsewhere posted about how I even want my tags to be organized, possibly in a hierarchy. Because just plain searching through the approved list of tags can be a pain, when you have a lot of tags.

--

Gnashing my teeth about this, I reminisced about InfoCentral. The very first note organizng software that I used on a tablet PC - way back in 1996-7.

InfoCentral was by no means perfect, but it was better than tags, better than hierarchy. Infocentral was all about links between objects. Links that were reversible, unlike in hierarchy. But where you could use hierchical browsing up to the point where it faild, abd then "shale the tree"

So you could look at a family as

Father - John

Son - William

Grandson - Simon

Granddaughter - Evelyn

Daughter - Sonia

Grannddaughter - Mildred

Or shake the tree to look at it from somebody else's point of view

William

Father - John

Son - Simon

Daughter - Evelyn

Sister - Sonia

Niece - Mildred

and then continue browsing.

OK, so InfoCentral wasn't smat enough to know that son's son = grandson.

Or to group sons and daugters as childrewn. Or sisters and brothers as siblings.

And Infocentral wasn't smart enough to do the classic pivoting:

Sales/Year/Month

Sales/Month/Year for month comparisons between different years

But Infocentral allowed me to do a lot of what I wanted.

I wish something like nfoCentral were available on the web, in Evernote or OneNote.

I'd love to have the time to extend the approach.

Thursday, July 03, 2014

Hidden Files in Perforce — Encodo Systems AG

Hidden Files in Perforce — Encodo Systems AG:

'via Blog this'

Security model:

user can see everything - file names, file contents
user can see file names, but not file contents

with an error indication if trying to access forbidden file contents

user can see neither file names nor file contents

with an error indication "some information was forbidden for you to see"
with no error indication

Different error models when scanning / listing directory trees / enumerating
and probing for a single filename / identifier.

Any query that might potentially return multiple file objects - e.g. opening by "filename", on a system where there can be multiple file objects with the same name, to be disambiguated by extra metadata (keywords, version numbers) can have the above apply.

filenames are just one form of metadata that can apply to file objects. Other metadata may apply: keywords, version numbers, cryptographic signatures. Should be able to handle situation where some but not all metadata is accessible:

e.g. filename is allowed, file contents access is allowed, but access to certain crypto signatures is not allowed - may not even be allowed to see who has signed things.

Each such metadata instance should have any of the above properties: visible, forbidden with error notification, forbidden silently.

This extends past visibility to permissions such as writeable, appendable.

Similar treatment for "obliterate" - completely removing an object from repository. E.g. removing proprietary code erroneously checked in to an open source project, or vice versa:

Such removal is just like a permissions failure, with no possibility of getting around it (except possibly for backups...).

Friday, June 06, 2014

Todo-lists versus task queues

New in my emacs org mode setup:

"TODO-LIST-EMPTY(-!)darkorange" ;; empty list, waiting for stuff to be added.

;; really, there are 2 different concepts

;; (1) a todo-list - an object that may start empty and be completed

;; (2) a task queue - an object which may repeatedly transition from empty to non-empty to empty to ...

;; We use the same term for both.

E.g. there is a queue of personal items - my personal todolist

Or "to-do today" - really a queue.

Or, more like:

"todo today" is a list, which one hopes may be completed at the end of the day. but usually is not.

items left over at the end of the day may be moved forward to tomorrow, or next workday.moved to a longer term todo list or tracker. Or abandoned.

Conceptually, todo-today is a view of my overall todo queue - not necessarily a snapshot at a single point in time.

Something like a query "all todo list items from (midnight,midnight], that are on my top priority list, as of the end of the daty (or the current time, if not at end of day)"

Thursday, June 05, 2014

Smart alarms for when awake and working

The link is vaguely related, but this post was not prompted by it - I just wanted to have some link to the state of the art: Five free apps to help remind you to take a break - TechRepublic:

Like many folks, I sit too much working at my computer.

I have an activity monitor, my Basis Watch. It tracks how long I sit still. In the past few weeks, I have still for as long as 4.5 hours at a stretch - that's absolutely still, in my chair at my keyboard, typing. Nearly every day I sit still for 2 or 3 stretches of 2 hours. Basis allows you to set a goal - "Don't Be A Sitter: from 9-5, get up every N hours." I currently have the goal set at 2 hours - because whenever I move it lower, say 90 minutes, it gets depressing.

I have tried setting alarms to remind myself to get up and work. Right now I have alarms set at 11:30, 2:00pm, and 3:30pm. Why so irregular? See below.

This post was prompted by my 11:30 alarm going off. Unnecessarily, because I had just been active, walking over to a coworker's desk.

Having the alarm go off unnecessarily is irritating. Having it good just as I am settling back at my desk to get back to work disrupts my concentration, breaks my flow. Having the alarm go off when I am 30 or 45 minutes into a good working period, into flow, really pisses me off. I have this theory that interruptions while you are in the middle of a critical b it of work, several things up in the air, in your head, is one of the primary causes of bugs.

What I want is an alarm, a reminder, to get up and move around, that is not at an absolute time. What I want is an alarm that occurs, say, an hour from the last time I got up and moved around. An alarm that is smart enough to reset itself

More: I want an alarm, a reminder, that is smart enough to detect (by some heuristic) if I am in flow or not. (Hmm, I wonder if my Basis watch can reliably distinguish typing. It's accelerometer is on my wrist, after all. Since I am a hunt and peck typist, 60wpm, but 80% right handed, I might have to move my watch from left wrist to my right wrist.)

A reminder that is smart enough to try to look for a period to notify me, after I have been working for an hour, but before I have been working for 2. Looking for a period where I am not working intensely. Possibly looking fior a period where I am not typing intensely - or possibly looking at what I am actually doing, whether I am working, or in Blogger (like now). Which is smart enough to look for a good period to interrupt me. But which might interrupt me no matrter what I am doing after 2 hours of sitting.

You know those prtoducts that try to wake you up at a good point in your sleep cycle? e.g. http://www.sleeptracker.com/how-it-works/

What I want is a product that interrupts me at a good point in my work cycle.

(Ideally it might be smart enough not to interrupt me when I am in a meeting. )

--

Smart alarms when you are awake.

Smart alarms when you are workuing.

Smart alarms should not just be for sleep.

---

(Possibly it could have the sort of incremental alarm feature that a dawn simulator has. A low priority background notification that ramps up gradually.)

---

OK, I should just go ahead and write it myself. I installed Tasker for Android to write such scripts. I am not a big fan of Tasker - stupid graphical interface, but worse, my phone battery always drains. I got the basic timer functionality working, but was not able to detect movement by accelerometer. (Blogging mwe this prompted to re-Google, and I found https://play.google.com/store/apps/details?id=com.kanetik.movement_detection_trial_premium)

Thursday, May 01, 2014

Temporary files, security - and filesystem transactions

It is well known that temporary files can be security holes.

Hence

mkstemp(3)

which generates a unique temporary file name, creates and opens the file, and returns an open file descriptor for the file.

"Atomically".

But this may not be enough.

E.g. today I am trying to replace a file with a new file generated from it - by creating a temporary, and then renaming.

Problem:

* I can't rename the file specified by a file descriptor

* if I rename the file

* on Linux, the name may have been reused, since Linux allows files to be removed even though open

* on cygwin, cannot rename if open. but if I close the handle, then the bad guy may be able to race and intercept

We can discuss kluges for this specific case:

* e.g. rename a file specified by descriptor

But the more general problem is

* atomicity

* and the fact that temporary filenames are globally visible.

If the temporary filename were not globally visible, then could securely

create tmp

write tmp

close tmp

rename tmp

with confidence that nobody else is squeezing between.

More generally, if we had filesystem transactions to guarantee atomicity

BEGIN TRANSACTION

create new file1, file2

write new file1, file2

close new file1, file2

abort if error

END TRANSACTION

Then we can create multiple such files, without having to mess with temporary filenames,

and without having to rename the temporary filenames to the official filenames.

We can use the official filenames from the get-go.

I.e. filesystem transactions automatically create secure hidden temporary files.

Without error prone programming.

---

The same may apply to shared memory transactions - but is most interesting when the shared memory has fine grain access control, e.g. capabilities, rather than the "shared-memory = security hole" we have nowadays.