The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Friday, June 26, 2009

git subtree

Mike Haertel forwarded me a link to git subtree.

It goes a long way towards what I want. It doesn't have the intentional branching - what it means for a project to be on the same branch as a subproject library it is using. But it goes a long way, perhaps most of the way, towards what I want.

Am I paranoid, or is this unacceptable?

I'm quite reluctant to blog about this, because it makes me seem crazy. Paranoid.

But: my computer is really slow. Lots of examples documented on this bog. 15 minutes to come out of standby. Pauses of minutes in the middle of writing a document. Etc. Etc.

I cannot imagine that most other employees suffer this slowness. I cannot imagine this being accepted by a wide number of people.

I can imagine that my computer might be slowed down by security software. Not just virus scanners, but possibly keyboard loggers or network traffic recorders or the like. Possibly installed by corporate security or IT or legal. I hope not malware. It often seems to run more slowly after I have blogged in public as I am doing now.

If this is the case, well: I understand that my employer has the right to monitor everything I do on my work computers. I agreed to it. It is US law.

I just wish that it did not impact my productivity so much by slowing down my computer so much.

And if I am being falsely paranoid, if these slowdowns are not the result of monitoring and security software, if other employees experience substantially the same slowdowns: WHY IS THIS TOLERATED!!!!!

Austin Airport Blog

I’m very happy with Southwest Airlines on this trip. Still cattle car boarding, but the legroom is better than any airline (United, American, Continental, Delta, Northwest) that I have flown recently. My shoulders are still wider than the seats, but that’s true of any airline below first class. (And I never travel first class.)

Here’s a nice touch: by Southwest’s gates in Austin’s Bergstrom airport, they have quite a few (8 that I can see) cabinets with power outlets besides unusually comfortable chairs. They even have USB, to charge up your phone. This is a welcome change from the usual airport “socket hunt”, having to walk all around to find an outlet, often far distant from your gate and requiring you to sit on the floor.


Peeve: pay internet.

At the Hilton in Austin: bad enough that I was paying twice what I should have, but that I had to pay 14$/day for Internet in my room as well? Or, rather, I did not pay, except two nights. The conference had free wireless, but with 150 slots for more than 500 people it was usually unavailable. Trouble is, my coworkers expect me to be connected. Now, here I am at the Austin airport, reluctant to pay for Boingo hotspot access. If I did pay, I could have my expense report (which must be completed online via a web app) before I got home. But I won’t pay, so even that trivial task piles up. If I were still at my hotel room I would have the last few of my second 24 hours. It’s not so much the paying for Internet, as the paying for it repeatedly because I am moving around. I swear I will get a data plan for my phone…


My new laptop computer has no privacy screen. That has a chilling effect. Must order one. I wonder why my company’s IT department doesn’t automatically provide one. I suppose that I should have ordered one as soon as I got my machine.

The chilling effect: so much for reading email at the conference, when it is visible to all behind me. I wanted to use the free internet at the conference, rather than the pay internet in my room. So much for reading ion the plane. I was willing to risk blogging, but then a few calendar reminders popped up.

Blogging from ISCA: couldn't keep up

I couldn't keep up. Not only is my typing limited, but my laptop PCs could not keep up.

I am not a touch typist, although I am a very fast hunt and pecker. It's not clear how much value there is to stream of consciousness blogging while I am watching a talk. Why not just get the slides from the speaker? There's value in recording answers to audience questions and off the cuff remarks that won't be on slides. But it is not clear that I will not be hurting myself by making such recordings. E.g. the remarks by HP's Kauffman about Intel's Atom in datacenters.

My company provided HP 8530 laptop is fast enough. When it is up. But when it takes 15 minutes to come out of a power save mode (as mine just did), it is darned easy to miss most of a 20 minute talk.

My OLPC XO is fast enough, and comes out of standby more quickly. I ran into a problem with fairly large text files - at a certain file size it slowed down to a crawl. Even slower than my work PC.

Anyway, enough excuses. My attempt to blog ISCA was not a success. Too bad.

Monday, June 22, 2009

Blogging from ISCA: limted WiFi

Only 150 connections for 500 people.

To reduce contention, will blog offline to a text file.

What were we saying about ubiquitous connectivity? You only get that if using cell services.

Sunday, June 21, 2009

Blogging from ISCA: WDDD: Is TM easier than locks?

Is TM really easier than locks? Programmers must still write critical sections.

Experiment: measure on a class of inexperienced programmers in a UT OS class.

The problem: sync gallery: a rogue's gallery of synchronization. Rogues shoot paint balls in lanes. Cleaners. Etc.

9 different variations: {single lane, two lane, cleaner thread} X {coarse, fine, TM}

TM support: Year1 DSTM2 (Herlihy 06). Year2 JDASTM (Ramadan 09). Library, not langage support.

DSTM syntax baroque. "thread.doit(method)" == thread.execute_method_as_transaction(method).

JDASTM: txbegin, txend.

Survey included experience, hours, rankings

147 students

Hours reported:

single lane: coarse grained > tm > fine grain. They did coarse grained first.

Best syntax: coarse > (TM|Coarse) > fine > (fine|conditions)

Easiest to think about: coarse > fine|conditions > tm

Defects tested using condor.

Errors taxonomy:
  • lock ordering - 8%
  • lock cond - 7%
  • lock forgot - 15%
  • cv exotic - 8%
  • cv use - 11%
  • tm exotic - 3%
  • tm forgot ..
  • tm order - not really a bug ..
TM had far fewer errors.


Wants support for atomic blocks.


I asked if there were programmer visible abort actions - if there was contention. There was not.
AFG opinion: abort actions will be a source of errors. Hard to test. I like transactions, but I hate increasing non-determinism.

Audience Q: learning curve. Why not teach TM first, and then translate into locks?

I was surprised at syntax preference for coarse grained locks. IMHO txstart/txend is almost equivalent to lock/unlock coarse grained locks. The presenter was similarly puzzled.

Blogging from ISCA: WDDD: Emmett Witchel: Mondriaan-like

Wrote paper: lots of people need metadata. Mondriaan. Pixie dust.

Mondriaan is a general solution, but must be engineered to fit a specific problem.

Mukti-paper projects. Emphasize novelty. Features disappear because they didn't work; lack of space; lack of novelty.

Mondriaan gave metadata for each 32-bit word. No alignment restrictions. No warts.

MMP hardware: like page table. Perm Table Base. Perm Table. Protection Lookaside Buffer.

Mondriaan tradeoffs:


PLB refill costs: minimize refills / maximize reach.

SW overheads writing table entries: infrequent updates and/or simple table updates.

Mondriaan stored permissions data, but can handle arbitrary metadata.

Talk is something like a design rationale for Mondriaan. 2 bit permissions for each word. (Byte?) Run length encoded. Zones <= 4 for 16 words. Fallback to pointer to a bitmap entry.

(AFG: I'm not interested in zones. Bad security model. But the metadata mechanisms may be interesting.)

RLE made writes slow.

Mondrix wrote permissions table entries a lot. RLE made it slow. Fell back to using bitmap entries. Coarser grained protection. Page granularity.

Emmett walks through examples of other work using Mondriaan-like metadata.

Colorama. Luis Ceze HPCA 2007. Explicitly uses Mondriaan. Colors data structures.
Measured dynamic memory allocations (malloc) every 1,900 instructions.

Loki. OSDI 2008. All data has 32 bit security tag. Tags per page, tags per word.


Balance space, PLB reach, write overhead.

PLB reach neglected.

Measuring write overhead maybe too hard for research project: suggests counting, multiplying by a cost estimate.

Audience Q: Dave Christie. PLB coherency? A: Mondrix did not deal with.

Dave Christie of AMD (also of my high school) asked about single threaded coherency and multithreaded. It sounds like he has some familiarity with the issue.
I think that it is also interesting that David chose this talk to attend.

Colorama: Gabriel Loh, Luis Ceze, Emmett: not totally practical, but interesting. Luis: had some deadlocks, might need TM support as well.

Blogging from ISCA: CARD: Panel Session, Multicore Programming

Arvind, David August, Keshav Pingali

Phrased as a debate between A and P

The motion: How should the potential of multicore processors be extracted by applications programmers?

Arvind: Intro

Implicit vs. Explicit


Too much detail => programming to hard. But is there an appropriate level of detail.

(AFG: infinite fine grain CPUs. Expose memory, cache, line size.)

Is expressing non-determinism essential to ||?

Speculation? Or not?

Debaters: August for (implicit), Pingali against (implicit, for explicit).

The motion: How should the potential of multicore processors be extracted by applications programmers.


NOT parallel programming

NOT parallel compilers

FOR implicitly parallel programming, + parallelizing compilers.

What is implicitly parallel programming.

E.g. "inline" directive on function. - explicit. Implicit inlining done automatically by compiler. Often better than human.

Tim Mattson's list of >150 explicit parallel programming languages/systems.

Q: Is explicit parallel programminginherently more difficult than implicit parallel programming?

Quotes Pingali PLDI, quoting Tim Sweeney, who said that Unrel || tripled SW development.

SQL: ... implicit parallel. Another Pingali quote.

Example: commutative annotation. e.g. indicating orderof rand calls between loop iterations. "It's okay if I get my random numbers out of order."

SPECint2000, modified ~50 lines of code of 2M. "Restored the trend" ...

Is it important to be able to express non-dererminism: YES.

Speculative Execution? YES

Is explicitly || needed by anyone other than compiler writers and systems programmers? NO

Only explicit indications needed are of the form function/anti-function (ad-to-list/remove-from-list).

Allies: Church Turing Thesis

Full Employment for Compiler Writer's Thesis.


For explicit.

Necessary evil for applications programmers writing irregular programs, e.g. linked data structures.

YES to non determinism. YES to speculation. (Data value dependent parallelism.)

Delaunay Mesh Refinement.

Don't care non-determinism.

Parallelism depends on run-time analysis. Compiler can't...

(AFG: can compiler generate the code to do the analysis?)

Galois model (non deterministic set iterators.)

Collections approach: libraries implemented parallel by experts,
used by higher level programmers. E.g. database B-trees.

Problems: no fixed set of datastructures.

Even generic datastructures need to be tuned.

In many applications the API functions do little computation. Little time in the datastructures. (AFG: but ||ing e.g. a map execute-over-all...)

August counterexamination of Pingali

A: Are there more libraries than programs?

P: Doesn't matter. App programmer still needs to optimize data structure for his app.

A: since in Q&A, I won't explain how you were wrong.

A: Are there patterns? Cliches? Compiler support for same.
In the past when compilers were a 15% solution, not enough money to develop.

(AFG: how many compilers handle even the Gang of 4 patterns? Let alone parallel programming.)

P: functional languages, compilers, haven't gone anywhere over the years.

A: analysis wrong crutch to lean on. E.g. perl (perlbmk). Parallelism in input set. No amount of analysis, neither compiler nor Perl programmer, will know ||ism.
Runtime analysis.


Mattav (sp?) UT prof: Parallelism not a problem. Implicit works. Explicit works. Tuning is the problem.

A: compiler needs to observe effects of transformation. Run-time feedback.

Mattav summarized P and A's response as "both of you are saying implicit for locality".

I asked my STL question.

To P: not using STL grounds for firing. Customizing grounds for firing. Maybe just fire them all / wait until ew generation hired?

P: STL doesn't parallelize. Is just the datastructure. (He didn't follow the map applicator). Joe programmers, doman specific. Stephanie programmers, ||. (He didn't say ninja/padawab.)

To A: people using iterators, but not map application. Can your compiler convert?

A responded: there exist tools that change datastructure, e.g. list to vector,
or to skiplist.


Audience Q: to A: where do you draw the line between explicit ad explicit.

A: if the annotation brings information that cannot be determined automatically, that's implicit. (AFG: copout. That's explicit. But nevertheless still appropriate.)

Arvind's close

If a course is important enough to teach to grad students, why not teach to freshmen?

Should || programming be taught to freshmen?

A: explicitly || programming taught to junior/senior (3rd/4th year)

P: Teach about parallelism in algorithms first. Later, implicit parallel at Joe level.
Later still, fully explicitly parallel.


Applications programmers do not need to write explicitly parallel programs.

The motion does not carry.

Blogging from ISCA: I fell behind

I fell behind. Laptop battery died, couldn't find a plug. Session hopped. I'll write more when I get a chance.


FutMem tutorial

HP Kauffman datacenters.

Blogging from ISCA: ACLD: Chuck Thakker, Rethinking Data Centers

Chuck Thakker invented the PC and Ethernet.

Chuck is now a post-PC person. He has no laptop. He survives with a SmartPhone. (It wasn't clear if he showed a Windows Mobile SmartPhone or an iPhone.)

Datacenters... shipping containers full of servers.

Anecdote: shipping containers were designed as a system... they succeeded where others had failed because he made them into an ISO standard.

Anecdote: SF longshoremen fought containerization of Port of SF. Oakland did not. SF is no longer a commercial port.

Container advantages:

Side to side airflow not impeded by the server case. There is no case. The container is... Loop airflow. Big impeller fans.

Shock mounting at the server, not the rack.

Aggressive ideas:

Once through water: pump through datacenter, and then on to farms.

Power distribution: reduce conversion steps.

12VDC->1VDC point of load regulators ~90%

AC->12VDC 2 stage 85%. Can do better: combined power factor correction, etc.

AC transformers 98%

Final efficiency ~80%

UPC and backup generators aren't part of the picture until the grid fails.

Datacenter close to big hydro dam => reliability. Not cost.

Now 1 phase AC to rack. Direction 3 phase AC to rack. Balance, lower ripple. 12-20VAC? Select to maximize overall efficiency.

Commodity servers: HP, Rackable, others.

Thakker: "The PC ecosystem is awful. Terrible." IBM non-x86 servers not in PC ecosystem. "Design our own..."

Custom motherboards.

Commodity disks

Cables at front

Redesign power supply

Thakker: whats ECC / checking.

Use lower power processors, notebook rather than server. (Note, opposite HP Kauffman.)

Network switches expensve. Unreliable.

Data center network:

fixed top

limited nodes

no broadcast/multicast

security simpler

load balance

design goals:

get rid of large switches

push complexity to edge

standard link technology, but not standard protocols

he likes monsoon.

short copper, long optical

2 kinds of switch: middle of rack, boundary. Both can be implemnted using Xilinx Virtex 5 FPGAs. Prototypable...


boundary: 10 Gb ports, 64 copper to rack, 64 optical outside container

rack 20 1+1 Gb to servers, 2 x10Gb to container boundary switches

AFG Q: what switches are outside the containers?

route controllers. (I am not sure if those are outside the containers)

likes monsoon valiant load balancing.

boundary switches are synchronous. source routed. less packet like. long lived connections.


Commodity hardware ... network switches to connect commodity parts NOT commodity.



Treating data centers as systems, and do full system optimization.


Thakker likes ATM. ATM better than Ethernet cause it doesn't lose packets. Chuck allowed to say this, since he invented Ethernet. But IP community refused to cooperate with ATM.

I asked him a question about switches outside the containers. He said, none: the datacenter is nothing except server containers and the NOC (Network Operations Center).

I asked about containers of switches. Chuck had thought, but doesn't like reliability.

Audience Q: Server computers have a 3 year lifetime. The container itself - power, cooling - can be reused - send back to manufacturer, and upgrade computers.

Compare to servers with cooling integrated: then the cooling solution lifetime is reduced to the computer electronics lifetime.

Audience Q: free air cooling. Thakker: outside air variations. (AFG: Think, winter/summer in the Dalles.) Running electronics hotter bad, reliability.

Audience Q: is power density going to increase. Thakker: microchannels, etc., bad. 8KW/rack now. Thakker doesn't think we will see 24KW/rack.

Mark Horowitz: no more than 10 cores per chip.

Saturday, June 20, 2009

Blogging from ISCA: AMAS-BT: Pardo, Crusoe

Transmeta Crusoe

VLIW 5 wide

Generic simulation support

Shadowed registers, commit/abort

Gated store buffer - 32 entries x 32 bytes.

load-and-protect - like ALAT (hw trap)

x86 condition code support.

PC support - low memory steering I/O vs. DRAM. A20M. Crusoe had hardware for memory map.

Crusoe: x86 ISA support, entirely software decode.
Pardo argues that x86 decode big & power hungry.

LongRun, voltage scaling. E.g. leave CPU at 90%, mem at 100%

Shade: 100 inst/sim inst. Perf 3:1 int, 1:1 FP.

Crusoe translation: 10,000 inst/inst.

Schedulig to VLIW target harder than Shade's RISC SPARC target.

x86 reuse rates low.

Crusoe summary:

Reliability, x86ness - good

Cost: good 1/2 Intel/AMD

Power: good 1/3 Intel/AMD

Perf: umm...

Crusoe faster than low power parts, but slower than 15W laptop parts.

Compute bound, often faster at lower watts. But there aren't that many compute bound workloads.

Memory/cache traffic: slower.

Low reuse: translation overhead -> slower

Crusoe has system gotchas:

PCI graphics, not AGP.

Software DMA, not overlapped.

How to do a small project:

Automate, automamte, automate.

Reference simulator. Must be fast enough go boot OS.

Fast VLIW simulator (for host): 30 inst/sim inst. (30 I/I)

Never published?

Narrowing: Reverse execution. Cosimulation, compae. "Nexus" binary search for first divergence. => Bit 17 in register 5 is wrong in this context ... I.e. nexus = automatic bug narrowing.


conventional, hand written

Random. Biased to interesting cases.

"Test" means "checkable". Crash => failure. Consistency check => suspicious.

AFG Q: MP non-determinism.

AFG: one nice aspect of SW DMA would be reproducibility.

Reverse HW execution in HW VLIW debugger started, no finished.

Single step trough nested fault handling. A complete debuggeris totally transparent.

Fast builds. Check in early and often.

On failure: binary search of checkins.

War stories...

CMS SW allowed working around many hardware bugs.

Not all reg resources were shadowed. Some bugs due to rolling back after non-shadowed state changed. Added rules checker to catc future bugs.

What Pardo would do differently:

Hardware was bottleneck. Changing ISA was *different* for software teams.

Big projects rules of thumb do not apply to small teams.

Better perf studies from get go. (Threw out 1st 2 CMS.)

More software inspection.


CMS written in C, gcc extension to provide HW access. More control.

Modest amount of assembly. Modest amount. Including modest amount of self modifying VLIW assembly code.

Not MP.

Blogging from ISCA: BIC: MATLAB, CUDA

This just a placeholder. Must take more notes for Pardo.

Blogging from ISCA: AMAS-BT: Pardo, SMC

Dave Keppel, Pardo, Google: Self Modifying Code

Everyone knows SMC is dead, but SMC is alive in the very tools that complain about it: dynamic optimizers, dynamic linkers, etc.

Pardo talks fast.

Detecting SMC via page protection. Slow if data and code in same page.

BitBLT - recomile every 10K instructions.

Debugger watchpoints. Change immediates in code.

Present in real commercial workloads.

Coherency events:

x86 - non.

Hardware instructions: ISCP "something changed". iflush addr. coherency(base,length).
Poor match between application and simulator/emulator. Need to detect what really changed.

Adaptive: default-write protect. Change strategy if too many faults. Fall back to default after a while.

Self checking strategy: check current ibytes against saved copy of original ibytes.

Pardo noted that invalidated code often reappears - thing like debugger watchpoints maychange back to original. "Revalidation". Another use for invaid cache entries.

Shade. SPARC. Iflush addr. But, there were some applications that did not use iflush, but which worked on real hardware.

Transmeta Crusoe. Subpage write protection.

"Fetch imediates" - translate code, but fetch immediates that might have been patched.

Crusoe: lots of retranslations when falling through the gears.

Deoptimized translators: fetch immediates. Translation calls interpreter.

Bad: BT leads to more implementations, more chances of bugs, reduced test coverage.

Performance stability, lack of. Consistent sometimes better than fast.

SMC/ISC. Q: what does ISC stand for?

Hardware support:

Crusoe: 2 write protect bits per page. Subpage WP cache.

Shade: 100 instructions to translate an instruction.

Gill51 - universa simulator.

Blogging from ISCA: AMAS-BT keynote

I'll be at ISCA the next few days.

Today: AMAS-BT workshop.

Antonio G. keynote.

Unfortunately, Antonio G and I share the same initials,AG. I will annotate him by Antion, me by me, or AFG.

Pollack's Rule: I suppose I should be gratified that one of my laws, perf = sqrt(power), is now widespread. I am somewhat chagrined that my old boss, Fred Pollack, has his name asociated with it. He publicized it in some keynotes.

Somehow perf=sqrt(power) has also crept in. And Antonio is multiplying the effects. This was not part of my, or Fred's, formulation. Perf=sqrt(area). We often assume that power, at least leakage, is 1:1 with area, which would imply perf=sqrt(power). But I do not think that this needs to be the case. Leakage may be negligible, and active power also seems to be proportional to sqrt(area). Or even less. This implies that perf=sqrt(active power), or less.

Antonio says that multicore => 1:1 perf increases. 2 cores => 2x parallelism. Q: is this correct? The old rule of thumb is that MP, too, perf=sqrt(#processors).

Antonio makes the EPI = Vdd**2 * Cdyn + Leakage. Handwaves leakage. Says Vdd cannot be lowered. (Me: is this true? Differential signalling?) So argues about Cdyn.

Guest ISA / Host ISA. Me: although part of the story, the real challenge is not BT fro ISA to ISA. The real challenges are (a) coming up with a host uarch and ISA that makes sense - that would make sense if compatibility was not a requirement. (b) Minimizing the cost of dynamic instrumentation and optimization.

I.e. the basic host ISA and uarch must make sense, irrespective of the guest ISA.

With an exception: possibly the host ISA is big, with lots of hint bits. The guest ISA may be small and compact. In this case, perhaps the act of binary translation itself helps. The guest ISA may be considered to be a cpmpact form of the host ISA, with just the semantics. The host ISA may be considered to be an expanded form. The host ISA may be considered to be a cache of performance annotations to the guest ISA.

Antonio: memory checkpointing. AFG comment: easier to BT single threaded or message passing programs, harder shared memory.

Antonio: adapting hardware. Resizing, power gating. AFG: hardware can do simiar adaptation. Software dynamic must have larger time constants.

Antonio: BT advantages include compatibility, both over time and across different microarchitectures (which he calls scalability). He notes that forward compatibility is especially interesting. E.g. old binaries taking advantage of new hardware features, like longer vector registers.

Pardo asked about soft real time workloads. Variability introduced by dynamic systems.

Friday, June 19, 2009

In airport, forgot PC power supply

Passing through airport security, on my way to ISCA, I realized I had forgotten my work laptop PC's power supply (AC adapter). Typing this on my OLPC XO non-work computer.

I remember packing my power supply and PC, but then unpacked them to print my boarding pass. Must have forgotten to repack the power supply, although I have the laptop.

I wish PC power were more ubiquitous. One of my PCs - which,the Concerto perhaps? - had no separate AC adapter. Instead, AC right inside the latop. Harder to forget.

Tuesday, June 16, 2009

Error reporting in libraries

It's a perennial problem: You have written a library dfunction, perhaps a header-only class. Usually it is silent. However, occasionally it suffers errors or warnings. The interface does not provide a nice way to return an error code - e.g. you are trying to return a string, not an int where 0 or -1 indicates failure. You want to print error messages or otherwise report errors for these hopefully rare cases.

Special case: assertion failures. If you are writing something like a simulator, with simulator stdout and stderr separate from program-under-simulation stdout or stderr, you probaby want to ensure your assertion failures go the the simulator log. This is usually easy, but in Pin simulator and program file stdout and stderr are shared.

Note: we are not just talking about error messages that are followed by death. Sometikes we are talking about warning or informational messages, after which the program should keep running.

if everything belongs to a single class, you might give the class daa members for the streams to be used. e.g.

class Foo {
std::ostream m_cout;
std::ostream m_cerr;
std::ostream& my_cout();
std::ostream& my_cerr();

with the usual methods wrapping the data member ostreams.
E.g. to allow you to print to std::cout and std::cerr if not initialized.
But not everything lies in the same object/class.

Passing around ostreams to every possible function is ugly.

Adding ostreams to every possible class is ugly.

I've occasionally passed generic "env" objects to lots of functions, where the env carries things like ostreams, and other good stuff like exit functions.
But this is ugly.

My BKM is beginning to look like ... Cpp (C preprocessor) macros COUT and CERR used everywhere. With COUT defined tio be std::cout, or my_cout(), or whatever.

Formalized as a header file COUT_CERR_ostream_redirection.hpp:

// Interface:
// To use std::cout and std::cerr
// #define USE_STD_COUT 1
// #include "COUT_CERR_ostream_redirection.hpp"
// This is also the default - if neither USE_STD_COUT and USE_MY_COUT_FUNCTIONS are defined
// To use my_cout() and my_cerr()
// #include "COUT_CERR_ostream_redirection.hpp"
// To use neither
// #define USE_STD_COUT 0
// #include "COUT_CERR_ostream_redirection.hpp"
// If you want to #include this header multiple times
// - equivalent to doing a #undef COUT_CERR_OSTREAM_REDIRECTION_already_included

#undef COUT_CERR_OSTREAM_REDIRECTION_already_included

#ifndef COUT_CERR_OSTREAM_REDIRECTION_already_included
#define COUT_CERR_OSTREAM_REDIRECTION_already_included

#if !defined(USE_STD_COUT) && !defined(USE_MY_COUT_FUNCTIONS)
#define USE_STD_COUT 1

#define COUT std::cout
#define CERR std::cerr
extern std::ostream& my_cout();
extern std::ostream& my_cerr();
#define COUT my_cerr()
#define CERR my_cout()
// let user define his or her own COUT and CERR macros
// e.g. #define COUT foo_cout() in module foo, and bar_cout() in module_bar()

#endif //COUT_CERR_OSTREAM_REDIRECTION_already_included

Friday, June 12, 2009

Implicit vs. Explicit Filenames

In an earlier post I talked about creating a UNIX command line tool for user level filesystems.

I described primitives such as
    FSinF get fileSrc fileDst
    get contents of fileSrc (in FSinF) and write to fileDst (in native filesystem)

    FSinF put fileSrc fileDst
    put contents of fileSrc (in native filesystem) to fileDst (in FSinF)
I must admit that I was thinking in terms of UNIX like filesystems, where the user specifies the filename. However, there is a more basic sort of filesystem: one where the storage system assigns the name.

My first real encounter with such a system was the MH Mail Handling system. Other mail systems are similar: when you save or file an email, you don't have to give it a name. MH gave it a number, sequentially incrementing. Outlook doesn't even do that, although there is probably a message ID somewhere. Apple Itunes' music manager is similar. The user typically looks at listing of the files in a folder, using metadata that may be extracted from the file, or otherwise associated. In other words, there is a browse/search/select interface. It is harder to automate than a simple filename interface, because uniqueness is harder to guarantee for a query selection.

Content based filesystems are similar. E.g. git, where the name is the hash.

Some content based filesystems cannot handle conflicts - two different files with the same hash. Git seems to have little provision for conflicts, although I have been told by Linus that it handls them. My own work in content based filesystems handles hash conflicts - the filename is the hash, with a version number to handle conflicts. The comparison to handle conflicts can be avoided, at the risk of data loss (such as occurs in other content based filesystes).

In any case: for a strictly hash based filesystem that does not handle conflicts, the user could calculate the hash outside and install it using a filename interface. However, for a content based filesystem such as mine, only the filesystem can return the filename.

This suggests an interface:
    fileDst := FSinF put fileSrc
    put contents of fileSrc (in native filesystem)
    into fileDst (in FSinF). FSinF specifies the filename.
where the storage system (FSinF) returns the filename.

Call these two types of filesystem implicitly versus explicitly name filesystem.

Obviously, an implicit filesystem can be built on top of an explicit filesystem. The wrapper calculates the names.

An explicit filesystem can be built on top of an implicit filesystem, but it needs bootstrapping: one file that can be located without knowing the implicit name, that contains the mappings of explicit to implicit names.

Version Control - File Branches and Directory Tree Branches

You have file version objects. These are the contents of a file at some particular point in time.

You have file objects.

I know that file objects are not necessarily the right thing. E.g. Linus' objection, that is is better to have tools that can track pieces of code around, and automatically figure out, e.g. when a file has been renamed or a function has been moved between files. E.g. the old Cray and IBM version control systems that treated entire projects as card decks, and could version particular regions or intervals of cards. But these just say that there should be provision for subobjects at finer granularity than file objects - line objects, function objects. expression objects? Character objects? When I start thinking of XML as source code, such things become natural. Nevertheless, for now, file objects are a natural unit.

There are file name objects.

File name objects are grouped into directory tree objects. (We'll probably just say tree objects.) There obviously should be no conflicts - no two file objects should have the same pathname within a tree object. If a given file object appears at two dfferent places in a sdirectory tree, they are logicaly linked. We may want these links to be hard links a change to one fileame is reflected in a change in the others), or copy-on-write links (chanes break the links). Soft links, such as UNIX symlinks, are different sorts of file objects.

It s possibly to traverse from file version objects to file objects, and hece to filename objects and directory tree objects. And vice versa.

File objects may be versioned. Obviously.

File objects may have branches or lines. Lists of file version objects. Wth description of intent: if "on" branch1 for fileA, a new checkin of fileA is made to branchA, extending branch1.

It is possible to go from any particular file version object to file objects, and to all file object branches or lines associated with the file object.

File object branches or lines are associated with individual file objects. (Often, file name objects.) Usually they are not visible to users, since users normally care about multiple files, i.e. directory tree objects.

Directory tree objects may have branches or lines.

Directory tree branches or lines may refer to particular file object branches or lines. This is an indication of *intent*. It says that any new checkins on those file object branches or lines should be included in the directory tree branch or line. After appropriate testing.

Directory tree branches or lines may refer to particular directory tree file version sets. Which are, of course, objects themselves. A directory tree version set is a set of filename objects, file objects, and file version objects. I.e. a directory tree file version set object is a particular checkpoint of file versions.

Directory tree branches refer to sets or sequences of directory tree file version set objects. Typically aranged in a directed graph. However, where in git branches are really just these tree objects, which point to each other, in my ideal version control system directory tree branch objecs are separate from and conain additional information beyond the directory tree file version set objects.

The directory tree file version set objects are just particular points or contours of file objects. The file objects themselves have their own lines or branches. File object versions that belong to file object branches that are part of a directory tree object, but which have not yet been incorporated into a directory tree file version set object for that branch, can be said to be EXTRAPOLATIONS of the branch. Similarly, file versions (say file.v2) that belong to file branches that are between other versions file.v1 and file.v3, where file.v1 and file.v3 belong to file version set objects on the branch, but where file.v2 does not, can be said to be INTRAPOLATIONS of the branch. Logically, such intrapolated file versions belong to the directory tree's history, but they are not first class. It may be useful to know abot them, so that you can go back to them while trying to figure out where a bug was introduced. But they do not necessarily belong to a consistent set or contour, as reflected in the file version set object.

Directory tree branches or lines may be bound to different file object branches or lines at different times. Such varying tree/file branch or line bindings should be recorded historically. Notice that there is a difference beween current bindings, the bindings at any particular point in time, and the set of all such.

Branches or lines are necessarily history objects. There is a distinction between the branch, and the branch head.

In git, branches seem mainly to be poiners between objects or tgree versions. I want branches to be first class. git gives examples of rebasing, followed by garbage collection. I want the excursions and backtracking to be recorded in the tree history. Of course, it should be possible to change history - to go v1 -> v2 -> v3 -> v4 / backtrack, and get a pruned history v1 -> v2 -> v3 -> v5. But I want also to include
  • v1 (time t1)
  • v1->v2 (time t2)
  • v2->v3 (time t3)
  • v3 was backed out, and reverted to v2 (time t4)
  • v3->v4 (time t5)
Now that if the file object branch or directory tree object branch is itself a file, then the slightest change in history will create a new version. (A new file object branch version file.) We can expect that they will pack nicely.

What does it mean to have a subtree branch? The subtree branch is just a directry ree branch, that does not contan all of the (current) file objects. Because checkins to subtree branches automatically are made to individual file object braches, they automatically become candidates for inclusion in other directory tree branches that refer to that file object branch.

Branching is hard or soft. Hard branches actually break the default, implicit, connection with other directory tree objects sharing a file object branch. (Paradoxically, this is amost exactly the opposite meaning as hard links.) Directory tree branches may share file object branches in a soft manner: changes to soft shared branches are candidates for inclusion , i.e. they are extrapolated, in other directory tree branches sharing that file object branch.

Directory tree branches or lines will probably be named.

File object branches or lines probably usually do not have names. They are made as side effects of the user visible directory tre branches or lines. E.g. if a directory tree branch declines to accet a file version object that was checked in on a soft shaed file object branch, then, implicitly, a new branch is probaby going to be created for that file iobject. We don't want the user to have to create such names. Creating names is hard work. We probably want to annotate the file version objecs and file branch objects with the directory tree branch or lines that led to their creation - also, for that matter, which diectory tree branches incorporate tyhem - we do nolt want to use this in the naming of the file object branch, since it would incorrectly imply that the file object branch belongs exclusively to that directory tree iobject branch. Which is not the case. A file object branch belongs to many directory tree branches or lines, especially when there is soft sharing (extrapolation or intrapolation).

Directory tree objects may be logically assembled out of other drectory tree objects; and/or directory tree object branches may be logically assembled out of other directory tree object branches. Probaby with simple assembly rules: no conflicts, or, override on a per-file-object basis, or, override on a per subtree basis, or ...

Although directory trees may be assembled this way, it may be wisest to flatten this informaion for recording into a directory tree file version set. This way we avoid revising history on one branch affecting the history recorded on others that include that branch. However, we probaby want to record the overlaying history.

This flattening is rather like AMD's build-list.

Tags are normally applied to directory tree file version set objects.

Tags may also be applied to subtree objects or file version objects. e should distinguih "whole" and "partial" tags.

Tags applied to branch objects are, naturally enough, good candiates for branch names.

Tags themselves are versioned. E.g. "This tag XYZZY applied to ... at time t0, and ... at time t1".


What data structure? Files? I keep coming back to relational tables - albeit ones with many conraints. But my usual objects to RDBMSes apply.

Wednesday, June 10, 2009

5 seconds to file mail and move to next in Outlook

Reading a lot of email, and immediately filing and moving on to next.

The command is accepted quickly enough, but it is taking about 5 seconds to move on to the next email.

During that 5 seconds, I have the subject line of the new email, but the contents of the old. Several times I have wrongly handled mail because I thought the contents were for the old subject line, not the new.

Outlook. Specifically Outlook 2007.

It would nt be so bad if I were locked out for 5 seconds while filing the old and fetching the new - if, for example, there was some visual indication such as graying the old and new messages in the subject line summary, displaying the old message greyed in the preview pane, etc.

And if new interactive commands were locked out until the display was consistent.

And if the time delay were consistently, say, 5 seconds. But with it randomly varying between 3 and 10 seconds, it is hard to know when to trst what I see on my screen. I am finding myself counting secnds every time I go to a new screen, to give it a chance to become consistent.

Tuesday, June 09, 2009

More Git


Friday, June 05, 2009

git slowness

Lots of coding in my checkin early and often style - I'm actually coding, rather than spending most of my time trying to figure out how somebody else's code works.

Unfortunately, git is becoming a real performance bottleneck. Especially compared to CVS.

Reason: I'm trying to use a single big repository. Logically there are many separate subprojects, but I found that using srparate git repos was a real hassle.

In CVS you can restrict the repository scans to a subtree. Apparently not so in git. Or, at least, I don't know how.

Full repo scans are great, but have a perf cost.