CompArch: "Hint instructions"
Oftentimes the hardware, the processor (or memory subsystem, or ...) microarchitecture,
wishes to know information that the programmer or compiler might already know.
Similarly, oftentimes the compiler wishes to know something that the programmer might already know.
Conversely, oftentimes the compiler wishes to know something that the hardware might already know,
if the program has already been executed several times. This last is my cute way of trying to fold
[[profiling feedback]] into this discussion.
Such questions might include:
* "Is this conditional branch more likely to be taken, or not?"
* "Is this load likely to forward from an earlier store, and if so, which?"
* "Is this load likely to be a cache miss?"
TBD: create a canonical list of operations for which hints have been proposed.
= Hints to compiler =
Computer architecture folklore, that probably has a basis in fact (TBD, get reference):
the original FORTRAN compiler allowed the programmer to say which alternative (of its 3-way arithmetic IFs) was most likely.
I.e. it allowed the programmer to give a hint to the compiler.
The story goes on to say that the human programmers were usually wrong.
In any case, there are often ways of conveying hints to the compiler. #pragmas are one example.
[[Profiling feedback]] is another.
Having received a compiler hint, the compiler can change code generation, e.g. the order of basic blocks. Or it might choose to implement the [[Hint instructions]] to hardware described here.
GLEW OPINION: hints from human to compiler should allow at least two representations: (1) inline, and (2) separate.
It should be possible to provide such hints in the source code, as close as possible to where it applies:
#pragma IF-THEN probablility = 90%
IF ... THEN
I am agnostic as to whether the hints or #pragmas should be before the statement, or inside the statement
IF ... THEN
#pragma IF-THEN probablility = 90%
#pragma IF-THEN probablility = 10%
However, it should be noted that designing a pleasant and unambiguous set of #pragmas is a hard language desgin issue in the first place.
I am NOT agnostic about the following: there should at least two representations: (1) inline, and (2) separate.
Sometimes the programmer does not have the ability to modify the source code.
Then it is necessary to be able to specify the hints separately,
e.g. on the command line
cc -pragma "foo.cc line 22 IF 90%" foo.cc
or in a separate file
foo.cc line 22 /if( c1 < c2 )/ probability 90%
bar.cc line 19 /.../ probability 10%
It is, of course, clumsy to specify the location where hints should apply in a separate file.
Line numbers, possibly with regexps to apply when the line numbers change by editing, may be required.
But it is necessary.
= Hint Instruction Set Embeddings =
How can hints be embedded in the instruction set?
* Fields of non-hint instructions
** CON: takes bits away from instruction encodings that are already tight.
** e.g. BRANCH-AND-LINK, with coroutine versus call semantic hints
* Separate HINT instructions
** CON: takes instruction fetch bandwidth
** PRO: more bits available, and does not waste bits if hints are not used
* Separate HINT instruction stream
:: Just as I recommend that compiler hints can be placed inline versus out of line, it has been proposed by some researchers to have hints, and possibly other operations, in a separate instruction stream, paralleling the normal instruction stream. Just as with compiler hints in a separate file, issues arise as to how to establish the correspondence - typically by IP.
* Hint prefixes
On a variable length instruction set like x86, certain prefixes can be applied to existing instructions. If the prefix is oroginally a NOP, it can be used to provide HINT semantics.
Similarly suffixes, and other modifiers.
* PRO: the hint cost is paid only by the hint user
* CON: variable length instructions ad prefixes are a pain to decode
See [[Call skip and hints]] for a proposal to hide hint instructions in the shadow of unconditionally take branches, to reduce overhead with less complexity than a hint instruction stream.
Another intermediate form is to have hint instructions in the instruction stream, but in a form that allows them to be hoisted out of loops and far away.
E.g. instead of
HINT: next loop branch is taken 32 times
One might do
HINT: loop branch at IP=label is taken 32 times
label: jnz loop
This makes the trigger for the hint more complicated, but reduces the instruction decoding pressure.
= Hint Instruction Set Encodings =
Probably the most important principle in instruction set design is to trap undefined encodings, so that they can be used for new instruction set extensions on new machines, and so that those new instructions can be trapped and emulated on old machines.
However, a slightly less important principle is to, sometimes, reserve multiple flavors of NOP instructions.
Less frequently NOP operands.
Such NOPs can be reserved for hintability in future processors,
but are treated as NOPs by present processors.
;Anecdote, or, Giving Credit:
Although the principle of reserving NOP HINT instruction set encodings was known during Intel P6,
Ronny Ronen of Intel Israel took the next step:
he realized that we could take several existing undefined instructions that took a #UD trap,
and redefine them to be NOPs reserved for future HINTs.
Although these new NOPs would have to be trapped and hinted,
several years later most of the marketplace would be full of machines that treated them as NOPs
- and at THAT future time they could be re-defined as GHINT instructions,
emulated effuciently as NOPs on older machines,
and only trapped slowly on really old machines no longer common in the marketplace.
Note that it is important that, although treated as NOPs by the hardware, the compiler be discouraged from using them,
from using any but the canonical NOP.
Should the compiler or programmer start using the different flavors of nop,
let's call them NOP_HINT_0 ... NOP_HINT_15 or similar,
to convey information (e.g. information to a hukan viewing an instruction trace in a debugger),
then it may be difficult to grant them specific hint behavior in the future.
= Prefetches and Hints =
Are prefetch instructions hints? It depends.
If prefetch instructions can be ignored, yes, then they are hints. You might choose to ignore a pefetch instruction if some other prediction mechanism overrides it.
If prefetch instructions have side effects, such as setting [[access or dirty bits in page tables]], or even taking [[page faults]],
then it is questionable whether prefetch instructions are hints.
If these side effects are optional, not required, then even such a [[prefetch instruction with architecturally visible side effects]] (*) can be considered hints. If these side effects are not hints, then the prefetch instruction really has significant non-hint consequences.
: * Note: WikiWish: I would like to have [[here labels]] - the ability to define a term inline, saying "here is the definition of a term", using some notation like double curly braces used to define wikilinks. Without all of the formality of creating a wiki page or section. Somtimes the best definitions are given in context, rather than broken out.
= Branch Prediction Hints =
== Call indirect default target hints ==
* [[Call skip and hints]] and [[How_to_use_the_extra_operands_of_indirect_call_instructions#CALL-INDIRECT-WITH-SKIP the discussion that led to it]]
= Security and Reliability are NOP HINTs =
It is possible to implement security and reliability instruction set extensions as NOP HINTs on older machines.
For example, Milo Martin's HardBound bounds checking that detects buffer overflow bugs.
Think about it: for a correctly written program with no buffer overflow bugs,
it should be possible to disable all such security checks, and the program run correctly.
Only incorrect programs would break if the security checks became NOPs.
(Or, malicious programs would break-in.)
Milo Martin's HardBound bounds checking that detects buffer overflow bugs,
which is itself a mechanism for improving software reliability,
various RAS instruction set features can be treated similarly.
Such as "SCRUB ECC FROM CACHE LINE".
= Memory Ordering =
On a sequentially consistent memory ordering model, synchronization fences are hints or nops.
However, on weaker memory ordering models, they are not.