[[Category:Virtual Memory]]
= The Annoying Quote =
One occasionally hears (and reads in computer architecture textbooks) statements such as
"Instruction set XXX can access N memory locations in a single instruction, so therefore requires a TLB with at least N-way associativity."
This statement is wrong in several ways, both as an underestimate and as an overestimate.
It reflects ignorance of out-of-order processors, and even for in-order processors it reflects ignorance of other implementations.
= Basic Assumption =
The basic assumption begind this statement is something like this:
Assume you have an instruction whose operation may be described in pseudocode or microcode as:
tL1 := load(M[A1])
...
tLN := load(M[AN])
tC1 := f1(tL1..tLN)
...
tCN := fN(tL1..tLN)
store( M[A1] := tC1 )
...
store( M[AN] := tCN )
Assume that you have a [[restart versus resume instruction exception|restart and not a resume instruction fault architecture]] - i.e. assume that you must access all of M[A1] .. M[AN], eventually, without a fault or TLB miss.
Then you need N TLB entries to hold all of the [[translation]]s for M[A1]..M[A2].
Or, equivalently, assume that you are not allowed to take a fault or TLB miss in the "[[commit phase]]" of the instruction.
Then, once again, you need N TLB entries to hold all of the [[translation]]s for M[A1]..M[A2].
Sounds simple, eh?
= Out-of-order with Speculative TLB miss handling =
This betrays an in-order mindset. It does not necessarily work on an [[out-of-order]] machine that does not block on a TLB miss,
i.e. which can perform ALU operations, memory references, and TLB misses out-of-order.
It doesn't work because, even though an operation such as load, store, or a [[tickle]] in the pseudocode or microcode for an instruction may load an TLB entry,
this TLB entry may be thrashed out of the TLB by (a) a later operation in same instruction, (b) an earlier operation in the same instruction (remember, out-of-order), or (c) a TLB use from a different instruction (remember, out-of-order and non-blocking: other instructions may be executing at the same time).
In particular, not that there is northing that says that the operations within a single instruction will be performed in-order --- and, indeed, on an out-of-order machine like the Intel P6 where the microinstructions within an instruction were performed out-of-order - so you can't necessarily make assumptions about the order of accesses, how they will affect LRU, etc.
== Kluges to Make It Work ==
* Allow out-of-order between instructions, but impose ordering restrictions within an instruction - e.g. by implementing every instruction with a strictly in-ordr state machine.
* TLB misses in-order, at commit time
= Other Implementations =
== Save Translations ==
Allow a "translation" to be saved in a register.
tT1 := save_translation_and_permission(M[A1])
...
tTN := save_translation_and_permission(M[AN])
tL1 := load_using_saved_translation_and_permission(M[phys_tr=tT1])
...
tLN := load_using_saved_translation_and_permission(M[phys_tr=tTN])
tC1 := f1(tL1..tLN)
...
tCN := fN(tL1..tLN)
store_using_saved_translation_and_permission( M[phys_tr=tT1] := tC1 )
...
store_using_saved_translation_and_permission( M[phys_tr=tTN] := tCN )
Issues:
* such a "saved translation" should contain not only the physical address corresponding to a virtual address, but also permissions.
* it is relatively easy to provide such an operation to microcode. It is harder to make it available to software.
** it obviously cannot be provided to user code
** virtualizing such a saved translation may be a challenge - even the OS may not be allowed to see the true physical address or permissions
== Who Cares? So What? ==
One could take the viewpoint of "Who Cares?": allow multiple TLB misses in the same instruction, don't restart.
Issue:
* there arises the possibility of [[intra-instruction translation inconsistency]] - different accesses to the same virtual address may receive different translations, different physical addresses, or, perhaps worse, a check may pass but a subsequent access fault.
Again, one may say "Who cares? The OS should not be changing translations while an instruction may be in flight."
But
* Saying that the OS should not do something does not always mean that it will
* An OS implemented on top of a VMM may lead to issues: the VMM may not be tracking where the OS keeps its page tables
* While this strategy may be acceptable, it may have lousy performance because of the necessity of stopping multiple processors for a [[TLB shootdown]] while changing a translation, e.g. in a [[page table]] in memory.
= Conclusion =
Saying
"Instruction set XXX can access N memory locations in a single instruction, so therefore requires a TLB with at least N-way associativity."
* underestimates the TLB entries required for an out-of-order processor with non-blocking TLB miss handling
* overestimates the TLB entries required for several reasonable implementation strategies, such as "Save Translations" and "Who Cares?"
More accurately, one might say
"Instruction set XXX can access N memory locations in a single instruction,
and for a certain set of microarchitecture assumptions
may require a TLB with at least N-way associativity."
But there are other ways...
No comments:
Post a Comment