Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Monday, April 18, 2011

Potential customers for tagged memory

http://semipublic.comp-arch.net/wiki/Potential_customers_for_tagged_memory

There are many potential customers for [[tagged memory]] or [[memory metadata]].
I have long maintained a list (or, rather, lists, since I have had to recreate these lists on 3 occasions)
of uses for tagged memory.
Here is a list that is by no means complete (previous versions of this list have exceeded 20 items).

Which should be allocated the tags?
Obviously, my favorite idea...

Glew opinion: tagged memory should be microarchitecture, not macroarchitecture. If there are physical tags, it is a good idea to be able to use them to speed things up. But there should always be a fallback proposal that does not depend on having main memory tags. See Milo Martin's Hardbound for an example of how to have metadata in memory, without tags.


* Tagged memory to identify data types - integer, floating point ...
:* Problem: there is an infinite number of types ...
:* 1..n bits. Per qword? Maybe per byte?
* Tagged memory to aid in [[garbage collection]]. [[Read and write barriers]].
:* 1 bit (or more) per minimum allocation unit, 1-4 words
* Tagged memory to aid security, e.g. to create [[non-forgeable pointers]] as in the IBM AS400
:* This may well be my favorite use of only a single bit of tagged memory.
:* 1 bit per pointer - 64 or 128 bits
* Tagged memory for [[taint or poison propagation]], as in [[DIFT (Dynamic Information Flow Tracking)]]. E.g. Raksha.
:* 1 bit per ... word? byte?
:* Raksha uses multiple bits
* Tagged memory for [[transactional memory]] support
:* 1 bit per ... many TM systems use cache line granularity, although smaller better
* Tagged memory for [[debugging]]
:* 1 bit per ... can deal with coarse granularity, e.g. by trapping, determining not interested, and going on.
* Tagged memory for [[performance monitoring]] - e.g. to mark all memory accessed in a time interval.
:* 1 bit per cache line
* Tagged memory for [[uninitialized memory]]
:* 1 bit per ... byte? word?
* Tagged emory for [[synchronization]] of parallel programs. E.g. [[full/empty bits]].
* Fine grain [[copy-on-write]] data structures
** [[Myrias style parallelism]]
:* 1 bit per ... byte? word? ...

Heck, I suppose that conventional width oriented ECC can be considered a form of tagged memory. And although ECC is common, at least at the time of writing it is not in 99% of all PCs. It causes problems enough to warrant the invention of [[Poor Man's ECC]], a non-width oriented implementation of ECC that avoids physical tagged memory.

Wide operand cache

http://semipublic.comp-arch.net/wiki/Wide_operand_cache
= By Microunity =

The [[wide operand cache]] is a concept originally by [[Microunity]] =

Situation:
* you want to design a RISC machine, with "naturally 32 or 64 bit regissters.
** or even 1024 bit registers - no matter how wide you go, there will nearly always be a reason to have significantly wider operands.
* but you also want to support "simple" instructions that just happen to have very wide operands.

See [[#Examples of instructions with wide operands]].
For the purposes of this discussion, we will consider [[BMM (bit matrix multiply)]].
BMM wants to have one, say, 64 bit operand, and a second operand, the matrix, that is 64x64 bits in size - 4Kib.

Defining an instruction such as
BMM destreg64b := srcreg64b * srcreg4x64b
could be possible - but it has many issues:
* how many 64x64=4Kib operand registers do you want to define? 1? 2? 4?
* you probably do not want to copy such a wide operand around - instead, you might want it to live inside its execution unit
etc.

The [[wide operand cache]] approach is as follows: define an instruction with a memory operand
BMM destreg64b := srcreg64b * M[...]
* You may spercify the memory operand simply as register direct, M[reg], or you may define it using a normal addressing mode such as M[basereg+offset], or even using scaled indexing.

Conceptually, the wide operand is loaded before every use:
BMM destreg64b := srcreg64b * M[addr]
tmp64x64b := load M[addr]
destreg64b := srcreg64b * tmpreg64x64b

However, we may avoid unnecessary memory traffic by caching the wide operand.

This leads to the basic issue: [[#TLB semantics versus coherent|]].



= TLB semantics versus coherent =

The [[wide operand cache]] concept admits a basic question:
* do you snoop the wide operand cache, keeping it coherent with main memory, so that if somebody writes to a cached wide operand it is reflected in the next execution of the instruction?
* or do you use "[[TLB semantics]]", i.e. making it a [[noncoherent cache]], requiring the user to explicitly invalidate it (requiring an [[invalidate wide operand cache(s) instruction]].

If coherent then an implementation can vary the number of wide operand cache entries transparently.

If noncoherent then not only can the number of entries be detected, but also you run the risk of getting different answers depending on the context switching rate.

[[Glew opinion]]: I prefer coherent, but would live with noncoherent if necessary.

= So Close... =

Frustrating anecdote: at AMD I was working, with Alex Klaiber, out how to support instructions like with very wide operands, such
as [[BMM]]. My whiteboard was full of scribblings, with the basic question of [[#TLB semantics versus coherent|]].

I then went to a meeting where Microunity presented their patents.

So close... Actually, probably off by 5-10 years. But I was following a path that I did not know had been trailblazed...


= Examples of instructions with wide operands =
Any instruction that has an operand that is significantly wider than some of its inputs,
and which either tends to be constant, or which tends to be modified in place,
is a candidate for a [[wide operand in memory]] implemented via a [[wide operand cache]].
For that matter, one could have wide operand instructions whose operands are all pseudo-regosters in a wide operand cache.
* [[BMM (bit matrix multiply)]] - 64b X 64x64b
* [[vector-matrix instructions]] - N X NxN
* [[permutation index vector instruction]] - Nbits*log2(Nbits)
* [[permutation bit matrix instruction]] - really a form of [[BMM]]
* [[superaccumulator]] - 32 bit - thousands ...
* [[regex instructions]] - with large regex operands that can be compiled
* [[LUT (lookup table) instructions]]
* [[texture sampling]]
* [[interpolation instructions]]
* [[CAM instructions]]

TBD

Saturday, April 09, 2011

New models for Industrial Research (in reply to: The death of Intel Labs and what it means for industrial research)

Matt Walsh's post on "The death of Intel Labs and what it means for industrial research" must have struck a nerve with me, because I have spent a morning writing a long response.

BRIEF:

Intel's lablets have been shut down, not the labs. I helped started Intel's labs, but not the lablets. It's not clear how effective the lablets ever were. Same for the labs. I discuss models for research, including

(1) Academia far-out, and industry close-in (nice if it were true)

(2) Google's 20%

(3) IBM and HP's business group motivated research labs

(4) Some of my experience from Intel, in both product groups and research labs

(5) Open Source (if ever I can retire I would work on Open Source. But I have not yet managed to find a job that allowed me to work on Open Source.)

I think my overall point is that each of these models works, sometimes - and each is subject to herd mentality, deference to power, etc. Perhaps there is room for new ways of doing research, invention, and invention - a new business model.

Finally, I mention briefly, providing links to quotes, Intellectual Venture's website. With a disclaimer saying that I don't speak for IV, although obviously I have hope for its potential since I left Intel to join IV.


DETAIL:

Intel recently announced that it is closing down its three "lablets"
in Berkeley, Seattle, and Pittsburgh


So it goes. This might be unfortunate. None of the lablet work in my
field, computer architecture, has caught my eye, although I did enjoy
interacting with Todd Mowry's group in Pittsbugh on Log Basse
Architecture (I had come up with Log Based Microarchitecure at
Wisconsin).

However, it is wrong to say that the lablets reflects the death of
Intel Labs. I was involved with the creation of Intel Labs, circa
1995 inside Intel.

This was historically a hard sell, since Intel had been *created* by
refugees from the research labs of other companies. It was a
touchstone of Intel culture that Intel would never do ivory tower
research not relevant to product groups.

E.g. while campaigning for the creation of Intel Labs I created a
slideset that said "Intel must start doing our own research in
computer architecture, now that we have copied all of the ideas from
older companies." I am not sure, but it seems like Craig Barrett may
have seen these slides when he was quoted in the Wall Street Journal
''Now we're at the head of the class, and there is nothing left to
copy,'' Mr. Barrett was quoted as having said.


(Ironically, DEC used this to justify their patent infringement
lawsuit against Intel circa 1997 -- but when I created these slides I
had IBM in mind, not DEC, since I freely admit that much of my work at
Intel was built upon a foundation of IBM work on RISC and Tomasulo
out-of-order. Not DEC Alpha. Perhaps I should never have created
those slides, but it put the case pithily, it helped justify the
creation of Intel Labs. And *I* did not quote them to the WSJ.)

Ref: http://query.nytimes.com/gst/fullpage.html?res=9F02E3D61F39F937A25756C0A961958260&pagewanted=2

Matt says "Before the Labs opened, Intel Research was consistently
ranked one of the lowest amongst all major technology companies in
terms of research stature and output." Well, yes and no. The lablets
opened in 2001. MRL, the Microprocessor Research Lab I helped start,
opened in 1995, as did some of the other labs. When I was at AMD in
2002-2004 my AMD coworkers were already sating to me that Intel's MRL
work was the most interesting work being published in computer
architecture conferences like ISCA, HPCA and Micro. I.e. I think MRL
was picking up steam well before the lablets were created.

Actually, the lablets were part of a trend to "academify" Intel
Labs. E.g. around that time my old lab MRL was taken over by a famous
professor imported from academia, who proceeded to do short term work
on the Itanium - and over the next few years most of the senior
researchers who did not agree with Itanium left or were forced out.
Ironically, the academic created a much shorter term focus at MRL by
betting on VLIW - and then ultimately he moved out of Intel.

Now, don't get me wrong: the guys left over the lab formerly known as
MRL do good work. Chris Wilkerson has published lots of good
papers. Jared Stark accomkplished the most successful technology
transfer I am aware of, of branch prediction to SNB. Chris and Jared
are largely the guys whose work my former AMD coworkers admired.

But, such work at Intel is largely incremental, evolutionary. I
mentioned that the famous professor in charge of my old research lab
tried to play politics by favoring Itanium, even though his
researchers were opposed.

Annoyingly, from when I joined Intel in 1991 to when Intel Labs
started in 1995 computer architecture work inside Intel was pretty
much 10 years ahead of academia. Out-of-order execution like P6 did
not come from mainstream academia, who were busy following the fads
RISC and in-order. (OOO came from at that time not mainstream
academia like Yale Patt and Wen-Mei Hwu, but they became maiinstream
as OOO became successful.)

But companies like Intel rest on their laurels. Having defeated RISC,
Intel did not need to do any serious computer architecture work for,
what, 10 years? 16 years now?

Matt says: I am very concerned about what happens if we don't have
enough long-range research. One model that could evolve is that
universities do the far-out stuff and industry focuses on the shorter
term.
It is hard to justify the Bell Labs model in today's world,
though no doubt it had tremendous impact.


I share your concern. But my experience is that universities are not
necessarily good at doing the far-out stuff.

About my experience: I'm not an academic, although perhaps I should
have been one. I failed to complete my Ph.D. twice (first marriage,
then when my daughter got born). I've never had an academic paper
published (although I had a few rejected, that later got built in
successful products). But I made some useful contributions to the
form of OOO that is in most modern computers. You are almost 100%
likely to have used some of my stuff, probably in the computer
you are reading this on. At one time I was Intel's most prolific
inventor. I helped start Intel Labs.

I wasted too many years of my life on what I think is the next major
step forward in computer architecture to improve single threaded
execution - speculative multithreading (SpMT). I say "wasted" not
because I think that SpMT is a bad idea, but because I spent far too
many of those years seeking funding and approval, rather than just
doing the work. The actual work was only a few intense months,
embedded in years of PowerPoint and poltics. But even though SpMT has
not proven a success yet, a spin-off idea, Multi-cluster
Multithreading (MCMT), the substrate that I wanted to build SpMT on,
is the heart of AMD's next flagship processor, Bulldozer. 7 years
after I left AMD in 2004. 13+ years after I came up with the idea of
MCMT, at Wisconsin during my second failed attempt to get a PhD.

My last major project at Intel 2005-2009 has not yet seen the light of day,
but newsblurbs such as Intel developing security 'game-changer':
Intel CTO says new technology will stop zero-day attacks in their
tracks
suggest that it may.

Source: http://www.computerworld.com/s/article/9206366/Intel_developing_security_game_changer_

But in my last year at Intel, this major project, and a couple of
minor projects, were cancelled under me, until I was forced to work on
Larrabee, a project that I was not quite so opposed to as I was to
Itanium. Enough being enough, I left.

So: I am not an academic, but I have worked at, and hope to remain
working at, the leading edge of technology. I have tried to create
organizations and teams that do leading edge research, like MRL, but I
am more interested in doing the work myself than in being a manager.

The question remains: where do we get the ideas for the future? How
do we fund research, invention, and innovation?

Matt says One model that could evolve is that universities do the
far-out stuff and industry focuses on the shorter term.


"Could evolve"? Believe me, this is what every academic research
grant proposal said that I saw when I sat on an Intel committee giving
out research grants. It usually doesn't work - although once in a while it does.

Matt says: Google takes a very different approach, one in which
there is no division between "research" and "engineering."
This is
an interesting approach. Myself, I am not a very good multi-tasker - I
tend to work intensely on a problem for weeks at a time. I don't know
how well I could manage 20% time, 1 day a week, for new projects.
(Although I am supposedly doing something like this for my current
job, the 90% main job tends to expand to fill all time available to
it.) But it may work for some people.

Somebody else posts about industry research: IBM Research and HP
Labs don't really have an academic research mindset and haven't for a
long time thanks to business unit based funding.
Then goes on to say
Even within Intel Research, successful researchers (in terms of
promotion beyond a certain key point) also had to have some kind of
significant internal impact.
Which is, I suspect, why the famous
professor who ran MRL after I left emphasized the VLIW dead-end,over
the objections of his senior researchers.

My own vision for Intel's Microprocessor Research Labs that the best
technology transfer is by transferring people. You can't throw
research over the wall and expect a product group to use it. Instead,
I wanted to have people flow back andforth between product groups and
research. In part I wanted to use the labs as an R&R stop for smart
people in the product groups - give them a place to recharge their
batteries after an exhausting 5 to 7 year product implementation. A
place to create their own new ideas, and/or borrow ideas from the
academics they might interact with in the labs. And then go back to a
product group, perhaps dragging a few of the academics along with
them, when they align with the start of a new project. "Align with the
start of a new project" - this is important. Sometimes a project
finishes, and there is no new project for the smart guys coming off
the old project to join, because of the vagaries of project schedules.
All too often people jump ship off an old project too early, because
they want to get onto the sexy new project at the right time for their
career growth. By providing such a scheduling buffer, this thrashing
may be avoided - and the even worse happenstance, when a smart guy
leaves the company, because there is no new project for him at his
current employer, while there is at the new company. And, finally,
once in a while a new project flows out of the lab.

I am particularly sympathetic to the Anonymous poster who said How
about a totally different alternative model?
and then talks about
memes popular in the Open Source community, such as People do not
need to spend half of their life in formal schooling to start doing
cutting edge work.
But then he says:

Most academic research outside of the top 5-10 schools in any field
is not useful, even by academic standards.
I go further: MOST
academic work at MOST schools, even the top 5-10 schools, is not
useful. But oftyen the best academic work is at some little known
third or fouth tier school, and has trouble getting published.

Code is (far) more useful than papers. I am very sympathetic to
this. But (a) most engineers and programmers are not free to publish
either code or papers, limited by their employment agreements. And
occasionaly (b) the papers are a useful summary of the good ideas.

I look forward to the day when we can have Open Source computer
hardware. I don't say this facetiously: some of my best friends are
working on it. I would also, if I did not need an income.

Many of the people capable of contributing at a high level in
academia have the ability to start significant companies and create
genuine wealth.
Many, but not most. Not every good technical person
is a good business person.

Which leads in to my closing point: Not every good technical person is
a good business person. Not every inventor is capable of building a
company around his investions. Many of the most useful inventions
cannot justify a completely new and independent company: they need the
ecosystem of an existing product line, and the support of a larger
organization.

For many years my resume said that my gioal was to re-create Thomas
Edison's Invention Factory in the modern world. In 2009 I left Intel
for the second time, and joined Intellectual Ventures. (With whom I
had earlier worked on some inventions I had made in 2004, in the short
time between my leaving AMD and rejoining Intel, the only time in my
career that my ideas belonged to me, and not my employer.)

I'm not authorized to speak for Intellectual Ventures, but I can refer
you to some of the things on the IV website,
http://www.intellectualventures.com:

“An industry dedicated to financing inventors and monetizing their
creations could transform the world.” Nathan Myhrvold, Founder and
CEO

    We believe ideas are valuable. At Intellectual Ventures, we invest both expertise and capital in the development of inventions. We collaborate with leading inventors and partner with pioneering companies. ... We are creating an active market for invention... We do this by: * Employing talented inventors here at Intellectual Ventures who work on new inventions to help solve some of the world’s biggest problems. * Purchasing inventions from individual inventors and businesses ... * Partnering with our international network of more than 3,000 inventors and helping them to monetize their inventions.

Most of posters, the original blogger and the authors of the comments,
on this topic are interested in promoting research, invention, and
innovation. Sometimes you need to create a new economic or business
model. I hope it works.

Matt says: It is hard to justify the Bell Labs model in today's
world, though no doubt it had tremendous impact.


Somebody else once said to me that Bell Labs could have been kept running, avoiding its decline, based solely on its patent royalties.

Would that have been worth it? These were the people that gave us the
transistor. Information Theory. UNIX.






Finally, I must make the following disclaimer:

The content of this message is my personal opinion only. Although I am
an employee (currently of Quantum Intellectual Property Services,
working for Intellectual Ventures; in the past of companies such as
Intel, AMD, Motorola, and Gould), I reveal this only so that the
reader may account for any possible bias I may have towards my
employer's products. The statements I make here in no way represent my
employer's position on the issue, nor am I authorized to speak on
behalf of my employer.

Tuesday, April 05, 2011

Why_saying_"You_need_N-way_associativity_in_the_TLB_to_guarantee_forward_progress"_is_bogus_and_reflective_of_an_in-order_mindset

http://semipublic.comp-arch.net/wiki/Why_saying_%22You_need_N-way_associativity_in_the_TLB_to_guarantee_forward_progress%22_is_bogus_and_reflective_of_an_in-order_mindset
[[Category:Virtual Memory]]

= The Annoying Quote =

One occasionally hears (and reads in computer architecture textbooks) statements such as

"Instruction set XXX can access N memory locations in a single instruction, so therefore requires a TLB with at least N-way associativity."

This statement is wrong in several ways, both as an underestimate and as an overestimate.
It reflects ignorance of out-of-order processors, and even for in-order processors it reflects ignorance of other implementations.

= Basic Assumption =

The basic assumption begind this statement is something like this:

Assume you have an instruction whose operation may be described in pseudocode or microcode as:
tL1 := load(M[A1])
...
tLN := load(M[AN])

tC1 := f1(tL1..tLN)
...
tCN := fN(tL1..tLN)

store( M[A1] := tC1 )
...
store( M[AN] := tCN )

Assume that you have a [[restart versus resume instruction exception|restart and not a resume instruction fault architecture]] - i.e. assume that you must access all of M[A1] .. M[AN], eventually, without a fault or TLB miss.
Then you need N TLB entries to hold all of the [[translation]]s for M[A1]..M[A2].

Or, equivalently, assume that you are not allowed to take a fault or TLB miss in the "[[commit phase]]" of the instruction.
Then, once again, you need N TLB entries to hold all of the [[translation]]s for M[A1]..M[A2].

Sounds simple, eh?

= Out-of-order with Speculative TLB miss handling =

This betrays an in-order mindset. It does not necessarily work on an [[out-of-order]] machine that does not block on a TLB miss,
i.e. which can perform ALU operations, memory references, and TLB misses out-of-order.

It doesn't work because, even though an operation such as load, store, or a [[tickle]] in the pseudocode or microcode for an instruction may load an TLB entry,
this TLB entry may be thrashed out of the TLB by (a) a later operation in same instruction, (b) an earlier operation in the same instruction (remember, out-of-order), or (c) a TLB use from a different instruction (remember, out-of-order and non-blocking: other instructions may be executing at the same time).

In particular, not that there is northing that says that the operations within a single instruction will be performed in-order --- and, indeed, on an out-of-order machine like the Intel P6 where the microinstructions within an instruction were performed out-of-order - so you can't necessarily make assumptions about the order of accesses, how they will affect LRU, etc.

== Kluges to Make It Work ==

* Allow out-of-order between instructions, but impose ordering restrictions within an instruction - e.g. by implementing every instruction with a strictly in-ordr state machine.

* TLB misses in-order, at commit time

= Other Implementations =

== Save Translations ==
Allow a "translation" to be saved in a register.
tT1 := save_translation_and_permission(M[A1])
...
tTN := save_translation_and_permission(M[AN])

tL1 := load_using_saved_translation_and_permission(M[phys_tr=tT1])
...
tLN := load_using_saved_translation_and_permission(M[phys_tr=tTN])

tC1 := f1(tL1..tLN)
...
tCN := fN(tL1..tLN)

store_using_saved_translation_and_permission( M[phys_tr=tT1] := tC1 )
...
store_using_saved_translation_and_permission( M[phys_tr=tTN] := tCN )

Issues:
* such a "saved translation" should contain not only the physical address corresponding to a virtual address, but also permissions.
* it is relatively easy to provide such an operation to microcode. It is harder to make it available to software.
** it obviously cannot be provided to user code
** virtualizing such a saved translation may be a challenge - even the OS may not be allowed to see the true physical address or permissions

== Who Cares? So What? ==

One could take the viewpoint of "Who Cares?": allow multiple TLB misses in the same instruction, don't restart.

Issue:
* there arises the possibility of [[intra-instruction translation inconsistency]] - different accesses to the same virtual address may receive different translations, different physical addresses, or, perhaps worse, a check may pass but a subsequent access fault.

Again, one may say "Who cares? The OS should not be changing translations while an instruction may be in flight."

But
* Saying that the OS should not do something does not always mean that it will
* An OS implemented on top of a VMM may lead to issues: the VMM may not be tracking where the OS keeps its page tables
* While this strategy may be acceptable, it may have lousy performance because of the necessity of stopping multiple processors for a [[TLB shootdown]] while changing a translation, e.g. in a [[page table]] in memory.

= Conclusion =

Saying
"Instruction set XXX can access N memory locations in a single instruction, so therefore requires a TLB with at least N-way associativity."

* underestimates the TLB entries required for an out-of-order processor with non-blocking TLB miss handling

* overestimates the TLB entries required for several reasonable implementation strategies, such as "Save Translations" and "Who Cares?"

More accurately, one might say

"Instruction set XXX can access N memory locations in a single instruction,
and for a certain set of microarchitecture assumptions
may require a TLB with at least N-way associativity."

But there are other ways...

Friday, April 01, 2011

Reset: Hard, Soft, Cold, Warm

http://semipublic.comp-arch.net/wiki/Reset:_Hard,_Soft,_Cold,_Warm
= The need for RESET after REST at power-on =

In the beginning there was [[RESET]]: a signal asserted while powering on, deasserted when power and logic levels were stable, so that the circuit could could be initialized.

Soon thereafter, or even before, there was POWEROK:
* !POWEROK and RESET => do not attempt operation
* POWEROK and RESET => power good, now do initialization
* POWEROK and !RESET => initialization done, running.

But let us not get obsessed by the details of signalling: I'm okay, you're okay, I've initialized and you are ready.
Moving forward: ...

Eventually people realized that they wanted to be able to restore the state of the system to that right after RESET, without going through the full power on sequence.
Hence the concept of [[soft reset]] was born.
On Intel x86 systems the INIT pin or meessage can be considered to be approximately a [[soft reset]].
With the old reset and power on constituting [[hard reset]].

But [[soft reset]]s like INIT cannot recover from all errors. Sometimes a computer is truly hung, and a [[power cycle]] is necessary.
Or, if not a power cycle, an assertion of the RESET signal
so that all ofthe rest of the system is initialized?

Did I mention that, perhaps before [[soft reset]], people would build circuits that could interrupt power with a relay, and thus provooke a power on [[hard reset]] under software control?

Trouble is, after a [[hard reset]] system state might be unreliable, or might be initialized to a true reset state, such as all zeros.
How can you then distinguish a true [[power on reset]] from a [[hard reset]] invoked under software control?
Perhaps used to recover from a system hang?

What you need is a softer hard reset - a [[warm reset]] that acts like a [[hard reset]], asserting the RESET signal to the rest of the system, I/O devices et al,
so that the state of the machine is as close as you can get to a true power on [[cold reset]] as possible.
But where power persists across the [[warm reset]],
so that at least certain status registers can be reliably read.

At first the state that persisted across such a [[warm reset]] might reside only in a battery backed up unit near to the reset state machine.
But when the amount of state that you want to persist grows large,
e.g. the [[MCA (Machine Check Architecture)]] error status log registers,
state inside the [[CPU]] may be allowed, indeed, required, to persist across such a [[warm reset]].

= A possible sequence of progressively harder RESETs for error recovery =

I am sure that you can see that [[cold reset]] versus [[warm reset]] and [[hard reset]] versus [[soft reset]] are not necessarily discrete points but,
as usual, may be points on a spectrum,
a not necessarily 1D ordered list of possible reset mechanisms.

If an error is detected,
e.g. if a processor stops responding
* first you might try sending it a normal interrupt, of progressively higher priority
* then you might try sending in an [[NMI (Non-Maskable Interrupt)]]
** then any of the flavors of [[even less maskable non-maskable interrupt]]s, such as Intel's [[SMI (System Management Interrupt]], some sort of [[VMM]] or [[hypervisor]] interrupt
* then you might try sending a [[soft reset]] message like INIT to the apparently hung processor
* this failing, you may try to do a [[warm reset]] of the hung processor
** although by this time you probably want to reset all of the processors and I/O that are tightly bound to the hung processor
** exactly how you define such a "reset domain" is system specific, although it is often the same as a [[shared memory cache coherency domain]].
* failing this, you might try to use a [[hard reset]] under the control of an external circuit, e.g. at the power supply, that can trip a relay and then untrip it after a time has elapsed
** heck, this can be done at power supply points increasingly distant: inside the PC or blade, at the rack, in the datacenter...
* all of these failing, you can try to notify the user, although by this point that probably is impossible; or you may rely on some external mechanism, such as a user or a watchdog, to try to use ever more extreme forms of resetting the system.

The above tends to imply that there is a linear order of reset mechanisms. This is not necessarily true. You may reset subsystems in heuristic order.

= RESET is a splitting concept =

I.e. overall [[RESET]] is one of those concepts that inevitably split,
whenever you look too closely at it.
Of course, create no more flavors of reset than you need, because each brings complexity.
But inevitably, if your system is successful and lasts for several years,
you will need one or two more flavors of RESET than were originally anticipated.

= Similar =

[[Watchdog timer]]s are a concept that splits similarly to RESET.
Indeed, each level of progressive reset forerror recovery of a hung system is often driven
by a new level of [[watchdog timer]].
Or [[sanity timer]].

(Or [neurosis timer]], or [[psychosis timer]] ... no, I am writing this on April Fool's.)