Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Saturday, February 12, 2011

RISC versus CISC

http://semipublic.comp-arch.net/wiki/RISC_versus_CISC

* [[RISC (Reduced Instruction Set Computer)]]
* [[CISC (Complicated Instruction Set Computer)]]

[[RISC]] and [[CISC]] have been discussed elsewhere, outside this wiki, in great detail.
I will not try to add much to what has already been written.

However, although I am a RISC sympathizer, I may take a contrarian stance, talking about some of the issues and problems with RISC.
The RISC philosophy was certainly influential. However, in many ways it failed.
I may try to talk about those failures.
And how the so-called [[RISC Revolution]] warped a generation of computer architects.

= Conclusion =

I did not really want to write this article. There is not much to add to all that has been written about the so-called [[RISC Wars]] except to say
that they caused a lot of thrashing, amounted to less than promised, although they did bring some improvements.

However, I could not really write a wiki on computer architecture without mentioning [[RISC versus CISC]].'

I would prefer, however, to discuss interesting aspects of RISC computer architecture, that may not be discussed in many other places,
than to rehash this old debate:

* [[Breaking out of the 32-bit RISC instruction set straitjacket]]
** [[variable length RISC instruction sets]]
** [[16-bit RISC instruction sets]]
** [[RISC instruction sets with prefix instructions]]

* [[post-RISC proliferation of instructions]]

* [[How do you count instructions in an instruction set architecture?]]

* [[Microcoded instructions and RISC]]
* [[Hardwired instructions and RISC]]
* [[Hardware state-machine sequenced instructions and RISC]]

Many of these latter issues are of the form "XXX and RISC", and really should be only of the form "XXX",
except that one of the leftovers of the [[RISC era]] is that it is considered obligatory to explain
how a feature supports or opposes the [[[RISC philosophy]].


= Background =

The late 1970s and early 1980s were an era of increasing complexity of increasing capability and complexity in computers.
The mainframe IBM 360 family was reaching the peak of its influence.
The relatively simple PDP-11 minicomputer
was supplanted by the more complex DEC VAX.
Microprocessors were marching forward: the Intel 8086 started along the road to world domination in 1978,
the Motorola 68000 started along the road to elegant failure in 1979.
Intel had been talking about the
[http://en.wikipedia.org/wiki/Intel_432 Intel iAPX 432]
microprocessor for a while, with features such as bit granular instructions,
garbage collection in hardware and microcode,
a software object model supported by hardware, etc.
People were trying to solve the so-called software gap with hardware, by building computers that more closely mapped
whatever language implementation they were targeting.


Complexity seemed increasing.
In actuality, much of the software complexity was moved into microcode.
Woe betide languages and operating systems that did not match the language and operating system use cases the computer was designed to support.

= The RISC Revolution =

Into this came a bolt of sanity:
    David A. Patterson and David R. Ditzel. 1980. The case for the reduced instruction set computer. SIGARCH Comput. Archit. News 8, 6 (October 1980), 25-33. DOI=10.1145/641914.641917 http://doi.acm.org/10.1145/641914.641917
Heck - it was not even published in a refereed journal!

One must also mention the [[IBM 801 minicomputer]], arguably the first RISC.

The late 1980s and early 1990s were full of RISC computers, especially microprocessors: [[Power]], [[PowerPC]], [[MIPS]], [[SPARC]], [[Motorola 88K]].
Not to mention many more failed companies.
All seemed destined to replace the IBM mainframe, the VAX minicomputer and,
then, as the importance of the PC market grew, the PC microprocessor.
But only the 68000 Apple PC fell to the PowerPC RISC onslaught.
The VAX and other minicomputers died out.
But the IBM mainframe, and the Intyel x86, continued, the latter spectacularly.

Now, by 2010,
* IBM is slowly transitioning to [[Power]], but the IBM Z-series mainframe stays strong
* the PC world is still ruled by Intel [[x86]]
** Intel's RISC and VLIW projects fell by the wayside
* [[PowerPC]] and [[MIPS]] have been relegated to [[embedded computing]]
* [[ARM]] is the biggest RISC success story, in embedded and, particularly, in low power, cell phones, etc.
** [[ARM]] is likely to be Intel's big competition
* Sun [[SPARC]] survives, barely - but Sun itself got swallowed by Oracle
** Sun/Oracle x86 boxes predominate even here
* DEC is long dead, as is [[Alpha]]

= What Happened? =

Many RISC companies went after the high priced, high profit margin, workstation or server markets.
But those markets got killed by [[The March of the Killer Micros]], specifically, the x86 PC microprocessor.
It is telling that ARM is the most successful "RISC", and that ARM targetted embedded and low power, the low end,
lower than the PC market, rather than the high end.

[[PowerPC]] and [[MIPS]] made a concerted attack on the Intel x86 PC franchise.
Microsoft even provided Windows NT support.
But when Intel proved that they could keep the x86 architecture competitive,
and stay the best semiconductor manufacturer, the "RISC PC" withered.

Intel proved they could keep the x86 PC microprocessor competitive in several stages:
* the i486 proved that the x86 could be pipelined.
** Up until then one of the pro-RISC arguments was that CISCs were too complicated to pipeline. But, see the next section
** I was thinking about Motorola 88Ks about this time, when the i486 started being talked about, and I realized - RISC had no fundamental advantage
* the Intel P5/Pentium did in-order superscalar
* the Intel P6, first released as the Pentium Pro, then Pentium II, did out-of-order
** briefly, the fastest microprocessor in the world, beating even DEC Alpha

Some say that the P6 killed RISC.

A more nuanced view is that RISC was a response to a short term issue:
the transition from board level to chip level integration.
When only a small fraction of a board level computer could fit on a chip,
RISC made more sense. When more could fit on a chip, RISC made less sense.

Not no sense at all. The RISC principles always have value.
But less sense, less of a competitive advantage.
Unnecessary complexity is always wasteful.

Moving on...

= The CISCs that survived were not that CISCy =

The CISCs that failed - DC VAX and the Motorola 68000 - were the most CISCy.
Most instructions were variable length.
Some frequently used instructions could be very long.
Many instructions had microcode.
Many operations had side effects.
They had complicated addressing modes - elegant in their generality, but coompliocated, sometimes neceessitating microcode just to calculate an address.


The CISCs that survived - the IBM 360 mainframe family, and the Intel x86 - were not that CISCy.

Sure, they had some complicated, almost impossible to implement without microcode, instructions.
Just consider x86 FAR CAL through CALL GATE.

However, most of their instructions were simple:
ADD src_reg += dest_reg_or_mem.
True, Note [[load-store]].
But [[load-op]] pipelines were not that hard for the IBM mainframes, and the Intel i486 and P5, to implement.
And the Intel P6 showed that the microarchitecture could be [[load-stiore]], even though the ISA was not.

The IBM 360 family has relatively simple instruction decoding.

The Intel x86 has painful instruction encodings, ranging from a single byte up to 15 bytes.
But the most frequently used instructions are small.
The really complicated instruction encodings, with prefixes, proved possible to (a) implement in hardware, but (b) at a significant performance cost, on the order of 1 cycle per prefix originally.

Most important, these instruction sets had few side effects.
Yes, the x86 has condition codes.
But most x86 instructions overwrite all of the arithmetic condition codes (INC and DEC not affecting the carry flag being the notable exception).
This avoided the sort of RAW hazard on the condition codes that would have been required for an OOO implementation of, say, the Motorola 68000.

= Didn't RISC win anyway? =

Did not RISC win anyway? After all, doesn't a modern CISC processor translate to RISC uops internally?

Well, yes and no. Let's look at some of the RISC principles proposed in early papers

* fixed length instructions - 32 bit
** at the [[macroinstruction set level]], not so much
** at the [[microinstruction set]] or [[UISA]] level, maybe
*** maybe- but definitely not "small". Is it a RISC if the (micro)instructions are 160 bits wide
*** even here, the recent trend is to compressing microcode. Some are small, some take more bits.
** the most popular surviving RISC instruction sets have 16 bit subsets to increase code density

* simple instruction decoding
** ISA level, no
** UISA level - undocumented. Probably. But, again, very wide!!

* software floating point
** nope!!

* large uniform register set
** in the early days, not so much
** over time, the register set has grown. As has the complexity of the ISA encoding, [[REX bytes]] etc.

* small number of instructions
** definitely not!!!
** microcode instruction sets have long been full of many instructions, particularly widget instructions
** even macroinstruction sets have increased dramatically in size since 1990. More than quadrupled.

Some have said that the point of RISC was not reduced instruction count,
but reduced instruction complexity.
This may be true - certainly, this was always the argument that I used *against* rabid RISC enthusiasts who were trying to reduce the number of instructions in the instruction set.
But nevertheless, there were many, many, RISC zealots and managers who evaluated proposals by counting instructions.


= What's left? =

The most important intellectual survivor of the RISC philosophy has been to aversion to complicated microcode sequences.
Expanding instructions to 1 to 4 [[uop]]s may be okay,
but taking a cycle or more to branch into a [[microcode sequencer]], perform the operation, and branch back is a performance penalty nobody wants to take.

The number of instructions in instruction sets has increased dramatically over the past decade.
But the vast majority of these are instructions that can be implemented directly by hardware,
by combinatoric logic,
or by simple state machines in the case of operations such as divide.

One may conjecture that the original RISC aversion to floating point
was actually to microcode floating point:
when the most important floating point operations like [[FADD]] and [[FMUL]] became pipelined,
capable of being started one or more per clock
even though the operation took several cycles of latency,
the objections to FP on a RISC died away.

== Bad Effects ==

We are still paying the cost of certain RISC-era design decisions.

For example, Intel MMX is irregular.
It does not have all the reasonable combinations of
datasize={8,16,32},
unsaturated, signed and unsigned saturation.
This irregularity was NOT because of hardware complexity,
but because management was trying to follow RISC principles by counting instructions.
Even when providing all regular combinations would have made the hardware simpler rather than harder.
(Aside: validation complexity is often used as an argument here, against regularity. The complexity of validating all regular combinations grows combinatorically.)

AMD x86-64 is not a bad instruction set extension.
But life might be easier in the future, if x86 does not die away, if it had been more regular.
More RISC-like.
But in this way RISC was its own enemy:
RISC did not achieve the often hoped for great performance improvements over CISC.
RISC reduces complexity, which does not directly improve performance.
So people went chasing after a Holy Grail of VLIW performance, which also did not pan out.

= Conclusion =

I did not really want to write this article. There is not much to add to all that has been written about the so-called [[RISC Wars]] except to say
that they caused a lot of thrashing, amounted to less than promised, although they did bring some improvements.

However, I could not really write a wiki on computer architecture without mentioning [[RISC versus CISC]].'

I would prefer, however, to discuss interesting aspects of RISC computer architecture, that may not be discussed in many other places,
than to rehash this old debate:

* [[Breaking out of the 32-bit RISC instruction set straitjacket]]
** [[variable length RISC instruction sets]]
** [[16-bit RISC instruction sets]]
** [[RISC instruction sets with prefix instructions]]

* [[post-RISC proliferation of instructions]]

* [[How do you count instructions in an instruction set architecture?]]

* [[Microcoded instructions and RISC]]
* [[Hardwired instructions and RISC]]
* [[Hardware state-machine sequenced instructions and RISC]]

Many of these latter issues are of the form "XXX and RISC", and really should be only of the form "XXX",
except that one of the leftovers of the [[RISC era]] is that it is considered obligatory to explain
how a feature supports or opposes the [[[RISC philosophy]].

No comments: