HE Icroprocessor Oday: Microprocessor Report's Publisher
HE Icroprocessor Oday: Microprocessor Report's Publisher
HE Icroprocessor Oday: Microprocessor Report's Publisher
TODAY
Michael Slater
December 1996 33
Current status
Table 1. Originators and licensees for Digital plans to begin moving down into high-end PCs, setting
leading desktop architectures. the stage for an eventual attack on the mainstream PC market.
It is a long shot for Alpha to capture a significant mainstream
Architecture Originator Licensees role, but at least it can’t be counted out yet.
It is in this light that PowerPC’s position in the Windows NT
Alpha Digital Semiconductor Mitsubishi, Samsung market appears so weak. PowerPC processors are not nearly
Mips Mips Technologies IDT, NEC, Toshiba as fast as the Alpha chips and don’t offer a significant perfor-
PA-RISC Hewlett-Packard Hitachi, Samsung mance advantage over Intel processors, leaving them between
PowerPC Apple, IBM, Motorola Groupe Bull, Exponential a rock and a hard place. Customers looking for safety and com-
Sparc Sun Microelectronics Fujitsu (includes HaL, patibility choose Intel; those seeking maximum performance
ICL, and Ross) on a small set of applications are drawn to Alpha. This leaves
few for whom PowerPC would be a compelling choice.
The staying power of the PowerPC backers is the archi-
dard. This effort failed miserably for at least two reasons. tecture’s key strength. If a future generation of chips is much
First, Unix was, and is, unsuitable for a mass-market oper- stronger than today’s, the architecture could end up head to
ating system because of its complexity, its resource require- head with Alpha in an attempt to capture the number two
ments, and its lack of personal productivity applications. position in the Windows NT market.
Second, Sun was never willing to let Sun-compatible system This leaves Hewlett-Packard’s PA-RISC as the only RISC
makers operate unrestrained, for fear of the effect that might architecture whose owner never attempted to use it in an
have on Sun’s own hardware business. attack on Intel’s market share. This may have been an excel-
Next up was Mips, then an independent company (Mips lent decision, considering the fate of companies that have
Computer Systems). Microsoft chose Mips as the first RISC tried. HP is now engaged with Intel in a joint development
architecture that the emerging high-end version of Windows, project that will lead to a new architecture around 1998. The
Windows NT, would support. Mips engineered the ACE ini- architecture, called IA-64 and to be first implemented in a
tiative, and at one point had both Digital Equipment chip code-named Merced, will provide backward compati-
Corporation and Compaq planning to build Mips-based sys- bility with both x86 and PA-RISC programs. Having built a
tems to run Windows NT. But the timing was bad; Compaq large computer business around its architecture, HP has
fell on hard times as lower cost PC clones eroded its business, found no compelling reason to spend billions of dollars on
and Digital decided to create its own architecture and aban- fabrication facilities and chip designs to provide processors
don Mips. for these systems. Thus, it has joined future paths with Intel.
Mips made its own contribution to the failure of ACE by Table 1 shows the companies backing each of the archi-
trying to collect large license fees for reference system tectures for general-purpose computers. (Not shown are
designs. Silicon Graphics (SGI) soon thereafter swallowed licensed implementations; for example, Cyrix has licensed
Mips Computer Systems, which became SGI’s Mips its x86 processor designs to IBM Microelectronics and SGS-
Technologies subsidiary. SGI has shown no interest in either Thomson.) Although there has been a mad rush to sign up
Windows NT or the high-volume desktop market, and licensees, it has turned out to be relatively insignificant; the
Microsoft is dropping Mips support in Windows NT. owners and primary backers of each architecture determine
Then came PowerPC, backed by IBM, Motorola, and its fate.
Apple. Apple has successfully converted the Macintosh line
to PowerPC, giving PowerPC the biggest desktop market of Pentium dominates computing today
any RISC processor. Because Apple and the Macintosh plat- Intel’s Pentium processor series dominates today’s desk-
form itself are struggling to maintain their modest position, top computer market. Depending on clock speed, this chip
however, this doesn’t represent much of a growth opportu- spans a price range (in quantities of 1,000) from about $75
nity. Efforts to go beyond Macintosh on the mainstream desk- to just over $500, putting it at the appropriate price points for
top have largely failed: IBM’s OS/2 for PowerPC was most PCs. Although early Pentium processors provided little
stillborn, Taligent folded its tent, and PowerPC’s position in advantage over 486 chips, Intel’s aggressive promotion of
the Windows NT market is weak. Pentium and rapid increase in the chip’s clock speed enabled
The most recent architecture to aim at the desktop market it to sweep the desktop market by the end of 1995 and the
is Alpha, Digital’s home-brewed replacement for the Mips notebook market in the first half of 1996.
architecture. Digital wholeheartedly embraced Windows NT Following a familiar pattern in the microprocessor indus-
and has the benefit of owning its own systems business, try—but at an accelerated pace—Intel has twice moved
including a PC business. But Windows NT is only now Pentium to a new process technology. The initial chips, code-
approaching the maturity that will enable it to become a named P5, were built in 0.8-micron BiCMOS and ran at 60
mainstream operating system; and DRAM prices are only and 66 MHz. These chips were power hungry, and Intel
now becoming low enough to render Windows NT’s addi- phased them out before Pentium began its move into the
tional memory requirement insignificant. mainstream PC market. The next version, the P54C, shrank
Until now, Digital has used Alpha’s outstanding perfor- the design to 0.6-micron BiCMOS and enabled clock rates
mance to sell very fast systems at premium prices—a nice of 75 to 120 MHz. This version also cut the supply voltage
niche business, but hardly a factor in the PC market. Next year, to 3.3 V and added dynamic power management circuitry.
34 IEEE Micro
This feature shuts down portions of
the chip not in use on a cycle-by- Instr. TLB
(32 entries) 8-Kbyte instruction cache
cycle basis, slashing typical power
consumption. Then Intel shrank the
design once again to 0.35-micron 128 64
Branch 1 micro-op
BiCMOS, enabling clock speeds up Simple decoder
to 166 MHz. A minor revision of this target Reorder
buffer 1 micro-op
design pushes the clock speed to 200 Simple decoder buffer
MHz—more than three times that of 4 micro-ops (40 entries)
General decoder
the original Pentium.
Instruction
To keep system design relatively fetch unit In-order RAT RRF
section Micro-op sequencer
easy, however, Intel has held the sys-
32
tem bus speed at 60 or 66 MHz.
Because of this, there is a huge gap
between increasing core CPU speeds Reservation station
(20 entries)
and the bandwidth of the external
bus, which provides access to the
level-two cache as well as to main Store Load
memory. This reduces the benefit of Store Integer FP Integer
addr addr
data ALU unit unit
faster core speeds; the 200-MHz unit unit
Pentium has a typical performance
gain of less than 10% over the 166-
MHz chip. Power consumption has Memory reorder
buffer (MOB) Out-of-order
also crept up to uncomfortable lev- execution engine
els as the clock speed has increased, 1 load 1 load
Load data 32
keeping the 166-MHz and faster
chips out of portable systems. Data TLB
(64 entries) 8-Kbyte dual-ported data cache
Intel will mitigate these problems
early next year with a new version of 64
Pentium, code-named the P55C and
implemented in 0.28-micron CMOS. System bus interface Level-two cache interface
By doubling the size of the on-chip
cache, Intel estimates that the miss rate
will decrease 20 to 40% on typical 36 Address 64 Data 64 Data
Windows applications, mitigating the
performance loss from the relatively
slow external bus. The P55C will also Figure 2. Pentium Pro microprocessor block diagram. TLB: translation look-aside
include pipeline enhancements to buffer.
boost its per-clock performance, as
well as the MMX instruction set exten-
sions for multimedia (described later). Intel’s most recent microprocessor design, Pentium Pro
The P55C will mark Intel’s shift away from the BiCMOS (P6), takes a far more aggressive approach to deliver more
process technology of earlier Pentiums. The 0.28-micron performance per clock cycle while also enabling higher clock
(drawn gate size) process enables Intel to reduce the supply speeds. Figure 2 shows the processor’s block diagram.
voltage from 3.3 to 2.8 V, which significantly reduces power The Pentium Pro design completely decouples instruction
consumption. At this low voltage, however, bipolar transis- dispatch and execution, translating x86 instructions into inter-
tors offer little benefit, making the extra process steps of nal micro-operations, not unlike traditional microcode
BiCMOS unjustified. The supply voltage reduction will make instructions. These micro-ops then pass to a 40-entry reorder
higher clock rates practical for portable systems and will sim- buffer, where they are stored until any required operands
plify cooling in desktop systems. are available. From there, they are issued to a 20-entry reser-
vation station, which queues them until the needed execu-
Intel’s new frontier: Pentium Pro tion unit is free. This design allows micro-ops to execute out
The Pentium design uses a simple, restrictive approach to of order, making it easier to keep parallel execution resources
superscalar operation. Its two pipelines do not operate entire- busy. At the same time, the fixed-length micro-ops are eas-
ly independently; when one stalls, the other must stop as ier to handle in the speculative, out-of-order core than com-
well, so no out-of-order execution is allowed. Furthermore, plex, variable-length x86 instructions.
the floating-point unit is not autonomous but relies on the To enable high clock speeds, Pentium Pro is very deeply
integer pipelines, so integer and floating-point instructions pipelined (also called superpipelined). Because the reser-
cannot execute in parallel. vation station represents an elastic element, the pipeline does
December 1996 35
Current status
512K 620/233
9 333 To push Pentium Pro performance
PPro/200 620/200
8 as high as possible, Intel designed a
R10000/ Ultra/ 604/225 604e/200 special level-two cache chip that is
7 200 200 mounted in the same package with
PPro/150
6
the CPU chip. The connections
Ultra/167 604/180 between the CPU and the cache chip
604/150 Alpha are point to point and don’t leave the
5 PowerPC
package, which enables Intel to use
R4400/250 Sparc
Mips nonstandard voltage levels and
4 PA-RISC achieve high data rates. The level-two
Intel cache chip, which Intel makes in both
256- and 512-Kbyte versions, delivers
3
64 bits per clock cycle, even with CPU
clock speeds up to 200 MHz.
SS2/85 This cache strategy was effective in
bringing Pentium Pro to market with
3Q95 4Q95 1Q96 2Q96 3Q96 4Q96 1Q97 2Q97 performance numbers that sent shock
Date of first system shipments waves through the planning depart-
ments of most RISC microprocessors
makers. As Figure 3 shows, at its
Figure 3. SPECint95 (base) performance versus time for x86 and RISC architectures. introduction Pentium Pro exceeded
Numbers following processor names are clock speeds in MHz. the SPECint95 performance of all
shipping RISC microprocessors. This
position didn’t last long, however, as
Intel has gone more than a year without either increasing clock
20
speed or introducing a new microarchitecture. Each of Intel’s
18 RISC competitors has done one or both. As Figure 4 shows,
Pentium Pro is even further behind the RISCs when it comes
SPECint95 (base) to floating-point performance.
16
SPECfp95 (base)
In the long run, however, Intel doesn’t want to devote half
14 its fab capacity to relatively low-margin SRAMs; it has been
working with SRAM makers to provide industry-standard
Log scale
R10000-200
PPC604e-225
PentiumPro-200
PA-8000-180
21164-500
36 IEEE Micro
Table 2. Key features of selected x86 microprocessors.
Maximum clock 120 200 200 150 200 233-266† 100 >180 150 225
rate (MHz)
Pinout P54C P54C P54C PPro PPro Klamath P54C P55C P54C P55C
Cache (data/ 8/8 8/8 16/16 8/8 8/8 N.A. 8/16 32/32 16/16 64 (unified)
instr., Kbytes)
MMX No No Yes No No Yes No Yes No Yes
Decode rate 2 2 2 3 3 3 1-4 2 2 2
(instr./clock cycle)
Issue rate per 2 2 2 5 5 5 4 4 2 2
clock cycle instr.* instr. instr. micro-ops micro-ops micro-ops micro-ops micro-ops instr. instr.
Out-of-order No No No Yes Yes Yes Yes Yes Limited Limited
execution
Die size (mm2) 148 90 140 308 196 N.A.** 181 ~180 167 <200
Transistors (millions) 3.3 3.3 4.5 5.5 5.5 N.A. 4.0 8.8 3.3 6.0
Process 0.5/4 0.35/4 0.28/4 0.5/4 0.35/4 0.28/4 0.35/3 0.35/5 0.44/5 0.35/5
(µm/layers) BiCMOS BiCMOS CMOS BiCMOS BiCMOS CMOS CMOS CMOS CMOS CMOS
Mfg. cost† $50 $40 $60 $180†† $145†† N.A. $70 $85 $70 $95
Production Now Now 1Q97 Now Now 1H97 Now 1H97 Now 1H97
List price*** $106-134 $204-509 N.A. $534 $428-1,035 N.A. $60-134 N.A. $98-299 N.A
microprocessors. Table 2 summarizes the key features of more complex design and a 30% greater transistor count.
today’s most important x86 microprocessors. As of October 1996, AMD had been unable to make the
AMD has a long history as an alternative supplier of x86 chip run faster than 100 MHz, while Intel was shipping
microprocessors. The company was a licensed alternate Pentiums at up to 200 MHz. This failing relegated AMD to the
source of Intel’s 8086 and 286 microprocessors, but the tech- low end of the PC microprocessor business, leaving little
nology exchange agreement between the two companies profit for a chip as large as the K5 (see Table 2). At the same
broke down into a bitter and drawn-out arbitration. As a time, the 486 market had largely dried up, and what
result, Intel never transferred its 386 or later technology to remained was priced in the $20-30 range, leaving AMD no
AMD. Instead, AMD entered the 386 and 486 markets by significant older products to fall back on.
reverse-engineering Intel’s chips. This involved extracting AMD recently released an improved version of the K5
the circuit designs, making minor modifications (such as for design that eliminates bottlenecks and reaches the original-
static rather than dynamic operation), and producing new ly targeted performance levels. At 100 MHz, it delivers per-
physical layouts tuned for AMD’s process technology. formance equivalent to a 133-MHz Pentium, moving AMD
This path proved successful in that it enabled AMD to con- into the midrange Pentium market.
tinue supplying microprocessors to the PC industry. AMD’s big opportunity, however, depends on the K6—a
However, it offered AMD little opportunity for differentia- design that started life as the NexGen 686, which AMD
tion and no chance of catching up with Intel’s performance bought NexGen to obtain. Like the Pentium Pro and K5, the
level. AMD couldn’t even begin its reverse-engineering and K6 uses a decoupled decode/execute design in which x86
reimplementation process until Intel shipped a product. instructions are first decoded into internal, RISC-like opera-
AMD therefore decided to create an entirely independent tions. AMD also is adding the MMX instruction set exten-
design, taking from the Intel chips only the instruction set (for sions. As the K5 design has shown, though, the devil is in the
software compatibility) and the bus interface and pinout (for details: A design’s effectiveness depends on a multitude of
system interface compatibility). After several delays, the K5 subtle design issues, any one of which can become a per-
reached the market, but without delivering the anticipated formance-limiting bottleneck. On paper, the K6 looks good,
performance level. The chip was supposed to deliver perfor- but until AMD ships its first K6 samples, due by the end of
mance 30% higher than an Intel Pentium processor at the same 1996, how well it performs will remain an unknown.
clock rate. Instead, it barely matched Intel’s per-clock perfor- Unlike AMD, Cyrix designed its own x86 cores from the
mance on Windows application benchmarks, despite a much start. The company started with a low-end 486-class core,
December 1996 37
Current status
Table 3. Key features of selected high-performance microprocessors. (Source: Vendors except where noted)
Clock rate (MHz) 500 200 225 250 110 180 160* 200 180 200
Cache size (Kbytes) 8/8/96 32/32 32/32 16/16 16/8 None 64/64 32/32 32/32 8/8
Issue rate 4 4 4 4 2 4 2 4 2 3
(instr./cycle)
Pipeline stages 7 5 6 6/9 5 7-9 5 5-7 5 12-14
Out-of-order 6 loads 16 instr. 16 instr. None None 56 instr. None 32 instr. None 40 ROPs
execution
Rename registers None 8 int/ 12 int/ None None 56 total None 32int/ None 40 total
8 FP 8 FP 32FP
Memory bandwidth ~400 1,200 ~180 1,300 ~100 768 213 539 ~160 528
(Mbytes/s)
Package, pins CPGA- CBGA- CBGA- PBGA- CPGA- LGA- CPGA- CPGA- SBGA- MCM-
499 625 255 521 321 1,085 464 527 272 387
Process (µm/layers) 0.35/4 0.35/4 0.35/4 0.29/5 0.4/3 0.5/4 0.5/4 0.35/4 0.35/3 0.35/4
Die size (mm2) 209 240* 148 149 233 345 259 298 84 196
Transistors (millions) 9.3 6.9 5.1 3.8 2.3 3.9 9.2 5.9 3.6 5.5
Estimated $150 $210 $60 $90 $80 $290 $95 $160 $25 $175**
mfg. cost*
Maximum 25 30 20* 30 9 >40 15 30 10 35**
power (W)
SPEC95 baseline 12.6/18.3 9.0/9.0* 8.5/7.0 8.5/15 1.4/1.9 10.8/18.3 5.5/7.3 8.9/17.2 4.0/3.7 8.7/6.0
performance
(integer/FP)
Availability Now 1H97 Now Now Now Now Now Now Now Now
List price (1,000) N.A. N.A. $594 $1,995 $379 N.A. N.A. $3,000 $365 $1,035
which it leveraged into a range of products from the 386SX- is all but guaranteed the lion’s share of the market, but AMD
pin-compatible 486SLC to a 486DX2. Cyrix abandoned these and Cyrix have the opportunity to gain a minority share big
products at the end of 1995, however, as it began the switch enough to be quite significant for them—if they execute well.
to its Pentium-class core, code-named the M1 and officially By the end of 1997, however, there may be other com-
called the 6x86. This chip delivers impressive performance petitors to contend with. Texas Instruments has a long-pend-
per clock cycle: At 133 MHz, for example, it outperforms a ing effort to develop its own x86 CPU core; at least four
166-MHz Pentium on common Windows application bench- start-ups in the United States are working on x86 micro-
marks. Rather than using the complex decoupled processors; and semiconductor makers in Korea and Japan
decode/execute approach of Pentium Pro and the K5, the are probably considering similar efforts as well.
6x86 extends Pentium’s relatively straightforward dual-
pipeline approach with additional features that enable both The pursuit of speed
pipelines to run concurrently more often. In the never-ending pursuit of maximum performance,
If Cyrix had access to Intel’s leading-edge process tech- microprocessor makers have followed a variety of strategies.
nology, its chips might match Intel’s Pentium clock rates. But In each case, designers must make countless judgment
as things stand, Cyrix uses 0.44-micron CMOS technology to calls—generally backed by simulations—on myriad design
compete against Intel’s 0.35-micron chips. That Cyrix can options, hoping to make the best use of transistor budgets.
beat Intel’s Pentium performance even with this handicap is Table 3 summarizes the key characteristics of today’s high-
a testament to the efficiency of its design. est performance microprocessors.
Like AMD, Cyrix will move to a next-generation design in Perhaps the most fundamental trade-off is between doing
early 1997 that will be key to its future success. Code-named lots of work in each clock cycle—which tends to generate
the M2, this chip is based on the 6x86 core but adds a much complex designs with limited clock rates—or streamlining
larger 64-Kbyte cache and other performance enhancements, the design as much as possible in pursuit of maximum clock
as well as the MMX instruction set extensions. speed. Sun’s SuperSparc is a notable example of a chip that
In 1997, makers of leading-edge PCs will be able to use pushed complexity too far, giving up too much in clock rate
Intel’s P55C or P6-series chips, AMD’s K6, or Cyrix’s M2. Intel to justify the per-clock efficiency. Sun remedied this in its
38 IEEE Micro
next design, UltraSparc.
Digital has been the most suc- Branch Instruction
Instruction
history TLB
cessful proponent of the maximum cache (8 Kbytes)
(2K×2) (48 entries) System
clock speed approach. The company bus
plans to ship 500-MHz processors 128 IFC 128
this year, while most other vendors’ Instruction
chips will be at 200 to 250 MHz. buffers
External
Digital’s Alpha 21164 does deliver PC Dispatch logic cache
the industry’s best performance, but unit control
not by as big a margin as the high Decoded Instructions
128
clock speed would indicate. As part
of the speed/complexity trade-off, it Dual Floating- Floating-
integer point add/ point
has among the industry’s worst per- divide multiply
formance per clock cycle. Figure 5 units
shows a block diagram of the 21164. 64
Virtual address
December 1996 39
Current status
than twice as many logic transistors. The figure shows that silicon area can deliver much greater performance on signal-
high-end processors today typically have CPU cores with 2 processing applications than could an equal area in an extend-
to 4 million transistors devoted to logic. The number of tran- ed general-purpose architecture.
sistors devoted to memory ranges from less than 1 million to DSP chips are not new; indeed, they are at the heart of
more than 6 million. most modems, cellular phones, disk drives, and countless
other devices. They have had little success in PCs, howev-
Extending instruction sets for multimedia er, because they aren’t well optimized for the PC environ-
Although the gulf in instruction set design style between ment. However, several companies are now making media
the x86 and RISC camps remains, they do agree on one point: processors carefully designed for PCs. These chips typical-
Modest extensions to the instruction set can significantly ly have PCI bus interfaces, integrated codecs or codec inter-
improve multimedia performance. A small increase in die faces, and graphics engines that provide compatibility with
area delivers a significant boost in performance for functions legacy PC display controller standards (such as VGA). Most
such as MPEG encoding and decoding, audio synthesis, importantly, makers of these PC media processors also pro-
image processing, and modems. vide driver software that enables applications to commu-
At the heart of most vendors’ multimedia extensions are nicate with the chips via Microsoft’s DirectX application
single-instruction, multiple-data (SIMD) operations. By tak- programming interfaces (APIs). Thus, programmers need
ing a 64-bit ALU and allowing the carry chain to be broken not customize application programs for each hardware
at various points, essentially the same amount of logic can design.
perform two 32-bit operations, four 16-bit operations, or Today, a start-up company called Chromatic Research
eight 8-bit operations, all in parallel. One complication is (Sunnyvale, Calif.) is the closest to shipping such a media
that multiple carry bits are not available. Fortunately, how- processor. Like many pioneering microprocessor companies
ever, most signal-processing operations benefit from satu- of recent times, Chromatic Research is fabless. LG Semicon
ration arithmetic. Instead of rolling over and setting the carry and Toshiba manufacture and sell the chips, while Chromatic
bit, saturation arithmetic sets the result at the minimum or sells the software that makes them work. Chromatic’s Mpact
maximum value. Most multimedia extensions add satura- media processor can perform not only 2D and 3D graphics
tion arithmetic as an option. Other common additions are rendering but also MPEG-1 and MPEG-2 decompression,
instructions for multiply-add and data element packing and MPEG-1 compression, teleconferencing, 33-Kbps fax/
unpacking. modem, and audio synthesis. Philips has its own media
HP was the first to add such extensions to its RISC archi- processor, TriMedia; Samsung, Mitsubishi, IBM, and others
tecture, but HP’s instructions are quite simple. Sun offers the have media processors in the works.
most comprehensive set of extensions in its VIS (Visual Whether these media processors have a long-term role in
Instruction Set), implemented in UltraSparc. Sun’s extensions PCs remains a subject of controversy. From Intel’s perspec-
include some relatively complex instructions, such as pixel tive, there is room for only one programmable processor in
distance, in addition to the simpler SIMD operations. a system. In this view, functions that require hardware accel-
The most widely discussed, though not yet shipped, set eration—such as 3D rendering—are best performed by fixed-
of extensions is Intel’s MMX, which will appear next year in function accelerators. In time, as the PC’s central processor
the P55C and Klamath processors. Both AMD and Cyrix will becomes faster, less opportunity will remain for media
offer MMX-compatible extensions next year as well. Intel processors. In the near term, though, there appears to be a
estimates that the performance of MMX-enhanced code will clear opportunity for such processors to boost PC capabili-
be from 1.4 times better for MPEG video decoding to more ties for a modest incremental cost.
than 4 times better for still-image processing (such as Adobe
Photoshop filtering). Of course, most programs won’t ben- Embedded processors enable digital
efit at all, and compilers don’t use MMX—programmers must consumer electronics
handcraft the code to realize the benefits. Embedded microprocessors rarely bask in public attention
The Mips and Alpha camps recently announced their own or earn huge profits, but manufacturers produce them in enor-
multimedia extensions, leaving PowerPC as the only popu- mous volume and in great diversity. Because software com-
lar architecture not to follow suit. This is ironic, since patibility is not as driving a force as in the desktop market,
PowerPC’s primary user—Apple—focuses on multimedia, the embedded market allows more architectures to survive.
and one of the PowerPC’s predecessors—Motorola’s ill-fated Early embedded microprocessor applications were control
88110—had a set of graphics instruction set extensions. oriented: Traffic-light and elevator controllers are the classic
examples. As microprocessor performance increased, the range
Media processors enter the fray of tasks that processors can handle broadened. The vast major-
General-purpose microprocessors can improve their han- ity of embedded applications don’t demand any more perfor-
dling of multimedia data types through instruction set exten- mance than low-cost 8-bit—or even 4-bit—processors offer.
sions, but there are compelling reasons to use a separate Figure 7 shows that, as a result, the bulk of the volume remains
processor for these tasks. DSP-like architectures provide mul- with these older devices, which continue to evolve by adding
tiple operand data paths, very-long-instruction-word-like more on-chip peripherals and memory. Ancient 4-bit proces-
arrangements, and other special features that make them fast sors have remained surprisingly popular, but new designs
but often hard to program. With these characteristics, a given rarely use them because low-end 8-bit devices have dropped
40 IEEE Micro
to very low prices and are easier to program. Even so, 4-bit 1,800
chips—long considered obsolete by most observers—are only 4-bit
now beginning to fade away and will continue shipping more 1,600 8-bit
than a billion units per year through the end of the decade. 16-bit
Some automotive engine controllers, as well as disk and 1,400 32-bit
network cards for PCs, use 16-bit embedded processors.
(Note that Figure 7 defines 16-bit processors by their exter-
December 1996 41
Current status
18.432 MHz
phase- D0-D31
3.6864 MHz
locked loop ARM 710a
State RUN
ARM7 control RESET,
32.786 kHz 32.768 KHz microprocessor WAKEUP
oscillator core
ROM/expansion
CS 0-7
control
EINt1-3, FIQ Interrupt ready
MEDCHG controller 8-Kbyte
cache DRAM
BATOK, EXTPWR Power controller RAS 0-3
PWRFL, BATCHG management CAS 0-3
Memory
Ports A, B, C, D (8 bit) management
GPIO Multiplexer
Port E (4 bit) unit Address/
data
Keyboard column PSU
drives (0-7) dc to dc control
Counters
(2) DRAM
LCD drive
CLK, SYNC Synchronous controller
IN, OUT, SMPCK serial I/O
Figure 8. Block diagram of Cirrus Logic’s CL-PS7110 integrated processor with ARM7 core.
infrastructure elements, could be a significant enabler of reduces the cost of power supplies and eliminates the need
broader competition in the microprocessor business. for a fan.
These changes in the embedded market have led to major
Embedded processors proliferate shifts in market share. As Figure 1 shows, Hitachi’s SH series
Table 4 summarizes the key features of a few of the more has come from nowhere to lead 32-bit RISC processor ship-
than one hundred 32- and 64-bit embedded processors now ments on the strength of Sega’s video games and other con-
available. As application demands and the competitive envi- sumer applications. Meanwhile, Intel’s more traditional 960
ronment have changed, architectures have evolved. Digital’s series, once the industry leader, has stagnated. AMD has
StrongARM is a stunningly fast derivative of the power-miser- entirely stopped future development of its 29000 family, once
ly but not especially fast ARM architecture. Hitachi’s new the 960’s top competitor.
SuperH series has a wide range of devices, of which the table
lists only one. Similarly, Motorola and IBM are each pro- Customization for embedded applications
ducing numerous PowerPC variations for embedded control As transistor counts in chips selling for under $100 (and
applications. eventually under $30) skyrocket to millions—and soon to
Motorola is the champion of embedded processor prolif- tens of millions—processors for PCs will continue to use
erations, with uncounted 68000 variations. Now it has even most of these transistors to increase performance. For most
modified the base instruction set architecture to produce the embedded applications, however, the demands for ever-
RISC-like ColdFire subset. NEC, along with IDT and LSI Logic, higher performance just aren’t there. Instead, embedded-
is pushing the Mips architecture into embedded applications; application designers would like to reduce system costs by
Table 4 shows only one of many options. Intel continues to integrating more functions on the same chip with the micro-
develop its 960 series, which is successful in some markets processor. The logical end point of this evolution is a com-
but shows little sign of progress in the expanding market for plete system on a chip. Technology is reaching a point where
low-cost 32-bit processors. (The PC market is a formidable chips can integrate even significant amounts of memory. For
distraction for Intel.) example, eliminating half the DRAM array from a 64-Mbit
As high-performance embedded processors move into DRAM still leaves 4 Mbytes of memory and room for mil-
consumer electronics, low power consumption becomes as lions of logic transistors.
important as low price. In portable applications, the value of As embedded microprocessors evolve toward systems on
low power is obvious: longer battery life or smaller, lighter a chip, they inevitably become more specialized. Different
batteries. Even in nonportable consumer applications, how- applications have different needs for memory, peripheral con-
ever, low power consumption is important, because it trollers, and interfaces to the external world. The desire for
42 IEEE Micro
Table 4. Key features of selected embedded microprocessors. (Source: Vendors except where noted)
Digital VLSI NEC Hitachi IBM PPC Motorola Motorola Motorola Intel Intel
Feature SA-110 ARM710 R4300 SH7604 403GA 860DC 68EC040 CF5102 960JA 960HT
Architecture ARM StrongARM Mips SuperH PowerPC PowerPC 68000 ColdFire i960 i960
Clock rate (MHz) 200 40 133 20 33 40 40 25 33 60
Instr./data cache 16/16 8/8 16/8 4/4 2/1 4/4 4/4 2/1 2/1 16/8
size (Kbytes)
FPU No No Yes No No No Yes No No No
MMU Yes Yes Yes No No Yes Yes No No No
Bus frequency (MHz) 66 40 66 20 33 40 40 25 33 20
MIPS† 230 36 160* 20 41 52 44 27 28 100*
Voltage 2.0/3.3** 5 3.3 3.3 3.3 3.3 5 3.3 3.3 3.3
Power (typical, mW) 900 424 2,200 200 265 900 4,500 900 500 4,500
MIPS/watt 239 85 73 100 155 58 10 30 56 22
MIPS/price 4.30 1.04 5.00 0.24 1.05 0.51 0.59 N.A. 0.76 0.79
Transistors (millions) 2.1 0.6 1.7 0.45 0.58 1.8 1.2 N.A. 0.75 2.3
Process (µm/layers) 0.35/3 0.6/2 0.35/3 0.8/2 0.5/3 0.5/3 0.65/3 0.6/3 0.8/3 0.6/4
Die size (mm2) 50 34 45 82 39 25 163 N.A. 64 100
Estimated mfg. cost* $18 $9 $11 $7 $14 $20 $30 $9 $8 $34
Availability Now Now Now Now Now Now Now Now Now Now
List price (10,000s) $49 $28 $32 $27 $28 $102 $75 $25 $37 $126
* MicroDesign Resources estimate ** Core/bus voltage † MIPS rating as supplied by vendor, based on Dhrystone 2.1
highly integrated system chips is Table 5. Originators and licensees of RISC processors
increasing the demand for building- for embedded applications.
block microprocessors that can func-
tion as parts of application-specific Architecture Originator Licensees
integrated circuits (ASICs). Many of
the leading microprocessor vendors ARM ARM Ltd. Asahi Kasei Microsystems (AKM), Alcatel, Atmel,
are not major ASIC suppliers, how- Cirrus Logic, Digital, GEC Plessey, LG Semicon,
ever, nor are they set up to customize NEC, Oki, Samsung, Sharp, Symbios Logic,
chips for every customer. Indeed, Texas Instruments, VLSI Technology, Yamaha
eliminating the need to do so was a ColdFire Motorola Mitsubishi
key benefit of the microprocessor in Mips Mips Technologies Integrated Device Technology, LSI Logic, NEC,
the first place. NKK, Philips, QED, Sony, Toshiba
LSI Logic is one company that has PowerPC IBM Microelectronics Mitsubishi
pioneered the design of ASICs with SuperH Hitachi VLSI Technology
microprocessor cores. Many other Sparc Sun Microelectronics C-Cube, Fujitsu, Hyundai, Matra MHS, Scientific
companies, including Texas Instru- Atlanta, TGI
ments, IBM Microelectronics, VLSI
Technology, and NEC, are also
aggressively developing this technology. Not only must these Table 5 shows CPU architectures that companies have
companies have a range of microprocessor cores available, licensed to chip and equipment makers for embedded appli-
but they must provide a variety of other complex building cations. Motorola continues to keep most of its cores pro-
blocks, such as MPEG decoders and graphics engines, as prietary and is gradually allowing more and more customer
well as the software tools to design, debug, verify, and test involvement in the design process.
the chips. In the future embedded-processor market, these Packaging is another key area that needs improvement. As
factors may be more important than the processor cores designers put more functions on a chip, the chips need more
themselves. input/output pins. Today’s common plastic quad flat packs
In this world of core-based ASICs, some microprocessor offer a cost per pin around 2 cents, but can’t provide pin counts
cores are becoming near commodities. Advanced RISC much beyond 200. High-pin-count pin grid arrays typically
Machines (ARM) in the UK has licensed its core designs wide- have costs around 10 cents per pin—leading to a $50 package
ly, and many companies offer ARM cores as part of their ASIC for a 500-pin device. New packaging technologies, such as
libraries. Mips has also licensed its cores widely, though not plastic ball grid arrays and various chip-scale packages, promise
as widely as ARM, and Sparc cores have a few licensees. high-pin-count packages with costs approaching a penny per
December 1996 43
Current status
pin in the next few years. If this comes to pass, it would be a the leader of the team; Jim Turley, who tracks embedded
significant enabling technology for highly integrated, low-cost microprocessors and their applications; and Yong Yao, Peter
chips. Sometimes the silicon seems like the easy part! Glaskowsky, and Steve Hammond, who track a range of PC
hardware technologies.
44 IEEE Micro