How Microprocessors Work
How Microprocessors Work
How Microprocessors Work
by Marshall Brain
The computer you are using to read this page uses a microprocessor to do its work. The
microprocessor is the heart of any normal computer, whether it is a desktop machine, a
server or a laptop. The microprocessor you are using might be a Pentium, a K6, a PowerPC,
a Sparc or any of the many other brands and types of microprocessors, but they all do
approximately the same thing in approximately the same way.
If you have ever wondered what the microprocessor in your computer is doing, or if you have
ever wondered about the differences between types of microprocessors, then read on. In this
edition of HowStuffWorks, you will learn how fairly simple digital logic techniques allow a
computer to do its job, whether its playing a game or spell checking a document!
Microprocessor History
A microprocessor -- also known as a CPU or central processing unit -- is a complete
computation engine that is fabricated on a single chip. The first microprocessor was the Intel
4004, introduced in 1971. The 4004 was not very powerful -- all it could do was add and
subtract, and it could only do that 4 bits at a time. But it was amazing that everything was on
one chip. Prior to the 4004, engineers built computers either from collections of chips or from
discrete components (transistors wired one at a time). The 4004 powered one of the first
portable electronic calculators.
The first microprocessor to make it into a home computer was the Intel 8080, a complete 8-
bit computer on one chip, introduced in 1974. The first microprocessor to make a real splash
in the market was the Intel 8088, introduced in 1979 and incorporated into the IBM PC
(which first appeared around 1982). If you are familiar with the PC market and its history, you
know that the PC market moved from the 8088 to the 80286 to the 80386 to the 80486 to the
Pentium to the Pentium II to the Pentium III to the Pentium 4. All of these microprocessors
are made by Intel and all of them are improvements on the basic design of the 8088. The
Pentium 4 can execute any piece of code that ran on the original 8088, but it does it about
5,000 times faster!
The following table helps you to understand the differences between the different processors
that Intel has introduced over the years.
• The date is the year that the processor was What's a Chip?
first introduced. Many processors are re- A chip is also called an integrated circuit.
Generally it is a small, thin piece of silicon
introduced at higher clock speeds for many onto which the transistors making up the
years after the original release date. microprocessor have been etched. A chip
• Transistors is the number of transistors on might be as large as an inch on a side and
can contain tens of millions of transistors.
the chip. You can see that the number of Simpler processors might consist of a few
transistors on a single chip has risen steadily thousand transistors etched onto a chip
just a few millimeters square.
over the years.
• Microns is the width, in microns, of the smallest wire on the chip. For comparison, a
human hair is 100 microns thick. As the feature size on the chip goes down, the
number of transistors rises.
• Clock speed is the maximum rate that the chip can be clocked at. Clock speed will
make more sense in the next section.
• Data Width is the width of the ALU. An 8-bit ALU can add/subtract/multiply/etc. two
8-bit numbers, while a 32-bit ALU can manipulate 32-bit numbers. An 8-bit ALU
would have to execute four instructions to add two 32-bit numbers, while a 32-bit
ALU can do it in one instruction. In many cases, the external data bus is the same
width as the ALU, but not always. The 8088 had a 16-bit ALU and an 8-bit bus, while
the modern Pentiums fetch data 64 bits at a time for their 32-bit ALUs.
• MIPS stands for "millions of instructions per second" and is a rough measure of the
performance of a CPU. Modern CPUs can do so many different things that MIPS
ratings lose a lot of their meaning, but you can get a general sense of the relative
power of the CPUs from this column.
From this table you can see that, in general, there is a relationship between clock speed and
MIPS. The maximum clock speed is a function of the manufacturing process and delays
within the chip. There is also a relationship between the number of transistors and MIPS. For
example, the 8088 clocked at 5 MHz but only executed at 0.33 MIPS (about one instruction
per 15 clock cycles). Modern processors can often execute at a rate of two instructions per
clock cycle. That improvement is directly related to the number of transistors on the chip and
will make more sense in the next section.
Inside a Microprocessor
To understand how a microprocessor works, it is helpful to look inside and learn about the
logic used to create one. In the process you can also learn about assembly language -- the
native language of a microprocessor -- and many of the things that engineers can do to
boost the speed of a processor.
A microprocessor executes a collection of machine instructions that tell the processor what
to do. Based on the instructions, a microprocessor does three basic things:
Let's assume that both the address and data buses are 8 bits wide in this example.
• Registers A, B and C are simply latches made out of flip-flops. (See the section on
"edge-triggered latches" in How Boolean Logic Works for details.)
• The address latch is just like registers A, B and C.
• The program counter is a latch with the extra ability to increment by 1 when told to do
so, and also to reset to zero when told to do so.
• The ALU could be as simple as an 8-bit adder (see the section on adders in How
Boolean Logic Works for details), or it might be able to add, subtract, multiply and
divide 8-bit values. Let's assume the latter here.
• The test register is a special latch that can hold values from comparisons performed
in the ALU. An ALU can normally compare two numbers and determine if they are
equal, if one is greater than the other, etc. The test register can also normally hold a
carry bit from the last stage of the adder. It stores these values in flip-flops and then
the instruction decoder can use the values to make decisions.
• There are six boxes marked "3-State" in the diagram. These are tri-state buffers. A
tri-state buffer can pass a 1, a 0 or it can essentially disconnect its output (imagine a
switch that totally disconnects the output line from the wire that the output is heading
toward). A tri-state buffer allows multiple outputs to connect to a wire, but only one of
them to actually drive a 1 or a 0 onto the line.
• The instruction register and instruction decoder are responsible for controlling all of
the other components.
Although they are not shown in this diagram, there would be Helpful Articles
control lines from the instruction decoder that would: If you are new to digital logic, you may find
the following articles helpful in
understanding this section:
• Tell the A register to latch the value currently on the
data bus • How Bytes and Bits Work
• Tell the B register to latch the value currently on the
data bus • How Boolean Logic Works
• Tell the C register to latch the value currently on the
data bus • How Electronic Gates Work
• Tell the program counter register to latch the value
currently on the data bus
• Tell the address register to latch the value currently on the data bus
• Tell the instruction register to latch the value currently on the data bus
• Tell the program counter to increment
• Tell the program counter to reset to zero
• Activate any of the six tri-state buffers (six separate lines)
• Tell the ALU what operation to perform
• Tell the test register to latch the ALU's test bits
• Activate the RD line
• Activate the WR line
Coming into the instruction decoder are the bits from the test register and the clock line, as
well as the bits from the instruction register.
ROM stands for read-only memory. A ROM chip is programmed with a permanent collection
of pre-set bytes. The address bus tells the ROM chip which byte to get and place on the data
bus. When the RD line changes state, the ROM chip presents the selected byte onto the
data bus.
RAM stands for random-access memory. RAM contains bytes of information, and the
microprocessor can read or write to those bytes depending on whether the RD or WR line is
signaled. One problem with today's RAM chips is that they forget everything once the power
goes off. That is why the computer needs ROM.
By the way, nearly all computers contain some amount of ROM (it is possible to create a
simple computer that contains no RAM -- many microcontrollers do this by placing a handful
of RAM bytes on the processor chip itself -- but generally impossible to create one that
contains no ROM). On a PC, the ROM is called the BIOS (Basic Input/Output System).
When the microprocessor starts, it begins executing instructions it finds in the BIOS. The
BIOS instructions do things like test the hardware in the machine, and then it goes to the
hard disk to fetch the boot sector (see How Hard Disks Work for details). This boot sector is
another small program, and the BIOS stores it in RAM after reading it off the disk. The
microprocessor then begins executing the boot sector's instructions from RAM. The boot
sector program will tell the microprocessor to fetch something else from the hard disk into
RAM, which the microprocessor then executes, and so on. This is how the microprocessor
loads and executes the entire operating system.
Microprocessor Instructions
Even the incredibly simple microprocessor shown in the previous example will have a fairly
large set of instructions that it can perform. The collection of instructions is implemented as
bit patterns, each one of which has a different meaning when loaded into the instruction
register. Humans are not particularly good at remembering bit patterns, so a set of short
words are defined to represent the different bit patterns. This collection of words is called the
assembly language of the processor. An assembler can translate the words into their bit
patterns very easily, and then the output of the assembler is placed in memory for the
microprocessor to execute.
Here's the set of assembly language instructions that the designer might create for the
simple microprocessor in our example:
If you have read How C Programming Works, then you know that this simple piece of C code
will calculate the factorial of 5 (where the factorial of 5 = 5! = 5 * 4 * 3 * 2 * 1 = 120):
a=1;
f=1;
while (a <= 5)
{
f = f * a;
a = a + 1;
}
At the end of the program's execution, the variable f contains the factorial of 5.
A C compiler translates this C code into assembly language. Assuming that RAM starts at
address 128 in this processor, and ROM (which contains the assembly language program)
starts at address 0, then for our simple microprocessor the assembly language might look
like this:
So now the question is, "How do all of these instructions look in ROM?" Each of these
assembly language instructions must be represented by a binary number. For the sake of
simplicity, let's assume each assembly language instruction is given a unique number, like
this:
• LOADA - 1
• LOADB - 2
• CONB - 3
• SAVEB - 4
• SAVEC mem - 5
• ADD - 6
• SUB - 7
• MUL - 8
• DIV - 9
• COM - 10
• JUMP addr - 11
• JEQ addr - 12
• JNEQ addr - 13
• JG addr - 14
• JGE addr - 15
• JL addr - 16
• JLE addr - 17
• STOP - 18
The numbers are known as opcodes. In ROM, our little program would look like this:
// Assume a is at address 128
// Assume F is at address 129
Addr opcode/value
0 3 // CONB 1
1 1
2 4 // SAVEB 128
3 128
4 3 // CONB 1
5 1
6 4 // SAVEB 129
7 129
8 1 // LOADA 128
9 128
10 3 // CONB 5
11 5
12 10 // COM
13 14 // JG 17
14 31
15 1 // LOADA 129
16 129
17 2 // LOADB 128
18 128
19 8 // MUL
20 5 // SAVEC 129
21 129
22 1 // LOADA 128
23 128
24 3 // CONB 1
25 1
26 6 // ADD
27 5 // SAVEC 128
28 128
29 11 // JUMP 4
30 8
31 18 // STOP
You can see that seven lines of C code became 17 lines of assembly language, and that
became 31 bytes in ROM.
The instruction decoder needs to turn each of the opcodes into a set of signals that drive the
different components inside the microprocessor. Let's take the ADD instruction as an
example and look at what it needs to do:
1. During the first clock cycle, we need to actually load the instruction. Therefore the
instruction decoder needs to:
• activate the tri-state buffer for the program counter
• activate the RD line
• activate the data-in tri-state buffer
• latch the instruction into the instruction register
2. During the second clock cycle, the ADD instruction is decoded. It needs to do very
little:
• set the operation of the ALU to addition
• latch the output of the ALU into the C register
3. During the third clock cycle, the program counter is incremented (in theory this could
be overlapped into the second clock cycle).
Every instruction can be broken down as a set of sequenced operations like these that
manipulate the components of the microprocessor in the proper order. Some instructions,
like this ADD instruction, might take two or three clock cycles. Others might take five or six
clock cycles.
Microprocessor Performance
The number of transistors available has a huge effect on the performance of a processor.
As seen earlier, a typical instruction in a processor like an 8088 took 15 clock cycles to
execute. Because of the design of the multiplier, it took approximately 80 cycles just to do
one 16-bit multiplication on the 8088. With more transistors, much more powerful multipliers
capable of single-cycle speeds become possible.
More transistors also allow for a technology called pipelining. In a pipelined architecture,
instruction execution overlaps. So even though it might take five clock cycles to execute
each instruction, there can be five instructions in various stages of execution simultaneously.
That way it looks like one instruction completes every clock cycle.
Many modern processors have multiple instruction decoders, each with its own pipeline. This
allows for multiple instruction streams, which means that more than one instruction can
complete during each clock cycle. This technique can be quite complex to implement, so it
takes lots of transistors.
The trend in processor design has been toward full 32-bit ALUs with fast floating point
processors built in and pipelined execution with multiple instruction streams. There has also
been a tendency toward special instructions (like the MMX instructions) that make certain
operations particularly efficient. There has also been the addition of hardware virtual memory
support and L1 caching on the processor chip. All of these trends push up the transistor
count, leading to the multi-million transistor powerhouses available today. These processors
can execute about one billion instructions per second!
For more information on microprocessors and related topics, check out the links on the next
page