AMD Athlon



Technical Brief

Publication # 22054 Rev: D

Issue Date: December 1999
Revision History
Date Rev Description
August 1999 C Initial public release.
Added information about AMD's new 0.18-micron process technology to “Process Technology”
December 1999 D
on page 7

AMD Athlon™ Processor

Technical Brief


The AMD Athlon™ processor powers the next generation in

computing platforms, delivering the ultimate performance for
cutting-edge applications and an unprecedented computing

The AMD Athlon™ processor is the first member of a new

family of seventh-generation AMD processors designed to meet
the computation-intensive requirements of cutting-edge
software applications running on high-performance desktop
systems, workstations, and servers. This technical brief
describes the features of the AMD Athlon processor’s

The AMD Athlon processor’s microarchitecture is designed to

support the g rowing processo r and system bandwidt h
requirements of emerging software, graphics, I/O, and memory
technologies. The AMD Athlon processor's high-speed
execution core includes multiple x86 instruction decoders, a
dual-ported 128-Kbyte split level-one (L1) cache, three
independent integer pipelines, three address calculation
pipelines, and the x86 industry's first superscalar, fully
pipelined, out-of-order, three-way floating-point engine. The
floating-point engine is capable of delivering 2.4 gigaflops
(G flops) of single-pre cision and more than 1 Gflop of

double-precision floating-point results at 600 MHz for superior

performance on numerically complex applications.

The AMD Athlon processor’s microarchitecture includes:

■ The industry's first nine-issue, superpipelined, superscalar
x86 processor microarchitecture designed for high clock
• Multiple x86 instruction decoders
• 72-entry instruction control unit
• Advanced dynamic branch prediction
• Three out-of-order, superscalar, fully pipelined
floating-point execution units, which execute all x87
(floating-point), MMX™ and 3DNow!™ instructions
• Three out-of-order, superscalar, pipelined integer units
• Three out-of-order, superscalar, pipelined address
calculation units
■ Enhanced 3DNow! technology with new instructions to
enable improved integer math calculations for speech or
video encoding and improved data movement for internet
plug-ins and other streaming applications
■ High-performance cache architecture featuring an
integrated 128-Kbyte L1 cache and a programmable,
high-speed backside L2 cache interface
■ 200-MHz AMD Athlon system bus (scalable beyond 400
MHz) enabling leading-edge system bandwidth for data
movement-intensive applications

AMD Athlon™ Processor Microarchitecture

The AMD Athlon processor is based on a seventh-generation

x86 microarchitecture that features a superpipelined,
nine-issue superscalar microarchitecture optimized for high
clock frequency. The AMD Athlon has a large dual-ported
128-Kbyte split-L1 cache (64-Kbyte instruction cache +
64-Kbyte data cache), a two-way, 2048-entry branch prediction
table, multiple parallel x86 instruction decoders, and multiple
integer and floating-point schedulers for independent
superscalar, out-of-order, speculative execution of instructions.
These elements are packed into an aggressive processing
p i p e l i n e t h a t i n c l u d e s 1 0 -s t a g e i n t e g e r a n d 1 5 -s t a g e
floating-point pipelines, which are illustrated in Figure 1.

2-Way, 64-Kbyte Instruction Cache Predecode Branch

24-Entry L1 TLB/256-Entry L2 TLB Cache Prediction Table

3-Way x86 Instruction Decoders

Instruction Control Unit (72-Entry)

Integer Scheduler (18-Entry) FPU Stack Map / Rename

FPU Scheduler (36-Entry)
FPU Register File (88-Entry)
Bus IEU0 AGU0 IEU1 AGU1 IEU2 AGU2 L2 Cache
Interface FADD FMUL FSTORE Controller
3DNow!™ 3DNow!

Load / Store Queue Unit

2-Way, 64-Kbyte Data Cache

32-Entry L1 TLB/256-Entry L2 TLB

System Interface L2 SRAMs

Figure 1. AMD Athlon™ Processor Block Diagram

Multiple Decoders
The AMD Athlon processor includes three full x86 instruction
decoders. These decoders translate x86 instructions into
fixed-length MacroOPs for higher instruction throughput and
increased proc essing power. Inst ead of exec ut ing x86
instruct ions, which have lengths of 1 to 15 bytes, the
AMD Athlon processor executes the fixed-length MacroOPs,
while maintaining the instruction coding efficiencies found in
x86 programs.

Instruction Control Unit

Once MacroOPs are decoded, up to three MacroOPs per cycle
are dispatched to the instruction control unit (ICU). The ICU is
a 72-entry MacroOP reorder buffer (ROB) that manages the
execution and retirement of all MacroOPs, performs register
renaming for operands, and controls any exception conditions
and instruction retirement operations. The ICU dispatches the
MacroOPs to the AMD Athlon processor’s multiple execution
unit schedulers.

Execution Pipelines
T h e A M D A t h l o n p ro c e s s o r c o n t a i n s a n 1 8 -e n t ry
integer/address generation MacroOP scheduler and a 36-entry
floating-point unit (FPU)/multimedia scheduler. These
schedulers issue MacroOPs to the nine independent execution
pipelines — three for integer calculations, three for address
calculations, and three for execution of MMX, 3DNow!, and x87
floating-point instructions.
The AM D Athlon pro ce sso r o f fe rs the mo st powe rful,
architecturally advanced floating-point engine ever delivered
in an x86 microprocessor. The AMD Athlon processor's
three-issue, superscalar floating-point capability is based on
three pipelined, out-of-order floating-point execution units,
each with a one-cycle throughput. These three execution units
(FMUL, FADD, and FSTORE) execute all x87 (floating-point)
instructions, MMX instructions, and enhanced 3DNow!
instructions. Using a data format and single-instruction
multiple-data (SIMD) operations based on the MMX instruction
model, the AMD Athlon processor can deliver as many as four
32-bit, single-precision floating-point results per clock cycle,
resulting in a peak performance of 2.4 Gflops at 600 MHz.
Branch Prediction
The AMD Athlon processor offers sophisticated dynamic
branch prediction logic to minimize or eliminate the delays due
to the branch instructions (jumps, calls, returns) common in x86
software. The processor includes the following:
■ Branch prediction table
■ Branch target address table
■ Return address stack

The AMD Athlon processor implements a two-way, 2048-entry

branch prediction table. The branch prediction table stores
prediction information that is used for predicting the direction
of conditional branches. The branch target address table stores
target addresses of conditional and unconditional branches.
The return address stack optimizes CALL/RET instruction pairs
by storing the return address of each CALL within a nested
series of subroutines and supplying a return address as the
predicted target address of the corresponding RET instruction.

Enhanced 3DNow!™ Technology

The AMD Athlon processor includes enhanced 3DNow!
technology designed to take 3D multimedia performance to new
heights. The enhanced 3DNow! technology implemented in the
AMD Athlon includes AMD’s original twenty-one 3DNow!
instructions (the industry’s first x86 instruction set to use
superscalar SIMD floating-point techniques to accelerate 3D
performance), plus, twenty-four new instructions, which
perform the following functions:
■ Twelve instructions that improve multimedia-enhanced
integer math calculations used in such applications as
speech recognition and video processing
■ Seven instructions that accelerate data movement for more
detailed graphics and functionality for internet browser
plug-ins and other streaming applications, enabling a richer
internet experience
■ Five digital signal processing (DSP) instructions that
enhance the performance of communications applications,
including soft modems, soft ADSL, MP3, and Dolby Digital
surround sound processing

In enhancing 3DNow! technology, AMD kept the instruction set

design simple, yet powerful. AMD’s plan in designing the new
3 D N ow ! i n s t r u c t i o n s wa s t o p rov i d e p owe r f u l S I M D
performance while enabling ease of implementation for
software developers. The relatively few instructions of
enhanced 3DNow! technology allow developers to adopt this
technology and optimize their applications quickly.

Cache Architecture
The A MD At hlon processor’ s hig h-perfo rma nce cache
architecture includes an integrated, 64-bit, dual-ported
128-Kbyte split-L1 cache with separate snoop port, multi-level
translation lookaside buffers (TLBs), a scalable L2 cache
controller with a 72-bit (64-bit data + 8-bit ECC) interface to as
much as 8-Mbyte of industry-standard SDR or DDR SRAMs, and
an integrated tag for the most cost-effective 512-Kbyte L2

The AMD Athlon processor’s integrated L1 cache comprises

two separate 64-Kbyte, two-way set-associative data and
instruction caches. The data cache has eight banks to support
concurrent access by two 64-bit loads or stores. The instruction
c a c h e c o n t a i n s p re d e c o d e d a t a t o a s s i s t m u l t i p l e ,
high-performance instruction decoders. The robust bi-level TLB
structure minimizes code and data delays when accessing
physical memory.

The AMD Athlon processor’s L2 cache controller operates at a

programmable frequency for compatibility with a variety of
industry-standard SRAMs including DDR. The integrated L2
cache tag provides a full tag for a 512-Kbyte L2 cache or a
partial tag for larger L2 caches.

System Bus Interface

The 200-MHz AMD Athlon system bus interface — the fastest
bus implement at ion for x86 platforms — leverages the
high-performance Digital™ Alpha™ EV6 system interface
technology to significantly boost system performance and
p rov i d e a m p l e h e a d ro o m fo r t o d ay ' s a n d t o m o r row ' s
applications. The AMD Athlon system bus provides advanced
features, such as source synchronous clocking for high-speed
200-MHz-to-400-MHz operation, point-to-point topology for

peak data bandwidth independent of the number of processors,

packet-based transfers for improved transaction pipelining,
large 64-byte burst data transfers, 8-bit ECC protection of data
and instructions, low-voltage signaling for high-performance,
low-cost motherboard implementations, and the ability to
address more than eight terabytes of physical memory.
The 200-MHz system bus implemented in the AMD Athlon
processor is capable of delivering a peak data transfer rate of
1.6 Gbytes per second — twice that of previous processor
generations. With its source synchronous clocking design, the
AMD Athlon processor's system bus is scalable to operate
beyond 400 MHz.

Process Technology
The AMD Athlon processor is manufactured on AMD's six-layer
metal, 0.25-micron process technology and AMD's new
0.18-micron process technology. In 0.25-micron technology, the
approximately 22-million-transistor AMD Athlon processor has
a d i e si z e o f 1 8 4 m m 2 . I n 0 . 1 8 -m i c ro n t e chn o l ogy, t h e
AMD Athlon processor has a die siz e of 102 mm 2 . The
AMD Athlon processor is inc luded in a cost-eff ective,
industry-standard module form factor — Slot A, which is
mechanically compatible with the existing Slot 1 infrastructure,
and therefore, leverages commonly available chassis, power
supply, and thermal solutions.

T h e A M D A t h l o n p ro c e s s o r ' s s e ve n t h -g e n e ra t i o n
microarchitecture and high-bandwidth system bus enable it to
attain performance levels never before achieved by an x86
processor. The AMD Athlon significantly outperforms
previous-generation x86 processors and delivers the highest
integer, floating-point, and 3D multimedia performance
available for x86 platforms, as measured by industry-standard
The AMD Athlon provides industry-leading processing power
for cutting-edge software applications, including digital
content creation, digital photo editing, digital video, image
compression, video encoding for streaming over the internet,
sof t DVD, c ommerc ial 3D modeling, workstation-class
computer-aided design (CAD), commercial desktop publishing,
and speech recognition.
AMD Athlon™ Processor Microarchitecture 7

