Folien Intel NetBurst Architecture

Moderne
Prozessorarchitekturen:
Die Intel NetBurst Technologie

Embedded Systems Design (ESD) / Hagenberg
Michael Bogner
Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.
Industrielle Software-Entwicklung - 1 -
Intel NetBurst Technology (1)
• Deeply pipelined design

-> allows high clock rates
eg. Pentium 4: 20 pipeline stages
eg. Pentium 4 Extreme (HT): 32 pipeline stages
Note: Some of the stages are just for moving data

from one part of the chip to another!
• Different clock rates

Different parts of the chip running at different clock rates
• Out-of-order and speculative execution

-> to enable parallelism
• Superscalar issue
-> to enable parallelism
• Hardware register renaming

-> to avoid register name space limitations
• L1 data cache is a trace cache

-> 8 or 16 kB, 8 ways, 64 bytes per line
• Employments of techniques to hide stall penalties:

Among these are
- parallel execution,
- buffering, and
- speculation.
This means:
The microarchitecture executes instructions dynamically

and out-of-order.
So the time it takes to execute each individual instruction

is not always deterministic.
Design Goal:
The primary design goal of the NetBurst micro-

architecture was to obtain the highest possible clock
frequency. This can only be achieved by making the
pipeline longer.
Level-1 Trace Cache
Instructions are stored in the trace cache after being decoded into µops. Rather
than storing instruction opcodes in a level-1 cache, it stores decoded µops.
One important reason for this is that the decoding stage was a bottleneck on
earlier processors. An opcode can have any length from 1 to 15 bytes. It is quite
complicated to determine the length of an instruction opcode; and we have to
know the length of the first opcode in order to know where the second opcode
begins. Therefore, it is difficult to determine opcode lengths in parallel.
Pipeline
The pipeline of the Intel NetBurst microarchitecture contains:
• an in-order issue front end,

• an out-of-order superscalar execution core,
• an in-order retirement unit.
The front end supplies instructions in program order to the out-of-order core. It
fetches and decodes instructions. The decoded instructions are translated into
µops.
The front end’s primary job is to feed a continuous stream of µops to the
execution core in original program order.
The out-of-order core aggressively reorders µops so that µops whose inputs are
ready (and have execution resources available) can execute as soon as possible. The
core can issue multiple µops per cycle.
The retirement section ensures that the results of execution are processed
according to original program order and that the proper architectural states are
updated.
Figure 2-5 illustrates a diagram of the major functional blocks associated with the
Intel NetBurst microarchitecture pipeline. The following subsections provide an
overview for each.
Pipeline Front End
The front end of the pipeline performs the following functions:
• prefetches instructions that are likely to be executed

• fetches required instructions that have not been prefetched
• decodes instructions into ìops
• generates microcode for complex instructions and special-purpose code
• delivers decoded instructions from the execution trace cache
• predicts branches using advanced algorithms
The front end is designed to address two problems that are sources of delay:
• time required to decode instructions fetched from the target

• wasted decode bandwidth due to branches or a branch target in the
middle of a cache line
Out-of-order Core
The core’s ability to execute instructions out of order is a key factor in enabling
parallelism. This feature enables the processor to reorder instructions so that if one
µop is delayed while waiting for data or a contended resource, other µops that
appear later in the program order may proceed. This implies that when one portion
of the pipeline experiences a delay, the delay may be covered by other operations
executing in parallel or by the execution of µops queued up in a buffer.
The core is designed to facilitate parallel execution. It can dispatch up to six µops
per cycle through the issue ports (Figure 2-6).
Retirement
The retirement section receives the results of the executed µops from the execution
core and processes the results so that the architectural state is updated according
to the original program order. For semantically correct execution, the results of
Intel 64 and IA-32 instructions must be committed in original program order
before they are retired. Exceptions may be raised as instructions are retired. For this
reason, exceptions cannot occur speculatively.
The retirement section also keeps track of branches and sends updated branch
target information to the branch target buffer (BTB). This updates branch history.

Folien Intel NetBurst Architecture

Uploaded by

Copyright:

Available Formats

Folien Intel NetBurst Architecture

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Folien Intel NetBurst Architecture

Uploaded by

Copyright:

Available Formats

Moderne

Die Intel NetBurst Technologie

Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

• Deeply pipelined design

Note: Some of the stages are just for moving data

• Different clock rates

• Out-of-order and speculative execution

• Hardware register renaming

• L1 data cache is a trace cache

• Employments of techniques to hide stall penalties:

The microarchitecture executes instructions dynamically

So the time it takes to execute each individual instruction

The primary design goal of the NetBurst micro-

Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

Level-1 Trace Cache

The pipeline of the Intel NetBurst microarchitecture contains:

• an in-order issue front end,

Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

Pipeline Front End

The front end of the pipeline performs the following functions:

• prefetches instructions that are likely to be executed

• time required to decode instructions fetched from the target

Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

You might also like