Folien Intel NetBurst Architecture

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Moderne

Prozessorarchitekturen:

Die Intel NetBurst Technologie


Embedded Systems Design (ESD) / Hagenberg
Michael Bogner

Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

Industrielle Software-Entwicklung - 1 -
Intel NetBurst Technology (1)

• Deeply pipelined design


-> allows high clock rates
eg. Pentium 4: 20 pipeline stages
eg. Pentium 4 Extreme (HT): 32 pipeline stages

Note: Some of the stages are just for moving data


from one part of the chip to another!

• Different clock rates


Different parts of the chip running at different clock rates

• Out-of-order and speculative execution


-> to enable parallelism

• Superscalar issue
-> to enable parallelism

• Hardware register renaming


-> to avoid register name space limitations

• L1 data cache is a trace cache


-> 8 or 16 kB, 8 ways, 64 bytes per line

• Employments of techniques to hide stall penalties:


Among these are
- parallel execution,
- buffering, and
- speculation.
Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

Industrielle Software-Entwicklung - 2 -
Intel NetBurst Technology (2)

This means:

The microarchitecture executes instructions dynamically


and out-of-order.

So the time it takes to execute each individual instruction


is not always deterministic.

Design Goal:

The primary design goal of the NetBurst micro-


architecture was to obtain the highest possible clock
frequency. This can only be achieved by making the
pipeline longer.

Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

Industrielle Software-Entwicklung - 3 -
Intel NetBurst Technology (3)

Level-1 Trace Cache

Instructions are stored in the trace cache after being decoded into µops. Rather
than storing instruction opcodes in a level-1 cache, it stores decoded µops.

One important reason for this is that the decoding stage was a bottleneck on
earlier processors. An opcode can have any length from 1 to 15 bytes. It is quite
complicated to determine the length of an instruction opcode; and we have to
know the length of the first opcode in order to know where the second opcode
begins. Therefore, it is difficult to determine opcode lengths in parallel.

Pipeline

The pipeline of the Intel NetBurst microarchitecture contains:

• an in-order issue front end,


• an out-of-order superscalar execution core,
• an in-order retirement unit.

The front end supplies instructions in program order to the out-of-order core. It
fetches and decodes instructions. The decoded instructions are translated into
µops.

The front end’s primary job is to feed a continuous stream of µops to the
execution core in original program order.

The out-of-order core aggressively reorders µops so that µops whose inputs are
ready (and have execution resources available) can execute as soon as possible. The
core can issue multiple µops per cycle.

The retirement section ensures that the results of execution are processed
according to original program order and that the proper architectural states are
updated.

Figure 2-5 illustrates a diagram of the major functional blocks associated with the
Intel NetBurst microarchitecture pipeline. The following subsections provide an
overview for each.

Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

Industrielle Software-Entwicklung - 4 -
Intel NetBurst Technology (4)

Pipeline Front End

The front end of the pipeline performs the following functions:

• prefetches instructions that are likely to be executed


• fetches required instructions that have not been prefetched
• decodes instructions into ìops
• generates microcode for complex instructions and special-purpose code
• delivers decoded instructions from the execution trace cache
• predicts branches using advanced algorithms

The front end is designed to address two problems that are sources of delay:

• time required to decode instructions fetched from the target


• wasted decode bandwidth due to branches or a branch target in the
middle of a cache line
Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

Industrielle Software-Entwicklung - 5 -
Intel NetBurst Technology (5)

Out-of-order Core
The core’s ability to execute instructions out of order is a key factor in enabling
parallelism. This feature enables the processor to reorder instructions so that if one
µop is delayed while waiting for data or a contended resource, other µops that
appear later in the program order may proceed. This implies that when one portion
of the pipeline experiences a delay, the delay may be covered by other operations
executing in parallel or by the execution of µops queued up in a buffer.

The core is designed to facilitate parallel execution. It can dispatch up to six µops
per cycle through the issue ports (Figure 2-6).

Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

Industrielle Software-Entwicklung - 6 -
Intel NetBurst Technology (6)

Retirement
The retirement section receives the results of the executed µops from the execution
core and processes the results so that the architectural state is updated according
to the original program order. For semantically correct execution, the results of
Intel 64 and IA-32 instructions must be committed in original program order
before they are retired. Exceptions may be raised as instructions are retired. For this
reason, exceptions cannot occur speculatively.

The retirement section also keeps track of branches and sends updated branch
target information to the branch target buffer (BTB). This updates branch history.

Quelle: Intel Corp, A. Fog. Copenhagen University College of Engineering.

Industrielle Software-Entwicklung - 7 -

You might also like